qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH V5 15/23] migration: cpr-transfer mode


From: Steven Sistare
Subject: Re: [PATCH V5 15/23] migration: cpr-transfer mode
Date: Thu, 2 Jan 2025 15:05:51 -0500
User-agent: Mozilla Thunderbird

On 1/2/2025 2:57 PM, Peter Xu wrote:
On Thu, Jan 02, 2025 at 02:21:13PM -0500, Steven Sistare wrote:
On 12/24/2024 2:24 PM, Peter Xu wrote:
On Tue, Dec 24, 2024 at 08:17:00AM -0800, Steve Sistare wrote:
Add the cpr-transfer migration mode, which allows the user to transfer
a guest to a new QEMU instance on the same host with minimal guest pause
time, by preserving guest RAM in place, albeit with new virtual addresses
in new QEMU, and by preserving device file descriptors.  Pages that were
locked in memory for DMA in old QEMU remain locked in new QEMU, because the
descriptor of the device that locked them remains open.

cpr-transfer preserves memory and devices descriptors by sending them to
new QEMU over a unix domain socket using SCM_RIGHTS.  Such CPR state cannot
be sent over the normal migration channel, because devices and backends
are created prior to reading the channel, so this mode sends CPR state
over a second "cpr" migration channel.  New QEMU reads the cpr channel
prior to creating devices or backends.  The user specifies the cpr channel
in the channel arguments on the outgoing side, and in a second -incoming
command-line parameter on the incoming side.

The user must start old QEMU with the the '-machine aux-ram-share=on' option,
which allows anonymous memory to be transferred in place to the new process
by transferring a memory descriptor for each ram block.  Memory-backend
objects must have the share=on attribute, but memory-backend-epc is not
supported.

The user starts new QEMU on the same host as old QEMU, with command-line
arguments to create the same machine, plus the -incoming option for the
main migration channel, like normal live migration.  In addition, the user
adds a second -incoming option with channel type "cpr".  The CPR channel
address must be a type, such as unix socket, that supports SCM_RIGHTS.

To initiate CPR, the user issues a migrate command to old QEMU, adding
a second migration channel of type "cpr" in the channels argument.
Old QEMU stops the VM, saves state to the migration channels, and enters
the postmigrate state.  New QEMU mmap's memory descriptors, and execution
resumes.

The implementation splits qmp_migrate into start and finish functions.
Start sends CPR state to new QEMU, which responds by closing the CPR
channel.  Old QEMU detects the HUP then calls finish, which connects the
main migration channel.

In summary, the usage is:

    qemu-system-$arch -machine aux-ram-share=on ...

    start new QEMU with "-incoming <main-uri> -incoming <cpr-channel>"

    Issue commands to old QEMU:
      migrate_set_parameter mode cpr-transfer

      {"execute": "migrate", ...
          {"channel-type": "main"...}, {"channel-type": "cpr"...} ... }

Signed-off-by: Steve Sistare <steven.sistare@oracle.com>

Feel free to take:

Reviewed-by: Peter Xu <peterx@redhat.com>

I still have a few trivial comments.

[...]

diff --git a/migration/cpr.c b/migration/cpr.c
index 87bcfdb..584b0b9 100644
--- a/migration/cpr.c
+++ b/migration/cpr.c
@@ -45,7 +45,7 @@ static const VMStateDescription vmstate_cpr_fd = {
           VMSTATE_UINT32(namelen, CprFd),
           VMSTATE_VBUFFER_ALLOC_UINT32(name, CprFd, 0, NULL, namelen),
           VMSTATE_INT32(id, CprFd),

Could you remind me again on when id!=0 will start to be used?

Each of vfio, iommufd, chardev, and tap will use id != 0.

I don't remember the details of the planned future series, but just to
mention that using integer ID can be error prone on device hot plug/unplug.

QEMU has a known bug even now on some device (e.g. slirp network backends)
that if the src QEMU originally has two devices (e.g. id=1,2), unplug
device id=1 (leaving id=2), then migrate, it could fail seeing dest only
has id=1 (dest QEMU starts with only one device), seeing a mismatched ID.

I recall PCIe frontend devices are not prone to such issue, that should
depend on whoever has ->get_id() (qdev_get_dev_path?) properly implemented
to generate a global unique ID that is not affected by order of device
realized / created.

It could boil down to how the IDs are allocated, anything that can be
allocated on the fly may not work well if there's no solid topology
information to fetch.

I wonder if CPR can be prone to this too when using IDs, just FYI.  It
might be a good idea if ID integers can be avoided somehow.  But you'll
definitely have the best picture of the whole thing, so it may or may not
apply.

Thanks for the thought, but I don't use such id's.
I use them for things like vfio interrupt index 0, 1, 2, etc.

- Steve




reply via email to

[Prev in Thread] Current Thread [Next in Thread]