[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH V7 00/24] Live update: cpr-transfer
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH V7 00/24] Live update: cpr-transfer |
Date: |
Mon, 27 Jan 2025 12:39:18 -0300 |
Steve Sistare <steven.sistare@oracle.com> writes:
> What?
>
> This patch series adds the live migration cpr-transfer mode, which
> allows the user to transfer a guest to a new QEMU instance on the same
> host with minimal guest pause time, by preserving guest RAM in place,
> albeit with new virtual addresses in new QEMU, and by preserving device
> file descriptors.
>
> The new user-visible interfaces are:
> * cpr-transfer (MigMode migration parameter)
> * cpr (MigrationChannelType)
> * incoming MigrationChannel (command-line argument)
> * aux-ram-share (machine option)
>
> The user sets the mode parameter before invoking the migrate command.
> In this mode, the user starts new QEMU on the same host as old QEMU, with
> the same arguments as old QEMU, plus two -incoming options; one for the main
> channel, and one for the CPR channel. The user issues the migrate command to
> old QEMU, which stops the VM, saves state to the migration channels, and
> enters the postmigrate state. Execution resumes in new QEMU.
>
> Memory-backend objects must have the share=on attribute, but
> memory-backend-epc
> is not supported. The VM must be started with the '-machine aux-ram-share=on'
> option, which allows auxilliary guest memory to be transferred in place to the
> new process.
>
> This mode requires a second migration channel of type "cpr", in the channel
> arguments on the outgoing side, and in a second -incoming command-line
> parameter on the incoming side. This CPR channel must support file descriptor
> transfer with SCM_RIGHTS, i.e. it must be a UNIX domain socket.
>
> Why?
>
> This mode has less impact on the guest than any other method of updating
> in place. The pause time is much lower, because devices need not be torn
> down and recreated, DMA does not need to be drained and quiesced, and minimal
> state is copied to new QEMU. Further, there are no constraints on the guest.
> By contrast, cpr-reboot mode requires the guest to support S3 suspend-to-ram,
> and suspending plus resuming vfio devices adds multiple seconds to the
> guest pause time.
>
> These benefits all derive from the core design principle of this mode,
> which is preserving open descriptors. This approach is very general and
> can be used to support a wide variety of devices that do not have hardware
> support for live migration, including but not limited to: vfio, chardev,
> vhost, vdpa, and iommufd. Some devices need new kernel software interfaces
> to allow a descriptor to be used in a process that did not originally open it.
>
> How?
>
> All memory that is mapped by the guest is preserved in place. Indeed,
> it must be, because it may be the target of DMA requests, which are not
> quiesced during cpr-transfer. All such memory must be mmap'able in new QEMU.
> This is easy for named memory-backend objects, as long as they are mapped
> shared, because they are visible in the file system in both old and new QEMU.
> Anonymous memory must be allocated using memfd_create rather than MAP_ANON,
> so the memfd's can be sent to new QEMU. Pages that were locked in memory
> for DMA in old QEMU remain locked in new QEMU, because the descriptor of
> the device that locked them remains open.
>
> cpr-transfer preserves descriptors by sending them to new QEMU via the CPR
> channel, which must support SCM_RIGHTS, and by sending the unique name of
> each descriptor to new QEMU via CPR state.
>
> For device descriptors, new QEMU reuses the descriptor when creating the
> device, rather than opening it again. For memfd descriptors, new QEMU
> mmap's the preserved memfd when a ramblock is created.
>
> CPR state cannot be sent over the normal migration channel, because devices
> and backends are created prior to reading the channel, so this mode sends
> CPR state over a second "cpr" migration channel. New QEMU reads the second
> channel prior to creating devices or backends.
>
> Example:
>
> In this example, we simply restart the same version of QEMU, but in
> a real scenario one would use a new QEMU binary path in terminal 2.
>
> Terminal 1: start old QEMU
> # qemu-kvm -qmp stdio -object
> memory-backend-file,id=ram0,size=4G,mem-path=/dev/shm/ram0,share=on
> -m 4G -machine aux-ram-share=on ...
>
> Terminal 2: start new QEMU
> # qemu-kvm -monitor stdio ... -incoming tcp:0:44444
> -incoming '{"channel-type": "cpr",
> "addr": { "transport": "socket", "type": "unix",
> "path": "cpr.sock"}}'
>
> Terminal 1:
> {"execute":"qmp_capabilities"}
>
> {"execute": "query-status"}
> {"return": {"status": "running",
> "running": true}}
>
> {"execute":"migrate-set-parameters",
> "arguments":{"mode":"cpr-transfer"}}
>
> {"execute": "migrate", "arguments": { "channels": [
> {"channel-type": "main",
> "addr": { "transport": "socket", "type": "inet",
> "host": "0", "port": "44444" }},
> {"channel-type": "cpr",
> "addr": { "transport": "socket", "type": "unix",
> "path": "cpr.sock" }}]}}
>
> {"execute": "query-status"}
> {"return": {"status": "postmigrate",
> "running": false}}
>
> Terminal 2:
> QEMU 10.0.50 monitor - type 'help' for more information
> (qemu) info status
> VM status: running
>
> This patch series implements a minimal version of cpr-transfer. Additional
> series are ready to be posted to deliver the complete vision described
> above, including
> * vfio
> * chardev
> * vhost and tap
> * blockers
> * cpr-exec mode
> * iommufd
>
> Changes in V2:
> * cpr-transfer is the first new mode proposed, and cpr-exec is deferred
> * anon-alloc does not apply to memory-backend-object
> * replaced hack with proper synchronization between source and target
> * defined QEMU_CPR_FILE_MAGIC
> * addressed misc review comments
>
> Changes in V3:
> * added cpr-transfer to migration-test
> * documented cpr-transfer in CPR.rst
> * fix size_t trace format for 32-bit build
> * drop explicit fd value in VMSTATE_FD
> * defer cpr_walk_fd() and cpr_resave_fd() to later series
> * drop "migration: save cpr mode".
> delete mode from cpr state, and use cpr_uri to infer transfer mode.
> * drop "migration: stop vm earlier for cpr"
> * dropped cpr helpers, to be re-added later when needed
> * fixed an unreported bug for cpr-transfer and migrate cancel
> * documented cpr-transfer restrictions in qapi
> * added trace for cpr_state_save and cpr_state_load
> * added ftruncate to "preserve ram blocks"
>
> Changes in V4:
> * cleaned up qtest deferred connection code
> * renamed pass_fd -> can_pass_fd
> * squashed patch "split qmp_migrate"
> * deleted cpr-uri and its patches
> * added cpr channel and its patches
> * added patch "hostmem-shm: preserve for cpr"
> * added patch "fd-based shared memory"
> * added patch "factor out allocation of anonymous shared memory"
> * added RAM_PRIVATE and its patch
> * added aux-ram-share and its patch
>
> Changes in V5:
> * added patch 'enhance migrate_uri_parse'
> * supported dotted keys for -incoming channel,
> and rewrote incoming_option_parse
> * moved migrate_fd_cancel -> vm_resume to "stop vm earlier for cpr"
> in a future series.
> * updated command-line definition for aux-ram-share
> * added patch "resizable qemu_ram_alloc_from_fd"
> * rewrote patch "fd-based shared memory"
> * fixed error message in qemu_shm_alloc
> * added patch 'tests/qtest: optimize migrate_set_ports'
> * added patch 'tests/qtest: enhance migration channels'
> * added patch 'tests/qtest: assert qmp_ready'
> * modified patch 'migration-test: cpr-transfer'
> * polished the documentation in CPR.rst, qapi, and the
> cpr-transfer mode commit message
> * updated to master, and resolved massive context diffs for migration tests
>
> Changes in V6:
> * added RB's and Acks.
> * in patch "assert qmp_ready", deleted qmp_ready and checked qmp_fd instead.
> renamed patch to ""assert qmp connected"
> * factored out fix into new patch
> "fix qemu_ram_alloc_from_fd size calculation"
> * deleted a redundant call to migrate_hup_delete
> * added commit message to "migration: cpr-transfer documentation"
> * polished the text of cpr-transfer mode in qapi
>
> Changes in V7:
> * fixed cpr-transfer test failure for s390
> * fixed machine_get_aux_ram_share compilation error for Windows
> * fixed size_t print format compilation error for misc architectures
> * fixed memory leaks in cpr_transfer_output, cpr_transfer_input, and
> qemu_file_get_fd
>
> The first 10 patches below are foundational and are needed for both
> cpr-transfer
> mode and the proposed cpr-exec mode. The next 6 patches are specific to
> cpr-transfer and implement the mechanisms for sharing state across a socket
> using SCM_RIGHTS. The last 8 patches supply tests and documentation.
>
> Steve Sistare (24):
> backends/hostmem-shm: factor out allocation of "anonymous shared
> memory with an fd"
> physmem: fix qemu_ram_alloc_from_fd size calculation
> physmem: qemu_ram_alloc_from_fd extensions
> physmem: fd-based shared memory
> memory: add RAM_PRIVATE
> machine: aux-ram-share option
> migration: cpr-state
> physmem: preserve ram blocks for cpr
> hostmem-memfd: preserve for cpr
> hostmem-shm: preserve for cpr
> migration: enhance migrate_uri_parse
> migration: incoming channel
> migration: SCM_RIGHTS for QEMUFile
> migration: VMSTATE_FD
> migration: cpr-transfer save and load
> migration: cpr-transfer mode
> migration-test: memory_backend
> tests/qtest: optimize migrate_set_ports
> tests/qtest: defer connection
> migration-test: defer connection
> tests/qtest: enhance migration channels
> tests/qtest: assert qmp connected
> migration-test: cpr-transfer
> migration: cpr-transfer documentation
>
> backends/hostmem-epc.c | 2 +-
> backends/hostmem-file.c | 2 +-
> backends/hostmem-memfd.c | 14 ++-
> backends/hostmem-ram.c | 2 +-
> backends/hostmem-shm.c | 51 ++------
> docs/devel/migration/CPR.rst | 182 ++++++++++++++++++++++++++-
> hw/core/machine.c | 22 ++++
> include/exec/memory.h | 10 ++
> include/exec/ram_addr.h | 13 +-
> include/hw/boards.h | 1 +
> include/migration/cpr.h | 33 +++++
> include/migration/misc.h | 7 ++
> include/migration/vmstate.h | 9 ++
> include/qemu/osdep.h | 1 +
> meson.build | 8 +-
> migration/cpr-transfer.c | 71 +++++++++++
> migration/cpr.c | 224
> +++++++++++++++++++++++++++++++++
> migration/meson.build | 2 +
> migration/migration.c | 139 +++++++++++++++++++-
> migration/migration.h | 4 +-
> migration/options.c | 8 +-
> migration/qemu-file.c | 84 ++++++++++++-
> migration/qemu-file.h | 2 +
> migration/ram.c | 2 +
> migration/trace-events | 11 ++
> migration/vmstate-types.c | 24 ++++
> qapi/migration.json | 44 ++++++-
> qemu-options.hx | 34 +++++
> stubs/vmstate.c | 7 ++
> system/memory.c | 4 +-
> system/physmem.c | 150 ++++++++++++++++++----
> system/trace-events | 1 +
> system/vl.c | 43 ++++++-
> tests/qtest/libqtest.c | 86 ++++++++-----
> tests/qtest/libqtest.h | 19 ++-
> tests/qtest/migration/cpr-tests.c | 62 +++++++++
> tests/qtest/migration/framework.c | 74 +++++++++--
> tests/qtest/migration/framework.h | 11 ++
> tests/qtest/migration/migration-qmp.c | 53 ++++++--
> tests/qtest/migration/migration-qmp.h | 10 +-
> tests/qtest/migration/migration-util.c | 23 ++--
> tests/qtest/migration/misc-tests.c | 9 +-
> tests/qtest/migration/precopy-tests.c | 6 +-
> tests/qtest/virtio-net-failover.c | 8 +-
> util/memfd.c | 16 ++-
> util/oslib-posix.c | 52 ++++++++
> util/oslib-win32.c | 6 +
> 47 files changed, 1472 insertions(+), 174 deletions(-)
> create mode 100644 include/migration/cpr.h
> create mode 100644 migration/cpr-transfer.c
> create mode 100644 migration/cpr.c
>
> base-commit: e8aa7fdcddfc8589bdc7c973a052e76e8f999455
I'd like to merge this series by the end of the week if possible. Please
take a look at some comments from Markus that were left behind in v5.
- Re: [PATCH V7 24/24] migration: cpr-transfer documentation, (continued)
[PATCH V7 18/24] tests/qtest: optimize migrate_set_ports, Steve Sistare, 2025/01/15
[PATCH V7 19/24] tests/qtest: defer connection, Steve Sistare, 2025/01/15
[PATCH V7 20/24] migration-test: defer connection, Steve Sistare, 2025/01/15
[PATCH V7 23/24] migration-test: cpr-transfer, Steve Sistare, 2025/01/15
Re: [PATCH V7 00/24] Live update: cpr-transfer,
Fabiano Rosas <=