[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 00/33] Multifd π device state transfer support with VFIO c
From: |
Maciej S. Szmigiero |
Subject: |
Re: [PATCH v4 00/33] Multifd π device state transfer support with VFIO consumer |
Date: |
Thu, 30 Jan 2025 21:27:29 +0100 |
User-agent: |
Mozilla Thunderbird |
On 30.01.2025 21:19, Fabiano Rosas wrote:
"Maciej S. Szmigiero" <mail@maciej.szmigiero.name> writes:
From: "Maciej S. Szmigiero" <maciej.szmigiero@oracle.com>
This is an updated v4 patch series of the v3 series located here:
https://lore.kernel.org/qemu-devel/cover.1731773021.git.maciej.szmigiero@oracle.com/
Changes from v3:
* MigrationLoadThread now returns bool and an Error complex error type
instead of just an int.
* qemu_loadvm_load_thread_pool now reports error via migrate_set_error()
instead of dedicated load_threads_ret variable.
* Since the change above uncovered an issue with respect to multifd send
channels not terminating TLS session properly QIOChannelTLS now allows
gracefully handling this situation.
* qemu_loadvm_load_thread_pool state is now part of MigrationIncomingState
instead of being stored in global variables.
This state now also has its own init/cleanup helpers.
* qemu_loadvm_load_thread_pool code is now moved into a separate section
of the savevm.c file, marked by an appropriate comment.
* thread_pool_free() is now documented to have wait-before-free semantic,
which allowed removal of explicit waits from thread pool cleanup paths.
* thread_pool_submit_immediate() method was added since this functionality
is used by both generic thread pool users in this patch set.
* postcopy_ram_listen_thread() now takes BQL around function calls that
ultimately call migration methods requiring BQL.
This fixes one of QEMU tests failing when explicitly BQL-sensitive code
is added later to these methods.
* qemu_loadvm_load_state_buffer() now returns a bool value instead of int.
* "Send final SYNC only after device state is complete" patch was
dropped since Peter implemented equivalent functionality upstream.
* "Document the BQL behavior of load SaveVMHandlers" patch was dropped
since that's something better done later, separately from this patch set.
* Header size is now added to mig_stats.multifd_bytes where it is actually
sent in the zero copy case - in multifd_nocomp_send_prepare().
* Spurious wakeups from qemu_cond_wait() are now handled properly as
pointed out by Avihai.
* VFIO migration FD now allows partial write() completion as pointed out
by Avihai.
* Patch "vfio/migration: Don't run load cleanup if load setup didn't run"
was dropped, instead all objects related to multifd load are now located in
their own VFIOMultifd struct which is allocated only if multifd device state
transfer is actually in use.
* Intermediate VFIOStateBuffers API as suggested by Avihai is now introduced
to simplify vfio_load_state_buffer() and vfio_load_bufs_thread().
* Optional VFIO device config state loading interlocking with loading
other iterables is now possible due to ARM64 platform VFIO dependency on
interrupt controller being loaded first as pointed out by Avihai.
* Patch "Multifd device state transfer support - receive side" was split
into a few smaller patches as suggested by CΓ©dric.
* x-migration-multifd-transfer VFIO property compat changes were moved
into a separate patch as suggested by CΓ©dric.
* Other small changes, like renamed functions and variables/members, added
review tags, code formatting, moved QEMU_LOCK_GUARD() instances closer to
actual protected blocks, etc.
========================================================================
This patch set is targeting QEMU 10.0.
What's not yet present is documentation update under docs/devel/migration
but I didn't want to delay posting the code any longer.
Such doc can still be merged later when the design is 100% finalized.
========================================================================
Maciej S. Szmigiero (32):
migration: Clarify that {load,save}_cleanup handlers can run without
setup
thread-pool: Remove thread_pool_submit() function
thread-pool: Rename AIO pool functions to *_aio() and data types to
*Aio
thread-pool: Implement generic (non-AIO) pool support
migration: Add MIG_CMD_SWITCHOVER_START and its load handler
migration: Add qemu_loadvm_load_state_buffer() and its handler
io: tls: Allow terminating the TLS session gracefully with EOF
migration/multifd: Allow premature EOF on TLS incoming channels
migration: postcopy_ram_listen_thread() needs to take BQL for some
calls
error: define g_autoptr() cleanup function for the Error type
migration: Add thread pool of optional load threads
migration/multifd: Split packet into header and RAM data
migration/multifd: Device state transfer support - receive side
migration/multifd: Make multifd_send() thread safe
migration/multifd: Add an explicit MultiFDSendData destructor
migration/multifd: Device state transfer support - send side
migration/multifd: Add multifd_device_state_supported()
migration: Add save_live_complete_precopy_thread handler
vfio/migration: Add x-migration-load-config-after-iter VFIO property
vfio/migration: Add load_device_config_state_start trace event
vfio/migration: Convert bytes_transferred counter to atomic
vfio/migration: Multifd device state transfer support - basic types
vfio/migration: Multifd device state transfer support -
VFIOStateBuffer(s)
vfio/migration: Multifd device state transfer - add support checking
function
vfio/migration: Multifd device state transfer support - receive
init/cleanup
vfio/migration: Multifd device state transfer support - received
buffers queuing
vfio/migration: Multifd device state transfer support - load thread
vfio/migration: Multifd device state transfer support - config loading
support
migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile
vfio/migration: Multifd device state transfer support - send side
vfio/migration: Add x-migration-multifd-transfer VFIO property
hw/core/machine: Add compat for x-migration-multifd-transfer VFIO
property
Peter Xu (1):
migration/multifd: Make MultiFDSendData a struct
hw/core/machine.c | 2 +
hw/vfio/migration.c | 754 ++++++++++++++++++++++++++++-
hw/vfio/pci.c | 14 +
hw/vfio/trace-events | 11 +-
include/block/aio.h | 8 +-
include/block/thread-pool.h | 62 ++-
include/hw/vfio/vfio-common.h | 7 +
include/io/channel-tls.h | 11 +
include/migration/client-options.h | 4 +
include/migration/misc.h | 16 +
include/migration/register.h | 54 ++-
include/qapi/error.h | 2 +
include/qemu/typedefs.h | 6 +
io/channel-tls.c | 6 +
migration/colo.c | 3 +
migration/meson.build | 1 +
migration/migration-hmp-cmds.c | 2 +
migration/migration.c | 6 +-
migration/migration.h | 7 +
migration/multifd-device-state.c | 192 ++++++++
migration/multifd-nocomp.c | 30 +-
migration/multifd.c | 248 ++++++++--
migration/multifd.h | 74 ++-
migration/options.c | 9 +
migration/qemu-file.h | 2 +
migration/savevm.c | 195 +++++++-
migration/savevm.h | 6 +-
migration/trace-events | 1 +
scripts/analyze-migration.py | 11 +
tests/unit/test-thread-pool.c | 6 +-
util/async.c | 6 +-
util/thread-pool.c | 184 +++++--
util/trace-events | 6 +-
33 files changed, 1814 insertions(+), 132 deletions(-)
create mode 100644 migration/multifd-device-state.c
Hi!
We have build issues:
https://gitlab.com/farosas/qemu/-/pipelines/1649146958
Looks like that's an issue that qatomics on 64-bit
VFIO bytes transferred counters aren't available on
32-bit host platforms.
The easiest way would be probably to change these to
32-bit counters on 32-bit platforms since they can't
realistically address more memory anyway.
And the postcopy/recovery test is failing. It seems the migration
finishes before the test can issue migrate-pause:
QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p
/x86_64/migration/postcopy/recovery/plain
...
{"execute": "migrate-start-postcopy"}
{"return": {}}
{"secs": 1738267018, "usecs": 860991}, "event": "MIGRATION", "data": {"status":
"postcopy-active"}
{"secs": 1738267018, "usecs": 861284}, "event": "STOP"
{"secs": 1738267017, "usecs": 960322}, "event": "MIGRATION", "data": {"status":
"active"}
{"secs": 1738267018, "usecs": 865589}, "event": "MIGRATION", "data": {"status":
"postcopy-active"}
{"secs": 1738267099, "usecs": 120971}, "event": "MIGRATION", "data": {"status":
"completed"}
{"secs": 1738267099, "usecs": 121154}, "event": "RESUME"
{"execute": "query-migrate"}
ERROR:../tests/qtest/migration/migration-qmp.c:172:check_migration_status:
assertion failed (current_status != "completed"): ("completed" !=
"completed")
Hmm, it looks like this failure wasn't showing
in my tests because the test was skipped due to
missing userfaultfd support:
$ QTEST_QEMU_BINARY=./qemu-system-x86_64 ./tests/qtest/migration-test -p
/x86_64/migration/postcopy/recovery/plain
TAP version 14
# random seed: R02Sc99a7d93274064bb87f3e0789fbf8326
# Skipping test: userfaultfd not available
# Start of x86_64 tests
# Start of migration tests
# End of migration tests
# End of x86_64 tests
1..0
Will try to make this test run and investigate the reason for
failure.
Thanks,
Maciej
- [PATCH v4 25/33] vfio/migration: Multifd device state transfer - add support checking function, (continued)
- [PATCH v4 25/33] vfio/migration: Multifd device state transfer - add support checking function, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 26/33] vfio/migration: Multifd device state transfer support - receive init/cleanup, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 27/33] vfio/migration: Multifd device state transfer support - received buffers queuing, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 28/33] vfio/migration: Multifd device state transfer support - load thread, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 29/33] vfio/migration: Multifd device state transfer support - config loading support, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 30/33] migration/qemu-file: Define g_autoptr() cleanup function for QEMUFile, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 31/33] vfio/migration: Multifd device state transfer support - send side, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 32/33] vfio/migration: Add x-migration-multifd-transfer VFIO property, Maciej S. Szmigiero, 2025/01/30
- [PATCH v4 33/33] hw/core/machine: Add compat for x-migration-multifd-transfer VFIO property, Maciej S. Szmigiero, 2025/01/30
- Re: [PATCH v4 00/33] Multifd π device state transfer support with VFIO consumer, Fabiano Rosas, 2025/01/30
- Re: [PATCH v4 00/33] Multifd π device state transfer support with VFIO consumer,
Maciej S. Szmigiero <=