[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PULL 02/25] migration/multifd: Further remove the SYNC on complete
From: |
Fabiano Rosas |
Subject: |
[PULL 02/25] migration/multifd: Further remove the SYNC on complete |
Date: |
Fri, 10 Jan 2025 09:13:50 -0300 |
From: Peter Xu <peterx@redhat.com>
Commit 637280aeb2 ("migration/multifd: Avoid the final FLUSH in
complete()") stopped sending the RAM_SAVE_FLAG_MULTIFD_FLUSH flag at
ram_save_complete(), because the sync on the destination side is not
needed due to the last iteration of find_dirty_block() having already
done it.
However, that commit overlooked that multifd_ram_flush_and_sync() on the
source side is also not needed at ram_save_complete(), for the same
reason.
Moreover, removing the RAM_SAVE_FLAG_MULTIFD_FLUSH but keeping the
multifd_ram_flush_and_sync() means that currently the recv threads will
hang when receiving the MULTIFD_FLAG_SYNC message, waiting for the
destination sync which only happens when RAM_SAVE_FLAG_MULTIFD_FLUSH is
received.
Luckily, multifd is still all working fine because recv side cleanup
code (mostly multifd_recv_sync_main()) is smart enough to make sure even
if recv threads are stuck at SYNC it'll get kicked out. And since this
is the completion phase of migration, nothing else will be sent after
the SYNCs.
This needs to be fixed because in the future VFIO will have data to push
after ram_save_complete() and we don't want the recv thread to be stuck
in the MULTIFD_FLAG_SYNC message.
Remove the unnecessary (and buggy) invocation of
multifd_ram_flush_and_sync().
For very old binaries (multifd_flush_after_each_section==true), the
flush_and_sync is still needed because each EOS received on destination
will enforce all-channel sync once.
Stable branches do not need this patch, as no real bug I can think of
that will go wrong there.. so not attaching Fixes to be clear on the
backport not needed.
Reviewed-by: Fabiano Rosas <farosas@suse.de>
Signed-off-by: Peter Xu <peterx@redhat.com>
Message-Id: <20241206224755.1108686-2-peterx@redhat.com>
Signed-off-by: Fabiano Rosas <farosas@suse.de>
---
migration/ram.c | 13 ++++++++++---
1 file changed, 10 insertions(+), 3 deletions(-)
diff --git a/migration/ram.c b/migration/ram.c
index a60666d3f6..f0ddd5eabe 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -3283,9 +3283,16 @@ static int ram_save_complete(QEMUFile *f, void *opaque)
}
}
- ret = multifd_ram_flush_and_sync();
- if (ret < 0) {
- return ret;
+ if (migrate_multifd() &&
+ migrate_multifd_flush_after_each_section()) {
+ /*
+ * Only the old dest QEMU will need this sync, because each EOS
+ * will require one SYNC message on each channel.
+ */
+ ret = multifd_ram_flush_and_sync();
+ if (ret < 0) {
+ return ret;
+ }
}
if (migrate_mapped_ram()) {
--
2.35.3
- [PULL 00/25] Migration patches for 2025-01-10, Fabiano Rosas, 2025/01/10
- [PULL 01/25] migration/multifd: Fix compile error caused by page_size usage, Fabiano Rosas, 2025/01/10
- [PULL 02/25] migration/multifd: Further remove the SYNC on complete,
Fabiano Rosas <=
- [PULL 03/25] migration/multifd: Allow to sync with sender threads only, Fabiano Rosas, 2025/01/10
- [PULL 04/25] migration/ram: Move RAM_SAVE_FLAG* into ram.h, Fabiano Rosas, 2025/01/10
- [PULL 05/25] migration/multifd: Unify RAM_SAVE_FLAG_MULTIFD_FLUSH messages, Fabiano Rosas, 2025/01/10
- [PULL 06/25] migration/multifd: Remove sync processing on postcopy, Fabiano Rosas, 2025/01/10
- [PULL 07/25] migration/multifd: Cleanup src flushes on condition check, Fabiano Rosas, 2025/01/10
- [PULL 08/25] migration/multifd: Document the reason to sync for save_setup(), Fabiano Rosas, 2025/01/10
- [PULL 09/25] migration/multifd: Fix compat with QEMU < 9.0, Fabiano Rosas, 2025/01/10
- [PULL 10/25] migration: Add helper to get target runstate, Fabiano Rosas, 2025/01/10
- [PULL 11/25] qmp/cont: Only activate disks if migration completed, Fabiano Rosas, 2025/01/10
- [PULL 12/25] migration/block: Make late-block-active the default, Fabiano Rosas, 2025/01/10