qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2] migration/multifd: Don't fsync when closing QIOChannelFil


From: Peter Xu
Subject: Re: [PATCH v2] migration/multifd: Don't fsync when closing QIOChannelFile
Date: Thu, 7 Mar 2024 08:14:49 +0800

On Tue, Mar 05, 2024 at 04:56:29PM -0300, Fabiano Rosas wrote:
> Commit bc38feddeb ("io: fsync before closing a file channel") added a
> fsync/fdatasync at the closing point of the QIOChannelFile to ensure
> integrity of the migration stream in case of QEMU crash.
> 
> The decision to do the sync at qio_channel_close() was not the best
> since that function runs in the main thread and the fsync can cause
> QEMU to hang for several minutes, depending on the migration size and
> disk speed.
> 
> To fix the hang, remove the fsync from qio_channel_file_close().
> 
> At this moment, the migration code is the only user of the fsync and
> we're taking the tradeoff of not having a sync at all, leaving the
> responsibility to the upper layers.
> 
> Fixes: bc38feddeb ("io: fsync before closing a file channel")
> Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>

Since 9.0 is reaching and it's important we avoid such hang, I queued this
version.

However to make sure we can still remember why we do this after a few
years, I added a rich comment and will squash into this patch:

=======

diff --git a/migration/multifd.c b/migration/multifd.c
index 0a8fef046b..bf9d483f7a 100644
--- a/migration/multifd.c
+++ b/migration/multifd.c
@@ -714,6 +714,22 @@ static bool multifd_send_cleanup_channel(MultiFDSendParams 
*p, Error **errp)
          * released because finalize() of the iochannel is only
          * triggered on the last reference and it's not guaranteed
          * that we always hold the last refcount when reaching here.
+         *
+         * Closing the fd explicitly has the benefit that if there is any
+         * registered I/O handler callbacks on such fd, that will get a
+         * POLLNVAL event and will further trigger the cleanup to finally
+         * release the IOC.
+         *
+         * FIXME: It should logically be guaranteed that all multifd
+         * channels have no I/O handler callback registered when reaching
+         * here, because migration thread will wait for all multifd channel
+         * establishments to complete during setup.  Since
+         * migrate_fd_cleanup() will be scheduled in main thread too, all
+         * previous callbacks should guarantee to be completed when
+         * reaching here.  See multifd_send_state.channels_created and its
+         * usage.  In the future, we could replace this with an assert
+         * making sure we're the last reference, or simply drop it if above
+         * is more clear to be justified.
          */
         qio_channel_close(p->c, &error_abort);
         object_unref(OBJECT(p->c));

========

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]