qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] migration/multifd: Don't fsync when closing QIOChannelFile


From: Daniel P . Berrangé
Subject: Re: [PATCH] migration/multifd: Don't fsync when closing QIOChannelFile
Date: Tue, 5 Mar 2024 17:49:33 +0000
User-agent: Mutt/2.2.12 (2023-09-09)

On Tue, Mar 05, 2024 at 02:43:32PM -0300, Fabiano Rosas wrote:
> Commit bc38feddeb ("io: fsync before closing a file channel") added a
> fsync/fdatasync at the closing point of the QIOChannelFile to ensure
> integrity of the migration stream in case of QEMU crash.
> 
> The decision to do the sync at qio_channel_close() was not the best
> since that function runs in the main thread and the fsync can cause
> QEMU to hang for several minutes, depending on the migration size and
> disk speed.
> 
> To fix the hang, remove the fsync from qio_channel_file_close().
> 
> At this moment, the migration code is the only user of the fsync and
> we're taking the tradeoff of not having a sync at all, leaving the
> responsibility to the upper layers.
> 
> Fixes: bc38feddeb ("io: fsync before closing a file channel")
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> ---
>  docs/devel/migration/main.rst |  3 ++-
>  io/channel-file.c             |  5 -----
>  migration/multifd.c           | 13 -------------
>  3 files changed, 2 insertions(+), 19 deletions(-)
> 
> diff --git a/docs/devel/migration/main.rst b/docs/devel/migration/main.rst
> index 8024275d6d..54385a23e5 100644
> --- a/docs/devel/migration/main.rst
> +++ b/docs/devel/migration/main.rst
> @@ -44,7 +44,8 @@ over any transport.
>  - file migration: do the migration using a file that is passed to QEMU
>    by path. A file offset option is supported to allow a management
>    application to add its own metadata to the start of the file without
> -  QEMU interference.
> +  QEMU interference. Note that QEMU does not flush cached file
> +  data/metadata at the end of migration.
>  
>  In addition, support is included for migration using RDMA, which
>  transports the page data using ``RDMA``, where the hardware takes care of
> diff --git a/io/channel-file.c b/io/channel-file.c
> index d4706fa592..a6ad7770c6 100644
> --- a/io/channel-file.c
> +++ b/io/channel-file.c
> @@ -242,11 +242,6 @@ static int qio_channel_file_close(QIOChannel *ioc,
>  {
>      QIOChannelFile *fioc = QIO_CHANNEL_FILE(ioc);
>  
> -    if (qemu_fdatasync(fioc->fd) < 0) {
> -        error_setg_errno(errp, errno,
> -                         "Unable to synchronize file data with storage 
> device");
> -        return -1;
> -    }
>      if (qemu_close(fioc->fd) < 0) {
>          error_setg_errno(errp, errno,
>                           "Unable to close file");

Upto here:

   Reviewed-by: Daniel P. Berrangé <berrange@redhat.com>


> diff --git a/migration/multifd.c b/migration/multifd.c
> index d4a44da559..2edcd5104e 100644
> --- a/migration/multifd.c
> +++ b/migration/multifd.c
> @@ -709,19 +709,6 @@ static bool 
> multifd_send_cleanup_channel(MultiFDSendParams *p, Error **errp)
>  {
>      if (p->c) {
>          migration_ioc_unregister_yank(p->c);
> -        /*
> -         * An explicit close() on the channel here is normally not
> -         * required, but can be helpful for "file:" iochannels, where it
> -         * will include fdatasync() to make sure the data is flushed to the
> -         * disk backend.
> -         *
> -         * The object_unref() cannot guarantee that because: (1) finalize()
> -         * of the iochannel is only triggered on the last reference, and
> -         * it's not guaranteed that we always hold the last refcount when
> -         * reaching here, and, (2) even if finalize() is invoked, it only
> -         * does a close(fd) without data flush.
> -         */
> -        qio_channel_close(p->c, &error_abort);
>          object_unref(OBJECT(p->c));
>          p->c = NULL;
>      }

I don't think you should be removing this. Calling qio_channel_close()
remains recommended best practice, even with fdatasync() removed, as
it provides a strong guarantee that the FD is released which you don't
get if you rely on the ref count being correctly decremented in all
code paths.

With regards,
Daniel
-- 
|: https://berrange.com      -o-    https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org         -o-            https://fstop138.berrange.com :|
|: https://entangle-photo.org    -o-    https://www.instagram.com/dberrange :|




reply via email to

[Prev in Thread] Current Thread [Next in Thread]