qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC PATCH 14/14] migration: Fix return-path thread exit


From: Fabiano Rosas
Subject: Re: [RFC PATCH 14/14] migration: Fix return-path thread exit
Date: Fri, 23 Feb 2024 11:05:34 -0300

Peter Xu <peterx@redhat.com> writes:

> On Fri, Feb 16, 2024 at 02:35:26PM -0300, Fabiano Rosas wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>> 
>> > Hello Fabiano
>> >
>> > On 2/14/24 21:35, Fabiano Rosas wrote:
>> >> Cédric Le Goater <clg@redhat.com> writes:
>> >> 
>> >>> Hello Fabiano
>> >>>
>> >>> On 2/8/24 14:29, Fabiano Rosas wrote:
>> >>>> Cédric Le Goater <clg@redhat.com> writes:
>> >>>>
>> >>>>> In case of error, close_return_path_on_source() can perform a shutdown
>> >>>>> to exit the return-path thread.  However, in migrate_fd_cleanup(),
>> >>>>> 'to_dst_file' is closed before calling close_return_path_on_source()
>> >>>>> and the shutdown fails, leaving the source and destination waiting for
>> >>>>> an event to occur.
>> >>>>
>> >>>> Hi, Cédric
>> >>>>
>> >>>> Are you sure this is not caused by patch 13?
>> >>>
>> >>> It happens with upstream QEMU without any patch.
>> >> 
>> >> I might have taken that "shutdown fails" in the commit message too
>> >> literaly. Anyway, I have a proposed solution:
>> >> 
>> >> -->8--
>> >>  From 729aa7b5b7f130f756d41649fdd0862bd2e90430 Mon Sep 17 00:00:00 2001
>> >> From: Fabiano Rosas <farosas@suse.de>
>> >> Date: Wed, 14 Feb 2024 16:45:43 -0300
>> >> Subject: [PATCH] migration: Join the return path thread before releasing
>> >>   to_dst_file
>> >> MIME-Version: 1.0
>> >> Content-Type: text/plain; charset=UTF-8
>> >> Content-Transfer-Encoding: 8bit
>> >> 
>> >> The return path thread might hang at a blocking system call. Before
>> >> joining the thread we might need to issue a shutdown() on the socket
>> >> file descriptor to release it. To determine whether the shutdown() is
>> >> necessary we look at the QEMUFile error.
>> >> 
>> >> Make sure we only clean up the QEMUFile after the return path has been
>> >> waited for.
>> >
>> > Yes. That's the important part.
>> >
>> >> This fixes a hang when qemu_savevm_state_setup() produced an error
>> >> that was detected by migration_detect_error(). That skips
>> >> migration_completion() so close_return_path_on_source() would get
>> >> stuck waiting for the RP thread to terminate.
>> >> 
>> >> At migrate_fd_cleanup() I'm keeping the relative order of joining the
>> >> migration thread and the return path just in case.
>> >
>> > That doesn't look necessary.
>> 
>> Indeed. But I don't trust the migration code, it's full of undocumented
>> dependencies like that.
>> 
>> > What was the reason to join the migration thread only when
>> > s->to_dst_file is valid ?
>> 
>> I didn't find any explicit reason looking through the history. It seems
>> we used to rely on to_dst_file before migration_thread_running was
>> introduced.
>> 
>> I wouldn't mind keeping that 'if' there.
>> 
>> Let's see what Peter thinks about it.
>
> Frankly I don't have a strong opinion on current patch 14 or the new
> proposal, but it seems we reached a consensus.
>
> Fabiano, would you repost with a formal patch, with the proper tags?

Yes, I'll post it soon.

>
> One thing I am still not sure is whether we should still have patch 13
> altogether? Please see my other reply on whether it's possible to have
> migrate_get_error() == true but qemu_file_get_error() == false.

I'll include it then.

> In
> postcopy_pause(), currently we constantly shutdown() so the join() should
> always work:
>
>         qemu_file_shutdown(file);
>         qemu_fclose(file);
>
>         /*
>          * We're already pausing, so ignore any errors on the return
>          * path and just wait for the thread to finish. It will be
>          * re-created when we resume.
>          */
>         close_return_path_on_source(s);
>
> If move close_return_path_on_source() upper, qemu_file_shutdown() may not
> be needed? And I think we need to make sure close_return_path_on_source()
> will always properly kick the other thread.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]