[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC PATCH 14/14] migration: Fix return-path thread exit
From: |
Fabiano Rosas |
Subject: |
Re: [RFC PATCH 14/14] migration: Fix return-path thread exit |
Date: |
Fri, 23 Feb 2024 11:05:34 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Fri, Feb 16, 2024 at 02:35:26PM -0300, Fabiano Rosas wrote:
>> Cédric Le Goater <clg@redhat.com> writes:
>>
>> > Hello Fabiano
>> >
>> > On 2/14/24 21:35, Fabiano Rosas wrote:
>> >> Cédric Le Goater <clg@redhat.com> writes:
>> >>
>> >>> Hello Fabiano
>> >>>
>> >>> On 2/8/24 14:29, Fabiano Rosas wrote:
>> >>>> Cédric Le Goater <clg@redhat.com> writes:
>> >>>>
>> >>>>> In case of error, close_return_path_on_source() can perform a shutdown
>> >>>>> to exit the return-path thread. However, in migrate_fd_cleanup(),
>> >>>>> 'to_dst_file' is closed before calling close_return_path_on_source()
>> >>>>> and the shutdown fails, leaving the source and destination waiting for
>> >>>>> an event to occur.
>> >>>>
>> >>>> Hi, Cédric
>> >>>>
>> >>>> Are you sure this is not caused by patch 13?
>> >>>
>> >>> It happens with upstream QEMU without any patch.
>> >>
>> >> I might have taken that "shutdown fails" in the commit message too
>> >> literaly. Anyway, I have a proposed solution:
>> >>
>> >> -->8--
>> >> From 729aa7b5b7f130f756d41649fdd0862bd2e90430 Mon Sep 17 00:00:00 2001
>> >> From: Fabiano Rosas <farosas@suse.de>
>> >> Date: Wed, 14 Feb 2024 16:45:43 -0300
>> >> Subject: [PATCH] migration: Join the return path thread before releasing
>> >> to_dst_file
>> >> MIME-Version: 1.0
>> >> Content-Type: text/plain; charset=UTF-8
>> >> Content-Transfer-Encoding: 8bit
>> >>
>> >> The return path thread might hang at a blocking system call. Before
>> >> joining the thread we might need to issue a shutdown() on the socket
>> >> file descriptor to release it. To determine whether the shutdown() is
>> >> necessary we look at the QEMUFile error.
>> >>
>> >> Make sure we only clean up the QEMUFile after the return path has been
>> >> waited for.
>> >
>> > Yes. That's the important part.
>> >
>> >> This fixes a hang when qemu_savevm_state_setup() produced an error
>> >> that was detected by migration_detect_error(). That skips
>> >> migration_completion() so close_return_path_on_source() would get
>> >> stuck waiting for the RP thread to terminate.
>> >>
>> >> At migrate_fd_cleanup() I'm keeping the relative order of joining the
>> >> migration thread and the return path just in case.
>> >
>> > That doesn't look necessary.
>>
>> Indeed. But I don't trust the migration code, it's full of undocumented
>> dependencies like that.
>>
>> > What was the reason to join the migration thread only when
>> > s->to_dst_file is valid ?
>>
>> I didn't find any explicit reason looking through the history. It seems
>> we used to rely on to_dst_file before migration_thread_running was
>> introduced.
>>
>> I wouldn't mind keeping that 'if' there.
>>
>> Let's see what Peter thinks about it.
>
> Frankly I don't have a strong opinion on current patch 14 or the new
> proposal, but it seems we reached a consensus.
>
> Fabiano, would you repost with a formal patch, with the proper tags?
Yes, I'll post it soon.
>
> One thing I am still not sure is whether we should still have patch 13
> altogether? Please see my other reply on whether it's possible to have
> migrate_get_error() == true but qemu_file_get_error() == false.
I'll include it then.
> In
> postcopy_pause(), currently we constantly shutdown() so the join() should
> always work:
>
> qemu_file_shutdown(file);
> qemu_fclose(file);
>
> /*
> * We're already pausing, so ignore any errors on the return
> * path and just wait for the thread to finish. It will be
> * re-created when we resume.
> */
> close_return_path_on_source(s);
>
> If move close_return_path_on_source() upper, qemu_file_shutdown() may not
> be needed? And I think we need to make sure close_return_path_on_source()
> will always properly kick the other thread.
- [RFC PATCH 14/14] migration: Fix return-path thread exit, (continued)
- [RFC PATCH 14/14] migration: Fix return-path thread exit, Cédric Le Goater, 2024/02/07
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit, Fabiano Rosas, 2024/02/08
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit, Cédric Le Goater, 2024/02/12
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit, Fabiano Rosas, 2024/02/14
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit, Cédric Le Goater, 2024/02/16
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit, Fabiano Rosas, 2024/02/16
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit, Peter Xu, 2024/02/22
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit,
Fabiano Rosas <=
- Re: [RFC PATCH 14/14] migration: Fix return-path thread exit, Cédric Le Goater, 2024/02/26
[PATCH 12/14] migration: Report error when shutdown fails, Cédric Le Goater, 2024/02/07