qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v1] migration/postcopy: ensure preempt channel is ready befor


From: Wang, Wei W
Subject: RE: [PATCH v1] migration/postcopy: ensure preempt channel is ready before loading states
Date: Fri, 5 Apr 2024 03:06:20 +0000

On Friday, April 5, 2024 10:33 AM, Peter Xu wrote:
> On Fri, Apr 05, 2024 at 01:38:31AM +0000, Wang, Wei W wrote:
> > On Friday, April 5, 2024 4:57 AM, Peter Xu wrote:
> > > On Fri, Apr 05, 2024 at 12:48:15AM +0800, Wang, Lei wrote:
> > > > On 4/5/2024 0:25, Wang, Wei W wrote:> On Thursday, April 4, 2024
> > > > 10:12 PM, Peter Xu wrote:
> > > > >> On Thu, Apr 04, 2024 at 06:05:50PM +0800, Wei Wang wrote:
> > > > >>> Before loading the guest states, ensure that the preempt
> > > > >>> channel has been ready to use, as some of the states (e.g. via
> > > > >>> virtio_load) might trigger page faults that will be handled
> > > > >>> through the
> > > preempt channel.
> > > > >>> So yield to the main thread in the case that the channel
> > > > >>> create event has been dispatched.
> > > > >>>
> > > > >>> Originally-by: Lei Wang <lei4.wang@intel.com>
> > > > >>> Link:
> > > > >>> https://lore.kernel.org/all/9aa5d1be-7801-40dd-83fd-f7e041ced2
> > > > >>> 49@i
> > > > >>> ntel
> > > > >>> .com/T/
> > > > >>> Suggested-by: Peter Xu <peterx@redhat.com>
> > > > >>> Signed-off-by: Lei Wang <lei4.wang@intel.com>
> > > > >>> Signed-off-by: Wei Wang <wei.w.wang@intel.com>
> > > > >>> ---
> > > > >>>  migration/savevm.c | 17 +++++++++++++++++
> > > > >>>  1 file changed, 17 insertions(+)
> > > > >>>
> > > > >>> diff --git a/migration/savevm.c b/migration/savevm.c index
> > > > >>> 388d7af7cd..fbc9f2bdd4 100644
> > > > >>> --- a/migration/savevm.c
> > > > >>> +++ b/migration/savevm.c
> > > > >>> @@ -2342,6 +2342,23 @@ static int
> > > > >>> loadvm_handle_cmd_packaged(MigrationIncomingState *mis)
> > > > >>>
> > > > >>>      QEMUFile *packf = qemu_file_new_input(QIO_CHANNEL(bioc));
> > > > >>>
> > > > >>> +    /*
> > > > >>> +     * Before loading the guest states, ensure that the
> > > > >>> + preempt channel
> > > has
> > > > >>> +     * been ready to use, as some of the states (e.g. via 
> > > > >>> virtio_load)
> might
> > > > >>> +     * trigger page faults that will be handled through the
> > > > >>> + preempt
> > > channel.
> > > > >>> +     * So yield to the main thread in the case that the
> > > > >>> + channel create
> > > event
> > > > >>> +     * has been dispatched.
> > > > >>> +     */
> > > > >>> +    do {
> > > > >>> +        if (!migrate_postcopy_preempt() || !qemu_in_coroutine() ||
> > > > >>> +            mis->postcopy_qemufile_dst) {
> > > > >>> +            break;
> > > > >>> +        }
> > > > >>> +
> > > > >>> +        aio_co_schedule(qemu_get_current_aio_context(),
> > > > >> qemu_coroutine_self());
> > > > >>> +        qemu_coroutine_yield();
> > > > >>> +    } while
> > > > >>> + (!qemu_sem_timedwait(&mis->postcopy_qemufile_dst_done,
> > > > >>> + 1));
> > > > >>
> > > > >> I think we need s/!// here, so the same mistake I made?  I
> > > > >> think we need to rework the retval of qemu_sem_timedwait() at some
> point later..
> > > > >
> > > > > No. qemu_sem_timedwait returns false when timeout, which means
> > > > > sem
> > > isn’t posted yet.
> > > > > So it needs to go back to the loop. (the patch was tested)
> > > >
> > > > When timeout, qemu_sem_timedwait() will return -1. I think the
> > > > patch test passed may because you will always have at least one
> > > > yield (the first yield in the do ...while ...) when
> loadvm_handle_cmd_packaged()?
> > >
> > > My guess is that here the kick will work and qemu_sem_timedwait()
> > > later will ETIMEOUT -> qemu_sem_timedwait() returns -1, then the loop
> just broke.
> > > That aio schedule should make sure anyway that the file is ready;
> > > the preempt thread must run before this to not hang that thread.
> >
> > Yes, misread of the return value. It still worked because the loop
> > broke at the "if (mis->postcopy_qemufile_dst)" check.
> >
> > Even below will work:
> > do {
> >     if (mis->postcopy_qemufile_dst) {
> >         break;
> >      }
> > ...
> > } while (1);
> >
> > I still don’t see the value of using postcopy_qemufile_dst_done sem in
> > the code though It simplify blocks the main thread from creating the
> > preempt channel for 1ms (regardless of the possibility about whether
> > the sem has been be posted or not. We add it for the case it is not posted
> and need to go back to the loop).
> 
> I think it used to only wait() in the preempt thread, so that is needed.
> It's also needed when postcopy is interrupted and need a recover, see
> loadvm_postcopy_handle_resume(), in that case it's the postcopy ram load
> thread that waits for it rather than the main thread or preempt thread.
> 
> Indeed if we move channel creation out of the preempt thread then it seems
> we don't need the sem in this path.  However the other path will still need 
> it,
> then when the new channel created (postcopy_preempt_new_channel) we'll
> need to identify a "switch to postcopy" case or "postcopy recovery" case, only
> post the sem when the former.  I think it might complicate the code, I'll 
> think
> again tomorrow after a sleep so my brain will work better, but I doubt this is
> what we want to do at rc3.

Yes, it's a bit rushed (no need to remove it completely at rc3).

> 
> If you feel comfortable, please feel free to send a version that you think is 
> the
> most correct so far (if you prefer no timedwait it's fine), and make sure the 
> test
> works the best on your side.  Then I'll smoke it a bit during weekends. Please
> always keep that in mind if that will be for rc3 it should be non-intrusive
> change, or we keep it slow for 9.1 and we don't need to rush.
> 
Sounds good.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]