qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v1] migration: refactor migration_completion


From: Wang, Wei W
Subject: RE: [PATCH v1] migration: refactor migration_completion
Date: Thu, 27 Jul 2023 14:52:44 +0000

On Thursday, July 27, 2023 1:10 AM, Peter Xu wrote:
> On Fri, Jul 21, 2023 at 11:14:55AM +0000, Wang, Wei W wrote:
> > On Friday, July 21, 2023 4:38 AM, Peter Xu wrote:
> > > Looks good to me, after addressing Isaku's comments.
> > >
> > > The current_active_state is very unfortunate, along with most of the
> > > calls to
> > > migrate_set_state() - I bet most of the code will definitely go
> > > wrong if that cmpxchg didn't succeed inside of migrate_set_state(),
> > > IOW in most cases we simply always want:
> >
> > Can you share examples where it could be wrong?
> > (If it has bugs, we need to fix)
> 
> Nop.  What I meant is most of the cases we want to set the state without
> caring much about the old state, so at least we can have a helper like below
> and simply call migrate_set_state(s, STATE) where we don't care old state.
> 
> >
> > >
> > >   migrate_set_state(&s->state, s->state, XXX);
> > >
> > > Not sure whether one pre-requisite patch is good to have so we can
> > > rename
> > > migrate_set_state() to something like __migrate_set_state(), then:
> > >
> > >   migrate_set_state(s, XXX) {
> > >     __migrate_set_state(&s->state, s->state, XXX);
> > >   }
> > >
> > > I don't even know whether there's any call site that will need
> > > __migrate_set_state() for real..
> > >
> >
> > Seems this would break the use of "MIGRATION_STATUS_CANCELLING".
> > For example,
> > - In migration_maybe_pause:
> > migrate_set_state(&s->state, MIGRATION_STATUS_PRE_SWITCHOVER,
> >                                     new_state); If the current
> > s->state isn't MIGRATION_STATUS_PRE_SWITCHOVER (could be
> > MIGRATION_STATUS_CANCELLING),  then s->state won’t be updated to
> > new_state.
> > - Then, in migration_completion, the following update to s->state won't
> succeed:
> >    migrate_set_state(&s->state, current_active_state,
> >                           MIGRATION_STATUS_COMPLETED);
> >
> > - Finally, when reaching migration_iteration_finish(), s->state is
> > MIGRATION_STATUS_CANCELLING, instead of
> MIGRATION_STATUS_COMPLETED.
> 
> The whole state changes are just flaky to me in general, even with the help of
> old_state cmpxchg.

Yes, the design/implementation of the migration state transition can be
improved (it looks fragile to me). I think this should be done in a separate
patchset, though. For this patch, we could keep it no functional change.

> 
> E.g., I'm wondering whether below race can happen, assuming we're starting
> with ACTIVE state and just about to complete migration:
> 
>           main thread                            migration thread
>           -----------                            ----------------
> 
> migration_maybe_pause(current_active_state==ACTIVE)
>                                              if (s->state != 
> MIGRATION_STATUS_CANCELLING)
>                                                --> true, keep setting state
>                                                qemu_mutex_unlock_iothread();
>     qemu_mutex_lock_iothread();
>     migrate_fd_cancel()
>       if (old_state == MIGRATION_STATUS_PRE_SWITCHOVER)
>         --> false, not posting to pause_sem
>       set state to MIGRATION_STATUS_CANCELLING
>                                               migrate_set_state(&s->state, 
> *current_active_state,
>                                                                 
> MIGRATION_STATUS_PRE_SWITCHOVER);
>                                                 --> false, cmpxchg fail
>                                               qemu_sem_wait(&s->pause_sem);
>                                                 --> hang death?

Still need "migrate continue" to unblock the migration thread.
Probably we should document that PAUSE_BEFORE_SWITCHOVER always requires an
explicit "migrate continue" to be issued from user (even after migration is 
cancelled).

reply via email to

[Prev in Thread] Current Thread [Next in Thread]