qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the


From: Zhanghailiang
Subject: RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place
Date: Fri, 15 May 2020 01:45:52 +0000

> -----Original Message-----
> From: Lukas Straub [mailto:address@hidden]
> Sent: Thursday, May 14, 2020 10:31 PM
> To: Zhanghailiang <address@hidden>
> Cc: qemu-devel <address@hidden>; Zhang Chen
> <address@hidden>; Juan Quintela <address@hidden>; Dr. David
> Alan Gilbert <address@hidden>
> Subject: Re: [PATCH 6/6] migration/colo.c: Move
> colo_notify_compares_event to the right place
> 
> On Thu, 14 May 2020 13:27:30 +0000
> Zhanghailiang <address@hidden> wrote:
> 
> > Cc: Zhang Chen <address@hidden>
> >
> > >
> > > If the secondary has to failover during checkpointing, it still is
> > > in the old state (i.e. different state than primary). Thus we can't
> > > expose the primary state until after the checkpoint is sent.
> > >
> >
> > Hmm, do you mean we should not flush the net packages to client
> > connection until checkpointing Process almost success because it may fail
> during checkpointing ?
> 
> No.
> If the primary fails/crashes during checkpointing, the secondary is still in
> different state than the primary because it didn't receive the full 
> checkpoint.
> We can release the miscompared packets only after both primary and
> secondary are in the same state.
> 
> Example:
> 1. Client opens a TCP connection, sends SYN.
> 2. Primary accepts the connection with SYN-ACK, but due to
> nondeterministic execution the secondary is delayed.
> 3. Checkpoint happens, primary releases the SYN-ACK packet but then
> crashes while sending the checkpoint.
> 4. The Secondary fails over. At this point it is still in the old state where 
> it
> hasn't sent the SYN-ACK packet.
> 5. The client responds with ACK to the SYN-ACK packet.
> 6. Because it doesn't know the connection, the secondary responds with RST,
> connection reset.
> 

Good example. For this patch, it is OK, I will add reviewed-by in your origin 
patch.


> Regards,
> Lukas Straub
> 
> > > This fixes sporadic connection reset of client connections during 
> > > failover.
> > >
> > > Signed-off-by: Lukas Straub <address@hidden>
> > > ---
> > >  migration/colo.c | 12 ++++++------
> > >  1 file changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c index
> > > a69782efc5..a3fc21e86e 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -430,12 +430,6 @@ static int
> > > colo_do_checkpoint_transaction(MigrationState *s,
> > >          goto out;
> > >      }
> > >
> > > -    qemu_event_reset(&s->colo_checkpoint_event);
> > > -    colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > > &local_err);
> > > -    if (local_err) {
> > > -        goto out;
> > > -    }
> > > -
> > >      /* Disable block migration */
> > >      migrate_set_block_enabled(false, &local_err);
> > >      qemu_mutex_lock_iothread();
> > > @@ -494,6 +488,12 @@ static int
> > > colo_do_checkpoint_transaction(MigrationState *s,
> > >          goto out;
> > >      }
> > >
> > > +    qemu_event_reset(&s->colo_checkpoint_event);
> > > +    colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > > &local_err);
> > > +    if (local_err) {
> > > +        goto out;
> > > +    }
> > > +
> > >      colo_receive_check_message(s->rp_state.from_dst_file,
> > >                         COLO_MESSAGE_VMSTATE_LOADED,
> &local_err);
> > >      if (local_err) {
> > > --
> > > 2.20.1




reply via email to

[Prev in Thread] Current Thread [Next in Thread]