[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the
From: |
Zhanghailiang |
Subject: |
RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place |
Date: |
Fri, 15 May 2020 01:45:52 +0000 |
> -----Original Message-----
> From: Lukas Straub [mailto:address@hidden]
> Sent: Thursday, May 14, 2020 10:31 PM
> To: Zhanghailiang <address@hidden>
> Cc: qemu-devel <address@hidden>; Zhang Chen
> <address@hidden>; Juan Quintela <address@hidden>; Dr. David
> Alan Gilbert <address@hidden>
> Subject: Re: [PATCH 6/6] migration/colo.c: Move
> colo_notify_compares_event to the right place
>
> On Thu, 14 May 2020 13:27:30 +0000
> Zhanghailiang <address@hidden> wrote:
>
> > Cc: Zhang Chen <address@hidden>
> >
> > >
> > > If the secondary has to failover during checkpointing, it still is
> > > in the old state (i.e. different state than primary). Thus we can't
> > > expose the primary state until after the checkpoint is sent.
> > >
> >
> > Hmm, do you mean we should not flush the net packages to client
> > connection until checkpointing Process almost success because it may fail
> during checkpointing ?
>
> No.
> If the primary fails/crashes during checkpointing, the secondary is still in
> different state than the primary because it didn't receive the full
> checkpoint.
> We can release the miscompared packets only after both primary and
> secondary are in the same state.
>
> Example:
> 1. Client opens a TCP connection, sends SYN.
> 2. Primary accepts the connection with SYN-ACK, but due to
> nondeterministic execution the secondary is delayed.
> 3. Checkpoint happens, primary releases the SYN-ACK packet but then
> crashes while sending the checkpoint.
> 4. The Secondary fails over. At this point it is still in the old state where
> it
> hasn't sent the SYN-ACK packet.
> 5. The client responds with ACK to the SYN-ACK packet.
> 6. Because it doesn't know the connection, the secondary responds with RST,
> connection reset.
>
Good example. For this patch, it is OK, I will add reviewed-by in your origin
patch.
> Regards,
> Lukas Straub
>
> > > This fixes sporadic connection reset of client connections during
> > > failover.
> > >
> > > Signed-off-by: Lukas Straub <address@hidden>
> > > ---
> > > migration/colo.c | 12 ++++++------
> > > 1 file changed, 6 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/migration/colo.c b/migration/colo.c index
> > > a69782efc5..a3fc21e86e 100644
> > > --- a/migration/colo.c
> > > +++ b/migration/colo.c
> > > @@ -430,12 +430,6 @@ static int
> > > colo_do_checkpoint_transaction(MigrationState *s,
> > > goto out;
> > > }
> > >
> > > - qemu_event_reset(&s->colo_checkpoint_event);
> > > - colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > > &local_err);
> > > - if (local_err) {
> > > - goto out;
> > > - }
> > > -
> > > /* Disable block migration */
> > > migrate_set_block_enabled(false, &local_err);
> > > qemu_mutex_lock_iothread();
> > > @@ -494,6 +488,12 @@ static int
> > > colo_do_checkpoint_transaction(MigrationState *s,
> > > goto out;
> > > }
> > >
> > > + qemu_event_reset(&s->colo_checkpoint_event);
> > > + colo_notify_compares_event(NULL, COLO_EVENT_CHECKPOINT,
> > > &local_err);
> > > + if (local_err) {
> > > + goto out;
> > > + }
> > > +
> > > colo_receive_check_message(s->rp_state.from_dst_file,
> > > COLO_MESSAGE_VMSTATE_LOADED,
> &local_err);
> > > if (local_err) {
> > > --
> > > 2.20.1
- Re: [PATCH 2/6] migration/colo.c: Use cpu_synchronize_all_states(), (continued)
- [PATCH 3/6] migration/colo.c: Flush ram cache only after receiving device state, Lukas Straub, 2020/05/11
- [PATCH 4/6] migration/colo.c: Relaunch failover even if there was an error, Lukas Straub, 2020/05/11
- [PATCH 5/6] migration/qemu-file.c: Don't ratelimit a shutdown fd, Lukas Straub, 2020/05/11
- [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place, Lukas Straub, 2020/05/11
- RE: [PATCH 6/6] migration/colo.c: Move colo_notify_compares_event to the right place, Zhanghailiang, 2020/05/14