[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4 0/4] migration/postcopy: Sync faulted addresses after netw
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [PATCH v4 0/4] migration/postcopy: Sync faulted addresses after network recovered |
Date: |
Mon, 12 Oct 2020 12:23:07 +0100 |
User-agent: |
Mutt/1.14.6 (2020-07-11) |
* Dr. David Alan Gilbert (dgilbert@redhat.com) wrote:
> Queued
Hi Peter,
I've had to unqueue this again unfortunately.
There's something going on with big endian hosts; on a PPC BE host,
it reliably hangs in the recovery test with this set.
(Although I can't see anything relevant to eye).
Dave
P.S. I can point you at an installed host
> * Peter Xu (peterx@redhat.com) wrote:
> > v4:
> > - use "void */ulong" instead of "uint64_t" where proper in patch 3/4 [Dave]
> >
> > v3:
> > - fix build on 32bit hosts & rebase
> > - remove r-bs for the last 2 patches for Dave due to the changes
> >
> > v2:
> > - add r-bs for Dave
> > - add patch "migration: Properly destroy variables on incoming side" as
> > patch 1
> > - destroy page_request_mutex in migration_incoming_state_destroy() too
> > [Dave]
> > - use WITH_QEMU_LOCK_GUARD in two places where we can [Dave]
> >
> > We've seen conditional guest hangs on destination VM after postcopy
> > recovered.
> > However the hang will resolve itself after a few minutes.
> >
> > The problem is: after a postcopy recovery, the prioritized postcopy queue on
> > the source VM is actually missing. So all the faulted threads before the
> > postcopy recovery happened will keep halted until (accidentally) the page
> > got
> > copied by the background precopy migration stream.
> >
> > The solution is to also refresh this information after postcopy recovery.
> > To
> > achieve this, we need to maintain a list of faulted addresses on the
> > destination node, so that we can resend the list when necessary. This work
> > is
> > done via patch 2-5.
> >
> > With that, the last thing we need to do is to send this extra information to
> > source VM after recovered. Very luckily, this synchronization can be
> > "emulated" by sending a bunch of page requests (although these pages have
> > been
> > sent previously!) to source VM just like when we've got a page fault. Even
> > in
> > the 1st version of the postcopy code we'll handle duplicated pages well. So
> > this fix does not even need a new capability bit and it'll work smoothly on
> > old
> > QEMUs when we migrate from them to the new QEMUs.
> >
> > Please review, thanks.
> >
> > Peter Xu (4):
> > migration: Pass incoming state into qemu_ufd_copy_ioctl()
> > migration: Introduce migrate_send_rp_message_req_pages()
> > migration: Maintain postcopy faulted addresses
> > migration: Sync requested pages after postcopy recovery
> >
> > migration/migration.c | 49 ++++++++++++++++++++++++++++++++--
> > migration/migration.h | 21 ++++++++++++++-
> > migration/postcopy-ram.c | 25 +++++++++++++-----
> > migration/savevm.c | 57 ++++++++++++++++++++++++++++++++++++++++
> > migration/trace-events | 3 +++
> > 5 files changed, 146 insertions(+), 9 deletions(-)
> >
> > --
> > 2.26.2
> >
> >
> >
> --
> Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
- [PATCH v4 0/4] migration/postcopy: Sync faulted addresses after network recovered, Peter Xu, 2020/10/02
- [PATCH v4 1/4] migration: Pass incoming state into qemu_ufd_copy_ioctl(), Peter Xu, 2020/10/02
- [PATCH v4 2/4] migration: Introduce migrate_send_rp_message_req_pages(), Peter Xu, 2020/10/02
- [PATCH v4 3/4] migration: Maintain postcopy faulted addresses, Peter Xu, 2020/10/02
- [PATCH v4 4/4] migration: Sync requested pages after postcopy recovery, Peter Xu, 2020/10/02
- Re: [PATCH v4 0/4] migration/postcopy: Sync faulted addresses after network recovered, Dr. David Alan Gilbert, 2020/10/07
- Re: [PATCH v4 0/4] migration/postcopy: Sync faulted addresses after network recovered,
Dr. David Alan Gilbert <=