qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 0/6] migration/postcopy: Sync faulted addresses after netw


From: Dr. David Alan Gilbert
Subject: Re: [PATCH v6 0/6] migration/postcopy: Sync faulted addresses after network recovered
Date: Mon, 26 Oct 2020 13:57:56 +0000
User-agent: Mutt/1.14.6 (2020-07-11)

* Peter Xu (peterx@redhat.com) wrote:
> v6:
> - fix page mask to use ramblock psize [Dave]
> 
> v5:
> - added one test patch for easier debugging for migration-test
> - added one fix patch [1] for another postcopy race
> - fixed a bug that could trigger when host/guest page size differs
> 
> v4:
> - use "void */ulong" instead of "uint64_t" where proper in patch 3/4 [Dave]
> 
> v3:
> - fix build on 32bit hosts & rebase
> - remove r-bs for the last 2 patches for Dave due to the changes
> 
> v2:
> - add r-bs for Dave
> - add patch "migration: Properly destroy variables on incoming side" as patch 
> 1
> - destroy page_request_mutex in migration_incoming_state_destroy() too [Dave]
> - use WITH_QEMU_LOCK_GUARD in two places where we can [Dave]
> 
> We've seen conditional guest hangs on destination VM after postcopy recovered.
> However the hang will resolve itself after a few minutes.
> 
> The problem is: after a postcopy recovery, the prioritized postcopy queue on
> the source VM is actually missing.  So all the faulted threads before the
> postcopy recovery happened will keep halted until (accidentally) the page got
> copied by the background precopy migration stream.
> 
> The solution is to also refresh this information after postcopy recovery.  To
> achieve this, we need to maintain a list of faulted addresses on the
> destination node, so that we can resend the list when necessary.  This work is
> done via patch 2-5.
> 
> With that, the last thing we need to do is to send this extra information to
> source VM after recovered.  Very luckily, this synchronization can be
> "emulated" by sending a bunch of page requests (although these pages have been
> sent previously!) to source VM just like when we've got a page fault.  Even in
> the 1st version of the postcopy code we'll handle duplicated pages well.  So
> this fix does not even need a new capability bit and it'll work smoothly on 
> old
> QEMUs when we migrate from them to the new QEMUs.
> 
> Please review, thanks.

Queued

Dave

> 
> Peter Xu (6):
>   migration: Pass incoming state into qemu_ufd_copy_ioctl()
>   migration: Introduce migrate_send_rp_message_req_pages()
>   migration: Maintain postcopy faulted addresses
>   migration: Sync requested pages after postcopy recovery
>   migration/postcopy: Release fd before going into 'postcopy-pause'
>   migration-test: Only hide error if !QTEST_LOG
> 
>  migration/migration.c        | 55 ++++++++++++++++++++++++++++++----
>  migration/migration.h        | 21 ++++++++++++-
>  migration/postcopy-ram.c     | 25 ++++++++++++----
>  migration/savevm.c           | 57 ++++++++++++++++++++++++++++++++++++
>  migration/trace-events       |  3 ++
>  tests/qtest/migration-test.c |  6 +++-
>  6 files changed, 154 insertions(+), 13 deletions(-)
> 
> -- 
> 2.26.2
> 
> 
> 
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]