qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC]migration: stop/start device at the end of live migration concu


From: Dr. David Alan Gilbert
Subject: Re: [RFC]migration: stop/start device at the end of live migration concurrently
Date: Mon, 1 Mar 2021 16:02:23 +0000
User-agent: Mutt/2.0.5 (2021-01-21)

* Wangxin (Alexander) (wangxinxin.wang@huawei.com) wrote:
> Hi all,

(copying in Michael for vhost user maintainer).

> We found that the downtime of migration will reach a few seconds when live
> migrating a huge VM with 224vCPU/180GiB/16 vhost-user nics (x32 queues)/
> 24 vhost-user-blk disks(x4 queues), most of the time is spent in the
> position of stopping the device at src and starting device at dst.

I suspect that's more vhost-user devices than anyone else has run on a
single VM!

> Our idea is to stop the device through multiple threads during the end of
> migration. To be more specific, we create thread pool at the beginning of live
> migraion, when migration thread call virtio_vmstate_change callback to stop or
> start device in vm_state_notify, it will submits request to thread pool to
> handle the callback concurrently.
> 
> We live migrate the vm and count the cost time at different stages of
> stopping/starting devices.
> 
>   -       -     -                 Cost: Original    With state change 
> concurrently
>                 get vring base             36ms          18ms
>         disk    disable guest notify       48ms          32ms
>                 disable host notify        300ms         120ms
> Src             get vring base             1376ms        294ms
>         net     disable host notify        1011ms        116ms
>                 disable guest notify       59ms          40ms
>  -       -      -
>                 enable guest notify        310ms         97ms
>         net     set memtable               48ms          20ms
>                 enable host notify         2022ms        114ms 
> Dst             enable host notify         312ms         78ms
>         disk    enable guest notify        32ms          23ms
>                 set memTable               16ms          10ms
> Total Downtime                             5600ms        962ms
> 
> However, there are some side effects:
> 1. When disable host notify or guest notify concurrently, the vm will be 
> crashed
> due to disabling same notify at the different threads, we now add two 
> different lock
> to solve this problem, it is hacking to do so and may be resulting in other 
> problems.
> 
> 2. As the QEMU BQL will be held by migration thread before stopping device in
> migration_completion, there will be deadlock in the following scene:
> migration_thread                              [thread 1]
>   set_up_multithread
>   ...
>   migration_completion()# get QEMU BQL
>     qemu_mutex_lock_iothread()
>     vm_stop_force_state()
>     ...
>       submit stopping device request
>       to thread pool
>                                            virtio_vmstate_change
>                                              virtio_set_status
>                                              ...
>                                                memory_region_transaction_begin
>                                                ...
>                                                  prepare_mmio_access
>                                                    
> qemu_mutex_iothread_locked()# N
>                                                    
> qemu_mutex_lock_iothread()# deadlock
> 
> Now we add another lock to replace the BQL in this scene to solve the problem,
> but we think this is not reliable enough and has potential risk that other
> processes will also use the QEMU BQL during the process of stopping device. My
> question is: how to deal with the conflict with QEMU BQL properly.
> 
> Any advice will be appreciated, thanks.

To me it feels like the other way here would be to explicitly split
each of these stages into two; one where it sends the request to the
vhost device and the other it waits for the response from the vhost-user
device;  (i.e. in the vhost_user case after the vhost_user_write but
before the vhost_user_read) - so instead of parallelising everything in
threads, you'd parallelise all of the corresponding operations;
so all of the get_vring_base's happen at the same time.

Michael: Would this make sense as a thing to change VhostOps
get_vring_base and many of the others into two part operations?
(or maybe coroutines with a yield in???)

Dave
-- 
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK




reply via email to

[Prev in Thread] Current Thread [Next in Thread]