qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to impove downtime of Live-Migration caused bdrv_drain_all()


From: Stefan Hajnoczi
Subject: Re: How to impove downtime of Live-Migration caused bdrv_drain_all()
Date: Thu, 2 Jan 2020 15:07:47 +0000

On Thu, Dec 26, 2019 at 05:40:22PM +0800, 张海斌 wrote:
> Stefan Hajnoczi <address@hidden> 于2019年3月29日周五 上午1:08写道:
> >
> > On Thu, Mar 28, 2019 at 05:53:34PM +0800, 张海斌 wrote:
> > > hi, stefan
> > >
> > > I have faced the same problem you wrote in
> > > https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg04025.html
> > >
> > > Reproduce as follow:
> > > 1. Clone qemu code from https://git.qemu.org/git/qemu.git, add some
> > > debug information and compile
> > > 2. Start a new VM
> > > 3. In VM, use fio randwrite to add pressure for disk
> > > 4. Live migrate
> > >
> > > Log show as follow:
> > > [2019-03-28 15:10:40.206] /data/qemu/cpus.c:1086: enter do_vm_stop
> > > [2019-03-28 15:10:40.212] /data/qemu/cpus.c:1097: call bdrv_drain_all
> > > [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1099: call 
> > > replay_disable_events
> > > [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1101: call bdrv_flush_all
> > > [2019-03-28 15:10:41.004] /data/qemu/cpus.c:1104: done do_vm_stop
> > >
> > > Calling bdrv_drain_all() costs 792 mini-seconds.
> > > I just add a bdrv_drain_all() at start of do_vm_stop() before
> > > pause_all_vcpus(), but it doesn't work.
> > > Is there any way to improve live-migration downtime cause by 
> > > bdrv_drain_all()?

I believe there were ideas about throttling storage controller devices
during the later phases of live migration to reduce the number of
pending I/Os.

In other words, if QEMU's virtio-blk/scsi emulation code reduces the
queue depth as live migration nears the handover point, bdrv_drain_all()
should become cheaper because fewer I/O requests will be in-flight.

A simple solution would reduce the queue depth during live migration
(e.g. queue depth 1).  A smart solution would look at I/O request
latency to decide what queue depth is acceptable.  For example, if
requests are taking 4 ms to complete then we might allow 2 or 3 requests
to achieve a ~10 ms bdrv_drain_all() downtime target.

As far as I know this has not been implemented.

Do you want to try implementing this?

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]