Re: How to impove downtime of Live-Migration caused bdrv_drain

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: How to impove downtime of Live-Migration caused bdrv_drain_all()

From:	Stefan Hajnoczi
Subject:	Re: How to impove downtime of Live-Migration caused bdrv_drain_all()
Date:	Thu, 2 Jan 2020 15:07:47 +0000

On Thu, Dec 26, 2019 at 05:40:22PM +0800, 张海斌 wrote:
> Stefan Hajnoczi <address@hidden> 于2019年3月29日周五 上午1:08写道：
> >
> > On Thu, Mar 28, 2019 at 05:53:34PM +0800, 张海斌 wrote:
> > > hi, stefan
> > >
> > > I have faced the same problem you wrote in
> > > https://lists.gnu.org/archive/html/qemu-devel/2016-08/msg04025.html
> > >
> > > Reproduce as follow:
> > > 1. Clone qemu code from https://git.qemu.org/git/qemu.git, add some
> > > debug information and compile
> > > 2. Start a new VM
> > > 3. In VM, use fio randwrite to add pressure for disk
> > > 4. Live migrate
> > >
> > > Log show as follow:
> > > [2019-03-28 15:10:40.206] /data/qemu/cpus.c:1086: enter do_vm_stop
> > > [2019-03-28 15:10:40.212] /data/qemu/cpus.c:1097: call bdrv_drain_all
> > > [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1099: call 
> > > replay_disable_events
> > > [2019-03-28 15:10:40.989] /data/qemu/cpus.c:1101: call bdrv_flush_all
> > > [2019-03-28 15:10:41.004] /data/qemu/cpus.c:1104: done do_vm_stop
> > >
> > > Calling bdrv_drain_all() costs 792 mini-seconds.
> > > I just add a bdrv_drain_all() at start of do_vm_stop() before
> > > pause_all_vcpus(), but it doesn't work.
> > > Is there any way to improve live-migration downtime cause by 
> > > bdrv_drain_all()?

I believe there were ideas about throttling storage controller devices
during the later phases of live migration to reduce the number of
pending I/Os.

In other words, if QEMU's virtio-blk/scsi emulation code reduces the
queue depth as live migration nears the handover point, bdrv_drain_all()
should become cheaper because fewer I/O requests will be in-flight.

A simple solution would reduce the queue depth during live migration
(e.g. queue depth 1).  A smart solution would look at I/O request
latency to decide what queue depth is acceptable.  For example, if
requests are taking 4 ms to complete then we might allow 2 or 3 requests
to achieve a ~10 ms bdrv_drain_all() downtime target.

As far as I know this has not been implemented.

Do you want to try implementing this?

Stefan

signature.asc
Description: PGP signature

[Prev in Thread]

Current Thread

[Next in Thread]

Re: How to impove downtime of Live-Migration caused bdrv_drain_all(), Stefan Hajnoczi <=
- Re: How to impove downtime of Live-Migration caused bdrv_drain_all(), Felipe Franciosi, 2020/01/02

Prev by Date: Re: Making QEMU easier for management tools and applications
Next by Date: Re: [PATCH 43/86] hppa: drop RAM size fixup
Previous by thread: Re: [PATCH v1] virtio: stregthen virtqueue size invariants
Next by thread: Re: How to impove downtime of Live-Migration caused bdrv_drain_all()
Index(es):
- Date
- Thread