qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] Request for clarification on qemu-img conv


From: Eric Blake
Subject: Re: [Qemu-devel] [Qemu-block] Request for clarification on qemu-img convert behavior zeroing target host_device
Date: Thu, 13 Dec 2018 09:05:43 -0600
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1

On 12/13/18 8:49 AM, Kevin Wolf wrote:

We observe that in Fedora 29 the qemu-img, before imaging the disk, it fully 
zeroes it. Taking into account the disk size, the whole process now takes 35 
minutes instead of 50 seconds. This causes the ironic-python-agent operation to 
time-out. The Fedora 27 qemu-img doesn't do that.

Known issue; Nir and Rich have posted a previous thread on the topic, and
the conclusion is that we need to make qemu-img smarter about NOT requesting
pre-zeroing of devices where that is more expensive than just zeroing as we
go.
https://lists.gnu.org/archive/html/qemu-devel/2018-11/msg01182.html

Yes, we should be careful to avoid the fallback in this case.

However, how could this ever go from 50 seconds for writing the whole
image to 35 minutes?! Even if you end up writing the whole image twice
because you write zeros first and then overwrite them everywhere with
data, shouldn't the maximum be doubling the time, i.e. 100 seconds?

Why is the write_zeroes fallback _that_ slow? It will also hit guests
that request write_zeroes, so I feel this is worth investigating a bit
more nevertheless.

Can you check with strace which operation actually succeeds writing
zeros to /dev/sda? The first thing we try is fallocate with
FALLOC_FL_PUNCH_HOLE | FALLOC_FL_KEEP_SIZE. This should always be fast,
so I suppose this fails in your case. The next thing is BLKZEROOUT,
which I think can do a fallback in the kernel. Does this return success?
Otherwise we have another fallback mechanism inside of QEMU, which would
use normal pwrite calls with a zeroed buffer.

It may also be a case of poor lseek(SEEK_HOLE) performance on the source (a known issue with at least some versions of tmpfs). The way qemu-img queries for block status, it ends up repeatedly hammering on lseek(), and if lseek() is already O(n) instead of O(1) in behavior, that explodes into some O(n^2) scaling because qemu-img isn't caching the answers it got previously.


Once we know which mechanism is used, we can look into why it is so
abysmally slow.

Indeed, performance traces are important for issues like this.

--
Eric Blake, Principal Software Engineer
Red Hat, Inc.           +1-919-301-3266
Virtualization:  qemu.org | libvirt.org



reply via email to

[Prev in Thread] Current Thread [Next in Thread]