qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [Qemu-block] do not lseek in qcow2 block_status


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [Qemu-block] do not lseek in qcow2 block_status
Date: Tue, 25 Dec 2018 09:45:14 +0000

24.12.2018 23:42, Nir Soffer wrote:
> On Mon, Dec 24, 2018 at 5:50 PM Vladimir Sementsov-Ogievskiy <address@hidden 
> <mailto:address@hidden>> wrote:
> 
>     Hi all!
> 
>     bdrv_co_block_status digs bs->file for additional, more accurate search 
> for hole
>     inside region, reported as DATA by bs since long-ago
> 
>         commit 5daa74a6ebce7543aaad178c4061dc087bb4c705
>         Author: Paolo Bonzini <address@hidden <mailto:address@hidden>>
>         Date:   Wed Sep 4 19:00:38 2013 +0200
> 
>             block: look for zero blocks in bs->file
> 
>     This accuracy is not free: assume we have qcow2 disk. Actually, qcow2 
> knows,
>     where are holes and where is data. But every block_status request calls 
> lseek
>     additionally. Recently (and not the first time) we have faced the 
> situation,
>     when lseek works slow. Of course, it is more about kernel implementation 
> of it,
>     but why to involve lseek at all, if we have the information on qcow2 
> level?
>     Assume a big disk, full of data, in any iterative copying block job (or 
> img
>     convert) we'll call lseek(HOLE) on every iteration, and each of these 
> lseeks
>     will have to iterate through all metadata up to the end of file. It's 
> obviously
>     ineffective behavior.
> 
>     I'm thinking about, how to properly workaround this thing, and I have some
>     questions.
> 
>     What are the cases, when we really need digging for zeros in bs->file?
> 
>     Why in mirror we care about discard vs write_zeros, keeping in mind, that
>     block_status will return ZERO instead for unallocated regions in some 
> cases, not
>     related to guest view of disk (bdrv_unallocated_blocks_are_zero, backing 
> file is
>     smaller than request.start)?
> 
>     And finally, what to do with the problem?
> 
>     Obviously, the simplest thing is just revert 5daa74a. What will it break?
> 
>     Otherwise, I need to "revert" it in some cases, and, may be, it should be 
> a
>     user-defined parameter.. Should it?  And what are the cases? We need to 
> "free"
>     of 5daa74a at least qcow2.. But again, should it be format-specific 
> option, or
>     something more generic?
> 
>     Then, if it would be separate parameter, how should it interfere with
>     want_zeros, or even, should it be want_zeros, reworked a bit to be 
> want_lseek?
> 
> 
> How about having 2 passes:
> 1. seek DATA/HOLE for entire image, keep result in memory
> 2. use the result when modifying ranges
Yes I think this is possible, but in case of qcow2, again, we don't need lseek 
at all,
and most optimal way is not call it.

> 
> This can also help allocation of clusters in a destination image:
> 1. seek DATA/HOLE in source image
> 2. allocate blocks in destination
> 3. convert using out of order writes *without* causing fragmentation in the 
> destination image.

Interesting idea, I've thought about something like this too. It should work 
for img convert, as
disk is static, but not for mirror..

Ok, for not static disk, we can implement cache filter driver for caching 
block_status results
(and to cache lseek results, we need to allow block_status return more 
information than requested,
like we do in NBD protocol), but again, for qcow2 we already have additional 
metadata layer which
is qcow2 itself, so lseek is redundant.

> 
> Here is an example using this approach when copying images to qem-nbd. This 
> is a pretty
> simplistic single threaded implementation in python, using "qemu-img map" to 
> find the data
> and zero ranges. It is faster than qemu-img convert :-)
> 
> https://github.com/oVirt/ovirt-imageio/blob/master/examples/nbd-client
> https://github.com/oVirt/ovirt-imageio/commit/bec7cb677e33c6d0a7645c367af3d56b70b666db
> 
> Nir


-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]