qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
Date: Tue, 19 Nov 2019 12:17:49 +0000

19.11.2019 15:05, Kevin Wolf wrote:
> Am 16.11.2019 um 17:34 hat Vladimir Sementsov-Ogievskiy geschrieben:
>> Hi all!
>>
>> I wanted to understand, what is the real difference between
>> bdrv_block_status_above and bdrv_is_allocated_above, IMHO
>> bdrv_is_allocated_above should work through bdrv_block_status_above..
>>
>> And I found the problem: bdrv_is_allocated_above considers space after
>> EOF as UNALLOCATED for intermediate nodes..
>>
>> UNALLOCATED is not about allocation at fs level, but about should we
>> go to backing or not.. And it seems incorrect for me, as in case of
>> short backing file, we'll read zeroes after EOF, instead of going
>> further by backing chain.
> 
> We actually have documentation what it means:
> 
>   * BDRV_BLOCK_ALLOCATED: the content of the block is determined by this
>   *                       layer rather than any backing, set by block layer
> 
> Say we have a short overlay, like this:
> 
> base.qcow2:     AAAAAAAA
> overlay.qcow2:  BBBB
> 
> Then of course the content of block 5 (the one after EOF of
> overlay.qcow2) is still determined by overlay.qcow2, which can be easily
> verified by reading it from overlay.qcow2 (produces zeros) and from
> base.qcow2 (produces As).
> 
> So the correct result when querying the block status of block 5 on
> overlay.qcow2 is BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO.
> 
> Interestingly, we already fixed the opposite case (large overlay over
> short backing file) in commit e88ae2264d9 from May 2014 according to the
> same logic.
> 
>> This leads to the following effect:
>>
>> ./qemu-img create -f qcow2 base.qcow2 2M
>> ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
>>
>> ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
>> ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
>>
>> Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
>> ./qemu-io -c "read -P 0 1M 1M" top.qcow2
>> read 1048576/1048576 bytes at offset 1048576
>> 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
>>
>> But after commit guest visible state is changed, which seems wrong for me:
>> ./qemu-img commit top.qcow2 -b mid.qcow2
>>
>> ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
>> Pattern verification failed at offset 1048576, 1048576 bytes
>> read 1048576/1048576 bytes at offset 1048576
>> 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
>>
>> ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
>> read 1048576/1048576 bytes at offset 1048576
>> 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
>>
>>
>> I don't know, is it a real bug, as I don't know, do we support backing
>> file larger than its parent. Still, I'm not sure that this behavior of
>> bdrv_is_allocated_above don't lead to other problems.
> 
> I agree it's a bug.
> 
> Your fix doesn't look right to me, though. You leave the buggy behaviour
> of bdrv_co_block_status() as it is and then add four patches to work
> around it in some (but not all) callers of it.
> 
> All that it should take to fix this is making the bs->backing check
> independent from want_zero and let it set ALLOCATED. What I expected
> would be something like the below patch.
> 
> But it doesn't seem to fully fix the problem (though 'alloc 1M 1M' in
> qemu-io shows that the range is now considered allocated), so probably
> there is still a separate bug in bdrv_is_allocated_above().
> 
> And I think we'll want an iotests case for both cases (short overlay,
> short backing file).
> 
> Kevin
> 
> 
> diff --git a/block/io.c b/block/io.c
> index f75777f5ea..5eafcff01a 100644
> --- a/block/io.c
> +++ b/block/io.c
> @@ -2359,16 +2359,17 @@ static int coroutine_fn 
> bdrv_co_block_status(BlockDriverState *bs,
>   
>       if (ret & (BDRV_BLOCK_DATA | BDRV_BLOCK_ZERO)) {
>           ret |= BDRV_BLOCK_ALLOCATED;
> -    } else if (want_zero) {
> -        if (bdrv_unallocated_blocks_are_zero(bs)) {
> -            ret |= BDRV_BLOCK_ZERO;
> -        } else if (bs->backing) {
> -            BlockDriverState *bs2 = bs->backing->bs;
> -            int64_t size2 = bdrv_getlength(bs2);
> -
> -            if (size2 >= 0 && offset >= size2) {
> +    } else if (want_zero && bdrv_unallocated_blocks_are_zero(bs)) {
> +        ret |= BDRV_BLOCK_ZERO;
> +    } else if (bs->backing) {
> +        BlockDriverState *bs2 = bs->backing->bs;
> +        int64_t size2 = bdrv_getlength(bs2);
> +
> +        if (size2 >= 0 && offset >= size2) {
> +            if (want_zero) {
>                   ret |= BDRV_BLOCK_ZERO;
>               }
> +            ret |= BDRV_BLOCK_ALLOCATED;

No, I think it's wrong, as it's not allocated at bs, but at bs->backing->bs.


So, bdrv_co_block_status works correct, what we can change about it, is not
to return pnum=0 if requested range after eof, but return pnum=bytes, together
with BDRV_BLOCK_ALLOCATED | BDRV_BLOCK_ZERO | BDRV_BLOCK_EOF.

And actual problem is in bdrv_block_status_above and bdrv_is_allocated_above, 
which
I'm fixing.



-- 
Best regards,
Vladimir



reply via email to

[Prev in Thread] Current Thread [Next in Thread]