qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above


From: Kevin Wolf
Subject: Re: [PATCH 0/4] fix & merge block_status_above and is_allocated_above
Date: Wed, 20 Nov 2019 12:44:08 +0100
User-agent: Mutt/1.12.1 (2019-06-15)

Am 20.11.2019 um 11:20 hat Vladimir Sementsov-Ogievskiy geschrieben:
> 16.11.2019 19:34, Vladimir Sementsov-Ogievskiy wrote:
> > Hi all!
> > 
> > I wanted to understand, what is the real difference between 
> > bdrv_block_status_above
> > and bdrv_is_allocated_above, IMHO bdrv_is_allocated_above should work 
> > through
> > bdrv_block_status_above..
> > 
> > And I found the problem: bdrv_is_allocated_above considers space after EOF 
> > as
> > UNALLOCATED for intermediate nodes..
> > 
> > UNALLOCATED is not about allocation at fs level, but about should we go to 
> > backing or
> > not.. And it seems incorrect for me, as in case of short backing file, 
> > we'll read
> > zeroes after EOF, instead of going further by backing chain.
> > 
> > This leads to the following effect:
> > 
> > ./qemu-img create -f qcow2 base.qcow2 2M
> > ./qemu-io -c "write -P 0x1 0 2M" base.qcow2
> > 
> > ./qemu-img create -f qcow2 -b base.qcow2 mid.qcow2 1M
> > ./qemu-img create -f qcow2 -b mid.qcow2 top.qcow2 2M
> > 
> > Region 1M..2M is shadowed by short middle image, so guest sees zeroes:
> > ./qemu-io -c "read -P 0 1M 1M" top.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (22.795 GiB/sec and 23341.5807 ops/sec)
> > 
> > But after commit guest visible state is changed, which seems wrong for me:
> > ./qemu-img commit top.qcow2 -b mid.qcow2
> > 
> > ./qemu-io -c "read -P 0 1M 1M" mid.qcow2
> > Pattern verification failed at offset 1048576, 1048576 bytes
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (4.981 GiB/sec and 5100.4794 ops/sec)
> > 
> > ./qemu-io -c "read -P 1 1M 1M" mid.qcow2
> > read 1048576/1048576 bytes at offset 1048576
> > 1 MiB, 1 ops; 00.00 sec (3.365 GiB/sec and 3446.1606 ops/sec)
> > 
> > 
> > I don't know, is it a real bug, as I don't know, do we support backing file 
> > larger than
> > its parent. Still, I'm not sure that this behavior of 
> > bdrv_is_allocated_above don't lead
> > to other problems.
> > 
> > =====
> > 
> > Hmm, bdrv_block_allocated_above behaves strange too:
> > 
> > with want_zero=true, it may report unallocated zeroes because of short 
> > backing files, which
> > are actually "allocated" in POV of backing chains. But I see this may 
> > influence only
> > qemu-img compare, and I don't see can it trigger some bug..
> > 
> > with want_zero=false, it may do no progress because of short backing file. 
> > Moreover it may
> > report EOF in the middle!! But want_zero=false used only in 
> > bdrv_is_allocated, which considers
> > onlyt top layer, so it seems OK.
> > 
> > =====
> > 
> > So, I propose these series, still I'm not sure is there a real bug.
> > 
> > Vladimir Sementsov-Ogievskiy (4):
> >    block/io: fix bdrv_co_block_status_above
> >    block/io: bdrv_common_block_status_above: support include_base
> >    block/io: bdrv_common_block_status_above: support bs == base
> >    block/io: fix bdrv_is_allocated_above
> > 
> >   block/io.c                 | 104 ++++++++++++++++++-------------------
> >   tests/qemu-iotests/154.out |   4 +-
> >   2 files changed, 53 insertions(+), 55 deletions(-)
> > 
> 
> 
> Interesting that the problem illustrated here is not fixed by the series, 
> it's actually
> relates to the fact that mirror does truncation with PREALLOC_MODE_OFF, which 
> leads
> to unallocated qcow2 clusters, which I think should be fixed too.

Yes, this is what I posted yesterday. (With a suggested quick fix, but
it turns out it was not quite correct, see below.)

> To illustrate the problem fixed by the series, we should commit to base:
> 
> # ./qemu-img commit top.qcow2 -b base.qcow2
> Image committed.
> # ./qemu-io -c "read -P 0 1M 1M" base.qcow2
> Pattern verification failed at offset 1048576, 1048576 bytes
> read 1048576/1048576 bytes at offset 1048576
> 1 MiB, 1 ops; 00.00 sec (5.366 GiB/sec and 5494.4149 ops/sec)

Ok, I'll try that later.

> Hmm, but how to fix the problem about truncate? I think truncate must
> not make underlying backing available for read.. Discard operation
> doesn't do it.
> 
> So, actually on PREALLOC_MODE_OFF we must allocated L2 tables and mark
> new clusters as ZERO?

Yes, we need to write zeroes to the new area if the backing file covers
it. We need to do this not only in mirror/commit/bdrv_commit(), but in
fact for all truncate operations: Berto mentioned on IRC yesterday that
you can get into the same situation with 'block_resize' monitor
commands.

So I tried to fix this yesterday, and I thought that I had a fix, when I
noticed that bdrv_co_do_zero_pwritev() takes a 32 bit bytes parameter.
So I'll still need to fix this. Other than that, I suppose the following
fix should work (but is probably a bit too invasive for -rc3).

Kevin

diff --git a/block/io.c b/block/io.c
index f75777f5ea..4118bf0118 100644
--- a/block/io.c
+++ b/block/io.c
@@ -3382,6 +3382,32 @@ int coroutine_fn bdrv_co_truncate(BdrvChild *child, 
int64_t offset, bool exact,
         goto out;
     }

+    /*
+     * If the image has a backing file that is large enough that it would
+     * provide data for the new area, we cannot leave it unallocated because
+     * then the backing file content would become visible. Instead, zero-fill
+     * the area where backing file and new area overlap.
+     */
+    if (new_bytes && bs->backing && prealloc == PREALLOC_MODE_OFF) {
+        int64_t backing_len;
+
+        backing_len = bdrv_getlength(backing_bs(bs));
+        if (backing_len < 0) {
+            ret = backing_len;
+            goto out;
+        }
+
+        if (backing_len > old_size) {
+            /* FIXME bytes parameter is 32 bits */
+            ret = bdrv_co_do_zero_pwritev(child, old_size,
+                                          MIN(new_bytes, backing_len - 
old_size),
+                                          BDRV_REQ_ZERO_WRITE | 
BDRV_REQ_MAY_UNMAP, &req);
+            if (ret < 0) {
+                goto out;
+            }
+        }
+    }
+
     ret = refresh_total_sectors(bs, offset >> BDRV_SECTOR_BITS);
     if (ret < 0) {
         error_setg_errno(errp, -ret, "Could not refresh total sector count");




reply via email to

[Prev in Thread] Current Thread [Next in Thread]