On Thu, Jul 23, 2020 at 6:12 PM Arik Hadas <ahadas@redhat.com> wrote:
The best place for this question is qemu-discuss, and CC Kevin and Stefan
(author of qemu-img measure).
> @Nir Soffer does the following make any sense to you:
>
> [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img info 73dde1fc-71c1-431a-8762-c2e71ec4cb93
> image: 73dde1fc-71c1-431a-8762-c2e71ec4cb93
> file format: raw
> virtual size: 15 GiB (16106127360 bytes)
> disk size: 8.65 GiB
>
> [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O qcow2 73dde1fc-71c1-431a-8762-c2e71ec4cb93
> required size: 16108814336
> fully allocated size: 16108814336
This means the file system does not report sparseness info, and without
information qemu-img cannot give a safe estimate.
I can reproduce this on NFS 3:
$ mount | grep export/2
nfs1:/export/2 on /rhev/data-center/mnt/nfs1:_export_2 type nfs
(rw,relatime,vers=3,rsize=262144,wsize=262144,namlen=255,soft,nolock,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,mountaddr=192.168.122.30,mountvers=3,mountport=20048,mountproto=udp,local_lock=all,addr=192.168.122.30)
$ cd /rhev/data-center/mnt/nfs1:_export_2
$ truncate -s 1g empty.img
$ qemu-img measure -O qcow2 empty.img
required size: 1074135040
fully allocated size: 1074135040
$ qemu-img map --output json empty.img
[{ "start": 0, "length": 1073741824, "depth": 0, "zero": false,
"data": true, "offset": 0}]
If we run qemu-img measure with strace, we can see:
$ strace qemu-img measure -O qcow2 empty.img 2>&1 | grep SEEK_HOLE
lseek(9, 0, SEEK_HOLE) = 1073741824
This means the byte range from 0 to 1073741824 is data.
If we do the same on NFS 4.2:
$ mount | grep export/1
nfs1:/export/1 on /rhev/data-center/mnt/nfs1:_export_1 type nfs4
(rw,relatime,vers=4.2,rsize=262144,wsize=262144,namlen=255,soft,nosharecache,proto=tcp,timeo=100,retrans=3,sec=sys,clientaddr=192.168.122.23,local_lock=none,addr=192.168.122.30)
$ cd /rhev/data-center/mnt/nfs1\:_export_1
$ qemu-img measure -O qcow2 empty.img
required size: 393216
fully allocated size: 1074135040
Unfortunately oVirt default is not NFS 4.2 yet, and we even warn about
changing the stupid default.
> qemu-img convert -f raw -O qcow2 73dde1fc-71c1-431a-8762-c2e71ec4cb93 /tmp/arik.qcow2
qemu-img convert detects zeros in the input file, so it can cope with
no sparseness info.
This is not free of course, copying this image is much slower when we
have to read the entire
image.
It would have been great if 'measure' could also have such an ability to take zeros into account as the 'convert',
even if it means longer execution time - otherwise when we export VMs to OVAs on such file systems, we may end up allocating the virtual size within the OVA (at least when base volume is a raw volume).
> [root@lion01 8c98c94d-bc14-4e24-b89c-4d96c820056d]# qemu-img measure -O qcow2 /tmp/arik.qcow2
> required size: 9359720448
> fully allocated size: 16108814336
Now we have qcow2 image, so we don't depend on the file system capabilities.
This is the advantage of using advanced file format.
> shouldn't the 'measure' command be a bit smarter than that? :)
I think it cannot be smarter, but maybe qemu folks have a better answer.
To measure, qemu-img needs to know how the data is laid out on disk, to compute
the number of clusters in the qcow2 image. Without help from the
filesystem the only
way to do this is to read the entire image.
The solution in oVirt is to allocate the required size (possibly
overallocating) and after
conversion was finished, reduce the volume to the required size using:
http://ovirt.github.io/ovirt-engine-sdk/4.4/services.m.html#ovirtsdk4.services.StorageDomainDiskService.reduce
This is much faster than reading the entire image twice.
That's sort of what we've started with - creating temporary volumes that were then copied to the OVA
But this took long time and consumed space on the storage domains so at some point we switched to use the 'measure' command - thinking it would give us the same result as if it was invoked on the 'collapsed' qcow2 volume...
I guess the apparent size of the 'collapsed' qcow2 volume will be closer to the disk size than to the virtual size - would it make more sense maybe to allocate the space within the OVA according to the disk size (with some buffers) then ?
Nir