[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PULL v2 6/8] vmdk: Add read-only support for seSparse
From: |
Max Reitz |
Subject: |
Re: [Qemu-devel] [PULL v2 6/8] vmdk: Add read-only support for seSparse snapshots |
Date: |
Tue, 2 Jul 2019 21:46:15 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.7.0 |
On 02.07.19 21:13, Peter Maydell wrote:
> On Mon, 24 Jun 2019 at 15:48, Max Reitz <address@hidden> wrote:
>>
>> From: Sam Eiderman <address@hidden>
>>
>> Until ESXi 6.5 VMware used the vmfsSparse format for snapshots (VMDK3 in
>> QEMU).
>>
>> This format was lacking in the following:
>>
>> * Grain directory (L1) and grain table (L2) entries were 32-bit,
>> allowing access to only 2TB (slightly less) of data.
>> * The grain size (default) was 512 bytes - leading to data
>> fragmentation and many grain tables.
>> * For space reclamation purposes, it was necessary to find all the
>> grains which are not pointed to by any grain table - so a reverse
>> mapping of "offset of grain in vmdk" to "grain table" must be
>> constructed - which takes large amounts of CPU/RAM.
>>
>
> Hi; this change has confused Coverity a bit (CID 1402786):
>
>> @@ -481,7 +529,7 @@ static int vmdk_init_tables(BlockDriverState *bs,
>> VmdkExtent *extent,
>> int i;
>>
>> /* read the L1 table */
>> - l1_size = extent->l1_size * sizeof(uint32_t);
>> + l1_size = extent->l1_size * extent->entry_size;
>> extent->l1_table = g_try_malloc(l1_size);
>> if (l1_size && extent->l1_table == NULL) {
>> return -ENOMEM;
>
>> }
>>
>> if (extent->l1_backup_table_offset) {
>> + assert(!extent->sesparse);
>> extent->l1_backup_table = g_try_malloc(l1_size);
>> if (l1_size && extent->l1_backup_table == NULL) {
>> ret = -ENOMEM;
>
> Here we allocate extent->l1_backup_table via g_try_malloc(),
> and our "ENOMEM" guard checks l1_size. However later on we do:
>
> for (i = 0; i < extent->l1_size; i++) {
> le32_to_cpus(&extent->l1_backup_table[i]);
> }
>
> which is a dereference of l1_backup_table whose guard
> is looking at extent->l1_size. Previously Coverity could
> (probably) see this was OK because we were setting
> l1_size = extent->l1_size * sizeof(uint32_t)
> so l1_size is 0 if and only if extent->l1_size is zero.
> But now there's another possibility because we have changed
> the initialization to
> l1_size = extent->l1_size * extent->entry_size;
> so if extent->entry_size is zero then l1_size could be 0
> (getting us past the ENOMEM check) but extent->l1_size
> non-zero (causing us to try to dereference a NULL l1_backup_table
> pointer).
>
> This can't in fact happen because if extent->l1_size is
> non-zero we'll assert that extent->entry_size is
> either 8 or 4, but the logic has become sufficiently
> convoluted that it's no longer clear to Coverity that this
> is an impossible case. I could mark this as a false positive,
> but maybe there's a way of rephrasing this code that makes
> the logic a little clearer to both humans and robots?
There is also the fact that entry_size is initialized to 4
(sizeof(uint32_t)) and updated to 8 (sizeof(uint64_t)) when the image is
in the seSparse subformat. So entry_size can only ever be 4 or 8.
The only substantial thing I could imagine that might in some way make
it clearer would be to drop entry_size and always decide based on
extent->sesparse (because extent->entry_size == (extent->sesparse ? 8 : 4)).
But to me personally, that would not really be clearer. I think it’s
reasonable to store the size of L1 entries in an extent attribute, and
remember that it’s always 4 or 8.
Of course, there is also the less substantial thing, which is adding a
comment “Size of L1 entries, can only be sizeof(uint32_t) or
sizeof(uint64_t)” above the entry_size definition.
I think Coverity is actually confused by aliasing. The report says this:
> 1. Condition l1_size, taking false branch.
So l1_size must be 0, which means either extent->l1_size or
extent->entry_size must be 0.
It then says:
> 3. Condition i < extent->l1_size, taking true branch.
So extent->l1_size must be greater than 0. (To be exact, it prints this
two times, so extent->l1_size must be 2.)
This is followed by:
> 4. Condition extent->entry_size == 8UL /* sizeof (uint64_t) */, taking true
> branch.
Er, so, extent->entry_size is 8?
That actually makes l1_size 16. So somewhere between conditions 1 and
3/4, *extent must have changed its contents.
So it looks to me like Coverity just thinks that *extent may be used
concurrently. Short of adding a “restrict”, I don’t know what to do
this but to close the report as a false positive.
> Supplementary:
>
> (1) this code:
> for (i = 0; i < extent->l1_size; i++) {
> if (extent->entry_size == sizeof(uint64_t)) {
> le64_to_cpus((uint64_t *)extent->l1_table + i);
> } else {
> assert(extent->entry_size == sizeof(uint32_t));
> le32_to_cpus((uint32_t *)extent->l1_table + i);
> }
> }
> only asserts that extent->entry_size is sane if the l1_size
> is non-zero, and it does it inside the loop so we assert
> once for each entry. Hiding the assert like this might be
> part of what's confusing Coverity.
>
> (2) we allocate the l1_backup_table() with a size of l1_size,
> which will differ depending on extent->entry_size; but then
> we do the endianness conversion on it using
> for (i = 0; i < extent->l1_size; i++) {
> le32_to_cpus(&extent->l1_backup_table[i]);
> }
> which assumes always 32-bit entries. Is that correct?
Yes, because 64 bit entries only occur for the seSparse format. At
least our current (RO) implementation does not use a backup table when
accessing seSparse images. (Hence the assertion against that at the
beginning of the enclosing if block.)
Max
signature.asc
Description: OpenPGP digital signature