qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates


From: Denis V. Lunev
Subject: Re: [PATCH v2 03/21] preallocate: Don't poll during permission updates
Date: Mon, 9 Oct 2023 11:28:13 +0200
User-agent: Mozilla Thunderbird

On 10/6/23 20:10, Vladimir Sementsov-Ogievskiy wrote:
On 06.10.23 11:56, Kevin Wolf wrote:
Am 05.10.2023 um 21:55 hat Vladimir Sementsov-Ogievskiy geschrieben:
On 11.09.23 12:46, Kevin Wolf wrote:
When the permission related BlockDriver callbacks are called, we are in the middle of an operation traversing the block graph. Polling in such a
place is a very bad idea because the graph could change in unexpected
ways. In the future, callers will also hold the graph lock, which is
likely to turn polling into a deadlock.

So we need to get rid of calls to functions like bdrv_getlength() or
bdrv_truncate() there as these functions poll internally. They are
currently used so that when no parent has write/resize permissions on
the image any more, the preallocate filter drops the extra preallocated
area in the image file and gives up write/resize permissions itself.

In order to achieve this without polling in .bdrv_check_perm, don't
immediately truncate the image, but only schedule a BH to do so. The
filter keeps the write/resize permissions a bit longer now until the BH
has executed.

There is one case in which delaying doesn't work: Reopening the image
read-only. In this case, bs->file will likely be reopened read-only,
too, so keeping write permissions a bit longer on it doesn't work. But
we can already cover this case in preallocate_reopen_prepare() and not
rely on the permission updates for it.

Hmm, now I found one more "future" case.

I now try to rebase my "[PATCH v7 0/7] blockdev-replace"
https://patchew.org/QEMU/20230421114102.884457-1-vsementsov@yandex-team.ru/

And it breaks after this commit.

By accident, blockdev-replace series uses exactly "preallocate" filter
to test insertion/removing of filters. And removing is broken now.

Removing is done as follows:

1. We have filter inserted: disk0 -file-> filter -file-> file0

2. blockdev-replace, replaces file child of disk0, so we should get the picture*: disk0 -file-> file0 <-file- filter

3. blockdev-del filter


But step [2] fails, as now preallocate filter doesn't drop permissions
during the operation (postponing this for a while) and the picture* is
impossible. Permission check fails.

Hmmm... Any idea how blockdev-replace and preallocate filter should
work :) ? Maybe, doing truncation in .drain_begin() will help? Will
try

Hm... What preallocate tries to do is really tricky...

Of course, the error is correct, this is an invalid configuration if
preallocate can still resize the image. So it would have to truncate the
file earlier, but the first time that preallocate knows of the change is
already too late to run requests.

Truncating on drain_begin feels more like a hack, but as long as it does
the job... Of course, this will have the preallocation truncated away on
events that have nothing to do with removing the filter. It's not
necessarily a disaster because preallocation is only an optimisation,
but it doesn't feel great.

Hmm, yes, that's not good.


Maybe let's take a step back: Which scenario is the preallocate driver
meant for and why do we even need to truncate the image file after
removing the filter? I suppose the filter doesn't make sense with raw
images because these are fixed size anyway, and pretty much any other
image format should be able to tolerate a permanently rounded up file
size. As long as you don't write to the preallocated area, it shouldn't
take space either on any sane filesystem.

Hmm, actually both VHD and VMDK can have footers, better avoid it with
those... But if truncating the image file on close is critical, what do
you do on crashes? Maybe preallocate should just not be considered
compatible with these formats?


Originally preallocate filter was made to be used with qcow2, on some proprietary storage, where:

1. Allocating of big chunk works a lot fater than allocating several smaller chunks 2. Holes are not free and/or file length is not free, so we really want to truncate the file back on close

Den, correct me if I'm wrong.

1. Absolutely correct. This is true when the file attributes
    are stored in a centralized place aka metadata storage
    and requests to it does not scale well.

2. This is at my opinion has different meaning. We have
    tried to make local storage behavior and distributed
    storage behavior to be the same when VM is off, i.e.
    the file should be in the same state (no free blocks
    at the end of the file).


Good thing is that in this scenario we don't need to remove the filter in runtime, so there is no problem.

Yes, this filter is not dynamic in that respect. It is either
here or not here.



Now I think that the generic solution is just add a new handler .bdrv_pre_replace, so blockdev-replace may work as follows:

drain_begin

call .bdrv_pre_replace for all affected nodes

do the replace

drain_end

And prellocate filter would do truncation in this .bdrv_pre_replace handler and set some flag, that we have nothing to trunctate (the flag is automatically cleared on .drained_end handler). Then during permission update if we see that nothing-to-truncate flag, we can drop permissions immediately.

But this difficulty may be postponed, and I can just document that preallocate filter doesn't support removing in runtime (and modify the test to use another filter, or just not to remove preallocate filter).


Den



reply via email to

[Prev in Thread] Current Thread [Next in Thread]