qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v3 1/9] block: add .bdrv_need_rw_file_child_duri


From: Vladimir Sementsov-Ogievskiy
Subject: Re: [Qemu-devel] [PATCH v3 1/9] block: add .bdrv_need_rw_file_child_during_reopen_rw handler
Date: Fri, 2 Aug 2019 16:41:50 +0000

02.08.2019 19:23, Vladimir Sementsov-Ogievskiy wrote:
> 02.08.2019 18:42, Kevin Wolf wrote:
>> Am 31.07.2019 um 14:09 hat Max Reitz geschrieben:
>>> On 25.07.19 11:18, Vladimir Sementsov-Ogievskiy wrote:
>>>> On reopen to rw parent may need rw access to child in .prepare, for
>>>> example qcow2 needs to write IN_USE flags into stored bitmaps
>>>> (currently it is done in a hacky way after commit and don't work).
>>>> So, let's introduce such logic.
>>>>
>>>> The drawback is that in worst case bdrv_reopen_set_read_only may finish
>>>> with error and in some intermediate state: some nodes reopened RW and
>>>> some are not. But this is a way to fix bug around reopening qcow2
>>>> bitmaps in the following commits.
>>>
>>> This commit message doesn’t really explain what this patch does.
>>>
>>>> Signed-off-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>> ---
>>>>   include/block/block_int.h |   2 +
>>>>   block.c                   | 144 ++++++++++++++++++++++++++++++++++----
>>>>   2 files changed, 133 insertions(+), 13 deletions(-)
>>>>
>>>> diff --git a/include/block/block_int.h b/include/block/block_int.h
>>>> index 3aa1e832a8..7bd6fd68dd 100644
>>>> --- a/include/block/block_int.h
>>>> +++ b/include/block/block_int.h
>>>> @@ -531,6 +531,8 @@ struct BlockDriver {
>>>>                                uint64_t parent_perm, uint64_t 
>>>> parent_shared,
>>>>                                uint64_t *nperm, uint64_t *nshared);
>>>> +     bool (*bdrv_need_rw_file_child_during_reopen_rw)(BlockDriverState 
>>>> *bs);
>>>> +
>>>>       /**
>>>>        * Bitmaps should be marked as 'IN_USE' in the image on reopening 
>>>> image
>>>>        * as rw. This handler should realize it. It also should unset 
>>>> readonly
>>>> diff --git a/block.c b/block.c
>>>> index cbd8da5f3b..3c8e1c59b4 100644
>>>> --- a/block.c
>>>> +++ b/block.c
>>>> @@ -1715,10 +1715,12 @@ static void 
>>>> bdrv_get_cumulative_perm(BlockDriverState *bs, uint64_t *perm,
>>>>                                        uint64_t *shared_perm);
>>>>   typedef struct BlockReopenQueueEntry {
>>>> -     bool prepared;
>>>> -     bool perms_checked;
>>>> -     BDRVReopenState state;
>>>> -     QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
>>>> +    bool reopened_file_child_rw;
>>>> +    bool changed_file_child_perm_rw;
>>>> +    bool prepared;
>>>> +    bool perms_checked;
>>>> +    BDRVReopenState state;
>>>> +    QSIMPLEQ_ENTRY(BlockReopenQueueEntry) entry;
>>>>   } BlockReopenQueueEntry;
>>>>   /*
>>>> @@ -3421,6 +3423,105 @@ BlockReopenQueue 
>>>> *bdrv_reopen_queue(BlockReopenQueue *bs_queue,
>>>>                                      keep_old_opts);
>>>>   }
>>>> +static int bdrv_reopen_set_read_only_drained(BlockDriverState *bs,
>>>> +                                             bool read_only,
>>>> +                                             Error **errp)
>>>> +{
>>>> +    BlockReopenQueue *queue;
>>>> +    QDict *opts = qdict_new();
>>>> +
>>>> +    qdict_put_bool(opts, BDRV_OPT_READ_ONLY, read_only);
>>>> +
>>>> +    queue = bdrv_reopen_queue(NULL, bs, opts, true);
>>>> +
>>>> +    return bdrv_reopen_multiple(queue, errp);
>>>> +}
>>>> +
>>>> +/*
>>>> + * handle_recursive_reopens
>>>> + *
>>>> + * On fail it needs rollback_recursive_reopens to be called.
>>>
>>> It would be nice if this description actually said anything about what
>>> the function is supposed to do.
>>>
>>>> + */
>>>> +static int handle_recursive_reopens(BlockReopenQueueEntry *current,
>>>> +                                    Error **errp)
>>>> +{
>>>> +    int ret;
>>>> +    BlockDriverState *bs = current->state.bs;
>>>> +
>>>> +    /*
>>>> +     * We use the fact that in reopen-queue children are always following
>>>> +     * parents.
>>>> +     * TODO: Switch BlockReopenQueue to be QTAILQ and use
>>>> +     *       QTAILQ_FOREACH_REVERSE.
>>>
>>> Why don’t you do that first?  It would make the code more obvious at
>>> least to me.
>>>
>>>> +     */
>>>> +    if (QSIMPLEQ_NEXT(current, entry)) {
>>>> +        ret = handle_recursive_reopens(QSIMPLEQ_NEXT(current, entry), 
>>>> errp);
>>>> +        if (ret < 0) {
>>>> +            return ret;
>>>> +        }
>>>> +    }
>>>> +
>>>> +    if ((current->state.flags & BDRV_O_RDWR) && bs->file && bs->drv &&
>>>> +        bs->drv->bdrv_need_rw_file_child_during_reopen_rw &&
>>>> +        bs->drv->bdrv_need_rw_file_child_during_reopen_rw(bs))
>>>> +    {
>>>> +        if (!bdrv_is_writable(bs->file->bs)) {
>>>> +            ret = bdrv_reopen_set_read_only_drained(bs->file->bs, false, 
>>>> errp);
>>>
>>> Hm.  Sorry, I find this all a bit hard to understand.  (No comments and
>>> all.)
>>>
>>> I understand that this is for an RO -> RW transition?  Everything is
>>> still RO, but the parent will need an RW child before it transitions to
>>> RW itself.
>>>
>>>
>>> I’m going to be honest up front, I don’t like this very much.  But I
>>> think it may be a reasonable solution for now.
>>>
>>> As I remember, the problem was that when reopening a qcow2 node from RO
>>> to RW, we need to write something in .prepare() (because it can fail),
>>> but naturally no .prepare() is called after any .commit(), so no matter
>>> the order of nodes in the ReopenQueue, the child node will never be RW
>>> by this point.
>>>
>>> Hm.  To me that mostly means that making the whole reopen process a
>>> transaction was just a dream that turns out not to work.
>>
>> This patch already looks somewhat complicated to me, and what you're
>> proposing below sounds another order of magnitude more complex.
> 
> Agree :) However, at this point I've almost implemented it (it's not a reason
> to chose more complex way, of course).
> 
>>
>> But I think the important point is your last sentence above. Once we
>> acknowledge that we can't possibly maintain full transaction semantics,
>> I'll ask this naive question: What prevents us from just keeping the
>> additional write in .commit?
> 
> Hmm, it's what we've started with. The only thing to do is to reverse order
> of commits, to commit file child before parent (and this way it works in
> Virtuozzo now).
> And I proposed it long ago:
> https://lists.gnu.org/archive/html/qemu-devel/2018-06/msg04528.html
> 
> 
>>
>> It is expected to work in the common case, so we're only talking about
>> the behaviour in error cases anyway. If something fails and we can't do
>> things in a transactionable way, we need to decide what the surprise
>> result will look like, because we can neither guarantee a rollback nor
>> successful completion.
>>
>> If the write fails unexpectedly, and we end up with a qcow2 image that
>> is opened r/w, but has read-only bitmaps - wouldn't that be a reasonable
>> result? It seems much easier to explain than some dependency subchain
>> already being committed and the rest rolled back.
> 
> And this is how it works now (except it doesn't work because of commit order).
> In our long ago conversation (link above) you pointed that the problem here
> is that we don't return an error from actually failed commit and it is
> ignored..
> 
>>
>> So, I guess my question is, what is the specifc scenario you're trying
>> to fix with this series (why isn't the final patch a test case that
>> would answer this question?), and are we sure that the cure isn't worse
>> than the disease?
>>
> 
> Test appears at 03 and tests what already works, and lacks test-cases which
> are broken, and they are added in 08 when all is fixed.
> 
> And here are two things to fix:
> First is that we lose bitmaps on reopening to RO and it is described and
> fixed in 06.
> Second is that we cant set IN_USE flag when reopening to RW and it is fixed
> finally in 08.
> 
> 
> =====
> 
> So, if we decide to keep things simple, what to do? Just reorder commits to
> satisfy dependencies, if any?
> 
> Should we add return code to commit, which should always be 0 except very rare
> case?
> 
> 


And another idea is to postpone IN_USE setting until we really need to modify 
the
bitmap.. Not sure that it is simpler.

-- 
Best regards,
Vladimir

reply via email to

[Prev in Thread] Current Thread [Next in Thread]