[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-stable] [Qemu-block] [Qemu-devel] [PATCH] block/backup: instal
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-stable] [Qemu-block] [Qemu-devel] [PATCH] block/backup: install notifier during creation |
Date: |
Thu, 19 Sep 2019 07:11:51 +0000 |
18.09.2019 23:31, John Snow wrote:
>
>
> On 9/10/19 9:23 AM, John Snow wrote:
>>
>>
>> On 9/10/19 4:19 AM, Stefan Hajnoczi wrote:
>>> On Wed, Aug 21, 2019 at 04:01:52PM -0400, John Snow wrote:
>>>>
>>>>
>>>> On 8/21/19 10:41 AM, Vladimir Sementsov-Ogievskiy wrote:
>>>>> 09.08.2019 23:13, John Snow wrote:
>>>>>> Backup jobs may yield prior to installing their handler, because of the
>>>>>> job_co_entry shim which guarantees that a job won't begin work until
>>>>>> we are ready to start an entire transaction.
>>>>>>
>>>>>> Unfortunately, this makes proving correctness about transactional
>>>>>> points-in-time for backup hard to reason about. Make it explicitly clear
>>>>>> by moving the handler registration to creation time, and changing the
>>>>>> write notifier to a no-op until the job is started.
>>>>>>
>>>>>> Reported-by: Vladimir Sementsov-Ogievskiy <address@hidden>
>>>>>> Signed-off-by: John Snow <address@hidden>
>>>>>> ---
>>>>>> block/backup.c | 32 +++++++++++++++++++++++---------
>>>>>> include/qemu/job.h | 5 +++++
>>>>>> job.c | 2 +-
>>>>>> 3 files changed, 29 insertions(+), 10 deletions(-)
>>>>>>
>>>>>> diff --git a/block/backup.c b/block/backup.c
>>>>>> index 07d751aea4..4df5b95415 100644
>>>>>> --- a/block/backup.c
>>>>>> +++ b/block/backup.c
>>>>>> @@ -344,6 +344,13 @@ static int coroutine_fn backup_before_write_notify(
>>>>>> assert(QEMU_IS_ALIGNED(req->offset, BDRV_SECTOR_SIZE));
>>>>>> assert(QEMU_IS_ALIGNED(req->bytes, BDRV_SECTOR_SIZE));
>>>>>>
>>>>>> + /* The handler is installed at creation time; the actual
>>>>>> point-in-time
>>>>>> + * starts at job_start(). Transactions guarantee those two points
>>>>>> are
>>>>>> + * the same point in time. */
>>>>>> + if (!job_started(&job->common.job)) {
>>>>>> + return 0;
>>>>>> + }
>>>>>
>>>>> Hmm, sorry if it is a stupid question, I'm not good in multiprocessing
>>>>> and in
>>>>> Qemu iothreads..
>>>>>
>>>>> job_started just reads job->co. If bs runs in iothread, and therefore
>>>>> write-notifier
>>>>> is in iothread, when job_start is called from main thread.. Is it
>>>>> guaranteed that
>>>>> write-notifier will see job->co variable change early enough to not miss
>>>>> guest write?
>>>>> Should not job->co be volatile for example or something like this?
>>>>>
>>>>> If not think about this patch looks good for me.
>>>>>
>>>>
>>>> You know, it's a really good question.
>>>> So good, in fact, that I have no idea.
>>>>
>>>> ¯\_(ツ)_/¯
>>>>
>>>> I'm fairly certain that IO will not come in until the .clean phase of a
>>>> qmp_transaction, because bdrv_drained_begin(bs) is called during
>>>> .prepare, and we activate the handler (by starting the job) in .commit.
>>>> We do not end the drained section until .clean.
>>>>
>>>> I'm not fully clear on what threading guarantees we have otherwise,
>>>> though; is it possible that "Thread A" would somehow lift the bdrv_drain
>>>> on an IO thread ("Thread B") and, after that, "Thread B" would somehow
>>>> still be able to see an outdated version of job->co that was set by
>>>> "Thread A"?
>>>>
>>>> I doubt it; but I can't prove it.
>>>
>>> In the qmp_backup() case (not qmp_transaction()) there is:
>>>
>>> void qmp_drive_backup(DriveBackup *arg, Error **errp)
>>> {
>>>
>>> BlockJob *job;
>>> job = do_drive_backup(arg, NULL, errp);
>>> if (job) {
>>> job_start(&job->job);
>>> }
>>> }
>>>
>>> job_start() is called without any thread synchronization, which is
>>> usually fine because the coroutine doesn't run until job_start() calls
>>> aio_co_enter().
>>>
>>> Now that the before write notifier has been installed early, there is
>>> indeed a race between job_start() and the write notifier accessing
>>> job->co from an IOThread.
>>>
>>> The write before notifier might see job->co != NULL before job_start()
>>> has finished. This could lead to issues if job_*() APIs are invoked by
>>> the write notifier and access an in-between job state.
>>>
>>
>> I see. I think in this case, as long as it sees != NULL, that the
>> notifier is actually safe to run. I agree that this might be confusing
>> to verify and could bite us in the future. The worry we had, too, is
>> more the opposite: will it see NULL for too long? We want to make sure
>> that it is registering as true *before the first yield*.
>>
>>> A safer approach is to set a BackupBlockJob variable at the beginning of
>>> backup_run() and check it from the before write notifier.
>>>
>>
>> That's too late, for reasons below.
>>
>>> That said, I don't understand the benefit of this patch and IMO it makes
>>> the code harder to understand because now we need to think about the
>>> created but not started state too.
>>>
>>> Stefan
>>>
>>
>> It's always possible I've hyped myself up into believing there's a
>> problem where there isn't one, but the fear is this:
>>
>> The point in time from a QMP transaction covers the job creation and the
>> job start, but when we start the job it will actually yield before we
>> get to backup_run -- and there is no guarantee that the handler will get
>> installed synchronously, so the point in time ends before the handler
>> activates.
>>
>
> i.e., the handler might get installed AFTER the critical region of a
> transaction. We could drop initial writes if we were unlucky.
>
> (I think.)
>
>> The yield occurs in job_co_entry as an intentional feature of forcing a
>> yield and pause point at run time -- so it's harder to write a job that
>> accidentally hogs the thread during initialization.
>>
>> This is an attempt to get the handler installed earlier to ensure the
>> point of time stays synchronized with creation time to provide a
>> stronger transactional guarantee.
>>
>
> Squeaky wheel gets the grease. Any comment?
>
Hmm, this all becomes difficult, I'd prefer to not worry and wait for backup-top
filter applied.
--
Best regards,
Vladimir