[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] backup bug or question
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: [Qemu-devel] backup bug or question |
Date: |
Sat, 10 Aug 2019 11:17:51 +0000 |
09.08.2019 23:13, John Snow wrote:
>
>
> On 8/9/19 9:18 AM, Vladimir Sementsov-Ogievskiy wrote:
>> Hi!
>>
>> Hmm, hacking around backup I have a question:
>>
>> What prevents guest write request after job_start but before setting
>> write notifier?
>>
>> code path:
>>
>> qmp_drive_backup or transaction with backup
>>
>> job_start
>> aio_co_enter(job_co_entry) /* may only schedule execution, isn't it
>> ? */
>>
>> ....
>>
>> job_co_entry
>> job_pause_point() /* it definitely yields, isn't it bad? */
>> job->driver->run() /* backup_run */
>>
>> ----
>>
>> backup_run()
>> bdrv_add_before_write_notifier()
>>
>> ...
>>
>
> I think you're right... :(
>
>
> We create jobs like this:
>
> job->paused = true;
> job->pause_count = 1;
>
>
> And then job_start does this:
>
> job->co = qemu_coroutine_create(job_co_entry, job);
> job->pause_count--;
> job->busy = true;
> job->paused = false;
>
>
> Which means that job_co_entry is being called before we lift the pause:
>
> assert(job && job->driver && job->driver->run);
> job_pause_point(job);
> job->ret = job->driver->run(job, &job->err);
>
> ...Which means that we are definitely yielding in job_pause_point.
>
> Yeah, that's a race condition waiting to happen.
>
>> And what guarantees we give to the user? Is it guaranteed that write
>> notifier is
>> set when qmp command returns?
>>
>> And I guess, if we start several backups in a transaction it should be
>> guaranteed
>> that the set of backups is consistent and correspond to one point in time...
>>
>
> I would have hoped that maybe the drain_all coupled with the individual
> jobs taking drain_start and drain_end would save us, but I guess we
> simply don't have a guarantee that all backup jobs WILL have installed
> their handler by the time the transaction ends.
>
> Or, if there is that guarantee, I don't know what provides it, so I
> think we shouldn't count on it accidentally working anymore.
>
>
>
> I think we should do two things:
>
> 1. Move the handler installation to creation time.
> 2. Modify backup_before_write_notify to return without invoking
> backup_do_cow if the job isn't started yet.
>
Hmm, I don't see, how it helps.. No-op write-notifier will not save as from
guest write, is it?
--
Best regards,
Vladimir