qemu-stable
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-stable] [Qemu-block] [Qemu-devel] [PATCH v0 0/2] Postponed act


From: Denis Plotnikov
Subject: Re: [Qemu-stable] [Qemu-block] [Qemu-devel] [PATCH v0 0/2] Postponed actions
Date: Tue, 14 Aug 2018 10:08:10 +0300
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1



On 13.08.2018 19:30, Kevin Wolf wrote:
Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
Ping ping!

On 16.07.2018 21:59, John Snow wrote:


On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
Ping!


I never saw a reply to Stefan's question on July 2nd, did you reply
off-list?

--js
Yes, I did. I talked to Stefan why the patch set appeared.

The rest of us still don't know the answer. I had the same question.

Kevin
Yes, that's my fault. I should have post it earlier.

I reviewed the problem once again and come up with the following explanation. Indeed, if the global lock has been taken by the main thread the vCPU threads won't be able to execute mmio ide.
But, if the main thread will release the lock then nothing will prevent
vCPU threads form execution what they want, e.g writing to the block device.

In case of running the mirroring it is possible. Let's take a look
at the following snippet of mirror_run. This is a part the mirroring completion part.

            bdrv_drained_begin(bs);
            cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>>>>>      if (cnt > 0 || mirror_flush(s) < 0) {
                bdrv_drained_end(bs);
                continue;
            }

(X) >>>>    assert(QLIST_EMPTY(&bs->tracked_requests));

mirror_flush here can yield the current coroutine so nothing more can be executed. We could end up with the situation when the main loop have to revolve to poll for another timer/bh to process. While revolving it releases the global lock. If the global lock is waited for by a vCPU (any other) thread, the waiting thread will get the lock and make what it intends.

This is something that I can observe:

mirror_flush yields coroutine, the main thread revolves and locks because a vCPU was waiting for the lock. Now the vCPU thread owns the lock and the main thread waits for the lock releasing.
The vCPU thread does cmd_write_dma and releases the lock. Then, the main
thread gets the lock and continues to run eventually proceeding with the coroutine yeiled. If the vCPU requests aren't completed by the moment we will assert at (X). If the vCPU requests are completed we won't even notice that we had some writes while in the drained section.

Denis

On 29.06.2018 15:40, Denis Plotnikov wrote:
There are cases when a request to a block driver state shouldn't have
appeared producing dangerous race conditions.
This misbehaviour is usually happens with storage devices emulated
without eventfd for guest to host notifications like IDE.

The issue arises when the context is in the "drained" section
and doesn't expect the request to come, but request comes from the
device not using iothread and which context is processed by the main
loop.

The main loop apart of the iothread event loop isn't blocked by the
"drained" section.
The request coming and processing while in "drained" section can spoil
the
block driver state consistency.

This behavior can be observed in the following KVM-based case:

1. Setup a VM with an IDE disk.
2. Inside a VM start a disk writing load for the IDE device
     e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
3. On the host create a mirroring block job for the IDE device
     e.g: drive_mirror <your_IDE> <your_path>
4. On the host finish the block job
     e.g: block_job_complete <your_IDE>
    Having done the 4th action, you could get an assert:
assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
On my setup, the assert is 1/3 reproducible.

The patch series introduces the mechanism to postpone the requests
until the BDS leaves "drained" section for the devices not using
iothreads.
Also, it modifies the asynchronous block backend infrastructure to use
that mechanism to release the assert bug for IDE devices.

Denis Plotnikov (2):
     async: add infrastructure for postponed actions
     block: postpone the coroutine executing if the BDS's is drained

    block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
    include/block/aio.h   | 63 +++++++++++++++++++++++++++++++++++++++++++
    util/async.c          | 33 +++++++++++++++++++++++
    3 files changed, 142 insertions(+), 12 deletions(-)



--
Best,
Denis

--
Best,
Denis



reply via email to

[Prev in Thread] Current Thread [Next in Thread]