[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-stable] [Qemu-block] [Qemu-devel] [PATCH v0 0/2] Postponed act
From: |
Denis Plotnikov |
Subject: |
Re: [Qemu-stable] [Qemu-block] [Qemu-devel] [PATCH v0 0/2] Postponed actions |
Date: |
Tue, 14 Aug 2018 10:08:10 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1 |
On 13.08.2018 19:30, Kevin Wolf wrote:
Am 13.08.2018 um 10:32 hat Denis Plotnikov geschrieben:
Ping ping!
On 16.07.2018 21:59, John Snow wrote:
On 07/16/2018 11:01 AM, Denis Plotnikov wrote:
Ping!
I never saw a reply to Stefan's question on July 2nd, did you reply
off-list?
--js
Yes, I did. I talked to Stefan why the patch set appeared.
The rest of us still don't know the answer. I had the same question.
Kevin
Yes, that's my fault. I should have post it earlier.
I reviewed the problem once again and come up with the following
explanation.
Indeed, if the global lock has been taken by the main thread the vCPU
threads won't be able to execute mmio ide.
But, if the main thread will release the lock then nothing will prevent
vCPU threads form execution what they want, e.g writing to the block device.
In case of running the mirroring it is possible. Let's take a look
at the following snippet of mirror_run. This is a part the mirroring
completion part.
bdrv_drained_begin(bs);
cnt = bdrv_get_dirty_count(s->dirty_bitmap);
>>>>>> if (cnt > 0 || mirror_flush(s) < 0) {
bdrv_drained_end(bs);
continue;
}
(X) >>>> assert(QLIST_EMPTY(&bs->tracked_requests));
mirror_flush here can yield the current coroutine so nothing more can be
executed.
We could end up with the situation when the main loop have to revolve to
poll for another timer/bh to process. While revolving it releases the
global lock. If the global lock is waited for by a vCPU (any other)
thread, the waiting thread will get the lock and make what it intends.
This is something that I can observe:
mirror_flush yields coroutine, the main thread revolves and locks
because a vCPU was waiting for the lock. Now the vCPU thread owns the
lock and the main thread waits for the lock releasing.
The vCPU thread does cmd_write_dma and releases the lock. Then, the main
thread gets the lock and continues to run eventually proceeding with the
coroutine yeiled.
If the vCPU requests aren't completed by the moment we will assert at
(X). If the vCPU requests are completed we won't even notice that we had
some writes while in the drained section.
Denis
On 29.06.2018 15:40, Denis Plotnikov wrote:
There are cases when a request to a block driver state shouldn't have
appeared producing dangerous race conditions.
This misbehaviour is usually happens with storage devices emulated
without eventfd for guest to host notifications like IDE.
The issue arises when the context is in the "drained" section
and doesn't expect the request to come, but request comes from the
device not using iothread and which context is processed by the main
loop.
The main loop apart of the iothread event loop isn't blocked by the
"drained" section.
The request coming and processing while in "drained" section can spoil
the
block driver state consistency.
This behavior can be observed in the following KVM-based case:
1. Setup a VM with an IDE disk.
2. Inside a VM start a disk writing load for the IDE device
e.g: dd if=<file> of=<file> bs=X count=Y oflag=direct
3. On the host create a mirroring block job for the IDE device
e.g: drive_mirror <your_IDE> <your_path>
4. On the host finish the block job
e.g: block_job_complete <your_IDE>
Having done the 4th action, you could get an assert:
assert(QLIST_EMPTY(&bs->tracked_requests)) from mirror_run.
On my setup, the assert is 1/3 reproducible.
The patch series introduces the mechanism to postpone the requests
until the BDS leaves "drained" section for the devices not using
iothreads.
Also, it modifies the asynchronous block backend infrastructure to use
that mechanism to release the assert bug for IDE devices.
Denis Plotnikov (2):
async: add infrastructure for postponed actions
block: postpone the coroutine executing if the BDS's is drained
block/block-backend.c | 58 ++++++++++++++++++++++++++++++---------
include/block/aio.h | 63 +++++++++++++++++++++++++++++++++++++++++++
util/async.c | 33 +++++++++++++++++++++++
3 files changed, 142 insertions(+), 12 deletions(-)
--
Best,
Denis
--
Best,
Denis