Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread

From:	Hanna Czenczek
Subject:	Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
Date:	Thu, 25 Jan 2024 10:06:51 +0100
User-agent:	Mozilla Thunderbird

On 24.01.24 22:53, Stefan Hajnoczi wrote:

On Wed, Jan 24, 2024 at 01:12:47PM +0100, Hanna Czenczek wrote:

On 23.01.24 18:10, Kevin Wolf wrote:

Am 23.01.2024 um 17:40 hat Hanna Czenczek geschrieben:

On 21.12.23 22:23, Kevin Wolf wrote:

From: Stefan Hajnoczi<stefanha@redhat.com>

Stop depending on the AioContext lock and instead access
SCSIDevice->requests from only one thread at a time:
- When the VM is running only the BlockBackend's AioContext may access
     the requests list.
- When the VM is stopped only the main loop may access the requests
     list.

These constraints protect the requests list without the need for locking
in the I/O code path.

Note that multiple IOThreads are not supported yet because the code
assumes all SCSIRequests are executed from a single AioContext. Leave
that as future work.

Signed-off-by: Stefan Hajnoczi<stefanha@redhat.com>
Reviewed-by: Eric Blake<eblake@redhat.com>
Message-ID:<20231204164259.1515217-2-stefanha@redhat.com>
Signed-off-by: Kevin Wolf<kwolf@redhat.com>
---
    include/hw/scsi/scsi.h |   7 +-
    hw/scsi/scsi-bus.c     | 181 ++++++++++++++++++++++++++++-------------
    2 files changed, 131 insertions(+), 57 deletions(-)

My reproducer forhttps://issues.redhat.com/browse/RHEL-3934  now breaks more
often because of this commit than because of the original bug, i.e. when
repeatedly hot-plugging and unplugging a virtio-scsi and a scsi-hd device,
this tends to happen when unplugging the scsi-hd:

{"execute":"device_del","arguments":{"id":"stg0"}}
{"return": {}}
qemu-system-x86_64: ../block/block-backend.c:2429: blk_get_aio_context:
Assertion `ctx == blk->ctx' failed.

[...]

I don't know anything about the problem either, but since you already
speculated about the cause, let me speculate about the solution:
Can we hold the graph writer lock for the tran_commit() call in
bdrv_try_change_aio_context()? And of course take the reader lock for
blk_get_aio_context(), but that should be completely unproblematic.

I tried this, and it’s not easy taking the lock just for tran_commit(),
because some callers of bdrv_try_change_aio_context() already hold the write
lock (specifically bdrv_attach_child_common(),
bdrv_attach_child_common_abort(), and bdrv_root_unref_child()[1]), and
qmp_x_blockdev_set_iothread() holds the read lock.  Other callers don’t hold
any lock[2].

So I’m not sure whether we should mark all of bdrv_try_change_aio_context()
as GRAPH_WRLOCK and then make all callers take the lock, or really only take
it for tran_commit(), and have callers release the lock around
bdrv_try_change_aio_context(). Former sounds better to naïve me.

(In any case, FWIW, having blk_set_aio_context() take the write lock, and
scsi_device_for_each_req_async_bh() take the read lock[3], does fix the
assertion failure.)

I wonder if a simpler solution is blk_inc_in_flight() in
scsi_device_for_each_req_async() and blk_dec_in_flight() in
scsi_device_for_each_req_async_bh() so that drain
waits for the BH.

There is a drain around the AioContext change, so as long as
scsi_device_for_each_req_async() was called before blk_set_aio_context()
and not _during_ aio_poll(), we would prevent inconsistent BB vs BDS
aio_contexts.


Actually, Kevin has suggested on IRC that we drop the whole drain. :)

Dropping the write lock in our outside of bdrv_try_change_aio_context()for callers that have already taken it seems unsafe, so the only optionwould be to make the whole function write-lock-able. The drainedsection can cause problems with that if it ends up wanting to reorganizethe graph, so AFAIU drain should never be done while under a writelock. This is already a problem now because there are three callersthat do call bdrv_try_change_aio_context() while under a write lock, soit seems like we shouldn’t keep the drain as-is.

So, Kevin suggested just dropping that drain, because I/O requests areno longer supposed to care about a BDS’s native AioContext anymoreanyway, so it seems like the need for the drain has gone away withmultiqueue. Then we could make the whole function GRAPH_WRLOCK.


Hanna

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/23
- Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Kevin Wolf, 2024/01/23
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/23
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/24
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Stefan Hajnoczi, 2024/01/24
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek <=
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Stefan Hajnoczi, 2024/01/25
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/25
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Kevin Wolf, 2024/01/26
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/26
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Stefan Hajnoczi, 2024/01/31
  - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/29
    - Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Kevin Wolf, 2024/01/31
- Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread, Hanna Czenczek, 2024/01/23

Prev by Date: Re: [PATCH 00/88] esp: rework ESP emulation to use a SCSI phase-based state machine
Next by Date: Re: qemu riscv, thead c906, Linux boot regression
Previous by thread: Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
Next by thread: Re: [PULL 11/33] scsi: only access SCSIDevice->requests from one thread
Index(es):
- Date
- Thread