[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: aio_wait_bh_oneshot() thread-safety question
From: |
Vladimir Sementsov-Ogievskiy |
Subject: |
Re: aio_wait_bh_oneshot() thread-safety question |
Date: |
Tue, 24 May 2022 16:46:21 +0300 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.8.1 |
On 5/24/22 15:40, Kevin Wolf wrote:
Am 24.05.2022 um 09:08 hat Paolo Bonzini geschrieben:
On 5/23/22 18:04, Vladimir Sementsov-Ogievskiy wrote:
I have a doubt about how aio_wait_bh_oneshot() works. Exactly, I see
that data->done is not accessed atomically, and doesn't have any barrier
protecting it..
Is following possible:
main-loop iothread
|
aio_wait_bh_oneshot() |
aio_bh_schedule_oneshot() |
| handle bh:
| 1. set data->done = true
| 2. call aio_wait_kick(), inserting the
| dummy bh into main context
|
... in AIO_WAIT_WHILE():
handle dummy bh, go to next
iteration, but still read
data->done=false due to some
processor data reordering,
go to next iteration of polling
and hang
Yes, barriers are missing:
https://lore.kernel.org/qemu-devel/You6FburTi7gVyxy@stefanha-x1.localdomain/T/#md97146c6eae1fce2ddd687fdc3f2215eee03f6f4
It seems like the issue was never observed, at least on x86.
Why is the barrier in aio_bh_enqueue() not enough? Is the comment there
wrong?
aio_notify() has another barrier. This is a little bit too late, but if
I misunderstood the aio_bh_enqueue() one, it could explain why it was
never observed.
Kevin
I'd consider two cases:
1. aio_wait_kick() reads num_waiters as 0 and don't schedule any BH into main
ctx.
In this case aio_wait_kick() only do one atomic operation:
qatomic_read(&global_aio_wait.num_waiters), which is not a barrier as I
understand.
So, data->done=true may be reordered with this operation.
main-loop iothread
aio_wait_bh_oneshot() |
aio_bh_schedule_oneshot() |
| atomic read num_waiters = 0 => don't kick
AIO_WAIT_WHILE |
atomic inc num_waiters |
read done = false, go |
into blocking aio_poll() |
| set data->done = true # reordered to the
end
| - but that doesn't help to wake main loop
For this case, iothread just don't call aio_bh_enqueue() and aio_notify(), so
any barriers in them doesn't help
2. aio_wait_kick() reads num_waiters>0 and do schedule BH
In this case it seems you are right: if main-loop dequeued dummy BH, it should be
guaranteed that after handling this BH the main loop will see data->done=true..
That's if the comment is correct, hope it is. At least it corresponds to what I've
read here : https://www.kernel.org/doc/Documentation/atomic_t.txt . How much
generic this information is - I don't know.
In 2.12 there was no enque() deque() functions, but there was smp_wmb() in
aio_bh_schedule_oneshot(), paired with atomic_xchg() in aio_bh_poll(), with
similar comment about implicit barrier.
--
Best regards,
Vladimir