[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 2/4] block-backend: fix edge case in bdrv_next() where BDS
From: |
Fiona Ebner |
Subject: |
Re: [PATCH v3 2/4] block-backend: fix edge case in bdrv_next() where BDS associated to BB changes |
Date: |
Tue, 4 Jun 2024 09:58:08 +0200 |
User-agent: |
Mozilla Thunderbird |
Am 03.06.24 um 18:21 schrieb Kevin Wolf:
> Am 03.06.2024 um 16:17 hat Fiona Ebner geschrieben:
>> Am 26.03.24 um 13:44 schrieb Kevin Wolf:
>>>
>>> The fix for bdrv_flush_all() is probably to make it bdrv_co_flush_all()
>>> with a coroutine wrapper so that the graph lock is held for the whole
>>> function. Then calling bdrv_co_flush() while iterating the list is safe
>>> and doesn't allow concurrent graph modifications.
>>
>> The second is that iotest 255 ran into an assertion failure upon QMP 'quit':
>>
>>> ../block/graph-lock.c:113: bdrv_graph_wrlock: Assertion
>>> `!qemu_in_coroutine()' failed.
>>
>> Looking at the backtrace:
>>
>>> #5 0x0000762a90cc3eb2 in __GI___assert_fail
>>> (assertion=0x5afb07991e7d "!qemu_in_coroutine()", file=0x5afb07991e00
>>> "../block/graph-lock.c", line=113, function=0x5afb07991f20
>>> <__PRETTY_FUNCTION__.4> "bdrv_graph_wrlock")
>>> at ./assert/assert.c:101
>>> #6 0x00005afb07585311 in bdrv_graph_wrlock () at ../block/graph-lock.c:113
>>> #7 0x00005afb07573a36 in blk_remove_bs (blk=0x5afb0af99420) at
>>> ../block/block-backend.c:901
>>> #8 0x00005afb075729a7 in blk_delete (blk=0x5afb0af99420) at
>>> ../block/block-backend.c:487
>>> #9 0x00005afb07572d88 in blk_unref (blk=0x5afb0af99420) at
>>> ../block/block-backend.c:547
>>> #10 0x00005afb07572fe8 in bdrv_next (it=0x762a852fef00) at
>>> ../block/block-backend.c:618
>>> #11 0x00005afb0758cd65 in bdrv_co_flush_all () at ../block/io.c:2347
>>> #12 0x00005afb0753ba37 in bdrv_co_flush_all_entry (opaque=0x7ffff12c6050)
>>> at block/block-gen.c:1391
>>> #13 0x00005afb0773bf41 in coroutine_trampoline (i0=168365184, i1=23291)
>>
>> So I guess calling bdrv_next() is not safe from a coroutine, because
>> the function doing the iteration could end up being the last thing to
>> have a reference for the BB.
>
> Does your bdrv_co_flush_all() take the graph (reader) lock? If so, this
> is surprising, because while we hold the graph lock, no reference should
> be able to go away - you need the writer lock for that and you won't get
> it as long as bdrv_co_flush_all() locks the graph. So whatever had a
> reference before the bdrv_next() loop must still have it now. Do you
> know where it gets dropped?
>
AFAICT, yes, it does hold the graph reader lock. The generated code is:
> static void coroutine_fn bdrv_co_flush_all_entry(void *opaque)
> {
> BdrvFlushAll *s = opaque;
>
> bdrv_graph_co_rdlock();
> s->ret = bdrv_co_flush_all();
> bdrv_graph_co_rdunlock();
> s->poll_state.in_progress = false;
>
> aio_wait_kick();
> }
Apparently when the mirror job is aborted/exits, which can happen during
the polling for bdrv_co_flush_all_entry(), a reference can go away
without the write lock (at least my breakpoints didn't trigger) being held:
> #0 blk_unref (blk=0x5cdefe943d20) at ../block/block-backend.c:537
> #1 0x00005cdefb26697e in mirror_exit_common (job=0x5cdefeb53000) at
> ../block/mirror.c:710
> #2 0x00005cdefb263575 in mirror_abort (job=0x5cdefeb53000) at
> ../block/mirror.c:823
> #3 0x00005cdefb2248a6 in job_abort (job=0x5cdefeb53000) at ../job.c:825
> #4 0x00005cdefb2245f2 in job_finalize_single_locked (job=0x5cdefeb53000) at
> ../job.c:855
> #5 0x00005cdefb223852 in job_completed_txn_abort_locked (job=0x5cdefeb53000)
> at ../job.c:958
> #6 0x00005cdefb223714 in job_completed_locked (job=0x5cdefeb53000) at
> ../job.c:1065
> #7 0x00005cdefb224a8b in job_exit (opaque=0x5cdefeb53000) at ../job.c:1088
> #8 0x00005cdefb4134fc in aio_bh_call (bh=0x5cdefe7487c0) at
> ../util/async.c:171
> #9 0x00005cdefb4136ce in aio_bh_poll (ctx=0x5cdefd9cd750) at
> ../util/async.c:218
> #10 0x00005cdefb3efdfd in aio_poll (ctx=0x5cdefd9cd750, blocking=true) at
> ../util/aio-posix.c:722
> #11 0x00005cdefb20435e in bdrv_poll_co (s=0x7ffe491621d8) at
> ../block/block-gen.h:43
> #12 0x00005cdefb206a33 in bdrv_flush_all () at block/block-gen.c:1410
> #13 0x00005cdefae5c8ed in do_vm_stop (state=RUN_STATE_SHUTDOWN,
> send_stop=false)
> at ../system/cpus.c:297
> #14 0x00005cdefae5c850 in vm_shutdown () at ../system/cpus.c:308
> #15 0x00005cdefae6d892 in qemu_cleanup (status=0) at ../system/runstate.c:871
> #16 0x00005cdefb1a7e78 in qemu_default_main () at ../system/main.c:38
> #17 0x00005cdefb1a7eb8 in main (argc=34, argv=0x7ffe491623a8) at
> ../system/main.c:48
Looking at the code in mirror_exit_common(), it doesn't seem to acquire
a write lock:
> bdrv_graph_rdunlock_main_loop();
>
> /*
> * Remove target parent that still uses BLK_PERM_WRITE/RESIZE before
> * inserting target_bs at s->to_replace, where we might not be able to get
> * these permissions.
> */
> blk_unref(s->target);
> s->target = NULL;
The write lock is taken in blk_remove_bs() when the refcount drops to 0
and the BB is actually removed:
> bdrv_graph_wrlock();
> bdrv_root_unref_child(root);
> bdrv_graph_wrunlock();
Best Regards,
Fiona