|
From: | Daniel Wielandt |
Subject: | Re: [PATCH] block-backend: protect setting block root to NULL with block graph write lock |
Date: | Thu, 9 Jan 2025 04:52:19 -0600 |
Suggestions hold pattern.. programmers obviously understand control logic. Systems built with operational flaws. Will freeze up when u start repairing it while it's running.
Am 08.01.2025 um 13:46 hat Fiona Ebner geschrieben:
> Setting blk->root is a graph change operation and thus needs to be
> protected by the block graph write lock in blk_remove_bs(). The
> assignment to blk->root in blk_insert_bs() is already protected by
> the block graph write lock.
Hm, if that's the case, then we should also enforce this in the
declaration of BlockBackend:
BdrvChild * GRAPH_RDLOCK_PTR root;
However, this results in more compiler failures that we need to fix. You
caught the only remaining writer, but the lock is only fully effective
if all readers take it, too.
> In particular, the graph read lock in blk_co_do_flush() could
> previously not ensure that blk_bs(blk) would always return the same
> value during the locked section, which could lead to a segfault [0] in
> combination with migration [1].
>
> From the user-provided backtraces in the forum thread [1], it seems
> like blk_co_do_flush() managed to get past the
> blk_co_is_available(blk) check, meaning that blk_bs(blk) returned a
> non-NULL value during the check, but then, when calling
> bdrv_co_flush(), blk_bs(blk) returned NULL.
>
> [0]:
>
> > 0 bdrv_primary_child (bs=bs@entry=0x0) at ../block.c:8287
> > 1 bdrv_co_flush (bs=0x0) at ../block/io.c:2948
> > 2 bdrv_co_flush_entry (opaque=0x7a610affae90) at block/block-gen.c:901
>
> [1]: https://forum.proxmox.com/threads/158072
>
> Cc: qemu-stable@nongnu.org
> Signed-off-by: Fiona Ebner <f.ebner@proxmox.com>
> ---
> block/block-backend.c | 2 +-
> 1 file changed, 1 insertion(+), 1 deletion(-)
>
> diff --git a/block/block-backend.c b/block/block-backend.c
> index c93a7525ad..9678615318 100644
> --- a/block/block-backend.c
> +++ b/block/block-backend.c
> @@ -887,9 +887,9 @@ void blk_remove_bs(BlockBackend *blk)
> */
> blk_drain(blk);
> root = blk->root;
> - blk->root = NULL;
>
> bdrv_graph_wrlock();
> + blk->root = NULL;
> bdrv_root_unref_child(root);
> bdrv_graph_wrunlock();
> }
I think the 'root = blk->root' needs to be inside the locked section,
too. Otherwise blk->root could change during bdrv_graph_wrlock() (which
has a nested event loop) and root would be stale. I assume clang would
complain about this with the added GRAPH_RDLOCK_PTR.
Kevin
[Prev in Thread] | Current Thread | [Next in Thread] |