qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] block-backend: protect setting block root to NULL with block


From: Fiona Ebner
Subject: Re: [PATCH] block-backend: protect setting block root to NULL with block graph write lock
Date: Fri, 10 Jan 2025 17:37:33 +0100
User-agent: Mozilla Thunderbird

Am 09.01.25 um 11:47 schrieb Kevin Wolf:
> Am 08.01.2025 um 13:46 hat Fiona Ebner geschrieben:
>> Setting blk->root is a graph change operation and thus needs to be
>> protected by the block graph write lock in blk_remove_bs(). The
>> assignment to blk->root in blk_insert_bs() is already protected by
>> the block graph write lock.
> 
> Hm, if that's the case, then we should also enforce this in the
> declaration of BlockBackend:
> 
>     BdrvChild * GRAPH_RDLOCK_PTR root;
> 
> However, this results in more compiler failures that we need to fix. You
> caught the only remaining writer, but the lock is only fully effective
> if all readers take it, too.

I started giving this a try, but quickly ran into some issues/questions:

1. For global state code, is it preferred to use
GRAPH_RDLOCK_GUARD_MAINLOOP() to cover the whole function or better to
use bdrv_graph_rd(un)lock_main_loop() to keep the locked section as
small as necessary? I feel like the former can be more readable, e.g. in
blk_insert_bs(), blk_new_open(), where blk->root is used in conditionals.

2. In particular, protecting blk->root means that blk_bs() needs to have
the read lock. In fact, blk_bs() is reading blk->root twice in a row, so
it seems like it could suffer from a potential NULL pointer dereference
(or I guess after compiler optimization a potential use-after-free)?

Since blk_bs() is IO_CODE() and not a coroutine, I tried to mark it
GRAPH_RDLOCK and move on to the callers.

However, one caller is blk_nb_sectors() which itself is called by
blk_get_geometry(). Both of these are manually-written coroutine wrappers:

> commit 81f730d4d0e8af9c0211c3fedf406df0046341a9
> Author: Paolo Bonzini <pbonzini@redhat.com>
> Date:   Fri Apr 7 17:33:03 2023 +0200
> 
>     block, block-backend: write some hot coroutine wrappers by hand
>     
>     The introduction of the graph lock is causing blk_get_geometry, a hot 
> function
>     used in the I/O path, to create a coroutine.  However, the only part that 
> really
>     needs to run in coroutine context is the call to 
> bdrv_co_refresh_total_sectors,
>     which in turn only happens in the rare case of host CD-ROM devices.
>     
>     So, write by hand the three wrappers on the path from blk_co_get_geometry 
> to
>     bdrv_co_refresh_total_sectors, so that the coroutine wrapper is only 
> created
>     if bdrv_nb_sectors actually calls bdrv_refresh_total_sectors.

Both the blk_bs() and blk_nb_sectors() functions are IO_CODE(), but not
coroutines, and callers of blk_get_geometry are already in the device
code. I'm not sure how to proceed here, happy to hear suggestions :)

---snip---

>> diff --git a/block/block-backend.c b/block/block-backend.c
>> index c93a7525ad..9678615318 100644
>> --- a/block/block-backend.c
>> +++ b/block/block-backend.c
>> @@ -887,9 +887,9 @@ void blk_remove_bs(BlockBackend *blk)
>>       */
>>      blk_drain(blk);
>>      root = blk->root;
>> -    blk->root = NULL;
>>  
>>      bdrv_graph_wrlock();
>> +    blk->root = NULL;
>>      bdrv_root_unref_child(root);
>>      bdrv_graph_wrunlock();
>>  }
> 
> I think the 'root = blk->root' needs to be inside the locked section,
> too. Otherwise blk->root could change during bdrv_graph_wrlock() (which
> has a nested event loop) and root would be stale. I assume clang would
> complain about this with the added GRAPH_RDLOCK_PTR.

Oh I see, good catch!

Best Regards,
Fiona




reply via email to

[Prev in Thread] Current Thread [Next in Thread]