[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] block-backend: protect setting block root to NULL with block
From: |
Kevin Wolf |
Subject: |
Re: [PATCH] block-backend: protect setting block root to NULL with block graph write lock |
Date: |
Thu, 16 Jan 2025 16:52:38 +0100 |
Am 10.01.2025 um 17:37 hat Fiona Ebner geschrieben:
> Am 09.01.25 um 11:47 schrieb Kevin Wolf:
> > Am 08.01.2025 um 13:46 hat Fiona Ebner geschrieben:
> >> Setting blk->root is a graph change operation and thus needs to be
> >> protected by the block graph write lock in blk_remove_bs(). The
> >> assignment to blk->root in blk_insert_bs() is already protected by
> >> the block graph write lock.
> >
> > Hm, if that's the case, then we should also enforce this in the
> > declaration of BlockBackend:
> >
> > BdrvChild * GRAPH_RDLOCK_PTR root;
> >
> > However, this results in more compiler failures that we need to fix. You
> > caught the only remaining writer, but the lock is only fully effective
> > if all readers take it, too.
>
> I started giving this a try, but quickly ran into some issues/questions:
>
> 1. For global state code, is it preferred to use
> GRAPH_RDLOCK_GUARD_MAINLOOP() to cover the whole function or better to
> use bdrv_graph_rd(un)lock_main_loop() to keep the locked section as
> small as necessary? I feel like the former can be more readable, e.g. in
> blk_insert_bs(), blk_new_open(), where blk->root is used in conditionals.
Yes, I think we generally use GRAPH_RDLOCK_GUARD_MAINLOOP() if there is
no read not to (e.g. if you need to take a writer lock in another part
of the same function). It's essentially only an assertion anyway, so it
doesn't even actually hold a real lock for longer than necessary.
> 2. In particular, protecting blk->root means that blk_bs() needs to have
> the read lock. In fact, blk_bs() is reading blk->root twice in a row, so
> it seems like it could suffer from a potential NULL pointer dereference
> (or I guess after compiler optimization a potential use-after-free)?
>
> Since blk_bs() is IO_CODE() and not a coroutine, I tried to mark it
> GRAPH_RDLOCK and move on to the callers.
It looks like blk_bs() is tricky in general... blk_bs() is used in
iothreads in devices. Not sure how easy it would be to trigger a bug
there, but from the locking alone, it seems entirely possible to have
the device iothread race with the main thread changing the root node.
> However, one caller is blk_nb_sectors() which itself is called by
> blk_get_geometry(). Both of these are manually-written coroutine wrappers:
>
> > commit 81f730d4d0e8af9c0211c3fedf406df0046341a9
> > Author: Paolo Bonzini <pbonzini@redhat.com>
> > Date: Fri Apr 7 17:33:03 2023 +0200
> >
> > block, block-backend: write some hot coroutine wrappers by hand
> >
> > The introduction of the graph lock is causing blk_get_geometry, a hot
> > function
> > used in the I/O path, to create a coroutine. However, the only part
> > that really
> > needs to run in coroutine context is the call to
> > bdrv_co_refresh_total_sectors,
> > which in turn only happens in the rare case of host CD-ROM devices.
> >
> > So, write by hand the three wrappers on the path from
> > blk_co_get_geometry to
> > bdrv_co_refresh_total_sectors, so that the coroutine wrapper is only
> > created
> > if bdrv_nb_sectors actually calls bdrv_refresh_total_sectors.
>
> Both the blk_bs() and blk_nb_sectors() functions are IO_CODE(), but not
> coroutines, and callers of blk_get_geometry are already in the device
> code. I'm not sure how to proceed here, happy to hear suggestions :)
Being in device code is not necessarily a problem, we could be doing
stuff already there if we wanted.
The important question is, what would be the right behaviour? Can any of
the calls run while the writer lock is held?
If not, we can probably assert that and just take a reader lock
(bdrv_graph_co_rdlock() has to run in coroutine context only because it
might have to wait for a writer - if we know that there is none, we
could do it outside of a coroutine).
But if yes, then we either have a real bug that needs to be solved or
the protection is provided by something other than the graph lock. Maybe
the answer is that because of how devices access the field, the graph
lock isn't the right mechanism for protecting BlockBackend.root and we
should rely on draining instead?
We already try to do this:
blk_drain(blk);
root = blk->root;
blk->root = NULL;
It would make more sense for this to be a drained_begin/end section that
quiesces devices while we're changing the root node, otherwise we can
get new requests before blk_drain() returns.
Do you know if the I/O request dereferencing a NULL pointer came from a
device? (The stack trace in your commit message is shortened too much to
tell me a lot, and the link to your forum doesn't allow me to view the
logs without registering an account. Please include more information in
the commit message in v2.)
Kevin
Re: [PATCH] block-backend: protect setting block root to NULL with block graph write lock, Fiona Ebner, 2025/01/10
- Re: [PATCH] block-backend: protect setting block root to NULL with block graph write lock,
Kevin Wolf <=