qemu-block
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC] block-backend: prevent dangling BDS pointer in blk_drain()


From: Stefan Hajnoczi
Subject: Re: [RFC] block-backend: prevent dangling BDS pointer in blk_drain()
Date: Thu, 9 Dec 2021 16:32:54 +0000

On Thu, Dec 09, 2021 at 04:45:13PM +0100, Hanna Reitz wrote:
> On 09.12.21 15:23, Stefan Hajnoczi wrote:
> > The BlockBackend root child can change during bdrv_drained_begin() when
> > aio_poll() is invoked. In fact the BlockDriverState can reach refcnt 0
> > and blk_drain() is left with a dangling BDS pointer.
> > 
> > One example is scsi_device_purge_requests(), which calls blk_drain() to
> > wait for in-flight requests to cancel. If the backup blockjob is active,
> > then the BlockBackend root child is a temporary filter BDS owned by the
> > blockjob. The blockjob can complete during bdrv_drained_begin() and the
> > last reference to the BDS is released when the temporary filter node is
> > removed. This results in a use-after-free when blk_drain() calls
> > bdrv_drained_end(bs) on the dangling pointer.
> > 
> > The general problem is that a function and its callers must not assume
> > that bs is still valid across aio_poll(). Explicitly hold a reference to
> > bs in blk_drain() to avoid the dangling pointer.
> > 
> > Signed-off-by: Stefan Hajnoczi <stefanha@redhat.com>
> > ---
> > I found that BDS nodes are sometimes deleted with bs->quiesce_counter >
> > 0 (at least when running "make check"), so it is currently not possible
> > to put the bdrv_ref/unref() calls in bdrv_do_drained_begin() and
> > bdrv_do_drained_end() because they will be unbalanced. That would have
> > been a more general solution than only fixing blk_drain().
> 
> Deleting nodes that have a `quiesce_counter > 0` doesn’t seem wrong to me –
> deleting only depends on strong references, and so I’d expect that anything
> that increases the quiesce_counter also has a strong reference to the node
> if the former wants the latter to stay around.
> 
> I suppose we could make it so that both the quiesce_counter and the refcnt
> need to be 0 before a BDS is deleted (and then deletion can happen both from
> bdrv_unref() and drained_end), but I don’t know whether that’s really
> necessary.  I’d rather leave it to the caller to ensure they keep a strong
> reference throughout the drain.
> 
> The question is, how often do we have a situation like this, where we take a
> weak reference for draining, because we assume there’s a strong reference
> backing us up (namely the one through blk->root), but that strong reference
> then can go away due to draining...
> 
> > Any suggestions for a better fix?
> 
> The fix makes sense to me.

Okay. My concern was that this is a whole class of bugs and my patch
only fixes blk_drain(). I have audited the code some more in the
meantime.

bdrv_insert_node() may be unsafe in the case where bs is a temporary
filter node that is unref'd during bdrv_drained_begin():

  BlockDriverState *bdrv_insert_node(BlockDriverState *bs, QDict *options,
                                     int flags, Error **errp)
  {
      ERRP_GUARD();
      int ret;
      BlockDriverState *new_node_bs = NULL;
      const char *drvname, *node_name;
      BlockDriver *drv;
  
      drvname = qdict_get_try_str(options, "driver");
      if (!drvname) {
          error_setg(errp, "driver is not specified");
          goto fail;
      }
  
      drv = bdrv_find_format(drvname);
      if (!drv) {
          error_setg(errp, "Unknown driver: '%s'", drvname);
          goto fail;
      }
  
      node_name = qdict_get_try_str(options, "node-name");
  
      new_node_bs = bdrv_new_open_driver_opts(drv, node_name, options, flags,
                                              errp);
      options = NULL; /* bdrv_new_open_driver() eats options */
      if (!new_node_bs) {
          error_prepend(errp, "Could not create node: ");
          goto fail;
      }
  
      bdrv_drained_begin(bs);
      ^^^^^^^^^^^^^^^^^^^^^^^ <--- bs can be dangling pointer
      ret = bdrv_replace_node(bs, new_node_bs, errp);
      bdrv_drained_end(bs);

The fix isn't as simple as blk_drain() because we don't want to insert
the new node before the now-deleted node. I think the correct way to
insert a node is against BdrvChild, not BlockDriverState. That way we
can be sure the new node will be inserted into a graph that is reachable
via BdrvChild (e.g. BlockBackend) instead of a detached BDS.

bdrv_set_aio_context_ignore() and blk_io_limits_disable() need to ref bs
like blk_drain() in this patch.

There are some other bdrv_drained_begin() calls that I'm assuming are
safe because they are during creation/deletion so I think we have strong
references there or nothing else knows about our BDS yet.

Do you agree with extending this patch series to cover the functions I
mentioned above?

> One alternative that comes to my mind is to instead re-fetch `bs =
> blk_bs(blk);` after the AIO_WAIT_WHILE() loop.  But that might be wrong,
> because if the node attached to the BB changed (i.e. isn’t `bs`, and isn’t
> `NULL`), then we’d end the drain on the wrong node.

Yes.

Stefan

Attachment: signature.asc
Description: PGP signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]