qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 1/2] memory: Update inline documentation


From: Peter Xu
Subject: Re: [PATCH v7 1/2] memory: Update inline documentation
Date: Fri, 17 Jan 2025 12:46:38 -0500

On Fri, Jan 17, 2025 at 03:24:34PM +0900, Akihiko Odaki wrote:
> On 2025/01/16 23:33, Peter Xu wrote:
> > On Thu, Jan 16, 2025 at 02:37:38PM +0900, Akihiko Odaki wrote:
> > > On 2025/01/16 1:14, Peter Xu wrote:
> > > > On Thu, Jan 16, 2025 at 12:52:56AM +0900, Akihiko Odaki wrote:
> > > > > Functionally, the ordering of container/subregion finalization 
> > > > > matters if
> > > > > some device tries to a container during finalization. In such a case,
> > > >                         |
> > > >                         ^ something is missing here, feel free to 
> > > > complete this.
> > > 
> > > Oops, I meant: functionally, the ordering of container/subregion
> > > finalization matters if some device tries to use a container during
> > > finalization.
> > 
> > This is true, though if we keep the concept of "all the MRs share the same
> > lifecycle of the owner" idea, another fix of such is simply moving the
> > container access before any detachment of MRs.
> > 
> > > 
> > > > 
> > > > > removing subregions from the container at random timing can result in 
> > > > > an
> > > > > unexpected behavior. There is little chance to have such a scenario 
> > > > > but we
> > > > > should stay the safe side if possible.
> > > > 
> > > > It sounds like a future feature, and I'm not sure we'll get there, so I
> > > > don't worry that much.  Keeping refcount core idea simple is still very
> > > > attractive to me.  I still prefer we have complete MR refcounting iff 
> > > > when
> > > > necessary.  It's also possible it'll never happen to QEMU.
> > > > 
> > > 
> > > It's not just about the future but also about compatibility with the 
> > > current
> > > device implementations. I will not be surprised even if the random 
> > > ordering
> > > of subregion finalization breaks one of dozens of devices we already have.
> > > We should pay attention the details as we are touching the core
> > > infrastructure.
> > 
> > Yes, if we can find any such example that we must follow the order of MR
> > destruction, I think that could justify your approach will be required but
> > not optional.  It's just that per my understanding there should be none,
> > and even if there're very few outliers, it can still be trivially fixed as
> > mentioned above.
> 
> It can be fixed but that means we need auditing the code of devices or wait
> until we get a bug report.

We'd better have a solid example.

And for this specific question, IIUC we can have such problem even if
internal-ref start to use MR refcounts.

It's because we have a not very straightforward way of finalize() an
object, which is freeing all properties before its own finalize()..

static void object_finalize(void *data)
{
    ...
    object_property_del_all(obj);
    object_deinit(obj, ti);
    ...
}

I think it used to be the other way round (which will be easier to
understand to me..), but changed after 76a6e1cc7cc.  It could boil down to
two dependencies: (1) children's unparent() callback wanting to have the
parent being present and valid, and (2) parent's finalize() callback
wanting to have all children being present and valid.  I guess we chose (1)
as of now.

So I suppose it means even with your patch, it won't help either as long as
MRs are properties, and they can already all be gone in a device finalize()
even with your new patch.

>From that POV, qdev unrealize() could be a good place for such cleanups
while making sure all properties are present.

> 
> > 
> > My gut feeling is when we need serious MR refcounting (I'd expect due to
> > the current heavy code base it might be easier to start a new project if
> > that's required.. that's why I was thinking maybe it will not happen.. but
> > if it will..), we'll do more than your change, and that also means
> > memory_region_ref() must start to refcount MRs, because a serious MR
> > separate refcounting should mean MR can go on the fly before the owner.
> 
> Actually there is one example: virtio_gpu_virgl_map_resource_blob() in
> hw/display/virtio-gpu-virgl.c creates a MR that can be removed before the
> device. To make this possible, it specifies MRs themselves as their and let
> them refcount without help of the device.

.. I am definitely surprised that we have code that assigns obj->parent to
be itself.

Putting the self parenting aside.. and to the topic: I don't think this is
an example for internal-MR use case?

When the owner is itself, then it's not sharing the owner of the parent MR
(which is b->hostmem in this case, which should probably be owned by
VirtIOGPU object instead).  So IIUC no matter which way we choose on the
current discussion, it won't affect the result.

Not to mention, the MRs are always properly detached always when the
resource is unmapped:

virtio_gpu_virgl_unmap_resource_blob():
        /* memory region owns self res->mr object and frees it by itself */
        memory_region_set_enabled(mr, false);
        memory_region_del_subregion(&b->hostmem, mr);

So from that POV it's a very good citizen.. it doesn't need anything to be
auto detached.

Below can be off topic, but since we're discussing this use case..

My gut feeling is that it can cause trouble at some point having MR itself
as parent.

For example, I will not be surprised that some day QMP command
qom-list-properties provide a --deep pararmeter, then this will hang that
command trying to look at child forever in a loop.

It can easily break in other ways too, e.g. when we add an assertion
anywhere for "obj->parent != obj" (which should really make sense.. from a
object model POV), or when we want to take a ref to parent (for obj->parent
reference) then it'll be a "real" circular reference, comparing to what
we're discussing on auto-detach so far (which is IIUC a pretty "weak"
circular ref; IOW, if del_subregion is always properly done, this patch not
needed).

Meanwhile the other free() overwrite:

    OBJECT(mr)->free = virtio_gpu_virgl_hostmem_region_free;

I suppose it can be error prone too whenever we want to provide a common
free() for MRs and this one can be easily overlooked..

I'm not familiar with GPU stuff, but reading it closer I do feel like it's
kind of a slight abuse on all above..

If to stick with "owner shares the same lifecycle for its MRs" idea and fix
all above, IMHO we could make virtio_gpu_virgl_hostmem_region to be a QOM
object, then it can be a proper parent for the MR.

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]