Re: [PATCH v7 1/2] memory: Update inline documentation

qemu-devel
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v7 1/2] memory: Update inline documentation

From:	Peter Xu
Subject:	Re: [PATCH v7 1/2] memory: Update inline documentation
Date:	Wed, 15 Jan 2025 08:43:28 -0500
On Wed, Jan 15, 2025 at 01:46:29PM +0900, Akihiko Odaki wrote:
> On 2025/01/15 2:02, Peter Xu wrote:
> > On Tue, Jan 14, 2025 at 05:43:09PM +0900, Akihiko Odaki wrote:
> > > memory_region_finalize() is not a function to tell the owner is leaving, 
> > > but
> > > the memory region itself is being destroyed.
> > 
> > It is when the lifecycle of the MR is the same as the owner.  That holds
> > true I suppose if without this patch, and that's why I don't prefer this
> > patch because it makes that part more complicated.
> 
> The lifecycle of the MR is not the same as the owner. The MR gets finalized
> during the finalization of the owner, and the owner is still alive at the
> moment. It is something you should always care when having a child object.

What is the benefit of having such explicit layering of different lifecycle
between the owner and the MRs that it owns?

To ask in another way, what's the functional benefit that we order the
destruction of MRs within the same owner, paying that with explicit two
refcounts concept in memory core?

AFAICT, that's the only purpose MR->refcount is servicing for in this
patchset besides the property link.

Currently, memory_region_ref() takes the refcount _only_ from the host.
Considering that's the only memory API to take a reference on a MR, it kind
of implies to everyone that the MR and the owner shares the lifetime.

In reality, it's not 100% shared indeed, but almost.  We even have those
document for dynamic MRs to make sure that is true even there.

Then it's about the "virtual lifecycle" which triggers a finalize(), or
"real lifecycle" which triggers a free() that may make a difference to a
MR.  And that's the part on whether we should try to not expose too much at
all on these.  I want to keep the concept simple if possible that we stick
with sharing lifetime between owner and all MRs underneath.  I want to see
whether we can avoid complicating that part.

I can see why you want to clearly separate the lifetimes, because it's
cleaner to you.  But IMHO we already made a decision from that starting
from when memory_region_ref() does not take MR->refcount, otherwise you
should at least need something like this to make the lifecycle completely
separate in your this patch:

diff --git a/system/memory.c b/system/memory.c
index b17b5538ff..d4b88c389a 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1843,15 +1843,23 @@ void memory_region_ref(MemoryRegion *mr)
      * Memory regions without an owner are supposed to never go away;
      * we do not ref/unref them because it slows down DMA sensibly.
      */
-    if (mr && mr->owner) {
-        object_ref(mr->owner);
+    if (mr) {
+        /* The MR has its own lifecycle.. even if in most cases, virtually */
+        object_ref(mr);
+        if (mr->owner) {
+            object_ref(mr->owner);
+        }
     }
 }
 
 void memory_region_unref(MemoryRegion *mr)
 {
-    if (mr && mr->owner) {
-        object_unref(mr->owner);
+    if (mr) {
+        /* The MR has its own lifecycle.. even if in most cases, virtually */
+        object_unref(mr);
+        if (mr->owner) {
+            object_unref(mr->owner);
+        }
     }
 }

To me, QEMU already went the other way.  So I sincerely don't know how that
extra mr->refcount usage it could bring us.  It only makes it harder to
understand to me.

> 
> > 
> > > It should not happen when a container is still referencing it. That is
> > > also why it has memory_region_ref(subregion) in
> > > memory_region_update_container_subregions() and assert(!mr->container) in
> > > memory_region_finalize().
> > 
> > Again, the line I added was sololy for what you said "automation" elsewhere
> > and only should work within MR-links within the same owner.  Otherwise
> > anyone referencing the MR would hold the owner ref then this finalize()
> > will never happen.
> > 
> > Now, if I could go back to your original purpose of this work, quotting
> > from your cover letter:
> > 
> > > I saw various sanitizer errors when running check-qtest-ppc64. While
> > > I could just turn off sanitizers, I decided to tackle them this time.
> > > 
> > > Unfortunately, GLib versions older than 2.81.0 do not free test data in
> > > some cases so some sanitizer errors remain. All sanitizer errors will be
> > > gone with this patch series combined with the following change for GLib:
> > > https://gitlab.gnome.org/GNOME/glib/-/merge_requests/4120
> > 
> > Is check-qtest-ppc64 the only one that will trigger this issue?  Does it
> > mean that most of the devices will do proper removal of device-owned
> > subregions (hence, not prone to circular reference of owner refcount)
> > except some devices in ppc64?
> > 
> 
> Searching for memory_region_add_subregion() gives 1078 instances where there
> are 142 instances of memory_region_del_subregion(). This is a rough estimate
> but there are potentially 936 instances of subregions without explicit
> deletion.
> 
> For example, hw/audio/intel-hda.c adds subregions immediately after their
> containers never deletes the subregions. I think that's fine because their
> lifetimes are obvious with reference counters.

OK, let's try to figure out a best way to move forward then.

Let me try to summarize the two approaches so far.

So in general I think I don't prefer this patch because this patch is kind
of in the middle of something.

It neither provides 100% separation of MR lifecycle: as discussed above, on
not referencing MR->refcount on memory_region_ref/unref at least yet so far
together in this patch, but suddenly started considering it in MR links.
To me, that's abuse if ordering of such finalize() is not justified.

Nor it provides best efficiency: needing to take a MR->refcount when
linking two MRs, even if we essentially don't need to guarded by the fact
that owner must exist already, which must hold true anyway for QEMU to work
so far.

What I think the best is we either go one way or another: either we make MR
lifecycle clearly separate, or we make it clearly efficient (meanwhile we
still keep the concept easy, and we at least try to always stick with one
refcount which is easier to maintain too).

IMHO that's what the other older patch does (plus my fixup squashed in):

https://lore.kernel.org/all/ZsenKpu1czQGYz7m@x1n/

That avoids taking a refcount for internal MRs, always stick with owner
shares the same lifecycle with MRs, just like the same assumption we have
already had in memory_region_ref().  The bad side effect is we need
something slightly hackish in mr finalize(), but we can provide some better
doc, and keep the comlexity there only (which I think is better than always
having two refcounts all over).

If we worry about removal of that container assertion, we could assert
instead on the owner.  I've attached a slightly modified full version of
such alternative patch below, with the best comment I see suite.

diff --git a/system/memory.c b/system/memory.c
index b17b5538ff..7b2d91ca6b 100644
--- a/system/memory.c
+++ b/system/memory.c
@@ -1803,7 +1803,6 @@ static void memory_region_finalize(Object *obj)
 {
     MemoryRegion *mr = MEMORY_REGION(obj);
 
-    assert(!mr->container);
 
     /* We know the region is not visible in any address space (it
      * does not have a container and cannot be a root either because
@@ -1813,6 +1812,17 @@ static void memory_region_finalize(Object *obj)
      */
     mr->enabled = false;
     memory_region_transaction_begin();
+    if (mr->container) {
+        /*
+         * If this happens, it must be when MRs share the same owner,
+         * because only share-owner-ed links doesn't take a refcount.  In
+         * this specific case, we allow random order of finalize() on the
+         * MRs the owner owns, so it's possible the child finalize()s
+         * before a parent.  When it happens, unlink from the child.
+         */
+        assert(mr->container->owner == mr->owner);
+        memory_region_del_subregion(mr->container, mr);
+    }
     while (!QTAILQ_EMPTY(&mr->subregions)) {
         MemoryRegion *subregion = QTAILQ_FIRST(&mr->subregions);
         memory_region_del_subregion(mr, subregion);
@@ -2644,7 +2654,15 @@ static void 
memory_region_update_container_subregions(MemoryRegion *subregion)
 
     memory_region_transaction_begin();
 
-    memory_region_ref(subregion);
+    if (mr->owner != subregion->owner) {
+        /*
+         * MRs always have the same lifecycle of its owner, so that when
+         * adding a subregion that shares the same owner of the parent, we
+         * don't need any refcounting, because the two MRs share the
+         * lifecycle with owner, so they share between each other too.
+         */
+        memory_region_ref(subregion);
+    }
     QTAILQ_FOREACH(other, &mr->subregions, subregions_link) {
         if (subregion->priority >= other->priority) {
             QTAILQ_INSERT_BEFORE(other, subregion, subregions_link);
@@ -2702,7 +2720,10 @@ void memory_region_del_subregion(MemoryRegion *mr,
         assert(alias->mapped_via_alias >= 0);
     }
     QTAILQ_REMOVE(&mr->subregions, subregion, subregions_link);
-    memory_region_unref(subregion);
+    /* See the corresponding comment in add subregion path */
+    if (mr->owner != subregion->owner) {
+        memory_region_unref(subregion);
+    }
     memory_region_update_pending |= mr->enabled && subregion->enabled;
     memory_region_transaction_commit();
 }
-- 
2.47.0


-- 
Peter Xu
[Prev in Thread]
Current Thread
[Next in Thread]
Re: [PATCH v7 1/2] memory: Update inline documentation, (continued)
Prev by Date: Re: [PATCH v2] hw/i386/cpu: remove default_cpu_version and simplify
Next by Date: Re: [PATCH v3] hw/cxl: Fix msix_notify: Assertion `vector < dev->msix_entries_nr`
Previous by thread: Re: [PATCH v7 1/2] memory: Update inline documentation
Next by thread: Re: [PATCH v7 1/2] memory: Update inline documentation
Index(es):
- Date
- Thread