Re: [PATCH v7 1/2] memory: Update inline documentation

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 1/2] memory: Update inline documentation

From:	Akihiko Odaki
Subject:	Re: [PATCH v7 1/2] memory: Update inline documentation
Date:	Fri, 17 Jan 2025 15:19:32 +0900
User-agent:	Mozilla Thunderbird

On 2025/01/17 1:13, BALATON Zoltan wrote:

On Thu, 16 Jan 2025, Peter Maydell wrote:
On Tue, 14 Jan 2025 at 19:12, Peter Xu <peterx@redhat.com> wrote:
On Tue, Jan 14, 2025 at 05:42:57PM +0000, Peter Maydell wrote:
There's at least one test in the arm qtests that will hit this.
I suspect that you'll find that most architectures except x86
(where we don't have models of complex SoCs and the few
machines we do have tend to be old code that is less QOMified)
will hit similar issues. I think there's a general issue here,
this isn't just "some particular ppc device is wrongly coded".
I see. Do you know how many of them would be important memory leaksthat
we should fix immediately?
None of these are important memory leaks, because the device is
almost always present for the lifetime of the simulation. The
only case you'd actually get a visible leak would be if you
could hot-unplug the device, and even then you'd have to
deliberately sit there doing hot-plug-then-unplug cycles to
leak an interesting amount of memory.

The main reason to want to fix them is that it lets us run
"make check" under the sanitizer and catch other, more
interesting leaks.
I mean, we have known memory leaks in QEMU in many places I assume.I am
curious how important this problem is, and whether such would justify a
memory API change that is not reaching a quorum state (and, imho, add
complexity to memory core and of course that spreads to x86 too evenif itwas not affected) to be merged. Or perhaps we can fix the importantones
first from the device model directly instead.
The problem is generic, and the problem is that we have not actually
nailed down how this is supposed to work, i.e:
* what are the reference counts counting?
It would be very unintuitive if ref counts not counted the number ofreferences to the object that contains this ref count. If I understoodcorrectly that's the reason for this discussion that in this case theref count of the owner is counting the MRs instead of its own referencesand the MR's ref count is not used which is confusing.
* if a device has this kind of memory region inside another,
  how is it supposed to be coded so as to not leak memory?

If we can figure out how the lifecycle and memory management
is supposed to work, then yes, we can fix the relevant device
models so that they follow whatever the rules are. But it seems
If looking for rules to follow one proven and relatively simple set iswhat Cocoa uses on macOS. There's a very short introduction here:
https://www.peachpit.com/articles/article.aspx?p=377302&seqNum=2
and a bit longer more complete here:
https://www.tomdalling.com/blog/cocoa/an-in-depth-look-at-manual-memory-management-in-objective-c/
I think this case is one which is shortly mentioned at the end of thesecond link which is solved by not retaining the contained objects sothis is closer to what PeterX suggests just dropping the refcounting onthe owner and consider the MRs owned by the superregions once addedthere. This means add subregion would pass the reference to the ownerand remove subregion passes it back to who called it so the MR's refcount needs no change but then who added the MR should not use it afterpassing it to the superregion and who called remove subregion shouldrelease it. But I don't really understans the problem so maybe it's morecomplex.

Let me explain my and Peter Xu's solutions by listing object references.It will abstract the details away and allow applying the analogy ofother reference counting systems like Objective-C.


"Real" reference relationship
-----------------------------

First, I'll introduce one basic rule:

Rule 1. If A needs B, let A refer to B.

Now, There is an owner and a memory region. The owner needs the memoryregion to function. The memory region needs the owner to provide toallocate itself and e.g., to provide MemoryRegionOps formemory_region_init_io(). So we will make a bi-directional referencebetween the owner and the memory region.


O (Owner) -> MR (Memory Region)
MR -> O

Now, let's think the case where there is a container and subregion, andthey are owned by different objects. The container needs the subregionto satisfy its accesses, and the subregion needs the container formemory_region_find(). So this relationship is also bi-directional.


CO (Container's Owner) -> C (Container)
C -> CO
SO (Subregions' Owner) -> S (Subregion)
S -> SO
C -> S
S -> C

Let's think someone took a reference to the container e.g., for DMA.This mean we add the following reference:


DMA Controller -> Container

This will ensure all of Container, Container's Owner, Subregion,Subregion's Owner are alive during the DMA.


"Weak" references
-----------------

Next, I'll impose the restriction that prevents circular references. Toavoid circular references, we'll stop counting references for onedirection. In other words, we'll make some references "weak". Weakreferences are prone to be dangling (in Objective-C or other platforms,such references will be usually replaced with nil-like values).


Rule 2. Make a reference from a memory region to its owner weak
Rule 3. Make a reference from a subregion to its container weak

CO -> C
C -> CO (weak)
SO -> S
S -> SO (weak)
C -> S
S -> C (weak)
D (DMA Controller) -> C

This will fix circular references, but unfortunately it will not workbecause DMA Controller requires weak references to reach to Container'sOwner or Subregion's Owner with "strong" references; it means they canbe dangling at any time.


To solve this problem, we'll introduce another rule:

Rule 4. Replace references to memory regions with references to theirowners *except for the references from the owner to the memory region*.


CO -> C
C -> CO (weak)
SO -> S
S -> SO (weak)
C -> SO
S -> C (weak)
D -> CO

With this change, D can again reach all of CO, C, SO, and S.

Problem
-------

What if Container's Owner and Subregion's Owner are identical?

O -> C
C -> O (weak)
O -> S
S -> O (weak)
C -> O
S -> C (weak)
D -> O

Now there is a circular reference: O -> C -> O

My patch
--------

My patch amends rule 4:

Rule 4'. Replace references to memory regions with references to theirowners *unless the reference originates from the owner*.


O -> C
C -> O (weak)
O -> S
S -> O (weak)
C -> S (this used to be C -> O)
S -> C (weak)
D -> O

Now the circular reference is gone.

Peter Xu's patch
----------------

Peter Xu's patch adds two new rules:

Rule 5. Make references from a container to its subregion weak if theyare owned by the same object.


O -> C
C -> O (weak)
O -> S
S -> O (weak)
C -> O (weak [this used to be strong])
S -> C (weak)
D -> O

Now the circular reference is gone and D can still reach O, C, and S.

However, there is one problem: C requires a weak reference to reach toS. More concretely the pointer from C to S will be dangling when thereference O -> S gets dropped during the finalization of O.


Peter Xu's patch adds yet another rule to overcome this problem:

Rule 6. When a subregion gets finalized, drop the weak reference fromits container to itself.


The dangling reference will be gone with this rule.

Comparison between the two patches
----------------------------------

I'll compare them by showing pros and cons of Peter Xu's patch.

Pro:

a. Will not add yet another direct reference to memory regions wherecurrently there can be only one reference from the owner to each memoryregion.


Cons:
b. It requires two new rules.

c. C will see its reference to S to disappear at random timing due to aweak reference.


So, which is better do you think?

[Prev in Thread]

Current Thread

[Next in Thread]

Re: [PATCH v7 1/2] memory: Update inline documentation, (continued)

Prev by Date: [PULL 50/50] hw/char/riscv_htif: Convert HTIF_DEBUG() to trace events
Next by Date: Re: [PATCH v2] pci: ensure valid link status bits for downstream ports
Previous by thread: Re: [PATCH v7 1/2] memory: Update inline documentation
Next by thread: Re: [PATCH v7 1/2] memory: Update inline documentation
Index(es):
- Date
- Thread