qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v7 1/2] memory: Update inline documentation


From: Akihiko Odaki
Subject: Re: [PATCH v7 1/2] memory: Update inline documentation
Date: Fri, 17 Jan 2025 15:19:32 +0900
User-agent: Mozilla Thunderbird

On 2025/01/17 1:13, BALATON Zoltan wrote:
On Thu, 16 Jan 2025, Peter Maydell wrote:
On Tue, 14 Jan 2025 at 19:12, Peter Xu <peterx@redhat.com> wrote:

On Tue, Jan 14, 2025 at 05:42:57PM +0000, Peter Maydell wrote:
There's at least one test in the arm qtests that will hit this.
I suspect that you'll find that most architectures except x86
(where we don't have models of complex SoCs and the few
machines we do have tend to be old code that is less QOMified)
will hit similar issues. I think there's a general issue here,
this isn't just "some particular ppc device is wrongly coded".

I see.  Do you know how many of them would be important memory leaks that
we should fix immediately?

None of these are important memory leaks, because the device is
almost always present for the lifetime of the simulation. The
only case you'd actually get a visible leak would be if you
could hot-unplug the device, and even then you'd have to
deliberately sit there doing hot-plug-then-unplug cycles to
leak an interesting amount of memory.

The main reason to want to fix them is that it lets us run
"make check" under the sanitizer and catch other, more
interesting leaks.

I mean, we have known memory leaks in QEMU in many places I assume. I am
curious how important this problem is, and whether such would justify a
memory API change that is not reaching a quorum state (and, imho, add
complexity to memory core and of course that spreads to x86 too even if it was not affected) to be merged.  Or perhaps we can fix the important ones
first from the device model directly instead.

The problem is generic, and the problem is that we have not actually
nailed down how this is supposed to work, i.e:
* what are the reference counts counting?

It would be very unintuitive if ref counts not counted the number of references to the object that contains this ref count. If I understood correctly that's the reason for this discussion that in this case the ref count of the owner is counting the MRs instead of its own references and the MR's ref count is not used which is confusing.

* if a device has this kind of memory region inside another,
  how is it supposed to be coded so as to not leak memory?

If we can figure out how the lifecycle and memory management
is supposed to work, then yes, we can fix the relevant device
models so that they follow whatever the rules are. But it seems

If looking for rules to follow one proven and relatively simple set is what Cocoa uses on macOS. There's a very short introduction here:
https://www.peachpit.com/articles/article.aspx?p=377302&seqNum=2
and a bit longer more complete here:
https://www.tomdalling.com/blog/cocoa/an-in-depth-look-at-manual-memory- management-in-objective-c/

I think this case is one which is shortly mentioned at the end of the second link which is solved by not retaining the contained objects so this is closer to what PeterX suggests just dropping the refcounting on the owner and consider the MRs owned by the superregions once added there. This means add subregion would pass the reference to the owner and remove subregion passes it back to who called it so the MR's ref count needs no change but then who added the MR should not use it after passing it to the superregion and who called remove subregion should release it. But I don't really understans the problem so maybe it's more complex.

Let me explain my and Peter Xu's solutions by listing object references. It will abstract the details away and allow applying the analogy of other reference counting systems like Objective-C.

"Real" reference relationship
-----------------------------

First, I'll introduce one basic rule:

Rule 1. If A needs B, let A refer to B.

Now, There is an owner and a memory region. The owner needs the memory region to function. The memory region needs the owner to provide to allocate itself and e.g., to provide MemoryRegionOps for memory_region_init_io(). So we will make a bi-directional reference between the owner and the memory region.

O (Owner) -> MR (Memory Region)
MR -> O

Now, let's think the case where there is a container and subregion, and they are owned by different objects. The container needs the subregion to satisfy its accesses, and the subregion needs the container for memory_region_find(). So this relationship is also bi-directional.

CO (Container's Owner) -> C (Container)
C -> CO
SO (Subregions' Owner) -> S (Subregion)
S -> SO
C -> S
S -> C

Let's think someone took a reference to the container e.g., for DMA. This mean we add the following reference:

DMA Controller -> Container

This will ensure all of Container, Container's Owner, Subregion, Subregion's Owner are alive during the DMA.

"Weak" references
-----------------

Next, I'll impose the restriction that prevents circular references. To avoid circular references, we'll stop counting references for one direction. In other words, we'll make some references "weak". Weak references are prone to be dangling (in Objective-C or other platforms, such references will be usually replaced with nil-like values).

Rule 2. Make a reference from a memory region to its owner weak
Rule 3. Make a reference from a subregion to its container weak

CO -> C
C -> CO (weak)
SO -> S
S -> SO (weak)
C -> S
S -> C (weak)
D (DMA Controller) -> C

This will fix circular references, but unfortunately it will not work because DMA Controller requires weak references to reach to Container's Owner or Subregion's Owner with "strong" references; it means they can be dangling at any time.

To solve this problem, we'll introduce another rule:

Rule 4. Replace references to memory regions with references to their owners *except for the references from the owner to the memory region*.

CO -> C
C -> CO (weak)
SO -> S
S -> SO (weak)
C -> SO
S -> C (weak)
D -> CO

With this change, D can again reach all of CO, C, SO, and S.

Problem
-------

What if Container's Owner and Subregion's Owner are identical?

O -> C
C -> O (weak)
O -> S
S -> O (weak)
C -> O
S -> C (weak)
D -> O

Now there is a circular reference: O -> C -> O

My patch
--------

My patch amends rule 4:

Rule 4'. Replace references to memory regions with references to their owners *unless the reference originates from the owner*.

O -> C
C -> O (weak)
O -> S
S -> O (weak)
C -> S (this used to be C -> O)
S -> C (weak)
D -> O

Now the circular reference is gone.

Peter Xu's patch
----------------

Peter Xu's patch adds two new rules:

Rule 5. Make references from a container to its subregion weak if they are owned by the same object.

O -> C
C -> O (weak)
O -> S
S -> O (weak)
C -> O (weak [this used to be strong])
S -> C (weak)
D -> O

Now the circular reference is gone and D can still reach O, C, and S.

However, there is one problem: C requires a weak reference to reach to S. More concretely the pointer from C to S will be dangling when the reference O -> S gets dropped during the finalization of O.

Peter Xu's patch adds yet another rule to overcome this problem:

Rule 6. When a subregion gets finalized, drop the weak reference from its container to itself.

The dangling reference will be gone with this rule.

Comparison between the two patches
----------------------------------

I'll compare them by showing pros and cons of Peter Xu's patch.

Pro:
a. Will not add yet another direct reference to memory regions where currently there can be only one reference from the owner to each memory region.

Cons:
b. It requires two new rules.
c. C will see its reference to S to disappear at random timing due to a weak reference.

So, which is better do you think?



reply via email to

[Prev in Thread] Current Thread [Next in Thread]