On Tue, 14 Jan 2025 at 19:12, Peter Xu <peterx@redhat.com> wrote:
On Tue, Jan 14, 2025 at 05:42:57PM +0000, Peter Maydell wrote:
There's at least one test in the arm qtests that will hit this.
I suspect that you'll find that most architectures except x86
(where we don't have models of complex SoCs and the few
machines we do have tend to be old code that is less QOMified)
will hit similar issues. I think there's a general issue here,
this isn't just "some particular ppc device is wrongly coded".
I see. Do you know how many of them would be important memory leaks that
we should fix immediately?
None of these are important memory leaks, because the device is
almost always present for the lifetime of the simulation. The
only case you'd actually get a visible leak would be if you
could hot-unplug the device, and even then you'd have to
deliberately sit there doing hot-plug-then-unplug cycles to
leak an interesting amount of memory.
The main reason to want to fix them is that it lets us run
"make check" under the sanitizer and catch other, more
interesting leaks.
I mean, we have known memory leaks in QEMU in many places I assume. I am
curious how important this problem is, and whether such would justify a
memory API change that is not reaching a quorum state (and, imho, add
complexity to memory core and of course that spreads to x86 too even if it
was not affected) to be merged. Or perhaps we can fix the important ones
first from the device model directly instead.
The problem is generic, and the problem is that we have not actually
nailed down how this is supposed to work, i.e:
* what are the reference counts counting?
* if a device has this kind of memory region inside another,
how is it supposed to be coded so as to not leak memory?
If we can figure out how the lifecycle and memory management
is supposed to work, then yes, we can fix the relevant device
models so that they follow whatever the rules are. But it seems