qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v6 1/2] qom: new object to associate device to numa node


From: Dan Williams
Subject: Re: [PATCH v6 1/2] qom: new object to associate device to numa node
Date: Wed, 10 Jan 2024 15:19:05 -0800

David Hildenbrand wrote:
> On 09.01.24 17:52, Jonathan Cameron wrote:
> > On Thu, 4 Jan 2024 10:39:41 -0700
> > Alex Williamson <alex.williamson@redhat.com> wrote:
> > 
> >> On Thu, 4 Jan 2024 16:40:39 +0000
> >> Ankit Agrawal <ankita@nvidia.com> wrote:
> >>
> >>> Had a discussion with RH folks, summary follows:
> >>>
> >>> 1. To align with the current spec description pointed by Jonathan, we 
> >>> first do
> >>>       a separate object instance per GI node as suggested by Jonathan. 
> >>> i.e.
> >>>       a acpi-generic-initiator would only link one node to the device. To
> >>>       associate a set of nodes, those number of object instances should be
> >>>       created.
> >>> 2. In parallel, we work to get the spec updated. After the update, we 
> >>> switch
> >>>      to the current implementation to link a PCI device with a set of NUMA
> >>>      nodes.
> >>>
> >>> Alex/Jonathan, does this sound fine?
> >>>    
> >>
> >> Yes, as I understand Jonathan's comments, the acpi-generic-initiator
> >> object should currently define a single device:node relationship to
> >> match the ACPI definition.
> > 
> > Doesn't matter for this, but it's a many_device:single_node
> > relationship as currently defined. We should be able to support that
> > in any new interfaces for QEMU.
> > 
> >>   Separately a clarification of the spec
> >> could be pursued that could allow us to reinstate a node list option
> >> for the acpi-generic-initiator object.  In the interim, a user can
> >> define multiple 1:1 objects to create the 1:N relationship that's
> >> ultimately required here.  Thanks,
> > 
> > Yes, a spec clarification would work, probably needs some text
> > to say a GI might not be an initiator as well - my worry is
> > theoretical backwards compatibility with a (probably
> > nonexistent) OS that assumes the N:1 mapping. So you may be in
> > new SRAT entry territory.
> > 
> > Given that, an alternative proposal that I think would work
> > for you would be to add a 'placeholder' memory node definition
> > in SRAT (so allow 0 size explicitly - might need a new SRAT
> > entry to avoid backwards compat issues).
> 
> Putting all the PCI/GI/... complexity aside, I'll just raise again that 
> for virtio-mem something simple like that might be helpful as well, IIUC.
> 
>       -numa node,nodeid=2 \
>       ...
>       -device virtio-mem-pci,node=2,... \
> 
> All we need is the OS to prepare for an empty node that will get 
> populated with memory later.
> 
> So if that's what a "placeholder" node definition in srat could achieve 
> as well, even without all of the other acpi-generic-initiator stuff, 
> that would be great.

Please no "placeholder" definitions in SRAT. One of the main thrusts of
CXL is to move away from static ACPI tables describing vendor-specific
memory topology, towards an industry standard device enumeration.

Platform firmware enumerates the platform CXL "windows" (ACPI CEDT
CFMWS) and the relative performance of the CPU access a CXL port (ACPI
HMAT Generic Port), everything else is CXL standard enumeration.

It is strictly OS policy about how many NUMA nodes it imagines it wants
to define within that playground. The current OS policy is one node per
"window". If a solution believes Linux should be creating more than that
I submit that's a discussion with OS policy developers, not a trip to
the BIOS team to please sprinkle in more placeholders. Linux can fully
own the policy here. The painful bit is just that it never had to
before.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]