qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID


From: Gavin Shan
Subject: Re: [PATCH] hw/arm/virt: Fix CPU's default NUMA node ID
Date: Thu, 17 Feb 2022 10:14:45 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0

On 1/26/22 5:14 PM, Igor Mammedov wrote:
On Wed, 26 Jan 2022 13:24:10 +0800
Gavin Shan <gshan@redhat.com> wrote:

The default CPU-to-NUMA association is given by mc->get_default_cpu_node_id()
when it isn't provided explicitly. However, the CPU topology isn't fully
considered in the default association and it causes CPU topology broken
warnings on booting Linux guest.

For example, the following warning messages are observed when the Linux guest
is booted with the following command lines.

   /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
   -accel kvm -machine virt,gic-version=host               \
   -cpu host                                               \
   -smp 6,sockets=2,cores=3,threads=1                      \
   -m 1024M,slots=16,maxmem=64G                            \
   -object memory-backend-ram,id=mem0,size=128M            \
   -object memory-backend-ram,id=mem1,size=128M            \
   -object memory-backend-ram,id=mem2,size=128M            \
   -object memory-backend-ram,id=mem3,size=128M            \
   -object memory-backend-ram,id=mem4,size=128M            \
   -object memory-backend-ram,id=mem4,size=384M            \
   -numa node,nodeid=0,memdev=mem0                         \
   -numa node,nodeid=1,memdev=mem1                         \
   -numa node,nodeid=2,memdev=mem2                         \
   -numa node,nodeid=3,memdev=mem3                         \
   -numa node,nodeid=4,memdev=mem4                         \
   -numa node,nodeid=5,memdev=mem5
          :
   alternatives: patching kernel code
   BUG: arch topology borken
   the CLS domain not a subset of the MC domain
   <the above error log repeats>
   BUG: arch topology borken
   the DIE domain not a subset of the NODE domain

With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5
are associated with NODE#0 to NODE#5 separately. That's incorrect because
CPU#0/1/2 should be associated with same NUMA node because they're seated
in same socket.

This fixes the issue by considering the socket when default CPU-to-NUMA
is given. With this applied, no more CPU topology broken warnings are seen
from the Linux guest. The 6 CPUs are associated with NODE#0/1, but there are
no CPUs associated with NODE#2/3/4/5.

From migration point of view it looks fine to me, and doesn't need a compat knob
since NUMA data (on virt-arm) only used to construct ACPI tables (and we don't
version those unless something is broken by it).


Signed-off-by: Gavin Shan <gshan@redhat.com>
---
  hw/arm/virt.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 141350bf21..b4a95522d3 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2499,7 +2499,7 @@ virt_cpu_index_to_props(MachineState *ms, unsigned 
cpu_index)
static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
  {
-    return idx % ms->numa_state->num_nodes;
+    return idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * 
ms->smp.threads);

I'd like for ARM folks to confirm whether above is correct
(i.e. socket is NUMA node boundary and also if above topo vars
could have odd values. Don't look at horribly complicated x86
as example, but it showed that vendors could stash pretty much
anything there, so we should consider it here as well and maybe
forbid that in smp virt-arm parser)


After doing some investigation, I don't think the socket is NUMA node boundary.
Unfortunately, I didn't find it's documented like this in any documents after
checking device-tree specification, Linux CPU topology and NUMA binding 
documents.

However, there are two options here according to Linux (guest) kernel code:
(A) socket is NUMA node boundary  (B) CPU die is NUMA node boundary. They are
equivalent as CPU die isn't supported on arm/virt machine. Besides, the topology
of one-to-one association between socket and NUMA node sounds natural and 
simplified.
So I think (A) is the best way to go.

Another thing I want to explain here is how the changes affect the memory
allocation in Linux guest. Taking the command lines included in the commit
log as an example, the first two NUMA nodes are bound to CPUs while the other
4 NUMA nodes are regarded as remote NUMA nodes to CPUs. The remote NUMA node
won't accommodate the memory allocation until the memory in the near (local)
NUMA node becomes exhausted. However, it's uncertain how the memory is hosted
if memory binding isn't applied.

Besides, I think the code should be improved like below to avoid overflow on
ms->numa_state->num_nodes.

 static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int idx)
 {
-    return idx % ms->numa_state->num_nodes;
+    int node_idx;
+
+    node_idx = idx / (ms->smp.dies * ms->smp.clusters * ms->smp.cores * 
ms->smp.threads);
+    return node_idx % ms->numa_state->num_nodes;
 }


  }
static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)


Thanks,
Gavin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]