[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID
From: |
Igor Mammedov |
Subject: |
Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID |
Date: |
Fri, 18 Mar 2022 10:56:56 +0100 |
On Fri, 18 Mar 2022 14:23:34 +0800
"wangyanan (Y)" <wangyanan55@huawei.com> wrote:
> Hi Gavin,
>
> On 2022/3/3 11:11, Gavin Shan wrote:
> > The default CPU-to-NUMA association is given by
> > mc->get_default_cpu_node_id()
> > when it isn't provided explicitly. However, the CPU topology isn't fully
> > considered in the default association and it causes CPU topology broken
> > warnings on booting Linux guest.
> >
> > For example, the following warning messages are observed when the Linux
> > guest
> > is booted with the following command lines.
> >
> > /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
> > -accel kvm -machine virt,gic-version=host \
> > -cpu host \
> > -smp 6,sockets=2,cores=3,threads=1 \
> > -m 1024M,slots=16,maxmem=64G \
> > -object memory-backend-ram,id=mem0,size=128M \
> > -object memory-backend-ram,id=mem1,size=128M \
> > -object memory-backend-ram,id=mem2,size=128M \
> > -object memory-backend-ram,id=mem3,size=128M \
> > -object memory-backend-ram,id=mem4,size=128M \
> > -object memory-backend-ram,id=mem4,size=384M \
> > -numa node,nodeid=0,memdev=mem0 \
> > -numa node,nodeid=1,memdev=mem1 \
> > -numa node,nodeid=2,memdev=mem2 \
> > -numa node,nodeid=3,memdev=mem3 \
> > -numa node,nodeid=4,memdev=mem4 \
> > -numa node,nodeid=5,memdev=mem5
> > :
> > alternatives: patching kernel code
> > BUG: arch topology borken
> > the CLS domain not a subset of the MC domain
> > <the above error log repeats>
> > BUG: arch topology borken
> > the DIE domain not a subset of the NODE domain
> >
> > With current implementation of mc->get_default_cpu_node_id(), CPU#0 to CPU#5
> > are associated with NODE#0 to NODE#5 separately. That's incorrect because
> > CPU#0/1/2 should be associated with same NUMA node because they're seated
> > in same socket.
> >
> > This fixes the issue by populating the CPU topology in
> > virt_possible_cpu_arch_ids()
> > and considering the socket index when default CPU-to-NUMA association is
> > given
> > in virt_possible_cpu_arch_ids(). With this applied, no more CPU topology
> > broken
> > warnings are seen from the Linux guest. The 6 CPUs are associated with
> > NODE#0/1,
> > but there are no CPUs associated with NODE#2/3/4/5.
> It may be better to split this patch into two. One extends
> virt_possible_cpu_arch_ids,
> and the other fixes the numa node ID issue.
> >
> > Signed-off-by: Gavin Shan <gshan@redhat.com>
> > ---
> > hw/arm/virt.c | 17 ++++++++++++++++-
> > 1 file changed, 16 insertions(+), 1 deletion(-)
> >
> > diff --git a/hw/arm/virt.c b/hw/arm/virt.c
> > index 46bf7ceddf..dee02b60fc 100644
> > --- a/hw/arm/virt.c
> > +++ b/hw/arm/virt.c
> > @@ -2488,7 +2488,9 @@ virt_cpu_index_to_props(MachineState *ms, unsigned
> > cpu_index)
> >
> > static int64_t virt_get_default_cpu_node_id(const MachineState *ms, int
> > idx)
> > {
> > - return idx % ms->numa_state->num_nodes;
> > + int64_t socket_id = ms->possible_cpus->cpus[idx].props.socket_id;
> > +
> > + return socket_id % ms->numa_state->num_nodes;
> > }
> >
> > static const CPUArchIdList *virt_possible_cpu_arch_ids(MachineState *ms)
> > @@ -2496,6 +2498,7 @@ static const CPUArchIdList
> > *virt_possible_cpu_arch_ids(MachineState *ms)
> > int n;
> > unsigned int max_cpus = ms->smp.max_cpus;
> > VirtMachineState *vms = VIRT_MACHINE(ms);
> > + MachineClass *mc = MACHINE_GET_CLASS(vms);
> >
> > if (ms->possible_cpus) {
> > assert(ms->possible_cpus->len == max_cpus);
> > @@ -2509,6 +2512,18 @@ static const CPUArchIdList
> > *virt_possible_cpu_arch_ids(MachineState *ms)
> > ms->possible_cpus->cpus[n].type = ms->cpu_type;
> > ms->possible_cpus->cpus[n].arch_id =
> > virt_cpu_mp_affinity(vms, n);
> > +
> > + ms->possible_cpus->cpus[n].props.has_socket_id = true;
> > + ms->possible_cpus->cpus[n].props.socket_id =
> > + n / (ms->smp.dies * ms->smp.clusters *
> > + ms->smp.cores * ms->smp.threads);
> > + if (mc->smp_props.dies_supported) {
> > + ms->possible_cpus->cpus[n].props.has_die_id = true;
> > + ms->possible_cpus->cpus[n].props.die_id =
> > + n / (ms->smp.clusters * ms->smp.cores * ms->smp.threads);
> > + }
> I still don't think we need to consider dies if it's certainly not
> supported yet, IOW, we will never come into the if-branch.
> We are populating arm-specific topo info instead of the generic,
> we can probably uniformly update this part together with other
> necessary places when we decide to support dies for arm virt
> machine in the future. :)
it seems we do support dies and they are supposed to be numa boundary too,
so perhaps we should account for it when generating node-id.
> > + ms->possible_cpus->cpus[n].props.has_core_id = true;
> > + ms->possible_cpus->cpus[n].props.core_id = n / ms->smp.threads;
> > ms->possible_cpus->cpus[n].props.has_thread_id = true;
> > ms->possible_cpus->cpus[n].props.thread_id = n;
> > }
> Maybe we should use the same algorithm in x86_topo_ids_from_idx
> to populate the IDs, so that scope of socket-id will be [0, total_sockets),
> scope of thread-id is [0, threads_per_core), and so on. Then with a
> group of socket/cluster/core/thread-id, we determine a CPU.
>
> Suggestion: For the long term, is it necessary now to add similar topo
> info infrastructure for ARM, such as X86CPUTopoInfo, X86CPUTopoIDs,
> x86_topo_ids_from_idx?
>
> Thanks,
> Yanan
>
- [PATCH v2 0/3] hw/arm/virt: Fix CPU's default NUMA node ID, Gavin Shan, 2022/03/02
- [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID, Gavin Shan, 2022/03/02
- Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID, wangyanan (Y), 2022/03/18
- Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID,
Igor Mammedov <=
- Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID, wangyanan (Y), 2022/03/18
- Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID, Igor Mammedov, 2022/03/18
- Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID, wangyanan (Y), 2022/03/20
- Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID, Gavin Shan, 2022/03/22
- Re: [PATCH v2 1/3] hw/arm/virt: Fix CPU's default NUMA node ID, Gavin Shan, 2022/03/22
[PATCH v2 2/3] hw/acpi/aml-build: Use existing CPU topology to build PPTT table, Gavin Shan, 2022/03/02
[PATCH v2 3/3] hw/arm/virt: Unify ACPI processor ID in MADT and SRAT table, Gavin Shan, 2022/03/02