qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/arm/boot: Use NUMA node ID in memory node name


From: Gavin Shan
Subject: Re: [PATCH] hw/arm/boot: Use NUMA node ID in memory node name
Date: Wed, 23 Jun 2021 14:43:49 +1000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0

Hi Drew,

On 6/22/21 5:13 PM, Andrew Jones wrote:
On Tue, Jun 22, 2021 at 06:53:41PM +1000, Gavin Shan wrote:
On 6/3/21 2:48 PM, Gavin Shan wrote:
On 6/2/21 9:36 PM, Andrew Jones wrote:
On Wed, Jun 02, 2021 at 11:09:32AM +1000, Gavin Shan wrote:
On 6/1/21 5:50 PM, Andrew Jones wrote:
On Tue, Jun 01, 2021 at 03:30:04PM +0800, Gavin Shan wrote:
We possibly populate empty nodes where memory isn't included and might
be hot added at late time. The FDT memory nodes can't be created due
to conflicts on their names if multiple empty nodes are specified.
For example, the VM fails to start with the following error messages.

     /home/gavin/sandbox/qemu.main/build/qemu-system-aarch64          \
     -accel kvm -machine virt,gic-version=host                        \
     -cpu host -smp 4,sockets=2,cores=2,threads=1 -m 1024M,maxmem=64G \
     -object memory-backend-ram,id=mem0,size=512M                     \
     -object memory-backend-ram,id=mem1,size=512M                     \
     -numa node,nodeid=0,cpus=0-1,memdev=mem0                         \
     -numa node,nodeid=1,cpus=2-3,memdev=mem1                         \
     -numa node,nodeid=2                                              \
     -numa node,nodeid=3                                              \
       :
     -device virtio-balloon-pci,id=balloon0,free-page-reporting=yes

     qemu-system-aarch64: FDT: Failed to create subnode /memory@80000000: \
                          FDT_ERR_EXISTS

This fixes the issue by using NUMA node ID or zero in the memory node
name to avoid the conflicting memory node names. With this applied, the
VM can boot successfully with above command lines.

Signed-off-by: Gavin Shan <gshan@redhat.com>
---
    hw/arm/boot.c | 7 ++++++-
    1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index d7b059225e..3169bdf595 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -432,7 +432,12 @@ static int fdt_add_memory_node(void *fdt, uint32_t acells, 
hwaddr mem_base,
        char *nodename;
        int ret;
-    nodename = g_strdup_printf("/memory@%" PRIx64, mem_base);
+    if (numa_node_id >= 0) {
+        nodename = g_strdup_printf("/memory@%d", numa_node_id);
+    } else {
+        nodename = g_strdup("/memory@0");
+    }
+
        qemu_fdt_add_subnode(fdt, nodename);
        qemu_fdt_setprop_string(fdt, nodename, "device_type", "memory");
        ret = qemu_fdt_setprop_sized_cells(fdt, nodename, "reg", acells, 
mem_base,

[...]


I've sent one separate mail to check with Rob Herring. Hopefully he have
ideas as he is maintaining linux FDT subsystem. You have been included to
that thread. I didn't find something meaningful to this thread after doing
some google search either.

Yes, I agree with you we need to follow the specification strictly. It seems
it's uncertain about the 'physical memory map' bus binding requirements.


I didn't get expected answers from device-tree experts.

What response did you get? Can you please provide a link to the discussion?


Sorry about the confusion. I meant "no response" by "expected answers". Here
is the mail sent to Rob before, no reply so far:

https://lkml.org/lkml/2021/6/2/1446

After rethinking about it,
I plan to fix this like this way, but please let me know if it sounds sensible
to you.

The idea is to assign a (not overlapped) dummy base address to each memory
node in the device-tree. The dummy is (last_valid_memory_address + NUMA ID).
The 'length' of the 'reg' property in the device-tree nodes, corresponding
to empty NUMA nodes, is still zero. This ensures the nodes are still invalid
until memory is added to these nodes.

I had the temporary patch for the implementation. It works fine and VM can
boot up successfully.

I would like to be sure that the kernel developers for NUMA, memory
hotplug, and devicetree specifications are all happy with the approach
before adding it to QEMU.


As I understood, it won't break anything from perspectives of NUMA
and device-tree specification. First of all, NUMA cares the so-called
distance map and 'numa-node-id' property in the individual device-tree
nodes. The device-tree specification doesn't say 'length' in 'reg'
property of memory node can't be zero. Also, the linux device-tree
implementation has the check on 'length', the memory block won't be
added if it's zero.

Documentation/devicetree/bindings/numa.txt has more details about
the required device-tree NUMA properties.

I'm not sure about memory hotplug. I tried memory hot add and it seems
working finely. Memory hot-add/remove are working without issues:

/home/gavin/sandbox/qemu.main/build/qemu-system-aarch64 \
-accel kvm -machine virt,gic-version=host               \
-cpu host -smp 4,sockets=2,cores=2,threads=1            \
-m 4096M,slots=16,maxmem=64G                            \
-object memory-backend-ram,id=mem0,size=2048M           \
-object memory-backend-ram,id=mem1,size=2048M           \
-numa node,nodeid=0,cpus=0-1,memdev=mem0                \
-numa node,nodeid=1,cpus=2-3,memdev=mem1                \
-numa node,nodeid=2                                     \
-numa node,nodeid=3
   :

Memory hot-add
===============

# cat /proc/meminfo  | grep MemTotal
MemTotal:        4027472 kB
# cat /sys/devices/system/node/node2/meminfo | grep MemTotal
Node 2 MemTotal:       0 kB

(qemu) object_add memory-backend-ram,id=hp-mem0,size=1G
(qemu) device_add pc-dimm,id=dimm0,memdev=hp-mem0,node=3

# cat /proc/meminfo  | grep MemTotal
MemTotal:        5076048 kB
# cat /sys/devices/system/node/node2/meminfo | grep MemTotal
Node 2 MemTotal: 1048576 kB

Memory hot-remove
=================

(qemu) device_del dimm0
(qemu) object_del hp-mem0

# cat /proc/meminfo  | grep MemTotal
MemTotal:        4027472 kB
# cat /sys/devices/system/node/node2/meminfo | grep MemTotal
cat: can't open '/sys/devices/system/node/node2/meminfo': No such file or 
directory

After this point, the memory can be added again without issues with
"object_add" and "device_add". So it sounds reasonable to remove
the empty NUMA node during memory hot-remove.

Thanks,
Gavin




reply via email to

[Prev in Thread] Current Thread [Next in Thread]