[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] hw/riscv: split RAM into low and high memory
From: |
Wu, Fei |
Subject: |
Re: [PATCH] hw/riscv: split RAM into low and high memory |
Date: |
Fri, 8 Sep 2023 08:16:22 +0800 |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0 Thunderbird/102.15.0 |
On 9/7/2023 11:46 PM, Anup Patel wrote:
> On Tue, Aug 1, 2023 at 4:16 AM Daniel Henrique Barboza
> <dbarboza@ventanamicro.com> wrote:
>>
>>
>>
>> On 7/30/23 22:53, Fei Wu wrote:
>>> riscv virt platform's memory started at 0x80000000 and
>>> straddled the 4GiB boundary. Curiously enough, this choice
>>> of a memory layout will prevent from launching a VM with
>>> a bit more than 2000MiB and PCIe pass-thru on an x86 host, due
>>> to identity mapping requirements for the MSI doorbell on x86,
>>> and these (APIC/IOAPIC) live right below 4GiB.
>>>
>>> So just split the RAM range into two portions:
>>> - 1 GiB range from 0x80000000 to 0xc0000000.
>>> - The remainder at 0x100000000
>>>
>>> ...leaving a hole between the ranges.
>>
>> I am afraid this breaks some existing distro setups, like Ubuntu. After this
>> patch
>> this emulation stopped working:
>>
>> ~/work/qemu/build/qemu-system-riscv64 \
>> -machine virt -nographic -m 8G -smp 8 \
>> -kernel ./uboot-ubuntu/usr/lib/u-boot/qemu-riscv64_smode/uboot.elf \
>> -drive file=snapshot.img,format=qcow2,if=virtio \
>> -netdev bridge,id=bridge1,br=virbr0 -device
>> virtio-net-pci,netdev=bridge1
>>
>>
>> This is basically a guest created via the official Canonical tutorial:
>>
>> https://wiki.ubuntu.com/RISC-V/QEMU
>>
>> The error being thrown:
>>
>> =================
>>
>> Boot HART ID : 4
>> Boot HART Domain : root
>> Boot HART Priv Version : v1.12
>> Boot HART Base ISA : rv64imafdch
>> Boot HART ISA Extensions : time,sstc
>> Boot HART PMP Count : 16
>> Boot HART PMP Granularity : 4
>> Boot HART PMP Address Bits: 54
>> Boot HART MHPM Count : 16
>> Boot HART MIDELEG : 0x0000000000001666
>> Boot HART MEDELEG : 0x0000000000f0b509
>>
>>
>> U-Boot 2022.07+dfsg-1ubuntu4.2 (Nov 24 2022 - 18:47:41 +0000)
>>
>> CPU:
>> rv64imafdch_zicbom_zicboz_zicsr_zifencei_zihintpause_zawrs_zfa_zca_zcd_zba_zbb_zbc_zbs_sstc_svadu
>> Model: riscv-virtio,qemu
>> DRAM: Unhandled exception: Store/AMO access fault
>> EPC: 00000000802018b8 RA: 00000000802126a0 TVAL: 00000000ff733f90
>>
>> Code: b823 06b2 bc23 06b2 b023 08b2 b423 08b2 (b823 08b2)
>>
>>
>> resetting ...
>> System reset not supported on this platform
>> ### ERROR ### Please RESET the board ###
>> QEMU 8.0.90 monitor - type 'help' for more infor
>> =================
>
> Can you try again after setting CONFIG_NR_DRAM_BANKS=2 in
> qemu-riscv64_smode_defconfig and qemu-riscv64_spl_defconfig ?
>
Yes, I made a u-boot patch to change this setting and also use
fdtdec_setup_mem_size_base_lowest() instead fdtdec_setup_mem_size_base()
in dram_init(), the latter is also necessary. The patch has been posted
to u-boot mailing list but got no reply yet:
https://lists.denx.de/pipermail/u-boot/2023-September/529729.html
Thanks,
Fei.
> Regards,
> Anup
>
>>
>>
>> Based on the change made I can make an educated guess on what is going wrong.
>> We have another board with a similar memory topology you're making here, the
>> Microchip Polarfire (microchip_pfsoc.c). We were having some problems with
>> this
>> board while trying to consolidate the boot process between all boards in
>> hw/riscv/boot.c because of its non-continuous RAM bank. The full story can be
>> read in the commit message of 4b402886ac89 ("hw/riscv: change
>> riscv_compute_fdt_addr()
>> semantics") but the short version can be seen in riscv_compute_fdt_addr()
>> from boot.c:
>>
>> - if ram_start is less than 3072MiB, the FDT will be put at the lowest
>> value
>> between 3072 MiB and the end of that RAM bank;
>>
>> - if ram_start is higher than 3072 MiB the FDT will be put at the end of the
>> RAM bank.
>>
>> So, after this patch, since riscv_compute_fdt_addr() is being used with the
>> now
>> lower RAM bank, the fdt is being put in LOW_MEM - fdt_size for any setup
>> that has
>> more than 1Gb RAM, and this breaks assumptions made by uboot and Ubuntu and
>> possibly
>> others that are trying to retrieve the FDT from the gap that you created
>> between
>> low and hi mem in this patch.
>>
>> In fact, this same Ubuntu guest I mentioned above will boot if I put only 1
>> Gb of RAM
>> (-m 1Gb). If I try with -m 1.1Gb I reproduce this error. This can be a
>> validation of
>> the guess I'm making here: Ubuntu is trying to fetch stuff (probably the
>> fdt) from
>> the gap between the memory areas.
>>
>> This change on top of this patch doesn't work either:
>>
>> $ git diff
>> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
>> index 8fbdc7220c..dfff48d849 100644
>> --- a/hw/riscv/virt.c
>> +++ b/hw/riscv/virt.c
>> @@ -1335,9 +1335,16 @@ static void virt_machine_done(Notifier *notifier,
>> void *data)
>> kernel_start_addr, true, NULL);
>> }
>>
>> - fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base,
>> + if (machine->ram_size < memmap[VIRT_DRAM].size) {
>> + fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM].base,
>> memmap[VIRT_DRAM].size,
>> machine);
>> + } else {
>> + fdt_load_addr = riscv_compute_fdt_addr(memmap[VIRT_DRAM_HIGH].base,
>> + memmap[VIRT_DRAM_HIGH].size,
>> + machine);
>> + }
>> +
>>
>>
>> This would put the fdt at the end of the HI RAM for guests with more than
>> 1Gb of RAM.
>> This change in fact makes the situation even worse, breaking setups that
>> were working
>> before with this patch.
>>
>> There's a chance that reducing the gap between the RAM banks can make Ubuntu
>> work
>> reliably again but now I'm a little cold feet with this change.
>>
>>
>> I think we'll need some kernel/Opensbi folks to weight in here to see if
>> there's a
>> failsafe memory setup that won't break distros out there and allow your
>> passthrough
>> to work.
>>
>>
>>
>> Thanks,
>>
>> Daniel
>>
>>>
>>> Signed-off-by: Andrei Warkentin <andrei.warkentin@intel.com>
>>> Signed-off-by: Fei Wu <fei2.wu@intel.com>
>>> ---
>>> hw/riscv/virt.c | 74 ++++++++++++++++++++++++++++++++++++-----
>>> include/hw/riscv/virt.h | 1 +
>>> 2 files changed, 66 insertions(+), 9 deletions(-)
>>>
>>> diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
>>> index d90286dc46..8fbdc7220c 100644
>>> --- a/hw/riscv/virt.c
>>> +++ b/hw/riscv/virt.c
>>> @@ -75,7 +75,9 @@
>>> #error "Can't accomodate all IMSIC groups in address space"
>>> #endif
>>>
>>> -static const MemMapEntry virt_memmap[] = {
>>> +#define LOW_MEM (1 * GiB)
>>> +
>>> +static MemMapEntry virt_memmap[] = {
>>> [VIRT_DEBUG] = { 0x0, 0x100 },
>>> [VIRT_MROM] = { 0x1000, 0xf000 },
>>> [VIRT_TEST] = { 0x100000, 0x1000 },
>>> @@ -96,6 +98,7 @@ static const MemMapEntry virt_memmap[] = {
>>> [VIRT_PCIE_ECAM] = { 0x30000000, 0x10000000 },
>>> [VIRT_PCIE_MMIO] = { 0x40000000, 0x40000000 },
>>> [VIRT_DRAM] = { 0x80000000, 0x0 },
>>> + [VIRT_DRAM_HIGH] = { 0x100000000, 0x0 },
>>> };
>>>
>>> /* PCIe high mmio is fixed for RV32 */
>>> @@ -295,15 +298,12 @@ static void create_fdt_socket_cpus(RISCVVirtState *s,
>>> int socket,
>>> }
>>> }
>>>
>>> -static void create_fdt_socket_memory(RISCVVirtState *s,
>>> - const MemMapEntry *memmap, int socket)
>>> +static void create_fdt_socket_mem_range(RISCVVirtState *s, uint64_t addr,
>>> + uint64_t size, int socket)
>>> {
>>> char *mem_name;
>>> - uint64_t addr, size;
>>> MachineState *ms = MACHINE(s);
>>>
>>> - addr = memmap[VIRT_DRAM].base + riscv_socket_mem_offset(ms, socket);
>>> - size = riscv_socket_mem_size(ms, socket);
>>> mem_name = g_strdup_printf("/memory@%lx", (long)addr);
>>> qemu_fdt_add_subnode(ms->fdt, mem_name);
>>> qemu_fdt_setprop_cells(ms->fdt, mem_name, "reg",
>>> @@ -313,6 +313,34 @@ static void create_fdt_socket_memory(RISCVVirtState *s,
>>> g_free(mem_name);
>>> }
>>>
>>> +static void create_fdt_socket_memory(RISCVVirtState *s,
>>> + const MemMapEntry *memmap, int socket)
>>> +{
>>> + uint64_t addr, size;
>>> + MachineState *mc = MACHINE(s);
>>> + uint64_t sock_offset = riscv_socket_mem_offset(mc, socket);
>>> + uint64_t sock_size = riscv_socket_mem_size(mc, socket);
>>> +
>>> + if (sock_offset < memmap[VIRT_DRAM].size) {
>>> + uint64_t low_mem_end = memmap[VIRT_DRAM].base +
>>> memmap[VIRT_DRAM].size;
>>> +
>>> + addr = memmap[VIRT_DRAM].base + sock_offset;
>>> + size = MIN(low_mem_end - addr, sock_size);
>>> + create_fdt_socket_mem_range(s, addr, size, socket);
>>> +
>>> + size = sock_size - size;
>>> + if (size > 0) {
>>> + create_fdt_socket_mem_range(s, memmap[VIRT_DRAM_HIGH].base,
>>> + size, socket);
>>> + }
>>> + } else {
>>> + addr = memmap[VIRT_DRAM_HIGH].base +
>>> + sock_offset - memmap[VIRT_DRAM].size;
>>> +
>>> + create_fdt_socket_mem_range(s, addr, sock_size, socket);
>>> + }
>>> +}
>>> +
>>> static void create_fdt_socket_clint(RISCVVirtState *s,
>>> const MemMapEntry *memmap, int socket,
>>> uint32_t *intc_phandles)
>>> @@ -1334,10 +1362,12 @@ static void virt_machine_done(Notifier *notifier,
>>> void *data)
>>>
>>> static void virt_machine_init(MachineState *machine)
>>> {
>>> - const MemMapEntry *memmap = virt_memmap;
>>> + MemMapEntry *memmap = virt_memmap;
>>> RISCVVirtState *s = RISCV_VIRT_MACHINE(machine);
>>> MemoryRegion *system_memory = get_system_memory();
>>> MemoryRegion *mask_rom = g_new(MemoryRegion, 1);
>>> + MemoryRegion *ram_below_4g, *ram_above_4g;
>>> + uint64_t ram_size_low, ram_size_high;
>>> char *soc_name;
>>> DeviceState *mmio_irqchip, *virtio_irqchip, *pcie_irqchip;
>>> int i, base_hartid, hart_count;
>>> @@ -1448,6 +1478,17 @@ static void virt_machine_init(MachineState *machine)
>>> }
>>> }
>>>
>>> + if (machine->ram_size > LOW_MEM) {
>>> + ram_size_high = machine->ram_size - LOW_MEM;
>>> + ram_size_low = LOW_MEM;
>>> + } else {
>>> + ram_size_high = 0;
>>> + ram_size_low = machine->ram_size;
>>> + }
>>> +
>>> + memmap[VIRT_DRAM].size = ram_size_low;
>>> + memmap[VIRT_DRAM_HIGH].size = ram_size_high;
>>> +
>>> if (riscv_is_32bit(&s->soc[0])) {
>>> #if HOST_LONG_BITS == 64
>>> /* limit RAM size in a 32-bit system */
>>> @@ -1460,7 +1501,8 @@ static void virt_machine_init(MachineState *machine)
>>> virt_high_pcie_memmap.size = VIRT32_HIGH_PCIE_MMIO_SIZE;
>>> } else {
>>> virt_high_pcie_memmap.size = VIRT64_HIGH_PCIE_MMIO_SIZE;
>>> - virt_high_pcie_memmap.base = memmap[VIRT_DRAM].base +
>>> machine->ram_size;
>>> + virt_high_pcie_memmap.base = memmap[VIRT_DRAM_HIGH].base +
>>> + memmap[VIRT_DRAM_HIGH].size;
>>> virt_high_pcie_memmap.base =
>>> ROUND_UP(virt_high_pcie_memmap.base,
>>> virt_high_pcie_memmap.size);
>>> }
>>> @@ -1468,8 +1510,22 @@ static void virt_machine_init(MachineState *machine)
>>> s->memmap = virt_memmap;
>>>
>>> /* register system main memory (actual RAM) */
>>> + ram_below_4g = g_malloc(sizeof(*ram_below_4g));
>>> + memory_region_init_alias(ram_below_4g, NULL, "ram-below-4g",
>>> machine->ram,
>>> + 0, memmap[VIRT_DRAM].size);
>>> memory_region_add_subregion(system_memory, memmap[VIRT_DRAM].base,
>>> - machine->ram);
>>> + ram_below_4g);
>>> +
>>> + if (memmap[VIRT_DRAM_HIGH].size) {
>>> + ram_above_4g = g_malloc(sizeof(*ram_above_4g));
>>> + memory_region_init_alias(ram_above_4g, NULL, "ram-above-4g",
>>> + machine->ram,
>>> + memmap[VIRT_DRAM].size,
>>> + memmap[VIRT_DRAM_HIGH].size);
>>> + memory_region_add_subregion(system_memory,
>>> + memmap[VIRT_DRAM_HIGH].base,
>>> + ram_above_4g);
>>> + }
>>>
>>> /* boot rom */
>>> memory_region_init_rom(mask_rom, NULL, "riscv_virt_board.mrom",
>>> diff --git a/include/hw/riscv/virt.h b/include/hw/riscv/virt.h
>>> index e5c474b26e..36004eb6ef 100644
>>> --- a/include/hw/riscv/virt.h
>>> +++ b/include/hw/riscv/virt.h
>>> @@ -79,6 +79,7 @@ enum {
>>> VIRT_IMSIC_S,
>>> VIRT_FLASH,
>>> VIRT_DRAM,
>>> + VIRT_DRAM_HIGH,
>>> VIRT_PCIE_MMIO,
>>> VIRT_PCIE_PIO,
>>> VIRT_PLATFORM_BUS,
>>