qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] mem/x86: add processor address space check for VM memory


From: Ani Sinha
Subject: Re: [PATCH] mem/x86: add processor address space check for VM memory
Date: Thu, 14 Sep 2023 11:23:42 +0530


> On 12-Sep-2023, at 9:04 PM, David Hildenbrand <david@redhat.com> wrote:
> 
> [...]
> 
>>> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
>>> index 54838c0c41..d187890675 100644
>>> --- a/hw/i386/pc.c
>>> +++ b/hw/i386/pc.c
>>> @@ -908,9 +908,12 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, 
>>> uint64_t pci_hole64_size)
>>> {
>>>     X86CPU *cpu = X86_CPU(first_cpu);
>>> 
>>> -    /* 32-bit systems don't have hole64 thus return max CPU address */
>>> -    if (cpu->phys_bits <= 32) {
>>> -        return ((hwaddr)1 << cpu->phys_bits) - 1;
>>> +    /*
>>> +     * 32-bit systems don't have hole64, but we might have a region for
>>> +     * memory hotplug.
>>> +     */
>>> +    if (!(cpu->env.features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM)) {
>>> +        return pc_pci_hole64_start() - 1;
>> Ok this is very confusing! I am looking at pc_pci_hole64_start() function. I 
>> have a few questions …
>> (a) pc_get_device_memory_range() returns the size of the device memory as 
>> the difference between ram_size and maxram_size. But from what I understand, 
>> ram_size is the actual size of the ram present and maxram_size is the max 
>> size of ram *after* hot plugging additional memory. How can we assume that 
>> the additional available space is already occupied by hot plugged memory?
> 
> Let's take a look at an example:
> 
> $ ./build/qemu-system-x86_64 -m 8g,maxmem=16g,slots=1 \
>  -object memory-backend-ram,id=mem0,size=1g \
>  -device pc-dimm,memdev=mem0 \
>  -nodefaults -nographic -S -monitor stdio
> 
> (qemu) info mtree
> ...
> memory-region: system
>  0000000000000000-ffffffffffffffff (prio 0, i/o): system
>    0000000000000000-00000000bfffffff (prio 0, ram): alias ram-below-4g 
> @pc.ram 0000000000000000-00000000bfffffff
>    0000000000000000-ffffffffffffffff (prio -1, i/o): pci
>      00000000000c0000-00000000000dffff (prio 1, rom): pc.rom
>      00000000000e0000-00000000000fffff (prio 1, rom): alias isa-bios @pc.bios 
> 0000000000020000-000000000003ffff
>      00000000fffc0000-00000000ffffffff (prio 0, rom): pc.bios
>    00000000000a0000-00000000000bffff (prio 1, i/o): alias smram-region @pci 
> 00000000000a0000-00000000000bffff
>    00000000000c0000-00000000000c3fff (prio 1, i/o): alias pam-pci @pci 
> 00000000000c0000-00000000000c3fff
>    00000000000c4000-00000000000c7fff (prio 1, i/o): alias pam-pci @pci 
> 00000000000c4000-00000000000c7fff
>    00000000000c8000-00000000000cbfff (prio 1, i/o): alias pam-pci @pci 
> 00000000000c8000-00000000000cbfff
>    00000000000cc000-00000000000cffff (prio 1, i/o): alias pam-pci @pci 
> 00000000000cc000-00000000000cffff
>    00000000000d0000-00000000000d3fff (prio 1, i/o): alias pam-pci @pci 
> 00000000000d0000-00000000000d3fff
>    00000000000d4000-00000000000d7fff (prio 1, i/o): alias pam-pci @pci 
> 00000000000d4000-00000000000d7fff
>    00000000000d8000-00000000000dbfff (prio 1, i/o): alias pam-pci @pci 
> 00000000000d8000-00000000000dbfff
>    00000000000dc000-00000000000dffff (prio 1, i/o): alias pam-pci @pci 
> 00000000000dc000-00000000000dffff
>    00000000000e0000-00000000000e3fff (prio 1, i/o): alias pam-pci @pci 
> 00000000000e0000-00000000000e3fff
>    00000000000e4000-00000000000e7fff (prio 1, i/o): alias pam-pci @pci 
> 00000000000e4000-00000000000e7fff
>    00000000000e8000-00000000000ebfff (prio 1, i/o): alias pam-pci @pci 
> 00000000000e8000-00000000000ebfff
>    00000000000ec000-00000000000effff (prio 1, i/o): alias pam-pci @pci 
> 00000000000ec000-00000000000effff
>    00000000000f0000-00000000000fffff (prio 1, i/o): alias pam-pci @pci 
> 00000000000f0000-00000000000fffff
>    00000000fec00000-00000000fec00fff (prio 0, i/o): ioapic
>    00000000fed00000-00000000fed003ff (prio 0, i/o): hpet
>    00000000fee00000-00000000feefffff (prio 4096, i/o): apic-msi
>    0000000100000000-000000023fffffff (prio 0, ram): alias ram-above-4g 
> @pc.ram 00000000c0000000-00000001ffffffff
>    0000000240000000-000000047fffffff (prio 0, i/o): device-memory
>      0000000240000000-000000027fffffff (prio 0, ram): mem0
> 
> 
> We requested 8G of boot memory, which is split between "<4G" memory and 
> ">=4G" memory.
> 
> We only place exactly 3G (0x0->0xbfffffff) under 4G, starting at address 0.

I can’t reconcile this with this code for q35:

   if (machine->ram_size >= 0xb0000000) {
        lowmem = 0x80000000; // max memory 0x8fffffff or 2.25 GiB               
                                                                                
                            
    } else {
        lowmem = 0xb0000000; // max memory 0xbfffffff or 3 GiB                  
                                                                                
              
    }

You assigned 8 Gib to ram which is > 0xb0000000 (2.75 Gib) 


> 
> We leave the remainder (1G) of the <4G addresses available for I/O devices 
> (32bit PCI hole).
> 
> So we end up with 5G (0x100000000->0x23fffffff) of memory starting exactly at 
> address 4G.
> 
> "maxram_size - ram_size"=8G is the maximum amount of memory you can hotplug. 
> We use it to size the
> "device-memory" region:
> 
> 0x47fffffff - 0x240000000+1 = 0x240000000
> -> 9 GiB
> 
> We requested a to hotplug a maximum of "8 GiB", and sized the area slightly 
> larger to allow for some flexibility
> when it comes to placing DIMMs in that "device-memory" area.

Right but here in this example you do not hot plug memory while the VM is 
running. We can hot plug 8G yes, but the memory may not physically exist yet 
(and may never exist). How can we use this math to provision device-memory when 
the memory may not exist physically?

> 
> We place that area for memory devices after the RAM. So it starts after the 
> 5G of ">=4G" boot memory.
> 
> 
> Long story short, based on the initial RAM size and the maximum RAM size, you
> can construct the layout above and exactly know
> a) How much memory is below 4G, starting at address 0 -> leaving 1G for the 
> 32bit PCI hole
> b) How much memory is above 4G, starting at address 4g.
> c) Where the region for memory devices starts (aligned after b) ) and how big 
> it is.
> d) Where the 64bit PCI hole is (after c) )
> 
>> (b) Another question is, in pc_pci_hole64_start(), why are we adding this 
>> size to the start address?
>> } else if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) {
>>      pc_get_device_memory_range(pcms, &hole64_start, &size);
>>         if (!pcmc->broken_reserved_end) {
>>             hole64_start += size;
> 
> The 64bit PCI hole starts after "device-memory" above.
> 
> Apparently, we have to take care of some layout issues before QEMU 2.5. You 
> can assume that nowadays,
> "pcmc->broken_reserved_end" is never set. So the PCI64 hole is always after 
> the device-memory region.
> 
>> I think this is trying to put the hole after the device memory. But if the 
>> ram size is <=maxram_size then the hole is after the above_4G memory? Why?
> 
> I didn't quit get what the concern is, can you elaborate?

Oh I meant the else part here and made a typo, the else implies ram size == 
maxram_size

  } else {
        hole64_start = pc_above_4g_end(pcms);
    }

So in this case, there is no device_memory region?!
Another thing I do not understand is, for 32 -bit, 
above_4g_mem_start is 4GiB  and above_4g_mem_size = ram_size - lowmem.
So we are allocating “above-4G” ram above address space of the processor?!

> 
>> (c) in your above change, what does long mode have anything to do with all 
>> of this?
> 
> According to my understanding, 32bit (i386) doesn't have a 64bit hole. And 
> 32bit vs.
> 64bit (i386 vs. x86_64) is decided based on LM, not on the address bits (as 
> we learned, PSE36, and PAE).
> 
> But really, I just did what x86_cpu_realizefn() does to decide 32bit vs. 
> 64bit ;)
> 
>    /* For 64bit systems think about the number of physical bits to present.
>     * ideally this should be the same as the host; anything other than 
> matching
>     * the host can cause incorrect guest behaviour.
>     * QEMU used to pick the magic value of 40 bits that corresponds to
>     * consumer AMD devices but nothing else.
>     *
>     * Note that this code assumes features expansion has already been done
>     * (as it checks for CPUID_EXT2_LM), and also assumes that potential
>     * phys_bits adjustments to match the host have been already done in
>     * accel-specific code in cpu_exec_realizefn.
>     */
>    if (env->features[FEAT_8000_0001_EDX] & CPUID_EXT2_LM) {
>    ...
>    } else {
>        /* For 32 bit systems don't use the user set value, but keep
>         * phys_bits consistent with what we tell the guest.
>         */

Ah I see. I missed this. But I still can’t understand why for 32 bit, 
pc_pci_hole64_start() would be the right address for max gpa?

>    ...
> 
> 
> But that was just my quick attempt at fixing pc_max_used_gpa().
> 
> *Maybe* there is a 64bit PCI hole on 32bit i386 with 36bit addresses?
> 
> I'm the wrong person to ask, but I kind-of doubt it. :)
> 
> 
> -- 
> Cheers,
> 
> David / dhildenb
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]