|
From: | Gavin Shan |
Subject: | Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions |
Date: | Fri, 5 Aug 2022 18:36:19 +1000 |
User-agent: | Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.2.0 |
Hi Eric, On 8/4/22 5:19 PM, Eric Auger wrote:
On 8/4/22 04:47, Gavin Shan wrote:On 8/3/22 10:52 PM, Eric Auger wrote:On 8/3/22 15:02, Gavin Shan wrote:On 8/3/22 5:01 PM, Marc Zyngier wrote:On Wed, 03 Aug 2022 04:01:04 +0100, Gavin Shan <gshan@redhat.com> wrote:On 8/2/22 7:41 PM, Eric Auger wrote:On 8/2/22 08:45, Gavin Shan wrote:There are 3 highmem IO regions as below. They can be disabled in two situations: (a) The specific region is disabled by user. (b) The specific region doesn't fit in the PA space. However, the base address and highest_gpa are still updated no matter if the region is enabled or disabled. It's incorrectly incurring waste in the PA space.If I am not wrong highmem_redists and highmem_mmio are not user selectable Only highmem ecam depends on machine type & ACPI setup. But I would say that in server use case it is always set. So is that optimization really needed?There are two other cases you missed. - highmem_ecam is enabled after virt-2.12, meaning it stays disabled before that.I don't get this. The current behaviour is to disable highmem_ecam if it doesn't fit in the PA space. I can't see anything that enables it if it was disabled the first place.There are several places or conditions where vms->highmem_ecam can be disabled: - virt_instance_init() where vms->highmem_ecam is inherited from !vmc->no_highmem_ecam. The option is set to true after virt-2.12 in virt_machine_2_12_options(). - machvirt_init() where vms->highmem_ecam can be disable if we have 32-bits vCPUs and failure on loading firmware. - Another place is where we're talking about. It's address assignment to fit the PA space.- The high memory region can be disabled if user is asking large (normal) memory space through 'maxmem=' option. When the requested memory by 'maxmem=' is large enough, the high memory regions are disabled. It means the normal memory has higher priority than those high memory regions. This is the case I provided in (b) of the commit log.Why is that a problem? It matches the expected behaviour, as the highmem IO region is floating and is pushed up by the memory region.Eric thought that VIRT_HIGH_GIC_REDIST2 and VIRT_HIGH_PCIE_MMIO regions aren't user selectable. I tended to explain why it's not true. 'maxmem=' can affect the outcome. When 'maxmem=' value is big enough, there will be no free area in the PA space to hold those two regions.In the commit log, I was supposed to say something like below for (a): - The specific high memory region can be disabled through changing the code by user or developer. For example, 'vms->highmem_mmio' is changed from true to false in virt_instance_init().Huh. By this principle, the user can change anything. Why is it important?Still like above. I was explaining the possible cases where those 3 switches can be turned on/off by users or developers. Our code needs to be consistent and comprehensive. vms->highmem_redists vms->highmem_ecam vms->mmioImprove address assignment for highmem IO regions to avoid the waste in the PA space by putting the logic into virt_memmap_fits().I guess that this is what I understand the least. What do you mean by "wasted PA space"? Either the regions fit in the PA space, and computing their addresses in relevant, or they fall outside of it and what we stick in memap[index].base is completely irrelevant.It's possible that we run into the following combination. we should have enough PA space to enable VIRT_HIGH_PCIE_MMIO region. However, the region is disabled in the original implementation because VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} regions consumed 1GB, which is unecessary and waste in the PA space.each region's base is aligned on its size.Yes.static MemMapEntry extended_memmap[] = { [VIRT_HIGH_GIC_REDIST2] = { 0x0, 64 * MiB }, [VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB }, [VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB },so anyway MMIO is at least at 512GB. Having a 1TB IPA space does not imply any amount of RAM. This depends on the address space. I };Yes. Prior to the start of system memory, there is 1GB used by various regions either.yesIPA_LIMIT = (1UL << 40) '-maxmem' = 511GB /* Memory starts from 1GB */ '-slots' = 0 vms->highmem_rdist2 = falseHow can this happen? the only reason for highmem_redists to be reset is if it does not fit into map_ipa. So if mmio fits, highmem_redists does too. What do I miss?The example is having "vms->highmem_rdist2 = flase" BEFORE the address assignment, it's possible that developer changes the code to disable it intentionally. The point is the original implementation isn't comprehensive because it has the wrong assumption that vms->highmem_{rdist2, ecam, mmio} all true before the address assignment. With the wrong assumption, the base address is always increased, even the previous region is disabled, during the address assignment in virt_set_memmap().Yes we currently always provision space for those functionalities, independently on whether they are used. But that's true for many other regions in the address map (although smaller) ;-)
Yep :)
vms->highmem_ecam = falsevms->highmem_mmio = true I am still skeptic about the relevance of such optim. Isn't it sensible to expect the host to support large IPA if you want to use such amount of memory. I think we should rather better document the memory map from a user point of view. Besides if you change the memory map, you need to introduce yet another option to avoid breaking migration, no?These 3 high memory regions consumes 513GB with alignment considered when all of them are enabled. The point is those 3 high memory regions, or part of them can be squeezed or smashed to accommodate '-maxmem' by design. I think there are two options I can think of. I personally prefer to keep the capability. With this, users gain broader range for their '-maxmem'. Please let me know your preference. - Revert the capability of squeezing or smashing those high memory regions to accommodate '-maxmem'. It means vms->high_{redist2, ecam, mmio} can't be disable on conflicts to the PA space. - Keep the capability, with this optimization applied to make the implementation comprehensive. I think it's worthy to add something about this limitation in doc/system/arm/virt.rst.indeed that's worth in any case.I don't think I need introduce another option to avoid migration breakage. Could you explain why I need the extra option?I think you do. Before and after this patch the QEMU memory regions associated to those devices won't be a the same location in the memory so if you migrate from an old version to a newer one, the guest won't be able to access them OK I have given my own opinion about those potential changes. Let's wait for others' ;-)
Thank you for your comments. Yeah, I would hold to post v2 to collect more comments :) I'm still don't understand how it's affecting migration. If I understand correct, the changed address based doesn't affect migration, as explained like below. It took me while to looking the source code to figure out how VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} is linked to GIC and PCIe host and migration. For VIRTH_HIGH_GIC_REDIST2 region, it's used by TYPE_ARM_GICV3 or TYPE_KVM_ARM_GICV3. TYPE_KVM_ARM_GICV3. For both cases, we do NOT migrate the region directly. Instead, the registers are saved to GICv3CPUState. GICv3CPUState is migrate and the registers are restored from the instance. For VIRT_HIGH_PCIE_ECAM, the registers are saved to PCIDevice::config. The buffer is migrated and PCI's config space is restored from it. In hw/net/e1000e.c, e1000e_vmstate has something like below embedded: VMSTATE_PCI_DEVICE(parent_obj, E1000EState), --- I also did one experiment by having different address bases on the source and destination. Migration succeeds. address regions from source --------------------------- virt_memmap_fits: HIGH_GIC_REDIST2: [0000004000000000 0000000004000000] virt_memmap_fits: HIGH_GIC_ECAM: [0000004010000000 0000000010000000] virt_memmap_fits: HIGH_GIC_MMIO: [0000008000000000 0000008000000000] address regions from destination -------------------------------- virt_memmap_fits: HIGH_GIC_REDIST2: [0000004040000000 0000000004000000] virt_memmap_fits: HIGH_GIC_ECAM: [0000004050000000 0000000010000000] virt_memmap_fits: HIGH_GIC_MMIO: [0000008000000000 0000008000000000]
Signed-off-by: Gavin Shan <gshan@redhat.com> --- hw/arm/virt.c | 54 +++++++++++++++++++++++++++++---------------------- 1 file changed, 31 insertions(+), 23 deletions(-) diff --git a/hw/arm/virt.c b/hw/arm/virt.c index 9633f822f3..bc0cd218f9 100644 --- a/hw/arm/virt.c +++ b/hw/arm/virt.c @@ -1688,6 +1688,34 @@ static uint64_t virt_cpu_mp_affinity(VirtMachineState *vms, int idx) return arm_cpu_mp_affinity(idx, clustersz); } +static void virt_memmap_fits(VirtMachineState *vms, int index, + bool *enabled, hwaddr *base, int pa_bits) +{ + hwaddr size = extended_memmap[index].size; + + /* The region will be disabled if its size isn't given */ + if (!*enabled || !size) {In which case do you have !size?Yeah, we don't have !size and the condition should be removed.+ *enabled = false; + vms->memmap[index].base = 0; + vms->memmap[index].size = 0;It looks dangerous to me to reset the region's base and size like that. for instance fdt_add_gic_node() will add dummy data in the dt.I would guess you missed that the high memory regions won't be exported through device-tree if they have been disabled. We have a check there, which is "if (nb_redist_regions == 1)"+ return; + } + + /* + * Check if the memory region fits in the PA space. The memory map + * and highest_gpa are updated if it fits. Otherwise, it's disabled. + */ + *enabled = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits));using a 'fits' local variable would make the code more obvious I thinkLets confirm if you're suggesting something like below? bool fits; fits = (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)); if (fits) { /* update *base, memory mapping, highest_gpa */ } else { *enabled = false; } I guess we can simply do if (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)) { /* update *base, memory mapping, highest_gpa */ } else { *enabled = false; } Please let me know which one looks best to you.Why should the 'enabled' flag be updated by this function, instead of returning the value and keeping it as an assignment in the caller function? It is purely stylistic though.The idea to put the logic, address assignment for those 3 high memory regions or updating the flags (vms->high_redist2/ecam/mmio), to the newly introduced function, to make virt_set_memmap() a bit simplified. Eric tends to agree the changes with minor adjustments. So I'm going to keep it as of being, which doesn't mean the stylistic thought is a bad one :) Lastly, I need to rewrite the comit log completely because it seems causing confusions to Eric and you.
Thanks, Gavin
[Prev in Thread] | Current Thread | [Next in Thread] |