[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO r
From: |
Eric Auger |
Subject: |
Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions |
Date: |
Fri, 5 Aug 2022 10:00:24 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.11.0 |
Hi Gavin,
On 8/5/22 10:36, Gavin Shan wrote:
> Hi Eric,
>
> On 8/4/22 5:19 PM, Eric Auger wrote:
>> On 8/4/22 04:47, Gavin Shan wrote:
>>> On 8/3/22 10:52 PM, Eric Auger wrote:
>>>> On 8/3/22 15:02, Gavin Shan wrote:
>>>>> On 8/3/22 5:01 PM, Marc Zyngier wrote:
>>>>>> On Wed, 03 Aug 2022 04:01:04 +0100,
>>>>>> Gavin Shan <gshan@redhat.com> wrote:
>>>>>>> On 8/2/22 7:41 PM, Eric Auger wrote:
>>>>>>>> On 8/2/22 08:45, Gavin Shan wrote:
>>>>>>>>> There are 3 highmem IO regions as below. They can be disabled in
>>>>>>>>> two situations: (a) The specific region is disabled by user. (b)
>>>>>>>>> The specific region doesn't fit in the PA space. However, the
>>>>>>>>> base
>>>>>>>>> address and highest_gpa are still updated no matter if the region
>>>>>>>>> is enabled or disabled. It's incorrectly incurring waste in
>>>>>>>>> the PA
>>>>>>>>> space.
>>>>>>>> If I am not wrong highmem_redists and highmem_mmio are not user
>>>>>>>> selectable
>>>>>>>>
>>>>>>>> Only highmem ecam depends on machine type & ACPI setup. But I
>>>>>>>> would
>>>>>>>> say
>>>>>>>> that in server use case it is always set. So is that optimization
>>>>>>>> really
>>>>>>>> needed?
>>>>>>>
>>>>>>> There are two other cases you missed.
>>>>>>>
>>>>>>> - highmem_ecam is enabled after virt-2.12, meaning it stays
>>>>>>> disabled
>>>>>>> before that.
>>>>>>
>>>>>> I don't get this. The current behaviour is to disable
>>>>>> highmem_ecam if
>>>>>> it doesn't fit in the PA space. I can't see anything that enables it
>>>>>> if it was disabled the first place.
>>>>>>
>>>>>
>>>>> There are several places or conditions where vms->highmem_ecam can be
>>>>> disabled:
>>>>>
>>>>> - virt_instance_init() where vms->highmem_ecam is inherited from
>>>>> !vmc->no_highmem_ecam. The option is set to true after virt-2.12
>>>>> in virt_machine_2_12_options().
>>>>>
>>>>> - machvirt_init() where vms->highmem_ecam can be disable if we have
>>>>> 32-bits vCPUs and failure on loading firmware.
>>>>>
>>>>> - Another place is where we're talking about. It's address assignment
>>>>> to fit the PA space.
>>>>>
>>>>>>>
>>>>>>> - The high memory region can be disabled if user is asking large
>>>>>>> (normal) memory space through 'maxmem=' option. When the
>>>>>>> requested
>>>>>>> memory by 'maxmem=' is large enough, the high memory
>>>>>>> regions are
>>>>>>> disabled. It means the normal memory has higher priority than
>>>>>>> those
>>>>>>> high memory regions. This is the case I provided in (b) of the
>>>>>>> commit log.
>>>>>>
>>>>>> Why is that a problem? It matches the expected behaviour, as the
>>>>>> highmem IO region is floating and is pushed up by the memory region.
>>>>>>
>>>>>
>>>>> Eric thought that VIRT_HIGH_GIC_REDIST2 and VIRT_HIGH_PCIE_MMIO
>>>>> regions
>>>>> aren't user selectable. I tended to explain why it's not true.
>>>>> 'maxmem='
>>>>> can affect the outcome. When 'maxmem=' value is big enough, there
>>>>> will be
>>>>> no free area in the PA space to hold those two regions.
>>>>>
>>>>>>>
>>>>>>> In the commit log, I was supposed to say something like below for
>>>>>>> (a):
>>>>>>>
>>>>>>> - The specific high memory region can be disabled through changing
>>>>>>> the code by user or developer. For example,
>>>>>>> 'vms->highmem_mmio'
>>>>>>> is changed from true to false in virt_instance_init().
>>>>>>
>>>>>> Huh. By this principle, the user can change anything. Why is it
>>>>>> important?
>>>>>>
>>>>>
>>>>> Still like above. I was explaining the possible cases where those
>>>>> 3 switches can be turned on/off by users or developers. Our code
>>>>> needs to be consistent and comprehensive.
>>>>>
>>>>> vms->highmem_redists
>>>>> vms->highmem_ecam
>>>>> vms->mmio
>>>>>
>>>>>>>
>>>>>>>>>
>>>>>>>>> Improve address assignment for highmem IO regions to avoid the
>>>>>>>>> waste
>>>>>>>>> in the PA space by putting the logic into virt_memmap_fits().
>>>>>>
>>>>>> I guess that this is what I understand the least. What do you
>>>>>> mean by
>>>>>> "wasted PA space"? Either the regions fit in the PA space, and
>>>>>> computing their addresses in relevant, or they fall outside of it
>>>>>> and
>>>>>> what we stick in memap[index].base is completely irrelevant.
>>>>>>
>>>>>
>>>>> It's possible that we run into the following combination. we should
>>>>> have enough PA space to enable VIRT_HIGH_PCIE_MMIO region. However,
>>>>> the region is disabled in the original implementation because
>>>>> VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} regions consumed 1GB, which is
>>>>> unecessary and waste in the PA space.
>>>> each region's base is aligned on its size.
>>>
>>> Yes.
>>>
>>>>>
>>>>> static MemMapEntry extended_memmap[] = {
>>>>> [VIRT_HIGH_GIC_REDIST2] = { 0x0, 64 * MiB },
>>>>> [VIRT_HIGH_PCIE_ECAM] = { 0x0, 256 * MiB },
>>>>> [VIRT_HIGH_PCIE_MMIO] = { 0x0, 512 * GiB },
>>>> so anyway MMIO is at least at 512GB. Having a 1TB IPA space does not
>>>> imply any amount of RAM. This depends on the address space.
>>>> I };
>>>
>>> Yes. Prior to the start of system memory, there is 1GB used by
>>> various regions either.
>> yes
>>>
>>>>>
>>>>> IPA_LIMIT = (1UL << 40)
>>>>> '-maxmem' = 511GB /* Memory starts from
>>>>> 1GB */
>>>>> '-slots' = 0
>>>>> vms->highmem_rdist2 = false
>>>> How can this happen? the only reason for highmem_redists to be
>>>> reset is
>>>> if it does not fit into map_ipa. So if mmio fits, highmem_redists does
>>>> too. What do I miss?
>>>
>>> The example is having "vms->highmem_rdist2 = flase" BEFORE the address
>>> assignment, it's possible that developer changes the code to disable
>>> it intentionally. The point is the original implementation isn't
>>> comprehensive
>>> because it has the wrong assumption that vms->highmem_{rdist2, ecam,
>>> mmio} all
>>> true before the address assignment. With the wrong assumption, the
>>> base address
>>> is always increased, even the previous region is disabled, during the
>>> address assignment in virt_set_memmap().
>> Yes we currently always provision space for those functionalities,
>> independently on whether they are used. But that's true for many other
>> regions in the address map (although smaller) ;-)
>
> Yep :)
>
>>>
>>>
>>>>> vms->highmem_ecam = false
>>>> vms->highmem_mmio = true
>>>> I am still skeptic about the relevance of such optim. Isn't it
>>>> sensible
>>>> to expect the host to support large IPA if you want to use such amount
>>>> of memory.
>>>> I think we should rather better document the memory map from a user
>>>> point of view.
>>>> Besides if you change the memory map, you need to introduce yet
>>>> another
>>>> option to avoid breaking migration, no?
>>>>
>>>
>>> These 3 high memory regions consumes 513GB with alignment considered
>>> when
>>> all of them are enabled. The point is those 3 high memory regions, or
>>> part
>>> of them can be squeezed or smashed to accommodate '-maxmem' by
>>> design. I
>>> think there are two options I can think of. I personally prefer to keep
>>> the capability. With this, users gain broader range for their
>>> '-maxmem'.
>>> Please let me know your preference.
>>>
>>> - Revert the capability of squeezing or smashing those high memory
>>> regions
>>> to accommodate '-maxmem'. It means vms->high_{redist2, ecam, mmio}
>>> can't
>>> be disable on conflicts to the PA space.
>>>
>>> - Keep the capability, with this optimization applied to make the
>>> implementation
>>> comprehensive.
>>>
>>> I think it's worthy to add something about this limitation in
>>> doc/system/arm/virt.rst.
>> indeed that's worth in any case.
>>>
>>> I don't think I need introduce another option to avoid migration
>>> breakage.
>>> Could you explain why I need the extra option?
>> I think you do. Before and after this patch the QEMU memory regions
>> associated to those devices won't be a the same location in the memory
>> so if you migrate from an old version to a newer one, the guest won't be
>> able to access them
>>
>> OK I have given my own opinion about those potential changes. Let's wait
>> for others' ;-)
>>
>
> Thank you for your comments. Yeah, I would hold to post v2 to collect
> more comments :)
>
> I'm still don't understand how it's affecting migration. If I understand
> correct, the changed address based doesn't affect migration, as explained
> like below. It took me while to looking the source code to figure out
> how VIRT_HIGH_{GIC_REDIST2, PCIE_ECAM} is linked to GIC and PCIe host and
> migration.
>
> For VIRTH_HIGH_GIC_REDIST2 region, it's used by TYPE_ARM_GICV3 or
> TYPE_KVM_ARM_GICV3.
> TYPE_KVM_ARM_GICV3. For both cases, we do NOT migrate the region
> directly. Instead,
> the registers are saved to GICv3CPUState. GICv3CPUState is migrate and
> the registers
> are restored from the instance.
>
> For VIRT_HIGH_PCIE_ECAM, the registers are saved to PCIDevice::config.
> The
> buffer is migrated and PCI's config space is restored from it. In
> hw/net/e1000e.c,
> e1000e_vmstate has something like below embedded:
>
> VMSTATE_PCI_DEVICE(parent_obj, E1000EState),
Actually I was more thinking about PCI MMIO region. Effectively for
regions that are saved/restored from regs it sounds OK (RDIST).
For ECAM I do not know how migration is handled but better to double
check with PCI/migration experts.
Thanks
Eric
>
> ---
>
> I also did one experiment by having different address bases on the
> source and
> destination. Migration succeeds.
>
> address regions from source
> ---------------------------
> virt_memmap_fits: HIGH_GIC_REDIST2: [0000004000000000 0000000004000000]
> virt_memmap_fits: HIGH_GIC_ECAM: [0000004010000000 0000000010000000]
> virt_memmap_fits: HIGH_GIC_MMIO: [0000008000000000 0000008000000000]
>
> address regions from destination
> --------------------------------
> virt_memmap_fits: HIGH_GIC_REDIST2: [0000004040000000 0000000004000000]
> virt_memmap_fits: HIGH_GIC_ECAM: [0000004050000000 0000000010000000]
> virt_memmap_fits: HIGH_GIC_MMIO: [0000008000000000 0000008000000000]
>
>
>>>
>>>>>
>>>>>>>>>
>>>>>>>>> Signed-off-by: Gavin Shan <gshan@redhat.com>
>>>>>>>>> ---
>>>>>>>>> hw/arm/virt.c | 54
>>>>>>>>> +++++++++++++++++++++++++++++----------------------
>>>>>>>>> 1 file changed, 31 insertions(+), 23 deletions(-)
>>>>>>>>>
>>>>>>>>> diff --git a/hw/arm/virt.c b/hw/arm/virt.c
>>>>>>>>> index 9633f822f3..bc0cd218f9 100644
>>>>>>>>> --- a/hw/arm/virt.c
>>>>>>>>> +++ b/hw/arm/virt.c
>>>>>>>>> @@ -1688,6 +1688,34 @@ static uint64_t
>>>>>>>>> virt_cpu_mp_affinity(VirtMachineState *vms, int idx)
>>>>>>>>> return arm_cpu_mp_affinity(idx, clustersz);
>>>>>>>>> }
>>>>>>>>> +static void virt_memmap_fits(VirtMachineState *vms, int
>>>>>>>>> index,
>>>>>>>>> + bool *enabled, hwaddr *base, int
>>>>>>>>> pa_bits)
>>>>>>>>> +{
>>>>>>>>> + hwaddr size = extended_memmap[index].size;
>>>>>>>>> +
>>>>>>>>> + /* The region will be disabled if its size isn't given */
>>>>>>>>> + if (!*enabled || !size) {
>>>>>>>> In which case do you have !size?
>>>>>>>
>>>>>>> Yeah, we don't have !size and the condition should be removed.
>>>>>>>
>>>>>>>>> + *enabled = false;
>>>>>>>>> + vms->memmap[index].base = 0;
>>>>>>>>> + vms->memmap[index].size = 0;
>>>>>>>> It looks dangerous to me to reset the region's base and size like
>>>>>>>> that.
>>>>>>>> for instance fdt_add_gic_node() will add dummy data in the dt.
>>>>>>>
>>>>>>> I would guess you missed that the high memory regions won't be
>>>>>>> exported
>>>>>>> through device-tree if they have been disabled. We have a check
>>>>>>> there,
>>>>>>> which is "if (nb_redist_regions == 1)"
>>>>>>>
>>>>>>>>> + return;
>>>>>>>>> + }
>>>>>>>>> +
>>>>>>>>> + /*
>>>>>>>>> + * Check if the memory region fits in the PA space. The
>>>>>>>>> memory map
>>>>>>>>> + * and highest_gpa are updated if it fits. Otherwise, it's
>>>>>>>>> disabled.
>>>>>>>>> + */
>>>>>>>>> + *enabled = (ROUND_UP(*base, size) + size <=
>>>>>>>>> BIT_ULL(pa_bits));
>>>>>>>> using a 'fits' local variable would make the code more obvious I
>>>>>>>> think
>>>>>>>
>>>>>>> Lets confirm if you're suggesting something like below?
>>>>>>>
>>>>>>> bool fits;
>>>>>>>
>>>>>>> fits = (ROUND_UP(*base, size) + size <=
>>>>>>> BIT_ULL(pa_bits));
>>>>>>>
>>>>>>> if (fits) {
>>>>>>> /* update *base, memory mapping, highest_gpa */
>>>>>>> } else {
>>>>>>> *enabled = false;
>>>>>>> }
>>>>>>>
>>>>>>> I guess we can simply do
>>>>>>>
>>>>>>> if (ROUND_UP(*base, size) + size <= BIT_ULL(pa_bits)) {
>>>>>>> /* update *base, memory mapping, highest_gpa */
>>>>>>> } else {
>>>>>>> *enabled = false;
>>>>>>> }
>>>>>>>
>>>>>>> Please let me know which one looks best to you.
>>>>>>
>>>>>> Why should the 'enabled' flag be updated by this function,
>>>>>> instead of
>>>>>> returning the value and keeping it as an assignment in the caller
>>>>>> function? It is purely stylistic though.
>>>>>>
>>>>>
>>>>> The idea to put the logic, address assignment for those 3 high memory
>>>>> regions or updating the flags (vms->high_redist2/ecam/mmio), to the
>>>>> newly introduced function, to make virt_set_memmap() a bit
>>>>> simplified.
>>>>> Eric tends to agree the changes with minor adjustments. So I'm going
>>>>> to keep it as of being, which doesn't mean the stylistic thought is
>>>>> a bad one :)
>>>>>
>>>>> Lastly, I need to rewrite the comit log completely because it seems
>>>>> causing confusions to Eric and you.
>>>>>
>>>
>
> Thanks,
> Gavin
>
- [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, (continued)
- [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Gavin Shan, 2022/08/02
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Eric Auger, 2022/08/02
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Gavin Shan, 2022/08/02
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Marc Zyngier, 2022/08/03
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Eric Auger, 2022/08/03
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Gavin Shan, 2022/08/03
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Eric Auger, 2022/08/03
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Gavin Shan, 2022/08/03
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Eric Auger, 2022/08/04
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Gavin Shan, 2022/08/05
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions,
Eric Auger <=
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Marc Zyngier, 2022/08/08
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Gavin Shan, 2022/08/11
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Marc Zyngier, 2022/08/11
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Eric Auger, 2022/08/03
- Re: [PATCH 1/2] hw/arm/virt: Improve address assignment for highmem IO regions, Gavin Shan, 2022/08/03