Re: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM vi

qemu-arm

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM vi

From:	Laszlo Ersek
Subject:	Re: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support)
Date:	Tue, 24 Sep 2019 17:52:30 +0200
User-agent:	Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1

On 09/20/19 19:04, Shameerali Kolothum Thodi wrote:
> Hi Laszlo/Igor,
>
> I spend some time to debug this further as I was rebasing the nvdimm
> hot-add support patches on top of the ongoing pc-dimm hot add ones.
>
> Just to refresh the memory:
>
> https://patchwork.kernel.org/cover/10783589/
>
> " It is observed that hot adding nvdimm will results in guest reboot
> failure. EDK2 fails to build the ACPI tables on reboot. Please find
> below EDK2 log on Guest reboot after nvdimm hot-add,
>
> ProcessCmdAddChecksum: invalid checksum range in "etc/acpi/tables"
> OnRootBridgesConnected: InstallAcpiTables: Protocol Error
> "
>
> Please find below,
>
>> -----Original Message-----
>> From: Laszlo Ersek [mailto:address@hidden]
>> Sent: 05 March 2019 12:15
>> To: Igor Mammedov <address@hidden>
>> Cc: Shameerali Kolothum Thodi <address@hidden>;
>> Auger Eric <address@hidden>; address@hidden;
>> address@hidden; address@hidden; address@hidden;
>> xuwei (O) <address@hidden>; Linuxarm <address@hidden>; Ard
>> Biesheuvel <address@hidden>; Leif Lindholm (Linaro
>> address) <address@hidden>
>> Subject: Re: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support
>>
>> On 03/01/19 18:39, Igor Mammedov wrote:
>>> On Fri, 1 Mar 2019 14:49:45 +0100
>>> Laszlo Ersek <address@hidden> wrote:
>>>
>>>> On 02/28/19 15:02, Shameerali Kolothum Thodi wrote:
>>>>
>>>>> Ah..I missed the fact that, firmware indeed sees an update in the
>>>>> blob len here (rounded or not) after reboot. So don't think x86
>>>>> has the same issue and padding is not the right solution as Igor
>>>>> explained in his reply.
>>>>>
>>>>> I will try to debug this further. Any pointers welcome.
>>>>
>>>> How about this.
>>>>
>>>> (1) The firmware looks up the fw_cfg file called "etc/table-loader"
>>>> in the fw_cfg file directory (identified by constant selector key
>>>> 0x0019, FW_CFG_FILE_DIR).
>>>>
>>>> (2) The directory entry, once found, tells the firmware two things
>>>> simultaneously. The selector key, and the size of the blob.
>>>>
>>>> (3) The firmware selects the selector key from step (2).
>>>>
>>>> (4) QEMU regenerates the ACPI payload (as a select callback).
>>>>
>>>> (5) The firmware reads the number of bytes from the fw_cfg blob
>>>> that it learned in step (2).
>>>>
>>>> Here's the problem. As long as QEMU used to perform step (4) only
>>>> for the purpose of refreshing PCI resources in the ACPI payload,
>>>> step (4) wouldn't *resize* the blob.
>>>>
>>>> However, if step (4) enlarges the blob, then the byte count that
>>>> step (5) uses -- from step (2) -- for reading, is obsolete.
>>
>>> I've thought that was a problem with IO based fw_cfg, as reading
>>> size/content were separates steps and that it was solved by DMA
>>> based fw_cfg file read.
>>
>> The DMA backend is not relevant for this question, for two reasons:
>>
>> (a) The question whether the fw_cfg transfer takes places with port
>> IO vs. DMA is hidden from the fw_cfg client code; that code goes
>> through an abstract library API.
>>
>> (b) While the DMA method indeed lets the firmware specify the details
>> of the transfer with one action, the issue is with the number of
>> bytes that the firmware requests (that is, not with *how* the
>> firmware requests the transfer). The firmware has to know the size of
>> the transfer before it can initiate the transfer (regardless of port
>> IO vs. DMA).
>>
>>
>> My question is: assume the firmware item in question is selected, and
>> the QEMU-side select callback runs (regenerating the ACPI payload).
>> Does this action update the blob size in the fw_cfg file directory as
>> well?
>
> I think it doesn't update the blob size on select callback which is
> the root cause of this issue. And the reason looks like,
> qemu_ram_resize() function returns without invoking the callback to
> update the blob size.
>
> On boot up, Qemu builds the table and exposes it to guest,
>       virt_acpi_setup()
>         acpi_add_rom_blob()
>           rom_add_blob()
>             rom_set_mr()  --> mr is allocated here and ram_block used_length 
> = HOST_PAGE_ALIGN(blob size);
>             fw_cfg_add_file_callback()
>               fw_cfg_add_bytes_callback() --> This uses the blob size passed 
> into it.
>
> On select callback path,
>
> virt_acpi_build_update()
>    acpi_ram_update()
>     memory_region_ram_resize()
>       qemu_ram_resize() -->. Here the newsize gets aligned to HOST_PAGE and 
> callback is only called used_length != newsize.
>
> https://github.com/qemu/qemu/blob/master/exec.c#L2180
>
> Debug logs:
> Initial boot:
> ##QEMU_DEBUG## rom_add_blob: file etc/acpi/tables size 0x64f7
> ##QEMU_DEBUG## fw_cfg_add_bytes_callback: key 0x21 len 0x64f7
> ........
> ........
> ###UEFI#### InstallQemuFwCfgTables: "etc/table-loader" has FwCfgItem 0x27 
> size 0xD00
> ##QEMU_DEBUG## virt_acpi_build_update:
> ##QEMU_DEBUG## acpi_ram_update: size 0x64f7
> ##QEMU_DEBUG## qemu_ram_resize: idstr /rom@etc/acpi/tables used_length  
> 0x7000 newsize 0x7000 --> No callback.
> .....
> ######UEFI###### ProcessCmdAllocate: QemuFwCfgFindFile("etc/acpi/tables"): 
> size 0x64F7 --> UEFI get the actual size, which is fine for now.
>
> Hot-add nvdimms and reboot.
>
> root@ubuntu:/# reboot
> .......
> ........
> ###UEFI#### InstallQemuFwCfgTables: "etc/table-loader" has FwCfgItem 0x27 
> size 0xD00
> ##QEMU_DEBUG## virt_acpi_build_update:
> ##QEMU_DEBUG## acpi_ram_update: size 0x6667 --> Size changed.
> ##QEMU_DEBUG## qemu_ram_resize: idstr /rom@etc/acpi/tables used_length  
> 0x7000 newsize 0x7000 --> newsize is still aligned to 0x7000 and no callback 
> to update.
> ......
> ######UEFI###### ProcessCmdAllocate: QemuFwCfgFindFile("etc/acpi/tables"): 
> size 0x64F7 -->UEFI still sees the old value and fails.
>
> This can be fixed by,
>
> diff --git a/hw/arm/virt-acpi-build.c b/hw/arm/virt-acpi-build.c index 
> f3bd45675b..79da3fd35d 100644
> --- a/hw/arm/virt-acpi-build.c
> +++ b/hw/arm/virt-acpi-build.c
> @@ -854,6 +854,9 @@ void virt_acpi_build(VirtMachineState *vms, 
> AcpiBuildTables *tables)
>          build_rsdp(tables->rsdp, tables->linker, &rsdp_data);
>      }
>
> +    g_array_set_size(tables_blob,
> +                    TARGET_PAGE_ALIGN(acpi_data_len(tables_blob)));
> +
>      /* Cleanup memory that's no longer used. */
>      g_array_free(table_offsets, true);
>  }
>
> But I am not sure this is the best to way fix this issue (Or I am
> missing something here).
>
> Please let me know.

The above QEMU patch, for virt_acpi_build(), may be necessary, but I
don't think it is sufficient.

For the firmware to see the updated (enlarged) blob, two things are
required:

(a) QEMU to update the blob size in the fw_cfg directory entry.

Note: to the firmware, it is totally irrelevant if QEMU updates some
*other* value or field that reflects the fresh blob size. The only thing
the firmware can see is the entry in the FW_CFG_FILE_DIR blob.

To illustrate the field I'm referring to, see:

    s->files->f[index].size   = cpu_to_be32(len);

in fw_cfg_add_file_callback().

See also:

            s->files->f[i].size   = cpu_to_be32(len);

in fw_cfg_modify_file().

That "size" field is what the firmware can see.

Note: *all* relevant fw_cfg files must have their "size" fields updated
in the directory blob (FW_CFG_FILE_DIR). I.e. the requirement applies to
both the linker-loader script, and to all blobs that are referenced (by
name) by the commands in the linker-loader script.


(b) The firmware to re-read the size from the directory, after selecting
the key for the sake of ACPI regeneration.

I wrote:

>> If it does, then I can work around the problem in the firmware. I can
>> add a re-lookup to the code after the item selection, in order to get
>> the fresh blob size from the fw_cfg file directory. Then we can use
>> that size for the actual transfer.
>>
>> This won't help old firmware on new QEMU, but at least new firmware
>> on old QEMU will not be hurt (the re-fetching of the fw_cfg file
>> directory will come with a small performance penalty, but
>> functionally it will be a no-op).

Thus, the firmware patch in question would be:

| diff --git a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c 
b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
| index bc1a891dbaf1..07f70ffe158a 100644
| --- a/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
| +++ b/OvmfPkg/AcpiPlatformDxe/QemuFwCfgAcpi.c
| @@ -975,6 +975,24 @@ InstallQemuFwCfgTables (
|    ORDERED_COLLECTION       *SeenPointers;
|    ORDERED_COLLECTION_ENTRY *SeenPointerEntry, *SeenPointerEntry2;
|
| +  Status = QemuFwCfgFindFile ("etc/table-loader", &FwCfgItem, &FwCfgSize);
| +  if (EFI_ERROR (Status)) {
| +    return Status;
| +  }
| +
| +  //
| +  // By selecting "FwCfgItem", ask QEMU to regenerate the ACPI payload, with
| +  // all PCI devices decoding their resources. Note: further selections
| +  // of the same key will not repeat the patching.
| +  //
| +  EnablePciDecoding (&OriginalPciAttributes, &OriginalPciAttributesCount);
| +  QemuFwCfgSelectItem (FwCfgItem);
| +  RestorePciDecoding (OriginalPciAttributes, OriginalPciAttributesCount);
| +
| +  //
| +  // The size of the script may have changed, possibly due to platform 
devices
| +  // having been hot-plugged before platform reset. Re-read the size.
| +  //
|    Status = QemuFwCfgFindFile ("etc/table-loader", &FwCfgItem, &FwCfgSize);
|    if (EFI_ERROR (Status)) {
|      return Status;
| @@ -989,10 +1007,8 @@ InstallQemuFwCfgTables (
|    if (LoaderStart == NULL) {
|      return EFI_OUT_OF_RESOURCES;
|    }
| -  EnablePciDecoding (&OriginalPciAttributes, &OriginalPciAttributesCount);
|    QemuFwCfgSelectItem (FwCfgItem);
|    QemuFwCfgReadBytes (FwCfgSize, LoaderStart);
| -  RestorePciDecoding (OriginalPciAttributes, OriginalPciAttributesCount);
|    LoaderEnd = LoaderStart + FwCfgSize / sizeof *LoaderEntry;
|
|    AllocationsRestrictedTo32Bit = NULL;

But, again, this only makes sense if QEMU implements (a).

Thanks
Laszlo

[Prev in Thread]

Current Thread

[Next in Thread]

Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support), Shameerali Kolothum Thodi, 2019/09/20
- Re: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support), Laszlo Ersek <=
  - RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support), Shameerali Kolothum Thodi, 2019/09/24
  - RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support), Shameerali Kolothum Thodi, 2019/09/26
    - Re: Invalid blob size on NVDIMM hot-add, Laszlo Ersek, 2019/09/26

Prev by Date: Re: [RFC v2 0/9] error: auto propagated local_err
Next by Date: Re: [RFC v2 0/9] error: auto propagated local_err
Previous by thread: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support)
Next by thread: RE: Invalid blob size on NVDIMM hot-add (was: RE: [RFC PATCH 0/4] ARM virt: ACPI memory hotplug support)
Index(es):
- Date
- Thread