qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadca


From: Laszlo Ersek
Subject: Re: [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use
Date: Tue, 14 Jul 2020 12:56:50 +0200
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Firefox/52.0 Thunderbird/52.9.1

On 07/10/20 18:17, Igor Mammedov wrote:
> There were reports of guest crash on CPU hotplug, when using q35 machine
> type and QVMF with Secure Boot, due to hotplugged CPU trying to process SMI

(1) typo: s/QVMF/OVMF/ please

(2) Please replace "Secure Boot" with "SMM". In everyday practice it's
OK to use them interchangeably, but in this commit message I'd like us
to be more precise.

> at default SMI handler location without it being relocated by firmware first.
>
> Fix it by refusing hotplug if firmware hasn't negotiatiad CPU hotplug SMI

(3) s/negotiatiad/negotiated/

> support while SMI broadcast is in use.
>
> Signed-off-by: Igor Mammedov <imammedo@redhat.com>
> ---
>  hw/acpi/ich9.c | 12 +++++++++++-
>  hw/i386/pc.c   | 11 +++++++++++
>  2 files changed, 22 insertions(+), 1 deletion(-)
>
> diff --git a/hw/acpi/ich9.c b/hw/acpi/ich9.c
> index 2d204babc6..a22b434e0b 100644
> --- a/hw/acpi/ich9.c
> +++ b/hw/acpi/ich9.c
> @@ -408,10 +408,20 @@ void ich9_pm_device_pre_plug_cb(HotplugHandler 
> *hotplug_dev, DeviceState *dev,
>      ICH9LPCState *lpc = ICH9_LPC_DEVICE(hotplug_dev);
>
>      if (object_dynamic_cast(OBJECT(dev), TYPE_PC_DIMM) &&
> -        !lpc->pm.acpi_memory_hotplug.is_enabled)
> +        !lpc->pm.acpi_memory_hotplug.is_enabled) {
>          error_setg(errp,
>                     "memory hotplug is not enabled: %s.memory-hotplug-support 
> "
>                     "is not set", object_get_typename(OBJECT(lpc)));
> +    } else if (object_dynamic_cast(OBJECT(dev), TYPE_CPU)) {
> +        uint64_t negotiated = lpc->smi_negotiated_features;

Wow, this is a relief. I thought it would be a difficult problem to
access the ICH9-LPC object cleanly, on the call stack of the device_add
command. I didn't imagine it would be at our disposal immediately.

> +
> +        if (negotiated & BIT_ULL(ICH9_LPC_SMI_F_BROADCAST_BIT) &&
> +            !(negotiated & BIT_ULL(ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT))) {
> +            error_setg(errp, "cpu hotplug SMI was not enabled by firmware");

(4) Please let's call this

  cpu hotplug *with* SMI

not just

  cpu hotplug SMI

(Emphasis added on "with" just for the sake of this discussion; no need
to embed the asterisks in the message.)

Because:

In my thinking, the feature that the firmware negotiates is not:

  SMI or no SMI, on CPU hotplug

Instead, the firmware negotiates:

  CPU hotplug with SMI, or no CPU hotplug

IOW, "SMI-or-no-SMI" is not a sub-feature of CPU hotplug; the feature
being negotiated, when SMI broadcast is enabled, is CPU hotplug as a
whole. That's exactly what this patch implements.


> +            error_append_hint(errp, "update machine type to newer than 5.0 "
> +                "and firmware that suppors CPU hotplug in Secure Boot mode");

(5) Please replace

  "in Secure Boot mode"

with

  "with SMM"

(for "firmware that suppors CPU hotplug with SMM")

> +        }
> +    }
>  }
>
>  void ich9_pm_device_plug_cb(HotplugHandler *hotplug_dev, DeviceState *dev,
> diff --git a/hw/i386/pc.c b/hw/i386/pc.c
> index 6fe80c84d7..dc1e9157d7 100644
> --- a/hw/i386/pc.c
> +++ b/hw/i386/pc.c
> @@ -1508,6 +1508,17 @@ static void pc_cpu_pre_plug(HotplugHandler 
> *hotplug_dev,
>          return;
>      }
>
> +    if (pcms->acpi_dev) {
> +        Error *local_err = NULL;
> +
> +        hotplug_handler_pre_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev,
> +                                 &local_err);
> +        if (local_err) {
> +            error_propagate(errp, local_err);
> +            return;
> +        }
> +    }
> +
>      init_topo_info(&topo_info, x86ms);
>
>      env->nr_dies = x86ms->smp_dies;
>

(6) This looks sane to me, but I have a question for the *pre-patch*
code.

I notice that hotplug_handler_pre_plug() is already called from the
(completely unrelated) function pc_memory_pre_plug().

In pc_memory_pre_plug(), we have the following snippet:

    /*
     * When -no-acpi is used with Q35 machine type, no ACPI is built,
     * but pcms->acpi_dev is still created. Check !acpi_enabled in
     * addition to cover this case.
     */
    if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
        error_setg(errp,
                   "memory hotplug is not enabled: missing acpi device or acpi 
disabled");
        return;
    }

Whereas in pc_cpu_pre_plug(), the present patch only adds a
"pcms->acpi_dev" nullity check.

Should pc_cpu_pre_plug() check for ACPI enablement similarly to
pc_memory_pre_plug()?

I'm asking for two reasons:

(6a) for the feature at hand (CPU hotplug with SMI), maybe we should
write:

    if (pcms->acpi_dev && x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {

(6b) or maybe more strictly, copy the check from memory hotplug (just
update the error message):

    if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
        error_setg(errp,
                   "CPU hotplug is not enabled: missing acpi device or acpi 
disabled");
        return;
    }

Because CPU hotplug depends on ACPI too, just like memory hotplug,
regardless of firmware, and regardless of guest-SMM. Am I correct to
think that?

Basically, I'm asking if we should replicate original commit
8cd91acec8df ("pc: fail memory hot-plug/unplug with -no-acpi and Q35
machine type", 2018-01-12) for CPU hotplug first (in a separate patch!),
before dealing with "lpc->smi_negotiated_features" in this patch.

Hmm... I'm getting confused. I *do* see similar checks in pc_cpu_plug()
and pc_cpu_unplug_request_cb(). But:

- I don't understand what determines whether we put the ACPI check in
*PRE* plug functions, or the plug functions,

- and I don't understand why pc_cpu_plug() and
pc_cpu_unplug_request_cb() only check "pcms->acpi_dev", and not
x86_machine_is_acpi_enabled().


(7) According to my request under patch#1, I propose that we should
implement a similar rejection for CPU hot-unplug, in this series. (Can
be a separate patch, of course.)

Thanks!
Laszlo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]