qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadca


From: Igor Mammedov
Subject: Re: [RFC 2/3] x86: cphp: prevent guest crash on CPU hotplug when broadcast SMI is in use
Date: Fri, 17 Jul 2020 14:57:59 +0200

On Tue, 14 Jul 2020 12:56:50 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 07/10/20 18:17, Igor Mammedov wrote:
[...]

> > @@ -1508,6 +1508,17 @@ static void pc_cpu_pre_plug(HotplugHandler 
> > *hotplug_dev,
> >          return;
> >      }
> >
> > +    if (pcms->acpi_dev) {
> > +        Error *local_err = NULL;
> > +
> > +        hotplug_handler_pre_plug(HOTPLUG_HANDLER(pcms->acpi_dev), dev,
> > +                                 &local_err);
> > +        if (local_err) {
> > +            error_propagate(errp, local_err);
> > +            return;
> > +        }
> > +    }
> > +
> >      init_topo_info(&topo_info, x86ms);
> >
> >      env->nr_dies = x86ms->smp_dies;
> >  
> 
> (6) This looks sane to me, but I have a question for the *pre-patch*
> code.

(not so short introduction)

plug callbacks were designed for wiring up hotplugged device, hence
it has check for acpi_dev which is one of HW connections needed to make
it work. Later on coldplug was consolidated with it, so plug callbacks
are also handling coldplugged devices.

then later plug callback was split on pre_ and plug_, where pre_
part is called right before device_foo.realize() and plug_ right after.
Split was intended to make graceful failure easier, where pre_ is taking
care of checking already set properties and optionally setting additional
properties and can/should fail without side-effects, and plug_ part
should not fail (maybe there is still cleanup to do there) and used to
finish wiring after the device had been realized.


> 
> I notice that hotplug_handler_pre_plug() is already called from the
> (completely unrelated) function pc_memory_pre_plug().
> 
> In pc_memory_pre_plug(), we have the following snippet:
> 
>     /*
>      * When -no-acpi is used with Q35 machine type, no ACPI is built,
>      * but pcms->acpi_dev is still created. Check !acpi_enabled in
>      * addition to cover this case.
>      */
>     if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
>         error_setg(errp,
>                    "memory hotplug is not enabled: missing acpi device or 
> acpi disabled");
>         return;
>     }
> 
> Whereas in pc_cpu_pre_plug(), the present patch only adds a
> "pcms->acpi_dev" nullity check.
> 
> Should pc_cpu_pre_plug() check for ACPI enablement similarly to
> pc_memory_pre_plug()?
extra check is not must have in pc_memory_pre_plug() as it should not break 
anything
(modulo MMIO memhp interface, which went unnoticed since probably nobody
uses MMIO memhp registers directly (i.e. QEMU's ACPI tables is sole user))
 
> I'm asking for two reasons:
> 
> (6a) for the feature at hand (CPU hotplug with SMI), maybe we should
> write:
> 
>     if (pcms->acpi_dev && x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
pretty harmless in the same sense as in pc_memory_pre_plug(),
but I'd rather avoid checks that are not must have.


> (6b) or maybe more strictly, copy the check from memory hotplug (just
> update the error message):
> 
>     if (!pcms->acpi_dev || !x86_machine_is_acpi_enabled(X86_MACHINE(pcms))) {
>         error_setg(errp,
>                    "CPU hotplug is not enabled: missing acpi device or acpi 
> disabled");
can't do it as it will break CPU clodplug when -no-cpi is used

>         return;
>     }
> 
> Because CPU hotplug depends on ACPI too, just like memory hotplug,
> regardless of firmware, and regardless of guest-SMM. Am I correct to
> think that?

We have pcms->acpi_dev check in x86 code because isapc/piix4 machines
will/might not create the pm device (which implements acpi hw). 

(1) if (pcmc->pci_enabled && x86_machine_is_acpi_enabled(X86_MACHINE(pcms)))

if that weren't the case, calls to acpi_dev would be unconditional

I'm adding check into pre_plug so we can gracefully reject device_add
in case firmware is not prepared for handling hotplug SMI. Since
the later might crash the guest. It's for the sake of better user
experience since QEMU might easily run older firmware.

If we don't care about that, we can drop negotiation and the check,
send SMI on hotplug when SMI broadcast is enabled, that might
crash guest since it might be not supported by used fw, but that
was already the case for quite a while and I've heard only few
complaints. (I guess most users have sense not to use something
not impl./supported)


> Basically, I'm asking if we should replicate original commit
> 8cd91acec8df ("pc: fail memory hot-plug/unplug with -no-acpi and Q35
> machine type", 2018-01-12) for CPU hotplug first (in a separate patch!),
> before dealing with "lpc->smi_negotiated_features" in this patch.

I'd rather leave hw part decoupled from acpi tables part,
nothing prevents users implementing their own fw/os which uses
our cpuhp interface directly without ACPI.

> Hmm... I'm getting confused. I *do* see similar checks in pc_cpu_plug()
> and pc_cpu_unplug_request_cb(). But:
> 
> - I don't understand what determines whether we put the ACPI check in
> *PRE* plug functions, or the plug functions,
I beleive [1] answers that

> - and I don't understand why pc_cpu_plug() and
> pc_cpu_unplug_request_cb() only check "pcms->acpi_dev", and not
> x86_machine_is_acpi_enabled().

x86_machine_is_acpi_enabled() is not must have in this case as
callbacks implement only hw part of cpuhp protocol (QEMU part),
what guest uses to handle it (fw tables(qemu generated),
or some custom code) is another story.


> (7) According to my request under patch#1, I propose that we should
> implement a similar rejection for CPU hot-unplug, in this series. (Can
> be a separate patch, of course.)
> 
> Thanks!
> Laszlo
> 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]