qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs


From: Igor Mammedov
Subject: Re: [RFC 3/3] x68: acpi: trigger SMI before scanning for hotplugged CPUs
Date: Tue, 14 Jul 2020 17:19:35 +0200

On Tue, 14 Jul 2020 14:41:28 +0200
Laszlo Ersek <lersek@redhat.com> wrote:

> On 07/14/20 14:28, Laszlo Ersek wrote:
> > (CC'ing Peter Krempa due to virsh setvcpu (singular) / setvcpus (plural)
> > references)
> > 
> > On 07/10/20 18:17, Igor Mammedov wrote:  
> >> In case firmware has negotiated CPU hotplug SMI feature, generate
> >> AML to describe SMI IO port region and send SMI to firmware
> >> on each CPU hotplug SCI.
> >>
> >> It might be not really usable, but should serve as a starting point to
> >> discuss how better to deal with split hotplug sequence during hot-add
> >> (
> >> ex scenario where it will break is:
> >>    hot-add  
> >>       -> (QEMU) add CPU in hotplug regs
> >>       -> (QEMU) SCI  
> >>            -1-> (OS) scan
> >>                -1-> (OS) SMI
> >>                -1-> (FW) pull in CPU1 ***
> >>                -1-> (OS) start iterating hotplug regs
> >>    hot-add  
> >>       -> (QEMU) add CPU in hotplug regs
> >>       -> (QEMU) SCI  
> >>             -2-> (OS) scan (blocked on mutex till previous scan is 
> >> finished)
> >>                -1-> (OS) 1st added CPU1 send device check event -> 
> >> INIT/SIPI
> >>                -1-> (OS) 1st added CPU2 send device check event -> 
> >> INIT/SIPI
> >>                        that's where it explodes, since FW didn't see CPU2
> >>                        when SMI was called
> >> )
> >>
> >> hot remove will throw in yet another set of problems, so lets discuss
> >> both here and see if we can  really share hotplug registers block between
> >> FW and AML or we should do something else with it.  
> > 
> > This issue is generally triggered by management applications such as
> > libvirt that issue device_add commands in quick succession. For libvirt,
> > the command is "virsh setvcpus" (plural) with multiple CPUs specified
> > for plugging. The singular "virsh setvcpu" command, which is more
> > friendly towards guest NUMA, does not run into the symptom.
> > 
> > The scope of the scan method lock is not large enough, with SMI in the
> > picture.
> > 
> > I suggest that we not uproot the existing AML code or the hotplug
> > register block. Instead, I suggest that we add serialization at a higher
> > level, with sufficient scope.
> > 
> > QEMU:
> > 
> > - introduce a new flag standing for "CPU plug operation in progress"
> > 
> > - if ICH9_LPC_SMI_F_BROADCAST_BIT has been negotiated:
> > 
> >   - "device_add" and "device_del" should enforce
> >     ICH9_LPC_SMI_F_CPU_HOTPLUG_BIT and
> >     ICH9_LPC_SMI_F_CPU_HOT_UNPLUG_BIT, respectively
> > 
> >   - both device_add and device_del (for VCPUs) should set check the
> >     "in progress" flag.
> > 
> >     - If set, reject the request synchronously
> > 
> >     - Otherwise, set the flag, and commence the operation
> > 
> >   - in cpu_hotplug_wr(), where we emit the ACPI_DEVICE_OST event with
> >     qapi_event_send_acpi_device_ost(), clear the "in-progress" flag.
> > 
> > - If QEMU executes the QMP command processing and the cpu_hotplug_wr()
> > function on different (host OS) threads, then perhaps we should use an
> > atomic type for the flag. (Not sure about locking between QEMU threads,
> > sorry.) I don't really expect race conditions, but in case we ever get
> > stuck with the flag, we should make sure that the stuck state is "in
> > progress", and not "not in progress". (The former state can prevent
> > further plug operations, but cannot cause the guest to lose state.)  
> 
> Furthermore, the "CPU plug operation in progress" flag should be:
> - either migrated,
> - or a migration blocker.
> 
> Because on the destination host, device_add should be possible if and
> only if the plug operation completed (either still on the source host,
> or on the destination host).

I have a way more simple alternative idea, which doesn't involve libvirt.

We can change AML to
  1. cache hotplugged CPUs from controller
  2. send SMI
  3. send Notify event to OS to online cached CPUs
this way we never INIT/SIPI cpus that FW hasn't seen yet
as for FW, it can relocate extra CPU that arrived after #1
it won't cause any harm as on the next SCI AML will pick up those
CPUs and SMI upcall will be just NOP.

I'll post a patch here on top of this series for you to try
(without any of your comments addressed yet, as it's already written
and I was testing it for a while to make sure it won't explode
with various windows versions)

> I guess that the "migration blocker" option is easier.
> 
> Anyway I assume this is already handled with memory hotplug somehow
> (i.e., migration attempt between device_add and ACPI_DEVICE_OST).

Thanks for comments,
I'll need some time to ponder on other comments and do some
palaeontology research to answer questions
(aka. I need to make up excuses for the code I wrote :) )
 
> Thanks,
> Laszlo




reply via email to

[Prev in Thread] Current Thread [Next in Thread]