[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SM
From: |
Laszlo Ersek |
Subject: |
Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature |
Date: |
Wed, 18 Jan 2017 11:19:49 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101 Thunderbird/45.6.0 |
On 01/18/17 11:03, Igor Mammedov wrote:
> On Tue, 17 Jan 2017 19:53:21 +0100
> Laszlo Ersek <address@hidden> wrote:
>
>> On 01/17/17 15:20, Igor Mammedov wrote:
>>> On Tue, 17 Jan 2017 14:46:05 +0100
>>> Laszlo Ersek <address@hidden> wrote:
>>>
>>>> On 01/17/17 14:20, Igor Mammedov wrote:
>>>>> On Tue, 17 Jan 2017 13:06:13 +0100
>>>>> Laszlo Ersek <address@hidden> wrote:
>>>>>
>>>>>> On 01/17/17 12:21, Igor Mammedov wrote:
>>>>>>> On Mon, 16 Jan 2017 21:46:55 +0100
>>>>>>> Laszlo Ersek <address@hidden> wrote:
>>>>>>>
>>>>>>>> On 01/13/17 14:09, Igor Mammedov wrote:
>>>>>>>>> On Thu, 12 Jan 2017 19:24:45 +0100
>>>>>>>>> Laszlo Ersek <address@hidden> wrote:
>>>>>>>>>
>>>>>>>>>> The generic edk2 SMM infrastructure prefers
>>>>>>>>>> EFI_SMM_CONTROL2_PROTOCOL.Trigger() to inject an SMI on each
>>>>>>>>>> processor. If
>>>>>>>>>> Trigger() only brings the current processor into SMM, then edk2
>>>>>>>>>> handles it
>>>>>>>>>> in the following ways:
>>>>>>>>>>
>>>>>>>>>> (1) If Trigger() is executed by the BSP (which is guaranteed before
>>>>>>>>>> ExitBootServices(), but is not necessarily true at runtime),
>>>>>>>>>> then:
>>>>>>>>>>
>>>>>>>>>> (a) If edk2 has been configured for "traditional" SMM
>>>>>>>>>> synchronization,
>>>>>>>>>> then the BSP sends directed SMIs to the APs with APIC
>>>>>>>>>> delivery,
>>>>>>>>>> bringing them into SMM individually. Then the BSP runs the
>>>>>>>>>> SMI
>>>>>>>>>> handler / dispatcher.
>>>>>>>>>>
>>>>>>>>>> (b) If edk2 has been configured for "relaxed" SMM
>>>>>>>>>> synchronization,
>>>>>>>>>> then the APs that are not already in SMM are not brought in,
>>>>>>>>>> and
>>>>>>>>>> the BSP runs the SMI handler / dispatcher.
>>>>>>>>>>
>>>>>>>>>> (2) If Trigger() is executed by an AP (which is possible after
>>>>>>>>>> ExitBootServices(), and can be forced e.g. by "taskset -c 1
>>>>>>>>>> efibootmgr"), then the AP in question brings in the BSP with a
>>>>>>>>>> directed SMI, and the BSP runs the SMI handler / dispatcher.
>>>>>>>>>>
>>>>>>>>>> The smaller problem with (1a) and (2) is that the BSP and AP
>>>>>>>>>> synchronization is slow. For example, the "taskset -c 1 efibootmgr"
>>>>>>>>>> command from (2) can take more than 3 seconds to complete, because
>>>>>>>>>> efibootmgr accesses non-volatile UEFI variables intensively.
>>>>>>>>>>
>>>>>>>>>> The larger problem is that QEMU's current behavior diverges from the
>>>>>>>>>> behavior usually seen on physical hardware, and that keeps exposing
>>>>>>>>>> obscure corner cases, race conditions and other instabilities in
>>>>>>>>>> edk2,
>>>>>>>>>> which generally expects / prefers a software SMI to affect all CPUs
>>>>>>>>>> at
>>>>>>>>>> once.
>>>>>>>>>>
>>>>>>>>>> Therefore introduce the "broadcast SMI" feature that causes QEMU to
>>>>>>>>>> inject
>>>>>>>>>> the SMI on all VCPUs.
>>>>>>>>>>
>>>>>>>>>> While the original posting of this patch
>>>>>>>>>> <http://lists.nongnu.org/archive/html/qemu-devel/2015-10/msg05658.html>
>>>>>>>>>> only intended to speed up (2), based on our recent "stress testing"
>>>>>>>>>> of SMM
>>>>>>>>>> this patch actually provides functional improvements.
>>>>>>>>>>
>>>>>>>>>> Cc: "Michael S. Tsirkin" <address@hidden>
>>>>>>>>>> Cc: Gerd Hoffmann <address@hidden>
>>>>>>>>>> Cc: Igor Mammedov <address@hidden>
>>>>>>>>>> Cc: Paolo Bonzini <address@hidden>
>>>>>>>>>> Signed-off-by: Laszlo Ersek <address@hidden>
>>>>>>>>>> Reviewed-by: Michael S. Tsirkin <address@hidden>
>>>>>>>>>> ---
>>>>>>>>>>
>>>>>>>>>> Notes:
>>>>>>>>>> v6:
>>>>>>>>>> - no changes, pick up Michael's R-b
>>>>>>>>>>
>>>>>>>>>> v5:
>>>>>>>>>> - replace the ICH9_LPC_SMI_F_BROADCAST bit value with the
>>>>>>>>>> ICH9_LPC_SMI_F_BROADCAST_BIT bit position (necessary for
>>>>>>>>>> DEFINE_PROP_BIT() in the next patch)
>>>>>>>>>>
>>>>>>>>>> include/hw/i386/ich9.h | 3 +++
>>>>>>>>>> hw/isa/lpc_ich9.c | 10 +++++++++-
>>>>>>>>>> 2 files changed, 12 insertions(+), 1 deletion(-)
>>>>>>>>>>
>>>>>>>>>> diff --git a/include/hw/i386/ich9.h b/include/hw/i386/ich9.h
>>>>>>>>>> index da1118727146..18dcca7ebcbf 100644
>>>>>>>>>> --- a/include/hw/i386/ich9.h
>>>>>>>>>> +++ b/include/hw/i386/ich9.h
>>>>>>>>>> @@ -250,4 +250,7 @@ Object *ich9_lpc_find(void);
>>>>>>>>>> #define ICH9_SMB_HST_D1 0x06
>>>>>>>>>> #define ICH9_SMB_HOST_BLOCK_DB 0x07
>>>>>>>>>>
>>>>>>>>>> +/* bit positions used in fw_cfg SMI feature negotiation */
>>>>>>>>>> +#define ICH9_LPC_SMI_F_BROADCAST_BIT 0
>>>>>>>>>> +
>>>>>>>>>> #endif /* HW_ICH9_H */
>>>>>>>>>> diff --git a/hw/isa/lpc_ich9.c b/hw/isa/lpc_ich9.c
>>>>>>>>>> index 376b7801a42c..ced6f803a4f2 100644
>>>>>>>>>> --- a/hw/isa/lpc_ich9.c
>>>>>>>>>> +++ b/hw/isa/lpc_ich9.c
>>>>>>>>>> @@ -437,7 +437,15 @@ static void ich9_apm_ctrl_changed(uint32_t val,
>>>>>>>>>> void *arg)
>>>>>>>>>>
>>>>>>>>>> /* SMI_EN = PMBASE + 30. SMI control and enable register */
>>>>>>>>>> if (lpc->pm.smi_en & ICH9_PMIO_SMI_EN_APMC_EN) {
>>>>>>>>>> - cpu_interrupt(current_cpu, CPU_INTERRUPT_SMI);
>>>>>>>>>> + if (lpc->smi_negotiated_features &
>>>>>>>>>> + (UINT64_C(1) << ICH9_LPC_SMI_F_BROADCAST_BIT)) {
>>>>>>>>>> + CPUState *cs;
>>>>>>>>>> + CPU_FOREACH(cs) {
>>>>>>>>>> + cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>>>>>>>>>> + }
>>>>>>>>> Shouldn't CPUs with default SMI base be excluded from broadcast?
>>>>>>>>>
>>>>>>>>
>>>>>>>> I don't think so; as far as I recall, in edk2 the initial SMM entry
>>>>>>>> code
>>>>>>>> -- before SMBASE relocation -- accommodates all VCPUs entering the same
>>>>>>>> code area for execution. The VCPUs are told apart from each other by
>>>>>>>> Initial APIC ID (that is, "who am I"), which is used as an index or
>>>>>>>> search criterion into a global shared array. Once they find their
>>>>>>>> respective private data blocks, the VCPUs don't step on each other's
>>>>>>>> toes.
>>>>>>> That's what I'm not sure.
>>>>>>> If broadcast SMI is sent before relocation all CPUs will use
>>>>>>> the same SMBASE and as result save their state in the same
>>>>>>> memory (even if it's state after RESET/POWER ON) racing/overwriting
>>>>>>> each other's saved state
>>>>>>
>>>>>> Good point!
>>>>>>
>>>>>>> and then on exit from SMI they all will restore
>>>>>>> whatever state that race produced so it seems plain wrong thing to do.
>>>>>>>
>>>>>>> Are you sure that edk2 does broadcast for SMI initialization or does it
>>>>>>> using directed SMIs?
>>>>>>
>>>>>> Hmmm, you are right. The SmmRelocateBases() function in
>>>>>> "UefiCpuPkg/PiSmmCpuDxeSmm/PiSmmCpuDxeSmm.c" first relocates SMBASE for
>>>>>> each individual AP in succession, by sending SMI IPIs to them, one by
>>>>>> one. Then it does the same for the BSP, by sending itself a similar SMI
>>>>>> IPI.
>>>>>>
>>>>>> The above QEMU code is only exercised much later, after the SMBASE
>>>>>> relocation, with the SMM stack up and running. It is used when a
>>>>>> (preferably broadcast) SMI is triggered for firmware services (for
>>>>>> example, the UEFI variable services).
>>>>>>
>>>>>> So, you are right.
>>>>>>
>>>>>> Considering edk2 only, the difference shouldn't matter -- when this code
>>>>>> is invoked (via an IO port write), the SMBASEs will all have been
>>>>>> relocated.
>>>>>>
>>>>>> Also, I've been informed that on real hardware, writing to APM_CNT
>>>>>> triggers an SMI on all CPUs, regardless of the individual SMBASE values.
>>>>>> So I believe the above code should be closest to real hardware, and the
>>>>>> pre-patch code was only written in unicast form for SeaBIOS's sake.
>>>>>>
>>>>>> I think the code is safe above. If the guest injects an SMI via APM_CNT
>>>>>> after negotiating SMI broadcast, it should have not left any VCPUs
>>>>>> without an SMI handler by that time. edk2 / OVMF conforms to this. In
>>>>>> the future we can tweak this further, if necessary; we have 63 more
>>>>>> bits, and we'll be able to reject invalid combinations of bits.
>>>>>>
>>>>>> Do you feel that we should skip VCPUs whose SMBASE has not been changed
>>>>>> from the default?
>>>>>>
>>>>>> If so, doesn't that run the risk of missing a VCPU that has an actual
>>>>>> SMI handler installed, and it just happens to be placed at the default
>>>>>> SMBASE location?
>>>>>>
>>>>>> ... I wouldn't mind skipping VCPUs with apparently unrelocated SMBASEs,
>>>>>> but I think (a) that's not what real hardware does, and (b) it is risky
>>>>>> if a VCPU's actual SMI handler has been installed while keeping the
>>>>>> default SMBASE value.
>>>>>>
>>>>>> What do you think?
>>>>> (a) probably doesn't consider hotplug, so it's totally fine for the case
>>>>> where firmware is the only one who wakes up/initializes CPUs.
>>>>> however consider 2 CPU hotplugged and then broadcast SMI happens
>>>>> to serve some other task (if there isn't unpark 'feature').
>>>>> Even if real hw does it, it's behavior not defined by SDM with possibly
>>>>> bad consequences.
>>>>>
>>>>> (b) alone it's risky so I'd skip based on combination of
>>>>>
>>>>> if (default SMBASE & CPU is in reset state)
>>>>> skip;
>>>>>
>>>>> that should be safe and it leaves open possibility to avoid adding
>>>>> unpark 'feature' to CPU.
>>>>
>>>> The above attributes ("SMBASE" and "CPU is in reset state") are specific
>>>> to x86 VCPUs. Should I use some dynamic QOM casting for this? Like the
>>>> X86_CPU() macro?
>>>>
>>>> Can I assume here that the macro will never return NULL? (I think so,
>>>> LPC is only used with x86.)
>>> yep, that should work.
>>>
>>>>
>>>> ... I guess SMBASE can be accessed as
>>>>
>>>> X86_CPU(cs)->env.smbase
>>> preferred style:
>>> X86CPU *cpu = X86_CPU(cs);
>>> cpu->...
>>>
>>>>
>>>> How do I figure out if the CPU is in reset state ("waiting for first
>>>> INIT") though? Note that this state should be distinguished from simply
>>>> "halted" (i.e., after a HLT). After a HLT, the SMI should be injected
>>>> and delivered; see commit a9bad65d2c1f. Edk2 regularly uses HLT to send
>>>> APs to sleep.
>>>
>>> how about using EIP reset value?
>>> x86_cpu_reset()
>>> ...
>>> env->eip = 0xfff0;
>>
>> Okay, so I tried this. It doesn't work.
>>
>> This is the code I wrote (extracting a new ich9_apm_broadcast_smi()
>> function from ich9_apm_ctrl_changed(), due to size reasons):
>>
>>> static void ich9_apm_broadcast_smi(void)
>>> {
>>> CPUState *cs;
>>>
>>> cpu_synchronize_all_states(); /* <--------- mark this */
> it probably should be executed after pause_all_vcpus(),
> maybe it will help.
>
>>> CPU_FOREACH(cs) {
>>> X86CPU *cpu = X86_CPU(cs);
>>> CPUX86State *env = &cpu->env;
>>>
>>> if (env->smbase == 0x30000 && env->eip == 0xfff0) {
>>> CPUClass *k = CPU_GET_CLASS(cs);
>>> uint64_t cpu_arch_id = k->get_arch_id(cs);
>>>
>>> trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
>>> continue;
>>> }
>>>
>>> cpu_interrupt(cs, CPU_INTERRUPT_SMI);
>>> }
>>> }
>>
> [...]
>> (b) If I add the cpu_synchronize_all_states() call before the loop, then
>> the incorrect filter matches go away; QEMU sees the KVM VCPU states
>> correctly, and the SMI is broad-cast.
>>
>> However, in this case, the boot slows to a *crawl*. It's unbearable. All
>> VCPUs are pegged at 100%, and you can easily follow the OVMF debug log
>> with the naked eye, almost line by line.
>> I didn't expect that cpu_synchronize_all_states() would be so costly,
>> but it makes the idea of checking VCPU state in the APM_CNT handler a
>> non-starter.
> I wonder were bottleneck is to slow down guest so much.
> What is actual cost of calling cpu_synchronize_all_states()?
>
> Maybe calling pause_all_vcpus() before cpu_synchronize_all_states()
> would help.
If I prepend just a pause_all_vcpus() function call, at the top, then
the VM freezes (quite understandably) when the first SMI is raised via
APM_CNT.
If I, instead, bracket the function body with pause_all_vcpus() and
resume_all_vcpus(), like this:
static void ich9_apm_broadcast_smi(void)
{
CPUState *cs;
pause_all_vcpus();
cpu_synchronize_all_states();
CPU_FOREACH(cs) {
X86CPU *cpu = X86_CPU(cs);
CPUX86State *env = &cpu->env;
if (env->smbase == 0x30000 && env->eip == 0xfff0) {
CPUClass *k = CPU_GET_CLASS(cs);
uint64_t cpu_arch_id = k->get_arch_id(cs);
trace_ich9_apm_broadcast_smi_skip(cpu_arch_id);
continue;
}
cpu_interrupt(cs, CPU_INTERRUPT_SMI);
}
resume_all_vcpus();
}
then I see the following symptoms:
- rather than 4 VCPUs being pegged at 100%, only one VCPU is pegged at
100%
- the boot process, as observed from the OVMF log, is just as slow
(= crawling) as without the VCPU pausing/resuming (i.e., like with
cpu_synchronize_all_states() only); so ultimately the
pausing/resuming doesn't help.
Thanks
Laszlo
- [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, (continued)
- [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/12
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/13
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/16
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/17
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/17
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/17
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/17
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/17
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/17
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature,
Laszlo Ersek <=
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Igor Mammedov, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 2/3] hw/isa/lpc_ich9: add broadcast SMI feature, Laszlo Ersek, 2017/01/18
- Re: [Qemu-devel] [PATCH v6 wave 2 0/3] q35: add negotiable broadcast SMI, Michael S. Tsirkin, 2017/01/18