qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-6.0 0/8] spapr: Address the confusion between IPI numbers


From: Greg Kurz
Subject: Re: [PATCH for-6.0 0/8] spapr: Address the confusion between IPI numbers and vCPU ids
Date: Mon, 23 Nov 2020 11:07:29 +0100

On Mon, 23 Nov 2020 09:04:42 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> On 11/20/20 6:46 PM, Greg Kurz wrote:
> > A regression was recently fixed in the sPAPR XIVE code for QEMU 5.2
> > RC3 [1]. It boiled down to a confusion between IPI numbers and vCPU
> > ids, which happen to be numerically equal in general, but are really
> > different entities that can diverge in some setups. When this happens,
> > we end up misconfiguring XIVE in a way that is almost fatal for the
> > guest.
> > 
> > The confusion comes from XICS which has historically assumed equality
> > between interrupt server numbers and vCPU ids, as mentionned in a
> > comment back from 2011 in the linux kernel icp_native_init_one_node()
> > function:
> > 
> >     /* This code does the theorically broken assumption that the interrupt
> >      * server numbers are the same as the hard CPU numbers.
> >      * This happens to be the case so far but we are playing with fire...
> >      * should be fixed one of these days. -BenH.
> >      */
> > 
> > This assumption crept into QEMU through the "ibm,interrupt-server-ranges"
> > property of the "interrupt-controller" node in the DT. This property
> > contains ranges of consecutive vCPU ids defined as (first id, # of ids).
> > In the case of QEMU, we define a single range starting from 0 up to the
> > highest vCPU id, as returned by spapr_max_server_number(). This has
> > always been associated to the "nr_servers" wording when naming variables
> > or function arguments. When XIVE got added, we introduced an sPAPR IRQ
> > abstraction to be able to control several interrupt controller backends.
> > The sPAPR IRQ base class provides a dt() handler used to populate the
> > "interrupt-controller" node in the DT. This handler takes an "nr_server"
> > argument inherited from XICS and we ended up using it to populate the
> > "ibm,xive-lisn-ranges" property used with XIVE, which has completely
> > different semantics. It contain ranges of interrupt numbers that the
> > guest can use for any purpose. Since one obvious purpose is IPI, its
> > first range should just be able to accomodate all possible vCPUs.
> 
> To clarify, PAPR says it is a requirement :
> 
>   "The first range will contain at least one per possible thread in the 
>    partition."
> 
> The regression showed that we were not initializing correctly this range 
> in QEMU/KVM. I an not even sure it had the correct size either since
> we were anyhow initializing 4096 IPIs.
> 

The bad thing was that each online vCPU would reset it's IPI in
KVM using a bogus IPI number (the vCPU id), and thus doesn't reset
the interrupt the guest is really going to use for the IPI.

> > In the case of QEMU, we define a single range starting from 0 up
> > to "nr_server" but we should rather size it to the number of vCPUs
> > actually (ie. smp.max_cpus).
> 
> ah. And so spapr_max_server_number(spapr) is crap ? This is starting
> to be complex to follow :/
>  

No. spapr_max_server_number(spapr) gives the highest vCPU id that
we end over to KVM in order to optimize VP id allocation in the HW.
But it definitely has nothing to do with "ibm,xive-lisn-ranges".

David suggested in some other mail that we could maybe pass
both spapr_max_server_number(spapr) and smp.max_cpus to the
activate() handler.

> > This series aims at getting rid of the "nr_server" argument in the
> > sPAPR IC handlers. Since both the highest possible vCPU id and the
> > maximum number of vCPUs are invariants for XICS and XIVE respectively,
> 
> What XIVE cares about is the number of possible IPIs and the number
> of vCPUs since we deduced from that the number of event queue 
> descriptors, which is another XIVE structure.
> 
> > let's make them device properties to be configured by the machine when
> > it creates the interrupt controllers and use them where needed.
> > 
> > This doesn't cause any visible change to setups using the default
> > VSMT machine settings. This changes "ibm,xive-lisn-ranges" for
> > setups that mess with VSMT, but this is acceptable since linux
> > only allocates one interrupt per vCPU and the higher part of the
> > range was never used.
> 
> This range is only used for vCPUs. 
> 
> C.
> 
> > [1] 
> > https://git.qemu.org/?p=qemu.git;a=commit;h=6d24795ee7e3199d199d3c415312c93382ad1807
> > 
> > Greg Kurz (8):
> >   spapr/xive: Turn some sanity checks into assertions
> >   spapr/xive: Introduce spapr_xive_nr_ends()
> >   spapr/xive: Add "nr-servers" property
> >   spapr/xive: Add "nr-ipis" property
> >   spapr/xics: Drop unused argument to xics_kvm_has_broken_disconnect()
> >   spapr/xics: Add "nr-servers" property
> >   spapr: Drop "nr_servers" argument of the sPAPR IC activate() operation
> >   spapr: Drop "nr_servers" argument of the sPAPR IC dt() operation
> > 
> >  include/hw/ppc/spapr.h      |  4 +--
> >  include/hw/ppc/spapr_irq.h  |  9 ++---
> >  include/hw/ppc/spapr_xive.h | 25 ++++++++++++-
> >  include/hw/ppc/xics_spapr.h | 23 +++++++++---
> >  hw/intc/spapr_xive.c        | 72 ++++++++++++++++++++++---------------
> >  hw/intc/spapr_xive_kvm.c    |  4 +--
> >  hw/intc/xics_kvm.c          |  4 +--
> >  hw/intc/xics_spapr.c        | 45 ++++++++++++++---------
> >  hw/ppc/spapr.c              |  7 ++--
> >  hw/ppc/spapr_irq.c          | 27 +++++++-------
> >  10 files changed, 141 insertions(+), 79 deletions(-)
> > 
> 




reply via email to

[Prev in Thread] Current Thread [Next in Thread]