qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH for-6.0 2/8] spapr/xive: Introduce spapr_xive_nr_ends()


From: Greg Kurz
Subject: Re: [PATCH for-6.0 2/8] spapr/xive: Introduce spapr_xive_nr_ends()
Date: Tue, 24 Nov 2020 18:01:20 +0100

On Tue, 24 Nov 2020 14:54:38 +0100
Cédric Le Goater <clg@kaod.org> wrote:

> On 11/23/20 12:16 PM, Greg Kurz wrote:
> > On Mon, 23 Nov 2020 10:46:38 +0100
> > Cédric Le Goater <clg@kaod.org> wrote:
> > 
> >> On 11/20/20 6:46 PM, Greg Kurz wrote:
> >>> We're going to kill the "nr_ends" field in a subsequent patch.
> >>
> >> why ? it is one of the tables of the controller and its part of 
> >> the main XIVE concepts. Conceptually, we could let the machine 
> >> dimension it with an arbitrary value as OPAL does. The controller
> >> would fail when the table is fully used. 
> >>
> > 
> > The idea is that the sPAPR machine only true need is to create a
> > controller that can accommodate up to a certain number of vCPU ids.
> > It doesn't really to know about the END itself IMHO.> 
> > This being said, if we decide to pass both spapr_max_server_number()
> > and smp.max_cpus down to the backends as function arguments, we won't
> > have to change "nr_ends" at all.
> 
> I would prefer that but I am still not sure what they represent. 
> 
> Looking at the sPAPR XIVE code, we deal with numbers/ranges in the 
> following places today.
> 
>  * spapr_xive_dt() 
> 
>    It defines a range of interrupt numbers to be used by the guest 
>    for the threads/vCPUs IPIs. It's a subset of interrupt numbers 
>    in :
> 
>               [ SPAPR_IRQ_IPI - SPAPR_IRQ_EPOW [
> 
>    These are not vCPU ids.
> 
>    Since these interrupt numbers will be considered as free to use
>    by the OS, it makes sense to pre-claim them. But claiming an 
>    interrupt number in the guest can potentially set up, through 
>    the KVM device, a mapping on the host and in HW. See below why
>    this can be a problem.
> 
>  * kvmppc_xive_cpu_connect()   
> 
>    This sizes the NVT tables in OPAL for the guest. This is the  
>    max number of vCPUs of the guest (not vCPU ids)
> 

I guess you're talking about KVM_DEV_XIVE_NR_SERVERS in
kvmppc_xive_connect() actually. We're currently passing
spapr_max_server_number() (vCPU id) but you might be
right.

I need to re-read the story around VSMT and XIVE.

commit 1e175d2e07c71d9574f5b1c74523abca54e2654f
Author: Sam Bobroff <sam.bobroff@au1.ibm.com>
Date:   Wed Jul 25 16:12:02 2018 +1000

    KVM: PPC: Book3S HV: Pack VCORE IDs to access full VCPU ID space

>  * spapr_irq_init()
> 
>    This is where the IPI interrupt numbers are claimed today. 
>    Directly in QEMU and KVM, if the machine is running XIVE only, 
>    indirectly if it's dual, first in QEMU and then in KVM when 
>    the machine switches of interrupt mode in CAS. 
> 
>    The problem is that the underlying XIVE resources in HW are 
>    allocated where the QEMU process is running. Which is not the
>    best option when the vCPUs are pinned on different chips.
> 
>    My patchset was trying to improve that by claiming the IPI on 
>    demand when the vCPU is connected to the KVM device. But it 
>    was using the vCPU id as the IPI interrupt number which is 
>    utterly wrong, the guest OS could use any number in the range 
>    exposed in the DT.
>    
>    The last patch you sent was going in the right direction I think.
>    That is to claim the IPI when the guest OS is requesting for it. 
> 
>    
> 160528045027.804522.6161091782230763832.stgit@bahia.lan/">http://patchwork.ozlabs.org/project/qemu-devel/patch/160528045027.804522.6161091782230763832.stgit@bahia.lan/
>    
>    But I don't understand why it was so complex. It should be like
>    the MSIs claimed by PCI devices.
> 

The difference here is that the guest doesn't claim IPIs. They are
supposedly pre-claimed in "ibm,xive-lisn-ranges". And this is actually
the case in QEMU.

The IPI setup sequence in the guest is basically:
1) grab a free irq from the bitmap, ie. "ibm,xive-lisn-ranges"
2) calls H_INT_GET_SOURCE_INFO, ie. populate_irq_data()
3) calls H_INT_SET_SOURCE_CONFIG, ie, configure_irq())

If we want an IPI to be claimed by the appropriate vCPU, we
can only do this from under H_INT_SET_SOURCE_CONFIG. And
until the guest eventually configures the IPI, KVM and QEMU
are out of sync.

This complexifies migration because we have to guess at
post load if we should claim the IPI in KVM or not. The
simple presence of the vCPU isn't enough : we need to
guess if the guest actually configured the IPI or not.

> 
> All this to say, that we need to size better the range in the 
> "ibm,xive-lisn-ranges" property if that's broken for vSMT. 
> 

Sizing the range to smp.max_cpus as proposed in this series
is fine, no matter what the VSMT is.

> Then, I think the IPIs can be treated just like the PCI MSIs
> but they need to be claimed first. That's the ugly part. 
> 

Yeah that's the big difference. For PCI MSIs, QEMU owns the
bitmap and the guest can claim (or release) a number of
MSIs the "ibm,change-msi" RTAS interface. There's no
such thing for IPIs : they are supposedly already claimed.

> Should we add a special check in h_int_set_source_config to
> deal with unclaimed IPIs that are being configured ?
> 

This is what my tentative fix does.

> 
> C.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]