[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v3 04/35] spapr/xive: introduce a XIVE interrupt c
From: |
Cédric Le Goater |
Subject: |
Re: [Qemu-ppc] [PATCH v3 04/35] spapr/xive: introduce a XIVE interrupt controller for sPAPR |
Date: |
Fri, 4 May 2018 15:05:08 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 |
On 05/04/2018 05:33 AM, David Gibson wrote:
> On Thu, May 03, 2018 at 06:50:09PM +0200, Cédric Le Goater wrote:
>> On 05/03/2018 07:22 AM, David Gibson wrote:
>>> On Thu, Apr 26, 2018 at 12:43:29PM +0200, Cédric Le Goater wrote:
>>>> On 04/26/2018 06:20 AM, David Gibson wrote:
>>>>> On Tue, Apr 24, 2018 at 11:46:04AM +0200, Cédric Le Goater wrote:
>>>>>> On 04/24/2018 08:51 AM, David Gibson wrote:
>>>>>>> On Thu, Apr 19, 2018 at 02:43:00PM +0200, Cédric Le Goater wrote:
>>>>>>>> sPAPRXive is a model for the XIVE interrupt controller device of the
>>>>>>>> sPAPR machine. It holds the routing XIVE table, the Interrupt
>>>>>>>> Virtualization Entry (IVE) table which associates interrupt source
>>>>>>>> numbers with targets.
>>>>>>>>
>>>>>>>> Also extend the XiveFabric with an accessor to the IVT. This will be
>>>>>>>> needed by the routing algorithm.
>>>>>>>>
>>>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> May be should introduce a XiveRouter model to hold the IVT. To be
>>>>>>>> discussed.
>>>>>>>
>>>>>>> Yeah, maybe. Am I correct in thinking that on pnv there could be more
>>>>>>> than one XiveRouter?
>>>>>>
>>>>>> There is only one, the main IC.
>>>>>
>>>>> Ok, that's what I thought originally. In that case some of the stuff
>>>>> in the patches really doesn't make sense to me.
>>>>
>>>> well, there is one IC per chip on powernv, but we haven't reach that part
>>>> yet.
>>>
>>> Hmm. There's some things we can delay dealing with, but I don't think
>>> this is one of them. I think we need to understand how multichip is
>>> going to work in order to come up with a sane architecture. Otherwise
>>> I fear we'll end up with something that we either need to horribly
>>> bastardize for multichip, or have to rework things dramatically
>>> leading to migration nightmares.
>>
>> So, it is all controlled by MMIO, so we should be fine on that part.
>> As for the internal tables, they are all configured by firmware, using
>> a chip identifier (block). I need to check how the remote XIVE are
>> accessed. I think this is by MMIO.
>
> Right, but for powernv we execute OPAL inside the VM, rather than
> emulating its effects. So we still need to model the actual hardware
> interfaces. OPAL hides the details from the kernel, but not from us
> on the other side.
Yes. This is the case in the current model. I took a look today and
I have a few fixes for the MMIO layout for P9 chips which I will send.
As for XIVE, the model needs to be a little more complex to support
VSD_MODE_FORWARD tables which describe how to forward a notification
to another XIVE IC on another chip. They contain an address on which
to load, This is another hop in the notification chain.
>> I haven't looked at multichip XIVE support but I am not too worried as
>> the framework is already in place for the machine.
>>
>>>>>>> If we did have a XiveRouter, I'm not sure we'd need the XiveFabric
>>>>>>> interface, possibly its methods could just be class methods of
>>>>>>> XiveRouter.
>>>>>>
>>>>>> Yes. We could introduce a XiveRouter to share the ivt table between
>>>>>> the sPAPRXive and the PnvXIVE models, the interrupt controllers of
>>>>>> the machines. Methods would provide way to get the ivt/eq/nvt
>>>>>> objects required for routing. I need to add a set_eq() to push the
>>>>>> EQ data.
>>>>>
>>>>> Hrm. Well, to add some more clarity, let's say the XiveRouter is the
>>>>> object which owns the IVT.
>>>>
>>>> OK. that would be a model with some state and not an interface.
>>>
>>> Yes. For papr variant it would have the whole IVT contents as its
>>> state. For the powernv, just the registers telling it where to find
>>> the IVT in RAM.
>>>
>>>>> It may or may not do other stuff as well.
>>>>
>>>> Its only task would be to do the final event routing: get the IVE,
>>>> get the EQ, push the EQ DATA in the OS event queue, notify the CPU.
>>>
>>> That seems like a lot of steps. Up to push the EQ DATA, certainly.
>>> And I guess it'll have to ping an NVT somehow, but I'm not sure it
>>> should know about CPUs as such.
>>
>> For PowerNV, the concept could be generalized, yes. An NVT can
>> contain the interrupt state of a logical server but the common
>> case is baremetal without guests for QEMU and so we have a NVT
>> per cpu.
>
> Hmm. We eventually want to support a kernel running guests under
> qemu/powernv though, right?
arg. an emulated hypervisor ! OK let's say this is a long term goal :)
> So even if we don't allow it right now,
> we don't want allowing that to require major surgery to our
> architecture.
That I agree on.
>> PowerNV will have some limitation but we can make it better than
>> today for sure. It boots.
>>
>> We can improve some of the NVT notification process, the way NVT
>> are matched eventually. may be support remote engines if the
>> NVT is not local. I have not looked at the details.
>>
>>> I'm not sure at this stage what should own the EQD table.
>>
>> The EQDT is in RAM.
>
> Not for spapr, it's not.
yeah ok. It's in QEMU/KVM.
> And even when it is in RAM, something needs
> to own the register that gives its base address.
It's more complex than registers on powernv. There is a procedure
to define the XIVE tables using XIVE table descriptors which contain
their characteristics, size, indirect vs. indirect, local vs remote.
OPAL/skiboot defines all these to configure the HW, and the model
necessarily needs to support the same interface. This is the case
for a single chip.
C.
>>> In the multichip case is there one EQD table for every IVT?
>>
>> There is one EQDT per chip, same for the IVT. They are in RAM,
>> identified with a block ID.
>>
>>> I'm guessing
>>> not - I figure the EQD table must be effectively global so that any
>>> chip's router can send events to any EQ in the whole system.
>>>>>> Now IIUC, on pnv the IVT lives in main system memory.
>>>>
>>>> yes. It is allocated by skiboot in RAM and fed to the HW using some
>>>> IC configuration registers. Then, each entry is configured with OPAL
>>>> calls and the HW is updated using cache scrub registers.
>>>
>>> Right. At least for the first pass we should be able to treat the
>>> cache scrub registers as no-ops and just not cache anything in the
>>> qemu implementation.
>>
>> The model currently supports the cache scrub registers, we need it
>> to update some values. It's not too complex.
>
> Ok.
>
>>>>> Under PAPR is the IVT in guest memory, or is it outside (updated by
>>>>> hypercalls/rtas)?
>>>>
>>>> Under sPAPR, the IVT is updated by the H_INT_SET_SOURCE_CONFIG hcall
>>>> which configures the targeting of an IRQ. It's not in the guest
>>>> memory.
>>>
>>> Right.
>>>
>>>> Behind the hood, the IVT is still configured by OPAL under KVM and
>>>> by QEMU when kernel_irqchip=off
>>>
>>> Sure. Even with kernel_irqchip=on there's still logically a guest IVT
>>> (or "IVT view" I guess), even if it's actual entries are stored
>>> distributed across various places in the host's IVTs.
>>
>> yes. The XIVE KVM device caches the info. This is used to dump the
>> state without doing OPAL calls.
>>
>> C.
>>
>>
>>>>>> The XiveRouter would also be a XiveFabric (or some other name) to
>>>>>> let the internal sources of the interrupt controller forward events.
>>>>>
>>>>> The further we go here, the less sure I am that XiveFabric even makes
>>>>> sense as a concept.
>>>>
>>>> See previous email.
>>>
>>
>