[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v3 06/35] spapr/xive: introduce a XIVE interrupt p
From: |
Cédric Le Goater |
Subject: |
Re: [Qemu-ppc] [PATCH v3 06/35] spapr/xive: introduce a XIVE interrupt presenter model |
Date: |
Wed, 9 May 2018 09:27:17 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 |
On 05/05/2018 06:27 AM, David Gibson wrote:
> On Fri, May 04, 2018 at 03:11:57PM +0200, Cédric Le Goater wrote:
>> On 05/04/2018 06:51 AM, David Gibson wrote:
>>> On Thu, May 03, 2018 at 06:06:14PM +0200, Cédric Le Goater wrote:
>>>> On 05/03/2018 07:35 AM, David Gibson wrote:
>>>>> On Thu, Apr 26, 2018 at 11:27:21AM +0200, Cédric Le Goater wrote:
>>>>>> On 04/26/2018 09:11 AM, David Gibson wrote:
>>>>>>> On Thu, Apr 19, 2018 at 02:43:02PM +0200, Cédric Le Goater wrote:
>>>>>>>> The XIVE presenter engine uses a set of registers to handle priority
>>>>>>>> management and interrupt acknowledgment among other things. The most
>>>>>>>> important ones being :
>>>>>>>>
>>>>>>>> - Interrupt Priority Register (PIPR)
>>>>>>>> - Interrupt Pending Buffer (IPB)
>>>>>>>> - Current Processor Priority (CPPR)
>>>>>>>> - Notification Source Register (NSR)
>>>>>>>>
>>>>>>>> There is one set of registers per level of privilege, four in all :
>>>>>>>> HW, HV pool, OS and User. These are called rings. All registers are
>>>>>>>> accessible through a specific MMIO region called the Thread Interrupt
>>>>>>>> Management Areas (TIMA) but, depending on the privilege level of the
>>>>>>>> CPU, the view of the TIMA is filtered. The sPAPR machine runs at the
>>>>>>>> OS privilege and therefore can only accesses the OS and the User
>>>>>>>> rings. The others are for hypervisor levels.
>>>>>>>>
>>>>>>>> The CPU interrupt state is modeled with a XiveNVT object which stores
>>>>>>>> the values of the different registers. The different TIMA views are
>>>>>>>> mapped at the same address for each CPU and 'current_cpu' is used to
>>>>>>>> retrieve the XiveNVT holding the ring registers.
>>>>>>>>
>>>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
>>>>>>>> ---
>>>>>>>>
>>>>>>>> Changes since v2 :
>>>>>>>>
>>>>>>>> - introduced the XiveFabric interface
>>>>>>>>
>>>>>>>> hw/intc/spapr_xive.c | 25 ++++
>>>>>>>> hw/intc/xive.c | 279
>>>>>>>> ++++++++++++++++++++++++++++++++++++++++++++
>>>>>>>> include/hw/ppc/spapr_xive.h | 5 +
>>>>>>>> include/hw/ppc/xive.h | 31 +++++
>>>>>>>> include/hw/ppc/xive_regs.h | 84 +++++++++++++
>>>>>>>> 5 files changed, 424 insertions(+)
>>>>>>>>
>>>>>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>>>>>> index 90cde8a4082d..f07832bf0a00 100644
>>>>>>>> --- a/hw/intc/spapr_xive.c
>>>>>>>> +++ b/hw/intc/spapr_xive.c
>>>>>>>> @@ -13,6 +13,7 @@
>>>>>>>> #include "target/ppc/cpu.h"
>>>>>>>> #include "sysemu/cpus.h"
>>>>>>>> #include "monitor/monitor.h"
>>>>>>>> +#include "hw/ppc/spapr.h"
>>>>>>>> #include "hw/ppc/spapr_xive.h"
>>>>>>>> #include "hw/ppc/xive.h"
>>>>>>>> #include "hw/ppc/xive_regs.h"
>>>>>>>> @@ -95,6 +96,22 @@ static void spapr_xive_realize(DeviceState *dev,
>>>>>>>> Error **errp)
>>>>>>>>
>>>>>>>> /* Allocate the Interrupt Virtualization Table */
>>>>>>>> xive->ivt = g_new0(XiveIVE, xive->nr_irqs);
>>>>>>>> +
>>>>>>>> + /* The Thread Interrupt Management Area has the same address for
>>>>>>>> + * each chip. On sPAPR, we only need to expose the User and OS
>>>>>>>> + * level views of the TIMA.
>>>>>>>> + */
>>>>>>>> + xive->tm_base = XIVE_TM_BASE;
>>>>>>>
>>>>>>> The constant should probably have PAPR in the name somewhere, since
>>>>>>> it's just for PAPR machines (same for the ESB mappings, actually).
>>>>>>
>>>>>> ok.
>>>>>>
>>>>>> I have also made 'tm_base' a property, like 'vc_base' for ESBs, in
>>>>>> case we want to change the value when the guest is instantiated.
>>>>>> I doubt it but this is an address in the global address space, so
>>>>>> letting the machine have control is better I think.
>>>>>
>>>>> I agree.
>>>>>
>>>>>>>> +
>>>>>>>> + memory_region_init_io(&xive->tm_mmio_user, OBJECT(xive),
>>>>>>>> + &xive_tm_user_ops, xive, "xive.tima.user",
>>>>>>>> + 1ull << TM_SHIFT);
>>>>>>>> + sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_mmio_user);
>>>>>>>> +
>>>>>>>> + memory_region_init_io(&xive->tm_mmio_os, OBJECT(xive),
>>>>>>>> + &xive_tm_os_ops, xive, "xive.tima.os",
>>>>>>>> + 1ull << TM_SHIFT);
>>>>>>>> + sysbus_init_mmio(SYS_BUS_DEVICE(dev), &xive->tm_mmio_os);
>>>>>>>> }
>>>>>>>>
>>>>>>>> static XiveIVE *spapr_xive_get_ive(XiveFabric *xf, uint32_t lisn)
>>>>>>>> @@ -104,6 +121,13 @@ static XiveIVE *spapr_xive_get_ive(XiveFabric
>>>>>>>> *xf, uint32_t lisn)
>>>>>>>> return lisn < xive->nr_irqs ? &xive->ivt[lisn] : NULL;
>>>>>>>> }
>>>>>>>>
>>>>>>>> +static XiveNVT *spapr_xive_get_nvt(XiveFabric *xf, uint32_t server)
>>>>>>>> +{
>>>>>>>> + PowerPCCPU *cpu = spapr_find_cpu(server);
>>>>>>>> +
>>>>>>>> + return cpu ? XIVE_NVT(cpu->intc) : NULL;
>>>>>>>> +}
>>>>>>>
>>>>>>> So this is a bit of a tangent, but I've been thinking of implementing
>>>>>>> a scheme where there's an opaque pointer in the cpu structure for the
>>>>>>> use of the machine. I'm planning for that to replace the intc pointer
>>>>>>> (which isn't really used directly by the cpu). That would allow us to
>>>>>>> have spapr put a structure there and have both xics and xive pointers
>>>>>>> which could be useful later on.
>>>>>>
>>>>>> ok. That should simplify the patchset at the end, in which we need to
>>>>>> switch the 'intc' pointer.
>>>>>>
>>>>>>> I think we'd need something similar to correctly handle migration of
>>>>>>> the VPA state, which is currently horribly broken.
>>>>>>>
>>>>>>>> +
>>>>>>>> static const VMStateDescription vmstate_spapr_xive_ive = {
>>>>>>>> .name = TYPE_SPAPR_XIVE "/ive",
>>>>>>>> .version_id = 1,
>>>>>>>> @@ -143,6 +167,7 @@ static void spapr_xive_class_init(ObjectClass
>>>>>>>> *klass, void *data)
>>>>>>>> dc->vmsd = &vmstate_spapr_xive;
>>>>>>>>
>>>>>>>> xfc->get_ive = spapr_xive_get_ive;
>>>>>>>> + xfc->get_nvt = spapr_xive_get_nvt;
>>>>>>>> }
>>>>>>>>
>>>>>>>> static const TypeInfo spapr_xive_info = {
>>>>>>>> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
>>>>>>>> index dccad0318834..5691bb9474e4 100644
>>>>>>>> --- a/hw/intc/xive.c
>>>>>>>> +++ b/hw/intc/xive.c
>>>>>>>> @@ -14,7 +14,278 @@
>>>>>>>> #include "sysemu/cpus.h"
>>>>>>>> #include "sysemu/dma.h"
>>>>>>>> #include "monitor/monitor.h"
>>>>>>>> +#include "hw/ppc/xics.h" /* for ICP_PROP_CPU */
>>>>>>>> #include "hw/ppc/xive.h"
>>>>>>>> +#include "hw/ppc/xive_regs.h"
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * XIVE Interrupt Presenter
>>>>>>>> + */
>>>>>>>> +
>>>>>>>> +static uint64_t xive_nvt_accept(XiveNVT *nvt)
>>>>>>>> +{
>>>>>>>> + return 0;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +static void xive_nvt_set_cppr(XiveNVT *nvt, uint8_t cppr)
>>>>>>>> +{
>>>>>>>> + if (cppr > XIVE_PRIORITY_MAX) {
>>>>>>>> + cppr = 0xff;
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + nvt->ring_os[TM_CPPR] = cppr;
>>>>>>>
>>>>>>> Surely this needs to recheck if we should be interrupting the cpu?
>>>>>>
>>>>>> yes. In patch 9, when we introduce the nvt notify routine.
>>>>>
>>>>> Ok.
>>>>>
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +/*
>>>>>>>> + * OS Thread Interrupt Management Area MMIO
>>>>>>>> + */
>>>>>>>> +static uint64_t xive_tm_read_special(XiveNVT *nvt, hwaddr offset,
>>>>>>>> + unsigned size)
>>>>>>>> +{
>>>>>>>> + uint64_t ret = -1;
>>>>>>>> +
>>>>>>>> + if (offset == TM_SPC_ACK_OS_REG && size == 2) {
>>>>>>>> + ret = xive_nvt_accept(nvt);
>>>>>>>> + } else {
>>>>>>>> + qemu_log_mask(LOG_GUEST_ERROR, "XIVE: invalid TIMA read @%"
>>>>>>>> + HWADDR_PRIx" size %d\n", offset, size);
>>>>>>>> + }
>>>>>>>> +
>>>>>>>> + return ret;
>>>>>>>> +}
>>>>>>>> +
>>>>>>>> +#define TM_RING(offset) ((offset) & 0xf0)
>>>>>>>> +
>>>>>>>> +static uint64_t xive_tm_os_read(void *opaque, hwaddr offset,
>>>>>>>> + unsigned size)
>>>>>>>> +{
>>>>>>>> + PowerPCCPU *cpu = POWERPC_CPU(current_cpu);
>>>>>>>
>>>>>>> So, as I said on a previous version of this, we can actually correctly
>>>>>>> represent different mappings in different cpu spaces, by exploiting
>>>>>>> cpu->as and not just having them all point to &address_space_memory.
>>>>>>
>>>>>> Yes, you did and I haven't studied the question yet. For the next
>>>>>> version.
>>>>>
>>>>> So, it's possible that using the cpu->as thing will be more trouble
>>>>> that it's worth.
>>>>
>>>> One of the trouble is the number of memory regions to use, one per cpu,
>>>
>>> Well, we're already going to have an NVT object for each cpu, yes? So
>>> a memory region per-cpu doesn't seem like a big stretch.
>>>
>>>> and the KVM support.
>>>
>>> And I really don't see how the memory regions impacts KVM.
>>
>> The TIMA is setup when the KVM device is initialized using some specific
>> ioctl to get an fd on a MMIO region from the host. It is then passed to
>> the guest as a 'ram_device', same for the ESBs.
>
> Ah, good point.
>
>> This is not a common region.
>
> I'm not sure what you mean by that.
I meant by that 'out of the ordinary', 'unusual'. Specially when under KVM.
C.
>>>> Having a single region is much easier.
>>>>
>>>>> I am a little concerned about using current_cpu though.
>>>>> First, will it work with KVM with kernel_irqchip=off - the
>>>>> cpus are running truly concurrently,
>>>>
>>>> FWIW, I didn't see any issue yet while stressing.
>>>
>>> Ok.
>>>
>>>>> but we still need to work out who's poking at the TIMA.
>>>>
>>>> I understand. The registers are accessed by the current cpu to set the
>>>> CPPR and to ack an interrupt. But when we route an event, we also access
>>>> and modify the registers. Do you suggest some locking ? I am not sure
>>>> how are protected the TIMA region accesses vs. the routing, which is
>>>> necessarily initiated by an ESB MMIO though.
>>>
>>> Locking isn't really the issue. I mean, we do need locking, but the
>>> BQL should provide that. The issue is what exactly does "current"
>>> mean in the context of multiple concurrently running cpus. Does it
>>> always mean what we need it to mean in every context we might call
>>> this from.
>>
>> I would say so.
>
> Ok.
>
Re: [Qemu-ppc] [PATCH v3 06/35] spapr/xive: introduce a XIVE interrupt presenter model, David Gibson, 2018/05/03