[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the
From: |
Cédric Le Goater |
Subject: |
Re: [Qemu-devel] [PATCH v7 09/19] spapr: add device tree support for the XIVE exploitation mode |
Date: |
Wed, 12 Dec 2018 08:37:26 +0100 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:60.0) Gecko/20100101 Thunderbird/60.3.1 |
On 12/12/18 1:19 AM, David Gibson wrote:
> On Tue, Dec 11, 2018 at 10:06:46AM +0100, Cédric Le Goater wrote:
>> On 12/11/18 1:38 AM, David Gibson wrote:
>>> On Mon, Dec 10, 2018 at 08:53:17AM +0100, Cédric Le Goater wrote:
>>>> On 12/10/18 7:39 AM, David Gibson wrote:
>>>>> On Sun, Dec 09, 2018 at 08:46:00PM +0100, Cédric Le Goater wrote:
>>>>>> The XIVE interface for the guest is described in the device tree under
>>>>>> the "interrupt-controller" node. A couple of new properties are
>>>>>> specific to XIVE :
>>>>>>
>>>>>> - "reg"
>>>>>>
>>>>>> contains the base address and size of the thread interrupt
>>>>>> managnement areas (TIMA), for the User level and for the Guest OS
>>>>>> level. Only the Guest OS level is taken into account today.
>>>>>>
>>>>>> - "ibm,xive-eq-sizes"
>>>>>>
>>>>>> the size of the event queues. One cell per size supported, contains
>>>>>> log2 of size, in ascending order.
>>>>>>
>>>>>> - "ibm,xive-lisn-ranges"
>>>>>>
>>>>>> the IRQ interrupt number ranges assigned to the guest for the IPIs.
>>>>>>
>>>>>> and also under the root node :
>>>>>>
>>>>>> - "ibm,plat-res-int-priorities"
>>>>>>
>>>>>> contains a list of priorities that the hypervisor has reserved for
>>>>>> its own use. OPAL uses the priority 7 queue to automatically
>>>>>> escalate interrupts for all other queues (DD2.X POWER9). So only
>>>>>> priorities [0..6] are allowed for the guest.
>>>>>>
>>>>>> Extend the sPAPR IRQ backend with a new handler to populate the DT
>>>>>> with the appropriate "interrupt-controller" node.
>>>>>>
>>>>>> Signed-off-by: Cédric Le Goater <address@hidden>
>>>>>> ---
>>>>>> include/hw/ppc/spapr_irq.h | 2 ++
>>>>>> include/hw/ppc/spapr_xive.h | 2 ++
>>>>>> include/hw/ppc/xics.h | 4 +--
>>>>>> hw/intc/spapr_xive.c | 64 +++++++++++++++++++++++++++++++++++++
>>>>>> hw/intc/xics_spapr.c | 3 +-
>>>>>> hw/ppc/spapr.c | 3 +-
>>>>>> hw/ppc/spapr_irq.c | 3 ++
>>>>>> 7 files changed, 77 insertions(+), 4 deletions(-)
>>>>>>
>>>>>> diff --git a/include/hw/ppc/spapr_irq.h b/include/hw/ppc/spapr_irq.h
>>>>>> index 23cdb51b879e..e51e9f052f63 100644
>>>>>> --- a/include/hw/ppc/spapr_irq.h
>>>>>> +++ b/include/hw/ppc/spapr_irq.h
>>>>>> @@ -39,6 +39,8 @@ typedef struct sPAPRIrq {
>>>>>> void (*free)(sPAPRMachineState *spapr, int irq, int num);
>>>>>> qemu_irq (*qirq)(sPAPRMachineState *spapr, int irq);
>>>>>> void (*print_info)(sPAPRMachineState *spapr, Monitor *mon);
>>>>>> + void (*dt_populate)(sPAPRMachineState *spapr, uint32_t nr_servers,
>>>>>> + void *fdt, uint32_t phandle);
>>>>>> } sPAPRIrq;
>>>>>>
>>>>>> extern sPAPRIrq spapr_irq_xics;
>>>>>> diff --git a/include/hw/ppc/spapr_xive.h b/include/hw/ppc/spapr_xive.h
>>>>>> index 9506a8f4d10a..728a5e8dc163 100644
>>>>>> --- a/include/hw/ppc/spapr_xive.h
>>>>>> +++ b/include/hw/ppc/spapr_xive.h
>>>>>> @@ -45,5 +45,7 @@ qemu_irq spapr_xive_qirq(sPAPRXive *xive, uint32_t
>>>>>> lisn);
>>>>>> typedef struct sPAPRMachineState sPAPRMachineState;
>>>>>>
>>>>>> void spapr_xive_hcall_init(sPAPRMachineState *spapr);
>>>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void
>>>>>> *fdt,
>>>>>> + uint32_t phandle);
>>>>>>
>>>>>> #endif /* PPC_SPAPR_XIVE_H */
>>>>>> diff --git a/include/hw/ppc/xics.h b/include/hw/ppc/xics.h
>>>>>> index 9958443d1984..14afda198cdb 100644
>>>>>> --- a/include/hw/ppc/xics.h
>>>>>> +++ b/include/hw/ppc/xics.h
>>>>>> @@ -181,8 +181,6 @@ typedef struct XICSFabricClass {
>>>>>> ICPState *(*icp_get)(XICSFabric *xi, int server);
>>>>>> } XICSFabricClass;
>>>>>>
>>>>>> -void spapr_dt_xics(int nr_servers, void *fdt, uint32_t phandle);
>>>>>> -
>>>>>> ICPState *xics_icp_get(XICSFabric *xi, int server);
>>>>>>
>>>>>> /* Internal XICS interfaces */
>>>>>> @@ -204,6 +202,8 @@ void icp_resend(ICPState *ss);
>>>>>>
>>>>>> typedef struct sPAPRMachineState sPAPRMachineState;
>>>>>>
>>>>>> +void spapr_dt_xics(sPAPRMachineState *spapr, uint32_t nr_servers, void
>>>>>> *fdt,
>>>>>> + uint32_t phandle);
>>>>>> int xics_kvm_init(sPAPRMachineState *spapr, Error **errp);
>>>>>> void xics_spapr_init(sPAPRMachineState *spapr);
>>>>>>
>>>>>> diff --git a/hw/intc/spapr_xive.c b/hw/intc/spapr_xive.c
>>>>>> index 982ac6e17051..a6d854b07690 100644
>>>>>> --- a/hw/intc/spapr_xive.c
>>>>>> +++ b/hw/intc/spapr_xive.c
>>>>>> @@ -14,6 +14,7 @@
>>>>>> #include "target/ppc/cpu.h"
>>>>>> #include "sysemu/cpus.h"
>>>>>> #include "monitor/monitor.h"
>>>>>> +#include "hw/ppc/fdt.h"
>>>>>> #include "hw/ppc/spapr.h"
>>>>>> #include "hw/ppc/spapr_xive.h"
>>>>>> #include "hw/ppc/xive.h"
>>>>>> @@ -1381,3 +1382,66 @@ void spapr_xive_hcall_init(sPAPRMachineState
>>>>>> *spapr)
>>>>>> spapr_register_hypercall(H_INT_SYNC, h_int_sync);
>>>>>> spapr_register_hypercall(H_INT_RESET, h_int_reset);
>>>>>> }
>>>>>> +
>>>>>> +void spapr_dt_xive(sPAPRMachineState *spapr, uint32_t nr_servers, void
>>>>>> *fdt,
>>>>>> + uint32_t phandle)
>>>>>> +{
>>>>>> + sPAPRXive *xive = spapr->xive;
>>>>>> + int node;
>>>>>> + uint64_t timas[2 * 2];
>>>>>> + /* Interrupt number ranges for the IPIs */
>>>>>> + uint32_t lisn_ranges[] = {
>>>>>> + cpu_to_be32(0),
>>>>>> + cpu_to_be32(nr_servers),
>>>>>> + };
>>>>>> + uint32_t eq_sizes[] = {
>>>>>> + cpu_to_be32(12), /* 4K */
>>>>>> + cpu_to_be32(16), /* 64K */
>>>>>> + cpu_to_be32(21), /* 2M */
>>>>>> + cpu_to_be32(24), /* 16M */
>>>>>
>>>>> For KVM, are we going to need to clamp this list based on the
>>>>> pagesizes the guest can use?
>>>>
>>>> I would say so. Is there a KVM service for that ?
>>>
>>> I'm not sure what you mean by a KVM service here.
>>
>> I meant a routine giving a list of the possible backing pagesizes.
>
> I'm still not really sure what you mean by that.
>
>>>> Today, the OS scans the list and picks the size fitting its current
>>>> PAGE_SIZE/SHIFT. But I suppose it would be better to advertise
>>>> only the page sizes that QEMU/KVM supports. Or should we play safe
>>>> and only export : 4K, 64K ?
>>>
>>> So, I'm guessing the limitation here is that when we use the KVM XIVE
>>> acceleration, the EQ will need to be host-contiguous?
>>
>> The EQ should be a single page (from the specs). So we don't have
>> a contiguity problem.
>
> A single page according to whom, though? A RPT guest can use 2MiB
> pages, even if it is backed with smaller pages on the host, but I'm
> guessing it would not work to have its EQ be a 2MiB page, since it
> won't be host-contiguous.
ok. yes. the HW won't be able to handle that.
The specs are not very clear on that topic. The Qsize field is 4bits,
page size being (2^(n+12)), so that's a lot of different page sizes but
the implementation only refers to 4K and 64K.
C.
>>> So, we can't have EQs larger than the backing pagesize for guest
>>> memory.
>>>
>>> We try to avoid having guest visible differences based on whether
>>> we're using KVM or not, so we don't want to change this list depending
>>> on whether KVM is active etc. - or on whether we're backing the guest
>>> with hugepages.
>>>
>>> We could reuse the cap-hpt-max-page-size machine parameter which also
>>> relates to backing page size, but that might be confusing since it's
>>> explicitly about HPT only whereas this limitation would apply to RPT
>>> guests too, IIUC.
>>>
>>> I think our best bet is probably to limit to 64kiB queues
>>> unconditionally.
>>
>> OK. That's the default. We can refine this list of page sizes later on
>> when we introduce KVM support if needed.
>>
>>> AIUI, that still gives us 8192 slots in the queue,
>>
>> The entries are 32bits. So that's 16384 entries.
>
> Even better.
>
>>> which is enough to store one of every possible irq, which should be
>>> enough.
>>
>> yes.
>>
>> I don't think Linux measures how much entries are dequeued at once.
>> It would give us a feel of how much pressure these EQs can sustain.
>>
>> C.
>>
>>> There's still an issue if we have a 4kiB pagesize host kernel, but we
>>> could just disable kernel_irqchip in that situation, since we don't
>>> expect people to be using that much in practice.
>>>
>>>>>> + };
>>>>>> + /* The following array is in sync with the reserved priorities
>>>>>> + * defined by the 'spapr_xive_priority_is_reserved' routine.
>>>>>> + */
>>>>>> + uint32_t plat_res_int_priorities[] = {
>>>>>> + cpu_to_be32(7), /* start */
>>>>>> + cpu_to_be32(0xf8), /* count */
>>>>>> + };
>>>>>> + gchar *nodename;
>>>>>> +
>>>>>> + /* Thread Interrupt Management Area : User (ring 3) and OS (ring 2)
>>>>>> */
>>>>>> + timas[0] = cpu_to_be64(xive->tm_base +
>>>>>> + XIVE_TM_USER_PAGE * (1ull << TM_SHIFT));
>>>>>> + timas[1] = cpu_to_be64(1ull << TM_SHIFT);
>>>>>> + timas[2] = cpu_to_be64(xive->tm_base +
>>>>>> + XIVE_TM_OS_PAGE * (1ull << TM_SHIFT));
>>>>>> + timas[3] = cpu_to_be64(1ull << TM_SHIFT);
>>>>>
>>>>> So this gives the MMIO address of the TIMA.
>>>>
>>>> Yes. It is considered to be a fixed "address"
>>>>
>>>>> Where does the guest get the MMIO address for the ESB pages from?
>>>>
>>>> with the hcall H_INT_GET_SOURCE_INFO.
>>>
>>> Ah, right.
>>>
>>
>
- [Qemu-devel] [PATCH v7 06/19] spapr/xive: use the VCPU id as a NVT identifier, (continued)
[Qemu-devel] [PATCH v7 10/19] spapr: allocate the interrupt thread context under the CPU core, Cédric Le Goater, 2018/12/09
[Qemu-devel] [PATCH v7 11/19] spapr: extend the sPAPR IRQ backend for XICS migration, Cédric Le Goater, 2018/12/09
[Qemu-devel] [PATCH v7 12/19] spapr: add a 'reset' method to the sPAPR IRQ backend, Cédric Le Goater, 2018/12/09
[Qemu-devel] [PATCH v7 13/19] spapr: add an extra OV5 field to the sPAPR IRQ backend, Cédric Le Goater, 2018/12/09
[Qemu-devel] [PATCH v7 08/19] spapr: add hcalls support for the XIVE exploitation interrupt mode, Cédric Le Goater, 2018/12/09
[Qemu-devel] [PATCH v7 14/19] spapr: set the interrupt presenter at reset, Cédric Le Goater, 2018/12/09