[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE
From: |
Cédric Le Goater |
Subject: |
Re: [Qemu-ppc] [PATCH v2 02/19] spapr: introduce a skeleton for the XIVE interrupt controller |
Date: |
Thu, 12 Apr 2018 10:51:02 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.5.2 |
On 04/12/2018 07:15 AM, David Gibson wrote:
> On Wed, Jan 17, 2018 at 03:39:46PM +0100, Cédric Le Goater wrote:
>> On 01/17/2018 12:10 PM, Benjamin Herrenschmidt wrote:
>>> On Wed, 2018-01-17 at 10:18 +0100, Cédric Le Goater wrote:
>>>>>>> Also, have we decided how the process of switching between XICS and
>>>>>>> XIVE will work vs. CAS ?
>>>>>>
>>>>>> That's how it is described in the architecture. The current choice is
>>>>>> to create both XICS and XIVE objects and choose at CAS which one to
>>>>>> use. It relies today on the capability of the pseries machine to
>>>>>> allocate IRQ numbers for both interrupt controller backends. These
>>>>>> patches have been merged in QEMU.
>>>>>>
>>>>>> A change of interrupt mode results in a reset. The device tree is
>>>>>> populated accordingly and the ICPs are switched for the model in
>>>>>> use.
>>>>>
>>>>> For KVM we need to only instanciate one of them though.
>>>>
>>>> Hmm,
>>>>
>>>> How would we handle a guest rebooting on a kernel without XIVE support ?
>>>
>>> It will do CAS again and we can change the devices.
>>
>> So, we would destroy the previous QEMU ICS object and create a new one
>> in the CAS hcall. That would probably work. There might be some issues
>> in creating and destroying the ICS KVM device, but that can be studied
>> without XIVE.
>
> Adding and removing devices at runtime based on guest requests like
> this will get really hairy in qemu.
I confirm ...
> As I've said before for the first cut, I think we want to select just
> one as a machine option to avoid this confusion.
OK
> Looking further ahead, I think we'll be better off having both the
> XIVE and XICS models always present (at least minimally) in qemu, but
> with only one "active" at any given time.
Under emulation it is not too complex to support both mode.
XIVE and XICS objects are both created but spapr->ov5_cas
filters their usage
However, syncing the change in KVM is more complex.
> Note that having the inactive one destroy and clean up the
> corresponding KVM devices is fine, as is deallocating as much of its
> runtime state as we can without changing the notional QOM tree.
yes. I will try to send a patchset organized that way :
- spapr XIVE emulated mode (both mode supported)
- XIVE KVM in an exclusive way, the machine will need to be
restarted from the command line to change interrupt mode.
- support of change of interrupt mode under KVM
- powernv device model (rough)
C.
>> It used to be considered ugly to create a QEMU device at reset time, so
>> I wonder if this is still the case, because when the machine reaches CAS,
>> we really are beyond reset.
>>
>> If this is OK, then the next "issue" is to keep in sync the allocated
>> IRQ numbers. The IRQ allocator is now merged at the machine level, so
>> the synchronization is obvious to do when both backend QEMU objects
>> are available. that's the path I took. If both QEMU objects are not
>> available, then we need to scan the IRQ number space in the current
>> interrupt mode and allocate the same IRQs in the newly negotiated mode.
>> Probably OK. I don't see major problems with the current code.
>>
>> Migration is a problem. We will need both backend QEMU objects to be
>> available anyhow if we want to migrate. So we are back to the current
>> solution creating both QEMU objects but we can try to defer some of the
>> KVM inits and create the KVM device on demand at CAS time.
>>
>> The next problem is the ICP object that currently needs the KVM device
>> fd to connect the vcpus ... So, we will need to change that also.
>> That is probably the biggest problem today. We need a way to disconnect
>> the vpcu from the KVM device and see how we can defer the connection.
>> I need to make sure this is possible, I can check that without XIVE
>> I think.
>>
>>>> Are you suggesting to create the XICS or XIVE device in the CAS
>>>> negotiation
>>>> process ? So, the machine would not have any interrupt controller before
>>>> CAS. That seems really late to me. grub uses the console for instance.
>>>
>>> We start with XICS by default.
>>
>> yes.
>>
>>>> I think it should prepare for both options, start in XIVE legacy mode,
>>>> which is XICS, then possibly switch to XIVE exploitation mode.
>>>>
>>>>>>> And how that will interact with KVM ?
>>>>>>
>>>>>> I expect we will do the same, which is to create two KVM devices to
>>>>>> be able to handle both interrupt controller backends depending on the
>>>>>> mode negotiated by the guest.
>>>>>
>>>>> That will be an ungodly mess, I'd rather we only instanciate the right
>>>>> one.
>>>>
>>>> It's rather transparent currently in the emulated version. There are two
>>>> sets of objects in QEMU, switching is done in CAS. KVM support should not
>>>> change anything in that area.
>>>>
>>>> I expect the 'xive-kvm' object to get/set states for migration, just like
>>>> for XICS and to setup the ESB+TIMA memory regions, which is new.
>>>
>>> But both XICS and XIVE are completely different kernel KVM devices that will
>>> need to "hook" into the same set of internal hooks for things like
>>> interrupts
>>> being passed through, RTAS calls etc...
>>>
>>> How does KVM knows which one to "activate" ?
>>
>> Can't we add an extra IRQ type and use vcpu->arch.irq_type for that ?
>> I haven't studied all the low level details though.
>>
>>> I don't think the kernel should have both.
>>
>> I hear that. From a QEMU perspective, it is much easier to put everything
>> in place for both interrupt modes and let the guest decide what it wants
>> to use.
>>
>> If we choose not to, we will need to find solution to defer the KVM inits
>> and to disconnect/reconnect the vcpus. For the latter, we could add a
>> KVM_DISABLE_CAP ioctl or maybe better add a new capability like
>> KVM_CAP_IRQ_XIVE to perform the switch.
>>
>>
>> C.
>>
>