qemu-ppc
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE


From: Mahesh J Salgaonkar
Subject: Re: [PATCH] spapr: Modify ibm, get-config-addr-info2 to set DEVNUM in PE config address.
Date: Thu, 29 Apr 2021 14:32:23 +0530

On 2021-04-28 22:33:45 Wed, Oliver O'Halloran wrote:
> On Tue, Apr 27, 2021 at 9:56 PM Mahesh Salgaonkar <mahesh@linux.ibm.com> 
> wrote:
> >
> > With upstream kernel, especially after commit 98ba956f6a389
> > ("powerpc/pseries/eeh: Rework device EEH PE determination") we see that KVM
> > guest isn't able to enable EEH option for PCI pass-through devices anymore.
> 
> How are you passing the devices through to the guest?

I am using libvirt with below xml section to add pass-through:

    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0033' bus='0x01' slot='0x00' function='0x0'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x0' 
multifunction='on'/>
    </hostdev>
    <hostdev mode='subsystem' type='pci' managed='yes'>
      <driver name='vfio'/>
      <source>
        <address domain='0x0033' bus='0x01' slot='0x00' function='0x1'/>
      </source>
      <address type='pci' domain='0x0000' bus='0x01' slot='0x01' function='0x1' 
multifunction='on'/>
    </hostdev>

Looks like libvirt does not allow pass through device in slot zero, and
throws following error.

error: XML error: Invalid PCI address 0000:01:00.0. slot must be >= 1
Failed. Try again? [y,n,i,f,?]:

> 
> > [root@atest-guest ~]# dmesg | grep EEH
> > [    0.032337] EEH: pSeries platform initialized
> > [    0.298207] EEH: No capable adapters found: recovery disabled.
> > [root@atest-guest ~]#
> >
> > So far the linux kernel was assuming pe_config_addr equal to device's
> > config_addr and using it to enable EEH on the PE through ibm,set-eeh-option
> > RTAS call. Which wasn't the correct way as per PAPR. The linux kernel
> > commit 98ba956f6a389 fixed this flow. With that fixed, linux now gets the
> > PE config address first using the ibm,get-config-addr-info2 RTAS call, and
> > then uses the found address as argument to ibm,set-eeh-option RTAS call to
> > enable EEH on the PCI device.
> 
> That's not really correct. EEH being enabled or disabled is a per-PE
> setting rather than a per-device setting. The fact there's not a 1-1
> correspondence between devices and PEs is a large part of why the
> get-config-addr-info2 RTAS call exists in the first place.
> Unfortunately, the initial implementation of EEH support in linux
> conflated the two because in the past there was typically a single
> device within a PE. However, that assumption was never really correct
> and it has long outlived its usefulness.
> 
> > This works on PowerVM lpar but fails in qemu
> > KVM guests. This is because ibm,set-eeh-option RTAS call in qemu expects PE
> > config address bits 16:20 be populated with device number (DEVNUM).
> >
> > The rtas call ibm,get-config-addr-info2 in qemu always returns the PE
> > config address in form of "00BB0001" (i.e.  <00><BUS><DEVFN><REG>) where
> > "BB" represents the bus number of PE's primary bus and with device number
> > information always set to zero. However until commit 98ba956f6a389 this
> > return value wasn't used to enable EEH on the PCI device.
> >
> > Now in qemu guest with recent kernel the ibm,set-eeh-option RTAS call fails
> > with -3 return value indicating that there is no PCI device exist for the
> > specified pe config address. The rtas_ibm_set_eeh_option call uses
> > pci_find_device() to get the PC device that matches specific bus and devfn
> > extracted from PE config address passed as argument. Since the DEVFN part
> > of PE config always contains zero, pci_find_device() fails to find the
> > specific PCI device and hence fails to enable the EEH capability.
> >
> > hw/ppc/spapr_pci_vfio.c: spapr_phb_vfio_eeh_set_option()
> >    case RTAS_EEH_ENABLE: {
> >         PCIHostState *phb;
> >         PCIDevice *pdev;
> >
> >         /*
> >          * The EEH functionality is enabled on basis of PCI device,
> >          * instead of PE. We need check the validity of the PCI
> >          * device address.
> >          */
> >         phb = PCI_HOST_BRIDGE(sphb);
> >         pdev = pci_find_device(phb->bus,
> >                                (addr >> 16) & 0xFF, (addr >> 8) & 0xFF);
> >         if (!pdev || !object_dynamic_cast(OBJECT(pdev), "vfio-pci")) {
> >             return RTAS_OUT_PARAM_ERROR;
> >         }
> >
> > This patch fixes this issue by populating DEVNUM field (bits 16:20) of PE
> > config address with device number.
> 
> I don't think this is a good idea and I'm fairly sure you're
> introducing some subtle breakage here. As an example, say that on the
> host side you have two devices on the same bus:
> 
> 0000:00:00.0 - card 1
> 0000:00:01.0 - card 2
> 
> On PowerNV (i.e. the hypervisor) these would be placed into the same
> hardware PE since they're on the same bus. Now, if I take both devices
> and pass them through to the guest on the same PHB and bus (use
> 0005:ff) we'll have:
> 
> 0005:ff:00.0 - card 1
> 0005:ff:01.0 - card 2

It looks like libvirt does not support pass through device in slot zero.
Hence these appears as below in guest:

 0005:ff:01.0 - card 1
 0005:ff:02.0 - card 2

And with get-config-addr-info2 returning 00BB0001, ibm,set-eeh-option
RTAS call tries to check if device is present on the bus of spapr_phb at
bus->devices[devfn] where devfn is 0. And since qemu does not support
pass through device on slot zero bus->devices[0] is always NULL. And
hence it fails to enable EEH.

> 
> With this patch applied get-config-addr-info2 returns 00BBD001, so the
> PE we get for each device will be:
> 
> card 1 - 00ff0001
> card 2 - 00ff1001
> 
> Which implies the two are in different PEs. As a result, if the guest
> requests a reset of card 1's PE then the guest will see an unexpected
> reset of card 2 as well. From the hypervisor's point of view the two
> are in the same PE so this is a legitimate thing to do, but due to
> this patch the guest doesn't know that.

Agree. I realize my fix is not correctly handling this. The current code
under ibm,set-eeh-option is checking for individual PCI device presence.
Better fix should be to check if there is any PCI device (vfio-pci)
present under specified bus and enable the EEH if found. And no change
in return value of get-config-addr-info2. What do you say ?

-- 
Mahesh J Salgaonkar



reply via email to

[Prev in Thread] Current Thread [Next in Thread]