qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not


From: Marcel Apfelbaum
Subject: Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready
Date: Thu, 22 Oct 2020 17:10:43 +0300



On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel Apfelbaum wrote:
> Hi David, Michael,
>
> On Thu, Oct 22, 2020 at 3:56 PM David Gibson <dgibson@redhat.com> wrote:
>
>     On Thu, 22 Oct 2020 08:06:55 -0400
>     "Michael S. Tsirkin" <mst@redhat.com> wrote:
>
>     > On Thu, Oct 22, 2020 at 02:40:26PM +0300, Marcel Apfelbaum wrote:
>     > > From: Marcel Apfelbaum <marcel@redhat.com>
>     > >
>     > > During PCIe Root Port's transition from Power-Off to Power-ON (or
>     vice-versa)
>     > > the "Slot Control Register" has the "Power Indicator Control"
>     > > set to "Blinking" expressing a "power transition" mode.
>     > >
>     > > Any hotplug operation during the "power transition" mode is not
>     permitted
>     > > or at least not expected by the Guest OS leading to strange failures.
>     > >
>     > > Detect and refuse hotplug operations in such case.
>     > >
>     > > Signed-off-by: Marcel Apfelbaum <marcel.apfelbaum@gmail.com>
>     > > ---
>     > >  hw/pci/pcie.c | 7 +++++++
>     > >  1 file changed, 7 insertions(+)
>     > >
>     > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
>     > > index 5b48bae0f6..2fe5c1473f 100644
>     > > --- a/hw/pci/pcie.c
>     > > +++ b/hw/pci/pcie.c
>     > > @@ -410,6 +410,7 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
>     *hotplug_dev, DeviceState *dev,
>     > >      PCIDevice *hotplug_pdev = PCI_DEVICE(hotplug_dev);
>     > >      uint8_t *exp_cap = hotplug_pdev->config + hotplug_pdev->
>     exp.exp_cap;
>     > >      uint32_t sltcap = pci_get_word(exp_cap + PCI_EXP_SLTCAP);
>     > > +    uint32_t sltctl = pci_get_word(exp_cap + PCI_EXP_SLTCTL);
>     > > 
>     > >      /* Check if hot-plug is disabled on the slot */
>     > >      if (dev->hotplugged && (sltcap & PCI_EXP_SLTCAP_HPC) == 0) {
>     > > @@ -418,6 +419,12 @@ void pcie_cap_slot_pre_plug_cb(HotplugHandler
>     *hotplug_dev, DeviceState *dev,
>     > >          return;
>     > >      }
>     > > 
>     > > +    if ((sltctl & PCI_EXP_SLTCTL_PIC) == PCI_EXP_SLTCTL_PWR_IND_BLINK)
>     {
>     > > +        error_setg(errp, "Hot-plug failed: %s is in Power Transition",
>     > > +                   DEVICE(hotplug_pdev)->id);
>     > > +        return;
>     > > +    }
>     > > +
>     > >      pcie_cap_slot_plug_common(PCI_DEVICE(hotplug_dev), dev, errp);
>     > >  } 
>     >
>     > Probably the only way to handle for existing machine types.
>
>
> I agree
>  
>
>     > For new ones, can't we queue it in host memory somewhere?
>
>
>
> I am not sure I understand what will be the flow.
>   - The user asks for a hotplug operation.
>   -  QEMU deferred operation.
> After that the operation may still fail, how would the user know if the
> operation
> succeeded or not?


How can it fail? It's just a button press ...


Currently we have "Hotplug unsupported."
With this change we have "Guest/System not ready"

 
>  
>
>     I'm not actually convinced we can't do that even for existing machine
>     types. 
>
>
> Is a Guest visible change, I don't think we can do it.
>  
>
>     So I'm a bit hesitant to suggest going ahead with this without
>     looking a bit closer at whether we can implement a wait-for-ready in
>     qemu, rather than forcing every user of qemu (human or machine) to do
>     so.
>
>
> While I agree it is a pain from the usability point of view, hotplug operations
> are allowed to fail. This is not more than a corner case, ensuring the right
> response (gracefully erroring out) may be enough.
>
> Thanks,
> Marcel
>


I don't think they ever failed in the past so management is unlikely
to handle the failure by retrying ...

That would require some management handling, yes.
But even without a "retry", failing is better than strange OS behavior.

Trying a better alternative like deferring the operation for new machines
would make sense, however is out of the scope of this patch that simply
detects the error leaving us in a slightly better state than today.

Thanks,
Marcel
 

>
>
>
>     --
>     David Gibson <dgibson@redhat.com>
>     Principal Software Engineer, Virtualization, Red Hat
>


reply via email to

[Prev in Thread] Current Thread [Next in Thread]