[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not
From: |
Michael S. Tsirkin |
Subject: |
Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready |
Date: |
Tue, 27 Oct 2020 07:30:51 -0400 |
On Fri, Oct 23, 2020 at 09:26:48AM +0300, Marcel Apfelbaum wrote:
> Hi Michael,
>
> On Thu, Oct 22, 2020 at 6:01 PM Michael S. Tsirkin <mst@redhat.com> wrote:
>
> On Thu, Oct 22, 2020 at 05:50:51PM +0300, Marcel Apfelbaum wrote:
> >
> >
> > On Thu, Oct 22, 2020 at 5:33 PM Michael S. Tsirkin <mst@redhat.com>
> wrote:
> >
> >Â Â Â On Thu, Oct 22, 2020 at 05:10:43PM +0300, Marcel Apfelbaum wrote:
> >Â Â Â >
> >Â Â Â >
> >Â Â Â > On Thu, Oct 22, 2020 at 5:01 PM Michael S. Tsirkin
> <mst@redhat.com>
> >Â Â Â wrote:
> >Â Â Â >
> >Â Â Â >Â Â Â On Thu, Oct 22, 2020 at 04:55:10PM +0300, Marcel
> Apfelbaum
> wrote:
> >Â Â Â >Â Â Â > Hi David, Michael,
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â > On Thu, Oct 22, 2020 at 3:56 PM David Gibson <
> dgibson@redhat.com>
> >Â Â Â wrote:
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >Â Â Â On Thu, 22 Oct 2020 08:06:55 -0400
> >Â Â Â >Â Â Â >Â Â Â "Michael S. Tsirkin" <mst@redhat.com> wrote:
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >Â Â Â > On Thu, Oct 22, 2020 at 02:40:26PM +0300,
> Marcel
> Apfelbaum
> >Â Â Â wrote:
> >Â Â Â >Â Â Â >Â Â Â > > From: Marcel Apfelbaum <marcel@redhat.com>
> >Â Â Â >Â Â Â >Â Â Â > >
> >Â Â Â >Â Â Â >Â Â Â > > During PCIe Root Port's transition from
> Power-Off to
> >Â Â Â Power-ON (or
> >Â Â Â >Â Â Â >Â Â Â vice-versa)
> >Â Â Â >Â Â Â >Â Â Â > > the "Slot Control Register" has the "Power
> Indicator
> >Â Â Â Control"
> >Â Â Â >Â Â Â >Â Â Â > > set to "Blinking" expressing a "power
> transition"
> mode.
> >Â Â Â >Â Â Â >Â Â Â > >
> >Â Â Â >Â Â Â >Â Â Â > > Any hotplug operation during the "power
> transition"
> mode is
> >Â Â Â not
> >Â Â Â >Â Â Â >Â Â Â permitted
> >Â Â Â >Â Â Â >Â Â Â > > or at least not expected by the Guest OS
> leading to
> strange
> >Â Â Â >Â Â Â failures.
> >Â Â Â >Â Â Â >Â Â Â > >
> >Â Â Â >Â Â Â >Â Â Â > > Detect and refuse hotplug operations in
> such case.
> >Â Â Â >Â Â Â >Â Â Â > >
> >Â Â Â >Â Â Â >Â Â Â > > Signed-off-by: Marcel Apfelbaum <
> marcel.apfelbaum@gmail.com
> >Â Â Â >
> >Â Â Â >Â Â Â >Â Â Â > > ---
> >Â Â Â >Â Â Â >Â Â Â > >Â hw/pci/pcie.c | 7 +++++++
> >Â Â Â >Â Â Â >Â Â Â > >Â 1 file changed, 7 insertions(+)
> >Â Â Â >Â Â Â >Â Â Â > >
> >Â Â Â >Â Â Â >Â Â Â > > diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> >Â Â Â >Â Â Â >Â Â Â > > index 5b48bae0f6..2fe5c1473f 100644
> >Â Â Â >Â Â Â >Â Â Â > > --- a/hw/pci/pcie.c
> >Â Â Â >Â Â Â >Â Â Â > > +++ b/hw/pci/pcie.c
> >Â Â Â >Â Â Â >Â Â Â > > @@ -410,6 +410,7 @@ void
> pcie_cap_slot_pre_plug_cb
> >Â Â Â (HotplugHandler
> >Â Â Â >Â Â Â >Â Â Â *hotplug_dev, DeviceState *dev,
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â PCIDevice *hotplug_pdev =
> PCI_DEVICE
> (hotplug_dev);
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â uint8_t *exp_cap =
> hotplug_pdev->config +
> >Â Â Â hotplug_pdev->
> >Â Â Â >Â Â Â >Â Â Â exp.exp_cap;
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â uint32_t sltcap =
> pci_get_word(exp_cap +
> >Â Â Â PCI_EXP_SLTCAP);
> >Â Â Â >Â Â Â >Â Â Â > > +Â Â uint32_t sltctl =
> pci_get_word(exp_cap +
> >Â Â Â PCI_EXP_SLTCTL);
> >Â Â Â >Â Â Â >Â Â Â > >Â
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â /* Check if hot-plug is disabled
> on the slot */
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â if (dev->hotplugged && (sltcap &
> PCI_EXP_SLTCAP_HPC) =
> >Â Â Â = 0) {
> >Â Â Â >Â Â Â >Â Â Â > > @@ -418,6 +419,12 @@ void
> pcie_cap_slot_pre_plug_cb
> >Â Â Â >Â Â Â (HotplugHandler
> >Â Â Â >Â Â Â >Â Â Â *hotplug_dev, DeviceState *dev,
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â Â Â return;
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â }
> >Â Â Â >Â Â Â >Â Â Â > >Â
> >Â Â Â >Â Â Â >Â Â Â > > +Â Â if ((sltctl & PCI_EXP_SLTCTL_PIC) ==
> >Â Â Â >Â Â Â PCI_EXP_SLTCTL_PWR_IND_BLINK)
> >Â Â Â >Â Â Â >Â Â Â {
> >Â Â Â >Â Â Â >Â Â Â > > +Â Â Â Â error_setg(errp, "Hot-plug
> failed: %s is in
> Power
> >Â Â Â >Â Â Â Transition",
> >Â Â Â >Â Â Â >Â Â Â > > +Â Â Â Â Â Â Â Â Â Â
> DEVICE(hotplug_pdev)->id);
> >Â Â Â >Â Â Â >Â Â Â > > +Â Â Â Â return;
> >Â Â Â >Â Â Â >Â Â Â > > +Â Â }
> >Â Â Â >Â Â Â >Â Â Â > > +
> >Â Â Â >Â Â Â >Â Â Â > >Â Â Â
> pcie_cap_slot_plug_common(PCI_DEVICE
> (hotplug_dev),
> >Â Â Â dev,
> >Â Â Â >Â Â Â errp);
> >Â Â Â >Â Â Â >Â Â Â > >Â }Â
> >Â Â Â >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >Â Â Â > Probably the only way to handle for existing
> machine
> types.
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â > I agree
> >Â Â Â >Â Â Â > Â
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >Â Â Â > For new ones, can't we queue it in host
> memory
> somewhere?
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â > I am not sure I understand what will be the flow.
> >Â Â Â >Â Â Â > Â - The user asks for a hotplug operation.
> >Â Â Â >Â Â Â > Â -Â QEMU deferred operation.
> >Â Â Â >Â Â Â > After that the operation may still fail, how would
> the user
> know if
> >Â Â Â the
> >Â Â Â >Â Â Â > operation
> >Â Â Â >Â Â Â > succeeded or not?
> >Â Â Â >
> >Â Â Â >
> >Â Â Â >Â Â Â How can it fail? It's just a button press ...
> >Â Â Â >
> >Â Â Â >
> >Â Â Â >
> >Â Â Â > Currently we have "Hotplug unsupported."
> >Â Â Â > With this change we have "Guest/System not ready"
> >
> >
> >Â Â Â Hotplug unsupported is not an error that can trigger with
> >Â Â Â a well behaved management such as libvirt.
> >
> >
> >Â Â Â > Â
> >Â Â Â >
> >Â Â Â >Â Â Â > Â
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >Â Â Â I'm not actually convinced we can't do that
> even for
> existing
> >Â Â Â machine
> >Â Â Â >Â Â Â >Â Â Â types.Â
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â > Is a Guest visible change, I don't think we can do it.
> >Â Â Â >Â Â Â > Â
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >Â Â Â So I'm a bit hesitant to suggest going ahead
> with this
> without
> >Â Â Â >Â Â Â >Â Â Â looking a bit closer at whether we can
> implement a
> >Â Â Â wait-for-ready in
> >Â Â Â >Â Â Â >Â Â Â qemu, rather than forcing every user of qemu
> (human or
> machine)
> >Â Â Â to do
> >Â Â Â >Â Â Â >Â Â Â so.
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â > While I agree it is a pain from the usability point
> of view,
> >Â Â Â hotplug
> >Â Â Â >Â Â Â operations
> >Â Â Â >Â Â Â > are allowed to fail. This is not more than a corner
> case,
> ensuring
> >Â Â Â the
> >Â Â Â >Â Â Â right
> >Â Â Â >Â Â Â > response (gracefully erroring out) may be enough.
> >Â Â Â >Â Â Â >
> >Â Â Â >Â Â Â > Thanks,
> >Â Â Â >Â Â Â > Marcel
> >Â Â Â >Â Â Â >
> >Â Â Â >
> >Â Â Â >
> >Â Â Â >Â Â Â I don't think they ever failed in the past so
> management is
> unlikely
> >Â Â Â >Â Â Â to handle the failure by retrying ...
> >Â Â Â >
> >Â Â Â >
> >Â Â Â > That would require some management handling, yes.
> >   > But even without a "retry", failing is better than strange OS
> behavior.
> >Â Â Â >
> >Â Â Â > Trying a better alternative like deferring the operation for
> new
> machines
> >Â Â Â > would make sense, however is out of the scope of this patch
> >
> >Â Â Â Expand the scope please. The scope should be "solve a problem
> xx" not
> >Â Â Â "solve a problem xx by doing abc".
> >
> >
> >
> > The scope is detecting a hotplug error early instead
> > passing to the Guest OS a hotplug operation that we know it will fail.
> >
>
> Right. After detecting just failing unconditionally it a bit too
> simplistic IMHO.
>
>
>
> Simplistic does not mean wrong or incorrect.
> I fail to see why it is not enough.
The failure patch requires management to retry later.
A more elaborate scheme will fix the bug without need for management
changes.
> What QEMU can do better? Wait an unbounded time for the blinking to finish?
> What if we have a buggy guest with a kernel stuck in blinking?
Then it won't see the new device ever but does it even matter? It's
stuck ... I'd ack adding a query command to see what is going
on with the device. Can be generic, implementable on top of ACPI too.
> Is QEMU's responsibility to emulate the operator itself? Because the operator
> is the one who is supposed to wait.
I think these details are immaterial for users. They don't read pci
spec.
>
> Thanks,
> Marcel
>
> [...]Â
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, (continued)
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, David Gibson, 2020/10/26
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, Marcel Apfelbaum, 2020/10/23
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, David Gibson, 2020/10/26
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, Michael S. Tsirkin, 2020/10/27
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, Igor Mammedov, 2020/10/27
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, Michael S. Tsirkin, 2020/10/27
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, David Gibson, 2020/10/27
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, David Gibson, 2020/10/27
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, Igor Mammedov, 2020/10/28
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, Michael S. Tsirkin, 2020/10/28
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready,
Michael S. Tsirkin <=
- Re: [PATCH] pci: Refuse to hotplug PCI Devices when the Guest OS is not ready, David Gibson, 2020/10/22