[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-ppc] [RFC PATCH 0/5] hotplug: fix premature rebinding of VFIO
From: |
Laine Stump |
Subject: |
Re: [Qemu-ppc] [RFC PATCH 0/5] hotplug: fix premature rebinding of VFIO devices to host |
Date: |
Thu, 29 Jun 2017 14:50:15 -0400 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.2.0 |
On 06/28/2017 08:24 PM, Michael Roth wrote:
> Hi everyone. Hoping to get some feedback on this approach, or some
> alternatives proposed below, to the following issue:
>
> Currently libvirt immediately attempts to rebind a managed device back to the
> host driver when it receives a DEVICE_DELETED event from QEMU. This is
> problematic for 2 reasons:
>
> 1) If multiple devices from a group are attached to a guest, this can move
> the group into a "non-viable" state where some devices are assigned to
> the host and some to the guest.
Since we don't support hotplug with managed='yes' of individual (or
multiple) functions of a multifunction host device, I don't know that
it's very useful to support hot *un*plug of it - it would only be useful
if the multi-function device were present in the guest when it was
started, and then was hot-unplugged later. And this is all a lot of
extra complexity, though, so it would be useful to know what are the
scenarios where it would actually be used (i.e. is this a legitimate
need, or just an interesting exercise?)
>
> 2) When QEMU emits the DEVICE_DELETED event, there's still a "finalize" phase
> where additional cleanup occurs. In most cases libvirt can ignore this
> cleanup, but in the case of VFIO devices this is where closing of a VFIO
> group FD occurs, and failing to wait before rebinding the device to the
> host driver can result in unexpected behavior. In the case of powernv
> hosts at least, this can lead to a host driver crashing due to the default
> DMA windows not having been fully-restored yet. The window between this is
> and the initial DEVICE_DELETED seems to be ~6 seconds in practice. We've
> seen host dumps with Mellanox CX4 VFs being rebound to host driver during
> this period (on powernv hosts).
I agree with Dan that the situation described here should be considered
a qemu bug - according to my understanding (from back at the time
DEVICE_DELETED was added to qemu (I think at libvirt's request) qemu
should never emit the DEVICE_DELETED event until *everything* related to
the device is finished - that was the whole point of adding the event in
the first palce. Covering up this bug with a bunch of extra libvirt
complexity is just creating the potential for even more bugs in the more
complex code.
>
> Patches 1-4 address 1) by deferring rebinding of a hostdev to the host driver
> until all the devices in the group have been detached, at which point all
> the hostdevs are rebound as a group. Until that point, the devices are traced
> by the drvManager's inactiveList in a similar manner to hostdevs that are
> assigned to VFIO via the nodedev-detach interface.
What happens if libvirtd is restarted during this period? Is the
inactiveList rebuilt with all the info necessary to assure that the
nodedev-reattach does/doesn't happen (as appropriate) for all devices?
>
> Patch 5 addresses 2) by adding an additional check that, when the last device
> from a group is detached, polls /proc for open FDs referencing the VFIO group
> path in /dev/vfio/<iommu_group> and waiting for the FD to be closed. If we
> time out, we abandon rebinding the hostdevs back to the host.
>
> There are a couple alternatives to Patch 5 that might be worth considering:
>
> a) Add a DEVICE_FINALIZED event to QEMU and wait for that instead of
> DEVICE_DELETED.
Is the "device is *almost* completely deleted" event (current
DEVIE_DELETED) really something that anyone wants/needs to know about?
Or is the only useful event just the one that you're calling
DEVICE_FINALIZED? If the latter, then I think it would be better to just
change when DEVICE_DELETED is emitted.
Paired with patches 1-4 this would let us drop patch 5 in
> favor of minimal changes to libvirt's event handlers.
>
> The downsides are:
> - that we'd incur some latency for all device-detach calls, but it's not
> apparent to me whether this delay is significant for anything outside
> of VFIO.
> - there may be cases where finalization after DEVICE_DELETE/unparent are
> is not guaranteed, and I'm not sure QOM would encourage such
> expectations even if that's currently the case.
>
> b) Add a GROUP_DELETED event to VFIO's finalize callback. This is the most
> direct solution. With this we could completely separate out the handling
> of rebinding to host driver based on receival of this event.
>
> The downsides are:
> - this would only work for newer versions of QEMU, though we could use
> the poll-wait in patch 5 as a fallback.
I don't think we should add such a complex patch as a fallback to
support older versions of qemu that don't have the bug fixed. Instead,
just tell people to upgrade qemu (after all, they're going to need to
update *something* (either libvirt or qemu); no need to update libvirt
just in order to avoid updating qemu).
> - synchronizing sync/async device-detach threads with sync/async
> handlers for this would be a bit hairy, but I have a WIP in progress
> that seems *fairly reasonable*
>
> c) Take the approach in Patch 5, either as a precursor to implementing b) or
> something else, or just sticking with that for now.
>
> d) ???
>
> Personally I'm leaning toward c), but I'm wondering if that's "good enough"
> for now, or if we should pursue something more robust from the start, or
> perhaps something else entirely?
>
> Any feedback is greatly appreciated!
>
> src/libvirt_private.syms | 5 ++
> src/qemu/qemu_hostdev.c | 16 +++++
> src/qemu/qemu_hostdev.h | 4 ++
> src/qemu/qemu_hotplug.c | 56 ++++++++++++++----
> src/util/virfile.c | 52 +++++++++++++++++
> src/util/virfile.h | 1 +
> src/util/virhostdev.c | 173
> ++++++++++++++++++++++++++++++++++++++++++++++++++-----
> src/util/virhostdev.h | 16 +++++
> src/util/virpci.c | 69 ++++++++++++++++++----
> src/util/virpci.h | 4 ++
> 10 files changed, 360 insertions(+), 36 deletions(-)
>
>
- [Qemu-ppc] [RFC PATCH 2/5] qemu_hotplug: squash qemuDomainRemovePCIHostDevice into caller, (continued)
- [Qemu-ppc] [RFC PATCH 2/5] qemu_hotplug: squash qemuDomainRemovePCIHostDevice into caller, Michael Roth, 2017/06/28
- [Qemu-ppc] [RFC PATCH 3/5] virpci: introduce virPCIIOMMUGroupIterate(), Michael Roth, 2017/06/28
- [Qemu-ppc] [RFC PATCH 1/5] virhostdev: factor release out from reattach and export it for use later, Michael Roth, 2017/06/28
- [Qemu-ppc] [RFC PATCH 5/5] qemu: hotplug: wait for VFIO group FD close before unbind, Michael Roth, 2017/06/28
- [Qemu-ppc] [RFC PATCH 4/5] qemu: hotplug: unbind VFIO devices as a group, Michael Roth, 2017/06/28
- Re: [Qemu-ppc] [RFC PATCH 0/5] hotplug: fix premature rebinding of VFIO devices to host, Daniel P. Berrange, 2017/06/29
- Re: [Qemu-ppc] [RFC PATCH 0/5] hotplug: fix premature rebinding of VFIO devices to host,
Laine Stump <=