[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for
From: |
Michael S. Tsirkin |
Subject: |
Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices |
Date: |
Wed, 5 Dec 2018 12:26:02 -0500 |
On Wed, Dec 05, 2018 at 05:18:18PM +0000, Daniel P. Berrangé wrote:
> On Thu, Oct 25, 2018 at 05:06:29PM +0300, Sameeh Jubran wrote:
> > From: Sameeh Jubran <address@hidden>
> >
> > Hi all,
> >
> > Background:
> >
> > There has been a few attempts to implement the standby feature for vfio
> > assigned devices which aims to enable the migration of such devices. This
> > is another attempt.
> >
> > The series implements an infrastructure for hiding devices from the bus
> > upon boot. What it does is the following:
> >
> > * In the first patch the infrastructure for hiding the device is added
> > for the qbus and qdev APIs. A "hidden" boolean is added to the device
> > state and it is set based on a callback to the standby device which
> > registers itself for handling the assessment: "should the primary device
> > be hidden?" by cross validating the ids of the devices.
> >
> > * In the second patch the virtio-net uses the API to hide the vfio
> > device and unhides it when the feature is acked.
>
> IIUC, the general idea is that we want to provide a pair of associated NIC
> devices to the guest, one emulated, one physical PCI device. The guest would
> put them in a bonded pair. Before migration the PCI device is unplugged & a
> new PCI device plugged on target after migration. The guest traffic continues
> without interuption due to the emulate device.
>
> This kind of conceptual approach can already be implemented today by
> management
> apps. The only hard problem that exists today is how the guest OS can figure
> out that a particular pair of devices it has are intended to be used
> together.
>
> With this series, IIUC, the virtio-net device is getting a given property
> which
> defines the qdev ID of the associated VFIO device. When the guest OS activates
> the virtio-net device and acknowledges the STANDBY feature bit, qdev then
> unhides the associated VFIO device.
>
> AFAICT the guest has to infer that the device which suddenly appears is the
> one
> associated with the virtio-net device it just initialized, for purposes of
> setting up the NIC bonding. There doesn't appear to be any explicit assocation
> between the devices exposed to the guest.
>
> This feels pretty fragile for a guest needing to match up devices when there
> are many pairs of devices exposed to a single guest.
>
> Unless I'm mis-reading the patches, it looks like the VFIO device always has
> to be available at the time QEMU is started. There's no way to boot a guest
> and then later hotplug a VFIO device to accelerate the existing virtio-net
> NIC.
That should be supported.
> Or similarly after migration there might not be any VFIO device available
> initially when QEMU is started to accept the incoming migration. So it might
> need to run in degraded mode for an extended period of time until one becomes
> available for hotplugging.
That should work too.
> The use of qdev IDs makes this troublesome, as the
> qdev ID of the future VFIO device would need to be decided upfront before it
> even exists.
I agree this sounds problematic.
>
> So overall I'm not really a fan of the dynamic hiding/unhiding of devices.
Dynamic hiding is an orthogonal issue though. It's needed for
error handling in case of migration failure: we do not
want to close the VFIO device but we do need to
hide it from guest. libvirt should not be involved in
this aspect though.
> I
> would much prefer to see some way to expose an explicit relationship between
> the devices to the guest.
>
> > Disclaimers:
> >
> > * I have only scratch tested this and from qemu side, it seems to be
> > working.
> > * This is an RFC so it lacks some proper error handling in few cases
> > and proper resource freeing. I wanted to get some feedback first
> > before it is finalized.
> >
> > Command line example:
> >
> > /home/sameeh/Builds/failover/qemu/x86_64-softmmu/qemu-system-x86_64 \
> > -netdev
> > tap,id=hostnet0,script=world_bridge_standalone.sh,downscript=no,ifname=cc1_71
> > \
> > -netdev
> > tap,vhost=on,id=hostnet1,script=world_bridge_standalone.sh,downscript=no,ifname=cc1_72,queues=4
> > \
> > -device
> > virtio-net,host_mtu=1500,netdev=hostnet1,id=cc1_72,vectors=10,mq=on,primary=cc1_71
> > \
> > -device e1000,netdev=hostnet0,id=cc1_71,standby=cc1_72 \
> >
> > Migration support:
> >
> > Pre migration or during setup phase of the migration we should send an
> > unplug request to the guest to unplug the primary device. I haven't had
> > the chance to implement that part yet but should do soon. Do you know
> > what's the best approach to do so? I wanted to have a callback to the
> > virtio-net device which tries to send an unplug request to the guest and
> > if succeeds then the migration continues. It needs to handle the case where
> > the migration fails and then it has to replug the primary device back.
>
> Having QEMU do this internally gets into a world of pain when you have
> multiple devices in the guest.
>
> Consider if we have 2 pairs of devices. We unplug one VFIO device, but
> unplugging the second VFIO device fails, thus we try to replug the first
> VFIO device but this now fails too. We don't even get as far as starting
> the migration before we have to return an error.
>
> The mgmt app will just see that the migration failed, but it will not
> be sure which devices are now actually exposed to the guest OS correctly.
>
> The similar problem hits if we started the migration data stream, but
> then had to abort and so need to tear try to replug in the source but
> failed for some reasons.
>
> Doing the VFIO device plugging/unplugging explicitly from the mgmt app
> gives that mgmt app direct information about which devices have been
> successfully made available to the guest at all time, becuase the mgmt
> app can see the errors from each step of the process. Trying to do
> this inside QEMU doesn't achieve anything the mgmt app can't already
> do, but it obscures what happens during failures. The same applies at
> the libvirt level too, which is why mgmt apps today will do the VFIO
> unplug/replug either side of migration themselves.
>
>
> Regards,
> Daniel
> --
> |: https://berrange.com -o- https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o- https://fstop138.berrange.com :|
> |: https://entangle-photo.org -o- https://www.instagram.com/dberrange :|
- Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices, Michael Roth, 2018/12/05
- Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices, Daniel P . Berrangé, 2018/12/05
- Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices, Daniel P . Berrangé, 2018/12/06
- Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices, Eduardo Habkost, 2018/12/07
- Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices, Daniel P . Berrangé, 2018/12/07
- Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices, Michael S. Tsirkin, 2018/12/07
- Re: [Qemu-devel] [RFC 0/2] Attempt to implement the standby feature for assigned network devices, Roman Kagan, 2018/12/07