[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support
From: |
Dr. David Alan Gilbert |
Subject: |
Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support |
Date: |
Thu, 30 May 2019 20:08:23 +0100 |
User-agent: |
Mutt/1.11.4 (2019-03-13) |
* Michael S. Tsirkin (address@hidden) wrote:
> On Thu, May 30, 2019 at 07:00:23PM +0100, Dr. David Alan Gilbert wrote:
> > * Michael S. Tsirkin (address@hidden) wrote:
> > > On Thu, May 30, 2019 at 04:56:45PM +0200, Jens Freimann wrote:
> > > > Hi David,
> > > >
> > > > sorry for the delayed reply.
> > > >
> > > > On Tue, May 28, 2019 at 11:04:15AM -0400, Michael S. Tsirkin wrote:
> > > > > On Tue, May 21, 2019 at 10:45:05AM +0100, Dr. David Alan Gilbert
> > > > > wrote:
> > > > > > * Jens Freimann (address@hidden) wrote:
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque);
> > > > > > > +
> > > > > > > static void virtio_net_set_link_status(NetClientState *nc)
> > > > > > > {
> > > > > > > VirtIONet *n = qemu_get_nic_opaque(nc);
> > > > > > > @@ -786,6 +796,14 @@ static void
> > > > > > > virtio_net_set_features(VirtIODevice *vdev, uint64_t features)
> > > > > > > } else {
> > > > > > > memset(n->vlans, 0xff, MAX_VLAN >> 3);
> > > > > > > }
> > > > > > > +
> > > > > > > + if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> > > > > > > + atomic_set(&n->primary_should_be_hidden, false);
> > > > > > > + if (n->primary_device_timer)
> > > > > > > + timer_mod(n->primary_device_timer,
> > > > > > > + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > + 4000);
> > > > > > > + }
> > > > > >
> > > > > > What's this magic timer constant and why?
> > > >
> > > > To be honest it's a leftover from previous versions (before I took
> > > > over) of the patches and I'm not sure why the timer is there.
> > > > I removed it and so far see no reason to keep it.
> > > >
> > > > > >
> > > > > > > }
> > > > > > >
> > > > > > > static int virtio_net_handle_rx_mode(VirtIONet *n, uint8_t cmd,
> > > > > > > @@ -2626,6 +2644,87 @@ void
> > > > > > > virtio_net_set_netclient_name(VirtIONet *n, const char *name,
> > > > > > > n->netclient_type = g_strdup(type);
> > > > > > > }
> > > > > > >
> > > > > > > +static void virtio_net_primary_plug_timer(void *opaque)
> > > > > > > +{
> > > > > > > + VirtIONet *n = opaque;
> > > > > > > + Error *err = NULL;
> > > > > > > +
> > > > > > > + if (n->primary_device_dict)
> > > > > > > + n->primary_device_opts =
> > > > > > > qemu_opts_from_qdict(qemu_find_opts("device"),
> > > > > > > + n->primary_device_dict, &err);
> > > > > > > + if (n->primary_device_opts) {
> > > > > > > + n->primary_dev = qdev_device_add(n->primary_device_opts,
> > > > > > > &err);
> > > > > > > + error_setg(&err, "virtio_net: couldn't plug in primary
> > > > > > > device");
> > > > > > > + return;
> > > > > > > + }
> > > > > > > + if (!n->primary_device_dict && err) {
> > > > > > > + if (n->primary_device_timer) {
> > > > > > > + timer_mod(n->primary_device_timer,
> > > > > > > + qemu_clock_get_ms(QEMU_CLOCK_VIRTUAL) +
> > > > > > > + 100);
> > > > > >
> > > > > > same here.
> > > >
> > > > see above
> > > >
> > > > > >
> > > > > >
> > > > > > > + }
> > > > > > > + }
> > > > > > > +}
> > > > > > > +
> > > > > > > +static void virtio_net_handle_migration_primary(VirtIONet *n,
> > > > > > > + MigrationState
> > > > > > > *s)
> > > > > > > +{
> > > > > > > + Error *err = NULL;
> > > > > > > + bool should_be_hidden =
> > > > > > > atomic_read(&n->primary_should_be_hidden);
> > > > > > > +
> > > > > > > + n->primary_dev = qdev_find_recursive(sysbus_get_default(),
> > > > > > > + n->primary_device_id);
> > > > > > > + if (!n->primary_dev) {
> > > > > > > + error_setg(&err, "virtio_net: couldn't find primary
> > > > > > > device");
> > > > > >
> > > > > > There's something broken with the error handling in this function -
> > > > > > the
> > > > > > 'err' never goes anywhere - I don't think it ever gets printed or
> > > > > > reported or stops the migration.
> > > >
> > > > yes, I'll fix it.
> > > >
> > > > > > > + }
> > > > > > > + if (migration_in_setup(s) && !should_be_hidden &&
> > > > > > > n->primary_dev) {
> > > > > > > + qdev_unplug(n->primary_dev, &err);
> > > > > >
> > > > > > Not knowing unplug well; can you just explain - is that device hard
> > > > > > unplugged and it's gone by the time this function returns or is it
> > > > > > still
> > > > > > hanging around for some indeterminate time?
> > > >
> > > > Qemu will trigger an unplug request via pcie attention button in which
> > > > case
> > > > there could be a delay by the guest operating system. We could give it
> > > > some
> > > > amount of time and if nothing happens try surpise removal or handle the
> > > > error otherwise.
> > > >
> > > >
> > > > regards,
> > > > Jens
> > >
> > > That's a subject for another day. Let's get the basic thing
> > > working.
> >
> > Well no, we need to know this thing isn't going to hang in the migration
> > setup phase, or if it does how we recover.
>
>
> This thing is *supposed* to be stuck in migration startup phase
> if guest is malicious.
>
> If migration does not progress management needs
> a way to detect this and cancel.
>
> Some more documentation about how this is supposed to happen
> would be helpful.
I want to see that first; because I want to convinced it's just a
documentation problem and that we actually really have a method of
recovering.
> > This patch series is very
> > odd precisely because it's trying to do the unplug itself in the
> > migration phase rather than let the management layer do it - so unless
> > it's nailed down how to make sure that's really really bullet proof
> > then we've got to go back and ask the question about whether we should
> > really fix it so it can be done by the management layer.
> >
> > Dave
>
> management already said they can't because files get closed and
> resources freed on unplug and so they might not be able to re-add device
> on migration failure. We do it in migration because that is
> where failures can happen and we can recover.
I find this explanation confusing - I can kind of see where it's coming
from, but we've got a pretty clear separation between a NIC and the
netdev that backs it; those files and resources should be associated
with the netdev and not the NIC. So does hot-removing the NIC really
clean up the netdev? (I guess maybe this is a different in vfio
which is the problem)
Dave
> > > --
> > > MST
> > --
> > Dr. David Alan Gilbert / address@hidden / Manchester, UK
--
Dr. David Alan Gilbert / address@hidden / Manchester, UK
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, (continued)
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Eduardo Habkost, 2019/05/30
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Michael S. Tsirkin, 2019/05/30
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Eduardo Habkost, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Michael S. Tsirkin, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Eduardo Habkost, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Dr. David Alan Gilbert, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Alex Williamson, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Michael S. Tsirkin, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Michael S. Tsirkin, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Eduardo Habkost, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support,
Dr. David Alan Gilbert <=
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Michael S. Tsirkin, 2019/05/30
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Dr. David Alan Gilbert, 2019/05/31
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Eduardo Habkost, 2019/05/30
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Dr. David Alan Gilbert, 2019/05/30
- Re: [Qemu-devel] [PATCH 3/4] net/virtio: add failover support, Eduardo Habkost, 2019/05/31
- [Qemu-devel] [PATCH 2/4] qdev/qbus: Add hidden device support, Jens Freimann, 2019/05/17
- [Qemu-devel] [PATCH 4/4] vfio/pci: unplug failover primary device before migration, Jens Freimann, 2019/05/17
- Re: [Qemu-devel] [PATCH 0/4] add failover feature for assigned network devices, Alex Williamson, 2019/05/20