[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [RFC 0/2] disable the configuration interrupt for the unsupported de
From: |
Jason Wang |
Subject: |
Re: [RFC 0/2] disable the configuration interrupt for the unsupported device |
Date: |
Fri, 29 Mar 2024 11:27:54 +0800 |
On Fri, Mar 29, 2024 at 11:02 AM Cindy Lu <lulu@redhat.com> wrote:
>
> On Thu, Mar 28, 2024 at 12:12 PM Jason Wang <jasowang@redhat.com> wrote:
> >
> > On Wed, Mar 27, 2024 at 5:33 PM Cindy Lu <lulu@redhat.com> wrote:
> > >
> > > On Wed, Mar 27, 2024 at 5:12 PM Jason Wang <jasowang@redhat.com> wrote:
> > > >
> > > > On Wed, Mar 27, 2024 at 4:28 PM Cindy Lu <lulu@redhat.com> wrote:
> > > > >
> > > > > On Wed, Mar 27, 2024 at 3:54 PM Jason Wang <jasowang@redhat.com>
> > > > > wrote:
> > > > > >
> > > > > > On Wed, Mar 27, 2024 at 2:03 PM Cindy Lu <lulu@redhat.com> wrote:
> > > > > > >
> > > > > > > On Wed, Mar 27, 2024 at 11:05 AM Jason Wang <jasowang@redhat.com>
> > > > > > > wrote:
> > > > > > > >
> > > > > > > > Hi Cindy:
> > > > > > > >
> > > > > > > > On Wed, Mar 27, 2024 at 9:29 AM Cindy Lu <lulu@redhat.com>
> > > > > > > > wrote:
> > > > > > > > >
> > > > > > > > > we need a crash in Non-standard image, here is the jira for
> > > > > > > > > this https://issues.redhat.com/browse/RHEL-28522
> > > > > > > > > The root cause of the issue is that an IRQFD was used without
> > > > > > > > > initialization..
> > > > > > > > >
> > > > > > > > > During the booting process of the Vyatta image, the behavior
> > > > > > > > > of the called function in qemu is as follows:
> > > > > > > > >
> > > > > > > > > 1. vhost_net_stop() was called, this will call the function
> > > > > > > > > virtio_pci_set_guest_notifiers() with assgin= false, and
> > > > > > > > > virtio_pci_set_guest_notifiers() will release the irqfd for
> > > > > > > > > vector 0
> > > > > > > >
> > > > > > > > Before vhost_net_stop(), do we know which vector is used by
> > > > > > > > which queue?
> > > > > > > >
> > > > > > > before this stop, vdev->config_verctor is get from
> > > > > > > virtio_pci_common_read/virtio_pci_common_write
> > > > > > > it was set to vector 0
> > > > > >
> > > > > > I basically meant if vector 0 is shared with some virtqueues here.
> > > > > >
> > > > > Really sorry for this, vq's vector is 1,2, and will not share with the
> > > > > configure vector
> > > > > > > > >
> > > > > > > > > 2. virtio_reset() was called -->set configure vector to
> > > > > > > > > VIRTIO_NO_VECTORt
> > > > > > > > >
> > > > > > > > > 3.vhost_net_start() was called (at this time the configure
> > > > > > > > > vector is
> > > > > > > > > still VIRTIO_NO_VECTOR) and call
> > > > > > > > > virtio_pci_set_guest_notifiers() with
> > > > > > > > > assgin= true, so the irqfd for vector 0 was not "init" during
> > > > > > > > > this process
> > > > > > > >
> > > > > > > > How does the configure vector differ from the virtqueue vector
> > > > > > > > here?
> > > > > > > >
> > > > > > > All the vectors are VIRTIO_NO_VECTOR (including vq). any
> > > > > > > msix_fire_vector_notifier()
> > > > > > > been called will cause the crash at this time.
> > > > > >
> > > > > > Won't virtio_pci_set_guest_notifiers() will try to allocate irqfd
> > > > > > when
> > > > > > the assignment is true?
> > > > > >
> > > > > It will allocate, but the vector is VIRTIO_NO_VECTOR (0xffff)
> > > > >
> > > > > then it will called kvm_virtio_pci_vector_use_one()
> > > > >
> > > > > in this function, there is a check for
> > > > >
> > > > > if (vector >= msix_nr_vectors_allocated(dev))
> > > > >
> > > > > { return 0; }
> > > > >
> > > > > So it will return.
> > > >
> > > > How about let's just fix this?
> > > >
> > > > Btw, it's better to explain in detail like the above in the next
> > > > version.
> > > >
> > > > Thanks
> > > >
> > > The problem is I think the behavior here is correct, The vector here is
> > > VIRTIO_NO_VECTOR and we should return,
> >
> > So if I understand correctly, the configure vector is configured after
> > DRIVER_OK?
> >
> sorry I didn't get your point, Do you mean set_guest_notifiers()?,
> this was called during the system boot
> but for the value of vdev->config_vector/vq vector, this is changed
> by virtio_pci_common_read/virtio_pci_common_write
> and these function will not check the process DRIVER_OK.
I basically mean Qemu behave based on the guest's behaviour.
So what you've described looks like a guest trying to configure the
config vector after it sets DRIVER_OK. So Qemu tries to use the irqfd
without initializaiton.
> > Spec doesn't forbid this, this is something we need to support.
> >
> > It looks to me the correct fix is to kvm_virtio_pci_vector_use_one()
> > when guest is writing to msix_vector after DRIVER_OK?
> >
> if I understand correctly. do you mean
> when function virtio_pci_common_read/virtio_pci_common_write was called
> we need to check the number of vdev->config_vector/vq vector, if this
> was changed and also DRIVER_OK was set
> then we need to call virtio_pci_set_guest_notifiers() to re-init the irqfd?
It is not re-init, as it has been freed.
A quick fix would be, call kvm_virtio_pci_vector_use/unuse_one() when
a guest assign/deassign a vector after DRIVER_OK.
Thanks
> Thanks
> cindy
> > Thanks
> >
> > > the fix could work maybe is we try get to know if this was changed
> > > from another value
> > > and use that one? this seems strange.
> > > Thanks
> > > cindy
> > > > >
> > > > > > > So I think this should
> > > > > > > be a bug in this guest image
> > > > > >
> > > > > > The point is Qemu should not crash even if the guest driver is
> > > > > > buggy.
> > > > > >
> > > > > > It would be nice if we can have a qtest for this on top.
> > > > > >
> > > > > > Thanks
> > > > > >
> > > > > sure, got it, I have done the Qtest, and it passed
> > > > > here is the result
> > > > >
> > > > > Ok: 794
> > > > > Expected Fail: 0
> > > > > Fail: 0
> > > > > Unexpected Pass: 0
> > > > > Skipped: 32
> > > > > Timeout: 0
> > > > >
> > > > > > > > >
> > > > > > > > > 4. The system continues to boot and
> > > > > > > > > msix_fire_vector_notifier() was
> > > > > > > > > called unmask the vector 0 and then met the crash
> > > > > > > > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 1
> > > > > > > > > [msix_fire_vector_notifier] 112 called vector 0 is_masked 0
> > > > > > > > >
> > > > > > > > > The reason for not reproducing in RHEL/fedora guest image is
> > > > > > > > > because
> > > > > > > > > REHL/Fedora doesn't have the behavior of calling
> > > > > > > > > vhost_net_stop and then virtio_reset, and also won't call
> > > > > > > > > msix_fire_vector_notifier for vector 0 during system boot.
> > > > > > > > >
> > > > > > > > > The reason for not reproducing before configure interrupt
> > > > > > > > > support is because
> > > > > > > > > vector 0 is for configure interrupt, before the support for
> > > > > > > > > configure interrupts, the notifier process will not handle
> > > > > > > > > vector 0.
> > > > > > > > >
> > > > > > > > > For the device Vyatta using, it doesn't support configure
> > > > > > > > > interrupts at all, So we plan to disable the configure
> > > > > > > > > interrupts in unsupported device
> > > > > > > >
> > > > > > > > Btw, let's tweak the changelog, it's a little bit hard to
> > > > > > > > understand.
> > > > > > > >
> > > > > > > sure will do
> > > > > > > thanks
> > > > > > > Cindy
> > > > > > > > Thanks
> > > > > > > >
> > > > > > > > >
> > > > > > > > > Signed-off-by: Cindy Lu <lulu@redhat.com>
> > > > > > > > >
> > > > > > > > > Cindy Lu (2):
> > > > > > > > > virtio-net: disable the configure interrupt for not support
> > > > > > > > > device
> > > > > > > > > virtio-pci: check if the configure interrupt enable
> > > > > > > > >
> > > > > > > > > hw/net/virtio-net.c | 5 ++++-
> > > > > > > > > hw/virtio/virtio-pci.c | 41
> > > > > > > > > +++++++++++++++++++++-----------------
> > > > > > > > > hw/virtio/virtio.c | 1 +
> > > > > > > > > include/hw/virtio/virtio.h | 1 +
> > > > > > > > > 4 files changed, 29 insertions(+), 19 deletions(-)
> > > > > > > > >
> > > > > > > > > --
> > > > > > > > > 2.43.0
> > > > > > > > >
> > > > > > > >
> > > > > > >
> > > > > >
> > > > >
> > > >
> > >
> >
>
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, (continued)
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Jason Wang, 2024/03/27
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Cindy Lu, 2024/03/27
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Jason Wang, 2024/03/27
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Jason Wang, 2024/03/27
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Cindy Lu, 2024/03/27
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Jason Wang, 2024/03/28
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Cindy Lu, 2024/03/28
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Cindy Lu, 2024/03/27
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Jason Wang, 2024/03/28
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device, Cindy Lu, 2024/03/28
- Re: [RFC 0/2] disable the configuration interrupt for the unsupported device,
Jason Wang <=