[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets dis
From: |
Jason Wang |
Subject: |
Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled |
Date: |
Fri, 24 Jan 2025 09:48:04 +0800 |
On Thu, Jan 23, 2025 at 4:31 PM Eric Auger <eric.auger@redhat.com> wrote:
>
> Hi Jason,
>
>
> On 1/23/25 2:34 AM, Jason Wang wrote:
> > On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
> >> Hi Jason,
> >>
> >>
> >> On 1/22/25 8:17 AM, Jason Wang wrote:
> >>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com> wrote:
> >>>> Hi Jason,
> >>>>
> >>>> On 1/21/25 4:27 AM, Jason Wang wrote:
> >>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
> >>>>> wrote:
> >>>>>> When a guest exposed with a vhost device and protected by an
> >>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
> >>>>>>
> >>>>>> Fail to lookup the translated address ffffe000
> >>>>>>
> >>>>>> We observe that the IOMMU gets disabled through a write to the global
> >>>>>> command register (CMAR_GCMD.TE) before the vhost device gets stopped.
> >>>>>> When this warning happens it can be observed an inflight IOTLB
> >>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
> >>>>>> that case a flat translation occurs and the check in
> >>>>>> vhost_memory_region_lookup() fails.
> >>>>>>
> >>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
> >>>>>> unregistered.
> >>>>>>
> >>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
> >>>>>> ---
> >>>>>> hw/virtio/vhost.c | 4 ++++
> >>>>>> 1 file changed, 4 insertions(+)
> >>>>>>
> >>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> >>>>>> index 6aa72fd434..128c2ab094 100644
> >>>>>> --- a/hw/virtio/vhost.c
> >>>>>> +++ b/hw/virtio/vhost.c
> >>>>>> @@ -931,6 +931,10 @@ static void vhost_iommu_region_del(MemoryListener
> >>>>>> *listener,
> >>>>>> break;
> >>>>>> }
> >>>>>> }
> >>>>>> + if (QLIST_EMPTY(&dev->iommu_list) &&
> >>>>>> + dev->vhost_ops->vhost_set_iotlb_callback) {
> >>>>>> + dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
> >>>>>> + }
> >>>>> So the current code assumes:
> >>>>>
> >>>>> 1) IOMMU is enabled before vhost starts
> >>>>> 2) IOMMU is disabled after vhost stops
> >>>>>
> >>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
> >>>>> IOMMU enabled after vhost starts?
> >>>> sorry I initially misunderstood the above comment. Indeed in the reboot
> >>>> case assumption 2) happens to be wrong. However what I currently do is:
> >>>> stop listening to iotlb miss requests from the kernel because my
> >>>> understanding is those requests are just spurious ones, generate
> >>>> warnings and we do not care since we are rebooting the system.
> >>>>
> >>>> However I do not claim this could handle the case where the IOMMU MR
> >>>> would be turned off and then turned on. I think in that case we should
> >>>> also flush the kernel IOTLB and this is not taken care of in this patch.
> >>>> Is it a relevant use case?
> >>> Not sure.
> >>>
> >>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. Is
> >>>> that a valid use case as the virtio driver is using the dma api?
> >>> It should not be but we can't assume the behaviour of the guest. It
> >>> could be buggy or even malicious.
> >> agreed
> >>> Btw, we had the following codes while handling te:
> >>>
> >>> /* Handle Translation Enable/Disable */
> >>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
> >>> {
> >>> if (s->dmar_enabled == en) {
> >>> return;
> >>> }
> >>>
> >>> trace_vtd_dmar_enable(en);
> >>>
> >>> ...
> >>>
> >>> vtd_reset_caches(s);
> >>> vtd_address_space_refresh_all(s);
> >>> }
> >>>
> >>> vtd_address_space_refresh_all() will basically disable the iommu
> >>> memory region. It looks not sufficient to trigger the region_del
> >>> callback, maybe we should delete the region or introduce listener
> >>> callback?
> >> This is exactly the code path which is entered in my use case.
> >>
> >> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
> >> given the current implement of this latter the IOTLB callback is not unset
> >> and the kernel IOTLB is not refreshed. Also as I pointed out the
> >> hdev->mem->regions are not updated? shouldn't they. Can you explain what
> >> they correspond to?
> > Adding Peter for more ideas.
> >
> > I think it's better to find a way to trigger the listener here, probably:
> >
> > 1) add/delete the memory regions instead of enable/disable
> sorry I don't understand what you mean. The vhost_iommu_region_del call
> stack is provided below [1]. Write to the intel iommu GCMD TE bit
> induces a call to vhost_iommu_region_del. This happens before the
> vhost_dev_stop whose call stack is provided below [2] and originates
> from a bus reset.
Right, I see.
>
> We may have inflight IOTLB miss requests happening between both.
>
> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
> translated anymore by the IOMMU and the iotlb.translated_addr returned
> by address_space_get_iotlb_entry() is not within the registered
> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
> the "Fail to lookup the translated address" message.
>
> It sounds wrong that vhost keeps on using IOVAs that are not translated
> anymore. It looks we have a reset ordering issue and this patch is just
> removing the sympton and not the cause.
Exactly.
>
> At the moment I don't really get what is initiating the intel iommu TE
> bit write. This is a guest action but is it initiated from a misordered
> qemu event?
It could also be a guest bug which reset the vIOMMU before the device.
Do we have a calltrace in the guest?
Adding more vtd maintiners for more thoughts.
Thanks
>
> It could also relate to
> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+the+rest+of+devices
>
> Thanks
>
> Eric
>
>
>
>
> [1]
> #0 vhost_iommu_region_del (listener=0x55555708a388,
> section=0x7ffff62bf030) at ../hw/virtio/vhost.c:927
> #1 0x0000555555c851b4 in address_space_update_topology_pass
> (as=0x55555822e4f0, old_view=0x7fffe8e81850, new_view=0x55555723dfa0,
> adding=false) at ../system/memory.c:977
> #2 0x0000555555c857a0 in address_space_set_flatview (as=0x55555822e4f0)
> at ../system/memory.c:1079
> #3 0x0000555555c85986 in memory_region_transaction_commit () at
> ../system/memory.c:1132
> #4 0x0000555555c89f25 in memory_region_set_enabled (mr=0x555557dc0d30,
> enabled=false) at ../system/memory.c:2686
> #5 0x0000555555b5edb1 in vtd_switch_address_space (as=0x555557dc0c60)
> at ../hw/i386/intel_iommu.c:1735
> #6 0x0000555555b5ee6f in vtd_switch_address_space_all
> (s=0x555557db1500) at ../hw/i386/intel_iommu.c:1779
> #7 0x0000555555b64533 in vtd_address_space_refresh_all
> (s=0x555557db1500) at ../hw/i386/intel_iommu.c:4006
> #8 0x0000555555b60770 in vtd_handle_gcmd_te (s=0x555557db1500,
> en=false) at ../hw/i386/intel_iommu.c:2419
> #9 0x0000555555b60885 in vtd_handle_gcmd_write (s=0x555557db1500) at
> ../hw/i386/intel_iommu.c:2449
> #10 0x0000555555b61db2 in vtd_mem_write (opaque=0x555557db1500, addr=24,
> val=100663296, size=4) at ../hw/i386/intel_iommu.c:2991
> #11 0x0000555555c833e0 in memory_region_write_accessor
> (mr=0x555557db1840, addr=24, value=0x7ffff62bf3d8, size=4, shift=0,
> mask=4294967295, attrs=...) at ../system/memory.c:497
> #12 0x0000555555c83711 in access_with_adjusted_size
> (addr=24, value=0x7ffff62bf3d8, size=4, access_size_min=4,
> access_size_max=8, access_fn=0x555555c832ea
> <memory_region_write_accessor>, mr=0x555557db1840, attrs=...)
> at ../system/memory.c:573
> #13 0x0000555555c8698b in memory_region_dispatch_write
> (mr=0x555557db1840, addr=24, data=100663296, op=MO_32, attrs=...) at
> ../system/memory.c:1521
> #14 0x0000555555c95619 in flatview_write_continue_step (attrs=...,
> buf=0x7ffff7fbb028 "", len=4, mr_addr=24, l=0x7ffff62bf4c0,
> mr=0x555557db1840) at ../system/physmem.c:2803
> #15 0x0000555555c956eb in flatview_write_continue (fv=0x7fffe852d370,
> addr=4275634200, attrs=..., ptr=0x7ffff7fbb028, len=4, mr_addr=24, l=4,
> mr=0x555557db1840) at ../system/physmem.c:2833
> #16 0x0000555555c957f9 in flatview_write (fv=0x7fffe852d370,
> addr=4275634200, attrs=..., buf=0x7ffff7fbb028, len=4) at
> ../system/physmem.c:2864
> #17 0x0000555555c95c39 in address_space_write (as=0x555556ff1720
> <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028,
> len=4) at ../system/physmem.c:2984
> #18 0x0000555555c95cb1 in address_space_rw (as=0x555556ff1720
> <address_space_memory>, addr=4275634200, attrs=..., buf=0x7ffff7fbb028,
> len=4, is_write=true) at ../system/physmem.c:2994
> #19 0x0000555555ceeb56 in kvm_cpu_exec (cpu=0x55555727e950) at
> ../accel/kvm/kvm-all.c:3188
> #20 0x0000555555cf1ea6 in kvm_vcpu_thread_fn (arg=0x55555727e950) at
> ../accel/kvm/kvm-accel-ops.c:50
> #21 0x0000555555f6de20 in qemu_thread_start (args=0x555557288960) at
> ../util/qemu-thread-posix.c:541
> #22 0x00007ffff7289e92 in start_thread () at /lib64/libc.so.6
>
> [2]
> #0 vhost_dev_stop (hdev=0x55555708a2c0, vdev=0x555558234cb0,
> vrings=false) at ../hw/virtio/vhost.c:2222
> #1 0x0000555555990266 in vhost_net_stop_one (net=0x55555708a2c0,
> dev=0x555558234cb0) at ../hw/net/vhost_net.c:350
> #2 0x00005555559906ea in vhost_net_stop (dev=0x555558234cb0,
> ncs=0x55555826f968, data_queue_pairs=1, cvq=0) at ../hw/net/vhost_net.c:462
> #3 0x0000555555c1ad0a in virtio_net_vhost_status (n=0x555558234cb0,
> status=0 '\000') at ../hw/net/virtio-net.c:318
> #4 0x0000555555c1afaf in virtio_net_set_status (vdev=0x555558234cb0,
> status=0 '\000') at ../hw/net/virtio-net.c:393
> #5 0x0000555555c591bd in virtio_set_status (vdev=0x555558234cb0, val=0
> '\000') at ../hw/virtio/virtio.c:2242
> #6 0x0000555555c595eb in virtio_reset (opaque=0x555558234cb0) at
> ../hw/virtio/virtio.c:2325
> #7 0x0000555555a0d1e1 in virtio_bus_reset (bus=0x555558234c30) at
> ../hw/virtio/virtio-bus.c:109
> #8 0x0000555555a13d51 in virtio_pci_reset (qdev=0x55555822c830) at
> ../hw/virtio/virtio-pci.c:2282
> #9 0x0000555555a14016 in virtio_pci_bus_reset_hold (obj=0x55555822c830,
> type=RESET_TYPE_COLD) at ../hw/virtio/virtio-pci.c:2322
> #10 0x0000555555d08831 in resettable_phase_hold (obj=0x55555822c830,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:180
> #11 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557ffa3c0,
> cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
> type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #12 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20,
> obj=0x555557ffa3c0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #13 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ffa3c0,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #14 0x0000555555d0543b in device_reset_child_foreach
> (obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275
> #15 0x0000555555d084d8 in resettable_child_foreach (rc=0x55555715a090,
> obj=0x555557ff98e0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #16 0x0000555555d08792 in resettable_phase_hold (obj=0x555557ff98e0,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #17 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x555557445120,
> cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
> type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #18 0x0000555555d084d8 in resettable_child_foreach (rc=0x555557146f20,
> obj=0x555557445120, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #19 0x0000555555d08792 in resettable_phase_hold (obj=0x555557445120,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #20 0x0000555555d0543b in device_reset_child_foreach
> (obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/qdev.c:275
> #21 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555570cf410,
> obj=0x5555572d0a00, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #22 0x0000555555d08792 in resettable_phase_hold (obj=0x5555572d0a00,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #23 0x0000555555d00fc5 in bus_reset_child_foreach (obj=0x55555723d270,
> cb=0x555555d086d5 <resettable_phase_hold>, opaque=0x0,
> type=RESET_TYPE_COLD) at ../hw/core/bus.c:97
> #24 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555571bfde0,
> obj=0x55555723d270, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #25 0x0000555555d08792 in resettable_phase_hold (obj=0x55555723d270,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #26 0x0000555555d06d6d in resettable_container_child_foreach
> (obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resetcontainer.c:54
> #27 0x0000555555d084d8 in resettable_child_foreach (rc=0x5555572180f0,
> obj=0x55555724a8a0, cb=0x555555d086d5 <resettable_phase_hold>,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:92
> #28 0x0000555555d08792 in resettable_phase_hold (obj=0x55555724a8a0,
> opaque=0x0, type=RESET_TYPE_COLD) at ../hw/core/resettable.c:169
> #29 0x0000555555d0838d in resettable_assert_reset (obj=0x55555724a8a0,
> type=RESET_TYPE_COLD) at ../hw/core/resettable.c:58
> #30 0x0000555555d082e5 in resettable_reset (obj=0x55555724a8a0,
> type=RESET_TYPE_COLD) at ../hw/core/resettable.c:45
> #31 0x00005555558db5c8 in qemu_devices_reset
> (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/core/reset.c:179
> #32 0x0000555555b6f5b2 in pc_machine_reset (machine=0x555557234490,
> reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../hw/i386/pc.c:1877
> #33 0x0000555555a5a0a2 in qemu_system_reset
> (reason=SHUTDOWN_CAUSE_GUEST_RESET) at ../system/runstate.c:516
> #34 0x0000555555a5a891 in main_loop_should_exit (status=0x7fffffffd574)
> at ../system/runstate.c:792
> #35 0x0000555555a5a992 in qemu_main_loop () at ../system/runstate.c:825
> #36 0x0000555555e9cced in qemu_default_main () at ../system/main.c:37
> #37 0x0000555555e9cd2a in main (argc=58, argv=0x7fffffffd6d8) at
> ../system/main.c:48
>
>
> >
> > or
> >
> > 2) introduce new listener ops that can be triggered when a region is
> > enabled or disabled
> >
> > Thanks
> >
> >> Thanks
> >>
> >> Eric
> >>
> >>> Thanks
> >>>
> >>>> Eric
> >>>>
> >>>>
> >>>>> Thanks
> >>>>>
> >>>>>> }
> >>>>>>
> >>>>>> void vhost_toggle_device_iotlb(VirtIODevice *vdev)
> >>>>>> --
> >>>>>> 2.47.1
> >>>>>>
>
- [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Eric Auger, 2025/01/20
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/20
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Eric Auger, 2025/01/21
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Eric Auger, 2025/01/21
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/22
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Eric Auger, 2025/01/22
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/22
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Eric Auger, 2025/01/23
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled,
Jason Wang <=
- RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Duan, Zhenzhong, 2025/01/23
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/23
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/23
- RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Duan, Zhenzhong, 2025/01/23
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/24
- RE: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Duan, Zhenzhong, 2025/01/24
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Eric Auger, 2025/01/24
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Peter Xu, 2025/01/24
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/26
- Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled, Jason Wang, 2025/01/26