qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets dis


From: Eric Auger
Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU gets disabled
Date: Fri, 24 Jan 2025 18:56:15 +0100
User-agent: Mozilla Thunderbird

Hi,


On 1/24/25 4:41 AM, Jason Wang wrote:
> On Fri, Jan 24, 2025 at 11:30 AM Jason Wang <jasowang@redhat.com> wrote:
>> On Fri, Jan 24, 2025 at 10:44 AM Duan, Zhenzhong
>> <zhenzhong.duan@intel.com> wrote:
>>>
>>>
>>>> -----Original Message-----
>>>> From: Eric Auger <eric.auger@redhat.com>
>>>> Subject: Re: [PATCH] hw/virtio/vhost: Disable IOTLB callbacks when IOMMU 
>>>> gets
>>>> disabled
>>>>
>>>> Hi Jason,
>>>>
>>>>
>>>> On 1/23/25 2:34 AM, Jason Wang wrote:
>>>>> On Wed, Jan 22, 2025 at 3:55 PM Eric Auger <eric.auger@redhat.com> wrote:
>>>>>> Hi Jason,
>>>>>>
>>>>>>
>>>>>> On 1/22/25 8:17 AM, Jason Wang wrote:
>>>>>>> On Wed, Jan 22, 2025 at 12:25 AM Eric Auger <eric.auger@redhat.com>
>>>> wrote:
>>>>>>>> Hi Jason,
>>>>>>>>
>>>>>>>> On 1/21/25 4:27 AM, Jason Wang wrote:
>>>>>>>>> On Tue, Jan 21, 2025 at 1:33 AM Eric Auger <eric.auger@redhat.com>
>>>> wrote:
>>>>>>>>>> When a guest exposed with a vhost device and protected by an
>>>>>>>>>> intel IOMMU gets rebooted, we sometimes observe a spurious warning:
>>>>>>>>>>
>>>>>>>>>> Fail to lookup the translated address ffffe000
>>>>>>>>>>
>>>>>>>>>> We observe that the IOMMU gets disabled through a write to the global
>>>>>>>>>> command register (CMAR_GCMD.TE) before the vhost device gets
>>>> stopped.
>>>>>>>>>> When this warning happens it can be observed an inflight IOTLB
>>>>>>>>>> miss occurs after the IOMMU disable and before the vhost stop. In
>>>>>>>>>> that case a flat translation occurs and the check in
>>>>>>>>>> vhost_memory_region_lookup() fails.
>>>>>>>>>>
>>>>>>>>>> Let's disable the IOTLB callbacks when all IOMMU MRs have been
>>>>>>>>>> unregistered.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Eric Auger <eric.auger@redhat.com>
>>>>>>>>>> ---
>>>>>>>>>>  hw/virtio/vhost.c | 4 ++++
>>>>>>>>>>  1 file changed, 4 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
>>>>>>>>>> index 6aa72fd434..128c2ab094 100644
>>>>>>>>>> --- a/hw/virtio/vhost.c
>>>>>>>>>> +++ b/hw/virtio/vhost.c
>>>>>>>>>> @@ -931,6 +931,10 @@ static void
>>>> vhost_iommu_region_del(MemoryListener *listener,
>>>>>>>>>>              break;
>>>>>>>>>>          }
>>>>>>>>>>      }
>>>>>>>>>> +    if (QLIST_EMPTY(&dev->iommu_list) &&
>>>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback) {
>>>>>>>>>> +        dev->vhost_ops->vhost_set_iotlb_callback(dev, false);
>>>>>>>>>> +    }
>>>>>>>>> So the current code assumes:
>>>>>>>>>
>>>>>>>>> 1) IOMMU is enabled before vhost starts
>>>>>>>>> 2) IOMMU is disabled after vhost stops
>>>>>>>>>
>>>>>>>>> This patch seems to fix 2) but not 1). Do we need to deal with the
>>>>>>>>> IOMMU enabled after vhost starts?
>>>>>>>> sorry I initially misunderstood the above comment. Indeed in the reboot
>>>>>>>> case assumption 2) happens to be wrong. However what I currently do is:
>>>>>>>> stop listening to iotlb miss requests from the kernel because my
>>>>>>>> understanding is those requests are just spurious ones, generate
>>>>>>>> warnings and we do not care since we are rebooting the system.
>>>>>>>>
>>>>>>>> However I do not claim this could handle the case where the IOMMU MR
>>>>>>>> would be turned off and then turned on. I think in that case we should
>>>>>>>> also flush the kernel IOTLB and this is not taken care of in this 
>>>>>>>> patch.
>>>>>>>> Is it a relevant use case?
>>>>>>> Not sure.
>>>>>>>
>>>>>>>> wrt removing assumption 1) and allow IOMMU enabled after vhost start. 
>>>>>>>> Is
>>>>>>>> that a valid use case as the virtio driver is using the dma api?
>>>>>>> It should not be but we can't assume the behaviour of the guest. It
>>>>>>> could be buggy or even malicious.
>>>>>> agreed
>>>>>>> Btw, we had the following codes while handling te:
>>>>>>>
>>>>>>> /* Handle Translation Enable/Disable */
>>>>>>> static void vtd_handle_gcmd_te(IntelIOMMUState *s, bool en)
>>>>>>> {
>>>>>>>     if (s->dmar_enabled == en) {
>>>>>>>         return;
>>>>>>>     }
>>>>>>>
>>>>>>>     trace_vtd_dmar_enable(en);
>>>>>>>
>>>>>>> ...
>>>>>>>
>>>>>>>     vtd_reset_caches(s);
>>>>>>>     vtd_address_space_refresh_all(s);
>>>>>>> }
>>>>>>>
>>>>>>> vtd_address_space_refresh_all() will basically disable the iommu
>>>>>>> memory region. It looks not sufficient to trigger the region_del
>>>>>>> callback, maybe we should delete the region or introduce listener
>>>>>>> callback?
>>>>>> This is exactly the code path which is entered in my use case.
>>>>>>
>>>>>> vtd_address_space_refresh_all(s) induces the vhost_iommu_region_del. But
>>>> given the current implement of this latter the IOTLB callback is not unset 
>>>> and the
>>>> kernel IOTLB is not refreshed. Also as I pointed out the  
>>>> hdev->mem->regions are
>>>> not updated? shouldn't they. Can you explain what they correspond to?
>>>>> Adding Peter for more ideas.
>>>>>
>>>>> I think it's better to find a way to trigger the listener here, probably:
>>>>>
>>>>> 1) add/delete the memory regions instead of enable/disable
>>>> sorry I don't understand what you mean. The vhost_iommu_region_del call
>>>> stack is provided below [1]. Write to the intel iommu GCMD TE bit
>>>> induces a call to vhost_iommu_region_del. This happens before the
>>>> vhost_dev_stop whose call stack is provided below [2] and originates
>>> >from a bus reset.
>>>> We may have inflight IOTLB miss requests happening between both.
>>>>
>>>> If this happens, vhost_device_iotlb_miss() fails because the IOVA is not
>>>> translated anymore by the IOMMU and the iotlb.translated_addr returned
>>>> by address_space_get_iotlb_entry() is not within the registered
>>>> vhost_memory_regions looked up in vhost_memory_region_lookup(), hence
>>>> the "Fail to lookup the translated address" message.
>>>>
>>>> It sounds wrong that vhost keeps on using IOVAs that are not translated
>>>> anymore. It looks we have a reset ordering issue and this patch is just
>>>> removing the sympton and not the cause.
>>>>
>>>> At the moment I don't really get what is initiating the intel iommu TE
>>>> bit write. This is a guest action but is it initiated from a misordered
>>>> qemu event?
>>> During reboot, native_machine_shutdown() calls x86_platform.iommu_shutdown()
>>> to disable iommu before reset. So Peter's patch will not address your issue.
>>>
>>> Before iommu shutdown, device_shutdown() is called to shutdown all devices.
>>> Not clear why vhost is still active.
>> It might be because neither virtio bus nor virtio-net provides a
>> shutdown method.
>>
>> There used to be requests to provide those to unbreak the kexec.
> More could be seen at https://issues.redhat.com/browse/RHEL-331

Cool, we have a culprit now :-) I see the ticket is closed, I will
reopen it. Are there known implementation challenges that caused the fix
postponing or do we "just" need someone with free cycles to carry out
the task. By the way FYI, we have other tickets opened related to vSMMU
and virtio devices also happening during reboot.

Thanks

Eric
> This looks exactly the same issue.
>
> Thanks
>
>> A quick try might be to provide a .driver.shutdown to
>> virtio_net_driver structure and reset the device there as a start.
>>
>> Thanks
>>
>>> Thanks
>>> Zhenzhong
>>>
>>>> It could also relate to
>>>> [PATCH 0/4] intel_iommu: Reset vIOMMU after all the rest of devices
>>>> https://lore.kernel.org/all/?q=s%3Aintel_iommu%3A+Reset+vIOMMU+after+all+
>>>> the+rest+of+devices




reply via email to

[Prev in Thread] Current Thread [Next in Thread]