qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility chec


From: Duan, Zhenzhong
Subject: RE: [PATCH v2 3/5] intel_iommu: Add a framework to do compatibility check with host IOMMU cap/ecap
Date: Thu, 25 Apr 2024 08:46:19 +0000

Hi Cédric,

>-----Original Message-----
>From: Cédric Le Goater <clg@redhat.com>
>Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>compatibility check with host IOMMU cap/ecap
>
>Hello Zhenzhong,
>
>On 4/18/24 10:42, Duan, Zhenzhong wrote:
>> Hi Cédric,
>>
>>> -----Original Message-----
>>> From: Cédric Le Goater <clg@redhat.com>
>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>> compatibility check with host IOMMU cap/ecap
>>>
>>> Hello Zhenzhong
>>>
>>> On 4/17/24 11:24, Duan, Zhenzhong wrote:
>>>>
>>>>
>>>>> -----Original Message-----
>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>> compatibility check with host IOMMU cap/ecap
>>>>>
>>>>> On 4/17/24 06:21, Duan, Zhenzhong wrote:
>>>>>>
>>>>>>
>>>>>>> -----Original Message-----
>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>
>>>>>>> Hello,
>>>>>>>
>>>>>>> On 4/16/24 09:09, Duan, Zhenzhong wrote:
>>>>>>>> Hi Cédric,
>>>>>>>>
>>>>>>>>> -----Original Message-----
>>>>>>>>> From: Cédric Le Goater <clg@redhat.com>
>>>>>>>>> Subject: Re: [PATCH v2 3/5] intel_iommu: Add a framework to do
>>>>>>>>> compatibility check with host IOMMU cap/ecap
>>>>>>>>>
>>>>>>>>> On 4/8/24 10:44, Zhenzhong Duan wrote:
>>>>>>>>>> From: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>>
>>>>>>>>>> If check fails, the host side device(either vfio or vdpa device)
>should
>>>>> not
>>>>>>>>>> be passed to guest.
>>>>>>>>>>
>>>>>>>>>> Implementation details for different backends will be in
>following
>>>>>>> patches.
>>>>>>>>>>
>>>>>>>>>> Signed-off-by: Yi Liu <yi.l.liu@intel.com>
>>>>>>>>>> Signed-off-by: Yi Sun <yi.y.sun@linux.intel.com>
>>>>>>>>>> Signed-off-by: Zhenzhong Duan <zhenzhong.duan@intel.com>
>>>>>>>>>> ---
>>>>>>>>>>       hw/i386/intel_iommu.c | 35
>>>>>>>>> +++++++++++++++++++++++++++++++++++
>>>>>>>>>>       1 file changed, 35 insertions(+)
>>>>>>>>>>
>>>>>>>>>> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
>>>>>>>>>> index 4f84e2e801..a49b587c73 100644
>>>>>>>>>> --- a/hw/i386/intel_iommu.c
>>>>>>>>>> +++ b/hw/i386/intel_iommu.c
>>>>>>>>>> @@ -35,6 +35,7 @@
>>>>>>>>>>       #include "sysemu/kvm.h"
>>>>>>>>>>       #include "sysemu/dma.h"
>>>>>>>>>>       #include "sysemu/sysemu.h"
>>>>>>>>>> +#include "sysemu/iommufd.h"
>>>>>>>>>>       #include "hw/i386/apic_internal.h"
>>>>>>>>>>       #include "kvm/kvm_i386.h"
>>>>>>>>>>       #include "migration/vmstate.h"
>>>>>>>>>> @@ -3819,6 +3820,32 @@ VTDAddressSpace
>>>>>>>>> *vtd_find_add_as(IntelIOMMUState *s, PCIBus *bus,
>>>>>>>>>>           return vtd_dev_as;
>>>>>>>>>>       }
>>>>>>>>>>
>>>>>>>>>> +static int vtd_check_legacy_hdev(IntelIOMMUState *s,
>>>>>>>>>> +                                 HostIOMMUDevice *hiod,
>>>>>>>>>> +                                 Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>>>>>>>>> +                                  HostIOMMUDevice *hiod,
>>>>>>>>>> +                                  Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    return 0;
>>>>>>>>>> +}
>>>>>>>>>> +
>>>>>>>>>> +static int vtd_check_hdev(IntelIOMMUState *s,
>>>>>>> VTDHostIOMMUDevice
>>>>>>>>> *vtd_hdev,
>>>>>>>>>> +                          Error **errp)
>>>>>>>>>> +{
>>>>>>>>>> +    HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>>>>>>>> +
>>>>>>>>>> +    if (object_dynamic_cast(OBJECT(hiod),
>TYPE_HIOD_IOMMUFD))
>>> {
>>>>>>>>>> +        return vtd_check_iommufd_hdev(s, hiod, errp);
>>>>>>>>>> +    }
>>>>>>>>>> +
>>>>>>>>>> +    return vtd_check_legacy_hdev(s, hiod, errp);
>>>>>>>>>> +}
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I think we should be using the .get_host_iommu_info() class
>handler
>>>>>>>>> instead. Can we refactor the code slightly to avoid this check on
>>>>>>>>> the type ?
>>>>>>>>
>>>>>>>> There is some difficulty ini avoiding this check, the behavior of
>>>>>>> vtd_check_legacy_hdev
>>>>>>>> and vtd_check_iommufd_hdev are different especially after
>nesting
>>>>>>> support introduced.
>>>>>>>> vtd_check_iommufd_hdev() has much wider check over cap/ecap
>bits
>>>>>>> besides aw_bits.
>>>>>>>
>>>>>>> I think it is important to fully separate the vIOMMU model from the
>>>>>>> host IOMMU backing device.
>>>
>>> This comment is true for the structures also.
>>>
>>>>>>> Could we introduce a new HostIOMMUDeviceClass
>>>>>>> handler .check_hdev() handler, which would
>>> call .get_host_iommu_info() ?
>>>
>>> This means that HIOD_LEGACY_INFO and HIOD_IOMMUFD_INFO should
>be
>>> a common structure 'HostIOMMUDeviceInfo' holding all attributes
>>> for the different backends. Each .get_host_iommu_info() implementation
>>> would translate the specific host iommu device data presentation
>>> into the common 'HostIOMMUDeviceInfo', this is true for host_aw_bits.
>>
>> I see, it's just not easy to define the unified elements in
>HostIOMMUDeviceInfo
>> so that they maps to bits or fields in host return IOMMU info.
>
>The proposal is adding a vIOMMU <-> HostIOMMUDevice interface and a
>new
>API needs to be completely defined for it. The IOMMU backend
>implementation
>could be anything, legacy, iommufd, iommufd v2, some other framework
>and
>the vIOMMU shouldn't be aware of its implementation.
>
>Exposing the kernel structures as done below should be avoided because
>they are part of the QEMU <-> kernel IOMMUFD interface.
>
>
>> Different platform returned host IOMMU info is platform specific.
>> For vtd and siommu:
>>
>> struct iommu_hw_info_vtd {
>>          __u32 flags;
>>          __u32 __reserved;
>>          __aligned_u64 cap_reg;
>>          __aligned_u64 ecap_reg;
>> };
>>
>> struct iommu_hw_info_arm_smmuv3 {
>>         __u32 flags;
>>         __u32 __reserved;
>>         __u32 idr[6];
>>         __u32 iidr;
>>         __u32 aidr;
>> };
>>
>> I can think of two kinds of declaration of HostIOMMUDeviceInfo:
>>
>> struct HostIOMMUDeviceInfo {
>>      uint8_t aw_bits;
>>      enum iommu_hw_info_type type;
>>      union {
>>          struct iommu_hw_info_vtd vtd;
>>          struct iommu_hw_info_arm_smmuv3;
>>          ......
>>      } data;
>> }
>>
>> or
>>
>> struct HostIOMMUDeviceInfo {
>>      uint8_t aw_bits;
>>      enum iommu_hw_info_type type;
>>      __u32 flags;
>>      __aligned_u64 cap_reg;
>>      __aligned_u64 ecap_reg;
>>      __u32 idr[6];
>>      __u32 iidr;
>>      __u32 aidr;
>>     ......
>> }
>>
>> Not clear if any is your expected format.
>>
>>> 'type' could be handled the same way, with a 'HostIOMMUDeviceInfo'
>>> type attribute and host iommu device type definitions, or as you
>>> suggested with a QOM interface. This is more complex however. In
>>> this case, I would suggest to implement a .compatible() handler to
>>> compare the host iommu device type with the vIOMMU type.
>>>
>>> The resulting check_hdev routine would look something like :
>>>
>>> static int vtd_check_hdev(IntelIOMMUState *s, VTDHostIOMMUDevice
>>> *vtd_hdev,
>>>                            Error **errp)
>>> {
>>>      HostIOMMUDevice *hiod = vtd_hdev->dev;
>>>      HostIOMMUDeviceClass *hiodc =
>>> HOST_IOMMU_DEVICE_GET_CLASS(hiod);
>>>      HostIOMMUDevice info;
>>>      int host_aw_bits, ret;
>>>
>>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>>      if (ret) {
>>>          return ret;
>>>      }
>>>
>>>      ret = hiodc->is_compatible(hiod, VIOMMU_INTERFACE(s));
>>>      if (ret) {
>>>          return ret;
>>>      }
>>>
>>>      if (s->aw_bits > info.aw_bits) {
>>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>>                     s->aw_bits, info.aw_bits);
>>>          return -EINVAL;
>>>      }
>>> }
>>>
>>> and the HostIOMMUDeviceClass::is_compatible() handler would call a
>>> vIOMMUInterface::compatible() handler simply returning
>>> IOMMU_HW_INFO_TYPE_INTEL_VTD. How does that sound ?
>>
>> Not quite get what HostIOMMUDeviceClass::is_compatible() does.
>
>HostIOMMUDeviceClass::is_compatible() calls in the host IOMMU backend
>to determine which IOMMU types are exposed by the host, then calls the
>vIOMMUInterface::compatible() handler to do the compare. API is to be
>defined.
>
>As a refinement, we could introduce in the vIOMMU <-> HostIOMMUDevice
>interface capabilities, or features, to check more precisely the level
>of compatibility between the vIOMMU and the host IOMMU device. This is
>similar to what is done between QEMU and KVM.
>
>If you think this is too complex, include type in HostIOMMUDeviceInfo.
>
>> Currently legacy and IOMMUFD host device has different check logic, how
>it can help
>> in merging vtd_check_legacy_hdev() and vtd_check_iommufd_hdev() into
>a single vtd_check_hdev()?
>
>IMHO, IOMMU shouldn't be aware of the IOMMU backend implementation,
>but
>if you think the Intel vIOMMU should access directly the iommufd backend
>when available, then we should drop this proposal and revisit the design
>to take a different approach.

I implemented a draft following your suggestions so we could explore further.
See https://github.com/yiliu1765/qemu/tree/zhenzhong/iommufd_nesting_preq_v3_tmp

In this draft, it uses .check_cap() to query HOST_IOMMU_DEVICE_CAP_xxx
just like KVM CAPs.
A common HostIOMMUDeviceCaps structure is introduced to be used by
both legacy and iommufd backend.

It indeed is cleaner. Only problem is I failed to implement .compatible()
as all the check could go ahead by just calling check_cap().
Could you help a quick check to see if I misunderstood any of your suggestion?

Thanks
Zhenzhong

>
>> Below is the two functions after nesting series, for your easy reference:
>>
>> static int vtd_check_legacy_hdev()
>> {
>>      if (s->scalable_modern) {
>>          /* Modern vIOMMU and legacy backend */
>>          error_setg(errp, "Need IOMMUFD backend in scalable modern
>mode");
>>          return -EINVAL;
>>      }
>
>This part would typically go in the compatible() handler.
>
>>
>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      if (s->aw_bits > info.aw_bits) {
>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>                     s->aw_bits, info.aw_bits);
>>          return -EINVAL;
>>      }
>>
>>      return 0;
>> }
>>
>> static int vtd_check_iommufd_hdev(IntelIOMMUState *s,
>>                                    VTDHostIOMMUDevice *vtd_hdev,
>>                                    Error **errp)
>> {
>>      ret = hiodc->get_host_iommu_info(hiod, &info, sizeof(info), errp);
>>      if (ret) {
>>          return ret;
>>      }
>>
>>      if (info.type != IOMMU_HW_INFO_TYPE_INTEL_VTD) {
>>          error_setg(errp, "IOMMU hardware is not compatible");
>>          return -EINVAL;
>>      }
>>
>>      vtd = &info.data.vtd;
>>      host_aw_bits = VTD_MGAW_FROM_CAP(vtd->cap_reg) + 1;
>>      if (s->aw_bits > host_aw_bits) {
>>          error_setg(errp, "aw-bits %d > host aw-bits %d",
>>                     s->aw_bits, host_aw_bits);
>>          return -EINVAL;
>>      }
>>
>>      if (!s->scalable_modern) {
>>          goto done;
>>      }
>>
>>      if (!(vtd->ecap_reg & VTD_ECAP_NEST)) {
>>          error_setg(errp, "Host IOMMU doesn't support nested translation");
>>          return -EINVAL;
>>      }
>>
>>      if (s->fl1gp && !(vtd->cap_reg & VTD_CAP_FL1GP)) {
>>          error_setg(errp, "Stage-1 1GB huge page is unsupported by host
>IOMMU");
>>          return -EINVAL;
>>      }
>
>These checks above would typically go in the compatible() handler also.
>
>Now, the question is how useful will that framework be if hotplugging
>devices always fail because the vIOMMU and host IOMMU devices have
>incompatible settings/capabilities ? Shouldn't we also add properties
>at the vIOMMU level ?
>
>
>Thanks,
>
>C.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]