[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH rfcv1 00/23] intel_iommu: Enable stage-1 translation
From: |
Jason Wang |
Subject: |
Re: [PATCH rfcv1 00/23] intel_iommu: Enable stage-1 translation |
Date: |
Mon, 22 Jan 2024 12:29:51 +0800 |
On Mon, Jan 15, 2024 at 6:39 PM Zhenzhong Duan <zhenzhong.duan@intel.com> wrote:
>
> Hi,
>
> This series enables stage-1 translation support in intel iommu which
> we called "modern" mode. In this mode, we don't do shadowing of
> guest page table for passthrough device but pass stage-1 page table
> to host side to construct a nested domain; we also support emulated
> device by translating the stage-1 page table. There was some effort
> to enable this feature in old days, see [1] for details.
>
> The key design is to utilize the dual-stage IOMMU translation
> (also known as IOMMU nested translation) capability in host IOMMU.
> As the below diagram shows, guest I/O page table pointer in GPA
> (guest physical address) is passed to host and be used to perform
> the stage-1 address translation. Along with it, modifications to
> present mappings in the guest I/O page table should be followed
> with an IOTLB invalidation.
>
> .-------------. .---------------------------.
> | vIOMMU | | Guest I/O page table |
> | | '---------------------------'
> .----------------/
> | PASID Entry |--- PASID cache flush --+
> '-------------' |
> | | V
> | | I/O page table pointer in GPA
> '-------------'
> Guest
> ------| Shadow |---------------------------|--------
> v v v
> Host
> .-------------. .------------------------.
> | pIOMMU | | FS for GIOVA->GPA |
> | | '------------------------'
> .----------------/ |
> | PASID Entry | V (Nested xlate)
> '----------------\.----------------------------------.
> | | | SS for GPA->HPA, unmanaged domain|
> | | '----------------------------------'
> '-------------'
> Where:
> - FS = First stage page tables
> - SS = Second stage page tables
> <Intel VT-d Nested translation>
>
> There are some interactions between VFIO and vIOMMU.
> * vIOMMU registers PCIIOMMUOps to PCI subsystem which VFIO can
> use to registers/unregisters IOMMUDevice object.
> * VFIO registers an IOMMUFDDevice object at vfio device realize
> stage to vIOMMU, this is implemented as a prerequisite series[2].
> * vIOMMU calls IOMMUFDDevice interface callback IOMMUFDDeviceOps
> to bind/unbind device to IOMMUFD backed domains, either nested
> domain or not.
>
> See below diagram:
>
> VFIO Device Intel IOMMU
> .-----------------. .-------------------.
> | | | |
> | .---------|PCIIOMMUOps |.-------------. |
> | | IOMMUFD |(set_iommu_device) || IOMMUFD | |
> | | Device |------------------------>|| Device list | |
> | .---------|(unset_iommu_device) |.-------------. |
> | | | | |
> | | | V |
> | .---------| IOMMUFDDeviceOps| .---------. |
> | | IOMMUFD | (attach_hwpt)| | IOMMUFD | |
> | | link |<------------------------| | Device | |
> | .---------| (detach_hwpt)| .---------. |
> | | | | |
> | | | ... |
> .-----------------. .-------------------.
>
> Based on Yi's suggestion, we updated a new design of managing ioas and
> hwpt, made it support multiple iommufd objects and the ERRATA_772415
> case, meanwhile tried to be optimal to share ioas and hwpt whenever
> possible.
>
> Stage-2 page table could be shared by different devices if there is
> no conflict and devices link to same iommufd object, i.e. devices
> under same host IOMMU can share same stage-2 page table. If there
> is conflict, i.e. there is one device under non cache coherency
> mode which is different from others, it requires a seperate
> stage-2 page table in non-CC mode.
>
> SPR platform has ERRATA_772415 which requires no readonly mappings
> in stage-2 page table. This series supports creating VTDIOASContainer
> with no readonly mappings. I'm not clear if there is a rare case that
> some IOMMUs on a multiple IOMMUs host have ERRATA_772415, this design
> can survive even in that case.
>
> See below example diagram for a full view:
>
> IntelIOMMUState
> |
> V
> .------------------. .------------------. .-------------------.
> | VTDIOASContainer |--->| VTDIOASContainer |--->| VTDIOASContainer
> |-->...
> | (iommufd0,RW&RO) | | (iommufd1,RW&RO) | | (iommufd0,RW only)|
> .------------------. .------------------. .-------------------.
> | | |
> | .-->... |
> V V
> .-------------------. .-------------------.
> .---------------.
> | VTDS2Hwpt(CC) |--->| VTDS2Hwpt(non-CC) |-->... | VTDS2Hwpt(CC)
> |-->...
> .-------------------. .-------------------.
> .---------------.
> | | | |
> | | | |
> .-----------. .-----------. .------------. .------------.
> | IOMMUFD | | IOMMUFD | | IOMMUFD | | IOMMUFD |
> | Device(CC)| | Device(CC)| | Device | | Device(CC) |
> | (iommufd0)| | (iommufd0)| | (non-CC) | | (errata) |
> | | | | | (iommufd0) | | (iommufd0) |
> .-----------. .-----------. .------------. .------------.
>
> This series is also a prerequisite work for vSVA, i.e. Sharing
> guest application address space with passthrough devices.
>
> To enable "modern" mode, only need to add "x-scalable-mode=modern".
> i.e. -device intel-iommu,x-scalable-mode=modern,...
>
> Passthrough device should use iommufd backend to work in "modern" mode.
> i.e. -object iommufd,id=iommufd0 -device vfio-pci,iommufd=iommufd0,...
>
> If host doens't support nested translation, qemu will fail
> with an unsupported report.
>
> Test done:
> - devices hotplug/unplug
> - different devices linked to different iommufds
>
> PATCH1-2: Some preparing work to update header and IOMMUFD uAPI
> PATCH3-4: Initialize vfio IOMMUFDDevice interface and pass to vIOMMU
> PATCH5: Introduce a placeholder variable for scalable modern mode
> PATCH6: Sync host cap/ecap with vIOMMU default cap/ecap in modern mode
> PATCH7-22: Implement first stage page table for passthrough and emulated
> device
Can we split the series and start from the emulated devices (and have
a qtest for that)? This might help for reviewing.
Thanks
- [PATCH rfcv1 16/23] intel_iommu: rename slpte in iotlb_entry to pte, (continued)
- [PATCH rfcv1 16/23] intel_iommu: rename slpte in iotlb_entry to pte, Zhenzhong Duan, 2024/01/15
- [PATCH rfcv1 17/23] intel_iommu: implement firt level translation, Zhenzhong Duan, 2024/01/15
- [PATCH rfcv1 18/23] intel_iommu: fix the fault reason report, Zhenzhong Duan, 2024/01/15
- [PATCH rfcv1 21/23] intel_iommu: invalidate piotlb when flush pasid, Zhenzhong Duan, 2024/01/15
- [PATCH rfcv1 20/23] intel_iommu: piotlb invalidation should notify unmap, Zhenzhong Duan, 2024/01/15
- [PATCH rfcv1 19/23] intel_iommu: introduce pasid iotlb cache, Zhenzhong Duan, 2024/01/15
- [PATCH rfcv1 22/23] intel_iommu: refresh pasid bind after pasid cache force reset, Zhenzhong Duan, 2024/01/15
- [PATCH rfcv1 23/23] intel_iommu: modify x-scalable-mode to be string option, Zhenzhong Duan, 2024/01/15
- Re: [PATCH rfcv1 00/23] intel_iommu: Enable stage-1 translation,
Jason Wang <=