qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU


From: Joao Martins
Subject: Re: [PATCH v4 00/15] vfio: VFIO migration support with vIOMMU
Date: Tue, 21 Jan 2025 16:42:46 +0000

On 28/11/2024 18:29, Joao Martins wrote:
> On 28/11/2024 03:19, Zhangfei Gao wrote:
>> Hi, Joao
>>
>> On Fri, Jun 23, 2023 at 5:51 AM Joao Martins <joao.m.martins@oracle.com> 
>> wrote:
>>>
>>> Hey,
>>>
>>> This series introduces support for vIOMMU with VFIO device migration,
>>> particurlarly related to how we do the dirty page tracking.
>>>
>>> Today vIOMMUs serve two purposes: 1) enable interrupt remaping 2)
>>> provide dma translation services for guests to provide some form of
>>> guest kernel managed DMA e.g. for nested virt based usage; (1) is specially
>>> required for big VMs with VFs with more than 255 vcpus. We tackle both
>>> and remove the migration blocker when vIOMMU is present provided the
>>> conditions are met. I have both use-cases here in one series, but I am happy
>>> to tackle them in separate series.
>>>
>>> As I found out we don't necessarily need to expose the whole vIOMMU
>>> functionality in order to just support interrupt remapping. x86 IOMMUs
>>> on Windows Server 2018[2] and Linux >=5.10, with qemu 7.1+ (or really
>>> Linux guests with commit c40aaaac10 and since qemu commit 8646d9c773d8)
>>> can instantiate a IOMMU just for interrupt remapping without needing to
>>> be advertised/support DMA translation. AMD IOMMU in theory can provide
>>> the same, but Linux doesn't quite support the IR-only part there yet,
>>> only intel-iommu.
>>>
>>> The series is organized as following:
>>>
>>> Patches 1-5: Today we can't gather vIOMMU details before the guest
>>> establishes their first DMA mapping via the vIOMMU. So these first four
>>> patches add a way for vIOMMUs to be asked of their properties at start
>>> of day. I choose the least churn possible way for now (as opposed to a
>>> treewide conversion) and allow easy conversion a posteriori. As
>>> suggested by Peter Xu[7], I have ressurected Yi's patches[5][6] which
>>> allows us to fetch PCI backing vIOMMU attributes, without necessarily
>>> tieing the caller (VFIO or anyone else) to an IOMMU MR like I
>>> was doing in v3.
>>>
>>> Patches 6-8: Handle configs with vIOMMU interrupt remapping but without
>>> DMA translation allowed. Today the 'dma-translation' attribute is
>>> x86-iommu only, but the way this series is structured nothing stops from
>>> other vIOMMUs supporting it too as long as they use
>>> pci_setup_iommu_ops() and the necessary IOMMU MR get_attr attributes
>>> are handled. The blocker is thus relaxed when vIOMMUs are able to toggle
>>> the toggle/report DMA_TRANSLATION attribute. With the patches up to this 
>>> set,
>>> we've then tackled item (1) of the second paragraph.
>>>
>>> Patches 9-15: Simplified a lot from v2 (patch 9) to only track the complete
>>> IOVA address space, leveraging the logic we use to compose the dirty ranges.
>>> The blocker is once again relaxed for vIOMMUs that advertise their IOVA
>>> addressing limits. This tackles item (2). So far I mainly use it with
>>> intel-iommu, although I have a small set of patches for virtio-iommu per
>>> Alex's suggestion in v2.
>>>
>>> Comments, suggestions welcome. Thanks for the review!
>>>
>>> Regards,
>>>         Joao
>>>
>>> Changes since v3[8]:
>>> * Pick up Yi's patches[5][6], and rework the first four patches.
>>>   These are a bit better splitted, and make the new iommu_ops *optional*
>>>   as opposed to a treewide conversion. Rather than returning an IOMMU MR
>>>   and let VFIO operate on it to fetch attributes, we instead let the
>>>   underlying IOMMU driver fetch the desired IOMMU MR and ask for the
>>>   desired IOMMU attribute. Callers only care about PCI Device backing
>>>   vIOMMU attributes regardless of its topology/association. (Peter Xu)
>>>   These patches are a bit better splitted compared to original ones,
>>>   and I've kept all the same authorship and note the changes from
>>>   original where applicable.
>>> * Because of the rework of the first four patches, switch to
>>>   individual attributes in the VFIOSpace that track dma_translation
>>>   and the max_iova. All are expected to be unused when zero to retain
>>>   the defaults of today in common code.
>>> * Improve the migration blocker message of the last patch to be
>>>   more obvious that vIOMMU migration blocker is added when no vIOMMU
>>>   address space limits are advertised. (Patch 15)
>>> * Cast to uintptr_t in IOMMUAttr data in intel-iommu (Philippe).
>>> * Switch to MAKE_64BIT_MASK() instead of plain left shift (Philippe).
>>> * Change diffstat of patches with scripts/git.orderfile (Philippe).
>>>
>>> Changes since v2[3]:
>>> * New patches 1-9 to be able to handle vIOMMUs without DMA translation, and
>>> introduce ways to know various IOMMU model attributes via the IOMMU MR. This
>>> is partly meant to address a comment in previous versions where we can't
>>> access the IOMMU MR prior to the DMA mapping happening. Before this series
>>> vfio giommu_list is only tracking 'mapped GIOVA' and that controlled by the
>>> guest. As well as better tackling of the IOMMU usage for interrupt-remapping
>>> only purposes.
>>> * Dropped Peter Xu ack on patch 9 given that the code changed a bit.
>>> * Adjust patch 14 to adjust for the VFIO bitmaps no longer being pointers.
>>> * The patches that existed in v2 of vIOMMU dirty tracking, are mostly
>>> * untouched, except patch 12 which was greatly simplified.
>>>
>>> Changes since v1[4]:
>>> - Rebased on latest master branch. As part of it, made some changes in
>>>   pre-copy to adjust it to Juan's new patches:
>>>   1. Added a new patch that passes threshold_size parameter to
>>>      .state_pending_{estimate,exact}() handlers.
>>>   2. Added a new patch that refactors vfio_save_block().
>>>   3. Changed the pre-copy patch to cache and report pending pre-copy
>>>      size in the .state_pending_estimate() handler.
>>> - Removed unnecessary P2P code. This should be added later on when P2P
>>>   support is added. (Alex)
>>> - Moved the dirty sync to be after the DMA unmap in vfio_dma_unmap()
>>>   (patch #11). (Alex)
>>> - Stored vfio_devices_all_device_dirty_tracking()'s value in a local
>>>   variable in vfio_get_dirty_bitmap() so it can be re-used (patch #11).
>>> - Refactored the viommu device dirty tracking ranges creation code to
>>>   make it clearer (patch #15).
>>> - Changed overflow check in vfio_iommu_range_is_device_tracked() to
>>>   emphasize that we specifically check for 2^64 wrap around (patch #15).
>>> - Added R-bs / Acks.
>>>
>>> [0] 
>>> https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
>>> [1] 
>>> https://lore.kernel.org/qemu-devel/c66d2d8e-f042-964a-a797-a3d07c260a3b@oracle.com/
>>> [2] 
>>> https://learn.microsoft.com/en-us/windows-hardware/design/device-experiences/oem-kernel-dma-protection
>>> [3] 
>>> https://lore.kernel.org/qemu-devel/20230222174915.5647-1-avihaih@nvidia.com/
>>> [4] 
>>> https://lore.kernel.org/qemu-devel/20230126184948.10478-1-avihaih@nvidia.com/
>>> [5] https://lore.kernel.org/all/20210302203827.437645-5-yi.l.liu@intel.com/
>>> [6] https://lore.kernel.org/all/20210302203827.437645-6-yi.l.liu@intel.com/
>>> [7] https://lore.kernel.org/qemu-devel/ZH9Kr6mrKNqUgcYs@x1n/
>>> [8] 
>>> https://lore.kernel.org/qemu-devel/20230530175937.24202-1-joao.m.martins@oracle.com/
>>>
>>> Avihai Horon (4):
>>>   memory/iommu: Add IOMMU_ATTR_MAX_IOVA attribute
>>>   intel-iommu: Implement IOMMU_ATTR_MAX_IOVA get_attr() attribute
>>>   vfio/common: Extract vIOMMU code from vfio_sync_dirty_bitmap()
>>>   vfio/common: Optimize device dirty page tracking with vIOMMU
>>>
>>> Joao Martins (7):
>>>   memory/iommu: Add IOMMU_ATTR_DMA_TRANSLATION attribute
>>>   intel-iommu: Implement get_attr() method
>>>   vfio/common: Track whether DMA Translation is enabled on the vIOMMU
>>>   vfio/common: Relax vIOMMU detection when DMA translation is off
>>>   vfio/common: Move dirty tracking ranges update to helper
>>>   vfio/common: Support device dirty page tracking with vIOMMU
>>>   vfio/common: Block migration with vIOMMUs without address width limits
>>>
>>> Yi Liu (4):
>>>   hw/pci: Add a pci_setup_iommu_ops() helper
>>>   hw/pci: Refactor pci_device_iommu_address_space()
>>>   hw/pci: Introduce pci_device_iommu_get_attr()
>>>   intel-iommu: Switch to pci_setup_iommu_ops()
>>>
>>
>> Would you mind pointing to the github address?
>> I have some conflicts, and the github will be much helpful.
> 
> Yeap, I have a series -- picking up from Cedric's rebase since 9.1 soft freeze
> -- but testing is still in progress.
> 
> Give me a couple days and I'll respond here as there's a little more changes 
> on
> top (now that we have IOMMUFD support) will get for v5.

Here it is the WIP (there's still 2 wrinkles left):

        https://github.com/jpemartins/qemu/commits/vfio-migration-viommu/

The first four patches relax the LM blocking of viommu if it's using IOMMUFD
dirty tracking. The rest is roughly this series that optimizes things a bit
though mostly useful for VF dirty tracking.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]