On Fri, Jan 10, 2025 at 02:45:39PM +0100, David Hildenbrand wrote:
In your commit I read:
"Implement the cut operation to be hitless, changes to the page table
during cutting must cause zero disruption to any ongoing DMA. This is the
expectation of the VFIO type 1 uAPI. Hitless requires HW support, it is
incompatible with HW requiring break-before-make."
So I guess that would mean that, depending on HW support, one could avoid
disabling large pages to still allow for atomic cuts / partial unmaps that
don't affect concurrent DMA.
Yes. Most x86 server HW will do this, though ARM support is a bit newish.
What would be your suggestion here to avoid the "map each 4k page
individually so we can unmap it individually" ? I didn't completely grasp
that, sorry.
Map in large ranges in the VMM, lets say 1G of shared memory as a
single mapping (called an iommufd area)
When the guest makes a 2M chunk of it private you do a ioctl to
iommufd to split the area into three, leaving the 2M chunk as a
seperate area.
The new iommufd ioctl to split areas will go down into the iommu driver
and atomically cut the 1G PTEs into smaller PTEs as necessary so that
no PTE spans the edges of the 2M area.
Then userspace can unmap the 2M area and leave the remainder of the 1G
area mapped.
All of this would be fully hitless to ongoing DMA.
The iommufs code is there to do this assuming the areas are mapped at
4k, what is missing is the iommu driver side to atomically resize
large PTEs.
From "IIRC you can only trigger split using the VFIO type 1 legacy API. We
would need to formalize split as an IOMMUFD native ioctl.
Nobody should use this stuf through the legacy type 1 API!!!!"
I assume you mean that we can only avoid the 4k map/unmap if we add proper
support to IOMMUFD native ioctl, and not try making it fly somehow with the
legacy type 1 API?
The thread was talking about the built-in support in iommufd to split
mappings. That built-in support is only accessible through legacy APIs
and should never be used in new qemu code. To use that built in
support in new code we need to build new APIs. The advantage of the
built-in support is qemu can map in large regions (which is more
efficient) and the kernel will break it down to 4k for the iommu
driver.
Mapping 4k at a time through the uAPI would be outrageously
inefficient.