qemu-arm
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Multiple vIOMMU instance support in QEMU?


From: Jason Gunthorpe
Subject: Re: Multiple vIOMMU instance support in QEMU?
Date: Thu, 18 May 2023 17:19:20 -0300

On Thu, May 18, 2023 at 03:45:24PM -0400, Peter Xu wrote:
> On Thu, May 18, 2023 at 11:56:46AM -0300, Jason Gunthorpe wrote:
> > On Thu, May 18, 2023 at 10:16:24AM -0400, Peter Xu wrote:
> > 
> > > What you mentioned above makes sense to me from the POV that 1 vIOMMU may
> > > not suffice, but that's at least totally new area to me because I never
> > > used >1 IOMMUs even bare metal (excluding the case where I'm aware that
> > > e.g. a GPU could have its own IOMMU-like dma translator).
> > 
> > Even x86 systems are multi-iommu, one iommu per physical CPU socket.
> 
> I tried to look at a 2-node system on hand and I indeed got two dmars:
> 
> [    4.444788] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap 
> 8d2078c106f0466 ecap f020df
> [    4.459673] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap 
> 8d2078c106f0466 ecap f020df
> 
> Though they do not seem to be all parallel on attaching devices.  E.g.,
> most of the devices on this host are attached to dmar1, while there're only
> two devices attached to dmar0:

Yeah, I expect it has to do with physical topology. PCIe devices
physically connected to each socket should use the socket local iommu
and the socket local caches.

ie it would be foolish to take an IO in socket A and the forward it to
socket B to perform IOMMU translation then forward it back to socket A
to land in memory.

> > I'm not sure how they model this though - Kevin do you know? Do we get
> > multiple iommu instances in Linux or is all the broadcasting of
> > invalidates and sharing of tables hidden?
> > 
> > > What's the system layout of your multi-vIOMMU world?  Is there still a
> > > centric vIOMMU, or multi-vIOMMUs can run fully in parallel, so that e.g. 
> > > we
> > > can have DEV1,DEV2 under vIOMMU1 and DEV3,DEV4 under vIOMMU2?
> > 
> > Just like physical, each viommu is parallel and independent. Each has
> > its own caches, ASIDs, DIDs/etc and thus invalidation domains.
> > 
> > The seperated caches is the motivating reason to do this as something
> > like vCMDQ is a direct command channel for invalidations to only the
> > caches of a single IOMMU block.
> 
> From cache invalidation pov, shouldn't the best be per-device granule (like
> dev-iotlb in VT-d? No idea for ARM)?

There are many caches and different cache tag schemes in an iommu. All
of them are local to the IOMMU block.

Consider where we might have a single vDID but the devices using that
DID are spread across two physical IOMMUs. When the VM asks to
invalidate the vDID the system has to generate two physical pDID
invalidations.

This can't be done without a software mediation layer in the VMM.

The better solution is to make the pDID and vDID 1:1 so the VM itself
replicates the invalidations. The VM has better knowledge of when
replication is needed so it is overall more efficient.

> I see that Intel is already copied here (at least Yi and Kevin) so I assume
> there're already some kind of synchronizations on multi-vIOMMU vs recent
> works on Intel side, which is definitely nice and can avoid work conflicts.

I actually don't know that.. Intel sees multiple DMAR blocks in SW and
they have kernel level replication of invalidation.. Intel doesn't
have a HW fast path yet so they can rely on mediation to fix it. Thus
I expect there is no HW replication of invalidations here. Kevin?

Remember the VFIO API hides all of this, when you change the VFIO
container it automatically generates all requires invalidations in the
kernel.

I also heard AMD has a HW fast and also multi-iommu but I don't really
know the details.

Jason



reply via email to

[Prev in Thread] Current Thread [Next in Thread]