Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU

qemu-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU

From:	Alex Williamson
Subject:	Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU
Date:	Wed, 26 Oct 2022 11:22:20 -0600

On Wed, 26 Oct 2022 15:07:32 +0300
Andrey Ryabinin <arbn@yandex-team.com> wrote:

> On 10/17/22 18:21, Alex Williamson wrote:
> > On Mon, 17 Oct 2022 13:54:03 +0300
> > Andrey Ryabinin <arbn@yandex-team.com> wrote:
> >   
> >> These patches add possibility to pass VFIO device to QEMU using file
> >> descriptors of VFIO container/group, instead of creating those by QEMU.
> >> This allows to take away permissions to open /dev/vfio/* from QEMU and
> >> delegate that to managment layer like libvirt.
> >>
> >> The VFIO API doen't allow to pass just fd of device, since we also need to 
> >> have
> >> VFIO container and group. So these patches allow to pass created VFIO 
> >> container/group
> >> to QEMU via command line/QMP, e.g. like this:
> >>             -object vfio-container,id=ct,fd=5 \
> >>             -object vfio-group,id=grp,fd=6,container=ct \
> >>             -device vfio-pci,host=05:00.0,group=grp  
> > 
> > This suggests that management tools need to become intimately familiar
> > with container and group association restrictions for implicit
> > dependencies, such as device AddressSpace.  We had considered this
> > before and intentionally chosen to allow QEMU to manage that
> > relationship.  Things like PCI bus type and presence of a vIOMMU factor
> > into these relationships.
> >   
> 
> This is already the case. These patches doesn't change much.
> QEMU doesn't allow to adding device from one group to several address spaces.
> So the management tool needs to know whether devices are in the same group or 
> not
> and whether QEMU will create separate address spaces for these devices or not.
> 
> E.g.
> qemu-system-x86_64 -nodefaults -M q35,accel=kvm,kernel-irqchip=split \
>         -device intel-iommu,intremap=on,caching-mode=on \
>         -device vfio-pci,host=00:1f.3 \
>         -device vfio-pci,host=00:1f.4 
> qemu-system-x86_64: -device vfio-pci,host=00:1f.4: vfio 0000:00:1f.4: group 
> 14 used in multiple address spaces

Obviously QEMU fails this configuration.  It must.  How does that
suggest that a management tool, like libvirt, is already aware of this
requirement.  In fact, libvirt will happily validate xml creating such
a configuration.  The point was that tools like libvirt would need to
provide these group and container file descriptors and they currently
impose no restrictions or working knowledge on the relationship between
devices, groups, containers, and address spaces.

> > In the above example, what happens in a mixed environment, for example
> > if we then add '-device vfio-pci,host=06:00.0' to the command line?
> > Isn't QEMU still going to try to re-use the container if it exists in
> > the same address space? Potentially this device could also be a member
> > of the same group.  How would the management tool know when to expect
> > the provided fds be released?
> >   
> 
> Valid point, container indeed will be reused and second device will occupy it.
> But we could make new container instead. Using several containers in one 
> address
> space won't be a problem, right?
> Of course several devices from same group won't be allowed to be added in 
> mixed way.

Potentially, yes, that is a problem.  Each container represents a
separate IOMMU context, separate DMA map and unmap operations, and
separate locked page accounting.  So if libvirt chooses the more
trivial solution to impose a new container for every group, that
translates to space, time, and process accounting overhead.

> > We also have an outstanding RFC for iommufd that already proposes an fd
> > passing interface, where iommufd removes many of the issues of the vfio
> > container by supporting multiple address spaces within a single fd
> > context, avoiding the duplicate locked page accounting issues between
> > containers, and proposing a direct device fd interface for vfio.  Why at
> > this point in time would we choose to expand the QEMU vfio interface in
> > this way?  Thanks,
> >   
> 
> It sounds nice, but iommufd is new API which doesn't exist in any kernel yet.
> These patches is something that can be used on existing, already deployed 
> kernels.

OTOH, we expect iommufd in the near term, non-RFC patches are posted.
The vfio kernel modules have undergone significant churn in recent
kernels to align with the development goals of iommufd.  QEMU support to
accept file descriptors for "legacy" implementations of vfio is only
the beginning, where the next step would require the management tools
to be sufficiently enlightened to implement file descriptor passing.
All of that suggests development and maintenance effort for something
we're actively trying to replace.  Thanks,

Alex

[Prev in Thread]

Current Thread

[Next in Thread]

[PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU, Andrey Ryabinin, 2022/10/17
- [PATCH 3/4] vfio: Add 'group' property to 'vfio-pci' device, Andrey Ryabinin, 2022/10/17
- [PATCH 2/4] vfio: add vfio-group user createable object, Andrey Ryabinin, 2022/10/17
  - Re: [PATCH 2/4] vfio: add vfio-group user createable object, Markus Armbruster, 2022/10/17
- [PATCH 4/4] tests/avocado/vfio: add test for vfio devices, Andrey Ryabinin, 2022/10/17
- [PATCH 1/4] vfio: add vfio-container user createable object, Andrey Ryabinin, 2022/10/17
- Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU, Daniel P . Berrangé, 2022/10/17
  - Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU, Andrey Ryabinin, 2022/10/26
- Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU, Alex Williamson, 2022/10/17
  - Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU, Andrey Ryabinin, 2022/10/26
    - Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU, Alex Williamson <=

Prev by Date: Re: [PATCH v4 0/7] ppc/e500: Add support for two types of flash, cleanup
Next by Date: Re: [PATCH] qga: add channel path to error messages
Previous by thread: Re: [PATCH 0/4] Allow to pass pre-created VFIO container/group to QEMU
Next by thread: [PATCH 1/2] tests/qtest: migration-test: Fix [-Werror=format-overflow=] build warning
Index(es):
- Date
- Thread