qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: DMA region abruptly removed from PCI device


From: Alex Williamson
Subject: Re: DMA region abruptly removed from PCI device
Date: Mon, 6 Jul 2020 08:20:04 -0600

On Mon, 6 Jul 2020 10:55:00 +0000
Thanos Makatos <thanos.makatos@nutanix.com> wrote:

> We have an issue when using the VFIO-over-socket libmuser PoC
> (https://www.mail-archive.com/qemu-devel@nongnu.org/msg692251.html) instead of
> the VFIO kernel module: we notice that DMA regions used by the emulated device
> can be abruptly removed while the device is still using them.
> 
> The PCI device we've implemented is an NVMe controller using SPDK, so it polls
> the submission queues for new requests. We use the latest SeaBIOS where it 
> tries
> to boot from the NVMe controller. Several DMA regions are registered
> (VFIO_IOMMU_MAP_DMA) and then the admin and a submission queues are created.
> From this point SPDK polls both queues. Then, the DMA region where the
> submission queue lies is removed (VFIO_IOMMU_UNMAP_DMA) and then re-added at 
> the
> same IOVA but at a different offset. SPDK crashes soon after as it accesses
> invalid memory. There is no other event (e.g. some PCI config space or NVMe
> register write) happening between unmapping and mapping the DMA region. My 
> guess
> is that this behavior is legitimate and that this is solved in the VFIO kernel
> module by releasing the DMA region only after all references to it have been
> released, which is handled by vfio_pin/unpin_pages, correct? If this is the 
> case
> then I suppose we need to implement the same logic in libmuser, but I just 
> want
> to make sure I'm not missing anything as this is a substantial change.

The vfio_{pin,unpin}_pages() interface only comes into play for mdev
devices and even then it's an announcement that a given mapping is
going away and the vendor driver is required to release references.
For normal PCI device assignment, vfio-pci is (aside from a few quirks)
device agnostic and the IOMMU container mappings are independent of the
device.  We do not have any device specific knowledge to know if DMA
pages still have device references.  The user's unmap request is
absolute, it cannot fail (aside from invalid usage) and upon return
there must be no residual mappings or references of the pages.

If you say there's no config space write, ex. clearing bus master from
the command register, then something like turning on a vIOMMU might
cause a change in the entire address space accessible by the device.
This would cause the identity map of IOVA to GPA to be replaced by a
new one, perhaps another identity map if iommu=pt or a more restricted
mapping if the vIOMMU is used for isolation.

It sounds like you have an incomplete device model, physical devices
have their address space adjusted by an IOMMU independent of, but
hopefully in collaboration with a device driver.  If a physical device
manages to bridge this transition, do what it does.  Thanks,

Alex




reply via email to

[Prev in Thread] Current Thread [Next in Thread]