qemu-commits
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Qemu-commits] [qemu/qemu] f1334d: memory/iommu: Add get_attr()


From: GitHub
Subject: [Qemu-commits] [qemu/qemu] f1334d: memory/iommu: Add get_attr()
Date: Wed, 07 Feb 2018 08:23:39 -0800

  Branch: refs/heads/master
  Home:   https://github.com/qemu/qemu
  Commit: f1334de60b2a43102d2d47918463e6a2cdcfcdeb
      
https://github.com/qemu/qemu/commit/f1334de60b2a43102d2d47918463e6a2cdcfcdeb
  Author: Alexey Kardashevskiy <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M include/exec/memory.h
    M memory.c

  Log Message:
  -----------
  memory/iommu: Add get_attr()

This adds get_attr() to IOMMUMemoryRegionClass, like
iommu_ops::domain_get_attr in the Linux kernel.

This defines the first attribute - IOMMU_ATTR_SPAPR_TCE_FD - which
will be used between the pSeries machine and VFIO-PCI.

Signed-off-by: Alexey Kardashevskiy <address@hidden>
Acked-by: Paolo Bonzini <address@hidden>
Acked-by: David Gibson <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: 07bc681a331311e9c51d1cd8933739a80cd57af8
      
https://github.com/qemu/qemu/commit/07bc681a331311e9c51d1cd8933739a80cd57af8
  Author: Alexey Kardashevskiy <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/common.c
    M hw/vfio/trace-events

  Log Message:
  -----------
  vfio/spapr: Use iommu memory region's get_attr()

In order to enable TCE operations support in KVM, we have to inform
the KVM about VFIO groups being attached to specific LIOBNs. The KVM
already knows about VFIO groups, the only bit missing is which
in-kernel TCE table (the one with user visible TCEs) should update
the attached broups. There is an KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE
attribute of the VFIO KVM device which receives a groupfd/tablefd couple.

This uses a new memory_region_iommu_get_attr() helper to get the IOMMU fd
and calls KVM to establish the link.

As get_attr() is not implemented yet, this should cause no behavioural
change.

Signed-off-by: Alexey Kardashevskiy <address@hidden>
Acked-by: Paolo Bonzini <address@hidden>
Acked-by: David Gibson <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: 9ded780c4cc92d15a977dba589d64862e25a340e
      
https://github.com/qemu/qemu/commit/9ded780c4cc92d15a977dba589d64862e25a340e
  Author: Alexey Kardashevskiy <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/ppc/spapr_iommu.c
    M target/ppc/kvm.c
    M target/ppc/kvm_ppc.h

  Log Message:
  -----------
  spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device

In order to enable TCE operations support in KVM, we have to inform
the KVM about VFIO groups being attached to specific LIOBNs;
the necessary bits are implemented already by IOMMU MR and VFIO.

This defines get_attr() for the SPAPR TCE IOMMU MR which makes VFIO
call the KVM_DEV_VFIO_GROUP_SET_SPAPR_TCE ioctl and establish
LIOBN-to-IOMMU link.

This changes spapr_tce_set_need_vfio() to avoid TCE table reallocation
if the kernel supports the TCE acceleration.

Signed-off-by: Alexey Kardashevskiy <address@hidden>
Acked-by: Paolo Bonzini <address@hidden>
Acked-by: David Gibson <address@hidden>
[aw - remove unnecessary sys/ioctl.h include]
Signed-off-by: Alex Williamson <address@hidden>


  Commit: edd09278932ac24adbf23ca7f7329bebaa7d9741
      
https://github.com/qemu/qemu/commit/edd09278932ac24adbf23ca7f7329bebaa7d9741
  Author: Alex Williamson <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/pci.h

  Log Message:
  -----------
  vfio/pci: Fixup VFIOMSIXInfo comment

The fields were removed in the referenced commit, but the comment
still mentions them.

Fixes: 2fb9636ebf24 ("vfio-pci: Remove unused fields from VFIOMSIXInfo")
Tested-by: Alexey Kardashevskiy <address@hidden>
Reviewed-by: Eric Auger <address@hidden>
Tested-by: Eric Auger <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: 3a286732d1563bdb440718d4e68137e06af785dd
      
https://github.com/qemu/qemu/commit/3a286732d1563bdb440718d4e68137e06af785dd
  Author: Alex Williamson <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/pci.c
    M hw/vfio/pci.h

  Log Message:
  -----------
  vfio/pci: Add base BAR MemoryRegion

Add one more layer to our stack of MemoryRegions, this base region
allows us to register BARs independently of the vfio region or to
extend the size of BARs which do map to a region.  This will be
useful when we want hypervisor defined BARs or sections of BARs,
for purposes such as relocating MSI-X emulation.  We therefore call
msix_init() based on this new base MemoryRegion, while the quirks,
which only modify regions still operate on those sub-MemoryRegions.

Signed-off-by: Alex Williamson <address@hidden>


  Commit: 04f336b05ff54f53234b391e444226d8c2481fb7
      
https://github.com/qemu/qemu/commit/04f336b05ff54f53234b391e444226d8c2481fb7
  Author: Alex Williamson <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/pci.c

  Log Message:
  -----------
  vfio/pci: Emulate BARs

The kernel provides similar emulation of PCI BAR register access to
QEMU, so up until now we've used that for things like BAR sizing and
storing the BAR address.  However, if we intend to resize BARs or add
BARs that don't exist on the physical device, we need to switch to the
pure QEMU emulation of the BAR.

Tested-by: Alexey Kardashevskiy <address@hidden>
Reviewed-by: Eric Auger <address@hidden>
Tested-by: Eric Auger <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: c3bbbdbf4b0fcb116ed9b6bae35971e354ab7e42
      
https://github.com/qemu/qemu/commit/c3bbbdbf4b0fcb116ed9b6bae35971e354ab7e42
  Author: Alex Williamson <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/core/qdev-properties.c
    M include/hw/qdev-properties.h
    M qapi/common.json

  Log Message:
  -----------
  qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR

Add an option which allows the user to specify a PCI BAR number,
including an 'off' and 'auto' selection.

Cc: Markus Armbruster <address@hidden>
Cc: Eric Blake <address@hidden>
Tested-by: Alexey Kardashevskiy <address@hidden>
Reviewed-by: Eric Auger <address@hidden>
Tested-by: Eric Auger <address@hidden>
Reviewed-by: Markus Armbruster <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: 89d5202edc5053e167c97f8e2341b2b9aa03a5c2
      
https://github.com/qemu/qemu/commit/89d5202edc5053e167c97f8e2341b2b9aa03a5c2
  Author: Alex Williamson <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/pci.c
    M hw/vfio/pci.h
    M hw/vfio/trace-events

  Log Message:
  -----------
  vfio/pci: Allow relocating MSI-X MMIO

Recently proposed vfio-pci kernel changes (v4.16) remove the
restriction preventing userspace from mmap'ing PCI BARs in areas
overlapping the MSI-X vector table.  This change is primarily intended
to benefit host platforms which make use of system page sizes larger
than the PCI spec recommendation for alignment of MSI-X data
structures (ie. not x86_64).  In the case of POWER systems, the SPAPR
spec requires the VM to program MSI-X using hypercalls, rendering the
MSI-X vector table unused in the VM view of the device.  However,
ARM64 platforms also support 64KB pages and rely on QEMU emulation of
MSI-X.  Regardless of the kernel driver allowing mmaps overlapping
the MSI-X vector table, emulation of the MSI-X vector table also
prevents direct mapping of device MMIO spaces overlapping this page.
Thanks to the fact that PCI devices have a standard self discovery
mechanism, we can try to resolve this by relocating the MSI-X data
structures, either by creating a new PCI BAR or extending an existing
BAR and updating the MSI-X capability for the new location.  There's
even a very slim chance that this could benefit devices which do not
adhere to the PCI spec alignment guidelines on x86_64 systems.

This new x-msix-relocation option accepts the following choices:

  off: Disable MSI-X relocation, use native device config (default)
  auto: Use a known good combination for the platform/device (none yet)
  bar0..bar5: Specify the target BAR for MSI-X data structures

If compatible, the target BAR will either be created or extended and
the new portion will be used for MSI-X emulation.

The first obvious user question with this option is how to determine
whether a given platform and device might benefit from this option.
In most cases, the answer is that it won't, especially on x86_64.
Devices often dedicate an entire BAR to MSI-X and therefore no
performance sensitive registers overlap the MSI-X area.  Take for
example:

# lspci -vvvs 0a:00.0
0a:00.0 Ethernet controller: Intel Corporation I350 Gigabit Network Connection
        ...
        Region 0: Memory at db680000 (32-bit, non-prefetchable) [size=512K]
        Region 3: Memory at db7f8000 (32-bit, non-prefetchable) [size=16K]
        ...
        Capabilities: [70] MSI-X: Enable+ Count=10 Masked-
                Vector table: BAR=3 offset=00000000
                PBA: BAR=3 offset=00002000

This device uses the 16K bar3 for MSI-X with the vector table at
offset zero and the pending bits arrary at offset 8K, fully honoring
the PCI spec alignment guidance.  The data sheet specifically refers
to this as an MSI-X BAR.  This device would not see a benefit from
MSI-X relocation regardless of the platform, regardless of the page
size.

However, here's another example:

# lspci -vvvs 02:00.0
02:00.0 Serial Attached SCSI controller: xxxxxxxx
        ...
        Region 0: I/O ports at c000 [size=256]
        Region 1: Memory at ef640000 (64-bit, non-prefetchable) [size=64K]
        Region 3: Memory at ef600000 (64-bit, non-prefetchable) [size=256K]
        ...
        Capabilities: [c0] MSI-X: Enable+ Count=16 Masked-
                Vector table: BAR=1 offset=0000e000
                PBA: BAR=1 offset=0000f000

Here the MSI-X data structures are placed on separate 4K pages at the
end of a 64KB BAR.  If our host page size is 4K, we're likely fine,
but at 64KB page size, MSI-X emulation at that location prevents the
entire BAR from being directly mapped into the VM address space.
Overlapping performance sensitive registers then starts to be a very
likely scenario on such a platform.  At this point, the user could
enable tracing on vfio_region_read and vfio_region_write to determine
more conclusively if device accesses are being trapped through QEMU.

Upon finding a device and platform in need of MSI-X relocation, the
next problem is how to choose target PCI BAR to host the MSI-X data
structures.  A few key rules to keep in mind for this selection
include:

 * There are only 6 BAR slots, bar0..bar5
 * 64-bit BARs occupy two BAR slots, 'lspci -vvv' lists the first slot
 * PCI BARs are always a power of 2 in size, extending == doubling
 * The maximum size of a 32-bit BAR is 2GB
 * MSI-X data structures must reside in an MMIO BAR

Using these rules, we can evaluate each BAR of the second example
device above as follows:

 bar0: I/O port BAR, incompatible with MSI-X tables
 bar1: BAR could be extended, incurring another 64KB of MMIO
 bar2: Unavailable, bar1 is 64-bit, this register is used by bar1
 bar3: BAR could be extended, incurring another 256KB of MMIO
 bar4: Unavailable, bar3 is 64bit, this register is used by bar3
 bar5: Available, empty BAR, minimum additional MMIO

A secondary optimization we might wish to make in relocating MSI-X
is to minimize the additional MMIO required for the device, therefore
we might test the available choices in order of preference as bar5,
bar1, and finally bar3.  The original proposal for this feature
included an 'auto' option which would choose bar5 in this case, but
various drivers have been found that make assumptions about the
properties of the "first" BAR or the size of BARs such that there
appears to be no foolproof automatic selection available, requiring
known good combinations to be sourced from users.  This patch is
pre-enabled for an 'auto' selection making use of a validated lookup
table, but no entries are yet identified.

Tested-by: Alexey Kardashevskiy <address@hidden>
Reviewed-by: Eric Auger <address@hidden>
Tested-by: Eric Auger <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: 89202c6fa87d4f181111901bb08dcd1538f8ab35
      
https://github.com/qemu/qemu/commit/89202c6fa87d4f181111901bb08dcd1538f8ab35
  Author: Eric Auger <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/platform.c

  Log Message:
  -----------
  hw/vfio/platform: Init the interrupt mutex

Add the initialization of the mutex protecting the interrupt list.

Signed-off-by: Eric Auger <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: a5b04f7c5380340342ad5623b34c57fe3bab9b29
      
https://github.com/qemu/qemu/commit/a5b04f7c5380340342ad5623b34c57fe3bab9b29
  Author: Alexey Kardashevskiy <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/common.c

  Log Message:
  -----------
  vfio/common: Remove redundant copy of local variable

There is already @hostwin in vfio_listener_region_add() so there is no
point in having the other one.

Fixes: 2e4109de8e58 ("vfio/spapr: Create DMA window dynamically (SPAPR IOMMU 
v2)")
Signed-off-by: Alexey Kardashevskiy <address@hidden>
Signed-off-by: Alex Williamson <address@hidden>


  Commit: db32d0f43839627f54a1a7f8eee17baa770f52d2
      
https://github.com/qemu/qemu/commit/db32d0f43839627f54a1a7f8eee17baa770f52d2
  Author: Alex Williamson <address@hidden>
  Date:   2018-02-06 (Tue, 06 Feb 2018)

  Changed paths:
    M hw/vfio/pci-quirks.c
    M hw/vfio/pci.c
    M hw/vfio/pci.h

  Log Message:
  -----------
  vfio/pci: Add option to disable GeForce quirks

These quirks are necessary for GeForce, but not for Quadro/GRID/Tesla
assignment.  Leaving them enabled is fully functional and provides the
most compatibility, but due to the unique NVIDIA MSI ACK behavior[1],
it also introduces latency in re-triggering the MSI interrupt.  This
overhead is typically negligible, but has been shown to adversely
affect some (very) high interrupt rate applications.  This adds the
vfio-pci device option "x-no-geforce-quirks=" which can be set to
"on" to disable this additional overhead.

A follow-on optimization for GeForce might be to make use of an
ioeventfd to allow KVM to trigger an irqfd in the kernel vfio-pci
driver, avoiding the bounce through userspace to handle this device
write.

[1] Background: the NVIDIA driver has been observed to issue a write
to the MMIO mirror of PCI config space in BAR0 in order to allow the
MSI interrupt for the device to retrigger.  Older reports indicated a
write of 0xff to the (read-only) MSI capability ID register, while
more recently a write of 0x0 is observed at config space offset 0x704,
non-architected, extended config space of the device (BAR0 offset
0x88704).  Virtualization of this range is only required for GeForce.

Signed-off-by: Alex Williamson <address@hidden>


  Commit: ea62da0913d20338b8a47bbfaef2e8f2763ee13f
      
https://github.com/qemu/qemu/commit/ea62da0913d20338b8a47bbfaef2e8f2763ee13f
  Author: Peter Maydell <address@hidden>
  Date:   2018-02-07 (Wed, 07 Feb 2018)

  Changed paths:
    M hw/core/qdev-properties.c
    M hw/ppc/spapr_iommu.c
    M hw/vfio/common.c
    M hw/vfio/pci-quirks.c
    M hw/vfio/pci.c
    M hw/vfio/pci.h
    M hw/vfio/platform.c
    M hw/vfio/trace-events
    M include/exec/memory.h
    M include/hw/qdev-properties.h
    M memory.c
    M qapi/common.json
    M target/ppc/kvm.c
    M target/ppc/kvm_ppc.h

  Log Message:
  -----------
  Merge remote-tracking branch 'remotes/awilliam/tags/vfio-update-20180206.0' 
into staging

VFIO updates 2018-02-06

 - SPAPR in-kernel TCE accleration (Alexey Kardashevskiy)

 - MSI-X relocation (Alex Williamson)

 - Add missing platform mutex init (Eric Auger)

 - Redundant variable cleanup (Alexey Kardashevskiy)

 - Option to disable GeForce quirks (Alex Williamson)

# gpg: Signature made Tue 06 Feb 2018 18:21:22 GMT
# gpg:                using RSA key 239B9B6E3BB08B22
# gpg: Good signature from "Alex Williamson <address@hidden>"
# gpg:                 aka "Alex Williamson <address@hidden>"
# gpg:                 aka "Alex Williamson <address@hidden>"
# gpg:                 aka "Alex Williamson <address@hidden>"
# Primary key fingerprint: 42F6 C04E 540B D1A9 9E7B  8A90 239B 9B6E 3BB0 8B22

* remotes/awilliam/tags/vfio-update-20180206.0:
  vfio/pci: Add option to disable GeForce quirks
  vfio/common: Remove redundant copy of local variable
  hw/vfio/platform: Init the interrupt mutex
  vfio/pci: Allow relocating MSI-X MMIO
  qapi: Create DEFINE_PROP_OFF_AUTO_PCIBAR
  vfio/pci: Emulate BARs
  vfio/pci: Add base BAR MemoryRegion
  vfio/pci: Fixup VFIOMSIXInfo comment
  spapr/iommu: Enable in-kernel TCE acceleration via VFIO KVM device
  vfio/spapr: Use iommu memory region's get_attr()
  memory/iommu: Add get_attr()

Signed-off-by: Peter Maydell <address@hidden>


Compare: https://github.com/qemu/qemu/compare/0833df03f420...ea62da0913d2

reply via email to

[Prev in Thread] Current Thread [Next in Thread]