Hey Alex,
Thanks for the detailed explanation!
First answer your question:
Is your "program" just doing a memcpy() with an mmap() of the PCI BAR
acquired through pci-sysfs or a userspace vfio-pci driver within the
guest?
My program is using a usersapcee vfio-pci driver for both emulated device and the assigned device within the guest OS.
The program is in Rust , and I am using std::ptr::write_volatile to do the memory copy.
I tried "x-no-mmap=on” for the assigned device, as you mentioned it did behave same as the emulated device.
I am also suspecting this
Possibly KVM doesn't emulate vmovdqu
relative to an MMIO access, but honestly I'm not positive that AVX
instructions are meant to work on MMIO space.
Do you have some suggestions to verify this ? Or some code pointer to check on this ?
Thanks ,
Xu
On Mar 4, 2024, at 4:59 PM, Alex Williamson <alex.williamson@redhat.com> wrote:
!-------------------------------------------------------------------|
This
Message Is From an External Sender
|-------------------------------------------------------------------!
On
Sun, 3 Mar 2024 22:20:33 +0000
Xu
Liu <liuxu@meta.com>
wrote:
Hello,
Recently I am running my programs in QEMU (x86_64) with “-accel=kvm”.
The QEMU version is 6.0.0.
I run my programs in two ways:
1. I pass through my device through vfio-pci to QEMU, this way
works well.
2. I write an emulated PCI device for QEMU, and run my programs on
the emulated PCI device. This crashes when the code try to do memory
copy to PCI device when the data length is longer than 16 bytes.
While the passthrough device works well for the same situation.
After dump the assembly code. I noticed when the data is <= 16
bytes, the mov assembly code is chosen, and it works well.
When the data is > 16 bytes, the vmovdqu assembly code is chosen,
and it crashes with “illegal operand”.
Given the code and data are exactly same for both passthrough device
and emulated device. I am curious about why this happens.
After turn on kernel trace for kvm by echo kvm:*
/sys/kernel/debug/tracing/set_event And rerun the QEMU and my code
for both passthrough device and emulated device, I noticed that:
1) for passthrough device, I didn’t see any trace events related to
my gva and gpa. This makes me think that the memory copy to PCI
device went through different code path . It is handled by the guest
OS without exit to VMX.
2) for emulated device, if I use compiler flag
target-feature=-avx,-avx2 to force compiler use mov assembly code,
I can see the memory copy goes through the KVM_EXIT_MMIO, and
everything works well. if I don’t force the compiler use mov , the
compiler just chooses the vmovdqu , which just crash the programs,
and no KVM_EXIT_MMIO related to my memory copy appears in the trace
events. Looks like the guest OS handles the crash.
Any clue about why the vmovdqu works for passthrough device but not
work for emulated device.
For
an assigned device, the device MMIO space will be directly mapped
into
the VM address space (assuming the PCI BAR is at least PAGE_SIZE),
so
there's no emulation of the access. You can disable this with the
x-no-mmap=on
option for the vfio-pci device, where then I'd guess this
behaves
the same as your emulated device (assuming we really don't
reach
QEMU for the access).
Since
you're not seeing a KVM_EXIT_MMIO I'd guess this is more of a KVM
issue
than QEMU (Cc kvm list). Possibly KVM doesn't emulate vmovdqu
relative
to an MMIO access, but honestly I'm not positive that AVX
instructions
are meant to work on MMIO space. I'll let x86 KVM experts
more
familiar with specific opcode semantics weigh in on that.
Is
your "program" just doing a memcpy() with an mmap() of the PCI BAR
acquired
through pci-sysfs or a userspace vfio-pci driver within the
guest?
In
QEMU 4a2e242bbb30 ("memory: Don't use memcpy for ram_device
regions")
we resolved an issue[1] where QEMU itself was doing a memcpy()
to
assigned device MMIO space resulting in breaking functionality of
the
device. IIRC memcpy() was using an SSE instruction that didn't
fault,
but didn't work correctly relative to MMIO space either.
So
I also wouldn't rule out that the program isn't inherently
misbehaving
by using memcpy() and thereby ignoring the nature of the
device
MMIO access semantics. Thanks,
Alex
[1]https://bugs.launchpad.net/qemu/+bug/1384892
|