[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device
From: |
Alex Williamson |
Subject: |
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files |
Date: |
Mon, 26 Sep 2011 12:34:52 -0600 |
On Mon, 2011-09-26 at 12:04 +0200, Alexander Graf wrote:
> Am 26.09.2011 um 09:51 schrieb David Gibson <address@hidden>:
>
> > On Fri, Sep 09, 2011 at 08:11:54AM -0500, Stuart Yoder wrote:
> >> Based on the discussions over the last couple of weeks
> >> I have updated the device fd file layout proposal and
> >> tried to specify it a bit more formally.
> >>
> >> ===============================================================
> >>
> >> 1. Overview
> >>
> >> This specification describes the layout of device files
> >> used in the context of vfio, which gives user space
> >> direct access to I/O devices that have been bound to
> >> vfio.
> >>
> >> When a device fd is opened and read, offset 0x0 contains
> >> a fixed sized header followed by a number of variable length
> >> records that describe different characteristics
> >> of the device-- addressable regions, interrupts, etc.
> >>
> >> 0x0 +-------------+-------------+
> >> | magic | u32 // identifies this as a vfio
> >> device file
> >> +---------------------------+ and identifies the type of bus
> >> | version | u32 // specifies the version of this
> >> +---------------------------+
> >> | flags | u32 // encodes any flags
> >> +---------------------------+
> >> | dev info record 0 |
> >> | type | u32 // type of record
> >> | rec_len | u32 // length in bytes of record
> >> | | (including record header)
> >> | flags | u32 // type specific flags
> >> | ...content... | // record content, which could
> >> +---------------------------+ // include sub-records
> >> | dev info record 1 |
> >> +---------------------------+
> >> | dev info record N |
> >> +---------------------------+
> >
> > I really should have chimed in on this earlier, but I've been very
> > busy.
> >
> > Um, not to put too fine a point on it, this is madness.
> >
> > Yes, it's very flexible and can thereby cover a very wide range of
> > cases. But it's much, much too complex. Userspace has to parse a
> > complex, multilayered data structure, with variable length elements
> > just to get an address at which to do IO. I can pretty much guarantee
> > that if we went with this, most userspace programs using this
> > interface would just ignore this metadata and directly map the
> > offsets at which they happen to know the kernel will put things for
> > the type of device they care about.
> >
> > _At least_ for PCI, I think the original VFIO layout of each BAR at a
> > fixed, well known offset is much better. Despite its limitations,
> > just advertising a "device type" ID which describes one of a few fixed
> > layouts would be preferable to this. I'm still hoping, that we can do
> > a bit better than that. But we should try really hard to at the very
> > least force the metadata into a simple array of resources each with a
> > fixed size record describing it, even if it means some space wastage
> > with occasionally-used fields. Anything more complex than that and
> > userspace is just never going to use it properly.
>
> We will have 2 different types of user space. One wants to be as
> generic as possible and needs all this dynamic information. QEMU would
> fall into this category.
>
> The other one is device specific drivers in user space. Here
> hardcoding might make sense.
>
> For the generic interface, we need something that us as verbose as
> possible and lets us enumerate all the device properties, so we can
> properly map and forward them to the guest.
>
> However, nothing keeps us from mapping BARs always at static offsets
> into the file. If you don't need the generic info, then you don't need
> it.
>
> Also, if you can come up with an interface that does not have variable
> length descriptors but is still able to export all the required
> generic information, please send a proposal to the list :)
>
Hi,
The other obvious possibility is a pure ioctl interface. To match what
this proposal is trying to describe, plus the runtime interfaces, we'd
need something like:
/* :0 - PCI devices, :1 - Devices path device, 63:2 - reserved */
#define VFIO_DEVICE_GET_FLAGS _IOR(, , u64)
/* Return number of mmio/iop/config regions.
* For PCI this is always 8 (BAR0-5 + ROM + Config) */
#define VFIO_DEVICE_GET_NUM_REGIONS _IOR(, , int)
/* Return length for region index (may be zero) */
#define VFIO_DEVICE_GET_REGION_LEN _IOWR(, , u64)
/* Return flags for region index
* :0 - mmap'able, :1 - read-only, 63:2 - reserved */
#define VFIO_DEVICE_GET_REGION_FLAGS _IOR(, , u64)
/* Return file offset for region index */
#define VFIO_DEVICE_GET_REGION_OFFSET _IOWR(, , u64)
/* Return physical address for region index - not implemented for PCI */
#define VFIO_DEVICE_GET_REGION_PHYS_ADDR _IOWR(, , u64)
/* Return number of IRQs (Not including MSI/MSI-X for PCI) */
#define VFIO_DEVICE_GET_NUM_IRQ _IOR(, , int)
/* Set IRQ eventfd for IRQ index, arg[0] = index, arg[1] = fd */
#define VFIO_DEVICE_SET_IRQ_EVENTFD _IOW(, , int)
/* Unmask IRQ index */
#define VFIO_DEVICE_UNMASK_IRQ _IOW(, , int)
/* Set unmask eventfd for index, arg[0] = index, arg[1] = fd */
#define VFIO_DEVICE_SET_UNMASK_IRQ_EVENTFD _IOW(, , int)
/* Return the device tree path for type/index into the user
* allocated buffer */
struct dtpath {
u32 type; (0 = region, 1 = IRQ)
u32 index;
u32 buf_len;
char *buf;
};
#define VFIO_DEVICE_GET_DTPATH _IOWR(, , struct dtpath)
/* Return the device tree index for type/index */
struct dtindex {
u32 type; (0 = region, 1 = IRQ)
u32 index;
u32 prop_type;
u32 prop_index;
};
#define VFIO_DEVICE_GET_DTINDEX _IOWR(, , struct dtindex)
/* Reset the device */
#define VFIO_DEVICE_RESET _IO(, ,)
/* PCI MSI setup, arg[0] = #, arg[1-n] = eventfds */
#define VFIO_DEVICE_PCI_SET_MSI_EVENTFDS _IOW(, , int)
#define VFIO_DEVICE_PCI_SET_MSIX_EVENTFDS _IOW(, , int)
Hope that covers it. Something I prefer about this interface is that
everything can easily be generated on the fly, whereas reading out a
table from the device means we really need to have that table somewhere
in kernel memory to easily support reading random offsets. Thoughts?
Thanks,
Alex
- [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, Stuart Yoder, 2011/09/09
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, David Gibson, 2011/09/26
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, Alexander Graf, 2011/09/26
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, Scott Wood, 2011/09/26
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, Alex Williamson, 2011/09/26
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, Scott Wood, 2011/09/27
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, Alex Williamson, 2011/09/27
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, Alexander Graf, 2011/09/28
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, David Gibson, 2011/09/30
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, David Gibson, 2011/09/30
- Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files, David Gibson, 2011/09/30