[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device
From: |
Alex Williamson |
Subject: |
Re: [Qemu-devel] RFC [v2]: vfio / device assignment -- layout of device fd files |
Date: |
Mon, 19 Sep 2011 09:16:00 -0600 |
On Fri, 2011-09-09 at 08:11 -0500, Stuart Yoder wrote:
> Based on the discussions over the last couple of weeks
> I have updated the device fd file layout proposal and
> tried to specify it a bit more formally.
>
> ===============================================================
>
> 1. Overview
>
> This specification describes the layout of device files
> used in the context of vfio, which gives user space
> direct access to I/O devices that have been bound to
> vfio.
>
> When a device fd is opened and read, offset 0x0 contains
> a fixed sized header followed by a number of variable length
> records that describe different characteristics
> of the device-- addressable regions, interrupts, etc.
>
> 0x0 +-------------+-------------+
> | magic | u32 // identifies this as a vfio
> device file
> +---------------------------+ and identifies the type of bus
> | version | u32 // specifies the version of this
> +---------------------------+
> | flags | u32 // encodes any flags
> +---------------------------+
> | dev info record 0 |
> | type | u32 // type of record
> | rec_len | u32 // length in bytes of record
> | | (including record header)
> | flags | u32 // type specific flags
> | ...content... | // record content, which could
> +---------------------------+ // include sub-records
> | dev info record 1 |
> +---------------------------+
> | dev info record N |
> +---------------------------+
>
> The device info records following the file header may have
> the following record types each with content encoded in
> a record specific way:
>
> ------------+-------+------------------------------------------------------
> | type |
> Region | num | Description
> ---------------------------------------------------------------------------
> REGION 1 describes an addressable address range for the device
> DTPATH 2 describes the device tree path for the device
> DTINDEX 3 describes the index into the related device tree
> property (reg,ranges,interrupts,interrupt-map)
> INTERRUPT 4 describes an interrupt for the device
> PCI_CONFIG_SPACE 5 property identifying a region as PCI config space
> PCI_BAR_INDEX 6 describes the BAR index for a PCI region
> PHYS_ADDR 7 describes the physical address of the region
> ---------------------------------------------------------------------------
>
> 2. Header
>
> The header is located at offset 0x0 in the device fd
> and has the following format:
>
> struct devfd_header {
> __u32 magic;
> __u32 version;
> __u32 flags;
> };
>
> The 'magic' field contains a magic value that will
> identify the type bus the device is on. Valid values
> are:
>
> 0x70636900 // "pci" - PCI device
> 0x64740000 // "dt" - device tree (system bus)
>
> 3. Region
>
> A REGION record an addressable address region for the device.
>
> struct devfd_region {
> __u32 type; // must be 0x1
> __u32 record_len;
> __u32 flags;
> __u64 offset; // seek offset to region from beginning
> // of file
> __u64 len ; // length of the region
> };
>
> The 'flags' field supports one flag:
>
> IS_MMAPABLE
>
> 4. Device Tree Path (DTPATH)
>
> A DTPATH record is a sub-record of a REGION and describes
> the path to a device tree node for the region
Can we better distinguish sub-records from records? I assume we're
trying to be as versatile as possible by having a single "type" address
space, but is this going to lead to implementation problems? A DTPATH
as a record, an INTERRUPT as a sub-record, etc. Should we instead have
a "subtype" address space per "type" and per device type? For a "dt"
device, it looks like we really have:
* REGION (type 0)
* DTPATH (subtype 0)
* DTINDEX (subtype 1)
* PHYS_ADDR (subtype 2)
* INTERRUPT (type 1)
* DTPATH (subtype 0)
* DTINDEX (subtype 1)
While "pci" is:
* REGION (type 0)
* PCI_CONFIG_SPACE (subtype 0)
* PCI_BAR_INDEX (subtype 1)
* INTERRUPT (type 1)
> struct devfd_dtpath {
> __u32 type; // must be 0x2
> __u32 record_len;
> __u64 char[] ; // length of the region
> };
>
> 5. Device Tree Index (DTINDEX)
>
> A DTINDEX record is a sub-record of a REGION and specifies
> the index into the resource list encoded in the associated
> device tree property-- "reg", "ranges", "interrupts", or
> "interrupt-map".
>
> struct devfd_dtindex {
> __u32 type; // must be 0x3
> __u32 record_len;
> __u32 prop_type;
> __u32 prop_index; // index into the resource list
> };
>
> prop_type must have one of the follow values:
> 1 // "reg" property
> 2 // "ranges" property
> 3 // "interrupts" property
> 4 // "interrupts" property
>
> Note: prop_index is not the byte offset into the property,
> but the logical index.
>
> 6. Interrupts (INTERRUPT)
>
> An INTERRUPT record describes one of a device's interrupts.
> The handle field is an argument to VFIO_DEVICE_GET_IRQ_FD
> which user space can use to receive device interrupts.
>
> struct devfd_interrupts {
> __u32 type; // must be 0x4
> __u32 record_len;
> __u32 flags;
> __u32 handle; // parameter to VFIO_DEVICE_GET_IRQ_FD
> };
I'm still on the fence whether we should implement INTERRUPT for PCI or
only assume handle 0x0 or maybe assume handle == interrupt pin.
>
> 7. PCI Config Space (PCI_CONFIG_SPACE)
>
> A PCI_CONFIG_SPACE record is a sub-record of a REGION record
> and identifies the region as PCI configuration space.
>
> struct devfd_cfgspace {
> __u32 type; // must be 0x5
> __u32 record_len;
> __u32 flags;
> }
>
> 8. PCI Bar Index (PCI_BAR_INDEX)
>
> A PCI_BAR_INDEX record is a sub-record of a REGION record
> and identifies the PCI BAR index for the region.
>
> struct devfd_barindex {
> __u32 type; // must be 0x6
> __u32 record_len;
> __u32 flags;
> __u32 bar_index;
> }
I suppose we're more concerned with easy parsing and alignment than
compactness, so a u32 to differentiate 6 BARS + 1 ROM is probably ok.
>
> 9. Physical Address (PHYS_ADDR)
>
> A PHYS_ADDR record is a sub-record of a REGION record
> and specifies the physical address of the region.
>
> struct devfd_physaddr {
> __u32 type; // must be 0x7
> __u32 record_len;
> __u32 flags;
> __u64 phys_addr;
> }
Thanks,
Alex