[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Qemu-devel] [PATCH v8 01/13] vfio: KABI for migration interface
From: |
Kirti Wankhede |
Subject: |
[Qemu-devel] [PATCH v8 01/13] vfio: KABI for migration interface |
Date: |
Tue, 27 Aug 2019 00:25:41 +0530 |
- Defined MIGRATION region type and sub-type.
- Used 3 bits to define VFIO device states.
Bit 0 => _RUNNING
Bit 1 => _SAVING
Bit 2 => _RESUMING
Combination of these bits defines VFIO device's state during migration
_STOPPED => All bits 0 indicates VFIO device stopped.
_RUNNING => Normal VFIO device running state.
_SAVING | _RUNNING => vCPUs are running, VFIO device is running but start
saving state of device i.e. pre-copy state
_SAVING => vCPUs are stoppped, VFIO device should be stopped, and
save device state,i.e. stop-n-copy state
_RESUMING => VFIO device resuming state.
_SAVING | _RESUMING => Invalid state if _SAVING and _RESUMING bits are set
Bits 3 - 31 are reserved for future use. User should perform
read-modify-write operation on this field.
- Defined vfio_device_migration_info structure which will be placed at 0th
offset of migration region to get/set VFIO device related information.
Defined members of structure and usage on read/write access:
* device_state: (read/write)
To convey VFIO device state to be transitioned to. Only 3 bits are used
as of now, Bits 3 - 31 are reserved for future use.
* pending bytes: (read only)
To get pending bytes yet to be migrated for VFIO device.
* data_offset: (read only)
To get data offset in migration region from where data exist during
_SAVING, from where data should be written by user space application
during _RESUMING state and while read dirty pages bitmap.
* data_size: (read/write)
To get and set size of data copied in migration region during _SAVING
and _RESUMING state.
* start_pfn, page_size, total_pfns: (write only)
To get bitmap of dirty pages from vendor driver from given
start address for total_pfns.
* copied_pfns: (read only)
To get number of pfns bitmap copied in migration region.
Vendor driver should copy the bitmap with bits set only for
pages to be marked dirty in migration region. Vendor driver
should return VFIO_DEVICE_DIRTY_PFNS_NONE if there are 0 pages dirty in
requested range. Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_ALL
to mark all pages in the section as dirty.
Migration region looks like:
------------------------------------------------------------------
|vfio_device_migration_info| data section |
| | /////////////////////////////// |
------------------------------------------------------------------
^ ^ ^
offset 0-trapped part data_offset data_size
Data section is always followed by vfio_device_migration_info
structure in the region, so data_offset will always be non-0.
Offset from where data is copied is decided by kernel driver, data
section can be trapped or mapped depending on how kernel driver
defines data section. If mmapped, then data_offset should be page
aligned, where as initial section which contain vfio_device_migration_info
structure might not end at offset which is page aligned.
Data_offset can be same or different for device data and dirty pages bitmap.
Vendor driver should decide whether to partition data section and how to
partition the data section. Vendor driver should return data_offset
accordingly.
For user application, data is opaque. User should write data in the same
order as received.
Signed-off-by: Kirti Wankhede <address@hidden>
Reviewed-by: Neo Jia <address@hidden>
---
linux-headers/linux/vfio.h | 148 +++++++++++++++++++++++++++++++++++++++++++++
1 file changed, 148 insertions(+)
diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 24f505199f83..4bc0236b0898 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -372,6 +372,154 @@ struct vfio_region_gfx_edid {
*/
#define VFIO_REGION_SUBTYPE_IBM_NVLINK2_ATSD (1)
+/* Migration region type and sub-type */
+#define VFIO_REGION_TYPE_MIGRATION (3)
+#define VFIO_REGION_SUBTYPE_MIGRATION (1)
+
+/**
+ * Structure vfio_device_migration_info is placed at 0th offset of
+ * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related
migration
+ * information. Field accesses from this structure are only supported at their
+ * native width and alignment, otherwise the result is undefined and vendor
+ * drivers should return an error.
+ *
+ * device_state: (read/write)
+ * To indicate vendor driver the state VFIO device should be transitioned
+ * to. If device state transition fails, write on this field return error.
+ * It consists of 3 bits:
+ * - If bit 0 set, indicates _RUNNING state. When its reset, that
indicates
+ * _STOPPED state. When device is changed to _STOPPED, driver should
stop
+ * device before write() returns.
+ * - If bit 1 set, indicates _SAVING state.
+ * - If bit 2 set, indicates _RESUMING state.
+ * Bits 3 - 31 are reserved for future use. User should perform
+ * read-modify-write operation on this field.
+ * _SAVING and _RESUMING bits set at the same time is invalid state.
+ *
+ * pending bytes: (read only)
+ * Number of pending bytes yet to be migrated from vendor driver
+ *
+ * data_offset: (read only)
+ * User application should read data_offset in migration region from where
+ * user application should read device data during _SAVING state or write
+ * device data during _RESUMING state or read dirty pages bitmap. See
below
+ * for detail of sequence to be followed.
+ *
+ * data_size: (read/write)
+ * User application should read data_size to get size of data copied in
+ * migration region during _SAVING state and write size of data copied in
+ * migration region during _RESUMING state.
+ *
+ * start_pfn: (write only)
+ * Start address pfn to get bitmap of dirty pages from vendor driver duing
+ * _SAVING state.
+ *
+ * page_size: (write only)
+ * User application should write the page_size of pfn.
+ *
+ * total_pfns: (write only)
+ * Total pfn count from start_pfn for which dirty bitmap is requested.
+ *
+ * copied_pfns: (read only)
+ * pfn count for which dirty bitmap is copied to migration region.
+ * Vendor driver should copy the bitmap with bits set only for pages to be
+ * marked dirty in migration region.
+ * - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_NONE if none of
the
+ * pages are dirty in requested range or rest of the range.
+ * - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_ALL to mark all
+ * pages dirty in the given range or rest of the range.
+ * - Vendor driver should return pfn count for which bitmap is written in
+ * the region.
+ *
+ * Migration region looks like:
+ * ------------------------------------------------------------------
+ * |vfio_device_migration_info| data section |
+ * | | /////////////////////////////// |
+ * ------------------------------------------------------------------
+ * ^ ^ ^
+ * offset 0-trapped part data_offset data_size
+ *
+ * Data section is always followed by vfio_device_migration_info structure
+ * in the region, so data_offset will always be non-0. Offset from where data
+ * is copied is decided by kernel driver, data section can be trapped or
+ * mapped or partitioned, depending on how kernel driver defines data section.
+ * Data section partition can be defined as mapped by sparse mmap capability.
+ * If mmapped, then data_offset should be page aligned, where as initial
section
+ * which contain vfio_device_migration_info structure might not end at offset
+ * which is page aligned.
+ * Data_offset can be same or different for device data and dirty pages bitmap.
+ * Vendor driver should decide whether to partition data section and how to
+ * partition the data section. Vendor driver should return data_offset
+ * accordingly.
+ *
+ * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy phase
+ * and for _SAVING device state or stop-and-copy phase:
+ * a. read pending_bytes. If pending_bytes > 0, go through below steps.
+ * b. read data_offset, indicates kernel driver to write data to staging
buffer.
+ * c. read data_size, amount of data in bytes written by vendor driver in
+ * migration region.
+ * d. read data_size bytes of data from data_offset in the migration region.
+ * e. process data.
+ * f. Loop through a to e.
+ *
+ * To copy system memory content during migration, vendor driver should be able
+ * to report system memory pages which are dirtied by that driver. For such
+ * dirty page reporting, user application should query for a range of GFNs
+ * relative to device address space (IOVA), then vendor driver should provide
+ * the bitmap of pages from this range which are dirtied by him through
+ * migration region where each bit represents a page and bit set to 1
represents
+ * that the page is dirty.
+ * User space application should take care of copying content of system memory
+ * for those pages.
+ *
+ * Steps to get dirty page bitmap:
+ * a. write start_pfn, page_size and total_pfns.
+ * b. read copied_pfns. Vendor driver should take one of the below action:
+ * - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_NONE if driver
+ * doesn't have any page to report dirty in given range or rest of the
+ * range. Exit the loop.
+ * - Vendor driver should return VFIO_DEVICE_DIRTY_PFNS_ALL to mark all
+ * pages dirty for given range or rest of the range. User space
+ * application mark all pages in the range as dirty and exit the loop.
+ * - Vendor driver should return copied_pfns and provide bitmap for
+ * copied_pfn in migration region.
+ * c. read data_offset, where vendor driver has written bitmap.
+ * d. read bitmap from the migration region from data_offset.
+ * e. Iterate through steps a to d while (total copied_pfns < total_pfns)
+ *
+ * Sequence to be followed while _RESUMING device state:
+ * While data for this device is available, repeat below steps:
+ * a. read data_offset from where user application should write data.
+ * b. write data of data_size to migration region from data_offset.
+ * c. write data_size which indicates vendor driver that data is written in
+ * staging buffer.
+ *
+ * For user application, data is opaque. User should write data in the same
+ * order as received.
+ */
+
+struct vfio_device_migration_info {
+ __u32 device_state; /* VFIO device state */
+#define VFIO_DEVICE_STATE_RUNNING (1 << 0)
+#define VFIO_DEVICE_STATE_SAVING (1 << 1)
+#define VFIO_DEVICE_STATE_RESUMING (1 << 2)
+#define VFIO_DEVICE_STATE_MASK (VFIO_DEVICE_STATE_RUNNING | \
+ VFIO_DEVICE_STATE_SAVING | \
+ VFIO_DEVICE_STATE_RESUMING)
+#define VFIO_DEVICE_STATE_INVALID (VFIO_DEVICE_STATE_SAVING | \
+ VFIO_DEVICE_STATE_RESUMING)
+ __u32 reserved;
+ __u64 pending_bytes;
+ __u64 data_offset;
+ __u64 data_size;
+ __u64 start_pfn;
+ __u64 page_size;
+ __u64 total_pfns;
+ __u64 copied_pfns;
+#define VFIO_DEVICE_DIRTY_PFNS_NONE (0)
+#define VFIO_DEVICE_DIRTY_PFNS_ALL (~0ULL)
+} __attribute__((packed));
+
/*
* The MSIX mappable capability informs that MSIX data of a BAR can be mmapped
* which allows direct access to non-MSIX registers which happened to be within
--
2.7.0
[Qemu-devel] [PATCH v8 02/13] vfio: Add function to unmap VFIO region, Kirti Wankhede, 2019/08/26
[Qemu-devel] [PATCH v8 03/13] vfio: Add vfio_get_object callback to VFIODeviceOps, Kirti Wankhede, 2019/08/26
[Qemu-devel] [PATCH v8 04/13] vfio: Add save and load functions for VFIO PCI devices, Kirti Wankhede, 2019/08/26
[Qemu-devel] [PATCH v8 05/13] vfio: Add migration region initialization and finalize function, Kirti Wankhede, 2019/08/26
[Qemu-devel] [PATCH v8 06/13] vfio: Add VM state change handler to know state of VM, Kirti Wankhede, 2019/08/26