[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[PATCH v1] docs/devel: Add VFIO device migration documentation
From: |
Kirti Wankhede |
Subject: |
[PATCH v1] docs/devel: Add VFIO device migration documentation |
Date: |
Thu, 29 Oct 2020 11:23:11 +0530 |
Document interfaces used for VFIO device migration. Added flow of state
changes during live migration with VFIO device.
Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
---
MAINTAINERS | 1 +
docs/devel/vfio-migration.rst | 119 ++++++++++++++++++++++++++++++++++++++++++
2 files changed, 120 insertions(+)
create mode 100644 docs/devel/vfio-migration.rst
diff --git a/MAINTAINERS b/MAINTAINERS
index 6a197bd358d6..6f3fcffc6b3d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1728,6 +1728,7 @@ M: Alex Williamson <alex.williamson@redhat.com>
S: Supported
F: hw/vfio/*
F: include/hw/vfio/
+F: docs/devel/vfio-migration.rst
vfio-ccw
M: Cornelia Huck <cohuck@redhat.com>
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
new file mode 100644
index 000000000000..dab9127825e4
--- /dev/null
+++ b/docs/devel/vfio-migration.rst
@@ -0,0 +1,119 @@
+=====================
+VFIO device Migration
+=====================
+
+VFIO devices use iterative approach for migration because certain VFIO devices
+(e.g. GPU) have large amount of data to be transfered. The iterative pre-copy
+phase of migration allows for the guest to continue whilst the VFIO device
state
+is transferred to destination, this helps to reduce the total downtime of the
+VM. VFIO devices can choose to skip the pre-copy phase of migration by
returning
+pending_bytes as zero during pre-copy phase.
+
+Detailed description of UAPI for VFIO device for migration is in the comment
+above ``vfio_device_migration_info`` structure definition in header file
+linux-headers/linux/vfio.h.
+
+VFIO device hooks for iterative approach:
+- A ``save_setup`` function that setup migration region, sets _SAVING flag in
+VFIO device state and inform VFIO IOMMU module to start dirty page tracking.
+
+- A ``load_setup`` function that setup migration region on the destination and
+sets _RESUMING flag in VFIO device state.
+
+- A ``save_live_pending`` function that reads pending_bytes from vendor driver
+that indicate how much more data the vendor driver yet to save for the VFIO
+device.
+
+- A ``save_live_iterate`` function that reads VFIO device's data from vendor
+driver through migration region during iterative phase.
+
+- A ``save_live_complete_precopy`` function that resets _RUNNING flag from VFIO
+device state, saves device config space, if any, and iteratively copies
+remaining data for VFIO device till pending_bytes returned by vendor driver
+is zero.
+
+- A ``load_state`` function loads config section and data sections generated by
+above save functions.
+
+- ``cleanup`` functions for both save and load that unmap migration region.
+
+VM state change handler is registered to change VFIO device state based on VM
+state change.
+
+Similarly, a migration state change notifier is added to get a notification on
+migration state change. These states are translated to VFIO device state and
+conveyed to vendor driver.
+
+System memory dirty pages tracking
+----------------------------------
+
+A ``log_sync`` memory listener callback is added to mark system memory pages
+as dirty which are used for DMA by VFIO device. Dirty pages bitmap is queried
+per container. All pages pinned by vendor driver through vfio_pin_pages()
+external API have to be marked as dirty during migration. When there are CPU
+writes, CPU dirty page tracking can identify dirtied pages, but any page pinned
+by vendor driver can also be written by device. There is currently no device
+which has hardware support for dirty page tracking. So all pages which are
+pinned by vendor driver are considered as dirty.
+Dirty pages are tracked when device is in stop-and-copy phase because if pages
+are marked dirty during pre-copy phase and content is transfered from source to
+destination, there is no way to know newly dirtied pages from the point they
+were copied earlier until device stops. To avoid repeated copy of same content,
+pinned pages are marked dirty only during stop-and-copy phase.
+
+System memory dirty pages tracking when vIOMMU is enabled
+---------------------------------------------------------
+With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase
+of migration. In that case, unmap ioctl returns pages pinned in that range and
+QEMU reports corresponding guest physical pages dirty.
+During stop-and-copy phase, an IOMMU notifier is used to get a callback for
+mapped pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for
+those mapped ranges.
+
+Flow of state changes during Live migration
+===========================================
+Below is the flow of state change during live migration where states in
brackets
+represent VM state, migration state and VFIO device state as:
+ (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)
+
+Live migration save path
+------------------------
+ QEMU normal running state
+ (RUNNING, _NONE, _RUNNING)
+ |
+ migrate_init spawns migration_thread
+ Migration thread then calls each device's .save_setup()
+ (RUNNING, _SETUP, _RUNNING|_SAVING)
+ |
+ (RUNNING, _ACTIVE, _RUNNING|_SAVING)
+ If device is active, get pending_bytes by .save_live_pending()
+ if total pending_bytes >= threshold_size, call .save_live_iterate()
+ Data of VFIO device for pre-copy phase is copied
+ Iterate till total pending bytes converge and are less than threshold
+ |
+ On migration completion, vCPUs stops and calls .save_live_complete_precopy
+ for each active device. VFIO device is then transitioned in _SAVING state
+ (FINISH_MIGRATE, _DEVICE, _SAVING)
+ |
+For VFIO device, iterate in .save_live_complete_precopy until pending data is 0
+ (FINISH_MIGRATE, _DEVICE, _STOPPED)
+ |
+ (FINISH_MIGRATE, _COMPLETED, _STOPPED)
+ Migraton thread schedule cleanup bottom half and exit
+
+Live migration resume path
+--------------------------
+
+ Incoming migration calls .load_setup for each device
+ (RESTORE_VM, _ACTIVE, _STOPPED)
+ |
+ For each device, .load_state is called for that device section data
+ (RESTORE_VM, _ACTIVE, _RESUMING)
+ |
+ At the end, called .load_cleanup for each device and vCPUs are started
|
+ (RUNNING, _NONE, _RUNNING)
+
+
+Postcopy
+========
+Postcopy migration is not supported for VFIO devices.
--
2.7.0
- [PATCH v1] docs/devel: Add VFIO device migration documentation,
Kirti Wankhede <=