qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v1] docs/devel: Add VFIO device migration documentation


From: Kirti Wankhede
Subject: [PATCH v1] docs/devel: Add VFIO device migration documentation
Date: Thu, 29 Oct 2020 11:23:11 +0530

Document interfaces used for VFIO device migration. Added flow of state
changes during live migration with VFIO device.

Signed-off-by: Kirti Wankhede <kwankhede@nvidia.com>
---
 MAINTAINERS                   |   1 +
 docs/devel/vfio-migration.rst | 119 ++++++++++++++++++++++++++++++++++++++++++
 2 files changed, 120 insertions(+)
 create mode 100644 docs/devel/vfio-migration.rst

diff --git a/MAINTAINERS b/MAINTAINERS
index 6a197bd358d6..6f3fcffc6b3d 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1728,6 +1728,7 @@ M: Alex Williamson <alex.williamson@redhat.com>
 S: Supported
 F: hw/vfio/*
 F: include/hw/vfio/
+F: docs/devel/vfio-migration.rst
 
 vfio-ccw
 M: Cornelia Huck <cohuck@redhat.com>
diff --git a/docs/devel/vfio-migration.rst b/docs/devel/vfio-migration.rst
new file mode 100644
index 000000000000..dab9127825e4
--- /dev/null
+++ b/docs/devel/vfio-migration.rst
@@ -0,0 +1,119 @@
+=====================
+VFIO device Migration
+=====================
+
+VFIO devices use iterative approach for migration because certain VFIO devices
+(e.g. GPU) have large amount of data to be transfered. The iterative pre-copy
+phase of migration allows for the guest to continue whilst the VFIO device 
state
+is transferred to destination, this helps to reduce the total downtime of the
+VM. VFIO devices can choose to skip the pre-copy phase of migration by 
returning
+pending_bytes as zero during pre-copy phase.
+
+Detailed description of UAPI for VFIO device for migration is in the comment
+above ``vfio_device_migration_info`` structure definition in header file
+linux-headers/linux/vfio.h.
+
+VFIO device hooks for iterative approach:
+-  A ``save_setup`` function that setup migration region, sets _SAVING flag in
+VFIO device state and inform VFIO IOMMU module to start dirty page tracking.
+
+- A ``load_setup`` function that setup migration region on the destination and
+sets _RESUMING flag in VFIO device state.
+
+- A ``save_live_pending`` function that reads pending_bytes from vendor driver
+that indicate how much more data the vendor driver yet to save for the VFIO
+device.
+
+- A ``save_live_iterate`` function that reads VFIO device's data from vendor
+driver through migration region during iterative phase.
+
+- A ``save_live_complete_precopy`` function that resets _RUNNING flag from VFIO
+device state, saves device config space, if any, and iteratively copies
+remaining data for VFIO device till pending_bytes returned by vendor driver
+is zero.
+
+- A ``load_state`` function loads config section and data sections generated by
+above save functions.
+
+- ``cleanup`` functions for both save and load that unmap migration region.
+
+VM state change handler is registered to change VFIO device state based on VM
+state change.
+
+Similarly, a migration state change notifier is added to get a notification on
+migration state change. These states are translated to VFIO device state and
+conveyed to vendor driver.
+
+System memory dirty pages tracking
+----------------------------------
+
+A ``log_sync`` memory listener callback is added to mark system memory pages
+as dirty which are used for DMA by VFIO device. Dirty pages bitmap is queried
+per container. All pages pinned by vendor driver through vfio_pin_pages()
+external API have to be marked as dirty during migration. When there are CPU
+writes, CPU dirty page tracking can identify dirtied pages, but any page pinned
+by vendor driver can also be written by device. There is currently no device
+which has hardware support for dirty page tracking. So all pages which are
+pinned by vendor driver are considered as dirty.
+Dirty pages are tracked when device is in stop-and-copy phase because if pages
+are marked dirty during pre-copy phase and content is transfered from source to
+destination, there is no way to know newly dirtied pages from the point they
+were copied earlier until device stops. To avoid repeated copy of same content,
+pinned pages are marked dirty only during stop-and-copy phase.
+
+System memory dirty pages tracking when vIOMMU is enabled
+---------------------------------------------------------
+With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase
+of migration. In that case, unmap ioctl returns pages pinned in that range and
+QEMU reports corresponding guest physical pages dirty.
+During stop-and-copy phase, an IOMMU notifier is used to get a callback for
+mapped pages and then dirty pages bitmap is fetched from VFIO IOMMU modules for
+those mapped ranges.
+
+Flow of state changes during Live migration
+===========================================
+Below is the flow of state change during live migration where states in 
brackets
+represent VM state, migration state and VFIO device state as:
+                (VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)
+
+Live migration save path
+------------------------
+                        QEMU normal running state
+                        (RUNNING, _NONE, _RUNNING)
+                                    |
+                       migrate_init spawns migration_thread
+                Migration thread then calls each device's .save_setup()
+                        (RUNNING, _SETUP, _RUNNING|_SAVING)
+                                    |
+                        (RUNNING, _ACTIVE, _RUNNING|_SAVING)
+            If device is active, get pending_bytes by .save_live_pending()
+         if total pending_bytes >= threshold_size, call .save_live_iterate()
+                  Data of VFIO device for pre-copy phase is copied
+     Iterate till total pending bytes converge and are less than threshold
+                                    |
+   On migration completion, vCPUs stops and calls .save_live_complete_precopy
+   for each active device. VFIO device is then transitioned in _SAVING state
+                    (FINISH_MIGRATE, _DEVICE, _SAVING)
+                                    |
+For VFIO device, iterate in .save_live_complete_precopy until pending data is 0
+                    (FINISH_MIGRATE, _DEVICE, _STOPPED)
+                                    |
+                    (FINISH_MIGRATE, _COMPLETED, _STOPPED)
+                Migraton thread schedule cleanup bottom half and exit
+
+Live migration resume path
+--------------------------
+
+             Incoming migration calls .load_setup for each device
+                        (RESTORE_VM, _ACTIVE, _STOPPED)
+                                    |
+    For each device, .load_state is called for that device section data
+                        (RESTORE_VM, _ACTIVE, _RESUMING)
+                                    |
+    At the end, called .load_cleanup for each device and vCPUs are started     
                   |
+                        (RUNNING, _NONE, _RUNNING)
+
+
+Postcopy
+========
+Postcopy migration is not supported for VFIO devices.
-- 
2.7.0




reply via email to

[Prev in Thread] Current Thread [Next in Thread]