|
From: | Avihai Horon |
Subject: | Re: [PATCH for-8.2 v3 1/6] vfio/migration: Move from STOP_COPY to STOP in vfio_save_cleanup() |
Date: | Tue, 8 Aug 2023 09:23:09 +0300 |
User-agent: | Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Thunderbird/102.14.0 |
On 07/08/2023 18:53, Cédric Le Goater wrote:
External email: Use caution opening links or attachments [ Adding Juan and Peter for their awareness ] On 8/2/23 10:14, Avihai Horon wrote:Changing the device state from STOP_COPY to STOP can take time as the device may need to free resources and do other operations as part of the transition. Currently, this is done in vfio_save_complete_precopy() and therefore it is counted in the migration downtime. To avoid this, change the device state from STOP_COPY to STOP in vfio_save_cleanup(), which is called after migration has completed and thus is not part of migration downtime.What bothers me is that this looks like a device specific optimization
True, currently it helps mlx5, but this change is based on the assumption that, in general, VFIO devices are likely to free resources when transitioning from STOP_COPY to STOP.
So I think this is a good change to have in any case.
and we are loosing the error part.
I don't think we lose the error part.AFAIU, the crucial part is transitioning to STOP_COPY and sending the final data.
If that's done successfully, then migration is successful.The STOP_COPY->STOP transition is done as part of the cleanup flow, after the migration is completed -- i.e., failure in it does not affect the success of migration. Further more, if there is an error in the STOP_COPY->STOP transition, then it's reported by vfio_migration_set_state().
I wonder if we could use the PRECOPY_NOTIFY_CLEANUP notifier instead and modify qemu_savevm_state_cleanup() to return the error which could then be handled by the caller.
qemu_savevm_state_cleanup() is called as part of the cleanup flow, so I don't think modifying it to return the error will give us added value.
Unless I missed something?
No need to resend the whole series. I think 2-6 are good for merge, I willprobably push them on vfio-next when -rc3 is out.
Great, thanks!
Signed-off-by: Avihai Horon <avihaih@nvidia.com> --- hw/vfio/migration.c | 19 +++++++++++++------ 1 file changed, 13 insertions(+), 6 deletions(-) diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c index 2674f4bc47..8acd182a8b 100644 --- a/hw/vfio/migration.c +++ b/hw/vfio/migration.c @@ -383,6 +383,19 @@ static void vfio_save_cleanup(void *opaque) VFIODevice *vbasedev = opaque; VFIOMigration *migration = vbasedev->migration; + /*+ * Changing device state from STOP_COPY to STOP can take time. Do it here,+ * after migration has completed, so it won't increase downtime. + */ + if (migration->device_state == VFIO_DEVICE_STATE_STOP_COPY) { + /*+ * If setting the device in STOP state fails, the device should be+ * reset. To do so, use ERROR state as a recover state. + */ + vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, + VFIO_DEVICE_STATE_ERROR); + } + g_free(migration->data_buffer); migration->data_buffer = NULL; migration->precopy_init_size = 0;@@ -508,12 +521,6 @@ static int vfio_save_complete_precopy(QEMUFile *f, void *opaque)return ret; } - /*- * If setting the device in STOP state fails, the device should be reset.- * To do so, use ERROR state as a recover state. - */ - ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_STOP, - VFIO_DEVICE_STATE_ERROR); trace_vfio_save_complete_precopy(vbasedev->name, ret); return ret;
[Prev in Thread] | Current Thread | [Next in Thread] |