[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH] migration: Don't serialize migration while can't switchover
From: |
Wang, Lei |
Subject: |
Re: [PATCH] migration: Don't serialize migration while can't switchover |
Date: |
Tue, 27 Feb 2024 11:16:38 +0800 |
User-agent: |
Mozilla Thunderbird |
On 2/22/2024 23:56, Avihai Horon wrote:
> Currently, migration code serializes device data sending during pre-copy
> iterative phase. As noted in the code comment, this is done to prevent
> faster changing device from sending its data over and over.
>
> However, with switchover-ack capability enabled, this behavior can be
> problematic and may prevent migration from converging. The problem lies
> in the fact that an earlier device may never finish sending its data and
> thus block other devices from sending theirs.
>
> This bug was observed in several VFIO migration scenarios where some
> workload on the VM prevented RAM from ever reaching a hard zero, not
> allowing VFIO initial pre-copy data to be sent, and thus destination
> could not ack switchover. Note that the same scenario, but without
> switchover-ack, would converge.
>
> Fix it by not serializing device data sending during pre-copy iterative
> phase if switchover was not acked yet.
Hi Avihai,
Can this bug be solved by ordering the priority of different device's handlers?
>
> Fixes: 1b4adb10f898 ("migration: Implement switchover ack logic")
> Signed-off-by: Avihai Horon <avihaih@nvidia.com>
> ---
> migration/savevm.h | 2 +-
> migration/migration.c | 4 ++--
> migration/savevm.c | 22 +++++++++++++++-------
> 3 files changed, 18 insertions(+), 10 deletions(-)
>
> diff --git a/migration/savevm.h b/migration/savevm.h
> index 74669733dd6..d4a368b522b 100644
> --- a/migration/savevm.h
> +++ b/migration/savevm.h
> @@ -36,7 +36,7 @@ void qemu_savevm_state_setup(QEMUFile *f);
> bool qemu_savevm_state_guest_unplug_pending(void);
> int qemu_savevm_state_resume_prepare(MigrationState *s);
> void qemu_savevm_state_header(QEMUFile *f);
> -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy);
> +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool
> can_switchover);
> void qemu_savevm_state_cleanup(void);
> void qemu_savevm_state_complete_postcopy(QEMUFile *f);
> int qemu_savevm_state_complete_precopy(QEMUFile *f, bool iterable_only,
> diff --git a/migration/migration.c b/migration/migration.c
> index ab21de2cadb..d8bfe1fb1b9 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -3133,7 +3133,7 @@ static MigIterateState
> migration_iteration_run(MigrationState *s)
> }
>
> /* Just another iteration step */
> - qemu_savevm_state_iterate(s->to_dst_file, in_postcopy);
> + qemu_savevm_state_iterate(s->to_dst_file, in_postcopy, can_switchover);
> return MIG_ITERATE_RESUME;
> }
>
> @@ -3216,7 +3216,7 @@ static MigIterateState
> bg_migration_iteration_run(MigrationState *s)
> {
> int res;
>
> - res = qemu_savevm_state_iterate(s->to_dst_file, false);
> + res = qemu_savevm_state_iterate(s->to_dst_file, false, true);
> if (res > 0) {
> bg_migration_completion(s);
> return MIG_ITERATE_BREAK;
> diff --git a/migration/savevm.c b/migration/savevm.c
> index d612c8a9020..3a012796375 100644
> --- a/migration/savevm.c
> +++ b/migration/savevm.c
> @@ -1386,7 +1386,7 @@ int qemu_savevm_state_resume_prepare(MigrationState *s)
> * 0 : We haven't finished, caller have to go again
> * 1 : We have finished, we can go to complete phase
> */
> -int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy)
> +int qemu_savevm_state_iterate(QEMUFile *f, bool postcopy, bool
> can_switchover)
> {
> SaveStateEntry *se;
> int ret = 1;
> @@ -1430,12 +1430,20 @@ int qemu_savevm_state_iterate(QEMUFile *f, bool
> postcopy)
> "%d(%s): %d",
> se->section_id, se->idstr, ret);
> qemu_file_set_error(f, ret);
> + return ret;
> }
> - if (ret <= 0) {
> - /* Do not proceed to the next vmstate before this one reported
> - completion of the current stage. This serializes the migration
> - and reduces the probability that a faster changing state is
> - synchronized over and over again. */
> +
> + if (ret == 0 && can_switchover) {
> + /*
> + * Do not proceed to the next vmstate before this one reported
> + * completion of the current stage. This serializes the migration
> + * and reduces the probability that a faster changing state is
> + * synchronized over and over again.
> + * Do it only if migration can switchover. If migration can't
> + * switchover yet, do proceed to let other devices send their
> data
> + * too, as this may be required for switchover to be acked and
> + * migration to converge.
> + */
> break;
> }
> }
> @@ -1724,7 +1732,7 @@ static int qemu_savevm_state(QEMUFile *f, Error **errp)
> qemu_savevm_state_setup(f);
>
> while (qemu_file_get_error(f) == 0) {
> - if (qemu_savevm_state_iterate(f, false) > 0) {
> + if (qemu_savevm_state_iterate(f, false, true) > 0) {
> break;
> }
> }