[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER |
Date: |
Thu, 05 Oct 2023 10:24:54 -0300 |
Peter Xu <peterx@redhat.com> writes:
> From: Fabiano Rosas <farosas@suse.de>
>
> To do so, create two paired sockets, but make them not providing real data.
> Feed those fake sockets to src/dst QEMUs for recovery to let them go into
> RECOVER stage without going out. Test that we can always kick it out and
> recover again with the right ports.
>
> This patch is based on Fabiano's version here:
>
> https://lore.kernel.org/r/877cowmdu0.fsf@suse.de
>
> Signed-off-by: Fabiano Rosas <farosas@suse.de>
> [peterx: write commit message, remove case 1, fix bugs, and more]
> Signed-off-by: Peter Xu <peterx@redhat.com>
> ---
> tests/qtest/migration-test.c | 94 ++++++++++++++++++++++++++++++++++++
> 1 file changed, 94 insertions(+)
>
> diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
> index 46f1c275a2..fb7a3765e4 100644
> --- a/tests/qtest/migration-test.c
> +++ b/tests/qtest/migration-test.c
> @@ -729,6 +729,7 @@ typedef struct {
> /* Postcopy specific fields */
> void *postcopy_data;
> bool postcopy_preempt;
> + bool postcopy_recovery_test_fail;
> } MigrateCommon;
>
> static int test_migrate_start(QTestState **from, QTestState **to,
> @@ -1381,6 +1382,78 @@ static void test_postcopy_preempt_tls_psk(void)
> }
> #endif
>
> +static void wait_for_postcopy_status(QTestState *one, const char *status)
> +{
> + wait_for_migration_status(one, status,
> + (const char * []) { "failed", "active",
> + "completed", NULL });
> +}
> +
> +static void postcopy_recover_fail(QTestState *from, QTestState *to)
> +{
> + int ret, pair1[2], pair2[2];
> + char c;
> +
> + /* Create two unrelated socketpairs */
> + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair1);
> + g_assert_cmpint(ret, ==, 0);
> +
> + ret = qemu_socketpair(PF_LOCAL, SOCK_STREAM, 0, pair2);
> + g_assert_cmpint(ret, ==, 0);
> +
> + /*
> + * Give the guests unpaired ends of the sockets, so they'll all blocked
> + * at reading. This mimics a wrong channel established.
> + */
> + qtest_qmp_fds_assert_success(from, &pair1[0], 1,
> + "{ 'execute': 'getfd',"
> + " 'arguments': { 'fdname': 'fd-mig' }}");
> + qtest_qmp_fds_assert_success(to, &pair2[0], 1,
> + "{ 'execute': 'getfd',"
> + " 'arguments': { 'fdname': 'fd-mig' }}");
> +
> + /*
> + * Write the 1st byte as QEMU_VM_COMMAND (0x8) for the dest socket, to
> + * emulate the 1st byte of a real recovery, but stops from there to
> + * keep dest QEMU in RECOVER. This is needed so that we can kick off
> + * the recover process on dest QEMU (by triggering the G_IO_IN event).
> + *
> + * NOTE: this trick is not needed on src QEMUs, because src doesn't
> + * rely on an pre-existing G_IO_IN event, so it will always trigger the
> + * upcoming recovery anyway even if it can read nothing.
> + */
> +#define QEMU_VM_COMMAND 0x08
> + c = QEMU_VM_COMMAND;
> + ret = send(pair2[1], &c, 1, 0);
> + g_assert_cmpint(ret, ==, 1);
> +
> + migrate_recover(to, "fd:fd-mig");
> + migrate_qmp(from, "fd:fd-mig", "{'resume': true}");
> +
> + /*
> + * Make sure both QEMU instances will go into RECOVER stage, then test
> + * kicking them out using migrate-pause.
> + */
> + wait_for_postcopy_status(from, "postcopy-recover");
> + wait_for_postcopy_status(to, "postcopy-recover");
Is this wait out of place? I think we're trying to resume too fast after
migrate_recover():
# {
# "error": {
# "class": "GenericError",
# "desc": "Cannot resume if there is no paused migration"
# }
# }
> +
> + /*
> + * This would be issued by the admin upon noticing the hang, we should
> + * make sure we're able to kick this out.
> + */
> + migrate_pause(from);
> + wait_for_postcopy_status(from, "postcopy-paused");
> +
> + /* Do the same test on dest */
> + migrate_pause(to);
> + wait_for_postcopy_status(to, "postcopy-paused");
> +
> + close(pair1[0]);
> + close(pair1[1]);
> + close(pair2[0]);
> + close(pair2[1]);
> +}
> +
> static void test_postcopy_recovery_common(MigrateCommon *args)
> {
> QTestState *from, *to;
> @@ -1420,6 +1493,15 @@ static void
> test_postcopy_recovery_common(MigrateCommon *args)
> (const char * []) { "failed", "active",
> "completed", NULL });
>
> + if (args->postcopy_recovery_test_fail) {
> + /*
> + * Test when a wrong socket specified for recover, and then the
> + * ability to kick it out, and continue with a correct socket.
> + */
> + postcopy_recover_fail(from, to);
> + /* continue with a good recovery */
> + }
> +
> /*
> * Create a new socket to emulate a new channel that is different
> * from the broken migration channel; tell the destination to
> @@ -1459,6 +1541,15 @@ static void test_postcopy_recovery_compress(void)
> test_postcopy_recovery_common(&args);
> }
>
> +static void test_postcopy_recovery_double_fail(void)
> +{
> + MigrateCommon args = {
> + .postcopy_recovery_test_fail = true,
> + };
> +
> + test_postcopy_recovery_common(&args);
> +}
> +
> #ifdef CONFIG_GNUTLS
> static void test_postcopy_recovery_tls_psk(void)
> {
> @@ -2841,6 +2932,9 @@ int main(int argc, char **argv)
> qtest_add_func("/migration/postcopy/recovery/compress/plain",
> test_postcopy_recovery_compress);
> }
> + qtest_add_func("/migration/postcopy/recovery/double-failures",
> + test_postcopy_recovery_double_fail);
> +
> }
>
> qtest_add_func("/migration/bad_dest", test_baddest);
[PATCH v3 04/10] migration: Deliver return path file error to migrate state too, Peter Xu, 2023/10/04
[PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Peter Xu, 2023/10/04
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER,
Fabiano Rosas <=
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Fabiano Rosas, 2023/10/05
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Peter Xu, 2023/10/05
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Fabiano Rosas, 2023/10/05
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Peter Xu, 2023/10/05
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Fabiano Rosas, 2023/10/05
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Fabiano Rosas, 2023/10/09
- Re: [PATCH v3 10/10] tests/migration-test: Add a test for postcopy hangs during RECOVER, Peter Xu, 2023/10/10
[PATCH v3 09/10] migration: Allow RECOVER->PAUSED convertion for dest qemu, Peter Xu, 2023/10/04
[PATCH v3 01/10] migration: Display error in query-migrate irrelevant of status, Peter Xu, 2023/10/04