qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH RFC 0/2] Fix migration issues


From: Fei Li
Subject: Re: [Qemu-devel] [PATCH RFC 0/2] Fix migration issues
Date: Thu, 25 Oct 2018 17:04:00 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.9.1



On 10/25/2018 05:27 AM, Peter Xu wrote:
On Mon, Oct 22, 2018 at 07:08:52PM +0800, Fei Li wrote:
Hi,
these two patches are to fix live migration issues. The first is
about multifd, and the second is to fix some error handling.

But I have a question about using multifd migration.
In our current code, when multifd is used during migration, if there
is an error before the destination receives all new channels (I mean
multifd_recv_new_channel(ioc)), the destination does not exit but
keeps waiting (Hang in recvmsg() in qio_channel_socket_readv) until
the source exits.

My question is about the state of the destination host if fails during
this period. I did a test, after applying [1/2] patch, if
multifd_new_send_channel_async() fails, the destination host hangs for
a while then later pops up a window saying
     "'QEMU (...) [stopped]' is not responding.
     You may choose to wait a short while for it to continue or force
     the application to quit entirely."
But after closing the window by clicking, the qemu on the dest still
hangs there until I exclusively kill the qemu on the source.

The source host keeps running as expected, but I guess the hang
phenonmenon in the dest is not right.
Would someone kindly give some suggestions on this? Thanks a lot.
Note that it's during KVM forum so the response from anyone might be
slow (it ends this week).
Thanks for the kindly reminder. :)
I think the thing you described seems normal since we can't guarantee
the network is always stable, normally I'll expect that the migration
will fail but it won't matter much since after all it's a precopy so
we lose nothing.  So I'm curious about when the error you mentioned
happens (e.g., total channel number is N, you only got M channels
connected, with M < N) could you just simply kill the destination?
Then AFAIU the source can just continue to run, right?
Yes, for the M < N situation, IMO the destination can be simply killed by
adding exit(EXIT_FAILURE) when it failed to receive packet via some
channel. The code is as below which has been tested, and result is the
source continues to run and the destination exits.
I'd like to write a separate patch if the below code/idea is acceptable
to fix the hang issue.

@@ -1325,22 +1325,24 @@ bool multifd_recv_all_channels_created(void)
 /* Return true if multifd is ready for the migration, otherwise false */
 bool multifd_recv_new_channel(QIOChannel *ioc)
 {
+    MigrationIncomingState *mis = migration_incoming_get_current();
     MultiFDRecvParams *p;
     Error *local_err = NULL;
     int id;

     id = multifd_recv_initial_packet(ioc, &local_err);
     if (id < 0) {
-        multifd_recv_terminate_threads(local_err);
-        return false;
+        error_reportf_err(local_err,
+                          "failed to receive packet via multifd channel %x: ",
+                          multifd_recv_state->count);
+        goto fail;
     }

     p = &multifd_recv_state->params[id];
     if (p->c != NULL) {
         error_setg(&local_err, "multifd: received id '%d' already setup'",
                    id);
-        multifd_recv_terminate_threads(local_err);
-        return false;
+        goto fail;
     }
     p->c = ioc;
     object_ref(OBJECT(ioc));
@@ -1352,6 +1354,11 @@ bool multifd_recv_new_channel(QIOChannel *ioc)
                        QEMU_THREAD_JOINABLE);
     atomic_inc(&multifd_recv_state->count);
     return multifd_recv_state->count == migrate_multifd_channels();
+fail:
+    multifd_recv_terminate_threads(local_err);
+    qemu_fclose(mis->from_src_file);
+    mis->from_src_file = NULL;
+    exit(EXIT_FAILURE);
 }

Have a nice day, thanks a lot
Fei

Fei Li (2):
   migration: fix the multifd code
   migration: fix some error handling

  migration/migration.c    |  5 +----
  migration/postcopy-ram.c |  3 +++
  migration/ram.c          | 33 +++++++++++++++++++++++----------
  migration/ram.h          |  2 +-
  4 files changed, 28 insertions(+), 15 deletions(-)

--
2.13.7

Regards,





reply via email to

[Prev in Thread] Current Thread [Next in Thread]