[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content o

From: Li Zhijian
Subject: Re: [Qemu-devel] [TCG only][Migration Bug? ] Occasionally, the content of VM's memory is inconsistent between Source and Destination of migration
Date: Thu, 3 Dec 2015 18:23:02 +0800
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:38.0) Gecko/20100101 Thunderbird/38.4.0

On 12/03/2015 05:24 PM, Dr. David Alan Gilbert wrote:
* Li Zhijian (address@hidden) wrote:
Hi all,

Does anyboday remember the similar issue post by hailiang months ago
At least tow bugs about migration had been fixed since that.
Yes, I wondered what happened to that.

And now we found the same issue at the tcg vm(kvm is fine), after migration,
the content VM's memory is inconsistent.
Hmm, TCG only - I don't know much about that; but I guess something must
be accessing memory without using the proper macros/functions so
it doesn't mark it as dirty.

we add a patch to check memory content, you can find it from affix

steps to reporduce:
1) apply the patch and re-build qemu
2) prepare the ubuntu guest and run memtest in grub.
soruce side:
x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine

destination side:
x86_64-softmmu/qemu-system-x86_64 -netdev tap,id=hn0 -device
e1000,id=net-pci0,netdev=hn0,mac=52:54:00:12:34:65 -boot c -drive
if=none,file=/home/lizj/ubuntu.raw,id=drive-virtio-disk0 -device
-vnc :7 -m 128 -smp 1 -device piix3-usb-uhci -device usb-tablet -qmp
tcp::4444,server,nowait -monitor stdio -cpu qemu64 -machine
pc-i440fx-2.3,accel=tcg,usb=off -incoming tcp:0:8881

3) start migration
with 1000M NIC, migration will finish within 3 min.

at source:
(qemu) migrate tcp:
after saving ram complete
qemu-system-x86_64: end ram md5

at destination:
Completed load of VM with exit code 0 seq iteration 1264
Completed load of VM with exit code 0 seq iteration 1265
Completed load of VM with exit code 0 seq iteration 1266
qemu-system-x86_64: after loading state section id 2(ram)
qemu-system-x86_64: end ram md5
qemu-system-x86_64: qemu_loadvm_state: after cpu_synchronize_all_post_init

qemu-system-x86_64: end ram md5

This occurs occasionally and only at tcg machine. It seems that
some pages dirtied in source side don't transferred to destination.
This problem can be reproduced even if we disable virtio.

Is it OK for some pages that not transferred to destination when do
migration ? Or is it a bug?
I'm pretty sure that means it's a bug.  Hard to find though, I guess
at least memtest is smaller than a big OS.  I think I'd dump the whole
of memory on both sides, hexdump and diff them  - I'd guess it would
just be one byte/word different, maybe that would offer some idea what
wrote it.
I try to dump and compare them, more than 10 pages are different.
in source side, they are random value rather than always 'FF' 'FB' 'EF' 'BF'... in destination.
and not all of the different pages are continuous.



Any idea...

=================md5 check patch=============================

diff --git a/Makefile.target b/Makefile.target
index 962d004..e2cb8e9 100644
--- a/Makefile.target
+++ b/Makefile.target
@@ -139,7 +139,7 @@ obj-y += memory.o cputlb.o
  obj-y += memory_mapping.o
  obj-y += dump.o
  obj-y += migration/ram.o migration/savevm.o
-LIBS := $(libs_softmmu) $(LIBS)
+LIBS := $(libs_softmmu) $(LIBS) -lplumb

  # xen support
  obj-$(CONFIG_XEN) += xen-common.o
diff --git a/migration/ram.c b/migration/ram.c
index 1eb155a..3b7a09d 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2513,7 +2513,7 @@ static int ram_load(QEMUFile *f, void *opaque, int

-    DPRINTF("Completed load of VM with exit code %d seq iteration "
+    fprintf(stderr, "Completed load of VM with exit code %d seq iteration "
              "%" PRIu64 "\n", ret, seq_iter);
      return ret;
diff --git a/migration/savevm.c b/migration/savevm.c
index 0ad1b93..3feaa61 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -891,6 +891,29 @@ void qemu_savevm_state_header(QEMUFile *f)


+#include "exec/ram_addr.h"
+#include "qemu/rcu_queue.h"
+#include <clplumbing/md5.h>
+#define MD5_DIGEST_LENGTH 16
+static void check_host_md5(void)
+    int i;
+    unsigned char md[MD5_DIGEST_LENGTH];
+    rcu_read_lock();
+    RAMBlock *block = QLIST_FIRST_RCU(&ram_list.blocks);/* Only check
'pc.ram' block */
+    rcu_read_unlock();
+    MD5(block->host, block->used_length, md);
+    for(i = 0; i < MD5_DIGEST_LENGTH; i++) {
+        fprintf(stderr, "%02x", md[i]);
+    }
+    fprintf(stderr, "\n");
+    error_report("end ram md5");
  void qemu_savevm_state_begin(QEMUFile *f,
                               const MigrationParams *params)
@@ -1056,6 +1079,10 @@ void qemu_savevm_state_complete_precopy(QEMUFile *f,
bool iterable_only)
          save_section_header(f, se, QEMU_VM_SECTION_END);

          ret = se->ops->save_live_complete_precopy(f, se->opaque);
+        fprintf(stderr, "after saving %s complete\n", se->idstr);
+        check_host_md5();
          trace_savevm_section_end(se->idstr, se->section_id, ret);
          save_section_footer(f, se);
          if (ret < 0) {
@@ -1791,6 +1818,11 @@ static int qemu_loadvm_state_main(QEMUFile *f,
MigrationIncomingState *mis)
                               section_id, le->se->idstr);
                  return ret;
+            if (section_type == QEMU_VM_SECTION_END) {
+                error_report("after loading state section id %d(%s)",
+                             section_id, le->se->idstr);
+                check_host_md5();
+            }
              if (!check_section_footer(f, le)) {
                  return -EINVAL;
@@ -1901,6 +1933,8 @@ int qemu_loadvm_state(QEMUFile *f)

+    error_report("%s: after cpu_synchronize_all_post_init\n", __func__);
+    check_host_md5();

      return ret;

Dr. David Alan Gilbert / address@hidden / Manchester, UK


Best regards.
Li Zhijian (8555)

reply via email to

[Prev in Thread] Current Thread [Next in Thread]