qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: recent flakiness (intermittent hangs) of migration-test


From: Peter Xu
Subject: Re: recent flakiness (intermittent hangs) of migration-test
Date: Thu, 29 Oct 2020 16:28:10 -0400

On Thu, Oct 29, 2020 at 07:34:33PM +0000, Dr. David Alan Gilbert wrote:
> > Here's qemu process 3514:
> > Thread 5 (Thread 0x3ff4affd910 (LWP 3628)):
> > #0  0x000003ff94c8d936 in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa26cd74dc)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ff94c8d936 in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa26cd7488, cond=0x2aa26cd74b0)
> >     at pthread_cond_wait.c:502
> > #2  0x000003ff94c8d936 in __pthread_cond_wait
> > (cond=cond@entry=0x2aa26cd74b0, mutex=mutex@entry=0x2aa26cd7488)
> >     at pthread_cond_wait.c:655
> > #3  0x000002aa2497072c in qemu_sem_wait (sem=sem@entry=0x2aa26cd7488)
> > at ../../util/qemu-thread-posix.c:328
> > #4  0x000002aa244f4a02 in postcopy_pause (s=0x2aa26cd7000) at
> > ../../migration/migration.c:3192

So the postcopy pause state didn't continue successfully on src due to some
reason ...

> > #5  0x000002aa244f4a02 in migration_detect_error (s=0x2aa26cd7000) at
> > ../../migration/migration.c:3255
> > #6  0x000002aa244f4a02 in migration_thread
> > (opaque=opaque@entry=0x2aa26cd7000) at
> > ../../migration/migration.c:3564
> > #7  0x000002aa2496fa3a in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:521
> > #8  0x000003ff94c87aa8 in start_thread (arg=0x3ff4affd910) at
> > pthread_create.c:463
> > #9  0x000003ff94b7a896 in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

[...]

> > And here's 3528:
> > Thread 6 (Thread 0x3ff6ccfd910 (LWP 3841)):
> > #0  0x000003ffb1b8d936 in futex_wait_cancelable (private=<optimized
> > out>, expected=0, futex_word=0x2aa387a6aac)
> >     at ../sysdeps/unix/sysv/linux/futex-internal.h:88
> > #1  0x000003ffb1b8d936 in __pthread_cond_wait_common (abstime=0x0,
> > mutex=0x2aa387a6a58, cond=0x2aa387a6a80)
> >     at pthread_cond_wait.c:502
> > #2  0x000003ffb1b8d936 in __pthread_cond_wait
> > (cond=cond@entry=0x2aa387a6a80, mutex=mutex@entry=0x2aa387a6a58)
> >     at pthread_cond_wait.c:655
> > #3  0x000002aa36bf072c in qemu_sem_wait (sem=sem@entry=0x2aa387a6a58)
> > at ../../util/qemu-thread-posix.c:328
> > #4  0x000002aa366c369a in postcopy_pause_incoming (mis=<optimized
> > out>) at ../../migration/savevm.c:2541

Same on the destination side.

> > #5  0x000002aa366c369a in qemu_loadvm_state_main
> > (f=f@entry=0x2aa38897930, mis=mis@entry=0x2aa387a6820)
> >     at ../../migration/savevm.c:2615
> > #6  0x000002aa366c44fa in postcopy_ram_listen_thread
> > (opaque=opaque@entry=0x0) at ../../migration/savevm.c:1830
> > #7  0x000002aa36befa3a in qemu_thread_start (args=<optimized out>) at
> > ../../util/qemu-thread-posix.c:521
> > #8  0x000003ffb1b87aa8 in start_thread (arg=0x3ff6ccfd910) at
> > pthread_create.c:463
> > #9  0x000003ffb1a7a896 in thread_start () at
> > ../sysdeps/unix/sysv/linux/s390/s390-64/clone.S:65

Peter, is it possible that you enable QTEST_LOG=1 in your future migration-test
testcase and try to capture the stderr?  With the help of commit a47295014d
("migration-test: Only hide error if !QTEST_LOG", 2020-10-26), the test should
be able to dump quite some helpful information to further identify the issue.

I'll also try to find another s390 host to try reproduce on my side.

Thanks,

-- 
Peter Xu




reply via email to

[Prev in Thread] Current Thread [Next in Thread]