[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsiv
From: |
Fabiano Rosas |
Subject: |
Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle |
Date: |
Thu, 12 Sep 2024 12:37:18 -0300 |
Peter Xu <peterx@redhat.com> writes:
> On Thu, Sep 12, 2024 at 09:13:16AM +0100, Peter Maydell wrote:
>> On Wed, 11 Sept 2024 at 22:26, Fabiano Rosas <farosas@suse.de> wrote:
>> > I don't think we're discussing total CI time at this point, so the math
>> > doesn't really add up. We're not looking into making the CI finish
>> > faster. We're looking into making migration-test finish faster. That
>> > would reduce timeouts in CI, speed-up make check and reduce the chance
>> > of random race conditions* affecting other people/staging runs.
>>
>> Right. The reason migration-test appears on my radar is because
>> it is very frequently the thing that shows up as "this sometimes
>> just fails or just times out and if you hit retry it goes away
>> again". That might not be migration-test's fault specifically,
>> because those retries tend to be certain CI configs (s390,
>> the i686-tci one), and I have some theories about what might be
>> causing it (e.g. build system runs 4 migration-tests in parallel,
>> which means 8 QEMU processes which is too many for the number
>> of host CPUs). But right now I look at CI job failures and my reaction
>> is "oh, it's the migration-test failing yet again" :-(
>>
>> For some examples from this week:
>>
>> https://gitlab.com/qemu-project/qemu/-/jobs/7802183144
>> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373 <--------[1]
>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152 <--------[2]
>> https://gitlab.com/qemu-project/qemu/-/jobs/7786579155
>
> Ah right, the TIMEOUT is unfortunate, especially if tests can be run in
> parallel. It indeed sounds like no good way to finally solve.. I don't
> also see how speeding up / reducing tests in migration test would help, as
> that's (from some degree..) is the same as tuning the timeout value bigger.
> When the tests are less it'll fit into 480s window, but maybe it's too
> quick now we wonder whether we should shrink it to e.g. 90s, but then it
> can timeout again when on a busy host with less capability of concurrency.
>
> But indeed there're two ERRORs ([1,2] above).. I collected some more info
> here before the log expires:
>
> =================================8<================================
>
> *** /i386/migration/multifd/tcp/plain/cancel, qtest-i386 on s390 host
>
> https://gitlab.com/qemu-project/qemu/-/jobs/7799842373
>
> 101/953 qemu:qtest+qtest-i386 / qtest-i386/migration-test
> ERROR 144.32s killed by signal 6 SIGABRT
>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
>>>> G_TEST_DBUS_DAEMON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>>>
>>>> PYTHON=/home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/pyvenv/bin/python3
>>>> QTEST_QEMU_IMG=./qemu-img MALLOC_PERTURB_=144
>>>> QTEST_QEMU_BINARY=./qemu-system-i386
>>>> /home/gitlab-runner/builds/zEr9wY_L/0/qemu-project/qemu/build/tests/qtest/migration-test
>>>> --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> stderr:
> warning: fd: migration to a file is deprecated. Use file: instead.
> warning: fd: migration to a file is deprecated. Use file: instead.
> ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11
> (Segmentation fault) (core dumped)
> (test program exited with status code -6)
> TAP parsing error: Too few tests run (expected 53, got 39)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> # Start of plain tests
> # Running /i386/migration/multifd/tcp/plain/cancel
> # Using machine type: pc-i440fx-9.2
> # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3273509.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3273509.qmp,id=char0
> -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel
> tcg -machine pc-i440fx-9.2, -name source,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-4112T2/src_serial -drive
> if=none,id=d0,file=/tmp/migration-test-4112T2/bootsect,format=raw -device
> ide-hd,drive=d0,secs=1,cyls=1,heads=1 2>/dev/null -accel qtest
> # starting QEMU: exec ./qemu-system-i386 -qtest unix:/tmp/qtest-3273509.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-3273509.qmp,id=char0
> -mon chardev=char0,mode=control -display none -audio none -accel kvm -accel
> tcg -machine pc-i440fx-9.2, -name target,debug-threads=on -m 150M -serial
> file:/tmp/migration-test-4112T2/dest_serial -incoming defer -drive
> if=none,id=d0,file=/tmp/migration-test-4112T2/bootsect,format=raw -device
> ide-hd,drive=d0,secs=1,cyls=1,heads=1 2>/dev/null -accel qtest
> ----------------------------------- stderr -----------------------------------
> warning: fd: migration to a file is deprecated. Use file: instead.
> warning: fd: migration to a file is deprecated. Use file: instead.
> ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11
> (Segmentation fault) (core dumped)
>
> *** /ppc64/migration/multifd/tcp/plain/cancel, qtest-ppc64 on i686 host
>
> https://gitlab.com/qemu-project/qemu/-/jobs/7786579152
>
> 174/315 qemu:qtest+qtest-ppc64 / qtest-ppc64/migration-test
> ERROR 381.00s killed by signal 6 SIGABRT
>>>> PYTHON=/builds/qemu-project/qemu/build/pyvenv/bin/python3
>>>> QTEST_QEMU_IMG=./qemu-img
>>>> G_TEST_DBUS_DAEMON=/builds/qemu-project/qemu/tests/dbus-vmstate-daemon.sh
>>>> QTEST_QEMU_BINARY=./qemu-system-ppc64 MALLOC_PERTURB_=178
>>>> QTEST_QEMU_STORAGE_DAEMON_BINARY=./storage-daemon/qemu-storage-daemon
>>>> /builds/qemu-project/qemu/build/tests/qtest/migration-test --tap -k
> ――――――――――――――――――――――――――――――――――――― ✀ ―――――――――――――――――――――――――――――――――――――
> stderr:
> qemu-system-ppc64: Cannot read from TLS channel: The TLS connection was
> non-properly terminated.
> warning: fd: migration to a file is deprecated. Use file: instead.
> warning: fd: migration to a file is deprecated. Use file: instead.
> ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11
> (Segmentation fault) (core dumped)
> (test program exited with status code -6)
> TAP parsing error: Too few tests run (expected 61, got 47)
> ――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――――
>
> # Start of plain tests
> # Running /ppc64/migration/multifd/tcp/plain/cancel
> # Using machine type: pseries-9.2
> # starting QEMU: exec ./qemu-system-ppc64 -qtest unix:/tmp/qtest-40766.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-40766.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg
> -machine pseries-9.2,vsmt=8 -name source,debug-threads=on -m 256M -serial
> file:/tmp/migration-test-H0Z1T2/src_serial -nodefaults -machine
> cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off, -bios
> /tmp/migration-test-H0Z1T2/bootsect 2>/dev/null -accel qtest
> # starting QEMU: exec ./qemu-system-ppc64 -qtest unix:/tmp/qtest-40766.sock
> -qtest-log /dev/null -chardev socket,path=/tmp/qtest-40766.qmp,id=char0 -mon
> chardev=char0,mode=control -display none -audio none -accel kvm -accel tcg
> -machine pseries-9.2,vsmt=8 -name target,debug-threads=on -m 256M -serial
> file:/tmp/migration-test-H0Z1T2/dest_serial -incoming defer -nodefaults
> -machine cap-cfpc=broken,cap-sbbc=broken,cap-ibs=broken,cap-ccf-assist=off,
> -bios /tmp/migration-test-H0Z1T2/bootsect 2>/dev/null -accel qtest
> ----------------------------------- stderr -----------------------------------
> qemu-system-ppc64: Cannot read from TLS channel: The TLS connection was
> non-properly terminated.
> warning: fd: migration to a file is deprecated. Use file: instead.
> warning: fd: migration to a file is deprecated. Use file: instead.
> ../tests/qtest/libqtest.c:205: kill_qemu() detected QEMU death from signal 11
> (Segmentation fault) (core dumped)
>
> (test program exited with status code -6)
> =================================8<================================
>
> So.. it's the same test (multifd/tcp/plain/cancel) that is failing on
> different host / arch being tested. What is more weird is the two failures
> are different, the 2nd failure throw out a TLS error even though the test
> doesn't yet have tls involved.
I think that's just a parallel test being cancelled prematurely, either
due to the crash or due to the timeout.
>
> Fabiano, is this the issue you're looking at?
Yes. I can reproduce locally by running 2 processes in parallel: 1 loop
with make -j$(nproc) check and another loop with tcp/plain/cancel. It
takes ~1h to hit. I've seen crashes with ppc64, s390 and
aarch64.
> Peter, do you think it'll be helpful if we temporarily mark this test as
> "slow" too so it's not run in CI (so we still run it ourselves when prepare
> migration PR, with the hope that it can reproduce)?
>
> Thanks,
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, (continued)
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/11
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Xu, 2024/09/11
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/11
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Maydell, 2024/09/12
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/12
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Maydell, 2024/09/12
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/12
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Xu, 2024/09/12
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Maydell, 2024/09/12
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Xu, 2024/09/13
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle,
Fabiano Rosas <=
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/12
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Xu, 2024/09/13
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/13
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/13
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Peter Xu, 2024/09/13
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Fabiano Rosas, 2024/09/13
- Re: [PATCH RFC 10/10] tests/migration-tests: Add test case for responsive CPU throttle, Yong Huang, 2024/09/09