qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4 05/24] Revert "replay: stop us hanging in rr_wait_io_event


From: Nicholas Piggin
Subject: Re: [PATCH v4 05/24] Revert "replay: stop us hanging in rr_wait_io_event"
Date: Thu, 14 Mar 2024 15:19:08 +1000

On Wed Mar 13, 2024 at 7:03 AM AEST, Alex Bennée wrote:
> "Nicholas Piggin" <npiggin@gmail.com> writes:
>
> > On Tue Mar 12, 2024 at 11:33 PM AEST, Alex Bennée wrote:
> >> Nicholas Piggin <npiggin@gmail.com> writes:
> >>
> >> > This reverts commit 1f881ea4a444ef36a8b6907b0b82be4b3af253a2.
> >> >
> >> > That commit causes reverse_debugging.py test failures, and does
> >> > not seem to solve the root cause of the problem x86-64 still
> >> > hangs in record/replay tests.
> >>
> >> I'm still finding the reverse debugging tests failing with this series.
> >
> > :(
> >
> > In gitlab CI or your own testing? What are you running exactly?
>
> My own - my mistake I didn't get a clean build because of the format
> bug. However I'm seeing new failures:
>
>   env QEMU_TEST_FLAKY_TESTS=1 AVOCADO_TIMEOUT_EXPECTED=1 ./pyvenv/bin/avocado 
> run ./tests/avocado/reverse_debugging.py
>   Fetching asset from 
> ./tests/avocado/reverse_debugging.py:ReverseDebugging_AArch64.test_aarch64_virt
>   JOB ID     : bd4b29f7afaa24dc6e32933ea9bc5e46bbc3a5a4
>   JOB LOG    : 
> /home/alex/avocado/job-results/job-2024-03-12T20.58-bd4b29f/job.log
>    (1/5) 
> ./tests/avocado/reverse_debugging.py:ReverseDebugging_X86_64.test_x86_64_pc: 
> PASS (4.49 s)
>    (2/5) 
> ./tests/avocado/reverse_debugging.py:ReverseDebugging_X86_64.test_x86_64_q35: 
> PASS (4.50 s)
>    (3/5) 
> ./tests/avocado/reverse_debugging.py:ReverseDebugging_AArch64.test_aarch64_virt:
>  FAIL: Invalid PC (read ffff2d941e4d7f28 instead of ffff2d941e4d7f2c) (3.06 s)

Okay, this is the new test I added. It runs for 1 second then
reverse-steps from the end of the trace. aarch64 is flaky -- pc is at a
different place at the same icount after the reverse-step (which is
basically the second replay). This indicates some non-determinism in
execution, or something in machine reset or migration is not restoring
the state exactly.

aarch64 ran okay few times including gitlab CI before I posted the
series, but turns out it does break quite often too.

x86 has a problem with this too so I disabled it there. I'll disable it
for aarch64 too for now.

x86 and aarch64 can run the replay_linux.py test quite well (after this
series), which is much longer and more complicated. The difference there
is that it is only a single replay, it never resets the machine or
loads the initial snapshot for reverse-debugging. So to me that
indicates that execution is probably deterministic, but its the reset
reload that has the problem.

Thanks,
Nick



reply via email to

[Prev in Thread] Current Thread [Next in Thread]