qemu-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding revers


From: Pavel Dovgalyuk
Subject: Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding reverse debugging
Date: Tue, 9 Oct 2018 14:26:52 +0300

Maybe this will help?

 

https://www.mail-archive.com/address@hidden/msg560780.html

 

Pavel Dovgalyuk

 

From: Artem Pisarenko [mailto:address@hidden 
Sent: Tuesday, October 09, 2018 2:24 PM
To: Pavel Dovgalyuk
Cc: address@hidden; address@hidden
Subject: Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding 
reverse debugging

 

(Since all previous patches are already merged to master, I'm running tests 
against latest (almost) version from master branch. Following results are based 
on master commit dafd95053611aa14dda40266857608d12ddce658 .)

 

Applying this patch made Tests 1 and 2 succeed (at least I wasn't able to 
acheive failures with several attempts).

Also I've tried few tests without sleep=off and/or rtc base options. All of 
them succeed too, except one case - removing sleep=off (regardless of -rtc 
option values or its presence at all) causes qemu to hang hard in recording 
mode at very startup. Process needs to be killed.

 

Some info from debugger:

    qemu-system-x86_64 [13231] [cores: 2,4,5,7]    

          Thread #1 [qemu-system-x86] 13231 [core: 2] (Suspended : Container)   
 

                      __lll_lock_wait() at lowlevellock.S:135 0x7f00b116626d   

                      __GI___pthread_mutex_lock() at pthread_mutex_lock.c:80 
0x7f00b115fdbd     

                      qemu_mutex_lock_impl() at qemu-thread-posix.c:66 0x947ac4 
     

                      replay_mutex_lock() at replay-internal.c:206 0x7f3dea     
  

                      os_host_main_loop_wait() at main-loop.c:235 0x94335e    

                      main_loop_wait() at main-loop.c:497 0x943429      

                      main_loop() at vl.c:1,853 0x5be70f   

                      main() at vl.c:4,575 0x5c56e0           

          Thread #2 [qemu-system-x86] 13282 [core: 4] (Suspended : Container)   
 

          Thread #3 [qemu-system-x86] 13283 [core: 5] (Suspended : Container)   
 

          Thread #4 [qemu-system-x86] 13284 [core: 7] (Suspended : Step) 

                      cpu_get_icount_raw() at cpus.c:301 0x45a0a0         

                      replay_get_current_step() at replay.c:67 0x7f2f14   

                      replay_save_instructions() at replay-internal.c:225 
0x7f3ea0          

                      replay_save_clock() at replay-time.c:24 0x7f483d   

                      icount_warp_rt() at cpus.c:512 0x45a745     

                      qemu_account_warp_timer() at cpus.c:690 0x45ad55         

                      qemu_tcg_rr_cpu_thread_fn() at cpus.c:1,498 0x45c554    

                      qemu_thread_start() at qemu-thread-posix.c:504 0x9485cf 

                      start_thread() at pthread_create.c:333 0x7f00b115d6ba     

                      clone() at clone.S:109 0x7f00b0e9341d       

    gdb (7.11.1)          

 

Threads #2,3 are just waiting in poll or similar. Nothing extraordinary.

 

Thread #4 cycles inside do {} while() loop of cpu_get_icount_raw() function:

    do {

        start = seqlock_read_begin(&timers_state.vm_clock_seqlock);

        icount = cpu_get_icount_raw_locked();

    } while (seqlock_read_retry(&timers_state.vm_clock_seqlock, start));

 

Value of timers_state.vm_clock_seqlock.sequence is always 3.

 

вт, 9 окт. 2018 г. в 15:04, Pavel Dovgalyuk <address@hidden>:

Please try the following patch.

There was a problem with rtc option in record/replay mode.

 

diff --git a/vl.c b/vl.c

index 40d5d0f..afe1c20 100644

--- a/vl.c

+++ b/vl.c

@@ -2885,6 +2885,7 @@ int main(int argc, char **argv, char **envp)

     DisplayState *ds;

     QemuOpts *opts, *machine_opts;

     QemuOpts *icount_opts = NULL, *accel_opts = NULL;

+    QemuOpts *rtc_opts = NULL;

     QemuOptsList *olist;

     int optind;

     const char *optarg;

@@ -3691,12 +3692,11 @@ int main(int argc, char **argv, char **envp)

                 warn_report("This option is ignored and will be removed soon");

                 break;

             case QEMU_OPTION_rtc:

-                opts = qemu_opts_parse_noisily(qemu_find_opts("rtc"), optarg,

-                                               false);

-                if (!opts) {

+                rtc_opts = qemu_opts_parse_noisily(qemu_find_opts("rtc"),

+                                                   optarg, false);

+                if (!rtc_opts) {

                     exit(1);

                 }

-                configure_rtc(opts);

                 break;

             case QEMU_OPTION_tb_size:

#ifndef CONFIG_TCG

@@ -3907,6 +3907,9 @@ int main(int argc, char **argv, char **envp)

     loc_set_none();

     replay_configure(icount_opts);

+    if (rtc_opts) {

+        configure_rtc(rtc_opts);

+    }

     if (incoming && !preconfig_exit_requested) {

         error_report("'preconfig' and 'incoming' options are "

 

Pavel Dovgalyuk

 

From: Artem Pisarenko [mailto:address@hidden 
Sent: Thursday, October 04, 2018 4:16 PM
To: dovgaluk
Cc: address@hidden; address@hidden
Subject: Re: [Qemu-devel] [PATCH v6 00/25] Fixing record/replay and adding 
reverse debugging

 

No, it didn't changed test results, at least for 
https://github.com/ispras/qemu/tree/rr-180911 . Even step values it stucks on 
are same for most runs.

Playing with master and my own branch gives different results for tests without 
sleep=off and -rtc base. It seems that patch you mentioned didn't changed them 
very much.

The only thing can be said for sure, is that this patch does not fix issues 
completely. But MAY fix them partially or in some other specific cases...

 

ср, 3 окт. 2018 г. в 12:47, dovgaluk <address@hidden>:

Can you try applying this patch?
https://www.mail-archive.com/address@hidden/msg563798.html

I also encountered the problems with x86_64 replaying and found the 
misprint in
the code which was fixed later, than sending the series to the mailing 
list.

Pavel Dovgalyuk


Artem Pisarenko писал 2018-10-02 10:02:
> I've added "-monitor stdio" option to command line of Test 1 and
> repeated entering command during execution:
> 
>   QEMU 3.0.50 monitor - type 'help' for more information
>   (qemu) info replay
>   Replaying execution 'icount_rr_capture.bin': current step =
> 311736195
>   (qemu) info replay
>   Replaying execution 'icount_rr_capture.bin': current step =
> 318198367
>   (qemu) info replay
>   Replaying execution 'icount_rr_capture.bin': current step =
> 324737211
>   (qemu) info replay
>   Replaying execution 'icount_rr_capture.bin': current step =
> 329890795
>   (qemu) info replay
>   Replaying execution 'icount_rr_capture.bin': current step =
> 607069789
>   (qemu) info replay
>   Replaying execution 'icount_rr_capture.bin': current step =
> 607069789
>   (qemu) info replay
>   Replaying execution 'icount_rr_capture.bin': current step =
> 607069789
>   ...
> 
> Some notes on value of step it stucks on:
> - mostly it's same (even across different record-replay pairs);
> - stressing host during replay may cause it to change even for same
> record-replay pair (i.e. different replay executions for same file
> recorded).
> 
> This specific case seems to be stable to reproduce.
> 
> вт, 2 окт. 2018 г. в 0:22, Artem Pisarenko
> <address@hidden>:
> 
>> I've posted bug report with extended tests (incl. case without
>> sleep=off). You may find guest image (kernel) in bug description.
>> https://bugs.launchpad.net/qemu/+bug/1795369 [1]
>> 
>> The most annoying thing is that some issues are almost not
>> reproducible. There are definitely race conditions somewhere in qemu
>> code. Running 'stress-ng' utility with CPU and I/O stressors in
>> parallel with qemu execution greatly minimizes amount of attempts
>> when I'm trying to trigger some of issues I encounter.
>> 
>> I'll try 'info monitor' command tomorrow, but no guarantees that
>> I'll be able to reproduce issue again.
>> 
>> Speaking about '-nographic' and SDL... I've noted that UI greatly
>> minimizes possibility of hanging (but not avoids it completely) when
>> using icount in general, so this effect isn't rr-specific. I've
>> already reported this bug too.
>> 
>> пн, 1 окт. 2018 г., 20:14 dovgaluk <address@hidden>:
>> 
>>> Artem Pisarenko писал 2018-09-30 14:01:
>>>> Feature still broken :(
>>> 
>>> Thanks for testing.
>>> 
>>>> 
>>>> Brief description of my tests.
>>>> 
>>>> Guest image is Linux, which just powers off after kernel boots
>>>> (instead of proceeding to user-space /init or /sbin/init).
>>>> Base cmdline:
>>>> qemu-system-x86_64 -nodefaults -machine pc,accel=tcg -m 2048
>>> -cpu
>>>> qemu64 -rtc clock=vm,base=2000-01-01T00:00:00 -kernel bzImage
>>> -initrd
>>>> rootfs -append 'nokaslr console=ttyS0 rdinit=/init_poweroff'
>>>> -nographic -serial SERIAL_VALUE -icount
>>>> 1,sleep=off,rr=RR_VALUE,rrfile=icount_rr_capture.bin
>>> 
>>> I've never tried it with sleep=off. Can you remove it and try
>>> again?
>>> 
>>> We also seen a problem with '-nographic'. When we remove this
>>> option and
>>> QEMU runs with SDL
>>> window, everything is ok. There is some problem with main loop
>>> which may
>>> sleep when there
>>> is no GUI to update, or something like that. We couldn't fix it
>>> yet.
>>> 
>>>> 
>>>> Test 1. When SERIAL_VALUE=none
>>>> Running with RR_VALUE=record completes successfully.
>>>> Running with RR_VALUE=replay doesn't completes. qemu process
>>> just
>>>> eating ~100% cpu and memory usage doesn't grow after some
>>> moment. I
>>>> don't see what happens because of problem no.2 (see below).
>>> 
>>> Try 'info replay' monitor command. Does instruction counter
>>> increases?
>>> 
>>>> 
>>>> Test 2. When SERIAL_VALUE=stdio
>>>> Running with RR_VALUE=record completes successfully.
>>>> 
>>>> Running with RR_VALUE=replay caues exit with error:
>>>> 
>>>> "qemu-system-x86_64: Missing character write event in the replay
>>> log"
>>>> 
>>>> These problems are same with qemu 2.12 (both vanilla and with
>>> previous
>>>> versions of these patches applied). Furthemore, I consider whole
>>>> icount mode broken and determinism isn't achievable.
>>>> The irony is that I actually don't need record/replay feature.
>>> I've
>>>> tried to use it only as instrument to debug failing determinism
>>> in
>>>> qemu code. But since replay/record feature itself relies on
>>>> determinism, which is broken, it's no wonder why it fails also
>>> (I just
>>>> hoped to bypass it).
>>>> 
>>>> Contact me if you need more details. I just tired a lot trying
>>> to get
>>>> all these things working... Hope is leaving me...
>>> 
>>> Can you share the kernel in case the icount still broken?
>>> 
>>> Pavel Dovgalyuk
>> --
>> 
>> С уважением,
>> Артем Писаренко
>  --
> 
> С уважением,
>   Артем Писаренко
> 
> Links:
> ------
> [1] https://bugs.launchpad.net/qemu/+bug/1795369

-- 

С уважением,
  Артем Писаренко

-- 

С уважением,
  Артем Писаренко



reply via email to

[Prev in Thread] Current Thread [Next in Thread]