[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: migration: broken snapshot saves appear on s390 when small fields in
From: |
Claudio Fontana |
Subject: |
Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed |
Date: |
Thu, 16 Jul 2020 14:58:54 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.4.1 |
Small update on this,
On 7/15/20 1:10 PM, Claudio Fontana wrote:
> Hi Thomas,
>
> On 7/14/20 4:35 PM, Thomas Huth wrote:
>> On 14/07/2020 16.29, Claudio Fontana wrote:
>>> Hello,
>>>
>>> I have some tiny progress in narrowing down this issue, possibly a qcow2
>>> issue, still unclear,
>>> but involving Kevin Wolf and Max Reitz.
>>>
>>>
>>> The reproducer again:
>>>
>>>> --------------------------------------------cut-------------------------------------------
>>>> diff --git a/cpus.c b/cpus.c
>>>> index 41d1c5099f..443b88697a 100644
>>>> --- a/cpus.c
>>>> +++ b/cpus.c
>>>> @@ -643,7 +643,7 @@ static void qemu_account_warp_timer(void)
>>>>
>>>> static bool icount_state_needed(void *opaque)
>>>> {
>>>> - return use_icount;
>>>> + return 0;
>>>> }
>>>>
>>>> static bool warp_timer_state_needed(void *opaque)
>>>> --------------------------------------------cut-------------------------------------------
>>>
>>> This issue for now appears on s390 only:
>>>
>>> On s390 hardware, test 267 fails (both kvm and tcg) in the qcow2 backing
>>> file part, with broken migration stream data in the s390-skeys vmsave (old
>>> style).
>> [...]
>>> If someone has a good idea let me know - first attempts to reproduce on x86
>>> failed, but maybe more work could lead to it.
>>
>
> small update: in the GOOD case (enough padding added) a qcow_merge() is
> triggered for the last write of 16202 bytes.
> In the BAD case (not enough padding added) a qcow_merge() is not triggered
> for the last write of 16201 bytes.
>
> Note: manually flushing with qemu_fflush in s390-skeys vmsave also works
> (maybe got lost in the noise).
>
>
>> Two questions:
>>
>> 1) Can you also reproduce the issue manually, without running iotest
>> 267? ... I tried, but so far I failed.
>
> Thanks for the suggestion, will try.
Currently trying to reproduce manually an environment similar to that of the
test,
at the moment I am not able to reproduce the issue manually.
Not very familiar with s390,
I've been running with
export QEMU=/home/cfontana/qemu-build/s390x-softmmu/qemu-system-s390x
$QEMU -nographic -monitor stdio -nodefaults -no-shutdown FILENAME
where FILENAME is the qcow2 produced by the test.
let me know if you have a suggestion on how to setup up something simple
properly.
>
>>
>> 2) Since all the information so far sounds like the problem could be
>> elsewhere in the code, and the skeys just catch it by accident ... have
>> you tried running with valgrind? Maybe it catches something useful?
>
> Nothing yet, but will fiddle with the options a bit more.
Only thing I have seen so far:
+==33321==
+==33321== Warning: client switching stacks? SP change: 0x1ffeffe5e8 -->
0x5d9cf60
+==33321== to suppress, use: --max-stackframe=137324009096 or greater
+==33321== Warning: client switching stacks? SP change: 0x5d9cd18 -->
0x1ffeffe5e8
+==33321== to suppress, use: --max-stackframe=137324009680 or greater
+==33321== Warning: client switching stacks? SP change: 0x1ffeffe8b8 -->
0x5d9ce58
+==33321== to suppress, use: --max-stackframe=137324010080 or greater
+==33321== further instances of this message will not be shown.
+==33321== Thread 4:
+==33321== Conditional jump or move depends on uninitialised value(s)
+==33321== at 0x3AEC70: process_queued_cpu_work (cpus-common.c:331)
+==33321== by 0x2753E1: qemu_wait_io_event_common (cpus.c:1213)
+==33321== by 0x2755CD: qemu_wait_io_event (cpus.c:1253)
+==33321== by 0x27596D: qemu_dummy_cpu_thread_fn (cpus.c:1337)
+==33321== by 0x725C87: qemu_thread_start (qemu-thread-posix.c:521)
+==33321== by 0x4D504E9: start_thread (in /lib64/libpthread-2.26.so)
+==33321== by 0x4E72BBD: ??? (in /lib64/libc-2.26.so)
+==33321==
+==33321== Conditional jump or move depends on uninitialised value(s)
+==33321== at 0x3AEC74: process_queued_cpu_work (cpus-common.c:331)
+==33321== by 0x2753E1: qemu_wait_io_event_common (cpus.c:1213)
+==33321== by 0x2755CD: qemu_wait_io_event (cpus.c:1253)
+==33321== by 0x27596D: qemu_dummy_cpu_thread_fn (cpus.c:1337)
+==33321== by 0x725C87: qemu_thread_start (qemu-thread-posix.c:521)
+==33321== by 0x4D504E9: start_thread (in /lib64/libpthread-2.26.so)
+==33321== by 0x4E72BBD: ??? (in /lib64/libc-2.26.so)
+==33321==
+==33321==
+==33321== HEAP SUMMARY:
+==33321== in use at exit: 2,138,442 bytes in 13,935 blocks
+==33321== total heap usage: 19,089 allocs, 5,154 frees, 5,187,670 bytes
allocated
+==33321==
+==33321== LEAK SUMMARY:
+==33321== definitely lost: 0 bytes in 0 blocks
+==33321== indirectly lost: 0 bytes in 0 blocks
+==33321== possibly lost: 7,150 bytes in 111 blocks
+==33321== still reachable: 2,131,292 bytes in 13,824 blocks
+==33321== suppressed: 0 bytes in 0 blocks
+==33321== Rerun with --leak-check=full to see details of leaked memory
>
>>
>> Thomas
>>
>
> Ciao,
>
> Claudio
>
>
A more interesting update is what follows I think.
I was able to "fix" the problem shown by the reproducer:
@@ -643,7 +643,7 @@ static void qemu_account_warp_tim@@ -643,7 +643,7 @@ static
void qemu_account_warp_timer(void)
static bool icount_state_needed(void *opaque)
{
- return use_icount;
+ return 0;
}
by just slowing down qcow2_co_pwritev_task_entry with some tight loops,
without changing any fields between runs (other than the reproducer icount
field removal).
I tried to insert the same slowdown just in savevm.c at the end of
save_snapshot, but that does not work, needs to be in the coroutine.
Thanks,
Claudio
- migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/12
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Paolo Bonzini, 2020/07/12
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/13
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/14
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Thomas Huth, 2020/07/14
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/15
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/15
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed,
Claudio Fontana <=
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/20
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/21
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Bruce Rogers, 2020/07/27
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Vladimir Sementsov-Ogievskiy, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Vladimir Sementsov-Ogievskiy, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Bruce Rogers, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Max Reitz, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Vladimir Sementsov-Ogievskiy, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Paolo Bonzini, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Max Reitz, 2020/07/28