[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: migration: broken snapshot saves appear on s390 when small fields in
From: |
Bruce Rogers |
Subject: |
Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed |
Date: |
Tue, 28 Jul 2020 07:23:05 -0600 |
User-agent: |
Evolution 3.36.4 |
On Tue, 2020-07-28 at 11:43 +0300, Vladimir Sementsov-Ogievskiy wrote:
> 28.07.2020 02:09, Bruce Rogers wrote:
> > On Tue, 2020-07-21 at 10:22 +0200, Claudio Fontana wrote:
> > > On 7/20/20 8:24 PM, Claudio Fontana wrote:
> > > > I have now been able to reproduce this on X86 as well.
> > > >
> > > > It happens much more rarely, about once every 10 times.
> > > >
> > > > I will sort out the data and try to make it even more
> > > > reproducible,
> > > > then post my findings in detail.
> > > >
> > > > Overall I proceeded as follows:
> > > >
> > > > 1) hooked the savevm code to skip all fields with the exception
> > > > of
> > > > "s390-skeys". So only s390-skeys are actually saved.
> > > >
> > > > 2) reimplemented "s390-skeys" in a common implementation in
> > > > cpus.c,
> > > > used on both x86 and s390, modeling the behaviour of save/load
> > > > from
> > > > hw/s390
> > > >
> > > > 3) ran ./check -qcow2 267 on both x86 and s390.
> > > >
> > > > In the case of s390, failure seems to be reproducible 100% of
> > > > the
> > > > times.
> > > > On X86, it is as mentioned failing about 10% of the times.
> > > >
> > > > Ciao,
> > > >
> > > > Claudio
> > >
> > > And here is a small series of two patches that can be used to
> > > reproduce the problem.
> > >
> > > Clearly, this is not directly related to s390 or to skeys or to
> > > icount in particular, it is just an issue that happened to be
> > > more
> > > visible there.
> > >
> > > If you could help with this, please apply the attached patches.
> > >
> > > Patch 1 just adds a new "300" iotest. It is way easier to extract
> > > the
> > > relevant part out of test 267, which does a bit too much in the
> > > same
> > > file.
> > > Also this allows easier use of valgrind, since it does not
> > > "require"
> > > anything.
> > >
> > > Patch 2 hooks the savevm code to skip all fields during the
> > > snapshot
> > > with the exception of "s390-skeys", a new artificial field
> > > implemented to
> > > model what the real s390-skeys is doing.
> > >
> > > After applying patch 1 and patch 2, you can test (also on X86),
> > > with:
> > >
> > > ./check -qcow2 300
> > >
> > > On X86 many runs will be successful, but a certain % of them will
> > > instead fail like this:
> > >
> > >
> > > claudio@linux-ch70:~/git/qemu-pristine/qemu-build/tests/qemu-
> > > iotests>
> > > ./check -qcow2 300
> > > QEMU -- "/home/claudio/git/qemu-pristine/qemu-
> > > build/tests/qemu-iotests/../../x86_64-softmmu/qemu-system-x86_64"
> > > -nodefaults -display none -accel qtest
> > > QEMU_IMG -- "/home/claudio/git/qemu-pristine/qemu-
> > > build/tests/qemu-iotests/../../qemu-img"
> > > QEMU_IO -- "/home/claudio/git/qemu-pristine/qemu-
> > > build/tests/qemu-iotests/../../qemu-io" --cache writeback --aio
> > > threads -f qcow2
> > > QEMU_NBD -- "/home/claudio/git/qemu-pristine/qemu-
> > > build/tests/qemu-iotests/../../qemu-nbd"
> > > IMGFMT -- qcow2 (compat=1.1)
> > > IMGPROTO -- file
> > > PLATFORM -- Linux/x86_64 linux-ch70 4.12.14-lp151.28.36-
> > > default
> > > TEST_DIR -- /home/claudio/git/qemu-pristine/qemu-
> > > build/tests/qemu-iotests/scratch
> > > SOCK_DIR -- /tmp/tmp.gdcUu3l0SM
> > > SOCKET_SCM_HELPER -- /home/claudio/git/qemu-pristine/qemu-
> > > build/tests/qemu-iotests/socket_scm_helper
> > >
> > > 300 fail [10:14:05] [10:14:06] (last:
> > > 0s) output
> > > mismatch (see 300.out.bad)
> > > --- /home/claudio/git/qemu-pristine/qemu/tests/qemu-
> > > iotests/300.out 2020-07-21 10:03:54.468104764 +0200
> > > +++ /home/claudio/git/qemu-pristine/qemu-build/tests/qemu-
> > > iotests/300.out.bad 2020-07-21 10:14:06.098090543 +0200
> > > @@ -12,6 +12,9 @@
> > > ID TAG VM
> > > SIZE DATE VM
> > > CLOCK
> > > -- snap0 SIZE yyyy-mm-dd
> > > hh:mm:ss 00:00:00.000
> > > (qemu) loadvm snap0
> > > +Unexpected storage key data: 0
> > > +error while loading state for instance 0x0 of device 's390-
> > > skeys'
> > > +Error: Error -22 while loading VM state
> > > (qemu) quit
> > >
> > > *** done
> > > Failures: 300
> > > Failed 1 of 1 iotests
> > >
> > >
> > > At this point somebody more knowledgeable about QCOW2, coroutines
> > > and
> > > backing files could chime in?
> > >
> > <trim>
> >
> > I used the reproducer you provide here to do a git bisect as I
> > assume
> > whatever is now broken wasn't always broken, and it pointed to the
> > following commit:
> >
> > commit df893d25ceea3c0dcbe6d6b425309317fab6b22e (refs/bisect/bad)
> > Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> > Date: Tue Jun 4 19:15:13 2019 +0300
> >
> > block/qcow2: implement .bdrv_co_preadv_part
> >
> > Indeed, I am currently able to reliable reproduce the issue with
> > this
> > commit applied, and not reproduce it without it.
>
> Do you have a reproducer on top of df893d25ceea? I tried to apply the
> patches on it, there are a lot of conflicts and after solving them,
> the test not work for me.
>
> What do you mean by not reproduce without df893d25ceea3c0d? I also
> tried to do "git rebase df893d25ceea3c0d --onto df893d25ceea3c0d^",
> after applying the reproducing patches on master, but again, there
> are conflicts..
>
> How did you applied
>
> commit 88f468e54696cd9ffc28ecf96f18f04125402541
> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Date: Mon Sep 16 20:53:22 2019 +0300
>
> block/qcow2: refactor qcow2_co_preadv_part
>
> and, which is more important further
>
> commit d710cf575ad5fb3ab329204620de45bfe50caa53
> Author: Vladimir Sementsov-Ogievskiy <vsementsov@virtuozzo.com>
> Date: Mon Sep 16 20:53:24 2019 +0300
>
> block/qcow2: introduce parallel subrequest handling in read and
> write
>
> without df893d25ceea3c0d ?
>
>
> I can reproduce the bug on master with Claudio's two patches, but how
> to reproduce the relation with df893d25ceea3c0d?
>
>
Yeah, I didn't provide full reproducer info for the point where the
patch I mention got applied. No matter, it looks like a fix has been
identified in the mean time.
Thanks for the quick response, though.
Bruce
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, (continued)
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/14
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Thomas Huth, 2020/07/14
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/15
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/15
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/16
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/20
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/21
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Bruce Rogers, 2020/07/27
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Vladimir Sementsov-Ogievskiy, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Vladimir Sementsov-Ogievskiy, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed,
Bruce Rogers <=
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Max Reitz, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Vladimir Sementsov-Ogievskiy, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Paolo Bonzini, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Max Reitz, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Paolo Bonzini, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Vladimir Sementsov-Ogievskiy, 2020/07/28
- Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Claudio Fontana, 2020/07/28
Re: migration: broken snapshot saves appear on s390 when small fields in migration stream removed, Dr. David Alan Gilbert, 2020/07/13