[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration
From: |
Dr. David Alan Gilbert |
Subject: |
Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue) |
Date: |
Tue, 15 Mar 2022 16:14:52 +0000 |
User-agent: |
Mutt/2.1.5 (2021-12-30) |
* Peter Maydell (peter.maydell@linaro.org) wrote:
> On Tue, 15 Mar 2022 at 14:39, Peter Maydell <peter.maydell@linaro.org> wrote:
> >
> > On Mon, 14 Mar 2022 at 19:44, Peter Maydell <peter.maydell@linaro.org>
> > wrote:
> > > On Mon, 14 Mar 2022 at 18:58, Peter Maydell <peter.maydell@linaro.org>
> > > wrote:
> > > > I just hit the abort case, narrowing it down to the
> > > > /i386/migration/multifd/tcp/zlib case, which can hit this without
> > > > any other tests being run:
> > >
> > > > This test seems to fail fairly frequently. I'll try a bisect...
> > >
> > > On this s390 machine, this test has been intermittent since
> > > it was first added in commit 7ec2c2b3c1 ("multifd: Add zlib compression
> > > multifd support") in 2019.
> >
> > I have tried (on current master) runs of various of the other
> > migration tests, and:
> > * /i386/migration/multifd/tcp/zstd completed 1170 iterations without
> > failing
> > * /i386/migration/precopy/tcp completed 4669 iterations without
> > failing
> > * /i386/migration/multifd/tcp/zlib fails usually within the first
> > 10 iterations (the most I ever saw it manage was 32)
> >
> > So whatever this is, it seems like it might be specific to the
> > zlib code somehow ?
>
> Maybe we're running into this bug
> https://bugs.launchpad.net/ubuntu/+source/zlib/+bug/1961427
> ("zlib: compressBound() returns an incorrect result on z15") ?
The initial description of compressBound being wrong doesn't
feel like it would cause that; it claims it would trigger an error
(I'm not sure how good we are at spotting that!); but then later
in the description it says:
'Mistakes in dfltcc_free_window OF and especially DEFLATE_BOUND_COMPLEN,
(incl. the bit definitions), may cause various and unforseen defects'
Certainly looks like a 'various and unforseen defect'.
Dave
> That bug report claims it doesn't affect focal, though, which
> is what we're running on this box (specifically, the zlib1g
> package is version 1:1.2.11.dfsg-2ubuntu1.2).
>
> A run with DFLTCC=0 has made it past 60 iterations so far, which
> suggests that that does serve as a workaround for the bug.
>
> thanks
> -- PMM
>
--
Dr. David Alan Gilbert / dgilbert@redhat.com / Manchester, UK
- Re: [PULL 00/18] migration queue, (continued)
- Re: [PULL 00/18] migration queue, Peter Xu, 2022/03/14
- Re: [PULL 00/18] migration queue, Peter Maydell, 2022/03/14
- Re: [PULL 00/18] migration queue, Peter Maydell, 2022/03/14
- multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue), Peter Maydell, 2022/03/15
- Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue), Peter Maydell, 2022/03/15
- Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue), Peter Maydell, 2022/03/15
- Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue), Daniel P . Berrangé, 2022/03/15
- Re: multifd/tcp/zlib intermittent abort, Thomas Huth, 2022/03/15
- Re: multifd/tcp/zlib intermittent abort, Daniel P . Berrangé, 2022/03/15
- Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue), Peter Maydell, 2022/03/15
- Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue),
Dr. David Alan Gilbert <=
- Re: multifd/tcp/zlib intermittent abort (was: Re: [PULL 00/18] migration queue), Peter Maydell, 2022/03/15
Re: [PULL 00/18] migration queue, Christian Borntraeger, 2022/03/15