lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] lzip vs. zstd


From: Antonio Diaz Diaz
Subject: Re: [Lzip-bug] lzip vs. zstd
Date: Thu, 20 Oct 2016 02:47:21 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14

address@hidden wrote:
do you already have numbers, opinions and maybe a comparison in
reliability, speed, compression ratio etc. against the new zstd?

I have used unzcrash to test the ability of the zstd decoder to detect corruption by itself (without a checksum), and the results are not good. As an example, here are the results of repeatedly decompressing the file COPYING.zst (a copy of the GPLv3) inverting a bit each time as to test all possible bit flips:

   11913 bytes tested
   95304 total decompressions
   56058 decompressions returned with zero status, of which
   56017 comparisons failed

The zstd decoder detects the corruption less than half of the times. Compare this with the lzip decoder, that detects about 99.99995% of the bit flips even without the help of its 3-factor integrity checking.

Using 'zstd --no-check' is significantly unsafer than using 'xz --check=none'.

Even with integrity checking enabled, my guess is that it is at least a million times more probable to get a false negative (undetected corruption) from zstd than from lzip.

The zstd file format has many of the defects of the xz format[1]; unprotected lengths, unprotected flags, unprotected dictionary IDs, optional integrity checking, optional file concatenation, and it does not seem to admit trailing data. Also the current version of the zstd file format is 0.2.0, which may mean that changes in the format are expected.

Zstd is described as a "fast real-time compression algorithm". AFAIK, its author does not recommend zstd for long-term archiving.

So my advice is that you should not use zstd for long-term archiving.

[1] http://www.nongnu.org/lzip/xz_inadequate.html

Juan Francisco Cantero Hurtado asked me if I know why the tests of zstd take so long to finish.

It seems that 'make test' takes a lot of time (17 min) because it is a full regression test, not just a small test with a few files to verify that compilation went well, as most programs do. The theoretical basis of zstd[2] seems more complicated than that of LZMA, and the author probably wants to make sure that any possible bug is caught early.

[2] https://arxiv.org/abs/1311.2540 Asymmetric numeral systems: entropy coding combining speed of Huffman coding with compression rate of arithmetic coding.


Best regards,
Antonio.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]