lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.


From: Antonio Diaz Diaz
Subject: Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.
Date: Wed, 18 Jul 2018 20:07:30 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14

Hi Ralph,

Ralph Corderoy wrote:
Having read http://lzip.nongnu.org/xz_inadequate.html I'm happy to move
away from xz(1), having been lured by coreutils adding it originally.
So I picked a random Gimp XCF file already xz'd and compared sizes.

     55,569138  gimp
     21,001368  xz -9
     23,299403  lzip -9  23,299403 / 21,001368 = 1.109
[...]
Is there a known reason why xz does noticeably better is some
situations like this one?

I have never tried to compress .xcf files, but there is a known reason why 'xz -9' compresses files larger than 32 MiB better than 'lzip -9'. It is explained at http://www.nongnu.org/lzip/lzip_benchmark.html#xz2

----------------------------------------------------------------------
"xz -9" uses a dictionary size twice as large as "lzip -9" (and twice as large as "lzma -9"). This makes it appear as if xz could compress large files a little more than lzip. To find the truth just pass to lzip the arguments equivalent to those of "xz -9" (or to xz the arguments equivalent to those of "lzip -9"), and lzip will usually compress more than xz:

  linux-libre-3.12.5-gnu.tar (size 535347200)
  "lzip -m64 -s64MiB"               74192464   9m16s
  "xz -9"                           74306080   9m 7s

  "lzip -9"                         74330266  10m53s
  "xz --lzma2=nice=273,dict=32MiB"  74563636  10m15s

Note that using plain "-9" on both compressors, lzip usually compresses large files about as much as xz, but using half the RAM and requiring half the RAM to decompress.
----------------------------------------------------------------------

The large difference in compression ratio in this file may be caused, for example, by it containing two areas of similar data more than 32 MiB apart. This is why the lzip manual states that:

http://www.nongnu.org/lzip/manual/lzip_manual.html#Invoking-lzip
The bidimensional parameter space of LZMA can't be mapped to a linear scale optimal for all files. If your files are large, very repetitive, etc, you may need to use the '--dictionary-size' and '--match-length' options directly to achieve optimal performance.

I have just tried to compress a couple small .xcf files from the gimp distribution and lzip compresses both better than xz, so I guess that 'lzip -9 -s64MiB' should improve the compression ratio of this file.


Hope this helps,
Antonio.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]