lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.


From: Antonio Diaz Diaz
Subject: Re: [Lzip-bug] Want to Jettison xz(1), But Size Matters.
Date: Sat, 21 Jul 2018 00:08:11 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.9.1.19) Gecko/20110420 SeaMonkey/2.0.14

Hi Ralph,

Ralph Corderoy wrote:
It is explained at http://www.nongnu.org/lzip/lzip_benchmark.html#xz2

Thanks, it does indeed explain it.  I skipped that section before
because of its heading: `Lzip compresses large tarballs more than xz'.
That read like a claim to me rather than my `Why is xz compressing more
than lzip' FAQ that's explained within.  :-)

Thanks for the hint. I have just reworded that heading and the next one because both explain why xz seems to perform better than lzip in some circumstances.


It now gives the expected results with my large XCF file.  Only 56 MiB
was required for the dictionary.

As you can see, lzip adjusts the dictionary to the file size.


     $ stat -c '%s  %n' * | sort -k1,1n -k2
     20957117  foo.xcf.lzip-m64-s64MiB
     21001368  foo.xcf.xz-9

You may perhaps obtain slightly better results with the shorter command 'lzip -9s26'.


         2^12 to 2^29 bytes. Note that dictionary sizes are quantized. If
         the specified size does not match one of the valid sizes, it will
         be rounded upwards by adding up to (BYTES / 8) to it.

Could info's last sentence be extended slightly with a clue why?

It has nothing to do with lzip's algorithm. It is simply for efficiency. As the dictionary size is just the minimum size of the buffer needed to decompress a file, it does not hurt to allocate a slightly larger buffer. This allows the size to be coded in just one byte, instead of the four bytes used by lzma-alone.


Also, I didn't think the info, or the man page which is always my first
port of call, explicitly stated that the last setting wins, e.g. `-9 -s
64MiB' uses `-9's `-m' of 273, as I think the source currently shows.

Thanks. You are right. I'll make the info manual explicitly state this.


Thanks for your help.  My ~/bin/toxz, formerly `tobz2', `togz', `toZ',
for converting already compressed files, has become `tolz'.

My pleasure. :-)

BTW, do you know zutils' zupdate?
http://www.nongnu.org/zutils/zutils.html
file:///home/internet/savannah/zutils/cvs/zutils/manual/zutils_manual.html#Zupdate

"zupdate recompresses files from bzip2, gzip, and xz formats to lzip format. Each original is compared with the new file and then deleted. Only regular files with standard file name extensions are recompressed, other files are ignored. Compressed files are decompressed and then recompressed on the fly; no temporary files are created."


BTW, I've added a reference to Wikipedia's `LZMA' page, that covers
`LZMA2' too.  Hopefully, it will remain.
https://en.wikipedia.org/w/index.php?title=Lempel%E2%80%93Ziv%E2%80%93Markov_chain_algorithm&diff=851144067&oldid=840725747

Thanks. Let's hope it. IIRC, a similar reference was deleted before.

I am surprised that the claim "LZMA2 supports arbitrarily scalable multithreaded compression and decompression" is still in the Wikipedia given the findings in http://www.nongnu.org/lzip/xz_inadequate.html and the fact that xz-utils does not yet implement parallel LZMA2 decompression after a decade. I think all parallel xz (de)compressors use the same method as plzip (splitting the input file in independent LZMA/LZMA2 members/blocks/streams). None of them seem to use the claimed capabilities of LZMA2.


Best regards,
Antonio.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]