lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] Re: performance: gzip, lzip, xz


From: Antonio Diaz Diaz
Subject: Re: [Lzip-bug] Re: performance: gzip, lzip, xz
Date: Tue, 13 Oct 2009 14:32:29 +0200
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905

Hello Jim,

Thanks for your interest in lzip. I hope I'll be able to convince you that lzip is better than you think. :-)


Jim Meyering wrote:
Claiming that xz has no clear goal seems mildly libelous.

I am not trying to discredit anybody. I am only stating that the xz format is far from ready for general use.

Maybe xz has a clear goal, but I have been unable to discover what it could be. Perhaps its goal is to find out the limit between format flexibility and format security, given the number of times the xz format had to be changed due to security problems.

Clearly long term stability is not the goal of xz. Just read the README file for 4.999.9beta, line 51: "Since the .xz format allows adding new filter IDs, it is possible that some day there will be a filter that is, for example, much faster to compress than LZMA2 (but probably with worse compression ratio). Similarly, it is possible that some day there is a filter that will compress better than LZMA2".

Will the old filters be removed as new ones are added, leaving users without support for old files, or will xz become increasingly bloated by old filters that almost nobody uses?

In any case, one does not need to be an IBM engineer to notice xz goal is not as clear as lzip goal:
http://lpar.ath0.com/2009/09/25/documentation-as-an-indicator-of-code-quality/
"Comparing the two, I see that xz has many more options. It has all kinds of tweaks to specify how much memory it uses, tweak various internal details of the LZMA algorithm, and filter the data. None of these options are adequately explained. To quote Ted Nelson quoting Roger Gregory, "An option means the programmer didn't have a clear idea of what the module was supposed to do." Or as Steve Krug puts it, "Don't make me think."

In contrast, lzip's user interface is much simpler, and closer to the Unix philosophy of "do one thing, and do it well". The only two tweaks to the LZMA algorithm lzip provides are adequately explained if you know the basics of how compression algorithms tend to work, and there's a table showing how they correspond to the compression levels -0 to -9. The only borderline gratuitous option is to split the compressed file into chunks, and that's at least a useful one. It also gets the SI units right.

So, lzip wins by a landslide on UI and documentation".


The .xz format is in no way an archive-like format. You cannot store
file names in .xz, and .xz supports even less metadata than .gz.

By archiver-like I mean it is way too complicated for a general purpose compressor and it includes features I have only found in archiver formats, like the subblock filter.


Regarding the possibility of recovery, there are not many differences
between .xz and .lz.

There is an important difference; in case of data corruption, xz format can fail in a thousand more ways than the much simpler lzip format. This is the reason lzip does have a recovery tool already, and XZ Utils does not. Just compare the formats to see what I mean.
http://www.nongnu.org/lzip/manual/lzip_manual.html#File-Format
http://tukaani.org/xz/xz-file-format-1.0.4.txt

One inconsistency that can make difficult even the detection of data corruption in xz files is that the format only requires implementations to support CRC32[1], but the xz tool uses CRC64 by default[2].
[1] see xz-file-format-1.0.4.txt, line 353.
[2] see "man xz", line 362.


Claiming long-term stability of the .lz format is a stretch.

Lzip format is definitive. It offers the same capabilities as bzip2. If some day I discover some better compression algorithm and decide to implement it, I'll write a new compressor and format. Remember, "do one thing, and do it well".


The file format has changed at least once (probably twice, but I'm
not sure) since the first stable release.  Older versions of lzip
cannot decompress new format files.  The same can and (I'm sure) will
happen with .xz too, but in case of .lz, it has been about adding basic
features that .xz had in the first place.

Lzip format has changed exactly once form the first released version. The only two changes were: The "member size" field was added to improve the recovery of undamaged members from multimember files. Coding of dictionary size in member header was extended to support more fine grained values.

I do not see those changes as "basic features", and certainly data recovery is not present in xz even now.


Regards,
Antonio.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]