lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Fast lzma radix matchfinder


From: Adam Tuja
Subject: Re: Fast lzma radix matchfinder
Date: Wed, 15 Jun 2022 02:50:24 +0200

Hello,
 
> Speed gains depend on the nature of the source data
It's more or less true for every LZ-compressor and in general case doesn't change much. There were some specific data that were compressed worse, there were also some compressed better than lzma, but still, the difference wasn't big and only exceptionally noticeable.
 
> to achieve about the same ratio as 7-Zip requires double the dictionary size
In general, to be "compatible" with lzma compression ratios, he chose to increase dictionary size. It's in `man fxz` /Compression preset levels.
 
It could be, more or less, achieved by adjusting match finder. In reality this isn't working so well and increasing dictionary is better way.
To illustrate it I used lzip presets in xz, fastlzma2; I also increased match length by 50% but, as it turned out, it didn't change that much. [1]
Given that increasing is still 2 times faster and utilizing more processors doesn't use much more memory either, it was obvious choice.
 
Increased dictionary size increases decompression memory requirement but it's still 6 times smaller than what compression needs. And these days phones have 8+ times more memory than highest preset (128MB).
 
Speaking of dictionary sizes and presets, I'm surprised that lzip's presets for levels 8 and 9 don't increase as by 100% as lower levels and are not 32M and 64M respectively.
 
> Also, having a level 11 that compresses less than level 9 is confusing to users.
Compressors that use more than one algorithm use this exact way to distinguish between them. [2]
As long as it's stated in manual/help it should be no problem.
 
 
> Increasing the number of levels also hinders data recovery
Like how? It produces lzma stream that can be decompressed by lzma decompressor. Decompressor doesn't know nor care about levels.
 
> options like -11 or -19 are not compatible with POSIX or GNU
standards
Then maybe an option to choose mode. There are two already - fast and normal, they are not selectable at the moment but again, decompressor doesn't know, nor care about that - it only needs to know dictionary size.
 
Anyway you my find something useful there anyway.
 
 
[1] https://pastebin.com/ckEv4Yc3
[2] for example: https://github.com/inikep/lizard
 


14.06.2022, 18:23, "Antonio Diaz Diaz" <antonio@gnu.org>:

Adam Tuja wrote:

 The comparison here would be the same as with lzma, that is slightly faster. [1]
 Bigger advantage, beside compression speed, is revealed in memory consumption
 for multiple threads - it's halved for single thread but 1/4 for 2 threads and
 1/8 for 4 threads [1][2].


Very interesting. Thank you for bringing this to my attention. I expect to
look at it in depth when I find the time, but I guess it may be difficult
(or impossible) to integrate it meaningfully into plzip because it seems
very different from what plzip does. See for example
https://github.com/conor42/fast-lzma2#readme

"Speed gains depend on the nature of the source data."

"The largest caveat is that the match-finder is a block algorithm, and to
achieve about the same ratio as 7-Zip requires double the dictionary size,
which raises the decompression memory usage."

 it is not worth the trouble of breaking lzip's reproducibility
 Don't know what you mean by "reproducibility"


Lzip is more than a compressor. It is a set of tools designed around a
format tuned for long-term archiving. It is important that the output of
lzip does not change frequently between versions because such changes may
hinder some kinds of data recovery. See for example
http://www.nongnu.org/lzip/manual/lziprecover_manual.html#Reproducing-one-sector

We need to think about the consequences of the consequences (sic) of any
change to the interface or to the algorithm.

 but I didn't mean to replace current encoder/s, rather complement them.
 If it was used it could be different compression levels, like 11-19.


Increasing the number of levels also hinders data recovery.

Moreover, options like -11 or -19 are not compatible with POSIX or GNU
standards. See
http://www.nongnu.org/arg-parser/manual/arg_parser_manual.html#Argument-syntax

Also, having a level 11 that compresses less than level 9 is confusing to users.

So these may also be difficult to integrate meaningfully into lzip.

Best regards,
Antonio.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]