lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] reducing memory usage when decompressing


From: John Reiser
Subject: Re: [Lzip-bug] reducing memory usage when decompressing
Date: Tue, 09 Dec 2008 10:33:04 -0800
User-agent: Thunderbird 2.0.0.18 (X11/20081119)

Hi Antonio,

> Gzip also stores the uncompressed size in the trailer because counting
> the bytes is the only way of being sure the size is correct.

If the size (and other meta-data such as timestamps, etc.) of an input file
after compression differs from before compression, then the compressor
has detected an inconsistency, and should warn the user.  The expected
frequency of this is low, but when it happens then it is very significant.

> It would be attractive if one could know the uncompressed size in
> advance. Also note that I have not found any file that needed an array
> more than 2 times its size to achieve maximum compression, so they are
> probably rare.

Can you give a concrete example of a file which requires close to 2 times
its size to achieve maximum compression?  Proving such a bound would be welcome.

> Allowing an array size independent of buffer size can make lzip slower
> and more complex.

The purpose is to control the buffer size that the decompressor requires.
The match detector could just suppress (or avoid discovering) any match
with a distance that exceeds a threshold.  This should have no effect
on correctness of any other part of the compressor.  At least, there
is no documentation that mentions any requirement to discover all matches.
Such a restriction would preclude various heuristic match detectors.
In most other compressors (zlib/gzip, bzip2, lzma, ...) fewer matches
means faster execution.

> I would like to see proof that larger arrays improve
> compression significantly before implementing it.

Well, your file that needs twice its size to achieve maximum compression
should be one example.  I'll look for more.

Regards,

-- 
John Reiser




reply via email to

[Prev in Thread] Current Thread [Next in Thread]