lzip-bug
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Lzip-bug] reducing memory usage when decompressing


From: Antonio Diaz Diaz
Subject: Re: [Lzip-bug] reducing memory usage when decompressing
Date: Sat, 06 Dec 2008 19:04:01 +0100
User-agent: Mozilla/5.0 (X11; U; Linux i586; en-US; rv:1.7.11) Gecko/20050905

Hello John,

John Reiser wrote:
When decompressing a compressed file, it seems to me that there is no benefit
to having a decompression buffer that is larger than the original file,
because any match distance (for copying) must be less.  The decompressor
could save space by allocating only
   min((1<<dictionary_bits), uncompressed_size)
bytes for the buffer.  It is somewhat unfortunate that the uncompressed size
appears not in the header of the compressed data, but only in the trailer.

Gzip also stores the uncompressed size in the trailer because counting the bytes is the only way of being sure the size is correct.


The usage model of limiting the size of the decompression buffer, but
still allowing the compressor to achieve tight compression by using
larger arrays for the probability model, longer than
   (1 << ceil(log2(uncompressed_size))) ,
is also attractive.  However, lzip has coupled together the buffer size
and the model size during compression.

It would be attractive if one could know the uncompressed size in advance. Also note that I have not found any file that needed an array more than 2 times its size to achieve maximum compression, so they are probably rare.


What are your thoughts about reducing memory usage when decompressing,
and allowing a model size that is independent of buffer size?

The only two ways I see of reducing memory usage when decompressing are, being careful when compressing or, overriding the dictionary size stored in the file header with a command line option.

Allowing an array size independent of buffer size can make lzip slower and more complex. I would like to see proof that larger arrays improve compression significantly before implementing it.


Best regards,
Antonio.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]