bug-gzip
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#48424: bug in "gzip -lv gzip-file"


From: Adler, Mark
Subject: bug#48424: bug in "gzip -lv gzip-file"
Date: Fri, 14 May 2021 19:53:22 +0000

Robert,

No, it’s not that the gzip utility implementation is using the wrong size 
integer. This is because the gzip utility is using the gzip-format trailer to 
guess at the uncompressed length. That trailer has a four-byte length, which is 
the uncompressed length of the last member modulo 2^32. Sometimes the guess is 
wrong.

The only way around this limitation, built into the gzip format, would be to 
decode the entire file to compute the determine the actual uncompressed length. 
pigz will do this on request with the -lt option.

There is no way to both rapidly and reliably get the uncompressed length.

What’s more, a compressed length of more than 4 GiB is not the only way for 
gzip -l to be wrong. gzip streams can consist of multiple members, in which 
case gzip -l will report the length from only the last member. Here is an 
example, first correctly enumerated by pigz -ltv:

% pigz -ltv mult.gz
method    check    timestamp    compressed   original reduced  name
gzip 8  66007dba  Mar 21  2005       54405     152089   64.2%  alice
gzip 8  b56c3f9d  Mar 21  2005          13         14    7.1%  <...>
gzip 8  8efc3b00  Mar 21  2005       71667     296960   75.9%  <...>

gzip -lv will give information only from the last member:

% gzip -lv mult.gz
method  crc     date  time           compressed        uncompressed  ratio 
uncompressed_name
defla 8efc3b00 Feb  2 09:30              126145              296960  57.5% mult

pigz -lv just looks at the trailer for the crc and length just like gzip, and 
also gets it wrong:

% pigz -lv mult.gz
method    check    timestamp    compressed   original reduced  name
gzip 8  8efc3b00  Mar 21  2005      126121     296960   57.5%  alice

Mark


On May 14, 2021, at 11:01 AM, Robert Urban 
<robert.urban@stromasys.com<mailto:robert.urban@stromasys.com>> wrote:

Hello,

gzip (at least my version, v1.10 running on Fedora 33) apparently uses an
unsigned 32-bit value when displaying the uncompressed size of a gzipped file.

This demonstrates the problem:

Create a 5GiB test file:

   $ fallocate -l $((5*1024*1024*1024)) fatfile

Compress it:

   $ gzip -c fatfile > fatfile.gz

List the contents:

   $ gzip -lv fatfile.gz
   method  crc     date  time           compressed        uncompressed  ratio
   uncompressed_name
   defla 193838c3 May 14 19:53             5857306          1073741824  99.5%
   fatfile

As you can see, the value in the "uncompressed" column is exactly 1GiB.

Regards,
Robert Urban

Please cc me in replies, as I'm not a subscriber of the list



reply via email to

[Prev in Thread] Current Thread [Next in Thread]