--- Begin Message ---
Subject: |
gzip -l reports wrong size for decompressed files larger than 4GB |
Date: |
Sun, 25 Mar 2018 10:42:42 +0200 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:49.0) Gecko/20100101 Firefox/49.0 SeaMonkey/2.46 |
Hello!
I am using gzip 1.6 from openSUSE Leap 42.3 with latest patches
$ file /usr/bin/gzip
/usr/bin/gzip: ELF 64-bit LSB executable, x86-64, version 1 (SYSV), dynamically linked, interpreter
/lib64/ld-linux-x86-64.so.2, for GNU/Linux 3.0.0, BuildID[sha1]=7103d56e17e6f81a52db927e393dce601c3af0e1, stripped
There is a compressed file available at https://data.dnb.de/opendata/GND.rdf.gz which has a size of 1.232.465.678 bytes.
Uncompressed it will have a size of 19.465.374.298
The problem is:
$ gzip -l GND.rdf.gz
compressed uncompressed ratio uncompressed_name
1232465678 2285505114 46.1% GND.rdf
This number 2285505114 is actually the lower 32 bits of the real size 19GB.
$ echo "19465374298-16*1024*1024*1024" | bc
2285505114
Such a behaviour is okay for 32-bit software, 64-bit should show correct
numbers.
Thanks
Wolfgang
--- End Message ---
--- Begin Message ---
Subject: |
Re: bug#17804: RFC: fixing the 32-bit size and time limits in gzip file format |
Date: |
Wed, 15 Dec 2021 18:33:12 -0800 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101 Thunderbird/91.3.1 |
On 6/18/14 15:12, Paul Eggert wrote:
One simple way forward would be to implement what pigz -tl does, namely,
decompress the input stream and discard the output, but print its size.
I finally got around to implementing that suggestion:
https://git.savannah.gnu.org/cgit/gzip.git/commit/?id=cf26200380585019e927fe3cf5c0ecb7c8b3ef14
https://git.savannah.gnu.org/cgit/gzip.git/commit/?id=32fef43b442c7abc70414863d64718cd06f6477a
So I am closing this old bug report.
--- End Message ---