bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38621: gdu showing different sizes


From: Bernhard Voelker
Subject: bug#38621: gdu showing different sizes
Date: Sun, 15 Dec 2019 22:19:29 +0100
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:68.0) Gecko/20100101 Thunderbird/68.3.0

tag 38621 notabug
close 38621
stop

On 2019-12-15 06:15, TJ Luoma wrote:
> I ended up with two version of the same file
> 'StreamDeck-4.4.2.12189.pkg' and 'Stream_Deck_4.4.2.12189.pkg' and
> wanted to check to see if they were the same file.
> 
> I checked the size with `gdu` like so:
> 
> % /usr/local/bin/gdu --si -s *pkg
> 101M     StreamDeck-4.4.2.12189.pkg
> 102M     Stream_Deck_4.4.2.12189.pkg
> 
> Which led me to think they were different files / sizes. But when I
> used `ls -l` I was surprised to see this:
> 
> % command ls -l *pkg
> -rw-r--r--  1 tjluoma  staff  88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg
> -rw-r--r--@ 1 tjluoma  staff  88885047 Dec 15 00:02 
> Stream_Deck_4.4.2.12189.pkg
> 
> So they _are_ the same size. Are they the same file? I used `md5` to check
> 
> % command md5 -r *pkg
> 98ac563a36386ca3aa87f62893302b4f StreamDeck-4.4.2.12189.pkg
> 98ac563a36386ca3aa87f62893302b4f Stream_Deck_4.4.2.12189.pkg
> 
> OK, so these are exactly the same file. So… why did `gdu` tell me they
> are different sizes?
> 
> %  gdu --version
> du (GNU coreutils) 8.31
> Copyright (C) 2019 Free Software Foundation, Inc.
> License GPLv3+: GNU GPL version 3 or later 
> <https://gnu.org/licenses/gpl.html>.
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> 
> Written by Torbjorn Granlund, David MacKenzie, Paul Eggert,
> and Jim Meyering.
> 
> I'm using Mac OS X 10.14.6 (18G2022) with `coreutils` installed via `brew`.
> 
> Any help would be appreciated.

This is a "sparse" file, i.e., a file with longer sequences of Zeroes
somewhere in between which can be stored more efficient on the disk.
Any application reading the data will get the correct number of Zeroes,
while some disk space is saved.

E.g. the following creates a 300M file, with the first 100M and the last 100M
with random data, and the 100M between is a "hole":

  # Write the 1st 100M (as usual).
  $ dd bs=1M count=100 if=/dev/urandom of=f
  100+ 0 records in
  100+0 records out
  104857600 bytes (105 MB, 100 MiB) copied, 0.466356 s, 225 MB/s

  # Write another 100M, but starting at a position of 200M,
  # thus leaving Zeroes in between.
  $ dd bs=1M seek=200 count=100 if=/dev/urandom of=f
  100+0 records in
  100+0 records out
  104857600 bytes (105 MB, 100 MiB) copied, 0.462072 s, 227 MB/s

  $ ls -logh f
  -rw-r--r-- 1 300M Dec 15 18:17 f

  $ du -h f  # shows the space occupied on disk.
  200M  f

  $ du --apparent-size -h f  # shows the size applications would read.
  300M  f

See the documentation of 'cp' and 'du':
https://www.gnu.org/software/coreutils/cp  (the --sparse option)
https://www.gnu.org/software/coreutils/du  (the --apparent-size option)

As this is not a bug in du(1), I'm marking this as such, and close the ticket
in our bug tracker.  The discussion can continue, of course.

Have a nice day,
Berny





reply via email to

[Prev in Thread] Current Thread [Next in Thread]