bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#38621: gdu showing different sizes


From: Bob Proulx
Subject: bug#38621: gdu showing different sizes
Date: Mon, 16 Dec 2019 13:51:38 -0700

TJ Luoma wrote:
> AHA! Ok, now I understand a little better. I have seen the difference
> between "size" and "size on disk" and did not realize that applied
> here.
>
> I'm still not 100% clear on _why_ two "identical" files would have
> different results for "size on disk" (it _seems_ like those should be
> identical) but I suspect that the answer is probably of a technical
> nature that would be "over my head" so to speak, and truthfully, all I
> really need to know is "sometimes that happens" rather than
> understanding the technical details of why.

I think at the start is where the confusion began.  Because the
commands are named to show that they were intended to show different
things.

  'du' is named for showing disk usage

  'ls' is named for listing files

And those are rather different things!  Let's dig into the details.

The long format for information says:

  ‘-l’
  ‘--format=long’
  ‘--format=verbose’
       In addition to the name of each file, print the file type, file
       mode bits, number of hard links, owner name, group name, size, and
       timestamp (*note Formatting file timestamps::), normally the
       modification timestamp (the mtime, *note File timestamps::).  Print
       question marks for information that cannot be determined.

So we know that ls lists the size of the file.  But let me
specifically say that this is tagged to the *file*.  It's file
centric.  There is also the -s option.

  ‘-s’
  ‘--size’
       Print the disk allocation of each file to the left of the file
       name.  This is the amount of disk space used by the file, which is
       usually a bit more than the file’s size, but it can be less if the
       file has holes.

This displays how much disk space the file consumes instead of the
size of the file.  The two being different things.

And then the 'du' documentation says:

  ‘du’ reports the amount of disk space used by the set of specified files

And so du is the disk used by the file.  But as we know the amount of
disk used is dependent upon the file system holding the file.
Different file systems will have different storage methods and the
amount of disk space being consumed by a file will be different and
somewhat unrelated to the size of the file.  Disk space consumed to
hold the file could be larger or smaller than the file size.

In particular if the file is sparse then there are "holes" in the
middle that are all zero data and do not need to be stored.  Thereby
saving the space.  In which case it will be smaller.  Or since files
are stored in blocks the final block will have some fragment of space
at the end that is past the end of the file but too small to be used
for other files.  In which case it will be larger.

Therefore it is not surprising that the numbers displayed for disk
usage is not the same as the file content size.  They would really
only line up exactly if the file content size is a multiple of the
file system storage block size and every block is fully represented on
disk.  Otherwise they will always be at least somewhat different in
number.

As long as I am here I should mention 'df' which shows disk free space
information.  One sometimes thinks that adding up the file content
size should add up to du disk usage size, but it doesn't.  And one
sometimes thinks that adding up all of the du disk usage sizes should
add up to the df disk free sizes, but it doesn't.  That is due to a
similar reason.  File systems reserve a min-free amount of space for
superuser level processes to ensure continued operation even if the
disk is fulling up from non-privileged processes.  Also file system
efficiency and performance drops dramatically as the file system fills
up.  Therefore the file system reports space with the min-free
reserved space in mind.  And once again this is different on different
file systems.

But let me return to your first bit of information.  The ls long
listing of the files.  Your version of ls gave an indication that
something was different about the second file.

> % command ls -l *pkg
> -rw-r--r--  1 tjluoma  staff  88885047 Dec 15 00:00 StreamDeck-4.4.2.12189.pkg
> -rw-r--r--@ 1 tjluoma  staff  88885047 Dec 15 00:02 
> Stream_Deck_4.4.2.12189.pkg

See that '@' in that position?  The GNU ls coreutils 8.30
documentation I am looking at says:

     Following the file mode bits is a single character that specifies
     whether an alternate access method such as an access control list
     applies to the file.  When the character following the file mode
     bits is a space, there is no alternate access method.  When it is a
     printing character, then there is such a method.

     GNU ‘ls’ uses a ‘.’ character to indicate a file with a security
     context, but no other alternate access method.

     A file with any other combination of alternate access methods is
     marked with a ‘+’ character.

I did not see anywhere that documented what an '@' means.  Therefore
it is likely something applied in a downstream patch.  Likely a
software distribution specific modification.  But I don't really know.
I live under a rock and don't get out much.  But likely meaning that
the second file listed with the file mode '@' is not stored on disk in
a typical way.  That's probably the first clue that it is different.
But actually I do not know as I do not see files listed that way here.

Bob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]