coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Patchset pertaining to --si option of df, du, ls


From: Glenn Golden
Subject: Patchset pertaining to --si option of df, du, ls
Date: Tue, 8 Sep 2020 10:27:02 -0600
User-agent: Mutt/1.10.1 (2018-07-13)

The attached patchset addresses a minor issue with program behavior vs.
documentation of the df, du, and ls tools from coreutils-8.32, when using
the --si option.

It resurrects an issue that was brought up in 2014 [3] and eventually closed
in 2018 [4] with a wontfix (after minimal discussion in the intervening time).


Summary
-------

Output from df, du, ls tools with the --si option display results using 
single-letter units suffixes "k", "M", "G", etc., rather than "kB", "MB", "GB".

The authoritative documentation as to expected behavior with --si is self-
contradictory: The program behavior is consistent with the subsections of
coreutils.info pertaining to those individual tools, but directly contradicted
by the behavior specified in Section 2.3, which specifically concentrates on
describing how the various block size options behave.

The patchset brings the behavior into accordance with the behavior documented
in Section 2.3.

Examples:


#
# Behavior with unmodified coreutils-8.32:
#

  $ df --si  /mnt/test
  Filesystem      Size  Used Avail Use% Mounted on
  /dev/sdb5       500M  282M  214M  57% /mnt/test

  $ du --si  /mnt/test/foo
  40k     /mnt/test/foo

  $ ls --si -l /mnt/test/foo
  -rwxr-xr-x 1 root root 40k Sep  8 07:42 /mnt/test/foo


#
# Behavior with attached patchset applied to coreutils-8.32:
#

  # df --si /mnt/test
  Filesystem      Size  Used Avail Use% Mounted on
  /dev/sdb5      500MB 282MB 214MB  57% /mnt/test

  $ du --si  /mnt/test/foo
  40kB    /mnt/test/foo

  $ ls --si -l /mnt/test/foo
  -rwxr-xr-x 1 root root 40kB Sep  8 07:42 /mnt/test/foo



Background and history
----------------------

In what follows, "M" (mega) is used as an example unit of measurement; the
same applies to the other suffix units, k, G, etc.

"SI option" means the behavior observed using any of the following:

  Option --si 
  Option --block-size=si 
  Environment variable BLOCKSIZE=si 
  Environment variable BLOCK_SIZE=si
  Environment variable DF_BLOCK_SIZE=si

Doc sections cited below refer to coreutils.info from coreutils-8.32.

The main doc vs. behavior discrepancies are as follows:

* Section 2.3, which is an overview discussion of the semantics of the various
  block size options and nomenclature, states unequivocally that when the SI
  option is specified, the results are expressed using suffix MB, that MB
  means 1000^2, and that bare M means 1024^2:

     "With human-readable formats, output sizes are followed by a size
      letter such as ‘M’ for megabytes.  ‘BLOCK_SIZE=human-readable’ uses
      powers of 1024; ‘M’ stands for 1,048,576 bytes.  ‘BLOCK_SIZE=si’ is
      similar, but uses powers of 1000 and appends ‘B’; ‘MB’ stands for
      1,000,000 bytes."

* Sections 10.1.2, 14.1, and 14.2 (the subsections pertaining specifically
  to ls, df, and du) state just the opposite: That the SI option uses bare
  ("B-less") suffixes, and that the underlying representation base implied
  by the bare suffixes is decimal:  

     "--si
       Append an SI-style abbreviation to each size, such as ‘M’ for
       megabytes.  Powers of 1000 are used, not 1024; ‘M’ stands for
       1,000,000 bytes."

* Subsection 26.2 (which pertains specifically to numfmt) further confuses
  the issue by giving an example ("e.g.  ‘4G’ ↦ ‘4,000,000,000’)", which
  contradicts Section 2.3 by implying that a bare suffix means decimal base.

* The "coreutils gotchas" blurb [2] (which is linked from [1], hence can
  presumably be considered authoritative) agrees with coreutils.info
  Section 2.3 in the semantics of M vs. MB, but doesn't specifically say
  anything about the SI option.

There is no dispute (known to me) that the numerical values displayed when
using the SI option are indeed based on decimal base, which everyone seems
to agree is what is desired for that option.  The issue is solely whether
the string suffix applied to the numerical values ought to be M or MB.

As was pointed out in the original thread [4], the numfmt tool provides a
workaround for this issue. But since the issue exists in its own right as
an inconsistency between program behavior vs. doc (and between various docs,
regardless of which behavior is deemed correct) it seems like addressing it
in some form or another ought to be at least considered as an option, despite
the numfmt workaround.


Effect of patch on build tests
------------------------------

The proposed patch does cause one du build-time test to fail ("test/du/inodes")
but this is simply because that test expects the SI option to produce output
with a bare suffix rather than with the B-appended suffix as specified by
coreutils.info Section 2.3 (which is what the patch hews to). So if the
proposed patch is accepted, that test would also have to be updated to agree
with the changed expected semantics.


Comments
--------

This is surely a minor issue; my only motivation for bringing it up again is
simply that I just got burned by it, and during the figuring-out-why phase of
looking thru the code and doc, it seemed like a reasonably simple patch might
be able to take care of it for all three involved programs (df, du, ls) without
causing too much side-effect grief, so figured might as well submit it and see
if you agree.  There may of course be subtleties I've missed that make this
simple-seeming fix unworkable.

And of course an important consideration for "fixing" output formats of tools
that are as widely used as these is how much global breakage would result to
the numerous scripts in the wild that scrape output from them.  The other side
of that is that the proposed patch affects behavior only when the SI option is
used, which (I'm guessing) is probably not very often.

I totally get that, so am not advocating strongly that it ought to be "fixed"
along the lines suggested by the patch, only that it should be re-considered
as an option and discussed, rather than wontfix-ing it right off the bat, just
because that was how it was previously handled [4]. I suspect that some of the
documentation inconsistencies pointed out above were either not present or
not appreciated when the issue was wontfixed/closed in 2018.

If there is agreement to accept the patchset -- which presently patches only 
the code behavior, not the doc or the build tests -- let me know, and I'll be
glad to propose an updated patchset that attempts to address the associated
documentation as well, i.e. brings the various subsections of coreutils.info
into a self-consistent state, and modifies the failing du test appropriately.


References
----------

[1] "Coreutils - GNU core utilities", top-level coreutils page,
     https://www.gnu.org/software/coreutils/

[2] "Coreutils gotchas", subsection on "Unit representations",
     https://www.pixelbeat.org/docs/coreutils-gotchas.html 

[3]  Prior thread from Aug. 2014, same topic: 
     https://lists.gnu.org/archive/html/bug-coreutils/2014-08/msg00022.html

[4]  Prior thread from Oct. 2018, same topic: 
     https://lists.gnu.org/archive/html/bug-coreutils/2018-10/msg00131.html

Attachment: patchset-gdg1.txt
Description: Text document


reply via email to

[Prev in Thread] Current Thread [Next in Thread]