Re: wc annoyances

bug-coreutils
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: wc annoyances

From:	Paul Eggert
Subject:	Re: wc annoyances
Date:	20 Jul 2003 01:54:50 -0700
User-agent:	Gnus/5.09 (Gnus v5.9.0) Emacs/21.3
Daniel Russell <address@hidden> writes:

> [ `cat autolist-next.num` -gt `wc -l autolist.txt` ]

That's easy enough to fix by using `wc -l <autolist.txt` instead.

However, I agree with you that the leading spaces are annoying; this
has bitten me as well, in other examples.  I looked into this, and it
turns out that the coreutils documentation is incorrect about what
POSIX requires for 'wc'.  When this misunderstanding is corrected, 'wc
-l <autolist.txt' can output no leading spaces.  While I was at it, I
fixed another problem: wc's columns sometimes don't line up for files
bigger than 9,999,999 bytes.  The fix handles only the case of regular
files, but that's an important case.

This patch fixes a couple of other minor related bugs that I noticed
while implementing this change: there was a confusion between "" and NULL,
and a possible arithmetic overflow.

2003-07-20  Paul Eggert  <address@hidden>

        wc count field widths now are heuristically adjusted depending
        on the input size, if known.  If only one count is printed, it
        is guaranteed to be printed without leading spaces.

        Previously, wc did not align the count fields if
        POSIXLY_CORRECT was set, but POSIX did not actually require
        this undesirable behavior, so it has been removed.

        * NEWS: Document this.
        * doc/coreutils.texi (wc invocation): Likewise.

        * src/wc.c (number_width): New var.
        (posixly_correct): Remove.
        (struct fstatus): New struct.
        (write_counts): Output fields of width number_width.
        Do not worry about POSIXLY_CORRECT.
        Use null file, not empty-string file, to denote stdin,
        since "" is a valid file name on some hosts.
        (wc, wc_file): New arg fstatus.  Use it to avoid invoking fstat
        if possible.
        (wc):  Avoid problems if end_pos - current_pos overflows.
        Do not print odd message if stdin has a read error.
        (get_input_fstatus, compute_number_width): New functions.
        (main): Use them to implement the new behavior.
        Ignore POSIXLY_CORRECT.

        * tests/wc/Test.pm: Adjust to the new output widths.

Index: NEWS
===================================================================
RCS file: /cvsroot/coreutils/coreutils/NEWS,v
retrieving revision 1.112
diff -p -u -r1.112 NEWS
--- NEWS        18 Jul 2003 08:53:32 -0000      1.112
+++ NEWS        20 Jul 2003 07:33:23 -0000
@@ -1,4 +1,14 @@
 GNU coreutils NEWS                                    -*- outline -*-
+
+* wc count field widths now are heuristically adjusted depending on the input
+  size, if known.  If only one count is printed, it is guaranteed to
+  be printed without leading spaces.
+
+* Previously, wc did not align the count fields if POSIXLY_CORRECT was set,
+  but POSIX did not actually require this undesirable behavior, so it
+  has been removed.
+
+
 * Major changes in release 5.0.2:
 
 ** Bug fixes
Index: doc/coreutils.texi
===================================================================
RCS file: /cvsroot/coreutils/coreutils/doc/coreutils.texi,v
retrieving revision 1.121
diff -p -u -r1.121 coreutils.texi
--- doc/coreutils.texi  18 Jul 2003 07:50:39 -0000      1.121
+++ doc/coreutils.texi  20 Jul 2003 07:34:20 -0000
@@ -2532,17 +2532,17 @@ wc address@hidden@dots{} address@hidden@do
 @end example
 
 @cindex total counts
address@hidden POSIXLY_CORRECT
 @command{wc} prints one line of counts for each file, and if the file was
 given as an argument, it prints the file name following the counts.  If
 more than one @var{file} is given, @command{wc} prints a final line
 containing the cumulative counts, with the file name @file{total}.  The
 counts are printed in this order: newlines, words, characters, bytes.
-By default, each count is output right-justified in a 7-byte field with
-one space between fields so that the numbers and file names line up nicely
-in columns.  However, @acronym{POSIX} requires that there be exactly one space
-separating columns.  You can make @command{wc} use the @acronym{POSIX}-mandated
-output format by setting the @env{POSIXLY_CORRECT} environment variable.
+Each count is printed right-justified in a field with at least one
+space between fields so that the numbers and file names normally line
+up nicely in columns.  The width of the count fields varies depending
+on the inputs, so you should not depend on a particular field width.
+However, as a @acronym{GNU} extension, if only one count is printed,
+it is guaranteed to be printed without leading spaces.
 
 By default, @command{wc} prints three counts: the newline, words, and byte
 counts.  Options can specify that only certain counts be printed.
Index: src/wc.c
===================================================================
RCS file: /cvsroot/coreutils/coreutils/src/wc.c,v
retrieving revision 1.86
diff -p -u -r1.86 wc.c
--- src/wc.c    17 Jun 2003 18:13:24 -0000      1.86
+++ src/wc.c    20 Jul 2003 08:49:04 -0000
@@ -92,15 +92,26 @@ static uintmax_t max_line_length;
 static int print_lines, print_words, print_chars, print_bytes;
 static int print_linelength;
 
+/* The print width of each count.  */
+static int number_width;
+
 /* Nonzero if we have ever read the standard input. */
 static int have_read_stdin;
 
 /* The error code to return to the system. */
 static int exit_status;
 
-/* If nonzero, do not line up columns but instead separate numbers by
-   a single space as specified in Single Unix Specification and POSIX. */
-static int posixly_correct;
+/* The result of calling fstat or stat on a file descriptor or file.  */
+struct fstatus
+{
+  /* If positive, fstat or stat has not been called yet.  Otherwise,
+     this is the value returned from fstat or stat.  */
+  int failed;
+
+  /* If FAILED is zero, this is the file's status.  */
+  struct stat st;
+};
+
 
 static struct option const longopts[] =
 {
@@ -153,42 +164,41 @@ write_counts (uintmax_t lines,
              uintmax_t linelength,
              const char *file)
 {
+  static char const format_sp_int[] = " %*s";
+  char const *format_int = format_sp_int + 1;
   char buf[INT_BUFSIZE_BOUND (uintmax_t)];
-  char const *space = "";
-  char const *format_int = (posixly_correct ? "%s" : "%7s");
-  char const *format_sp_int = (posixly_correct ? "%s%s" : "%s%7s");
 
   if (print_lines)
     {
-      printf (format_int, umaxtostr (lines, buf));
-      space = " ";
+      printf (format_int, number_width, umaxtostr (lines, buf));
+      format_int = format_sp_int;
     }
   if (print_words)
     {
-      printf (format_sp_int, space, umaxtostr (words, buf));
-      space = " ";
+      printf (format_int, number_width, umaxtostr (words, buf));
+      format_int = format_sp_int;
     }
   if (print_chars)
     {
-      printf (format_sp_int, space, umaxtostr (chars, buf));
-      space = " ";
+      printf (format_int, number_width, umaxtostr (chars, buf));
+      format_int = format_sp_int;
     }
   if (print_bytes)
     {
-      printf (format_sp_int, space, umaxtostr (bytes, buf));
-      space = " ";
+      printf (format_int, number_width, umaxtostr (bytes, buf));
+      format_int = format_sp_int;
     }
   if (print_linelength)
     {
-      printf (format_sp_int, space, umaxtostr (linelength, buf));
+      printf (format_int, number_width, umaxtostr (linelength, buf));
     }
-  if (*file)
+  if (file)
     printf (" %s", file);
   putchar ('\n');
 }
 
 static void
-wc (int fd, const char *file)
+wc (int fd, char const *file, struct fstatus *fstatus)
 {
   char buf[BUFFER_SIZE + 1];
   size_t bytes_read;
@@ -229,16 +239,17 @@ wc (int fd, const char *file)
   if (count_bytes && !count_chars && !print_lines && !count_complicated)
     {
       off_t current_pos, end_pos;
-      struct stat stats;
 
-      if (fstat (fd, &stats) == 0 && S_ISREG (stats.st_mode)
+      if (0 < fstatus->failed)
+       fstatus->failed = fstat (fd, &fstatus->st);
+
+      if (! fstatus->failed && S_ISREG (fstatus->st.st_mode)
          && (current_pos = lseek (fd, (off_t) 0, SEEK_CUR)) != -1
          && (end_pos = lseek (fd, (off_t) 0, SEEK_END)) != -1)
        {
-         off_t diff;
          /* Be careful here.  The current position may actually be
             beyond the end of the file.  As in the example above.  */
-         bytes = (diff = end_pos - current_pos) < 0 ? 0 : diff;
+         bytes = end_pos < current_pos ? 0 : end_pos - current_pos;
        }
       else
        {
@@ -246,7 +257,7 @@ wc (int fd, const char *file)
            {
              if (bytes_read == SAFE_READ_ERROR)
                {
-                 error (0, errno, "%s", file);
+                 error (0, errno, "%s", file ? file : _("standard input"));
                  exit_status = 1;
                  break;
                }
@@ -264,7 +275,7 @@ wc (int fd, const char *file)
 
          if (bytes_read == SAFE_READ_ERROR)
            {
-             error (0, errno, "%s", file);
+             error (0, errno, "%s", file ? file : _("standard input"));
              exit_status = 1;
              break;
            }
@@ -493,12 +504,12 @@ wc (int fd, const char *file)
 }
 
 static void
-wc_file (const char *file)
+wc_file (char const *file, struct fstatus *fstatus)
 {
   if (STREQ (file, "-"))
     {
       have_read_stdin = 1;
-      wc (0, file);
+      wc (STDIN_FILENO, file, fstatus);
     }
   else
     {
@@ -509,7 +520,7 @@ wc_file (const char *file)
          exit_status = 1;
          return;
        }
-      wc (fd, file);
+      wc (fd, file, fstatus);
       if (close (fd))
        {
          error (0, errno, "%s", file);
@@ -518,11 +529,74 @@ wc_file (const char *file)
     }
 }
 
+/* Return the file status for the NFILES files addressed by FILE.
+   Optimize the case where only one number is printed, for just one
+   file; in that case we can use a print width of 1, so we don't need
+   to stat the file.  */
+
+static struct fstatus *
+get_input_fstatus (int nfiles, char * const *file)
+{
+  struct fstatus *fstatus = xmalloc (nfiles * sizeof *fstatus);
+
+  if (nfiles == 1
+      && ((print_lines + print_words + print_chars
+          + print_bytes + print_linelength)
+         == 1))
+    fstatus[0].failed = 1;
+  else
+    {
+      int i;
+
+      for (i = 0; i < nfiles; i++)
+       fstatus[i].failed = (file[i] && strcmp (file[i], "-") != 0
+                            ? stat (file[i], &fstatus[i].st)
+                            : fstat (STDIN_FILENO, &fstatus[i].st));
+    }
+
+  return fstatus;
+}
+
+/* Return a print width suitable for the NFILES files whose status is
+   recorded in FSTATUS.  Optimize the same special case that
+   get_input_fstatus optimizes.  */
+
+static int
+compute_number_width (int nfiles, struct fstatus const *fstatus)
+{
+  int width = 1;
+
+  if (fstatus[0].failed <= 0)
+    {
+      int minimum_width = 1;
+      uintmax_t regular_total = 0;
+      int i;
+
+      for (i = 0; i < nfiles; i++)
+       if (! fstatus[i].failed)
+         {
+           if (S_ISREG (fstatus[i].st.st_mode))
+             regular_total += fstatus[i].st.st_size;
+           else
+             minimum_width = 7;
+         }
+
+      for (; 10 <= regular_total; regular_total /= 10)
+       width++;
+      if (width < minimum_width)
+       width = minimum_width;
+    }
+
+  return width;
+}
+
+
 int
 main (int argc, char **argv)
 {
   int optc;
   int nfiles;
+  struct fstatus *fstatus;
 
   initialize_main (&argc, &argv);
   program_name = argv[0];
@@ -533,7 +607,6 @@ main (int argc, char **argv)
   atexit (close_stdout);
 
   exit_status = 0;
-  posixly_correct = (getenv ("POSIXLY_CORRECT") != NULL);
   print_lines = print_words = print_chars = print_bytes = print_linelength = 0;
   total_lines = total_words = total_chars = total_bytes = max_line_length = 0;
 
@@ -576,16 +649,21 @@ main (int argc, char **argv)
     print_lines = print_words = print_bytes = 1;
 
   nfiles = argc - optind;
+  nfiles += (nfiles == 0);
+
+  fstatus = get_input_fstatus (nfiles, argv + optind);
+  number_width = compute_number_width (nfiles, fstatus);
 
-  if (nfiles == 0)
+  if (! argv[optind])
     {
       have_read_stdin = 1;
-      wc (0, "");
+      wc (STDIN_FILENO, NULL, &fstatus[0]);
     }
   else
     {
-      for (; optind < argc; ++optind)
-       wc_file (argv[optind]);
+      int i;
+      for (i = 0; i < nfiles; i++)
+       wc_file (argv[optind + i], &fstatus[i]);
 
       if (nfiles > 1)
        write_counts (total_lines, total_words, total_chars, total_bytes,
Index: tests/wc/Test.pm
===================================================================
RCS file: /cvsroot/coreutils/coreutils/tests/wc/Test.pm,v
retrieving revision 1.5
diff -p -u -r1.5 Test.pm
--- tests/wc/Test.pm    2 Nov 1997 14:28:10 -0000       1.5
+++ tests/wc/Test.pm    20 Jul 2003 07:39:19 -0000
@@ -6,22 +6,22 @@ $Test::input_via_stdin = 1;
 
 my @tv = (
 # test flags  input                 expected output        expected return code
-['a0', '-c',  '',                   "      0\n",                    0],
-['a1', '-l',  '',                   "      0\n",                    0],
-['a2', '-w',  '',                   "      0\n",                    0],
-['a3', '-c',  'x',                  "      1\n",                    0],
-['a4', '-w',  'x',                  "      1\n",                    0],
-['a5', '-w',  "x y\n",              "      2\n",                    0],
-['a6', '-w',  "x y\nz",             "      3\n",                    0],
+['a0', '-c',  '',                   "0\n",                          0],
+['a1', '-l',  '',                   "0\n",                          0],
+['a2', '-w',  '',                   "0\n",                          0],
+['a3', '-c',  'x',                  "1\n",                          0],
+['a4', '-w',  'x',                  "1\n",                          0],
+['a5', '-w',  "x y\n",              "2\n",                          0],
+['a6', '-w',  "x y\nz",             "3\n",                          0],
 # Remember, -l counts *newline* bytes, not logical lines.
-['a7', '-l',  "x y",                "      0\n",                    0],
-['a8', '-l',  "x y\n",              "      1\n",                    0],
-['a9', '-l',  "x\ny\n",             "      2\n",                    0],
-['b0', '',    "",                   "      0       0       0\n",    0],
-['b1', '',    "a b\nc\n",           "      2       3       6\n",    0],
-['c0', '-L',  "1\n12\n",            "      2\n",                    0],
-['c1', '-L',  "1\n123\n1\n",        "      3\n",                    0],
-['c2', '-L',  "\n123456",           "      6\n",                    0],
+['a7', '-l',  "x y",                "0\n",                          0],
+['a8', '-l',  "x y\n",              "1\n",                          0],
+['a9', '-l',  "x\ny\n",             "2\n",                          0],
+['b0', '',    "",                   "0 0 0\n",                      0],
+['b1', '',    "a b\nc\n",           "2 3 6\n",                      0],
+['c0', '-L',  "1\n12\n",            "2\n",                          0],
+['c1', '-L',  "1\n123\n1\n",        "3\n",                          0],
+['c2', '-L',  "\n123456",           "6\n",                          0],
 );
 
 sub test_vector
[Prev in Thread]
Current Thread
[Next in Thread]
wc annoyances, Daniel Russell, 2003/07/19
- Re: wc annoyances, Paul Jarc, 2003/07/20
- Re: wc annoyances, Paul Eggert <=
  - Re: wc annoyances, Jim Meyering, 2003/07/20
    - Re: wc annoyances, Paul Eggert, 2003/07/20
    - Re: wc annoyances, Jim Meyering, 2003/07/20
Prev by Date: Re: Coreutils 5.0.1: spurious error from uniq
Next by Date: Re: man page portability concern
Previous by thread: Re: wc annoyances
Next by thread: Re: wc annoyances
Index(es):
- Date
- Thread