bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: sort -o x -o y


From: Paul Eggert
Subject: Re: sort -o x -o y
Date: 02 Sep 2003 16:03:29 -0700
User-agent: Gnus/5.09 (Gnus v5.9.0) Emacs/21.3

Dan Jacobson <address@hidden> writes:

> $ echo a|sort -o x -o y
> $ ls
> y

POSIX allows this behavior, but it's admittedly weird.

I think that option order should not matter, unless POSIX or the
documentation explicitly says otherwise.  So I propose the following
patch.  While looking into this problem I noticed that sort's -t
option doesn't let you specify a NUL as a field separator (this is a
related issue since 'sort' uses 0 to represent "no option specified
yet").  Also, the documentation and usage strings incorrectly say
"white space" several places where they should say "blanks".  Here's
a patch for these problems.

2003-09-02  Paul Eggert  <address@hidden>

        * NEWS: sort -t '\0' now uses a NUL tab.
        sort option order no longer matters, unless POSIX requires it.
        * doc/coreutils.texi (sort invocation): -d now overrides -i.
        "whitespace" -> "blanks"; "whitespace" isn't correct.
        -t '\0' now specifies a NUL tab.
        * src/sort.c (usage): Say "blanks" instead of "whitespace",
        Similar fixes for many comments.
        (TAB_DEFAULT): New constant, so that we can support NUL as
        the field separator.
        (tab): Now int, not char.  Initialize to TAB_DEFAULT.
        (specify_sort_size): If multiple sizes are specified, use the largest.
        (begfield, limfield): Support NUL tab char.
        (set_ordering): Do not let -i override -d.
        (main): Report an error if incompatible -o or -t options are given.
        Report an error for "-t ''".  Allow "-t '\0'" to specify a NUL tab.

Index: NEWS
===================================================================
RCS file: /cvsroot/coreutils/coreutils/NEWS,v
retrieving revision 1.124
diff -p -u -r1.124 NEWS
--- NEWS        27 Aug 2003 09:18:28 -0000      1.124
+++ NEWS        2 Sep 2003 22:50:50 -0000
@@ -13,6 +13,12 @@ GNU coreutils NEWS                      
   timestamps to their full nanosecond resolution; microsecond
   resolution is the best we can do right now.
 
+  sort now supports the zero byte (NUL) as a field separator; use -t '\0'.
+  The -t '' option, which formerly had no effect, is now an error.
+
+  sort option order no longer matters for the options -S, -d, -i, -o, and -t.
+  Stronger options override weaker, and incompatible options are diagnosed.
+
 ** Bug fixes
 
   stat no longer overruns a buffer for format strings ending in `%'
Index: doc/coreutils.texi
===================================================================
RCS file: /cvsroot/coreutils/coreutils/doc/coreutils.texi,v
retrieving revision 1.130
diff -p -u -r1.130 coreutils.texi
--- doc/coreutils.texi  17 Aug 2003 17:10:25 -0000      1.130
+++ doc/coreutils.texi  2 Sep 2003 22:51:09 -0000
@@ -2969,6 +2969,8 @@ converting to floating point.
 @vindex LC_CTYPE
 Ignore nonprinting characters.
 The @env{LC_CTYPE} locale determines character types.
+This option has no effect if the stronger @option{--dictionary-order}
+(@option{-d}) option is also given.
 
 @item -M
 @itemx --month-sort
@@ -2976,7 +2978,7 @@ The @env{LC_CTYPE} locale determines cha
 @opindex --month-sort
 @cindex months, sorting by
 @vindex LC_TIME
-An initial string, consisting of any amount of whitespace, followed
+An initial string, consisting of any amount of blanks, followed
 by a month name abbreviation, is folded to UPPER case and
 compared in the order @samp{JAN} < @samp{FEB} < @dots{} < @samp{DEC}.
 Invalid names compare low to valid names.  The @env{LC_TIME} locale
@@ -2989,7 +2991,7 @@ category determines the month spellings.
 @cindex numeric sort
 @vindex LC_NUMERIC
 Sort numerically: the number begins each line; specifically, it consists
-of optional whitespace, an optional @samp{-} sign, and zero or more
+of optional blanks, an optional @samp{-} sign, and zero or more
 digits possibly separated by thousands separators, optionally followed
 by a decimal-point character and zero or more digits.  The @env{LC_NUMERIC}
 locale specifies the decimal-point character and thousands separator.
@@ -3085,7 +3087,7 @@ than @var{size}.
 @cindex field separator character
 Use character @var{separator} as the field separator when finding the
 sort keys in each line.  By default, fields are separated by the empty
-string between a non-whitespace character and a whitespace character.
+string between a non-blank character and a blank character.
 That is, given the input line @address@hidden foo bar}}, @command{sort} breaks 
it
 into fields @address@hidden foo}} and @address@hidden bar}}.  The field 
separator is
 not considered to be part of either the field preceding or the field
@@ -3093,6 +3095,10 @@ following.  But note that sort fields th
 as @option{-k 2}, or sort fields consisting of a range, as @option{-k 2,3},
 retain the field separators present between the endpoints of the range.
 
+To specify a zero byte (@acronym{ASCII} @sc{nul} (Null) character) as
+the field separator, use the two-character string @samp{\0}, e.g.,
address@hidden -t '\0'}.
+
 @item -T @var{tempdir}
 @itemx address@hidden
 @opindex -T
@@ -3218,7 +3224,7 @@ field-end part of the key specifier.
 
 @item
 Sort the password file on the fifth field and ignore any
-leading white space.  Sort lines with equal values in field five
+leading blanks.  Sort lines with equal values in field five
 on the numeric user ID in field three.
 
 @example
@@ -3242,7 +3248,7 @@ The use of @option{-print0}, @option{-z}
 that pathnames that contain Line Feed characters will not get broken up
 by the sort operation.
 
-Finally, to ignore both leading and trailing white space, you
+Finally, to ignore both leading and trailing blanks, you
 could have applied the @samp{b} modifier to the field-end specifier
 for the first key,
 
Index: src/sort.c
===================================================================
RCS file: /cvsroot/coreutils/coreutils/src/sort.c,v
retrieving revision 1.267
diff -p -u -r1.267 sort.c
--- src/sort.c  4 Aug 2003 08:55:44 -0000       1.267
+++ src/sort.c  2 Sep 2003 22:56:17 -0000
@@ -146,8 +146,8 @@ struct keyfield
   size_t echar;                        /* Additional characters in field. */
   bool const *ignore;          /* Boolean array of characters to ignore. */
   char const *translate;       /* Translation applied to characters. */
-  bool skipsblanks;            /* Skip leading white space at start. */
-  bool skipeblanks;            /* Skip trailing white space at finish. */
+  bool skipsblanks;            /* Skip leading blanks at start. */
+  bool skipeblanks;            /* Skip trailing blanks at finish. */
   bool numeric;                        /* Flag for numeric comparison.  Handle
                                   strings of digits with optional decimal
                                   point, but no exponential notation. */
@@ -173,7 +173,7 @@ char *program_name;
    internally, but doing this with good performance is a bit
    tricky.  */
 
-/* Table of white space. */
+/* Table of blanks.  */
 static bool blanks[UCHAR_LIM];
 
 /* Table of non-printing characters. */
@@ -243,10 +243,13 @@ static bool reverse;
    they were read if all keys compare equal.  */
 static bool stable;
 
-/* Tab character separating fields.  If NUL, then fields are separated
-   by the empty string between a non-whitespace character and a whitespace
+/* If TAB has this value, blanks separate fields.  */
+enum { TAB_DEFAULT = CHAR_MAX + 1 };
+
+/* Tab character separating fields.  If TAB_DEFAULT, then fields are
+   separated by the empty string between a non-blank character and a blank
    character. */
-static char tab;
+static int tab = TAB_DEFAULT;
 
 /* Flag to remove consecutive duplicate lines from the output.
    Only the last of a sequence of equal lines will be output. */
@@ -305,7 +308,7 @@ Other options:\n\
   -S, --buffer-size=SIZE    use SIZE for main memory buffer\n\
 "), stdout);
       printf (_("\
-  -t, --field-separator=SEP use SEP instead of non- to whitespace transition\n\
+  -t, --field-separator=SEP use SEP instead of non-blank to blank transition\n\
   -T, --temporary-directory=DIR  use DIR for temporaries, not $TMPDIR or %s\n\
                               multiple options specify multiple directories\n\
   -u, --unique              with -c: check for strict ordering\n\
@@ -618,6 +621,11 @@ specify_sort_size (char const *s)
 
   if (e == LONGINT_OK)
     {
+      /* If multiple sort sizes are specified, take the maximum, so
+        that option order does not matter.  */
+      if (n < sort_size)
+       return;
+
       sort_size = n;
       if (sort_size == n)
        {
@@ -769,7 +777,7 @@ begfield (const struct line *line, const
   /* The leading field separator itself is included in a field when -t
      is absent.  */
 
-  if (tab)
+  if (tab != TAB_DEFAULT)
     while (ptr < lim && sword--)
       {
        while (ptr < lim && *ptr != tab)
@@ -817,7 +825,7 @@ limfield (const struct line *line, const
      `beginning' is the first character following the delimiting TAB.
      Otherwise, leave PTR pointing at the first `blank' character after
      the preceding field.  */
-  if (tab)
+  if (tab != TAB_DEFAULT)
     while (ptr < lim && eword--)
       {
        while (ptr < lim && *ptr != tab)
@@ -866,7 +874,7 @@ limfield (const struct line *line, const
      */
 
   /* Make LIM point to the end of (one byte past) the current field.  */
-  if (tab)
+  if (tab != TAB_DEFAULT)
     {
       char *newlim;
       newlim = memchr (ptr, tab, lim - ptr);
@@ -2159,7 +2167,10 @@ set_ordering (register const char *s, st
          key->general_numeric = true;
          break;
        case 'i':
-         key->ignore = nonprinting;
+         /* Option order should not matter, so don't let -i override
+            -d.  -d implies -i, but -i does not imply -d.  */
+         if (! key->ignore)
+           key->ignore = nonprinting;
          break;
        case 'M':
          key->month = true;
@@ -2428,6 +2439,8 @@ main (int argc, char **argv)
          break;
 
        case 'o':
+         if (outfile != minus && strcmp (outfile, optarg) != 0)
+           error (SORT_FAILURE, 0, _("multiple output files specified"));
          outfile = optarg;
          break;
 
@@ -2440,15 +2453,28 @@ main (int argc, char **argv)
          break;
 
        case 't':
-         tab = optarg[0];
-         if (tab && optarg[1])
-           {
-             /* Provoke with `sort -txx'.  Complain about
-                "multi-character tab" instead of "multibyte tab", so
-                that the diagnostic's wording does not need to be
-                changed once multibyte characters are supported.  */
-             error (SORT_FAILURE, 0, _("multi-character tab `%s'"), optarg);
-           }
+         {
+           int newtab = optarg[0];
+           if (! newtab)
+             error (SORT_FAILURE, 0, _("empty tab"));
+           if (optarg[1])
+             {
+               if (strcmp (optarg, "\\0") == 0)
+                 newtab = '\0';
+               else
+                 {
+                   /* Provoke with `sort -txx'.  Complain about
+                      "multi-character tab" instead of "multibyte tab", so
+                      that the diagnostic's wording does not need to be
+                      changed once multibyte characters are supported.  */
+                   error (SORT_FAILURE, 0, _("multi-character tab `%s'"),
+                          optarg);
+                 }
+             }
+           if (tab != TAB_DEFAULT && tab != newtab)
+             error (SORT_FAILURE, 0, _("incompatible tabs"));
+           tab = newtab;
+         }
          break;
 
        case 'T':





reply via email to

[Prev in Thread] Current Thread [Next in Thread]