bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tab as sort's field-separator


From: Bob Proulx
Subject: Re: tab as sort's field-separator
Date: Mon, 17 Jun 2002 17:33:20 -0600
User-agent: Mutt/1.3.28i

> > Meanwhile, try this:
> > 
> >   sort -t"\t" -u -k2,2 -k1,1 /tmp/x

> BTW, your suggestion of using "\t" didn't work.

That worked for me using some versions of bash but not other
versions.  It looks like something that has been fixed to be standards
conforming.

I decided to RTFM a little.  Please read along with me and see what we
can learn.  The bash manual says this:

       There  are three quoting mechanisms: the escape character,
       single quotes, and double quotes.

       A non-quoted backslash (\) is the  escape  character.   It
       preserves  the  literal  value  of the next character that
       follows, with the exception of <newline>.  If a \<newline>
       pair  appears, and the backslash is not itself quoted, the
       \<newline> is treated as a line continuation (that is,  it
       is removed from the input stream and effectively ignored).

       Enclosing characters in single quotes preserves  the  lit
       eral  value of each character within the quotes.  A single
       quote may not occur between single quotes, even when  pre
       ceded by a backslash.

       Enclosing  characters  in double quotes preserves the lit
       eral value of all characters within the quotes,  with  the
       exception  of  $, `, and \.  The characters $ and ` retain
       their special meaning within double quotes.  The backslash
       retains  its  special meaning only when followed by one of
       the following characters: $, `, ",  \,  or  <newline>.   A
       double quote may be quoted within double quotes by preced
       ing it with a backslash.

The bash manual specifically says that "\t" is nothing special since
it is not one of the listed sequences.  This was backed up by the
SUSv2 specification as well.

       The special parameters * and @ have special  meaning  when
       in double quotes (see PARAMETERS below).

       Words  of  the  form $'string' are treated specially.  The

Aha!  This is what we need.

       word expands to string, with backslash-escaped  characters
       replaced  as  specifed  by the ANSI C standard.  Backslash
       escape sequences, if present, are decoded as follows:
              \a     alert (bell)
              \b     backspace
              \e     an escape character
              \f     form feed
              \n     new line
              \r     carriage return
              \t     horizontal tab
              \v     vertical tab
              \\     backslash
              \'     single quote
              \nnn   the eight-bit character whose value  is  the
                     octal value nnn (one to three digits)
              \xHH   the  eight-bit  character whose value is the
                     hexadecimal value HH (one or two hex digits)

       The  expanded  result  is  single-quoted, as if the dollar
       sign had not been present.

       A double-quoted string preceded by a dollar sign ($)  will
       cause the string to be translated according to the current
       locale.  If the current locale is C or POSIX,  the  dollar
       sign   is  ignored.   If  the  string  is  translated  and
       replaced, the replacement is double-quoted.

Let's try what the manual suggests:

  sort -t$'\t'. -u -k2,2 -k1,1 /dev/null

That looks good so far and works with bash, hpux /bin/sh, and aix
/bin/sh which I tested this out on.  If three different sources do
something the same way then there must be a reason.

I perused the standards documentation for the shell here.

  http://www.opengroup.org/onlinepubs/007908799/xcu/chap2.html

But unfortunately I did not find anything that required this.  Perhaps
I missed it and someone can point me to the relevant passages.

Bob



reply via email to

[Prev in Thread] Current Thread [Next in Thread]