bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question on "sort" command


From: Bob Proulx
Subject: Re: Question on "sort" command
Date: Tue, 25 Feb 2003 21:07:17 -0700
User-agent: Mutt/1.3.28i

address@hidden wrote:
> 
> I have some problems with the command sort :
> How can I specify the field seperator "\t" on sort command?

You are actually experiencing problems with the shell.  This has been
discussed before.  But gnu.org seems to have lost all of the mailing
list archives from before this year.  A terrible loss if the data is
not recovered.

But if you google search for the following string and hit google's
cached copies you can read the previous thread of discussion about
this.

  site:gnu.org tab as sort field-separator

But since the archive is down I will recreate the discussion here.

> I try to sort a file which have a field seperator : a tabulation.
> I wrote : cat toto.file | sort -t \t -k 2n > result.txt
> but the second field (numeric field) in the result.txt file is not sorted.
> 
> I tried some others command line like these:
> cat toto.file | sort -t "\t" -k 2n > result.txt
> cat toto.file | sort -t"\t" -k 2n > result.txt
> cat toto.file | sort -t\t -k 2n > result.txt
> cat toto.file | sort -t=\t -k 2n > result.txt
> cat toto.file | sort -t="\t" -k 2n > result.txt

Use 'echo' to see what you are telling sort to do.

  echo sort -t"\t"
  sort -t\t

sort is not seeing a tab, it is seeing a backslash which is not the
same thing at all.  Getting the tab into the string with current
shells is tricky.

Here is what the bash manual says:

       There  are three quoting mechanisms: the escape character,
       single quotes, and double quotes.

       A non-quoted backslash (\) is the  escape  character.   It
       preserves  the  literal  value  of the next character that
       follows, with the exception of <newline>.  If a \<newline>
       pair  appears, and the backslash is not itself quoted, the
       \<newline> is treated as a line continuation (that is,  it
       is removed from the input stream and effectively ignored).  

       Enclosing characters in single quotes preserves  the  lit
       eral  value of each character within the quotes.  A single
       quote may not occur between single quotes, even when  pre
       ceded by a backslash.

       Enclosing  characters  in double quotes preserves the lit
       eral value of all characters within the quotes,  with  the
       exception  of  $, `, and \.  The characters $ and ` retain
       their special meaning within double quotes.  The backslash
       retains  its  special meaning only when followed by one of
       the following characters: $, `, ",  \,  or  <newline>.   A
       double quote may be quoted within double quotes by preced
       ing it with a backslash.

The bash manual specifically says that "\t" is nothing special since
it is not one of the listed sequences.  This was backed up by the
SUSv2 specification as well.

Previous suggestions go like this.  If you want a maximally portable
solution, use awk.  Paul Eggert suggested this:

  tab=`awk 'BEGIN {print "\t"; exit}'`
  sort -t"$tab"

But I am okay with cutting loose machines with operating systems prior
to 1992 when printf first appeared.  Therefore I use printf for a
slightly simpler solution.

  tab=$(printf "\t")
  sort -t"$tab"

Or I suppose you could combine them into a one-liner.  But people
reading your script later will hurt you for it.

   sort -t"$(printf "\t")"

Paul Eggert wrote on this subject:
> It is POSIX standard and it is fairly safe nowadays, but it won't work
> on older hosts.  I believe the "printf" command was first standardized
> by XPG4 (dated 1992), and many older hosts do not have it.  In
> contrast, the solution with Awk should work all the way back to Unix
> Version 7 (dated 1978).  But if you're using $(...) instead of `...`
> then I guess you're not worried about older hosts anyway....
> 
> On Solaris 9 "printf" is part of the SUNWloc package, and this package
> is occasionally not installed on some bare-bones Solaris hosts; e.g. see
> <http://groups.google.com/groups?selm=38F5B805.E1026EBC%40ks.sel.alcatel.de>.
> In contrast, "awk" is in SUNWesu, which is almost always installed.

Hope that helps...

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]