bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: tab as sort's field-separator


From: Andrew D Jewell
Subject: Re: tab as sort's field-separator
Date: Mon, 17 Jun 2002 15:55:11 -0400

     As for the merging of the alexa extensions into the mainline :
     I hereby beg the appropriate persons to accept my changes.
     It seems unlikely, however that it will happen.
     Why can't you just distribute the Alexa version?

If you really want to prove to yourself that it's the shell and not sort, test with this program :

int main(int argc char ** argv)
{int i; for (i=1; i<argc; ++i) printf("<%s>\n", argv[i]);}

It will show you that the parameters received by the program are never quite what you expect.


There are three ways for the shell to interpret the command, only one of which will work. Let TAB mean the tab character

1) sort -tTAB foo
2) sort -t\t foo
3) sort "-tTAB" foo

Number 1) won't work, because the tab is just more white space that is discarded between parameters, and you get <-t> <foo>, which uses foo as the delimiter instead of the file.

Number 2) won't work, because sort sees <-t\t> <foo> , which again is a multi-character delimiter, </> and <t>

Number 3) works, but the shell makes it really difficult to express. You need to see the parameters as <-t > <foo>.

I suppose, if we were nice, we would interpret escaped characters like \n \t \r and the like, the way the paste command does already. General question : would this be good or bad? Would it break anything?


Good luck,
adj

At 4:24 AM -0400 6/17/02, address@hidden wrote:
Thanks for the reply.  I'm still not convinced it's (just) a shell
problem.  From the following example it appears to me that tcsh is
expanding the literal '\t' correctly to a single tab character.
What's not clear is why sort complains about '\t' in the directly
executed case and '/etc/motd' in the indirect cases.  I think only
reading the source will tell us for sure and I don't have the correct
version with me here.

BTW, your suggestion of using "\t" didn't work.

Jim


$ echo $version
tcsh 6.10.00 (Astron) 2000-11-19 (i386-intel-linux) ...

$ echo sort -t'\t' /etc/motd > /tmp/t
$ od -tx1 /tmp/t
0000000 73 6f 72 74 20 2d 74 09 20 2f 65 74 63 2f 6d 6f
0000020 74 64 0a
0000023

$ od -ta /tmp/t
0000000   s   o   r   t  sp   -   t  ht  sp   /   e   t   c   /   m   o
0000020   t   d  nl
0000023

$ od -tc /tmp/t
0000000   s   o   r   t       -   t  \t       /   e   t   c   /   m   o
0000020   t   d  \n
0000023

$ bash /tmp/t
sort: multi-character tab `/etc/motd'

$ tcsh /tmp/t
sort: multi-character tab `/etc/motd'

$ sort -t'\t' /etc/motd
sort: multi-character tab `\t'

$ bash --version
GNU bash, version 2.05.8(1)-release (i386-redhat-linux-gnu)


-----Original Message-----
Date: Mon, 17 Jun 2002 01:04:36 -0600
To: address@hidden
Cc: Andrew D Jewell <address@hidden>, address@hidden
Subject: Re: tab as sort's field-separator
From: address@hidden (Bob Proulx)

 Clearly the workaround not working with tcsh is a shell problem.  But
 I don't see how not being able to specify a literal for a tab as an
 option to sort is a shell problem.

I believe it is a shell problem as well.

 Did you see the two examples of using tab with sort at the very end
 of my original message (and below) after my signature?  The error is
 independent of the shell used.

Then both shells are causing you the trouble.  But the problem as we
shall see is with your quoting.

 >RH 7.2, textutils-2.0.14-2
 >
 >$ sort -t'\t' -u -k2,2 -k1,1 /tmp/x
 >sort: multi-character tab `\t'

Try using 'echo' to see what the sort command is seeing.  This is what
I get on linux using different versions of bash.

bash-2.04:

  echo sort -t'\t' -u -k2,2 -k1,1 /tmp/x
  sort -t  -u -k2,2 -k1,1 /tmp/x

bash-2.05:

  echo sort -t'\t' -u -k2,2 -k1,1 /tmp/x
  sort -t\t -u -k2,2 -k1,1 /tmp/x

See that literal '\t' there as two characters in the bash-2.05
example?  Sort is complaining that only one character is allowed as an
option.  It is seeing two.  It is seeing a 'backslash' followed by a
't'.  You need to make it see only one character.  Which you appear to
be aware of because you posted a workaround.

 >I can work around this in bash using:
 >   TAB=`echo -e "\t"`
 >   sort -t"$TAB" ...

You seem to see that this is just an issue with your shell quoting.
You can see that \t is espanded inside of "" but not in ''.  So you
are using "\t" in the echo to expand to be a tab.  But then later you
are trying to use '\t' which works with some shells and apparently not
in the newer bash.

This seems to be different in different versions of bash.  I don't see
this problem when using the hpux /bin/sh posix shell, for example.  If
someone knows what the standards say about quoting and backslash
expansion I would be interested in hearing about it.  Likely this is a
change required by the Austin group or some such.  I will give them
the benefit of the doubt until someone says otherwise.

 $ strings /bin/sort | grep multi
 ...
 multi-character tab `%s'

That is the error message code in sort which is triggered when the -t
option has more than one character.  We are in agreement there, right?

 Thanks for the pointer to the improved textutils.  I'll need to wait
 for them to be in the released baseline since I'll be distributing the
 script I'm working on.  Any idea when that will be?

The alexautils are a code fork off of the GNU tools.  They implement
their own set of features and options.  As far as I know there are no
plans to merge those into the main GNU source.  Friendly competition
is a good thing and working examples of alternative implementations
are always welcome.

Meanwhile, try this:

  sort -t"\t" -u -k2,2 -k1,1 /tmp/x

As a "\t" I believe the shell will expand the tab.  As in your posted
workaround.

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]