[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: "cut" field delimeter

From: Bob Proulx
Subject: Re: "cut" field delimeter
Date: Sun, 15 Jun 2003 14:52:14 -0600
User-agent: Mutt/1.3.28i

Mark Fenbers wrote:
> I surely wish that the "cut" utility would make use of regular
> expressions [regexp(5)] as field delimeters.  So many times I have
> to use "awk" where "cut" would be simpler and quicker, but my fields
> are separated by more than one character.  (Awk allows regexps in
> the -F switch.)  Something like:
>     cut -d"[\t ,]+" -f1-3 myfile
> would make "cut" infinitely more useful.  The example above would
> tell "cut" that the fields are delimited by 1 or more consecutive
> spaces and/or tabs and/or commas.

First let me say thanks for the suggestion.  I understand and
sympathize with your request.  But I don't think that is the best
thing for cut.  Let me voice one opinion.

Adding this type of awk functionality would logically be extended to
say that other awk functionality would also be useful.  And if awk is
good then perl is even better.  Like the camel that pushes his nose
into the merchant's tent where do you stop?  Especially since sed,
awk, perl, ruby all provide this capability already.  Wouldn't the
logical conclusion be the same functionality as has already matured
into the present day advanced commands?[1]

Are these so much worse in comparison to your suggested one?

  awk -F"[\t ,]+" '{print $1,$2,$3}'

  perl -F"[\t ,]+" -lane 'print join(" ",@F[0..2])'

  ruby -F"[\t ,]+" -lane 'print $F[0..2].join(" ")''

Programs should be simple and modular.  Common behavior should be
modularized into a common location where it can be used by other
programs.  More complicated programs are created by chaining together
simpler programs.  This is part of the subtle but beautiful design of
the UNIX system.  Applying to this case provides the following.  Use
sed for the REs and cut for the field splitting.  Of course the TAB
and space are going to be a problem so let's use the modern character
class [:space:] to make this work.

  sed "s/[[:space:],]\+/ /g" | cut -d" " -f1-3

The core utilities are required utilities on /bin and /usr/bin on all
systems.  Therefore extra care needs to be taken that they don't bloat
too much.  It reduces their ability to be useful on a wide range of
platforms.  Already there are other programs such as busybox which
implies that the core utilities are too bloated already.  And if RE
functionality were added to cut it would imply that other commands
also needed the same functionality added to them as well multiplying
the problem.


[1] I used that argument since it is what defeated putting classes
into ANSI C.  If ANSI C became C with Classes as C++ started out with
then wouldn't the logical progression be the same as already in
existence with C++?  And so we do not have classes in the present C.

reply via email to

[Prev in Thread] Current Thread [Next in Thread]