Re: cut -c -b (UNCLASSIFIED)

bug-textutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: cut -c -b (UNCLASSIFIED)

From:	Bob Proulx
Subject:	Re: cut -c -b (UNCLASSIFIED)
Date:	Thu, 29 Jul 2004 21:45:54 -0600
User-agent:	Mutt/1.3.28i

Kirby, Jason B Mr NISA-DC/RABA Techhnologies wrote:
> Classification:  UNCLASSIFIED 
> Caveats: NONE

What's all of that about?

> users of cut -c17-18,20-22,28-33 need output to come out with tabs as
> abc   def     ghi   
> or spaces
> abc def ghi
> or user defined -d:
> abc:def:ghi
> not
> abcdefghi
> as it does now with the fields *pasted* together.

Oh, I don't know.  It is more of a rectangular cut program.  Which it
does quite well.  Considering all of the other programs for dealing
with text as text I would say cut should stick to doing rectangular
cutting and to use a different program for text parsing.

> Example:
> 0200(ECKD) at ( 94:  0) is dasda      : active at blocksize: 4096, 600840
> blocks, 2347 MB
> 129e(ECKD) at ( 94:100) is dasdz      : active at blocksize: 4096, 601020
> blocks, 2347 MB

I assume those are supposed to be two lines and not four.  Never word
wrap example lines.  Examples should be verbatim.

> I want "94" as field one, "0" and "100" as field two, and "dasda" and
> "dasdz" as field three.
> I'd like to see:
> 94: 92:dasdx
> 94: 96:dasdy
> 94:100:dasdz
> 94:104:dasdaa
> 94:108:dasdab

You are parsing output which was obviously meant to be human readable
and not machine readable.  Therefore you will just have to peel the
onion and brute force through it.  For this case I would use perl,
ruby or python.  I favor ruby.  But perl is popular and quite
pervasive.

This:

  perl -ne 'm/\(\s*(\d+)\s*:\s*(\d+)\s*\)\s+\S+\s+(\S+)/;print "$1:$2:$3\n";'

Produces:

  94:0:dasda
  94:100:dasdz

I think that is just what you asked for.

To split that perl expression apart, the -n generates a loop around
the input and does not print by default.  The m/something/ matches
over each input line.  Any (something) sets $1 with the second
(something) setting $2 and so forth.  The \s is whitespace, \S is
non-whitespace, \d is any digit.  The * is zero or more, the + is one
or more, of the immediately preceeding item.  So basically the flow is
match it using regular expressions then print sections of it.

Bob

[Prev in Thread]

Current Thread

[Next in Thread]

cut -c -b (UNCLASSIFIED), Kirby, Jason B Mr NISA-DC/RABA Techhnologies, 2004/07/29
- Re: cut -c -b (UNCLASSIFIED), Bob Proulx <=
- Re: cut -c -b (UNCLASSIFIED), Bob Proulx, 2004/07/31

Prev by Date: cut -c -b (UNCLASSIFIED)
Next by Date: Re: cut -c -b (UNCLASSIFIED)
Previous by thread: cut -c -b (UNCLASSIFIED)
Next by thread: Re: cut -c -b (UNCLASSIFIED)
Index(es):
- Date
- Thread