bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Undocumented cut feature


From: Bob Proulx
Subject: Re: Undocumented cut feature
Date: Fri, 26 Oct 2007 12:54:55 -0600
User-agent: Mutt/1.5.13 (2006-08-11)

Gambs, David (CONT) wrote:
> From vi w/set list I have the following -
> 
> file3:
> optvg^M$
> 4$
> 3171$

That shows that the carriage return was already in the file before
'cut' processed it.  That is the source of the issue.

> file2 (the one that I cut on):

But your previous example showed that you were cutting file3 into file1.

>  optvg$
>  4 MB$
>  3171 / 12.39 GB$
> 
> The command:
> cut -f2 -d' ' ~/file2 > ~/file3

Okay.  No carriage returns going in.

> Your suggested command gives:
> $ cut -f2 -d' ' file2 | od -tx1 -c
> 0000000 6f 70 74 76 67 0d 0a 34 0a 33 31 37 31 0a
>           o   p   t   v   g  \r  \n   4  \n   3   1   7   1  \n
                               ^^ A carriage return.

I cannot recreate this behavior on a RHEL3 machine.  Can you double
check that your input files?  I believe there may be a mixup in which
file is which file.  Your first example in the previous message showed
you using file3 and the above shows that file3 contains carriage
returns in the data.

Note that cut prints the entire line if no delimiter is present.

  `-f FIELD-LIST'
  `--fields=FIELD-LIST'
       Select for printing only the fields listed in FIELD-LIST.  Fields
       are separated by a TAB character by default.  Also print any line
       that contains no delimiter character, unless the
       `--only-delimited' (`-s') option is specified

I believe what is happening is that your original input data contains
a carriage return in the input.  The optvg line is the only line
without any delimiters and is therefore passed through by cut.  This
is why you are seeing the carriage return in the output.

> And I have found differences within RedHat on the vgdisplay. This
> vgdisplay is in /sbin and not linked to anything. On the system where
> the problem does not happen (newer coreutils & OS release) the command
> is /usr/sbin/vgdisplay and is linked to lvm. Don't know where that would
> make a difference though.

You should be able to use 'rpm -qf FILE' where FILE is /sbin/vgdisplay
and /usr/sbin/vgdisplay to determine what package contains that file.
I don't think vgdisplay should output carriage returns.

> cd ~
> /sbin/vgdisplay | egrep -e Name -e "PE Size" -e Free | cut -b 22- | cut
> -f2 -d' ' > file1
> rm file.out
> touch file.out
> gawk '
> { line0 = /[:alpha:]/ }
> { printf "%s ", $line0 >> "file.out" }
> { getline }
> { line1 = /[:print:]/ }
> { printf "%i ", $1 >> "file.out" }
> { getline }
> { line2 = /[:print:]/ }
> { print $1 >> "file.out" } ' file1
> rm file1

That is a very unconventional awk script!  Unfortunately I do not have
the time right now to look at what it is doing in detail.

> In the gawk script when you output line0, the ^M puts the cursor at the
> beginning of the line. The next print lines then overwrite what was
> there. In this case optvg is completely overwritten. A longer vg name
> would have some of it left.

Overwriting would only happen to a terminal.  The character stream
would still contain all of the characters.

> $ cat file.out
>  4 3171

I think if you can debug why CRs are in the vgdisplay output and
ensure that they are removed there that everything else will flow
through normally.

> And all this started on HP-UX. The script works just fine there. It was
> when I brought it over to Linux that problems arose and modifications
> had to be made.

About the time you have ported to three different systems is when most
scripts start to get portable.  :-)

Bob





reply via email to

[Prev in Thread] Current Thread [Next in Thread]