bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: A bug in "tr" command ???


From: Bob Proulx
Subject: Re: A bug in "tr" command ???
Date: Sat, 4 Oct 2003 19:23:23 -0600
User-agent: Mutt/1.3.28i

address@hidden wrote:
> Hello.  I hope this is not a bug, that I'm just doing something
> wrong.  Anyway, here's how this "journey" started.  I had a file
> with carriage return characters (^M) in it.

DOS has a CR-NL end of line convention.  UNIX has a NL end of line
convention.  A classic conversion problem.

> The file was one LONG record, and I wanted newline characters where
> the ^M's were.  I thought I could just set awk's RS variable to ^M,
> and that would do it.  But, I needed a way to "create" the ^M
> character.

I would use other commands myself.  Such as tr -d.  Try this:

  tr -d "\015"

But please continue with your story.

> Somewhere on the Internet I found this:
> 
> cm=`echo m | tr 'm' '\015'`
> 
> but, that did not seem to work.  Seemed like "cm" ended up being
> null.

Works for me!

> To test if the syntax of the command correct, I did the following:
> 
> Script 1 (a file called ASCII):
> 
> cat /dev/null > asc
> cat /dev/null > asc.txt

The 'cat' programs in the above are not needed.

  true > asc
or
  : > asc
or
  > asc

All do the same thing without the extra program.  (Sorry, but extra
'cat' processes are a common scripting mistake and a pet peeve of
mine.)  Here is a simple howto on common shell mistakes.

  http://www.greenend.org.uk/rjk/2001/04/shell.html

> for i in 000 001 002 003 004 005 006 007 \
>          010 011 012 013 014 015 016 017 \
>          020 021 022 023 024 025 026 027 \
>          030 301 032
> do
>    echo "x=\`echo x | tr 'x' '\\${i}'\`" >> asc
>    echo "echo \"\${x}\" >>asc.txt" >>asc
> done
> 
> bash asc
> 
> The execution of script 1 (ASCII) created a file called asc
> (which was executed from within the ASCII file).
> 
> asc file:
> 
> x=`echo x | tr 'x' '\000'
> if [[ "${x}" == "" ]]; then echo "x is null"; else echo "x is not null"; fi
> #Note: the above "if" statement was not created by the script.  I edited it 
> in afterwards,

You have to be careful that your editor does not change any of the
characters.  In particular some editors will silently delete null (000)
characters.

> #and re-executed the asc file manually
> echo "${x}" >>asc.txt
> x=`echo x | tr 'x' '\001'
> if [[ "${x}" == "" ]]; then echo "x is null"; else echo "x is not null"; fi
> #Same for that "if" statement, too
> echo "${x}" >>asc.txt
> x=`echo x | tr 'x' '\002'
> echo "${x}" >>asc.txt
>     .
>     . (several lines left out for brevity)
>     .
> x=`echo x | tr 'x' '\031'
> echo "${x}" >>asc.txt
> x=`echo x | tr 'x' '\032'
> echo "${x}" >>asc.txt
> 
> And, the result of that execution was a file, asc.txt
> (this is how it looked when viewed with vi):
> 
> (a null character, OK, i.e., expected)
> ^A
> ^B
> ^C
> ^D
> ^E
> ^F
> ^G
> ^H
> (a tab character, OK, i.e., expected)
> (a null character, not expected)

I can't recreate that.  I don't see the null.

> ^K
> ^L
> (a null character, not expected, at least I had hoped it would be a ^M)

I can't recreate that.  I don't see the null.

What version of tr are you using?

  tr --version

> ^N
> .
> .(several lines left out for brevity)
> .
> ^Z
> 
> Note 1:  where ^I would be is a tab character (OK)
>          where ^J would be is a null character
>          where ^M would be is a null character
> 
> Note 2:  I went back and edited in the following line to the
>          2nd file (asc):
> 
> if [[ "${x}" == "" ]]; then echo "x is null"; else echo "x is not null"; fi
> 
> and inserted it after the "000", "001", "012", "013", "015", and "016" lines 
> to test.  The character created by the "012", and "015" lines from the asc 
> file is null.  :(

You can probably do this easier with:

  echo x | tr x "\\015" | od -c

or even

  for I in $(seq -w 0 32);do echo x | tr x "\\$i" | od -c;done

> Note 3:  GNU bash, version 2.05.0(8)-release (i686-pc-cygwin)

Cygwin?  You are probably running afoul of the DOS end of line
conventions.  Probably the program is doing its own conversions.  Can
you recreate this on a UN*X like machine?  I don't think anyone on
this list uses Cygwin.  So if it is a Cygwin specific problem then you
would need to take this to the cygwin list.

> Note 4: I finally just made a copy of a file that had ^M's in it,
> edited out everything but one ^M character, and then edited the
> following around the ^M:
> 
> BEGIN { RS = "^M" }
> { print }
> 
> and then used that to process my file with the ^M's in it:
> 
> cat ctrl-Ms_file | awk -f RS_is_ctrl-M.awk > newlines_file

Try 'tr -d "\015"' as the classic way to delete CRs from files.

> Yucky thing is that I would have to keep that "RS_is_crtl-M.awk"
> file around (or create it as needed using vi) since I can't create a
> ^M character "on the fly".  :(

Sure you can!

  tr -d "\015"

  printf "\r"

  tr -d "$(printf "\r")"

  CR=$(prinf "\r")

  tr -d "$CR"

I think 'perl -l' might rethread end of line conventions too.  Not
sure.  I don't have a way to test this on DOS.  But it is worth a test
on Cygwin.

  perl -lne 'print'

Bob




reply via email to

[Prev in Thread] Current Thread [Next in Thread]