bug#12192: tr - bytes vs characters

bug-coreutils

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12192: tr - bytes vs characters

From:	Jim Meyering
Subject:	bug#12192: tr - bytes vs characters
Date:	Sat, 15 Sep 2012 12:28:54 +0200

forcemerge 12192 9365
thanks

Michael Stummvoll wrote:
> Hi gnu folks,
>
> as already known, tr cannot handle multibyte-encodings like utf-8:
>
>> address@hidden:~$ echo "foo" | tr o ö
>> fÃÃ
>
> i know, that multibyte encoding support is not needed for
> posix-compilance, BUT:
>
> the manpage of tr says the following:
>
>> Translate, squeeze, and/or delete characters from standard input,
>> writing to standard output.
>
> and thats the inconsistence imho.
>
> The typical interpretation of "character" in such a context means one
> character on display. regardless which encoding is used or how many
> bytes are used to display this. So, if tr realy translates "characters"
> it should preserve the encoding. If it doesn't do, it does not
> translate "characters" but "bytes". So there I see two ways:
>
> - add multybyte-encoding support to tr
> or
> - change the manpage and helptext to not say "characters" but "bytes"
>
> since it doesn't seem that somebody want to add the support to tr, an
> update of the manpage would be the easier way to ensure the consistence.

Thanks for the report.
I'm merging this issue with the others that relate to tr
and multi-byte support.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#12192: tr - bytes vs characters, Jim Meyering <=

Prev by Date: bug#12260: [patch] rm -d in coreutils 8.19
Next by Date: bug#12445: ls --color does not color symlinks when not resolving them
Previous by thread: bug#12260: [patch] rm -d in coreutils 8.19
Next by thread: bug#12453: failed test suite on 64-bit debian squeeze
Index(es):
- Date
- Thread