bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#12192: tr - bytes vs characters


From: Jim Meyering
Subject: bug#12192: tr - bytes vs characters
Date: Sat, 15 Sep 2012 12:28:54 +0200

forcemerge 12192 9365
thanks

Michael Stummvoll wrote:
> Hi gnu folks,
>
> as already known, tr cannot handle multibyte-encodings like utf-8:
>
>> address@hidden:~$ echo "foo" | tr o ö
>> fÃÃ
>
> i know, that multibyte encoding support is not needed for
> posix-compilance, BUT:
>
> the manpage of tr says the following:
>
>> Translate, squeeze, and/or delete characters from standard input,
>> writing to standard output.
>
> and thats the inconsistence imho.
>
> The typical interpretation of "character" in such a context means one
> character on display. regardless which encoding is used or how many
> bytes are used to display this. So, if tr realy translates "characters"
> it should preserve the encoding. If it doesn't do, it does not
> translate "characters" but "bytes". So there I see two ways:
>
> - add multybyte-encoding support to tr
> or
> - change the manpage and helptext to not say "characters" but "bytes"
>
> since it doesn't seem that somebody want to add the support to tr, an
> update of the manpage would be the easier way to ensure the consistence.

Thanks for the report.
I'm merging this issue with the others that relate to tr
and multi-byte support.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]