bug-coreutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#20114: tr does not support multibyte characters in the first argumen


From: Pádraig Brady
Subject: bug#20114: tr does not support multibyte characters in the first argument
Date: Mon, 16 Mar 2015 12:15:07 +0000
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:31.0) Gecko/20100101 Thunderbird/31.4.0

On 16/03/15 02:30, Bruno Haible wrote:
> POSIX [1] specifies that the recognition of characters in 'tr' depends on
> the environment variables LANG, etc.
> 
> But trying to replace a multibyte character by another character does not
> work:
> 
> $ echo $LANG
> de_DE.UTF-8
> $ enspace=`printf '\u2002'`
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 20 20 59
> 0000005
> 
> Expected output would be:
> $ echo -n "X${enspace}Y" | tr "${enspace}" ' ' | od -t x1
> 0000000 58 20 59
> 0000003
> 
> With 'sed' it works:
> 
> $ echo -n "X${enspace}Y" | sed -e "s/${enspace}/ /g" | od -t x1
> 0000000 58 20 59
> 0000003
> 
> Bruno
> 
> [1] http://pubs.opengroup.org/onlinepubs/9699919799/utilities/tr.html

Yes you're right Bruno.
Multi-byte support in coreutils in general has languished,
but we hope to start improving support in the next major release (9?)
after the current imminent 8.24 stable release.

To that end I've put together a plan:
http://www.pixelbeat.org/docs/coreutils_i18n/

cheers,
Pádraig.





reply via email to

[Prev in Thread] Current Thread [Next in Thread]