bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multibyte locale patches for GNU utils available


From: Pablo Saratxaga
Subject: Re: multibyte locale patches for GNU utils available
Date: Tue, 8 May 2001 17:22:49 +0200

Kaixo!

On Tue, May 08, 2001 at 04:18:19PM +0200, Bruno Haible wrote:
 
> Many GNU text processing utilities doen't work correctly in multibyte
> locales.

BTW, the following is not utf-8 nor encoding related; but related to i18n.
It may even not be considered as a bug; however, its result will be considered
as undesirable by a wide portion of users (it is the case for me at least).

So, in the old days, when only ascii existed, there was (and still is),
two string sirting functions: strcmp and strcasecmp, one is case sensitive
and the other is case insensitive.

That made people accostummed to use [A-Z] and [a-z] in regexp expressions
as two very different things.

However, now in an i18n environment there is *only one* such function: 
strcoll.
And it is case insensitive; there is no case sensitive equivalent (in 
standard at least).

The result is that when you set your locale to anything other than 'C',
then both [A-Z] and [a-z] become the same thing as [A-Za-z].

That is a very annoying situtaiton.


The solution is quite simple: implement a case sensitive version of strcoll
(see the attached file, a small patch I did for bash).


If there are some people that thinks the odd behaviour has to be provided
too; then at least implement the possibility to let the user choose (through
a command line option, or an environment variable) to use a case sensitive
or case insensitive behaviour.

But Unix being case sensitive in its file system; and old behaviour (C only)
beign case sensitive; it would be logical to continue to keep case 
sensitivness; breaking it is a very bad thing, and considered by a lot of
people as a bug.

Programs currently hurst by that problem are 'bash' and 'grep'; but probably
others.

Thanks

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/            PGP Key available, key ID: 0x8F0E4975

Attachment: my-strcoll.diff.bz2
Description: BZip2 compressed data


reply via email to

[Prev in Thread] Current Thread [Next in Thread]