[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: multibyte locale patches for GNU utils available
From: |
Pablo Saratxaga |
Subject: |
Re: multibyte locale patches for GNU utils available |
Date: |
Tue, 8 May 2001 17:22:49 +0200 |
Kaixo!
On Tue, May 08, 2001 at 04:18:19PM +0200, Bruno Haible wrote:
> Many GNU text processing utilities doen't work correctly in multibyte
> locales.
BTW, the following is not utf-8 nor encoding related; but related to i18n.
It may even not be considered as a bug; however, its result will be considered
as undesirable by a wide portion of users (it is the case for me at least).
So, in the old days, when only ascii existed, there was (and still is),
two string sirting functions: strcmp and strcasecmp, one is case sensitive
and the other is case insensitive.
That made people accostummed to use [A-Z] and [a-z] in regexp expressions
as two very different things.
However, now in an i18n environment there is *only one* such function:
strcoll.
And it is case insensitive; there is no case sensitive equivalent (in
standard at least).
The result is that when you set your locale to anything other than 'C',
then both [A-Z] and [a-z] become the same thing as [A-Za-z].
That is a very annoying situtaiton.
The solution is quite simple: implement a case sensitive version of strcoll
(see the attached file, a small patch I did for bash).
If there are some people that thinks the odd behaviour has to be provided
too; then at least implement the possibility to let the user choose (through
a command line option, or an environment variable) to use a case sensitive
or case insensitive behaviour.
But Unix being case sensitive in its file system; and old behaviour (C only)
beign case sensitive; it would be logical to continue to keep case
sensitivness; breaking it is a very bad thing, and considered by a lot of
people as a bug.
Programs currently hurst by that problem are 'bash' and 'grep'; but probably
others.
Thanks
--
Ki ça vos våye bén,
Pablo Saratxaga
http://www.srtxg.easynet.be/ PGP Key available, key ID: 0x8F0E4975
my-strcoll.diff.bz2
Description: BZip2 compressed data