bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multibyte locale patches for GNU utils available


From: Pablo Saratxaga
Subject: Re: multibyte locale patches for GNU utils available
Date: Tue, 8 May 2001 19:58:07 +0200

Kaixo!

On Tue, May 08, 2001 at 08:38:46AM -0700, Ulrich Drepper wrote:

> Pablo Saratxaga <address@hidden> writes:
> 
>> Programs currently hurst by that problem are 'bash' and 'grep'; 
>> but probably others.
> 
> Crap.  These programs are simply broken as is.

So you agree there is a problem.

> strcoll is fine

For sorting in a case insensitive way, yes.
I never said that strcoll() is broken, I say that it is not adapted to
this case.

> (your expectations are broken).

I'm not alone to think that [A-Z], [a-z] and [A-Za-z] should have different
effect.

> With a correct implementation of glob()

Well, the solution can be done differently of what I proposed; that is not
the problem.
What I pointed out is that replacing strcmp() with strcoll() adds i18n 
support but gives unexpected case insensitive results. That is indeed a
problem (and you see to acknowledge it).

> With a correct implementation of glob()
> which cannot be provided by Chet bash is just fine.  This will be in
> bash 2.06.

So, trying to fix it in a temporary, but satisfactory, way, until a proper
solution using a right glob() implementation is out, is a bad thing?

> Stay away from making statements like you did if you don't understand
> the issues.

I don't understad why you are so harsh.

I simply saw there is a problem (and several people did too; do some search
on internet, you will find lots of texts about "bash is bugged"), and
I traced it to the use of strcoll() to determine if a given char
is comprized in a [...] regexp-like intervall. the problem is the 
case-insensitivness of strcoll(). strcmp() is case sensitive, but it don't
deal with chars but with bytes (that is, with numbers, the notion of letter
is unknown to strcmp()), and then the i18n support is lost (there is no
way to use a [A-Z] like intervall for cyrillic letters using strcmp() for
example).
A solution (it works, it is a solution, maybe not the best, but surely
not "crap") is to replace the sue of strcoll() (in that case) with a case
sensitive strcoll() like.

If it can be done better, no problem, I'll be very happy.

But I don't agree that the problem doesn't exist and that my expectations
that [a-z] and [A-Z] should be different are wrong. A lot of people just
have that expectations, it's common sense, and it's enforced by more than
20 years of current practice in Unix world.
 
I just pointed out the problem, in the hope it would be addressed; that's 
all I had to say.

> 
> -- 
> ---------------.                          ,-.   1325 Chesapeake Terrace
> Ulrich Drepper  \    ,-------------------'   \  Sunnyvale, CA 94089 USA
> Red Hat          `--' drepper at redhat.com   `------------------------

-- 
Ki ça vos våye bén,
Pablo Saratxaga

http://www.srtxg.easynet.be/            PGP Key available, key ID: 0x8F0E4975



reply via email to

[Prev in Thread] Current Thread [Next in Thread]