bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: multibyte locale patches for GNU utils available


From: Paul Eggert
Subject: Re: multibyte locale patches for GNU utils available
Date: Tue, 8 May 2001 10:04:24 -0700 (PDT)

> From: Pablo Saratxaga <address@hidden>
> Date: Tue, 8 May 2001 17:22:49 +0200

> The result is that when you set your locale to anything other than 'C',
> then both [A-Z] and [a-z] become the same thing as [A-Za-z].

When you set your locale to anything other than 'C', then both [A-Z]
and [a-z] become whatever the locale wants it to mean.  They do not
necessarily mean the same thing as [A-Za-z].  It could be, for
example, that [A-Z] is an invalid regular expression, because 'Z'
comes before 'A' in this locale.

You are supposed to use [[:lower:]] if you want lower case letters in
the current locale.

> That is a very annoying situtaiton.

True.

> The solution is quite simple: implement a case sensitive version of strcoll

I'm afraid that the solution is much more complicated than that.
Your case-sensitive strcoll is incorrect for several reasons (it
assumes 8-bit characters, among other things).

> But Unix being case sensitive in its file system; and old behaviour (C only)
> beign case sensitive; it would be logical to continue to keep case
> sensitivness; breaking it is a very bad thing,

One way around the problem is to define your locales so that [A-Z]
means just upper case characters.  POSIX allows this to be done even
for locales where 'A' and 'a' should sort together, and many modern
POSIX systems (including recent versions of GNU/Linux and Solaris) do
it that way.  That solves much of your problem without having to
rewrite applications.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]