[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?
From: |
Eric Blake |
Subject: |
Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z? |
Date: |
Mon, 21 May 2012 20:02:52 -0600 |
User-agent: |
Mozilla/5.0 (X11; Linux x86_64; rv:12.0) Gecko/20120430 Thunderbird/12.0.1 |
On 05/21/2012 05:42 PM, Linda Walsh wrote:
>> POSIX explicitly undefined ranges for all but the C locale. _Other
>> standards_, such as Unicode, are free to add range requirements on top
>> of what POSIX requires, but alas, Unicode collation order does NOT
>> currently specify anything about regular expression or glob range
>> matching, so it is out of scope for Unicode to say what [A-Z] expands to.
>
>
> ----
>
> I think this is the problem.
>
> A-Z in regular expressions is defined to expand to those characters
> that are _in collating order_, >A, and <Z...
Only in POSIX 1992 or in the C locale. In POSIX 2001 and POSIX 2008,
and non-C locales, [A-Z] is explicitly undefined, because the definition
of characters in collating order between A and Z did not work out.
>
> Without a collating order that expression in RE's would never have made any
> sense. It requires a collating order and is dependent on it.
They still don't make any sense in any locale except C, because POSIX no
longer requires collating order.
> The regex(7) man page says that [xx-xx] uses ***collating order**::
The regex(7) man page _of which system_? Just because _some_ systems
(like glibc, picking the POSIX 1992 semantics) have well-defined
semantics, doesn't mean that all systems have those same semantics.
According to POSIX, you cannot portably assume ANY semantics for ranges
except in the C locale. And if RRI gains traction, that means that you
can assume ASCII collation, across ALL locales, but this is a different
order than collation of a specific locale, and it is also a GNU
extension not guaranteed by POSIX.
> ----
> Seems pretty clear -- regex's aren't exempt from collating order, they
> depend on it...
Only on platforms where libc has chosen to provide an extension beyond
POSIX, and where GNU programs have not further overridden things to
avoid the unexpected glibc semantics.
--
Eric Blake eblake@redhat.com +1-919-301-3266
Libvirt virtualization library http://libvirt.org
signature.asc
Description: OpenPGP digital signature
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, (continued)
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Chris F.A. Johnson, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Greg Wooledge, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Linda Walsh, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Chet Ramey, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Linda Walsh, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Eric Blake, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Linda Walsh, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Chet Ramey, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Eric Blake, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Linda Walsh, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?,
Eric Blake <=
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Linda Walsh, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Linda Walsh, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Sven Mascheck, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Raphaƫl Droz, 2012/05/25
- Message not available
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Aharon Robbins, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Chet Ramey, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Chet Ramey, 2012/05/21
- Re: locale specific ordering in EN_US -- why is a<A<b<B<y<Y<z<Z?, Linda Walsh, 2012/05/21