[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: is it a bug?
From: |
Eric Blake |
Subject: |
Re: is it a bug? |
Date: |
Tue, 02 Mar 2010 06:11:03 -0700 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.23) Gecko/20090812 Thunderbird/2.0.0.23 Mnenhy/0.7.6.666 |
According to Voelker, Bernhard on 3/2/2010 1:34 AM:
> I understand that the sort order depends on the locale, i.e. LC_ALL,
> but this doesn't explain the differences I get on Solaris 5.10, SLES 10.1,
> and Cygwin (given that sort didn't change about this point in the past).
The difference is that all three use different locale installations.
>
> # === Solaris SunOS 5.10, sort 6.10 ===
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=C sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=POSIX sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h
C and POSIX are strictly identical, on all machines. If they ever behave
differently from one another, on the same machine, or when comparing two
machines, then you have found a bug and should report it to that vendor.
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h
That just means that Solaris' rules for en_US don't ignore punctuation.
You can use locale(1) to learn more about the collation rules that will be
selected when you enable that locale.
> # === SLES 10.1, kernel 2.6.16.60-0.23-smp, sort 5.93 ===
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort
> ru.unix.ftn /h
> ru.unix /h
> ru.unix.prog /h
Yep, glibc's locale installation ignores punctuation for en_US. And
glibc's locale installation is probably the most complete one out there.
> $ sort --version
> sort (GNU coreutils) 5.93
Time to consider upgrading - the latest stable version is 8.4, and there
have been some bugs fixed in sort in the meantime.
> # === Cygwin on XPSP3, CYGWIN_NT-5.1 1.7.1(0.218/5/3), sort 7.0 ===
> $ printf "ru.unix /h\nru.unix.ftn /h\nru.unix.prog /h" | LC_ALL=en_US sort
> ru.unix /h
> ru.unix.ftn /h
> ru.unix.prog /h
Yep, cygwin 1.7.1 silently treats all LC_COLLATE in the C locale
(basically, no one had implemented the internals to convert the windows
notion of collation over to the POSIX api); it will improve for cygwin
1.7.2. But cygwin is still different than glibc; it only supports locales
known to windows, rather than the glibc approach of letting you install
your own locales to a specific directory.
> It seems that sort doesn't depend on LC_ALL on Solaris and Cygwin,
> but it does on Linux. Besides LC_ALL, what does the sort order depend
> on? Build settings?
LC_ALL takes precedence. But if LC_ALL is unset, then it is up to
LC_COLLATE; and if that is unset, then LC_LANG; and if that is unset, then
it is system-specific.
--
Don't work too hard, make some time for fun as well!
Eric Blake address@hidden
signature.asc
Description: OpenPGP digital signature