[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: fileutils/textutils LC_COLLATE support
From: |
Paul Eggert |
Subject: |
Re: fileutils/textutils LC_COLLATE support |
Date: |
Thu, 11 Oct 2001 18:59:41 -0700 (PDT) |
> Date: Thu, 11 Oct 2001 18:57:28 +0100 (BST)
> From: Corin Hartland-Swann <address@hidden>
>
> What I think would be the best solution is if ls(1) and sort(1) (and
> possibly other programs in textutils) were designed to sort by
> byte-ordering by default, and were given an option to use the locale-based
> collation.
POSIX specifies the behavior that you're objecting to, and most modern
Unix implementations conform to the standard in this respect. If
there were a good reason to depart from the standard, I suppose we
could do so -- but I haven't heard a good reason yet.
> Which all suggest that the intent behind the sort program is to do byte-
> ordering unless otherwise directed.
That may have been true in 1977, but the modern intent behind the sort
program is to use LC_COLLATE. This has been true for a decade in
Unix, and GNU/Linux is catching up with the rest of the world.
> The --ignore-case option, for instance, is now meaningless under ISO
> 8859-1 because LC_COLLATE makes upper and lower-cased letters
> equivalent.
I think you're confusing the collation sequence with the encoding
here. On most platforms, the letters are not equivalent, just next to
each other. For example, on Solaris 8 with LC_ALL=en_US.UTF-8, I get:
$ (echo a; echo A) | sort -u
A
a
$ (echo a; echo A) | sort -u -f
a
so -f (a.k.a --ignore-case) is not meaningless. I get the same
result for both GNU textutils 2.0.16 sort and Solaris 8 sort.
> I'm sure I'm not the only one who has assumed that byte ordering would
> remain the default action.
Yes. That is unfortunately true.
> I believe that this is the right thing to do because it preserves the
> existing and expected behaviour, but allows the user to specify locale-
> based collation if they want to. I think that this is something that
> should be specified explicitly.
But the default behavior is the traditional one. The user (or perhaps
the system administrator) must set LC_ALL, LC_COLLATE, or LANG to get
the new behavior.
(Maybe you should complain to your system administrator. :-)