bug-textutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fileutils/textutils LC_COLLATE support


From: Paul Eggert
Subject: Re: fileutils/textutils LC_COLLATE support
Date: Thu, 11 Oct 2001 18:59:41 -0700 (PDT)

> Date: Thu, 11 Oct 2001 18:57:28 +0100 (BST)
> From: Corin Hartland-Swann <address@hidden>
> 
> What I think would be the best solution is if ls(1) and sort(1) (and
> possibly other programs in textutils) were designed to sort by
> byte-ordering by default, and were given an option to use the locale-based
> collation.

POSIX specifies the behavior that you're objecting to, and most modern
Unix implementations conform to the standard in this respect.  If
there were a good reason to depart from the standard, I suppose we
could do so -- but I haven't heard a good reason yet.


> Which all suggest that the intent behind the sort program is to do byte-
> ordering unless otherwise directed.

That may have been true in 1977, but the modern intent behind the sort
program is to use LC_COLLATE.  This has been true for a decade in
Unix, and GNU/Linux is catching up with the rest of the world.


> The --ignore-case option, for instance, is now meaningless under ISO
> 8859-1 because LC_COLLATE makes upper and lower-cased letters
> equivalent.

I think you're confusing the collation sequence with the encoding
here.  On most platforms, the letters are not equivalent, just next to
each other.  For example, on Solaris 8 with LC_ALL=en_US.UTF-8, I get:

$ (echo a; echo A) | sort -u
A
a
$ (echo a; echo A) | sort -u -f
a

so -f (a.k.a --ignore-case) is not meaningless.  I get the same
result for both GNU textutils 2.0.16 sort and Solaris 8 sort.


> I'm sure I'm not the only one who has assumed that byte ordering would
> remain the default action.

Yes.  That is unfortunately true.


> I believe that this is the right thing to do because it preserves the
> existing and expected behaviour, but allows the user to specify locale-
> based collation if they want to. I think that this is something that
> should be specified explicitly.

But the default behavior is the traditional one.  The user (or perhaps
the system administrator) must set LC_ALL, LC_COLLATE, or LANG to get
the new behavior.

(Maybe you should complain to your system administrator.  :-)



reply via email to

[Prev in Thread] Current Thread [Next in Thread]