bug-fileutils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: fileutils/textutils LC_COLLATE support


From: Corin Hartland-Swann
Subject: Re: fileutils/textutils LC_COLLATE support
Date: Fri, 19 Oct 2001 11:08:01 +0100 (BST)

Hi Paul,

On Thu, 11 Oct 2001, Paul Eggert wrote:
> > What I think would be the best solution is if ls(1) and sort(1) (and
> > possibly other programs in textutils) were designed to sort by
> > byte-ordering by default, and were given an option to use the locale-based
> > collation.
> 
> POSIX specifies the behavior that you're objecting to, and most modern
> Unix implementations conform to the standard in this respect.  If
> there were a good reason to depart from the standard, I suppose we
> could do so -- but I haven't heard a good reason yet.

Fair enough - if it is so written, so it shall be :)

> > Which all suggest that the intent behind the sort program is to do byte-
> > ordering unless otherwise directed.
> 
> That may have been true in 1977, but the modern intent behind the sort
> program is to use LC_COLLATE.  This has been true for a decade in
> Unix, and GNU/Linux is catching up with the rest of the world.

I just hadn't ever seen the particular symptoms I was describing (the
equivalence of upper and lower-case characters, and ignoring dashes) on
any other OS or other Linux distribution. I don't know whether that just
means that my linux distribution has broken locale settings, or everything
else did...

> $ (echo a; echo A) | sort -u
> A
> a

But what do you get for:

$ (echo a; echo B) | sort -u

Would you still expect all the uppercase letters to come first by default?

> > I'm sure I'm not the only one who has assumed that byte ordering would
> > remain the default action.
> 
> Yes.  That is unfortunately true.
> 
> > I believe that this is the right thing to do because it preserves the
> > existing and expected behaviour, but allows the user to specify locale-
> > based collation if they want to. I think that this is something that
> > should be specified explicitly.
> 
> But the default behavior is the traditional one.  The user (or perhaps
> the system administrator) must set LC_ALL, LC_COLLATE, or LANG to get
> the new behavior.
> 
> (Maybe you should complain to your system administrator.  :-)

That would be me, then :)

OK - you've convinced me. It sounds like this is the way it should be.

However, can I persuade you to add a --binary option (or similar) to force
straight byte-order sorting? This would allow users to keep the default
locale definition for collation (so that things were displayed in the 
correct order in lists), but to force sort(1) not to use it.

Many Thanks,

Corin

/------------------------+-------------------------------------\
| Corin Hartland-Swann   |    Tel: +44 (0) 20 7491 2000        |
| Commerce Internet Ltd  |    Fax: +44 (0) 20 7491 2010        |
| 22 Cavendish Buildings | Mobile: +44 (0) 79 5854 0027        | 
| Gilbert Street         |                                     |
| Mayfair                |    Web: http://www.commerce.uk.net/ |
| London W1K 5HJ         | E-Mail: address@hidden        |
\------------------------+-------------------------------------/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]