[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: What is necessary and sufficient to let 'sort' sort as if strcmp in
From: |
Bob Proulx |
Subject: |
Re: What is necessary and sufficient to let 'sort' sort as if strcmp in C is used? |
Date: |
Sat, 1 Feb 2014 15:03:27 -0700 |
User-agent: |
Mutt/1.5.21 (2010-09-15) |
Peng Yu wrote:
> man sort says "Set LC_ALL=C to get the traditional sort order that
> uses native byte values."
Yes.
> man comm says "Note, comparisons honor the rules specified by 'LC_COLLATE'."
Yes. No. Almost. It honors LANG if neither LC_COLLATE nor LC_ALL is
set. It honors LC_COLLATE if LC_ALL is not set. If LC_ALL is set
then it honors LC_ALL. LC_ALL also overrides LC_CTYPE.
> My test shows that it seems LC_COLLATE=C is sufficient to make sort
> using native byte values. Is it so?
Yes. No. Almost. LC_ALL overrides LC_COLLATE. The three variables
in locale order are LANG, then LC_COLLATE, then LC_ALL. LC_ALL also
overrides LC_CTYPE.
Setting LC_COLLATE mostly works fine. I always set this in my environment.
export LANG=en_US.UTF-8
export LC_COLLATE=C
But while setting LC_COLLATE=C works for typical western locales there
is concern about others. What will be the interaction with Chinese
big5 encoding for characters? It probably own't behave in a desirable
way. LC_ALL=C is probably required then to override LC_CTYPE.
Therefore while using LC_COLLATE alone works for some character
encodings it can't be definitively stated as working for all cases as
a general rule. Setting LC_ALL can be stated as a general rule
because LC_ALL overrides LC_CTYPE while LC_COLLATE does not.
The locale behavior is controlled by libc. If you have GNU libc
installed then the installed manual will match your system.
info -f libc 'Locale Categories'
The most recent version is available on the web at the project site:
http://www.gnu.org/software/libc/manual/html_node/Locale-Categories.html#Locale-Categories
Bob