[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: gettext locale environment variable documentation
From: |
Bruno Haible |
Subject: |
Re: gettext locale environment variable documentation |
Date: |
Sun, 3 Jun 2007 12:44:29 +0200 |
User-agent: |
KMail/1.5.4 |
Karl Berry wrote:
> Third, also in the gettext grok node, there is this item:
> 3. `LC_xxx', according to selected locale
> I don't understand why the phrase "according to the selected locale" is
> there. I surmised that xxx meant names like "MESSAGES", e.g.,
> "LC_MESSAGES". How is that locale-dependent? Does "MESSAGES" get
> translated (LC_ANZEIGEN, babelfish tells me :)?! Can you explain,
> please?
No wonder that you don't understand this: the manual confuses the terms
"locale" and "locale category". And moreover, a "locale category" is not
a category of locales... Find attached a doc change that should make it
clearer.
Thanks for reporting this major doc bug.
> P.S. Maybe also worth mentioning that "POSIX" is the same as "C" for a
> locale name?
I don't think it's worth mentioning: I've never seen people using the
"POSIX" locale. They all prefer to use the "C" locale, since it's identical
and shorter to write.
Bruno
--- gettext.texi 27 May 2007 21:07:56 -0000 1.124
+++ gettext.texi 3 Jun 2007 10:35:44 -0000
@@ -724,7 +724,7 @@
termed the country's locale. The locale represents the knowledge
needed to support the country's native attributes.
address@hidden locale facets
address@hidden locale categories
There are a few major areas which may vary between countries and
hence, define what a locale must describe. The following list helps
putting multi-lingual messages into the proper context of other tasks
@@ -736,7 +736,7 @@
@cindex codeset
@cindex encoding
@cindex character encoding
address@hidden locale facet, LC_CTYPE
address@hidden locale category, LC_CTYPE
The codeset most commonly used through out the USA and most English
speaking parts of the world is the ASCII codeset. However, there are
@@ -751,7 +751,7 @@
@item Currency
@cindex currency symbols
address@hidden locale facet, LC_MONETARY
address@hidden locale category, LC_MONETARY
The symbols used vary from country to country as does the position
used by the symbol. Software needs to be able to transparently
@@ -759,7 +759,7 @@
@item Dates
@cindex date format
address@hidden locale facet, LC_TIME
address@hidden locale category, LC_TIME
The format of date varies between locales. For example, Christmas day
in 1994 is written as 12/25/94 in the USA and as 25/12/94 in Australia.
@@ -772,7 +772,7 @@
@item Numbers
@cindex number format
address@hidden locale facet, LC_NUMERIC
address@hidden locale category, LC_NUMERIC
Numbers can be represented differently in different locales.
For example, the following numbers are all written correctly for
@@ -791,7 +791,7 @@
@item Messages
@cindex messages
address@hidden locale facet, LC_MESSAGES
address@hidden locale category, LC_MESSAGES
The most obvious area is the language support within a locale. This is
where GNU @code{gettext} provides the means for developers and users to
@@ -800,6 +800,17 @@
@end table
address@hidden locale categories
+These areas of cultural conventions are called @emph{locale categories}.
+It is an unfortunate term; @emph{locale aspects} or @emph{locale feature
+categories} would be a better term, because each ``locale category''
+describes an area or task that requires localization. The concrete data
+that describes the cultural conventions for such an area and for a particular
+culture is also called a @emph{locale category}. In this sense, a locale
+is composed of several locale categories: the locale category describing
+the codeset, the locale category describing the formatting of numbers,
+the locale category containing the translated messages, and so on.
+
@cindex Linux
Components of locale outside of message handling are standardized in
the ISO C standard and the SUSV2 specification. GNU @code{libc}
@@ -1584,11 +1595,11 @@
@file{config.h} or by the Makefile. For now consult the @code{gettext}
or @code{hello} sources for more information.
address@hidden locale facet, LC_ALL
address@hidden locale facet, LC_CTYPE
address@hidden locale category, LC_ALL
address@hidden locale category, LC_CTYPE
The use of @code{LC_ALL} might not be appropriate for you.
@code{LC_ALL} includes all locale categories and especially
address@hidden This later category is responsible for determining
address@hidden This latter category is responsible for determining
character classes with the @code{isalnum} etc. functions from
@file{ctype.h} which could especially for programs, which process some
kind of input language, be wrong. For example this would mean that a
@@ -1596,8 +1607,8 @@
France but not in the U.S.
Some systems also have problems with parsing numbers using the
address@hidden functions if an other but the @code{LC_ALL} locale is used.
-The standards say that additional formats but the one known in the
address@hidden functions if an other but the @code{LC_ALL} locale category is
+used. The standards say that additional formats but the one known in the
@code{"C"} locale might be recognized. But some systems seem to reject
numbers in the @code{"C"} locale format. In some situation, it might
also be a problem with the notation itself which makes it impossible to
@@ -1621,13 +1632,13 @@
@end group
@end example
address@hidden locale facet, LC_CTYPE
address@hidden locale facet, LC_COLLATE
address@hidden locale facet, LC_MONETARY
address@hidden locale facet, LC_NUMERIC
address@hidden locale facet, LC_TIME
address@hidden locale facet, LC_MESSAGES
address@hidden locale facet, LC_RESPONSES
address@hidden locale category, LC_CTYPE
address@hidden locale category, LC_COLLATE
address@hidden locale category, LC_MONETARY
address@hidden locale category, LC_NUMERIC
address@hidden locale category, LC_TIME
address@hidden locale category, LC_MESSAGES
address@hidden locale category, LC_RESPONSES
@noindent
On all POSIX conformant systems the locale categories @code{LC_CTYPE},
@code{LC_MESSAGES}, @code{LC_COLLATE}, @code{LC_MONETARY},
@@ -5281,8 +5292,8 @@
returned. If the argument is @code{NULL} the result is undefined.
One thing which should come into mind is that no explicit dependency to
-the used domain is given. The current value of the domain for the
address@hidden locale is used. If this changes between two
+the used domain is given. The current value of the domain is used.
+If this changes between two
executions of the same @code{gettext} call in the program, both calls
reference a different message catalog.
@@ -5322,7 +5333,7 @@
Both take an additional argument at the first place, which corresponds
to the argument of @code{textdomain}. The third argument of
address@hidden allows to use another locale but @code{LC_MESSAGES}.
address@hidden allows to use another locale category but @code{LC_MESSAGES}.
But I really don't know where this can be useful. If the
@var{domain_name} is @code{NULL} or @var{category} has an value beside
the known ones, the result is undefined. It should also be noted that
@@ -5364,8 +5375,8 @@
files. The way usually used in Unix environments is have this encoding
in the file name. This is also done here. The directory name given in
@code{bindtextdomain}s second argument (or the default directory),
-followed by the value and name of the locale and the domain name are
-concatenated:
+followed by the name of the locale, the locale category, and the domain name
+are concatenated:
@example
@var{dir_name}/@var{locale}/address@hidden/@var{domain_name}.mo
@@ -5378,18 +5389,19 @@
@end example
@noindent
address@hidden is the value of the locale whose name is this
address@hidden is the name of the locale category which is designated by
@address@hidden For @code{gettext} and @code{dgettext} this
@address@hidden is always @address@hidden
system, eg Ultrix, don't have @code{LC_MESSAGES}. Here we use a more or
less arbitrary value for it, namely 1729, the smallest positive integer
which can be represented in two different ways as the sum of two cubes.}
-The value of the locale is determined through
+The name of the locale category is determined through
@code{setlocale (address@hidden, NULL)}.
@footnote{When the system does not support @code{setlocale} its behavior
in setting the locale values is simulated by looking at the environment
variables.}
address@hidden specifies the locale category by the third argument.
+When using the function @code{dcgettext}, you can specify the locale category
+through the third argument.
@node Charset conversion, Contexts, Locating Catalogs, gettext
@subsection How to specify the output character set @code{gettext} uses
@@ -5510,7 +5522,7 @@
These are generalizations of @code{pgettext}. They behave similarly to
@code{dgettext} and @code{dcgettext}, respectively. The @var{domain_name}
argument defines the translation domain. The @var{category} argument
-allows to use another locale facet than @code{LC_MESSAGES}.
+allows to use another locale category than @code{LC_MESSAGES}.
As as example consider the following fictional situation. A GUI program
has a menu bar with the following entries:
@@ -6254,7 +6266,7 @@
@vindex address@hidden, environment variable}
@vindex address@hidden, environment variable}
@vindex address@hidden, environment variable}
address@hidden @code{LC_xxx}, according to selected locale
address@hidden @code{LC_xxx}, according to selected locale category
@vindex address@hidden, environment variable}
@item @code{LANG}
@end enumerate
@@ -6394,7 +6406,7 @@
@code{libintl} code with their software.
Message catalog support is however only the tip of the iceberg.
-What about the data for the other locale categories. They also have
+What about the data for the other locale categories? They also have
a number of deficiencies. Are we going to abandon them as well and
develop another duplicate set of routines (should @code{libintl}
expand beyond message catalog support)?
@@ -8262,7 +8274,7 @@
@code{dcgettext}, @code{dcngettext} available from within the language.
These functions are less often used, but are nevertheless necessary for
particular purposes: @code{ngettext} for correct plural handling, and
address@hidden and @code{dcngettext} for obeying other locale
address@hidden and @code{dcngettext} for obeying other locale-related
environment variables than @code{LC_MESSAGES}, such as @code{LC_TIME} or
@code{LC_MONETARY}. For these latter functions, you need to make the
@code{LC_*} constants, available in the C header @code{<locale.h>},
@@ -8281,7 +8293,7 @@
You should either perform a @code{setlocale (LC_ALL, "")} call during
the startup of your language runtime, or allow the programmer to do so.
Remember that gettext will act as a no-op if the @code{LC_MESSAGES} and
address@hidden locale facets are not both set.
address@hidden locale categories are not both set.
@item
A programmer should have a way to extract translatable strings from a
@@ -8419,7 +8431,7 @@
translation of @code{"%d"} can be @code{"%Id"}. The effect of this flag,
on systems with GNU @code{libc}, is that in the output, the ASCII digits are
replaced with the @samp{outdigits} defined in the @code{LC_CTYPE} locale
-facet. On other systems, the @code{gettext} function removes this flag,
+category. On other systems, the @code{gettext} function removes this flag,
so that it has no effect.
Note that the programmer should @emph{not} put this flag into the