bug-gettext
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

POSIX gettext() and the installation directories for .mo files


From: Bruno Haible
Subject: POSIX gettext() and the installation directories for .mo files
Date: Tue, 04 May 2021 00:42:38 +0200
User-agent: KMail/5.1.3 (Linux/4.4.0-206-generic; KDE/5.18.0; x86_64; ; )

https://posix.rhansen.org/p/gettext_split
says (line 77..79)

  "For each locale name in LANGUAGE, or if LANGUAGE is not set or is
   empty, or no suitable messages object is found in processing LANGUAGE,
   the pathname used to locate the messages object shall be
   dirname/localename/categoryname/textdomainname.mo, where:
   ...
   For the LANGUAGE search, the localename part is each locale name from
   LANGUAGE in turn.  For the single-locale search, the localename part
   is the name of the current locale, or the locale specified in an *_l()
   function call, for the category named by categoryname."

This is NOT how GNU gettext behaves. If POSIX standardizes it like this,
GNU libc and GNU gettext will have the choice among
  (a) looking in different (and fewer) directories than they do today,
      causing major i18n dysfunctionality to users, until the users
      have set up lots of symbolic links between directories, or
  (b) violating POSIX in this point.

I will vote for (b).

Namely, what GNU gettext does is to look in SEVERAL (not ONE) directories
per LANGUAGE element.

The localename parts of these directories are constructed from the language
identifier (element of LANGUAGE) or locale name. For example:

* The language identifier 'de' gives rise to the localename part
    de

* The language identifier 'de_AT' gives rise to the localename parts
    de_AT
    de

* The locale name 'de_AT.UTF-8' gives rise to the localename parts
    de_AT.UTF-8
    de_AT.utf8
    de_AT
    de.UTF-8
    de.utf8
    de

* The locale name 'uz_UZ.UTF-8@cyrillic gives rise to the localename parts
    uz_UZ.UTF-8@cyrillic
    uz_UZ.utf8@cyrillic
    uz_UZ@cyrillic
    uz.UTF-8@cyrillic
    uz.utf8@cyrillic
    uz@cyrillic
    uz_UZ.UTF-8
    uz_UZ.utf8
    uz_UZ
    uz.UTF-8
    uz.utf8
    uz

This list of directories is important for people who live in communities
which often (but not always) have translations of their own but can read
translations for other locales. In the examples above:

  * A user in Austria prefers translations for Austrian German, but can
    also read German with no problem.

  * A user in Uzbekistan may prefer translations in Cyrillic but can also
    read translations in Latin. [1]

If above text was adopted, it would have the consequences that

  1) Many symbolic links are needed in /usr/share/locale/. Solaris 11.4
     is a system that implements gettext() as described in above text,
     and it has the links shown below [2].

  2) Users who want to create a new locale (e.g. for English in Australia)
     will have to create a symlink
     /usr/share/locale/en_AU -> /usr/share/locale/en
     and so on for each custom locale.

  3) Users who install packages in non-privileged directories (for GNU
     programs, that's the --prefix=PREFIX option) will have to create the
     same amount of symbolic links in their PREFIX/share/locale/ directory.

  4) Users will have to set fallback logic in their LANGUAGE environment
     variable

       LANGUAGE=de_AT:de_DE

     instead of having it built-in:

       LANGUAGE=de_AT

This is BAD, BAD, BAD.

Bruno

[1] https://en.wikipedia.org/wiki/Uzbek_alphabet
[2]
$ ls -l /usr/share/locale
total 102
drwxr-xr-x   3 root     other          3 Oct 13  2018 C
drwxr-xr-x   3 root     other          4 Oct 13  2018 de
lrwxrwxrwx   1 root     root           2 Oct 13  2018 de_DE -> de
lrwxrwxrwx   1 root     root           2 Oct 13  2018 de_DE.ISO8859-1 -> de
lrwxrwxrwx   1 root     root           2 Oct 13  2018 de_DE.ISO8859-15 -> de
lrwxrwxrwx   1 root     root           2 Oct 13  2018 de_DE.UTF-8 -> de
lrwxrwxrwx   1 root     root           2 Oct 13  2018 de.ISO8859-15 -> de
drwxr-xr-x   3 root     other          3 Oct 13  2018 de.us-ascii
lrwxrwxrwx   1 root     root           2 Oct 13  2018 de.UTF-8 -> de
drwxr-xr-x   3 root     other          3 Oct 13  2018 en
drwxr-xr-x   3 root     other          3 Oct 13  2018 en_US
drwxr-xr-x   3 root     other          3 Oct 13  2018 en@boldquot
drwxr-xr-x   3 root     other          3 Oct 13  2018 en@quot
drwxr-xr-x   3 root     other          3 Oct 13  2018 en@shaw
drwxr-xr-x   3 root     other          4 Oct 13  2018 es
drwxr-xr-x   3 root     other          3 Oct 13  2018 es_ES
lrwxrwxrwx   1 root     root           2 Oct 13  2018 es_ES.ISO8859-1 -> es
lrwxrwxrwx   1 root     root           2 Oct 13  2018 es_ES.ISO8859-15 -> es
lrwxrwxrwx   1 root     root           2 Oct 13  2018 es_ES.UTF-8 -> es
lrwxrwxrwx   1 root     root           2 Oct 13  2018 es.ISO8859-15 -> es
lrwxrwxrwx   1 root     root           2 Oct 13  2018 es.UTF-8 -> es
drwxr-xr-x   3 root     other          4 Oct 13  2018 fr
lrwxrwxrwx   1 root     root           2 Oct 13  2018 fr_FR -> fr
lrwxrwxrwx   1 root     root           2 Oct 13  2018 fr_FR.ISO8859-1 -> fr
lrwxrwxrwx   1 root     root           2 Oct 13  2018 fr_FR.ISO8859-15 -> fr
lrwxrwxrwx   1 root     root           2 Oct 13  2018 fr_FR.UTF-8 -> fr
lrwxrwxrwx   1 root     root           2 Oct 13  2018 fr.ISO8859-15 -> fr
lrwxrwxrwx   1 root     root           2 Oct 13  2018 fr.UTF-8 -> fr
drwxr-xr-x   3 root     other          4 Oct 13  2018 it
lrwxrwxrwx   1 root     root           2 Oct 13  2018 it_IT -> it
lrwxrwxrwx   1 root     root           2 Oct 13  2018 it_IT.ISO8859-1 -> it
lrwxrwxrwx   1 root     root           2 Oct 13  2018 it_IT.ISO8859-15 -> it
lrwxrwxrwx   1 root     root           2 Oct 13  2018 it_IT.UTF-8 -> it
lrwxrwxrwx   1 root     root           2 Oct 13  2018 it.ISO8859-15 -> it
lrwxrwxrwx   1 root     root           2 Oct 13  2018 it.UTF-8 -> it
drwxr-xr-x   3 root     other          4 Oct 13  2018 ja
lrwxrwxrwx   1 root     root           2 Oct 13  2018 ja_JP.eucJP -> ja
lrwxrwxrwx   1 root     root           2 Oct 13  2018 ja_JP.PCK -> ja
lrwxrwxrwx   1 root     root           2 Oct 13  2018 ja_JP.UTF-8 -> ja
drwxr-xr-x   3 root     other          4 Oct 13  2018 ko
lrwxrwxrwx   1 root     root           2 Oct 13  2018 ko_KR.EUC -> ko
lrwxrwxrwx   1 root     root           2 Oct 13  2018 ko_KR.UTF-8 -> ko
lrwxrwxrwx   1 root     root           2 Oct 13  2018 ko.UTF-8 -> ko
drwxr-xr-x   3 root     other          4 Oct 13  2018 pt
drwxr-xr-x   3 root     other          4 Oct 13  2018 pt_BR
lrwxrwxrwx   1 root     root           5 Oct 13  2018 pt_BR.ISO8859-1 -> pt_BR
drwxr-xr-x   3 root     other          3 Oct 13  2018 pt_BR.us-ascii
lrwxrwxrwx   1 root     root           5 Oct 13  2018 pt_BR.UTF-8 -> pt_BR
lrwxrwxrwx   1 root     root           2 Oct 13  2018 pt.ISO8859-15 -> pt
drwxr-xr-x   3 root     other          3 Oct 13  2018 pt.us-ascii
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh -> zh_CN
drwxr-xr-x   3 root     other          4 Oct 13  2018 zh_CN
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh_CN.EUC -> zh_CN
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh_CN.GB18030 -> zh_CN
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh_CN.GBK -> zh_CN
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh_CN.UTF-8 -> zh_CN
drwxr-xr-x   3 root     other          4 Oct 13  2018 zh_TW
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh_TW.BIG5 -> zh_TW
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh_TW.EUC -> zh_TW
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh_TW.UTF-8 -> zh_TW
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh.GBK -> zh_CN
lrwxrwxrwx   1 root     root           5 Oct 13  2018 zh.UTF-8 -> zh_CN




reply via email to

[Prev in Thread] Current Thread [Next in Thread]