emacs-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Emacs i18n


From: Bruno Haible
Subject: Re: Emacs i18n
Date: Wed, 20 Mar 2019 12:59:32 +0100
User-agent: KMail/5.1.3 (Linux/4.4.0-141-generic; KDE/5.18.0; x86_64; ; )

Richard Stallman wrote in
<https://lists.gnu.org/archive/html/emacs-devel/2019-03/msg00328.html>:

> I can envision something like this:
>
>       "russian-nom:%d байт%| скопирован%|, %s, %s"
>
> where the 'russian-nom' operator would replace the two %| sequences
> with the appropriate declensional suffixes for the nominative case.

It is, of course, tempting to try to do morphological analysis in an
algorithmic way, based on our background as algorithm hackers. François
Pinard and others considered this, back in 1995 when they started i18n in GNU.

The reason this approach was not chosen is still valid today:

When you design a translation system, you have two personas:
  - the programmer,
  - the translator.

The translation system defines
  1) which information flows from the programmer to the translator,
     and in which format,
  2) which information flows back from the translator to the programmer,
     and in which format.

And it has to cope with the assumed skills of these personas:

  - The programmer, you can assume, can write and understand algorithms,
    but does not master the grammar of more than one language (usually).

  - The translator, you can assume, can translate sentences and knows
    about the different meanings of words in different context. But they
    cannot write nor understand algorithms. Many translators, in fact,
    don't see the grammar as a set of rules.

You may find some people on the intersection, such as a Russian hacker,
but it is hard to find people with both skills for languages such as
Vietnamese, Slovenian, or Basque. So, you better design the system in
such a way that no person is assumed to have both skills.

The challenge is to define these formats 1) and 2) in a way that

  * Programmers can do their job with their skills (i.e. don't need to
    understand Russian).

  * Translators can do their job with their skills (i.e. don't need to
    understand algorithms).

In the gettext approach (where 1) are POT files and 2) are PO files) we
added plural form handling, which is just a small morphological variation,
and it required a significant amount of documentation and education for
translators. I would say, it is on the limit what we can make translators
grok.

Now, when you give a translator a string

   "russian-nom:%d байт%| скопирован%|, %s, %s"

you need to think about the appropriate tooling that will make the
translator understand
  - what 'russian-nom' means,
  - what the '|' characters mean,
  - what the '%' characters mean.
Either the translator tool should somehow highlight these characters
and present on-line help, or it should present it as a sequence of
strings to translate:

  Rule: russian-nom
  "%d байт"
  " скопирован"
  ", %s, %s"

It is important to realize that each such case of morphological variation
requires translator tooling support. And unfortunately different such tools
exist, and every translator has their preferred one. For the plural form
handling alone, it took several years until the main tools had support for
it in their UI.

Bruno




reply via email to

[Prev in Thread] Current Thread [Next in Thread]