bug-bison
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Grammatical forms in translatable texts


From: Frank Heckenbach
Subject: Grammatical forms in translatable texts
Date: Sat, 18 Apr 2020 17:09:11 +0200

Hi,

when I played a bit with the new i18n features in Bison 3.5.90,
I noticed some grammatical issues with the generated texts. I think
this belongs on a Bison list, since it affects not only the
translations themselves, but also the translatable texts Bison
emits.

E.g.:

  [en] msgid "syntax error, unexpected %s"
  [de] msgstr "Syntaxfehler, unerwartetes %s"
  [fr] msgstr "erreur de syntaxe, %s inattendu"

I know (for de) and think (for fr) that "unerwartetes"/"inattendu"
needs to take different forms depending on the gender of %s.

Another case:

  [en] msgid "syntax error, unexpected %s, expecting %s or %s"
  [de] msgstr "Syntaxfehler, unerwartetes %s, hatte %s oder %s erwartet"
  [fr] msgstr "erreur de syntaxe, %s inattendu, attendait %s ou %s"

The way it's worded (it could be worded differently) is so that the
first %s must be in the nominative (subject) case, but the other
ones in the accusative (object) case, in de, I think also in fr (and
theoretically even in en, but there's no difference except for a few
words like "I"/"me" which are unlikely to occur in token names).

Luckily in de, these forms are often the same -- but not always,
e.g. "Buchstabe"(nom.) ("letter") / "Buchstaben"(acc.). They're
often different for adjectives which might easily be part of token
names, and for articles -- which brings us to another point:

The first %s requires no article in de just like in en; the other
ones, strictly speaking, do require an article (though in a short
message like this, it might be barely acceptable to omit them, in de
somewhat less so than in en).

Seeing that de is rather closely related to en, compared to most
other languages, other languages might have even more grammatical
issues.

As a complex example, using these token names:

  "Cyrillic letter" -> "kyrillischer Buchstabe"
  "Latin letter" -> "lateinischer Buchstabe"
  "Greek letter" -> "griechischer Buchstabe"

a correctly translated message in de would look like this:

  "Syntaxfehler, unerwarteter(nom.masc.) kyrillischer(nom.masc.)
  Buchstabe(nom.), hatte einen(article/acc.masc.)
  lateinischen(acc.masc.) Buchstaben(acc.) oder [einen](same
  article, optional) griechischen(acc.masc.) Buchstaben(acc.)
  erwartet"

Of course, you might consider this nitpicking. I bring it up because
with the current wording of the translatable texts, it's basically
impossible to produce grammatically correct translations in all
cases.

Also, as currently worded, the token names themselves would need to
be translated differently in different contexts, so Bison users
would have to be aware of that. I've done something similar in
another program of mine where I needed two forms (only two,
luckily), and defined a "|" in the translations to separate them
(with no "|" meaning the same form for both), e.g.:

  msgid "Cyrillic letter"
  msgstr "kyrillischer Buchstabe|kyrillischen Buchstaben"

Of course, this requires (and would require in Bison) the caller of
"_" to parse this, or pass a parameter to "_", so "_" could parse it
(more flexible, and those who don't care about it could just ignore
that parameter).

Another option would be rather roundabout wordings to make sure the
token names always occur in the same case and without article, but
these would generally be less readable (and I'm not sure if even
possible in every language), something like:

  "syntax error, the token \"%s\" was unexpected, expected one of
  the following tokens: %s, ..."

Regards,
Frank



reply via email to

[Prev in Thread] Current Thread [Next in Thread]