Re: Run-time internationalized messages

bug-bison

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Run-time internationalized messages

From:	Tim Van Holder
Subject:	Re: Run-time internationalized messages
Date:	Sun, 4 May 2003 18:19:50 +0200

> I think that the approach in mind here was a static approach.
> This should not even be very difficult to do, in view of
> Bison's current M4 approach:
> Simply replace all output strings that the parser uses with
> macros, then merely supply a special string macro file. If
> you then want to change the parser strings, merely change
> that output.

Hmm - that sounds like you decide the output language at bison
time.  This does not seem a great idea.

> Now you evidently want a dynamic approach. One approach might
> be to put all the default strings in character arrays, which
> easily can be changed at runtime, if the names of the strings
> are known. If the strings are already in M4 macros, the only
> thing that would be needed is a special M4 skeleton file.

Why reinvent the wheel?  Many systems already have some form of
gettext, which was made exactly for this purpose: translating
messages.  All bison needs to to is wrap its translatable strings
in some macro (be it the canonical '_', or some other one), and
provide a default no-op version of that macro (much like most GNU
apps do now).
Perhaps the message strings could be named variables to make a
custom translation routine easier.  This would require another
macro to mark these strings as translatable but not translate them
yet.

Example:

[bison output]

#if !defined(YYTRANSLATE)
# define YYTRANSLATE(msg) msg
#endif

// This probably never needs a "real" definition - it's only here
// to enable xgettext-like tools to find & extract translatable
// strings.
//#if !defined(YYTRANSLATABLE)
# define YYTRANSLATABLE(msg) msg
//#endif


...

const char* bison_syntax_error = YYTRANSLATABLE("syntax error");

...
     if (error)
        yyerror(YYTRANSLATE(bison_syntax_error));
...


[user grammar - using gettext]
...
#define YYTRANSLATE(msg)    gettext(msg)
...

-> now it is enough to run xgettext specifying YYTRANSLATE and
   YYTRANSLATABLE as keywords to check against, and it will extract
   the messages, and those can be prepared for use in the canonical
   way.

[user grammar - custom functions]
...
#define YYTRANSLATE(msg) my_translation(current_language, msg)
%parse-param { int current_language }
...
%%

const char* my_messages[2][] = {
 { "erreur syntactical", "le foo" },
 { "Syntaxfehler", "das foo" },
 { "syntaxfout", "de foo" },
 { "errorio syntactico", "il foo" }
};
const int maxlang = 3;

const char*
my_translation(int language, const char* message)
{
  if (language < 0 || language > maxlang)
    return message;
  if (message == bison_syntax_error)
    return my_messages[language][0];
  else if (message == bison_foo)
    return my_messages[language][1];
  return message;
}


This way, both methods of translation are supported.  Personally,
I feel supporting gettext (or a gettext-like API) alone is sufficient,
but having the option of doing it yourself is nice I suppose.

> As for the question of making the thing platform independent,
> there is no such a thing with respect to output languages like C/C++.
> So there you are left out in the cold. When I discussed it in a C++
> newsgroup, the best thing that people really needing this feature
> (as those writing WWW browsers/servers and such) currently could find
> was to give names to each character according to some encoding, and
> then use that. For example, using
> Unicode:
>    unsigned LATIN_CAPITAL_LETTER_A = 0x0041;
>    ...
> or
>    #define LATIN_CAPITAL_LETTER_A 0x0041
>    ...
> Then use LATIN_CAPITAL_LETTER_A instead of "A". One can
> probably easily
> produce such list of characters by taking down the Unicode
> Namelist and
> convert to C format via a suitable small program.
>
> All remains is finding someone willing to do the job. :-)

I don't see how that helps much - except with character sets, but
handling character sets would be an issue for the person writing the
translation function (gettext already handles this through iconv),
not for bison (I consider outputting UTF-8 english strings translation
as well (from en.ISO-8859-1 to en.UTF-8)).

For C++ the main problem is that the
  stream << "text" << number << "more text" << endl;
lends itself very very badly to i18n (because you are stuck with the
word order, have to translate fragments instead of sentences, and have
no way to deal with plural forms).  Which results in having to use a
printf-like function in C++, which isn't natural, but the only way to
have sane i18n.  C++ really needs a good text formatting system that
does
not lose coherence as the current system does - but for that it would
probably have to go fully OO with a to_string() primitive for all
objects.

For non-C/C++ languages (Java, C#, etc) other problems arise (no macros,
for one).  In some cases there is built-in i18n support (e.g. Java, C#),
but it will definitely not be the same across languages.

[Prev in Thread]

Current Thread

[Next in Thread]

Run-time internationalized messages, Bruce Lilly, 2003/05/02
- Re: Run-time internationalized messages, Hans Aberg, 2003/05/03
  - Re: Run-time internationalized messages, Bruce Lilly, 2003/05/03
    - Re: Run-time internationalized messages, Hans Aberg, 2003/05/03
    - Re: Run-time internationalized messages, Hans Aberg, 2003/05/04
  - Message not available
    - Re: Run-time internationalized messages, Hans Aberg, 2003/05/04
- Re: Run-time internationalized messages, Akim Demaille, 2003/05/05
- Re: Run-time internationalized messages, Tim Van Holder <=

Prev by Date: Re: Run-time internationalized messages
Next by Date: Re: Run-time internationalized messages
Previous by thread: Re: Run-time internationalized messages
Next by thread: Re : "array access bugs in bison-1.875"
Index(es):
- Date
- Thread