bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Feature: msgfmt should not include messages if msgid==msgstr


From: Simos Xenitellis
Subject: Re: Feature: msgfmt should not include messages if msgid==msgstr
Date: Mon, 10 Sep 2007 15:05:07 +0100

On Sun, 2007-09-09 at 23:33 +0200, Bruno Haible wrote:
> Simos Xenitellis wrote on 2007-07-19:
...
> > The problem then comes with msgfmt which compiles in messages where
> > msgid == msgstr. 
> >
> > What should be done is for msgfmt to check if a translation really needs
> > to be included (whether msgid==msgstr), and try not to include those
> > messages.
> > ...
> > To provide some figures, the en_GB translations in Ubuntu Linux takes up
> > ~26MB though a fraction of the size is really necessary.
> 
> There are two reasons why this cannot work:
> 
> 1) The 'en_GB' translations inherit from the 'en' translations. But
>    translation teams are managed independently. A translation team
>    will never grant a 100% trust and authority to another translation
>    team. 'en_GB' translators will never refer to 'en' translations
>    without checking the messages themselves.
> 
>    To take a concrete example. Say, a message contains the msgid "apologise".
>    An en_GB translator sees this is perfect Oxford English, and copies it
>    into the msgstr "apologise". Later on, the 'en' team decides to follow
>    U.S. orthography and puts "apologize" into the en.po catalog. If the
>    en_GB.mo file had omitted the msgstr "apologise", the inheritance rules
>    would now produce the translation "apologize" (taken from en.mo) for
>    users in the en_GB locale - despite the explicitly specified different
>    spelling by the en_GB translators.

Thanks for the reply. I can see your point here.

The way I would see this need for a --no-copied option is to provide
those who make distributions the facility to reduce space on their
discretion.

While disk space is cheap on the desktop/laptop, Linux is also being
used in smaller devices with disk space and memory limits. While a user
would set a fancy LANGUAGE string, there are cases that only one
language is desired and can only be used (think for example the N800
where you can only set the locale from a menu). In addition, the
developer of a (non-installable) live cd may choose to support only one
value in LANGUAGE.

The option to use "--no-copied" is up to the person who produces
packages for the distribution, therefore it is one-way only (you have
the option to "strip" the PO files just before you compile them to MO
files; these stripped PO files are then discarded).

There are other specific cases that the maintainer can choose to do this
"stripping" before the processing of the PO files; the weather applet in
GNOME has over 4000 airports/cities in its database. For many languages
(=translator coordinators), the majority of these cities are copied to
msgstr. This specific case can shave off 7MB of uncompressed data from
specific live CDs. Again, it's up to the distribution to accept or not.

All in all, I see fit to have the --no-copied option in msgattrib only
(not in msgfmt), so that those who make distributions can choose if they
want to go through this optimisation or not. Without this feature in
gettext-tools, distributions do not have the option, and they need to
support tools outside the standard tool chain.

Sorry for the atrocious patch. It looks somewhat better at
http://blogs.gnome.org/simos/2007/07/23/important-mo-file-optimisation-for-en_-locales-and-partly-others/

Simos






reply via email to

[Prev in Thread] Current Thread [Next in Thread]