bug-gnu-utils
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Gettext MO format version numbers


From: Dwayne Bailey
Subject: Re: Gettext MO format version numbers
Date: Mon, 16 Feb 2009 10:59:11 +0200

On Sun, 2009-02-15 at 22:28 +0100, Bruno Haible wrote:
> Hi,
> 
> Dwayne Bailey wrote:
> > 1) The Gettext documentation states that we are at version 0 of the
> > format.
> 
> This is not up to date. I'm fixing it as below.

Thanks.

> > Yet I have observed some files with a version number of 1, in 
> > the wild.  I was able to parse them correctly by simply ignoring the
> > version information.  Is there such a version?
> 
> Yes, see file intl/gmo.h:
> 
> /* Revision number of the currently used .mo (binary) file format.  */
> #define MO_REVISION_NUMBER 0
> #define MO_REVISION_NUMBER_WITH_SYSDEP_I 1
> 
> > 2) .mo files for certain RTL languages have different version number.
> > msgfmt and msgunfmt are able to read these files but when opened in a
> > hex editor the first few bytes are as follows:
> > 
> > DE12 0495 0100 0100
> 
> This means: major revision number is 1 (meaning that some format string
> translations use "I" for internationalized output digits [Farsi]), 

Ah.  I've seen these, since I'm using these for translation recovery and
not to actually render the text I think I'm safe.  Will be a problem I
assume for MO files that we produce.

> and
> minor revision number is 1 (meaning that some strings have substrings
> whose expansion depends on the system type).

Could you maybe explain what this means? By system type do you mean some
arbitrary list of types?  It seems from the code that there are a list
of possible types apart from the actual message pairs.

> > Simply ignoring the version information allowed me to read these files
> > correctly.  But I would like to know the cause so that we can continue
> > to produce correct .mo files.
> 
> It's better to support these other major and minor versions, but OTOH it's
> a lot of code for rarely used features. In order to support all kinds of
> MO file versions, without copying all the hairy stuff from gettext's
> write-mo.c and read-mo.c, it is better to simply invoke 'msgfmt' and
> 'msgunfmt' when creating or reading MO files, respectively.

In some use cases I've needed to ignore using the commands directly.  My
guess is that I can get away with what I need as I'm mostly reading MO
files, but that writing might not be as forgiving.

Now if I could do reading and writing like I do with libgettextpo that
would help :)  But I guess that might be a lot of work for a single
user.

Thanks this has helped 

I reviewed the doc change below. Some comments:

1) Would be good to reference people to gmo.h if they need further
information.
2) The page has an illustration of the structure of an MO file.  Your
diff doesn't add information about system type entries to that picture.
Lots of work I assume so maybe telling people to look in gmo.h is the
correct approach.

> 
> Bruno
> 
> 
> 2009-02-15  Bruno Haible  <address@hidden>
> 
>       * gettext.texi (MO Files): Update w.r.t. the maximum revision in use.
>       Reported by Dwayne Bailey <address@hidden>.
> 
> diff -u -r1.160 gettext.texi
> --- gettext.texi        28 Jan 2009 01:55:03 -0000      1.160
> +++ gettext.texi        15 Feb 2009 21:20:32 -0000      1.162
> @@ -5285,15 +5285,23 @@
>  The first two words serve the identification of the file.  The magic
>  number will always signal GNU MO files.  The number is stored in the
>  byte order of the generating machine, so the magic number really is
> -two numbers: @code{0x950412de} and @code{0xde120495}.  The second
> -word describes the current revision of the file format.  For now the
> -revision is 0.  This might change in future versions, and ensures
> -that the readers of MO files can distinguish new formats from old
> -ones, so that both can be handled correctly.  The version is kept
> +two numbers: @code{0x950412de} and @code{0xde120495}.
> +
> +The second word describes the current revision of the file format,
> +composed of a major and a minor revision number.  The revision numbers
> +ensure that the readers of MO files can distinguish new formats from
> +old ones and handle their contents, as far as possible.  For now the
> +major revision is 0 or 1, and the minor revision is also 0 or 1.  More
> +revisions might be added in the future.  A program seeing an unexpected
> +major revision number should stop reading the MO file entirely; whereas
> +an unexpected minor revision number means that the file can be read but
> +will not reveal its full contents, when parsed by a program that
> +supports only smaller minor revision numbers.
> +
> +The version is kept
>  separate from the magic number, instead of using different magic
>  numbers for different formats, mainly because @file{/etc/magic} is
> -not updated often.  It might be better to have magic separated from
> -internal format version identification.
> +not updated often.
>  
>  Follow a number of pointers to later tables in the file, allowing
>  for the extension of the prefix part of MO files without having to
-- 
Dwayne Bailey
Associate                                      +27 12 460 1095 (w)
Translate.org.za                               +27 83 443 7114 (c)

Recent blog posts:
* Fixes for Skype Video, Webcam on Fedora
http://www.translate.org.za/blogs/dwayne/en/content/fixes-skype-video-webcam-fedora
* libtranslate, TM plugins and Virtaal
* Localisation Information Language - preventing mistakes and increasing the 
richness of localisation

Stop Digital Apartheid! - http://www.digitalapartheid.com
Firefox web browser in Afrikaans - http://af.www.mozilla.com/af/
African Network for Localisation (ANLoc) - http://africanlocalisation.net/






reply via email to

[Prev in Thread] Current Thread [Next in Thread]