Re: Question regarding gettext behavior on iconv failure

bug-gettext

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Question regarding gettext behavior on iconv failure

From:	Bruno Haible
Subject:	Re: Question regarding gettext behavior on iconv failure
Date:	Mon, 03 May 2021 23:37:41 +0200
User-agent:	KMail/5.1.3 (Linux/4.4.0-206-generic; KDE/5.18.0; x86_64; ; )

Hi Eric,

> The example in question set up several .po files and a specific
> environment to test various pluralization/transcoding fallbacks, and
> concludes with a snippet where a string with an encoding error in
> ISO-8859-1 is output in spite of an iconv failure, rather than the
> string passed in to ngettext():
> 
> 
>     n_recipients = 1;
>     // The following outputs "1 Empfänger" encoded in UTF-8:
>     printf("%s\n", ngettext("recipient", "recipients", n_recipients));
> 
>     bind_textdomain_codeset("mail", "ASCII");
> 
>     n_recipients = 1;
>     // The following outputs "recipient" with the same encoding as the
> "recipient"
>     // argument to ngettext (remember, the the system is assumed to not
> support
>     // conversion from ISO/IEC 8859-1 to ASCII):
>     printf("%s\n", ngettext("recipient", "recipients", n_recipients));
>     // On GNU gettext, "1 Empfänger" is output in ISO-8859-1 here (i.e.
> no conversion is done). I think we already agreed on considering this
> behavior a bug,

I cannot reproduce this. Find attached my (complete) test case.

GNU gettext uses iconv_open() with arguments that indicate that a not 1:1
conversion (e.g. transliteration) is better than a failure.

The result thus depends on the iconv implementation. For GNU gettext
the recommended iconv implementations are:
  - on glibc systems: GNU libc,
  - otherwise: GNU libiconv.
Therefore here are the results on GNU libc (2.32) and on some other OS
(FreeBSD 13) with GNU libiconv:

With a mail.po that contains only umlauts:

Output on glibc systems (e.g. 2.32):
1 Empfänger
1 Empfaenger

Output on non-glibc systems with GNU libiconv:
1 Empfänger
1 Empf"anger

With a mail-utf8.po that contains also Hanzi characters:

Output on glibc systems (e.g. 2.32):
1 Empfänger Chinese (中文,普通话,汉语)      你好
1 Empfaenger Chinese (??,???,??)      ??

Output on non-glibc systems with GNU libiconv:
1 Empfänger Chinese (中文,普通话,汉语)      你好
recipient

As you can see:

  * For the first line of output, since the output encoding is UTF-8,
    iconv() never needed transliteration and never failed.

  * For the second line of output, in the first three cases, iconv()
    did transliteration, and the result was always an ASCII string.
    (The quality of glibc's transliteration of Hanzi characters to
    question marks can be debated, though.)

  * In the last case, iconv() failed, and thus GNU gettext output
    the corresponding argument to ngettext() untranslated.

> This raises a few questions: does the GNU gettext team agree that this
> can be considered a bug

No. Please provide a reproducible test case, that produces wrong results
on an interesting platform. NetBSD 3.0 or IRIX 6.5, for example, don't
count.

Bruno

foo.c
Description: Text Data

mail.po
Description: Text Data

mail-utf8.po
Description: Text Data

[Prev in Thread]

Current Thread

[Next in Thread]

Question regarding gettext behavior on iconv failure, Eric Blake, 2021/05/03
- Re: Question regarding gettext behavior on iconv failure, Bruno Haible <=
  - Re: Question regarding gettext behavior on iconv failure, Carlos O'Donell, 2021/05/04
    - Re: Question regarding gettext behavior on iconv failure, Bruno Haible, 2021/05/04

Prev by Date: Question regarding gettext behavior on iconv failure
Next by Date: POSIX gettext() and iconv_open()
Previous by thread: Question regarding gettext behavior on iconv failure
Next by thread: Re: Question regarding gettext behavior on iconv failure
Index(es):
- Date
- Thread