bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 180


From: Mingye Wang (Arthur2e5)
Subject: [bug-gnu-libiconv] GB2312 incompatible with GB18030; violation of GB 18030 "principles"
Date: Thu, 29 Sep 2016 02:33:51 -0400
User-agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:38.0) Gecko/20100101 Thunderbird/38.7.2

Hello,

I am not sure if someone has brought this up before, as what I am reporting is, in fact, a well-documented issue. [1]
  [1]: https://en.wikipedia.org/wiki/GB_2312#Two_implementations_of_GB2312

iconv encodes the GB code points A1A4 and A1AA differently for GB 2312 and GB 18030:

bytes   gb2312  gb18030
-----   ------  -------
A1A4    U+00B7  U+30FB
A1AA    U+2014  U+2015

This slight difference breaks compatibility between these two encodings, a principle of the mandatory GB 18030[^1] standard:
  [^1]: -2000 and -2005. In 2000 it says "de facto internal encoding".

> 3. Principles
> =============
>
> This standard is backwards compatible with the internal encoding
> defined in GB 2312.
> ...


This violation of standard principles is not rare in the FOSS world, according to [1]. Someone submitted a similar bug to Python[2], but it got marked "wontfix" to ensure compatibility with "the rest of the FOSS world" as well as round-trip safety (in case of a Ruby-like normalization[^2]). I am submitting this bug in hope that changes in libiconv, an important reference implementation for "the rest of the FOSS world", can lead to revisions in other libraries.
  [2]: https://bugs.python.org/issue24036
[^2]: Ruby uses a gb18030-compatible implementation internally, but still accepts Unicode code points from the incompatible code points.

--
Regards,

Arthur2e5

Attachment: signature.asc
Description: OpenPGP digital signature


reply via email to

[Prev in Thread] Current Thread [Next in Thread]