bug-gnu-emacs
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#66760: 29.1; [BUG] GB18030 Incorrect Encoding


From: Andreas Schwab
Subject: bug#66760: 29.1; [BUG] GB18030 Incorrect Encoding
Date: Thu, 26 Oct 2023 16:20:59 +0200
User-agent: Gnus/5.13 (Gnus v5.13)

On Okt 26 2023, Ruijie Yu wrote:

> I have noticed that in GB18030 encoding, certain ranges of characters
> have incorrect encodings.
>
> One example is U+217A (SMALL ROMAN NUMERAL ELEVEN).  The expected
> encoding is 81 36 C5 30 (as can be seen from the GB18030 standard [1]
> and verified from other programs such as iconv and MySQL), whereas the
> observed encoding within Emacs is 81 36 C4 39, with a 1-codepoint
> offset.

This is a bug in the generation of GB180304.map.  The gb180303.awk
script assumes that the 4-byte encodings of GB18030 are filling the
holes in sequence of characters with a 2-byte encoding by Unicode
codepoint order, but there are some places where codepoints from the PUA
area are inserted into the sequence.  For example, U+1E3E maps to 81 35
F4 36, the next codepoint not mapped to a 2-byte code is U+1E40, but
that maps to 81 35 F4 38, whereas 81 35 F4 37 is the encoding of U+E7C7.
So the output gets out of sync.

-- 
Andreas Schwab, SUSE Labs, schwab@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."





reply via email to

[Prev in Thread] Current Thread [Next in Thread]