bug#65996: 29.1; UCS normalization is wrong

bug-gnu-emacs

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#65996: 29.1; UCS normalization is wrong

From:	Eli Zaretskii
Subject:	bug#65996: 29.1; UCS normalization is wrong
Date:	Sat, 16 Sep 2023 12:21:42 +0300

> From: awrhygty@outlook.com
> Date: Fri, 15 Sep 2023 21:49:38 +0900
> 
> 
> UCS normalization is wrong for some characters.
> 
> (1) NFD/NFKD decompostion is not done
>     U+1112E 𑄮 CHAKMA VOWEL SIGN O
>     U+1112F 𑄯 CHAKMA VOWEL SIGN AU
>     U+1134B 𑍋 GRANTHA VOWEL SIGN OO
>     U+1134C 𑍌 GRANTHA VOWEL SIGN AU
>     U+114BB 𑒻 TIRHUTA VOWEL SIGN AI
>     U+114BC 𑒼 TIRHUTA VOWEL SIGN O
>     U+114BE 𑒾 TIRHUTA VOWEL SIGN AU
>     U+115BA 𑖺 SIDDHAM VOWEL SIGN O
>     U+115BB 𑖻 SIDDHAM VOWEL SIGN AU
>     U+11938 𑤸 DIVES AKURU VOWEL SIGN O
> 
>     (let ((s "\U0001112E\U0001112F\U0001134B\U0001134C\
>     \U000114BB\U000114BC\U000114BE\U000115BA\U000115BB\U00011938"))
>       (require 'ucs-normalize)
>       (list (equal s (ucs-normalize-NFD-string s))
>             (equal s (ucs-normalize-NFKD-string s))))
>     =>(t t)
> 
> (2) NFKC/NFKD replacement is not done
>     U+1E030..U+1E06D Cyrillic MODIFIER LETTER or SUBSCRIPT
>     U+1EE00..U+1EEBB ARABIC MATHEMATICAL *
>     U+1FBF0..U+1FBF9 SEGMENTED DIGIT *
> 
>     (let* ((f (lambda (cell)
>                 (apply #'string (number-sequence (car cell) (cdr cell)))))
>            (s (mapconcat f '((#x1E030 . #x1E06D)
>                              (#x1EE00 . #x1EEBB)
>                              (#x1FBF0 . #x1FBF9)))))
>       (require 'ucs-normalize)
>       (list (equal s (ucs-normalize-NFKC-string s))
>             (equal s (ucs-normalize-NFKD-string s))))
>     =>(t t)

Thanks, fixed on the emacs-29 branch.

Once again, if (as I'm guessing) you found these problems by examining
the data in ucs-normalize.el, it would have greatly helped if you'd
pointed to the problematic data in your report.  Reverse-engineering
the sources of the problem from the behavior takes time, especially
when the relevant code is not trivial and was written by someone else.

[Prev in Thread]

Current Thread

[Next in Thread]

bug#65996: 29.1; UCS normalization is wrong, awrhygty, 2023/09/15
- bug#65996: 29.1; UCS normalization is wrong, Eli Zaretskii <=

Prev by Date: bug#65980: 30.0.50; C-e behaves surprisingly in minibuffer
Next by Date: bug#66025: 30.0.50; eldoc.el: eldoc--invoke-strategy: register-doc defined with 4 args, called with 5
Previous by thread: bug#65996: 29.1; UCS normalization is wrong
Next by thread: bug#65997: 29.1; ?\N{char_name} reference is wrong
Index(es):
- Date
- Thread