[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#65997: 29.1; ?\N{char_name} reference is wrong
From: |
Robert Pluim |
Subject: |
bug#65997: 29.1; ?\N{char_name} reference is wrong |
Date: |
Fri, 15 Sep 2023 17:33:41 +0200 |
>>>>> On Fri, 15 Sep 2023 22:02:37 +0900, awrhygty@outlook.com said:
awrhygty> S-exps in the form of ?\N{char_name} return wrong values for some
awrhygty> characters.
awrhygty> The S-exp below inserts a whole list of such characters.
awrhygty> (dotimes (u (1+ (max-char 'ucs)))
awrhygty> (let* ((name (get-char-code-property u 'name)))
awrhygty> (when (and name (not (<= #xD800 u #xDFFF)))
awrhygty> (let ((u2 (condition-case err
awrhygty> (read (format "?\\N{%s}" name))
awrhygty> (error 0))))
awrhygty> (unless (eq u u2)
awrhygty> (insert (format "%X\t%s\t%X\t%s\n" u name u2
awrhygty> (if (= 0 u2)
awrhygty> "error"
awrhygty> (get-char-code-property u2
'name)))))))))
For a minute there I thought our hash tables were broken :-). Stefan,
it only took 9 years, but this is no longer true:
lisp/international/mule-cmds.el:
;; In theory this code could end up pushing an "old-name" that
;; shadows a "new-name" but in practice every time an
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
The patch below fixes that issue.
awrhygty> output(TANGUT COMPONENTs are omitted):
I donʼt know why the ranges in `ucs-names' donʼt cover these
code-points. Itʼs easy enough to change them, but theyʼre
explicitly commented out.
awrhygty> 16FE4 KHITAN SMALL SCRIPT FILLER 0 error
awrhygty> 16FF0 VIETNAMESE ALTERNATE READING MARK CA 0 error
awrhygty> 16FF1 VIETNAMESE ALTERNATE READING MARK NHAY 0 error
awrhygty> 1B132 HIRAGANA LETTER SMALL KO 0 error
And similarly for these 4.
Robert
--
diff --git a/lisp/international/mule-cmds.el b/lisp/international/mule-cmds.el
index c26898f7649..254ecae5bd5 100644
--- a/lisp/international/mule-cmds.el
+++ b/lisp/international/mule-cmds.el
@@ -3135,7 +3135,9 @@ ucs-names
;; `old-name' conflicts with a `new-name', the newer one has a
;; higher code, so it gets pushed later!
(if new-name (puthash new-name c names))
- (if old-name (puthash old-name c names))
+ (when (and old-name
+ (not (gethash old-name names)))
+ (puthash old-name c names))
;; Unicode uses the spelling "lamda" in character
;; names, instead of "lambda", due to "preferences
;; expressed by the Greek National Body" (Bug#30513).