[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
bug#51733: 27.1; Detect impossible email addresses better
From: |
Lars Ingebrigtsen |
Subject: |
bug#51733: 27.1; Detect impossible email addresses better |
Date: |
Mon, 17 Jan 2022 21:22:58 +0100 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/29.0.50 (gnu/linux) |
I'm not quite sure I understand this bit here
https://www.unicode.org/reports/tr39/#Confusable_Detection
---
For an input string X, define skeleton(X) to be the following transformation on
the string:
Convert X to NFD format, as described in [UAX15].
Concatenate the prototypes for each character in X according to the
specified data, producing a string of exemplar characters.
Reapply NFD.
---
I mean, that sounds OK in and of itself, but then:
---
X and Y are single-script confusables if and only if they are confusable, and
their resolved script sets have at least one element in common.
Examples: “ljeto” and “ljeto” in Latin (the Croatian word for “summer”),
where the first word uses only four codepoints, the first of which is U+01C9
(lj) LATIN SMALL LETTER LJ.
---
But:
(ucs-normalize-NFD-string "ljeto")
=> "ljeto"
So according to that algo "ljeto" and "ljeto" are not confusable.
But if we use NFKD instead, they are:
(ucs-normalize-NFKD-string "ljeto")
=> "ljeto"
It seems unlikely to be a typo in this document, surely? But NFKD seems
to make a whole lot more sense than NFD for this usage. I must be
missing or misreading something.
--
(domestic pets only, the antidote for overdose, milk.)
bloggy blog: http://lars.ingebrigtsen.no
- bug#51733: 27.1; Detect impossible email addresses better, (continued)
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Eli Zaretskii, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Eli Zaretskii, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Eli Zaretskii, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Eli Zaretskii, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better, Eli Zaretskii, 2022/01/17
- bug#51733: 27.1; Detect impossible email addresses better,
Lars Ingebrigtsen <=
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Eli Zaretskii, 2022/01/18
- bug#51733: 27.1; Detect impossible email addresses better, Robert Pluim, 2022/01/19
- bug#51733: 27.1; Detect impossible email addresses better, Lars Ingebrigtsen, 2022/01/19