[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Unicode 15 support - using UTC instead of IANA as table source? On U+19
From: |
Simon Josefsson |
Subject: |
Unicode 15 support - using UTC instead of IANA as table source? On U+19DA |
Date: |
Tue, 18 Oct 2022 21:13:27 +0200 |
User-agent: |
Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) |
Hi
I am considering switching to UTC as the source of our derived IDNA2008
tables, for simple support of Unicode > 12. For Unicode <= 12 this has
no difference except for U+19DA which UTC has as PVALID and IANA as
DISALLOWED. This means idn2 behaviour changes from:
jas@latte:~$ echo ᧚|idn2
idn2: toAscii: string contains a disallowed character
into
jas@latte:~/src/libidn2/src$ echo ᧚|./idn2
xn--pkf
This actually goes back to libidn2 0.11 behaviour, which also resulted
in xn--pkf since it used Unicode < 6.0.0:
jas@latte:~/src/libidn2-0.11/src$ ./idn2 --version|head -1
idn2 (idn2) 0.11
jas@latte:~/src/libidn2-0.11/src$ echo ᧚|./idn2
xn--pkf
jas@latte:~/src/libidn2-0.11/src$
The xn--pkf output is consistent with some other IDNA2008
implementations:
https://icu4c-demos.unicode.org/icu-bin/idnbrowser?t=xn--th5h
https://idnaconv.net/try-it.html?encoded=xn--th5h&decode=%3C%3C+Decode
There may be other differences between UTC derived values and IANA
derived values for Unicode > 12 and <= 15 once IANA gets around to
publishing tables, but we can't tell until that happens and I'm not
holding my horses since they haven't published anything for 12.1.0
(2019-03), 13.0.0 (2019-11), 14.0.0 (2021) nor 15.0.0 (2022-05).
I don't have a strong opinion on this, but some of the factors involved
are:
1) consistency with other implementations
2) importance of U+19DA (which is rare) and practical problems resulting
from this change (apparently little)
3) support Unicode > 12 now (most important of these factors IMO)
4) domain name stability: once derived for a code point, the property
shouldn't change in the future. thus, the change in 0.12 could be
considered the bug here. I believe I agreed with the approach used by
RFC 6452 at the time it was published, but revisiting this issue today I
find myself in the opposite camp. It is a subjective judgement call,
and there are good arguments for both sides.
If you want to provide feedback on this, please respond here or to this
issue:
https://gitlab.com/libidn/libidn2/-/issues/112
/Simon
signature.asc
Description: PGP signature
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- Unicode 15 support - using UTC instead of IANA as table source? On U+19DA,
Simon Josefsson <=