Hello,
I'm from security at Google. I'm working on a differential fuzzer
between libidn2 and the Python idna package. (Essentially, I've written
a program that rapidly tries inputs for libidn2 and Python idna, and
makes sure that the same input produces the same result). I was writing
this to find bugs in the Python idna package, but I think I've found 3
bugs in libidn2 instead. I'm reaching out to report these 3 bugs.
In all of these cases, libidn2 rejects encoding the specified domain
name with an error, but Python idna encodes it fine. Also, in all of
these cases, libidn2 will happily /decode/ the punycode generated by
Python idna, into the same input that it refuses to encode.
This input causes libidn2 to report an error of "domain name longer than
255 characters." However, the punycode domain name is only 146 characters.
* Domain name:
髦暩晦晦晦獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳筳獳
싂.퐀쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓄쓼쓄쓄쓄쓄쓄쓄쓄쓄쓄㻄쓄쓄럄䄀싂.뼀猀獳獳
獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳獳ⱁ㩁
* Domain name hex codepoints:
['9ae6', '66a9', '6666', '6666', '6666', '7373', '7373', '7373',
'7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
'7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
'7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
'7b73', '7373', 'c2c2', '2e', 'd400', 'c4c4', 'c4c4', 'c4c4',
'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
'c4fc', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4', 'c4c4',
'c4c4', 'c4c4', '3ec4', 'c4c4', 'c4c4', 'b7c4', '4100', 'c2c2',
'2e', 'bf00', '7300', '7373', '7373', '7373', '7373', '7373',
'7373', '7373', '7373', '7373', '7373', '7373', '7373', '7373',
'7373', '7373', '7373', '7373', '7373', '2c41', '3a41']
* Punycode:
xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b
This input causes libidn2 encoding to report an error of "string has
forbidden bi-directional properties". To determine which library was
wrong, I implemented the bidi rule myself, and I believe this should be
valid.
* Domain name:
ਗ਼.ÿ߽̃̃̃
* Domain name hex codepoints:
['a17', 'a3c', '2e', 'ff', '7fd', '303', '303', '303']
* Punycode:
xn--lkvaa9xr87caaaaaaaaaaaaaaaaaaaaaaaaaaa7968dcp2n7tvk.xn--p9mx3db62rwgjlncaaaaaaaaaaaaaaaaaaaba41m468u.xn--bfj606ben8bfnaaaaaaaaaaaaaaaaaa79563b
This input causes libidn2 to report a disallowed character. This appears
to not be a "bug", but rather out-of-date tables in libidn2. The
offending character
<https://www.fileformat.info/info/unicode/char/0e90/index.htm> was only
added to Unicode in 2019.
* Domain name:
ຐ.xyz <http://xn--46c.xyz>
* Domain name hex codepoints:
['e90', '2e', '78', '79', '7a']
* Punycode:
xn--46c.xyz <http://xn--46c.xyz>