help-libidn
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: libidn2 0.13


From: Tim Rühsen
Subject: Re: libidn2 0.13
Date: Sat, 07 Jan 2017 19:48:42 +0100
User-agent: KMail/5.2.3 (Linux/4.8.0-2-amd64; KDE/5.28.0; x86_64; ; )

On Dienstag, 3. Januar 2017 10:00:53 CET Nikos Mavrogiannopoulos wrote:
> On Mon, Jan 2, 2017 at 10:17 PM, Tim Rühsen <address@hidden> wrote:
> >> * APIs more like libidn's that take a full domain name and do proper
> >> 
> >>   operations on them.  In several forms, UTF-8, USC-32, locale encoded,
> >>   etc.
> >> 
> >> * APIs to decode a IDNA2008 domain from ACE to Unicode format.  That is
> >> 
> >>   not described by the IDNA2008 RFCs, interestingly enough, but I
> >>   suspect people will want it, hah!
> > 
> > Wget used to use ACE decoding from libidn, but only for logging/displaying
> > purpose. Since we switched to libidn2, the UTF-8/locale named will not be
> > displayed any more :-). With such a function I would reactivate the
> > logging
> > code.
> 
> For gnutls unfortunately the reverse is really necessary and that's
> the reason we are stuck with libidn. We need to be able to print the
> actual name of the certificate and not only the punycode which is
> non-human readable for most languages.

Than let's define a function.

Let me start with a suggestion to get the ball rolling
        int idn2_fromASCII (const uint8_t *src, uint8_t **dst)

'src' is an UTF-8 encoded string (domain name)
'dst' is the punycode-decoded output, also UTF-8.

Examples:
foo.bar -> foo.bar
übel.de -> übel.de
xn--bel-goa.de -> übel.de
xn--bel-goa.größer.de -> übel.größer.de

Casing: we leave input as it is - only domain labels that start with xn-- will 
be converted without any casing check.

Why utf-8 and utf-8 ?
- Most applications internally work already with UTF-8.
- It is easy to convert to utf-16/utf-32 (ucs2/ucs4).
- Leave charset transcoding out of the library
- ...

Do we need an additional 'flags' for future use ? Why not.

If we want charset transcoding, we also need input and output charset, maybe 
also language (e.g. think of turkish i/I casing). Do we want that ?

Regards, Tim

Attachment: signature.asc
Description: This is a digitally signed message part.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]