[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: libidn2 next steps
From: |
Simon Josefsson |
Subject: |
Re: libidn2 next steps |
Date: |
Wed, 28 Dec 2016 09:02:40 +0100 |
User-agent: |
Gnus/5.130014 (Ma Gnus v0.14) Emacs/24.4 (gnu/linux) |
Tim Rühsen <address@hidden> writes:
> On Dienstag, 27. Dezember 2016 12:26:44 CET Simon Josefsson wrote:
>> Now that Tim implemented TR46 and there is a release with it out, I'm
>> pondering some next steps for libidn2, which may include:
>>
>> * Better APIs to simplify conversion for applications
>> - Compare libidn APIs that take various string forms.
>> - In particular, the API should take entire domain names
>> instead of only labels.
>> - Also in particular, there should be an API for decoding.
>> * Other language bindings? Compare libidn.
>>
>> More ideas are welcome!
>>
>> Continous integration on gitlab would be nice. It was some time since I
>> played with it last time... things have likely changed.
>
> * Reducing static table size (tr46map.*)
> - using a trie for idna_map would allow compact storing of codepoints /
> codepoint ranges and still having a fast access/search
> - detect and eliminate doublettes in mapdata
> - compact storing mapdata
>
> By 'compact storing' I think of the usage of a continuation bit:
> 1 byte: 0-0x7f -> 0xxxxxxx
> 2 bytes: 0x80-0x3fff ->1xxxxxxx 0xxxxxxx
> 3 bytes: 0x4000-0x1fffff ->1xxxxxxx 1xxxxxxx 0xxxxxxx
> 4 bytes: 0x200000-0xFFFFFFF -> 1xxxxxxx 1xxxxxxx 1xxxxxxx 0xxxxxxx
> 5 bytes: 0x10000000->0xFFFFFFFF -> 1xxxxxxx 1xxxxxxx 1xxxxxxx 1xxxxxxx
> 0xxxxxxx
Yeah, there is a lot of room for optimization in a library like this --
it should be relatively easy to try different storage algorithms and
benchmark them to see what works best in reality. As you've noticed,
I've taken a really naïve approach initially to get simple (but slow)
code. It is not always clear beforehand what results in better
performance or not.
/Simon
signature.asc
Description: PGP signature