[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches)
From: |
Jason Harris |
Subject: |
UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches) |
Date: |
Tue, 19 Oct 2004 18:10:30 -0400 |
User-agent: |
Mutt/1.4.2.1i |
On Tue, Oct 19, 2004 at 11:33:41AM -0400, Jason Harris wrote:
> This seems to work on pks servers whether they send UTF-8 or not.
> For Noèl Koethe's keys, I can use ALT-h to generate è and get
> back both 0x307D56ED and 0x0986B74D on keyserver.kjsl.com:11371.
> This also works from the iso-8859-1 (assumed) search pages at
> stinkfoot.org (using elinks and lynx, anyway), which returns UTF-8
> results, and at dtype.org, which returns iso-8859-1 (assumed) results.
>
> On noreply.org, I only get 0x307D56ED, however. The links:
>
>
> http://keyserver.noreply.org/pks/lookup?search=no%C3%A8l+koethe&fingerprint=on&op=index
>
> http://keyserver.kjsl.com:11371/pks/lookup?search=no%C3%A8l+koethe&fingerprint=on&op=index
[self-reply]
Actually, that happened only by luck on pks. pks uses isalnum(3) in
kd_add_userid_to_wordlist() to tokenize userid strings. ispunct(3)
would seem a better choice, however, in the presence of non-ASCII
characters.
For key 0x0986B74D, No\xe8\x6c K\xf6\x74he <noel koethe.net>, or
Noèl Köthe, pks currently stores the following "words"
from the userid:
koethe
net
no
noel
the
With the following changes to kd_add_userid_to_wordlist() in kd_generic.c:
while (end < userid+userid_len) {
/* find beginning of word */
start = end;
- while ((start < userid+userid_len) && !isalnum(*start))
+ while ((start < userid+userid_len) &&
+ (ispunct(*start) || isspace (*start)))
start++;
/* find end of word */
end = start;
- while ((end < userid+userid_len) && isalnum(*end))
+ while ((end < userid+userid_len) &&
+ (!ispunct(*end) && !isspace (*end)))
end++;
/* store it if it's > 1 char */
pks stores the following (actual) words from the userid (printed using
hex escapes):
koethe
k\f6the
net
noel
no\e8l
This seems fine, but elinks (using the ISO 8859-1 charset) and lynx send
query strings of:
http://localhost:11371/pks/lookup?op=index&search=no%C3%A8l&fingerprint=on
instead of:
http://localhost:11371/pks/lookup?op=index&search=no%e8l&fingerprint=on
which is needed to find no\e8l on 0x0986B74D. The first query string
does return 0x307D56ED, another key of Noèl's, however, since it uses
UTF-8 encoding in the actual userid string.
Therefore, the above patch seems to tokenize older binary as well as UTF-8
userids properly and stores them in raw format in worddb. elinks and lynx,
at least, send UTF-8 query strings that match newer keys that encode
userids in UTF-8. Older keys can still be found using the old hex codes,
when necessary.
NB: For full effect, anyone using this patch to more fully support UTF-8
needs to make a keydump and rebuild their pks database(es) from scratch.
I imagine a similar fix is necessary for SKS.
--
Jason Harris | NIC: JH329, PGP: This _is_ PGP-signed, isn't it?
address@hidden _|_ web: http://keyserver.kjsl.com/~jharris/
Got photons? (TM), (C) 2004
pgpGF_M7t_C6n.pgp
Description: PGP signature
- [Sks-devel] 1.0.8 patches, Peter Palfrader, 2004/10/18
- Re: [Sks-devel] 1.0.8 patches, Jason Harris, 2004/10/18
- Re: [Sks-devel] 1.0.8 patches, Yaron Minsky, 2004/10/18
- Re: [Sks-devel] 1.0.8 patches, Peter Palfrader, 2004/10/19
- Re: [Sks-devel] 1.0.8 patches, Jason Harris, 2004/10/19
- UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches),
Jason Harris <=
- Re: UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches), David Shaw, 2004/10/20
- Re: UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches), Jason Harris, 2004/10/20
- Re: UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches), Yaron Minsky, 2004/10/20
- Re: [pgp-keyserver-folk] Re: UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches), David Shaw, 2004/10/22
- Re: [pgp-keyserver-folk] Re: UTF-8/non-ASCII chars in keys (was Re: [Sks-devel] 1.0.8 patches), Jason Harris, 2004/10/23
Re: [Sks-devel] 1.0.8 patches, David Shaw, 2004/10/19