chicken-users
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.


From: Alex Shinn
Subject: Re: [Chicken-users] [Q] uri-common has problem with UTF-8 uri.
Date: Tue, 15 Jan 2013 14:44:08 +0900

On Tue, Jan 15, 2013 at 2:23 PM, Ivan Raikov <address@hidden> wrote:
Hi again,

   I have now extended the utf8 code in uri-generic, so that UTF-8 sequences are percent-encoded as lists of the form '(% h1 h2 [% h3 h4 ...])). The percent-decoding routine is not going to decode sequences of more that one byte, so that now percent encoding normalization will not interfere with encoded UTF-8 sequences. I have also renamed the iri->uri routine to utf8-string->uri. I think now its behavior is compliant with both RFC 3986 and 3987:

(utf8-string->uri "http://example.com/삼계탕") =>

#(URI scheme=http authority=#(URIAuth host="example.com" port=#f) path=(/ "%EC%82%BC%EA%B3%84%ED%83%95") query=#f fragment=#f)

This result looks broken.  As I noted in my previous mail, the URI representation
already handles non-ASCII characters and escapes on output:

$ csi -R uri-common
#;1> (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕"))
#<URI-common: scheme="http" port=#f host="127.0.0.1" path=(/ "삼계탕") query=#f fragment=#f>
#;2> (uri->string (make-uri scheme: "http" host: "127.0.0.1" path: '(/ "삼계탕")))
"http://127.0.0.1/82%BCB3%8483%95"

If you put percent escapes _inside_ the internal path representation,
you'll get double escaping.

Parsing is a separate matter, and utf8-string->uri should return
the URI object without error, but with the unescaped values in
the path and query as resulting from the make-uri above.

Unrelated, the actual escaped output looks buggy - it looks like
some characters like the leading "%EC%" are getting dropped.

-- 
Alex


reply via email to

[Prev in Thread] Current Thread [Next in Thread]