guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 39


From: Vivien Kraus
Subject: Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
Date: Thu, 02 Nov 2023 21:48:55 +0100
User-agent: Evolution 3.46.4

Hello Natan!

Le jeudi 02 novembre 2023 à 16:00 -0400, Nathan a écrit :
> There is a problem and I fixed it by rewriting a bunch of code myself
> because I need similar code.

Thank you!

> remove-dot-segments:
> You cannot split-and-decode-uri-path and then encode-and-join-uri-
> path.
> Those are terrible functions that don't work on all URIs.
> URI schemes are allowed to specify that certain reserved characters
> (sub-delims) are special.
> In that case, a sub-delim that IS escaped is different from a sub-
> delim that IS NOT escaped.
> 
> Example input to your remove-dot-segments:
> (resolve-relative-reference (string->uri-reference "/") (string->uri-
> reference "excitement://a.com/a!a!%21!"))
> Your wrong output:
> excitement://a.com/a%21a%21%21%21

I see.

> 
> One solution would be to only percent-decode dots. Because dot is
> unreserved, that solution doesn't have any URI equivalence issues.
> But I still think decoding dots automatically is a bad, unexpected
> side-effect to have.
> I rewrote this function so that it:
> - works on both escaped and unescaped dots
> - doesn't unescape any unnecessary characters

This pushes the limits of my understanding of URIs, as I did not know
we had to consider '%2E%2E' the same as '..'. However, the RFC is not
very clear:

2.3: Unreserved Characters:
   For consistency, percent-encoded octets in the ranges of ALPHA
   (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
   underscore (%5F), or tilde (%7E) should not be created by URI
   producers and, when found in a URI, should be decoded to their
   corresponding unreserved characters by URI normalizers.

5.2.1: Pre-parse the Base URI:
   Normalization of the base URI, as described in Sections 6.2.2 and
   6.2.3, is optional.  A URI reference must be transformed to its
   target URI before it can be normalized.

Did you find something more precise than that?  In any case, decoding
the dots is probably the least unsafe thing to do.

> 
> The test suite no longer needs to check for incorrect output either:
> > ;; The test suite checks for ';' characters, but Guile escapes
> > ;; them in URIs. Same for '='.
> 
> ----
> 
> resolve-relative-reference:
> I rewrote this procedure so it is shorter.
> I also added #:strict? to toggle "strict parser" as mentioned in the
> RFC.

As far as I understand, your code is correct. The tests pass.

Thank you again!

Best regards,

Vivien



reply via email to

[Prev in Thread] Current Thread [Next in Thread]