[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 39
From: |
Nathan |
Subject: |
Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2. |
Date: |
Fri, 03 Nov 2023 13:49:37 -0400 |
Hi Vivien,
> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:
I wasn't able to find anything that MANDATED any normalization at all, either
before or after Relative Resolution. It is possible that treating %2E as a
literal dot in resolve-relative-reference could count as unwanted
normalization. But it's a safe operation in terms of URI equivalence* and I
think users would be less confused to have %2E%2E disappear than to have it
remain.
Also, what if the resolve-relative-reference procedure didn't treat %2E as a
dot?
There isn't a uri-normalize procedure users can call afterwards to fix that.
And there isn't a version of uri-decode that allows selectively decoding JUST
the dot characters.
Users would have to write a lot of code themselves to get proper
relative-resolution, so we should do it for them.
- Nathan
*References for the claim that treating %2E as a literal dot is always okay:
- Section 2.3: percent-encoded unreserved characters are always equivalent to
decoded ones.
- Section 2.4: unreserved characters can be percent-decoded at any time.
- Section 6.2.2.3: dot-segments should be removed during normalization even if
found outside of a relative-reference.
Vivien Kraus <vivien@planete-kraus.eu> writes:
> Hello Natan!
>
> Le jeudi 02 novembre 2023 à 16:00 -0400, Nathan a écrit :
>> There is a problem and I fixed it by rewriting a bunch of code myself
>> because I need similar code.
>
> Thank you!
>
>> remove-dot-segments:
>> You cannot split-and-decode-uri-path and then encode-and-join-uri-
>> path.
>> Those are terrible functions that don't work on all URIs.
>> URI schemes are allowed to specify that certain reserved characters
>> (sub-delims) are special.
>> In that case, a sub-delim that IS escaped is different from a sub-
>> delim that IS NOT escaped.
>>
>> Example input to your remove-dot-segments:
>> (resolve-relative-reference (string->uri-reference "/") (string->uri-
>> reference "excitement://a.com/a!a!%21!"))
>> Your wrong output:
>> excitement://a.com/a%21a%21%21%21
>
> I see.
>
>>
>> One solution would be to only percent-decode dots. Because dot is
>> unreserved, that solution doesn't have any URI equivalence issues.
>> But I still think decoding dots automatically is a bad, unexpected
>> side-effect to have.
>> I rewrote this function so that it:
>> - works on both escaped and unescaped dots
>> - doesn't unescape any unnecessary characters
>
> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:
>
> 2.3: Unreserved Characters:
> For consistency, percent-encoded octets in the ranges of ALPHA
> (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
> underscore (%5F), or tilde (%7E) should not be created by URI
> producers and, when found in a URI, should be decoded to their
> corresponding unreserved characters by URI normalizers.
>
> 5.2.1: Pre-parse the Base URI:
> Normalization of the base URI, as described in Sections 6.2.2 and
> 6.2.3, is optional. A URI reference must be transformed to its
> target URI before it can be normalized.
>
> Did you find something more precise than that? In any case, decoding
> the dots is probably the least unsafe thing to do.
>
>>
>> The test suite no longer needs to check for incorrect output either:
>> > ;; The test suite checks for ';' characters, but Guile escapes
>> > ;; them in URIs. Same for '='.
>>
>> ----
>>
>> resolve-relative-reference:
>> I rewrote this procedure so it is shorter.
>> I also added #:strict? to toggle "strict parser" as mentioned in the
>> RFC.
>
> As far as I understand, your code is correct. The tests pass.
>
> Thank you again!
>
> Best regards,
>
> Vivien