>From 655d3e61fa99bb5ddf5388c0843f498d0bf6f789 Mon Sep 17 00:00:00 2001 From: Nathan Date: Thu, 2 Nov 2023 15:42:30 -0400 Subject: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2. * module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4. (merge-paths): Implement algorithm 5.2.3. (resolve-relative-reference): Implement algorithm 5.2.2. (module): Export resolve-relative-reference. * NEWS: Reference it here. * doc/ref/web.texi (URIs): Document it here. (Subtypes of URI): Add a @node declaration to cross-reference it. (HTTP Headers) [location]: Point to the section for different URI types. (Web Client) [http-request]: Indicate that no redirection is performed, and warn about blindly following them. * AUTHORS: Mention Vivien Kraus. * THANKS: Thank Vivien Kraus. --- AUTHORS | 8 +++ NEWS | 7 ++ THANKS | 1 + doc/ref/web.texi | 43 +++++++++++- module/web/uri.scm | 126 +++++++++++++++++++++++++++++++++- test-suite/tests/web-uri.test | 61 ++++++++++++++++ 6 files changed, 244 insertions(+), 2 deletions(-) diff --git a/AUTHORS b/AUTHORS index d756a74ce..2a95d3b0b 100644 --- a/AUTHORS +++ b/AUTHORS @@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore: Gregory Marton: In the subdirectory test-suite/tests, changes to: hash.test + +Vivien Kraus: +In the subdirectory module/web, changes to: + uri.scm +In the subdirectory doc/ref, changes to: + web.texi +In the subdirectory test-suite/tests, changes to: + web-uri.test diff --git a/NEWS b/NEWS index b319404d7..bdf75cb3c 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9) * New interfaces and functionality +** New function in (web uri): resolve-relative-reference + +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may +be used to request a moved resource in case of a 301 or 302 HTTP +response, by resolving the Location value of the response on top of the +requested URI. + ** New warning: unused-module This analysis, enabled at `-W2', issues warnings for modules that appear diff --git a/THANKS b/THANKS index aa4877e95..a1f982f04 100644 --- a/THANKS +++ b/THANKS @@ -19,6 +19,7 @@ Contributors since the last release: Chris K Jester-Young David Kastrup Daniel Kraft + Vivien Kraus Daniel Krueger Noah Lavine Christopher Lemmer Webber diff --git a/doc/ref/web.texi b/doc/ref/web.texi index 607c855b6..c6923c23f 100644 --- a/doc/ref/web.texi +++ b/doc/ref/web.texi @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes as @code{"scrambled%20eggs/biscuits%26gravy"}. @end deffn +@node Subtypes of URI @subsubheading Subtypes of URI As we noted above, not all URI objects have a scheme. You might have @@ -356,6 +357,34 @@ Parse @var{string} into a URI object, while asserting that no scheme is present. Return @code{#f} if the string could not be parsed. @end deffn +@cindex resolve URI reference +In order to get a URI object from a base URI and a relative reference, +one has to use a @dfn{relative URI reference resolution} algorithm. For +instance, given a base URI, @samp{https://example.com/over/here}, and a +relative reference, @samp{../no/there}, it may seem easy to get an +absolute URI as @samp{https://example.com/over/../no/there}. It is +possible that the server at @samp{https://example.com} could serve the +same resource under this URL as +@samp{https://example.com/no/there}. However, a web cache, or a linked +data processor, must understand that the relative reference resolution +leads to @samp{https://example.com/no/there}. + +@deffn {Scheme Procedure} resolve-relative-reference @var{base} @var{relative} @ + [#:strict?=@code{#t}] +Return a URI object representing @var{relative}, using the components of +@var{base} if missing, as defined in section 5.2 in RFC 3986. Both @var{base} +and @var{relative} may be full URI or relative URI references. If @var{strict?} +is true, the parser does not ignore the scheme in @var{relative} if it is +identical to the one in @var{base}. The name ``relative'' indicates the +argument’s relationship to @var{base}, not its type. This function cannot +return a relative reference (it can only return an absolute URI object), if +either @var{base} or @var{relative} is an absolute URI object. + +Please note that any part of @var{base} may be overriden by +@var{relative}. For instance, if @var{base} has a @code{https} URI +scheme, and if @var{relative} has a @code{http} scheme, then the result +will have a @code{http} scheme. +@end deffn @node HTTP @subsection The Hyper-Text Transfer Protocol @@ -1038,7 +1067,8 @@ The entity-tag of the resource. @deftypevr {HTTP Header} URI-reference location A URI reference on which a request may be completed. Used in combination with a redirecting status code to perform client-side -redirection. +redirection. @xref{Subtypes of URI, the distinction between types of +URI}, for more information on relative references. @example (parse-header 'location "http://example.com/other") @result{} # @@ -1501,6 +1531,17 @@ constants, such as @code{certificate-status/signer-not-found} or Connect to the server corresponding to @var{uri} and make a request over HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.). +@code{http-request} does not follow redirections. If a redirection is +required, @code{http-request} returns a response object with an adequate +response code (e.g. 301 or 302). + +Making web requests on a network where private servers are hosted comes +with potential security risks. A malicious public server might forge +its DNS record to point to your internal address. It might also +redirect you to your internal server. In the first case, or if you +follow the redirection of the second case, then you may accidentally +connect to your private server as if it were public. + The following keyword arguments allow you to modify the requests in various ways, for example attaching a body to the request, or setting specific headers. The following table lists the keyword arguments and diff --git a/module/web/uri.scm b/module/web/uri.scm index 8e0b9bee7..2280976b5 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -1,6 +1,7 @@ ;;;; (web uri) --- URI manipulation tools ;;;; ;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc. +;;;; Copyright (C) 2023 Vivien Kraus ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -47,7 +48,9 @@ uri-reference? relative-ref? build-uri-reference build-relative-ref - string->uri-reference string->relative-ref)) + string->uri-reference string->relative-ref + + resolve-relative-reference)) (define-record-type (make-uri scheme userinfo host port path query fragment) @@ -501,3 +504,124 @@ strings, and join the parts together with ‘/’ as a delimiter. For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’ encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’." (string-join (map uri-encode parts) "/")) + +(define (remove-dot-segments path) + "Remove the @samp{./} and @samp{../} segments in @var{path}, as +RFC3986, section 5.2.4. This procedure properly handles percent-encoded +dots, but does not percent-decode any unnecessary bytes." + (let lp ((input path) (out '())) + (define (get-dots-info) + "returns three values about the start of the current input string. +(values starts-with-slash? dots end-slash-index) +DOTS is the number of dot characters, including escaped ones. +If there are non-dot, non-slash characters too then DOTS will +instead be some meaningless number greater than two." + (let ((starts-with-slash (eqv? (string-ref input 0) #\/))) + (let dots-lp ((i (if starts-with-slash 1 0)) + (dots 0)) + (if (eqv? i (string-length input)) + (values starts-with-slash dots i) + (let ((c (string-ref input i))) + (cond + ((eqv? c #\/) + (values starts-with-slash dots i)) + ((string-prefix-ci? "%2E" input 0 3 i) + (dots-lp (+ i 3) (1+ dots))) + ((eqv? c #\.) + (dots-lp (1+ i) (1+ dots))) + (else + (dots-lp (1+ i) 3)))))))) + (if (string-null? input) + (apply string-append (reverse out)) + (call-with-values get-dots-info + (lambda (starts-with-slash? dots end-slash-pos) + (cond + ;; handle ../ ./ . .. + ((and (not starts-with-slash?) (or (eqv? dots 1) (eqv? dots 2))) + (lp (substring input (min (1+ end-slash-pos) + (string-length input))) out)) + ((and starts-with-slash? (eqv? dots 1)) + (lp + (if (eqv? end-slash-pos (string-length input)) + "/" ;; handle /. + (substring input end-slash-pos)) ;; handle /./ + out)) + ((and starts-with-slash? (eqv? dots 2)) + (lp + (if (eqv? end-slash-pos (string-length input)) + "/" ;; handle /.. + (substring input end-slash-pos)) ;; handle /../ + (if (null? out) out (cdr out)))) + (else + (lp + (substring input end-slash-pos) + (cons (substring input 0 end-slash-pos) out))))))))) + +(define (merge-paths base-has-authority? base relative) + "Return @samp{@var{base}/@var{relative}}, with the subtleties of +absolute paths explained in RFC3986, section 5.2.3. If the base URI has +an authority (userinfo, host, port), then the processing is a bit +different." + (if (and base-has-authority? + (equal? base "")) + (string-append "/" relative) + (let ((last-/ (string-rindex base #\/))) + (if last-/ + (string-append (substring base 0 last-/) "/" relative) + relative)))) + +(define* (resolve-relative-reference base relative #:key (strict? #t)) + "Resolve @var{relative} on top of @var{base}, as RFC3986, section +5.2. Both @var{relative} and @var{base} may be URI or relative +references. The name ``relative'' indicates the argument’s relationship +to @var{base}, not its type. Both @var{base} and @var{relative} may be +full URIs or relative references. The return value is a URI if either +@var{relative} or @var{base} is a URI. If @var{strict?} is true, the +default, the parser does not ignore the scheme in @var{relative} if it +is identical to the one in @var{base}." + (let ((b-scheme (uri-scheme base)) + (b-userinfo (uri-userinfo base)) + (b-host (uri-host base)) + (b-port (uri-port base)) + (b-path (uri-path base)) + (b-query (uri-query base)) + (b-fragment (uri-fragment base)) + (r-scheme (uri-scheme relative)) + (r-userinfo (uri-userinfo relative)) + (r-host (uri-host relative)) + (r-port (uri-port relative)) + (r-path (uri-path relative)) + (r-query (uri-query relative)) + (r-fragment (uri-fragment relative))) + (cond + ((or r-host (and r-scheme (or strict? (not (eq? r-scheme b-scheme))))) + (build-uri-reference + #:scheme (or r-scheme b-scheme) + #:userinfo r-userinfo + #:host r-host + #:port r-port + #:path (remove-dot-segments r-path) + #:query r-query + #:fragment r-fragment)) + ((string-null? r-path) + (build-uri-reference + #:scheme b-scheme + #:userinfo b-userinfo + #:host b-host + #:port b-port + #:path b-path + #:query (or r-query b-query) + #:fragment r-fragment)) + (else + (build-uri-reference + #:scheme b-scheme + #:userinfo b-userinfo + #:host b-host + #:port b-port + #:path + (remove-dot-segments + (if (string-prefix? "/" r-path) + r-path + (merge-paths b-host b-path r-path))) + #:query r-query + #:fragment r-fragment))))) diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 95fd82f16..b4d4b6cdb 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -693,3 +693,64 @@ (pass-if (equal? "foo%20bar" (uri-encode "foo bar"))) (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar"))) (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^")))) + +(with-test-prefix "resolve relative reference" + ;; Test suite in RFC3986, section 5.4. + (let ((base (string->uri "http://a/b/c/d;p?q"))) + (let ((resolve + (lambda (relative) + (let* ((relative-uri + (string->uri-reference relative)) + (resolved-uri + (resolve-relative-reference base relative-uri)) + (resolved (uri->string resolved-uri))) + resolved)))) + (pass-if "remove-dot-segments unnecessary escaping" + (equal? (resolve "%2e%2E/.%2e/%2E./g%2e%2E%2Fh%2e") "http://a/g%2e%2E%2Fh%2e")) + (with-test-prefix "normal" + (pass-if (equal? (resolve "g:h") "g:h")) + (pass-if (equal? (resolve "g") "http://a/b/c/g")) + (pass-if (equal? (resolve "./g") "http://a/b/c/g")) + (pass-if (equal? (resolve "g/") "http://a/b/c/g/")) + (pass-if (equal? (resolve "/g") "http://a/g")) + (pass-if (equal? (resolve "//g") "http://g")) + (pass-if (equal? (resolve "?y") "http://a/b/c/d;p?y")) + (pass-if (equal? (resolve "g?y") "http://a/b/c/g?y")) + (pass-if (equal? (resolve "#s") "http://a/b/c/d;p?q#s")) + (pass-if (equal? (resolve "g?y#s") "http://a/b/c/g?y#s")) + (pass-if (equal? (resolve ";x") "http://a/b/c/;x")) + (pass-if (equal? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s")) + (pass-if (equal? (resolve "") "http://a/b/c/d;p?q")) + (pass-if (equal? (resolve ".") "http://a/b/c/")) + (pass-if (equal? (resolve "./") "http://a/b/c/")) + (pass-if (equal? (resolve "..") "http://a/b/")) + (pass-if (equal? (resolve "../") "http://a/b/")) + (pass-if (equal? (resolve "../g") "http://a/b/g")) + (pass-if (equal? (resolve "../..") "http://a/")) + (pass-if (equal? (resolve "../../") "http://a/")) + (pass-if (equal? (resolve "../../g") "http://a/g"))) + (with-test-prefix "abnormal" + (pass-if (equal? (resolve "../../../g") "http://a/g")) + (pass-if (equal? (resolve "../../../../g") "http://a/g")) + (pass-if (equal? (resolve "/./g") "http://a/g")) + (pass-if (equal? (resolve "/../g") "http://a/g")) + (pass-if (equal? (resolve "g.") "http://a/b/c/g.")) + (pass-if (equal? (resolve ".g") "http://a/b/c/.g")) + (pass-if (equal? (resolve "g..") "http://a/b/c/g..")) + (pass-if (equal? (resolve "..g") "http://a/b/c/..g")) + (pass-if (equal? (resolve "./../g") "http://a/b/g")) + (pass-if (equal? (resolve "./g/.") "http://a/b/c/g/")) + (pass-if (equal? (resolve "g/./h") "http://a/b/c/g/h")) + (pass-if (equal? (resolve "g/../h") "http://a/b/c/h")) + (pass-if (equal? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y")) + (pass-if (equal? (resolve "g;x=1/../y") "http://a/b/c/y")) + (pass-if (equal? (resolve "g?y/./x") "http://a/b/c/g?y/./x")) + (pass-if (equal? (resolve "g?y/../x") "http://a/b/c/g?y/../x")) + (pass-if (equal? (resolve "g#s/./x") "http://a/b/c/g#s/./x")) + (pass-if (equal? (resolve "g#s/../x") "http://a/b/c/g#s/../x")) + (pass-if (equal? (resolve "http:g") "http:g")) + (pass-if "nonstrict relative resolve" + (equal? (uri->string (resolve-relative-reference + base (string->uri-reference "http:g") + #:strict? #f)) + "http://a/b/c/g")))))) -- 2.41.0