guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5


From: Vivien Kraus
Subject: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
Date: Mon, 25 Sep 2023 18:48:28 +0200
User-agent: Evolution 3.46.4

* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
* doc/ref/web.texi (URIs): Document it here.
(Subtypes of URI): Add a @node declaration to cross-reference it.
(HTTP Headers) [location]: Point to the section for different URI types.
(Web Client) [http-request]: Indicate that no redirection is performed.
---

I clarified the situation about redirections. I don’t think it’s Guile’s
job to do it. For permanent redirections (301), the application
developer is supposed to edit the pages that point to the now-moved
resource anyway. A handful of security issues must also be lurking in
the shadows, and I don’t think it should be a responsibility for the
Guile web client.

The specification uses the word "relative" both for the type of URI that
is most likely to be found, and to express the asymmetric relation
between both arguments of the algorithm. I think "base" and "dependent"
are clearer, what do you think?

The semicolon and equal sign are both reserved characters, so it’s
expected that Guile escapes them. If there’s a bug, it is in the 5.4
section of the RFC. However, I understand that it would be desirable for
the algorithm to accept such unescaped characters, since it works with
URIs in isolation and not in an HTTP frame or web page.

 NEWS                          |   7 ++
 doc/ref/web.texi              |  27 +++++-
 module/web/uri.scm            | 161 +++++++++++++++++++++++++++++++++-
 test-suite/tests/web-uri.test |  68 ++++++++++++++
 4 files changed, 261 insertions(+), 2 deletions(-)

diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
 
 * New interfaces and functionality
 
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
 ** New warning: unused-module
 
 This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855b6..2267c9774 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" 
"biscuits&gravy")} encodes
 as @code{"scrambled%20eggs/biscuits%26gravy"}.
 @end deffn
 
+@node Subtypes of URI
 @subsubheading Subtypes of URI
 
 As we noted above, not all URI objects have a scheme.  You might have
@@ -356,6 +357,25 @@ Parse @var{string} into a URI object, while asserting that 
no scheme is
 present.  Return @code{#f} if the string could not be parsed.
 @end deffn
 
+@cindex resolve URI reference
+In order to get a URI object from a base URI and a relative reference,
+one has to use a @dfn{relative URI reference resolution} algorithm.  For
+instance, given a base URI, @samp{https://example.com/over/here}, and a
+relative reference, @samp{../no/there}, it may seem easy to get an
+absolute URI as @samp{https://example.com/over/../no/there}. It is
+possible that the server at @samp{https://example.com} could serve the
+same resource under this URL as
+@samp{https://example.com/no/there}. However, a web cache, or a linked
+data processor, must understand that the relative reference resolution
+leads to @samp{https://example.com/no/there}.
+
+@deffn {Scheme procedure} resolve-relative-reference @var{base} @var{dependent}
+Return a URI object representing @var{dependent}, using the components
+of @var{base} if missing, as defined in section 5.2 in RFC 3986. This
+function cannot return a relative reference (it can only return an
+absolute URI object), if either @var{base} or @var{dependent} is an
+absolute URI object.
+@end deffn
 
 @node HTTP
 @subsection The Hyper-Text Transfer Protocol
@@ -1038,7 +1058,8 @@ The entity-tag of the resource.
 @deftypevr {HTTP Header} URI-reference location
 A URI reference on which a request may be completed.  Used in
 combination with a redirecting status code to perform client-side
-redirection.
+redirection. @xref{Subtypes of URI, the distinction between types of
+URI}, for more information on relative references.
 @example
 (parse-header 'location "http://example.com/other";)
 @result{} #<uri ...>
@@ -1501,6 +1522,10 @@ constants, such as 
@code{certificate-status/signer-not-found} or
 Connect to the server corresponding to @var{uri} and make a request over
 HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.).
 
+@code{http-request} does not follow redirections. If a redirection is
+required, @code{http-request} returns a response object with an adequate
+response code (e.g. 301 or 302).
+
 The following keyword arguments allow you to modify the requests in
 various ways, for example attaching a body to the request, or setting
 specific headers.  The following table lists the keyword arguments and
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..acec2d1e8 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -47,7 +47,9 @@
 
             uri-reference? relative-ref?
             build-uri-reference build-relative-ref
-            string->uri-reference string->relative-ref))
+            string->uri-reference string->relative-ref
+
+            resolve-relative-reference))
 
 (define-record-type <uri>
   (make-uri scheme userinfo host port path query fragment)
@@ -501,3 +503,160 @@ strings, and join the parts together with ‘/’ as a 
delimiter.
 For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
 encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
   (string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+  "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+ RFC3986, section 5.2.4."
+  (let scan ((input
+              (let ((components (split-and-decode-uri-path path)))
+                (if (string-suffix? "/" path)
+                    `(,@components "")
+                    components)))
+             (input-path-absolute? (string-prefix? "/" path))
+             (output '())
+             (output-absolute? #f)
+             (output-ends-in-/? (string-suffix? "/" path)))
+    (cond
+     ((and input-path-absolute?
+           (null? input))
+      ;; Transfer the initial "/" from the input to the end of the
+      ;; output.
+      (scan '() #f output output-absolute? #t))
+     ((null? input)
+      (string-append
+       (if output-absolute? "/" "")
+       (encode-and-join-uri-path
+        (reverse output))
+       (if output-ends-in-/? "/" "")))
+     ((and (not input-path-absolute?)
+           (or (equal? (car input) "..")
+               (equal? (car input) ".")))
+      (scan (cdr input) #f output output-absolute? output-ends-in-/?))
+     ((and input-path-absolute?
+           (equal? (car input) "."))
+      (scan (cdr input) #t output output-absolute? output-ends-in-/?))
+     ((and input-path-absolute?
+           (equal? (car input) ".."))
+      (scan (cdr input) #t
+            (if (null? output)
+                output
+                (cdr output))
+            ;; Remove the last segment, including the preceding /. So,
+            ;; if there is 0 or 1 segment, remove the root / too.
+            (if (or (null? output) (null? (cdr output)))
+                #f  ;; remove the /
+                #t) ;; keep it
+            #f))
+     (else
+      (scan (cdr input)
+            ;; If there is only 1 item in input, then it does not end in
+            ;; /, so the recursive call does not start with
+            ;; /. Otherwise, the recursive call starts with /.
+            (not (null? (cdr input)))
+            (cons (car input) output)
+            ;; If the output is empty and the input path is absolute,
+            ;; the / of the transferred path is transferred as well.
+            (or output-absolute?
+                (and (null? output)
+                     input-path-absolute?))
+            #f)))))
+
+(define (merge-paths base-has-authority? base dependent)
+  "Return @samp{@var{base}/@var{dependent}}, with the subtelties of absolute
+ paths explained in RFC3986, section 5.2.3. If the base URI has an
+authority (userinfo, host, port), then the processing is a bit
+different."
+  (if (and base-has-authority?
+           (equal? base ""))
+      (string-append "/" dependent)
+      (let ((last-/ (string-rindex base #\/)))
+        (if last-/
+            (string-append (substring base 0 last-/) "/" dependent)
+            dependent))))
+
+(define (resolve-relative-reference base dependent)
+  "Resolve @var{dependent} on top of @var{base}, as RFC3986, section
+5.2. Both @var{dependent} and @var{base} may be URI or relative
+references. The return value is a URI if either @var{dependent} or
+@var{base} is a URI."
+  ;; As opposed to RFC 3986, we use "dependent" instead of "relative" to
+  ;; avoid confusion between "URI" and "relative reference", the
+  ;; dependent URI may be either.
+  (let ((b-scheme (uri-scheme base))
+        (b-userinfo (uri-userinfo base))
+        (b-host (uri-host base))
+        (b-port (uri-port base))
+        (b-path (uri-path base))
+        (b-query (uri-query base))
+        (b-fragment (uri-fragment base))
+        (r-scheme (uri-scheme dependent))
+        (r-userinfo (uri-userinfo dependent))
+        (r-host (uri-host dependent))
+        (r-port (uri-port dependent))
+        (r-path (uri-path dependent))
+        (r-query (uri-query dependent))
+        (r-fragment (uri-fragment dependent))
+        (t-scheme #f)
+        (t-userinfo #f)
+        (t-host #f)
+        (t-port #f)
+        (t-path "")
+        (t-query #f)
+        (t-fragment #f))
+    ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
+
+    ;;The programming style uses mutations to better adhere to the
+    ;;algorithm specification.
+    (if r-scheme
+        (begin
+          (set! t-scheme r-scheme)
+          (set! t-userinfo r-userinfo)
+          (set! t-host r-host)
+          (set! t-port r-port)
+          (set! t-path (remove-dot-segments r-path))
+          (set! t-query r-query))
+        ;; r-scheme is not defined:
+        (begin
+          (if r-host
+              (begin
+                (set! t-userinfo r-userinfo)
+                (set! t-host r-host)
+                (set! t-port r-port)
+                (set! t-path (remove-dot-segments r-path))
+                (set! t-query r-query))
+              ;; r-scheme is not defined, r-authority is not defined:
+              (begin
+                (if (equal? r-path "")
+                    (begin
+                      (set! t-path b-path)
+                      (if r-query
+                          ;; r-scheme, r-authority, r-path are not
+                          ;; defined:
+                          (set! t-query r-query)
+                          ;; r-scheme, r-authority, r-path, r-query are
+                          ;; not defined:
+                          (set! t-query b-query)))
+                    ;; r-scheme, r-authority not defined, r-path defined:
+                    (begin
+                      (if (string-prefix? "/" r-path)
+                          ;; r-scheme, r-authority not defined, r-path
+                          ;; absolute:
+                          (set! t-path (remove-dot-segments r-path))
+                          ;; r-scheme, r-authority not defined, r-path
+                          ;; dependent:
+                          (set! t-path
+                                (remove-dot-segments
+                                 (merge-paths b-host b-path r-path))))
+                      (set! t-query r-query)))
+                (set! t-userinfo b-userinfo)
+                (set! t-host b-host)
+                (set! t-port b-port)))
+          (set! t-scheme b-scheme)))
+    (set! t-fragment r-fragment)
+    (build-uri-reference #:scheme t-scheme
+                         #:userinfo t-userinfo
+                         #:host t-host
+                         #:port t-port
+                         #:path t-path
+                         #:query t-query
+                         #:fragment t-fragment)))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..c453bf60f 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -20,6 +20,7 @@
 (define-module (test-web-uri)
   #:use-module (web uri)
   #:use-module (ice-9 regex)
+  #:use-module (ice-9 string-fun)
   #:use-module (test-suite lib))
 
 
@@ -693,3 +694,70 @@
   (pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
   (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
   (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+  ;; Test suite in RFC3986, section 5.4.
+  (let ((base (string->uri "http://a/b/c/d;p?q";))
+        (equal/encoded?
+         ;; The test suite checks for ';' characters, but Guile escapes
+         ;; them in URIs. Same for '='.
+         (let ((escape-colon
+                (lambda (x)
+                  (string-replace-substring x ";" "%3B")))
+               (escape-equal
+                (lambda (x)
+                  (string-replace-substring x "=" "%3D"))))
+         (lambda (x y)
+           (equal? (escape-colon (escape-equal x))
+                   (escape-colon (escape-equal y)))))))
+    (let ((resolve
+           (lambda (relative)
+             (let* ((relative-uri
+                     (string->uri-reference relative))
+                    (resolved-uri
+                     (resolve-relative-reference base relative-uri))
+                    (resolved (uri->string resolved-uri)))
+               resolved))))
+      (with-test-prefix "normal"
+        (pass-if (equal/encoded? (resolve "g:h") "g:h"))
+        (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g";))
+        (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g";))
+        (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/";))
+        (pass-if (equal/encoded? (resolve "/g") "http://a/g";))
+        (pass-if (equal/encoded? (resolve "//g") "http://g";))
+        (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y";))
+        (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y";))
+        (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s";))
+        (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s";))
+        (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x";))
+        (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s";))
+        (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q";))
+        (pass-if (equal/encoded? (resolve ".") "http://a/b/c/";))
+        (pass-if (equal/encoded? (resolve "./") "http://a/b/c/";))
+        (pass-if (equal/encoded? (resolve "..") "http://a/b/";))
+        (pass-if (equal/encoded? (resolve "../") "http://a/b/";))
+        (pass-if (equal/encoded? (resolve "../g") "http://a/b/g";))
+        (pass-if (equal/encoded? (resolve "../..") "http://a/";))
+        (pass-if (equal/encoded? (resolve "../../") "http://a/";))
+        (pass-if (equal/encoded? (resolve "../../g") "http://a/g";)))
+      (with-test-prefix "abnormal"
+        (pass-if (equal/encoded? (resolve "../../../g") "http://a/g";))
+        (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g";))
+        (pass-if (equal/encoded? (resolve "/./g") "http://a/g";))
+        (pass-if (equal/encoded? (resolve "/../g") "http://a/g";))
+        (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g.";))
+        (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g";))
+        (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g..";))
+        (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g";))
+        (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g";))
+        (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/";))
+        (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h";))
+        (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h";))
+        (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y";))
+        (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y";))
+        (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x";))
+        (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x";))
+        (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x";))
+        (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x";))
+        (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
+        

base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb
-- 
2.41.0



reply via email to

[Prev in Thread] Current Thread [Next in Thread]