* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4. (merge-paths): Implement algorithm 5.2.3. (resolve-relative-reference): Implement algorithm 5.2.2. (module): Export resolve-relative-reference. * NEWS: Reference it here. * doc/ref/web.texi (URIs): Document it here. (Subtypes of URI): Add a @node declaration to cross-reference it. (HTTP Headers) [location]: Point to the section for different URI types. (Web Client) [http-request]: Indicate that no redirection is performed, and warn about blindly following them. * AUTHORS: Mention me. * THANKS: Thank me. --- AUTHORS | 8 ++ NEWS | 7 ++ THANKS | 1 + doc/ref/web.texi | 41 ++++++++- module/web/uri.scm | 161 +++++++++++++++++++++++++++++++++- test-suite/tests/web-uri.test | 67 ++++++++++++++ 6 files changed, 283 insertions(+), 2 deletions(-) diff --git a/AUTHORS b/AUTHORS index d756a74ce..2a95d3b0b 100644 --- a/AUTHORS +++ b/AUTHORS @@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore: Gregory Marton: In the subdirectory test-suite/tests, changes to: hash.test + +Vivien Kraus: +In the subdirectory module/web, changes to: + uri.scm +In the subdirectory doc/ref, changes to: + web.texi +In the subdirectory test-suite/tests, changes to: + web-uri.test diff --git a/NEWS b/NEWS index b319404d7..bdf75cb3c 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9) * New interfaces and functionality +** New function in (web uri): resolve-relative-reference + +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may +be used to request a moved resource in case of a 301 or 302 HTTP +response, by resolving the Location value of the response on top of the +requested URI. + ** New warning: unused-module This analysis, enabled at `-W2', issues warnings for modules that appear diff --git a/THANKS b/THANKS index aa4877e95..a1f982f04 100644 --- a/THANKS +++ b/THANKS @@ -19,6 +19,7 @@ Contributors since the last release: Chris K Jester-Young David Kastrup Daniel Kraft + Vivien Kraus Daniel Krueger Noah Lavine Christopher Lemmer Webber diff --git a/doc/ref/web.texi b/doc/ref/web.texi index 607c855b6..d92a8d51a 100644 --- a/doc/ref/web.texi +++ b/doc/ref/web.texi @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes as @code{"scrambled%20eggs/biscuits%26gravy"}. @end deffn +@node Subtypes of URI @subsubheading Subtypes of URI As we noted above, not all URI objects have a scheme. You might have @@ -356,6 +357,32 @@ Parse @var{string} into a URI object, while asserting that no scheme is present. Return @code{#f} if the string could not be parsed. @end deffn +@cindex resolve URI reference +In order to get a URI object from a base URI and a relative reference, +one has to use a @dfn{relative URI reference resolution} algorithm. For +instance, given a base URI, @samp{https://example.com/over/here}, and a +relative reference, @samp{../no/there}, it may seem easy to get an +absolute URI as @samp{https://example.com/over/../no/there}. It is +possible that the server at @samp{https://example.com} could serve the +same resource under this URL as +@samp{https://example.com/no/there}. However, a web cache, or a linked +data processor, must understand that the relative reference resolution +leads to @samp{https://example.com/no/there}. + +@deffn {Scheme procedure} resolve-relative-reference @var{base} @var{relative} +Return a URI object representing @var{relative}, using the components of +@var{base} if missing, as defined in section 5.2 in RFC 3986. Both +@var{base} and @var{relative} may be full URI or relative URI +references. The name ``relative'' indicates the argument’s relationship +to @var{base}, not its type. This function cannot return a relative +reference (it can only return an absolute URI object), if either +@var{base} or @var{relative} is an absolute URI object. + +Please note that any part of @var{base} may be overriden by +@var{relative}. For instance, if @var{base} has a @code{https} URI +scheme, and if @var{relative} has a @code{http} scheme, then the result +will have a @code{http} scheme. +@end deffn @node HTTP @subsection The Hyper-Text Transfer Protocol @@ -1038,7 +1065,8 @@ The entity-tag of the resource. @deftypevr {HTTP Header} URI-reference location A URI reference on which a request may be completed. Used in combination with a redirecting status code to perform client-side -redirection. +redirection. @xref{Subtypes of URI, the distinction between types of +URI}, for more information on relative references. @example (parse-header 'location "http://example.com/other") @result{} # @@ -1501,6 +1529,17 @@ constants, such as @code{certificate-status/signer-not-found} or Connect to the server corresponding to @var{uri} and make a request over HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.). +@code{http-request} does not follow redirections. If a redirection is +required, @code{http-request} returns a response object with an adequate +response code (e.g. 301 or 302). + +Making web requests on a network where private servers are hosted comes +with potential security risks. A malicious public server might forge +its DNS record to point to your internal address. It might also +redirect you to your internal server. In the first case, or if you +follow the redirection of the second case, then you may accidentally +connect to your private server as if it were public. + The following keyword arguments allow you to modify the requests in various ways, for example attaching a body to the request, or setting specific headers. The following table lists the keyword arguments and diff --git a/module/web/uri.scm b/module/web/uri.scm index 8e0b9bee7..319010097 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -1,6 +1,7 @@ ;;;; (web uri) --- URI manipulation tools ;;;; ;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc. +;;;; Copyright (C) 2023 Vivien Kraus ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -47,7 +48,9 @@ uri-reference? relative-ref? build-uri-reference build-relative-ref - string->uri-reference string->relative-ref)) + string->uri-reference string->relative-ref + + resolve-relative-reference)) (define-record-type (make-uri scheme userinfo host port path query fragment) @@ -501,3 +504,159 @@ strings, and join the parts together with ‘/’ as a delimiter. For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’ encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’." (string-join (map uri-encode parts) "/")) + +(define (remove-dot-segments path) + "Remove the @samp{./} and @samp{../} segments in @var{path}, as + RFC3986, section 5.2.4." + (let scan ((input + (let ((components (split-and-decode-uri-path path))) + (if (string-suffix? "/" path) + `(,@components "") + components))) + (input-path-absolute? (string-prefix? "/" path)) + (output '()) + (output-absolute? #f) + (output-ends-in-/? (string-suffix? "/" path))) + (cond + ((and input-path-absolute? + (null? input)) + ;; Transfer the initial "/" from the input to the end of the + ;; output. + (scan '() #f output output-absolute? #t)) + ((null? input) + (string-append + (if output-absolute? "/" "") + (encode-and-join-uri-path + (reverse output)) + (if output-ends-in-/? "/" ""))) + ((and (not input-path-absolute?) + (or (equal? (car input) "..") + (equal? (car input) "."))) + (scan (cdr input) #f output output-absolute? output-ends-in-/?)) + ((and input-path-absolute? + (equal? (car input) ".")) + (scan (cdr input) #t output output-absolute? output-ends-in-/?)) + ((and input-path-absolute? + (equal? (car input) "..")) + (scan (cdr input) #t + (if (null? output) + output + (cdr output)) + ;; Remove the last segment, including the preceding /. So, + ;; if there is 0 or 1 segment, remove the root / too. + (if (or (null? output) (null? (cdr output))) + #f ;; remove the / + #t) ;; keep it + #f)) + (else + (scan (cdr input) + ;; If there is only 1 item in input, then it does not end in + ;; /, so the recursive call does not start with + ;; /. Otherwise, the recursive call starts with /. + (not (null? (cdr input))) + (cons (car input) output) + ;; If the output is empty and the input path is absolute, + ;; the / of the transferred path is transferred as well. + (or output-absolute? + (and (null? output) + input-path-absolute?)) + #f))))) + +(define (merge-paths base-has-authority? base relative) + "Return @samp{@var{base}/@var{relative}}, with the subtleties of +absolute paths explained in RFC3986, section 5.2.3. If the base URI has +an authority (userinfo, host, port), then the processing is a bit +different." + (if (and base-has-authority? + (equal? base "")) + (string-append "/" relative) + (let ((last-/ (string-rindex base #\/))) + (if last-/ + (string-append (substring base 0 last-/) "/" relative) + relative)))) + +(define (resolve-relative-reference base relative) + "Resolve @var{relative} on top of @var{base}, as RFC3986, section +5.2. Both @var{relative} and @var{base} may be URI or relative +references. The name ``relative'' indicates the argument’s relationship +to @var{base}, not its type. Both @var{base} and @var{relative} may be +full URIs or relative references. The return value is a URI if either +@var{relative} or @var{base} is a URI." + (let ((b-scheme (uri-scheme base)) + (b-userinfo (uri-userinfo base)) + (b-host (uri-host base)) + (b-port (uri-port base)) + (b-path (uri-path base)) + (b-query (uri-query base)) + (b-fragment (uri-fragment base)) + (r-scheme (uri-scheme relative)) + (r-userinfo (uri-userinfo relative)) + (r-host (uri-host relative)) + (r-port (uri-port relative)) + (r-path (uri-path relative)) + (r-query (uri-query relative)) + (r-fragment (uri-fragment relative)) + (t-scheme #f) + (t-userinfo #f) + (t-host #f) + (t-port #f) + (t-path "") + (t-query #f) + (t-fragment #f)) + ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2 + + ;;The programming style uses mutations to better adhere to the + ;;algorithm specification. + (if r-scheme + (begin + (set! t-scheme r-scheme) + (set! t-userinfo r-userinfo) + (set! t-host r-host) + (set! t-port r-port) + (set! t-path (remove-dot-segments r-path)) + (set! t-query r-query)) + ;; r-scheme is not defined: + (begin + (if r-host + (begin + (set! t-userinfo r-userinfo) + (set! t-host r-host) + (set! t-port r-port) + (set! t-path (remove-dot-segments r-path)) + (set! t-query r-query)) + ;; r-scheme is not defined, r-authority is not defined: + (begin + (if (equal? r-path "") + (begin + (set! t-path b-path) + (if r-query + ;; r-scheme, r-authority, r-path are not + ;; defined: + (set! t-query r-query) + ;; r-scheme, r-authority, r-path, r-query are + ;; not defined: + (set! t-query b-query))) + ;; r-scheme, r-authority not defined, r-path defined: + (begin + (if (string-prefix? "/" r-path) + ;; r-scheme, r-authority not defined, r-path + ;; absolute: + (set! t-path (remove-dot-segments r-path)) + ;; r-scheme, r-authority not defined, r-path + ;; relative: + (set! t-path + (remove-dot-segments + (merge-paths b-host b-path r-path)))) + (set! t-query r-query))) + (set! t-userinfo b-userinfo) + (set! t-host b-host) + (set! t-port b-port))) + (set! t-scheme b-scheme))) + (set! t-fragment r-fragment) + (build-uri-reference #:scheme t-scheme + #:userinfo t-userinfo + #:host t-host + #:port t-port + #:path t-path + #:query t-query + #:fragment t-fragment))) diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 95fd82f16..cdd0dc7b6 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -20,6 +20,7 @@ (define-module (test-web-uri) #:use-module (web uri) #:use-module (ice-9 regex) + #:use-module (ice-9 string-fun) #:use-module (test-suite lib)) @@ -693,3 +694,69 @@ (pass-if (equal? "foo%20bar" (uri-encode "foo bar"))) (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar"))) (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^")))) + +(with-test-prefix "resolve relative reference" + ;; Test suite in RFC3986, section 5.4. + (let ((base (string->uri "http://a/b/c/d;p?q")) + (equal/encoded? + ;; The test suite checks for ';' characters, but Guile escapes + ;; them in URIs. Same for '='. + (let ((escape-colon + (lambda (x) + (string-replace-substring x ";" "%3B"))) + (escape-equal + (lambda (x) + (string-replace-substring x "=" "%3D")))) + (lambda (x y) + (equal? (escape-colon (escape-equal x)) + (escape-colon (escape-equal y))))))) + (let ((resolve + (lambda (relative) + (let* ((relative-uri + (string->uri-reference relative)) + (resolved-uri + (resolve-relative-reference base relative-uri)) + (resolved (uri->string resolved-uri))) + resolved)))) + (with-test-prefix "normal" + (pass-if (equal/encoded? (resolve "g:h") "g:h")) + (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g")) + (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g")) + (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/")) + (pass-if (equal/encoded? (resolve "/g") "http://a/g")) + (pass-if (equal/encoded? (resolve "//g") "http://g")) + (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y")) + (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y")) + (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s")) + (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s")) + (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x")) + (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s")) + (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q")) + (pass-if (equal/encoded? (resolve ".") "http://a/b/c/")) + (pass-if (equal/encoded? (resolve "./") "http://a/b/c/")) + (pass-if (equal/encoded? (resolve "..") "http://a/b/")) + (pass-if (equal/encoded? (resolve "../") "http://a/b/")) + (pass-if (equal/encoded? (resolve "../g") "http://a/b/g")) + (pass-if (equal/encoded? (resolve "../..") "http://a/")) + (pass-if (equal/encoded? (resolve "../../") "http://a/")) + (pass-if (equal/encoded? (resolve "../../g") "http://a/g"))) + (with-test-prefix "abnormal" + (pass-if (equal/encoded? (resolve "../../../g") "http://a/g")) + (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g")) + (pass-if (equal/encoded? (resolve "/./g") "http://a/g")) + (pass-if (equal/encoded? (resolve "/../g") "http://a/g")) + (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g.")) + (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g")) + (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g..")) + (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g")) + (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g")) + (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/")) + (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h")) + (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h")) + (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y")) + (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y")) + (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x")) + (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x")) + (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x")) + (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x")) + (pass-if (equal/encoded? (resolve "http:g") "http:g")))))) -- 2.41.0