* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4. (merge-paths): Implement algorithm 5.2.3. (resolve-relative-reference): Implement algorithm 5.2.2. (module): Export resolve-relative-reference. * NEWS: Reference it here. * doc/ref/web.texi (URIs): Document it here. (Subtypes of URI): Add a @node declaration to cross-reference it. (HTTP Headers) [location]: Point to the section for different URI types. (Web Client) [http-request]: Indicate that no redirection is performed. --- I clarified the situation about redirections. I don’t think it’s Guile’s job to do it. For permanent redirections (301), the application developer is supposed to edit the pages that point to the now-moved resource anyway. A handful of security issues must also be lurking in the shadows, and I don’t think it should be a responsibility for the Guile web client. The specification uses the word "relative" both for the type of URI that is most likely to be found, and to express the asymmetric relation between both arguments of the algorithm. I think "base" and "dependent" are clearer, what do you think? The semicolon and equal sign are both reserved characters, so it’s expected that Guile escapes them. If there’s a bug, it is in the 5.4 section of the RFC. However, I understand that it would be desirable for the algorithm to accept such unescaped characters, since it works with URIs in isolation and not in an HTTP frame or web page. NEWS | 7 ++ doc/ref/web.texi | 27 +++++- module/web/uri.scm | 161 +++++++++++++++++++++++++++++++++- test-suite/tests/web-uri.test | 68 ++++++++++++++ 4 files changed, 261 insertions(+), 2 deletions(-) diff --git a/NEWS b/NEWS index b319404d7..bdf75cb3c 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9) * New interfaces and functionality +** New function in (web uri): resolve-relative-reference + +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may +be used to request a moved resource in case of a 301 or 302 HTTP +response, by resolving the Location value of the response on top of the +requested URI. + ** New warning: unused-module This analysis, enabled at `-W2', issues warnings for modules that appear diff --git a/doc/ref/web.texi b/doc/ref/web.texi index 607c855b6..2267c9774 100644 --- a/doc/ref/web.texi +++ b/doc/ref/web.texi @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes as @code{"scrambled%20eggs/biscuits%26gravy"}. @end deffn +@node Subtypes of URI @subsubheading Subtypes of URI As we noted above, not all URI objects have a scheme. You might have @@ -356,6 +357,25 @@ Parse @var{string} into a URI object, while asserting that no scheme is present. Return @code{#f} if the string could not be parsed. @end deffn +@cindex resolve URI reference +In order to get a URI object from a base URI and a relative reference, +one has to use a @dfn{relative URI reference resolution} algorithm. For +instance, given a base URI, @samp{https://example.com/over/here}, and a +relative reference, @samp{../no/there}, it may seem easy to get an +absolute URI as @samp{https://example.com/over/../no/there}. It is +possible that the server at @samp{https://example.com} could serve the +same resource under this URL as +@samp{https://example.com/no/there}. However, a web cache, or a linked +data processor, must understand that the relative reference resolution +leads to @samp{https://example.com/no/there}. + +@deffn {Scheme procedure} resolve-relative-reference @var{base} @var{dependent} +Return a URI object representing @var{dependent}, using the components +of @var{base} if missing, as defined in section 5.2 in RFC 3986. This +function cannot return a relative reference (it can only return an +absolute URI object), if either @var{base} or @var{dependent} is an +absolute URI object. +@end deffn @node HTTP @subsection The Hyper-Text Transfer Protocol @@ -1038,7 +1058,8 @@ The entity-tag of the resource. @deftypevr {HTTP Header} URI-reference location A URI reference on which a request may be completed. Used in combination with a redirecting status code to perform client-side -redirection. +redirection. @xref{Subtypes of URI, the distinction between types of +URI}, for more information on relative references. @example (parse-header 'location "http://example.com/other") @result{} # @@ -1501,6 +1522,10 @@ constants, such as @code{certificate-status/signer-not-found} or Connect to the server corresponding to @var{uri} and make a request over HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.). +@code{http-request} does not follow redirections. If a redirection is +required, @code{http-request} returns a response object with an adequate +response code (e.g. 301 or 302). + The following keyword arguments allow you to modify the requests in various ways, for example attaching a body to the request, or setting specific headers. The following table lists the keyword arguments and diff --git a/module/web/uri.scm b/module/web/uri.scm index 8e0b9bee7..acec2d1e8 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -47,7 +47,9 @@ uri-reference? relative-ref? build-uri-reference build-relative-ref - string->uri-reference string->relative-ref)) + string->uri-reference string->relative-ref + + resolve-relative-reference)) (define-record-type (make-uri scheme userinfo host port path query fragment) @@ -501,3 +503,160 @@ strings, and join the parts together with ‘/’ as a delimiter. For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’ encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’." (string-join (map uri-encode parts) "/")) + +(define (remove-dot-segments path) + "Remove the @samp{./} and @samp{../} segments in @var{path}, as + RFC3986, section 5.2.4." + (let scan ((input + (let ((components (split-and-decode-uri-path path))) + (if (string-suffix? "/" path) + `(,@components "") + components))) + (input-path-absolute? (string-prefix? "/" path)) + (output '()) + (output-absolute? #f) + (output-ends-in-/? (string-suffix? "/" path))) + (cond + ((and input-path-absolute? + (null? input)) + ;; Transfer the initial "/" from the input to the end of the + ;; output. + (scan '() #f output output-absolute? #t)) + ((null? input) + (string-append + (if output-absolute? "/" "") + (encode-and-join-uri-path + (reverse output)) + (if output-ends-in-/? "/" ""))) + ((and (not input-path-absolute?) + (or (equal? (car input) "..") + (equal? (car input) "."))) + (scan (cdr input) #f output output-absolute? output-ends-in-/?)) + ((and input-path-absolute? + (equal? (car input) ".")) + (scan (cdr input) #t output output-absolute? output-ends-in-/?)) + ((and input-path-absolute? + (equal? (car input) "..")) + (scan (cdr input) #t + (if (null? output) + output + (cdr output)) + ;; Remove the last segment, including the preceding /. So, + ;; if there is 0 or 1 segment, remove the root / too. + (if (or (null? output) (null? (cdr output))) + #f ;; remove the / + #t) ;; keep it + #f)) + (else + (scan (cdr input) + ;; If there is only 1 item in input, then it does not end in + ;; /, so the recursive call does not start with + ;; /. Otherwise, the recursive call starts with /. + (not (null? (cdr input))) + (cons (car input) output) + ;; If the output is empty and the input path is absolute, + ;; the / of the transferred path is transferred as well. + (or output-absolute? + (and (null? output) + input-path-absolute?)) + #f))))) + +(define (merge-paths base-has-authority? base dependent) + "Return @samp{@var{base}/@var{dependent}}, with the subtelties of absolute + paths explained in RFC3986, section 5.2.3. If the base URI has an +authority (userinfo, host, port), then the processing is a bit +different." + (if (and base-has-authority? + (equal? base "")) + (string-append "/" dependent) + (let ((last-/ (string-rindex base #\/))) + (if last-/ + (string-append (substring base 0 last-/) "/" dependent) + dependent)))) + +(define (resolve-relative-reference base dependent) + "Resolve @var{dependent} on top of @var{base}, as RFC3986, section +5.2. Both @var{dependent} and @var{base} may be URI or relative +references. The return value is a URI if either @var{dependent} or +@var{base} is a URI." + ;; As opposed to RFC 3986, we use "dependent" instead of "relative" to + ;; avoid confusion between "URI" and "relative reference", the + ;; dependent URI may be either. + (let ((b-scheme (uri-scheme base)) + (b-userinfo (uri-userinfo base)) + (b-host (uri-host base)) + (b-port (uri-port base)) + (b-path (uri-path base)) + (b-query (uri-query base)) + (b-fragment (uri-fragment base)) + (r-scheme (uri-scheme dependent)) + (r-userinfo (uri-userinfo dependent)) + (r-host (uri-host dependent)) + (r-port (uri-port dependent)) + (r-path (uri-path dependent)) + (r-query (uri-query dependent)) + (r-fragment (uri-fragment dependent)) + (t-scheme #f) + (t-userinfo #f) + (t-host #f) + (t-port #f) + (t-path "") + (t-query #f) + (t-fragment #f)) + ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2 + + ;;The programming style uses mutations to better adhere to the + ;;algorithm specification. + (if r-scheme + (begin + (set! t-scheme r-scheme) + (set! t-userinfo r-userinfo) + (set! t-host r-host) + (set! t-port r-port) + (set! t-path (remove-dot-segments r-path)) + (set! t-query r-query)) + ;; r-scheme is not defined: + (begin + (if r-host + (begin + (set! t-userinfo r-userinfo) + (set! t-host r-host) + (set! t-port r-port) + (set! t-path (remove-dot-segments r-path)) + (set! t-query r-query)) + ;; r-scheme is not defined, r-authority is not defined: + (begin + (if (equal? r-path "") + (begin + (set! t-path b-path) + (if r-query + ;; r-scheme, r-authority, r-path are not + ;; defined: + (set! t-query r-query) + ;; r-scheme, r-authority, r-path, r-query are + ;; not defined: + (set! t-query b-query))) + ;; r-scheme, r-authority not defined, r-path defined: + (begin + (if (string-prefix? "/" r-path) + ;; r-scheme, r-authority not defined, r-path + ;; absolute: + (set! t-path (remove-dot-segments r-path)) + ;; r-scheme, r-authority not defined, r-path + ;; dependent: + (set! t-path + (remove-dot-segments + (merge-paths b-host b-path r-path)))) + (set! t-query r-query))) + (set! t-userinfo b-userinfo) + (set! t-host b-host) + (set! t-port b-port))) + (set! t-scheme b-scheme))) + (set! t-fragment r-fragment) + (build-uri-reference #:scheme t-scheme + #:userinfo t-userinfo + #:host t-host + #:port t-port + #:path t-path + #:query t-query + #:fragment t-fragment))) diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 95fd82f16..c453bf60f 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -20,6 +20,7 @@ (define-module (test-web-uri) #:use-module (web uri) #:use-module (ice-9 regex) + #:use-module (ice-9 string-fun) #:use-module (test-suite lib)) @@ -693,3 +694,70 @@ (pass-if (equal? "foo%20bar" (uri-encode "foo bar"))) (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar"))) (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^")))) + +(with-test-prefix "resolve relative reference" + ;; Test suite in RFC3986, section 5.4. + (let ((base (string->uri "http://a/b/c/d;p?q")) + (equal/encoded? + ;; The test suite checks for ';' characters, but Guile escapes + ;; them in URIs. Same for '='. + (let ((escape-colon + (lambda (x) + (string-replace-substring x ";" "%3B"))) + (escape-equal + (lambda (x) + (string-replace-substring x "=" "%3D")))) + (lambda (x y) + (equal? (escape-colon (escape-equal x)) + (escape-colon (escape-equal y))))))) + (let ((resolve + (lambda (relative) + (let* ((relative-uri + (string->uri-reference relative)) + (resolved-uri + (resolve-relative-reference base relative-uri)) + (resolved (uri->string resolved-uri))) + resolved)))) + (with-test-prefix "normal" + (pass-if (equal/encoded? (resolve "g:h") "g:h")) + (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g")) + (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g")) + (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/")) + (pass-if (equal/encoded? (resolve "/g") "http://a/g")) + (pass-if (equal/encoded? (resolve "//g") "http://g")) + (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y")) + (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y")) + (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s")) + (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s")) + (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x")) + (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s")) + (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q")) + (pass-if (equal/encoded? (resolve ".") "http://a/b/c/")) + (pass-if (equal/encoded? (resolve "./") "http://a/b/c/")) + (pass-if (equal/encoded? (resolve "..") "http://a/b/")) + (pass-if (equal/encoded? (resolve "../") "http://a/b/")) + (pass-if (equal/encoded? (resolve "../g") "http://a/b/g")) + (pass-if (equal/encoded? (resolve "../..") "http://a/")) + (pass-if (equal/encoded? (resolve "../../") "http://a/")) + (pass-if (equal/encoded? (resolve "../../g") "http://a/g"))) + (with-test-prefix "abnormal" + (pass-if (equal/encoded? (resolve "../../../g") "http://a/g")) + (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g")) + (pass-if (equal/encoded? (resolve "/./g") "http://a/g")) + (pass-if (equal/encoded? (resolve "/../g") "http://a/g")) + (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g.")) + (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g")) + (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g..")) + (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g")) + (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g")) + (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/")) + (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h")) + (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h")) + (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y")) + (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y")) + (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x")) + (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x")) + (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x")) + (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x")) + (pass-if (equal/encoded? (resolve "http:g") "http:g")))))) + base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb -- 2.41.0