From: Vivien Kraus <vivien@planete-kraus.eu>
To: guile-devel@gnu.org
Cc: Maxime Devos <maximedevos@telenet.be>
Subject: [PATCH 1/3] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
Date: Mon, 25 Sep 2023 18:48:28 +0200 [thread overview]
Message-ID: <d6f4e5597bb61744552e4062020000eba2270dad.1698483663.git.vivien@planete-kraus.eu> (raw)
In-Reply-To: <cover.1698483663.git.vivien@planete-kraus.eu>
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 17963 bytes --]
* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
* doc/ref/web.texi (URIs): Document it here.
(Subtypes of URI): Add a @node declaration to cross-reference it.
(HTTP Headers) [location]: Point to the section for different URI types.
(Web Client) [http-request]: Indicate that no redirection is performed,
and warn about blindly following them.
* AUTHORS: Mention me.
* THANKS: Thank me.
---
AUTHORS | 8 ++
NEWS | 7 ++
THANKS | 1 +
doc/ref/web.texi | 41 ++++++++-
module/web/uri.scm | 161 +++++++++++++++++++++++++++++++++-
test-suite/tests/web-uri.test | 67 ++++++++++++++
6 files changed, 283 insertions(+), 2 deletions(-)
diff --git a/AUTHORS b/AUTHORS
index d756a74ce..2a95d3b0b 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore:
Gregory Marton:
In the subdirectory test-suite/tests, changes to:
hash.test
+
+Vivien Kraus:
+In the subdirectory module/web, changes to:
+ uri.scm
+In the subdirectory doc/ref, changes to:
+ web.texi
+In the subdirectory test-suite/tests, changes to:
+ web-uri.test
diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
* New interfaces and functionality
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
** New warning: unused-module
This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/THANKS b/THANKS
index aa4877e95..a1f982f04 100644
--- a/THANKS
+++ b/THANKS
@@ -19,6 +19,7 @@ Contributors since the last release:
Chris K Jester-Young
David Kastrup
Daniel Kraft
+ Vivien Kraus
Daniel Krueger
Noah Lavine
Christopher Lemmer Webber
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855b6..d92a8d51a 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
as @code{"scrambled%20eggs/biscuits%26gravy"}.
@end deffn
+@node Subtypes of URI
@subsubheading Subtypes of URI
As we noted above, not all URI objects have a scheme. You might have
@@ -356,6 +357,32 @@ Parse @var{string} into a URI object, while asserting that no scheme is
present. Return @code{#f} if the string could not be parsed.
@end deffn
+@cindex resolve URI reference
+In order to get a URI object from a base URI and a relative reference,
+one has to use a @dfn{relative URI reference resolution} algorithm. For
+instance, given a base URI, @samp{https://example.com/over/here}, and a
+relative reference, @samp{../no/there}, it may seem easy to get an
+absolute URI as @samp{https://example.com/over/../no/there}. It is
+possible that the server at @samp{https://example.com} could serve the
+same resource under this URL as
+@samp{https://example.com/no/there}. However, a web cache, or a linked
+data processor, must understand that the relative reference resolution
+leads to @samp{https://example.com/no/there}.
+
+@deffn {Scheme procedure} resolve-relative-reference @var{base} @var{relative}
+Return a URI object representing @var{relative}, using the components of
+@var{base} if missing, as defined in section 5.2 in RFC 3986. Both
+@var{base} and @var{relative} may be full URI or relative URI
+references. The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type. This function cannot return a relative
+reference (it can only return an absolute URI object), if either
+@var{base} or @var{relative} is an absolute URI object.
+
+Please note that any part of @var{base} may be overriden by
+@var{relative}. For instance, if @var{base} has a @code{https} URI
+scheme, and if @var{relative} has a @code{http} scheme, then the result
+will have a @code{http} scheme.
+@end deffn
@node HTTP
@subsection The Hyper-Text Transfer Protocol
@@ -1038,7 +1065,8 @@ The entity-tag of the resource.
@deftypevr {HTTP Header} URI-reference location
A URI reference on which a request may be completed. Used in
combination with a redirecting status code to perform client-side
-redirection.
+redirection. @xref{Subtypes of URI, the distinction between types of
+URI}, for more information on relative references.
@example
(parse-header 'location "http://example.com/other")
@result{} #<uri ...>
@@ -1501,6 +1529,17 @@ constants, such as @code{certificate-status/signer-not-found} or
Connect to the server corresponding to @var{uri} and make a request over
HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.).
+@code{http-request} does not follow redirections. If a redirection is
+required, @code{http-request} returns a response object with an adequate
+response code (e.g. 301 or 302).
+
+Making web requests on a network where private servers are hosted comes
+with potential security risks. A malicious public server might forge
+its DNS record to point to your internal address. It might also
+redirect you to your internal server. In the first case, or if you
+follow the redirection of the second case, then you may accidentally
+connect to your private server as if it were public.
+
The following keyword arguments allow you to modify the requests in
various ways, for example attaching a body to the request, or setting
specific headers. The following table lists the keyword arguments and
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..319010097 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -1,6 +1,7 @@
;;;; (web uri) --- URI manipulation tools
;;;;
;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc.
+;;;; Copyright (C) 2023 Vivien Kraus
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -47,7 +48,9 @@
uri-reference? relative-ref?
build-uri-reference build-relative-ref
- string->uri-reference string->relative-ref))
+ string->uri-reference string->relative-ref
+
+ resolve-relative-reference))
(define-record-type <uri>
(make-uri scheme userinfo host port path query fragment)
@@ -501,3 +504,159 @@ strings, and join the parts together with ‘/’ as a delimiter.
For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
(string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+ "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+ RFC3986, section 5.2.4."
+ (let scan ((input
+ (let ((components (split-and-decode-uri-path path)))
+ (if (string-suffix? "/" path)
+ `(,@components "")
+ components)))
+ (input-path-absolute? (string-prefix? "/" path))
+ (output '())
+ (output-absolute? #f)
+ (output-ends-in-/? (string-suffix? "/" path)))
+ (cond
+ ((and input-path-absolute?
+ (null? input))
+ ;; Transfer the initial "/" from the input to the end of the
+ ;; output.
+ (scan '() #f output output-absolute? #t))
+ ((null? input)
+ (string-append
+ (if output-absolute? "/" "")
+ (encode-and-join-uri-path
+ (reverse output))
+ (if output-ends-in-/? "/" "")))
+ ((and (not input-path-absolute?)
+ (or (equal? (car input) "..")
+ (equal? (car input) ".")))
+ (scan (cdr input) #f output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) "."))
+ (scan (cdr input) #t output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) ".."))
+ (scan (cdr input) #t
+ (if (null? output)
+ output
+ (cdr output))
+ ;; Remove the last segment, including the preceding /. So,
+ ;; if there is 0 or 1 segment, remove the root / too.
+ (if (or (null? output) (null? (cdr output)))
+ #f ;; remove the /
+ #t) ;; keep it
+ #f))
+ (else
+ (scan (cdr input)
+ ;; If there is only 1 item in input, then it does not end in
+ ;; /, so the recursive call does not start with
+ ;; /. Otherwise, the recursive call starts with /.
+ (not (null? (cdr input)))
+ (cons (car input) output)
+ ;; If the output is empty and the input path is absolute,
+ ;; the / of the transferred path is transferred as well.
+ (or output-absolute?
+ (and (null? output)
+ input-path-absolute?))
+ #f)))))
+
+(define (merge-paths base-has-authority? base relative)
+ "Return @samp{@var{base}/@var{relative}}, with the subtleties of
+absolute paths explained in RFC3986, section 5.2.3. If the base URI has
+an authority (userinfo, host, port), then the processing is a bit
+different."
+ (if (and base-has-authority?
+ (equal? base ""))
+ (string-append "/" relative)
+ (let ((last-/ (string-rindex base #\/)))
+ (if last-/
+ (string-append (substring base 0 last-/) "/" relative)
+ relative))))
+
+(define (resolve-relative-reference base relative)
+ "Resolve @var{relative} on top of @var{base}, as RFC3986, section
+5.2. Both @var{relative} and @var{base} may be URI or relative
+references. The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type. Both @var{base} and @var{relative} may be
+full URIs or relative references. The return value is a URI if either
+@var{relative} or @var{base} is a URI."
+ (let ((b-scheme (uri-scheme base))
+ (b-userinfo (uri-userinfo base))
+ (b-host (uri-host base))
+ (b-port (uri-port base))
+ (b-path (uri-path base))
+ (b-query (uri-query base))
+ (b-fragment (uri-fragment base))
+ (r-scheme (uri-scheme relative))
+ (r-userinfo (uri-userinfo relative))
+ (r-host (uri-host relative))
+ (r-port (uri-port relative))
+ (r-path (uri-path relative))
+ (r-query (uri-query relative))
+ (r-fragment (uri-fragment relative))
+ (t-scheme #f)
+ (t-userinfo #f)
+ (t-host #f)
+ (t-port #f)
+ (t-path "")
+ (t-query #f)
+ (t-fragment #f))
+ ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
+
+ ;;The programming style uses mutations to better adhere to the
+ ;;algorithm specification.
+ (if r-scheme
+ (begin
+ (set! t-scheme r-scheme)
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined:
+ (begin
+ (if r-host
+ (begin
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined, r-authority is not defined:
+ (begin
+ (if (equal? r-path "")
+ (begin
+ (set! t-path b-path)
+ (if r-query
+ ;; r-scheme, r-authority, r-path are not
+ ;; defined:
+ (set! t-query r-query)
+ ;; r-scheme, r-authority, r-path, r-query are
+ ;; not defined:
+ (set! t-query b-query)))
+ ;; r-scheme, r-authority not defined, r-path defined:
+ (begin
+ (if (string-prefix? "/" r-path)
+ ;; r-scheme, r-authority not defined, r-path
+ ;; absolute:
+ (set! t-path (remove-dot-segments r-path))
+ ;; r-scheme, r-authority not defined, r-path
+ ;; relative:
+ (set! t-path
+ (remove-dot-segments
+ (merge-paths b-host b-path r-path))))
+ (set! t-query r-query)))
+ (set! t-userinfo b-userinfo)
+ (set! t-host b-host)
+ (set! t-port b-port)))
+ (set! t-scheme b-scheme)))
+ (set! t-fragment r-fragment)
+ (build-uri-reference #:scheme t-scheme
+ #:userinfo t-userinfo
+ #:host t-host
+ #:port t-port
+ #:path t-path
+ #:query t-query
+ #:fragment t-fragment)))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..cdd0dc7b6 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -20,6 +20,7 @@
(define-module (test-web-uri)
#:use-module (web uri)
#:use-module (ice-9 regex)
+ #:use-module (ice-9 string-fun)
#:use-module (test-suite lib))
@@ -693,3 +694,69 @@
(pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
(pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
(pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+ ;; Test suite in RFC3986, section 5.4.
+ (let ((base (string->uri "http://a/b/c/d;p?q"))
+ (equal/encoded?
+ ;; The test suite checks for ';' characters, but Guile escapes
+ ;; them in URIs. Same for '='.
+ (let ((escape-colon
+ (lambda (x)
+ (string-replace-substring x ";" "%3B")))
+ (escape-equal
+ (lambda (x)
+ (string-replace-substring x "=" "%3D"))))
+ (lambda (x y)
+ (equal? (escape-colon (escape-equal x))
+ (escape-colon (escape-equal y)))))))
+ (let ((resolve
+ (lambda (relative)
+ (let* ((relative-uri
+ (string->uri-reference relative))
+ (resolved-uri
+ (resolve-relative-reference base relative-uri))
+ (resolved (uri->string resolved-uri)))
+ resolved))))
+ (with-test-prefix "normal"
+ (pass-if (equal/encoded? (resolve "g:h") "g:h"))
+ (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "/g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "//g") "http://g"))
+ (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y"))
+ (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y"))
+ (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s"))
+ (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s"))
+ (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x"))
+ (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s"))
+ (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q"))
+ (pass-if (equal/encoded? (resolve ".") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "./") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "..") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "../..") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../g") "http://a/g")))
+ (with-test-prefix "abnormal"
+ (pass-if (equal/encoded? (resolve "../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/./g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g."))
+ (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g"))
+ (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g.."))
+ (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g"))
+ (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h"))
+ (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h"))
+ (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y"))
+ (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y"))
+ (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x"))
+ (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x"))
+ (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x"))
+ (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x"))
+ (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
--
2.41.0
next prev parent reply other threads:[~2023-09-25 16:48 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-10-28 9:01 [PATCH 0/3] Parse the Link header Vivien Kraus
2023-09-25 16:48 ` Vivien Kraus [this message]
2023-10-28 8:51 ` [PATCH 3/3] Parse the HTTP " Vivien Kraus
2023-10-28 8:57 ` [PATCH 2/3] Update section comment in (web http) Vivien Kraus
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=d6f4e5597bb61744552e4062020000eba2270dad.1698483663.git.vivien@planete-kraus.eu \
--to=vivien@planete-kraus.eu \
--cc=guile-devel@gnu.org \
--cc=maximedevos@telenet.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).