* [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
@ 2023-09-25 16:48 Vivien Kraus
2023-09-25 20:46 ` Maxime Devos
0 siblings, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-09-25 16:48 UTC (permalink / raw)
To: guile-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 12946 bytes --]
* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
---
Dear Guile developers,
When you request https://example.com/resource an URI and get redirected
to "here", you end up with 2 URI references:
- https://example.com/resource
- here
What should you request next? The answer is,
"https://example.com/here". It seems evident how we go from one to the
other.
However, there are more subtle cases. What if you get redirected to
"../here", for instance?
RFC 3986 has you covered, in section 5.2. It explains how we go from a
base URI and a URI reference to the new URI.
What do you think?
Best regards,
Vivien
NEWS | 7 ++
module/web/uri.scm | 152 +++++++++++++++++++++++++++++++++-
test-suite/tests/web-uri.test | 68 +++++++++++++++
3 files changed, 226 insertions(+), 1 deletion(-)
diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
* New interfaces and functionality
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
** New warning: unused-module
This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..2b80c3847 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -47,7 +47,9 @@
uri-reference? relative-ref?
build-uri-reference build-relative-ref
- string->uri-reference string->relative-ref))
+ string->uri-reference string->relative-ref
+
+ resolve-relative-reference))
(define-record-type <uri>
(make-uri scheme userinfo host port path query fragment)
@@ -501,3 +503,151 @@ strings, and join the parts together with ‘/’ as a delimiter.
For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
(string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+ "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+ RFC3986, section 5.2.4."
+ (let scan ((input
+ (let ((components (split-and-decode-uri-path path)))
+ (if (string-suffix? "/" path)
+ `(,@components "")
+ components)))
+ (input-path-absolute? (string-prefix? "/" path))
+ (output '())
+ (output-absolute? #f)
+ (output-ends-in-/? (string-suffix? "/" path)))
+ (cond
+ ((and input-path-absolute?
+ (null? input))
+ ;; Transfer the initial "/" from the input to the end of the
+ ;; output.
+ (scan '() #f output output-absolute? #t))
+ ((null? input)
+ (string-append
+ (if output-absolute? "/" "")
+ (encode-and-join-uri-path
+ (reverse output))
+ (if output-ends-in-/? "/" "")))
+ ((and (not input-path-absolute?)
+ (or (equal? (car input) "..")
+ (equal? (car input) ".")))
+ (scan (cdr input) #f output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) "."))
+ (scan (cdr input) #t output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) ".."))
+ (scan (cdr input) #t
+ (if (null? output)
+ output
+ (cdr output))
+ ;; Remove the last segment, including the preceding /. So,
+ ;; if there is 0 or 1 segment, remove the root / too.
+ (if (or (null? output) (null? (cdr output)))
+ #f ;; remove the /
+ #t) ;; keep it
+ #f))
+ (else
+ (scan (cdr input)
+ ;; If there is only 1 item in input, then it does not end in
+ ;; /, so the recursive call does not start with
+ ;; /. Otherwise, the recursive call starts with /.
+ (not (null? (cdr input)))
+ (cons (car input) output)
+ ;; If the output is empty and the input path is absolute,
+ ;; the / of the transferred path is transferred as well.
+ (or output-absolute?
+ (and (null? output)
+ input-path-absolute?))
+ #f)))))
+
+(define (merge-paths base-has-authority? base relative)
+ "Return @samp{@var{base}/@var{relative}}, with the subtelties of absolute
+ paths explained in RFC3986, section 5.2.3. If the base URI has an
+authority (userinfo, host, port), then the processing is a bit
+different."
+ (if (and base-has-authority?
+ (equal? base ""))
+ (string-append "/" relative)
+ (let ((last-/ (string-rindex base #\/)))
+ (if last-/
+ (string-append (substring base 0 last-/) "/" relative)
+ relative))))
+
+(define (resolve-relative-reference base relative)
+ "Resolve @var{relative} on top of @var{base}, as RFC3986, section 5.2."
+ (let ((b-scheme (uri-scheme base))
+ (b-userinfo (uri-userinfo base))
+ (b-host (uri-host base))
+ (b-port (uri-port base))
+ (b-path (uri-path base))
+ (b-query (uri-query base))
+ (b-fragment (uri-fragment base))
+ (r-scheme (uri-scheme relative))
+ (r-userinfo (uri-userinfo relative))
+ (r-host (uri-host relative))
+ (r-port (uri-port relative))
+ (r-path (uri-path relative))
+ (r-query (uri-query relative))
+ (r-fragment (uri-fragment relative))
+ (t-scheme #f)
+ (t-userinfo #f)
+ (t-host #f)
+ (t-port #f)
+ (t-path "")
+ (t-query #f)
+ (t-fragment #f))
+ ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
+ (if r-scheme
+ (begin
+ (set! t-scheme r-scheme)
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined:
+ (begin
+ (if r-host
+ (begin
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined, r-authority is not defined:
+ (begin
+ (if (equal? r-path "")
+ (begin
+ (set! t-path b-path)
+ (if r-query
+ ;; r-scheme, r-authority, r-path are not
+ ;; defined:
+ (set! t-query r-query)
+ ;; r-scheme, r-authority, r-path, r-query are
+ ;; not defined:
+ (set! t-query b-query)))
+ ;; r-scheme, r-authority not defined, r-path defined:
+ (begin
+ (if (string-prefix? "/" r-path)
+ ;; r-scheme, r-authority not defined, r-path
+ ;; absolute:
+ (set! t-path (remove-dot-segments r-path))
+ ;; r-scheme, r-authority not defined, r-path
+ ;; relative:
+ (set! t-path
+ (remove-dot-segments
+ (merge-paths b-host b-path r-path))))
+ (set! t-query r-query)))
+ (set! t-userinfo b-userinfo)
+ (set! t-host b-host)
+ (set! t-port b-port)))
+ (set! t-scheme b-scheme)))
+ (set! t-fragment r-fragment)
+ (build-uri-reference #:scheme t-scheme
+ #:userinfo t-userinfo
+ #:host t-host
+ #:port t-port
+ #:path t-path
+ #:query t-query
+ #:fragment t-fragment)))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..c453bf60f 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -20,6 +20,7 @@
(define-module (test-web-uri)
#:use-module (web uri)
#:use-module (ice-9 regex)
+ #:use-module (ice-9 string-fun)
#:use-module (test-suite lib))
@@ -693,3 +694,70 @@
(pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
(pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
(pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+ ;; Test suite in RFC3986, section 5.4.
+ (let ((base (string->uri "http://a/b/c/d;p?q"))
+ (equal/encoded?
+ ;; The test suite checks for ';' characters, but Guile escapes
+ ;; them in URIs. Same for '='.
+ (let ((escape-colon
+ (lambda (x)
+ (string-replace-substring x ";" "%3B")))
+ (escape-equal
+ (lambda (x)
+ (string-replace-substring x "=" "%3D"))))
+ (lambda (x y)
+ (equal? (escape-colon (escape-equal x))
+ (escape-colon (escape-equal y)))))))
+ (let ((resolve
+ (lambda (relative)
+ (let* ((relative-uri
+ (string->uri-reference relative))
+ (resolved-uri
+ (resolve-relative-reference base relative-uri))
+ (resolved (uri->string resolved-uri)))
+ resolved))))
+ (with-test-prefix "normal"
+ (pass-if (equal/encoded? (resolve "g:h") "g:h"))
+ (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "/g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "//g") "http://g"))
+ (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y"))
+ (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y"))
+ (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s"))
+ (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s"))
+ (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x"))
+ (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s"))
+ (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q"))
+ (pass-if (equal/encoded? (resolve ".") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "./") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "..") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "../..") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../g") "http://a/g")))
+ (with-test-prefix "abnormal"
+ (pass-if (equal/encoded? (resolve "../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/./g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g."))
+ (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g"))
+ (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g.."))
+ (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g"))
+ (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h"))
+ (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h"))
+ (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y"))
+ (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y"))
+ (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x"))
+ (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x"))
+ (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x"))
+ (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x"))
+ (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
+
base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-09-25 20:46 ` Maxime Devos
@ 2023-09-25 16:48 ` Vivien Kraus
2023-10-02 16:32 ` Vivien Kraus
2023-10-03 20:03 ` [PATCH] " Vivien Kraus
1 sibling, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-09-25 16:48 UTC (permalink / raw)
To: Maxime Devos, guile-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 16887 bytes --]
* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
* doc/ref/web.texi (URIs): Document it here.
(Subtypes of URI): Add a @node declaration to cross-reference it.
(HTTP Headers) [location]: Point to the section for different URI types.
(Web Client) [http-request]: Indicate that no redirection is performed.
---
I clarified the situation about redirections. I don’t think it’s Guile’s
job to do it. For permanent redirections (301), the application
developer is supposed to edit the pages that point to the now-moved
resource anyway. A handful of security issues must also be lurking in
the shadows, and I don’t think it should be a responsibility for the
Guile web client.
The specification uses the word "relative" both for the type of URI that
is most likely to be found, and to express the asymmetric relation
between both arguments of the algorithm. I think "base" and "dependent"
are clearer, what do you think?
The semicolon and equal sign are both reserved characters, so it’s
expected that Guile escapes them. If there’s a bug, it is in the 5.4
section of the RFC. However, I understand that it would be desirable for
the algorithm to accept such unescaped characters, since it works with
URIs in isolation and not in an HTTP frame or web page.
NEWS | 7 ++
doc/ref/web.texi | 27 +++++-
module/web/uri.scm | 161 +++++++++++++++++++++++++++++++++-
test-suite/tests/web-uri.test | 68 ++++++++++++++
4 files changed, 261 insertions(+), 2 deletions(-)
diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
* New interfaces and functionality
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
** New warning: unused-module
This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855b6..2267c9774 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
as @code{"scrambled%20eggs/biscuits%26gravy"}.
@end deffn
+@node Subtypes of URI
@subsubheading Subtypes of URI
As we noted above, not all URI objects have a scheme. You might have
@@ -356,6 +357,25 @@ Parse @var{string} into a URI object, while asserting that no scheme is
present. Return @code{#f} if the string could not be parsed.
@end deffn
+@cindex resolve URI reference
+In order to get a URI object from a base URI and a relative reference,
+one has to use a @dfn{relative URI reference resolution} algorithm. For
+instance, given a base URI, @samp{https://example.com/over/here}, and a
+relative reference, @samp{../no/there}, it may seem easy to get an
+absolute URI as @samp{https://example.com/over/../no/there}. It is
+possible that the server at @samp{https://example.com} could serve the
+same resource under this URL as
+@samp{https://example.com/no/there}. However, a web cache, or a linked
+data processor, must understand that the relative reference resolution
+leads to @samp{https://example.com/no/there}.
+
+@deffn {Scheme procedure} resolve-relative-reference @var{base} @var{dependent}
+Return a URI object representing @var{dependent}, using the components
+of @var{base} if missing, as defined in section 5.2 in RFC 3986. This
+function cannot return a relative reference (it can only return an
+absolute URI object), if either @var{base} or @var{dependent} is an
+absolute URI object.
+@end deffn
@node HTTP
@subsection The Hyper-Text Transfer Protocol
@@ -1038,7 +1058,8 @@ The entity-tag of the resource.
@deftypevr {HTTP Header} URI-reference location
A URI reference on which a request may be completed. Used in
combination with a redirecting status code to perform client-side
-redirection.
+redirection. @xref{Subtypes of URI, the distinction between types of
+URI}, for more information on relative references.
@example
(parse-header 'location "http://example.com/other")
@result{} #<uri ...>
@@ -1501,6 +1522,10 @@ constants, such as @code{certificate-status/signer-not-found} or
Connect to the server corresponding to @var{uri} and make a request over
HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.).
+@code{http-request} does not follow redirections. If a redirection is
+required, @code{http-request} returns a response object with an adequate
+response code (e.g. 301 or 302).
+
The following keyword arguments allow you to modify the requests in
various ways, for example attaching a body to the request, or setting
specific headers. The following table lists the keyword arguments and
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..acec2d1e8 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -47,7 +47,9 @@
uri-reference? relative-ref?
build-uri-reference build-relative-ref
- string->uri-reference string->relative-ref))
+ string->uri-reference string->relative-ref
+
+ resolve-relative-reference))
(define-record-type <uri>
(make-uri scheme userinfo host port path query fragment)
@@ -501,3 +503,160 @@ strings, and join the parts together with ‘/’ as a delimiter.
For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
(string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+ "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+ RFC3986, section 5.2.4."
+ (let scan ((input
+ (let ((components (split-and-decode-uri-path path)))
+ (if (string-suffix? "/" path)
+ `(,@components "")
+ components)))
+ (input-path-absolute? (string-prefix? "/" path))
+ (output '())
+ (output-absolute? #f)
+ (output-ends-in-/? (string-suffix? "/" path)))
+ (cond
+ ((and input-path-absolute?
+ (null? input))
+ ;; Transfer the initial "/" from the input to the end of the
+ ;; output.
+ (scan '() #f output output-absolute? #t))
+ ((null? input)
+ (string-append
+ (if output-absolute? "/" "")
+ (encode-and-join-uri-path
+ (reverse output))
+ (if output-ends-in-/? "/" "")))
+ ((and (not input-path-absolute?)
+ (or (equal? (car input) "..")
+ (equal? (car input) ".")))
+ (scan (cdr input) #f output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) "."))
+ (scan (cdr input) #t output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) ".."))
+ (scan (cdr input) #t
+ (if (null? output)
+ output
+ (cdr output))
+ ;; Remove the last segment, including the preceding /. So,
+ ;; if there is 0 or 1 segment, remove the root / too.
+ (if (or (null? output) (null? (cdr output)))
+ #f ;; remove the /
+ #t) ;; keep it
+ #f))
+ (else
+ (scan (cdr input)
+ ;; If there is only 1 item in input, then it does not end in
+ ;; /, so the recursive call does not start with
+ ;; /. Otherwise, the recursive call starts with /.
+ (not (null? (cdr input)))
+ (cons (car input) output)
+ ;; If the output is empty and the input path is absolute,
+ ;; the / of the transferred path is transferred as well.
+ (or output-absolute?
+ (and (null? output)
+ input-path-absolute?))
+ #f)))))
+
+(define (merge-paths base-has-authority? base dependent)
+ "Return @samp{@var{base}/@var{dependent}}, with the subtelties of absolute
+ paths explained in RFC3986, section 5.2.3. If the base URI has an
+authority (userinfo, host, port), then the processing is a bit
+different."
+ (if (and base-has-authority?
+ (equal? base ""))
+ (string-append "/" dependent)
+ (let ((last-/ (string-rindex base #\/)))
+ (if last-/
+ (string-append (substring base 0 last-/) "/" dependent)
+ dependent))))
+
+(define (resolve-relative-reference base dependent)
+ "Resolve @var{dependent} on top of @var{base}, as RFC3986, section
+5.2. Both @var{dependent} and @var{base} may be URI or relative
+references. The return value is a URI if either @var{dependent} or
+@var{base} is a URI."
+ ;; As opposed to RFC 3986, we use "dependent" instead of "relative" to
+ ;; avoid confusion between "URI" and "relative reference", the
+ ;; dependent URI may be either.
+ (let ((b-scheme (uri-scheme base))
+ (b-userinfo (uri-userinfo base))
+ (b-host (uri-host base))
+ (b-port (uri-port base))
+ (b-path (uri-path base))
+ (b-query (uri-query base))
+ (b-fragment (uri-fragment base))
+ (r-scheme (uri-scheme dependent))
+ (r-userinfo (uri-userinfo dependent))
+ (r-host (uri-host dependent))
+ (r-port (uri-port dependent))
+ (r-path (uri-path dependent))
+ (r-query (uri-query dependent))
+ (r-fragment (uri-fragment dependent))
+ (t-scheme #f)
+ (t-userinfo #f)
+ (t-host #f)
+ (t-port #f)
+ (t-path "")
+ (t-query #f)
+ (t-fragment #f))
+ ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
+
+ ;;The programming style uses mutations to better adhere to the
+ ;;algorithm specification.
+ (if r-scheme
+ (begin
+ (set! t-scheme r-scheme)
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined:
+ (begin
+ (if r-host
+ (begin
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined, r-authority is not defined:
+ (begin
+ (if (equal? r-path "")
+ (begin
+ (set! t-path b-path)
+ (if r-query
+ ;; r-scheme, r-authority, r-path are not
+ ;; defined:
+ (set! t-query r-query)
+ ;; r-scheme, r-authority, r-path, r-query are
+ ;; not defined:
+ (set! t-query b-query)))
+ ;; r-scheme, r-authority not defined, r-path defined:
+ (begin
+ (if (string-prefix? "/" r-path)
+ ;; r-scheme, r-authority not defined, r-path
+ ;; absolute:
+ (set! t-path (remove-dot-segments r-path))
+ ;; r-scheme, r-authority not defined, r-path
+ ;; dependent:
+ (set! t-path
+ (remove-dot-segments
+ (merge-paths b-host b-path r-path))))
+ (set! t-query r-query)))
+ (set! t-userinfo b-userinfo)
+ (set! t-host b-host)
+ (set! t-port b-port)))
+ (set! t-scheme b-scheme)))
+ (set! t-fragment r-fragment)
+ (build-uri-reference #:scheme t-scheme
+ #:userinfo t-userinfo
+ #:host t-host
+ #:port t-port
+ #:path t-path
+ #:query t-query
+ #:fragment t-fragment)))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..c453bf60f 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -20,6 +20,7 @@
(define-module (test-web-uri)
#:use-module (web uri)
#:use-module (ice-9 regex)
+ #:use-module (ice-9 string-fun)
#:use-module (test-suite lib))
@@ -693,3 +694,70 @@
(pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
(pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
(pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+ ;; Test suite in RFC3986, section 5.4.
+ (let ((base (string->uri "http://a/b/c/d;p?q"))
+ (equal/encoded?
+ ;; The test suite checks for ';' characters, but Guile escapes
+ ;; them in URIs. Same for '='.
+ (let ((escape-colon
+ (lambda (x)
+ (string-replace-substring x ";" "%3B")))
+ (escape-equal
+ (lambda (x)
+ (string-replace-substring x "=" "%3D"))))
+ (lambda (x y)
+ (equal? (escape-colon (escape-equal x))
+ (escape-colon (escape-equal y)))))))
+ (let ((resolve
+ (lambda (relative)
+ (let* ((relative-uri
+ (string->uri-reference relative))
+ (resolved-uri
+ (resolve-relative-reference base relative-uri))
+ (resolved (uri->string resolved-uri)))
+ resolved))))
+ (with-test-prefix "normal"
+ (pass-if (equal/encoded? (resolve "g:h") "g:h"))
+ (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "/g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "//g") "http://g"))
+ (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y"))
+ (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y"))
+ (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s"))
+ (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s"))
+ (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x"))
+ (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s"))
+ (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q"))
+ (pass-if (equal/encoded? (resolve ".") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "./") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "..") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "../..") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../g") "http://a/g")))
+ (with-test-prefix "abnormal"
+ (pass-if (equal/encoded? (resolve "../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/./g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g."))
+ (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g"))
+ (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g.."))
+ (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g"))
+ (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h"))
+ (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h"))
+ (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y"))
+ (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y"))
+ (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x"))
+ (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x"))
+ (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x"))
+ (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x"))
+ (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
+
base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v3] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-03 18:49 ` Maxime Devos
@ 2023-09-25 16:48 ` Vivien Kraus
2023-10-03 18:56 ` [PATCH v2] " Dale Mellor
1 sibling, 0 replies; 21+ messages in thread
From: Vivien Kraus @ 2023-09-25 16:48 UTC (permalink / raw)
To: Maxime Devos, guile-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 18457 bytes --]
* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
* doc/ref/web.texi (URIs): Document it here.
(Subtypes of URI): Add a @node declaration to cross-reference it.
(HTTP Headers) [location]: Point to the section for different URI types.
(Web Client) [http-request]: Indicate that no redirection is performed,
and warn about blindly following them.
* AUTHORS: Mention me.
* THANKS: Thank me.
---
Hello! I added a warning about following redirections, and reverted to
"relative" with a warning. "maybe-relative" is not a term in the RFC,
and it may lead people to think the "base" cannot be a relative
reference (it can).
I did as if I kept the copyright, but I would prefer to it to the FSF.
Best regards,
Vivien
AUTHORS | 8 ++
NEWS | 7 ++
THANKS | 1 +
doc/ref/web.texi | 43 ++++++++-
module/web/uri.scm | 161 +++++++++++++++++++++++++++++++++-
test-suite/tests/web-uri.test | 67 ++++++++++++++
6 files changed, 285 insertions(+), 2 deletions(-)
diff --git a/AUTHORS b/AUTHORS
index d756a74ce..2a95d3b0b 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore:
Gregory Marton:
In the subdirectory test-suite/tests, changes to:
hash.test
+
+Vivien Kraus:
+In the subdirectory module/web, changes to:
+ uri.scm
+In the subdirectory doc/ref, changes to:
+ web.texi
+In the subdirectory test-suite/tests, changes to:
+ web-uri.test
diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
* New interfaces and functionality
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
** New warning: unused-module
This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/THANKS b/THANKS
index aa4877e95..a1f982f04 100644
--- a/THANKS
+++ b/THANKS
@@ -19,6 +19,7 @@ Contributors since the last release:
Chris K Jester-Young
David Kastrup
Daniel Kraft
+ Vivien Kraus
Daniel Krueger
Noah Lavine
Christopher Lemmer Webber
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855b6..a97e4cc95 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
as @code{"scrambled%20eggs/biscuits%26gravy"}.
@end deffn
+@node Subtypes of URI
@subsubheading Subtypes of URI
As we noted above, not all URI objects have a scheme. You might have
@@ -356,6 +357,32 @@ Parse @var{string} into a URI object, while asserting that no scheme is
present. Return @code{#f} if the string could not be parsed.
@end deffn
+@cindex resolve URI reference
+In order to get a URI object from a base URI and a relative reference,
+one has to use a @dfn{relative URI reference resolution} algorithm. For
+instance, given a base URI, @samp{https://example.com/over/here}, and a
+relative reference, @samp{../no/there}, it may seem easy to get an
+absolute URI as @samp{https://example.com/over/../no/there}. It is
+possible that the server at @samp{https://example.com} could serve the
+same resource under this URL as
+@samp{https://example.com/no/there}. However, a web cache, or a linked
+data processor, must understand that the relative reference resolution
+leads to @samp{https://example.com/no/there}.
+
+@deffn {Scheme procedure} resolve-relative-reference @var{base} @var{relative}
+Return a URI object representing @var{relative}, using the components of
+@var{base} if missing, as defined in section 5.2 in RFC 3986. Both
+@var{base} and @var{relative} may be full URI or relative URI
+references. The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type. This function cannot return a relative
+reference (it can only return an absolute URI object), if either
+@var{base} or @var{relative} is an absolute URI object.
+
+Please note that any part of @var{base} may be overriden by
+@var{relative}. For instance, if @var{base} has a @code{https} URI
+scheme, and if @var{relative} has a @code{http} scheme, then the result
+will have a @code{http} scheme.
+@end deffn
@node HTTP
@subsection The Hyper-Text Transfer Protocol
@@ -1038,7 +1065,8 @@ The entity-tag of the resource.
@deftypevr {HTTP Header} URI-reference location
A URI reference on which a request may be completed. Used in
combination with a redirecting status code to perform client-side
-redirection.
+redirection. @xref{Subtypes of URI, the distinction between types of
+URI}, for more information on relative references.
@example
(parse-header 'location "http://example.com/other")
@result{} #<uri ...>
@@ -1501,6 +1529,19 @@ constants, such as @code{certificate-status/signer-not-found} or
Connect to the server corresponding to @var{uri} and make a request over
HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.).
+@code{http-request} does not follow redirections. If a redirection is
+required, @code{http-request} returns a response object with an adequate
+response code (e.g. 301 or 302).
+
+Be warned that if you are hosting a private HTTP(s) server on your
+system, then you should be careful when sending requests to public
+servers. A malicious public server might forge its DNS record to point
+to your internal address. It might also redirect you to your internal
+server. In the first case, or if you follow the redirection of the
+second case, then you could wrongly consider the response originating
+from your private server as public, since it would appear as originating
+from the public server.
+
The following keyword arguments allow you to modify the requests in
various ways, for example attaching a body to the request, or setting
specific headers. The following table lists the keyword arguments and
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..319010097 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -1,6 +1,7 @@
;;;; (web uri) --- URI manipulation tools
;;;;
;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc.
+;;;; Copyright (C) 2023 Vivien Kraus
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -47,7 +48,9 @@
uri-reference? relative-ref?
build-uri-reference build-relative-ref
- string->uri-reference string->relative-ref))
+ string->uri-reference string->relative-ref
+
+ resolve-relative-reference))
(define-record-type <uri>
(make-uri scheme userinfo host port path query fragment)
@@ -501,3 +504,159 @@ strings, and join the parts together with ‘/’ as a delimiter.
For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
(string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+ "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+ RFC3986, section 5.2.4."
+ (let scan ((input
+ (let ((components (split-and-decode-uri-path path)))
+ (if (string-suffix? "/" path)
+ `(,@components "")
+ components)))
+ (input-path-absolute? (string-prefix? "/" path))
+ (output '())
+ (output-absolute? #f)
+ (output-ends-in-/? (string-suffix? "/" path)))
+ (cond
+ ((and input-path-absolute?
+ (null? input))
+ ;; Transfer the initial "/" from the input to the end of the
+ ;; output.
+ (scan '() #f output output-absolute? #t))
+ ((null? input)
+ (string-append
+ (if output-absolute? "/" "")
+ (encode-and-join-uri-path
+ (reverse output))
+ (if output-ends-in-/? "/" "")))
+ ((and (not input-path-absolute?)
+ (or (equal? (car input) "..")
+ (equal? (car input) ".")))
+ (scan (cdr input) #f output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) "."))
+ (scan (cdr input) #t output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) ".."))
+ (scan (cdr input) #t
+ (if (null? output)
+ output
+ (cdr output))
+ ;; Remove the last segment, including the preceding /. So,
+ ;; if there is 0 or 1 segment, remove the root / too.
+ (if (or (null? output) (null? (cdr output)))
+ #f ;; remove the /
+ #t) ;; keep it
+ #f))
+ (else
+ (scan (cdr input)
+ ;; If there is only 1 item in input, then it does not end in
+ ;; /, so the recursive call does not start with
+ ;; /. Otherwise, the recursive call starts with /.
+ (not (null? (cdr input)))
+ (cons (car input) output)
+ ;; If the output is empty and the input path is absolute,
+ ;; the / of the transferred path is transferred as well.
+ (or output-absolute?
+ (and (null? output)
+ input-path-absolute?))
+ #f)))))
+
+(define (merge-paths base-has-authority? base relative)
+ "Return @samp{@var{base}/@var{relative}}, with the subtleties of
+absolute paths explained in RFC3986, section 5.2.3. If the base URI has
+an authority (userinfo, host, port), then the processing is a bit
+different."
+ (if (and base-has-authority?
+ (equal? base ""))
+ (string-append "/" relative)
+ (let ((last-/ (string-rindex base #\/)))
+ (if last-/
+ (string-append (substring base 0 last-/) "/" relative)
+ relative))))
+
+(define (resolve-relative-reference base relative)
+ "Resolve @var{relative} on top of @var{base}, as RFC3986, section
+5.2. Both @var{relative} and @var{base} may be URI or relative
+references. The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type. Both @var{base} and @var{relative} may be
+full URIs or relative references. The return value is a URI if either
+@var{relative} or @var{base} is a URI."
+ (let ((b-scheme (uri-scheme base))
+ (b-userinfo (uri-userinfo base))
+ (b-host (uri-host base))
+ (b-port (uri-port base))
+ (b-path (uri-path base))
+ (b-query (uri-query base))
+ (b-fragment (uri-fragment base))
+ (r-scheme (uri-scheme relative))
+ (r-userinfo (uri-userinfo relative))
+ (r-host (uri-host relative))
+ (r-port (uri-port relative))
+ (r-path (uri-path relative))
+ (r-query (uri-query relative))
+ (r-fragment (uri-fragment relative))
+ (t-scheme #f)
+ (t-userinfo #f)
+ (t-host #f)
+ (t-port #f)
+ (t-path "")
+ (t-query #f)
+ (t-fragment #f))
+ ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
+
+ ;;The programming style uses mutations to better adhere to the
+ ;;algorithm specification.
+ (if r-scheme
+ (begin
+ (set! t-scheme r-scheme)
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined:
+ (begin
+ (if r-host
+ (begin
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined, r-authority is not defined:
+ (begin
+ (if (equal? r-path "")
+ (begin
+ (set! t-path b-path)
+ (if r-query
+ ;; r-scheme, r-authority, r-path are not
+ ;; defined:
+ (set! t-query r-query)
+ ;; r-scheme, r-authority, r-path, r-query are
+ ;; not defined:
+ (set! t-query b-query)))
+ ;; r-scheme, r-authority not defined, r-path defined:
+ (begin
+ (if (string-prefix? "/" r-path)
+ ;; r-scheme, r-authority not defined, r-path
+ ;; absolute:
+ (set! t-path (remove-dot-segments r-path))
+ ;; r-scheme, r-authority not defined, r-path
+ ;; relative:
+ (set! t-path
+ (remove-dot-segments
+ (merge-paths b-host b-path r-path))))
+ (set! t-query r-query)))
+ (set! t-userinfo b-userinfo)
+ (set! t-host b-host)
+ (set! t-port b-port)))
+ (set! t-scheme b-scheme)))
+ (set! t-fragment r-fragment)
+ (build-uri-reference #:scheme t-scheme
+ #:userinfo t-userinfo
+ #:host t-host
+ #:port t-port
+ #:path t-path
+ #:query t-query
+ #:fragment t-fragment)))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..cdd0dc7b6 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -20,6 +20,7 @@
(define-module (test-web-uri)
#:use-module (web uri)
#:use-module (ice-9 regex)
+ #:use-module (ice-9 string-fun)
#:use-module (test-suite lib))
@@ -693,3 +694,69 @@
(pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
(pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
(pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+ ;; Test suite in RFC3986, section 5.4.
+ (let ((base (string->uri "http://a/b/c/d;p?q"))
+ (equal/encoded?
+ ;; The test suite checks for ';' characters, but Guile escapes
+ ;; them in URIs. Same for '='.
+ (let ((escape-colon
+ (lambda (x)
+ (string-replace-substring x ";" "%3B")))
+ (escape-equal
+ (lambda (x)
+ (string-replace-substring x "=" "%3D"))))
+ (lambda (x y)
+ (equal? (escape-colon (escape-equal x))
+ (escape-colon (escape-equal y)))))))
+ (let ((resolve
+ (lambda (relative)
+ (let* ((relative-uri
+ (string->uri-reference relative))
+ (resolved-uri
+ (resolve-relative-reference base relative-uri))
+ (resolved (uri->string resolved-uri)))
+ resolved))))
+ (with-test-prefix "normal"
+ (pass-if (equal/encoded? (resolve "g:h") "g:h"))
+ (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "/g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "//g") "http://g"))
+ (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y"))
+ (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y"))
+ (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s"))
+ (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s"))
+ (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x"))
+ (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s"))
+ (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q"))
+ (pass-if (equal/encoded? (resolve ".") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "./") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "..") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "../..") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../g") "http://a/g")))
+ (with-test-prefix "abnormal"
+ (pass-if (equal/encoded? (resolve "../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/./g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g."))
+ (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g"))
+ (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g.."))
+ (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g"))
+ (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h"))
+ (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h"))
+ (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y"))
+ (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y"))
+ (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x"))
+ (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x"))
+ (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x"))
+ (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x"))
+ (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-10 21:44 ` Maxime Devos
@ 2023-09-25 16:48 ` Vivien Kraus
2023-11-02 20:00 ` Nathan via Developers list for Guile, the GNU extensibility library
0 siblings, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-09-25 16:48 UTC (permalink / raw)
To: Maxime Devos, guile-devel
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1: Type: text/plain, Size: 18074 bytes --]
* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
* doc/ref/web.texi (URIs): Document it here.
(Subtypes of URI): Add a @node declaration to cross-reference it.
(HTTP Headers) [location]: Point to the section for different URI types.
(Web Client) [http-request]: Indicate that no redirection is performed,
and warn about blindly following them.
* AUTHORS: Mention me.
* THANKS: Thank me.
---
Hello,
Is this wording better?
Best regards,
Vivien
AUTHORS | 8 ++
NEWS | 7 ++
THANKS | 1 +
doc/ref/web.texi | 41 ++++++++-
module/web/uri.scm | 161 +++++++++++++++++++++++++++++++++-
test-suite/tests/web-uri.test | 67 ++++++++++++++
6 files changed, 283 insertions(+), 2 deletions(-)
diff --git a/AUTHORS b/AUTHORS
index d756a74ce..2a95d3b0b 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore:
Gregory Marton:
In the subdirectory test-suite/tests, changes to:
hash.test
+
+Vivien Kraus:
+In the subdirectory module/web, changes to:
+ uri.scm
+In the subdirectory doc/ref, changes to:
+ web.texi
+In the subdirectory test-suite/tests, changes to:
+ web-uri.test
diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
* New interfaces and functionality
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
** New warning: unused-module
This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/THANKS b/THANKS
index aa4877e95..a1f982f04 100644
--- a/THANKS
+++ b/THANKS
@@ -19,6 +19,7 @@ Contributors since the last release:
Chris K Jester-Young
David Kastrup
Daniel Kraft
+ Vivien Kraus
Daniel Krueger
Noah Lavine
Christopher Lemmer Webber
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855b6..d92a8d51a 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
as @code{"scrambled%20eggs/biscuits%26gravy"}.
@end deffn
+@node Subtypes of URI
@subsubheading Subtypes of URI
As we noted above, not all URI objects have a scheme. You might have
@@ -356,6 +357,32 @@ Parse @var{string} into a URI object, while asserting that no scheme is
present. Return @code{#f} if the string could not be parsed.
@end deffn
+@cindex resolve URI reference
+In order to get a URI object from a base URI and a relative reference,
+one has to use a @dfn{relative URI reference resolution} algorithm. For
+instance, given a base URI, @samp{https://example.com/over/here}, and a
+relative reference, @samp{../no/there}, it may seem easy to get an
+absolute URI as @samp{https://example.com/over/../no/there}. It is
+possible that the server at @samp{https://example.com} could serve the
+same resource under this URL as
+@samp{https://example.com/no/there}. However, a web cache, or a linked
+data processor, must understand that the relative reference resolution
+leads to @samp{https://example.com/no/there}.
+
+@deffn {Scheme procedure} resolve-relative-reference @var{base} @var{relative}
+Return a URI object representing @var{relative}, using the components of
+@var{base} if missing, as defined in section 5.2 in RFC 3986. Both
+@var{base} and @var{relative} may be full URI or relative URI
+references. The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type. This function cannot return a relative
+reference (it can only return an absolute URI object), if either
+@var{base} or @var{relative} is an absolute URI object.
+
+Please note that any part of @var{base} may be overriden by
+@var{relative}. For instance, if @var{base} has a @code{https} URI
+scheme, and if @var{relative} has a @code{http} scheme, then the result
+will have a @code{http} scheme.
+@end deffn
@node HTTP
@subsection The Hyper-Text Transfer Protocol
@@ -1038,7 +1065,8 @@ The entity-tag of the resource.
@deftypevr {HTTP Header} URI-reference location
A URI reference on which a request may be completed. Used in
combination with a redirecting status code to perform client-side
-redirection.
+redirection. @xref{Subtypes of URI, the distinction between types of
+URI}, for more information on relative references.
@example
(parse-header 'location "http://example.com/other")
@result{} #<uri ...>
@@ -1501,6 +1529,17 @@ constants, such as @code{certificate-status/signer-not-found} or
Connect to the server corresponding to @var{uri} and make a request over
HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.).
+@code{http-request} does not follow redirections. If a redirection is
+required, @code{http-request} returns a response object with an adequate
+response code (e.g. 301 or 302).
+
+Making web requests on a network where private servers are hosted comes
+with potential security risks. A malicious public server might forge
+its DNS record to point to your internal address. It might also
+redirect you to your internal server. In the first case, or if you
+follow the redirection of the second case, then you may accidentally
+connect to your private server as if it were public.
+
The following keyword arguments allow you to modify the requests in
various ways, for example attaching a body to the request, or setting
specific headers. The following table lists the keyword arguments and
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..319010097 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -1,6 +1,7 @@
;;;; (web uri) --- URI manipulation tools
;;;;
;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc.
+;;;; Copyright (C) 2023 Vivien Kraus
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -47,7 +48,9 @@
uri-reference? relative-ref?
build-uri-reference build-relative-ref
- string->uri-reference string->relative-ref))
+ string->uri-reference string->relative-ref
+
+ resolve-relative-reference))
(define-record-type <uri>
(make-uri scheme userinfo host port path query fragment)
@@ -501,3 +504,159 @@ strings, and join the parts together with ‘/’ as a delimiter.
For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
(string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+ "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+ RFC3986, section 5.2.4."
+ (let scan ((input
+ (let ((components (split-and-decode-uri-path path)))
+ (if (string-suffix? "/" path)
+ `(,@components "")
+ components)))
+ (input-path-absolute? (string-prefix? "/" path))
+ (output '())
+ (output-absolute? #f)
+ (output-ends-in-/? (string-suffix? "/" path)))
+ (cond
+ ((and input-path-absolute?
+ (null? input))
+ ;; Transfer the initial "/" from the input to the end of the
+ ;; output.
+ (scan '() #f output output-absolute? #t))
+ ((null? input)
+ (string-append
+ (if output-absolute? "/" "")
+ (encode-and-join-uri-path
+ (reverse output))
+ (if output-ends-in-/? "/" "")))
+ ((and (not input-path-absolute?)
+ (or (equal? (car input) "..")
+ (equal? (car input) ".")))
+ (scan (cdr input) #f output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) "."))
+ (scan (cdr input) #t output output-absolute? output-ends-in-/?))
+ ((and input-path-absolute?
+ (equal? (car input) ".."))
+ (scan (cdr input) #t
+ (if (null? output)
+ output
+ (cdr output))
+ ;; Remove the last segment, including the preceding /. So,
+ ;; if there is 0 or 1 segment, remove the root / too.
+ (if (or (null? output) (null? (cdr output)))
+ #f ;; remove the /
+ #t) ;; keep it
+ #f))
+ (else
+ (scan (cdr input)
+ ;; If there is only 1 item in input, then it does not end in
+ ;; /, so the recursive call does not start with
+ ;; /. Otherwise, the recursive call starts with /.
+ (not (null? (cdr input)))
+ (cons (car input) output)
+ ;; If the output is empty and the input path is absolute,
+ ;; the / of the transferred path is transferred as well.
+ (or output-absolute?
+ (and (null? output)
+ input-path-absolute?))
+ #f)))))
+
+(define (merge-paths base-has-authority? base relative)
+ "Return @samp{@var{base}/@var{relative}}, with the subtleties of
+absolute paths explained in RFC3986, section 5.2.3. If the base URI has
+an authority (userinfo, host, port), then the processing is a bit
+different."
+ (if (and base-has-authority?
+ (equal? base ""))
+ (string-append "/" relative)
+ (let ((last-/ (string-rindex base #\/)))
+ (if last-/
+ (string-append (substring base 0 last-/) "/" relative)
+ relative))))
+
+(define (resolve-relative-reference base relative)
+ "Resolve @var{relative} on top of @var{base}, as RFC3986, section
+5.2. Both @var{relative} and @var{base} may be URI or relative
+references. The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type. Both @var{base} and @var{relative} may be
+full URIs or relative references. The return value is a URI if either
+@var{relative} or @var{base} is a URI."
+ (let ((b-scheme (uri-scheme base))
+ (b-userinfo (uri-userinfo base))
+ (b-host (uri-host base))
+ (b-port (uri-port base))
+ (b-path (uri-path base))
+ (b-query (uri-query base))
+ (b-fragment (uri-fragment base))
+ (r-scheme (uri-scheme relative))
+ (r-userinfo (uri-userinfo relative))
+ (r-host (uri-host relative))
+ (r-port (uri-port relative))
+ (r-path (uri-path relative))
+ (r-query (uri-query relative))
+ (r-fragment (uri-fragment relative))
+ (t-scheme #f)
+ (t-userinfo #f)
+ (t-host #f)
+ (t-port #f)
+ (t-path "")
+ (t-query #f)
+ (t-fragment #f))
+ ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
+
+ ;;The programming style uses mutations to better adhere to the
+ ;;algorithm specification.
+ (if r-scheme
+ (begin
+ (set! t-scheme r-scheme)
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined:
+ (begin
+ (if r-host
+ (begin
+ (set! t-userinfo r-userinfo)
+ (set! t-host r-host)
+ (set! t-port r-port)
+ (set! t-path (remove-dot-segments r-path))
+ (set! t-query r-query))
+ ;; r-scheme is not defined, r-authority is not defined:
+ (begin
+ (if (equal? r-path "")
+ (begin
+ (set! t-path b-path)
+ (if r-query
+ ;; r-scheme, r-authority, r-path are not
+ ;; defined:
+ (set! t-query r-query)
+ ;; r-scheme, r-authority, r-path, r-query are
+ ;; not defined:
+ (set! t-query b-query)))
+ ;; r-scheme, r-authority not defined, r-path defined:
+ (begin
+ (if (string-prefix? "/" r-path)
+ ;; r-scheme, r-authority not defined, r-path
+ ;; absolute:
+ (set! t-path (remove-dot-segments r-path))
+ ;; r-scheme, r-authority not defined, r-path
+ ;; relative:
+ (set! t-path
+ (remove-dot-segments
+ (merge-paths b-host b-path r-path))))
+ (set! t-query r-query)))
+ (set! t-userinfo b-userinfo)
+ (set! t-host b-host)
+ (set! t-port b-port)))
+ (set! t-scheme b-scheme)))
+ (set! t-fragment r-fragment)
+ (build-uri-reference #:scheme t-scheme
+ #:userinfo t-userinfo
+ #:host t-host
+ #:port t-port
+ #:path t-path
+ #:query t-query
+ #:fragment t-fragment)))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..cdd0dc7b6 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -20,6 +20,7 @@
(define-module (test-web-uri)
#:use-module (web uri)
#:use-module (ice-9 regex)
+ #:use-module (ice-9 string-fun)
#:use-module (test-suite lib))
@@ -693,3 +694,69 @@
(pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
(pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
(pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+ ;; Test suite in RFC3986, section 5.4.
+ (let ((base (string->uri "http://a/b/c/d;p?q"))
+ (equal/encoded?
+ ;; The test suite checks for ';' characters, but Guile escapes
+ ;; them in URIs. Same for '='.
+ (let ((escape-colon
+ (lambda (x)
+ (string-replace-substring x ";" "%3B")))
+ (escape-equal
+ (lambda (x)
+ (string-replace-substring x "=" "%3D"))))
+ (lambda (x y)
+ (equal? (escape-colon (escape-equal x))
+ (escape-colon (escape-equal y)))))))
+ (let ((resolve
+ (lambda (relative)
+ (let* ((relative-uri
+ (string->uri-reference relative))
+ (resolved-uri
+ (resolve-relative-reference base relative-uri))
+ (resolved (uri->string resolved-uri)))
+ resolved))))
+ (with-test-prefix "normal"
+ (pass-if (equal/encoded? (resolve "g:h") "g:h"))
+ (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g"))
+ (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "/g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "//g") "http://g"))
+ (pass-if (equal/encoded? (resolve "?y") "http://a/b/c/d;p?y"))
+ (pass-if (equal/encoded? (resolve "g?y") "http://a/b/c/g?y"))
+ (pass-if (equal/encoded? (resolve "#s") "http://a/b/c/d;p?q#s"))
+ (pass-if (equal/encoded? (resolve "g?y#s") "http://a/b/c/g?y#s"))
+ (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x"))
+ (pass-if (equal/encoded? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s"))
+ (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q"))
+ (pass-if (equal/encoded? (resolve ".") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "./") "http://a/b/c/"))
+ (pass-if (equal/encoded? (resolve "..") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../") "http://a/b/"))
+ (pass-if (equal/encoded? (resolve "../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "../..") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../") "http://a/"))
+ (pass-if (equal/encoded? (resolve "../../g") "http://a/g")))
+ (with-test-prefix "abnormal"
+ (pass-if (equal/encoded? (resolve "../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "../../../../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/./g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "/../g") "http://a/g"))
+ (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g."))
+ (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g"))
+ (pass-if (equal/encoded? (resolve "g..") "http://a/b/c/g.."))
+ (pass-if (equal/encoded? (resolve "..g") "http://a/b/c/..g"))
+ (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g"))
+ (pass-if (equal/encoded? (resolve "./g/.") "http://a/b/c/g/"))
+ (pass-if (equal/encoded? (resolve "g/./h") "http://a/b/c/g/h"))
+ (pass-if (equal/encoded? (resolve "g/../h") "http://a/b/c/h"))
+ (pass-if (equal/encoded? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y"))
+ (pass-if (equal/encoded? (resolve "g;x=1/../y") "http://a/b/c/y"))
+ (pass-if (equal/encoded? (resolve "g?y/./x") "http://a/b/c/g?y/./x"))
+ (pass-if (equal/encoded? (resolve "g?y/../x") "http://a/b/c/g?y/../x"))
+ (pass-if (equal/encoded? (resolve "g#s/./x") "http://a/b/c/g#s/./x"))
+ (pass-if (equal/encoded? (resolve "g#s/../x") "http://a/b/c/g#s/../x"))
+ (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
base-commit: 79e836b8cc601a1259c934000a953a8d739ddd6f
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-09-25 16:48 [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2 Vivien Kraus
@ 2023-09-25 20:46 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v2] " Vivien Kraus
2023-10-03 20:03 ` [PATCH] " Vivien Kraus
0 siblings, 2 replies; 21+ messages in thread
From: Maxime Devos @ 2023-09-25 20:46 UTC (permalink / raw)
To: Vivien Kraus, guile-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 5390 bytes --]
Op 25-09-2023 om 18:48 schreef Vivien Kraus:
> * module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
> (merge-paths): Implement algorithm 5.2.3.
> (resolve-relative-reference): Implement algorithm 5.2.2.
> (module): Export resolve-relative-reference.
> * NEWS: Reference it here.
> ---
> Dear Guile developers,
>
> When you request https://example.com/resource an URI and get redirected
> to "here", you end up with 2 URI references:
>
> - https://example.com/resource
> - here
>
> What should you request next? The answer is,
> "https://example.com/here". It seems evident how we go from one to the
> other.
>
> However, there are more subtle cases. What if you get redirected to
> "../here", for instance?
>
> RFC 3986 has you covered, in section 5.2. It explains how we go from a
> base URI and a URI reference to the new URI.
> What do you think?
>
> Best regards,
Sounds very useful. However, there are also some dangers on doing this
thing -- the ‘external’ page https://example.com/data.json could
redirect to
http://localhost/unsecured-secret-but-its-localhost-only-so-it-is-safe.
I forgot the name of this attack, but there is probably a page somewhere
that documents the danger and how to mitigate it (I think I read some
Firefox documentation somewhere?). Perhaps there exists an informative
RFC about it? I think it put a warning about this somewhere in the
documentation.
(Another related problem is that example.com could have IP address ::1,
but that's a different problem.)
>
> Vivien
>
> NEWS | 7 ++
> module/web/uri.scm | 152 +++++++++++++++++++++++++++++++++-
> test-suite/tests/web-uri.test | 68 +++++++++++++++
You forgot modifying the non-docstring documentation to properly
document the new procedure.
> 3 files changed, 226 insertions(+), 1 deletion(-)
Given the existence of resolve-relative-reference, it is easy to expect
http-request to do redirection. I think it would be best to adjust to
the documentation of http-request / http-get / ... to mention whether
automatic redirection is done or not.
> +(define (resolve-relative-reference base relative)
> + "Resolve @var{relative} on top of @var{base}, as RFC3986, section 5.2."
I don't like the mutation, but it's a completely deterministic procedure
(ignoring memory allocation) so it can't cause problems and hence I
suppose it's not worth rewriting (unless you or someone else really
wants to rewrite it, I suppose).
I suppose it also avoids the risk of accidentally deviating from the RFC
it is supposed to implement, which is major advantage of sticking with
the mutation.
I like that you say _which_ resolution method you are using instead of
saying or implying that this is the always the _right_ way of resolving
relative references, because some URI schemes are rather quirky.
(I don't know any quirkiness w.r.t. relative references, but wouldn't be
surprised if it exists.)
Also I think it's worth stating that both base and relative are URIs --
with the current docstring, I find (resolve-... uri "./whatever") a
reasonable thing to do.
IIUC, there currently is nothing preventing
(resolve-... (uri object for "https://example.com/a")
(uri object for "https://example.com/b")).
IIUC, this is supposed to return (uri object for
"https://example.com/b"), but that could be more explicit with a change
of variable name.
(define (resolve-... base maybe-relative)
[...])
> +(with-test-prefix "resolve relative reference"
> + ;; Test suite in RFC3986, section 5.4.
> + (let ((base (string->uri "http://a/b/c/d;p?q"))
> + (equal/encoded?
> + ;; The test suite checks for ';' characters, but Guile escapes
> + ;; them in URIs. Same for '='.
IIUC, that's a bug!
6.2.2.2. Percent-Encoding Normalization
The percent-encoding mechanism (Section 2.1) is a frequent source of
variance among otherwise identical URIs. In addition to the case
normalization issue noted above, some URI producers percent-encode
octets that do not require percent-encoding, resulting in URIs that
are equivalent to their non-encoded counterparts. __These URIs
__should be normalized by decoding any percent-encoded octet that
corresponds to an unreserved character, as described in
Section 2.3.__
(Emphasis added.)
I am assuming here that ; is an unreserved character, if it isn't, there
isn't a bug here.
However, I sense a lack of normalisation in resolve-relative-reference,
so unless Guile already does normalisation elsewhere (perhaps it does
during the URI object construction?), there might be a bug here -- ok,
technically perhaps not a bug because the docstring only mentions
‘implements section 5.2 and 5.2. doesn't seem to mention section 6 (and
section 6 says ‘should’, not ‘shall/must’, but some people use ‘should’
as ‘shall/must’, so dunno), but in that case that's still a potential
pitfall that could be mentioned in the docstring. Could be as simple as
"No normalisation is performed.".
I guess it shouldn't do normalisation, but guesswork seems better to be
avoided/confirmed or disconfirmed when possible.
Best regards,
Maxime Devos.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-09-25 16:48 ` [PATCH v2] " Vivien Kraus
@ 2023-10-02 16:32 ` Vivien Kraus
2023-10-03 18:49 ` Maxime Devos
0 siblings, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-10-02 16:32 UTC (permalink / raw)
To: Maxime Devos, guile-devel
Hi!
Are there other things to fix?
Best regards,
Vivien
Le lundi 25 septembre 2023 à 18:48 +0200, Vivien Kraus a écrit :
> * module/web/uri.scm (remove-dot-segments): Implement algorithm
> 5.2.4.
> (merge-paths): Implement algorithm 5.2.3.
> (resolve-relative-reference): Implement algorithm 5.2.2.
> (module): Export resolve-relative-reference.
> * NEWS: Reference it here.
> * doc/ref/web.texi (URIs): Document it here.
> (Subtypes of URI): Add a @node declaration to cross-reference it.
> (HTTP Headers) [location]: Point to the section for different URI
> types.
> (Web Client) [http-request]: Indicate that no redirection is
> performed.
> ---
>
> I clarified the situation about redirections. I don’t think it’s
> Guile’s
> job to do it. For permanent redirections (301), the application
> developer is supposed to edit the pages that point to the now-moved
> resource anyway. A handful of security issues must also be lurking in
> the shadows, and I don’t think it should be a responsibility for the
> Guile web client.
>
> The specification uses the word "relative" both for the type of URI
> that
> is most likely to be found, and to express the asymmetric relation
> between both arguments of the algorithm. I think "base" and
> "dependent"
> are clearer, what do you think?
>
> The semicolon and equal sign are both reserved characters, so it’s
> expected that Guile escapes them. If there’s a bug, it is in the 5.4
> section of the RFC. However, I understand that it would be desirable
> for
> the algorithm to accept such unescaped characters, since it works
> with
> URIs in isolation and not in an HTTP frame or web page.
>
> NEWS | 7 ++
> doc/ref/web.texi | 27 +++++-
> module/web/uri.scm | 161
> +++++++++++++++++++++++++++++++++-
> test-suite/tests/web-uri.test | 68 ++++++++++++++
> 4 files changed, 261 insertions(+), 2 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index b319404d7..bdf75cb3c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
>
> * New interfaces and functionality
>
> +** New function in (web uri): resolve-relative-reference
> +
> +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It
> may
> +be used to request a moved resource in case of a 301 or 302 HTTP
> +response, by resolving the Location value of the response on top of
> the
> +requested URI.
> +
> ** New warning: unused-module
>
> This analysis, enabled at `-W2', issues warnings for modules that
> appear
> diff --git a/doc/ref/web.texi b/doc/ref/web.texi
> index 607c855b6..2267c9774 100644
> --- a/doc/ref/web.texi
> +++ b/doc/ref/web.texi
> @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs"
> "biscuits&gravy")} encodes
> as @code{"scrambled%20eggs/biscuits%26gravy"}.
> @end deffn
>
> +@node Subtypes of URI
> @subsubheading Subtypes of URI
>
> As we noted above, not all URI objects have a scheme. You might
> have
> @@ -356,6 +357,25 @@ Parse @var{string} into a URI object, while
> asserting that no scheme is
> present. Return @code{#f} if the string could not be parsed.
> @end deffn
>
> +@cindex resolve URI reference
> +In order to get a URI object from a base URI and a relative
> reference,
> +one has to use a @dfn{relative URI reference resolution} algorithm.
> For
> +instance, given a base URI, @samp{https://example.com/over/here},
> and a
> +relative reference, @samp{../no/there}, it may seem easy to get an
> +absolute URI as @samp{https://example.com/over/../no/there}. It is
> +possible that the server at @samp{https://example.com} could serve
> the
> +same resource under this URL as
> +@samp{https://example.com/no/there}. However, a web cache, or a
> linked
> +data processor, must understand that the relative reference
> resolution
> +leads to @samp{https://example.com/no/there}.
> +
> +@deffn {Scheme procedure} resolve-relative-reference @var{base}
> @var{dependent}
> +Return a URI object representing @var{dependent}, using the
> components
> +of @var{base} if missing, as defined in section 5.2 in RFC 3986.
> This
> +function cannot return a relative reference (it can only return an
> +absolute URI object), if either @var{base} or @var{dependent} is an
> +absolute URI object.
> +@end deffn
>
> @node HTTP
> @subsection The Hyper-Text Transfer Protocol
> @@ -1038,7 +1058,8 @@ The entity-tag of the resource.
> @deftypevr {HTTP Header} URI-reference location
> A URI reference on which a request may be completed. Used in
> combination with a redirecting status code to perform client-side
> -redirection.
> +redirection. @xref{Subtypes of URI, the distinction between types of
> +URI}, for more information on relative references.
> @example
> (parse-header 'location "http://example.com/other")
> @result{} #<uri ...>
> @@ -1501,6 +1522,10 @@ constants, such as @code{certificate-
> status/signer-not-found} or
> Connect to the server corresponding to @var{uri} and make a request
> over
> HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST},
> etc.).
>
> +@code{http-request} does not follow redirections. If a redirection
> is
> +required, @code{http-request} returns a response object with an
> adequate
> +response code (e.g. 301 or 302).
> +
> The following keyword arguments allow you to modify the requests in
> various ways, for example attaching a body to the request, or
> setting
> specific headers. The following table lists the keyword arguments
> and
> diff --git a/module/web/uri.scm b/module/web/uri.scm
> index 8e0b9bee7..acec2d1e8 100644
> --- a/module/web/uri.scm
> +++ b/module/web/uri.scm
> @@ -47,7 +47,9 @@
>
> uri-reference? relative-ref?
> build-uri-reference build-relative-ref
> - string->uri-reference string->relative-ref))
> + string->uri-reference string->relative-ref
> +
> + resolve-relative-reference))
>
> (define-record-type <uri>
> (make-uri scheme userinfo host port path query fragment)
> @@ -501,3 +503,160 @@ strings, and join the parts together with ‘/’
> as a delimiter.
> For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
> encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
> (string-join (map uri-encode parts) "/"))
> +
> +(define (remove-dot-segments path)
> + "Remove the @samp{./} and @samp{../} segments in @var{path}, as
> + RFC3986, section 5.2.4."
> + (let scan ((input
> + (let ((components (split-and-decode-uri-path path)))
> + (if (string-suffix? "/" path)
> + `(,@components "")
> + components)))
> + (input-path-absolute? (string-prefix? "/" path))
> + (output '())
> + (output-absolute? #f)
> + (output-ends-in-/? (string-suffix? "/" path)))
> + (cond
> + ((and input-path-absolute?
> + (null? input))
> + ;; Transfer the initial "/" from the input to the end of the
> + ;; output.
> + (scan '() #f output output-absolute? #t))
> + ((null? input)
> + (string-append
> + (if output-absolute? "/" "")
> + (encode-and-join-uri-path
> + (reverse output))
> + (if output-ends-in-/? "/" "")))
> + ((and (not input-path-absolute?)
> + (or (equal? (car input) "..")
> + (equal? (car input) ".")))
> + (scan (cdr input) #f output output-absolute? output-ends-in-
> /?))
> + ((and input-path-absolute?
> + (equal? (car input) "."))
> + (scan (cdr input) #t output output-absolute? output-ends-in-
> /?))
> + ((and input-path-absolute?
> + (equal? (car input) ".."))
> + (scan (cdr input) #t
> + (if (null? output)
> + output
> + (cdr output))
> + ;; Remove the last segment, including the preceding /.
> So,
> + ;; if there is 0 or 1 segment, remove the root / too.
> + (if (or (null? output) (null? (cdr output)))
> + #f ;; remove the /
> + #t) ;; keep it
> + #f))
> + (else
> + (scan (cdr input)
> + ;; If there is only 1 item in input, then it does not
> end in
> + ;; /, so the recursive call does not start with
> + ;; /. Otherwise, the recursive call starts with /.
> + (not (null? (cdr input)))
> + (cons (car input) output)
> + ;; If the output is empty and the input path is
> absolute,
> + ;; the / of the transferred path is transferred as well.
> + (or output-absolute?
> + (and (null? output)
> + input-path-absolute?))
> + #f)))))
> +
> +(define (merge-paths base-has-authority? base dependent)
> + "Return @samp{@var{base}/@var{dependent}}, with the subtelties of
> absolute
> + paths explained in RFC3986, section 5.2.3. If the base URI has an
> +authority (userinfo, host, port), then the processing is a bit
> +different."
> + (if (and base-has-authority?
> + (equal? base ""))
> + (string-append "/" dependent)
> + (let ((last-/ (string-rindex base #\/)))
> + (if last-/
> + (string-append (substring base 0 last-/) "/" dependent)
> + dependent))))
> +
> +(define (resolve-relative-reference base dependent)
> + "Resolve @var{dependent} on top of @var{base}, as RFC3986, section
> +5.2. Both @var{dependent} and @var{base} may be URI or relative
> +references. The return value is a URI if either @var{dependent} or
> +@var{base} is a URI."
> + ;; As opposed to RFC 3986, we use "dependent" instead of
> "relative" to
> + ;; avoid confusion between "URI" and "relative reference", the
> + ;; dependent URI may be either.
> + (let ((b-scheme (uri-scheme base))
> + (b-userinfo (uri-userinfo base))
> + (b-host (uri-host base))
> + (b-port (uri-port base))
> + (b-path (uri-path base))
> + (b-query (uri-query base))
> + (b-fragment (uri-fragment base))
> + (r-scheme (uri-scheme dependent))
> + (r-userinfo (uri-userinfo dependent))
> + (r-host (uri-host dependent))
> + (r-port (uri-port dependent))
> + (r-path (uri-path dependent))
> + (r-query (uri-query dependent))
> + (r-fragment (uri-fragment dependent))
> + (t-scheme #f)
> + (t-userinfo #f)
> + (t-host #f)
> + (t-port #f)
> + (t-path "")
> + (t-query #f)
> + (t-fragment #f))
> + ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
> +
> + ;;The programming style uses mutations to better adhere to the
> + ;;algorithm specification.
> + (if r-scheme
> + (begin
> + (set! t-scheme r-scheme)
> + (set! t-userinfo r-userinfo)
> + (set! t-host r-host)
> + (set! t-port r-port)
> + (set! t-path (remove-dot-segments r-path))
> + (set! t-query r-query))
> + ;; r-scheme is not defined:
> + (begin
> + (if r-host
> + (begin
> + (set! t-userinfo r-userinfo)
> + (set! t-host r-host)
> + (set! t-port r-port)
> + (set! t-path (remove-dot-segments r-path))
> + (set! t-query r-query))
> + ;; r-scheme is not defined, r-authority is not
> defined:
> + (begin
> + (if (equal? r-path "")
> + (begin
> + (set! t-path b-path)
> + (if r-query
> + ;; r-scheme, r-authority, r-path are not
> + ;; defined:
> + (set! t-query r-query)
> + ;; r-scheme, r-authority, r-path, r-query
> are
> + ;; not defined:
> + (set! t-query b-query)))
> + ;; r-scheme, r-authority not defined, r-path
> defined:
> + (begin
> + (if (string-prefix? "/" r-path)
> + ;; r-scheme, r-authority not defined, r-
> path
> + ;; absolute:
> + (set! t-path (remove-dot-segments r-path))
> + ;; r-scheme, r-authority not defined, r-
> path
> + ;; dependent:
> + (set! t-path
> + (remove-dot-segments
> + (merge-paths b-host b-path r-
> path))))
> + (set! t-query r-query)))
> + (set! t-userinfo b-userinfo)
> + (set! t-host b-host)
> + (set! t-port b-port)))
> + (set! t-scheme b-scheme)))
> + (set! t-fragment r-fragment)
> + (build-uri-reference #:scheme t-scheme
> + #:userinfo t-userinfo
> + #:host t-host
> + #:port t-port
> + #:path t-path
> + #:query t-query
> + #:fragment t-fragment)))
> diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-
> uri.test
> index 95fd82f16..c453bf60f 100644
> --- a/test-suite/tests/web-uri.test
> +++ b/test-suite/tests/web-uri.test
> @@ -20,6 +20,7 @@
> (define-module (test-web-uri)
> #:use-module (web uri)
> #:use-module (ice-9 regex)
> + #:use-module (ice-9 string-fun)
> #:use-module (test-suite lib))
>
>
> @@ -693,3 +694,70 @@
> (pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
> (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
> (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
> +
> +(with-test-prefix "resolve relative reference"
> + ;; Test suite in RFC3986, section 5.4.
> + (let ((base (string->uri "http://a/b/c/d;p?q"))
> + (equal/encoded?
> + ;; The test suite checks for ';' characters, but Guile
> escapes
> + ;; them in URIs. Same for '='.
> + (let ((escape-colon
> + (lambda (x)
> + (string-replace-substring x ";" "%3B")))
> + (escape-equal
> + (lambda (x)
> + (string-replace-substring x "=" "%3D"))))
> + (lambda (x y)
> + (equal? (escape-colon (escape-equal x))
> + (escape-colon (escape-equal y)))))))
> + (let ((resolve
> + (lambda (relative)
> + (let* ((relative-uri
> + (string->uri-reference relative))
> + (resolved-uri
> + (resolve-relative-reference base relative-uri))
> + (resolved (uri->string resolved-uri)))
> + resolved))))
> + (with-test-prefix "normal"
> + (pass-if (equal/encoded? (resolve "g:h") "g:h"))
> + (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g"))
> + (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g"))
> + (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/"))
> + (pass-if (equal/encoded? (resolve "/g") "http://a/g"))
> + (pass-if (equal/encoded? (resolve "//g") "http://g"))
> + (pass-if (equal/encoded? (resolve "?y")
> "http://a/b/c/d;p?y"))
> + (pass-if (equal/encoded? (resolve "g?y")
> "http://a/b/c/g?y"))
> + (pass-if (equal/encoded? (resolve "#s")
> "http://a/b/c/d;p?q#s"))
> + (pass-if (equal/encoded? (resolve "g?y#s")
> "http://a/b/c/g?y#s"))
> + (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x"))
> + (pass-if (equal/encoded? (resolve "g;x?y#s")
> "http://a/b/c/g;x?y#s"))
> + (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q"))
> + (pass-if (equal/encoded? (resolve ".") "http://a/b/c/"))
> + (pass-if (equal/encoded? (resolve "./") "http://a/b/c/"))
> + (pass-if (equal/encoded? (resolve "..") "http://a/b/"))
> + (pass-if (equal/encoded? (resolve "../") "http://a/b/"))
> + (pass-if (equal/encoded? (resolve "../g") "http://a/b/g"))
> + (pass-if (equal/encoded? (resolve "../..") "http://a/"))
> + (pass-if (equal/encoded? (resolve "../../") "http://a/"))
> + (pass-if (equal/encoded? (resolve "../../g") "http://a/g")))
> + (with-test-prefix "abnormal"
> + (pass-if (equal/encoded? (resolve "../../../g")
> "http://a/g"))
> + (pass-if (equal/encoded? (resolve "../../../../g")
> "http://a/g"))
> + (pass-if (equal/encoded? (resolve "/./g") "http://a/g"))
> + (pass-if (equal/encoded? (resolve "/../g") "http://a/g"))
> + (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g."))
> + (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g"))
> + (pass-if (equal/encoded? (resolve "g..")
> "http://a/b/c/g.."))
> + (pass-if (equal/encoded? (resolve "..g")
> "http://a/b/c/..g"))
> + (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g"))
> + (pass-if (equal/encoded? (resolve "./g/.")
> "http://a/b/c/g/"))
> + (pass-if (equal/encoded? (resolve "g/./h")
> "http://a/b/c/g/h"))
> + (pass-if (equal/encoded? (resolve "g/../h")
> "http://a/b/c/h"))
> + (pass-if (equal/encoded? (resolve "g;x=1/./y")
> "http://a/b/c/g;x=1/y"))
> + (pass-if (equal/encoded? (resolve "g;x=1/../y")
> "http://a/b/c/y"))
> + (pass-if (equal/encoded? (resolve "g?y/./x")
> "http://a/b/c/g?y/./x"))
> + (pass-if (equal/encoded? (resolve "g?y/../x")
> "http://a/b/c/g?y/../x"))
> + (pass-if (equal/encoded? (resolve "g#s/./x")
> "http://a/b/c/g#s/./x"))
> + (pass-if (equal/encoded? (resolve "g#s/../x")
> "http://a/b/c/g#s/../x"))
> + (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
> +
>
> base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-02 16:32 ` Vivien Kraus
@ 2023-10-03 18:49 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v3] " Vivien Kraus
2023-10-03 18:56 ` [PATCH v2] " Dale Mellor
0 siblings, 2 replies; 21+ messages in thread
From: Maxime Devos @ 2023-10-03 18:49 UTC (permalink / raw)
To: Vivien Kraus, guile-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2029 bytes --]
Op 02-10-2023 om 18:32 schreef Vivien Kraus:
> Hi!
>
> Are there other things to fix?
You forgot to include the warning of potential security issues in the
documentation -- I don't mean that Guile should fix the issues (it
can't), but rather that its documentation should inform the user that
there exists a potential issue to fix.
>> +(define (merge-paths base-has-authority? base dependent)
>> + "Return @samp{@var{base}/@var{dependent}}, with the subtelties of
>> absolute
subtle spelling error: subtleties -> subtleties
Also, if the result of changing the variable name is deviating from the
RFC, then I'm not sure whether it is better. (I was thinking of relative
-> maybe-relative myself, which sticks close to the RFC.) Rather
bikesheddy, though.
>> diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-
>> uri.test
>> index 95fd82f16..c453bf60f 100644
>> --- a/test-suite/tests/web-uri.test
>> +++ b/test-suite/tests/web-uri.test
>> @@ -20,6 +20,7 @@
>> (define-module (test-web-uri)
>> #:use-module (web uri)
>> #:use-module (ice-9 regex)
>> + #:use-module (ice-9 string-fun)
>> #:use-module (test-suite lib))
Copyright lines need update 2020->2023, or a new update line if there is
no assignment to FSF. (If you want to assign to FSF, the process to do
this is started by the maintainer(s) -- I'm not a Guile maintainer.)
Also: new entry in AUTHORS (TODO: HACKING implies this is only if you
assigned copyright to FSF, and only for new files, but, err, nope,
copyright != author and likewise initial author != all authors) (*).
Also, according to HACKING, you should be self-congratulatory, i.e., add
yourself to THANKS.
Other than that, I have no remarks.
Best regards,
Maxime Devos
(*) It says (paraphrases) ‘see maintain.texi for what should go in
there’, but then the name of the file is extremely misleading, because
authors != authors that a list is needed of according to maintain.texi.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-03 18:49 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v3] " Vivien Kraus
@ 2023-10-03 18:56 ` Dale Mellor
2023-10-03 19:04 ` Maxime Devos
1 sibling, 1 reply; 21+ messages in thread
From: Dale Mellor @ 2023-10-03 18:56 UTC (permalink / raw)
To: guile-devel
On Tue, 2023-10-03 at 20:49 +0200, Maxime Devos wrote:
>
> subtle spelling error: subtleties -> subtleties
That's pretty subtle!
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-03 18:56 ` [PATCH v2] " Dale Mellor
@ 2023-10-03 19:04 ` Maxime Devos
0 siblings, 0 replies; 21+ messages in thread
From: Maxime Devos @ 2023-10-03 19:04 UTC (permalink / raw)
To: guile-rc34xq, guile-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 275 bytes --]
Op 03-10-2023 om 20:56 schreef Dale Mellor:
> On Tue, 2023-10-03 at 20:49 +0200, Maxime Devos wrote:
>> subtle spelling error: subtleties -> subtleties
> That's pretty subtle!
>
Oops, I corrected the spelling error too early:
subtelties -> subtleties
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-09-25 20:46 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v2] " Vivien Kraus
@ 2023-10-03 20:03 ` Vivien Kraus
2023-10-03 22:22 ` Maxime Devos
1 sibling, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-10-03 20:03 UTC (permalink / raw)
To: Maxime Devos, guile-devel
Hello!
Le lundi 25 septembre 2023 à 22:46 +0200, Maxime Devos a écrit :
> However, there are also some dangers on doing this
> thing -- the ‘external’ page https://example.com/data.json could
> redirect to
> http://localhost/unsecured-secret-but-its-localhost-only-so-it-is-safe
> .
Since you want to warn users about this problem, I have to first
understand why it is a problem at all. Your example literally ends with
"so it is safe", and I fail to see why it would be a problem. Could you
elaborate?
Best regards,
Vivien
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-03 20:03 ` [PATCH] " Vivien Kraus
@ 2023-10-03 22:22 ` Maxime Devos
2023-10-03 22:30 ` Maxime Devos
0 siblings, 1 reply; 21+ messages in thread
From: Maxime Devos @ 2023-10-03 22:22 UTC (permalink / raw)
To: Vivien Kraus, guile-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2638 bytes --]
Op 03-10-2023 om 22:03 schreef Vivien Kraus:
> Hello!
>
> Le lundi 25 septembre 2023 à 22:46 +0200, Maxime Devos a écrit :
>> However, there are also some dangers on doing this
>> thing -- the ‘external’ pagehttps://example.com/data.json could
>> redirect to
>> http://localhost/unsecured-secret-but-its-localhost-only-so-it-is-safe
>> .
> Since you want to warn users about this problem, I have to first
> understand why it is a problem at all. Your example literally ends with
> "so it is safe", and I fail to see why it would be a problem. Could you
> elaborate?
The 'so-it-is-safe' is ironic, as in,
‘so-naively-you-might-have-thought-it-was-safe-but-it-isnt'..
More concretely, here is how things could be used for a data
exfiltration attack:
Let's say that user U has a local web server (localhost), on which
passwords are saved.
Furthermore, let's say that U locally runs software that periodically
downloads https://secretly-evil-or-compromised.com/page and uploads a
copy to https://somewhere-public.org. (For example, maybe U wants to
know how secretly-evil-or-compromised.com/page evolves over time, and
shares the results with other people.)
The attacker, who controls secretly-evil-or-compromised.com, adjusts the
server to redirect https://secretly-evil-or-compromised.com/page to
http://localhost/passwords.
Now, next time the software is run, the software notices the redirect
https://secretly-evil-or-compromised.com/page ->
http://localhost/passwords, so it naively follows the redirect and now
the passwords are copied to a public web page that the attackers can
access, even though supposedly the passwords were localhost-only!
For another attack, see
<https://about.gitlab.com/blog/2021/09/07/why-are-developers-vulnerable-to-driveby-attacks/>.
Best practices against this are two-fold:
* redirections are not followed across ‘scopes’
(e.g., global internet website -> localhost website is forbidden).
This can be quite complex, but at least some cases are handled by
Firefox, IIRC.
In case of browsers, there are also XSS prevention things etc..
The best prevention is not allowing redirects at all or only
allowing redirections that keep the hostname intact -- while an
option for much software, it isn't an option for web browsers.
* web servers don't rely on being not directly inaccessible from the
network for security, instead, they also require things like
cookies, logins, not-technically-a-cookie-but-?secret=something-long.
Best regards,
Maxime.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-03 22:22 ` Maxime Devos
@ 2023-10-03 22:30 ` Maxime Devos
2023-10-04 5:29 ` Vivien Kraus
0 siblings, 1 reply; 21+ messages in thread
From: Maxime Devos @ 2023-10-03 22:30 UTC (permalink / raw)
To: Vivien Kraus, guile-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 560 bytes --]
> The best prevention is not allowing redirects at all or only
> allowing redirections that keep the hostname intact -- while an
> option for much software, it isn't an option for web browsers.
Partially scratch that -- restricting to ‘keeping hostname intact’ is
insufficient, because there could be a DNS record that points 'website
via http' to 127.0.0.1, and hence a redirect from https://website -->
http://website can change IP addresses from global Internet to local
computer.
Best regards,
Maxime Devos.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-03 22:30 ` Maxime Devos
@ 2023-10-04 5:29 ` Vivien Kraus
2023-10-10 21:44 ` Maxime Devos
0 siblings, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-10-04 5:29 UTC (permalink / raw)
To: Maxime Devos, guile-devel
Le mercredi 04 octobre 2023 à 00:30 +0200, Maxime Devos a écrit :
>
> > The best prevention is not allowing redirects at all or only
> > allowing redirections that keep the hostname intact -- while
> > an
> > option for much software, it isn't an option for web
> > browsers.
>
> Partially scratch that -- restricting to ‘keeping hostname intact’ is
> insufficient, because there could be a DNS record that points
> 'website
> via http' to 127.0.0.1, and hence a redirect from https://website -->
> http://website can change IP addresses from global Internet to local
> computer.
But then, it is not a problem with resolve-relative-reference, and not
even a risk with redirections; if the DNS changes before you query the
page, then the secret page leaks anyway, no redirection needed.
We could add a warning in the "http-request" method documentation,
like:
Be warned that if you are hosting a private HTTP(s) server on your
system, a DNS change for a public target URI to your internal IP
address, or following a redirection from a public target URI to your
private server, may lead you to consider the response originating from
your private server as public.
Would that be a good summary?
Vivien
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-10-04 5:29 ` Vivien Kraus
@ 2023-10-10 21:44 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v4] " Vivien Kraus
0 siblings, 1 reply; 21+ messages in thread
From: Maxime Devos @ 2023-10-10 21:44 UTC (permalink / raw)
To: Vivien Kraus, guile-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 2576 bytes --]
Op 04-10-2023 om 07:29 schreef Vivien Kraus:
> Le mercredi 04 octobre 2023 à 00:30 +0200, Maxime Devos a écrit :
>>
>>> The best prevention is not allowing redirects at all or only
>>> allowing redirections that keep the hostname intact -- while
>>> an
>>> option for much software, it isn't an option for web
>>> browsers.
>>
>> Partially scratch that -- restricting to ‘keeping hostname intact’ is
>> insufficient, because there could be a DNS record that points
>> 'website
>> via http' to 127.0.0.1, and hence a redirect from https://website -->
>> http://website can change IP addresses from global Internet to local
>> computer.
>
> But then, it is not a problem with resolve-relative-reference, and not
> even a risk with redirections; if the DNS changes before you query the
> page, then the secret page leaks anyway, no redirection needed. >
> We could add a warning in the "http-request" method documentation,
> like:
>
> Be warned that if you are hosting a private HTTP(s) server on your
> system, a DNS change for a public target URI to your internal IP
> address, or following a redirection from a public target URI to your
> private server, may lead you to consider the response originating from
> your private server as public.
>
> Would that be a good summary?
Hum I was thinking from the perspective of the client, whereas this
considers things form the perspective of the server, but that works too
I suppose.
However, I am confused by the phrasing of the last sentence
> [...], may lead you to consider the response originating from
> your private server as public.
After reading it again, I think I understand it, but I would instead
propose:
> may allow you to accidentally contact your private server as if it
were public. Depending on the application, an attacker could exploit this.
because:
* while the responses can be a problem, another problem is that
a request is sent in the first place -- as a problematic example,
the private server could have a buffer overflow somewhere that
is remotely triggered by the attacker via a redirection to a long
URL triggering the buffer overflow (with an attacking payload!).
For another example, consider PUT/POST/DELETE/... instead of GET.
* to make it clear this is a potential security problem.
‘Be warned’ doesn't cut it IMO.
* another reason I can't word properly, something about the
'as public'.
Best regards,
Maxime Devos.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-09-25 16:48 ` [PATCH v4] " Vivien Kraus
@ 2023-11-02 20:00 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-02 20:48 ` Vivien Kraus
2023-11-27 17:10 ` Vivien Kraus
0 siblings, 2 replies; 21+ messages in thread
From: Nathan via Developers list for Guile, the GNU extensibility library @ 2023-11-02 20:00 UTC (permalink / raw)
To: vivien, guile-devel
[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]
There is a problem and I fixed it by rewriting a bunch of code myself because I need similar code.
remove-dot-segments:
You cannot split-and-decode-uri-path and then encode-and-join-uri-path.
Those are terrible functions that don't work on all URIs.
URI schemes are allowed to specify that certain reserved characters (sub-delims) are special.
In that case, a sub-delim that IS escaped is different from a sub-delim that IS NOT escaped.
Example input to your remove-dot-segments:
(resolve-relative-reference (string->uri-reference "/") (string->uri-reference "excitement://a.com/a!a!%21!"))
Your wrong output:
excitement://a.com/a%21a%21%21%21
One solution would be to only percent-decode dots. Because dot is unreserved, that solution doesn't have any URI equivalence issues.
But I still think decoding dots automatically is a bad, unexpected side-effect to have.
I rewrote this function so that it:
- works on both escaped and unescaped dots
- doesn't unescape any unnecessary characters
The test suite no longer needs to check for incorrect output either:
> ;; The test suite checks for ';' characters, but Guile escapes
> ;; them in URIs. Same for '='.
----
resolve-relative-reference:
I rewrote this procedure so it is shorter.
I also added #:strict? to toggle "strict parser" as mentioned in the RFC.
- Nathan
[-- Attachment #2: patch --]
[-- Type: text/x-patch, Size: 16708 bytes --]
From 655d3e61fa99bb5ddf5388c0843f498d0bf6f789 Mon Sep 17 00:00:00 2001
From: Nathan <nathan_mail@nborghese.com>
Date: Thu, 2 Nov 2023 15:42:30 -0400
Subject: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986
5.2.
* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
* doc/ref/web.texi (URIs): Document it here.
(Subtypes of URI): Add a @node declaration to cross-reference it.
(HTTP Headers) [location]: Point to the section for different URI types.
(Web Client) [http-request]: Indicate that no redirection is performed,
and warn about blindly following them.
* AUTHORS: Mention Vivien Kraus.
* THANKS: Thank Vivien Kraus.
---
AUTHORS | 8 +++
NEWS | 7 ++
THANKS | 1 +
doc/ref/web.texi | 43 +++++++++++-
module/web/uri.scm | 126 +++++++++++++++++++++++++++++++++-
test-suite/tests/web-uri.test | 61 ++++++++++++++++
6 files changed, 244 insertions(+), 2 deletions(-)
diff --git a/AUTHORS b/AUTHORS
index d756a74ce..2a95d3b0b 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore:
Gregory Marton:
In the subdirectory test-suite/tests, changes to:
hash.test
+
+Vivien Kraus:
+In the subdirectory module/web, changes to:
+ uri.scm
+In the subdirectory doc/ref, changes to:
+ web.texi
+In the subdirectory test-suite/tests, changes to:
+ web-uri.test
diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
* New interfaces and functionality
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
** New warning: unused-module
This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/THANKS b/THANKS
index aa4877e95..a1f982f04 100644
--- a/THANKS
+++ b/THANKS
@@ -19,6 +19,7 @@ Contributors since the last release:
Chris K Jester-Young
David Kastrup
Daniel Kraft
+ Vivien Kraus
Daniel Krueger
Noah Lavine
Christopher Lemmer Webber
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855b6..c6923c23f 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
as @code{"scrambled%20eggs/biscuits%26gravy"}.
@end deffn
+@node Subtypes of URI
@subsubheading Subtypes of URI
As we noted above, not all URI objects have a scheme. You might have
@@ -356,6 +357,34 @@ Parse @var{string} into a URI object, while asserting that no scheme is
present. Return @code{#f} if the string could not be parsed.
@end deffn
+@cindex resolve URI reference
+In order to get a URI object from a base URI and a relative reference,
+one has to use a @dfn{relative URI reference resolution} algorithm. For
+instance, given a base URI, @samp{https://example.com/over/here}, and a
+relative reference, @samp{../no/there}, it may seem easy to get an
+absolute URI as @samp{https://example.com/over/../no/there}. It is
+possible that the server at @samp{https://example.com} could serve the
+same resource under this URL as
+@samp{https://example.com/no/there}. However, a web cache, or a linked
+data processor, must understand that the relative reference resolution
+leads to @samp{https://example.com/no/there}.
+
+@deffn {Scheme Procedure} resolve-relative-reference @var{base} @var{relative} @
+ [#:strict?=@code{#t}]
+Return a URI object representing @var{relative}, using the components of
+@var{base} if missing, as defined in section 5.2 in RFC 3986. Both @var{base}
+and @var{relative} may be full URI or relative URI references. If @var{strict?}
+is true, the parser does not ignore the scheme in @var{relative} if it is
+identical to the one in @var{base}. The name ``relative'' indicates the
+argument’s relationship to @var{base}, not its type. This function cannot
+return a relative reference (it can only return an absolute URI object), if
+either @var{base} or @var{relative} is an absolute URI object.
+
+Please note that any part of @var{base} may be overriden by
+@var{relative}. For instance, if @var{base} has a @code{https} URI
+scheme, and if @var{relative} has a @code{http} scheme, then the result
+will have a @code{http} scheme.
+@end deffn
@node HTTP
@subsection The Hyper-Text Transfer Protocol
@@ -1038,7 +1067,8 @@ The entity-tag of the resource.
@deftypevr {HTTP Header} URI-reference location
A URI reference on which a request may be completed. Used in
combination with a redirecting status code to perform client-side
-redirection.
+redirection. @xref{Subtypes of URI, the distinction between types of
+URI}, for more information on relative references.
@example
(parse-header 'location "http://example.com/other")
@result{} #<uri ...>
@@ -1501,6 +1531,17 @@ constants, such as @code{certificate-status/signer-not-found} or
Connect to the server corresponding to @var{uri} and make a request over
HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.).
+@code{http-request} does not follow redirections. If a redirection is
+required, @code{http-request} returns a response object with an adequate
+response code (e.g. 301 or 302).
+
+Making web requests on a network where private servers are hosted comes
+with potential security risks. A malicious public server might forge
+its DNS record to point to your internal address. It might also
+redirect you to your internal server. In the first case, or if you
+follow the redirection of the second case, then you may accidentally
+connect to your private server as if it were public.
+
The following keyword arguments allow you to modify the requests in
various ways, for example attaching a body to the request, or setting
specific headers. The following table lists the keyword arguments and
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..2280976b5 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -1,6 +1,7 @@
;;;; (web uri) --- URI manipulation tools
;;;;
;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc.
+;;;; Copyright (C) 2023 Vivien Kraus
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -47,7 +48,9 @@
uri-reference? relative-ref?
build-uri-reference build-relative-ref
- string->uri-reference string->relative-ref))
+ string->uri-reference string->relative-ref
+
+ resolve-relative-reference))
(define-record-type <uri>
(make-uri scheme userinfo host port path query fragment)
@@ -501,3 +504,124 @@ strings, and join the parts together with ‘/’ as a delimiter.
For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
(string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+ "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+RFC3986, section 5.2.4. This procedure properly handles percent-encoded
+dots, but does not percent-decode any unnecessary bytes."
+ (let lp ((input path) (out '()))
+ (define (get-dots-info)
+ "returns three values about the start of the current input string.
+(values starts-with-slash? dots end-slash-index)
+DOTS is the number of dot characters, including escaped ones.
+If there are non-dot, non-slash characters too then DOTS will
+instead be some meaningless number greater than two."
+ (let ((starts-with-slash (eqv? (string-ref input 0) #\/)))
+ (let dots-lp ((i (if starts-with-slash 1 0))
+ (dots 0))
+ (if (eqv? i (string-length input))
+ (values starts-with-slash dots i)
+ (let ((c (string-ref input i)))
+ (cond
+ ((eqv? c #\/)
+ (values starts-with-slash dots i))
+ ((string-prefix-ci? "%2E" input 0 3 i)
+ (dots-lp (+ i 3) (1+ dots)))
+ ((eqv? c #\.)
+ (dots-lp (1+ i) (1+ dots)))
+ (else
+ (dots-lp (1+ i) 3))))))))
+ (if (string-null? input)
+ (apply string-append (reverse out))
+ (call-with-values get-dots-info
+ (lambda (starts-with-slash? dots end-slash-pos)
+ (cond
+ ;; handle ../ ./ . ..
+ ((and (not starts-with-slash?) (or (eqv? dots 1) (eqv? dots 2)))
+ (lp (substring input (min (1+ end-slash-pos)
+ (string-length input))) out))
+ ((and starts-with-slash? (eqv? dots 1))
+ (lp
+ (if (eqv? end-slash-pos (string-length input))
+ "/" ;; handle /.
+ (substring input end-slash-pos)) ;; handle /./
+ out))
+ ((and starts-with-slash? (eqv? dots 2))
+ (lp
+ (if (eqv? end-slash-pos (string-length input))
+ "/" ;; handle /..
+ (substring input end-slash-pos)) ;; handle /../
+ (if (null? out) out (cdr out))))
+ (else
+ (lp
+ (substring input end-slash-pos)
+ (cons (substring input 0 end-slash-pos) out)))))))))
+
+(define (merge-paths base-has-authority? base relative)
+ "Return @samp{@var{base}/@var{relative}}, with the subtleties of
+absolute paths explained in RFC3986, section 5.2.3. If the base URI has
+an authority (userinfo, host, port), then the processing is a bit
+different."
+ (if (and base-has-authority?
+ (equal? base ""))
+ (string-append "/" relative)
+ (let ((last-/ (string-rindex base #\/)))
+ (if last-/
+ (string-append (substring base 0 last-/) "/" relative)
+ relative))))
+
+(define* (resolve-relative-reference base relative #:key (strict? #t))
+ "Resolve @var{relative} on top of @var{base}, as RFC3986, section
+5.2. Both @var{relative} and @var{base} may be URI or relative
+references. The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type. Both @var{base} and @var{relative} may be
+full URIs or relative references. The return value is a URI if either
+@var{relative} or @var{base} is a URI. If @var{strict?} is true, the
+default, the parser does not ignore the scheme in @var{relative} if it
+is identical to the one in @var{base}."
+ (let ((b-scheme (uri-scheme base))
+ (b-userinfo (uri-userinfo base))
+ (b-host (uri-host base))
+ (b-port (uri-port base))
+ (b-path (uri-path base))
+ (b-query (uri-query base))
+ (b-fragment (uri-fragment base))
+ (r-scheme (uri-scheme relative))
+ (r-userinfo (uri-userinfo relative))
+ (r-host (uri-host relative))
+ (r-port (uri-port relative))
+ (r-path (uri-path relative))
+ (r-query (uri-query relative))
+ (r-fragment (uri-fragment relative)))
+ (cond
+ ((or r-host (and r-scheme (or strict? (not (eq? r-scheme b-scheme)))))
+ (build-uri-reference
+ #:scheme (or r-scheme b-scheme)
+ #:userinfo r-userinfo
+ #:host r-host
+ #:port r-port
+ #:path (remove-dot-segments r-path)
+ #:query r-query
+ #:fragment r-fragment))
+ ((string-null? r-path)
+ (build-uri-reference
+ #:scheme b-scheme
+ #:userinfo b-userinfo
+ #:host b-host
+ #:port b-port
+ #:path b-path
+ #:query (or r-query b-query)
+ #:fragment r-fragment))
+ (else
+ (build-uri-reference
+ #:scheme b-scheme
+ #:userinfo b-userinfo
+ #:host b-host
+ #:port b-port
+ #:path
+ (remove-dot-segments
+ (if (string-prefix? "/" r-path)
+ r-path
+ (merge-paths b-host b-path r-path)))
+ #:query r-query
+ #:fragment r-fragment)))))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..b4d4b6cdb 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -693,3 +693,64 @@
(pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
(pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
(pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+ ;; Test suite in RFC3986, section 5.4.
+ (let ((base (string->uri "http://a/b/c/d;p?q")))
+ (let ((resolve
+ (lambda (relative)
+ (let* ((relative-uri
+ (string->uri-reference relative))
+ (resolved-uri
+ (resolve-relative-reference base relative-uri))
+ (resolved (uri->string resolved-uri)))
+ resolved))))
+ (pass-if "remove-dot-segments unnecessary escaping"
+ (equal? (resolve "%2e%2E/.%2e/%2E./g%2e%2E%2Fh%2e") "http://a/g%2e%2E%2Fh%2e"))
+ (with-test-prefix "normal"
+ (pass-if (equal? (resolve "g:h") "g:h"))
+ (pass-if (equal? (resolve "g") "http://a/b/c/g"))
+ (pass-if (equal? (resolve "./g") "http://a/b/c/g"))
+ (pass-if (equal? (resolve "g/") "http://a/b/c/g/"))
+ (pass-if (equal? (resolve "/g") "http://a/g"))
+ (pass-if (equal? (resolve "//g") "http://g"))
+ (pass-if (equal? (resolve "?y") "http://a/b/c/d;p?y"))
+ (pass-if (equal? (resolve "g?y") "http://a/b/c/g?y"))
+ (pass-if (equal? (resolve "#s") "http://a/b/c/d;p?q#s"))
+ (pass-if (equal? (resolve "g?y#s") "http://a/b/c/g?y#s"))
+ (pass-if (equal? (resolve ";x") "http://a/b/c/;x"))
+ (pass-if (equal? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s"))
+ (pass-if (equal? (resolve "") "http://a/b/c/d;p?q"))
+ (pass-if (equal? (resolve ".") "http://a/b/c/"))
+ (pass-if (equal? (resolve "./") "http://a/b/c/"))
+ (pass-if (equal? (resolve "..") "http://a/b/"))
+ (pass-if (equal? (resolve "../") "http://a/b/"))
+ (pass-if (equal? (resolve "../g") "http://a/b/g"))
+ (pass-if (equal? (resolve "../..") "http://a/"))
+ (pass-if (equal? (resolve "../../") "http://a/"))
+ (pass-if (equal? (resolve "../../g") "http://a/g")))
+ (with-test-prefix "abnormal"
+ (pass-if (equal? (resolve "../../../g") "http://a/g"))
+ (pass-if (equal? (resolve "../../../../g") "http://a/g"))
+ (pass-if (equal? (resolve "/./g") "http://a/g"))
+ (pass-if (equal? (resolve "/../g") "http://a/g"))
+ (pass-if (equal? (resolve "g.") "http://a/b/c/g."))
+ (pass-if (equal? (resolve ".g") "http://a/b/c/.g"))
+ (pass-if (equal? (resolve "g..") "http://a/b/c/g.."))
+ (pass-if (equal? (resolve "..g") "http://a/b/c/..g"))
+ (pass-if (equal? (resolve "./../g") "http://a/b/g"))
+ (pass-if (equal? (resolve "./g/.") "http://a/b/c/g/"))
+ (pass-if (equal? (resolve "g/./h") "http://a/b/c/g/h"))
+ (pass-if (equal? (resolve "g/../h") "http://a/b/c/h"))
+ (pass-if (equal? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y"))
+ (pass-if (equal? (resolve "g;x=1/../y") "http://a/b/c/y"))
+ (pass-if (equal? (resolve "g?y/./x") "http://a/b/c/g?y/./x"))
+ (pass-if (equal? (resolve "g?y/../x") "http://a/b/c/g?y/../x"))
+ (pass-if (equal? (resolve "g#s/./x") "http://a/b/c/g#s/./x"))
+ (pass-if (equal? (resolve "g#s/../x") "http://a/b/c/g#s/../x"))
+ (pass-if (equal? (resolve "http:g") "http:g"))
+ (pass-if "nonstrict relative resolve"
+ (equal? (uri->string (resolve-relative-reference
+ base (string->uri-reference "http:g")
+ #:strict? #f))
+ "http://a/b/c/g"))))))
--
2.41.0
^ permalink raw reply related [flat|nested] 21+ messages in thread
* Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-11-02 20:00 ` Nathan via Developers list for Guile, the GNU extensibility library
@ 2023-11-02 20:48 ` Vivien Kraus
2023-11-03 17:49 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-27 17:10 ` Vivien Kraus
1 sibling, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-11-02 20:48 UTC (permalink / raw)
To: Nathan, guile-devel
Hello Natan!
Le jeudi 02 novembre 2023 à 16:00 -0400, Nathan a écrit :
> There is a problem and I fixed it by rewriting a bunch of code myself
> because I need similar code.
Thank you!
> remove-dot-segments:
> You cannot split-and-decode-uri-path and then encode-and-join-uri-
> path.
> Those are terrible functions that don't work on all URIs.
> URI schemes are allowed to specify that certain reserved characters
> (sub-delims) are special.
> In that case, a sub-delim that IS escaped is different from a sub-
> delim that IS NOT escaped.
>
> Example input to your remove-dot-segments:
> (resolve-relative-reference (string->uri-reference "/") (string->uri-
> reference "excitement://a.com/a!a!%21!"))
> Your wrong output:
> excitement://a.com/a%21a%21%21%21
I see.
>
> One solution would be to only percent-decode dots. Because dot is
> unreserved, that solution doesn't have any URI equivalence issues.
> But I still think decoding dots automatically is a bad, unexpected
> side-effect to have.
> I rewrote this function so that it:
> - works on both escaped and unescaped dots
> - doesn't unescape any unnecessary characters
This pushes the limits of my understanding of URIs, as I did not know
we had to consider '%2E%2E' the same as '..'. However, the RFC is not
very clear:
2.3: Unreserved Characters:
For consistency, percent-encoded octets in the ranges of ALPHA
(%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
underscore (%5F), or tilde (%7E) should not be created by URI
producers and, when found in a URI, should be decoded to their
corresponding unreserved characters by URI normalizers.
5.2.1: Pre-parse the Base URI:
Normalization of the base URI, as described in Sections 6.2.2 and
6.2.3, is optional. A URI reference must be transformed to its
target URI before it can be normalized.
Did you find something more precise than that? In any case, decoding
the dots is probably the least unsafe thing to do.
>
> The test suite no longer needs to check for incorrect output either:
> > ;; The test suite checks for ';' characters, but Guile escapes
> > ;; them in URIs. Same for '='.
>
> ----
>
> resolve-relative-reference:
> I rewrote this procedure so it is shorter.
> I also added #:strict? to toggle "strict parser" as mentioned in the
> RFC.
As far as I understand, your code is correct. The tests pass.
Thank you again!
Best regards,
Vivien
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-11-02 20:48 ` Vivien Kraus
@ 2023-11-03 17:49 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-03 18:19 ` Vivien Kraus
0 siblings, 1 reply; 21+ messages in thread
From: Nathan via Developers list for Guile, the GNU extensibility library @ 2023-11-03 17:49 UTC (permalink / raw)
To: Vivien Kraus; +Cc: Nathan, guile-devel
Hi Vivien,
> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:
I wasn't able to find anything that MANDATED any normalization at all, either before or after Relative Resolution. It is possible that treating %2E as a literal dot in resolve-relative-reference could count as unwanted normalization. But it's a safe operation in terms of URI equivalence* and I think users would be less confused to have %2E%2E disappear than to have it remain.
Also, what if the resolve-relative-reference procedure didn't treat %2E as a dot?
There isn't a uri-normalize procedure users can call afterwards to fix that.
And there isn't a version of uri-decode that allows selectively decoding JUST the dot characters.
Users would have to write a lot of code themselves to get proper relative-resolution, so we should do it for them.
- Nathan
*References for the claim that treating %2E as a literal dot is always okay:
- Section 2.3: percent-encoded unreserved characters are always equivalent to decoded ones.
- Section 2.4: unreserved characters can be percent-decoded at any time.
- Section 6.2.2.3: dot-segments should be removed during normalization even if found outside of a relative-reference.
Vivien Kraus <vivien@planete-kraus.eu> writes:
> Hello Natan!
>
> Le jeudi 02 novembre 2023 à 16:00 -0400, Nathan a écrit :
>> There is a problem and I fixed it by rewriting a bunch of code myself
>> because I need similar code.
>
> Thank you!
>
>> remove-dot-segments:
>> You cannot split-and-decode-uri-path and then encode-and-join-uri-
>> path.
>> Those are terrible functions that don't work on all URIs.
>> URI schemes are allowed to specify that certain reserved characters
>> (sub-delims) are special.
>> In that case, a sub-delim that IS escaped is different from a sub-
>> delim that IS NOT escaped.
>>
>> Example input to your remove-dot-segments:
>> (resolve-relative-reference (string->uri-reference "/") (string->uri-
>> reference "excitement://a.com/a!a!%21!"))
>> Your wrong output:
>> excitement://a.com/a%21a%21%21%21
>
> I see.
>
>>
>> One solution would be to only percent-decode dots. Because dot is
>> unreserved, that solution doesn't have any URI equivalence issues.
>> But I still think decoding dots automatically is a bad, unexpected
>> side-effect to have.
>> I rewrote this function so that it:
>> - works on both escaped and unescaped dots
>> - doesn't unescape any unnecessary characters
>
> This pushes the limits of my understanding of URIs, as I did not know
> we had to consider '%2E%2E' the same as '..'. However, the RFC is not
> very clear:
>
> 2.3: Unreserved Characters:
> For consistency, percent-encoded octets in the ranges of ALPHA
> (%41-%5A and %61-%7A), DIGIT (%30-%39), hyphen (%2D), period (%2E),
> underscore (%5F), or tilde (%7E) should not be created by URI
> producers and, when found in a URI, should be decoded to their
> corresponding unreserved characters by URI normalizers.
>
> 5.2.1: Pre-parse the Base URI:
> Normalization of the base URI, as described in Sections 6.2.2 and
> 6.2.3, is optional. A URI reference must be transformed to its
> target URI before it can be normalized.
>
> Did you find something more precise than that? In any case, decoding
> the dots is probably the least unsafe thing to do.
>
>>
>> The test suite no longer needs to check for incorrect output either:
>> > ;; The test suite checks for ';' characters, but Guile escapes
>> > ;; them in URIs. Same for '='.
>>
>> ----
>>
>> resolve-relative-reference:
>> I rewrote this procedure so it is shorter.
>> I also added #:strict? to toggle "strict parser" as mentioned in the
>> RFC.
>
> As far as I understand, your code is correct. The tests pass.
>
> Thank you again!
>
> Best regards,
>
> Vivien
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-11-03 17:49 ` Nathan via Developers list for Guile, the GNU extensibility library
@ 2023-11-03 18:19 ` Vivien Kraus
0 siblings, 0 replies; 21+ messages in thread
From: Vivien Kraus @ 2023-11-03 18:19 UTC (permalink / raw)
To: Nathan; +Cc: guile-devel
Le vendredi 03 novembre 2023 à 13:49 -0400, Nathan a écrit :
> > This pushes the limits of my understanding of URIs, as I did not
> > know
> > we had to consider '%2E%2E' the same as '..'. However, the RFC is
> > not
> > very clear:
>
> I wasn't able to find anything that MANDATED any normalization at
> all, either before or after Relative Resolution. It is possible that
> treating %2E as a literal dot in resolve-relative-reference could
> count as unwanted normalization. But it's a safe operation in terms
> of URI equivalence* and I think users would be less confused to have
> %2E%2E disappear than to have it remain.
>
> Also, what if the resolve-relative-reference procedure didn't treat
> %2E as a dot?
> There isn't a uri-normalize procedure users can call afterwards to
> fix that.
> And there isn't a version of uri-decode that allows selectively
> decoding JUST the dot characters.
> Users would have to write a lot of code themselves to get proper
> relative-resolution, so we should do it for them.
First of all, I just noticed I spelled your name wrong, sorry.
I’m convinced the decision you made to consider '.' and '%2E'
equivalent is the correct one, if that was not clear for everyone.
Best regards,
Vivien
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-11-02 20:00 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-02 20:48 ` Vivien Kraus
@ 2023-11-27 17:10 ` Vivien Kraus
2023-11-27 17:15 ` Vivien Kraus
1 sibling, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-11-27 17:10 UTC (permalink / raw)
To: guile-devel; +Cc: Nathan
Dear Guile developers,
Some time ago, Nathan wrote this patch as a better version of mine, and
I would really like to see it merged.
Does it need more improvement? I consider it pretty good.
Best regards,
Vivien
Le jeudi 02 novembre 2023 à 16:00 -0400, Nathan a écrit :
> There is a problem and I fixed it by rewriting a bunch of code myself
> because I need similar code.
>
> remove-dot-segments:
> You cannot split-and-decode-uri-path and then encode-and-join-uri-
> path.
> Those are terrible functions that don't work on all URIs.
> URI schemes are allowed to specify that certain reserved characters
> (sub-delims) are special.
> In that case, a sub-delim that IS escaped is different from a sub-
> delim that IS NOT escaped.
>
> Example input to your remove-dot-segments:
> (resolve-relative-reference (string->uri-reference "/") (string->uri-
> reference "excitement://a.com/a!a!%21!"))
> Your wrong output:
> excitement://a.com/a%21a%21%21%21
>
> One solution would be to only percent-decode dots. Because dot is
> unreserved, that solution doesn't have any URI equivalence issues.
> But I still think decoding dots automatically is a bad, unexpected
> side-effect to have.
> I rewrote this function so that it:
> - works on both escaped and unescaped dots
> - doesn't unescape any unnecessary characters
>
> The test suite no longer needs to check for incorrect output either:
> > ;; The test suite checks for ';' characters, but Guile escapes
> > ;; them in URIs. Same for '='.
>
> ----
>
> resolve-relative-reference:
> I rewrote this procedure so it is shorter.
> I also added #:strict? to toggle "strict parser" as mentioned in the
> RFC.
>
> - Nathan
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-11-27 17:10 ` Vivien Kraus
@ 2023-11-27 17:15 ` Vivien Kraus
2023-11-29 1:08 ` Nathan via Developers list for Guile, the GNU extensibility library
0 siblings, 1 reply; 21+ messages in thread
From: Vivien Kraus @ 2023-11-27 17:15 UTC (permalink / raw)
To: guile-devel; +Cc: Nathan
Le lundi 27 novembre 2023 à 18:10 +0100, Vivien Kraus a écrit :
> Dear Guile developers,
>
> Some time ago, Nathan wrote this patch as a better version of mine,
> and
> I would really like to see it merged.
>
> Does it need more improvement? I consider it pretty good.
Maybe I spoke too soon: the AUTHORS file and the copyright file header
should not mention me for uri.scm, but Nathan, as this version did not
retain significant work by me.
Vivien
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
2023-11-27 17:15 ` Vivien Kraus
@ 2023-11-29 1:08 ` Nathan via Developers list for Guile, the GNU extensibility library
0 siblings, 0 replies; 21+ messages in thread
From: Nathan via Developers list for Guile, the GNU extensibility library @ 2023-11-29 1:08 UTC (permalink / raw)
To: Vivien Kraus; +Cc: guile-devel
[-- Attachment #1: Type: text/plain, Size: 554 bytes --]
The HACKING file says that we always have to go through FSF copyright assignment beforehand, so I was just waiting for that. If they don't care about that anymore, then my name would go in THANKS instead. For completeness, I've attached patches for either case.
(In case I'm still wrong, this would all quickly be sorted out if this patch were to actually be commited.)
I've been using my own fork of the uri.scm library for now that includes many fixes to make it closer to the RFC's specification.
I could contribute those next if this gets merged.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: add me to authors --]
[-- Type: text/x-patch, Size: 1446 bytes --]
From 311cd5759b10537ee4641a48afa9d79e7a6c6055 Mon Sep 17 00:00:00 2001
From: Nathan <nathan_mail@nborghese.com>
Date: Tue, 28 Nov 2023 19:52:57 -0500
Subject: [PATCH 2/2] Add Nathan Borghese to AUTHORS
* AUTHORS: Add Nathan Borghese
* module/web/uri.scm: Update FSF copyright date
---
AUTHORS | 6 ++++--
module/web/uri.scm | 3 +--
2 files changed, 5 insertions(+), 4 deletions(-)
diff --git a/AUTHORS b/AUTHORS
index 2a95d3b0b..10464407b 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -372,9 +372,11 @@ In the subdirectory test-suite/tests, changes to:
hash.test
Vivien Kraus:
-In the subdirectory module/web, changes to:
- uri.scm
In the subdirectory doc/ref, changes to:
web.texi
In the subdirectory test-suite/tests, changes to:
web-uri.test
+
+Nathan Borghese:
+In the subdirectory module/web, changes to:
+ uri.scm
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 2280976b5..9d3063652 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -1,7 +1,6 @@
;;;; (web uri) --- URI manipulation tools
;;;;
-;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc.
-;;;; Copyright (C) 2023 Vivien Kraus
+;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2023 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
--
2.41.0
[-- Attachment #3: add me to THANKS --]
[-- Type: text/x-patch, Size: 1780 bytes --]
From 7c659b58db0de5f4e229848e177fc47ee0e4bd11 Mon Sep 17 00:00:00 2001
From: Nathan <nathan_mail@nborghese.com>
Date: Tue, 28 Nov 2023 20:10:32 -0500
Subject: [PATCH] Add Nathan Borghese to THANKS
* THANKS: Add Nathan Borghese
* module/web/uri.scm: Add Nathan's copyright
---
AUTHORS | 8 --------
THANKS | 1 +
module/web/uri.scm | 2 +-
3 files changed, 2 insertions(+), 9 deletions(-)
diff --git a/AUTHORS b/AUTHORS
index 2a95d3b0b..d756a74ce 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -370,11 +370,3 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore:
Gregory Marton:
In the subdirectory test-suite/tests, changes to:
hash.test
-
-Vivien Kraus:
-In the subdirectory module/web, changes to:
- uri.scm
-In the subdirectory doc/ref, changes to:
- web.texi
-In the subdirectory test-suite/tests, changes to:
- web-uri.test
diff --git a/THANKS b/THANKS
index a1f982f04..e986bab20 100644
--- a/THANKS
+++ b/THANKS
@@ -2,6 +2,7 @@ Contributors since the last release:
Christopher Baines
Greg Benison
+ Nathan Borghese
Rob Browning
Tristan Colgate-McFarlane
Aleix Conchillo Flaqué
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 2280976b5..708205136 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -1,7 +1,7 @@
;;;; (web uri) --- URI manipulation tools
;;;;
;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc.
-;;;; Copyright (C) 2023 Vivien Kraus
+;;;; Copyright (C) 2023 Nathan Borghese
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
--
2.41.0
[-- Attachment #4: Type: text/plain, Size: 545 bytes --]
Vivien Kraus <vivien@planete-kraus.eu> writes:
> Le lundi 27 novembre 2023 à 18:10 +0100, Vivien Kraus a écrit :
>> Dear Guile developers,
>>
>> Some time ago, Nathan wrote this patch as a better version of mine,
>> and
>> I would really like to see it merged.
>>
>> Does it need more improvement? I consider it pretty good.
>
> Maybe I spoke too soon: the AUTHORS file and the copyright file header
> should not mention me for uri.scm, but Nathan, as this version did not
> retain significant work by me.
>
> Vivien
^ permalink raw reply related [flat|nested] 21+ messages in thread
end of thread, other threads:[~2023-11-29 1:08 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-25 16:48 [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2 Vivien Kraus
2023-09-25 20:46 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v2] " Vivien Kraus
2023-10-02 16:32 ` Vivien Kraus
2023-10-03 18:49 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v3] " Vivien Kraus
2023-10-03 18:56 ` [PATCH v2] " Dale Mellor
2023-10-03 19:04 ` Maxime Devos
2023-10-03 20:03 ` [PATCH] " Vivien Kraus
2023-10-03 22:22 ` Maxime Devos
2023-10-03 22:30 ` Maxime Devos
2023-10-04 5:29 ` Vivien Kraus
2023-10-10 21:44 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v4] " Vivien Kraus
2023-11-02 20:00 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-02 20:48 ` Vivien Kraus
2023-11-03 17:49 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-03 18:19 ` Vivien Kraus
2023-11-27 17:10 ` Vivien Kraus
2023-11-27 17:15 ` Vivien Kraus
2023-11-29 1:08 ` Nathan via Developers list for Guile, the GNU extensibility library
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).