unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Nathan via "Developers list for Guile, the GNU extensibility library" <guile-devel@gnu.org>
To: vivien@planete-kraus.eu, guile-devel@gnu.org
Subject: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
Date: Thu, 02 Nov 2023 16:00:51 -0400	[thread overview]
Message-ID: <87fs1n53de.fsf@nborghese.com> (raw)
In-Reply-To: <d6f4e5597bb61744552e4062020000eba2270dad.1698324123.git.vivien@planete-kraus.eu>

[-- Attachment #1: Type: text/plain, Size: 1323 bytes --]

There is a problem and I fixed it by rewriting a bunch of code myself because I need similar code.

remove-dot-segments:
You cannot split-and-decode-uri-path and then encode-and-join-uri-path.
Those are terrible functions that don't work on all URIs.
URI schemes are allowed to specify that certain reserved characters (sub-delims) are special.
In that case, a sub-delim that IS escaped is different from a sub-delim that IS NOT escaped.

Example input to your remove-dot-segments:
(resolve-relative-reference (string->uri-reference "/") (string->uri-reference "excitement://a.com/a!a!%21!"))
Your wrong output:
excitement://a.com/a%21a%21%21%21

One solution would be to only percent-decode dots. Because dot is unreserved, that solution doesn't have any URI equivalence issues.
But I still think decoding dots automatically is a bad, unexpected side-effect to have.
I rewrote this function so that it:
- works on both escaped and unescaped dots
- doesn't unescape any unnecessary characters

The test suite no longer needs to check for incorrect output either:
> ;; The test suite checks for ';' characters, but Guile escapes
> ;; them in URIs. Same for '='.

----

resolve-relative-reference:
I rewrote this procedure so it is shorter.
I also added #:strict? to toggle "strict parser" as mentioned in the RFC.

- Nathan

[-- Attachment #2: patch --]
[-- Type: text/x-patch, Size: 16708 bytes --]

From 655d3e61fa99bb5ddf5388c0843f498d0bf6f789 Mon Sep 17 00:00:00 2001
From: Nathan <nathan_mail@nborghese.com>
Date: Thu, 2 Nov 2023 15:42:30 -0400
Subject: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986
 5.2.

* module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4.
(merge-paths): Implement algorithm 5.2.3.
(resolve-relative-reference): Implement algorithm 5.2.2.
(module): Export resolve-relative-reference.
* NEWS: Reference it here.
* doc/ref/web.texi (URIs): Document it here.
(Subtypes of URI): Add a @node declaration to cross-reference it.
(HTTP Headers) [location]: Point to the section for different URI types.
(Web Client) [http-request]: Indicate that no redirection is performed,
and warn about blindly following them.
* AUTHORS: Mention Vivien Kraus.
* THANKS: Thank Vivien Kraus.
---
 AUTHORS                       |   8 +++
 NEWS                          |   7 ++
 THANKS                        |   1 +
 doc/ref/web.texi              |  43 +++++++++++-
 module/web/uri.scm            | 126 +++++++++++++++++++++++++++++++++-
 test-suite/tests/web-uri.test |  61 ++++++++++++++++
 6 files changed, 244 insertions(+), 2 deletions(-)

diff --git a/AUTHORS b/AUTHORS
index d756a74ce..2a95d3b0b 100644
--- a/AUTHORS
+++ b/AUTHORS
@@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratories and Bellcore:
 Gregory Marton:
 In the subdirectory test-suite/tests, changes to:
     hash.test
+
+Vivien Kraus:
+In the subdirectory module/web, changes to:
+    uri.scm
+In the subdirectory doc/ref, changes to:
+    web.texi
+In the subdirectory test-suite/tests, changes to:
+    web-uri.test
diff --git a/NEWS b/NEWS
index b319404d7..bdf75cb3c 100644
--- a/NEWS
+++ b/NEWS
@@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
 
 * New interfaces and functionality
 
+** New function in (web uri): resolve-relative-reference
+
+Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may
+be used to request a moved resource in case of a 301 or 302 HTTP
+response, by resolving the Location value of the response on top of the
+requested URI.
+
 ** New warning: unused-module
 
 This analysis, enabled at `-W2', issues warnings for modules that appear
diff --git a/THANKS b/THANKS
index aa4877e95..a1f982f04 100644
--- a/THANKS
+++ b/THANKS
@@ -19,6 +19,7 @@ Contributors since the last release:
         Chris K Jester-Young
           David Kastrup
          Daniel Kraft
+         Vivien Kraus
          Daniel Krueger
            Noah Lavine
     Christopher Lemmer Webber
diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 607c855b6..c6923c23f 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits&gravy")} encodes
 as @code{"scrambled%20eggs/biscuits%26gravy"}.
 @end deffn
 
+@node Subtypes of URI
 @subsubheading Subtypes of URI
 
 As we noted above, not all URI objects have a scheme.  You might have
@@ -356,6 +357,34 @@ Parse @var{string} into a URI object, while asserting that no scheme is
 present.  Return @code{#f} if the string could not be parsed.
 @end deffn
 
+@cindex resolve URI reference
+In order to get a URI object from a base URI and a relative reference,
+one has to use a @dfn{relative URI reference resolution} algorithm.  For
+instance, given a base URI, @samp{https://example.com/over/here}, and a
+relative reference, @samp{../no/there}, it may seem easy to get an
+absolute URI as @samp{https://example.com/over/../no/there}.  It is
+possible that the server at @samp{https://example.com} could serve the
+same resource under this URL as
+@samp{https://example.com/no/there}.  However, a web cache, or a linked
+data processor, must understand that the relative reference resolution
+leads to @samp{https://example.com/no/there}.
+
+@deffn {Scheme Procedure} resolve-relative-reference @var{base} @var{relative} @
+       [#:strict?=@code{#t}]
+Return a URI object representing @var{relative}, using the components of
+@var{base} if missing, as defined in section 5.2 in RFC 3986.  Both @var{base}
+and @var{relative} may be full URI or relative URI references.  If @var{strict?}
+is true, the parser does not ignore the scheme in @var{relative} if it is
+identical to the one in @var{base}.  The name ``relative'' indicates the
+argument’s relationship to @var{base}, not its type.  This function cannot
+return a relative reference (it can only return an absolute URI object), if
+either @var{base} or @var{relative} is an absolute URI object.
+
+Please note that any part of @var{base} may be overriden by
+@var{relative}.  For instance, if @var{base} has a @code{https} URI
+scheme, and if @var{relative} has a @code{http} scheme, then the result
+will have a @code{http} scheme.
+@end deffn
 
 @node HTTP
 @subsection The Hyper-Text Transfer Protocol
@@ -1038,7 +1067,8 @@ The entity-tag of the resource.
 @deftypevr {HTTP Header} URI-reference location
 A URI reference on which a request may be completed.  Used in
 combination with a redirecting status code to perform client-side
-redirection.
+redirection.  @xref{Subtypes of URI, the distinction between types of
+URI}, for more information on relative references.
 @example
 (parse-header 'location "http://example.com/other")
 @result{} #<uri ...>
@@ -1501,6 +1531,17 @@ constants, such as @code{certificate-status/signer-not-found} or
 Connect to the server corresponding to @var{uri} and make a request over
 HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.).
 
+@code{http-request} does not follow redirections.  If a redirection is
+required, @code{http-request} returns a response object with an adequate
+response code (e.g. 301 or 302).
+
+Making web requests on a network where private servers are hosted comes
+with potential security risks.  A malicious public server might forge
+its DNS record to point to your internal address.  It might also
+redirect you to your internal server.  In the first case, or if you
+follow the redirection of the second case, then you may accidentally
+connect to your private server as if it were public.
+
 The following keyword arguments allow you to modify the requests in
 various ways, for example attaching a body to the request, or setting
 specific headers.  The following table lists the keyword arguments and
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 8e0b9bee7..2280976b5 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -1,6 +1,7 @@
 ;;;; (web uri) --- URI manipulation tools
 ;;;;
 ;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free Software Foundation, Inc.
+;;;; Copyright (C) 2023 Vivien Kraus
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -47,7 +48,9 @@
 
             uri-reference? relative-ref?
             build-uri-reference build-relative-ref
-            string->uri-reference string->relative-ref))
+            string->uri-reference string->relative-ref
+
+            resolve-relative-reference))
 
 (define-record-type <uri>
   (make-uri scheme userinfo host port path query fragment)
@@ -501,3 +504,124 @@ strings, and join the parts together with ‘/’ as a delimiter.
 For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
 encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
   (string-join (map uri-encode parts) "/"))
+
+(define (remove-dot-segments path)
+  "Remove the @samp{./} and @samp{../} segments in @var{path}, as
+RFC3986, section 5.2.4. This procedure properly handles percent-encoded
+dots, but does not percent-decode any unnecessary bytes."
+  (let lp ((input path) (out '()))
+    (define (get-dots-info)
+      "returns three values about the start of the current input string.
+(values starts-with-slash? dots end-slash-index)
+DOTS is the number of dot characters, including escaped ones.
+If there are non-dot, non-slash characters too then DOTS will
+instead be some meaningless number greater than two."
+      (let ((starts-with-slash (eqv? (string-ref input 0) #\/)))
+        (let dots-lp ((i (if starts-with-slash 1 0))
+                      (dots 0))
+          (if (eqv? i (string-length input))
+              (values starts-with-slash dots i)
+              (let ((c (string-ref input i)))
+                (cond
+                 ((eqv? c #\/)
+                  (values starts-with-slash dots i))
+                 ((string-prefix-ci? "%2E" input 0 3 i)
+                  (dots-lp (+ i 3) (1+ dots)))
+                 ((eqv? c #\.)
+                  (dots-lp (1+ i) (1+ dots)))
+                 (else
+                  (dots-lp (1+ i) 3))))))))
+    (if (string-null? input)
+        (apply string-append (reverse out))
+        (call-with-values get-dots-info
+          (lambda (starts-with-slash? dots end-slash-pos)
+            (cond
+             ;; handle ../ ./ . ..
+             ((and (not starts-with-slash?) (or (eqv? dots 1) (eqv? dots 2)))
+              (lp (substring input (min (1+ end-slash-pos)
+                                        (string-length input))) out))
+             ((and starts-with-slash? (eqv? dots 1))
+              (lp
+               (if (eqv? end-slash-pos (string-length input))
+                   "/" ;; handle /.
+                   (substring input end-slash-pos)) ;; handle /./
+               out))
+             ((and starts-with-slash? (eqv? dots 2))
+              (lp
+               (if (eqv? end-slash-pos (string-length input))
+                   "/" ;; handle /..
+                   (substring input end-slash-pos)) ;; handle /../
+               (if (null? out) out (cdr out))))
+             (else
+              (lp
+               (substring input end-slash-pos)
+               (cons (substring input 0 end-slash-pos) out)))))))))
+
+(define (merge-paths base-has-authority? base relative)
+  "Return @samp{@var{base}/@var{relative}}, with the subtleties of
+absolute paths explained in RFC3986, section 5.2.3. If the base URI has
+an authority (userinfo, host, port), then the processing is a bit
+different."
+  (if (and base-has-authority?
+           (equal? base ""))
+      (string-append "/" relative)
+      (let ((last-/ (string-rindex base #\/)))
+        (if last-/
+            (string-append (substring base 0 last-/) "/" relative)
+            relative))))
+
+(define* (resolve-relative-reference base relative #:key (strict? #t))
+  "Resolve @var{relative} on top of @var{base}, as RFC3986, section
+5.2.  Both @var{relative} and @var{base} may be URI or relative
+references.  The name ``relative'' indicates the argument’s relationship
+to @var{base}, not its type.  Both @var{base} and @var{relative} may be
+full URIs or relative references.  The return value is a URI if either
+@var{relative} or @var{base} is a URI. If @var{strict?} is true, the
+default, the parser does not ignore the scheme in @var{relative} if it
+is identical to the one in @var{base}."
+  (let ((b-scheme (uri-scheme base))
+        (b-userinfo (uri-userinfo base))
+        (b-host (uri-host base))
+        (b-port (uri-port base))
+        (b-path (uri-path base))
+        (b-query (uri-query base))
+        (b-fragment (uri-fragment base))
+        (r-scheme (uri-scheme relative))
+        (r-userinfo (uri-userinfo relative))
+        (r-host (uri-host relative))
+        (r-port (uri-port relative))
+        (r-path (uri-path relative))
+        (r-query (uri-query relative))
+        (r-fragment (uri-fragment relative)))
+    (cond
+     ((or r-host (and r-scheme (or strict? (not (eq? r-scheme b-scheme)))))
+      (build-uri-reference
+       #:scheme (or r-scheme b-scheme)
+       #:userinfo r-userinfo
+       #:host r-host
+       #:port r-port
+       #:path (remove-dot-segments r-path)
+       #:query r-query
+       #:fragment r-fragment))
+     ((string-null? r-path)
+      (build-uri-reference
+       #:scheme b-scheme
+       #:userinfo b-userinfo
+       #:host b-host
+       #:port b-port
+       #:path b-path
+       #:query (or r-query b-query)
+       #:fragment r-fragment))
+     (else
+      (build-uri-reference
+       #:scheme b-scheme
+       #:userinfo b-userinfo
+       #:host b-host
+       #:port b-port
+       #:path
+       (remove-dot-segments
+        (if (string-prefix? "/" r-path)
+            r-path
+            (merge-paths b-host b-path r-path)))
+       #:query r-query
+       #:fragment r-fragment)))))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 95fd82f16..b4d4b6cdb 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -693,3 +693,64 @@
   (pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
   (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
   (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
+
+(with-test-prefix "resolve relative reference"
+  ;; Test suite in RFC3986, section 5.4.
+  (let ((base (string->uri "http://a/b/c/d;p?q")))
+    (let ((resolve
+           (lambda (relative)
+             (let* ((relative-uri
+                     (string->uri-reference relative))
+                    (resolved-uri
+                     (resolve-relative-reference base relative-uri))
+                    (resolved (uri->string resolved-uri)))
+               resolved))))
+      (pass-if "remove-dot-segments unnecessary escaping"
+        (equal? (resolve "%2e%2E/.%2e/%2E./g%2e%2E%2Fh%2e") "http://a/g%2e%2E%2Fh%2e"))
+      (with-test-prefix "normal"
+        (pass-if (equal? (resolve "g:h") "g:h"))
+        (pass-if (equal? (resolve "g") "http://a/b/c/g"))
+        (pass-if (equal? (resolve "./g") "http://a/b/c/g"))
+        (pass-if (equal? (resolve "g/") "http://a/b/c/g/"))
+        (pass-if (equal? (resolve "/g") "http://a/g"))
+        (pass-if (equal? (resolve "//g") "http://g"))
+        (pass-if (equal? (resolve "?y") "http://a/b/c/d;p?y"))
+        (pass-if (equal? (resolve "g?y") "http://a/b/c/g?y"))
+        (pass-if (equal? (resolve "#s") "http://a/b/c/d;p?q#s"))
+        (pass-if (equal? (resolve "g?y#s") "http://a/b/c/g?y#s"))
+        (pass-if (equal? (resolve ";x") "http://a/b/c/;x"))
+        (pass-if (equal? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s"))
+        (pass-if (equal? (resolve "") "http://a/b/c/d;p?q"))
+        (pass-if (equal? (resolve ".") "http://a/b/c/"))
+        (pass-if (equal? (resolve "./") "http://a/b/c/"))
+        (pass-if (equal? (resolve "..") "http://a/b/"))
+        (pass-if (equal? (resolve "../") "http://a/b/"))
+        (pass-if (equal? (resolve "../g") "http://a/b/g"))
+        (pass-if (equal? (resolve "../..") "http://a/"))
+        (pass-if (equal? (resolve "../../") "http://a/"))
+        (pass-if (equal? (resolve "../../g") "http://a/g")))
+      (with-test-prefix "abnormal"
+        (pass-if (equal? (resolve "../../../g") "http://a/g"))
+        (pass-if (equal? (resolve "../../../../g") "http://a/g"))
+        (pass-if (equal? (resolve "/./g") "http://a/g"))
+        (pass-if (equal? (resolve "/../g") "http://a/g"))
+        (pass-if (equal? (resolve "g.") "http://a/b/c/g."))
+        (pass-if (equal? (resolve ".g") "http://a/b/c/.g"))
+        (pass-if (equal? (resolve "g..") "http://a/b/c/g.."))
+        (pass-if (equal? (resolve "..g") "http://a/b/c/..g"))
+        (pass-if (equal? (resolve "./../g") "http://a/b/g"))
+        (pass-if (equal? (resolve "./g/.") "http://a/b/c/g/"))
+        (pass-if (equal? (resolve "g/./h") "http://a/b/c/g/h"))
+        (pass-if (equal? (resolve "g/../h") "http://a/b/c/h"))
+        (pass-if (equal? (resolve "g;x=1/./y") "http://a/b/c/g;x=1/y"))
+        (pass-if (equal? (resolve "g;x=1/../y") "http://a/b/c/y"))
+        (pass-if (equal? (resolve "g?y/./x") "http://a/b/c/g?y/./x"))
+        (pass-if (equal? (resolve "g?y/../x") "http://a/b/c/g?y/../x"))
+        (pass-if (equal? (resolve "g#s/./x") "http://a/b/c/g#s/./x"))
+        (pass-if (equal? (resolve "g#s/../x") "http://a/b/c/g#s/../x"))
+        (pass-if (equal? (resolve "http:g") "http:g"))
+        (pass-if "nonstrict relative resolve"
+          (equal? (uri->string (resolve-relative-reference
+                                base (string->uri-reference "http:g")
+                                #:strict? #f))
+                  "http://a/b/c/g"))))))
-- 
2.41.0


  reply	other threads:[~2023-11-02 20:00 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-25 16:48 [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2 Vivien Kraus
2023-09-25 20:46 ` Maxime Devos
2023-09-25 16:48   ` [PATCH v2] " Vivien Kraus
2023-10-02 16:32     ` Vivien Kraus
2023-10-03 18:49       ` Maxime Devos
2023-09-25 16:48         ` [PATCH v3] " Vivien Kraus
2023-10-03 18:56         ` [PATCH v2] " Dale Mellor
2023-10-03 19:04           ` Maxime Devos
2023-10-03 20:03   ` [PATCH] " Vivien Kraus
2023-10-03 22:22     ` Maxime Devos
2023-10-03 22:30       ` Maxime Devos
2023-10-04  5:29         ` Vivien Kraus
2023-10-10 21:44           ` Maxime Devos
2023-09-25 16:48             ` [PATCH v4] " Vivien Kraus
2023-11-02 20:00               ` Nathan via Developers list for Guile, the GNU extensibility library [this message]
2023-11-02 20:48                 ` Vivien Kraus
2023-11-03 17:49                   ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-03 18:19                     ` Vivien Kraus
2023-11-27 17:10                 ` Vivien Kraus
2023-11-27 17:15                   ` Vivien Kraus
2023-11-29  1:08                     ` Nathan via Developers list for Guile, the GNU extensibility library

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87fs1n53de.fsf@nborghese.com \
    --to=guile-devel@gnu.org \
    --cc=nathan_mail@nborghese.com \
    --cc=vivien@planete-kraus.eu \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).