From: Vivien Kraus <vivien@planete-kraus.eu>
To: Maxime Devos <maximedevos@telenet.be>, guile-devel@gnu.org
Subject: Re: [PATCH v2] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2.
Date: Mon, 02 Oct 2023 18:32:50 +0200 [thread overview]
Message-ID: <090848be45623b187539be7826e924ddb05806a8.camel@planete-kraus.eu> (raw)
In-Reply-To: <dcd10e2d08251cdc6d19213098248fbea0b706ef.1695680018.git.vivien@planete-kraus.eu>
Hi!
Are there other things to fix?
Best regards,
Vivien
Le lundi 25 septembre 2023 à 18:48 +0200, Vivien Kraus a écrit :
> * module/web/uri.scm (remove-dot-segments): Implement algorithm
> 5.2.4.
> (merge-paths): Implement algorithm 5.2.3.
> (resolve-relative-reference): Implement algorithm 5.2.2.
> (module): Export resolve-relative-reference.
> * NEWS: Reference it here.
> * doc/ref/web.texi (URIs): Document it here.
> (Subtypes of URI): Add a @node declaration to cross-reference it.
> (HTTP Headers) [location]: Point to the section for different URI
> types.
> (Web Client) [http-request]: Indicate that no redirection is
> performed.
> ---
>
> I clarified the situation about redirections. I don’t think it’s
> Guile’s
> job to do it. For permanent redirections (301), the application
> developer is supposed to edit the pages that point to the now-moved
> resource anyway. A handful of security issues must also be lurking in
> the shadows, and I don’t think it should be a responsibility for the
> Guile web client.
>
> The specification uses the word "relative" both for the type of URI
> that
> is most likely to be found, and to express the asymmetric relation
> between both arguments of the algorithm. I think "base" and
> "dependent"
> are clearer, what do you think?
>
> The semicolon and equal sign are both reserved characters, so it’s
> expected that Guile escapes them. If there’s a bug, it is in the 5.4
> section of the RFC. However, I understand that it would be desirable
> for
> the algorithm to accept such unescaped characters, since it works
> with
> URIs in isolation and not in an HTTP frame or web page.
>
> NEWS | 7 ++
> doc/ref/web.texi | 27 +++++-
> module/web/uri.scm | 161
> +++++++++++++++++++++++++++++++++-
> test-suite/tests/web-uri.test | 68 ++++++++++++++
> 4 files changed, 261 insertions(+), 2 deletions(-)
>
> diff --git a/NEWS b/NEWS
> index b319404d7..bdf75cb3c 100644
> --- a/NEWS
> +++ b/NEWS
> @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9)
>
> * New interfaces and functionality
>
> +** New function in (web uri): resolve-relative-reference
> +
> +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It
> may
> +be used to request a moved resource in case of a 301 or 302 HTTP
> +response, by resolving the Location value of the response on top of
> the
> +requested URI.
> +
> ** New warning: unused-module
>
> This analysis, enabled at `-W2', issues warnings for modules that
> appear
> diff --git a/doc/ref/web.texi b/doc/ref/web.texi
> index 607c855b6..2267c9774 100644
> --- a/doc/ref/web.texi
> +++ b/doc/ref/web.texi
> @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs"
> "biscuits&gravy")} encodes
> as @code{"scrambled%20eggs/biscuits%26gravy"}.
> @end deffn
>
> +@node Subtypes of URI
> @subsubheading Subtypes of URI
>
> As we noted above, not all URI objects have a scheme. You might
> have
> @@ -356,6 +357,25 @@ Parse @var{string} into a URI object, while
> asserting that no scheme is
> present. Return @code{#f} if the string could not be parsed.
> @end deffn
>
> +@cindex resolve URI reference
> +In order to get a URI object from a base URI and a relative
> reference,
> +one has to use a @dfn{relative URI reference resolution} algorithm.
> For
> +instance, given a base URI, @samp{https://example.com/over/here},
> and a
> +relative reference, @samp{../no/there}, it may seem easy to get an
> +absolute URI as @samp{https://example.com/over/../no/there}. It is
> +possible that the server at @samp{https://example.com} could serve
> the
> +same resource under this URL as
> +@samp{https://example.com/no/there}. However, a web cache, or a
> linked
> +data processor, must understand that the relative reference
> resolution
> +leads to @samp{https://example.com/no/there}.
> +
> +@deffn {Scheme procedure} resolve-relative-reference @var{base}
> @var{dependent}
> +Return a URI object representing @var{dependent}, using the
> components
> +of @var{base} if missing, as defined in section 5.2 in RFC 3986.
> This
> +function cannot return a relative reference (it can only return an
> +absolute URI object), if either @var{base} or @var{dependent} is an
> +absolute URI object.
> +@end deffn
>
> @node HTTP
> @subsection The Hyper-Text Transfer Protocol
> @@ -1038,7 +1058,8 @@ The entity-tag of the resource.
> @deftypevr {HTTP Header} URI-reference location
> A URI reference on which a request may be completed. Used in
> combination with a redirecting status code to perform client-side
> -redirection.
> +redirection. @xref{Subtypes of URI, the distinction between types of
> +URI}, for more information on relative references.
> @example
> (parse-header 'location "http://example.com/other")
> @result{} #<uri ...>
> @@ -1501,6 +1522,10 @@ constants, such as @code{certificate-
> status/signer-not-found} or
> Connect to the server corresponding to @var{uri} and make a request
> over
> HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST},
> etc.).
>
> +@code{http-request} does not follow redirections. If a redirection
> is
> +required, @code{http-request} returns a response object with an
> adequate
> +response code (e.g. 301 or 302).
> +
> The following keyword arguments allow you to modify the requests in
> various ways, for example attaching a body to the request, or
> setting
> specific headers. The following table lists the keyword arguments
> and
> diff --git a/module/web/uri.scm b/module/web/uri.scm
> index 8e0b9bee7..acec2d1e8 100644
> --- a/module/web/uri.scm
> +++ b/module/web/uri.scm
> @@ -47,7 +47,9 @@
>
> uri-reference? relative-ref?
> build-uri-reference build-relative-ref
> - string->uri-reference string->relative-ref))
> + string->uri-reference string->relative-ref
> +
> + resolve-relative-reference))
>
> (define-record-type <uri>
> (make-uri scheme userinfo host port path query fragment)
> @@ -501,3 +503,160 @@ strings, and join the parts together with ‘/’
> as a delimiter.
> For example, the list ‘(\"scrambled eggs\" \"biscuits&gravy\")’
> encodes as ‘\"scrambled%20eggs/biscuits%26gravy\"’."
> (string-join (map uri-encode parts) "/"))
> +
> +(define (remove-dot-segments path)
> + "Remove the @samp{./} and @samp{../} segments in @var{path}, as
> + RFC3986, section 5.2.4."
> + (let scan ((input
> + (let ((components (split-and-decode-uri-path path)))
> + (if (string-suffix? "/" path)
> + `(,@components "")
> + components)))
> + (input-path-absolute? (string-prefix? "/" path))
> + (output '())
> + (output-absolute? #f)
> + (output-ends-in-/? (string-suffix? "/" path)))
> + (cond
> + ((and input-path-absolute?
> + (null? input))
> + ;; Transfer the initial "/" from the input to the end of the
> + ;; output.
> + (scan '() #f output output-absolute? #t))
> + ((null? input)
> + (string-append
> + (if output-absolute? "/" "")
> + (encode-and-join-uri-path
> + (reverse output))
> + (if output-ends-in-/? "/" "")))
> + ((and (not input-path-absolute?)
> + (or (equal? (car input) "..")
> + (equal? (car input) ".")))
> + (scan (cdr input) #f output output-absolute? output-ends-in-
> /?))
> + ((and input-path-absolute?
> + (equal? (car input) "."))
> + (scan (cdr input) #t output output-absolute? output-ends-in-
> /?))
> + ((and input-path-absolute?
> + (equal? (car input) ".."))
> + (scan (cdr input) #t
> + (if (null? output)
> + output
> + (cdr output))
> + ;; Remove the last segment, including the preceding /.
> So,
> + ;; if there is 0 or 1 segment, remove the root / too.
> + (if (or (null? output) (null? (cdr output)))
> + #f ;; remove the /
> + #t) ;; keep it
> + #f))
> + (else
> + (scan (cdr input)
> + ;; If there is only 1 item in input, then it does not
> end in
> + ;; /, so the recursive call does not start with
> + ;; /. Otherwise, the recursive call starts with /.
> + (not (null? (cdr input)))
> + (cons (car input) output)
> + ;; If the output is empty and the input path is
> absolute,
> + ;; the / of the transferred path is transferred as well.
> + (or output-absolute?
> + (and (null? output)
> + input-path-absolute?))
> + #f)))))
> +
> +(define (merge-paths base-has-authority? base dependent)
> + "Return @samp{@var{base}/@var{dependent}}, with the subtelties of
> absolute
> + paths explained in RFC3986, section 5.2.3. If the base URI has an
> +authority (userinfo, host, port), then the processing is a bit
> +different."
> + (if (and base-has-authority?
> + (equal? base ""))
> + (string-append "/" dependent)
> + (let ((last-/ (string-rindex base #\/)))
> + (if last-/
> + (string-append (substring base 0 last-/) "/" dependent)
> + dependent))))
> +
> +(define (resolve-relative-reference base dependent)
> + "Resolve @var{dependent} on top of @var{base}, as RFC3986, section
> +5.2. Both @var{dependent} and @var{base} may be URI or relative
> +references. The return value is a URI if either @var{dependent} or
> +@var{base} is a URI."
> + ;; As opposed to RFC 3986, we use "dependent" instead of
> "relative" to
> + ;; avoid confusion between "URI" and "relative reference", the
> + ;; dependent URI may be either.
> + (let ((b-scheme (uri-scheme base))
> + (b-userinfo (uri-userinfo base))
> + (b-host (uri-host base))
> + (b-port (uri-port base))
> + (b-path (uri-path base))
> + (b-query (uri-query base))
> + (b-fragment (uri-fragment base))
> + (r-scheme (uri-scheme dependent))
> + (r-userinfo (uri-userinfo dependent))
> + (r-host (uri-host dependent))
> + (r-port (uri-port dependent))
> + (r-path (uri-path dependent))
> + (r-query (uri-query dependent))
> + (r-fragment (uri-fragment dependent))
> + (t-scheme #f)
> + (t-userinfo #f)
> + (t-host #f)
> + (t-port #f)
> + (t-path "")
> + (t-query #f)
> + (t-fragment #f))
> + ;; https://www.rfc-editor.org/rfc/rfc3986#section-5.2
> +
> + ;;The programming style uses mutations to better adhere to the
> + ;;algorithm specification.
> + (if r-scheme
> + (begin
> + (set! t-scheme r-scheme)
> + (set! t-userinfo r-userinfo)
> + (set! t-host r-host)
> + (set! t-port r-port)
> + (set! t-path (remove-dot-segments r-path))
> + (set! t-query r-query))
> + ;; r-scheme is not defined:
> + (begin
> + (if r-host
> + (begin
> + (set! t-userinfo r-userinfo)
> + (set! t-host r-host)
> + (set! t-port r-port)
> + (set! t-path (remove-dot-segments r-path))
> + (set! t-query r-query))
> + ;; r-scheme is not defined, r-authority is not
> defined:
> + (begin
> + (if (equal? r-path "")
> + (begin
> + (set! t-path b-path)
> + (if r-query
> + ;; r-scheme, r-authority, r-path are not
> + ;; defined:
> + (set! t-query r-query)
> + ;; r-scheme, r-authority, r-path, r-query
> are
> + ;; not defined:
> + (set! t-query b-query)))
> + ;; r-scheme, r-authority not defined, r-path
> defined:
> + (begin
> + (if (string-prefix? "/" r-path)
> + ;; r-scheme, r-authority not defined, r-
> path
> + ;; absolute:
> + (set! t-path (remove-dot-segments r-path))
> + ;; r-scheme, r-authority not defined, r-
> path
> + ;; dependent:
> + (set! t-path
> + (remove-dot-segments
> + (merge-paths b-host b-path r-
> path))))
> + (set! t-query r-query)))
> + (set! t-userinfo b-userinfo)
> + (set! t-host b-host)
> + (set! t-port b-port)))
> + (set! t-scheme b-scheme)))
> + (set! t-fragment r-fragment)
> + (build-uri-reference #:scheme t-scheme
> + #:userinfo t-userinfo
> + #:host t-host
> + #:port t-port
> + #:path t-path
> + #:query t-query
> + #:fragment t-fragment)))
> diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-
> uri.test
> index 95fd82f16..c453bf60f 100644
> --- a/test-suite/tests/web-uri.test
> +++ b/test-suite/tests/web-uri.test
> @@ -20,6 +20,7 @@
> (define-module (test-web-uri)
> #:use-module (web uri)
> #:use-module (ice-9 regex)
> + #:use-module (ice-9 string-fun)
> #:use-module (test-suite lib))
>
>
> @@ -693,3 +694,70 @@
> (pass-if (equal? "foo%20bar" (uri-encode "foo bar")))
> (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar")))
> (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^"))))
> +
> +(with-test-prefix "resolve relative reference"
> + ;; Test suite in RFC3986, section 5.4.
> + (let ((base (string->uri "http://a/b/c/d;p?q"))
> + (equal/encoded?
> + ;; The test suite checks for ';' characters, but Guile
> escapes
> + ;; them in URIs. Same for '='.
> + (let ((escape-colon
> + (lambda (x)
> + (string-replace-substring x ";" "%3B")))
> + (escape-equal
> + (lambda (x)
> + (string-replace-substring x "=" "%3D"))))
> + (lambda (x y)
> + (equal? (escape-colon (escape-equal x))
> + (escape-colon (escape-equal y)))))))
> + (let ((resolve
> + (lambda (relative)
> + (let* ((relative-uri
> + (string->uri-reference relative))
> + (resolved-uri
> + (resolve-relative-reference base relative-uri))
> + (resolved (uri->string resolved-uri)))
> + resolved))))
> + (with-test-prefix "normal"
> + (pass-if (equal/encoded? (resolve "g:h") "g:h"))
> + (pass-if (equal/encoded? (resolve "g") "http://a/b/c/g"))
> + (pass-if (equal/encoded? (resolve "./g") "http://a/b/c/g"))
> + (pass-if (equal/encoded? (resolve "g/") "http://a/b/c/g/"))
> + (pass-if (equal/encoded? (resolve "/g") "http://a/g"))
> + (pass-if (equal/encoded? (resolve "//g") "http://g"))
> + (pass-if (equal/encoded? (resolve "?y")
> "http://a/b/c/d;p?y"))
> + (pass-if (equal/encoded? (resolve "g?y")
> "http://a/b/c/g?y"))
> + (pass-if (equal/encoded? (resolve "#s")
> "http://a/b/c/d;p?q#s"))
> + (pass-if (equal/encoded? (resolve "g?y#s")
> "http://a/b/c/g?y#s"))
> + (pass-if (equal/encoded? (resolve ";x") "http://a/b/c/;x"))
> + (pass-if (equal/encoded? (resolve "g;x?y#s")
> "http://a/b/c/g;x?y#s"))
> + (pass-if (equal/encoded? (resolve "") "http://a/b/c/d;p?q"))
> + (pass-if (equal/encoded? (resolve ".") "http://a/b/c/"))
> + (pass-if (equal/encoded? (resolve "./") "http://a/b/c/"))
> + (pass-if (equal/encoded? (resolve "..") "http://a/b/"))
> + (pass-if (equal/encoded? (resolve "../") "http://a/b/"))
> + (pass-if (equal/encoded? (resolve "../g") "http://a/b/g"))
> + (pass-if (equal/encoded? (resolve "../..") "http://a/"))
> + (pass-if (equal/encoded? (resolve "../../") "http://a/"))
> + (pass-if (equal/encoded? (resolve "../../g") "http://a/g")))
> + (with-test-prefix "abnormal"
> + (pass-if (equal/encoded? (resolve "../../../g")
> "http://a/g"))
> + (pass-if (equal/encoded? (resolve "../../../../g")
> "http://a/g"))
> + (pass-if (equal/encoded? (resolve "/./g") "http://a/g"))
> + (pass-if (equal/encoded? (resolve "/../g") "http://a/g"))
> + (pass-if (equal/encoded? (resolve "g.") "http://a/b/c/g."))
> + (pass-if (equal/encoded? (resolve ".g") "http://a/b/c/.g"))
> + (pass-if (equal/encoded? (resolve "g..")
> "http://a/b/c/g.."))
> + (pass-if (equal/encoded? (resolve "..g")
> "http://a/b/c/..g"))
> + (pass-if (equal/encoded? (resolve "./../g") "http://a/b/g"))
> + (pass-if (equal/encoded? (resolve "./g/.")
> "http://a/b/c/g/"))
> + (pass-if (equal/encoded? (resolve "g/./h")
> "http://a/b/c/g/h"))
> + (pass-if (equal/encoded? (resolve "g/../h")
> "http://a/b/c/h"))
> + (pass-if (equal/encoded? (resolve "g;x=1/./y")
> "http://a/b/c/g;x=1/y"))
> + (pass-if (equal/encoded? (resolve "g;x=1/../y")
> "http://a/b/c/y"))
> + (pass-if (equal/encoded? (resolve "g?y/./x")
> "http://a/b/c/g?y/./x"))
> + (pass-if (equal/encoded? (resolve "g?y/../x")
> "http://a/b/c/g?y/../x"))
> + (pass-if (equal/encoded? (resolve "g#s/./x")
> "http://a/b/c/g#s/./x"))
> + (pass-if (equal/encoded? (resolve "g#s/../x")
> "http://a/b/c/g#s/../x"))
> + (pass-if (equal/encoded? (resolve "http:g") "http:g"))))))
> +
>
> base-commit: 8441d8ff5671db690eb239cfea4dcfdee6d6dcdb
next prev parent reply other threads:[~2023-10-02 16:32 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-25 16:48 [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2 Vivien Kraus
2023-09-25 20:46 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v2] " Vivien Kraus
2023-10-02 16:32 ` Vivien Kraus [this message]
2023-10-03 18:49 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v3] " Vivien Kraus
2023-10-03 18:56 ` [PATCH v2] " Dale Mellor
2023-10-03 19:04 ` Maxime Devos
2023-10-03 20:03 ` [PATCH] " Vivien Kraus
2023-10-03 22:22 ` Maxime Devos
2023-10-03 22:30 ` Maxime Devos
2023-10-04 5:29 ` Vivien Kraus
2023-10-10 21:44 ` Maxime Devos
2023-09-25 16:48 ` [PATCH v4] " Vivien Kraus
2023-11-02 20:00 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-02 20:48 ` Vivien Kraus
2023-11-03 17:49 ` Nathan via Developers list for Guile, the GNU extensibility library
2023-11-03 18:19 ` Vivien Kraus
2023-11-27 17:10 ` Vivien Kraus
2023-11-27 17:15 ` Vivien Kraus
2023-11-29 1:08 ` Nathan via Developers list for Guile, the GNU extensibility library
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/guile/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=090848be45623b187539be7826e924ddb05806a8.camel@planete-kraus.eu \
--to=vivien@planete-kraus.eu \
--cc=guile-devel@gnu.org \
--cc=maximedevos@telenet.be \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).