From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Nathan via "Developers list for Guile, the GNU extensibility library" Newsgroups: gmane.lisp.guile.devel Subject: [PATCH v4] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2. Date: Thu, 02 Nov 2023 16:00:51 -0400 Message-ID: <87fs1n53de.fsf@nborghese.com> References: Reply-To: Nathan Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="38258"; mail-complaints-to="usenet@ciao.gmane.io" To: vivien@planete-kraus.eu, guile-devel@gnu.org Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Thu Nov 02 23:14:43 2023 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qyfxb-0009lF-AF for guile-devel@m.gmane-mx.org; Thu, 02 Nov 2023 23:14:43 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qyfxO-0007Zt-60; Thu, 02 Nov 2023 18:14:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qydvN-00033l-Eb for guile-devel@gnu.org; Thu, 02 Nov 2023 16:04:17 -0400 Original-Received: from mail.nborghese.com ([207.148.28.48]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qydvJ-0003lO-6B for guile-devel@gnu.org; Thu, 02 Nov 2023 16:04:15 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; s=062122; bh=rioITcqScSx82 nlJzPfHY6+OiLf55kyDWID1/etbosA=; h=date:in-reply-to:subject:to:from; d=nborghese.com; b=GpmPkZgjnFSJ9eO/S6uIL3EPV9cOHs8vMPE633gwTuQmexUNV/v x3xilZP/1qrYuOrO1R8lIRQbd+qt7DED7tH/R68o0U8zsNuvB6ZWfYNNA5AWkr1gbnwRCH WCFXcsAwjjrzvMtnvAHjC/uXgSz03kf/5Taa3+F/RCIYmovRLA= Original-Received: by nborghese.com (OpenSMTPD) with ESMTPSA id 2bba4998 (TLSv1.3:TLS_AES_256_GCM_SHA384:256:NO); Thu, 2 Nov 2023 20:04:08 +0000 (UTC) In-Reply-To: Received-SPF: pass client-ip=207.148.28.48; envelope-from=nathan_mail@nborghese.com; helo=mail.nborghese.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Thu, 02 Nov 2023 18:14:29 -0400 X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:22060 Archived-At: --=-=-= Content-Type: text/plain There is a problem and I fixed it by rewriting a bunch of code myself because I need similar code. remove-dot-segments: You cannot split-and-decode-uri-path and then encode-and-join-uri-path. Those are terrible functions that don't work on all URIs. URI schemes are allowed to specify that certain reserved characters (sub-delims) are special. In that case, a sub-delim that IS escaped is different from a sub-delim that IS NOT escaped. Example input to your remove-dot-segments: (resolve-relative-reference (string->uri-reference "/") (string->uri-reference "excitement://a.com/a!a!%21!")) Your wrong output: excitement://a.com/a%21a%21%21%21 One solution would be to only percent-decode dots. Because dot is unreserved, that solution doesn't have any URI equivalence issues. But I still think decoding dots automatically is a bad, unexpected side-effect to have. I rewrote this function so that it: - works on both escaped and unescaped dots - doesn't unescape any unnecessary characters The test suite no longer needs to check for incorrect output either: > ;; The test suite checks for ';' characters, but Guile escapes > ;; them in URIs. Same for '='. ---- resolve-relative-reference: I rewrote this procedure so it is shorter. I also added #:strict? to toggle "strict parser" as mentioned in the RFC. - Nathan --=-=-= Content-Type: text/x-patch; charset=utf-8 Content-Disposition: attachment; filename=0001-Add-resolve-relative-reference-in-web-uri-as-in-RFC-.patch Content-Transfer-Encoding: quoted-printable Content-Description: patch >From 655d3e61fa99bb5ddf5388c0843f498d0bf6f789 Mon Sep 17 00:00:00 2001 From: Nathan Date: Thu, 2 Nov 2023 15:42:30 -0400 Subject: [PATCH] Add resolve-relative-reference in (web uri), as in RFC 3986 5.2. * module/web/uri.scm (remove-dot-segments): Implement algorithm 5.2.4. (merge-paths): Implement algorithm 5.2.3. (resolve-relative-reference): Implement algorithm 5.2.2. (module): Export resolve-relative-reference. * NEWS: Reference it here. * doc/ref/web.texi (URIs): Document it here. (Subtypes of URI): Add a @node declaration to cross-reference it. (HTTP Headers) [location]: Point to the section for different URI types. (Web Client) [http-request]: Indicate that no redirection is performed, and warn about blindly following them. * AUTHORS: Mention Vivien Kraus. * THANKS: Thank Vivien Kraus. --- AUTHORS | 8 +++ NEWS | 7 ++ THANKS | 1 + doc/ref/web.texi | 43 +++++++++++- module/web/uri.scm | 126 +++++++++++++++++++++++++++++++++- test-suite/tests/web-uri.test | 61 ++++++++++++++++ 6 files changed, 244 insertions(+), 2 deletions(-) diff --git a/AUTHORS b/AUTHORS index d756a74ce..2a95d3b0b 100644 --- a/AUTHORS +++ b/AUTHORS @@ -370,3 +370,11 @@ John W. Eaton, based on code from AT&T Bell Laboratori= es and Bellcore: Gregory Marton: In the subdirectory test-suite/tests, changes to: hash.test + +Vivien Kraus: +In the subdirectory module/web, changes to: + uri.scm +In the subdirectory doc/ref, changes to: + web.texi +In the subdirectory test-suite/tests, changes to: + web-uri.test diff --git a/NEWS b/NEWS index b319404d7..bdf75cb3c 100644 --- a/NEWS +++ b/NEWS @@ -9,6 +9,13 @@ Changes in 3.0.10 (since 3.0.9) =20 * New interfaces and functionality =20 +** New function in (web uri): resolve-relative-reference + +Implement the /5.2. Relative Resolution/ algorithm in RFC 3986. It may +be used to request a moved resource in case of a 301 or 302 HTTP +response, by resolving the Location value of the response on top of the +requested URI. + ** New warning: unused-module =20 This analysis, enabled at `-W2', issues warnings for modules that appear diff --git a/THANKS b/THANKS index aa4877e95..a1f982f04 100644 --- a/THANKS +++ b/THANKS @@ -19,6 +19,7 @@ Contributors since the last release: Chris K Jester-Young David Kastrup Daniel Kraft + Vivien Kraus Daniel Krueger Noah Lavine Christopher Lemmer Webber diff --git a/doc/ref/web.texi b/doc/ref/web.texi index 607c855b6..c6923c23f 100644 --- a/doc/ref/web.texi +++ b/doc/ref/web.texi @@ -297,6 +297,7 @@ For example, the list @code{("scrambled eggs" "biscuits= &gravy")} encodes as @code{"scrambled%20eggs/biscuits%26gravy"}. @end deffn =20 +@node Subtypes of URI @subsubheading Subtypes of URI =20 As we noted above, not all URI objects have a scheme. You might have @@ -356,6 +357,34 @@ Parse @var{string} into a URI object, while asserting = that no scheme is present. Return @code{#f} if the string could not be parsed. @end deffn =20 +@cindex resolve URI reference +In order to get a URI object from a base URI and a relative reference, +one has to use a @dfn{relative URI reference resolution} algorithm. For +instance, given a base URI, @samp{https://example.com/over/here}, and a +relative reference, @samp{../no/there}, it may seem easy to get an +absolute URI as @samp{https://example.com/over/../no/there}. It is +possible that the server at @samp{https://example.com} could serve the +same resource under this URL as +@samp{https://example.com/no/there}. However, a web cache, or a linked +data processor, must understand that the relative reference resolution +leads to @samp{https://example.com/no/there}. + +@deffn {Scheme Procedure} resolve-relative-reference @var{base} @var{relat= ive} @ + [#:strict?=3D@code{#t}] +Return a URI object representing @var{relative}, using the components of +@var{base} if missing, as defined in section 5.2 in RFC 3986. Both @var{b= ase} +and @var{relative} may be full URI or relative URI references. If @var{st= rict?} +is true, the parser does not ignore the scheme in @var{relative} if it is +identical to the one in @var{base}. The name ``relative'' indicates the +argument=E2=80=99s relationship to @var{base}, not its type. This functio= n cannot +return a relative reference (it can only return an absolute URI object), if +either @var{base} or @var{relative} is an absolute URI object. + +Please note that any part of @var{base} may be overriden by +@var{relative}. For instance, if @var{base} has a @code{https} URI +scheme, and if @var{relative} has a @code{http} scheme, then the result +will have a @code{http} scheme. +@end deffn =20 @node HTTP @subsection The Hyper-Text Transfer Protocol @@ -1038,7 +1067,8 @@ The entity-tag of the resource. @deftypevr {HTTP Header} URI-reference location A URI reference on which a request may be completed. Used in combination with a redirecting status code to perform client-side -redirection. +redirection. @xref{Subtypes of URI, the distinction between types of +URI}, for more information on relative references. @example (parse-header 'location "http://example.com/other") @result{} # @@ -1501,6 +1531,17 @@ constants, such as @code{certificate-status/signer-n= ot-found} or Connect to the server corresponding to @var{uri} and make a request over HTTP, using @var{method} (@code{GET}, @code{HEAD}, @code{POST}, etc.). =20 +@code{http-request} does not follow redirections. If a redirection is +required, @code{http-request} returns a response object with an adequate +response code (e.g. 301 or 302). + +Making web requests on a network where private servers are hosted comes +with potential security risks. A malicious public server might forge +its DNS record to point to your internal address. It might also +redirect you to your internal server. In the first case, or if you +follow the redirection of the second case, then you may accidentally +connect to your private server as if it were public. + The following keyword arguments allow you to modify the requests in various ways, for example attaching a body to the request, or setting specific headers. The following table lists the keyword arguments and diff --git a/module/web/uri.scm b/module/web/uri.scm index 8e0b9bee7..2280976b5 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -1,6 +1,7 @@ ;;;; (web uri) --- URI manipulation tools ;;;; ;;;; Copyright (C) 1997,2001,2002,2010,2011,2012,2013,2014,2019-2021 Free = Software Foundation, Inc. +;;;; Copyright (C) 2023 Vivien Kraus ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -47,7 +48,9 @@ =20 uri-reference? relative-ref? build-uri-reference build-relative-ref - string->uri-reference string->relative-ref)) + string->uri-reference string->relative-ref + + resolve-relative-reference)) =20 (define-record-type (make-uri scheme userinfo host port path query fragment) @@ -501,3 +504,124 @@ strings, and join the parts together with =E2=80=98/= =E2=80=99 as a delimiter. For example, the list =E2=80=98(\"scrambled eggs\" \"biscuits&gravy\")=E2= =80=99 encodes as =E2=80=98\"scrambled%20eggs/biscuits%26gravy\"=E2=80=99." (string-join (map uri-encode parts) "/")) + +(define (remove-dot-segments path) + "Remove the @samp{./} and @samp{../} segments in @var{path}, as +RFC3986, section 5.2.4. This procedure properly handles percent-encoded +dots, but does not percent-decode any unnecessary bytes." + (let lp ((input path) (out '())) + (define (get-dots-info) + "returns three values about the start of the current input string. +(values starts-with-slash? dots end-slash-index) +DOTS is the number of dot characters, including escaped ones. +If there are non-dot, non-slash characters too then DOTS will +instead be some meaningless number greater than two." + (let ((starts-with-slash (eqv? (string-ref input 0) #\/))) + (let dots-lp ((i (if starts-with-slash 1 0)) + (dots 0)) + (if (eqv? i (string-length input)) + (values starts-with-slash dots i) + (let ((c (string-ref input i))) + (cond + ((eqv? c #\/) + (values starts-with-slash dots i)) + ((string-prefix-ci? "%2E" input 0 3 i) + (dots-lp (+ i 3) (1+ dots))) + ((eqv? c #\.) + (dots-lp (1+ i) (1+ dots))) + (else + (dots-lp (1+ i) 3)))))))) + (if (string-null? input) + (apply string-append (reverse out)) + (call-with-values get-dots-info + (lambda (starts-with-slash? dots end-slash-pos) + (cond + ;; handle ../ ./ . .. + ((and (not starts-with-slash?) (or (eqv? dots 1) (eqv? dots 2= ))) + (lp (substring input (min (1+ end-slash-pos) + (string-length input))) out)) + ((and starts-with-slash? (eqv? dots 1)) + (lp + (if (eqv? end-slash-pos (string-length input)) + "/" ;; handle /. + (substring input end-slash-pos)) ;; handle /./ + out)) + ((and starts-with-slash? (eqv? dots 2)) + (lp + (if (eqv? end-slash-pos (string-length input)) + "/" ;; handle /.. + (substring input end-slash-pos)) ;; handle /../ + (if (null? out) out (cdr out)))) + (else + (lp + (substring input end-slash-pos) + (cons (substring input 0 end-slash-pos) out))))))))) + +(define (merge-paths base-has-authority? base relative) + "Return @samp{@var{base}/@var{relative}}, with the subtleties of +absolute paths explained in RFC3986, section 5.2.3. If the base URI has +an authority (userinfo, host, port), then the processing is a bit +different." + (if (and base-has-authority? + (equal? base "")) + (string-append "/" relative) + (let ((last-/ (string-rindex base #\/))) + (if last-/ + (string-append (substring base 0 last-/) "/" relative) + relative)))) + +(define* (resolve-relative-reference base relative #:key (strict? #t)) + "Resolve @var{relative} on top of @var{base}, as RFC3986, section +5.2. Both @var{relative} and @var{base} may be URI or relative +references. The name ``relative'' indicates the argument=E2=80=99s relati= onship +to @var{base}, not its type. Both @var{base} and @var{relative} may be +full URIs or relative references. The return value is a URI if either +@var{relative} or @var{base} is a URI. If @var{strict?} is true, the +default, the parser does not ignore the scheme in @var{relative} if it +is identical to the one in @var{base}." + (let ((b-scheme (uri-scheme base)) + (b-userinfo (uri-userinfo base)) + (b-host (uri-host base)) + (b-port (uri-port base)) + (b-path (uri-path base)) + (b-query (uri-query base)) + (b-fragment (uri-fragment base)) + (r-scheme (uri-scheme relative)) + (r-userinfo (uri-userinfo relative)) + (r-host (uri-host relative)) + (r-port (uri-port relative)) + (r-path (uri-path relative)) + (r-query (uri-query relative)) + (r-fragment (uri-fragment relative))) + (cond + ((or r-host (and r-scheme (or strict? (not (eq? r-scheme b-scheme))))) + (build-uri-reference + #:scheme (or r-scheme b-scheme) + #:userinfo r-userinfo + #:host r-host + #:port r-port + #:path (remove-dot-segments r-path) + #:query r-query + #:fragment r-fragment)) + ((string-null? r-path) + (build-uri-reference + #:scheme b-scheme + #:userinfo b-userinfo + #:host b-host + #:port b-port + #:path b-path + #:query (or r-query b-query) + #:fragment r-fragment)) + (else + (build-uri-reference + #:scheme b-scheme + #:userinfo b-userinfo + #:host b-host + #:port b-port + #:path + (remove-dot-segments + (if (string-prefix? "/" r-path) + r-path + (merge-paths b-host b-path r-path))) + #:query r-query + #:fragment r-fragment))))) diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 95fd82f16..b4d4b6cdb 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -693,3 +693,64 @@ (pass-if (equal? "foo%20bar" (uri-encode "foo bar"))) (pass-if (equal? "foo%0A%00bar" (uri-encode "foo\n\x00bar"))) (pass-if (equal? "%3C%3E%5C%5E" (uri-encode "<>\\^")))) + +(with-test-prefix "resolve relative reference" + ;; Test suite in RFC3986, section 5.4. + (let ((base (string->uri "http://a/b/c/d;p?q"))) + (let ((resolve + (lambda (relative) + (let* ((relative-uri + (string->uri-reference relative)) + (resolved-uri + (resolve-relative-reference base relative-uri)) + (resolved (uri->string resolved-uri))) + resolved)))) + (pass-if "remove-dot-segments unnecessary escaping" + (equal? (resolve "%2e%2E/.%2e/%2E./g%2e%2E%2Fh%2e") "http://a/g%2e= %2E%2Fh%2e")) + (with-test-prefix "normal" + (pass-if (equal? (resolve "g:h") "g:h")) + (pass-if (equal? (resolve "g") "http://a/b/c/g")) + (pass-if (equal? (resolve "./g") "http://a/b/c/g")) + (pass-if (equal? (resolve "g/") "http://a/b/c/g/")) + (pass-if (equal? (resolve "/g") "http://a/g")) + (pass-if (equal? (resolve "//g") "http://g")) + (pass-if (equal? (resolve "?y") "http://a/b/c/d;p?y")) + (pass-if (equal? (resolve "g?y") "http://a/b/c/g?y")) + (pass-if (equal? (resolve "#s") "http://a/b/c/d;p?q#s")) + (pass-if (equal? (resolve "g?y#s") "http://a/b/c/g?y#s")) + (pass-if (equal? (resolve ";x") "http://a/b/c/;x")) + (pass-if (equal? (resolve "g;x?y#s") "http://a/b/c/g;x?y#s")) + (pass-if (equal? (resolve "") "http://a/b/c/d;p?q")) + (pass-if (equal? (resolve ".") "http://a/b/c/")) + (pass-if (equal? (resolve "./") "http://a/b/c/")) + (pass-if (equal? (resolve "..") "http://a/b/")) + (pass-if (equal? (resolve "../") "http://a/b/")) + (pass-if (equal? (resolve "../g") "http://a/b/g")) + (pass-if (equal? (resolve "../..") "http://a/")) + (pass-if (equal? (resolve "../../") "http://a/")) + (pass-if (equal? (resolve "../../g") "http://a/g"))) + (with-test-prefix "abnormal" + (pass-if (equal? (resolve "../../../g") "http://a/g")) + (pass-if (equal? (resolve "../../../../g") "http://a/g")) + (pass-if (equal? (resolve "/./g") "http://a/g")) + (pass-if (equal? (resolve "/../g") "http://a/g")) + (pass-if (equal? (resolve "g.") "http://a/b/c/g.")) + (pass-if (equal? (resolve ".g") "http://a/b/c/.g")) + (pass-if (equal? (resolve "g..") "http://a/b/c/g..")) + (pass-if (equal? (resolve "..g") "http://a/b/c/..g")) + (pass-if (equal? (resolve "./../g") "http://a/b/g")) + (pass-if (equal? (resolve "./g/.") "http://a/b/c/g/")) + (pass-if (equal? (resolve "g/./h") "http://a/b/c/g/h")) + (pass-if (equal? (resolve "g/../h") "http://a/b/c/h")) + (pass-if (equal? (resolve "g;x=3D1/./y") "http://a/b/c/g;x=3D1/y")) + (pass-if (equal? (resolve "g;x=3D1/../y") "http://a/b/c/y")) + (pass-if (equal? (resolve "g?y/./x") "http://a/b/c/g?y/./x")) + (pass-if (equal? (resolve "g?y/../x") "http://a/b/c/g?y/../x")) + (pass-if (equal? (resolve "g#s/./x") "http://a/b/c/g#s/./x")) + (pass-if (equal? (resolve "g#s/../x") "http://a/b/c/g#s/../x")) + (pass-if (equal? (resolve "http:g") "http:g")) + (pass-if "nonstrict relative resolve" + (equal? (uri->string (resolve-relative-reference + base (string->uri-reference "http:g") + #:strict? #f)) + "http://a/b/c/g")))))) --=20 2.41.0 --=-=-=--