* bug#12827: [PATCH] Tweak web modules, support relative URIs [not found] <CAN3veRcJ4EMJ53vWSRG0HXfwdXbhUmdvUu8EuLfVV7abjZEt1Q@mail.gmail.com> @ 2013-02-24 10:45 ` Mark H Weaver [not found] ` <87vc9i6ld2.fsf@tines.lan> 1 sibling, 0 replies; 9+ messages in thread From: Mark H Weaver @ 2013-02-24 10:45 UTC (permalink / raw) To: Daniel Hartwig; +Cc: 12827, Ludovic Courtès, guile-devel Hi Daniel, Daniel Hartwig <mandyke@gmail.com> writes: > * Terminology > > The terminology used in latest URI spec. (RFC 3986) is not widely used > elsewhere. Not by Guile, not by the HTTP spec., or other sources. > Specifically, it defines these terms: > > - URI: scheme rest ... [fragment] > - Absolute-URI: scheme rest ... [fragment] > - Relative-Ref: rest ... [fragment] > - URI-Reference: Absolute-URI | Relative-Ref > > where as HTTP and other sources use the terms from an earlier URI > spec. (RFC 2396): > > - Absolute-URI: scheme rest ... [fragment] > - Relative-URI: rest ... [fragment] > - URI, URI-Reference: Absolute-URI | Relative-URI > > With this patch I have opted to use the later, more common terms. > This has the advantage of not requiring massive renaming or > duplicating of most procedures to include, e.g. > ‘uri-reference-scheme’, as we just use the term ‘uri’ to refer to > either type. > > If this is undesired, it can easily be reworked to use the terminology > from RFC 3986. Thanks for your careful work on this, and especially for calling our attention to the terminology changes introduced in the latest URI spec. My preference would be to use the newer RFC 3986 terms. To my mind, the key question is: which type (Absolute-URI or URI-Reference) is more commonly appropriate in user code, and thus more deserving of the short term "URI". I would argue that Absolute-URIs are more often appropriate in typical user code. The reason is that outside of URI-handling libraries, most code that deals with URIs simply use them as universal pointers, i.e. they implicitly assume that each URI is sufficient by itself to identify any resource in universe. Working with URI-References is inherently trickier and more error-prone, because code that handles them must do some additional bookkeeping to associate each URI-Reference with its _context_. It is inconvenient to mix URI-References from different contexts, and they must be converted when moved from one context to another. For typical code, the simplest and safest strategy for dealing with URI-References is to convert them to Absolute-URIs as early as possible, preferably as the document is being read. (Of course, there are special cases such as editors where it is important to preserve the URI-References, but that is not the typical case). Therefore, I think that Absolute-URI is more deserving of the short term "URI", and furthermore that existing code outside of (web uri) that refers to URIs should, by default, be assumed to be talking about Absolute-URIs. Only after some thought about whether a procedure handles relative references properly should its type be changed to accept URI-References. > * API compatability > > Presently, all APIs work only with absolute URIs. You can not use > string->uri or build-uri to produce any relative URIs, neither are > other procedures (generally) expected to work correctly if given them. For the reasons given above, I think that it is a virtue, not a flaw, i.e. I think that latest URI spec (RFC 3986) got this right. It is important to clearly distinguish Absolute-URIs from URI-References. Despite their overlapping syntax, they are very different concepts, and must not be conflated. Here's what I suggest: instead of extending 'string->uri' and 'build-uri' to produce relative URIs, rename those extended procedures 'string->uri-reference' and 'build-uri-reference'. These are long names, but that's okay because users should think twice before using them, and that's seldom needed. Then, we extend 'string->uri' and 'build-uri' in a different way: we extend them to handle relative references in their *inputs*, but continue to provide absolute *outputs*, by adding an optional keyword argument '#:base-uri'. This would make it easy to implement the simplest and safest strategy outlined above with a minimum of code changes. What do you think? Mark ^ permalink raw reply [flat|nested] 9+ messages in thread
[parent not found: <87vc9i6ld2.fsf@tines.lan>]
* bug#12827: [PATCH] Tweak web modules, support relative URIs [not found] ` <87vc9i6ld2.fsf@tines.lan> @ 2013-02-24 12:31 ` Daniel Hartwig 2013-02-24 19:55 ` Mark H Weaver 2013-03-13 11:05 ` Andy Wingo 0 siblings, 2 replies; 9+ messages in thread From: Daniel Hartwig @ 2013-02-24 12:31 UTC (permalink / raw) To: Mark H Weaver; +Cc: 12827, Ludovic Courtès, guile-devel On 24 February 2013 18:45, Mark H Weaver <mhw@netris.org> wrote: > Hi Daniel, > > Daniel Hartwig <mandyke@gmail.com> writes: >> * Terminology >> >> The terminology used in latest URI spec. (RFC 3986) is not widely used >> elsewhere. Not by Guile, not by the HTTP spec., or other sources. >> Specifically, it defines these terms: >> >> - URI: scheme rest ... [fragment] >> - Absolute-URI: scheme rest ... [fragment] >> - Relative-Ref: rest ... [fragment] >> - URI-Reference: Absolute-URI | Relative-Ref >> >> where as HTTP and other sources use the terms from an earlier URI >> spec. (RFC 2396): >> >> - Absolute-URI: scheme rest ... [fragment] >> - Relative-URI: rest ... [fragment] >> - URI, URI-Reference: Absolute-URI | Relative-URI >> > My preference would be to use the newer RFC 3986 terms. To my mind, the > key question is: which type (Absolute-URI or URI-Reference) is more > commonly appropriate in user code, and thus more deserving of the short > term "URI". > > I would argue that Absolute-URIs are more often appropriate in typical > user code. The reason is that outside of URI-handling libraries, most > code that deals with URIs simply use them as universal pointers, > i.e. they implicitly assume that each URI is sufficient by itself to > identify any resource in universe. Right. RFC 3986 makes a convincing argument for the new terminology. These notes about usage also reflect the sentiment in that document. FWIW, I sat mostly on the fence, finally going away from URI-Reference due to these concerns I expressed in an earlier email: > The API seems less clean, and it is not immediately clear > that uri? is not the top of the URI-like type hierarchy. The other > functions only indicate “uri” in their name. I did not > wish to introduce parallel “build-uri-reference”, etc. for each of > these, and did consider adding #:reference? on some to select > weaker validation. and looking at some other Scheme URI modules. However, having read over your comments I think that we could reasonably get away with just introducing the procedures you mention below and not bother about renaming (or duplicating) the field getters to ‘uri-reference-path’ etc.. > Here's what I suggest: instead of extending 'string->uri' and > 'build-uri' to produce relative URIs, rename those extended procedures > 'string->uri-reference' and 'build-uri-reference'. These are long > names, but that's okay because users should think twice before using > them, and that's seldom needed. In your proposed solution, ‘uri?’ and ‘uri-reference?’ are the predicates and they respond according to the RFC rather than internal Guile details? That is: (uri? (string->uri-reference "http://example.net/")) => #t (uri-reference? (string->uri-reference "http://example.net/")) => #t (uri? (string->uri-reference "foo")) => #f or …? > Then, we extend 'string->uri' and 'build-uri' in a different way: we > extend them to handle relative references in their *inputs*, but > continue to provide absolute *outputs*, by adding an optional keyword > argument '#:base-uri'. This would make it easy to implement the > simplest and safest strategy outlined above with a minimum of code > changes. This strategy does reflect the recommendation of RFC 3986 to resolve the references as they are read. Also an elegant API, as it encourages immedately resolving uri-references and never creating (or considering to create) the context-sensitive relative-refs. > > What do you think? > I quite like it, particularly the last part about #:base-uri. Ludo, I think this is basically what you were suggesting in the first place? :-) . ^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#12827: [PATCH] Tweak web modules, support relative URIs 2013-02-24 12:31 ` Daniel Hartwig @ 2013-02-24 19:55 ` Mark H Weaver 2013-03-13 11:05 ` Andy Wingo 1 sibling, 0 replies; 9+ messages in thread From: Mark H Weaver @ 2013-02-24 19:55 UTC (permalink / raw) To: Daniel Hartwig; +Cc: 12827, Ludovic Courtès, guile-devel Daniel Hartwig <mandyke@gmail.com> writes: > On 24 February 2013 18:45, Mark H Weaver <mhw@netris.org> wrote: >> I would argue that Absolute-URIs are more often appropriate in typical >> user code. The reason is that outside of URI-handling libraries, most >> code that deals with URIs simply use them as universal pointers, >> i.e. they implicitly assume that each URI is sufficient by itself to >> identify any resource in universe. > > Right. RFC 3986 makes a convincing argument for the new terminology. > These notes about usage also reflect the sentiment in that document. > > FWIW, I sat mostly on the fence, finally going away from URI-Reference > due to these concerns I expressed in an earlier email: >> The API seems less clean, and it is not immediately clear >> that uri? is not the top of the URI-like type hierarchy. The other >> functions only indicate “uri” in their name. I did not >> wish to introduce parallel “build-uri-reference”, etc. for each of >> these, and did consider adding #:reference? on some to select >> weaker validation. > > and looking at some other Scheme URI modules. > > However, having read over your comments I think that we could > reasonably get away with just introducing the procedures you mention > below and not bother about renaming (or duplicating) the field getters > to ‘uri-reference-path’ etc.. Hmm. The cleanest solution would probably be to duplicate the field getters, and make the 'uri-*' variants (e.g. 'uri-path') raise an error when applied to a relative reference. However, it's probably not that important, so if you think it's better to simply extend 'uri-path' etc to apply to all URI-References, I'm okay with that. >> Here's what I suggest: instead of extending 'string->uri' and >> 'build-uri' to produce relative URIs, rename those extended procedures >> 'string->uri-reference' and 'build-uri-reference'. These are long >> names, but that's okay because users should think twice before using >> them, and that's seldom needed. > > In your proposed solution, ‘uri?’ and ‘uri-reference?’ are the > predicates and they respond according to the RFC rather than internal > Guile details? What do you mean by "rather than internal Guile details"? Here's how I like to think about these types: URI-Reference is at the top of the type hierarchy, and URI (a.k.a. Absolute-URI) and Relative-Ref are subtypes. Furthermore, every URI-Reference is either an Absolute-URI or a Relative-Ref. In other words, if you create a URI-Reference that happens to be absolute, then you'll end up with a URI, in the same sense that if you create a complex number whose imaginary part happens to be exact zero, you'll end up with a real number. > That is: > > (uri? (string->uri-reference "http://example.net/")) > => #t > (uri-reference? (string->uri-reference "http://example.net/")) > => #t > (uri? (string->uri-reference "foo")) > => #f Yes. >> Then, we extend 'string->uri' and 'build-uri' in a different way: we >> extend them to handle relative references in their *inputs*, but >> continue to provide absolute *outputs*, by adding an optional keyword >> argument '#:base-uri'. This would make it easy to implement the >> simplest and safest strategy outlined above with a minimum of code >> changes. > > This strategy does reflect the recommendation of RFC 3986 to resolve > the references as they are read. Also an elegant API, as it > encourages immedately resolving uri-references and never creating (or > considering to create) the context-sensitive relative-refs. > >> >> What do you think? >> > > I quite like it, particularly the last part about #:base-uri. > > Ludo, I think this is basically what you were suggesting in the first place? :-) Excellent! BTW, to be clear, I suggest that 'string->uri' and 'build-uri' should be guaranteed to produce Absolute-URIs. In other words, they should raise an error if not given enough information to produce an Absolute-URI. Does that make sense? Thanks again for your work on this :) Mark ^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#12827: [PATCH] Tweak web modules, support relative URIs 2013-02-24 12:31 ` Daniel Hartwig 2013-02-24 19:55 ` Mark H Weaver @ 2013-03-13 11:05 ` Andy Wingo 2013-03-16 14:25 ` Daniel Hartwig 1 sibling, 1 reply; 9+ messages in thread From: Andy Wingo @ 2013-03-13 11:05 UTC (permalink / raw) To: Daniel Hartwig; +Cc: Ludovic Courtès, 12827 What's the status here, Daniel? Would be nice to fix this bug one way or another for 2.0.8. Cheers, Andy -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#12827: [PATCH] Tweak web modules, support relative URIs 2013-03-13 11:05 ` Andy Wingo @ 2013-03-16 14:25 ` Daniel Hartwig 2013-03-20 10:20 ` Andy Wingo 2016-06-20 19:52 ` Andy Wingo 0 siblings, 2 replies; 9+ messages in thread From: Daniel Hartwig @ 2013-03-16 14:25 UTC (permalink / raw) To: Andy Wingo; +Cc: Ludovic Courtès, 12827 [-- Attachment #1: Type: text/plain, Size: 1181 bytes --] On 13 March 2013 19:05, Andy Wingo <wingo@pobox.com> wrote: > What's the status here, Daniel? Would be nice to fix this bug one way > or another for 2.0.8. Latest work attached, updated as per discussion with Mark. Still missing #:base-uri (RFC 3986 #5.2) and some polish. For the docs, I believe it best to follow the RFC and leave the existing section on URIs as-is, followed by a new section introducing the other types. This will help avoid conflating the two concepts of URI and URI-reference. Regarding the interface. There is now an abundance of constructors and string converters, one for each specific type. It is also somewhat inconsistent in that there is no need for multiple accessors or ‘uri*->string’ procedures. An alternative interface might employ a single constructor similar to ‘make-time’, using a set of variables/symbols to represent the desired type: build-uri-reference arg ... [#:type=‘uri’] string->uri-reference str [type] where TYPE is one of ‘uri’, ‘uri-reference’, ‘relative-ref’, ‘absolute-uri’. Perhaps even have a single ‘build-uri’ with these semantics. Comments, ideas? [-- Attachment #2: 0001-web-add-support-for-URI-reference.patch --] [-- Type: application/octet-stream, Size: 25390 bytes --] From 26655a2ae8a2864ea867ed5240eff5d0bb916a49 Mon Sep 17 00:00:00 2001 From: Daniel Hartwig <mandyke@gmail.com> Date: Sat, 16 Mar 2013 21:18:34 +0800 Subject: [PATCH] web: add support for URI-reference * doc/ref/web.texi (URIs): Fragments are properly part of a URI, so remove the incorrect note. * module/web/uri.scm (uri-reference?): New base type predicate. (uri?, relative-ref?, absolute-uri?): Specific predicates. (validate-uri-reference): Strict validation. (validate-uri, validate-relative-ref, validate-absolute-uri): Specific validators. (build-uri-reference, build-relative-ref, build-absolute-uri): New constructors. (string->uri*): Add `validate' argument. (string->uri, string->uri-reference, string->relative-ref): (string->absolute-uri): Specific constructors. * module/web/http.scm (parse-request-uri): Use `build-uri-reference', and result is a URI-reference, not URI, object. No longer infer an absent `uri-scheme' is `http'. (write-uri): Do not display an absent `uri-scheme', however, do display the scheme even when `uri-host' is absent. Add note to look at using `uri->string'. (declare-absolute-uri-header!): Update. Rename from `declare-uri-header!'. (declare-uri-reference-header!): Update. Rename from `declare-relative-uri-header!'. * test-suite/tests/web-uri.test ("build-uri-reference"): ("string->uri-reference"): Add. ("uri->string"): Also tests for relative-refs. * test-suite/tests/web-http.test ("read-request-line"): ("write-request-line"): Update for no scheme in some URIs. ("entity headers", "request headers"): Content-location and referer should also parse relative-URIs. ("response headers"): Location should not parse relative-URIs. * test-suite/tests/web-request.test ("example-1"): Expect URI-reference with no scheme. --- doc/ref/web.texi | 8 -- module/web/http.scm | 47 ++++++----- module/web/uri.scm | 158 ++++++++++++++++++++++++++++++++++--- test-suite/tests/web-http.test | 54 ++++++++----- test-suite/tests/web-request.test | 5 +- test-suite/tests/web-uri.test | 66 +++++++++++++++- 6 files changed, 275 insertions(+), 63 deletions(-) diff --git a/doc/ref/web.texi b/doc/ref/web.texi index 0d41f9f..476151b 100644 --- a/doc/ref/web.texi +++ b/doc/ref/web.texi @@ -190,14 +190,6 @@ since passwords do not belong in URIs, the RFC does not want to condone this practice, so it calls anything before the @code{@@} sign @dfn{userinfo}. -Properly speaking, a fragment is not part of a URI. For example, when a -web browser follows a link to @indicateurl{http://example.com/#foo}, it -sends a request for @indicateurl{http://example.com/}, then looks in the -resulting page for the fragment identified @code{foo} reference. A -fragment identifies a part of a resource, not the resource itself. But -it is useful to have a fragment field in the URI record itself, so we -hope you will forgive the inconsistency. - @example (use-modules (web uri)) @end example diff --git a/module/web/http.scm b/module/web/http.scm index b5202b6..5c250d9 100644 --- a/module/web/http.scm +++ b/module/web/http.scm @@ -1023,7 +1023,8 @@ symbol, like ‘GET’." (define* (parse-request-uri str #:optional (start 0) (end (string-length str))) "Parse a URI from an HTTP request line. Note that URIs in requests do -not have to have a scheme or host name. The result is a URI object." +not have to have a scheme or host name. The result is a URI-reference +object." (cond ((= start end) (bad-request "Missing Request-URI")) @@ -1033,10 +1034,10 @@ not have to have a scheme or host name. The result is a URI object." (let* ((q (string-index str #\? start end)) (f (string-index str #\# start end)) (q (and q (or (not f) (< q f)) q))) - (build-uri 'http - #:path (substring str start (or q f end)) - #:query (and q (substring str (1+ q) (or f end))) - #:fragment (and f (substring str (1+ f) end))))) + (build-uri-reference + #:path (substring str start (or q f end)) + #:query (and q (substring str (1+ q) (or f end))) + #:fragment (and f (substring str (1+ f) end))))) (else (or (string->uri (substring str start end)) (bad-request "Invalid URI: ~a" (substring str start end)))))) @@ -1053,11 +1054,17 @@ three values: the method, the URI, and the version." (parse-http-version line (1+ d1) (string-length line))) (bad-request "Bad Request-Line: ~s" line)))) +;; FIXME: The validation here should be reconsidered and moved to +;; individual header validators if they do not already covered. Then +;; this procedure should be using uri->string. (define (write-uri uri port) - (if (uri-host uri) + (if (uri-scheme uri) (begin (display (uri-scheme uri) port) - (display "://" port) + (display #\: port))) + (if (uri-host uri) + (begin + (display "//" port) (if (uri-userinfo uri) (begin (display (uri-userinfo uri) port) @@ -1171,20 +1178,22 @@ treated specially, and is just returned as a plain string." (declare-header! name parse-non-negative-integer non-negative-integer? display)) -;; emacs: (put 'declare-uri-header! 'scheme-indent-function 1) -(define (declare-uri-header! name) +;; emacs: (put 'declare-absolute-uri-header! 'scheme-indent-function 1) +(define (declare-absolute-uri-header! name) (declare-header! name - (lambda (str) (or (string->uri str) (bad-header-component 'uri str))) - (@@ (web uri) absolute-uri?) + (lambda (str) + (or (string->absolute-uri str) + (bad-header-component 'absolute-uri str))) + absolute-uri? write-uri)) -;; emacs: (put 'declare-relative-uri-header! 'scheme-indent-function 1) -(define (declare-relative-uri-header! name) +;; emacs: (put 'declare-uri-reference-header! 'scheme-indent-function 1) +(define (declare-uri-reference-header! name) (declare-header! name (lambda (str) - (or ((@@ (web uri) string->uri*) str) - (bad-header-component 'uri str))) - uri? + (or (string->uri-reference str) + (bad-header-component 'uri-reference str))) + uri-reference? write-uri)) ;; emacs: (put 'declare-quality-list-header! 'scheme-indent-function 1) @@ -1449,7 +1458,7 @@ treated specially, and is just returned as a plain string." ;; Content-Location = ( absoluteURI | relativeURI ) ;; -(declare-relative-uri-header! "Content-Location") +(declare-uri-reference-header! "Content-Location") ;; Content-MD5 = <base64 of 128 bit MD5 digest as per RFC 1864> ;; @@ -1752,7 +1761,7 @@ treated specially, and is just returned as a plain string." ;; Referer = ( absoluteURI | relativeURI ) ;; -(declare-relative-uri-header! "Referer") +(declare-uri-reference-header! "Referer") ;; TE = #( t-codings ) ;; t-codings = "trailers" | ( transfer-extension [ accept-params ] ) @@ -1789,7 +1798,7 @@ treated specially, and is just returned as a plain string." ;; Location = absoluteURI ;; -(declare-uri-header! "Location") +(declare-absolute-uri-header! "Location") ;; Proxy-Authenticate = 1#challenge ;; diff --git a/module/web/uri.scm b/module/web/uri.scm index 7fe0100..8a8e1d9 100644 --- a/module/web/uri.scm +++ b/module/web/uri.scm @@ -40,11 +40,15 @@ string->uri uri->string uri-decode uri-encode split-and-decode-uri-path - encode-and-join-uri-path)) + encode-and-join-uri-path + + uri-reference? relative-ref? absolute-uri? + build-uri-reference build-relative-ref build-absolute-uri + string->uri-reference string->relative-ref string->absolute-uri)) (define-record-type <uri> (make-uri scheme userinfo host port path query fragment) - uri? + uri-reference? (scheme uri-scheme) (userinfo uri-userinfo) (host uri-host) @@ -53,8 +57,51 @@ (query uri-query) (fragment uri-fragment)) +;;; +;;; Predicates. +;;; +;;; These are quick, and assume rigid validation at construction time. + +;;; RFC 3986, #3. +;;; +;;; URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] +;;; +;;; hier-part = "//" authority path-abempty +;;; / path-absolute +;;; / path-rootless +;;; / path-empty + +(define (uri? obj) + (and (uri-reference? obj) + (uri-scheme obj))) + +;;; RFC 3986, #4.2. +;;; +;;; relative-ref = relative-part [ "?" query ] [ "#" fragment ] +;;; +;;; relative-part = "//" authority path-abempty +;;; / path-absolute +;;; / path-noscheme +;;; / path-empty + +(define (relative-ref? obj) + (and (uri-reference? obj) + (not (uri-scheme obj)))) + +;;; RFC 3986, #4.3. +;;; +;;; absolute-URI = scheme ":" hier-part [ "?" query ] + (define (absolute-uri? obj) - (and (uri? obj) (uri-scheme obj) #t)) + (and (uri-reference? obj) + (uri-scheme obj) + (not (uri-fragment obj)))) + +\f +;;; +;;; Constructors. +;;; +;;; Disable validation at your own peril! (define (uri-error message . args) (throw 'uri-error message args)) @@ -62,9 +109,13 @@ (define (positive-exact-integer? port) (and (number? port) (exact? port) (integer? port) (positive? port))) -(define (validate-uri scheme userinfo host port path query fragment) +(define* (validate-uri-reference scheme userinfo host port path query fragment + #:key scheme? no-scheme? no-fragment? + (relative-part? (not scheme))) (cond - ((not (symbol? scheme)) + ((and scheme no-scheme?) + (uri-error "Expected no scheme: ~s" scheme)) + ((and (or scheme? scheme) (not (symbol? scheme))) (uri-error "Expected a symbol for the URI scheme: ~s" scheme)) ((and (or userinfo port) (not host)) (uri-error "Expected a host, given userinfo or port")) @@ -76,9 +127,45 @@ (uri-error "Expected string for userinfo: ~s" userinfo)) ((not (string? path)) (uri-error "Expected string for path: ~s" path)) - ((and host (not (string-null? path)) - (not (eqv? (string-ref path 0) #\/))) - (uri-error "Expected path of absolute URI to start with a /: ~a" path)))) + ((and query (not (string? query))) + (uri-error "Expected string for query: ~s" query)) + ((and fragment no-fragment?) + (uri-error "Expected no fragment: ~s" fragment)) + ((and fragment (not (string? fragment))) + (uri-error "Expected string for fragment: ~s" fragment)) + ;; Strict validation of allowed paths, based on other components. + ;; Refer to RFC 3986 for the details. + ((not (string-null? path)) + (if host + (cond + ((not (eqv? (string-ref path 0) #\/)) + (uri-error + "Expected absolute path starting with \"/\": ~a" path))) + (cond + ((string-prefix? "//" path) + (uri-error + "Expected path not starting with \"//\" (no host): ~a" path)) + ((and relative-part? + (not (eqv? (string-ref path 0) #\/)) + (let ((colon (string-index path #\:))) + (and colon (not (string-index path #\/ 0 colon))))) + (uri-error + "Expected relative path's first segment without \":\": ~a" + path))))))) + +(define (validate-uri scheme userinfo host port path query fragment) + (validate-uri-reference scheme userinfo host port path query fragment + #:scheme? #t)) + +(define (validate-relative-ref scheme userinfo host port path query fragment) + (validate-uri-reference scheme userinfo host port path query fragment + #:no-scheme? #t + #:relative-part? #t)) + +(define (validate-absolute-uri scheme userinfo host port path query fragment) + (validate-uri-reference scheme userinfo host port path query fragment + #:scheme? #t + #:no-fragment? #t)) (define* (build-uri scheme #:key userinfo host port (path "") query fragment (validate? #t)) @@ -91,6 +178,38 @@ is valid." (validate-uri scheme userinfo host port path query fragment)) (make-uri scheme userinfo host port path query fragment)) +(define* (build-uri-reference #:key scheme userinfo host port + (path "") query fragment + (validate? #t)) + "Construct a URI-reference object. Fields are the same as for +‘build-uri’ except that SCHEME may also be ‘#f’." + (if validate? + (validate-uri-reference scheme userinfo host port path query fragment)) + (make-uri scheme userinfo host port path query fragment)) + +(define* (build-relative-ref #:key userinfo host port + (path "") query fragment + (validate? #t)) + "Construct an absolute-URI object. Fields are the same as for +‘build-uri’ except there is no scheme." + (if validate? + (validate-relative-ref #f userinfo host port path query fragment)) + (make-uri #f userinfo host port path query fragment)) + +(define* (build-absolute-uri #:key scheme userinfo host port + (path "") query + (validate? #t)) + "Construct an absolute-URI object. Fields are the same as for +‘build-uri’ except there is no fragment." + (if validate? + (validate-absolute-uri scheme userinfo host port path query #f)) + (make-uri scheme userinfo host port path query #f)) + +\f +;;; +;;; Converters. +;;; + ;; See RFC 3986 #3.2.2 for comments on percent-encodings, IDNA (RFC ;; 3490), and non-ASCII host names. ;; @@ -173,9 +292,7 @@ is valid." (define uri-regexp (make-regexp uri-pat)) -(define (string->uri* string) - "Parse STRING into a URI object. Return ‘#f’ if the string -could not be parsed." +(define (string->uri* string validate) (% (let ((m (regexp-exec uri-regexp string))) (if (not m) (abort)) (let ((scheme (let ((str (match:substring m 2))) @@ -190,6 +307,7 @@ could not be parsed." (parse-authority authority abort) (values #f #f #f))) (lambda (userinfo host port) + (validate scheme userinfo host port path query fragment) (make-uri scheme userinfo host port path query fragment))))) (lambda (k) #f))) @@ -197,8 +315,22 @@ could not be parsed." (define (string->uri string) "Parse STRING into a URI object. Return ‘#f’ if the string could not be parsed." - (let ((uri (string->uri* string))) - (and uri (uri-scheme uri) uri))) + (string->uri* string validate-uri)) + +(define (string->uri-reference string) + "Parse STRING into a URI-reference object. Return ‘#f’ if the string +could not be parsed." + (string->uri* string validate-uri-reference)) + +(define (string->relative-ref string) + "Parse STRING into a relative-ref object. Return ‘#f’ if the string +could not be parsed." + (string->uri* string validate-relative-ref)) + +(define (string->absolute-uri string) + "Parse STRING into an absolute-URI object. Return ‘#f’ if the string +could not be parsed." + (string->uri* string validate-absolute-uri)) (define *default-ports* (make-hash-table)) diff --git a/test-suite/tests/web-http.test b/test-suite/tests/web-http.test index 2913724..b836926 100644 --- a/test-suite/tests/web-http.test +++ b/test-suite/tests/web-http.test @@ -1,6 +1,6 @@ ;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8; -*- ;;;; -;;;; Copyright (C) 2010, 2011 Free Software Foundation, Inc. +;;;; Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -132,32 +132,33 @@ (with-test-prefix "read-request-line" (pass-if-read-request-line "GET / HTTP/1.1" GET - (build-uri 'http - #:path "/") + (build-uri-reference + #:path "/") (1 . 1)) (pass-if-read-request-line "GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1" GET - (build-uri 'http - #:host "www.w3.org" - #:path "/pub/WWW/TheProject.html") + (build-uri-reference + #:scheme 'http + #:host "www.w3.org" + #:path "/pub/WWW/TheProject.html") (1 . 1)) (pass-if-read-request-line "GET /pub/WWW/TheProject.html HTTP/1.1" GET - (build-uri 'http - #:path "/pub/WWW/TheProject.html") + (build-uri-reference + #:path "/pub/WWW/TheProject.html") (1 . 1)) (pass-if-read-request-line "HEAD /etc/hosts?foo=bar HTTP/1.1" HEAD - (build-uri 'http - #:path "/etc/hosts" - #:query "foo=bar") + (build-uri-reference + #:path "/etc/hosts" + #:query "foo=bar") (1 . 1))) (with-test-prefix "write-request-line" (pass-if-write-request-line "GET / HTTP/1.1" GET - (build-uri 'http - #:path "/") + (build-uri-reference + #:path "/") (1 . 1)) ;;; FIXME: Test fails due to scheme, host always being removed. ;;; However, it should be supported to request these be present, and @@ -170,14 +171,14 @@ ;; (1 . 1)) (pass-if-write-request-line "GET /pub/WWW/TheProject.html HTTP/1.1" GET - (build-uri 'http - #:path "/pub/WWW/TheProject.html") + (build-uri-reference + #:path "/pub/WWW/TheProject.html") (1 . 1)) (pass-if-write-request-line "HEAD /etc/hosts?foo=bar HTTP/1.1" HEAD - (build-uri 'http - #:path "/etc/hosts" - #:query "foo=bar") + (build-uri-reference + #:path "/etc/hosts" + #:query "foo=bar") (1 . 1))) (with-test-prefix "read-response-line" @@ -252,6 +253,12 @@ (pass-if-parse content-length "010" 10) (pass-if-parse content-location "http://foo/" (build-uri 'http #:host "foo" #:path "/")) + (pass-if-parse content-location "//foo/" + (build-uri-reference #:host "foo" #:path "/")) + (pass-if-parse content-location "/etc/foo" + (build-uri-reference #:path "/etc/foo")) + (pass-if-parse content-location "foo" + (build-uri-reference #:path "foo")) (pass-if-parse content-range "bytes 10-20/*" '(bytes (10 . 20) *)) (pass-if-parse content-range "bytes */*" '(bytes * *)) (pass-if-parse content-range "bytes */30" '(bytes * 30)) @@ -319,6 +326,14 @@ (pass-if-parse range "bytes=-20,-30" '(bytes (#f . 20) (#f . 30))) (pass-if-parse referer "http://foo/bar?baz" (build-uri 'http #:host "foo" #:path "/bar" #:query "baz")) + (pass-if-parse referer "//foo/bar?baz" + (build-uri-reference #:host "foo" + #:path "/bar" + #:query "baz")) + (pass-if-parse referer "/etc/foo" + (build-uri-reference #:path "/etc/foo")) + (pass-if-parse referer "foo" + (build-uri-reference #:path "foo")) (pass-if-parse te "trailers" '((trailers))) (pass-if-parse te "trailers,foo" '((trailers) (foo))) (pass-if-parse user-agent "guile" "guile")) @@ -333,6 +348,9 @@ (pass-if-parse etag "W/\"foo\"" '("foo" . #f)) (pass-if-parse location "http://other-place" (build-uri 'http #:host "other-place")) + (pass-if-any-error location "//other-place") + (pass-if-any-error location "/etc/foo") + (pass-if-any-error location "foo") (pass-if-parse proxy-authenticate "Basic realm=\"guile\"" '((basic (realm . "guile")))) (pass-if-parse retry-after "Tue, 15 Nov 1994 08:12:31 GMT" diff --git a/test-suite/tests/web-request.test b/test-suite/tests/web-request.test index 8cf1c2e..68721d3 100644 --- a/test-suite/tests/web-request.test +++ b/test-suite/tests/web-request.test @@ -1,6 +1,6 @@ ;;;; web-request.test --- HTTP requests -*- mode: scheme; coding: utf-8; -*- ;;;; -;;;; Copyright (C) 2010, 2011 Free Software Foundation, Inc. +;;;; Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -53,7 +53,8 @@ Accept-Language: en-gb, en;q=0.9\r (pass-if (equal? (request-method r) 'GET)) - (pass-if (equal? (request-uri r) (build-uri 'http #:path "/qux"))) + (pass-if (equal? (request-uri r) + (build-uri-reference #:path "/qux"))) (pass-if (equal? (read-request-body r) #f)) diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test index 3f6e7e3..21d8044 100644 --- a/test-suite/tests/web-uri.test +++ b/test-suite/tests/web-uri.test @@ -1,6 +1,6 @@ ;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8; -*- ;;;; -;;;; Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc. +;;;; Copyright (C) 2010, 2011, 2012, 2013 Free Software Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public @@ -27,7 +27,7 @@ (define* (uri=? uri #:key scheme userinfo host port path query fragment) - (and (uri? uri) + (and (uri-reference? uri) (equal? (uri-scheme uri) scheme) (equal? (uri-userinfo uri) userinfo) (equal? (uri-host uri) host) @@ -123,6 +123,22 @@ "Expected.*host" (build-uri 'http #:userinfo "foo"))) +(with-test-prefix "build-uri-reference" + (pass-if "//host/etc/foo" + (uri=? (build-uri-reference #:host "host" + #:path "/etc/foo") + #:host "host" + #:path "/etc/foo")) + + (pass-if "/path/to/some/foo?query" + (uri=? (build-uri-reference #:path "/path/to/some/foo" + #:query "query") + #:path "/path/to/some/foo" + #:query "query")) + + (pass-if "nextdoc/foo" + (uri=? (build-uri-reference #:path "nextdoc/foo") + #:path "nextdoc/foo"))) (with-test-prefix "string->uri" (pass-if "ftp:" @@ -212,6 +228,30 @@ #:scheme 'file #:path "/etc/hosts"))) +(with-test-prefix "string->uri-reference" + (pass-if "/" + (uri=? (string->uri-reference "/") + #:path "/")) + + (pass-if "/path/to/foo" + (uri=? (string->uri-reference "/path/to/foo") + #:path "/path/to/foo")) + + (pass-if "//example.org" + (uri=? (string->uri-reference "//example.org") + #:host "example.org" + #:path "")) + + (pass-if "//bar@example.org/path/to/foo" + (uri=? (string->uri-reference "//bar@example.org/path/to/foo") + #:userinfo "bar" + #:host "example.org" + #:path "/path/to/foo")) + + (pass-if "nextdoc/foo" + (uri=? (string->uri-reference "nextdoc/foo") + #:path "nextdoc/foo"))) + (with-test-prefix "uri->string" (pass-if "ftp:" (equal? "ftp:" @@ -248,7 +288,27 @@ (pass-if "http://foo:/" (equal? "http://foo/" - (uri->string (string->uri "http://foo:/"))))) + (uri->string (string->uri "http://foo:/")))) + + (pass-if "/" + (equal? "/" + (uri->string (string->uri-reference "/")))) + + (pass-if "/path/to/foo" + (equal? "/path/to/foo" + (uri->string (string->uri-reference "/path/to/foo")))) + + (pass-if "//example.org" + (equal? "//example.org" + (uri->string (string->uri-reference "//example.org")))) + + (pass-if "//bar@example.org/path/to/foo" + (equal? "//bar@example.org/path/to/foo" + (uri->string (string->uri-reference "//bar@example.org/path/to/foo")))) + + (pass-if "nextdoc/foo" + (equal? "nextdoc/foo" + (uri->string (string->uri-reference "nextdoc/foo"))))) (with-test-prefix "decode" (pass-if "foo%20bar" -- 1.7.10.4 ^ permalink raw reply related [flat|nested] 9+ messages in thread
* bug#12827: [PATCH] Tweak web modules, support relative URIs 2013-03-16 14:25 ` Daniel Hartwig @ 2013-03-20 10:20 ` Andy Wingo 2016-06-20 19:52 ` Andy Wingo 1 sibling, 0 replies; 9+ messages in thread From: Andy Wingo @ 2013-03-20 10:20 UTC (permalink / raw) To: Daniel Hartwig; +Cc: Ludovic Courtès, 12827 On Sat 16 Mar 2013 15:25, Daniel Hartwig <mandyke@gmail.com> writes: > On 13 March 2013 19:05, Andy Wingo <wingo@pobox.com> wrote: >> What's the status here, Daniel? Would be nice to fix this bug one way >> or another for 2.0.8. > > Latest work attached, updated as per discussion with Mark. Looks good but needs more docs for me to understand it, I think. A -- http://wingolog.org/ ^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#12827: [PATCH] Tweak web modules, support relative URIs 2013-03-16 14:25 ` Daniel Hartwig 2013-03-20 10:20 ` Andy Wingo @ 2016-06-20 19:52 ` Andy Wingo 2016-06-21 13:22 ` Ludovic Courtès 1 sibling, 1 reply; 9+ messages in thread From: Andy Wingo @ 2016-06-20 19:52 UTC (permalink / raw) To: Daniel Hartwig; +Cc: 12827, Ludovic Courtès I would like to apply this patch, to master at least. Any objections? We need documentation for the new exports, is the only missing thing. Andy On Sat 16 Mar 2013 15:25, Daniel Hartwig <mandyke@gmail.com> writes: > On 13 March 2013 19:05, Andy Wingo <wingo@pobox.com> wrote: >> What's the status here, Daniel? Would be nice to fix this bug one way >> or another for 2.0.8. > > Latest work attached, updated as per discussion with Mark. > > Still missing #:base-uri (RFC 3986 #5.2) and some polish. > > For the docs, I believe it best to follow the RFC and leave the > existing section on URIs as-is, followed by a new section introducing > the other types. This will help avoid conflating the two concepts of > URI and URI-reference. > > Regarding the interface. There is now an abundance of constructors > and string converters, one for each specific type. It is also > somewhat inconsistent in that there is no need for multiple accessors > or âuri*->stringâ procedures. An alternative interface might employ a > single constructor similar to âmake-timeâ, using a set of > variables/symbols to represent the desired type: > > build-uri-reference arg ... [#:type=âuriâ] > string->uri-reference str [type] > > where TYPE is one of âuriâ, âuri-referenceâ, ârelative-refâ, > âabsolute-uriâ. Perhaps even have a single âbuild-uriâ with these > semantics. > > Comments, ideas? > > From 26655a2ae8a2864ea867ed5240eff5d0bb916a49 Mon Sep 17 00:00:00 2001 > From: Daniel Hartwig <mandyke@gmail.com> > Date: Sat, 16 Mar 2013 21:18:34 +0800 > Subject: [PATCH] web: add support for URI-reference > > * doc/ref/web.texi (URIs): Fragments are properly part of a URI, so > remove the incorrect note. > > * module/web/uri.scm (uri-reference?): New base type predicate. > (uri?, relative-ref?, absolute-uri?): Specific predicates. > > (validate-uri-reference): Strict validation. > (validate-uri, validate-relative-ref, validate-absolute-uri): > Specific validators. > > (build-uri-reference, build-relative-ref, build-absolute-uri): > New constructors. > > (string->uri*): Add `validate' argument. > (string->uri, string->uri-reference, string->relative-ref): > (string->absolute-uri): Specific constructors. > > * module/web/http.scm (parse-request-uri): Use `build-uri-reference', > and result is a URI-reference, not URI, object. No longer infer an > absent `uri-scheme' is `http'. > > (write-uri): Do not display an absent `uri-scheme', however, do > display the scheme even when `uri-host' is absent. Add note to look > at using `uri->string'. > > (declare-absolute-uri-header!): Update. Rename from > `declare-uri-header!'. > > (declare-uri-reference-header!): Update. Rename from > `declare-relative-uri-header!'. > > * test-suite/tests/web-uri.test ("build-uri-reference"): > ("string->uri-reference"): Add. > > ("uri->string"): Also tests for relative-refs. > > * test-suite/tests/web-http.test ("read-request-line"): > ("write-request-line"): Update for no scheme in some URIs. > > ("entity headers", "request headers"): Content-location and referer > should also parse relative-URIs. > ("response headers"): Location should not parse relative-URIs. > > * test-suite/tests/web-request.test ("example-1"): Expect URI-reference > with no scheme. > --- > doc/ref/web.texi | 8 -- > module/web/http.scm | 47 ++++++----- > module/web/uri.scm | 158 ++++++++++++++++++++++++++++++++++--- > test-suite/tests/web-http.test | 54 ++++++++----- > test-suite/tests/web-request.test | 5 +- > test-suite/tests/web-uri.test | 66 +++++++++++++++- > 6 files changed, 275 insertions(+), 63 deletions(-) > > diff --git a/doc/ref/web.texi b/doc/ref/web.texi > index 0d41f9f..476151b 100644 > --- a/doc/ref/web.texi > +++ b/doc/ref/web.texi > @@ -190,14 +190,6 @@ since passwords do not belong in URIs, the RFC does not want to condone > this practice, so it calls anything before the @code{@@} sign > @dfn{userinfo}. > > -Properly speaking, a fragment is not part of a URI. For example, when a > -web browser follows a link to @indicateurl{http://example.com/#foo}, it > -sends a request for @indicateurl{http://example.com/}, then looks in the > -resulting page for the fragment identified @code{foo} reference. A > -fragment identifies a part of a resource, not the resource itself. But > -it is useful to have a fragment field in the URI record itself, so we > -hope you will forgive the inconsistency. > - > @example > (use-modules (web uri)) > @end example > diff --git a/module/web/http.scm b/module/web/http.scm > index b5202b6..5c250d9 100644 > --- a/module/web/http.scm > +++ b/module/web/http.scm > @@ -1023,7 +1023,8 @@ symbol, like GET." > > (define* (parse-request-uri str #:optional (start 0) (end (string-length str))) > "Parse a URI from an HTTP request line. Note that URIs in requests do > -not have to have a scheme or host name. The result is a URI object." > +not have to have a scheme or host name. The result is a URI-reference > +object." > (cond > ((= start end) > (bad-request "Missing Request-URI")) > @@ -1033,10 +1034,10 @@ not have to have a scheme or host name. The result is a URI object." > (let* ((q (string-index str #\? start end)) > (f (string-index str #\# start end)) > (q (and q (or (not f) (< q f)) q))) > - (build-uri 'http > - #:path (substring str start (or q f end)) > - #:query (and q (substring str (1+ q) (or f end))) > - #:fragment (and f (substring str (1+ f) end))))) > + (build-uri-reference > + #:path (substring str start (or q f end)) > + #:query (and q (substring str (1+ q) (or f end))) > + #:fragment (and f (substring str (1+ f) end))))) > (else > (or (string->uri (substring str start end)) > (bad-request "Invalid URI: ~a" (substring str start end)))))) > @@ -1053,11 +1054,17 @@ three values: the method, the URI, and the version." > (parse-http-version line (1+ d1) (string-length line))) > (bad-request "Bad Request-Line: ~s" line)))) > > +;; FIXME: The validation here should be reconsidered and moved to > +;; individual header validators if they do not already covered. Then > +;; this procedure should be using uri->string. > (define (write-uri uri port) > - (if (uri-host uri) > + (if (uri-scheme uri) > (begin > (display (uri-scheme uri) port) > - (display "://" port) > + (display #\: port))) > + (if (uri-host uri) > + (begin > + (display "//" port) > (if (uri-userinfo uri) > (begin > (display (uri-userinfo uri) port) > @@ -1171,20 +1178,22 @@ treated specially, and is just returned as a plain string." > (declare-header! name > parse-non-negative-integer non-negative-integer? display)) > > -;; emacs: (put 'declare-uri-header! 'scheme-indent-function 1) > -(define (declare-uri-header! name) > +;; emacs: (put 'declare-absolute-uri-header! 'scheme-indent-function 1) > +(define (declare-absolute-uri-header! name) > (declare-header! name > - (lambda (str) (or (string->uri str) (bad-header-component 'uri str))) > - (@@ (web uri) absolute-uri?) > + (lambda (str) > + (or (string->absolute-uri str) > + (bad-header-component 'absolute-uri str))) > + absolute-uri? > write-uri)) > > -;; emacs: (put 'declare-relative-uri-header! 'scheme-indent-function 1) > -(define (declare-relative-uri-header! name) > +;; emacs: (put 'declare-uri-reference-header! 'scheme-indent-function 1) > +(define (declare-uri-reference-header! name) > (declare-header! name > (lambda (str) > - (or ((@@ (web uri) string->uri*) str) > - (bad-header-component 'uri str))) > - uri? > + (or (string->uri-reference str) > + (bad-header-component 'uri-reference str))) > + uri-reference? > write-uri)) > > ;; emacs: (put 'declare-quality-list-header! 'scheme-indent-function 1) > @@ -1449,7 +1458,7 @@ treated specially, and is just returned as a plain string." > > ;; Content-Location = ( absoluteURI | relativeURI ) > ;; > -(declare-relative-uri-header! "Content-Location") > +(declare-uri-reference-header! "Content-Location") > > ;; Content-MD5 = <base64 of 128 bit MD5 digest as per RFC 1864> > ;; > @@ -1752,7 +1761,7 @@ treated specially, and is just returned as a plain string." > > ;; Referer = ( absoluteURI | relativeURI ) > ;; > -(declare-relative-uri-header! "Referer") > +(declare-uri-reference-header! "Referer") > > ;; TE = #( t-codings ) > ;; t-codings = "trailers" | ( transfer-extension [ accept-params ] ) > @@ -1789,7 +1798,7 @@ treated specially, and is just returned as a plain string." > > ;; Location = absoluteURI > ;; > -(declare-uri-header! "Location") > +(declare-absolute-uri-header! "Location") > > ;; Proxy-Authenticate = 1#challenge > ;; > diff --git a/module/web/uri.scm b/module/web/uri.scm > index 7fe0100..8a8e1d9 100644 > --- a/module/web/uri.scm > +++ b/module/web/uri.scm > @@ -40,11 +40,15 @@ > string->uri uri->string > uri-decode uri-encode > split-and-decode-uri-path > - encode-and-join-uri-path)) > + encode-and-join-uri-path > + > + uri-reference? relative-ref? absolute-uri? > + build-uri-reference build-relative-ref build-absolute-uri > + string->uri-reference string->relative-ref string->absolute-uri)) > > (define-record-type <uri> > (make-uri scheme userinfo host port path query fragment) > - uri? > + uri-reference? > (scheme uri-scheme) > (userinfo uri-userinfo) > (host uri-host) > @@ -53,8 +57,51 @@ > (query uri-query) > (fragment uri-fragment)) > > +;;; > +;;; Predicates. > +;;; > +;;; These are quick, and assume rigid validation at construction time. > + > +;;; RFC 3986, #3. > +;;; > +;;; URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] > +;;; > +;;; hier-part = "//" authority path-abempty > +;;; / path-absolute > +;;; / path-rootless > +;;; / path-empty > + > +(define (uri? obj) > + (and (uri-reference? obj) > + (uri-scheme obj))) > + > +;;; RFC 3986, #4.2. > +;;; > +;;; relative-ref = relative-part [ "?" query ] [ "#" fragment ] > +;;; > +;;; relative-part = "//" authority path-abempty > +;;; / path-absolute > +;;; / path-noscheme > +;;; / path-empty > + > +(define (relative-ref? obj) > + (and (uri-reference? obj) > + (not (uri-scheme obj)))) > + > +;;; RFC 3986, #4.3. > +;;; > +;;; absolute-URI = scheme ":" hier-part [ "?" query ] > + > (define (absolute-uri? obj) > - (and (uri? obj) (uri-scheme obj) #t)) > + (and (uri-reference? obj) > + (uri-scheme obj) > + (not (uri-fragment obj)))) > + > +\f > +;;; > +;;; Constructors. > +;;; > +;;; Disable validation at your own peril! > > (define (uri-error message . args) > (throw 'uri-error message args)) > @@ -62,9 +109,13 @@ > (define (positive-exact-integer? port) > (and (number? port) (exact? port) (integer? port) (positive? port))) > > -(define (validate-uri scheme userinfo host port path query fragment) > +(define* (validate-uri-reference scheme userinfo host port path query fragment > + #:key scheme? no-scheme? no-fragment? > + (relative-part? (not scheme))) > (cond > - ((not (symbol? scheme)) > + ((and scheme no-scheme?) > + (uri-error "Expected no scheme: ~s" scheme)) > + ((and (or scheme? scheme) (not (symbol? scheme))) > (uri-error "Expected a symbol for the URI scheme: ~s" scheme)) > ((and (or userinfo port) (not host)) > (uri-error "Expected a host, given userinfo or port")) > @@ -76,9 +127,45 @@ > (uri-error "Expected string for userinfo: ~s" userinfo)) > ((not (string? path)) > (uri-error "Expected string for path: ~s" path)) > - ((and host (not (string-null? path)) > - (not (eqv? (string-ref path 0) #\/))) > - (uri-error "Expected path of absolute URI to start with a /: ~a" path)))) > + ((and query (not (string? query))) > + (uri-error "Expected string for query: ~s" query)) > + ((and fragment no-fragment?) > + (uri-error "Expected no fragment: ~s" fragment)) > + ((and fragment (not (string? fragment))) > + (uri-error "Expected string for fragment: ~s" fragment)) > + ;; Strict validation of allowed paths, based on other components. > + ;; Refer to RFC 3986 for the details. > + ((not (string-null? path)) > + (if host > + (cond > + ((not (eqv? (string-ref path 0) #\/)) > + (uri-error > + "Expected absolute path starting with \"/\": ~a" path))) > + (cond > + ((string-prefix? "//" path) > + (uri-error > + "Expected path not starting with \"//\" (no host): ~a" path)) > + ((and relative-part? > + (not (eqv? (string-ref path 0) #\/)) > + (let ((colon (string-index path #\:))) > + (and colon (not (string-index path #\/ 0 colon))))) > + (uri-error > + "Expected relative path's first segment without \":\": ~a" > + path))))))) > + > +(define (validate-uri scheme userinfo host port path query fragment) > + (validate-uri-reference scheme userinfo host port path query fragment > + #:scheme? #t)) > + > +(define (validate-relative-ref scheme userinfo host port path query fragment) > + (validate-uri-reference scheme userinfo host port path query fragment > + #:no-scheme? #t > + #:relative-part? #t)) > + > +(define (validate-absolute-uri scheme userinfo host port path query fragment) > + (validate-uri-reference scheme userinfo host port path query fragment > + #:scheme? #t > + #:no-fragment? #t)) > > (define* (build-uri scheme #:key userinfo host port (path "") query fragment > (validate? #t)) > @@ -91,6 +178,38 @@ is valid." > (validate-uri scheme userinfo host port path query fragment)) > (make-uri scheme userinfo host port path query fragment)) > > +(define* (build-uri-reference #:key scheme userinfo host port > + (path "") query fragment > + (validate? #t)) > + "Construct a URI-reference object. Fields are the same as for > +build-uri except that SCHEME may also be #f." > + (if validate? > + (validate-uri-reference scheme userinfo host port path query fragment)) > + (make-uri scheme userinfo host port path query fragment)) > + > +(define* (build-relative-ref #:key userinfo host port > + (path "") query fragment > + (validate? #t)) > + "Construct an absolute-URI object. Fields are the same as for > +build-uri except there is no scheme." > + (if validate? > + (validate-relative-ref #f userinfo host port path query fragment)) > + (make-uri #f userinfo host port path query fragment)) > + > +(define* (build-absolute-uri #:key scheme userinfo host port > + (path "") query > + (validate? #t)) > + "Construct an absolute-URI object. Fields are the same as for > +build-uri except there is no fragment." > + (if validate? > + (validate-absolute-uri scheme userinfo host port path query #f)) > + (make-uri scheme userinfo host port path query #f)) > + > +\f > +;;; > +;;; Converters. > +;;; > + > ;; See RFC 3986 #3.2.2 for comments on percent-encodings, IDNA (RFC > ;; 3490), and non-ASCII host names. > ;; > @@ -173,9 +292,7 @@ is valid." > (define uri-regexp > (make-regexp uri-pat)) > > -(define (string->uri* string) > - "Parse STRING into a URI object. Return #f if the string > -could not be parsed." > +(define (string->uri* string validate) > (% (let ((m (regexp-exec uri-regexp string))) > (if (not m) (abort)) > (let ((scheme (let ((str (match:substring m 2))) > @@ -190,6 +307,7 @@ could not be parsed." > (parse-authority authority abort) > (values #f #f #f))) > (lambda (userinfo host port) > + (validate scheme userinfo host port path query fragment) > (make-uri scheme userinfo host port path query fragment))))) > (lambda (k) > #f))) > @@ -197,8 +315,22 @@ could not be parsed." > (define (string->uri string) > "Parse STRING into a URI object. Return #f if the string > could not be parsed." > - (let ((uri (string->uri* string))) > - (and uri (uri-scheme uri) uri))) > + (string->uri* string validate-uri)) > + > +(define (string->uri-reference string) > + "Parse STRING into a URI-reference object. Return #f if the string > +could not be parsed." > + (string->uri* string validate-uri-reference)) > + > +(define (string->relative-ref string) > + "Parse STRING into a relative-ref object. Return #f if the string > +could not be parsed." > + (string->uri* string validate-relative-ref)) > + > +(define (string->absolute-uri string) > + "Parse STRING into an absolute-URI object. Return #f if the string > +could not be parsed." > + (string->uri* string validate-absolute-uri)) > > (define *default-ports* (make-hash-table)) > > diff --git a/test-suite/tests/web-http.test b/test-suite/tests/web-http.test > index 2913724..b836926 100644 > --- a/test-suite/tests/web-http.test > +++ b/test-suite/tests/web-http.test > @@ -1,6 +1,6 @@ > ;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8; -*- > ;;;; > -;;;; Copyright (C) 2010, 2011 Free Software Foundation, Inc. > +;;;; Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc. > ;;;; > ;;;; This library is free software; you can redistribute it and/or > ;;;; modify it under the terms of the GNU Lesser General Public > @@ -132,32 +132,33 @@ > (with-test-prefix "read-request-line" > (pass-if-read-request-line "GET / HTTP/1.1" > GET > - (build-uri 'http > - #:path "/") > + (build-uri-reference > + #:path "/") > (1 . 1)) > (pass-if-read-request-line "GET http://www.w3.org/pub/WWW/TheProject.html HTTP/1.1" > GET > - (build-uri 'http > - #:host "www.w3.org" > - #:path "/pub/WWW/TheProject.html") > + (build-uri-reference > + #:scheme 'http > + #:host "www.w3.org" > + #:path "/pub/WWW/TheProject.html") > (1 . 1)) > (pass-if-read-request-line "GET /pub/WWW/TheProject.html HTTP/1.1" > GET > - (build-uri 'http > - #:path "/pub/WWW/TheProject.html") > + (build-uri-reference > + #:path "/pub/WWW/TheProject.html") > (1 . 1)) > (pass-if-read-request-line "HEAD /etc/hosts?foo=bar HTTP/1.1" > HEAD > - (build-uri 'http > - #:path "/etc/hosts" > - #:query "foo=bar") > + (build-uri-reference > + #:path "/etc/hosts" > + #:query "foo=bar") > (1 . 1))) > > (with-test-prefix "write-request-line" > (pass-if-write-request-line "GET / HTTP/1.1" > GET > - (build-uri 'http > - #:path "/") > + (build-uri-reference > + #:path "/") > (1 . 1)) > ;;; FIXME: Test fails due to scheme, host always being removed. > ;;; However, it should be supported to request these be present, and > @@ -170,14 +171,14 @@ > ;; (1 . 1)) > (pass-if-write-request-line "GET /pub/WWW/TheProject.html HTTP/1.1" > GET > - (build-uri 'http > - #:path "/pub/WWW/TheProject.html") > + (build-uri-reference > + #:path "/pub/WWW/TheProject.html") > (1 . 1)) > (pass-if-write-request-line "HEAD /etc/hosts?foo=bar HTTP/1.1" > HEAD > - (build-uri 'http > - #:path "/etc/hosts" > - #:query "foo=bar") > + (build-uri-reference > + #:path "/etc/hosts" > + #:query "foo=bar") > (1 . 1))) > > (with-test-prefix "read-response-line" > @@ -252,6 +253,12 @@ > (pass-if-parse content-length "010" 10) > (pass-if-parse content-location "http://foo/" > (build-uri 'http #:host "foo" #:path "/")) > + (pass-if-parse content-location "//foo/" > + (build-uri-reference #:host "foo" #:path "/")) > + (pass-if-parse content-location "/etc/foo" > + (build-uri-reference #:path "/etc/foo")) > + (pass-if-parse content-location "foo" > + (build-uri-reference #:path "foo")) > (pass-if-parse content-range "bytes 10-20/*" '(bytes (10 . 20) *)) > (pass-if-parse content-range "bytes */*" '(bytes * *)) > (pass-if-parse content-range "bytes */30" '(bytes * 30)) > @@ -319,6 +326,14 @@ > (pass-if-parse range "bytes=-20,-30" '(bytes (#f . 20) (#f . 30))) > (pass-if-parse referer "http://foo/bar?baz" > (build-uri 'http #:host "foo" #:path "/bar" #:query "baz")) > + (pass-if-parse referer "//foo/bar?baz" > + (build-uri-reference #:host "foo" > + #:path "/bar" > + #:query "baz")) > + (pass-if-parse referer "/etc/foo" > + (build-uri-reference #:path "/etc/foo")) > + (pass-if-parse referer "foo" > + (build-uri-reference #:path "foo")) > (pass-if-parse te "trailers" '((trailers))) > (pass-if-parse te "trailers,foo" '((trailers) (foo))) > (pass-if-parse user-agent "guile" "guile")) > @@ -333,6 +348,9 @@ > (pass-if-parse etag "W/\"foo\"" '("foo" . #f)) > (pass-if-parse location "http://other-place" > (build-uri 'http #:host "other-place")) > + (pass-if-any-error location "//other-place") > + (pass-if-any-error location "/etc/foo") > + (pass-if-any-error location "foo") > (pass-if-parse proxy-authenticate "Basic realm=\"guile\"" > '((basic (realm . "guile")))) > (pass-if-parse retry-after "Tue, 15 Nov 1994 08:12:31 GMT" > diff --git a/test-suite/tests/web-request.test b/test-suite/tests/web-request.test > index 8cf1c2e..68721d3 100644 > --- a/test-suite/tests/web-request.test > +++ b/test-suite/tests/web-request.test > @@ -1,6 +1,6 @@ > ;;;; web-request.test --- HTTP requests -*- mode: scheme; coding: utf-8; -*- > ;;;; > -;;;; Copyright (C) 2010, 2011 Free Software Foundation, Inc. > +;;;; Copyright (C) 2010, 2011, 2013 Free Software Foundation, Inc. > ;;;; > ;;;; This library is free software; you can redistribute it and/or > ;;;; modify it under the terms of the GNU Lesser General Public > @@ -53,7 +53,8 @@ Accept-Language: en-gb, en;q=0.9\r > > (pass-if (equal? (request-method r) 'GET)) > > - (pass-if (equal? (request-uri r) (build-uri 'http #:path "/qux"))) > + (pass-if (equal? (request-uri r) > + (build-uri-reference #:path "/qux"))) > > (pass-if (equal? (read-request-body r) #f)) > > diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test > index 3f6e7e3..21d8044 100644 > --- a/test-suite/tests/web-uri.test > +++ b/test-suite/tests/web-uri.test > @@ -1,6 +1,6 @@ > ;;;; web-uri.test --- URI library -*- mode: scheme; coding: utf-8; -*- > ;;;; > -;;;; Copyright (C) 2010, 2011, 2012 Free Software Foundation, Inc. > +;;;; Copyright (C) 2010, 2011, 2012, 2013 Free Software Foundation, Inc. > ;;;; > ;;;; This library is free software; you can redistribute it and/or > ;;;; modify it under the terms of the GNU Lesser General Public > @@ -27,7 +27,7 @@ > > > (define* (uri=? uri #:key scheme userinfo host port path query fragment) > - (and (uri? uri) > + (and (uri-reference? uri) > (equal? (uri-scheme uri) scheme) > (equal? (uri-userinfo uri) userinfo) > (equal? (uri-host uri) host) > @@ -123,6 +123,22 @@ > "Expected.*host" > (build-uri 'http #:userinfo "foo"))) > > +(with-test-prefix "build-uri-reference" > + (pass-if "//host/etc/foo" > + (uri=? (build-uri-reference #:host "host" > + #:path "/etc/foo") > + #:host "host" > + #:path "/etc/foo")) > + > + (pass-if "/path/to/some/foo?query" > + (uri=? (build-uri-reference #:path "/path/to/some/foo" > + #:query "query") > + #:path "/path/to/some/foo" > + #:query "query")) > + > + (pass-if "nextdoc/foo" > + (uri=? (build-uri-reference #:path "nextdoc/foo") > + #:path "nextdoc/foo"))) > > (with-test-prefix "string->uri" > (pass-if "ftp:" > @@ -212,6 +228,30 @@ > #:scheme 'file > #:path "/etc/hosts"))) > > +(with-test-prefix "string->uri-reference" > + (pass-if "/" > + (uri=? (string->uri-reference "/") > + #:path "/")) > + > + (pass-if "/path/to/foo" > + (uri=? (string->uri-reference "/path/to/foo") > + #:path "/path/to/foo")) > + > + (pass-if "//example.org" > + (uri=? (string->uri-reference "//example.org") > + #:host "example.org" > + #:path "")) > + > + (pass-if "//bar@example.org/path/to/foo" > + (uri=? (string->uri-reference "//bar@example.org/path/to/foo") > + #:userinfo "bar" > + #:host "example.org" > + #:path "/path/to/foo")) > + > + (pass-if "nextdoc/foo" > + (uri=? (string->uri-reference "nextdoc/foo") > + #:path "nextdoc/foo"))) > + > (with-test-prefix "uri->string" > (pass-if "ftp:" > (equal? "ftp:" > @@ -248,7 +288,27 @@ > > (pass-if "http://foo:/" > (equal? "http://foo/" > - (uri->string (string->uri "http://foo:/"))))) > + (uri->string (string->uri "http://foo:/")))) > + > + (pass-if "/" > + (equal? "/" > + (uri->string (string->uri-reference "/")))) > + > + (pass-if "/path/to/foo" > + (equal? "/path/to/foo" > + (uri->string (string->uri-reference "/path/to/foo")))) > + > + (pass-if "//example.org" > + (equal? "//example.org" > + (uri->string (string->uri-reference "//example.org")))) > + > + (pass-if "//bar@example.org/path/to/foo" > + (equal? "//bar@example.org/path/to/foo" > + (uri->string (string->uri-reference "//bar@example.org/path/to/foo")))) > + > + (pass-if "nextdoc/foo" > + (equal? "nextdoc/foo" > + (uri->string (string->uri-reference "nextdoc/foo"))))) > > (with-test-prefix "decode" > (pass-if "foo%20bar" ^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#12827: [PATCH] Tweak web modules, support relative URIs 2016-06-20 19:52 ` Andy Wingo @ 2016-06-21 13:22 ` Ludovic Courtès 2017-05-21 12:05 ` Andy Wingo 0 siblings, 1 reply; 9+ messages in thread From: Ludovic Courtès @ 2016-06-21 13:22 UTC (permalink / raw) To: Andy Wingo; +Cc: 12827, Daniel Hartwig Andy Wingo <wingo@pobox.com> skribis: > I would like to apply this patch, to master at least. Any objections? > We need documentation for the new exports, is the only missing thing. On a quick glance that looks good. My only concern would be incompatibilities; for instance, the ‘content-location’ can now be an object that doesn’t pass ‘uri?’, IIUC. Not sure how much of a problem that is. Ludo’. ^ permalink raw reply [flat|nested] 9+ messages in thread
* bug#12827: [PATCH] Tweak web modules, support relative URIs 2016-06-21 13:22 ` Ludovic Courtès @ 2017-05-21 12:05 ` Andy Wingo 0 siblings, 0 replies; 9+ messages in thread From: Andy Wingo @ 2017-05-21 12:05 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 12827-done, Daniel Hartwig On Tue 21 Jun 2016 15:22, ludo@gnu.org (Ludovic Courtès) writes: > Andy Wingo <wingo@pobox.com> skribis: > >> I would like to apply this patch, to master at least. Any objections? >> We need documentation for the new exports, is the only missing thing. > > On a quick glance that looks good. My only concern would be > incompatibilities; for instance, the ‘content-location’ can now be an > object that doesn’t pass ‘uri?’, IIUC. Not sure how much of a problem > that is. I have applied this patch with some modifications and added docs. Notably, the behavior of uri? is unchanged relative to master (though there's a deprecation note; see NEWS) and string->uri keeps its old behavior of not throwing exceptions. I also removed the "absolute-uri?" stuff because it was unused and it seems silly to reserve that useful name in that way -- RFC 3986 defines "absolute uri" simply as being a URI without a fragment. Weird definition. Thank you for the patch, Daniel, and thanks to Ludo and Mark for working through this bug. Andy ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2017-05-21 12:05 UTC | newest] Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- [not found] <CAN3veRcJ4EMJ53vWSRG0HXfwdXbhUmdvUu8EuLfVV7abjZEt1Q@mail.gmail.com> 2013-02-24 10:45 ` bug#12827: [PATCH] Tweak web modules, support relative URIs Mark H Weaver [not found] ` <87vc9i6ld2.fsf@tines.lan> 2013-02-24 12:31 ` Daniel Hartwig 2013-02-24 19:55 ` Mark H Weaver 2013-03-13 11:05 ` Andy Wingo 2013-03-16 14:25 ` Daniel Hartwig 2013-03-20 10:20 ` Andy Wingo 2016-06-20 19:52 ` Andy Wingo 2016-06-21 13:22 ` Ludovic Courtès 2017-05-21 12:05 ` Andy Wingo
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).