unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
From: Daniel Hartwig <mandyke@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: 12827@debbugs.gnu.org
Subject: bug#12827: [2.0.6] web client: fails to parse 404 header
Date: Sat, 24 Nov 2012 19:23:04 +0800	[thread overview]
Message-ID: <CAN3veRfzRovVKAxRcTJnPd2+hc-MpDPz3J1HAgc_L0Kmd3h89w@mail.gmail.com> (raw)
In-Reply-To: <87ip8wezdd.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 686 bytes --]

On 24 November 2012 06:19, Ludovic Courtès <ludo@gnu.org> wrote:
> Any update on that?  The plan is to release 2.0.7 next week, so it’d be
> great if this could be in.

I have made a first attempt at the doc strings and manual.  This
involved first syncronizing the two, as only the manual had been
receiving updates.

Some more tweaking to the code.

Personally I am not 100% on this, but I attach it for comment anyway.
I will not be able to work on it again for a short while.

A quick solution may be to silently introduce just enough to fix the
current bug, and worry about the extra predicates, uri-record-type vs.
rfc-definition-of-uri, etc. later.

Regards

[-- Attachment #2: 0001-syncronize-web-module-docstrings-with-manual.patch --]
[-- Type: application/octet-stream, Size: 21066 bytes --]

From fcc01b345b93d7a75980d7607684e4a5b3243daa Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Sat, 24 Nov 2012 14:10:12 +0800
Subject: [PATCH 1/3] syncronize web module docstrings with manual

* doc/ref/web.texi: Fix spacing.  Update with a few missing function
  descriptions.

* module/web/client.scm:
* module/web/http.scm:
* module/web/request.scm:
* module/web/server.scm:
* module/web/uri.scm: Update docstrings from manual.
---
 doc/ref/web.texi       |   35 ++++++++++++++---------------
 module/web/client.scm  |    9 ++++++++
 module/web/http.scm    |   41 ++++++++++++++++------------------
 module/web/request.scm |   14 ++++++++----
 module/web/server.scm  |   10 ++++-----
 module/web/uri.scm     |   57 +++++++++++++++++++++++++++++-------------------
 6 files changed, 96 insertions(+), 70 deletions(-)

diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 161a28d..e6e594e 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -431,8 +431,8 @@ from @code{header-writer}.
 @end deffn
 
 @deffn {Scheme Procedure} read-headers port
-Read the headers of an HTTP message from @var{port}, returning the
-headers as an ordered alist.
+Read the headers of an HTTP message from @var{port}, returning them
+as an ordered alist.
 @end deffn
 
 @deffn {Scheme Procedure} write-headers headers port
@@ -1368,6 +1368,7 @@ Return the given response header, or @var{default} if none was present.
 the lower-level HTTP, request, and response modules.
 
 @deffn {Scheme Procedure} open-socket-for-uri uri
+Return an open input/output port for a connection to URI.
 @end deffn
 
 @deffn {Scheme Procedure} http-get uri [#:port=(open-socket-for-uri uri)] [#:version='(1 . 1)] [#:keep-alive?=#f] [#:extra-headers='()] [#:decode-body?=#t]
@@ -1470,17 +1471,17 @@ the server socket.
 
 A user may define a server implementation with the following form:
 
-@deffn {Scheme Procedure} define-server-impl name open read write close
+@deffn {Scheme Syntax} define-server-impl name open read write close
 Make a @code{<server-impl>} object with the hooks @var{open},
 @var{read}, @var{write}, and @var{close}, and bind it to the symbol
 @var{name} in the current module.
 @end deffn
 
 @deffn {Scheme Procedure} lookup-server-impl impl
-Look up a server implementation. If @var{impl} is a server
-implementation already, it is returned directly. If it is a symbol, the
+Look up a server implementation.  If @var{impl} is a server
+implementation already, it is returned directly.  If it is a symbol, the
 binding named @var{impl} in the @code{(web server @var{impl})} module is
-looked up. Otherwise an error is signaled.
+looked up.  Otherwise an error is signaled.
 
 Currently a server implementation is a somewhat opaque type, useful only
 for passing to other procedures in this module, like @code{read-client}.
@@ -1494,7 +1495,7 @@ any access to the impl objects.
 
 @deffn {Scheme Procedure} open-server impl open-params
 Open a server for the given implementation.  Return one value, the new
-server object. The implementation's @code{open} procedure is applied to
+server object.  The implementation's @code{open} procedure is applied to
 @var{open-params}, which should be a list.
 @end deffn
 
@@ -1502,7 +1503,7 @@ server object. The implementation's @code{open} procedure is applied to
 Read a new client from @var{server}, by applying the implementation's
 @code{read} procedure to the server.  If successful, return three
 values: an object corresponding to the client, a request object, and the
-request body. If any exception occurs, return @code{#f} for all three
+request body.  If any exception occurs, return @code{#f} for all three
 values.
 @end deffn
 
@@ -1513,9 +1514,9 @@ The response and response body are produced by calling the given
 @var{handler} with @var{request} and @var{body} as arguments.
 
 The elements of @var{state} are also passed to @var{handler} as
-arguments, and may be returned as additional values. The new
+arguments, and may be returned as additional values.  The new
 @var{state}, collected from the @var{handler}'s return values, is then
-returned as a list. The idea is that a server loop receives a handler
+returned as a list.  The idea is that a server loop receives a handler
 from the user, along with whatever state values the user is interested
 in, allowing the user's handler to explicitly manage its state.
 @end deffn
@@ -1526,20 +1527,20 @@ given request.
 
 As a convenience to web handler authors, @var{response} may be given as
 an alist of headers, in which case it is used to construct a default
-response. Ensures that the response version corresponds to the request
-version. If @var{body} is a string, encodes the string to a bytevector,
-in an encoding appropriate for @var{response}. Adds a
+response.  Ensures that the response version corresponds to the request
+version.  If @var{body} is a string, encodes the string to a bytevector,
+in an encoding appropriate for @var{response}.  Adds a
 @code{content-length} and @code{content-type} header, as necessary.
 
 If @var{body} is a procedure, it is called with a port as an argument,
-and the output collected as a bytevector. In the future we might try to
+and the output collected as a bytevector.  In the future we might try to
 instead use a compressing, chunk-encoded port, and call this procedure
-later, in the write-client procedure. Authors are advised not to rely on
+later, in the write-client procedure.  Authors are advised not to rely on
 the procedure being called at any particular time.
 @end deffn
 
 @deffn {Scheme Procedure} write-client impl server client response body
-Write an HTTP response and body to @var{client}. If the server and
+Write an HTTP response and body to @var{client}.  If the server and
 client support persistent connections, it is the implementation's
 responsibility to keep track of the client thereafter, presumably by
 attaching it to the @var{server} argument somehow.
@@ -1572,7 +1573,7 @@ before sending back to the client.
 
 Additional arguments to @var{handler} are taken from @var{state}.
 Additional return values are accumulated into a new @var{state}, which
-will be used for subsequent requests. In this way a handler can
+will be used for subsequent requests.  In this way a handler can
 explicitly manage its state.
 @end deffn
 
diff --git a/module/web/client.scm b/module/web/client.scm
index cf7ea53..0991373 100644
--- a/module/web/client.scm
+++ b/module/web/client.scm
@@ -115,6 +115,15 @@
 (define* (http-get uri #:key (port (open-socket-for-uri uri))
                    (version '(1 . 1)) (keep-alive? #f) (extra-headers '())
                    (decode-body? #t))
+  "Connect to the server corresponding to @var{uri} and ask for the
+resource, using the @code{GET} method.  If you already have a port open,
+pass it as @var{port}.  The port will be closed at the end of the
+request unless @var{keep-alive?} is true.  Any extra headers in the
+alist @var{extra-headers} will be added to the request.
+
+If @var{decode-body?} is true, as is the default, the body of the
+response will be decoded to string, if it is a textual content-type.
+Otherwise it will be returned as a bytevector."
   (let ((req (build-request uri #:version version
                             #:headers (if keep-alive?
                                           extra-headers
diff --git a/module/web/http.scm b/module/web/http.scm
index cc5dd5a..3b78d08 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -100,12 +100,7 @@
                           validator
                           writer
                           #:key multiple?)
-  "Define a parser, validator, and writer for the HTTP header, @var{name}.
-
-@var{parser} should be a procedure that takes a string and returns a
-Scheme value.  @var{validator} is a predicate for whether the given
-Scheme value is valid for this header.  @var{writer} takes a value and a
-port, and writes the value to the port."
+  "Declare a parser, validator, and writer for a given header."
   (if (and (string? name) parser validator writer)
       (let ((decl (make-header-decl name parser validator writer multiple?)))
         (hashq-set! *declared-headers* (string->header name) decl)
@@ -120,27 +115,33 @@ port, and writes the value to the port."
         (string-titlecase (symbol->string sym)))))
 
 (define (known-header? sym)
-  "Return @code{#t} if there are parsers and writers registered for this
-header, otherwise @code{#f}."
+  "Return @code{#t} iff @var{sym} is a known header, with associated
+parsers and serialization procedures."
   (and (lookup-header-decl sym) #t))
 
 (define (header-parser sym)
-  "Returns a procedure to parse values for the given header."
+  "Return the value parser for headers named @var{sym}.  The result is a
+procedure that takes one argument, a string, and returns the parsed
+value.  If the header isn't known to Guile, a default parser is returned
+that passes through the string unchanged."
   (let ((decl (lookup-header-decl sym)))
     (if decl
         (header-decl-parser decl)
         (lambda (x) x))))
 
 (define (header-validator sym)
-  "Returns a procedure to validate values for the given header."
+  "Return a predicate which returns @code{#t} if the given value is valid
+for headers named @var{sym}.  The default validator for unknown headers
+is @code{string?}."
   (let ((decl (lookup-header-decl sym)))
     (if decl
         (header-decl-validator decl)
         string?)))
 
 (define (header-writer sym)
-  "Returns a procedure to write values for the given header to a given
-port."
+  "Return a procedure that writes values for headers named @var{sym} to a
+port.  The resulting procedure takes two arguments: a value and a port.
+The default writer is @code{display}."
   (let ((decl (lookup-header-decl sym)))
     (if decl
         (header-decl-writer decl)
@@ -196,10 +197,7 @@ body was reached (i.e., a blank line)."
 
 (define (parse-header sym val)
   "Parse @var{val}, a string, with the parser registered for the header
-named @var{sym}.
-
-Returns the parsed value.  If a parser was not found, the value is
-returned as a string."
+named @var{sym}.  Returns the parsed value."
   ((header-parser sym) val))
 
 (define (valid-header? sym val)
@@ -210,17 +208,16 @@ header with name @var{sym}."
       (error "header name not a symbol" sym)))
 
 (define (write-header sym val port)
-  "Writes the given header name and value to @var{port}.  If @var{sym}
-is a known header, uses the specific writer registered for that header.
-Otherwise the value is written using @code{display}."
+  "Write the given header name and value to @var{port}, using the writer
+from @code{header-writer}."
   (display (header->string sym) port)
   (display ": " port)
   ((header-writer sym) val port)
   (display "\r\n" port))
 
 (define (read-headers port)
-  "Read an HTTP message from @var{port}, returning the headers as an
-ordered alist."
+  "Read the headers of an HTTP message from @var{port}, returning them
+as an ordered alist."
   (let lp ((headers '()))
     (call-with-values (lambda () (read-header port))
       (lambda (k v)
@@ -230,7 +227,7 @@ ordered alist."
 
 (define (write-headers headers port)
   "Write the given header alist to @var{port}.  Doesn't write the final
-\\r\\n, as the user might want to add another header."
+@samp{\\r\\n}, as the user might want to add another header."
   (let lp ((headers headers))
     (if (pair? headers)
         (begin
diff --git a/module/web/request.scm b/module/web/request.scm
index 40d4a66..51ef473 100644
--- a/module/web/request.scm
+++ b/module/web/request.scm
@@ -195,7 +195,11 @@ metadata, @var{meta}.
 As a side effect, sets the encoding on @var{port} to
 ISO-8859-1 (latin-1), so that reading one character reads one byte.  See
 the discussion of character sets in \"HTTP Requests\" in the manual, for
-more information."
+more information.
+
+Note that the body is not part of the request.  Once you have read a
+request, you may read the body separately, and likewise for writing
+requests."
   (set-port-encoding! port "ISO-8859-1")
   (call-with-values (lambda () (read-request-line port))
     (lambda (method uri version)
@@ -205,7 +209,7 @@ more information."
 (define (write-request r port)
   "Write the given HTTP request to @var{port}.
 
-Returns a new request, whose @code{request-port} will continue writing
+Return a new request, whose @code{request-port} will continue writing
 on @var{port}, perhaps using some transfer encoding."
   (write-request-line (request-method r) (request-uri r)
                       (request-version r) port)
@@ -217,8 +221,8 @@ on @var{port}, perhaps using some transfer encoding."
                     (request-headers r) (request-meta r) port)))
 
 (define (read-request-body r)
-  "Reads the request body from @var{r}, as a bytevector.  Returns
-@code{#f} if there was no request body."
+  "Reads the request body from @var{r}, as a bytevector.  Return @code{#f}
+if there was no request body."
   (let ((nbytes (request-content-length r)))
     (and nbytes
          (let ((bv (get-bytevector-n (request-port r) nbytes)))
@@ -297,6 +301,8 @@ request @var{r}."
 
 ;; Misc accessors
 (define* (request-absolute-uri r #:optional default-host default-port)
+  "A helper routine to determine the absolute URI of a request, using the
+@code{host} header and the default host and port."
   (let ((uri (request-uri r)))
     (if (uri-host uri)
         uri
diff --git a/module/web/server.scm b/module/web/server.scm
index 42887e6..ed88329 100644
--- a/module/web/server.scm
+++ b/module/web/server.scm
@@ -143,7 +143,7 @@ for passing to other procedures in this module, like
 
 ;; -> server
 (define (open-server impl open-params)
-  "Open a server for the given implementation.  Returns one value, the
+  "Open a server for the given implementation.  Return one value, the
 new server object.  The implementation's @code{open} procedure is
 applied to @var{open-params}, which should be a list."
   (apply (server-impl-open impl) open-params))
@@ -151,9 +151,9 @@ applied to @var{open-params}, which should be a list."
 ;; -> (client request body | #f #f #f)
 (define (read-client impl server)
   "Read a new client from @var{server}, by applying the implementation's
-@code{read} procedure to the server.  If successful, returns three
+@code{read} procedure to the server.  If successful, return three
 values: an object corresponding to the client, a request object, and the
-request body.  If any exception occurs, returns @code{#f} for all three
+request body.  If any exception occurs, return @code{#f} for all three
 values."
   (call-with-error-handling
    (lambda ()
@@ -364,7 +364,7 @@ attaching it to the @var{server} argument somehow."
 ;; -> new-state
 (define (serve-one-client handler impl server state)
   "Read one request from @var{server}, call @var{handler} on the request
-and body, and write the response to the client.  Returns the new state
+and body, and write the response to the client.  Return the new state
 produced by the handler procedure."
   (debug-elapsed 'serve-again)
   (call-with-values
@@ -404,7 +404,7 @@ The response and body will be run through @code{sanitize-response}
 before sending back to the client.
 
 Additional arguments to @var{handler} are taken from
-@var{state}. Additional return values are accumulated into a new
+@var{state}.  Additional return values are accumulated into a new
 @var{state}, which will be used for subsequent requests.  In this way a
 handler can explicitly manage its state.
 
diff --git a/module/web/uri.scm b/module/web/uri.scm
index 78614a5..ddab7be 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -79,8 +79,10 @@
 
 (define* (build-uri scheme #:key userinfo host port (path "") query fragment
                     (validate? #t))
-  "Construct a URI object. If @var{validate?} is true, also run some
-consistency checks to make sure that the constructed URI is valid."
+  "Construct a URI object.  @var{scheme} should be a symbol, and the rest
+of the fields are either strings or @code{#f}.  If @var{validate?} is
+true, also run some consistency checks to make sure that the constructed
+URI is valid."
   (if validate?
       (validate-uri scheme userinfo host port path query fragment))
   (make-uri scheme userinfo host port path query fragment))
@@ -168,7 +170,7 @@ consistency checks to make sure that the constructed URI is valid."
   (make-regexp uri-pat))
 
 (define (string->uri string)
-  "Parse @var{string} into a URI object. Returns @code{#f} if the string
+  "Parse @var{string} into a URI object.  Return @code{#f} if the string
 could not be parsed."
   (% (let ((m (regexp-exec uri-regexp string)))
        (if (not m) (abort))
@@ -191,10 +193,7 @@ could not be parsed."
 (define *default-ports* (make-hash-table))
 
 (define (declare-default-port! scheme port)
-  "Declare a default port for the given URI scheme.
-
-Default ports are for printing URI objects: a default port is not
-printed."
+  "Declare a default port for the given URI scheme."
   (hashq-set! *default-ports* scheme port))
 
 (define (default-port? scheme port)
@@ -205,7 +204,9 @@ printed."
 (declare-default-port! 'https 443)
 
 (define (uri->string uri)
-  "Serialize @var{uri} to a string."
+  "Serialize @var{uri} to a string.  If the URI has a port that is the
+default port for its scheme, the port is not included in the
+serialization."
   (let* ((scheme-str (string-append
                       (symbol->string (uri-scheme uri)) ":"))
          (userinfo (uri-userinfo uri))
@@ -293,18 +294,24 @@ printed."
   (string->char-set "0123456789abcdefABCDEF"))
 
 (define* (uri-decode str #:key (encoding "utf-8"))
-  "Percent-decode the given @var{str}, according to @var{encoding}.
+  "Percent-decode the given @var{str}, according to @var{encoding},
+which should be the name of a character encoding.
 
 Note that this function should not generally be applied to a full URI
 string. For paths, use split-and-decode-uri-path instead. For query
 strings, split the query on @code{&} and @code{=} boundaries, and decode
 the components separately.
 
-Note that percent-encoded strings encode @emph{bytes}, not characters.
-There is no guarantee that a given byte sequence is a valid string
-encoding. Therefore this routine may signal an error if the decoded
-bytes are not valid for the given encoding. Pass @code{#f} for
-@var{encoding} if you want decoded bytes as a bytevector directly."
+Note also that percent-encoded strings encode @emph{bytes}, not
+characters.  There is no guarantee that a given byte sequence is a valid
+string encoding. Therefore this routine may signal an error if the
+decoded bytes are not valid for the given encoding. Pass @code{#f} for
+@var{encoding} if you want decoded bytes as a bytevector directly.
+@xref{Ports, @code{set-port-encoding!}}, for more information on
+character encodings.
+
+Returns a string of the decoded characters, or a bytevector if
+@var{encoding} was @code{#f}."
   (let* ((len (string-length str))
          (bv
           (call-with-output-bytevector*
@@ -358,10 +365,13 @@ bytes are not valid for the given encoding. Pass @code{#f} for
 ;;
 (define* (uri-encode str #:key (encoding "utf-8")
                      (unescaped-chars unreserved-chars))
-  "Percent-encode any character not in the character set, @var{unescaped-chars}.
+  "Percent-encode any character not in the character set,
+@var{unescaped-chars}.
 
-Percent-encoding first writes out the given character to a bytevector
-within the given @var{encoding}, then encodes each byte as
+The default character set includes alphanumerics from ASCII, as well as
+the special characters @samp{-}, @samp{.}, @samp{_}, and @samp{~}.  Any
+other character will be percent-encoded, by writing out the character to
+a bytevector within the given @var{encoding}, then encoding each byte as
 @code{%@var{HH}}, where @var{HH} is the hexadecimal representation of
 the byte."
   (define (needs-escaped? ch)
@@ -387,15 +397,18 @@ the byte."
       str))
 
 (define (split-and-decode-uri-path path)
-  "Split @var{path} into its components, and decode each
-component, removing empty components.
+  "Split @var{path} into its components, and decode each component,
+removing empty components.
 
-For example, @code{\"/foo/bar/\"} decodes to the two-element list,
-@code{(\"foo\" \"bar\")}."
+For example, @code{\"/foo/bar%20baz/\"} decodes to the two-element list,
+@code{(\"foo\" \"bar baz\")}."
   (filter (lambda (x) (not (string-null? x)))
           (map uri-decode (string-split path #\/))))
 
 (define (encode-and-join-uri-path parts)
   "URI-encode each element of @var{parts}, which should be a list of
-strings, and join the parts together with @code{/} as a delimiter."
+strings, and join the parts together with @code{/} as a delimiter.
+
+For example, the list @code{(\"scrambled eggs\" \"biscuits&gravy\")}
+encodes as @code{\"scrambled%20eggs/biscuits%26gravy\"}."
   (string-join (map uri-encode parts) "/"))
-- 
1.7.10.4


[-- Attachment #3: 0002-web-uri-document-that-uri-port-is-an-integer.patch --]
[-- Type: application/octet-stream, Size: 2827 bytes --]

From 50c5235ca50e356acfad709ae820c2963a8ff11c Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Sat, 24 Nov 2012 15:11:21 +0800
Subject: [PATCH 2/3] (web uri): document that uri-port is an integer

* doc/ref/web.texi (URIs):
* module/web/uri.scm (build-uri): Document that uri-port is an integer.
---
 doc/ref/web.texi   |   13 +++++++------
 module/web/uri.scm |    9 +++++----
 2 files changed, 12 insertions(+), 10 deletions(-)

diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index e6e594e..3e93bea 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -209,10 +209,11 @@ access to them.
 @deffn {Scheme Procedure} build-uri scheme [#:userinfo=@code{#f}] [#:host=@code{#f}] @
        [#:port=@code{#f}] [#:path=@code{""}] [#:query=@code{#f}] @
        [#:fragment=@code{#f}] [#:validate?=@code{#t}]
-Construct a URI object.  @var{scheme} should be a symbol, and the rest
-of the fields are either strings or @code{#f}.  If @var{validate?} is
-true, also run some consistency checks to make sure that the constructed
-URI is valid.
+Construct a URI object.  @var{scheme} should be a symbol, @var{port}
+either a positive, exact integer or @code{#f}, and the rest of the
+fields are either strings or @code{#f}.  If @var{validate?} is true,
+also run some consistency checks to make sure that the constructed URI
+is valid.
 @end deffn
 
 @deffn {Scheme Procedure} uri? x
@@ -224,8 +225,8 @@ URI is valid.
 @deffnx {Scheme Procedure} uri-query uri
 @deffnx {Scheme Procedure} uri-fragment uri
 A predicate and field accessors for the URI record type.  The URI scheme
-will be a symbol, and the rest either strings or @code{#f} if not
-present.
+will be a symbol, the port either a positive, exact integer or @code{#f},
+and the rest either strings or @code{#f} if not present.
 @end deffn
 
 @deffn {Scheme Procedure} string->uri string
diff --git a/module/web/uri.scm b/module/web/uri.scm
index ddab7be..e84bc03 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -79,10 +79,11 @@
 
 (define* (build-uri scheme #:key userinfo host port (path "") query fragment
                     (validate? #t))
-  "Construct a URI object.  @var{scheme} should be a symbol, and the rest
-of the fields are either strings or @code{#f}.  If @var{validate?} is
-true, also run some consistency checks to make sure that the constructed
-URI is valid."
+  "Construct a URI object.  @var{scheme} should be a symbol, @var{port}
+either a positive, exact integer or @code{#f}, and the rest of the
+fields are either strings or @code{#f}.  If @var{validator?} is true,
+also run some consistency checks to make sure that the constructed URI
+is valid."
   (if validate?
       (validate-uri scheme userinfo host port path query fragment))
   (make-uri scheme userinfo host port path query fragment))
-- 
1.7.10.4


[-- Attachment #4: 0003-uri-reference-support.patch --]
[-- Type: application/octet-stream, Size: 15823 bytes --]

From ebbcc923b776fd3fbfe28c5c050ee8df7b71529d Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Thu, 8 Nov 2012 13:11:52 +0800
Subject: [PATCH 3/3] uri-reference support

* module/web/uri.scm (build-uri, validate-uri): Also build
  relative-refs, which have no scheme.
  (string->uri, uri->string): Also support URI objects with no scheme.
  (uri-reference?, relative-ref?, absolute-uri?): New predicates to
  distinguish various URI-like objects.
  (uri?): Redefine so that semantics are unchanged; only return #t for
  objects previously built and validated by build-uri?.  Such objects
  always had a uri-scheme.
* module/web/http.scm (declare-uri-reference-header!): New header type
  accepting any URI-reference.
  ("Content-Location", "Referer"): Change to URI-reference type.
* doc/ref/web.texi (URIs): Document other URI-like syntaxes defined in
  RFC 3986.  Include brief discussion.  Update functions
* test-suite/tests/web-http.test:
* test-suite/tests/web-uri.test: Add relevant tests.
---
 doc/ref/web.texi               |   39 ++++++++++---
 module/web/http.scm            |   13 ++++-
 module/web/uri.scm             |   59 ++++++++++++++-----
 test-suite/tests/web-http.test |    4 ++
 test-suite/tests/web-uri.test  |  125 ++++++++++++++++++++++++++++++++++++++++
 5 files changed, 214 insertions(+), 26 deletions(-)

diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 3e93bea..d247080 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -173,8 +173,9 @@ Guile provides a standard data type for Universal Resource Identifiers
 The generic URI syntax is as follows:
 
 @example
-URI := scheme ":" ["//" [userinfo "@@"] host [":" port]] path \
-       [ "?" query ] [ "#" fragment ]
+URI := scheme ":" hier-part [ "?" query ] [ "#" fragment ]
+
+hier-part := ["//" [userinfo "@@"] host [":" port]] path
 @end example
 
 For example, in the URI, @indicateurl{http://www.gnu.org/help/}, the
@@ -198,6 +199,25 @@ fragment identifies a part of a resource, not the resource itself.  But
 it is useful to have a fragment field in the URI record itself, so we
 hope you will forgive the inconsistency.
 
+There are some additional URI-like syntaxes:
+
+@example
+URI-reference := URI / relative-ref
+
+relative-ref := hier-part [ "?" query ] [ "#" fragment ]
+
+absolute-URI := scheme ":" hier-part [ "?" query ]
+@end example
+
+These extra forms are useful in various situations.  For example,
+relative-refs are convenient in documents to refer to other parts of the
+same document, or resources located on the same site.
+
+A relative-ref must be considered in relation to a given base URI to
+correctly identify a resource.  The base URI is determined according to
+the context and properties of the document in which the relative-ref is
+located.  See section 5 of RFC 3986 for details.
+
 @example
 (use-modules (web uri))
 @end example
@@ -211,12 +231,14 @@ access to them.
        [#:fragment=@code{#f}] [#:validate?=@code{#t}]
 Construct a URI object.  @var{scheme} should be a symbol, @var{port}
 either a positive, exact integer or @code{#f}, and the rest of the
-fields are either strings or @code{#f}.  If @var{validate?} is true,
-also run some consistency checks to make sure that the constructed URI
-is valid.
+fields are strings.  If @var{validate?} is true, also run some
+consistency checks to make sure that the constructed URI is valid.
 @end deffn
 
 @deffn {Scheme Procedure} uri? x
+@deffnx {Scheme Procedure} uri-reference? x
+@deffnx {Scheme Procedure} relative-ref? x
+@deffnx {Scheme Procedure} absolute-uri? x
 @deffnx {Scheme Procedure} uri-scheme uri
 @deffnx {Scheme Procedure} uri-userinfo uri
 @deffnx {Scheme Procedure} uri-host uri
@@ -224,9 +246,10 @@ is valid.
 @deffnx {Scheme Procedure} uri-path uri
 @deffnx {Scheme Procedure} uri-query uri
 @deffnx {Scheme Procedure} uri-fragment uri
-A predicate and field accessors for the URI record type.  The URI scheme
-will be a symbol, the port either a positive, exact integer or @code{#f},
-and the rest either strings or @code{#f} if not present.
+Predicates and field accessors for the URI record type.  The URI scheme
+will be a symbol, the port a positive, exact integer, and the rest
+strings.  Any field other than @code{uri-path} may also be @code{#f} if
+not present.
 @end deffn
 
 @deffn {Scheme Procedure} string->uri string
diff --git a/module/web/http.scm b/module/web/http.scm
index 3b78d08..389880b 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -1179,12 +1179,19 @@ treated specially, and is just returned as a plain string."
     parse-non-negative-integer non-negative-integer? display))
 
 ;; emacs: (put 'declare-uri-header! 'scheme-indent-function 1)
-(define (declare-uri-header! name)
+(define* (declare-uri-header! name #:optional)
   (declare-header! name
     (lambda (str) (or (string->uri str) (bad-header-component 'uri str)))
     uri?
     write-uri))
 
+;; emacs: (put 'declare-uri-reference-header! 'scheme-indent-function 1)
+(define (declare-uri-reference-header! name)
+  (declare-header! name
+    (lambda (str) (or (string->uri str) (bad-header-component 'uri-reference str)))
+    uri-reference?
+    write-uri))
+
 ;; emacs: (put 'declare-quality-list-header! 'scheme-indent-function 1)
 (define (declare-quality-list-header! name)
   (declare-header! name
@@ -1437,7 +1444,7 @@ treated specially, and is just returned as a plain string."
 
 ;; Content-Location = ( absoluteURI | relativeURI )
 ;;
-(declare-uri-header! "Content-Location")
+(declare-uri-reference-header! "Content-Location")
 
 ;; Content-MD5 = <base64 of 128 bit MD5 digest as per RFC 1864>
 ;;
@@ -1726,7 +1733,7 @@ treated specially, and is just returned as a plain string."
 
 ;; Referer = ( absoluteURI | relativeURI )
 ;;
-(declare-uri-header! "Referer")
+(declare-uri-reference-header! "Referer")
 
 ;; TE = #( t-codings )
 ;; t-codings = "trailers" | ( transfer-extension [ accept-params ] )
diff --git a/module/web/uri.scm b/module/web/uri.scm
index e84bc03..e7990ad 100644
--- a/module/web/uri.scm
+++ b/module/web/uri.scm
@@ -31,7 +31,7 @@
   #:use-module (ice-9 control)
   #:use-module (rnrs bytevectors)
   #:use-module (ice-9 binary-ports)
-  #:export (uri?
+  #:export (uri? uri-reference? relative-ref? absolute-uri?
             uri-scheme uri-userinfo uri-host uri-port
             uri-path uri-query uri-fragment
 
@@ -44,7 +44,7 @@
 
 (define-record-type <uri>
   (make-uri scheme userinfo host port path query fragment)
-  uri?
+  uri-reference?
   (scheme uri-scheme)
   (userinfo uri-userinfo)
   (host uri-host)
@@ -53,15 +53,32 @@
   (query uri-query)
   (fragment uri-fragment))
 
+(define (uri? x)
+  (and (uri-reference? x)
+       (uri-scheme x)
+       #t))
+
+(define (relative-ref? x)
+  (and (uri-reference? x)
+       (not (uri-scheme x))
+       #t))
+
+(define (absolute-uri? x)
+  (and (uri-reference? x)
+       (uri-scheme x)
+       (not (uri-fragment x))
+       #t))
+
 (define (uri-error message . args)
   (throw 'uri-error message args))
 
 (define (positive-exact-integer? port)
   (and (number? port) (exact? port) (integer? port) (positive? port)))
 
-(define (validate-uri scheme userinfo host port path query fragment)
+(define (validate-uri scheme userinfo host port path query fragment reference?)
   (cond
-   ((not (symbol? scheme))
+   ((and (not (symbol? scheme))
+         (or (not reference?) scheme))
     (uri-error "Expected a symbol for the URI scheme: ~s" scheme))
    ((and (or userinfo port) (not host))
     (uri-error "Expected a host, given userinfo or port"))
@@ -85,7 +102,7 @@ fields are either strings or @code{#f}.  If @var{validator?} is true,
 also run some consistency checks to make sure that the constructed URI
 is valid."
   (if validate?
-      (validate-uri scheme userinfo host port path query fragment))
+      (validate-uri scheme userinfo host port path query fragment #t))
   (make-uri scheme userinfo host port path query fragment))
 
 ;; See RFC 3986 #3.2.2 for comments on percent-encodings, IDNA (RFC
@@ -153,6 +170,17 @@ is valid."
 ;;;               / path-absolute
 ;;;               / path-rootless
 ;;;               / path-empty
+;;;
+;;; RFC 3986, #4.
+;;;
+;;;   URI-reference = URI / relative-ref
+;;;
+;;;   relative-ref  = relative-part [ "?" query ] [ "#" fragment ]
+;;;
+;;;   relative-part = "//" authority path-abempty
+;;;                 / path-absolute
+;;;                 / path-noscheme
+;;;                 / path-empty
 
 (define scheme-pat
   "[a-zA-Z][a-zA-Z0-9+.-]*")
@@ -165,7 +193,7 @@ is valid."
 (define fragment-pat
   ".*")
 (define uri-pat
-  (format #f "^(~a):(//~a)?(~a)(\\?(~a))?(#(~a))?$"
+  (format #f "^((~a):)?(//~a)?(~a)(\\?(~a))?(#(~a))?$"
           scheme-pat authority-pat path-pat query-pat fragment-pat))
 (define uri-regexp
   (make-regexp uri-pat))
@@ -175,12 +203,12 @@ is valid."
 could not be parsed."
   (% (let ((m (regexp-exec uri-regexp string)))
        (if (not m) (abort))
-       (let ((scheme (string->symbol
-                      (string-downcase (match:substring m 1))))
-             (authority (match:substring m 2))
-             (path (match:substring m 3))
-             (query (match:substring m 5))
-             (fragment (match:substring m 7)))
+       (let ((scheme (let ((s (match:substring m 2)))
+                       (and s (string->symbol (string-downcase s)))))
+             (authority (match:substring m 3))
+             (path (match:substring m 4))
+             (query (match:substring m 6))
+             (fragment (match:substring m 8)))
          (call-with-values
              (lambda ()
                (if authority
@@ -208,8 +236,7 @@ could not be parsed."
   "Serialize @var{uri} to a string.  If the URI has a port that is the
 default port for its scheme, the port is not included in the
 serialization."
-  (let* ((scheme-str (string-append
-                      (symbol->string (uri-scheme uri)) ":"))
+  (let* ((scheme (uri-scheme uri))
          (userinfo (uri-userinfo uri))
          (host (uri-host uri))
          (port (uri-port uri))
@@ -217,7 +244,9 @@ serialization."
          (query (uri-query uri))
          (fragment (uri-fragment uri)))
     (string-append
-     scheme-str
+     (if scheme
+         (string-append (symbol->string scheme) ":")
+         "")
      (if host
          (string-append "//"
                         (if userinfo (string-append userinfo "@")
diff --git a/test-suite/tests/web-http.test b/test-suite/tests/web-http.test
index 97f5559..5d80fde 100644
--- a/test-suite/tests/web-http.test
+++ b/test-suite/tests/web-http.test
@@ -145,6 +145,8 @@
   (pass-if-parse content-length "010" 10)
   (pass-if-parse content-location "http://foo/"
                  (build-uri 'http #:host "foo" #:path "/"))
+  (pass-if-parse content-location "//foo"
+                 (build-uri #f #:host "foo"))
   (pass-if-parse content-range "bytes 10-20/*" '(bytes (10 . 20) *))
   (pass-if-parse content-range "bytes */*" '(bytes * *))
   (pass-if-parse content-range "bytes */30" '(bytes * 30))
@@ -208,6 +210,8 @@
   (pass-if-parse range "bytes=-20,-30" '(bytes (#f . 20) (#f . 30)))
   (pass-if-parse referer "http://foo/bar?baz"
                  (build-uri 'http #:host "foo" #:path "/bar" #:query "baz"))
+  (pass-if-parse referer "//foo/bar?baz"
+                 (build-uri #f #:host "foo" #:path "/bar" #:query "baz"))
   (pass-if-parse te "trailers" '((trailers)))
   (pass-if-parse te "trailers,foo" '((trailers) (foo)))
   (pass-if-parse user-agent "guile" "guile"))
diff --git a/test-suite/tests/web-uri.test b/test-suite/tests/web-uri.test
index 3f6e7e3..d4d44d6 100644
--- a/test-suite/tests/web-uri.test
+++ b/test-suite/tests/web-uri.test
@@ -78,6 +78,24 @@
            #:port 22
            #:path "/baz"))
 
+  (pass-if "foo"
+    (uri=? (build-uri #f #:path "foo")
+           #:path "foo"))
+
+  (pass-if "/foo"
+    (uri=? (build-uri #f #:path "/foo")
+           #:path "/foo"))
+
+  (pass-if "//foo/bar"
+    (uri=? (build-uri #f #:host "foo" #:path "/bar")
+           #:host "foo"
+           #:path "/bar"))
+
+  (pass-if "?foo"
+    (uri=? (build-uri #f #:query "foo")
+           #:path ""
+           #:query "foo"))
+
   (pass-if-uri-exception "non-symbol scheme"
                          "Expected.*symbol"
                          (build-uri "nonsym"))
@@ -123,6 +141,80 @@
                          "Expected.*host"
                          (build-uri 'http #:userinfo "foo")))
 
+(with-test-prefix "absolute-uri?"
+  (pass-if "ftp:"
+    (absolute-uri? (build-uri 'ftp)))
+
+  (pass-if "ftp:foo"
+    (absolute-uri? (build-uri 'ftp #:path "foo")))
+
+  (pass-if "ftp://foo/bar"
+    (absolute-uri? (build-uri 'ftp #:host "foo" #:path "/bar")))
+
+  (pass-if "ftp://foo@bar:22/baz"
+    (absolute-uri? (build-uri 'ftp #:userinfo "foo" #:host "bar" #:port 22 #:path "/baz")))
+
+  (expect-fail "foo"
+    (absolute-uri? (build-uri #f #:path "foo")))
+
+  (expect-fail "/foo"
+    (absolute-uri? (build-uri #f #:path "/foo")))
+
+  (expect-fail "//foo/bar"
+    (absolute-uri? (build-uri #f #:host "foo" #:path "/bar")))
+
+  (expect-fail "?foo"
+    (absolute-uri? (build-uri #f #:query "foo"))))
+
+(with-test-prefix "relative-ref?"
+  (expect-fail "ftp:"
+    (relative-ref? (build-uri 'ftp)))
+
+  (expect-fail "ftp:foo"
+    (relative-ref? (build-uri 'ftp #:path "foo")))
+
+  (expect-fail "ftp://foo/bar"
+    (relative-ref? (build-uri 'ftp #:host "foo" #:path "/bar")))
+
+  (expect-fail "ftp://foo@bar:22/baz"
+    (relative-ref? (build-uri 'ftp #:userinfo "foo" #:host "bar" #:port 22 #:path "/baz")))
+
+  (pass-if "foo"
+    (relative-ref? (build-uri #f #:path "foo")))
+
+  (pass-if "/foo"
+    (relative-ref? (build-uri #f #:path "/foo")))
+
+  (pass-if "//foo/bar"
+    (relative-ref? (build-uri #f #:host "foo" #:path "/bar")))
+
+  (pass-if "?foo"
+    (relative-ref? (build-uri #f #:query "foo"))))
+
+(with-test-prefix "uri-reference?"
+  (pass-if "ftp:"
+    (uri-reference? (build-uri 'ftp)))
+
+  (pass-if "ftp:foo"
+    (uri-reference? (build-uri 'ftp #:path "foo")))
+
+  (pass-if "ftp://foo/bar"
+    (uri-reference? (build-uri 'ftp #:host "foo" #:path "/bar")))
+
+  (pass-if "ftp://foo@bar:22/baz"
+    (uri-reference? (build-uri 'ftp #:userinfo "foo" #:host "bar" #:port 22 #:path "/baz")))
+
+  (pass-if "foo"
+    (uri-reference? (build-uri #f #:path "foo")))
+
+  (pass-if "/foo"
+    (uri-reference? (build-uri #f #:path "/foo")))
+
+  (pass-if "//foo/bar"
+    (uri-reference? (build-uri #f #:host "foo" #:path "/bar")))
+
+  (pass-if "?foo"
+    (uri-reference? (build-uri #f #:query "foo"))))
 
 (with-test-prefix "string->uri"
   (pass-if "ftp:"
@@ -149,6 +241,24 @@
            #:port 22
            #:path "/baz"))
 
+  (pass-if "foo"
+    (uri=? (string->uri "foo")
+           #:path "foo"))
+
+  (pass-if "/foo"
+    (uri=? (string->uri "/foo")
+           #:path "/foo"))
+
+  (pass-if "//foo/bar"
+    (uri=? (string->uri "//foo/bar")
+           #:host "foo"
+           #:path "/bar"))
+
+  (pass-if "?foo"
+    (uri=? (string->uri "?foo")
+           #:path ""
+           #:query "foo"))
+
   (pass-if "http://bad.host.1"
     (not (string->uri "http://bad.host.1")))
 
@@ -229,6 +339,21 @@
     (equal? "ftp://foo@bar:22/baz"
             (uri->string (string->uri "ftp://foo@bar:22/baz"))))
   
+  (pass-if "foo"
+    (equal? "foo"
+            (uri->string (string->uri "foo"))))
+
+  (pass-if "/foo"
+    (equal? "/foo" (uri->string (string->uri "/foo"))))
+
+  (pass-if "//foo/bar"
+    (equal? "//foo/bar"
+            (uri->string (string->uri "//foo/bar"))))
+
+  (pass-if "?foo"
+    (equal? "?foo"
+            (uri->string (string->uri "?foo"))))
+
   (when (memq 'socket *features*)
     (pass-if "http://192.0.2.1"
       (equal? "http://192.0.2.1"
-- 
1.7.10.4


  reply	other threads:[~2012-11-24 11:23 UTC|newest]

Thread overview: 18+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2012-11-07 20:40 bug#12827: [2.0.6] web client: fails to parse 404 header Ludovic Courtès
2012-11-08  5:52 ` Daniel Hartwig
2012-11-08 20:10   ` Ludovic Courtès
2012-11-09  0:39     ` Daniel Hartwig
2012-11-09 20:52       ` Ludovic Courtès
2012-11-10  1:45         ` Daniel Hartwig
2012-11-10 13:52           ` Ludovic Courtès
2012-11-23 22:19       ` Ludovic Courtès
2012-11-24 11:23         ` Daniel Hartwig [this message]
2012-11-24 15:10           ` Ludovic Courtès
2012-11-24 15:34             ` Daniel Hartwig
2012-11-26  0:15               ` Ludovic Courtès
2012-11-26 23:13                 ` Ludovic Courtès
2012-11-27  1:06                   ` Daniel Hartwig
2012-11-27 12:50                     ` Ludovic Courtès
2012-11-27 15:18                       ` Daniel Hartwig
2012-11-27 21:43                         ` Ludovic Courtès
2013-02-23  8:11 ` bug#12827: [PATCH] Tweak web modules, support relative URIs (was: bug#12827: [2.0.6] web client: fails to parse 404 header) Daniel Hartwig

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAN3veRfzRovVKAxRcTJnPd2+hc-MpDPz3J1HAgc_L0Kmd3h89w@mail.gmail.com \
    --to=mandyke@gmail.com \
    --cc=12827@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).