unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* documentation for (web ...)
@ 2010-12-14 21:01 Andy Wingo
  2010-12-16 23:10 ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Andy Wingo @ 2010-12-14 21:01 UTC (permalink / raw)
  To: Neil Jerram; +Cc: guile-devel

Hi Neil,

I was looking at documenting the recent web stuff. My idea is to make a
new section in the Guile Modules chapter, after POSIX, with the intro
that the web is the new POSIX (sorta). In that section I'll write
documentation for the various web modules. Let me know if you have a
better idea, or don't like my idea, or what, and we'll work things out.

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: documentation for (web ...)
  2010-12-14 21:01 documentation for (web ...) Andy Wingo
@ 2010-12-16 23:10 ` Ludovic Courtès
  2010-12-23 23:51   ` Neil Jerram
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2010-12-16 23:10 UTC (permalink / raw)
  To: guile-devel

Hello!

Andy Wingo <wingo@pobox.com> writes:

> I was looking at documenting the recent web stuff. My idea is to make a
> new section in the Guile Modules chapter, after POSIX, with the intro
> that the web is the new POSIX (sorta).

I’m not keen on the comparison to POSIX.  For one, POSIX is for
operating systems, and the web is no substitute to operating systems.
In addition, what makes it /look/ like an operating system, i.e., the
fact that many applications can run “on the web”, in the browser, is
largely software as a service (SaaS), which I’d rather not promote.

Mind you, I do like the idea of having ready-to-use web tools in Guile,
and prominent in the manual.

Anyway, my 2 opinionated ¢.  ;-)

Besides I enjoyed reading the paragraph about “web data types” and why
it matters.

One nitpick: could use you @deffn {Scheme Procedure} instead of
{Function}, for consistency?

Kudos for writing all this nice doc!

Ludo’.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: documentation for (web ...)
  2010-12-16 23:10 ` Ludovic Courtès
@ 2010-12-23 23:51   ` Neil Jerram
  2010-12-26 17:45     ` Andy Wingo
  2011-01-11  6:52     ` Andy Wingo
  0 siblings, 2 replies; 7+ messages in thread
From: Neil Jerram @ 2010-12-23 23:51 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

ludo@gnu.org (Ludovic Courtès) writes:

> Hello!
>
> Andy Wingo <wingo@pobox.com> writes:
>
>> I was looking at documenting the recent web stuff. My idea is to make a
>> new section in the Guile Modules chapter, after POSIX, with the intro
>> that the web is the new POSIX (sorta).
>
> I’m not keen on the comparison to POSIX.  For one, POSIX is for
> operating systems, and the web is no substitute to operating systems.
> In addition, what makes it /look/ like an operating system, i.e., the
> fact that many applications can run “on the web”, in the browser, is
> largely software as a service (SaaS), which I’d rather not promote.
>
> Mind you, I do like the idea of having ready-to-use web tools in Guile,
> and prominent in the manual.
>
> Anyway, my 2 opinionated ¢.  ;-)

I agree with Ludo that the comparison with POSIX isn't quite right.  But
on the other hand I tend to discount that feeling, because I think it's
more fun for multiple "voices" to come through in different parts of the
manual.  Overall this is another lovely piece of doc from you; following
are some thoughts and comments that occurred to me on reading through.

> 7.3 HTTP, the Web, and All That
> ===============================
> 
> When Guile started back in the mid-nineties, the GNU system was still
> focused on producing a good POSIX implementation.  This is why Guile's
> POSIX support is good, and has been so for a while.
> 
>    But times change, and in a way these days the web is the new POSIX: a
> standard and a motley set of implementations on which much computing is
> done.  So today's Guile also supports the web at the programming
> language level, by defining common data types and operations for the
> technologies underpinning the web: URIs, HTTP, and XML.
> 
>    It is particularly important to define native web data types.  Though
> the web is text in motion, programming the web in text is like
> programming with `goto': muddy, and error-prone.  Most current security
> problems on the web are due to treating the web as text instead of as
> instances of the proper data types.

This is an interesting point of view, and I would certainly like more
exposition of it.  I wonder if that will be coming later?

[After reading all through: I'd say you've demonstrated that data types
are good, but haven't shown any link with security problems, so the hook
here remains dangling.]

>    In addition, common web data types help programmers to share code.

Also, I guess it's not totally clear at this point what you mean by web
data types, but perhaps that will become clear as we go on.

[It did]

>    Well.  That's all very nice and opinionated and such, but how do I
> use the thing?  Read on!
> 
> * Menu:
> 
> * URIs::                        Universal Resource Identifiers.
> * HTTP::                        The Hyper-Text Transfer Protocol.
> * HTTP Headers::                How Guile represents specific header values.
> * Requests::                    HTTP requests.
> * Responses::                   HTTP responses.
> * Web Server::                  Serving HTTP to the internet.
> * Web Examples::                How to use this thing.
> 
> \x1f
> File: guile.info,  Node: URIs,  Next: HTTP,  Up: Web
> 
> 7.3.1 Universal Resource Identifiers
> ------------------------------------
> 
> Guile provides a standard data type for Universal Resource Identifiers
> (URIs), as defined in RFC 3986.
> 
>    The generic URI syntax is as follows:
> 
>      URI := scheme ":" ["//" [userinfo "@"] host [":" port]] path \
>             [ "?" query ] [ "#" fragment ]
> 
>    So, all URIs have a scheme and a path. Some URIs have a host, and
> some of those have ports and userinfo. Any URI might have a query part
> or a fragment.
> 
>    Userinfo is something of an abstraction, as some legacy URI schemes
> allowed userinfo of the form `USERNAME:PASSWD'.  Passwords don't belong
> in URIs, so the RFC does not want to condone this, but neither can it
> say that what is before the `@' sign is just a username, so the RFC
> punts on the issue and calls it "userinfo".
> 
>    Also, strictly speaking, a URI with a fragment is a "URI reference".
> A fragment is typically not serialized when sending a URI over the
> wire; that is, it is not part of the identifier of a resource.  It only
> identifies a part of a given resource.

I found that a bit tricky to understand.  I think an example of what you
mean is that a web browser would only request the URI up to and
excluding the #, and process the #... part itself (by scrolling to that
point in the page).  It might help to say that.

>  But it's useful to have a field
> for it in the URI record itself, so we hope you will forgive the
> inconsistency.
> 
>      (use-modules (web uri))
> 
>    The following procedures can be found in the `(web uri)' module.
> Load it into your Guile, using a form like the above, to have access to
> them.
> 
>  -- Function: build-uri scheme [#:userinfo] [#:host] [#:port] [#:path]
>           [#:query] [#:fragment] [#:validate?]

Why is the path arg not mandatory?

>      Construct a URI object. If VALIDATE? is true, also run some
>      consistency checks to make sure that the constructed URI is valid.
> 
>  -- Function: uri? x
>  -- Function: uri-scheme uri
>  -- Function: uri-userinfo uri
>  -- Function: uri-host uri
>  -- Function: uri-port uri
>  -- Function: uri-path uri
>  -- Function: uri-query uri
>  -- Function: uri-fragment uri
>      A predicate and field accessors for the URI record type.
> 
>  -- Function: declare-default-port! scheme port
>      Declare a default port for the given URI scheme.
> 
>      Default ports are for printing URI objects: a default port is not
>      printed.

Does this really belong here?  Seems like mixing a bit of the `model'
into the `view'.  I'd expect a URI without an explicit port to give

  (uri-port uri) => #f

and that if I do

  (set! (uri-port uri) 80)

the :80 would be there in the string representation of the URI.

That's a mostly theoretical point though; I admit I haven't thought
through what is most _useful_.  Although, often what is most useful is
for an API to behave as most programmers would expect it to.

>  -- Function: parse-uri string
>      Parse STRING into a URI object. Returns `#f' if the string could
>      not be parsed.
> 
>  -- Function: unparse-uri uri
>      Serialize URI to a string.

Or uri->string ?  And I guess parse-uri could be string->uri.
Cf. string->number and number->string.

>  -- Function: uri-decode str [#:charset]
>      Percent-decode the given STR, according to CHARSET.
> 
>      Note that this function should not generally be applied to a full
>      URI string. For paths, use split-and-decode-uri-path instead. For
>      query strings, split the query on `&' and `=' boundaries, and
>      decode the components separately.
> 
>      Note that percent-encoded strings encode _bytes_, not characters.
>      There is no guarantee that a given byte sequence is a valid string
>      encoding. Therefore this routine may signal an error if the decoded
>      bytes are not valid for the given encoding. Pass `#f' for CHARSET
>      if you want decoded bytes as a bytevector directly.

So the return value is a bytevector if CHARSET is #f, and a string if
not?

>  -- Function: uri-encode str [#:charset] [#:unescaped-chars]
>      Percent-encode any character not in UNESCAPED-CHARS.

UNESCAPED-CHARS is a vector, a list, ...?

>      Percent-encoding first writes out the given character to a

s/character/string

>      bytevector within the given CHARSET, then encodes each byte as
>      `%HH', where HH is the hexadecimal representation of the byte.
> 
>  -- Function: split-and-decode-uri-path path
>      Split PATH into its components, and decode each component,
>      removing empty components.
> 
>      For example, `"/foo/bar/"' decodes to the two-element list,
>      `("foo" "bar")'.

Presumably this does % decoding too, so it would be good to give another
example to show that.

>  -- Function: encode-and-join-uri-path parts
>      URI-encode each element of PARTS, which should be a list of
>      strings, and join the parts together with `/' as a delimiter.
> 
> \x1f
> File: guile.info,  Node: HTTP,  Next: HTTP Headers,  Prev: URIs,  Up: Web
> 
> 7.3.2 The Hyper-Text Transfer Protocol
> --------------------------------------
> 
> The initial motivation for including web functionality in Guile, rather
> than rely on an external package, was to establish a standard base on
> which people can share code.  To that end, we continue the focus on data
> types by providing a number of low-level parsers and unparsers for
> elements of the HTTP protocol.
> 
>    If you are want to skip the low-level details for now and move on to
> web pages, *note Web Server::.  Otherwise, load the HTTP module, and
> read on.
> 
>      (use-modules (web http))
> 
>    The focus of the `(web http)' module is to parse and unparse
> standard HTTP headers, representing them to Guile as native data
> structures.  For example, a `Date:' header will be represented as a
> SRFI-19 date record (*note SRFI-19::), rather than as a string.
> 
>    Guile tries to follow RFCs fairly strictly--the road to perdition
> being paved with compatibility hacks--though some allowances are made
> for not-too-divergent texts.
> 
>    The first bit is to define a registry of parsers, validators, and
> unparsers, keyed by header name.  That is the function of the
> `<header-decl>' object.
> 
>  -- Function: make-header-decl sym name multiple? parser validator
>           writer
>  -- Function: header-decl? x
>  -- Function: header-decl-sym decl
>  -- Function: header-decl-name decl
>  -- Function: header-decl-multiple? decl
>  -- Function: header-decl-parser decl
>  -- Function: header-decl-validator decl
>  -- Function: header-decl-writer decl.
>      A constructor, predicate, and field accessors for the
>      `<header-decl>' type. The fields are as follows:
> 
>     `sym'
>           The symbol name for this header field, always in lower-case.
>           For example, `"Content-Length"' has a symbolic name of
>           `content-length'.
> 
>     `name'
>           The string name of the header, in its preferred
>           capitalization.
> 
>     `multiple?'
>           `#t' iff this header may appear multiple times in a message.
> 
>     `parser'
>           A procedure which takes a string and returns a parsed value.
> 
>     `validator'
>           A predicate, returning `#t' iff the value is valid for this
>           header.

Maybe say something here about validator function often being very
similar to parsing function?

>     `writer'
>           A writer, which writes a value to the port given in the
>           second argument.
> 
>  -- Function: declare-header! sym name [#:multiple?] [#:parser]
>           [#:validator] [#:writer]
>      Make a header declaration, as above, and register it by symbol and
>      by name.

Are the keyword args really optional?  If so, what are the defaults?

A possibly important point: what is the scope of the space in which
these header declarations are made?  My reason for asking is that this
infrastructure looks applicable for other HTTP-like protocols too, such
as SIP.  But the detailed rules for a given header in SIP may be
different from a header with the same name in HTTP, and hence different
header-decl objects would be needed.  Therefore, even though we claim no
other protocol support right now, perhaps we should anticipate that by
enhancing declare-header! so as to distinguish between HTTP-space and
other-protocol-spaces.

[After reading all through, I remain confused about exactly how general
this server infrastructure is intended to be]

>  -- Function: lookup-header-decl name
>      Return the HEADER-DECL object registered for the given NAME.
> 
>      NAME may be a symbol or a string. Strings are mapped to headers in
>      a case-insensitive fashion.
> 
>  -- Function: valid-header? sym val
>      Returns a true value iff VAL is a valid Scheme value for the
>      header with name SYM.

Note slight inconsistency in the two above deffns: "Return" vs
"Returns".

>    Now that we have a generic interface for reading and writing
> headers, we do just that.
> 
>  -- Function: read-header port
>      Reads one HTTP header from PORT. Returns two values: the header
>      name and the parsed Scheme value.

As multiple values?  Is that more helpful than as a cons?

> May raise an exception if the
>      header was known but the value was invalid.
> 
>      Returns #F for both values if the end of the message body was
>      reached (i.e., a blank line).

I'd find #<eof> more intuitive.

>  -- Function: parse-header name val
>      Parse VAL, a string, with the parser for the header named NAME.
> 
>      Returns two values, the header name and parsed value. If a parser
>      was found, the header name will be returned as a symbol. If a
>      parser was not found, both the header name and the value are
>      returned as strings.

Again, multiple values or a cons?

>  -- Function: write-header name val port
>      Writes the given header name and value to PORT. If NAME is a
>      symbol, looks up a declared header and uses that writer. Otherwise
>      the value is written using DISPLAY.
>
>  -- Function: read-headers port
>      Read an HTTP message from PORT, returning the headers as an
>      ordered alist.

s/Read/Read the headers of/  ?  i.e. Should the caller have already read
the request/response line?

>  -- Function: write-headers headers port
>      Write the given header alist to PORT. Doesn't write the final
>      \r\n, as the user might want to add another header.
> 
>    The `(web http)' module also has some utility procedures to read and
> write request and response lines.
> 
>  -- Function: parse-http-method str [start] [end]
>      Parse an HTTP method from STR. The result is an upper-case symbol,
>      like `GET'.
> 
>  -- Function: parse-http-version str [start] [end]
>      Parse an HTTP version from STR, returning it as a major-minor
>      pair. For example, `HTTP/1.1' parses as the pair of integers, `(1
>      . 1)'.
> 
>  -- Function: parse-request-uri str [start] [end]
>      Parse a URI from an HTTP request line. Note that URIs in requests
>      do not have to have a scheme or host name. The result is a URI
>      object.
> 
>  -- Function: read-request-line port
>      Read the first line of an HTTP request from PORT, returning three
>      values: the method, the URI, and the version.
> 
>  -- Function: write-request-line method uri version port
>      Write the first line of an HTTP request to PORT.
> 
>  -- Function: read-response-line port
>      Read the first line of an HTTP response from PORT, returning three
>      values: the HTTP version, the response code, and the "reason
>      phrase".
> 
>  -- Function: write-response-line version code reason-phrase port
>      Write the first line of an HTTP response to PORT.
> 
> \x1f
> File: guile.info,  Node: HTTP Headers,  Next: Requests,  Prev: HTTP,  Up: Web
> 
> 7.3.3 HTTP Headers
> ------------------
> 
> The `(web http)' module defines parsers and unparsers for all headers
> defined in the HTTP/1.1 standard.  This section describes the parsed
> format of the various headers.
> 
>    We cannot describe the function of all of these headers, however, in
> sufficient detail.

I don't get the point here.

>  The interested reader would do well to download a
> copy of RFC 2616 and have it on hand.
> 
>    To begin with, we should make a few definitions:
> 
> "key-value list"
>      A key-value list is a list of values.  Each value may be a string,
>      a symbol, or a pair.  Known keys are parsed to symbols; otherwise
>      keys are left as strings.  Keys with values are parsed to pairs,
>      the car of which is the symbol or string key, and the cdr is the
>      parsed value.  Parsed values for known keys have key-dependent
>      formats.  Parsed values for unknown keys are strings.
> 
> "param list"
>      A param list is a list of key-value lists.  When serialized to a
>      string, items in the inner lists are separated by semicolons.
>      Again, known keys are parsed to symbols.
> 
> "quality"
>      A number of headers have quality values in them, which are decimal
>      fractions between zero and one indicating a preference for various
>      kinds of responses, which the server may choose to heed.  Given
>      that only three digits are allowed in the fractional part, Guile
>      parses quality values to integers between 0 and 1000 instead of
>      inexact numbers between 0.0 and 1.0.
> 
> "quality list"
>      A list of pairs, the car of which is a quality value.
> 
> "entity tag"
>      A pair, the car of which is an opaque string, and the cdr of which
>      is true iff the entity tag is a "strong" entity tag.

A bit of a conceptual stack has built up at this point.  i.e. I have no
idea why you're telling me this....

> 7.3.3.1 General Headers
> .......................
> 
> `cache-control'
>      A key-value list of cache-control directives. Known keys are
>      `max-age', `max-stale', `min-fresh', `must-revalidate',
>      `no-cache', `no-store', `no-transform', `only-if-cached',
>      `private', `proxy-revalidate', `public', and `s-maxage'.
> 
>      If present, parameters to `max-age', `max-stale', `min-fresh', and
>      `s-maxage' are all parsed as non-negative integers.
> 
>      If present, parameters to `private' and `no-cache' are parsed as
>      lists of header names, represented as symbols if they are known
>      headers or strings otherwise.

... but this is pretty quickly justifying the stuff above, so I think
the stack is actually OK.

> `connection'
>      A list of connection tokens.  A connection token is a string.
> 
> `date'
>      A SRFI-19 date record.
> 
> `pragma'
>      A key-value list of pragma directives.  `no-cache' is the only
>      known key.
> 
> `trailer'
>      A list of header names.  Known header names are parsed to symbols,
>      otherwise they are left as strings.
> 
> `transfer-encoding'
>      A param list of transfer codings.  `chunked' is the only known key.

OK, why a param list rather than key-value?  How are elements in the
second key-value list, say, different from elements in the first
key-value list?

> `upgrade'
>      A list of strings.
> 
> `via'
>      A list of strings.  There may be multiple `via' headers in ne
>      message.
> 
> `warning'
>      A list of warnings.  Each warning is a itself a list of four
>      elements: a code, as an exact integer between 0 and 1000, a host
>      as a string, the warning text as a string, and either `#f' or a
>      SRFI-19 date.
> 
>      There may be multiple `warning' headers in one message.
> 
> 7.3.3.2 Entity Headers
> ......................
> 
> `allow'
>      A list of methods, as strings.  Methods are parsed as strings
>      instead of `parse-http-method' so as to allow for new methods.
> 
> `content-encoding'
>      A list of content codings, as strings.
> 
> `content-language'
>      A list of language tags, as strings.
> 
> `content-length'
>      An exact, non-negative integer.
> 
> `content-location'
>      A URI record.
> 
> `content-md5'
>      A string.
> 
> `content-range'
>      A list of three elements: the symbol `bytes', either the symbol
>      `*' or a pair of integers, indicating the byte rage, and either
>      `*' or an integer, for the instance length.
> 
> `content-type'
>      A pair, the car of which is the media type as a string, and the
>      cdr is an alist of parameters, with strings as keys and values.
> 
>      For example, `"text/plain"' parses as `("text/plain")', and
>      `"text/plain;charset=utf-8"' parses as `("text/plain" ("charset" .
>      "utf-8"))'.
> 
> `expires'
>      A SRFI-19 date.
> 
> `last-modified'
>      A SRFI-19 date.
> 
> 
> 7.3.3.3 Request Headers
> .......................
> 
> `accept'
>      A param list.  Each element in the list indicates one media-range
>      with accept-params.  They only known key is `q', whose value is
>      parsed as a quality value.
> 
> `accept-charset'
>      A quality-list of charsets, as strings.
> 
> `accept-encoding'
>      A quality-list of content codings, as strings.
> 
> `accept-language'
>      A quality-list of languages, as strings.
> 
> `authorization'
>      A string.
> 
> `expect'
>      A param list of expectations.  The only known key is
>      `100-continue'.
> 
> `from'
>      A string.
> 
> `host'
>      A pair of the host, as a string, and the port, as an integer. If
>      no port is given, port is `#f'.
> 
> `if-match'
>      Either the symbol `*', or a list of entity tags (see above).
> 
> `if-modified-since'
>      A SRFI-19 date.
> 
> `if-none-match'
>      Either the symbol `*', or a list of entity tags (see above).
> 
> `if-range'
>      Either an entity tag, or a SRFI-19 date.
> 
> `if-unmodified-since'
>      A SRFI-19 date.
> 
> `max-forwards'
>      An exact non-negative integer.
> 
> `proxy-authorization'
>      A string.
> 
> `range'
>      A pair whose car is the symbol `bytes', and whose cdr is a list of
>      pairs. Each element of the cdr indicates a range; the car is the
>      first byte position and the cdr is the last byte position, as
>      integers, or `#f' if not given.
> 
> `referer'
>      A URI.
> 
> `te'
>      A param list of transfer-codings.  The only known key is
>      `trailers'.
> 
> `user-agent'
>      A string.
> 
> 7.3.3.4 Response Headers
> ........................
> 
> `accept-ranges'
>      A list of strings.
> 
> `age'
>      An exact, non-negative integer.
> 
> `etag'
>      An entity tag.
> 
> `location'
>      A URI.
> 
> `proxy-authenticate'
>      A string.
> 
> `retry-after'
>      Either an exact, non-negative integer, or a SRFI-19 date.
> 
> `server'
>      A string.
> 
> `vary'
>      Either the symbol `*', or a list of headers, with known headers
>      parsed to symbols.
> 
> `www-authenticate'
>      A string.

Obviously there's lots of substructure there (in WWW-Authenticate) that
we just don't support yet.  Is there a clear compatibility story for
if/when Guile is enhanced to parse that out?

I guess yes; calling code will just need something like

  (if (string? val)
      ;; An older Guile that doesn't parse authentication fully.
      (do-application-own-parsing)
      ;; A newer Guile that does parse authentication.
      (use-the-parsed-authentication-object))


> \x1f
> File: guile.info,  Node: Requests,  Next: Responses,  Prev: HTTP Headers,  Up: Web
> 
> 7.3.4 HTTP Requests
> -------------------
> 
>      (use-modules (web request))
> 
>    The request module contains a data type for HTTP requests.  Note that
> the body is not part of the request, but the port is.  Once you have
> read a request, you may read the body separately, and likewise for
> writing requests.
> 
>  -- Function: build-request [#:method] [#:uri] [#:version] [#:headers]
>           [#:port] [#:meta] [#:validate-headers?]
>      Construct an HTTP request object. If VALIDATE-HEADERS? is true,
>      the headers are each run through their respective validators.
> 
>  -- Function: request?
>  -- Function: request-method
>  -- Function: request-uri
>  -- Function: request-version
>  -- Function: request-headers
>  -- Function: request-meta
>  -- Function: request-port
>      A predicate and field accessors for the request type.  The fields
>      are as follows:
>     `method'
>           The HTTP method, for example, `GET'.
> 
>     `uri'
>           The URI as a URI record.
> 
>     `version'
>           The HTTP version pair, like `(1 . 1)'.
> 
>     `headers'
>           The request headers, as an alist of parsed values.
> 
>     `meta'
>           An arbitrary alist of other data, for example information
>           returned in the `sockaddr' from `accept' (*note Network
>           Sockets and Communication::).
> 
>     `port'
>           The port on which to read or write a request body, if any.
> 
>  -- Function: read-request port [meta]
>      Read an HTTP request from PORT, optionally attaching the given
>      metadata, META.
> 
>      As a side effect, sets the encoding on PORT to ISO-8859-1
>      (latin-1), so that reading one character reads one byte. See the
>      discussion of character sets in "HTTP Requests" in the manual, for
>      more information.

That last sentence is OK for a docstring, but strange here _in_ the
manual.

And, where is that discussion?

>  -- Function: write-request r port
>      Write the given HTTP request to PORT.
> 
>      Returns a new request, whose `request-port' will continue writing
>      on PORT, perhaps using some transfer encoding.
> 
>  -- Function: read-request-body/latin-1 r
>      Reads the request body from R, as a string.
> 
>      Assumes that the request port has ISO-8859-1 encoding, so that the
>      number of characters to read is the same as the
>      `request-content-length'. Returns `#f' if there was no request
>      body.
> 
>  -- Function: write-request-body/latin-1 r body
>      Write BODY, a string encodable in ISO-8859-1, to the port
>      corresponding to the HTTP request R.
> 
>  -- Function: read-request-body/bytevector r
>      Reads the request body from R, as a bytevector. Returns `#f' if
>      there was no request body.
> 
>  -- Function: write-request-body/bytevector r bv
>      Write BODY, a bytevector, to the port corresponding to the HTTP
>      request R.
> 
>    The various headers that are typically associated with HTTP requests
> may be accessed with these dedicated accessors.  *Note HTTP Headers::,
> for more information on the format of parsed headers.
> 
>  -- Function: request-accept request [default='()]
>  -- Function: request-accept-charset request [default='()]
>  -- Function: request-accept-encoding request [default='()]
>  -- Function: request-accept-language request [default='()]
>  -- Function: request-allow request [default='()]
>  -- Function: request-authorization request [default=#f]
>  -- Function: request-cache-control request [default='()]
>  -- Function: request-connection request [default='()]
>  -- Function: request-content-encoding request [default='()]
>  -- Function: request-content-language request [default='()]
>  -- Function: request-content-length request [default=#f]
>  -- Function: request-content-location request [default=#f]
>  -- Function: request-content-md5 request [default=#f]
>  -- Function: request-content-range request [default=#f]
>  -- Function: request-content-type request [default=#f]
>  -- Function: request-date request [default=#f]
>  -- Function: request-expect request [default='()]
>  -- Function: request-expires request [default=#f]
>  -- Function: request-from request [default=#f]
>  -- Function: request-host request [default=#f]
>  -- Function: request-if-match request [default=#f]
>  -- Function: request-if-modified-since request [default=#f]
>  -- Function: request-if-none-match request [default=#f]
>  -- Function: request-if-range request [default=#f]
>  -- Function: request-if-unmodified-since request [default=#f]
>  -- Function: request-last-modified request [default=#f]
>  -- Function: request-max-forwards request [default=#f]
>  -- Function: request-pragma request [default='()]
>  -- Function: request-proxy-authorization request [default=#f]
>  -- Function: request-range request [default=#f]
>  -- Function: request-referer request [default=#f]
>  -- Function: request-te request [default=#f]
>  -- Function: request-trailer request [default='()]
>  -- Function: request-transfer-encoding request [default='()]
>  -- Function: request-upgrade request [default='()]
>  -- Function: request-user-agent request [default=#f]
>  -- Function: request-via request [default='()]
>  -- Function: request-warning request [default='()]
>      Return the given request header, or DEFAULT if none was present.
> 
>  -- Function: request-absolute-uri r [default-host] [default-port]
>      A helper routine to determine the absolute URI of a request, using
>      the `host' header and the default host and port.

Hmm, I think the provision of this data type needs a bit more
motivation.  It doesn't appear to offer much additional value, compared
with reading or writing the components of a request individually, and on
the other hand it appears to bake in assumptions about charsets and
content length that might not always be true.

> \x1f
> File: guile.info,  Node: Responses,  Next: Web Server,  Prev: Requests,  Up: Web
> 
> 7.3.5 HTTP Responses
> --------------------
> 
>      (use-modules (web response))
> 
>    As with requests (*note Requests::), Guile offers a data type for
> HTTP responses.  Again, the body is represented separately from the
> request.
> 
>  -- Function: response?
>  -- Function: response-version
>  -- Function: response-code
>  -- Function: response-reason-phrase response
>  -- Function: response-headers
>  -- Function: response-port
>      A predicate and field accessors for the response type.  The fields
>      are as follows:
>     `version'
>           The HTTP version pair, like `(1 . 1)'.
> 
>     `code'
>           The HTTP response code, like `200'.
> 
>     `reason-phrase'
>           The reason phrase, or the standard reason phrase for the
>           response's code.
> 
>     `headers'
>           The response headers, as an alist of parsed values.
> 
>     `port'
>           The port on which to read or write a response body, if any.
> 
>  -- Function: read-response port
>      Read an HTTP response from PORT, optionally attaching the given
>      metadata, META.
> 
>      As a side effect, sets the encoding on PORT to ISO-8859-1
>      (latin-1), so that reading one character reads one byte. See the
>      discussion of character sets in "HTTP Responses" in the manual,
>      for more information.

As above.

>  -- Function: build-response [#:version] [#:code] [#:reason-phrase]
>           [#:headers] [#:port]
>      Construct an HTTP response object. If VALIDATE-HEADERS? is true,
>      the headers are each run through their respective validators.
> 
>  -- Function: extend-response r k v . additional
>      Extend an HTTP response by setting additional HTTP headers K, V.
>      Returns a new HTTP response.

What does the ADDITIONAL arg mean?

>  -- Function: adapt-response-version response version
>      Adapt the given response to a different HTTP version. Returns a
>      new HTTP response.
> 
>      The idea is that many applications might just build a response for
>      the default HTTP version, and this method could handle a number of
>      programmatic transformations to respond to older HTTP versions
>      (0.9 and 1.0). But currently this function is a bit heavy-handed,
>      just updating the version field.

Interesting, and adds more value to the idea of the response object.
Why not for the request object too - are you assuming that Guile will
usually be acting as the HTTP server?  (Which I'm sure is correct, but
"usually" is not "always".)

>  -- Function: write-response r port
>      Write the given HTTP response to PORT.
> 
>      Returns a new response, whose `response-port' will continue writing
>      on PORT, perhaps using some transfer encoding.
> 
>  -- Function: read-response-body/latin-1 r
>      Reads the response body from R, as a string.
> 
>      Assumes that the response port has ISO-8859-1 encoding, so that the
>      number of characters to read is the same as the
>      `response-content-length'. Returns `#f' if there was no response
>      body.
> 
>  -- Function: write-response-body/latin-1 r body
>      Write BODY, a string encodable in ISO-8859-1, to the port
>      corresponding to the HTTP response R.
> 
>  -- Function: read-response-body/bytevector r
>      Reads the response body from R, as a bytevector. Returns `#f' if
>      there was no response body.
> 
>  -- Function: write-response-body/bytevector r bv
>      Write BODY, a bytevector, to the port corresponding to the HTTP
>      response R.
> 
>    As with requests, the various headers that are typically associated
> with HTTP responses may be accessed with these dedicated accessors.
> *Note HTTP Headers::, for more information on the format of parsed
> headers.
> 
>  -- Function: response-accept-ranges response [default=#f]
>  -- Function: response-age response [default='()]
>  -- Function: response-allow response [default='()]
>  -- Function: response-cache-control response [default='()]
>  -- Function: response-connection response [default='()]
>  -- Function: response-content-encoding response [default='()]
>  -- Function: response-content-language response [default='()]
>  -- Function: response-content-length response [default=#f]
>  -- Function: response-content-location response [default=#f]
>  -- Function: response-content-md5 response [default=#f]
>  -- Function: response-content-range response [default=#f]
>  -- Function: response-content-type response [default=#f]
>  -- Function: response-date response [default=#f]
>  -- Function: response-etag response [default=#f]
>  -- Function: response-expires response [default=#f]
>  -- Function: response-last-modified response [default=#f]
>  -- Function: response-location response [default=#f]
>  -- Function: response-pragma response [default='()]
>  -- Function: response-proxy-authenticate response [default=#f]
>  -- Function: response-retry-after response [default=#f]
>  -- Function: response-server response [default=#f]
>  -- Function: response-trailer response [default='()]
>  -- Function: response-transfer-encoding response [default='()]
>  -- Function: response-upgrade response [default='()]
>  -- Function: response-vary response [default='()]
>  -- Function: response-via response [default='()]
>  -- Function: response-warning response [default='()]
>  -- Function: response-www-authenticate response [default=#f]
>      Return the given request header, or DEFAULT if none was present.
> 
> \x1f
> File: guile.info,  Node: Web Server,  Next: Web Examples,  Prev: Responses,  Up: Web
> 
> 7.3.6 Web Server
> ----------------
> 
> `(web server)' is a generic web server interface, along with a main
> loop implementation for web servers controlled by Guile.
> 
>      (use-modules (web server))
> 
>    The lowest layer is the `<server-impl>' object, which defines a set
> of hooks to open a server, read a request from a client, write a
> response to a client, and close a server.  These hooks - `open',
> `read', `write', and `close', respectively - are bound together in a
> `<server-impl>' object.  Procedures in this module take a
> `<server-impl>' object, if needed.
> 
>    A `<server-impl>' may also be looked up by name.  If you pass the
> `http' symbol to `run-server', Guile looks for a variable named `http'
> in the `(web server http)' module, which should be bound to a
> `<server-impl>' object.  Such a binding is made by instantiation of the
> `define-server-impl' syntax.  In this way the run-server loop can
> automatically load other backends if available.
> 
>    The life cycle of a server goes as follows:
> 
>   1. The `open' hook is called, to open the server. `open' takes 0 or
>      more arguments, depending on the backend,

How is that possible?  (immediate thought... perhaps it will be
explained later)

> and returns an opaque
>      server socket object, or signals an error.
> 
>   2. The `read' hook is called, to read a request from a new client.
>      The `read' hook takes one argument, the server socket.  It should

It feels surprising for the infrastructure to pass the server socket to
the read hook.  I'd expect the infrastructure to do the `accept' itself
and pass the client socket to the read hook.

Also, does the infrastructure assume that each client socket will only
be used for one request and response, and then closed?  Would it be hard
to remove that assumption, so that the <server-impl> idea is more
general?

>      return three values: an opaque client socket, the request, and the
>      request body. The request should be a `<request>' object, from
>      `(web request)'.  The body should be a string or a bytevector, or
>      `#f' if there is no body.
> 
>      If the read failed, the `read' hook may return #f for the client
>      socket, request, and body.
> 
>   3. A user-provided handler procedure is called, with the request and
>      body as its arguments.  The handler should return two values: the
>      response, as a `<response>' record from `(web response)', and the
>      response body as a string, bytevector, or `#f' if not present.  We
>      also allow the reponse to be simply an alist of headers, in which

s/reponse/response

>      case a default response object is constructed with those headers.

What about response status?  (perhaps represented as a "status" header,
a la modlisp)

>   4. The `write' hook is called with three arguments: the client
>      socket, the response, and the body.  The `write' hook returns no
>      values.
> 
>   5. At this point the request handling is complete. For a loop, we
>      loop back and try to read a new request.
> 
>   6. If the user interrupts the loop, the `close' hook is called on the
>      server socket.
> 
>    A user may define a server implementation with the following form:
> 
>  -- Function: define-server-impl name open read write close
>      Make a `<server-impl>' object with the hooks OPEN, READ, WRITE,
>      and CLOSE, and bind it to the symbol NAME in the current module.
> 
>  -- Function: lookup-server-impl impl
>      Look up a server implementation. If IMPL is a server
>      implementation already, it is returned directly. If it is a
>      symbol, the binding named IMPL in the `(web server IMPL)' module is
>      looked up. Otherwise an error is signaled.
> 
>      Currently a server implementation is a somewhat opaque type,
>      useful only for passing to other procedures in this module, like
>      `read-client'.
> 
>    The `(web server)' module defines a number of routines that use
> `<server-impl>' objects to implement parts of a web server.  Given that
> we don't expose the accessors for the various fields of a
> `<server-impl>', indeed these routines are the only procedures with any
> access to the impl objects. 

How general is <server-impl> hoping to be?  Correspondingly, is the (web
server) module name appropriate?

To me, "web" => "http", so (web server http) is a tautological name.
And in fact it sounds like you intend <server-impl> to cover more than
just web/HTTP, so I suppose it should be in a module like (server),
rather than (web server).

It seems we could do with some more server impls in order to validate
that the infrastructure is all defined correctly.  Time-permitting, I'd
like to play with writing modlisp support for this new system, analogous
to what I did already in guile-www.

>  -- Function: open-server impl open-params
>      Open a server for the given implementation. Returns one value, the
>      new server object. The implementation's `open' procedure is
>      applied to OPEN-PARAMS, which should be a list.
> 
>  -- Function: read-client impl server
>      Read a new client from SERVER, by applying the implementation's
>      `read' procedure to the server. If successful, returns three
>      values: an object corresponding to the client, a request object,
>      and the request body. If any exception occurs, returns `#f' for
>      all three values.

I think there's a one-request-per-connection assumption here, isn't
there?

>  -- Function: handle-request handler request body state
>      Handle a given request, returning the response and body.
> 
>      The response and response body are produced by calling the given
>      HANDLER with REQUEST and BODY as arguments.
> 
>      The elements of STATE are also passed to HANDLER as arguments, and
>      may be returned as additional values. The new STATE, collected
>      from the HANDLER's return values, is then returned as a list. The
>      idea is that a server loop receives a handler from the user, along
>      with whatever state values the user is interested in, allowing the
>      user's handler to explicitly manage its state.
> 
>  -- Function: sanitize-response request response body
>      "Sanitize" the given response and body, making them appropriate
>      for the given request.
> 
>      As a convenience to web handler authors, RESPONSE may be given as
>      an alist of headers, in which case it is used to construct a
>      default response. Ensures that the response version corresponds to
>      the request version. If BODY is a string, encodes the string to a
>      bytevector, in an encoding appropriate for RESPONSE. Adds a
>      `content-length' and `content-type' header, as necessary.
> 
>      If BODY is a procedure, it is called with a port as an argument,
>      and the output collected as a bytevector. In the future we might
>      try to instead use a compressing, chunk-encoded port, and call
>      this procedure later, in the write-client procedure. Authors are
>      advised not to rely on the procedure being called at any
>      particular time.
> 
>  -- Function: write-client impl server client response body
>      Write an HTTP response and body to CLIENT. If the server and
>      client support persistent connections, it is the implementation's
>      responsibility to keep track of the client thereafter, presumably
>      by attaching it to the SERVER argument somehow.

Ah, interesting, I guess this is what removes the
one-request-per-connection assumption.

>  -- Function: close-server impl server
>      Release resources allocated by a previous invocation of
>      `open-server'.
> 
>    Given the procedures above, it is a small matter to make a web
> server:
> 
>  -- Function: serve-one-client handler impl server state
>      Read one request from SERVER, call HANDLER on the request and
>      body, and write the response to the client. Returns the new state
>      produced by the handler procedure.
> 
>  -- Function: run-server handler [impl] [open-params] . state
>      Run Guile's built-in web server.
> 
>      HANDLER should be a procedure that takes two or more arguments,
>      the HTTP request and request body, and returns two or more values,
>      the response and response body.
> 
>      For example, here is a simple "Hello, World!" server:
> 
>            (define (handler request body)
>              (values '((content-type . ("text/plain")))
>                      "Hello, World!"))
>            (run-server handler)
> 
>      The response and body will be run through `sanitize-response'
>      before sending back to the client.
> 
>      Additional arguments to HANDLER are taken from STATE.  Additional
>      return values are accumulated into a new STATE, which will be used
>      for subsequent requests. In this way a handler can explicitly
>      manage its state.
> 
>      The default server implementation is `http', which accepts
>      OPEN-PARAMS like `(#:port 8081)', among others. See "Web Server"
>      in the manual, for more information.

Last sentence should be removed from the manual version of the
docstring.

> \x1f
> File: guile.info,  Node: Web Examples,  Prev: Web Server,  Up: Web
> 
> 7.3.7 Web Examples
> ------------------
> 
> Well, enough about the tedious internals.  Let's make a web application!
> 
> 7.3.7.1 Hello, World!
> .....................
> 
> The first program we have to write, of course, is "Hello, World!".
> This means that we have to implement a web handler that does what we
> want.

The thunder here has been somewhat stolen by the fact that you already
presented this example above!

>    Now we define a handler, a function of two arguments and two return
> values:
> 
>      (define (handler request request-body)
>        (values RESPONSE RESPONSE-BODY))
> 
>    In this first example, we take advantage of a short-cut, returning an
> alist of headers instead of a proper response object. The response body
> is our payload:
> 
>      (define (hello-world-handler request request-body)
>        (values '((content-type . ("text/plain")))
>                "Hello World!"))
> 
>    Now let's test it, by running a server with this handler. Load up the
> web server module if you haven't yet done so, and run a server with this
> handler:
> 
>      (use-modules (web server))
>      (run-server hello-world-handler)
> 
>    By default, the web server listens for requests on `localhost:8080'.
> Visit that address in your web browser to test.  If you see the string,
> `Hello World!', sweet!
> 
> 7.3.7.2 Inspecting the Request
> ..............................
> 
> The Hello World program above is a general greeter, responding to all
> URIs.  To make a more exclusive greeter, we need to inspect the request
> object, and conditionally produce different results.  So let's load up
> the request, response, and URI modules, and do just that.
> 
>      (use-modules (web server)) ; you probably did this already
>      (use-modules (web request)
>                   (web response)
>                   (web uri))
> 
>      (define (request-path-components request)
>        (split-and-decode-uri-path (uri-path (request-uri request))))
> 
>      (define (hello-hacker-handler request body)
>        (if (equal? (request-path-components request)
>                    '("hacker"))
>            (values '((content-type . ("text/plain")))
>                    "Hello hacker!")
>            (not-found request)))
> 
>      (run-server hello-hacker-handler)
> 
>    Here we see that we have defined a helper to return the components of
> the URI path as a list of strings, and used that to check for a request
> to `/hacker/'. Then the success case is just as before - visit
> `http://localhost:8080/hacker/' in your browser to check.
> 
>    You should always match against URI path components as decoded by
> `split-and-decode-uri-path'. The above example will work for
> `/hacker/', `//hacker///', and `/h%61ck%65r'.
> 
>    But we forgot to define `not-found'!  If you are pasting these
> examples into a REPL, accessing any other URI in your web browser will
> drop your Guile console into the debugger:
> 
>      <unnamed port>:38:7: In procedure module-lookup:
>      <unnamed port>:38:7: Unbound variable: not-found
> 
>      Entering a new prompt.  Type `,bt' for a backtrace or `,q' to continue.
>      scheme@(guile-user) [1]>
> 
>    So let's define the function, right there in the debugger.  As you
> probably know, we'll want to return a 404 response.
> 
>      ;; Paste this in your REPL
>      (define (not-found request)
>        (values (build-response #:code 404)
>                (string-append "Resource not found: "
>                               (unparse-uri (request-uri request)))))
> 
>      ;; Now paste this to let the web server keep going:
>      ,continue

Cool, I didn't know Guile could do that!

>    Now if you access `http://localhost/foo/', you get this error
> message.  (Note that some popular web browsers won't show
> server-generated 404 messages, showing their own instead, unless the 404
> message body is long enough.)
> 
> 7.3.7.3 Higher-Level Interfaces
> ...............................
> 
> The web handler interface is a common baseline that all kinds of Guile
> web applications can use.  You will usually want to build something on
> top of it, however, especially when producing HTML.  Here is a simple
> example that builds up HTML output using SXML (*note sxml simple::).
> 
>    First, load up the modules:
> 
>      (use-modules (web server)
>                   (web request)
>                   (web response)
>                   (sxml simple))
> 
>    Now we define a simple templating function that takes a list of HTML
> body elements, as SXML, and puts them in our super template:
> 
>      (define (templatize title body)
>        `(html (head (title ,title))
>               (body ,@body)))
> 
>    For example, the simplest Hello HTML can be produced like this:
> 
>      (sxml->xml (templatize "Hello!" '((b "Hi!"))))
>      -|
>      <html><head><title>Hello!</title></head><body><b>Hi!</b></body></html>
> 
>    Much better to work with Scheme data types than to work with HTML as
> strings. Now we define a little response helper:
> 
>      (define* (respond #:optional body #:key
>                        (status 200)
>                        (title "Hello hello!")
>                        (doctype "<!DOCTYPE html>\n")
>                        (content-type-params '(("charset" . "utf-8")))
>                        (content-type "text/html")
>                        (extra-headers '())
>                        (sxml (and body (templatize title body))))
>        (values (build-response
>                 #:code status
>                 #:headers `((content-type
>                              . (,content-type ,@content-type-params))
>                             ,@extra-headers))
>                (lambda (port)
>                  (if sxml
>                      (begin
>                        (if doctype (display doctype port))
>                        (sxml->xml sxml port))))))
> 
>    Here we see the power of keyword arguments with default
> initializers. By the time the arguments are fully parsed, the `sxml'
> local variable will hold the templated SXML, ready for sending out to
> the client.
> 
>    Instead of returning the body as a string, here we give a
>    procedure,

Insert "Also, " before "Instead"?  Otherwise this reads as moving onto a
new example.

> which will be called by the web server to write out the response to the
> client.
> 
>    Now, a simple example using this responder, which lays out the
> incoming headers in an HTML table.
> 
>      (define (debug-page request body)
>        (respond
>         `((h1 "hello world!")
>           (table
>            (tr (th "header") (th "value"))
>            ,@(map (lambda (pair)
>                     `(tr (td (tt ,(with-output-to-string
>                                     (lambda () (display (car pair))))))
>                          (td (tt ,(with-output-to-string
>                                     (lambda ()
>                                       (write (cdr pair))))))))
>                   (request-headers request))))))
> 
>      (run-server debug-page)
> 
>    Now if you visit any local address in your web browser, we actually
> see some HTML, finally.
> 
> 7.3.7.4 Conclusion
> ..................
> 
> Well, this is about as far as Guile's built-in web support goes, for
> now.  There are many ways to make a web application, but hopefully by
> standardizing the most fundamental data types, users will be able to
> choose the approach that suits them best, while also being able to
> switch between implementations of the server.  This is a relatively new
> part of Guile, so if you have feedback, let us know, and we can take it
> into account.  Happy hacking on the web!

Thanks, a fun read!

     Neil



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: documentation for (web ...)
  2010-12-23 23:51   ` Neil Jerram
@ 2010-12-26 17:45     ` Andy Wingo
  2011-01-21 23:05       ` Neil Jerram
  2011-01-11  6:52     ` Andy Wingo
  1 sibling, 1 reply; 7+ messages in thread
From: Andy Wingo @ 2010-12-26 17:45 UTC (permalink / raw)
  To: Neil Jerram; +Cc: Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 22204 bytes --]

Heya Neil,

Happy holidays, and thanks for the comments!

On Thu 23 Dec 2010 18:51, Neil Jerram <neil@ossau.uklinux.net> writes:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> I’m not keen on the comparison to POSIX.
>
> I agree with Ludo that the comparison with POSIX isn't quite right.

Sure.  It's truthy but not true, I think ;-)  I'll revise it.

> [After reading all through: I'd say you've demonstrated that data types
> are good, but haven't shown any link with security problems, so the hook
> here remains dangling.]

Will add some context.  Here are some examples though.

  * "Cross-site scripting" (XSS) is where a user submits something to a
    web site, which is incorporated into that website (like a
    comment). For example:

        (define (bad submission)
          (string-append "<b>You entered: " submission "</b>"))

        (define (good submission)
          `(b "You entered: " ,submission))

    In the first case, the application works with text. In the second,
    it works with SXML, and something in the continuation does a
    sxml->xml on the composed result.

    Both `bad' and `good' are the same for a submission of "Hello".

    But for a submission of "<i>foo</i>", `bad' yields "<b>You entered:
    <i>foo</i></b>", which effectively treats the submission with the
    same status as the template -- you can paste in anything.

    On the other hand, `good' would produce "<b>You entered:
    &lt;i&gt;foo&lt;/i&gt;</b>", when serialized.  The submission is in
    a textual context, so it is treated as text and not as HTML.

    This seems somewhat academic, but XSS vulnerabilities occur exactly
    for this reason: treating both template and input as text, instead
    of using data types to prove certain characteristics about your web
    application (in this case, that user-submitted text will never
    become javascript, executing within your domain's privileges).

    http://en.wikipedia.org/wiki/Cross-site_scripting

  * "Cross-site request forgery" (CSRF) often involves a dynamic
    payload, generated by an XSS attack. For example I might comment on
    your web log, exploiting an XSS vulnerability, adding some
    javascript on your web site which will then be run by all
    viewers. That javascript could then perform a CSRF attack.

    Anyway, XSS is often CSRF, and the above XSS arguments apply.

    http://en.wikipedia.org/wiki/Cross-site_request_forgery

  * URL encoding attacks: decoding and encoding URLs is tricky. Using a
    separate data type for URLs with a limited number of operations on
    it can help you to make sure you are doing the right
    thing. Furthermore, using proper data types to parse path and query
    components helps avoid a number of the ad-hoc string-parsing errors
    that one might have.

    http://www.technicalinfo.net/papers/URLEmbeddedAttacks.html was the
    first hit I found on the various issues.

    I wrote a short article on URLs here:
    http://wingolog.org/archives/2010/12/23/doing-it-wrong

  * Viewing headers values as strings rather than instances of
    particular data types -> "HTTP response splitting"

    http://en.wikipedia.org/wiki/HTTP_response_splitting

    Note that we are not yet entirely "protected" from this
    issue. http://lwn.net/Articles/419350/ for a recent vulnerability.

Hey, thinking on this answer a bit: perhaps the key issue is
composability.  Strings don't compose nicely, because they are missing
information: does they need escaping or not, and if they do, what kind
of escaping?  Many vulnerabilities result from confusing this issue.
Using other data types (SXML, URI records, the request/response records)
allows the data to speak for themselves.  You can safely encode your
assumptions in types.

>>    Also, strictly speaking, a URI with a fragment is a "URI reference".
>> A fragment is typically not serialized when sending a URI over the
>> wire; that is, it is not part of the identifier of a resource.  It only
>> identifies a part of a given resource.
>
> I found that a bit tricky to understand.  I think an example of what you
> mean is that a web browser would only request the URI up to and
> excluding the #, and process the #... part itself (by scrolling to that
> point in the page).  It might help to say that.

Yes, will do.

>>  -- Function: build-uri scheme [#:userinfo] [#:host] [#:port] [#:path]
>>           [#:query] [#:fragment] [#:validate?]
>
> Why is the path arg not mandatory?

Because it has a default value: "". The path arg in "my-foo-scheme:" is
"".

>>  -- Function: declare-default-port! scheme port
>>      Declare a default port for the given URI scheme.
>> 
>>      Default ports are for printing URI objects: a default port is not
>>      printed.
>
> Does this really belong here?  Seems like mixing a bit of the `model'
> into the `view'.  I'd expect a URI without an explicit port to give
>
>   (uri-port uri) => #f

This is the case.

> and that if I do
>
>   (set! (uri-port uri) 80)

Note that we actually don't export port accessors -- you have to make a
new port.

We could export these accessors though.

> the :80 would be there in the string representation of the URI.

Quoth the RFC (3986):

    6.2.3.  Scheme-Based Normalization

       The syntax and semantics of URIs vary from scheme to scheme, as
       described by the defining specification for each scheme.
       Implementations may use scheme-specific rules, at further processing
       cost, to reduce the probability of false negatives.  For example,
       because the "http" scheme makes use of an authority component, has a
       default port of "80", and defines an empty path to be equivalent to
       "/", the following four URIs are equivalent:

          http://example.com
          http://example.com/
          http://example.com:/
          http://example.com:80/

       In general, a URI that uses the generic syntax for authority with an
       empty path should be normalized to a path of "/".  Likewise, an
       explicit ":port", for which the port is empty or the default for the
       scheme, is equivalent to one where the port and its ":" delimiter are
       elided and thus should be removed by scheme-based normalization.  For
       example, the second URI above is the normal form for the "http"
       scheme.

So this default port stuff is a poor-man's scheme-specific
normalization, to not display a port component in a serialization, if
the port is the default for the scheme.

>>  -- Function: parse-uri string
>>      Parse STRING into a URI object. Returns `#f' if the string could
>>      not be parsed.
>> 
>>  -- Function: unparse-uri uri
>>      Serialize URI to a string.
>
> Or uri->string ?  And I guess parse-uri could be string->uri.
> Cf. string->number and number->string.

Sure. I think I will add a keyword arg to switch between throwing errors
and returning #f, also.

>>  -- Function: uri-decode str [#:charset]
>>      Percent-decode the given STR, according to CHARSET.
>
> So the return value is a bytevector if CHARSET is #f, and a string if
> not?

Yes.

>>  -- Function: uri-encode str [#:charset] [#:unescaped-chars]
>>      Percent-encode any character not in UNESCAPED-CHARS.
>
> UNESCAPED-CHARS is a vector, a list, ...?

A character set, actually... Will indicate.

>>      Percent-encoding first writes out the given character to a
>
> s/character/string

Actually this is correct, it's just the docstring which is odd:
"percent-encode any _character_ not in unescaped-chars". Will reword
though.

>>  -- Function: split-and-decode-uri-path path
>>      Split PATH into its components, and decode each component,
>>      removing empty components.
>> 
>>      For example, `"/foo/bar/"' decodes to the two-element list,
>>      `("foo" "bar")'.
>
> Presumably this does % decoding too, so it would be good to give another
> example to show that.

Good idea.

>>     `parser'
>>           A procedure which takes a string and returns a parsed value.
>> 
>>     `validator'
>>           A predicate, returning `#t' iff the value is valid for this
>>           header.
>
> Maybe say something here about validator function often being very
> similar to parsing function?

They are not quite the same. A parser takes a string and produces a
Scheme value, and a validator takes a Scheme value and returns #t iff
the value is valid for that header. I will add an example.

>>  -- Function: declare-header! sym name [#:multiple?] [#:parser]
>>           [#:validator] [#:writer]
>>      Make a header declaration, as above, and register it by symbol and
>>      by name.
>
> Are the keyword args really optional?  If so, what are the defaults?

Only `multiple?' is optional. Will indicate the defaults for these and
other kwargs .They are keyword args just to allow for extensibility, and
for declare-header! invocations to read better.

> A possibly important point: what is the scope of the space in which
> these header declarations are made?  My reason for asking is that this
> infrastructure looks applicable for other HTTP-like protocols too, such
> as SIP.  But the detailed rules for a given header in SIP may be
> different from a header with the same name in HTTP, and hence different
> header-decl objects would be needed.  Therefore, even though we claim no
> other protocol support right now, perhaps we should anticipate that by
> enhancing declare-header! so as to distinguish between HTTP-space and
> other-protocol-spaces.

It's a good question. HTTP is deliberately MIME-like, but specifies a
number of important differences (see appendix 19.4 of RFC 2616).

For now, the scope is limited to HTTP headers.

> [After reading all through, I remain confused about exactly how general
> this server infrastructure is intended to be]

The ultimate intention is to allow the "web handler" stuff I mentioned
at the end of the section, and to allow the web app author to not care
very much what server is being used. To do this, we have to allow all
kinds of "server" implementations -- CGI, direct HTTP to a socket,
zeromq messages, etc. Regardless of how the request comes and the
response goes, we need to be able to recognize and parse HTTP headers
into their various appropriate data types -- and (web http) is really
the middle, here. The (web server) stuff is a higher-level abstraction
-- not a necessary abstraction, but helpful, if you can use it.

>>  -- Function: lookup-header-decl name
>>      Return the HEADER-DECL object registered for the given NAME.
>> 
>>      NAME may be a symbol or a string. Strings are mapped to headers in
>>      a case-insensitive fashion.
>> 
>>  -- Function: valid-header? sym val
>>      Returns a true value iff VAL is a valid Scheme value for the
>>      header with name SYM.
>
> Note slight inconsistency in the two above deffns: "Return" vs
> "Returns".

Which is the right one? "Return"? Will change.

>>    Now that we have a generic interface for reading and writing
>> headers, we do just that.
>> 
>>  -- Function: read-header port
>>      Reads one HTTP header from PORT. Returns two values: the header
>>      name and the parsed Scheme value.
>
> As multiple values?  Is that more helpful than as a cons?

Yes, as multiple values. The advantage is that returning multiple values
from a Scheme procedure does not cause any allocation.

>>      Returns #F for both values if the end of the message body was
>>      reached (i.e., a blank line).
>
> I'd find #<eof> more intuitive.

OK, will change.

>>  -- Function: parse-header name val
>>      Parse VAL, a string, with the parser for the header named NAME.
>> 
>>      Returns two values, the header name and parsed value. If a parser
>>      was found, the header name will be returned as a symbol. If a
>>      parser was not found, both the header name and the value are
>>      returned as strings.
>
> Again, multiple values or a cons?

Multiple values.

>>  -- Function: read-headers port
>>      Read an HTTP message from PORT, returning the headers as an
>>      ordered alist.
>
> s/Read/Read the headers of/  ?  i.e. Should the caller have already read
> the request/response line?

Indeed, the headers of. Will change.

>> The `(web http)' module defines parsers and unparsers for all headers
>> defined in the HTTP/1.1 standard.  This section describes the parsed
>> format of the various headers.
>> 
>>    We cannot describe the function of all of these headers, however, in
>> sufficient detail.
>
> I don't get the point here.

Do you mean that the reason is not apparent at this point in the
document? I don't think the intro is worded very well, and indeed it
appears to be a bit of buildup without knowing where you go... Maybe an
example in the beginning would be apropos?

Or should we give brief descriptions of the meanings of all of these
headers as well? That might be a good idea too.

>> `transfer-encoding'
>>      A param list of transfer codings.  `chunked' is the only known key.
>
> OK, why a param list rather than key-value?  How are elements in the
> second key-value list, say, different from elements in the first
> key-value list?

Well, some of these headers are quite unfortunate in their
construction.  In this case:

     Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding

So really, this is a list. But:

       transfer-coding         = "chunked" | transfer-extension
       transfer-extension      = token *( ";" parameter )
       parameter               = attribute "=" value
       attribute               = token
       value                   = token | quoted-string

Given that a transfer-extension is really a toeken with a number of
parameters, the thing gets complicated. You could have:

  Transfer-Encoding: chunked,abcd,newthing;foo="bar, baz; qux";xyzzy

which is hard to parse if you do it ad-hoc.  (web http) parses it as:

   (transfer-encoding . ((chunked) ("abcd") ("newthing" ("foo . "bar, baz; quz") "xyzzy")))

Still complicated, but more uniform at least. Saying that `chunked' is
the only known key means that it's the only one that's translated to a
symbol; i.e. `abcd' is parsed to a string.  (This is to prevent attacks
to intern a bunch of symbols; though symbols can be gc'd in guile.)

Does that help? I'll see about replacing usages of "param list" as "list
of key-value lists", as it's probably clearer, and we can save ourselves
a definition.

>> `www-authenticate'
>>      A string.
>
> Obviously there's lots of substructure there (in WWW-Authenticate) that
> we just don't support yet.  Is there a clear compatibility story for
> if/when Guile is enhanced to parse that out?
>
> I guess yes; calling code will just need something like
>
>   (if (string? val)
>       ;; An older Guile that doesn't parse authentication fully.
>       (do-application-own-parsing)
>       ;; A newer Guile that does parse authentication.
>       (use-the-parsed-authentication-object))

That's a very good question. The problem is that if we change the parsed
representation, then old code breaks. That's why I put in the effort to
give (hopefully) good representations for most headers, to avoid that
situation -- though you appear to have caught one laziness on my part
here, and in Authorizaton, Proxy-Authenticate, and Proxy-Authorization.

So maybe the right thing to do here is just to bite the bullet, parse as
the RFC says we should, and avoid this situation.

>>  -- Function: read-request port [meta]
>>      Read an HTTP request from PORT, optionally attaching the given
>>      metadata, META.
>> 
>>      As a side effect, sets the encoding on PORT to ISO-8859-1
>>      (latin-1), so that reading one character reads one byte. See the
>>      discussion of character sets in "HTTP Requests" in the manual, for
>>      more information.
>
> That last sentence is OK for a docstring, but strange here _in_ the
> manual.

Good point.

> And, where is that discussion?

Heh, good point :)

> Hmm, I think the provision of this data type needs a bit more
> motivation.  It doesn't appear to offer much additional value, compared
> with reading or writing the components of a request individually, and on
> the other hand it appears to bake in assumptions about charsets and
> content length that might not always be true.

I probably didn't explain it very well then. A request record holds the
data from a request -- the method, uri, headers, etc. Additionally it
can be read or written. It does not actually bake in assumptions about
character sets or the like. It's simply that that HTTP protocol is
flawed in this respect, that it mixes textual and binary data. We want
to be able to read and parse requests, responses, and their headers
using string and char routines, and that's fine as the character set for
HTTP messages is restricted to a subset of the lower ASCII set. But then
the body of an HTTP message is fundamentally binary -- the
content-length is specified in bytes, not characters.

So the right way to read off a body is as a bytevector of the specified
length (potentially with chunked transfer encoding of course, though we
don't do that yet). Then if you want text, you decode using the
character set specified in the request. If you are particularly lucky
and it is a textual type and the charset is not specified, you can read
it as a latin-1 string directly, otherwise you convert. Or you can deal
with the binary data as a string.

Setting the charset on the port is a bit of a hack, but it is the right
thing to do if you are reading HTTP. And it doesn't matter what the
charset is when you read the body as it's specified in bytes anyway and
should be read in bytes (and then, possibly, decoded).

Some more organized discussion should go in the manual... but what do
you think?

>>  -- Function: extend-response r k v . additional
>>      Extend an HTTP response by setting additional HTTP headers K, V.
>>      Returns a new HTTP response.
>
> What does the ADDITIONAL arg mean?

More k-v pairs. Will note in the manual.

>>  -- Function: adapt-response-version response version
>>      Adapt the given response to a different HTTP version. Returns a
>>      new HTTP response.
>
> Interesting, and adds more value to the idea of the response object.
> Why not for the request object too - are you assuming that Guile will
> usually be acting as the HTTP server?  (Which I'm sure is correct, but
> "usually" is not "always".)

The thing is that the request initiates the transaction -- so it's the
requestor that makes the version decision. If you want to decide on
another version, presumably you do so when you build-version. But
perhaps for some sort of "request middleware", this could be
interesting.

>>   1. The `open' hook is called, to open the server. `open' takes 0 or
>>      more arguments, depending on the backend,
>
> How is that possible?  (immediate thought... perhaps it will be
> explained later)

Yes let's make sure we explain that later (we don't yet). "depending on
the backend" should make that clear.

>>   2. The `read' hook is called, to read a request from a new client.
>>      The `read' hook takes one argument, the server socket.  It should
>
> It feels surprising for the infrastructure to pass the server socket to
> the read hook.  I'd expect the infrastructure to do the `accept' itself
> and pass the client socket to the read hook.

It's the opaque "server socket object". Doing it this way has a two
advantages:

  1) Works with other socket architectures (zeromq, particularly).

  2) Allows the server to make its own implementation of keepalive
  (or not).

Particularly the latter is interesting -- the http implementation makes
a big pollset (from (ice-9 poll), not yet documented), and polls on the
server socket and all the keepalive sockets.

> Also, does the infrastructure assume that each client socket will only
> be used for one request and response, and then closed?  Would it be hard
> to remove that assumption, so that the <server-impl> idea is more
> general?

I don't think it makes that assumption, no. Client lifecycle is totally
up to the implementation. I think the deal is you check out the client,
run the handler, then always call the "write" hook, even if there was
an error in the handler. If the request fails to be read, the read hook
should not return a socket.

>>      case a default response object is constructed with those headers.
>
> What about response status?  (perhaps represented as a "status" header,
> a la modlisp)

The response object has that in it -- it will be 200 by default. If you
want another one, it's easy to build-response #:code xxx #:headers
header, no?

>>    The `(web server)' module defines a number of routines that use
>> `<server-impl>' objects to implement parts of a web server.  Given that
>> we don't expose the accessors for the various fields of a
>> `<server-impl>', indeed these routines are the only procedures with any
>> access to the impl objects. 
>
> How general is <server-impl> hoping to be?  Correspondingly, is the (web
> server) module name appropriate?
>
> To me, "web" => "http", so (web server http) is a tautological name.
> And in fact it sounds like you intend <server-impl> to cover more than
> just web/HTTP, so I suppose it should be in a module like (server),
> rather than (web server).
>
> It seems we could do with some more server impls in order to validate
> that the infrastructure is all defined correctly.  Time-permitting, I'd
> like to play with writing modlisp support for this new system, analogous
> to what I did already in guile-www.

I have written mod-lisp support. You can see it in tekuti/mod-lisp.scm,
attached here:


[-- Attachment #2: (tekuti mod-lisp) --]
[-- Type: text/plain, Size: 9341 bytes --]

;; Tekuti
;; Copyright (C) 2008, 2010 Andy Wingo <wingo at pobox dot com>

;; This program is free software; you can redistribute it and/or    
;; modify it under the terms of the GNU General Public License as   
;; published by the Free Software Foundation; either version 3 of   
;; the License, or (at your option) any later version.              
;;                                                                  
;; This program is distributed in the hope that it will be useful,  
;; but WITHOUT ANY WARRANTY; without even the implied warranty of   
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the    
;; GNU General Public License for more details.                     
;;                                                                  
;; You should have received a copy of the GNU General Public License
;; along with this program; if not, contact:
;;
;; Free Software Foundation           Voice:  +1-617-542-5942
;; 59 Temple Place - Suite 330        Fax:    +1-617-542-2652
;; Boston, MA  02111-1307,  USA       gnu@gnu.org

;;; Commentary:
;;
;; Web server implementation for mod-lisp.
;;
;;; Code:

(define-module (tekuti mod-lisp)
  #:use-module (ice-9 rdelim)
  #:use-module (system repl error-handling)
  #:use-module (srfi srfi-9)
  #:use-module (ice-9 poll)
  #:use-module (rnrs bytevectors)
  #:use-module (web http)
  #:use-module (web request)
  #:use-module (web response)
  #:use-module (web server))


;;; FIXME: ignore SIGPIPE, otherwise apache dying will kill us

(define *mod-lisp-headers* (make-hash-table))

(define (define-mod-lisp-header! sym name parser)
  (hash-set! *mod-lisp-headers* name (cons sym parser)))

(define (mod-lisp-sym-and-parser name)
  (hash-ref *mod-lisp-headers* name))

(define-mod-lisp-header! 'server-protocol
  "server-protocol"
  parse-http-version)

(define-mod-lisp-header! 'method
  "method"
  parse-http-method)

(define-mod-lisp-header! 'url
  "url"
  parse-request-uri)

(define-mod-lisp-header! 'server-ip-addr
  "server-ip-addr"
  identity)

(define-mod-lisp-header! 'server-ip-port
  "server-ip-port"
  string->number)

(define-mod-lisp-header! 'remote-ip-addr
  "remote-ip-addr"
  identity)

(define-mod-lisp-header! 'remote-ip-port
  "remote-ip-port"
  string->number)

(define-mod-lisp-header! 'server-id
  "server-id"
  identity)

(define-mod-lisp-header! 'server-ip-addr
  "server-ip-addr"
  identity)

(define-mod-lisp-header! 'server-baseversion
  "server-baseversion"
  identity)

(define-mod-lisp-header! 'modlisp-version
  "modlisp-version"
  identity)

(define-mod-lisp-header! 'modlisp-major-version
  "modlisp-major-version"
  string->number)

(define (read-headers/mod-lisp socket)
  (define (read-line*)
    (let ((line (read-line socket)))
      (if (eof-object? line)
          (error "unexpected eof")
          line)))
  (let lp ((headers '()) (meta '()))
    (let ((k (read-line*)))
      (if (string=? k "end")
          (values (reverse! headers) (reverse! meta))
          (let ((sym-and-parser (mod-lisp-sym-and-parser k))
                (v (read-line*)))
            (if sym-and-parser
                (lp headers
                    (acons (car sym-and-parser)
                           ((cdr sym-and-parser) v)
                           meta))
                (call-with-values (lambda () (parse-header k v))
                  (lambda (k v)
                    (lp (acons k v headers) meta)))))))))

(define (read-request/mod-lisp port)
  ;; See the note in (web request) regarding chars, bytes, and strings
  ;; for more notes on charsets.
  (set-port-encoding! port "ISO-8859-1")
  (call-with-values (lambda () (read-headers/mod-lisp port))
    (lambda (headers meta)
      (build-request
       #:method (assq-ref meta 'method)
       #:uri (assq-ref meta 'url)
       #:version (assq-ref meta 'server-protocol)
       #:headers headers
       #:meta meta
       #:port port))))

(define (write-header/mod-lisp name val port)
  (if (string? name)
      ;; assume that it's a header we don't know about...
      (begin
        (display name port) (newline port)
        (display val port) (newline port))
      (let ((decl (lookup-header-decl name)))
        (if (not decl)
            (error "Unknown header" name)
            (begin
              (display (header-decl-name decl) port) (newline port)
              ((header-decl-writer decl) val port) (newline port))))))

(define (write-response-line/mod-lisp code phrase port)
  (write-header/mod-lisp "Status"
                         (string-append (number->string code) " " phrase)
                         port))

(define (write-headers/mod-lisp headers port)
  (for-each
   (lambda (pair) 
     (write-header/mod-lisp (car pair) (cdr pair) port))
   headers))

(define (write-response/mod-lisp r port)
  (write-response-line/mod-lisp (response-code r)
                                (response-reason-phrase r) port)
  (write-headers/mod-lisp (response-headers r) port)
  (display "end" port) (newline port)
  (if (eq? port (response-port r))
      r
      (build-response #:version (response-version r)
                      #:code (response-code r)
                      #:reason-phrase (response-reason-phrase r)
                      #:headers (response-headers r)
                      #:port port)))

(define (make-default-socket family addr port)
  (let ((sock (socket PF_INET SOCK_STREAM 0)))
    (setsockopt sock SOL_SOCKET SO_REUSEADDR 1)
    (bind sock family addr port)
    sock))

(define-record-type <mod-lisp-server>
  (make-mod-lisp-server socket poll-idx poll-set)
  mod-lisp-server?
  (socket mod-lisp-socket)
  (poll-idx mod-lisp-poll-idx set-mod-lisp-poll-idx!)
  (poll-set mod-lisp-poll-set))

(define *error-events* (logior POLLHUP POLLERR))
(define *read-events* POLLIN)
(define *events* (logior *error-events* *read-events*))

;; -> server
(define* (mod-lisp-open #:key
                    (host #f)
                    (family AF_INET)
                    (addr (if host
                              (inet-pton family host)
                              INADDR_LOOPBACK))
                    (port 8080)
                    (socket (make-default-socket family addr port)))
  (listen socket 128)
  (sigaction SIGPIPE SIG_IGN)
  (let ((poll-set (make-empty-poll-set)))
    (poll-set-add! poll-set socket *events*)
    (make-mod-lisp-server socket 0 poll-set)))

;; -> (client request body | #f #f #f)
(define (mod-lisp-read server)
  (let* ((poll-set (mod-lisp-poll-set server)))
    (let lp ((idx (mod-lisp-poll-idx server)))
      (let ((revents (poll-set-revents poll-set idx)))
        (cond
         ((zero? idx)
          ;; The server socket, and the end of our downward loop.
          (cond
           ((zero? revents)
            ;; No client ready, and no error; poll and loop.
            (poll poll-set)
            (lp (1- (poll-set-nfds poll-set))))
           ((not (zero? (logand revents *error-events*)))
            ;; An error.
            (throw 'interrupt))
           (else
            ;; A new client. Add to set, poll, and loop.
            ;;
            ;; FIXME: preserve meta-info.
            (let ((client (accept (poll-set-port poll-set idx))))
              ;; Fully buffered.
              (setvbuf (car client) _IOFBF)
              ;; From "HOP, A Fast Server for the Diffuse Web", Serrano.
              (setsockopt (car client) SOL_SOCKET SO_SNDBUF (* 12 1024))
              (poll-set-add! poll-set (car client) *events*)
              (poll poll-set)
              (lp (1- (poll-set-nfds poll-set)))))))
         ((zero? revents)
          ;; Nothing on this port.
          (lp (1- idx)))
         ;; Otherwise, a client socket with some activity on
         ;; it. Remove it from the poll set.
         (else
          (let ((port (poll-set-remove! poll-set idx)))
            (cond
             ((eof-object? (peek-char port))
              ;; EOF.
              (close-port port)
              (lp (1- idx)))
             (else
              ;; Otherwise, try to read a request from this port.
              ;; Record the next index.
              (set-mod-lisp-poll-idx! server (1- idx))
              (with-throw-handler
               #t
               (lambda ()
                 (let ((req (read-request/mod-lisp port)))
                   (values port
                           req
                           (read-request-body/latin-1 req))))
               (lambda (k . args)
                 (false-if-exception (close-port port)))))))))))))

;; -> unspecified values
(define (mod-lisp-write server client response body)
  (let ((response (write-response/mod-lisp response client)))
    (cond
     ((not body))                       ; pass
     ((string? body)
      (write-response-body/latin-1 response body))
     ((bytevector? body)
      (write-response-body/bytevector response body))
     (else
      (error "Expected a string or bytevector for body" body)))
    (close-port (response-port response))))

;; -> unspecified values
(define (mod-lisp-close server)
  (let ((poll-set (mod-lisp-poll-set server)))
    (let lp ((n (poll-set-nfds poll-set)))
      (if (positive? n)
          (begin
            (close-port (poll-set-remove! poll-set (1- n)))
            (lp (1- n)))))))

(define-server-impl mod-lisp
  mod-lisp-open
  mod-lisp-read
  mod-lisp-write
  mod-lisp-close)

[-- Attachment #3: Type: text/plain, Size: 1401 bytes --]


Does that document help? This is only http, not other servers.

> I think there's a one-request-per-connection assumption here, isn't
> there?

Heh, no. See the (web server http) implementation. Perhaps misnamed
though -- should it be (web server socket)? But most servers have
sockets? I chose HTTP as it is the name of the wire protocol, though
other protocols are possible that have that same semantic content
(mod-lisp for example). Other suggestions welcome.

>>      The default server implementation is `http', which accepts
>>      OPEN-PARAMS like `(#:port 8081)', among others. See "Web Server"
>>      in the manual, for more information.
>
> Last sentence should be removed from the manual version of the
> docstring.

ACK.

>> 7.3.7.1 Hello, World!
>> .....................
>
> The thunder here has been somewhat stolen by the fact that you already
> presented this example above!

True! Well, we didn't actually run it, but hey... perhaps elide from the
previous section, and point people here?

>>    Instead of returning the body as a string, here we give a
>>    procedure,
>
> Insert "Also, " before "Instead"?  Otherwise this reads as moving onto a
> new example.

Good point!

Whew! Long mail, but it's a lot of new code and docs, so the feedback is
much appreciated. I've inserted notes in my web.texi, and will poke this
shortly.

Happy holidays,

Andy
-- 
http://wingolog.org/

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: documentation for (web ...)
  2010-12-23 23:51   ` Neil Jerram
  2010-12-26 17:45     ` Andy Wingo
@ 2011-01-11  6:52     ` Andy Wingo
  1 sibling, 0 replies; 7+ messages in thread
From: Andy Wingo @ 2011-01-11  6:52 UTC (permalink / raw)
  To: Neil Jerram; +Cc: Ludovic Courtès, guile-devel

Heya,

This web thing has really been a slog, but I think I'm done with it now.
Actually I know that _I_ am done with it ;)

I think I took care of all your concerns.  Let me know if you have
anything you really care about still pending.  But, for now, I am quite
tired of RFC 2616 :)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: documentation for (web ...)
  2010-12-26 17:45     ` Andy Wingo
@ 2011-01-21 23:05       ` Neil Jerram
  2011-01-22 15:01         ` Andy Wingo
  0 siblings, 1 reply; 7+ messages in thread
From: Neil Jerram @ 2011-01-21 23:05 UTC (permalink / raw)
  To: Andy Wingo; +Cc: Ludovic Courtès, guile-devel

[-- Attachment #1: Type: text/plain, Size: 12794 bytes --]

Andy Wingo <wingo@pobox.com> writes:

> Heya Neil,

Hi Andy,

I've properly read all your responses on this now, and basically agree
with them all.  I've just added a few further comments on specific
points below.  I've also looked at the updated docs, and think they're
great.

I noticed a few minor glitches in the doc text, and a patch for those is
attached.  I've also attached patches for updating your (tekuti
mod-lisp) to the latest (web ...) API.

Thanks for working on this area; it's great to have this function in the
Guile core.

> So this default port stuff is a poor-man's scheme-specific
> normalization, to not display a port component in a serialization, if
> the port is the default for the scheme.

Thanks, I see that now.

>>>  -- Function: uri-decode str [#:charset]
>>>      Percent-decode the given STR, according to CHARSET.
>>
>> So the return value is a bytevector if CHARSET is #f, and a string if
>> not?
>
> Yes.

I'm still not completely sure here.  What if STR contains normal
characters as well as possible %XX sequences.  If I call uri-decode with
#:encoding #f, how is each normal character mapped into the resulting
bytevector?

>>>     `parser'
>>>           A procedure which takes a string and returns a parsed value.
>>> 
>>>     `validator'
>>>           A predicate, returning `#t' iff the value is valid for this
>>>           header.
>>
>> Maybe say something here about validator function often being very
>> similar to parsing function?
>
> They are not quite the same. A parser takes a string and produces a
> Scheme value, and a validator takes a Scheme value and returns #t iff
> the value is valid for that header. I will add an example.

Thanks, that makes better sense.  I was previously thinking that both
the validator and the parser acted on the raw header value.

I think there's still a glitch in the doc text, and have proposed an
update in the attached patch.

>> A possibly important point: what is the scope of the space in which
>> these header declarations are made?  My reason for asking is that this
>> infrastructure looks applicable for other HTTP-like protocols too, such
>> as SIP.  But the detailed rules for a given header in SIP may be
>> different from a header with the same name in HTTP, and hence different
>> header-decl objects would be needed.  Therefore, even though we claim no
>> other protocol support right now, perhaps we should anticipate that by
>> enhancing declare-header! so as to distinguish between HTTP-space and
>> other-protocol-spaces.
>
> It's a good question. HTTP is deliberately MIME-like, but specifies a
> number of important differences (see appendix 19.4 of RFC 2616).
>
> For now, the scope is limited to HTTP headers.

OK.

>> [After reading all through, I remain confused about exactly how general
>> this server infrastructure is intended to be]
>
> The ultimate intention is to allow the "web handler" stuff I mentioned
> at the end of the section, and to allow the web app author to not care
> very much what server is being used. To do this, we have to allow all
> kinds of "server" implementations -- CGI, direct HTTP to a socket,
> zeromq messages, etc. Regardless of how the request comes and the
> response goes, we need to be able to recognize and parse HTTP headers
> into their various appropriate data types -- and (web http) is really
> the middle, here. The (web server) stuff is a higher-level abstraction
> -- not a necessary abstraction, but helpful, if you can use it.

Thanks.  After playing with the mod-lisp code, I think I've finally
understood this.  The `web' in `(web ...)' means requests and response
with the HTTP-defined structure - even if they might be delivered to
application via something like modlisp or CGI.  Whereas the `http' in
`(web server http)' means delivery directly from/to a socket in HTTP
wire format.

Which is fine.  I daresay there might be a useful future extension to
something like SIP, but there's absolutely no need to try to engineer
that in now.

>>>  -- Function: lookup-header-decl name
>>>      Return the HEADER-DECL object registered for the given NAME.
>>> 
>>>      NAME may be a symbol or a string. Strings are mapped to headers in
>>>      a case-insensitive fashion.
>>> 
>>>  -- Function: valid-header? sym val
>>>      Returns a true value iff VAL is a valid Scheme value for the
>>>      header with name SYM.
>>
>> Note slight inconsistency in the two above deffns: "Return" vs
>> "Returns".
>
> Which is the right one? "Return"? Will change.

I very much doubt that Guile is globally consistent on this; but it was
quite noticeable here.

>>>    Now that we have a generic interface for reading and writing
>>> headers, we do just that.
>>> 
>>>  -- Function: read-header port
>>>      Reads one HTTP header from PORT. Returns two values: the header
>>>      name and the parsed Scheme value.
>>
>> As multiple values?  Is that more helpful than as a cons?
>
> Yes, as multiple values. The advantage is that returning multiple values
> from a Scheme procedure does not cause any allocation.

Ah, OK.

>>> The `(web http)' module defines parsers and unparsers for all headers
>>> defined in the HTTP/1.1 standard.  This section describes the parsed
>>> format of the various headers.
>>> 
>>>    We cannot describe the function of all of these headers, however, in
>>> sufficient detail.
>>
>> I don't get the point here.
>
> Do you mean that the reason is not apparent at this point in the
> document? I don't think the intro is worded very well, and indeed it
> appears to be a bit of buildup without knowing where you go... Maybe an
> example in the beginning would be apropos?

I meant that I didn't understand why "cannot" - rather than, say,
"don't" or "don't want to" - and the meaning of "sufficient detail" -
i.e. sufficient for what?

I think that the text now, "For full details on the meanings of all of
these headers, see the HTTP 1.1 standard, RFC 2616.", is better, and
covers these points.

> Or should we give brief descriptions of the meanings of all of these
> headers as well? That might be a good idea too.

No, I don't think that's needed.

>>> `transfer-encoding'
>>>      A param list of transfer codings.  `chunked' is the only known key.
>>
>> OK, why a param list rather than key-value?  How are elements in the
>> second key-value list, say, different from elements in the first
>> key-value list?
>
> Well, some of these headers are quite unfortunate in their
> construction.  In this case:
>
>      Transfer-Encoding       = "Transfer-Encoding" ":" 1#transfer-coding
>
> So really, this is a list. But:
>
>        transfer-coding         = "chunked" | transfer-extension
>        transfer-extension      = token *( ";" parameter )
>        parameter               = attribute "=" value
>        attribute               = token
>        value                   = token | quoted-string
>
> Given that a transfer-extension is really a toeken with a number of
> parameters, the thing gets complicated. You could have:
>
>   Transfer-Encoding: chunked,abcd,newthing;foo="bar, baz; qux";xyzzy
>
> which is hard to parse if you do it ad-hoc.  (web http) parses it as:
>
>    (transfer-encoding . ((chunked) ("abcd") ("newthing" ("foo . "bar, baz; quz") "xyzzy")))
>
> Still complicated, but more uniform at least. Saying that `chunked' is
> the only known key means that it's the only one that's translated to a
> symbol; i.e. `abcd' is parsed to a string.  (This is to prevent attacks
> to intern a bunch of symbols; though symbols can be gc'd in guile.)
>
> Does that help? I'll see about replacing usages of "param list" as "list
> of key-value lists", as it's probably clearer, and we can save ourselves
> a definition.

Hmm.  I still don't feel I completely understand this; but on the other
hand it's too fiddly to me to want to go into more now.  I think I'll
wait until I actually have to process something with these structures.

>>> `www-authenticate'
>>>      A string.
>>
>> Obviously there's lots of substructure there (in WWW-Authenticate) that
>> we just don't support yet.  Is there a clear compatibility story for
>> if/when Guile is enhanced to parse that out?
>>
>> I guess yes; calling code will just need something like
>>
>>   (if (string? val)
>>       ;; An older Guile that doesn't parse authentication fully.
>>       (do-application-own-parsing)
>>       ;; A newer Guile that does parse authentication.
>>       (use-the-parsed-authentication-object))
>
> That's a very good question. The problem is that if we change the parsed
> representation, then old code breaks. That's why I put in the effort to
> give (hopefully) good representations for most headers, to avoid that
> situation -- though you appear to have caught one laziness on my part
> here, and in Authorizaton, Proxy-Authenticate, and Proxy-Authorization.

I don't think I could ever think that you are lazy!

> So maybe the right thing to do here is just to bite the bullet, parse as
> the RFC says we should, and avoid this situation.

As long as it's a bounded problem, fine.  (And I see that the modules do
now crack out authorizaton and authenticate headers.)

>> Hmm, I think the provision of this data type needs a bit more
>> motivation.  It doesn't appear to offer much additional value, compared
>> with reading or writing the components of a request individually, and on
>> the other hand it appears to bake in assumptions about charsets and
>> content length that might not always be true.
>
> I probably didn't explain it very well then. A request record holds the
> data from a request -- the method, uri, headers, etc. Additionally it
> can be read or written. It does not actually bake in assumptions about
> character sets or the like. It's simply that that HTTP protocol is
> flawed in this respect, that it mixes textual and binary data. We want
> to be able to read and parse requests, responses, and their headers
> using string and char routines, and that's fine as the character set for
> HTTP messages is restricted to a subset of the lower ASCII set. But then
> the body of an HTTP message is fundamentally binary -- the
> content-length is specified in bytes, not characters.

Thanks.  I think I see the usefulness of request and response objects
now, given the "overall picture" above.

> So the right way to read off a body is as a bytevector of the specified
> length (potentially with chunked transfer encoding of course, though we
> don't do that yet). Then if you want text, you decode using the
> character set specified in the request. If you are particularly lucky
> and it is a textual type and the charset is not specified, you can read
> it as a latin-1 string directly, otherwise you convert. Or you can deal
> with the binary data as a string.
>
> Setting the charset on the port is a bit of a hack, but it is the right
> thing to do if you are reading HTTP. And it doesn't matter what the
> charset is when you read the body as it's specified in bytes anyway and
> should be read in bytes (and then, possibly, decoded).
>
> Some more organized discussion should go in the manual... but what do
> you think?

The discussion in `An Important Note on Character Sets' looks good to
me.

>>>  -- Function: adapt-response-version response version
>>>      Adapt the given response to a different HTTP version. Returns a
>>>      new HTTP response.
>>
>> Interesting, and adds more value to the idea of the response object.
>> Why not for the request object too - are you assuming that Guile will
>> usually be acting as the HTTP server?  (Which I'm sure is correct, but
>> "usually" is not "always".)
>
> The thing is that the request initiates the transaction -- so it's the
> requestor that makes the version decision.

Oh yes, of course.

>>>   2. The `read' hook is called, to read a request from a new client.
>>>      The `read' hook takes one argument, the server socket.  It should
>>
>> It feels surprising for the infrastructure to pass the server socket to
>> the read hook.  I'd expect the infrastructure to do the `accept' itself
>> and pass the client socket to the read hook.
>
> It's the opaque "server socket object". Doing it this way has a two
> advantages:
>
>   1) Works with other socket architectures (zeromq, particularly).

I'm not familiar with that, but will take a look.

>   2) Allows the server to make its own implementation of keepalive
>   (or not).
>
> Particularly the latter is interesting -- the http implementation makes
> a big pollset (from (ice-9 poll), not yet documented), and polls on the
> server socket and all the keepalive sockets.

As does (tekuti mod-lisp) - so the duplication is a slight shame.  But I
agree that there's no reason why the work to unduplicate that should be
in (web server).

Regards,
        Neil


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Fix-typos-in-web-.-doc.patch --]
[-- Type: text/x-diff, Size: 1603 bytes --]

From fe431700a6c3ddf3fb26fce9607b9c7866968949 Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@ossau.uklinux.net>
Date: Fri, 21 Jan 2011 19:34:01 +0000
Subject: [PATCH] Fix typos in (web ...) doc

* doc/ref/web.texi (Types and the Web): "help" -> "helpful".
  (HTTP): Add closing paren.  Remove code that looks like a leftover.
---
 doc/ref/web.texi |   12 +++---------
 1 files changed, 3 insertions(+), 9 deletions(-)

diff --git a/doc/ref/web.texi b/doc/ref/web.texi
index 3c7e0cd..76aa510 100644
--- a/doc/ref/web.texi
+++ b/doc/ref/web.texi
@@ -59,8 +59,8 @@ valid dates.  Error handling for a number of basic cases, like invalid
 dates, occurs on the boundary in which we produce a SRFI 19 date record
 from other types, like strings.
 
-With regards to the web, data types are help in the two broad phases of
-HTTP messages: parsing and generation.
+With regards to the web, data types are helpful in the two broad phases
+of HTTP messages: parsing and generation.
 
 Consider a server, which has to parse a request, and produce a response.
 Guile will parse the request into an HTTP request object
@@ -339,7 +339,7 @@ For example:
 
 (string->header "FOO")
 @result{} foo
-(header->string 'foo
+(header->string 'foo)
 @result{} "Foo"
 @end example
 
@@ -387,12 +387,6 @@ leaving it as a string.  You could register this header with Guile's
 HTTP stack like this:
 
 @example
-(define (parse-ip str)
-  (inet-aton str)
-(define (validate-ip ip)
-(define (write-ip ip port)
-  (display (inet-ntoa ip) port))
-
 (declare-header! "X-Client-Address"
   (lambda (str)
     (inet-aton str))
-- 
1.7.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: 0001-Export-server-impl-so-that-applications-can-use-it.patch --]
[-- Type: text/x-diff, Size: 717 bytes --]

From 072f3606be11e1e00542261017be531d8f4ade9b Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@ossau.uklinux.net>
Date: Fri, 21 Jan 2011 16:31:30 +0000
Subject: [PATCH 1/6] Export server impl so that applications can use it

---
 tekuti/mod-lisp.scm |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 59002b7..c302f58 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -33,7 +33,8 @@
   #:use-module (web http)
   #:use-module (web request)
   #:use-module (web response)
-  #:use-module (web server))
+  #:use-module (web server)
+  #:export (mod-lisp))
 
 
 ;;; FIXME: ignore SIGPIPE, otherwise apache dying will kill us
-- 
1.7.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: 0002-Update-body-related-calls-to-new-API.patch --]
[-- Type: text/x-diff, Size: 1276 bytes --]

From 00b99fec62f8999f4ae151f8d71bc0f66de63a00 Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@ossau.uklinux.net>
Date: Fri, 21 Jan 2011 16:33:44 +0000
Subject: [PATCH 2/6] Update body-related calls to new API

---
 tekuti/mod-lisp.scm |    6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index c302f58..f644643 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -249,7 +249,7 @@
                  (let ((req (read-request/mod-lisp port)))
                    (values port
                            req
-                           (read-request-body/latin-1 req))))
+                           (read-request-body req))))
                (lambda (k . args)
                  (false-if-exception (close-port port)))))))))))))
 
@@ -259,9 +259,9 @@
     (cond
      ((not body))                       ; pass
      ((string? body)
-      (write-response-body/latin-1 response body))
+      (write-response-body response (string->utf8 body)))
      ((bytevector? body)
-      (write-response-body/bytevector response body))
+      (write-response-body response body))
      (else
       (error "Expected a string or bytevector for body" body)))
     (close-port (response-port response))))
-- 
1.7.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #5: 0003-Avoid-using-lookup-header-decl-which-isn-t-exported.patch --]
[-- Type: text/x-diff, Size: 1161 bytes --]

From 32b9f7252de2d73681a6df47fe19fcb361e9d3a1 Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@ossau.uklinux.net>
Date: Fri, 21 Jan 2011 16:34:10 +0000
Subject: [PATCH 3/6] Avoid using lookup-header-decl, which isn't exported

---
 tekuti/mod-lisp.scm |   11 +++++------
 1 files changed, 5 insertions(+), 6 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index f644643..3a8b254 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -136,12 +136,11 @@
       (begin
         (display name port) (newline port)
         (display val port) (newline port))
-      (let ((decl (lookup-header-decl name)))
-        (if (not decl)
-            (error "Unknown header" name)
-            (begin
-              (display (header-decl-name decl) port) (newline port)
-              ((header-decl-writer decl) val port) (newline port))))))
+      (if (not (known-header? name))
+	  (error "Unknown header" name)
+	  (begin
+	    (display (header->string name) port) (newline port)
+	    ((header-writer name) val port) (newline port)))))
 
 (define (write-response-line/mod-lisp code phrase port)
   (write-header/mod-lisp "Status"
-- 
1.7.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #6: 0004-Update-to-new-parse-header-signature-which-only-retu.patch --]
[-- Type: text/x-diff, Size: 1019 bytes --]

From beb163df99803dcebe8b7fcf0fb4481a1a0f7f8b Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@ossau.uklinux.net>
Date: Fri, 21 Jan 2011 16:34:40 +0000
Subject: [PATCH 4/6] Update to new parse-header signature, which only returns one value

---
 tekuti/mod-lisp.scm |    7 ++++---
 1 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 3a8b254..f3e36a4 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -112,9 +112,10 @@
                     (acons (car sym-and-parser)
                            ((cdr sym-and-parser) v)
                            meta))
-                (call-with-values (lambda () (parse-header k v))
-                  (lambda (k v)
-                    (lp (acons k v headers) meta)))))))))
+		(lp (acons (string->header k)
+			   (parse-header (string->header k) v)
+			   headers)
+		    meta)))))))
 
 (define (read-request/mod-lisp port)
   ;; See the note in (web request) regarding chars, bytes, and strings
-- 
1.7.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #7: 0005-Update-to-new-build-request-signature.patch --]
[-- Type: text/x-diff, Size: 837 bytes --]

From f94c51e3e73e96b9f729af38a7ee98b193eaf9c5 Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@ossau.uklinux.net>
Date: Fri, 21 Jan 2011 17:14:26 +0000
Subject: [PATCH 5/6] Update to new build-request signature

---
 tekuti/mod-lisp.scm |    3 +--
 1 files changed, 1 insertions(+), 2 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index f3e36a4..00a2921 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -123,9 +123,8 @@
   (set-port-encoding! port "ISO-8859-1")
   (call-with-values (lambda () (read-headers/mod-lisp port))
     (lambda (headers meta)
-      (build-request
+      (build-request (assq-ref meta 'url)
        #:method (assq-ref meta 'method)
-       #:uri (assq-ref meta 'url)
        #:version (assq-ref meta 'server-protocol)
        #:headers headers
        #:meta meta
-- 
1.7.1


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #8: 0006-Don-t-validate-headers-as-we-get-an-apparently-inval.patch --]
[-- Type: text/x-diff, Size: 769 bytes --]

From 7408a37c93caf6880302814058d2d445cd121891 Mon Sep 17 00:00:00 2001
From: Neil Jerram <neil@ossau.uklinux.net>
Date: Fri, 21 Jan 2011 17:15:15 +0000
Subject: [PATCH 6/6] Don't validate headers, as we get an apparently invalid Accept header
 from apache2/mod-lisp

---
 tekuti/mod-lisp.scm |    3 ++-
 1 files changed, 2 insertions(+), 1 deletions(-)

diff --git a/tekuti/mod-lisp.scm b/tekuti/mod-lisp.scm
index 00a2921..18a5649 100644
--- a/tekuti/mod-lisp.scm
+++ b/tekuti/mod-lisp.scm
@@ -128,7 +128,8 @@
        #:version (assq-ref meta 'server-protocol)
        #:headers headers
        #:meta meta
-       #:port port))))
+       #:port port
+       #:validate-headers? #f))))
 
 (define (write-header/mod-lisp name val port)
   (if (string? name)
-- 
1.7.1


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: documentation for (web ...)
  2011-01-21 23:05       ` Neil Jerram
@ 2011-01-22 15:01         ` Andy Wingo
  0 siblings, 0 replies; 7+ messages in thread
From: Andy Wingo @ 2011-01-22 15:01 UTC (permalink / raw)
  To: Neil Jerram; +Cc: Ludovic Courtès, guile-devel

Greets,

On Sat 22 Jan 2011 00:05, Neil Jerram <neil@ossau.uklinux.net> writes:

>>>>  -- Function: uri-decode str [#:charset]
>>>>      Percent-decode the given STR, according to CHARSET.
>
> I'm still not completely sure here.  What if STR contains normal
> characters as well as possible %XX sequences.  If I call uri-decode with
> #:encoding #f, how is each normal character mapped into the resulting
> bytevector?

From RFC 3986 1.2.1:

   A URI is a sequence of characters from a very limited set: the
   letters of the basic Latin alphabet, digits, and a few special
   characters.

So for non-encoded chars, if the char->integer of the char is less than
128, then we just write the byte out; and otherwise signal an error.  I
think that means that the encoding has to be 7-bit ascii-compatible.

>> Or should we give brief descriptions of the meanings of all of these
>> headers as well? That might be a good idea too.
>
> No, I don't think that's needed.

I did end up doing that, but if it's too much text, we can think of ways
to compress it...

Speaking of which, Peter Bex's "intarweb" egg for chicken is a fairly
mature, well-documented piece of code that does pretty much the same
thing.  I only learned about it after I had done most of the work on
Guile's web stack, but it's still useful as a repo of good ideas, docs,
and interfaces:

  http://wiki.call-cc.org/eggref/4/intarweb?action=show

>>>> `transfer-encoding'

The current text is:

 -- HTTP Header: List transfer-encoding
     A list of transfer codings, expressed as key-value lists.  The only
     transfer coding defined by the specification is `chunked'.
          (parse-header 'transfer-encoding "chunked")
          => (chunked)

And actually that's a bug, it actually parses to ((chunked)).  I'll push
a fix.

> Hmm.  I still don't feel I completely understand this; but on the other
> hand it's too fiddly to me to want to go into more now.  I think I'll
> wait until I actually have to process something with these structures.

Yeah, it's not really a pleasant topic.  Hopefully this sleeping dog
will indeed lie.

>>>> `www-authenticate'
>>>>      A string.
>>>
>>> Obviously there's lots of substructure there (in WWW-Authenticate) that
>>> we just don't support yet.  Is there a clear compatibility story for
>>> if/when Guile is enhanced to parse that out?
>>>
>>> I guess yes; calling code will just need something like
>>>
>>>   (if (string? val)
>>>       ;; An older Guile that doesn't parse authentication fully.
>>>       (do-application-own-parsing)
>>>       ;; A newer Guile that does parse authentication.
>>>       (use-the-parsed-authentication-object))

So I did fix this one.  But the compatibility story for other headers is
that if the application is relying on the value of a header that is not
declared by Guile, the application should declare that headers, and rely
on its parsers, validators, and writers.

That way if Guile comes along later and defines a parser for Set-Cookie,
the application that depends on Set-Cookie doesn't have to change,
because it already defined its own parser.

>>   1) Works with other socket architectures (zeromq, particularly).
>
> I'm not familiar with that, but will take a look.

I really like the idea of mongrel2 -- http://mongrel2.org/home.
Mongrel2 is a web server that just parses the request, figures out where
to route the request, then forwards the request to some other process
via zeromq (zeromq.org).  Really seems like the Right Thing(TM) for a
web server to do.  A project for another day...

Thanks for all your helpful comments!

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2011-01-22 15:01 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-12-14 21:01 documentation for (web ...) Andy Wingo
2010-12-16 23:10 ` Ludovic Courtès
2010-12-23 23:51   ` Neil Jerram
2010-12-26 17:45     ` Andy Wingo
2011-01-21 23:05       ` Neil Jerram
2011-01-22 15:01         ` Andy Wingo
2011-01-11  6:52     ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).