unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* new module: (web uri)
@ 2010-10-17 19:30 Andy Wingo
  2010-10-17 20:05 ` Mike Gran
  2010-10-17 20:33 ` Ludovic Courtès
  0 siblings, 2 replies; 5+ messages in thread
From: Andy Wingo @ 2010-10-17 19:30 UTC (permalink / raw)
  To: guile-devel

Hello,

I added a RFC 3986-compatible URI parser to Guile, as (web uri). It's
not documented yet unfortunately, but there it is.

The goal is to build up to having an HTTP client and a toy HTTP server
in Guile itself. Obviously this coincides with Guile-WWW in scope; I've
chosen the (web ...) namespace so as not to conflict.

Even though Guile-WWW is GPL, I think basing new modules on old
Guile-WWW is OK, as the FSF has copyright and can do the GPL -> LGPL
thing without problems. But don't incorporate code that has folks other
than the FSF in the copyright.

So I think the thing would be to implement a "request" object, a simple
client, and a simple server, and whatever else a client and server
need. But not, for example, all of the server-utils that are in
guile-www now; building real web servers is hard and not settled, so we
should punt for now.

(Why have a server at all, you ask? It keeps us honest, for one. Also
it's useful for prototyping. For example currenty I have a web
application I'm updating, and I don't know whether to install apache and
build mod_lisp, switch to fastcgi and implement that, or do a whole
server, or if i use fastcgi then do I do apache or nginx or gnu
myserver?

All these questions are besides the current thing I want to check, which
is, is this darn thing working? For that I need a simple server.)

Anyway, pasting (web uri) here at the end of the mail. Comments welcome!

Cheers,

Andy

;;;; (web uri) --- URI manipulation tools
;;;;
;;;; Copyright (C) 1997,2001,2002,2010 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either
;;;; version 3 of the License, or (at your option) any later version.
;;;; 
;;;; This library is distributed in the hope that it will be useful,
;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
;;;; Lesser General Public License for more details.
;;;; 
;;;; You should have received a copy of the GNU Lesser General Public
;;;; License along with this library; if not, write to the Free Software
;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
;;;;

;;; Commentary:

;; Based on (www url). To be documented.

;;; Code:

(define-module (web uri)
  #:export (uri?
            uri-scheme uri-userinfo uri-host uri-port
            uri-path uri-query uri-fragment

            build-uri
            parse-uri unparse-uri
            uri-decode uri-encode
            split-and-decode-uri-path
            encode-and-join-uri-path)
  #:use-module (srfi srfi-9)
  #:use-module (ice-9 regex)
  #:use-module (ice-9 control)
  #:use-module (rnrs bytevectors)
  #:use-module (rnrs io ports))

(define-record-type <uri>
  (make-uri scheme userinfo host port path query fragment)
  uri?
  (scheme uri-scheme)
  (userinfo uri-userinfo)
  (host uri-host)
  (port uri-port)
  (path uri-path)
  (query uri-query)
  (fragment uri-fragment))

(define (positive-exact-integer? port)
  (and (number? port) (exact? port) (integer? port) (positive? port)))

(define (validate-uri scheme userinfo host port path query fragment)
  (cond
   ((not (symbol? scheme))
    (error "expected a symbol for the URI scheme" scheme))
   ((and (or userinfo port) (not host))
    (error "expected host, given userinfo or port"))
   ((and port (not (positive-exact-integer? port)))
    (error "expected integer port" port))
   ((and host (or (not (string? host)) (not (valid-host? host))))
    (error "expected valid host" host))
   ((and userinfo (not (string? userinfo)))
    (error "expected string for userinfo" userinfo))
   ((not (string? path))
    (error "expected string for path" path))
   ((and host (not (string-null? path))
         (not (eqv? (string-ref path 0) #\/)))
    (error "expected path of absolute URI to start with a /" path))))

(define* (build-uri scheme #:key userinfo host port (path "") query fragment
                    (validate? #t))
  (if validate?
      (validate-uri scheme userinfo host port path query fragment))
  (make-uri scheme userinfo host port path query fragment))

;; See RFC 3986 #3.2.2 for comments on percent-encodings, IDNA (RFC
;; 3490), and non-ASCII host names.
;;
(define ipv4-regexp
  (make-regexp "^([0-9.]+)"))
(define ipv6-regexp
  (make-regexp "^\\[([0-9a-fA-F:]+)\\]+"))
(define domain-label-regexp
  (make-regexp "^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?$"))
(define top-label-regexp
  (make-regexp "^[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?$"))

(define (valid-host? host)
  (cond
   ((regexp-exec ipv4-regexp host)
    => (lambda (m)
         (false-if-exception (inet-pton AF_INET (match:substring m 1)))))
   ((regexp-exec ipv6-regexp host)
    => (lambda (m)
         (false-if-exception (inet-pton AF_INET6 (match:substring m 1)))))
   (else
    (let ((labels (reverse (string-split host #\.))))
      (and (pair? labels)
           (regexp-exec top-label-regexp (car labels))
           (and-map (lambda (label)
                      (regexp-exec domain-label-regexp label))
                    (cdr labels)))))))

(define userinfo-pat
  "[a-zA-Z0-9_.!~*'();:&=+$,-]+")
(define host-pat
  "[a-zA-Z0-9.-]+")
(define port-pat
  "[0-9]*")
(define authority-regexp
  (make-regexp
   (format #f "^//((~a)@)?(~a)(:(~a))?$"
           userinfo-pat host-pat port-pat)))

(define (parse-authority authority fail)
  (let ((m (regexp-exec authority-regexp authority)))
    (if (and m (valid-host? (match:substring m 3)))
        (values (match:substring m 2)
                (match:substring m 3)
                (let ((port (match:substring m 5)))
                  (and port (not (string-null? port))
                       (string->number port))))
        (fail))))


;;; RFC 3986, #3.
;;;
;;;   URI         = scheme ":" hier-part [ "?" query ] [ "#" fragment ]
;;;
;;;   hier-part   = "//" authority path-abempty
;;;               / path-absolute
;;;               / path-rootless
;;;               / path-empty

(define scheme-pat
  "[a-zA-Z][a-zA-Z0-9+.-]*")
(define authority-pat
  "[^/?#]*")
(define path-pat
  "[^?#]*")
(define query-pat
  "[^#]*")
(define fragment-pat
  ".*")
(define uri-pat
  (format #f "^(~a):(//~a)?(~a)(\\?(~a))?(#(~a))?$"
          scheme-pat authority-pat path-pat query-pat fragment-pat))
(define uri-regexp
  (make-regexp uri-pat))

(define (parse-uri string)
  (% (let ((m (regexp-exec uri-regexp string)))
       (if (not m) (abort))
       (let ((scheme (string->symbol
                      (string-downcase (match:substring m 1))))
             (authority (match:substring m 2))
             (path (match:substring m 3))
             (query (match:substring m 5))
             (fragment (match:substring m 7)))
         (call-with-values
             (lambda ()
               (if authority
                   (parse-authority authority abort)
                   (values #f #f #f)))
           (lambda (userinfo host port)
             (make-uri scheme userinfo host port path query fragment)))))
     (lambda (k)
       #f)))

(define (unparse-uri uri)
  (let* ((scheme-str (string-append
                      (symbol->string (uri-scheme uri)) ":"))
         (userinfo (uri-userinfo uri))
         (host (uri-host uri))
         (port (uri-port uri))
         (path (uri-path uri))
         (query (uri-query uri))
         (fragment (uri-fragment uri)))
    (string-append
     scheme-str
     (if host
         (string-append "//"
                        (if userinfo (string-append userinfo "@")
                            "")
                        host
                        (if port
                            (string-append ":" (number->string port))
                            ""))
         "")
     path
     (if query
         (string-append "?" query)
         "")
     (if fragment
         (string-append "#" fragment)
         ""))))


;; A note on characters and bytes: URIs are defined to be sequences of
;; characters in a subset of ASCII. Those characters may encode a
;; sequence of bytes (octets), which in turn may encode sequences of
;; characters in other character sets.
;;

;; Return a new string made from uri-decoding @var{str}.  Specifically,
;; turn @code{+} into space, and hex-encoded @code{%XX} strings into
;; their eight-bit characters.
;;
(define hex-chars
  (string->char-set "0123456789abcdefABCDEF"))

(define* (uri-decode str #:key (charset 'utf-8))
  (let ((len (string-length str)))
    (call-with-values open-bytevector-output-port
      (lambda (port get-bytevector)
        (let lp ((i 0))
          (if (= i len)
              ((case charset
                 ((utf-8) utf8->string)
                 ((#f) (lambda (x) x)) ; raw bytevector
                 (else (error "unknown charset" charset)))
               (get-bytevector))
              (let ((ch (string-ref str i)))
                (cond
                 ((eqv? ch #\+)
                  (put-u8 port (char->integer #\space))
                  (lp (1+ i)))
                 ((and (< (+ i 2) len) (eqv? ch #\%)
                       (let ((a (string-ref str (+ i 1)))
                             (b (string-ref str (+ i 2))))
                         (and (char-set-contains? hex-chars a)
                              (char-set-contains? hex-chars b)
                              (string->number (string a b) 16))))
                  => (lambda (u8)
                       (put-u8 port u8)
                       (lp (+ i 3))))
                 ((< (char->integer ch) 128)
                  (put-u8 port (char->integer ch))
                  (lp (1+ i)))
                 (else
                  (error "invalid character in encoded URI" str ch))))))))))
  
(define ascii-alnum-chars
  (string->char-set
   "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789"))

;; RFC 3986, #2.2.
(define gen-delims
  (string->char-set ":/?#[]@"))
(define sub-delims
  (string->char-set "!$&'()*+,l="))
(define reserved-chars
  (char-set-union gen-delims sub-delims))

;; RFC 3986, #2.3
(define unreserved-chars
  (char-set-union ascii-alnum-chars
                  (string->char-set "-._~")))

;; Return a new string made from uri-encoding @var{str}, unconditionally
;; transforming any characters not in @var{unescaped-chars}.
;;
(define* (uri-encode str #:key (charset 'utf-8)
                     (unescaped-chars unreserved-chars))
  (define (put-utf8 binary-port str)
    (put-bytevector binary-port (string->utf8 str)))

  ((case charset
     ((utf-8) utf8->string)
     ((#f) (lambda (x) x)) ; raw bytevector
     (else (error "unknown charset" charset)))
   (call-with-values open-bytevector-output-port
     (lambda (port get-bytevector)
       (string-for-each
        (lambda (ch)
          (if (char-set-contains? unescaped-chars ch)
              (put-utf8 port (string ch))
              (let* ((utf8 (string->utf8 (string ch)))
                     (len (bytevector-length utf8)))
                ;; Encode each byte.
                (let lp ((i 0))
                  (if (< i len)
                      (begin
                        (put-utf8 port (string #\%))
                        (put-utf8 port
                                  (number->string (bytevector-u8-ref utf8 i) 16))
                        (lp (1+ i))))))))
        str)
       (get-bytevector)))))

(define (split-and-decode-uri-path path)
  (filter (lambda (x) (not (string-null? x)))
          (map uri-decode (string-split path #\/))))

(define (encode-and-join-uri-path parts)
  (string-join (map uri-encode parts) "/"))


-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: new module: (web uri)
  2010-10-17 19:30 new module: (web uri) Andy Wingo
@ 2010-10-17 20:05 ` Mike Gran
  2010-10-19 18:41   ` Andy Wingo
  2010-10-17 20:33 ` Ludovic Courtès
  1 sibling, 1 reply; 5+ messages in thread
From: Mike Gran @ 2010-10-17 20:05 UTC (permalink / raw)
  To: Andy Wingo, guile-devel

> Hello,


> 
> The goal is to  build up to having an HTTP client and a toy HTTP server
> in Guile itself.  Obviously this coincides with Guile-WWW in scope; I've
> chosen the (web ...)  namespace so as not to conflict.

Hi Andy,

As you probably know, there are already a couple of guile webservers.
Last time I checked TTN's sizzweb was probably the best maintained.

If you're avoiding that for license reasons, I'd suggest wrapping 
up GNU libmicrohttpd.

Writing yet another webserver for fun is one thing, but, writing yet
another webserver for inclusion in Guile is the path of madness, because
you'd be implying that it is, in some sense, validated.  A webserver
is oh so deceptively simple...  But looking at the http11 parser
in Mongrel2, for example, shows how it is harder than it looks.

BTW, I haven't tried it yet, but, a Guile / Mongrel2 could make for
a great platform, with not too much glue code.

But, just my 2 cents.

Regard,

Mike Gran




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: new module: (web uri)
  2010-10-17 19:30 new module: (web uri) Andy Wingo
  2010-10-17 20:05 ` Mike Gran
@ 2010-10-17 20:33 ` Ludovic Courtès
  2010-10-19 18:45   ` Andy Wingo
  1 sibling, 1 reply; 5+ messages in thread
From: Ludovic Courtès @ 2010-10-17 20:33 UTC (permalink / raw)
  To: guile-devel

Guten Abend!

Andy Wingo <wingo@pobox.com> writes:

> I added a RFC 3986-compatible URI parser to Guile, as (web uri). It's
> not documented yet unfortunately, but there it is.

Great!

> Even though Guile-WWW is GPL,

Though if it made things easier, I’d be fine keeping it GPL’d.

> Anyway, pasting (web uri) here at the end of the mail. Comments welcome!

This looks nice to me, except for the lack of docstrings (which I’ve
come to value thanks to Geiser ;-)) and ‘userinfo’ -> ‘user-info’.  :-)

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: new module: (web uri)
  2010-10-17 20:05 ` Mike Gran
@ 2010-10-19 18:41   ` Andy Wingo
  0 siblings, 0 replies; 5+ messages in thread
From: Andy Wingo @ 2010-10-19 18:41 UTC (permalink / raw)
  To: Mike Gran; +Cc: guile-devel

Heya Mike,

On Sun 17 Oct 2010 22:05, Mike Gran <spk121@yahoo.com> writes:

> Writing yet another webserver for fun is one thing, but, writing yet
> another webserver for inclusion in Guile is the path of madness, because
> you'd be implying that it is, in some sense, validated.  A webserver
> is oh so deceptively simple...  But looking at the http11 parser
> in Mongrel2, for example, shows how it is harder than it looks.

Yeah I certainly don't want to bless a web server, or even write a
proper one. It would be nice though if we had a *toy* web server.

For example, as I mentioned I was setting up my web app, and I just
wanted to see if it worked. If I could run a toy server and check that
it worked, that would be great. Then I could choose mongrel2 or nginx or
whatever with fastcgi or mod_proxy or whatever, and I have some freedom
in that respect, because I know that somewhere there is a (lambda
(request) ...) that doesn't care much about how the actual web server is
implemented.

You could be right though, this might be a bad idea. But at the very
least we do need an HTTP client in Guile if we are ever to make an
ELPA-alike (or cpan-alike, if you prefer), and for that we need a URI
lib, an http client, perhaps request and response object, base64 and
mime encoding, etc.

> BTW, I haven't tried it yet, but, a Guile / Mongrel2 could make for
> a great platform, with not too much glue code.

Yeah that sounds like fun! Microhttpd also sounds interesting for toy
usage, will check it out.

Peace,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: new module: (web uri)
  2010-10-17 20:33 ` Ludovic Courtès
@ 2010-10-19 18:45   ` Andy Wingo
  0 siblings, 0 replies; 5+ messages in thread
From: Andy Wingo @ 2010-10-19 18:45 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel

On Sun 17 Oct 2010 22:33, ludo@gnu.org (Ludovic Courtès) writes:

>> Even though Guile-WWW is GPL,
>
> Though if it made things easier, I’d be fine keeping it GPL’d.

I would prefer to have Guile under one license, all things equal; any
serious server work, for example, would go outside of Guile and thus be
happily GPL-able, i would think.

>> Anyway, pasting (web uri) here at the end of the mail. Comments welcome!
>
> This looks nice to me, except for the lack of docstrings (which I’ve
> come to value thanks to Geiser ;-)) and ‘userinfo’ -> ‘user-info’.  :-)

OK :)

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2010-10-19 18:45 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2010-10-17 19:30 new module: (web uri) Andy Wingo
2010-10-17 20:05 ` Mike Gran
2010-10-19 18:41   ` Andy Wingo
2010-10-17 20:33 ` Ludovic Courtès
2010-10-19 18:45   ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).