From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!not-for-mail From: Andy Wingo Newsgroups: gmane.lisp.guile.devel Subject: new module: (web uri) Date: Sun, 17 Oct 2010 21:30:31 +0200 Message-ID: NNTP-Posting-Host: lo.gmane.org Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii X-Trace: dough.gmane.org 1287343634 4111 80.91.229.12 (17 Oct 2010 19:27:14 GMT) X-Complaints-To: usenet@dough.gmane.org NNTP-Posting-Date: Sun, 17 Oct 2010 19:27:14 +0000 (UTC) To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Sun Oct 17 21:27:11 2010 Return-path: Envelope-to: guile-devel@m.gmane.org Original-Received: from lists.gnu.org ([199.232.76.165]) by lo.gmane.org with esmtp (Exim 4.69) (envelope-from ) id 1P7Ysw-0006Uf-4B for guile-devel@m.gmane.org; Sun, 17 Oct 2010 21:27:10 +0200 Original-Received: from localhost ([127.0.0.1]:48051 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P7Ysv-0000sc-4E for guile-devel@m.gmane.org; Sun, 17 Oct 2010 15:27:09 -0400 Original-Received: from [140.186.70.92] (port=48260 helo=eggs.gnu.org) by lists.gnu.org with esmtp (Exim 4.43) id 1P7Ysl-0000re-9A for guile-devel@gnu.org; Sun, 17 Oct 2010 15:27:00 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1P7Ysj-0001zO-Jx for guile-devel@gnu.org; Sun, 17 Oct 2010 15:26:59 -0400 Original-Received: from a-pb-sasl-quonix.pobox.com ([208.72.237.25]:37273 helo=sasl.smtp.pobox.com) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1P7Ysj-0001zB-GH for guile-devel@gnu.org; Sun, 17 Oct 2010 15:26:57 -0400 Original-Received: from sasl.smtp.pobox.com (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id F14F2DFF70 for ; Sun, 17 Oct 2010 15:26:56 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha1; c=relaxed; d=pobox.com; h=from:to :subject:date:message-id:mime-version:content-type; s=sasl; bh=y a+U5cJAdoz0icWKjIoXX0RjsI8=; b=n2pS2r5ob+t3A55DeT3GiKZRSVexA67GY fUJD2vvwS9zv9YlRvtOg1U2atWqx5IbgG+A/gPQk21ufZ3e6ba7oHa7hPYX2wFXu 797YoiDVFe9uBZeW1devrQNjfdZ0qWTHrO3Tq+wSIRK8B64WS0dFvabEKlPG9Fwi nbSudWNa7Y= DomainKey-Signature: a=rsa-sha1; c=nofws; d=pobox.com; h=from:to:subject :date:message-id:mime-version:content-type; q=dns; s=sasl; b=ZUT CfRn4dtlTiYhGj0arEmKQo2d6AD0oNnHAfKe4kzj1hbEJsgPH/PLoZQcIt+VeUS0 EP8SkdQz/+Dt3NEa9DVLmm48lc2j6hTOZbpHT2S7LzgpsLKj2+CfbkOPL+zsaAAK ElFmrlAdLZJpSnKqNTBQsu4ChCNgTErKw4fk2PTg= Original-Received: from a-pb-sasl-quonix. (unknown [127.0.0.1]) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTP id EE7E7DFF6F for ; Sun, 17 Oct 2010 15:26:56 -0400 (EDT) Original-Received: from unquote.localdomain (unknown [79.156.147.212]) (using TLSv1 with cipher DHE-RSA-AES256-SHA (256/256 bits)) (No client certificate requested) by a-pb-sasl-quonix.pobox.com (Postfix) with ESMTPSA id DBE41DFF6E for ; Sun, 17 Oct 2010 15:26:55 -0400 (EDT) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/23.2 (gnu/linux) X-Pobox-Relay-ID: 816CF49C-DA24-11DF-BEED-030CEE7EF46B-02397024!a-pb-sasl-quonix.pobox.com X-detected-operating-system: by eggs.gnu.org: Solaris 10 (beta) X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.5 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Errors-To: guile-devel-bounces+guile-devel=m.gmane.org@gnu.org Xref: news.gmane.org gmane.lisp.guile.devel:11051 Archived-At: Hello, I added a RFC 3986-compatible URI parser to Guile, as (web uri). It's not documented yet unfortunately, but there it is. The goal is to build up to having an HTTP client and a toy HTTP server in Guile itself. Obviously this coincides with Guile-WWW in scope; I've chosen the (web ...) namespace so as not to conflict. Even though Guile-WWW is GPL, I think basing new modules on old Guile-WWW is OK, as the FSF has copyright and can do the GPL -> LGPL thing without problems. But don't incorporate code that has folks other than the FSF in the copyright. So I think the thing would be to implement a "request" object, a simple client, and a simple server, and whatever else a client and server need. But not, for example, all of the server-utils that are in guile-www now; building real web servers is hard and not settled, so we should punt for now. (Why have a server at all, you ask? It keeps us honest, for one. Also it's useful for prototyping. For example currenty I have a web application I'm updating, and I don't know whether to install apache and build mod_lisp, switch to fastcgi and implement that, or do a whole server, or if i use fastcgi then do I do apache or nginx or gnu myserver? All these questions are besides the current thing I want to check, which is, is this darn thing working? For that I need a simple server.) Anyway, pasting (web uri) here at the end of the mail. Comments welcome! Cheers, Andy ;;;; (web uri) --- URI manipulation tools ;;;; ;;;; Copyright (C) 1997,2001,2002,2010 Free Software Foundation, Inc. ;;;; ;;;; This library is free software; you can redistribute it and/or ;;;; modify it under the terms of the GNU Lesser General Public ;;;; License as published by the Free Software Foundation; either ;;;; version 3 of the License, or (at your option) any later version. ;;;; ;;;; This library is distributed in the hope that it will be useful, ;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;;;; Lesser General Public License for more details. ;;;; ;;;; You should have received a copy of the GNU Lesser General Public ;;;; License along with this library; if not, write to the Free Software ;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA ;;;; ;;; Commentary: ;; Based on (www url). To be documented. ;;; Code: (define-module (web uri) #:export (uri? uri-scheme uri-userinfo uri-host uri-port uri-path uri-query uri-fragment build-uri parse-uri unparse-uri uri-decode uri-encode split-and-decode-uri-path encode-and-join-uri-path) #:use-module (srfi srfi-9) #:use-module (ice-9 regex) #:use-module (ice-9 control) #:use-module (rnrs bytevectors) #:use-module (rnrs io ports)) (define-record-type (make-uri scheme userinfo host port path query fragment) uri? (scheme uri-scheme) (userinfo uri-userinfo) (host uri-host) (port uri-port) (path uri-path) (query uri-query) (fragment uri-fragment)) (define (positive-exact-integer? port) (and (number? port) (exact? port) (integer? port) (positive? port))) (define (validate-uri scheme userinfo host port path query fragment) (cond ((not (symbol? scheme)) (error "expected a symbol for the URI scheme" scheme)) ((and (or userinfo port) (not host)) (error "expected host, given userinfo or port")) ((and port (not (positive-exact-integer? port))) (error "expected integer port" port)) ((and host (or (not (string? host)) (not (valid-host? host)))) (error "expected valid host" host)) ((and userinfo (not (string? userinfo))) (error "expected string for userinfo" userinfo)) ((not (string? path)) (error "expected string for path" path)) ((and host (not (string-null? path)) (not (eqv? (string-ref path 0) #\/))) (error "expected path of absolute URI to start with a /" path)))) (define* (build-uri scheme #:key userinfo host port (path "") query fragment (validate? #t)) (if validate? (validate-uri scheme userinfo host port path query fragment)) (make-uri scheme userinfo host port path query fragment)) ;; See RFC 3986 #3.2.2 for comments on percent-encodings, IDNA (RFC ;; 3490), and non-ASCII host names. ;; (define ipv4-regexp (make-regexp "^([0-9.]+)")) (define ipv6-regexp (make-regexp "^\\[([0-9a-fA-F:]+)\\]+")) (define domain-label-regexp (make-regexp "^[a-zA-Z0-9]([a-zA-Z0-9-]*[a-zA-Z0-9])?$")) (define top-label-regexp (make-regexp "^[a-zA-Z]([a-zA-Z0-9-]*[a-zA-Z0-9])?$")) (define (valid-host? host) (cond ((regexp-exec ipv4-regexp host) => (lambda (m) (false-if-exception (inet-pton AF_INET (match:substring m 1))))) ((regexp-exec ipv6-regexp host) => (lambda (m) (false-if-exception (inet-pton AF_INET6 (match:substring m 1))))) (else (let ((labels (reverse (string-split host #\.)))) (and (pair? labels) (regexp-exec top-label-regexp (car labels)) (and-map (lambda (label) (regexp-exec domain-label-regexp label)) (cdr labels))))))) (define userinfo-pat "[a-zA-Z0-9_.!~*'();:&=+$,-]+") (define host-pat "[a-zA-Z0-9.-]+") (define port-pat "[0-9]*") (define authority-regexp (make-regexp (format #f "^//((~a)@)?(~a)(:(~a))?$" userinfo-pat host-pat port-pat))) (define (parse-authority authority fail) (let ((m (regexp-exec authority-regexp authority))) (if (and m (valid-host? (match:substring m 3))) (values (match:substring m 2) (match:substring m 3) (let ((port (match:substring m 5))) (and port (not (string-null? port)) (string->number port)))) (fail)))) ;;; RFC 3986, #3. ;;; ;;; URI = scheme ":" hier-part [ "?" query ] [ "#" fragment ] ;;; ;;; hier-part = "//" authority path-abempty ;;; / path-absolute ;;; / path-rootless ;;; / path-empty (define scheme-pat "[a-zA-Z][a-zA-Z0-9+.-]*") (define authority-pat "[^/?#]*") (define path-pat "[^?#]*") (define query-pat "[^#]*") (define fragment-pat ".*") (define uri-pat (format #f "^(~a):(//~a)?(~a)(\\?(~a))?(#(~a))?$" scheme-pat authority-pat path-pat query-pat fragment-pat)) (define uri-regexp (make-regexp uri-pat)) (define (parse-uri string) (% (let ((m (regexp-exec uri-regexp string))) (if (not m) (abort)) (let ((scheme (string->symbol (string-downcase (match:substring m 1)))) (authority (match:substring m 2)) (path (match:substring m 3)) (query (match:substring m 5)) (fragment (match:substring m 7))) (call-with-values (lambda () (if authority (parse-authority authority abort) (values #f #f #f))) (lambda (userinfo host port) (make-uri scheme userinfo host port path query fragment))))) (lambda (k) #f))) (define (unparse-uri uri) (let* ((scheme-str (string-append (symbol->string (uri-scheme uri)) ":")) (userinfo (uri-userinfo uri)) (host (uri-host uri)) (port (uri-port uri)) (path (uri-path uri)) (query (uri-query uri)) (fragment (uri-fragment uri))) (string-append scheme-str (if host (string-append "//" (if userinfo (string-append userinfo "@") "") host (if port (string-append ":" (number->string port)) "")) "") path (if query (string-append "?" query) "") (if fragment (string-append "#" fragment) "")))) ;; A note on characters and bytes: URIs are defined to be sequences of ;; characters in a subset of ASCII. Those characters may encode a ;; sequence of bytes (octets), which in turn may encode sequences of ;; characters in other character sets. ;; ;; Return a new string made from uri-decoding @var{str}. Specifically, ;; turn @code{+} into space, and hex-encoded @code{%XX} strings into ;; their eight-bit characters. ;; (define hex-chars (string->char-set "0123456789abcdefABCDEF")) (define* (uri-decode str #:key (charset 'utf-8)) (let ((len (string-length str))) (call-with-values open-bytevector-output-port (lambda (port get-bytevector) (let lp ((i 0)) (if (= i len) ((case charset ((utf-8) utf8->string) ((#f) (lambda (x) x)) ; raw bytevector (else (error "unknown charset" charset))) (get-bytevector)) (let ((ch (string-ref str i))) (cond ((eqv? ch #\+) (put-u8 port (char->integer #\space)) (lp (1+ i))) ((and (< (+ i 2) len) (eqv? ch #\%) (let ((a (string-ref str (+ i 1))) (b (string-ref str (+ i 2)))) (and (char-set-contains? hex-chars a) (char-set-contains? hex-chars b) (string->number (string a b) 16)))) => (lambda (u8) (put-u8 port u8) (lp (+ i 3)))) ((< (char->integer ch) 128) (put-u8 port (char->integer ch)) (lp (1+ i))) (else (error "invalid character in encoded URI" str ch)))))))))) (define ascii-alnum-chars (string->char-set "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")) ;; RFC 3986, #2.2. (define gen-delims (string->char-set ":/?#[]@")) (define sub-delims (string->char-set "!$&'()*+,l=")) (define reserved-chars (char-set-union gen-delims sub-delims)) ;; RFC 3986, #2.3 (define unreserved-chars (char-set-union ascii-alnum-chars (string->char-set "-._~"))) ;; Return a new string made from uri-encoding @var{str}, unconditionally ;; transforming any characters not in @var{unescaped-chars}. ;; (define* (uri-encode str #:key (charset 'utf-8) (unescaped-chars unreserved-chars)) (define (put-utf8 binary-port str) (put-bytevector binary-port (string->utf8 str))) ((case charset ((utf-8) utf8->string) ((#f) (lambda (x) x)) ; raw bytevector (else (error "unknown charset" charset))) (call-with-values open-bytevector-output-port (lambda (port get-bytevector) (string-for-each (lambda (ch) (if (char-set-contains? unescaped-chars ch) (put-utf8 port (string ch)) (let* ((utf8 (string->utf8 (string ch))) (len (bytevector-length utf8))) ;; Encode each byte. (let lp ((i 0)) (if (< i len) (begin (put-utf8 port (string #\%)) (put-utf8 port (number->string (bytevector-u8-ref utf8 i) 16)) (lp (1+ i)))))))) str) (get-bytevector))))) (define (split-and-decode-uri-path path) (filter (lambda (x) (not (string-null? x))) (map uri-decode (string-split path #\/)))) (define (encode-and-join-uri-path parts) (string-join (map uri-encode parts) "/")) -- http://wingolog.org/