unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* HTTP Request/Response questions
@ 2011-11-06  5:49 R. P. Dillon
  2011-11-06 11:18 ` Ian Price
  0 siblings, 1 reply; 7+ messages in thread
From: R. P. Dillon @ 2011-11-06  5:49 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 2630 bytes --]

I'm currently working on a project to gather RSS data using Guile.  I've
been working with both the stable 2.0.3 version and the latest git
repository.  I'm fairly new to Guile, though, so I might be approaching
this the wrong way.

As a test, I wanted to make an HTTP request.  This is a series of commands
I executed in the REPL to accomplish this (using Geiser in Emacs 24):

(use-modules (web request) (web response) (web uri) (rnrs bytevectors))

(define port (socket PF_INET SOCK_STREAM 0))
(define address (addrinfo:addr (car (getaddrinfo "www.google.com" "http"))))
(connect port address)
(define request (build-request (build-uri 'http #:host "www.google.com")))
(write-request request port)
(define response (read-response port))

(read-response ...) consistently fails with Google:

web/http.scm:754:6: In procedure parse-asctime-date:
web/http.scm:754:6: Throw to key `bad-header' with args `(date "-1")'.

The expiration is set to -1 in the headers, and this seems to cause a
problem for the web libraries in Guile.
This same request seems to work well for my own domain (killring.org).

I attempted a very similar series of commands to get RSS data for Google
News:

(define port (socket PF_INET SOCK_STREAM 0))
(define address (addrinfo:addr (car (getaddrinfo "news.google.com"
"http"))))
(connect port address)
(define request (build-request (build-uri 'http #:host "news.google.com"
#:path "/news?pz=1&cf=all&ned=us&hl=en&output=rss")))
(write-request request port)
(define response (read-response port))
(define body-vec (read-response-body response))

In this case, the (read-response-body ...) returns #f, although when I
pulled the data manually, there was XML data present in the body of the
response.

Similarly, when getting RSS information from Slashdot:

(define port (socket PF_INET SOCK_STREAM 0))
(define address (addrinfo:addr (car (getaddrinfo "rss.slashdot.org"
"http"))))
(connect port address)
(define request (build-request (build-uri 'http #:host "rss.slashdot.org"
#:path "/Slashdot/slashdot")))
(write-request request port)
(define response (read-response port))

I get the following error when reading the response:

web/http.scm:814:12: In procedure parse-entity-tag:
web/http.scm:814:12: Throw to key `bad-header' with args `(qstring
"F+oOJMkOlp2n1IUbAJmq+7qCGuk")'.

which I haven't fully tracked down yet.

I have a feeling I'm using the API incorrectly, though I've pored over the
documentation the best I can to figure out how to make these requests and
parse the responses.  Short of writing my own implementation, is there
anything I should be doing to make this work?

Thanks,
Rick

[-- Attachment #2: Type: text/html, Size: 3750 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: HTTP Request/Response questions
  2011-11-06  5:49 HTTP Request/Response questions R. P. Dillon
@ 2011-11-06 11:18 ` Ian Price
  2011-11-06 16:55   ` R. P. Dillon
  2012-01-09 19:16   ` Andy Wingo
  0 siblings, 2 replies; 7+ messages in thread
From: Ian Price @ 2011-11-06 11:18 UTC (permalink / raw)
  To: R. P. Dillon; +Cc: guile-user

"R. P. Dillon" <rpdillon@gmail.com> writes:

> I'm currently working on a project to gather RSS data using Guile.  I've been
I've done that. I highly recommend sxpath for this job.

> working with both the stable 2.0.3 version and the latest git repository.  I'm
> fairly new to Guile, though, so I might be approaching this the wrong way.
>
> As a test, I wanted to make an HTTP request.  This is a series of commands I
> executed in the REPL to accomplish this (using Geiser in Emacs 24):
>
> (use-modules (web request) (web response) (web uri) (rnrs bytevectors))
>
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "www.google.com" "http"))))
> (connect port address)
> (define request (build-request (build-uri 'http #:host "www.google.com")))
> (write-request request port)
> (define response (read-response port))
>
> (read-response ...) consistently fails with Google:
>
> web/http.scm:754:6: In procedure parse-asctime-date:
> web/http.scm:754:6: Throw to key `bad-header' with args `(date "-1")'.
I can confirm this with (call-with-input-string "Date: -1\r\n\r\n" parse-headers)

>
> The expiration is set to -1 in the headers, and this seems to cause a problem
> for the web libraries in Guile.
This is not IIRC a valid Date header, but is this common value? If so, it
may be worth making an exception for it.

> This same request seems to work well for my own domain (killring.org).
>
> I attempted a very similar series of commands to get RSS data for Google News:
>
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "news.google.com" "http"))))
> (connect port address)
> (define request (build-request (build-uri 'http #:host "news.google.com"
> #:path "/news?pz=1&cf=all&ned=us&hl=en&output=rss")))
> (write-request request port)
> (define response (read-response port))
> (define body-vec (read-response-body response))
>
> In this case, the (read-response-body ...) returns #f, although when I pulled
> the data manually, there was XML data present in the body of the response.
I have also experienced this problem. read-response-body returns #f if
there is no content-length header, which usually means chunked
encoding.

I have a patch to deal with this, but I have not received any
feedback on my proposed functions, so I haven't posted it
yet. Basically, I wanted to add 4 functions, including a
read-chunked-response-body, and to have the (web client) handle
chunked-encoding transparently.

>
> Similarly, when getting RSS information from Slashdot:
>
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "rss.slashdot.org" "http"))))
> (connect port address)
> (define request (build-request (build-uri 'http #:host "rss.slashdot.org"
> #:path "/Slashdot/slashdot")))
> (write-request request port)
> (define response (read-response port))
>
> I get the following error when reading the response:
>
> web/http.scm:814:12: In procedure parse-entity-tag:
> web/http.scm:814:12: Throw to key `bad-header' with args `(qstring
> "F+oOJMkOlp2n1IUbAJmq+7qCGuk")'.
>
> which I haven't fully tracked down yet.
I came across this issue already, and in my case it was because some servers
(gws, I think) don't quote their Etags. Feedburner was a common
culprit. All in all, not common, but a nuisance. Using 'declare-header!'
from the (web http) library, you can cause Etags not to be parsed by doing

(declare-header! "Etag" values string? display)

Although, I'd think it much nicer if guile were to expose
declare-opaque-header! directly for these sorts of circumstances.

>
> I have a feeling I'm using the API incorrectly, though I've pored over the
> documentation the best I can to figure out how to make these requests and
> parse the responses.  Short of writing my own implementation, is there
> anything I should be doing to make this work?
No no, you're using it right :) Although the (web client) module will be
more convenient usually. For example,

scheme@(guile−user)> ,use (web client)
scheme@(guile−user)> http-get
$11 = #<procedure http−get (uri #:key port version keep−alive? extra−headers decode−body?)>
scheme@(guile−user)> (http-get (string->uri "http://www.google.com"))
$12 = #<<response> version: (1 . 1) code: 302 reason−phrase: "Found" headers: ((location . #<<uri> scheme: http userinfo: #f host: "www.google.co.uk" port: #f path: "/" query: #f fragment: #f>) (cache−control private) (content−type text/html (charset . "UTF−8")) (set−cookie . "PREF=ID=3c2c9fc50c288823:FF=0:TM=1320578334:LM=1320578334:S=Gtrhd05V1tRopJyZ; expires=Tue, 05−Nov−2013 11:18:54 GMT; path=/; domain=.google.com") (date . #<date nanosecond: 0 second: 54 minute: 18 hour: 11 day: 6 month: 11 year: 2011 zone−offset: 0>) (server . "gws") (content−length . 221) (x−xss−protection . "1; mode=block") (x−frame−options . "SAMEORIGIN") (connection close)) port: #<closed: file 0>>
$13 = "<HTML><HEAD><meta http−equiv=\"content−type\" content=\"text/html;charset=utf−8\">
<TITLE>302 Moved</TITLE></HEAD><BODY>
<H1>302 Moved</H1>
The document has moved
<A HREF=\"http://www.google.co.uk/\">here</A>.\r
</BODY></HTML>\r
"
scheme@(guile−user)> 

>
> Thanks,
> Rick
>

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: HTTP Request/Response questions
  2011-11-06 11:18 ` Ian Price
@ 2011-11-06 16:55   ` R. P. Dillon
  2011-11-06 21:04     ` Ian Price
  2012-01-09 19:16   ` Andy Wingo
  1 sibling, 1 reply; 7+ messages in thread
From: R. P. Dillon @ 2011-11-06 16:55 UTC (permalink / raw)
  To: Ian Price; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 1518 bytes --]

Thanks for your response, Ian.  I don't know how I missed the (web client)
module, but it's right there in my info page.

I've been experimenting with it, but am having similar problem to those
outlined below.  I'm going to start reading some of the code, but my
initial impression is that there's lots of loose interpretation (or at
least execution) of the specs in the servers I'm testing on (Google, CNN)
that are causing errors, e.g.

(http-get (string->uri "http://www.cnn.com"))

yields:

web/client.scm:109:4: In procedure http-get:
web/client.scm:109:4: Throw to key `bad-response' with args `("EOF while
reading response body: ~a bytes of ~a" (18576 106274))'.

In web/client.scm:
    109:4  0 (http-get #<<uri> scheme: http userinfo: #f host: "www.cnn.com"
port: #f path: "" query: #f fragment: #f> #:port #<input-o…> …)

In your google.com web client example, the request seemed to return the
body of the document, but I'm still encountering the -1 expiration problem.
(Guile 2.0.3, though I think I'll go back to the git repo if I can work
around a recent compilation error that showed up).

It might be useful for me to see if I can make the parsing functions more
permissive, since they are (correctly) throwing errors for some common
servers.  Unfortunately, I don't know that much about the innards of HTTP,
but I'm sure I can look at where the errors are generated and short circuit
some of the logic and see what happens.  =)

Thanks for your help with this.

Rick

[-- Attachment #2: Type: text/html, Size: 2875 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: HTTP Request/Response questions
  2011-11-06 16:55   ` R. P. Dillon
@ 2011-11-06 21:04     ` Ian Price
  2011-11-07  8:24       ` Thien-Thi Nguyen
  0 siblings, 1 reply; 7+ messages in thread
From: Ian Price @ 2011-11-06 21:04 UTC (permalink / raw)
  To: R. P. Dillon; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 1706 bytes --]

"R. P. Dillon" <rpdillon@gmail.com> writes:

> (http-get (string->uri "http://www.cnn.com"))
>
> yields:
>
> web/client.scm:109:4: In procedure http-get:
> web/client.scm:109:4: Throw to key `bad-response' with args `("EOF while
> reading response body: ~a bytes of ~a" (18576 106274))'.
>
> In web/client.scm:
>     109:4  0 (http-get #<<uri> scheme: http userinfo: #f host: "www.cnn.com"
> port: #f path: "" query: #f fragment: #f> #:port #<input-o…> …)
I see, http-get by default sends a "Connection: close" header, which is
probably responsible for this behaviour. Using the keep-alive keyword
argument should rectify this.

  (http-get (string->uri "http://www.cnn.com") #:keep-alive? #t)

> In your google.com web client example, the request seemed to return the body
> of the document, but I'm still encountering the -1 expiration problem. (Guile
> 2.0.3, though I think I'll go back to the git repo if I can work around a
> recent compilation error that showed up).
If you aren't needing the date header, then I'd suggest doing the same
for the date header as I did for the etag header. It's a band-aid, but
I'm not really sure why you'd be getting a -1 date.

> Thanks for your help with this.
No problem.

I've also attached a patch for _reading_ chunk-encoded data. It will
also modify http-get to handle that for you.


Other Guilers,

If you use the web modules, _please_ comment on my suggestions for
chunked encoding support. See
http://article.gmane.org/gmane.lisp.guile.devel/12814 for details.

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: patch for guile --]
[-- Type: text/x-patch, Size: 4784 bytes --]

From f58482fcae11690b23924334f7b89ba136a7fddc Mon Sep 17 00:00:00 2001
From: Ian Price <ianprice90@googlemail.com>
Date: Sun, 6 Nov 2011 20:42:25 +0000
Subject: [PATCH] Add support for transfer-encoded responses

---
 module/web/client.scm              |    4 ++-
 module/web/response.scm            |   46 ++++++++++++++++++++++++++++++++++++
 test-suite/tests/web-response.test |   25 +++++++++++++++++++
 3 files changed, 74 insertions(+), 1 deletions(-)

diff --git a/module/web/client.scm b/module/web/client.scm
index 6a04497..78d5201 100644
--- a/module/web/client.scm
+++ b/module/web/client.scm
@@ -107,7 +107,9 @@
     (if (not keep-alive?)
         (shutdown port 1))
     (let* ((res (read-response port))
-           (body (read-response-body res)))
+           (body (if (member '(chunked) (response-transfer-encoding res))
+                     (read-chunked-response-body res)
+                     (read-response-body res))))
       (if (not keep-alive?)
           (close-port port))
       (values res
diff --git a/module/web/response.scm b/module/web/response.scm
index 6283772..e24ac0b 100644
--- a/module/web/response.scm
+++ b/module/web/response.scm
@@ -20,6 +20,8 @@
 ;;; Code:
 
 (define-module (web response)
+  #:use-module (srfi srfi-1)
+  #:use-module (rnrs control)
   #:use-module (rnrs bytevectors)
   #:use-module (ice-9 binary-ports)
   #:use-module (ice-9 rdelim)
@@ -39,6 +41,7 @@
             read-response-body
             write-response-body
 
+            read-chunked-response-body
             ;; General headers
             ;;
             response-cache-control
@@ -230,6 +233,49 @@ on @var{port}, perhaps using some transfer encoding."
 response @var{r}."
   (put-bytevector (response-port r) bv))
 
+
+(define (read-chunk-header port)
+  (let* ((str (read-line port))
+         (extension-start (string-index str (lambda (c) (or (char=? c #\;)
+                                                       (char=? c #\return)))))
+         (size (string->number (if extension-start ; unnecessary?
+                                   (substring str 0 extension-start)
+                                   str)
+                               16)))
+    size))
+
+(define (read-chunk port)
+  (let ((size (read-chunk-header port)))
+    (read-chunk-body port size)))
+
+(define (read-chunk-body port size)
+  (let ((bv (get-bytevector-n port size)))
+    (get-u8 port)                       ; CR
+    (get-u8 port)                       ; LF
+    bv))
+
+(define (read-chunked-response-body r)
+  (let ((port (response-port r)))
+    (let loop ((chunks '()))
+      (let ((chunk (read-chunk port)))
+        (if (zero? (bytevector-length chunk))
+            (bytevector-concatenate (reverse! chunks))
+            (loop (cons chunk chunks)))))))
+
+(define (bytevector-concatenate bvs)
+  (let* ((total-length (fold (lambda (bv total)
+                               (+ (bytevector-length bv) total))
+                             0
+                             bvs))
+         (result (make-bytevector total-length)))
+    (let loop ((start 0) (bvs bvs))
+      (unless (null? bvs)
+        (let ((len (bytevector-length (car bvs))))
+          (bytevector-copy! (car bvs) 0 result start len)
+          (loop (+ start len) (cdr bvs)))))
+    result))
+
+
 (define-syntax define-response-accessor
   (lambda (x)
     (syntax-case x ()
diff --git a/test-suite/tests/web-response.test b/test-suite/tests/web-response.test
index a21a702..bc55704 100644
--- a/test-suite/tests/web-response.test
+++ b/test-suite/tests/web-response.test
@@ -40,6 +40,19 @@ Content-Type: text/html; charset=utf-8\r
 \r
 abcdefghijklmnopqrstuvwxyz0123456789")
 
+(define example-2
+  "HTTP/1.1 200 OK\r
+Transfer-Encoding: chunked\r
+Content-Type: text/plain\r
+\r
+1c\r
+Lorem ipsum dolor sit amet, \r
+1d\r
+consectetur adipisicing elit,\r
+43\r
+ sed do eiusmod tempor incididunt ut labore et dolore magna aliqua.\r
+0\r\n")
+
 (define (responses-equal? r1 body1 r2 body2)
   (and (equal? (response-version r1) (response-version r2))
        (equal? (response-code r1) (response-code r2))
@@ -100,3 +113,15 @@ abcdefghijklmnopqrstuvwxyz0123456789")
 
     (pass-if "by accessor"
       (equal? (response-content-encoding r) '(gzip)))))
+
+
+(with-test-prefix "example-2"
+  (let* ((r (read-response (open-input-string example-2)))
+         (b (read-chunked-response-body r)))
+    (pass-if (equal? '((chunked))
+                     (response-transfer-encoding r)))
+    (pass-if (equal? b
+                     (string->utf8
+                      (string-append
+                       "Lorem ipsum dolor sit amet, consectetur adipisicing elit,"
+                       " sed do eiusmod tempor incididunt ut labore et dolore magna aliqua."))))))
-- 
1.7.6.4


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: HTTP Request/Response questions
  2011-11-06 21:04     ` Ian Price
@ 2011-11-07  8:24       ` Thien-Thi Nguyen
  2011-11-08 18:46         ` Ian Price
  0 siblings, 1 reply; 7+ messages in thread
From: Thien-Thi Nguyen @ 2011-11-07  8:24 UTC (permalink / raw)
  To: Ian Price; +Cc: guile-user

() Ian Price <ianprice90@googlemail.com>
() Sun, 06 Nov 2011 21:04:31 +0000

   If you use the web modules, _please_ comment on my suggestions for
   chunked encoding support. See

I don't use those modules (yet?), but i did notice something:

   +         (extension-start (string-index str (lambda (c) (or (char=? c #\;)
   +                                                       (char=? c #\return)))))

SRFI 13 ‘string-index’ takes a character-set object as well as a predicate.
Going that way could be more efficient (if you pre-construct the charset).



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: HTTP Request/Response questions
  2011-11-07  8:24       ` Thien-Thi Nguyen
@ 2011-11-08 18:46         ` Ian Price
  0 siblings, 0 replies; 7+ messages in thread
From: Ian Price @ 2011-11-08 18:46 UTC (permalink / raw)
  To: Thien-Thi Nguyen; +Cc: guile-user

Thien-Thi Nguyen <ttn@gnuvola.org> writes:

> SRFI 13 ‘string-index’ takes a character-set object as well as a predicate.
> Going that way could be more efficient (if you pre-construct the charset).
I'll make sure to try it out, but I'm don't think it would make much of
a difference.

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: HTTP Request/Response questions
  2011-11-06 11:18 ` Ian Price
  2011-11-06 16:55   ` R. P. Dillon
@ 2012-01-09 19:16   ` Andy Wingo
  1 sibling, 0 replies; 7+ messages in thread
From: Andy Wingo @ 2012-01-09 19:16 UTC (permalink / raw)
  To: Ian Price; +Cc: guile-user

On Sun 06 Nov 2011 12:18, Ian Price <ianprice90@googlemail.com> writes:

> I have a patch to deal with this, but I have not received any
> feedback on my proposed functions, so I haven't posted it
> yet. Basically, I wanted to add 4 functions, including a
> read-chunked-response-body, and to have the (web client) handle
> chunked-encoding transparently.

How about not exporting read-chunked-response-body, but instead having
read-response-body check for chunked encoding itself?

>> (define request (build-request (build-uri 'http #:host "rss.slashdot.org"
>> #:path "/Slashdot/slashdot")))
>> (write-request request port)
>> (define response (read-response port))
>>
>> I get the following error when reading the response:
>>
>> web/http.scm:814:12: In procedure parse-entity-tag:
>> web/http.scm:814:12: Throw to key `bad-header' with args `(qstring
>> "F+oOJMkOlp2n1IUbAJmq+7qCGuk")'.

> I came across this issue already, and in my case it was because some servers
> (gws, I think) don't quote their Etags. Feedburner was a common
> culprit. All in all, not common, but a nuisance. Using 'declare-header!'
> from the (web http) library, you can cause Etags not to be parsed by doing
>
> (declare-header! "Etag" values string? display)

Should we fix the etag parser to allow unquoted strings?

> Although, I'd think it much nicer if guile were to expose
> declare-opaque-header! directly for these sorts of circumstances.

If you document it, sure.

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-01-09 19:16 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-11-06  5:49 HTTP Request/Response questions R. P. Dillon
2011-11-06 11:18 ` Ian Price
2011-11-06 16:55   ` R. P. Dillon
2011-11-06 21:04     ` Ian Price
2011-11-07  8:24       ` Thien-Thi Nguyen
2011-11-08 18:46         ` Ian Price
2012-01-09 19:16   ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).