unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* header parsing
@ 2011-09-09 18:42 Ian Price
  2011-09-09 19:08 ` Ian Price
  2011-09-10 18:20 ` Andy Wingo
  0 siblings, 2 replies; 4+ messages in thread
From: Ian Price @ 2011-09-09 18:42 UTC (permalink / raw)
  To: bug-guile


Hi guilers,

Writing a quick script the other day I came across two issues when
guile parses http headers. The first is that 'parse-rfc-822-date' does
not allow for single digit days, which are allowed in rfc822/1123 [0].

Secondly, a non-guile problem, is that many sites don't quote their
etags. While I can (and did) account for this myself with
'declare-header!', I was wondering if it would be better to modify
'parse-qstring' to fallback to 'parse-opaque-string' if the start isn't
a #\" character. This would be in line with Postel's law[1], but the
commentary at the top of http.scm says there shouldn't be too many
compatibility hacks.

Patches are supplied for both.

[0]. https://tools.ietf.org/html/rfc822#section-5.1 (see 1*2DIGIT)
[1]. Be conservative in what you send; be liberal in what you accept.
-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"




^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: header parsing
  2011-09-09 18:42 header parsing Ian Price
@ 2011-09-09 19:08 ` Ian Price
  2011-09-10 18:20 ` Andy Wingo
  1 sibling, 0 replies; 4+ messages in thread
From: Ian Price @ 2011-09-09 19:08 UTC (permalink / raw)
  To: bug-guile

[-- Attachment #1: Type: text/plain, Size: 196 bytes --]


> Patches are supplied for both.

Apparently not. ;-)

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: date patch --]
[-- Type: text/x-patch, Size: 3128 bytes --]

From 8cde08a514ff1c0d4c09dbfd2d2ae50dc090db46 Mon Sep 17 00:00:00 2001
From: Ian Price <ianprice90@googlemail.com>
Date: Fri, 9 Sep 2011 20:02:34 +0100
Subject: [PATCH 1/2] RFC 822 allows single digit days of the month

* module/web/http.scm (parse-rfc-822-date): Add single digit day
  conditional.
* test-suite/tests/web-http.test("general headers"): Add test.
---
 module/web/http.scm            |   28 +++++++++++++++++++---------
 test-suite/tests/web-http.test |    3 +++
 2 files changed, 22 insertions(+), 9 deletions(-)

diff --git a/module/web/http.scm b/module/web/http.scm
index 21874ee..70db813 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -702,15 +702,25 @@ ordered alist."
 ;; 0         1         2
 (define (parse-rfc-822-date str)
   ;; We could verify the day of the week but we don't.
-  (if (not (string-match? str "aaa, dd aaa dddd dd:dd:dd GMT"))
-      (bad-header 'date str))
-  (let ((date (parse-non-negative-integer str 5 7))
-        (month (parse-month str 8 11))
-        (year (parse-non-negative-integer str 12 16))
-        (hour (parse-non-negative-integer str 17 19))
-        (minute (parse-non-negative-integer str 20 22))
-        (second (parse-non-negative-integer str 23 25)))
-    (make-date 0 second minute hour date month year 0)))
+  (cond ((string-match? str "aaa, dd aaa dddd dd:dd:dd GMT")
+         (let ((date (parse-non-negative-integer str 5 7))
+               (month (parse-month str 8 11))
+               (year (parse-non-negative-integer str 12 16))
+               (hour (parse-non-negative-integer str 17 19))
+               (minute (parse-non-negative-integer str 20 22))
+               (second (parse-non-negative-integer str 23 25)))
+           (make-date 0 second minute hour date month year 0)))
+        ((string-match? str "aaa, d aaa dddd dd:dd:dd GMT")
+         (let ((date (parse-non-negative-integer str 5 6))
+               (month (parse-month str 7 10))
+               (year (parse-non-negative-integer str 11 15))
+               (hour (parse-non-negative-integer str 16 18))
+               (minute (parse-non-negative-integer str 19 21))
+               (second (parse-non-negative-integer str 22 24)))
+           (make-date 0 second minute hour date month year 0)))
+        (else
+         (bad-header 'date str)         ; prevent tail call
+         #f)))
 
 ;; RFC 850, updated by RFC 1036
 ;; Sunday, 06-Nov-94 08:49:37 GMT
diff --git a/test-suite/tests/web-http.test b/test-suite/tests/web-http.test
index c191c6e..e4d6efb 100644
--- a/test-suite/tests/web-http.test
+++ b/test-suite/tests/web-http.test
@@ -89,6 +89,9 @@
   (pass-if-parse date "Tue, 15 Nov 1994 08:12:31 GMT"
                  (string->date "Tue, 15 Nov 1994 08:12:31 +0000"
                                "~a, ~d ~b ~Y ~H:~M:~S ~z"))
+  (pass-if-parse date "Wed, 7 Sep 2011 11:25:00 GMT"
+                 (string->date "Wed, 7 Sep 2011 11:25:00 +0000"
+                               "~a,~e ~b ~Y ~H:~M:~S ~z"))
   (pass-if-parse-error date "Tue, 15 Nov 1994 08:12:31 EST" date)
   (pass-if-any-error date "Tue, 15 Qux 1994 08:12:31 EST")
 
-- 
1.7.6


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #3: qstring patch --]
[-- Type: text/x-patch, Size: 3024 bytes --]

From 2899c848e7ef82cd03b3a65b48ed3e74c1caa4a0 Mon Sep 17 00:00:00 2001
From: Ian Price <ianprice90@googlemail.com>
Date: Fri, 9 Sep 2011 20:05:13 +0100
Subject: [PATCH 2/2] Allow unquoted Etags.

* module/web/http.scm (parse-qstring): If qstring doesn't start with a
  #\", then treat as opaque.
* test-suite/tests/web-http.test("response headers"): Add test.
---
 module/web/http.scm            |   34 ++++++++++++++++++----------------
 test-suite/tests/web-http.test |    1 +
 2 files changed, 19 insertions(+), 16 deletions(-)

diff --git a/module/web/http.scm b/module/web/http.scm
index 70db813..82c1f87 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -317,22 +317,24 @@ ordered alist."
 (define* (parse-qstring str #:optional
                         (start 0) (end (trim-whitespace str start))
                         #:key incremental?)
-  (if (and (< start end) (eqv? (string-ref str start) #\"))
-      (let lp ((i (1+ start)) (qi 0) (escapes '()))
-        (if (< i end)
-            (case (string-ref str i)
-              ((#\\)
-               (lp (+ i 2) (1+ qi) (cons qi escapes)))
-              ((#\")
-               (let ((out (collect-escaped-string str (1+ start) qi escapes)))
-                 (if incremental?
-                     (values out (1+ i))
-                     (if (= (1+ i) end)
-                         out
-                         (bad-header-component 'qstring str)))))
-              (else
-               (lp (1+ i) (1+ qi) escapes)))
-            (bad-header-component 'qstring str)))
+  (if (and (< start end))
+      (if (eqv? (string-ref str start) #\")
+          (let lp ((i (1+ start)) (qi 0) (escapes '()))
+            (if (< i end)
+                (case (string-ref str i)
+                  ((#\\)
+                   (lp (+ i 2) (1+ qi) (cons qi escapes)))
+                  ((#\")
+                   (let ((out (collect-escaped-string str (1+ start) qi escapes)))
+                     (if incremental?
+                         (values out (1+ i))
+                         (if (= (1+ i) end)
+                             out
+                             (bad-header-component 'qstring str)))))
+                  (else
+                   (lp (1+ i) (1+ qi) escapes)))
+                (bad-header-component 'qstring str)))
+          (parse-opaque-string (substring str start end)))
       (bad-header-component 'qstring str)))
 
 (define (write-list l port write-item delim)
diff --git a/test-suite/tests/web-http.test b/test-suite/tests/web-http.test
index e4d6efb..4591cd1 100644
--- a/test-suite/tests/web-http.test
+++ b/test-suite/tests/web-http.test
@@ -199,6 +199,7 @@
   (pass-if-parse accept-ranges "foo,bar" '(foo bar))
   (pass-if-parse age "30" 30)
   (pass-if-parse etag "\"foo\"" '("foo" . #t))
+  (pass-if-parse etag "1315389780000" '("1315389780000" . #t))
   (pass-if-parse etag "W/\"foo\"" '("foo" . #f))
   (pass-if-parse location "http://other-place"
                  (build-uri 'http #:host "other-place"))
-- 
1.7.6


^ permalink raw reply related	[flat|nested] 4+ messages in thread

* Re: header parsing
  2011-09-09 18:42 header parsing Ian Price
  2011-09-09 19:08 ` Ian Price
@ 2011-09-10 18:20 ` Andy Wingo
  2011-09-10 20:02   ` Ian Price
  1 sibling, 1 reply; 4+ messages in thread
From: Andy Wingo @ 2011-09-10 18:20 UTC (permalink / raw)
  To: Ian Price; +Cc: bug-guile

Hi Ian,

On Fri 09 Sep 2011 11:42, Ian Price <ianprice90@googlemail.com> writes:

> Writing a quick script the other day I came across two issues when
> guile parses http headers. The first is that 'parse-rfc-822-date' does
> not allow for single digit days, which are allowed in rfc822/1123 [0].

Applied, thanks.

> Secondly, a non-guile problem, is that many sites don't quote their
> etags. While I can (and did) account for this myself with
> 'declare-header!', I was wondering if it would be better to modify
> 'parse-qstring' to fallback to 'parse-opaque-string' if the start isn't
> a #\" character. This would be in line with Postel's law[1], but the
> commentary at the top of http.scm says there shouldn't be too many
> compatibility hacks.

I'm a bit hesitant on this one, as parse-qstring is used in so many
other places.  Perhaps change this one to add parse-opaque-or-qstring or
something, and call that from parse-etag?  Dunno.  Are there really many
sites that do this to their etags?

Cheers,

Andy
-- 
http://wingolog.org/



^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: header parsing
  2011-09-10 18:20 ` Andy Wingo
@ 2011-09-10 20:02   ` Ian Price
  0 siblings, 0 replies; 4+ messages in thread
From: Ian Price @ 2011-09-10 20:02 UTC (permalink / raw)
  To: Andy Wingo; +Cc: bug-guile

Andy Wingo <wingo@pobox.com> writes:

> I'm a bit hesitant on this one, as parse-qstring is used in so many
> other places.  Perhaps change this one to add parse-opaque-or-qstring or
> something, and call that from parse-etag?  Dunno.  Are there really many
> sites that do this to their etags?

With the script in question it seemed to happen about 10% of the time,
but, I decided to write a quick and dirty script to see how many sites
did this. I used the top 30 sites according to alexa.org, top 30 on
reddit as of now, and the contents of my RSS reader (not necessarily
representative, of course), but I actually only got 1 wrong etag. Maybe
I just got unlucky yesterday.*shrugs*. Best just to ignore the patch,
but thanks anyway.

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2011-09-10 20:02 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-09-09 18:42 header parsing Ian Price
2011-09-09 19:08 ` Ian Price
2011-09-10 18:20 ` Andy Wingo
2011-09-10 20:02   ` Ian Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).