unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* HTTP "Expires" header should handle non-date values
@ 2011-11-27 10:39 Daniel Hartwig
  2011-12-22  2:51 ` bug#10147: " Andy Wingo
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Hartwig @ 2011-11-27 10:39 UTC (permalink / raw)
  To: R. P. Dillon; +Cc: bug-guile, guile-user

[-- Attachment #1: Type: text/plain, Size: 1951 bytes --]

Package: guile
Version: 2.0.3
Tags: patch

On 6 November 2011 13:49, R. P. Dillon <rpdillon@gmail.com> wrote:
> (use-modules (web request) (web response) (web uri) (rnrs bytevectors))
> (define port (socket PF_INET SOCK_STREAM 0))
> (define address (addrinfo:addr (car (getaddrinfo "www.google.com" "http"))))
> (connect port address)
> (define request (build-request (build-uri 'http #:host "www.google.com")))
> (write-request request port)
> (define response (read-response port))
> (read-response ...) consistently fails with Google:
> web/http.scm:754:6: In procedure parse-asctime-date:
> web/http.scm:754:6: Throw to key `bad-header' with args `(date "-1")'.
> The expiration is set to -1 in the headers, and this seems to cause a
> problem for the web libraries in Guile.
> This same request seems to work well for my own domain (killring.org).

This is definitely a bug on Guile's part, HTTP/1.1 permits such values
for "Expires" headers [1], treating them as though they were a date in
the past:

   HTTP/1.1 clients and caches MUST treat other invalid date formats,
   especially including the value "0", as in the past (i.e., "already
   expired").

[1] http://tools.ietf.org/html/rfc2616#section-14.21

Attached patch permits non-date values for "Expires", leaving them as
strings (preferable, as such responses can be transparently forwarded
to other clients). The staleness of a response could be determined
quite crudely, e.g.

(define (response-stale? r)
  (let ((expires (response-expires r)))
    (and expires
         (or (not (date? expires)) ;; Indicates already expired.
             (time<=? (date->time-utc expires)
                      (current-time))))))

This approach completely ignores the recommended way of determining
whether a response has expired.  See section 13.2.4 of the RFC for
calculations involving various factors such as the time that a request
was sent, "Cache-Control" directives, etc.


Regards

Daniel

[-- Attachment #2: 0001-Permit-non-date-values-for-Expires-header.patch --]
[-- Type: text/x-patch, Size: 1833 bytes --]

From dcbcd94db89a8e73ebc3089a860575bbeb8d1708 Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Sun, 27 Nov 2011 18:25:55 +0800
Subject: [PATCH] Permit non-date values for "Expires" header.

* module/web/http.scm ("Expires"): Permit non-date values.
---
 module/web/http.scm            |   20 +++++++++++++++++++-
 test-suite/tests/web-http.test |    1 +
 2 files changed, 20 insertions(+), 1 deletions(-)

diff --git a/module/web/http.scm b/module/web/http.scm
index dc742a1..5eb2e90 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -1500,7 +1500,25 @@ phrase\"."
 
 ;; Expires = HTTP-date
 ;;
-(declare-date-header! "Expires")
+;; ...
+;; HTTP/1.1 clients and caches MUST treat other invalid date formats,
+;; especially including the value "0", as in the past (i.e., "already
+;; expired").
+;;
+(declare-header! "Expires"
+  (lambda (str)
+    (or (false-if-exception (parse-date str))
+        str))
+  (lambda (val)
+    (or (string? val) (date? val)))
+  (lambda (val port)
+    (cond
+     ((string? val)
+      (display val port))
+     ((date? val)
+      (write-date val port))
+     (else
+      (bad-header-component 'expires val)))))
 
 ;; Last-Modified = HTTP-date
 ;;
diff --git a/test-suite/tests/web-http.test b/test-suite/tests/web-http.test
index b6abbf3..c881223 100644
--- a/test-suite/tests/web-http.test
+++ b/test-suite/tests/web-http.test
@@ -134,6 +134,7 @@
   (pass-if-parse expires "Tue, 15 Nov 1994 08:12:31 GMT"
                  (string->date "Tue, 15 Nov 1994 08:12:31 +0000"
                          "~a, ~d ~b ~Y ~H:~M:~S ~z"))
+  (pass-if-parse expires "0" "0")
   (pass-if-parse last-modified "Tue, 15 Nov 1994 08:12:31 GMT"
                  (string->date "Tue, 15 Nov 1994 08:12:31 +0000"
                          "~a, ~d ~b ~Y ~H:~M:~S ~z")))
-- 
1.7.2.5


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#10147: HTTP "Expires" header should handle non-date values
  2011-11-27 10:39 HTTP "Expires" header should handle non-date values Daniel Hartwig
@ 2011-12-22  2:51 ` Andy Wingo
  2011-12-22  4:28   ` Daniel Hartwig
  0 siblings, 1 reply; 6+ messages in thread
From: Andy Wingo @ 2011-12-22  2:51 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: R. P. Dillon, guile-user, 10147

Hi Daniel,

So sorry for the delay.

On Sun 27 Nov 2011 05:39, Daniel Hartwig <mandyke@gmail.com> writes:

> This is definitely a bug on Guile's part, HTTP/1.1 permits such values
> for "Expires" headers [1], treating them as though they were a date in
> the past:
>
>    HTTP/1.1 clients and caches MUST treat other invalid date formats,
>    especially including the value "0", as in the past (i.e., "already
>    expired").
>
> [1] http://tools.ietf.org/html/rfc2616#section-14.21

But that's right after saying

   The format is an absolute date and time as defined by HTTP-date in
   section 3.3.1; it MUST be in RFC 1123 date format:

      Expires = "Expires" ":" HTTP-date

But, pragmatism may rule, here...

> Attached patch permits non-date values for "Expires", leaving them as
> strings (preferable, as such responses can be transparently forwarded
> to other clients). The staleness of a response could be determined
> quite crudely, e.g.
>
> (define (response-stale? r)
>   (let ((expires (response-expires r)))
>     (and expires
>          (or (not (date? expires)) ;; Indicates already expired.
>              (time<=? (date->time-utc expires)
>                       (current-time))))))

Let us assume that it is a good idea to include this hack.  Wouldn't it
be better to keep the expires header as a date?  Would any date in the
past work fine?

Would it be best to allow some special cases like "0" or "-1" instead?

I'm just trying to limit the damage here :)  WDYT?

Andy
-- 
http://wingolog.org/





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#10147: HTTP "Expires" header should handle non-date values
  2011-12-22  2:51 ` bug#10147: " Andy Wingo
@ 2011-12-22  4:28   ` Daniel Hartwig
  2011-12-22 12:35     ` Andy Wingo
  0 siblings, 1 reply; 6+ messages in thread
From: Daniel Hartwig @ 2011-12-22  4:28 UTC (permalink / raw)
  To: Andy Wingo; +Cc: R. P. Dillon, guile-user, 10147

On 22 December 2011 10:51, Andy Wingo <wingo@pobox.com> wrote:
>
> On Sun 27 Nov 2011 05:39, Daniel Hartwig <mandyke@gmail.com> writes:
>
>> This is definitely a bug on Guile's part, HTTP/1.1 permits such values
>> for "Expires" headers [1], treating them as though they were a date in
>> the past:
>>
>>    HTTP/1.1 clients and caches MUST treat other invalid date formats,
>>    especially including the value "0", as in the past (i.e., "already
>>    expired").
>>
>> [1] http://tools.ietf.org/html/rfc2616#section-14.21
>
> But that's right after saying
>
>   The format is an absolute date and time as defined by HTTP-date in
>   section 3.3.1; it MUST be in RFC 1123 date format:
>
>      Expires = "Expires" ":" HTTP-date
>
> But, pragmatism may rule, here...
>

... especially given that players like Google are using "-1" ;-)

> ... Wouldn't it
> be better to keep the expires header as a date?  Would any date in the
> past work fine?
>

That is what I initially considered.  A great solution because it
requires no changes to existing code which would be expecting a date
value.  I think any date would do, ideally it would match the value of
the Date header, but the current parsing code does not allow for
access to it.  I considered using a single, well defined value for
date-in-the-past (Unix epoch).

The *only* concern I had with this approach is wrt implementing a
cache/proxy.  My idea was that by storing the non-date values as a
string you can store/forward these unmodified and still check for the
"already expired" condition.

Admitedly this is a very minor concern, as there is no change in
semantics at the protocol level -- both approaches result in the
client understanding that the content is already expired.

I think what I came up with was a solution in need of a problem
(which should be solved more generally across the whole module
any way).


> Would it be best to allow some special cases like "0" or "-1" instead?
>

Not sure precisely what you mean here.  Is it something like:

(or (false-if-exception (parse-date str))
    (and (memq str '("0" "-1")) str)
    date-in-the-past)

?

> I'm just trying to limit the damage here :)  WDYT?
>

I am certainly in favour of keeping the value as a date to achieve
this aim.





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#10147: HTTP "Expires" header should handle non-date values
  2011-12-22  4:28   ` Daniel Hartwig
@ 2011-12-22 12:35     ` Andy Wingo
  2011-12-27 15:49       ` Daniel Hartwig
       [not found]       ` <CAN3veRfgg+dDzjMy1L8xaAcaZ82dAuFM1dnNpGbzq-5ckoVsAA@mail.gmail.com>
  0 siblings, 2 replies; 6+ messages in thread
From: Andy Wingo @ 2011-12-22 12:35 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: R. P. Dillon, guile-user, 10147

On Wed 21 Dec 2011 23:28, Daniel Hartwig <mandyke@gmail.com> writes:

> On 22 December 2011 10:51, Andy Wingo <wingo@pobox.com> wrote:
>>
>> On Sun 27 Nov 2011 05:39, Daniel Hartwig <mandyke@gmail.com> writes:
>>
>>>    HTTP/1.1 clients and caches MUST treat other invalid date formats,
>>>    especially including the value "0", as in the past (i.e., "already
>>>    expired").
>>
>> But, pragmatism may rule, here...
>
> ... especially given that players like Google are using "-1" ;-)

Yes, indeed.

>> ... Wouldn't it
>> be better to keep the expires header as a date?  Would any date in the
>> past work fine?
>
> That is what I initially considered.  I considered using a single,
> well defined value for date-in-the-past (Unix epoch).

I think I would prefer this.  It makes user code easier, and with more
of a chance of being correct.  I think that Mozilla at least used to use
the beginnning of the epoch as this date.

>> Would it be best to allow some special cases like "0" or "-1" instead?
>>
>
> Not sure precisely what you mean here.  Is it something like:
>
> (or (false-if-exception (parse-date str))
>     (and (memq str '("0" "-1")) str)
>     date-in-the-past)

More like:

  (if (member str '("0" "-1"))
      date-in-the-past
      (parse-date str))

Then we can wait and see -- if only these two values are out there, then
we are good, and we keep the "validating" characteristic of our date
parser.  Otherwise we can fall back to the false-if-exception dance if
someone submits a bug report.

WDYT?  Want to send another patch? :-)

Andy
-- 
http://wingolog.org/





^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#10147: HTTP "Expires" header should handle non-date values
  2011-12-22 12:35     ` Andy Wingo
@ 2011-12-27 15:49       ` Daniel Hartwig
       [not found]       ` <CAN3veRfgg+dDzjMy1L8xaAcaZ82dAuFM1dnNpGbzq-5ckoVsAA@mail.gmail.com>
  1 sibling, 0 replies; 6+ messages in thread
From: Daniel Hartwig @ 2011-12-27 15:49 UTC (permalink / raw)
  To: Andy Wingo; +Cc: guile-user, 10147

[-- Attachment #1: Type: text/plain, Size: 1672 bytes --]

On 22 December 2011 20:35, Andy Wingo <wingo@pobox.com> wrote:
>> Not sure precisely what you mean here.  Is it something like:
>>
>> (or (false-if-exception (parse-date str))
>>     (and (memq str '("0" "-1")) str)
>>     date-in-the-past)
>
> More like:
>
>  (if (member str '("0" "-1"))
>      date-in-the-past
>      (parse-date str))
>
> Then we can wait and see -- if only these two values are out there, then
> we are good, and we keep the "validating" characteristic of our date
> parser.  Otherwise we can fall back to the false-if-exception dance if
> someone submits a bug report.

A rough check against ~2600 sites scraped from dmoz.org shows only a
handful with other values.  These two:

"Mon, 12 Jul 1996 1:00:00 GMT"
                  ^ misses leading `0'
"Thu, 01 Jan 1970 00:00:00 +0000"
                           ^ should be `GMT'

The second (use of `+0000') was also encountered amongst other
date-valued headers in ~1% of pages sampled.  There might be a case
here for relaxing `parse-date' as I don't think these should be
handled specifically for "Expires" headers.

There were three more like:

"{ts '2011-12-27 08:12:22'}"

which only appeared for "Expires" headers.  They look something like
server directives which should have been transformed to legit
expiration dates but haven't been, due to misconfiguration.  In this
case I'd rather throw an error than parse it (wrongly) to
date-in-the-past.

Given those points, I have attached a patch implementing the suggested
handling for "Expires" and will take a look at perhaps relaxing
parse-date (and others).  Anyone have ideas on that?


Daniel

[-- Attachment #2: 0001-permit-non-date-values-for-Expires-header.patch --]
[-- Type: text/x-patch, Size: 889 bytes --]

From 8b7eda0bd7b03467f6eef0ce6c99dedf8fd3ac0c Mon Sep 17 00:00:00 2001
From: Daniel Hartwig <mandyke@gmail.com>
Date: Tue, 27 Dec 2011 22:24:28 +0800
Subject: [PATCH] permit non-date values for Expires header

* module/web/http.scm ("Expires"): Permit (some) non-date values.
---
 module/web/http.scm |   10 +++++++++-
 1 files changed, 9 insertions(+), 1 deletions(-)

diff --git a/module/web/http.scm b/module/web/http.scm
index afe70a7..9bb4449 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -1506,7 +1506,15 @@ phrase\"."
 
 ;; Expires = HTTP-date
 ;;
-(declare-date-header! "Expires")
+(define *date-in-the-past* (parse-date "Thu, 01 Jan 1970 00:00:00 GMT"))
+
+(declare-header! "Expires"
+  (lambda (str)
+    (if (member str '("0" "-1"))
+        *date-in-the-past*
+        (parse-date str)))
+  date?
+  write-date)
 
 ;; Last-Modified = HTTP-date
 ;;
-- 
1.7.5.4


^ permalink raw reply related	[flat|nested] 6+ messages in thread

* bug#10147: HTTP "Expires" header should handle non-date values
       [not found]       ` <CAN3veRfgg+dDzjMy1L8xaAcaZ82dAuFM1dnNpGbzq-5ckoVsAA@mail.gmail.com>
@ 2012-01-09 22:36         ` Andy Wingo
  0 siblings, 0 replies; 6+ messages in thread
From: Andy Wingo @ 2012-01-09 22:36 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-user, 10147-done

Hi Daniel,

Thanks very much for the thorough analysis!

On Tue 27 Dec 2011 16:49, Daniel Hartwig <mandyke@gmail.com> writes:

> Given those points, I have attached a patch implementing the suggested
> handling for "Expires" and will take a look at perhaps relaxing
> parse-date (and others).  Anyone have ideas on that?

I applied your patch, and I think some sensible parse-date relaxations
are a good idea too.

Regards,

Andy
-- 
http://wingolog.org/





^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2012-01-09 22:36 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2011-11-27 10:39 HTTP "Expires" header should handle non-date values Daniel Hartwig
2011-12-22  2:51 ` bug#10147: " Andy Wingo
2011-12-22  4:28   ` Daniel Hartwig
2011-12-22 12:35     ` Andy Wingo
2011-12-27 15:49       ` Daniel Hartwig
     [not found]       ` <CAN3veRfgg+dDzjMy1L8xaAcaZ82dAuFM1dnNpGbzq-5ckoVsAA@mail.gmail.com>
2012-01-09 22:36         ` Andy Wingo

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).