unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] web: http: Accept blank Content-Type headers.
@ 2015-07-24  2:17 David Thompson
  2015-07-24  2:24 ` David Thompson
  2015-07-27 21:05 ` Mark H Weaver
  0 siblings, 2 replies; 5+ messages in thread
From: David Thompson @ 2015-07-24  2:17 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 729 bytes --]

Hello,

I encountered a bug in the HTTP header parsing bug when trying to
download a file via Guix.  The response had a Content-Type header, but
with no value, like so:

    Content-Type: 

From reading the W3C spec[0], an unknown Content-Type header can be
treated as if it were an application/octet-stream type.  I'm unsure if
that means even Content-Type values that have invalid syntax should be
accepted or not, so I didn't try to handle them.  I'm not even sure if
we should be translating the empty string to application/octet-stream.
I know that it shouldn't throw an excpetion, but maybe it should just
return the empty list and the user can decide how to interpret it?

Anyway, here's a patch to get the ball rolling.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-web-http-Accept-blank-Content-Type-headers.patch --]
[-- Type: text/x-patch, Size: 1890 bytes --]

From c9994a7b94ab2c43adfea2980da99515e6292e16 Mon Sep 17 00:00:00 2001
From: David Thompson <dthompson2@worcester.edu>
Date: Thu, 23 Jul 2015 22:07:53 -0400
Subject: [PATCH] web: http: Accept blank Content-Type headers.

* module/web/http.scm (parse-media-type): Return
  'application/octet-stream' when given the empty string.
* test-suite/tests/web-http.test ("entity headers"): Add test.
---
 module/web/http.scm            | 10 +++++++---
 test-suite/tests/web-http.test |  1 +
 2 files changed, 8 insertions(+), 3 deletions(-)

diff --git a/module/web/http.scm b/module/web/http.scm
index f8aa8db..6f0f6fd 100644
--- a/module/web/http.scm
+++ b/module/web/http.scm
@@ -271,9 +271,13 @@ as an ordered alist."
     (and idx (= idx (string-rindex str #\/))
          (not (string-index str separators-without-slash)))))
 (define (parse-media-type str)
-  (if (validate-media-type str)
-      (string->symbol str)
-      (bad-header-component 'media-type str)))
+  (cond
+   ((string-null? str)
+    'application/octet-stream)
+   ((validate-media-type str)
+    (string->symbol str))
+   (else
+    (bad-header-component 'media-type str))))
 
 (define* (skip-whitespace str #:optional (start 0) (end (string-length str)))
   (let lp ((i start))
diff --git a/test-suite/tests/web-http.test b/test-suite/tests/web-http.test
index c59674f..0675dbc 100644
--- a/test-suite/tests/web-http.test
+++ b/test-suite/tests/web-http.test
@@ -267,6 +267,7 @@
   (pass-if-parse content-range "bytes */30" '(bytes * 30))
   (pass-if-parse content-type "foo/bar" '(foo/bar))
   (pass-if-parse content-type "foo/bar; baz=qux" '(foo/bar (baz . "qux")))
+  (pass-if-parse content-type "" '(application/octet-stream))
   (pass-if-parse expires "Tue, 15 Nov 1994 08:12:31 GMT"
                  (string->date "Tue, 15 Nov 1994 08:12:31 +0000"
                          "~a, ~d ~b ~Y ~H:~M:~S ~z"))
-- 
2.4.3


[-- Attachment #3: Type: text/plain, Size: 38 bytes --]


-- 
David Thompson
GPG Key: 0FF1D807

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* Re: [PATCH] web: http: Accept blank Content-Type headers.
  2015-07-24  2:17 [PATCH] web: http: Accept blank Content-Type headers David Thompson
@ 2015-07-24  2:24 ` David Thompson
  2015-07-27 21:05 ` Mark H Weaver
  1 sibling, 0 replies; 5+ messages in thread
From: David Thompson @ 2015-07-24  2:24 UTC (permalink / raw)
  To: David Thompson, guile-devel

Forgot the footnote:

[0] http://www.w3.org/Protocols/rfc1341/4_Content-Type.html

-- 
David Thompson
GPG Key: 0FF1D807



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] web: http: Accept blank Content-Type headers.
  2015-07-24  2:17 [PATCH] web: http: Accept blank Content-Type headers David Thompson
  2015-07-24  2:24 ` David Thompson
@ 2015-07-27 21:05 ` Mark H Weaver
  2015-07-28  0:06   ` Thompson, David
  2015-07-28  0:12   ` Thompson, David
  1 sibling, 2 replies; 5+ messages in thread
From: Mark H Weaver @ 2015-07-27 21:05 UTC (permalink / raw)
  To: David Thompson; +Cc: guile-devel

David Thompson <davet@gnu.org> writes:

> I encountered a bug in the HTTP header parsing bug when trying to
> download a file via Guix.  The response had a Content-Type header, but
> with no value, like so:
>
>     Content-Type: 
>
> From reading the W3C spec[0], an unknown Content-Type header can be
> treated as if it were an application/octet-stream type.

An empty string is not merely an "unknown" Content-Type header.  It is
blatantly invalid syntax.  It would be good to contact the web site
owner and ask them to fix it.

Since web clients seem to accept just about anything these days, and web
servers have adapted to this by producing garbage, it may be that we
need to add a "permissive" mode that sifts through the garbage and uses
heuristics to try to make some sense of it.

However, I'm not sure it makes sense to handle this particular case of
an empty Content-Type header specially, at that this place in the code.
Do we have any other examples of this particular error?

I realize that it's more work, but I would prefer to retain a mode that
reports errors (possibly making a few compromises for very widespread
errors), and then to somehow implement another mode that accepts
*anything* and does its best to make sense of it.

What do you think?

Thanks for working on it,

      Mark



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] web: http: Accept blank Content-Type headers.
  2015-07-27 21:05 ` Mark H Weaver
@ 2015-07-28  0:06   ` Thompson, David
  2015-07-28  0:12   ` Thompson, David
  1 sibling, 0 replies; 5+ messages in thread
From: Thompson, David @ 2015-07-28  0:06 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel, David Thompson

On Mon, Jul 27, 2015 at 5:05 PM, Mark H Weaver <mhw@netris.org> wrote:
> David Thompson <davet@gnu.org> writes:
>
>> I encountered a bug in the HTTP header parsing bug when trying to
>> download a file via Guix.  The response had a Content-Type header, but
>> with no value, like so:
>>
>>     Content-Type:
>>
>> From reading the W3C spec[0], an unknown Content-Type header can be
>> treated as if it were an application/octet-stream type.
>
> An empty string is not merely an "unknown" Content-Type header.  It is
> blatantly invalid syntax.  It would be good to contact the web site
> owner and ask them to fix it.

> Since web clients seem to accept just about anything these days, and web
> servers have adapted to this by producing garbage, it may be that we
> need to add a "permissive" mode that sifts through the garbage and uses
> heuristics to try to make some sense of it.
>
> However, I'm not sure it makes sense to handle this particular case of
> an empty Content-Type header specially, at that this place in the code.
> Do we have any other examples of this particular error?
>
> I realize that it's more work, but I would prefer to retain a mode that
> reports errors (possibly making a few compromises for very widespread
> errors), and then to somehow implement another mode that accepts
> *anything* and does its best to make sense of it.
>
> What do you think?
>
> Thanks for working on it,
>
>       Mark
>



^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [PATCH] web: http: Accept blank Content-Type headers.
  2015-07-27 21:05 ` Mark H Weaver
  2015-07-28  0:06   ` Thompson, David
@ 2015-07-28  0:12   ` Thompson, David
  1 sibling, 0 replies; 5+ messages in thread
From: Thompson, David @ 2015-07-28  0:12 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel, David Thompson

Apologies for the previous email.  I seriously botched some keystrokes.

On Mon, Jul 27, 2015 at 5:05 PM, Mark H Weaver <mhw@netris.org> wrote:
> David Thompson <davet@gnu.org> writes:
>
>> I encountered a bug in the HTTP header parsing bug when trying to
>> download a file via Guix.  The response had a Content-Type header, but
>> with no value, like so:
>>
>>     Content-Type:
>>
>> From reading the W3C spec[0], an unknown Content-Type header can be
>> treated as if it were an application/octet-stream type.
>
> An empty string is not merely an "unknown" Content-Type header.  It is
> blatantly invalid syntax.  It would be good to contact the web site
> owner and ask them to fix it.

Yes, I have done so.  I haven't heard back yet.  I hope they take this
seriously.

> Since web clients seem to accept just about anything these days, and web
> servers have adapted to this by producing garbage, it may be that we
> need to add a "permissive" mode that sifts through the garbage and uses
> heuristics to try to make some sense of it.
>
> However, I'm not sure it makes sense to handle this particular case of
> an empty Content-Type header specially, at that this place in the code.
> Do we have any other examples of this particular error?

No, just this one.  I think getting rubygems.org to fix the issue will
be better than adding a special case to the HTTP header parser.

> I realize that it's more work, but I would prefer to retain a mode that
> reports errors (possibly making a few compromises for very widespread
> errors), and then to somehow implement another mode that accepts
> *anything* and does its best to make sense of it.
>
> What do you think?

Yes, that makes sense.  I don't have the drive to attempt that right
now, but thanks for sharing your thoughts about this.

Thanks!

- Dave



^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2015-07-28  0:12 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-07-24  2:17 [PATCH] web: http: Accept blank Content-Type headers David Thompson
2015-07-24  2:24 ` David Thompson
2015-07-27 21:05 ` Mark H Weaver
2015-07-28  0:06   ` Thompson, David
2015-07-28  0:12   ` Thompson, David

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).