unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* Uploading Word documents, PDFs, PNG files etc
@ 2009-05-10 16:21 Sebastian Tennant
  2009-05-11 15:11 ` Ludovic Courtès
  2009-05-12  3:15 ` Thien-Thi Nguyen
  0 siblings, 2 replies; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-10 16:21 UTC (permalink / raw)
  To: guile-user

Hi Guilers,

The following works fine for plain text files but fails with Word
documents, PDFs, PNG files and no doubt other (binary?) file types.

This error msg, followed by the contents of the file, is dumped on
stderr each time:

 string contains #\nul character: "\x0d
 Content-Disposition: form-data; name=\"File-Upload\"; filename=\"eap_logo.png\"\x0d
 Content-Type: image/png\x0d
 \x0d
 .
 .


---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<

 (use-modules (www cgi))

 [...]

 (let* ((upload (cgi:upload "File-Upload"))
        (props-alist (object-property upload #:guile-www-cgi))
        ;;(object-property ...) is deprecated
        (upload-fname (transform-string
                       (assoc-ref props-alist #:filename)
                       #\  #\_)))
   (with-output-to-file (string-append USER-UPLOAD-DIR upload-fname)
     (lambda ()
       (display upload)))))

---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<---8<

How should I go about getting uploaded binary files out of memory and
onto disk?

Any help/advice/pointers much appreciated.

Seb

P.S. 'Writing' rather than 'displaying' the uploaded file makes no
     difference (and is not what I want).

-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-10 16:21 Uploading Word documents, PDFs, PNG files etc Sebastian Tennant
@ 2009-05-11 15:11 ` Ludovic Courtès
  2009-05-11 15:55   ` Thien-Thi Nguyen
  2009-05-12  3:15 ` Thien-Thi Nguyen
  1 sibling, 1 reply; 22+ messages in thread
From: Ludovic Courtès @ 2009-05-11 15:11 UTC (permalink / raw)
  To: guile-user

Hello,

Sebastian Tennant <sebyte@smolny.plus.com> writes:

>  string contains #\nul character: "\x0d

The problem is that Guile strings cannot contain null characters.  For
arbitrary binary data, other containers must be used, such as SRFI-4
u8vectors or R6RS bytevectors (from Guile-R6RS-Libs).

It may be a design flaw in Guile-WWW, which abuses strings to store
binary data.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-11 15:11 ` Ludovic Courtès
@ 2009-05-11 15:55   ` Thien-Thi Nguyen
  2009-05-11 23:17     ` Ludovic Courtès
  0 siblings, 1 reply; 22+ messages in thread
From: Thien-Thi Nguyen @ 2009-05-11 15:55 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-user

() ludo@gnu.org (Ludovic Courtès)
() Mon, 11 May 2009 17:11:40 +0200

   The problem is that Guile strings cannot contain null characters.

FWIW, Guile 1.4.x can:

string> (string #\nul)
"^@"

   It may be a design flaw in Guile-WWW, which abuses strings to
   store binary data.

I see it rather as using a Guile feature that was subsequently
removed (much after the design of Guile-WWW was done).  Looks
like we'll need to add an ./configure check for Guile-WWW for
such situations...

thi




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-11 15:55   ` Thien-Thi Nguyen
@ 2009-05-11 23:17     ` Ludovic Courtès
  0 siblings, 0 replies; 22+ messages in thread
From: Ludovic Courtès @ 2009-05-11 23:17 UTC (permalink / raw)
  To: guile-user

Thien-Thi Nguyen <ttn@gnuvola.org> writes:

> FWIW, Guile 1.4.x can:
>
> string> (string #\nul)
> "^@"

Actually, it works as well with 1.8:

--8<---------------cut here---------------start------------->8---
guile> (string #\nul)
$1 = "\x00"
--8<---------------cut here---------------end--------------->8---

The "string contains #\nul character" message comes from
`scm_to_locale_stringn ()', which converts a string to a C `char *'.

It would be good to investigate how Guile-WWW reaches that code,
probably when calling some POSIX wrapper.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-10 16:21 Uploading Word documents, PDFs, PNG files etc Sebastian Tennant
  2009-05-11 15:11 ` Ludovic Courtès
@ 2009-05-12  3:15 ` Thien-Thi Nguyen
  2009-05-12 10:15   ` Sebastian Tennant
  2009-05-13 14:02   ` Sebastian Tennant
  1 sibling, 2 replies; 22+ messages in thread
From: Thien-Thi Nguyen @ 2009-05-12  3:15 UTC (permalink / raw)
  To: Sebastian Tennant; +Cc: guile-user

() Sebastian Tennant <sebyte@smolny.plus.com>
() Sun, 10 May 2009 16:21:29 +0000

      (with-output-to-file (string-append USER-UPLOAD-DIR upload-fname)
        (lambda ()
          (display upload)))

Perhaps you can convert the string in variable `upload' to a uniform vector
and write it out using `uniform-vector-write'.  It may be the case, too,
that the string can be passed to `uniform-vector-write' directly, e.g.:

      (with-output-to-file (string-append USER-UPLOAD-DIR upload-fname)
        (lambda ()
          (uniform-vector-write upload)))

thi




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-12  3:15 ` Thien-Thi Nguyen
@ 2009-05-12 10:15   ` Sebastian Tennant
  2009-05-12 10:33     ` Ludovic Courtès
  2009-05-13 14:02   ` Sebastian Tennant
  1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-12 10:15 UTC (permalink / raw)
  To: guile-user

Quoth Thien-Thi Nguyen <ttn@gnuvola.org>:
> () Sebastian Tennant <sebyte@smolny.plus.com>
> () Sun, 10 May 2009 16:21:29 +0000
>
>       (with-output-to-file (string-append USER-UPLOAD-DIR upload-fname)
>         (lambda ()
>           (display upload)))
>
> Perhaps you can convert the string in variable `upload' to a uniform vector
> and write it out using `uniform-vector-write'.  It may be the case, too,
> that the string can be passed to `uniform-vector-write' directly, e.g.:
>
>       (with-output-to-file (string-append USER-UPLOAD-DIR upload-fname)
>         (lambda ()
>           (uniform-vector-write upload)))

Nope.

On closer inspection, it is the call to (cgi:upload ...) which throws
the error so there is no way of converting the result to a vector before
trying to write it to disk.

 cgi.scm:

 (define (upload name)
   (and=> (uploads name) car))

cgi.scm is so full of closures it makes my head swim but from what I can
gather the upload is extracted and stored in memory at the
initialisation stage (cgi:init) which would suggest to me that the
problem doesn't lie within the procedure parse-form-multipart as this is
called by (cgi:init).

Given that guile 1.8.6 can actually handle null chars in strings
(despite what the error message says) what could the problem be?

Seb

P.S. Documentation bug - the procedure 'and=>' is missing from the
     manual.  What's the difference between the way the upload procedure
     is written above and simply:

      (define (upload name)
        (car (uploads name)))

-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-12 10:15   ` Sebastian Tennant
@ 2009-05-12 10:33     ` Ludovic Courtès
  2009-05-12 11:16       ` Sebastian Tennant
  0 siblings, 1 reply; 22+ messages in thread
From: Ludovic Courtès @ 2009-05-12 10:33 UTC (permalink / raw)
  To: guile-user

Sebastian Tennant <sebyte@smolny.plus.com> writes:

>  (define (upload name)
>    (and=> (uploads name) car))

[...]

> P.S. Documentation bug - the procedure 'and=>' is missing from the
>      manual.  What's the difference between the way the upload procedure
>      is written above and simply:
>
>       (define (upload name)
>         (car (uploads name)))

Using `and=>', `upload' returns `#f' if `uploads' returns `#f'.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-12 10:33     ` Ludovic Courtès
@ 2009-05-12 11:16       ` Sebastian Tennant
  0 siblings, 0 replies; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-12 11:16 UTC (permalink / raw)
  To: guile-user

Quoth ludo@gnu.org (Ludovic Courtès):
> Sebastian Tennant <sebyte@smolny.plus.com> writes:
>
>>  (define (upload name)
>>    (and=> (uploads name) car))
>
> [...]
>
>> P.S. Documentation bug - the procedure 'and=>' is missing from the
>>      manual.  What's the difference between the way the upload procedure
>>      is written above and simply:
>>
>>       (define (upload name)
>>         (car (uploads name)))
>
> Using `and=>', `upload' returns `#f' if `uploads' returns `#f'.
>

Ah... nice.  Thanks.

-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-12  3:15 ` Thien-Thi Nguyen
  2009-05-12 10:15   ` Sebastian Tennant
@ 2009-05-13 14:02   ` Sebastian Tennant
  2009-05-13 15:02     ` Ludovic Courtès
  1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-13 14:02 UTC (permalink / raw)
  To: guile-user

Quoth Thien-Thi Nguyen <ttn@gnuvola.org>:
> Perhaps you can convert the string in variable `upload' to a uniform
> vector and write it out using `uniform-vector-write'.  It may be the
> case, too, that the string can be passed to `uniform-vector-write'
> directly

I've isolated the problem.

(info "(guile-1.8)Regexp Functions")

 "Zero bytes (`#\nul') cannot be used in regex patterns or input
  strings, since the underlying C functions treat that as the end of
  string.  If there's a zero byte an error is thrown."

This restriction must have been added in guile > 1.4 if binary file
uploads are possible in ttn's 1.4.x branch.  It (the restriction) breaks
the named let get-pair (part of parse-form-multipart) called via
cgi:init:

(define (parse-form-multipart)
 ...
 (let* ((segment (car segment-newstart))
        (try (lambda (rx extract)
               (and=> (regexp-exec rx segment)  ; <-- #\nuls cause this to fail
                      extract)))
        (name (or parent-name
                  (try name-exp  m1)))
        (value    (try value-exp match:suffix)) ; <-- when called from here
        (type     (try type-exp  m1)))
 ... )
 )

Seb

-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-13 14:02   ` Sebastian Tennant
@ 2009-05-13 15:02     ` Ludovic Courtès
  2009-05-13 18:01       ` Sebastian Tennant
  2009-05-13 19:09       ` Sebastian Tennant
  0 siblings, 2 replies; 22+ messages in thread
From: Ludovic Courtès @ 2009-05-13 15:02 UTC (permalink / raw)
  To: guile-user

Hello,

Sebastian Tennant <sebyte@smolny.plus.com> writes:

> (info "(guile-1.8)Regexp Functions")
>
>  "Zero bytes (`#\nul') cannot be used in regex patterns or input
>   strings, since the underlying C functions treat that as the end of
>   string.  If there's a zero byte an error is thrown."

I think it makes sense to explicitly restrict regexps to actual text as
opposed to binary data.

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-13 15:02     ` Ludovic Courtès
@ 2009-05-13 18:01       ` Sebastian Tennant
  2009-05-13 19:09       ` Sebastian Tennant
  1 sibling, 0 replies; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-13 18:01 UTC (permalink / raw)
  To: guile-user

Quoth ludo@gnu.org (Ludovic Courtès):
> Hello,
>
> Sebastian Tennant <sebyte@smolny.plus.com> writes:
>
>> (info "(guile-1.8)Regexp Functions")
>>
>>  "Zero bytes (`#\nul') cannot be used in regex patterns or input
>>   strings, since the underlying C functions treat that as the end of
>>   string.  If there's a zero byte an error is thrown."
>
> I think it makes sense to explicitly restrict regexps to actual text as
> opposed to binary data.

I agree.  I only wanted to clearly identify the problem.

Hopefully I will be able to 'upgrade' cgi.scm so that it uses an
alternative method of extracting the file upload string myself.

Seb
-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-13 15:02     ` Ludovic Courtès
  2009-05-13 18:01       ` Sebastian Tennant
@ 2009-05-13 19:09       ` Sebastian Tennant
  2009-05-13 19:23         ` Linas Vepstas
  1 sibling, 1 reply; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-13 19:09 UTC (permalink / raw)
  To: guile-user

Quoth ludo@gnu.org (Ludovic Courtès):
> Hello,
>
> Sebastian Tennant <sebyte@smolny.plus.com> writes:
>
>> (info "(guile-1.8)Regexp Functions")
>>
>>  "Zero bytes (`#\nul') cannot be used in regex patterns or input
>>   strings, since the underlying C functions treat that as the end of
>>   string.  If there's a zero byte an error is thrown."
>
> I think it makes sense to explicitly restrict regexps to actual text as
> opposed to binary data.

On second thoughts, I'm not so sure... shouldn't users have the option
(either at compile time, or better still, in userland)?

Restricting regexps to actual text is fine... until you need to grep
binary data, or, as in this case, a combination of text and binary data.

I thought it was going to be trivial to replace the call to regexp-exec
in cgi.scm that extracted the uploaded (possibly binary) file, because
the pattern identifying the beginning of the file in the raw data string
is simple ("\n\r\n\r") - but I now realise that many calls to
regexp-exec in cgi.scm will need to be replaced, some with complex
matching patterns, so I can't see how this can be done without using
regexps, hence my changed opinion.

The only thing I can think of doing now is replacing calls to
regexp-exec with system calls to grep (which can accept binary data) -
clearly sub-optimal and non-trivial.

Anyone have any other ideas?  How easy would it be to build a guile with
a regex feature that doesn't implement this restriction on binary data?

Seb
-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-13 19:09       ` Sebastian Tennant
@ 2009-05-13 19:23         ` Linas Vepstas
  2009-05-14  3:47           ` Keith Wright
  0 siblings, 1 reply; 22+ messages in thread
From: Linas Vepstas @ 2009-05-13 19:23 UTC (permalink / raw)
  To: Sebastian Tennant; +Cc: guile-user

2009/5/13 Sebastian Tennant <sebyte@smolny.plus.com>:

> Restricting regexps to actual text is fine... until you need to grep
> binary data, or, as in this case, a combination of text and binary data.

Last I looked, standard c-library posix/gnu/perl/java
regex only worked on strings, not on binary data.
You'll have trouble finding a binary-data regex
implementation in C (or any other language).

> in cgi.scm that extracted the uploaded (possibly binary) file, because
> the pattern identifying the beginning of the file in the raw data string
> is simple ("\n\r\n\r") -

No, this sounds somehow broken.  If I remember correctly,
binary mime-parts should have a ConentLength header
so you can skip over them. If ContentLength is absent,
then the part should bee ascii-encoded (e.g. base64)
yeah, grapping large blocks of ascii sucks, which is
why the ContetnLength should be used.

-- linas




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-13 19:23         ` Linas Vepstas
@ 2009-05-14  3:47           ` Keith Wright
  2009-05-14 12:49             ` Sebastian Tennant
  0 siblings, 1 reply; 22+ messages in thread
From: Keith Wright @ 2009-05-14  3:47 UTC (permalink / raw)
  To: guile-user

> From: Linas Vepstas <linasvepstas@gmail.com>
> Cc: guile-user@gnu.org
> 
> 2009/5/13 Sebastian Tennant <sebyte@smolny.plus.com>:
> 
> > Restricting regexps to actual text is fine... until
> > you need to grep binary data, or, as in this case,
> > a combination of text and binary data.
> 
> > in cgi.scm that extracted the uploaded (possibly
> > binary) file, because the pattern identifying the
> > beginning of the file in the raw data string is
> > simple ("\n\r\n\r") -
> 
> No, this sounds somehow broken.  If I remember correctly,
> binary mime-parts should have a ConentLength header
> so you can skip over them. If ContentLength is absent,
> then the part should bee ascii-encoded (e.g. base64)
> yeah, grapping large blocks of ascii sucks, which is
> why the ContetnLength should be used.
> 
> -- linas

If the spec says a length indication followed by
a fixed length of arbitrary binary data, then it
is not just sucky, but incorrect to apply either
grep or regexp to the binary.  It will seem to
work until it hits a binary data that "by
accident" contains the string you are looking
for.

The only correct algorithm is to make a preliminary
pass to somehow remove the binary data and
pseudo-concatenate the remaining strings.

  -- Keith





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-14  3:47           ` Keith Wright
@ 2009-05-14 12:49             ` Sebastian Tennant
  2009-05-14 13:13               ` Sebastian Tennant
  2009-05-17 21:55               ` Ludovic Courtès
  0 siblings, 2 replies; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-14 12:49 UTC (permalink / raw)
  To: guile-user

Quoth Keith Wright <kwright@keithdiane.us>:
>> > Restricting regexps to actual text is fine... until
>> > you need to grep binary data, or, as in this case,
>> > a combination of text and binary data.
>> 
>> > in cgi.scm that extracted the uploaded (possibly
>> > binary) file, because the pattern identifying the
>> > beginning of the file in the raw data string is
>> > simple ("\n\r\n\r") -
>> 
>> No, this sounds somehow broken.  If I remember correctly,
>> binary mime-parts should have a ConentLength header
>> so you can skip over them. If ContentLength is absent,
>> then the part should bee ascii-encoded (e.g. base64)
>> yeah, grapping large blocks of ascii sucks, which is
>> why the ContetnLength should be used.
>> 
>> -- linas
>
> If the spec says a length indication followed by
> a fixed length of arbitrary binary data, then it
> is not just sucky, but incorrect to apply either
> grep or regexp to the binary.  It will seem to
> work until it hits a binary data that "by
> accident" contains the string you are looking
> for.
>
> The only correct algorithm is to make a preliminary
> pass to somehow remove the binary data and
> pseudo-concatenate the remaining strings.

Multipart/form-data comes with a Content-Length header that describes
the length of any and all parts combined, making it possible to extract
a string that looks like this:

 -----------------------------1307099961880952181245320094\x0d
 Content-Disposition: form-data; name=\"TABLE\"\x0d
 \x0d
 \x0d
 -----------------------------1307099961880952181245320094\x0d
 Content-Disposition: form-data; name=\"File-Upload\"; filename=\"null-char.txt\"\x0d
 Content-Type: text/plain\x0d
 \x0d
 foo^@bar
 \x0d
 -----------------------------1307099961880952181245320094\x0d
 Content-Disposition: form-data; name=\"Button\"\x0d
 \x0d
 Upload\x0d
 -----------------------------1307099961880952181245320094--\x0d

The boundary can be obtained from the Content-Type header and the above
string can then be broken into parts but how can one extract the name,
filename, type and value of each part without using regexps?

cgi.scm currently uses the following patterns and I can't think of an
alternative way of doing it:

 (let ((name-rx     (make-regexp "name=\"([^\"]*)\""))
       (filename-rx (make-regexp "filename=\"*([^\"\r]*)\"*"))
       (type-rx     (make-regexp "Content-Type: ([^\r]*)\r\n" regexp/icase))
       (value-rx    (make-regexp "\r\n\r\n")))
   ...)

Seb
-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-14 12:49             ` Sebastian Tennant
@ 2009-05-14 13:13               ` Sebastian Tennant
  2009-05-17 21:55               ` Ludovic Courtès
  1 sibling, 0 replies; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-14 13:13 UTC (permalink / raw)
  To: guile-user

Quoth Sebastian Tennant <sebyte@smolny.plus.com>:
> Multipart/form-data comes with a Content-Length header that describes

Correction - the above should read 'CONTENT_LENGTH environment variable'.

> The boundary can be obtained from the Content-Type header and the above

Similarly, the above should read 'CONTENT_TYPE' environment variable'.

Seb
-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-14 12:49             ` Sebastian Tennant
  2009-05-14 13:13               ` Sebastian Tennant
@ 2009-05-17 21:55               ` Ludovic Courtès
  2009-05-19  4:48                 ` Sebastian Tennant
  1 sibling, 1 reply; 22+ messages in thread
From: Ludovic Courtès @ 2009-05-17 21:55 UTC (permalink / raw)
  To: guile-user

Hello,

Sebastian Tennant <sebyte@smolny.plus.com> writes:

> cgi.scm currently uses the following patterns and I can't think of an
> alternative way of doing it:
>
>  (let ((name-rx     (make-regexp "name=\"([^\"]*)\""))
>        (filename-rx (make-regexp "filename=\"*([^\"\r]*)\"*"))
>        (type-rx     (make-regexp "Content-Type: ([^\r]*)\r\n" regexp/icase))
>        (value-rx    (make-regexp "\r\n\r\n")))
>    ...)

Can't this be applied just to the header part of the blob rather than to
the whole blob, including binary data?

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-17 21:55               ` Ludovic Courtès
@ 2009-05-19  4:48                 ` Sebastian Tennant
  2009-05-19  4:59                   ` Sebastian Tennant
  2009-05-19  7:50                   ` Ludovic Courtès
  0 siblings, 2 replies; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-19  4:48 UTC (permalink / raw)
  To: guile-user

Hi Ludo,

Thanks for responding.  I know this isn't really your 'thing' (for want
of a better word).

Quoth ludo@gnu.org (Ludovic Courtès):
>> cgi.scm currently uses the following patterns and I can't think of an
>> alternative way of doing it:
>>
>>  (let ((name-rx     (make-regexp "name=\"([^\"]*)\""))
>>        (filename-rx (make-regexp "filename=\"*([^\"\r]*)\"*"))
>>        (type-rx     (make-regexp "Content-Type: ([^\r]*)\r\n" regexp/icase))
>>        (value-rx    (make-regexp "\r\n\r\n")))
>>    ...)
>
> Can't this be applied just to the header part of the blob rather than to
> the whole blob, including binary data?

The problem is that there's no way of being sure how many header lines
will precede the (possibly) binary blob in any given part (RFC 2388).

 -----------------------------1307099961880952181245320094\x0d
 Content-Disposition: form-data; name=\"TABLE\"\x0d
 \x0d
 \x0d
 -----------------------------1307099961880952181245320094\x0d
 Content-Disposition: form-data; name=\"File-Upload\"; filename=\"null-char.txt\"\x0d
 Content-Type: text/plain\x0d
 \x0d
 foo^@bar
 \x0d
 -----------------------------1307099961880952181245320094\x0d
 Content-Disposition: form-data; name=\"Button\"\x0d
 \x0d
 Upload\x0d
 -----------------------------1307099961880952181245320094--\x0d

Content-Disposition is mandatory, but Content-Type is optional
(defaulting to text/plain) as is Content-Transfer-Encoding, so the
"header part" of any given MIME part may be a single line or it may be
three.


Seb
-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-19  4:48                 ` Sebastian Tennant
@ 2009-05-19  4:59                   ` Sebastian Tennant
  2009-05-19  7:50                   ` Ludovic Courtès
  1 sibling, 0 replies; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-19  4:59 UTC (permalink / raw)
  To: guile-user

Quoth Sebastian Tennant <sebyte@smolny.plus.com>:
> so the "header part" of any given MIME part may be a single line or it
> may be three.

Correction - ...it may be a single line, or it may be two, or three.

Seb
-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-19  4:48                 ` Sebastian Tennant
  2009-05-19  4:59                   ` Sebastian Tennant
@ 2009-05-19  7:50                   ` Ludovic Courtès
  2009-05-21  5:22                     ` Sebastian Tennant
  1 sibling, 1 reply; 22+ messages in thread
From: Ludovic Courtès @ 2009-05-19  7:50 UTC (permalink / raw)
  To: guile-user

Hello,

Sebastian Tennant <sebyte@smolny.plus.com> writes:

> Content-Disposition is mandatory, but Content-Type is optional
> (defaulting to text/plain) as is Content-Transfer-Encoding, so the
> "header part" of any given MIME part may be a single line or it may be
> three.

Then I presume this could be read line-by-line as strings (using
`read-line' from `(ice-9 rdelim)') until the end-of-header marker is
reached.  The remaining data would be read using `uniform-vector-read!'
or some such.

Disclaimer: I'm no MIME expert.  ;-)

Thanks,
Ludo'.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-19  7:50                   ` Ludovic Courtès
@ 2009-05-21  5:22                     ` Sebastian Tennant
  2009-05-21 10:47                       ` Thien-Thi Nguyen
  0 siblings, 1 reply; 22+ messages in thread
From: Sebastian Tennant @ 2009-05-21  5:22 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 1028 bytes --]

Quoth ludo@gnu.org (Ludovic Courtès):
> Hello,
>
> Sebastian Tennant <sebyte@smolny.plus.com> writes:
>
>> Content-Disposition is mandatory, but Content-Type is optional
>> (defaulting to text/plain) as is Content-Transfer-Encoding, so the
>> "header part" of any given MIME part may be a single line or it may be
>> three.
>
> Then I presume this could be read line-by-line as strings (using
> `read-line' from `(ice-9 rdelim)') until the end-of-header marker is
> reached.  The remaining data would be read using `uniform-vector-read!'
> or some such.

Problem solved.  With this patch applied to cgi.scm in ttn's (www cgi)
module, uploading of binary data now works with Guile 1.8.

In the end it was simply a case of splitting each part into a header
section and value section using string-contains and substring rather
than match:prefix and match:suffix.

Thanks for all your help.

Seb
-- 
Emacs' AlsaPlayer - Music Without Jolts
Lightweight, full-featured and mindful of your idyllic happiness.
http://home.gna.org/eap


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Unified diff --]
[-- Type: text/x-diff, Size: 4518 bytes --]

--- cgi.scm	2007-10-04 10:35:38.000000000 +0000
+++ cgi-patched.scm	2009-05-21 04:48:58.210914642 +0000
@@ -212,10 +212,9 @@
                               (#:raw-mime-headers . ,raw-headers)))
       (set! u (updated-alist u name value)))
 
-    (let ((name-exp     (make-regexp "name=\"([^\"]*)\""))
-          (filename-exp (make-regexp "filename=\"*([^\"\r]*)\"*"))
-          (type-exp     (make-regexp "Content-Type: ([^\r]*)\r\n" regexp/icase))
-          (value-exp    (make-regexp "\r\n\r\n")))
+    (let ((name-rx     (make-regexp "name=\"([^\"]*)\""))
+          (filename-rx (make-regexp "filename=\"*([^\"\r]*)\"*"))
+          (type-rx     (make-regexp "Content-Type: ([^\r]*)$" regexp/icase)))
 
       (let level ((str raw-data)
                   (boundary (determine-boundary (env-look 'content-type)))
@@ -233,41 +232,56 @@
                                      (lambda (seg-finish)
                                        (cons (subs str seg-start (- seg-finish 2))
                                              seg-finish))))))
+		   ;; segment-newstart is a cons of the form
+		   ;; ("<header(s)>\r\n\r\n<value-of-part>\r\n"
+		   ;;                 .
+		   ;;  <position-reached-in-raw-data>)
                    (lambda (segment-newstart)
                      (let* ((segment (car segment-newstart))
-                            (try (lambda (rx extract)
-                                   (and=> (regexp-exec rx segment)
-                                          extract)))
-                            (name (or parent-name
-                                      (try name-exp  m1)))
-                            (value    (try value-exp match:suffix))
-                            (type     (try type-exp  m1)))
-                       (and name
+			    ;; segment splitter
+			    (seg-split
+			     (lambda (pattern string portion)
+			       (and=> (string-contains string pattern) portion)))
+			    ;; split segment into header(s) and value
+			    (headers (seg-split "\r\n\r\n" segment
+						(lambda (index)
+						  (substring segment 2 index))))
+			    (value (seg-split "\r\n\r\n" segment
+					      (lambda (index)
+						(substring segment (+ index 4)))))
+			    ;; extract data from header(s)
+                            (hdr-extract (lambda (rx extract)
+					   (and=> (regexp-exec rx headers)
+						  extract)))
+                            (name (or parent-name (hdr-extract name-rx m1)))
+                            (type (hdr-extract type-rx m1))
+			    (filename (hdr-extract filename-rx m1)))
+
+		       (and name
                             value
-                            (cond ((and type
-                                        (not parent-name) ; only recurse once
-                                        (string-match "multipart/mixed" type))
-                                   (level value
-                                          (determine-boundary type)
-                                          name))
-                                  ((and type (try filename-exp m1))
-                                   => (lambda (filename)
-                                        (stash-file-upload!
-                                         name filename type value
-                                         (subs (try value-exp match:prefix)
-                                               2))))
-                                  (else
-                                   (stash-form-variable! name value)))))
+			    (cond ((and type
+					(not parent-name) ; only recurse once
+					(string-match "multipart/mixed" type))
+				   (level value
+					  (determine-boundary type)
+					  name))
+				  ((and type (hdr-extract filename-rx m1))
+				   => (lambda (filename)
+					(stash-file-upload!
+					 name filename type value headers)))
+				  (else
+				   (stash-form-variable! name value)))
+			    ))
                      (get-pair (cdr segment-newstart))))))))
 
     (cons (reverse! v) (reverse! u))))
 
 (define (get-cookies raw)
   ;; Parse RAW (a string) for cookie-like frags.  Return an alist.
-  (let ((pair-exp (make-regexp "([^=; \t\n]+)=([^=; \t\n]+)"))
+  (let ((pair-rx (make-regexp "([^=; \t\n]+)=([^=; \t\n]+)"))
         (c (list)))
     (define (get-pair str)
-      (let ((pair-match (regexp-exec pair-exp str)))
+      (let ((pair-match (regexp-exec pair-rx str)))
         (if (not pair-match) '()
             (let ((name (match:substring pair-match 1))
                   (value (match:substring pair-match 2)))

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Uploading Word documents, PDFs, PNG files etc
  2009-05-21  5:22                     ` Sebastian Tennant
@ 2009-05-21 10:47                       ` Thien-Thi Nguyen
  0 siblings, 0 replies; 22+ messages in thread
From: Thien-Thi Nguyen @ 2009-05-21 10:47 UTC (permalink / raw)
  To: guile-user

() Sebastian Tennant <sebyte@smolny.plus.com>
() Thu, 21 May 2009 05:22:15 +0000

   Problem solved.  With this patch applied to cgi.scm in ttn's
   (www cgi) module, uploading of binary data now works with Guile
   1.8.

   In the end it was simply a case of splitting each part into a
   header section and value section using string-contains and
   substring rather than match:prefix and match:suffix.

Thanks for writing this patch.  I will incorporate it into the
next Guile-WWW release.  Keep up the good work!

thi




^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2009-05-21 10:47 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2009-05-10 16:21 Uploading Word documents, PDFs, PNG files etc Sebastian Tennant
2009-05-11 15:11 ` Ludovic Courtès
2009-05-11 15:55   ` Thien-Thi Nguyen
2009-05-11 23:17     ` Ludovic Courtès
2009-05-12  3:15 ` Thien-Thi Nguyen
2009-05-12 10:15   ` Sebastian Tennant
2009-05-12 10:33     ` Ludovic Courtès
2009-05-12 11:16       ` Sebastian Tennant
2009-05-13 14:02   ` Sebastian Tennant
2009-05-13 15:02     ` Ludovic Courtès
2009-05-13 18:01       ` Sebastian Tennant
2009-05-13 19:09       ` Sebastian Tennant
2009-05-13 19:23         ` Linas Vepstas
2009-05-14  3:47           ` Keith Wright
2009-05-14 12:49             ` Sebastian Tennant
2009-05-14 13:13               ` Sebastian Tennant
2009-05-17 21:55               ` Ludovic Courtès
2009-05-19  4:48                 ` Sebastian Tennant
2009-05-19  4:59                   ` Sebastian Tennant
2009-05-19  7:50                   ` Ludovic Courtès
2009-05-21  5:22                     ` Sebastian Tennant
2009-05-21 10:47                       ` Thien-Thi Nguyen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).