[PATCH] read-response-body should return received data when any break happens

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* [PATCH] read-response-body should return received data when any break happens
@ 2012-03-11 15:35 Nala Ginrut
  2012-03-15 16:09 ` Daniel Hartwig
  2012-03-15 18:31 ` Ian Price
  0 siblings, 2 replies; 8+ messages in thread
From: Nala Ginrut @ 2012-03-11 15:35 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 2210 bytes --]

I've been troubled with a weird problem in read-response-body for a long
time.
I think read-response-body never return the received data when any break
happens. No matter the break caused by connection problem or user
interruption.
The only possible read-response-body returns data is connection never down
and all the data have been received even if I want to download a 2G file.
Or there's no
chance to write any data to the disk. When break occurs, all the received
data will evaporate.
Considering my terrible network, I decide not to pray for good luck when I
establish connection with our web module.
So here's a patch to fix it.

The new read-response-body will add the received data to the exceptional
information which used by "throw", if read-response-body can't continue to
work anymore, the received data will return with throw.
And there's a useful helper function to write this data to the disk ==>
(output-received-response-body e port)

However, add received data to the exceptions will cause troubles when
tracing or REPL throw exceptional information, because received data(as a
bytevector) is usually huge. So there's another helper function to get rid
of the received data after you handle the received data in your way. It
will re-throw the original information in case other program need to catch
it. ==> (throw-from-response-body-break e)

It's easy to use them, but you must catch anything when your code contains
read-response-body:
----------------------------cut-------------------------
(catch #t
          (lambda ()
             .....do some work with read-response-body...)
           (lambda e
                 (output-received-response-body e port) ;; write
received-data to the disk or you may decide to store it to other place.
                 ...
                 ...
                 (throw-from-response-body-break e))) ;; re-throw the
original information in the last step.
---------------------------end---------------------------

Anyway, one may use it in his/her own way if he/she checkout their
implementation. The helper functions are not always necessary.
But I do think read-response-body should return the received data when it
breaks.

Any comments?

Regards.

[-- Attachment #1.2: Type: text/html, Size: 2459 bytes --]

[-- Attachment #2: 0001-read-response-body-should-return-received-data-in-an.patch --]
[-- Type: text/x-diff, Size: 3673 bytes --]

From 6b6aef2192769ce12a2962b02d103a019f4bc9c6 Mon Sep 17 00:00:00 2001
From: NalaGinrut <NalaGinrut@gmail.com>
Date: Sun, 11 Mar 2012 23:02:07 +0800
Subject: [PATCH] read-response-body should return received data in any break

---
 module/web/response.scm |   52 +++++++++++++++++++++++++++++++++++++++-------
 1 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/module/web/response.scm b/module/web/response.scm
index 07e1245..e3ea0a6 100644
--- a/module/web/response.scm
+++ b/module/web/response.scm
@@ -21,6 +21,7 @@
 
 (define-module (web response)
   #:use-module (rnrs bytevectors)
+  #:autoload (rnrs) (call-with-port)
   #:use-module (ice-9 binary-ports)
   #:use-module (ice-9 rdelim)
   #:use-module (srfi srfi-9)
@@ -38,6 +39,8 @@
 
             response-must-not-include-body?
             read-response-body
+            output-received-response-body
+            throw-from-response-body-break
             write-response-body
 
             ;; General headers
@@ -224,16 +227,49 @@ This is true for some response types, like those with code 304."
       (= (response-code r) 204)
       (= (response-code r) 304)))
 
-(define (read-response-body r)
+(define* (read-response-body r #:key (block 4096))
   "Reads the response body from @var{r}, as a bytevector.  Returns
 @code{#f} if there was no response body."
-  (let ((nbytes (response-content-length r)))
-    (and nbytes
-         (let ((bv (get-bytevector-n (response-port r) nbytes)))
-           (if (= (bytevector-length bv) nbytes)
-               bv
-               (bad-response "EOF while reading response body: ~a bytes of ~a"
-                            (bytevector-length bv) nbytes))))))
+  (let* ((nbytes (response-content-length r))
+         (bv (and nbytes (make-bytevector nbytes)))
+         (start 0))
+    (catch #t
+           (lambda ()
+             (let lp((buf (get-bytevector-n (response-port r) block)))
+                 (if (eof-object? buf)
+                     bv
+                     (let ((len (bytevector-length buf)))
+                       (cond
+                        ((<= len block)
+                         (bytevector-copy! buf 0 bv start len)
+                         (set! start (+ start len))
+                         (lp (get-bytevector-n (response-port r) block)))
+                        (else
+                         (bad-response "EOF while reading response body: ~a bytes of ~a"
+                                       start nbytes)))))))
+             (lambda (k . e)
+               (let ((received (call-with-port 
+                                (open-bytevector-input-port bv) 
+                                (lambda (port)
+                                  (get-bytevector-n port start)))))
+                 (throw k `(,@e (body ,@received))) ;; return the received data
+                 )))))
+
+;; output the received data if there is, or do nothing
+(define (output-received-response-body e port)
+  (let ((received (assoc-ref (cadr e) 'body)))
+    (if received
+        (begin
+          (put-bytevector port received) 
+          (force-output port)))))
+   
+;; Exceptional information contains the received bytevector added from the
+;; read-response-body if any exception had been caught.
+;; If received data ware huge(it always does), it'd be a trouble during the tracing.
+;; This helper function could get rid of the received data from exceptional info,
+;; and re-throw it.
+(define (throw-from-response-body-break e)
+  (throw (car e) (list-head (cdr e) (1- (length (cdr e))))))
 
 (define (write-response-body r bv)
   "Write @var{bv}, a bytevector, to the port corresponding to the HTTP
-- 
1.7.0.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] read-response-body should return received data when any break happens
  2012-03-11 15:35 [PATCH] read-response-body should return received data when any break happens Nala Ginrut
@ 2012-03-15 16:09 ` Daniel Hartwig
  2012-03-15 18:37   ` Ian Price
  2012-03-15 18:31 ` Ian Price
  1 sibling, 1 reply; 8+ messages in thread
From: Daniel Hartwig @ 2012-03-15 16:09 UTC (permalink / raw)
  To: Nala Ginrut, guile-devel

On 11 March 2012 23:35, Nala Ginrut <nalaginrut@gmail.com> wrote:
>
> The new read-response-body will add the received data to the exceptional
> information which used by "throw", if read-response-body can't continue to
> work anymore, the received data will return with throw.
> And there's a useful helper function to write this data to the disk ==>
> (output-received-response-body e port)
>

> Any comments?
>

So this is an interesting start.  The idea of buffering the transfer
is great -- however, it falls short in this implementation because it
is internal to read-response-body.

Also, the whole business about passing the partial data out via an
exception is very messy.

What about also passing a bytevector to read-response-body?  The
exception then only needs to mention how many bytes were read because
the caller already has access to the bytevector aka the data.

Consider this quick hack:

(define* (read-response-body! r bv #:optional
                              (start 0)
                              (count (min (bytevector-length bv)
                                          (response-content-length r))))
  (and count
       (let ((read (get-bytevector-n! (response-port r) bv start count)))
         (if (= read count)
             bv
             (bad-response
              "EOF while reading response body: ~a bytes of ~a (buffer)"
              read count)))))

which has all the features of your solution yet is much smaller and
puts the caller in more explicit control of the buffering, which opens
up many scenarios.

For example, reusing the same bytevector and looping over
read-response-body! saving the results to disk each time.  This limits
the memory use to the size of the bytevector *and* removes the copy
operation from your implementation (bonus!).

This also facilitates the ongoing work with filtering streams, etc.
that some other people are tackling.

Some food for thought.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] read-response-body should return received data when any break happens
  2012-03-11 15:35 [PATCH] read-response-body should return received data when any break happens Nala Ginrut
  2012-03-15 16:09 ` Daniel Hartwig
@ 2012-03-15 18:31 ` Ian Price
  1 sibling, 0 replies; 8+ messages in thread
From: Ian Price @ 2012-03-15 18:31 UTC (permalink / raw)
  To: Nala Ginrut; +Cc: guile-devel

Nala Ginrut <nalaginrut@gmail.com> writes:

> I've been troubled with a weird problem in read-response-body for a long time.
> I think read-response-body never return the received data when any break happens. No matter the break caused by connection problem or user interruption.
> The only possible read-response-body returns data is connection never down and all the data have been received even if I want to download a 2G file. Or there's no
> chance to write any data to the disk. When break occurs, all the received data will evaporate.
> Considering my terrible network, I decide not to pray for good luck when I establish connection with our web module.
> So here's a patch to fix it.
I don't think read-response-body is a good choice for downloading a 2GB
file in any case. Just as you wouldn't read a 2GB sorted file into
memory to perform a binary search, preferring to search directly on the
disk itself, I think people with expectations of treating large files
should write an appropriate handler for themselves using the lower
building blocks guile already provides.

> The new read-response-body will add the received data to the exceptional information which used by "throw", if read-response-body can't continue to work anymore, the received
> data will return with throw.
Yes, providing it at the point of error is best. The data is already
available to us and it won't affect normal returns.

This last part is particularly important to me, as I mentioned on IRC,
since programmers should not have to do anything special in the normal case.

> -(define (read-response-body r)
> +(define* (read-response-body r #:key (block 4096))
>    "Reads the response body from @var{r}, as a bytevector.  Returns
>  @code{#f} if there was no response body."
> -  (let ((nbytes (response-content-length r)))
> -    (and nbytes
> -         (let ((bv (get-bytevector-n (response-port r) nbytes)))
> -           (if (= (bytevector-length bv) nbytes)
> -               bv
> -               (bad-response "EOF while reading response body: ~a bytes of ~a"
> -                            (bytevector-length bv) nbytes))))))
> +  (let* ((nbytes (response-content-length r))
> +         (bv (and nbytes (make-bytevector nbytes)))
> +         (start 0))
> +    (catch #t
> +           (lambda ()
> +             (let lp((buf (get-bytevector-n (response-port r) block)))
> +                 (if (eof-object? buf)
> +                     bv
> +                     (let ((len (bytevector-length buf)))
> +                       (cond
> +                        ((<= len block)
> +                         (bytevector-copy! buf 0 bv start len)
> +                         (set! start (+ start len))
> +                         (lp (get-bytevector-n (response-port r) block)))
> +                        (else
> +                         (bad-response "EOF while reading response body: ~a bytes of ~a"
> +                                       start nbytes)))))))
The manual buffering is superfluous. get-bytevector-n already does this
under the hood, and much more efficiently. Even if it didn't, reading in
smaller chunks is only a win if you are also processing it in chunks,
since large blocks can make the gc interfere more often.

The only conceivable difference I could see is that you are choosing to
return an eof object rather of erroring. I'm not convinced of the
utility of that: #f or #vu8() I could understand, eof less so.

> +             (lambda (k . e)
> +               (let ((received (call-with-port 
> +                                (open-bytevector-input-port bv) 
> +                                (lambda (port)
> +                                  (get-bytevector-n port start)))))
> +                 (throw k `(,@e (body ,@received))) ;; return the received data
> +                 )))))
> +
yuck. This consing a body symbol is hideous IMO. From a programmatic
point of view it is unnecessary, and all it's adding is requiring the
user to perform an extra cadr. It would be much better to choose a
different key to throw to, rather than doing it this way.

> +;; output the received data if there is, or do nothing
> +(define (output-received-response-body e port)
> +  (let ((received (assoc-ref (cadr e) 'body)))
> +    (if received
> +        (begin
> +          (put-bytevector port received) 
> +          (force-output port)))))
> +   
> +;; Exceptional information contains the received bytevector added from the
> +;; read-response-body if any exception had been caught.
> +;; If received data ware huge(it always does), it'd be a trouble during the tracing.
> +;; This helper function could get rid of the received data from exceptional info,
> +;; and re-throw it.
> +(define (throw-from-response-body-break e)
> +  (throw (car e) (list-head (cdr e) (1- (length (cdr e))))))
>  
>  (define (write-response-body r bv)
>    "Write @var{bv}, a bytevector, to the port corresponding to the HTTP

I know I don't have any sort of authority, but I'd like to veto
these. Special procedures to deal with an ad-hoc protocol layered over
another more appropriate protocol is just ugly. As I already mentioned,
exceptions have tags, and this is what these are for.

Fundamentally, I think this patch could be simplified to checking for an
eof from get-bytevector-n and changing the bad-response to an
"incomplete-response" that provides the bytevector.

What does everyone else think?

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] read-response-body should return received data when any break happens
  2012-03-15 16:09 ` Daniel Hartwig
@ 2012-03-15 18:37   ` Ian Price
  2012-03-15 18:48     ` Daniel Hartwig
  0 siblings, 1 reply; 8+ messages in thread
From: Ian Price @ 2012-03-15 18:37 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: guile-devel

Daniel Hartwig <mandyke@gmail.com> writes:

> So this is an interesting start.  The idea of buffering the transfer
> is great -- however, it falls short in this implementation because it
> is internal to read-response-body.
The buffering is useless, it's already performed by get-bytevector-n. In
this sense, it is also not internal to read-response-body

> Also, the whole business about passing the partial data out via an
> exception is very messy.
>
> What about also passing a bytevector to read-response-body?  The
> exception then only needs to mention how many bytes were read because
> the caller already has access to the bytevector aka the data.
>
> Consider this quick hack:
>
> (define* (read-response-body! r bv #:optional
>                               (start 0)
>                               (count (min (bytevector-length bv)
>                                           (response-content-length r))))
>   (and count
>        (let ((read (get-bytevector-n! (response-port r) bv start count)))
>          (if (= read count)
>              bv
>              (bad-response
>               "EOF while reading response body: ~a bytes of ~a (buffer)"
>               read count)))))
>
> which has all the features of your solution yet is much smaller and
> puts the caller in more explicit control of the buffering, which opens
> up many scenarios.
Removing the unnecessary buffering also makes it smaller. :P

> For example, reusing the same bytevector and looping over
> read-response-body! saving the results to disk each time.  This limits
> the memory use to the size of the bytevector *and* removes the copy
> operation from your implementation (bonus!).
If you wanted to do it that way, it'd be better to pass in the port
directly and cut out the middle man. 

-- 
Ian Price

"Programming is like pinball. The reward for doing it well is
the opportunity to do it again" - from "The Wizardy Compiled"



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] read-response-body should return received data when any break happens
  2012-03-15 18:37   ` Ian Price
@ 2012-03-15 18:48     ` Daniel Hartwig
  2012-03-16  2:23       ` Nala Ginrut
  0 siblings, 1 reply; 8+ messages in thread
From: Daniel Hartwig @ 2012-03-15 18:48 UTC (permalink / raw)
  To: Ian Price; +Cc: guile-devel

On 16 March 2012 02:37, Ian Price <ianprice90@googlemail.com> wrote:
> Daniel Hartwig <mandyke@gmail.com> writes:
>> For example, reusing the same bytevector and looping over
>> read-response-body! saving the results to disk each time.  This limits
>> the memory use to the size of the bytevector *and* removes the copy
>> operation from your implementation (bonus!).
> If you wanted to do it that way, it'd be better to pass in the port
> directly and cut out the middle man.
>

Indeed.  The procedure shown is similar to one from one of my own
projects which features the write-to-disk internally.

I guess it pays to keep in mind that it is trivial to rearrange
procedures such as this to suit any particular situation.  The OP
appeared--to me--to be very over worked for the task.

Perhaps such error-tolerance/streaming capabilities can be tied in at
a level closer to get-bytevector-n...

Regards

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] read-response-body should return received data when any break happens
  2012-03-15 18:48     ` Daniel Hartwig
@ 2012-03-16  2:23       ` Nala Ginrut
  2012-03-16  3:24         ` Daniel Hartwig
  2012-03-16  3:31         ` Nala Ginrut
  0 siblings, 2 replies; 8+ messages in thread
From: Nala Ginrut @ 2012-03-16  2:23 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: Ian Price, guile-devel

[-- Attachment #1: Type: text/plain, Size: 1805 bytes --]

Thanks for reply!
@ijp: Yes, I think return received data within exception is ugly. But I
have to do it because my consideration is to return received data when
*any* exception happens. So if it's not a common connection error, I have
to catch it to return the received data, then re-throw the same
exception(and cut the received data). Because other program may expect this
exception. But I don't claim that this is the best design. Anyway, just a
proposal.
@Daniel: I realized that seems make "get-bytevector-n" return the received
data rather than read-response-body is better. But I'm afraid that it'll
conflict with the definition "get-bytevector-n". Say, we ask for n bytes,
but it returned m bytes less than n. So the user maybe get confused with
the name "get-bytevector-n".

On Fri, Mar 16, 2012 at 2:48 AM, Daniel Hartwig <mandyke@gmail.com> wrote:

> On 16 March 2012 02:37, Ian Price <ianprice90@googlemail.com> wrote:
> > Daniel Hartwig <mandyke@gmail.com> writes:
> >> For example, reusing the same bytevector and looping over
> >> read-response-body! saving the results to disk each time.  This limits
> >> the memory use to the size of the bytevector *and* removes the copy
> >> operation from your implementation (bonus!).
> > If you wanted to do it that way, it'd be better to pass in the port
> > directly and cut out the middle man.
> >
>
> Indeed.  The procedure shown is similar to one from one of my own
> projects which features the write-to-disk internally.
>
> I guess it pays to keep in mind that it is trivial to rearrange
> procedures such as this to suit any particular situation.  The OP
> appeared--to me--to be very over worked for the task.
>
> Perhaps such error-tolerance/streaming capabilities can be tied in at
> a level closer to get-bytevector-n...
>
>
> Regards
>

[-- Attachment #2: Type: text/html, Size: 2323 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] read-response-body should return received data when any break happens
  2012-03-16  2:23       ` Nala Ginrut
@ 2012-03-16  3:24         ` Daniel Hartwig
  2012-03-16  3:31         ` Nala Ginrut
  1 sibling, 0 replies; 8+ messages in thread
From: Daniel Hartwig @ 2012-03-16  3:24 UTC (permalink / raw)
  To: Nala Ginrut, guile-devel

On 16 March 2012 10:23, Nala Ginrut <nalaginrut@gmail.com> wrote:
> @Daniel: I realized that seems make "get-bytevector-n" return the received
> data rather than read-response-body is better. But I'm afraid that it'll
> conflict with the definition "get-bytevector-n". Say, we ask for n bytes,
> but it returned m bytes less than n. So the user maybe get confused with the
> name "get-bytevector-n".
>

The definition of get-bytevector-n states what happens when fewer than
'count' bytes are read. Also, get-bytevector-n! says it returns the
number of bytes read as an integer.  So... I don't see where the
conflict is?



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] read-response-body should return received data when any break happens
  2012-03-16  2:23       ` Nala Ginrut
  2012-03-16  3:24         ` Daniel Hartwig
@ 2012-03-16  3:31         ` Nala Ginrut
  1 sibling, 0 replies; 8+ messages in thread
From: Nala Ginrut @ 2012-03-16  3:31 UTC (permalink / raw)
  To: Daniel Hartwig; +Cc: Ian Price, guile-devel

[-- Attachment #1: Type: text/plain, Size: 2170 bytes --]

After a long long discussion with ijp, I realized that this could be an
easy patch because get-bytevector-n handles most of the work. So I'll raise
another thread about this topic for a new version patch.
Thanks for discussion!

On Fri, Mar 16, 2012 at 10:23 AM, Nala Ginrut <nalaginrut@gmail.com> wrote:

> Thanks for reply!
> @ijp: Yes, I think return received data within exception is ugly. But I
> have to do it because my consideration is to return received data when
> *any* exception happens. So if it's not a common connection error, I have
> to catch it to return the received data, then re-throw the same
> exception(and cut the received data). Because other program may expect this
> exception. But I don't claim that this is the best design. Anyway, just a
> proposal.
> @Daniel: I realized that seems make "get-bytevector-n" return the received
> data rather than read-response-body is better. But I'm afraid that it'll
> conflict with the definition "get-bytevector-n". Say, we ask for n bytes,
> but it returned m bytes less than n. So the user maybe get confused with
> the name "get-bytevector-n".
>
>
> On Fri, Mar 16, 2012 at 2:48 AM, Daniel Hartwig <mandyke@gmail.com> wrote:
>
>> On 16 March 2012 02:37, Ian Price <ianprice90@googlemail.com> wrote:
>> > Daniel Hartwig <mandyke@gmail.com> writes:
>> >> For example, reusing the same bytevector and looping over
>> >> read-response-body! saving the results to disk each time.  This limits
>> >> the memory use to the size of the bytevector *and* removes the copy
>> >> operation from your implementation (bonus!).
>> > If you wanted to do it that way, it'd be better to pass in the port
>> > directly and cut out the middle man.
>> >
>>
>> Indeed.  The procedure shown is similar to one from one of my own
>> projects which features the write-to-disk internally.
>>
>> I guess it pays to keep in mind that it is trivial to rearrange
>> procedures such as this to suit any particular situation.  The OP
>> appeared--to me--to be very over worked for the task.
>>
>> Perhaps such error-tolerance/streaming capabilities can be tied in at
>> a level closer to get-bytevector-n...
>>
>>
>> Regards
>>
>
>

[-- Attachment #2: Type: text/html, Size: 2962 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2012-03-16  3:31 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-03-11 15:35 [PATCH] read-response-body should return received data when any break happens Nala Ginrut
2012-03-15 16:09 ` Daniel Hartwig
2012-03-15 18:37   ` Ian Price
2012-03-15 18:48     ` Daniel Hartwig
2012-03-16  2:23       ` Nala Ginrut
2012-03-16  3:24         ` Daniel Hartwig
2012-03-16  3:31         ` Nala Ginrut
2012-03-15 18:31 ` Ian Price

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).