unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] Make get-bytevector-all suspendable.
@ 2023-07-20 14:01 Christopher Baines
  2024-06-01  9:11 ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Christopher Baines @ 2023-07-20 14:01 UTC (permalink / raw)
  To: guile-devel

I'm looking at this since it's used in (web response)
read-response-body.

* module/ice-9/suspendable-ports.scm (get-bytevector-all): New
procedure.
(port-bindings): Add it.
---
 module/ice-9/suspendable-ports.scm | 36 ++++++++++++++++++++++++++++++
 1 file changed, 36 insertions(+)

diff --git a/module/ice-9/suspendable-ports.scm b/module/ice-9/suspendable-ports.scm
index 9fac1df62..e5b3de982 100644
--- a/module/ice-9/suspendable-ports.scm
+++ b/module/ice-9/suspendable-ports.scm
@@ -50,6 +50,7 @@
 
 (define-module (ice-9 suspendable-ports)
   #:use-module (rnrs bytevectors)
+  #:use-module (rnrs bytevectors gnu)
   #:use-module (ice-9 ports internal)
   #:use-module (ice-9 match)
   #:export (current-read-waiter
@@ -342,6 +343,40 @@
                 (set-port-buffer-cur! buf (+ cur transfer-size))
                 transfer-size))))))
 
+(define (get-bytevector-all port)
+  (define %initial-length 4096)
+
+  (let read-loop ((total 0)
+                  (result-length %initial-length)
+                  (result (make-bytevector %initial-length)))
+    (call-with-values (lambda () (fill-input port 1 'binary))
+      (lambda (buf cur buffered)
+        (if (zero? buffered)
+            (begin
+              (set-port-buffer-has-eof?! buf #f)
+              (if (= total 0)
+                  the-eof-object
+                  (bytevector-slice result 0 total)))
+            (let* ((new-total (+ total buffered))
+                   (new-result-length
+                    (let loop ((new-result-length result-length))
+                      (if (< new-total new-result-length)
+                          new-result-length
+                          (loop (* 2 new-result-length)))))
+                   (new-result
+                    (if (= new-result-length result-length)
+                        result
+                        (let ((new-result (make-bytevector new-result-length)))
+                          (bytevector-copy! result 0 new-result 0 total)
+                          new-result))))
+              (bytevector-copy! (port-buffer-bytevector buf) cur
+                                new-result total
+                                buffered)
+              (set-port-buffer-cur! buf (+ cur buffered))
+              (read-loop new-total
+                         new-result-length
+                         new-result)))))))
+
 (define (put-u8 port byte)
   (let* ((buf (port-write-buffer port))
          (bv (port-buffer-bytevector buf))
@@ -754,6 +789,7 @@
     ((ice-9 binary-ports)
      get-u8 lookahead-u8 get-bytevector-n get-bytevector-n!
      get-bytevector-some get-bytevector-some!
+     get-bytevector-all
      put-u8 put-bytevector)
     ((ice-9 textual-ports)
      put-char put-string)
-- 
2.41.0




^ permalink raw reply related	[flat|nested] 6+ messages in thread

* Re: [PATCH] Make get-bytevector-all suspendable.
  2023-07-20 14:01 [PATCH] Make get-bytevector-all suspendable Christopher Baines
@ 2024-06-01  9:11 ` Ludovic Courtès
  2024-06-01  9:40   ` Maxime Devos
  2024-06-01  9:44   ` Maxime Devos
  0 siblings, 2 replies; 6+ messages in thread
From: Ludovic Courtès @ 2024-06-01  9:11 UTC (permalink / raw)
  To: Christopher Baines; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

Hi Chris,

Christopher Baines <mail@cbaines.net> skribis:

> I'm looking at this since it's used in (web response)
> read-response-body.
>
> * module/ice-9/suspendable-ports.scm (get-bytevector-all): New
> procedure.
> (port-bindings): Add it.

Given that ‘get-bytevector-n!’ already has a variant in
suspendable-ports.scm, my preference would be to rewrite
‘get-bytevector-all’ in Scheme (patch attached).  That way, it would
naturally be suspendable.  (It’s also in line with the general strategy
of moving things to Scheme.)

I don’t expect significant performance difference compared to the C
implementation since that is dominated by allocations and I/O.

Thoughts?

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 4277 bytes --]

diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c
index 7c51bf617..ffa1e1b2b 100644
--- a/libguile/r6rs-ports.c
+++ b/libguile/r6rs-ports.c
@@ -1,4 +1,4 @@
-/* Copyright 2009-2011,2013-2015,2018-2019,2023
+/* Copyright 2009-2011,2013-2015,2018-2019,2023,2024
      Free Software Foundation, Inc.
 
    This file is part of Guile.
@@ -393,58 +393,23 @@ SCM_DEFINE (scm_get_bytevector_some_x, "get-bytevector-some!", 4, 0, 0,
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_get_bytevector_all, "get-bytevector-all", 1, 0, 0,
-	    (SCM port),
-	    "Read from @var{port}, blocking as necessary, until "
-	    "the end-of-file is reached.  Return either "
-	    "a new bytevector containing the data read or the "
-	    "end-of-file object (if no data were available).")
-#define FUNC_NAME s_scm_get_bytevector_all
-{
-  SCM result;
-  size_t c_len, c_count;
-  size_t c_read, c_total;
-
-  SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
+static SCM get_bytevector_all_var;
 
-  c_len = c_count = 4096;
-  result = scm_c_make_bytevector (c_count);
-  c_total = c_read = 0;
-
-  do
-    {
-      if (c_read > c_len - c_total)
-	{
-	  /* Grow the bytevector.  */
-          SCM prev = result;
-
-          if (INT_ADD_OVERFLOW (c_len, c_len))
-            scm_num_overflow (FUNC_NAME);
-
-          result = scm_c_make_bytevector (c_len * 2);
-          memcpy (SCM_BYTEVECTOR_CONTENTS (result),
-                  SCM_BYTEVECTOR_CONTENTS (prev),
-                  c_total);
-	  c_count = c_len;
-	  c_len *= 2;
-	}
-
-      /* `scm_c_read ()' blocks until C_COUNT bytes are available or EOF is
-	 reached.  */
-      c_read = scm_c_read_bytes (port, result, c_total, c_count);
-      c_total += c_read, c_count -= c_read;
-    }
-  while (c_count == 0);
-
-  if (c_total == 0)
-    return SCM_EOF_VAL;
+static void
+init_bytevector_io_vars (void)
+{
+  get_bytevector_all_var =
+    scm_c_public_lookup ("ice-9 binary-port", "get-bytevector-all");
+}
 
-  if (c_len > c_total)
-    return scm_c_shrink_bytevector (result, c_total);
+SCM
+scm_get_bytevector_all (SCM port)
+{
+  static scm_i_pthread_once_t once = SCM_I_PTHREAD_ONCE_INIT;
+  scm_i_pthread_once (&once, init_bytevector_io_vars);
 
-  return result;
+  return scm_call_1 (scm_variable_ref (get_bytevector_all_var), port);
 }
-#undef FUNC_NAME
 
 
 \f
diff --git a/module/ice-9/binary-ports.scm b/module/ice-9/binary-ports.scm
index b7eddc93d..864d9ef9a 100644
--- a/module/ice-9/binary-ports.scm
+++ b/module/ice-9/binary-ports.scm
@@ -1,5 +1,5 @@
 ;;; binary-ports.scm --- Binary IO on ports
-;;; Copyright (C) 2009-2011,2013,2016,2019,2021,2023 Free Software Foundation, Inc.
+;;; Copyright (C) 2009-2011,2013,2016,2019,2021,2023,2024 Free Software Foundation, Inc.
 ;;;
 ;;; This library is free software: you can redistribute it and/or modify
 ;;; it under the terms of the GNU Lesser General Public License as
@@ -27,6 +27,7 @@
 
 (define-module (ice-9 binary-ports)
   #:use-module (rnrs bytevectors)
+  #:autoload   (rnrs bytevectors gnu) (bytevector-slice)
   #:use-module (ice-9 match)
   #:use-module (ice-9 custom-ports)
   #:export (eof-object
@@ -180,3 +181,29 @@ bytevector composed of the bytes written into the port is returned."
                     ;; FIXME: Instead default to current encoding, if
                     ;; someone reads text from this port.
                     #:encoding 'ISO-8859-1 #:conversion-strategy 'error))
+
+\f
+;;;
+;;; Binary input.
+;;;
+
+(define (get-bytevector-all port)
+  "Read from @var{port}, blocking as necessary, until
+the end-of-file is reached.  Return either a new bytevector containing
+the data read or the end-of-file object (if no data were available)."
+  (define initial-capacity 4096)
+
+  (let loop ((bv (make-bytevector initial-capacity))
+             (capacity initial-capacity)
+             (size 0))
+    (match (get-bytevector-n! port bv size (- capacity size))
+      ((? eof-object?)
+       (bytevector-slice bv 0 size))
+      (read
+       (let ((size (+ read size)))
+         (if (= capacity size)
+             (let* ((capacity (* capacity 2))
+                    (new (make-bytevector capacity)))
+               (bytevector-copy! bv 0 new 0 size)
+               (loop new capacity size))
+             (loop bv capacity size)))))))

^ permalink raw reply related	[flat|nested] 6+ messages in thread

* RE: [PATCH] Make get-bytevector-all suspendable.
  2024-06-01  9:11 ` Ludovic Courtès
@ 2024-06-01  9:40   ` Maxime Devos
  2024-06-16 13:07     ` Ludovic Courtès
  2024-06-01  9:44   ` Maxime Devos
  1 sibling, 1 reply; 6+ messages in thread
From: Maxime Devos @ 2024-06-01  9:40 UTC (permalink / raw)
  To: Ludovic Courtès, Christopher Baines; +Cc: guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1807 bytes --]

>Given that ‘get-bytevector-n!’ already has a variant in
suspendable-ports.scm, my preference would be to rewrite
‘get-bytevector-all’ in Scheme (patch attached).  That way, it would
naturally be suspendable.  (It’s also in line with the general strategy
of moving things to Scheme.)

It currently is difficult to write a correct implementation of get-bytevector-all in pure Scheme, because ‘get-bytevector-all’ needs to return a _fresh_ bytevector and could return twice (e.g. in case of system-async-mark + call-with-prompt shenanigans). I think the proposed implementation is incorrect in this way.

Could be resolved with call-with-blocked-asyncs, but given the title of the patch, that’s against the point.

However, given the sequential and stateful nature of ports, I don’t think it’s useful to implement support for these ‘return twice’ situations, so I think it would be best to simply adjust the documentation to disallow this situation.

Since it is not unique to get-bytevector-all, maybe it could be mentioned in https://www.gnu.org/software/guile/manual/html_node/Ports.html that it isn’t allowed to put any non-pure operation (where we include allocations as a side-effect, e.g. for open-input-bytevector etc, ) related to ports in such a situations?

+(define (get-bytevector-all port)
+  (define %initial-length 4096)
+
+  (let read-loop ((total 0)
+                  (result-length %initial-length)
+                  (result (make-bytevector %initial-length)))

I sense a lack of tests for the test suite ...
Also, to avoid avoidable allocations and copying, it would be nice if %initial-length could be overridden (optional #:initial-length keyword argument), in case the user has a better guess available.

Best regards,
Maxime Devos

[-- Attachment #2: Type: text/html, Size: 3664 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] Make get-bytevector-all suspendable.
  2024-06-01  9:11 ` Ludovic Courtès
  2024-06-01  9:40   ` Maxime Devos
@ 2024-06-01  9:44   ` Maxime Devos
  1 sibling, 0 replies; 6+ messages in thread
From: Maxime Devos @ 2024-06-01  9:44 UTC (permalink / raw)
  To: Ludovic Courtès, Christopher Baines; +Cc: guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 187 bytes --]

Adding to my previous response:

Most (all?) tests for suspendable/non-suspendable are shared, and run in both cases, so probably the suspendable get-bytevector-all already has tests.

[-- Attachment #2: Type: text/html, Size: 1237 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [PATCH] Make get-bytevector-all suspendable.
  2024-06-01  9:40   ` Maxime Devos
@ 2024-06-16 13:07     ` Ludovic Courtès
  2024-06-16 13:23       ` Maxime Devos
  0 siblings, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2024-06-16 13:07 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-devel

Hi Maxime,

Maxime Devos <maximedevos@telenet.be> skribis:

> It currently is difficult to write a correct implementation of get-bytevector-all in pure Scheme, because ‘get-bytevector-all’ needs to return a _fresh_ bytevector and could return twice (e.g. in case of system-async-mark + call-with-prompt shenanigans). I think the proposed implementation is incorrect in this way.

Hmm I don’t see how it could return twice.  If an async runs while
‘get-bytevector-all’ is executed, it cannot cause ‘get-bytevector-all’
to abort to a prompt.  I think we’re fine, no?

> +(define (get-bytevector-all port)
> +  (define %initial-length 4096)
> +
> +  (let read-loop ((total 0)
> +                  (result-length %initial-length)
> +                  (result (make-bytevector %initial-length)))
>
> I sense a lack of tests for the test suite ...
> Also, to avoid avoidable allocations and copying, it would be nice if %initial-length could be overridden (optional #:initial-length keyword argument), in case the user has a better guess available.

Yeah, but since this interface is specified in R6RS, and since we’re
just porting the existing C implementation to Scheme, I’d rather not add
such a keyword argument.  We could provide an extension separately
though, as is done for (rnrs bytevectors gnu) for instance.

Regarding tests, you’re right: this is already covered.

I pushed this as 461ff313fa478d207a7668595e9d976a2ace9770 together with
a NEWS entry.

Thanks for your feedback!

Ludo’.



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [PATCH] Make get-bytevector-all suspendable.
  2024-06-16 13:07     ` Ludovic Courtès
@ 2024-06-16 13:23       ` Maxime Devos
  0 siblings, 0 replies; 6+ messages in thread
From: Maxime Devos @ 2024-06-16 13:23 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 1513 bytes --]

>> It currently is difficult to write a correct implementation of get-bytevector-all in pure Scheme, because ‘get-bytevector-all’ needs to return a _fresh_ bytevector and could return twice (e.g. in case of system-async-mark + call-with-prompt shenanigans). I think the proposed implementation is incorrect in this way.

>Hmm I don’t see how it could return twice.  If an async runs while
>‘get-bytevector-all’ is executed, it cannot cause ‘get-bytevector-all’
>to abort to a prompt.  I think we’re fine, no?

I did not say that ‘get-bytevector-all’ aborts to a prompt. Rather, the async itself does the abort-to-prompt, and since the async is run inside the get-bytevector-all, as a result get-bytevector-all could return twice (depending on what the handler of call-with-prompt is doing with the delimited continuation).

I’m pretty sure it can do this, that’s how Fibers works (to be clear I’m referring to the abort-to-prompt from an async, not the return twice, Fibers doesn’t do return twice things) -- non-cooperative scheduling is implemented by aborting from an async.

Example (assuming bad/good timing):

Thread #1:
;; Consider the situation where the handler invokes the delimited continuation twice.
(call-with-prompt
  [tag + handler things]
  (lambda () (get-bytevector-all [...]))
  [tag + handler things])

Thread #2:
(system-async-mark
  (lambda ()
    (abort-to-prompt [tag + more arguments]))
  [Thread #1]))

Best regards,
Maxime Devos.

[-- Attachment #2: Type: text/html, Size: 3162 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-06-16 13:23 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-07-20 14:01 [PATCH] Make get-bytevector-all suspendable Christopher Baines
2024-06-01  9:11 ` Ludovic Courtès
2024-06-01  9:40   ` Maxime Devos
2024-06-16 13:07     ` Ludovic Courtès
2024-06-16 13:23       ` Maxime Devos
2024-06-01  9:44   ` Maxime Devos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).