unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] Add 'bytevector-slice'.
@ 2023-01-11 15:00 Ludovic Courtès
  2023-01-11 15:21 ` Thompson, David
                   ` (3 more replies)
  0 siblings, 4 replies; 17+ messages in thread
From: Ludovic Courtès @ 2023-01-11 15:00 UTC (permalink / raw)
  To: guile-devel; +Cc: Andy Wingo, Ludovic Courtès

* module/rnrs/bytevectors/gnu.scm: New file.
* am/bootstrap.am (SOURCES): Add it.
* libguile/bytevectors.c (scm_bytevector_slice): New function.
* libguile/bytevectors.h (scm_bytevector_slice): New declaration.
* test-suite/tests/bytevectors.test ("bytevector-slice"): New tests.
* doc/ref/api-data.texi (Bytevector Slices): New node.
---
 am/bootstrap.am                   |  1 +
 doc/ref/api-data.texi             | 46 ++++++++++++++++++++-
 doc/ref/guile.texi                |  2 +-
 libguile/bytevectors.c            | 69 ++++++++++++++++++++++++++++++-
 libguile/bytevectors.h            |  3 +-
 module/rnrs/bytevectors/gnu.scm   | 24 +++++++++++
 test-suite/tests/bytevectors.test | 53 +++++++++++++++++++++++-
 7 files changed, 193 insertions(+), 5 deletions(-)
 create mode 100644 module/rnrs/bytevectors/gnu.scm

Hello!

This is an updated version of the ‘bytevector-slice’ primitive I used in
the linker/assembler patch series¹ that I think is ready to go.

I went to some length to do something sensible wrt. element type of the
input, when the input is a SRFI-4 uniform vector.  The other option would
be to make the output a pure bytevector unconditionally, but I thought
it would be more consistent and useful to preserve the input element type
when possible (see tests with an f32vector).

If there are no objections, I’ll push it to ‘main’ in the coming days.

Thanks,
Ludo’.

¹ https://lists.gnu.org/archive/html/guile-devel/2023-01/msg00013.html

diff --git a/am/bootstrap.am b/am/bootstrap.am
index 0257d53dc..53ee68315 100644
--- a/am/bootstrap.am
+++ b/am/bootstrap.am
@@ -249,6 +249,7 @@ SOURCES =					\
   rnrs/arithmetic/fixnums.scm			\
   rnrs/arithmetic/flonums.scm			\
   rnrs/bytevectors.scm				\
+  rnrs/bytevectors/gnu.scm			\
   rnrs/io/simple.scm				\
   rnrs/io/ports.scm				\
   rnrs/records/inspection.scm			\
diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index 8658b9785..fe2d2af50 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -1,6 +1,6 @@
 @c -*-texinfo-*-
 @c This is part of the GNU Guile Reference Manual.
-@c Copyright (C)  1996, 1997, 2000-2004, 2006-2017, 2019-2020, 2022
+@c Copyright (C)  1996, 1997, 2000-2004, 2006-2017, 2019-2020, 2022-2023
 @c   Free Software Foundation, Inc.
 @c See the file guile.texi for copying conditions.
 
@@ -6673,6 +6673,7 @@ Bytevectors can be used with the binary input/output primitives
 * Bytevectors as Strings::      Interpreting bytes as Unicode strings.
 * Bytevectors as Arrays::       Guile extension to the bytevector API.
 * Bytevectors as Uniform Vectors::  Bytevectors and SRFI-4.
+* Bytevector Slices::           Aliases for parts of a bytevector.
 @end menu
 
 @node Bytevector Endianness
@@ -7108,6 +7109,49 @@ Bytevectors may also be accessed with the SRFI-4 API. @xref{SRFI-4 and
 Bytevectors}, for more information.
 
 
+@node Bytevector Slices
+@subsubsection Bytevector Slices
+
+@cindex subset, of a bytevector
+@cindex slice, of a bytevector
+@cindex slice, of a uniform vector
+As an extension to the R6RS specification, the @code{(rnrs bytevectors
+gnu)} module provides the @code{bytevector-slice} procedure, which
+returns a bytevector aliasing part of an existing bytevector.
+
+@deffn {Scheme Procedure} bytevector-slice @var{bv} @var{offset} [@var{size}]
+@deffnx {C Function} scm_bytevector_slice (@var{bv}, @var{offset}, @var{size})
+Return the slice of @var{bv} starting at @var{offset} and counting
+@var{size} bytes.  When @var{size} is omitted, the slice covers all
+of @var{bv} starting from @var{offset}.  The returned slice shares
+storage with @var{bv}: changes to the slice are visible in @var{bv}
+and vice-versa.
+
+When @var{bv} is actually a SRFI-4 uniform vector, its element
+type is preserved unless @var{offset} and @var{size} are not aligned
+on its element type size.
+@end deffn
+
+Here is an example showing how to use it:
+
+@lisp
+(use-modules (rnrs bytevectors)
+             (rnrs bytevectors gnu))
+
+(define bv (u8-list->bytevector (iota 10)))
+(define slice (bytevector-slice bv 2 3))
+
+slice
+@result{} #vu8(2 3 4)
+
+(bytevector-u8-set! slice 0 77)
+slice
+@result{} #vu8(77 3 4)
+
+bv
+@result{} #vu8(0 1 77 3 4 5 6 7 8 9)
+@end lisp
+
 @node Arrays
 @subsection Arrays
 @tpindex Arrays
diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
index 6a81a0893..8414c3e2d 100644
--- a/doc/ref/guile.texi
+++ b/doc/ref/guile.texi
@@ -13,7 +13,7 @@
 @copying
 This manual documents Guile version @value{VERSION}.
 
-Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation,
+Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,
 Inc. @*
 Copyright (C) 2021 Maxime Devos
 
diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index bbc23f449..6b920c88a 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -1,4 +1,4 @@
-/* Copyright 2009-2015,2018-2019
+/* Copyright 2009-2015,2018-2019,2022-2023
      Free Software Foundation, Inc.
 
    This file is part of Guile.
@@ -325,6 +325,73 @@ scm_c_take_typed_bytevector (signed char *contents, size_t len,
   return ret;
 }
 
+SCM_DEFINE (scm_bytevector_slice, "bytevector-slice", 2, 1, 0,
+            (SCM bv, SCM offset, SCM size),
+            "Return the slice of @var{bv} starting at @var{offset} and counting\n"
+            "@var{size} bytes.  When @var{size} is omitted, the slice covers all\n"
+            "of @var{bv} starting from @var{offset}.  The returned slice shares\n"
+            "storage with @var{bv}: changes to the slice are visible in @var{bv}\n"
+            "and vice-versa.\n"
+            "\n"
+            "When @var{bv} is actually a SRFI-4 uniform vector, its element\n"
+            "type is preserved unless @var{offset} and @var{size} are not aligned\n"
+            "on its element type size.\n")
+#define FUNC_NAME s_scm_bytevector_slice
+{
+  SCM ret;
+  size_t c_offset, c_size;
+  scm_t_array_element_type element_type;
+
+  SCM_VALIDATE_BYTEVECTOR (1, bv);
+
+  /* FIXME: Until 3.0.8 included, the assembler would not set the
+     SCM_F_BYTEVECTOR_CONTIGUOUS flag on literals.  Thus, ignore it and
+     assume BV is contiguous (how could it not be anyway?).  */
+#if 0
+  if (!SCM_BYTEVECTOR_CONTIGUOUS_P (bv))
+    scm_wrong_type_arg_msg (FUNC_NAME, 0, bv, "contiguous bytevector");
+#endif
+
+  c_offset = scm_to_size_t (offset);
+
+  if (SCM_UNBNDP (size))
+    {
+      if (c_offset < SCM_BYTEVECTOR_LENGTH (bv))
+        c_size = SCM_BYTEVECTOR_LENGTH (bv) - c_offset;
+      else
+        c_size = 0;
+    }
+  else
+    c_size = scm_to_size_t (size);
+
+  if (c_offset + c_size > SCM_BYTEVECTOR_LENGTH (bv))
+    scm_out_of_range (FUNC_NAME, offset);
+
+  /* Preserve the element type of BV, unless we're not slicing on type
+     boundaries.  */
+  element_type = SCM_BYTEVECTOR_ELEMENT_TYPE (bv);
+  if ((c_offset % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0)
+      || (c_size % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0))
+    element_type = SCM_ARRAY_ELEMENT_TYPE_VU8;
+  else
+    c_size /= (scm_i_array_element_type_sizes[element_type] / 8);
+
+  ret = make_bytevector_from_buffer (c_size,
+                                     SCM_BYTEVECTOR_CONTENTS (bv) + c_offset,
+                                     element_type);
+  if (!SCM_MUTABLE_BYTEVECTOR_P (bv))
+    {
+      /* Preserve the immutability property.  */
+      scm_t_bits flags = SCM_BYTEVECTOR_FLAGS (ret);
+      SCM_SET_BYTEVECTOR_FLAGS (ret, flags | SCM_F_BYTEVECTOR_IMMUTABLE);
+    }
+
+  SCM_BYTEVECTOR_SET_PARENT (ret, bv);
+
+  return ret;
+}
+#undef FUNC_NAME
+
 /* Shrink BV to C_NEW_LEN (which is assumed to be smaller than its current
    size) and return the new bytevector (possibly different from BV).  */
 SCM
diff --git a/libguile/bytevectors.h b/libguile/bytevectors.h
index 980d6e267..6179bfd86 100644
--- a/libguile/bytevectors.h
+++ b/libguile/bytevectors.h
@@ -1,7 +1,7 @@
 #ifndef SCM_BYTEVECTORS_H
 #define SCM_BYTEVECTORS_H
 
-/* Copyright 2009,2011,2018
+/* Copyright 2009,2011,2018,2022
      Free Software Foundation, Inc.
 
    This file is part of Guile.
@@ -52,6 +52,7 @@ SCM_API uint8_t scm_c_bytevector_ref (SCM, size_t);
 SCM_API void scm_c_bytevector_set_x (SCM, size_t, uint8_t);
 
 SCM_API SCM scm_make_bytevector (SCM, SCM);
+SCM_API SCM scm_bytevector_slice (SCM, SCM, SCM);
 SCM_API SCM scm_native_endianness (void);
 SCM_API SCM scm_bytevector_p (SCM);
 SCM_API SCM scm_bytevector_length (SCM);
diff --git a/module/rnrs/bytevectors/gnu.scm b/module/rnrs/bytevectors/gnu.scm
new file mode 100644
index 000000000..ce97535a8
--- /dev/null
+++ b/module/rnrs/bytevectors/gnu.scm
@@ -0,0 +1,24 @@
+;;;; gnu.scm --- GNU extensions to the bytevector API.
+
+;;;;	Copyright (C) 2022 Free Software Foundation, Inc.
+;;;;
+;;;; This library is free software; you can redistribute it and/or
+;;;; modify it under the terms of the GNU Lesser General Public
+;;;; License as published by the Free Software Foundation; either
+;;;; version 3 of the License, or (at your option) any later version.
+;;;;
+;;;; This library is distributed in the hope that it will be useful,
+;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+;;;; Lesser General Public License for more details.
+;;;;
+;;;; You should have received a copy of the GNU Lesser General Public
+;;;; License along with this library; if not, write to the Free Software
+;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
+
+(define-module (rnrs bytevectors gnu)
+  #:version (6)
+  #:export (bytevector-slice))
+
+(define bytevector-slice
+  (@@ (rnrs bytevectors) bytevector-slice))
diff --git a/test-suite/tests/bytevectors.test b/test-suite/tests/bytevectors.test
index 732aadb3e..dc4b32370 100644
--- a/test-suite/tests/bytevectors.test
+++ b/test-suite/tests/bytevectors.test
@@ -1,6 +1,6 @@
 ;;;; bytevectors.test --- R6RS bytevectors. -*- mode: scheme; coding: utf-8; -*-
 ;;;;
-;;;; Copyright (C) 2009-2015, 2018, 2021 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009-2015, 2018, 2021, 2023 Free Software Foundation, Inc.
 ;;;;
 ;;;; Ludovic Courtès
 ;;;;
@@ -22,6 +22,7 @@
   #:use-module (test-suite lib)
   #:use-module (system base compile)
   #:use-module (rnrs bytevectors)
+  #:use-module (rnrs bytevectors gnu)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-4))
 
@@ -666,6 +667,56 @@
     exception:out-of-range
     (with-input-from-string "#vu8(0 256)" read)))
 
+\f
+(with-test-prefix "bytevector-slice"
+
+  (pass-if-exception "wrong size"
+      exception:out-of-range
+    (let ((b #vu8(1 2 3)))
+      (bytevector-slice b 1 3)))
+
+  (pass-if-equal "slices"
+      (list #vu8(1 2) #vu8(2 3)
+            #vu8(1)   #vu8(2) #vu8(3))
+    (let ((b #vu8(1 2 3)))
+      (list (bytevector-slice b 0 2)
+            (bytevector-slice b 1)
+            (bytevector-slice b 0 1)
+            (bytevector-slice b 1 1)
+            (bytevector-slice b 2))))
+
+  (pass-if-exception "immutable flag preserved"
+      exception:wrong-type-arg
+    (compile '(begin
+                (use-modules (rnrs bytevectors)
+                             (rnrs bytevectors gnu))
+
+                ;; The literal bytevector below is immutable.
+                (let ((bv #vu8(1 2 3)))
+                  (bytevector-u8-set! (bytevector-slice bv 1) 0 0)))
+
+             ;; Disable optimizations to invoke the full-blown
+             ;; 'scm_bytevector_u8_set_x' procedure, which checks for
+             ;; the SCM_F_BYTEVECTOR_IMMUTABLE flag.
+             #:optimization-level 0
+             #:to 'value))
+
+  (pass-if-equal "slice of f32vector"
+      '(8 2)
+    (let* ((v #f32(1.1 1.2 3.14))
+           (s (bytevector-slice v 4)))
+      (and (= (f32vector-ref s 0)
+              (f32vector-ref v 1))
+           (list (bytevector-length s)
+                 (f32vector-length s)))))
+
+  (pass-if-equal "unaligned slice of f32vector"
+      10
+    (let* ((v #f32(1.1 1.2 3.14))
+           (s (bytevector-slice v 2)))
+      (and (not (f32vector? s))
+           (bytevector-length s)))))
+
 \f
 (with-test-prefix "Arrays"
 
-- 
2.38.1




^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-11 15:00 [PATCH] Add 'bytevector-slice' Ludovic Courtès
@ 2023-01-11 15:21 ` Thompson, David
  2023-01-11 15:29 ` Jean Abou Samra
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 17+ messages in thread
From: Thompson, David @ 2023-01-11 15:21 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel, Andy Wingo

Hi Ludovic,

On Wed, Jan 11, 2023 at 10:00 AM Ludovic Courtès <ludo@gnu.org> wrote:
>
> +@node Bytevector Slices
> +@subsubsection Bytevector Slices
> +
> +@cindex subset, of a bytevector
> +@cindex slice, of a bytevector
> +@cindex slice, of a uniform vector
> +As an extension to the R6RS specification, the @code{(rnrs bytevectors
> +gnu)} module provides the @code{bytevector-slice} procedure, which
> +returns a bytevector aliasing part of an existing bytevector.
> +
> +@deffn {Scheme Procedure} bytevector-slice @var{bv} @var{offset} [@var{size}]
> +@deffnx {C Function} scm_bytevector_slice (@var{bv}, @var{offset}, @var{size})
> +Return the slice of @var{bv} starting at @var{offset} and counting
> +@var{size} bytes.  When @var{size} is omitted, the slice covers all
> +of @var{bv} starting from @var{offset}.  The returned slice shares
> +storage with @var{bv}: changes to the slice are visible in @var{bv}
> +and vice-versa.

Just wanted to chime in to say that this is a really great new
feature! Bytevector slices will be very useful for performance
sensitive code that could benefit from using a pre-allocated memory
arena.

Really really great stuff! Thank you!

- Dave



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-11 15:00 [PATCH] Add 'bytevector-slice' Ludovic Courtès
  2023-01-11 15:21 ` Thompson, David
@ 2023-01-11 15:29 ` Jean Abou Samra
  2023-01-11 17:34   ` Ludovic Courtès
  2023-01-11 17:39 ` The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’ Ludovic Courtès
  2023-01-12  0:10 ` [PATCH] Add 'bytevector-slice' Maxime Devos
  3 siblings, 1 reply; 17+ messages in thread
From: Jean Abou Samra @ 2023-01-11 15:29 UTC (permalink / raw)
  To: Ludovic Courtès, guile-devel; +Cc: Andy Wingo


[-- Attachment #1.1: Type: text/plain, Size: 1237 bytes --]

Hi Ludovic,

Le 11/01/2023 à 16:00, Ludovic Courtès a écrit :
> +@node Bytevector Slices
> +@subsubsection Bytevector Slices
> +
> +@cindex subset, of a bytevector
> +@cindex slice, of a bytevector
> +@cindex slice, of a uniform vector
> +As an extension to the R6RS specification, the @code{(rnrs bytevectors
> +gnu)} module provides the @code{bytevector-slice} procedure, which
> +returns a bytevector aliasing part of an existing bytevector.
> +
> +@deffn {Scheme Procedure} bytevector-slice @var{bv} @var{offset} [@var{size}]
> +@deffnx {C Function} scm_bytevector_slice (@var{bv}, @var{offset}, @var{size})
> +Return the slice of @var{bv} starting at @var{offset} and counting
> +@var{size} bytes.  When @var{size} is omitted, the slice covers all
> +of @var{bv} starting from @var{offset}.  The returned slice shares
> +storage with @var{bv}: changes to the slice are visible in @var{bv}
> +and vice-versa.



What do you think about making it more similar to substrings?
There the 'substring' procedure makes a copy-on-write substring,
and you have substring/shared if you really want shared mutation.
Would something like this be meaningful / feasible / useful
for bytevectors?

Best,
Jean



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-11 15:29 ` Jean Abou Samra
@ 2023-01-11 17:34   ` Ludovic Courtès
  2023-01-11 17:37     ` [EXT] " Thompson, David
  0 siblings, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2023-01-11 17:34 UTC (permalink / raw)
  To: Jean Abou Samra; +Cc: guile-devel, Andy Wingo

Hi,

Jean Abou Samra <jean@abou-samra.fr> skribis:

> What do you think about making it more similar to substrings?
> There the 'substring' procedure makes a copy-on-write substring,
> and you have substring/shared if you really want shared mutation.
> Would something like this be meaningful / feasible / useful
> for bytevectors?

I’m not convinced there’s a need for copy-on-write bytevectors, and it
would be hard to implement, and to implement
efficiently—bytevector-mutating instructions would have to check whether
they’re accessing a CoW bytevector and DTRT.

What could be convenient though is ‘bytevector-copy’ (no bang), which
would combine ‘make-bytevector’ + ‘bytevector-copy!’.

Ludo’.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXT] Re: [PATCH] Add 'bytevector-slice'.
  2023-01-11 17:34   ` Ludovic Courtès
@ 2023-01-11 17:37     ` Thompson, David
  2023-01-11 19:05       ` [EXT] " lloda
  0 siblings, 1 reply; 17+ messages in thread
From: Thompson, David @ 2023-01-11 17:37 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Jean Abou Samra, guile-devel, Andy Wingo

On Wed, Jan 11, 2023 at 12:34 PM Ludovic Courtès <ludo@gnu.org> wrote:
>
> What could be convenient though is ‘bytevector-copy’ (no bang), which
> would combine ‘make-bytevector’ + ‘bytevector-copy!’.

'bytevector-copy' already exists, or do you mean some different
implementation of it?

- Dave



^ permalink raw reply	[flat|nested] 17+ messages in thread

* The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’
  2023-01-11 15:00 [PATCH] Add 'bytevector-slice' Ludovic Courtès
  2023-01-11 15:21 ` Thompson, David
  2023-01-11 15:29 ` Jean Abou Samra
@ 2023-01-11 17:39 ` Ludovic Courtès
  2023-01-11 18:51   ` lloda
  2023-01-12  0:10 ` [PATCH] Add 'bytevector-slice' Maxime Devos
  3 siblings, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2023-01-11 17:39 UTC (permalink / raw)
  To: guile-devel; +Cc: Andy Wingo

[-- Attachment #1: Type: text/plain, Size: 425 bytes --]

Ludovic Courtès <ludo@gnu.org> skribis:

> This is an updated version of the ‘bytevector-slice’ primitive I used in
> the linker/assembler patch series¹ that I think is ready to go.

While working on this, I noticed I might have to pay attention to
‘SCM_F_BYTEVECTOR_CONTIGUOUS’, as noted in the patch.

But it turns out that flag isn’t really used.  I found two places that
should add it and do not:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 1766 bytes --]

diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index 6b920c88a..fd7fdad0b 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -274,7 +274,8 @@ make_bytevector_from_buffer (size_t len, void *contents,
 
       c_len = len * (scm_i_array_element_type_sizes[element_type] / 8);
 
-      SCM_SET_BYTEVECTOR_FLAGS (ret, element_type);
+      SCM_SET_BYTEVECTOR_FLAGS (ret,
+                                element_type | SCM_F_BYTEVECTOR_CONTIGUOUS);
       SCM_BYTEVECTOR_SET_LENGTH (ret, c_len);
       SCM_BYTEVECTOR_SET_CONTENTS (ret, contents);
       SCM_BYTEVECTOR_SET_PARENT (ret, SCM_BOOL_F);
diff --git a/module/system/repl/debug.scm b/module/system/repl/debug.scm
diff --git a/module/system/vm/assembler.scm b/module/system/vm/assembler.scm
index 165976363..61e0460ff 100644
--- a/module/system/vm/assembler.scm
+++ b/module/system/vm/assembler.scm
@@ -1857,8 +1857,9 @@ should be .data or .rodata), and return the resulting linker object.
   (define tc7-program #x45)
 
   (define tc7-bytevector #x4d)
-  ;; This flag is intended to be left-shifted by 7 bits.
+  ;; These flags are intended to be left-shifted by 7 bits.
   (define bytevector-immutable-flag #x200)
+  (define bytevector-contiguous-flag #x100)
 
   (define tc7-array #x5d)
 
@@ -2026,6 +2027,7 @@ should be .data or .rodata), and return the resulting linker object.
                                ;; Bytevector immutable flag also shifted
                                ;; left.
                                (ash (logior bytevector-immutable-flag
+                                            bytevector-contiguous-flag
                                             (array-type-code obj))
                                     7)))))
           (case word-size

[-- Attachment #3: Type: text/plain, Size: 372 bytes --]


There are probably more.

Fundamentally, I’m not sure what this flag is supposed to mean.  AFAICS,
there’s no way to create a non-contiguous bytevector (or SRFI-4 vector).

This flag was added in 7ed54fd36d2e381aa46ef8a7d2fc13a6776b573a.  My
guess is that it was part of plan that wasn’t carried out in the end.

Andy, thoughts?  :-)

Thanks,
Ludo’.

^ permalink raw reply related	[flat|nested] 17+ messages in thread

* Re: The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’
  2023-01-11 17:39 ` The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’ Ludovic Courtès
@ 2023-01-11 18:51   ` lloda
  2023-01-11 19:00     ` lloda
  0 siblings, 1 reply; 17+ messages in thread
From: lloda @ 2023-01-11 18:51 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel@gnu.org, Andy Wingo

[-- Attachment #1: Type: text/plain, Size: 217 bytes --]


I think this flag has no use and you should just remove all references to it and deprecate it. Contiguity is a trivial check.

We removed a similar flag in the array type not long ago.

Regards

  Daniel



[-- Attachment #2: Type: text/html, Size: 764 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’
  2023-01-11 18:51   ` lloda
@ 2023-01-11 19:00     ` lloda
  0 siblings, 0 replies; 17+ messages in thread
From: lloda @ 2023-01-11 19:00 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel@gnu.org, Andy Wingo



> On 11 Jan 2023, at 19:51, lloda <lloda@sarc.name> wrote:
> 
> Contiguity is a trivial check.

I mean if bytevectors even had a stride field.




^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXT] [PATCH] Add 'bytevector-slice'.
  2023-01-11 17:37     ` [EXT] " Thompson, David
@ 2023-01-11 19:05       ` lloda
  2023-01-12 22:27         ` Ludovic Courtès
  0 siblings, 1 reply; 17+ messages in thread
From: lloda @ 2023-01-11 19:05 UTC (permalink / raw)
  To: Thompson, David
  Cc: Ludovic Courtès, Jean Abou Samra, guile-devel@gnu.org,
	Andy Wingo

[-- Attachment #1: Type: text/plain, Size: 821 bytes --]



> On 11 Jan 2023, at 18:37, Thompson, David <dthompson2@worcester.edu> wrote:
> 
> On Wed, Jan 11, 2023 at 12:34 PM Ludovic Courtès <ludo@gnu.org> wrote:
>> 
>> What could be convenient though is ‘bytevector-copy’ (no bang), which
>> would combine ‘make-bytevector’ + ‘bytevector-copy!’.
> 
> 'bytevector-copy' already exists, or do you mean some different
> implementation of it?
> 
> - Dave

The current bytevector-copy takes a single argument. I would extend it in the same way the srfi4 copy functions were extended recently in the following commit

http://git.savannah.gnu.org/gitweb/?p=guile.git;a=commitdiff;h=6be51f9bbf47692ee5747b2cac6b372df65de970 <http://git.savannah.gnu.org/gitweb/?p=guile.git;a=commitdiff;h=6be51f9bbf47692ee5747b2cac6b372df65de970>

Regards

  Daniel


[-- Attachment #2: Type: text/html, Size: 1588 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-11 15:00 [PATCH] Add 'bytevector-slice' Ludovic Courtès
                   ` (2 preceding siblings ...)
  2023-01-11 17:39 ` The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’ Ludovic Courtès
@ 2023-01-12  0:10 ` Maxime Devos
  2023-01-13 11:32   ` Ludovic Courtès
  2023-01-14 15:19   ` Ludovic Courtès
  3 siblings, 2 replies; 17+ messages in thread
From: Maxime Devos @ 2023-01-12  0:10 UTC (permalink / raw)
  To: Ludovic Courtès, guile-devel; +Cc: Andy Wingo


[-- Attachment #1.1.1: Type: text/plain, Size: 12595 bytes --]

This looks useful to me, especially once the optimiser recognises 
'bytevector-slice'.  (In Scheme-GNUnet, I defined a wrapper around 
bytevectors called 'bytevectors slices' to implement such a thing.)

The only thing missing for me, is a procedure 
'bytevector-slice/read-only' and 'bytevector-slice/write-only', then I 
could throw the module implementing the wrapping in Scheme-GNUnet and 
the overhead incurred by wrapping away.

On 11-01-2023 16:00, Ludovic Courtès wrote:
> +@deffn {Scheme Procedure} bytevector-slice @var{bv} @var{offset} [@var{size}]
> +@deffnx {C Function} scm_bytevector_slice (@var{bv}, @var{offset}, @var{size})
> +Return the slice of @var{bv} starting at @var{offset} and counting
> +@var{size} bytes.  When @var{size} is omitted, the slice covers all
> +of @var{bv} starting from @var{offset}.  The returned slice shares
> +storage with @var{bv}: changes to the slice are visible in @var{bv}
> +and vice-versa.
> +
> +When @var{bv} is actually a SRFI-4 uniform vector, its element
> +type is preserved unless @var{offset} and @var{size} are not aligned
> +on its element type size.
> +@end deffn
I wrote stuff here about 'What if offset or size unaligned?  This is 
undocumented.', and only later noticed this paragraph documenting 
exactly that.

> +
> +Here is an example showing how to use it:
> +
> +@lisp
> +(use-modules (rnrs bytevectors)
> +             (rnrs bytevectors gnu))

I thought that R6RS reserved (rnrs ...) to the RnRS process, so I would 
think that naming it (rnrs bytevectors gnu) would be 
standards-incompliant, though I cannot find in the specification, so 
maybe it isn't actually reserved.

(SRFI looks a bit looser to me, so I find the (srfi ... gnu) acceptable, 
but (rnrs ... gnu) looks weird to me, I would propose (ice-9 
bytevector-extensions) or such.).

> +
> diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
> index 6a81a0893..8414c3e2d 100644
> --- a/doc/ref/guile.texi
> +++ b/doc/ref/guile.texi
> @@ -13,7 +13,7 @@
>   @copying
>   This manual documents Guile version @value{VERSION}.
>   
> -Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation,
> +Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,


Where does this year 2022 come from?  Does a previous version of this 
patch predate the new year?


> diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
> index bbc23f449..6b920c88a 100644
> --- a/libguile/bytevectors.c
> +++ b/libguile/bytevectors.c
> @@ -1,4 +1,4 @@
> -/* Copyright 2009-2015,2018-2019
> +/* Copyright 2009-2015,2018-2019,2022-2023
>        Free Software Foundation, Inc.
>   
Ditto.

>      This file is part of Guile.
> @@ -325,6 +325,73 @@ scm_c_take_typed_bytevector (signed char *contents, size_t len,
>     return ret;
>   }
>   
> +SCM_DEFINE (scm_bytevector_slice, "bytevector-slice", 2, 1, 0,
> +            (SCM bv, SCM offset, SCM size),
> +            "Return the slice of @var{bv} starting at @var{offset} and counting\n"
> +            "@var{size} bytes.  When @var{size} is omitted, the slice covers all\n"
> +            "of @var{bv} starting from @var{offset}.  The returned slice shares\n"
> +            "storage with @var{bv}: changes to the slice are visible in @var{bv}\n"
> +            "and vice-versa.\n"
> +            "\n"
> +            "When @var{bv} is actually a SRFI-4 uniform vector, its element\n"
> +            "type is preserved unless @var{offset} and @var{size} are not aligned\n"
> +            "on its element type size.\n")
> +#define FUNC_NAME s_scm_bytevector_slice
> +{
> +  SCM ret;
> +  size_t c_offset, c_size;
> +  scm_t_array_element_type element_type;
> +
> +  SCM_VALIDATE_BYTEVECTOR (1, bv);
> +
> +  /* FIXME: Until 3.0.8 included, the assembler would not set the
> +     SCM_F_BYTEVECTOR_CONTIGUOUS flag on literals.  Thus, ignore it and
> +     assume BV is contiguous (how could it not be anyway?).  */
> +#if 0
> +  if (!SCM_BYTEVECTOR_CONTIGUOUS_P (bv))
> +    scm_wrong_type_arg_msg (FUNC_NAME, 0, bv, "contiguous bytevector");
> +#endif


I don't know what's up with that, I'm leaving this fragment of code for 
other to review.

> +
> +  c_offset = scm_to_size_t (offset);
> +
> +  if (SCM_UNBNDP (size))
> +    {
> +      if (c_offset < SCM_BYTEVECTOR_LENGTH (bv))
> +        c_size = SCM_BYTEVECTOR_LENGTH (bv) - c_offset;
> +      else
> +        c_size = 0; > +    }
> +  else
> +    c_size = scm_to_size_t (size);
> +
> +  if (c_offset + c_size > SCM_BYTEVECTOR_LENGTH (bv))
> +    scm_out_of_range (FUNC_NAME, offset);


If offset=SIZE_MAX-1 and size=1, this will overflow to 0 and hence not 
trigger the error reporting.  This bounds check needs to be rewritten, 
with corresponding additional tests.

> +
> +  /* Preserve the element type of BV, unless we're not slicing on type
> +     boundaries.  */
> +  element_type = SCM_BYTEVECTOR_ELEMENT_TYPE (bv);
> +  if ((c_offset % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0)
> +      || (c_size % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0))
> +    element_type = SCM_ARRAY_ELEMENT_TYPE_VU8;
> +  else
> +    c_size /= (scm_i_array_element_type_sizes[element_type] / 8);

I was worried about the alignment but it looks like it is handled.

> +  ret = make_bytevector_from_buffer (c_size,
> +                                     SCM_BYTEVECTOR_CONTENTS (bv) + c_offset,
> +                                     element_type);
> +  if (!SCM_MUTABLE_BYTEVECTOR_P (bv))
> +    {
> +      /* Preserve the immutability property.  */
> +      scm_t_bits flags = SCM_BYTEVECTOR_FLAGS (ret);
> +      SCM_SET_BYTEVECTOR_FLAGS (ret, flags | SCM_F_BYTEVECTOR_IMMUTABLE);
> +    }
> +
> +  SCM_BYTEVECTOR_SET_PARENT (ret, bv);


IIUC, if you use bytevector-slice iteratively, say:

(let loop ((bv some-initial-value)
            (n large-number))
   (if (> n 0)
       (loop (bytevector-slice bv 0 (bytevector-length bv))
             (- n 1))
       bv))

you will end up with a bytevector containing a reference to a bytevector 
containing a reference to ... containing a reference to the original 
reference, in a chain of length ≃ large-number.

This sounds rather inefficient to me (memory-wise).  Instead, I propose 
setting the parent of 'ret' to the parent of 'bv'.  (Its not really the 
'parent' anymore then, but AFAIK nothing cares about that, though you 
may want to rename the field then).

   (--)

Do you know what this 'parent' field even is for?  Going by some 
comments in 'libguile/bytevectors.c', it is for GC reasons, but going by 
the existence of the 'bytevector->pointer' + 'pointer->bytevector' trick 
which destroys 'parent' information, it seems unnecessary to me.

It appears to be added in '059a588fedf377ffd32cc1f1fee7ed829b263890', 
but it doesn't say why it is needed.  My guess is that the reason the 
'pointer->bytevector' tricks works at all is:

     * libguile/foreign.c (scm_pointer_to_bytevector): Use the parent field
       instead of registering a weak reference from bytevector to foreign
       pointer.

and hence 'parent' is needed after all, but this stuff isn't 
well-documented in the implementation.

> +
> +  return ret;
> +}
> +#undef FUNC_NAME
> +
>   /* Shrink BV to C_NEW_LEN (which is assumed to be smaller than its current
>      size) and return the new bytevector (possibly different from BV).  */
>   SCM
> diff --git a/libguile/bytevectors.h b/libguile/bytevectors.h
> index 980d6e267..6179bfd86 100644
> --- a/libguile/bytevectors.h
> +++ b/libguile/bytevectors.h
> @@ -1,7 +1,7 @@
>   #ifndef SCM_BYTEVECTORS_H
>   #define SCM_BYTEVECTORS_H
>   
> -/* Copyright 2009,2011,2018
> +/* Copyright 2009,2011,2018,2022
>        Free Software Foundation, Inc.
>   
>      This file is part of Guile.
> @@ -52,6 +52,7 @@ SCM_API uint8_t scm_c_bytevector_ref (SCM, size_t);
>   SCM_API void scm_c_bytevector_set_x (SCM, size_t, uint8_t);
>   
>   SCM_API SCM scm_make_bytevector (SCM, SCM);
> +SCM_API SCM scm_bytevector_slice (SCM, SCM, SCM);
>   SCM_API SCM scm_native_endianness (void);
>   SCM_API SCM scm_bytevector_p (SCM);
>   SCM_API SCM scm_bytevector_length (SCM);
> diff --git a/module/rnrs/bytevectors/gnu.scm b/module/rnrs/bytevectors/gnu.scm
> new file mode 100644
> index 000000000..ce97535a8
> --- /dev/null
> +++ b/module/rnrs/bytevectors/gnu.scm
> @@ -0,0 +1,24 @@
> +;;;; gnu.scm --- GNU extensions to the bytevector API.
> +
> +;;;;	Copyright (C) 2022 Free Software Foundation, Inc.
> +;;;;
> +;;;; This library is free software; you can redistribute it and/or
> +;;;; modify it under the terms of the GNU Lesser General Public
> +;;;; License as published by the Free Software Foundation; either
> +;;;; version 3 of the License, or (at your option) any later version.
> +;;;;
> +;;;; This library is distributed in the hope that it will be useful,
> +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;;;; Lesser General Public License for more details.
> +;;;;
> +;;;; You should have received a copy of the GNU Lesser General Public
> +;;;; License along with this library; if not, write to the Free Software
> +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

Nowadays a https://.../ instead of a mail address, is more conventional 
and useful, I'd think.  Its even used in old files, e.g. 
libguile/r6rs-ports.c.

> +
> +(define-module (rnrs bytevectors gnu)
> +  #:version (6)
> +  #:export (bytevector-slice))
> +
> +(define bytevector-slice
> +  (@@ (rnrs bytevectors) bytevector-slice))
> diff --git a/test-suite/tests/bytevectors.test b/test-suite/tests/bytevectors.test
> index 732aadb3e..dc4b32370 100644
> --- a/test-suite/tests/bytevectors.test
> +++ b/test-suite/tests/bytevectors.test
> @@ -1,6 +1,6 @@
>   ;;;; bytevectors.test --- R6RS bytevectors. -*- mode: scheme; coding: utf-8; -*-
>   ;;;;
> -;;;; Copyright (C) 2009-2015, 2018, 2021 Free Software Foundation, Inc.
> +;;;; Copyright (C) 2009-2015, 2018, 2021, 2023 Free Software Foundation, Inc.
>   ;;;;
>   ;;;; Ludovic Courtès
>   ;;;;
> @@ -22,6 +22,7 @@
>     #:use-module (test-suite lib)
>     #:use-module (system base compile)
>     #:use-module (rnrs bytevectors)
> +  #:use-module (rnrs bytevectors gnu)
>     #:use-module (srfi srfi-1)
>     #:use-module (srfi srfi-4))
>   
> @@ -666,6 +667,56 @@
>       exception:out-of-range
>       (with-input-from-string "#vu8(0 256)" read)))
>   
> +\f
> +(with-test-prefix "bytevector-slice"
> +
> +  (pass-if-exception "wrong size"
> +      exception:out-of-range
> +    (let ((b #vu8(1 2 3)))
> +      (bytevector-slice b 1 3)))


> +
> +  (pass-if-equal "slices"
> +      (list #vu8(1 2) #vu8(2 3)
> +            #vu8(1)   #vu8(2) #vu8(3))
> +    (let ((b #vu8(1 2 3)))
> +      (list (bytevector-slice b 0 2)
> +            (bytevector-slice b 1)
> +            (bytevector-slice b 0 1)
> +            (bytevector-slice b 1 1)
> +            (bytevector-slice b 2))))
> +
> +  (pass-if-exception "immutable flag preserved"
> +      exception:wrong-type-arg
> +    (compile '(begin
> +                (use-modules (rnrs bytevectors)
> +                             (rnrs bytevectors gnu))
> +
> +                ;; The literal bytevector below is immutable.
> +                (let ((bv #vu8(1 2 3)))
> +                  (bytevector-u8-set! (bytevector-slice bv 1) 0 0)))
> +
> +             ;; Disable optimizations to invoke the full-blown
> +             ;; 'scm_bytevector_u8_set_x' procedure, which checks for
> +             ;; the SCM_F_BYTEVECTOR_IMMUTABLE flag.
> +             #:optimization-level 0
> +             #:to 'value))
> +
> +  (pass-if-equal "slice of f32vector"
> +      '(8 2)
> +    (let* ((v #f32(1.1 1.2 3.14))
> +           (s (bytevector-slice v 4)))
> +      (and (= (f32vector-ref s 0)
> +              (f32vector-ref v 1))
> +           (list (bytevector-length s)
> +                 (f32vector-length s))))) > +
> +  (pass-if-equal "unaligned slice of f32vector"
> +      10
> +    (let* ((v #f32(1.1 1.2 3.14))
> +           (s (bytevector-slice v 2)))
> +      (and (not (f32vector? s))
> +           (bytevector-length s)))))

A test is missing for the case where the size is unaligned instead of 
the offset.

>   \f
>   (with-test-prefix "Arrays"
>   

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXT] [PATCH] Add 'bytevector-slice'.
  2023-01-11 19:05       ` [EXT] " lloda
@ 2023-01-12 22:27         ` Ludovic Courtès
  2023-01-13  9:30           ` lloda
  0 siblings, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2023-01-12 22:27 UTC (permalink / raw)
  To: lloda; +Cc: Thompson, David, Jean Abou Samra, guile-devel@gnu.org, Andy Wingo

lloda <lloda@sarc.name> skribis:

>> On 11 Jan 2023, at 18:37, Thompson, David <dthompson2@worcester.edu> wrote:
>> 
>> On Wed, Jan 11, 2023 at 12:34 PM Ludovic Courtès <ludo@gnu.org> wrote:
>>> 
>>> What could be convenient though is ‘bytevector-copy’ (no bang), which
>>> would combine ‘make-bytevector’ + ‘bytevector-copy!’.
>> 
>> 'bytevector-copy' already exists, or do you mean some different
>> implementation of it?
>> 
>> - Dave
>
> The current bytevector-copy takes a single argument.

Right, what I had in mind is one that would take an offset and size; I
hadn’t realized the name was already taken.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [EXT] [PATCH] Add 'bytevector-slice'.
  2023-01-12 22:27         ` Ludovic Courtès
@ 2023-01-13  9:30           ` lloda
  0 siblings, 0 replies; 17+ messages in thread
From: lloda @ 2023-01-13  9:30 UTC (permalink / raw)
  To: Ludovic Courtès
  Cc: Thompson, David, Jean Abou Samra, guile-devel@gnu.org, Andy Wingo



> On 12 Jan 2023, at 23:27, Ludovic Courtès <ludo@gnu.org> wrote:
> 
> lloda <lloda@sarc.name> skribis:
> 
>>> On 11 Jan 2023, at 18:37, Thompson, David <dthompson2@worcester.edu> wrote:
>>> 
>>> On Wed, Jan 11, 2023 at 12:34 PM Ludovic Courtès <ludo@gnu.org> wrote:
>>>> 
>>>> What could be convenient though is ‘bytevector-copy’ (no bang), which
>>>> would combine ‘make-bytevector’ + ‘bytevector-copy!’.
>>> 
>>> 'bytevector-copy' already exists, or do you mean some different
>>> implementation of it?
>>> 
>>> - Dave
>> 
>> The current bytevector-copy takes a single argument.
> 
> Right, what I had in mind is one that would take an offset and size; I
> hadn’t realized the name was already taken.
> 
> Thanks,
> Ludo’.

Actually (scheme base) from r7rs already defines (bytevector-copy bytevector start end), this is of course r7rs's convention. This is hidden in Guile's manual, which only lists the 1-argument version from r6rs.





^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-12  0:10 ` [PATCH] Add 'bytevector-slice' Maxime Devos
@ 2023-01-13 11:32   ` Ludovic Courtès
  2023-01-13 11:56     ` Maxime Devos
  2023-01-14 15:19   ` Ludovic Courtès
  1 sibling, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2023-01-13 11:32 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-devel, Andy Wingo

Hi,

Maxime Devos <maximedevos@telenet.be> skribis:

> The only thing missing for me, is a procedure
> 'bytevector-slice/read-only' and 'bytevector-slice/write-only', then I
> could throw the module implementing the wrapping in Scheme-GNUnet and
> the overhead incurred by wrapping away.

I did consider the read-only part, but it’s not really implementable
currently as SCM_F_BYTEVECTOR_IMMUTABLE flag is even ignored by
instructions, and fixing it is far beyond the scope of this patch:

  https://issues.guix.gnu.org/60779

> On 11-01-2023 16:00, Ludovic Courtès wrote:

[...]

>> +(use-modules (rnrs bytevectors)
>> +             (rnrs bytevectors gnu))
>
> I thought that R6RS reserved (rnrs ...) to the RnRS process, so I
> would think that naming it (rnrs bytevectors gnu) would be
> standards-incompliant, though I cannot find in the specification, so
> maybe it isn't actually reserved.
>
> (SRFI looks a bit looser to me, so I find the (srfi ... gnu)
> acceptable, but (rnrs ... gnu) looks weird to me, I would propose
> (ice-9 bytevector-extensions) or such.).

I’ll stick to gnu.scm because that’s the convention we’ve used for
extensions so far, and “gnu” is arguably “reserved”.

>> +Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,
>
>
> Where does this year 2022 come from?  Does a previous version of this
> patch predate the new year?

I started working on it in December and my ‘write-file-hooks’ updated it.

>> +  c_offset = scm_to_size_t (offset);
>> +
>> +  if (SCM_UNBNDP (size))
>> +    {
>> +      if (c_offset < SCM_BYTEVECTOR_LENGTH (bv))
>> +        c_size = SCM_BYTEVECTOR_LENGTH (bv) - c_offset;
>> +      else
>> +        c_size = 0; > +    }
>> +  else
>> +    c_size = scm_to_size_t (size);
>> +
>> +  if (c_offset + c_size > SCM_BYTEVECTOR_LENGTH (bv))
>> +    scm_out_of_range (FUNC_NAME, offset);
>
>
> If offset=SIZE_MAX-1 and size=1, this will overflow to 0 and hence not
> trigger the error reporting.  This bounds check needs to be rewritten,
> with corresponding additional tests.

OK.

> IIUC, if you use bytevector-slice iteratively, say:
>
> (let loop ((bv some-initial-value)
>            (n large-number))
>   (if (> n 0)
>       (loop (bytevector-slice bv 0 (bytevector-length bv))
>             (- n 1))
>       bv))
>
> you will end up with a bytevector containing a reference to a
> bytevector containing a reference to ... containing a reference to the
> original reference, in a chain of length ≃ large-number.

The ‘parent’ word is here just so the backing memory isn’t reclaimed
while the slice is still alive.

Whether there’s a long chain of ‘parent’ links or not makes no
difference to GC performance nor to memory usage.

> Do you know what this 'parent' field even is for?  Going by some
> comments in 'libguile/bytevectors.c', it is for GC reasons, but going
> by the existence of the 'bytevector->pointer' + 'pointer->bytevector'
> trick which destroys 'parent' information, it seems unnecessary to me.

Like I wrote, it was introduced to keep backing memory alive, generally.
With ‘pointer->bytevector’, we want to make sure the original pointer
remains live as long as the bytevector is live (pointer objects can have
a finalizer, and that finalizer must not run while the bytevector is
live).

> Nowadays a https://.../ instead of a mail address, is more
> conventional and useful, I'd think.  Its even used in old files,
> e.g. libguile/r6rs-ports.c.

Noted.

> A test is missing for the case where the size is unaligned instead of
> the offset.

Noted.

I’ll come up with a second version taking this into account.

Thanks!

Ludo’.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-13 11:32   ` Ludovic Courtès
@ 2023-01-13 11:56     ` Maxime Devos
  2023-01-13 23:48       ` Ludovic Courtès
  0 siblings, 1 reply; 17+ messages in thread
From: Maxime Devos @ 2023-01-13 11:56 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel, Andy Wingo


[-- Attachment #1.1.1: Type: text/plain, Size: 1260 bytes --]

On 13-01-2023 12:32, Ludovic Courtès wrote:
>> IIUC, if you use bytevector-slice iteratively, say:
>>
>> (let loop ((bv some-initial-value)
>>             (n large-number))
>>    (if (> n 0)
>>        (loop (bytevector-slice bv 0 (bytevector-length bv))
>>              (- n 1))
>>        bv))
>>
>> you will end up with a bytevector containing a reference to a
>> bytevector containing a reference to ... containing a reference to the
>> original reference, in a chain of length ≃ large-number.
> 
> The ‘parent’ word is here just so the backing memory isn’t reclaimed
> while the slice is still alive.
> 
> Whether there’s a long chain of ‘parent’ links or not makes no
> difference to GC performance nor to memory usage.

This is false, the opposite holds: memory usage is at least linear in 
the length of the chain, because the objects in the chain are pairwise 
non-eq?: for a chain (O_1, O_2, ..., O_n) that is alive, each object O_i 
in the chain is kept alive (because that's what the 'parent' link is 
for).  Because bytevector-slice does a fresh allocation, there then are 
n different objects.  Hence, memory usage is at least 
'SCM_BYTEVECTOR_HEADER_SIZE * sizeof(scm_t_bits) * n'.

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-13 11:56     ` Maxime Devos
@ 2023-01-13 23:48       ` Ludovic Courtès
  2023-01-14 14:45         ` Maxime Devos
  0 siblings, 1 reply; 17+ messages in thread
From: Ludovic Courtès @ 2023-01-13 23:48 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-devel, Andy Wingo

Maxime Devos <maximedevos@telenet.be> skribis:

> On 13-01-2023 12:32, Ludovic Courtès wrote:
>>> IIUC, if you use bytevector-slice iteratively, say:
>>>
>>> (let loop ((bv some-initial-value)
>>>             (n large-number))
>>>    (if (> n 0)
>>>        (loop (bytevector-slice bv 0 (bytevector-length bv))
>>>              (- n 1))
>>>        bv))
>>>
>>> you will end up with a bytevector containing a reference to a
>>> bytevector containing a reference to ... containing a reference to the
>>> original reference, in a chain of length ≃ large-number.
>> The ‘parent’ word is here just so the backing memory isn’t reclaimed
>> while the slice is still alive.
>> Whether there’s a long chain of ‘parent’ links or not makes no
>> difference to GC performance nor to memory usage.
>
> This is false, the opposite holds: memory usage is at least linear in
> the length of the chain, because the objects in the chain are pairwise
> non-eq?: for a chain (O_1, O_2, ..., O_n) that is alive, each object
> O_i in the chain is kept alive (because that's what the 'parent' link
> is for).  Because bytevector-slice does a fresh allocation, there then
> are n different objects.  Hence, memory usage is at least
> 'SCM_BYTEVECTOR_HEADER_SIZE * sizeof(scm_t_bits) * n'.

Ah yes, that’s right, I misunderstood the comment.

In the example above, where we’re only dealing with slices, we could
“skip” the parent (i.e., have each slice’s parent point to the “root” of
the hierarchy), but I don’t think we can assume this to be the case
generally.

Ludo’.



^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-13 23:48       ` Ludovic Courtès
@ 2023-01-14 14:45         ` Maxime Devos
  0 siblings, 0 replies; 17+ messages in thread
From: Maxime Devos @ 2023-01-14 14:45 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guile-devel, Andy Wingo


[-- Attachment #1.1.1: Type: text/plain, Size: 1444 bytes --]



On 14-01-2023 00:48, Ludovic Courtès wrote:
> Ah yes, that’s right, I misunderstood the comment.
> 
> In the example above, where we’re only dealing with slices, we could
> “skip” the parent (i.e., have each slice’s parent point to the “root” of
> the hierarchy), but I don’t think we can assume this to be the case
> generally.
> 
> Ludo’.

Why not?  I.e., where would things go wrong it the parent is "skipped" 
in other cases?  Something about GC I guess, but I don't follow what 
this "something" would be.

My guess is that you are thinking of the interaction with weak key-value 
maps, e.g. a map

    root-bytevector    -> stuff
    [other entries]

where the user might expect a slice of the root bytevector to keep the 
root itself intact -- which would be the case with your patch, but not 
with my "skip the parent" proposal.

I suppose there might be use cases for such things, in which case I 
wouldn't mind the original behaviour (though I'll have to look over 
Scheme-GNUnet code to verify no long chains are constructed), but if 
this is supposed to be supported, it should be documented. E.g.:

‘The returned slice keeps a reference to @var{bv} and not only to the 
underlying bytes of the bytevector.  Usually, this is of no importance, 
but this information is relevant when using GC data structures such as 
guardians and weak hash tables.’

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 17+ messages in thread

* Re: [PATCH] Add 'bytevector-slice'.
  2023-01-12  0:10 ` [PATCH] Add 'bytevector-slice' Maxime Devos
  2023-01-13 11:32   ` Ludovic Courtès
@ 2023-01-14 15:19   ` Ludovic Courtès
  1 sibling, 0 replies; 17+ messages in thread
From: Ludovic Courtès @ 2023-01-14 15:19 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-devel, Andy Wingo

Hello!

I pushed the patch as commit e441c34f1666921f6b15597c1aa3a50596a129d7
with the following changes taking into account your comments:

  • adjusted copyright years;

  • removed comment about ‘SCM_F_BYTEVECTOR_CONTIGUOUS’ since it’s quite
    clear from the discussion in this thread that this flag is
    vestigial;

  • added an overflow check for ‘c_offset + c_size’ and a corresponding
    test (really glad you reported this one!);

  • added another missing test that you mentioned.

I did not update the license header as you suggested, but I think we
should run a script on all the repo to homogenize those but it’s a bit
messy right now.

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-01-14 15:19 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-11 15:00 [PATCH] Add 'bytevector-slice' Ludovic Courtès
2023-01-11 15:21 ` Thompson, David
2023-01-11 15:29 ` Jean Abou Samra
2023-01-11 17:34   ` Ludovic Courtès
2023-01-11 17:37     ` [EXT] " Thompson, David
2023-01-11 19:05       ` [EXT] " lloda
2023-01-12 22:27         ` Ludovic Courtès
2023-01-13  9:30           ` lloda
2023-01-11 17:39 ` The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’ Ludovic Courtès
2023-01-11 18:51   ` lloda
2023-01-11 19:00     ` lloda
2023-01-12  0:10 ` [PATCH] Add 'bytevector-slice' Maxime Devos
2023-01-13 11:32   ` Ludovic Courtès
2023-01-13 11:56     ` Maxime Devos
2023-01-13 23:48       ` Ludovic Courtès
2023-01-14 14:45         ` Maxime Devos
2023-01-14 15:19   ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).