unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Maxime Devos <maximedevos@telenet.be>
To: "Ludovic Courtès" <ludo@gnu.org>, guile-devel@gnu.org
Cc: Andy Wingo <wingo@pobox.com>
Subject: Re: [PATCH] Add 'bytevector-slice'.
Date: Thu, 12 Jan 2023 01:10:57 +0100	[thread overview]
Message-ID: <57a2d974-0278-409c-678a-6daf7672ba81@telenet.be> (raw)
In-Reply-To: <20230111150015.10219-1-ludo@gnu.org>


[-- Attachment #1.1.1: Type: text/plain, Size: 12595 bytes --]

This looks useful to me, especially once the optimiser recognises 
'bytevector-slice'.  (In Scheme-GNUnet, I defined a wrapper around 
bytevectors called 'bytevectors slices' to implement such a thing.)

The only thing missing for me, is a procedure 
'bytevector-slice/read-only' and 'bytevector-slice/write-only', then I 
could throw the module implementing the wrapping in Scheme-GNUnet and 
the overhead incurred by wrapping away.

On 11-01-2023 16:00, Ludovic Courtès wrote:
> +@deffn {Scheme Procedure} bytevector-slice @var{bv} @var{offset} [@var{size}]
> +@deffnx {C Function} scm_bytevector_slice (@var{bv}, @var{offset}, @var{size})
> +Return the slice of @var{bv} starting at @var{offset} and counting
> +@var{size} bytes.  When @var{size} is omitted, the slice covers all
> +of @var{bv} starting from @var{offset}.  The returned slice shares
> +storage with @var{bv}: changes to the slice are visible in @var{bv}
> +and vice-versa.
> +
> +When @var{bv} is actually a SRFI-4 uniform vector, its element
> +type is preserved unless @var{offset} and @var{size} are not aligned
> +on its element type size.
> +@end deffn
I wrote stuff here about 'What if offset or size unaligned?  This is 
undocumented.', and only later noticed this paragraph documenting 
exactly that.

> +
> +Here is an example showing how to use it:
> +
> +@lisp
> +(use-modules (rnrs bytevectors)
> +             (rnrs bytevectors gnu))

I thought that R6RS reserved (rnrs ...) to the RnRS process, so I would 
think that naming it (rnrs bytevectors gnu) would be 
standards-incompliant, though I cannot find in the specification, so 
maybe it isn't actually reserved.

(SRFI looks a bit looser to me, so I find the (srfi ... gnu) acceptable, 
but (rnrs ... gnu) looks weird to me, I would propose (ice-9 
bytevector-extensions) or such.).

> +
> diff --git a/doc/ref/guile.texi b/doc/ref/guile.texi
> index 6a81a0893..8414c3e2d 100644
> --- a/doc/ref/guile.texi
> +++ b/doc/ref/guile.texi
> @@ -13,7 +13,7 @@
>   @copying
>   This manual documents Guile version @value{VERSION}.
>   
> -Copyright (C) 1996-1997, 2000-2005, 2009-2021 Free Software Foundation,
> +Copyright (C) 1996-1997, 2000-2005, 2009-2023 Free Software Foundation,


Where does this year 2022 come from?  Does a previous version of this 
patch predate the new year?


> diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
> index bbc23f449..6b920c88a 100644
> --- a/libguile/bytevectors.c
> +++ b/libguile/bytevectors.c
> @@ -1,4 +1,4 @@
> -/* Copyright 2009-2015,2018-2019
> +/* Copyright 2009-2015,2018-2019,2022-2023
>        Free Software Foundation, Inc.
>   
Ditto.

>      This file is part of Guile.
> @@ -325,6 +325,73 @@ scm_c_take_typed_bytevector (signed char *contents, size_t len,
>     return ret;
>   }
>   
> +SCM_DEFINE (scm_bytevector_slice, "bytevector-slice", 2, 1, 0,
> +            (SCM bv, SCM offset, SCM size),
> +            "Return the slice of @var{bv} starting at @var{offset} and counting\n"
> +            "@var{size} bytes.  When @var{size} is omitted, the slice covers all\n"
> +            "of @var{bv} starting from @var{offset}.  The returned slice shares\n"
> +            "storage with @var{bv}: changes to the slice are visible in @var{bv}\n"
> +            "and vice-versa.\n"
> +            "\n"
> +            "When @var{bv} is actually a SRFI-4 uniform vector, its element\n"
> +            "type is preserved unless @var{offset} and @var{size} are not aligned\n"
> +            "on its element type size.\n")
> +#define FUNC_NAME s_scm_bytevector_slice
> +{
> +  SCM ret;
> +  size_t c_offset, c_size;
> +  scm_t_array_element_type element_type;
> +
> +  SCM_VALIDATE_BYTEVECTOR (1, bv);
> +
> +  /* FIXME: Until 3.0.8 included, the assembler would not set the
> +     SCM_F_BYTEVECTOR_CONTIGUOUS flag on literals.  Thus, ignore it and
> +     assume BV is contiguous (how could it not be anyway?).  */
> +#if 0
> +  if (!SCM_BYTEVECTOR_CONTIGUOUS_P (bv))
> +    scm_wrong_type_arg_msg (FUNC_NAME, 0, bv, "contiguous bytevector");
> +#endif


I don't know what's up with that, I'm leaving this fragment of code for 
other to review.

> +
> +  c_offset = scm_to_size_t (offset);
> +
> +  if (SCM_UNBNDP (size))
> +    {
> +      if (c_offset < SCM_BYTEVECTOR_LENGTH (bv))
> +        c_size = SCM_BYTEVECTOR_LENGTH (bv) - c_offset;
> +      else
> +        c_size = 0; > +    }
> +  else
> +    c_size = scm_to_size_t (size);
> +
> +  if (c_offset + c_size > SCM_BYTEVECTOR_LENGTH (bv))
> +    scm_out_of_range (FUNC_NAME, offset);


If offset=SIZE_MAX-1 and size=1, this will overflow to 0 and hence not 
trigger the error reporting.  This bounds check needs to be rewritten, 
with corresponding additional tests.

> +
> +  /* Preserve the element type of BV, unless we're not slicing on type
> +     boundaries.  */
> +  element_type = SCM_BYTEVECTOR_ELEMENT_TYPE (bv);
> +  if ((c_offset % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0)
> +      || (c_size % SCM_BYTEVECTOR_TYPE_SIZE (bv) != 0))
> +    element_type = SCM_ARRAY_ELEMENT_TYPE_VU8;
> +  else
> +    c_size /= (scm_i_array_element_type_sizes[element_type] / 8);

I was worried about the alignment but it looks like it is handled.

> +  ret = make_bytevector_from_buffer (c_size,
> +                                     SCM_BYTEVECTOR_CONTENTS (bv) + c_offset,
> +                                     element_type);
> +  if (!SCM_MUTABLE_BYTEVECTOR_P (bv))
> +    {
> +      /* Preserve the immutability property.  */
> +      scm_t_bits flags = SCM_BYTEVECTOR_FLAGS (ret);
> +      SCM_SET_BYTEVECTOR_FLAGS (ret, flags | SCM_F_BYTEVECTOR_IMMUTABLE);
> +    }
> +
> +  SCM_BYTEVECTOR_SET_PARENT (ret, bv);


IIUC, if you use bytevector-slice iteratively, say:

(let loop ((bv some-initial-value)
            (n large-number))
   (if (> n 0)
       (loop (bytevector-slice bv 0 (bytevector-length bv))
             (- n 1))
       bv))

you will end up with a bytevector containing a reference to a bytevector 
containing a reference to ... containing a reference to the original 
reference, in a chain of length ≃ large-number.

This sounds rather inefficient to me (memory-wise).  Instead, I propose 
setting the parent of 'ret' to the parent of 'bv'.  (Its not really the 
'parent' anymore then, but AFAIK nothing cares about that, though you 
may want to rename the field then).

   (--)

Do you know what this 'parent' field even is for?  Going by some 
comments in 'libguile/bytevectors.c', it is for GC reasons, but going by 
the existence of the 'bytevector->pointer' + 'pointer->bytevector' trick 
which destroys 'parent' information, it seems unnecessary to me.

It appears to be added in '059a588fedf377ffd32cc1f1fee7ed829b263890', 
but it doesn't say why it is needed.  My guess is that the reason the 
'pointer->bytevector' tricks works at all is:

     * libguile/foreign.c (scm_pointer_to_bytevector): Use the parent field
       instead of registering a weak reference from bytevector to foreign
       pointer.

and hence 'parent' is needed after all, but this stuff isn't 
well-documented in the implementation.

> +
> +  return ret;
> +}
> +#undef FUNC_NAME
> +
>   /* Shrink BV to C_NEW_LEN (which is assumed to be smaller than its current
>      size) and return the new bytevector (possibly different from BV).  */
>   SCM
> diff --git a/libguile/bytevectors.h b/libguile/bytevectors.h
> index 980d6e267..6179bfd86 100644
> --- a/libguile/bytevectors.h
> +++ b/libguile/bytevectors.h
> @@ -1,7 +1,7 @@
>   #ifndef SCM_BYTEVECTORS_H
>   #define SCM_BYTEVECTORS_H
>   
> -/* Copyright 2009,2011,2018
> +/* Copyright 2009,2011,2018,2022
>        Free Software Foundation, Inc.
>   
>      This file is part of Guile.
> @@ -52,6 +52,7 @@ SCM_API uint8_t scm_c_bytevector_ref (SCM, size_t);
>   SCM_API void scm_c_bytevector_set_x (SCM, size_t, uint8_t);
>   
>   SCM_API SCM scm_make_bytevector (SCM, SCM);
> +SCM_API SCM scm_bytevector_slice (SCM, SCM, SCM);
>   SCM_API SCM scm_native_endianness (void);
>   SCM_API SCM scm_bytevector_p (SCM);
>   SCM_API SCM scm_bytevector_length (SCM);
> diff --git a/module/rnrs/bytevectors/gnu.scm b/module/rnrs/bytevectors/gnu.scm
> new file mode 100644
> index 000000000..ce97535a8
> --- /dev/null
> +++ b/module/rnrs/bytevectors/gnu.scm
> @@ -0,0 +1,24 @@
> +;;;; gnu.scm --- GNU extensions to the bytevector API.
> +
> +;;;;	Copyright (C) 2022 Free Software Foundation, Inc.
> +;;;;
> +;;;; This library is free software; you can redistribute it and/or
> +;;;; modify it under the terms of the GNU Lesser General Public
> +;;;; License as published by the Free Software Foundation; either
> +;;;; version 3 of the License, or (at your option) any later version.
> +;;;;
> +;;;; This library is distributed in the hope that it will be useful,
> +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
> +;;;; Lesser General Public License for more details.
> +;;;;
> +;;;; You should have received a copy of the GNU Lesser General Public
> +;;;; License along with this library; if not, write to the Free Software
> +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA

Nowadays a https://.../ instead of a mail address, is more conventional 
and useful, I'd think.  Its even used in old files, e.g. 
libguile/r6rs-ports.c.

> +
> +(define-module (rnrs bytevectors gnu)
> +  #:version (6)
> +  #:export (bytevector-slice))
> +
> +(define bytevector-slice
> +  (@@ (rnrs bytevectors) bytevector-slice))
> diff --git a/test-suite/tests/bytevectors.test b/test-suite/tests/bytevectors.test
> index 732aadb3e..dc4b32370 100644
> --- a/test-suite/tests/bytevectors.test
> +++ b/test-suite/tests/bytevectors.test
> @@ -1,6 +1,6 @@
>   ;;;; bytevectors.test --- R6RS bytevectors. -*- mode: scheme; coding: utf-8; -*-
>   ;;;;
> -;;;; Copyright (C) 2009-2015, 2018, 2021 Free Software Foundation, Inc.
> +;;;; Copyright (C) 2009-2015, 2018, 2021, 2023 Free Software Foundation, Inc.
>   ;;;;
>   ;;;; Ludovic Courtès
>   ;;;;
> @@ -22,6 +22,7 @@
>     #:use-module (test-suite lib)
>     #:use-module (system base compile)
>     #:use-module (rnrs bytevectors)
> +  #:use-module (rnrs bytevectors gnu)
>     #:use-module (srfi srfi-1)
>     #:use-module (srfi srfi-4))
>   
> @@ -666,6 +667,56 @@
>       exception:out-of-range
>       (with-input-from-string "#vu8(0 256)" read)))
>   
> +\f
> +(with-test-prefix "bytevector-slice"
> +
> +  (pass-if-exception "wrong size"
> +      exception:out-of-range
> +    (let ((b #vu8(1 2 3)))
> +      (bytevector-slice b 1 3)))


> +
> +  (pass-if-equal "slices"
> +      (list #vu8(1 2) #vu8(2 3)
> +            #vu8(1)   #vu8(2) #vu8(3))
> +    (let ((b #vu8(1 2 3)))
> +      (list (bytevector-slice b 0 2)
> +            (bytevector-slice b 1)
> +            (bytevector-slice b 0 1)
> +            (bytevector-slice b 1 1)
> +            (bytevector-slice b 2))))
> +
> +  (pass-if-exception "immutable flag preserved"
> +      exception:wrong-type-arg
> +    (compile '(begin
> +                (use-modules (rnrs bytevectors)
> +                             (rnrs bytevectors gnu))
> +
> +                ;; The literal bytevector below is immutable.
> +                (let ((bv #vu8(1 2 3)))
> +                  (bytevector-u8-set! (bytevector-slice bv 1) 0 0)))
> +
> +             ;; Disable optimizations to invoke the full-blown
> +             ;; 'scm_bytevector_u8_set_x' procedure, which checks for
> +             ;; the SCM_F_BYTEVECTOR_IMMUTABLE flag.
> +             #:optimization-level 0
> +             #:to 'value))
> +
> +  (pass-if-equal "slice of f32vector"
> +      '(8 2)
> +    (let* ((v #f32(1.1 1.2 3.14))
> +           (s (bytevector-slice v 4)))
> +      (and (= (f32vector-ref s 0)
> +              (f32vector-ref v 1))
> +           (list (bytevector-length s)
> +                 (f32vector-length s))))) > +
> +  (pass-if-equal "unaligned slice of f32vector"
> +      10
> +    (let* ((v #f32(1.1 1.2 3.14))
> +           (s (bytevector-slice v 2)))
> +      (and (not (f32vector? s))
> +           (bytevector-length s)))))

A test is missing for the case where the size is unaligned instead of 
the offset.

>   \f
>   (with-test-prefix "Arrays"
>   

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

  parent reply	other threads:[~2023-01-12  0:10 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-11 15:00 [PATCH] Add 'bytevector-slice' Ludovic Courtès
2023-01-11 15:21 ` Thompson, David
2023-01-11 15:29 ` Jean Abou Samra
2023-01-11 17:34   ` Ludovic Courtès
2023-01-11 17:37     ` [EXT] " Thompson, David
2023-01-11 19:05       ` [EXT] " lloda
2023-01-12 22:27         ` Ludovic Courtès
2023-01-13  9:30           ` lloda
2023-01-11 17:39 ` The mysterious ‘SCM_F_BYTEVECTOR_CONTIGUOUS’ Ludovic Courtès
2023-01-11 18:51   ` lloda
2023-01-11 19:00     ` lloda
2023-01-12  0:10 ` Maxime Devos [this message]
2023-01-13 11:32   ` [PATCH] Add 'bytevector-slice' Ludovic Courtès
2023-01-13 11:56     ` Maxime Devos
2023-01-13 23:48       ` Ludovic Courtès
2023-01-14 14:45         ` Maxime Devos
2023-01-14 15:19   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=57a2d974-0278-409c-678a-6daf7672ba81@telenet.be \
    --to=maximedevos@telenet.be \
    --cc=guile-devel@gnu.org \
    --cc=ludo@gnu.org \
    --cc=wingo@pobox.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).