unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed
* bytevector-string-ref
@ 2022-12-18 12:12 Sascha Ziemann
  2022-12-18 13:19 ` bytevector-string-ref tomas
                   ` (2 more replies)
  0 siblings, 3 replies; 12+ messages in thread
From: Sascha Ziemann @ 2022-12-18 12:12 UTC (permalink / raw)
  To: guile-user

I am wondering if something like bytevector-string-ref is missing in the API.
Or is there any other way to extract a string from a byte vector, without
copying the data twice?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-18 12:12 bytevector-string-ref Sascha Ziemann
@ 2022-12-18 13:19 ` tomas
  2022-12-18 16:25 ` bytevector-string-ref Taylan Kammer
  2022-12-18 21:17 ` bytevector-string-ref Matt Wette
  2 siblings, 0 replies; 12+ messages in thread
From: tomas @ 2022-12-18 13:19 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 583 bytes --]

On Sun, Dec 18, 2022 at 01:12:57PM +0100, Sascha Ziemann wrote:
> I am wondering if something like bytevector-string-ref is missing in the API.
> Or is there any other way to extract a string from a byte vector, without
> copying the data twice?

Alas, it's more complicated than this. There's a whole colourful world
of encodings in between :-)

I think "6.6.5.13 Representing Strings as Bytes" in the Guile manual
is the relevant part, here on the Intrawebs.

Cheers

[1] https://www.gnu.org/software/guile/manual/html_node/Representing-Strings-as-Bytes.html
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-18 12:12 bytevector-string-ref Sascha Ziemann
  2022-12-18 13:19 ` bytevector-string-ref tomas
@ 2022-12-18 16:25 ` Taylan Kammer
  2022-12-18 16:35   ` bytevector-string-ref Vijay Marupudi
  2022-12-18 16:38   ` bytevector-string-ref tomas
  2022-12-18 21:17 ` bytevector-string-ref Matt Wette
  2 siblings, 2 replies; 12+ messages in thread
From: Taylan Kammer @ 2022-12-18 16:25 UTC (permalink / raw)
  To: Sascha Ziemann, guile-user

On 18.12.2022 13:12, Sascha Ziemann wrote:
> I am wondering if something like bytevector-string-ref is missing in the API.
> Or is there any other way to extract a string from a byte vector, without
> copying the data twice?
> 

I don't think Guile currently has any way of giving you a string object that's
backed by the contents of a bytevector, instead of a privately held copy of those
bytevector contents.

AFAIK, there is only the utfX->string class of procedures, which give you a "newly
allocated" string from the bytevector's contents:

https://www.gnu.org/software/guile/manual/html_node/Bytevectors-as-Strings.html

That should only lead to the contents being copied once, however.  I'm not sure
why you asked "without copying the data twice."

-- 
Taylan




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-18 16:25 ` bytevector-string-ref Taylan Kammer
@ 2022-12-18 16:35   ` Vijay Marupudi
  2022-12-18 16:38   ` bytevector-string-ref tomas
  1 sibling, 0 replies; 12+ messages in thread
From: Vijay Marupudi @ 2022-12-18 16:35 UTC (permalink / raw)
  To: Taylan Kammer, Sascha Ziemann, guile-user

[-- Attachment #1: Type: text/plain, Size: 427 bytes --]

The 2 copies occur if you want a portion of the bytevector to be
converted to a string. The current API requires you to copy to a
bytevector that's the length of the slice, and then convert that copy to
a string.

I've sent patches to guile-devel to address this issue (by extending
utf{8,16,32}->string) in the past, but those patches have not gotten any
attention yet.

Attaching them again here to (hopefully) revive them.


[-- Attachment #2: 0001-Allow-utf8-string-utf16-string-utf32-string-to-take-.patch --]
[-- Type: text/x-patch, Size: 12490 bytes --]

From c6be127b4818d43a0244592c18a52de113d3ff08 Mon Sep 17 00:00:00 2001
From: Vijay Marupudi <vijay@vijaymarupudi.com>
Date: Thu, 20 Jan 2022 22:19:25 -0500
Subject: [PATCH 1/2] Allow utf8->string, utf16->string, utf32->string to take
 ranges

Added the following new functions, that behave like substring, but for
bytevector to string conversion.

scm_utf8_range_to_string (SCM, SCM, SCM);
scm_utf16_range_to_string (SCM, SCM, SCM, SCM);
scm_utf32_range_to_string (SCM, SCM, SCM, SCM);

* doc/ref/api-data.texi: Updated documentation to reflect new function
  and range constraints
* libguile/bytevectors.c: Added new function.
* libguile/bytevectors.h: Added new function declaration.
* test-suite/tests/bytevectors.test: Added tests for exceptions and
  behavior for edge cases
---
 doc/ref/api-data.texi             |  15 +++-
 libguile/bytevectors.c            | 144 +++++++++++++++++++++++-------
 libguile/bytevectors.h            |   3 +
 test-suite/tests/bytevectors.test |  37 ++++++++
 4 files changed, 164 insertions(+), 35 deletions(-)

diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index b6c2c4d61..44b64454f 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -7139,16 +7139,25 @@ UTF-32 (aka. UCS-4) encoding of @var{str}.  For UTF-16 and UTF-32,
 it defaults to big endian.
 @end deffn
 
-@deffn {Scheme Procedure} utf8->string utf
-@deffnx {Scheme Procedure} utf16->string utf [endianness]
-@deffnx {Scheme Procedure} utf32->string utf [endianness]
+@deffn {Scheme Procedure} utf8->string utf [start [end]]
+@deffnx {Scheme Procedure} utf16->string utf [endianness [start [end]]]
+@deffnx {Scheme Procedure} utf32->string utf [endianness [start [end]]]
 @deffnx {C Function} scm_utf8_to_string (utf)
+@deffnx {C Function} scm_utf8_range_to_string (utf, start, end)
 @deffnx {C Function} scm_utf16_to_string (utf, endianness)
+@deffnx {C Function} scm_utf16_range_to_string (utf, endianness, start, end)
 @deffnx {C Function} scm_utf32_to_string (utf, endianness)
+@deffnx {C Function} scm_utf32_range_to_string (utf, endianness, start, end)
+
 Return a newly allocated string that contains from the UTF-8-, UTF-16-,
 or UTF-32-decoded contents of bytevector @var{utf}.  For UTF-16 and UTF-32,
 @var{endianness} should be the symbol @code{big} or @code{little}; when omitted,
 it defaults to big endian.
+
+@var{start} and @var{end}, when provided, must be exact integers
+satisfying:
+
+0 <= @var{start} <= @var{end} <= @code{(bytevector-length @var{utf})}.
 @end deffn
 
 @node Bytevectors as Arrays
diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index f42fbb427..12d299042 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -2061,25 +2061,46 @@ SCM_DEFINE (scm_string_to_utf32, "string->utf32",
 
 /* Produce the body of a function that converts a UTF-encoded bytevector to a
    string.  */
-#define UTF_TO_STRING(_utf_width)					\
+#define UTF_TO_STRING(_utf_width, utf, endianness, start, end)          \
   SCM str = SCM_BOOL_F;							\
   int err;								\
   char *c_str = NULL;                                                   \
   char c_utf_name[MAX_UTF_ENCODING_NAME_LEN];				\
   char *c_utf;                                                          \
-  size_t c_strlen = 0, c_utf_len = 0;					\
+  size_t c_strlen = 0, c_utf_len, c_start, c_end;                       \
 									\
-  SCM_VALIDATE_BYTEVECTOR (1, utf);					\
-  if (scm_is_eq (endianness, SCM_UNDEFINED))                            \
-    endianness = sym_big;						\
+  SCM_VALIDATE_BYTEVECTOR (1, (utf));					\
+  if (scm_is_eq ((endianness), SCM_UNDEFINED))                          \
+    (endianness) = sym_big;						\
   else									\
-    SCM_VALIDATE_SYMBOL (2, endianness);				\
+    SCM_VALIDATE_SYMBOL (2, (endianness));				\
 									\
-  c_utf_len = SCM_BYTEVECTOR_LENGTH (utf);				\
-  c_utf = (char *) SCM_BYTEVECTOR_CONTENTS (utf);			\
-  utf_encoding_name (c_utf_name, (_utf_width), endianness);		\
+  c_utf_len = SCM_BYTEVECTOR_LENGTH ((utf));				\
+  c_utf = (char *) SCM_BYTEVECTOR_CONTENTS ((utf));			\
+  utf_encoding_name (c_utf_name, (_utf_width), (endianness));		\
+                                                                        \
+  if (!scm_is_eq ((start), SCM_UNDEFINED))                              \
+    {                                                                   \
+      c_start = scm_to_unsigned_integer ((start), 0, c_utf_len);        \
+    }                                                                   \
+  else                                                                  \
+    {                                                                   \
+      c_start = 0;                                                      \
+    }                                                                   \
+                                                                        \
+  if (!scm_is_eq ((end), SCM_UNDEFINED))                                \
+    {                                                                   \
+      c_end = scm_to_unsigned_integer ((end), 0, c_utf_len);            \
+    }                                                                   \
+  else                                                                  \
+    {                                                                   \
+      c_end = c_utf_len;                                                \
+    }                                                                   \
+                                                                        \
+  validate_bytevector_range(FUNC_NAME, c_utf_len, c_start, c_end);      \
+                                                                        \
 									\
-  err = mem_iconveh (c_utf, c_utf_len,					\
+  err = mem_iconveh (c_utf + c_start, c_end - c_start,                  \
 		     c_utf_name, "UTF-8",				\
 		     iconveh_question_mark, NULL,			\
 		     &c_str, &c_strlen);				\
@@ -2094,46 +2115,105 @@ SCM_DEFINE (scm_string_to_utf32, "string->utf32",
   return (str);
 
 
-SCM_DEFINE (scm_utf8_to_string, "utf8->string",
-	    1, 0, 0,
-	    (SCM utf),
-	    "Return a newly allocate string that contains from the UTF-8-"
-	    "encoded contents of bytevector @var{utf}.")
-#define FUNC_NAME s_scm_utf8_to_string
+static inline void
+validate_bytevector_range(const char* function_name, size_t len, size_t start, size_t end) {
+  if (SCM_UNLIKELY (start > len))
+    {
+      scm_out_of_range (function_name, scm_from_size_t(start));
+    }
+  if (SCM_UNLIKELY (end > len))
+    {
+      scm_out_of_range (function_name, scm_from_size_t(end));
+    }
+  if (SCM_UNLIKELY(end < start))
+    {
+      scm_out_of_range (function_name, scm_from_size_t(end));
+    }
+}
+
+
+SCM_DEFINE (scm_utf8_range_to_string, "utf8->string",
+            1, 2, 0,
+            (SCM utf, SCM start, SCM end),
+            "Return a newly allocate string that contains from the UTF-8-"
+            "encoded contents of bytevector @var{utf}.")
+#define FUNC_NAME s_scm_utf8_range_to_string
 {
   SCM str;
   const char *c_utf;
-  size_t c_utf_len = 0;
+  size_t c_start;
+  size_t c_end;
+  size_t c_len;
 
   SCM_VALIDATE_BYTEVECTOR (1, utf);
-
-  c_utf_len = SCM_BYTEVECTOR_LENGTH (utf);
   c_utf = (char *) SCM_BYTEVECTOR_CONTENTS (utf);
-  str = scm_from_utf8_stringn (c_utf, c_utf_len);
+  c_len = SCM_BYTEVECTOR_LENGTH(utf);
+
+  if (!scm_is_eq (start, SCM_UNDEFINED))
+    {
+      c_start = scm_to_unsigned_integer (start, 0, c_len);
+    }
+  else
+    {
+      c_start = 0;
+    }
 
+  if (!scm_is_eq (end, SCM_UNDEFINED))
+    {
+      c_end = scm_to_unsigned_integer (end, 0, c_len);
+    }
+  else
+    {
+      c_end = c_len;
+    }
+
+  validate_bytevector_range(FUNC_NAME, c_len, c_start, c_end);
+  str = scm_from_utf8_stringn (c_utf + c_start, c_end - c_start);
   return (str);
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_utf16_to_string, "utf16->string",
-	    1, 1, 0,
-	    (SCM utf, SCM endianness),
-	    "Return a newly allocate string that contains from the UTF-16-"
-	    "encoded contents of bytevector @var{utf}.")
+SCM
+scm_utf8_to_string(SCM utf)
+#define FUNC_NAME s_scm_utf8_to_string
+{
+  return scm_utf8_range_to_string(utf, SCM_UNDEFINED, SCM_UNDEFINED);
+}
+#undef FUNC_NAME
+
+SCM_DEFINE (scm_utf16_range_to_string, "utf16->string",
+            1, 3, 0,
+            (SCM utf, SCM endianness, SCM start, SCM end),
+            "Return a newly allocate string that contains from the UTF-8-"
+            "encoded contents of bytevector @var{utf}.")
+#define FUNC_NAME s_scm_utf16_range_to_string
+{
+  UTF_TO_STRING(16, utf, endianness, start, end);
+}
+#undef FUNC_NAME
+
+SCM scm_utf16_to_string (SCM utf, SCM endianness)
 #define FUNC_NAME s_scm_utf16_to_string
 {
-  UTF_TO_STRING (16);
+  return scm_utf16_range_to_string(utf, endianness, SCM_UNDEFINED, SCM_UNDEFINED);
 }
 #undef FUNC_NAME
 
-SCM_DEFINE (scm_utf32_to_string, "utf32->string",
-	    1, 1, 0,
-	    (SCM utf, SCM endianness),
-	    "Return a newly allocate string that contains from the UTF-32-"
-	    "encoded contents of bytevector @var{utf}.")
+SCM_DEFINE (scm_utf32_range_to_string, "utf32->string",
+            1, 3, 0,
+            (SCM utf, SCM endianness, SCM start, SCM end),
+            "Return a newly allocate string that contains from the UTF-8-"
+            "encoded contents of bytevector @var{utf}.")
+#define FUNC_NAME s_scm_utf32_range_to_string
+{
+  UTF_TO_STRING(32, utf, endianness, start, end);
+}
+#undef FUNC_NAME
+
+SCM scm_utf32_to_string (SCM utf, SCM endianness)
 #define FUNC_NAME s_scm_utf32_to_string
 {
-  UTF_TO_STRING (32);
+  return scm_utf32_range_to_string(utf, endianness, SCM_UNDEFINED, SCM_UNDEFINED);
 }
 #undef FUNC_NAME
 
diff --git a/libguile/bytevectors.h b/libguile/bytevectors.h
index 980d6e267..63d8e3119 100644
--- a/libguile/bytevectors.h
+++ b/libguile/bytevectors.h
@@ -113,8 +113,11 @@ SCM_API SCM scm_string_to_utf8 (SCM);
 SCM_API SCM scm_string_to_utf16 (SCM, SCM);
 SCM_API SCM scm_string_to_utf32 (SCM, SCM);
 SCM_API SCM scm_utf8_to_string (SCM);
+SCM_API SCM scm_utf8_range_to_string (SCM, SCM, SCM);
 SCM_API SCM scm_utf16_to_string (SCM, SCM);
+SCM_API SCM scm_utf16_range_to_string (SCM, SCM, SCM, SCM);
 SCM_API SCM scm_utf32_to_string (SCM, SCM);
+SCM_API SCM scm_utf32_range_to_string (SCM, SCM, SCM, SCM);
 
 
 \f
diff --git a/test-suite/tests/bytevectors.test b/test-suite/tests/bytevectors.test
index 732aadb3e..f8c6a8df1 100644
--- a/test-suite/tests/bytevectors.test
+++ b/test-suite/tests/bytevectors.test
@@ -558,6 +558,43 @@
       exception:decoding-error
     (utf8->string #vu8(104 105 239 191 50)))
 
+  (pass-if "utf8->string range: start provided"
+    (let* ((utf8 (string->utf8 "gnu guile"))
+           (str (utf8->string utf8 4)))
+      (string=? str "guile")))
+
+  (pass-if "utf8->string range: start and end provided"
+    (let* ((utf8 (string->utf8 "gnu guile"))
+           (str (utf8->string utf8 4 7)))
+      (string=? str "gui")))
+
+  (pass-if "utf8->string range: start = end = 0"
+    (let* ((utf8 (string->utf8 "gnu guile"))
+           (str (utf8->string utf8 0 0)))
+      (string=? str "")))
+
+  (pass-if-exception "utf8->string range: start > len"
+      exception:out-of-range
+    (let* ((utf8 (string->utf8 "four")))
+      ;; 4 as start is expected to return an empty string, in congruence
+      ;; with `substring'.
+      (utf8->string utf8 5)))
+
+  (pass-if-exception "utf8->string range: end < start"
+      exception:out-of-range
+      (let* ((utf8 (string->utf8 "gnu guile")))
+        (utf8->string utf8 1 0)))
+
+  (pass-if "utf8->string range: multibyte characters"
+    (string=? (utf8->string #vu8(195 169 67) 0 2) "é"))
+
+  (pass-if-exception "utf8->string range: decoding error for invalid range"
+      exception:decoding-error
+    (utf8->string #vu8(195 169) 0 1))
+
+  (pass-if "utf8->string range: null byte non-termination"
+    (string=? (utf8->string #vu8(0 32 196) 0 2) "\x00 "))
+
   (pass-if "utf16->string"
     (let* ((utf16  (uint-list->bytevector (map char->integer
                                                (string->list "hello, world"))
-- 
2.34.1


[-- Attachment #3: Type: text/plain, Size: 9 bytes --]


~ Vijay

^ permalink raw reply related	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-18 16:25 ` bytevector-string-ref Taylan Kammer
  2022-12-18 16:35   ` bytevector-string-ref Vijay Marupudi
@ 2022-12-18 16:38   ` tomas
  1 sibling, 0 replies; 12+ messages in thread
From: tomas @ 2022-12-18 16:38 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 1300 bytes --]

On Sun, Dec 18, 2022 at 05:25:16PM +0100, Taylan Kammer wrote:
> On 18.12.2022 13:12, Sascha Ziemann wrote:
> > I am wondering if something like bytevector-string-ref is missing in the API.
> > Or is there any other way to extract a string from a byte vector, without
> > copying the data twice?
> > 
> 
> I don't think Guile currently has any way of giving you a string object that's
> backed by the contents of a bytevector, instead of a privately held copy of those
> bytevector contents.
> 
> AFAIK, there is only the utfX->string class of procedures, which give you a "newly
> allocated" string from the bytevector's contents:
> 
> https://www.gnu.org/software/guile/manual/html_node/Bytevectors-as-Strings.html
> 
> That should only lead to the contents being copied once, however.  I'm not sure
> why you asked "without copying the data twice."

I think you have to copy anyway -- unless your string's encoding is
strictly the same as Guile's internal encoding. Currently (I don't
know whether this is part of the official interface) it is dual:
either one byte/char (for texts encodable in iso-8859-1) or four
byte/char for all the others. representing Unicode code points (plus
some extra bits) I can't imagine it to be fun interfacing to that :-)

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-18 12:12 bytevector-string-ref Sascha Ziemann
  2022-12-18 13:19 ` bytevector-string-ref tomas
  2022-12-18 16:25 ` bytevector-string-ref Taylan Kammer
@ 2022-12-18 21:17 ` Matt Wette
  2022-12-18 22:45   ` bytevector-string-ref Sascha Ziemann
  2 siblings, 1 reply; 12+ messages in thread
From: Matt Wette @ 2022-12-18 21:17 UTC (permalink / raw)
  To: guile-user

On 12/18/22 4:12 AM, Sascha Ziemann wrote:
> I am wondering if something like bytevector-string-ref is missing in the API.
> Or is there any other way to extract a string from a byte vector, without
> copying the data twice?
>

I sympathize with the struggle here.   I wonder if rlb is aware.
I believe he is working on an update to string handling in guile.
I posted a message to #guile@libera.chat asking if his work
includes moving to/from bytevectors.

Matt




^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-18 21:17 ` bytevector-string-ref Matt Wette
@ 2022-12-18 22:45   ` Sascha Ziemann
  2022-12-19  5:01     ` bytevector-string-ref tomas
  0 siblings, 1 reply; 12+ messages in thread
From: Sascha Ziemann @ 2022-12-18 22:45 UTC (permalink / raw)
  To: guile-user

Maybe having a bytevector-slice-ref with shared memory would be more flexible.
The partial usage of a bytevector as a string is just one use case.
There may be others.



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-18 22:45   ` bytevector-string-ref Sascha Ziemann
@ 2022-12-19  5:01     ` tomas
  2022-12-21 15:49       ` bytevector-string-ref Sascha Ziemann
  0 siblings, 1 reply; 12+ messages in thread
From: tomas @ 2022-12-19  5:01 UTC (permalink / raw)
  To: guile-user

[-- Attachment #1: Type: text/plain, Size: 384 bytes --]

On Sun, Dec 18, 2022 at 11:45:49PM +0100, Sascha Ziemann wrote:
> Maybe having a bytevector-slice-ref with shared memory would be more flexible.
> The partial usage of a bytevector as a string is just one use case.
> There may be others.

Is that related to "shared arrays"? This seems to be even more general (the
affine part being the offset of the slice).

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-19  5:01     ` bytevector-string-ref tomas
@ 2022-12-21 15:49       ` Sascha Ziemann
  2022-12-22  5:34         ` bytevector-string-ref tomas
  0 siblings, 1 reply; 12+ messages in thread
From: Sascha Ziemann @ 2022-12-21 15:49 UTC (permalink / raw)
  To: tomas; +Cc: guile-user

> Is that related to "shared arrays"? This seems to be even more general (the
> affine part being the offset of the slice).

Thanks for the hint. I was not aware of them.

But I am struggling to use them. This does not work:

(define str "Hello, World!")
(define bv (string->utf8 str))
(define sa (make-shared-array bv (lambda (i) (list (+ i 7))) '(0 4)))
(utf8->string sa)

Anything else I can try?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-21 15:49       ` bytevector-string-ref Sascha Ziemann
@ 2022-12-22  5:34         ` tomas
  2022-12-22  8:58           ` bytevector-string-ref Sascha Ziemann
  0 siblings, 1 reply; 12+ messages in thread
From: tomas @ 2022-12-22  5:34 UTC (permalink / raw)
  To: Sascha Ziemann; +Cc: guile-user

[-- Attachment #1: Type: text/plain, Size: 650 bytes --]

On Wed, Dec 21, 2022 at 04:49:09PM +0100, Sascha Ziemann wrote:
> > Is that related to "shared arrays"? This seems to be even more general (the
> > affine part being the offset of the slice).
> 
> Thanks for the hint. I was not aware of them.
> 
> But I am struggling to use them. This does not work:
> 
> (define str "Hello, World!")
> (define bv (string->utf8 str))
> (define sa (make-shared-array bv (lambda (i) (list (+ i 7))) '(0 4)))

I think this should be

  (define sa (make-shared-array bv (lambda (i) (list (+ i 7))) 4))

since you have a one-dimensional vector. But perhaps I mis-read
your intentions.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-22  5:34         ` bytevector-string-ref tomas
@ 2022-12-22  8:58           ` Sascha Ziemann
  2022-12-22 12:24             ` bytevector-string-ref lloda
  0 siblings, 1 reply; 12+ messages in thread
From: Sascha Ziemann @ 2022-12-22  8:58 UTC (permalink / raw)
  To: guile-user

> > (define str "Hello, World!")
> > (define bv (string->utf8 str))
> > (define sa (make-shared-array bv (lambda (i) (list (+ i 7))) '(0 4)))
>
> I think this should be
>
>   (define sa (make-shared-array bv (lambda (i) (list (+ i 7))) 4))

This seems to be the same (equal?):
(make-shared-array bv (lambda (i) (list (+ i 7))) '(0 4))
(make-shared-array bv (lambda (i) (list (+ i 7))) 5)

And it does not work either:
In procedure utf8->string: Wrong type argument in position 1
(expecting bytevector): #1vu8(87 111 114 108)

#1vu8() and #vu8() seem to be diverse. Btw what is the difference?



^ permalink raw reply	[flat|nested] 12+ messages in thread

* Re: bytevector-string-ref
  2022-12-22  8:58           ` bytevector-string-ref Sascha Ziemann
@ 2022-12-22 12:24             ` lloda
  0 siblings, 0 replies; 12+ messages in thread
From: lloda @ 2022-12-22 12:24 UTC (permalink / raw)
  To: Sascha Ziemann; +Cc: guile-user



> On 22 Dec 2022, at 09:58, Sascha Ziemann <ceving@gmail.com> wrote:
> 
>>> (define str "Hello, World!")
>>> (define bv (string->utf8 str))
>>> (define sa (make-shared-array bv (lambda (i) (list (+ i 7))) '(0 4)))
>> 
>> I think this should be
>> 
>>  (define sa (make-shared-array bv (lambda (i) (list (+ i 7))) 4))
> 
> This seems to be the same (equal?):
> (make-shared-array bv (lambda (i) (list (+ i 7))) '(0 4))
> (make-shared-array bv (lambda (i) (list (+ i 7))) 5)
> 
> And it does not work either:
> In procedure utf8->string: Wrong type argument in position 1
> (expecting bytevector): #1vu8(87 111 114 108)
> 
> #1vu8() and #vu8() seem to be diverse. Btw what is the difference?

#vu8() are containers. These objects consist of a piece of storage and its size.

#1vu8() are views (specifically, rank-1 views). These are more complicated objects that contain a pointer to a container plus size, start location, and stride. Internally, views are their own type ('arrays') and not bytevectors.

In Guile, functions that have 'array' in their name take either containers or views of any type, so array-ref for example will accept #1vu8() or #vu8() or #f32() or even #(). However this generality makes them slower than type-specific functions such as bytevector-ref or f32vector-ref and so on.

If you make a shared array of a bytevector that isn't the bytevector itself (start at 0, end at end, stride 1) you'll get a view and not a bytevector, because the bytevector object cannot represent such a shared array. Once you use make-shared-array, you're limited to the array-xxx functions.

It used to be the case that some typed vector functions worked on views was well. For example, we used to have vector-ref which worked on #1() and #(), and simple-vector-ref that worked only on #(). It was a bit of a mess. That was never the case of the bytevector functions, which were added later and only ever worked on bytevector objects.

I suppose it's possible to extend utf8 to accept a bytevector view (internally, an array of rank 1 and type vu8) since utf8 isn't itself a type (in the way you'd expect something called bytevector->xxx should be constrained to arguments that satisfy bytevector?). It also helps that there aren't a lot of functions utf8->xxx that you'd need to fix.

I hope this makes sense.

  Daniel




^ permalink raw reply	[flat|nested] 12+ messages in thread

end of thread, other threads:[~2022-12-22 12:24 UTC | newest]

Thread overview: 12+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-18 12:12 bytevector-string-ref Sascha Ziemann
2022-12-18 13:19 ` bytevector-string-ref tomas
2022-12-18 16:25 ` bytevector-string-ref Taylan Kammer
2022-12-18 16:35   ` bytevector-string-ref Vijay Marupudi
2022-12-18 16:38   ` bytevector-string-ref tomas
2022-12-18 21:17 ` bytevector-string-ref Matt Wette
2022-12-18 22:45   ` bytevector-string-ref Sascha Ziemann
2022-12-19  5:01     ` bytevector-string-ref tomas
2022-12-21 15:49       ` bytevector-string-ref Sascha Ziemann
2022-12-22  5:34         ` bytevector-string-ref tomas
2022-12-22  8:58           ` bytevector-string-ref Sascha Ziemann
2022-12-22 12:24             ` bytevector-string-ref lloda

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).