[PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
@ 2017-06-18 23:28 Freja Nordsiek
  2017-06-21  1:13 ` Mark H Weaver
  2017-06-21  2:11 ` Mark H Weaver
  0 siblings, 2 replies; 8+ messages in thread
From: Freja Nordsiek @ 2017-06-18 23:28 UTC (permalink / raw)
  To: guile-devel


[-- Attachment #1.1: Type: text/plain, Size: 1027 bytes --]

Was fiddling around with using Chibi's R7RS test-suite in Guile and found a
major R7RS syntax feature currently missing from Guile. The feature is R7RS
bytevector notation, which uses the #u8 prefix like SRFI-4 unsigned 8-bit
integer vectors instead of the R6RS prefix #vu8.

I wrote a patch for the r7rs-wip branch (attached) to add and implement
reader and print options to enable the use of R7RS bytevector syntax, as
well as add unit tests for the options and update the documentation. I made
a boolean option for both named 'r7rs-bytevectors to enable the R7RS syntax
(default is #f). They syntax options are enabled with

    (read-enable 'r7rs-bytevectors)
    (print-enable 'r7rs-bytevectors)

Turning this syntax option on does mean that SRFI-4 unsigned 8-bit integer
vectors cannot be created with the #u8 prefix and that they cannot be
distinguished from bytevectors when printed with write or display. The
patch adds warnings about this in the Bytevectors and SRFI-4 sections of
the documentation.


Freja Nordsiek

[-- Attachment #1.2: Type: text/html, Size: 1127 bytes --]

[-- Attachment #2: 0001-Added-read-and-print-options-for-R7RS-bytevector-not.patch --]
[-- Type: text/x-patch, Size: 13410 bytes --]

From 88126627a01185c7a88a01269ef46f00c1466106 Mon Sep 17 00:00:00 2001
From: Freja Nordsiek <fnordsie@gmail.com>
Date: Mon, 19 Jun 2017 01:00:01 +0200
Subject: [PATCH] Added read and print options for R7RS bytevector notation.

* libguile/private-options.h: Added read and print options.
* libguile/read.c: Added and implemented R7RS bytevector reading option.
* libguile/print.c: Added R7RS bytevector print option.
* libguile/bytevector.c (scm_i_print_bytevector): Implemented option to print
  bytevectors using R7RS notation.
* test-suite/tests/reader.test: Added tests for the read option.
* test-suite/tests/print.test: Added tests for the print option.
* doc/ref/api-evaluation.texi (Scheme Read and Scheme Write): Updated to
  reflect added read and print options.
* doc/ref/api-data.texi (Bytevectors): Updated to reflect added read and print
  options for bytevectors.
* doc/ref/srfi-modules.texi (SRFI-4): Added warning about the added read and
  print options conflicting with unsigned 8-bit integers.
---
 doc/ref/api-data.texi        | 11 +++++++++++
 doc/ref/api-evaluation.texi  |  3 +++
 doc/ref/srfi-modules.texi    |  7 +++++++
 libguile/bytevectors.c       |  9 ++++++++-
 libguile/print.c             |  2 ++
 libguile/private-options.h   |  6 ++++--
 libguile/read.c              | 29 ++++++++++++++++++++++++-----
 test-suite/tests/print.test  | 17 ++++++++++++++++-
 test-suite/tests/reader.test | 13 +++++++++++++
 9 files changed, 88 insertions(+), 9 deletions(-)

diff --git a/doc/ref/api-data.texi b/doc/ref/api-data.texi
index acdf9ca..17f4c07 100644
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@@ -4572,6 +4572,17 @@ they do not need to be quoted:
 @result{} #vu8(1 53 204)
 @end lisp
 
+R7RS uses a different syntax for bytevectors, which uses the prefix @code{#u8}
+to make it more in line with SRFI-4 (@pxref{SRFI-4}).  This syntax can be
+enabled for reading and writing by enabling the @code{'r7rs-bytevectors} read
+option with @code{(read-enable 'r7rs-bytevectors)} (@pxref{Scheme Read})
+and print option with @code{(print-enable 'r7rs-bytevectors)}
+(@pxref{Scheme Write}) respectively.  Note that enabling these read and
+print options will mean that SRFI-4 unsigned 8-bit integers (which are a
+separate type in Guile) cannot be created using the @code{#u8} prefix and it
+will not be possible to distinguish bytevectors from SRFI-4 unsigned 8-bit
+integers from their printed forms.
+
 Bytevectors can be used with the binary input/output primitives of the
 R6RS (@pxref{R6RS I/O Ports}).
 
diff --git a/doc/ref/api-evaluation.texi b/doc/ref/api-evaluation.texi
index 565ccdb..a63a3dd 100644
--- a/doc/ref/api-evaluation.texi
+++ b/doc/ref/api-evaluation.texi
@@ -343,6 +343,7 @@ hungry-eol-escapes no   In strings, consume leading whitespace after an
                         escaped end-of-line.
 curly-infix       no    Support SRFI-105 curly infix expressions.
 r7rs-symbols      no    Support R7RS |...| symbol notation.
+r7rs-bytevectors  no    Support R7RS #u8(...) bytevector notation in addition to R6RS #vu8(...).
 @end smalllisp
 
 Guile allows read options to be set on a per-port basis in one of two
@@ -465,6 +466,8 @@ escape-newlines           yes     Render newlines as \n when printing
                                   using `write'. 
 r7rs-symbols              no      Escape symbols using R7RS |...| symbol
                                   notation.
+r7rs-bytevectors          no      Print bytevectors using R7RS #u8(...) notation
+                                  instead of R6RS #vu8(...) notation.
 @end smalllisp
 
 These options may be modified with the print-set! syntax.
diff --git a/doc/ref/srfi-modules.texi b/doc/ref/srfi-modules.texi
index b1776c6..2532ec6 100644
--- a/doc/ref/srfi-modules.texi
+++ b/doc/ref/srfi-modules.texi
@@ -1438,6 +1438,13 @@ for a three element list @code{(1 #f 3)}, but for Guile @code{(1 #f3)}
 is invalid.  @code{(1 #f 3)} is almost certainly what one should write
 anyway to make the intention clear, so this is rarely a problem.
 
+Note that the read syntax for unsigned 8-bit integers here conflicts
+with the R7RS read syntax of bytevectors.  When the @code{'r7rs-bytevectors}
+read option is set with @code{(read-enable 'r7rs-bytevectors)}, the @code{#u8}
+tag will make bytevectors instead of unsigned 8-bit integer vectors.  And
+similarly, the two types cannot be distinguished when printing when the
+equivalent printing option is set with @code{(print-enable 'r7rs-bytevectors)}.
+@xref{Bytevectors}, for more information.
 
 @node SRFI-4 API
 @subsubsection SRFI-4 - API
diff --git a/libguile/bytevectors.c b/libguile/bytevectors.c
index 5008d23..48a2dae 100644
--- a/libguile/bytevectors.c
+++ b/libguile/bytevectors.c
@@ -35,6 +35,7 @@
 #include "libguile/array-handle.h"
 #include "libguile/uniform.h"
 #include "libguile/srfi-4.h"
+#include "libguile/private-options.h"
 
 #include <byteswap.h>
 #include <striconveh.h>
@@ -404,7 +405,13 @@ scm_i_print_bytevector (SCM bv, SCM port, scm_print_state *pstate SCM_UNUSED)
   scm_array_get_handle (bv, &h);
 
   scm_putc ('#', port);
-  scm_write (scm_array_handle_element_type (&h), port);
+  /* VU8 bytevectors are printed with u8 when r7rs-bytevectors print option is
+     enabled. Otherwise, they are printed the default way (vu8). */
+  if (SCM_PRINT_R7RS_BYTEVECTORS_P
+      && SCM_BYTEVECTOR_ELEMENT_TYPE (bv) == SCM_ARRAY_ELEMENT_TYPE_VU8)
+      scm_puts ("u8", port);
+  else
+    scm_write (scm_array_handle_element_type (&h), port);
   scm_putc ('(', port);
   for (i = h.dims[0].lbnd, ubnd = h.dims[0].ubnd, inc = h.dims[0].inc;
        i <= ubnd; i += inc)
diff --git a/libguile/print.c b/libguile/print.c
index 8090c01..714fed0 100644
--- a/libguile/print.c
+++ b/libguile/print.c
@@ -119,6 +119,8 @@ scm_t_option scm_print_opts[] = {
     "Escape symbols using R7RS |...| symbol notation." },
   { SCM_OPTION_BOOLEAN, "datum-labels", 0,
     "Print cyclic data using SRFI-38 datum label notation." },
+  { SCM_OPTION_BOOLEAN, "r7rs-bytevectors", 0,
+    "Print bytevectors using R7RS #u8(...) notation instead of R6RS #vu8(...) notation."},
   { 0 },
 };
 
diff --git a/libguile/private-options.h b/libguile/private-options.h
index 5205dfb..885a307 100644
--- a/libguile/private-options.h
+++ b/libguile/private-options.h
@@ -54,7 +54,8 @@ SCM_INTERNAL scm_t_option scm_print_opts[];
 #define SCM_PRINT_ESCAPE_NEWLINES_P scm_print_opts[3].val
 #define SCM_PRINT_R7RS_SYMBOLS_P    scm_print_opts[4].val
 #define SCM_PRINT_DATUM_LABELS_P    scm_print_opts[5].val
-#define SCM_N_PRINT_OPTIONS 6
+#define SCM_PRINT_R7RS_BYTEVECTORS_P scm_print_opts[6].val
+#define SCM_N_PRINT_OPTIONS 7
 
 
 /*
@@ -71,7 +72,8 @@ SCM_INTERNAL scm_t_option scm_read_opts[];
 #define SCM_HUNGRY_EOL_ESCAPES_P scm_read_opts[6].val
 #define SCM_CURLY_INFIX_P      scm_read_opts[7].val
 #define SCM_R7RS_SYMBOLS_P     scm_read_opts[8].val
+#define SCM_R7RS_BYTEVECTORS_P scm_read_opts[9].val
 
-#define SCM_N_READ_OPTIONS 9
+#define SCM_N_READ_OPTIONS 10
 
 #endif  /* PRIVATE_OPTIONS */ 
diff --git a/libguile/read.c b/libguile/read.c
index f1adc8f..7dbf45b 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -90,6 +90,8 @@ scm_t_option scm_read_opts[] =
       "Support SRFI-105 curly infix expressions."},
     { SCM_OPTION_BOOLEAN, "r7rs-symbols", 0,
       "Support R7RS |...| symbol notation."},
+    { SCM_OPTION_BOOLEAN, "r7rs-bytevectors", 0,
+      "Support R7RS #u8(...) bytevector notation in addition to R6RS #vu8(...)."},
     { 0, },
   };
  
@@ -116,6 +118,7 @@ struct t_read_context
   unsigned int curly_infix_p        : 1;
   unsigned int neoteric_p           : 1;
   unsigned int r7rs_symbols_p       : 1;
+  unsigned int r7rs_bytevectors_p   : 1;
 
   SCM datum_label_table, datum_label_tag;
 };
@@ -1475,9 +1478,14 @@ static SCM
 scm_read_bytevector (scm_t_wchar chr, SCM port, scm_t_read_context *ctx,
                      long line, int column)
 {
-  chr = scm_getc (port);
-  if (chr != 'u')
-    goto syntax;
+  /* If the bytevector style is R6RS, there is a 'u' to read. If it is R7RS
+     style, the 'u' was already read. */
+  if (!ctx->r7rs_bytevectors_p)
+    {
+      chr = scm_getc (port);
+      if (chr != 'u')
+        goto syntax;
+    }
 
   chr = scm_getc (port);
   if (chr != '8')
@@ -1796,13 +1804,19 @@ scm_read_sharp (scm_t_wchar chr, SCM port, scm_t_read_context *ctx,
     case '(':
       return (scm_read_vector (chr, port, ctx, line, column));
     case 's':
-    case 'u':
     case 'f':
     case 'c':
       /* This one may return either a boolean or an SRFI-4 vector.  */
       return (scm_read_srfi4_vector (chr, port, ctx, line, column));
     case 'v':
       return (scm_read_bytevector (chr, port, ctx, line, column));
+    case 'u':
+      /* Will be a bytevector if doing r7rs bytevectors, and an SRFI-4 vector
+         otherwise. */
+      if (ctx->r7rs_bytevectors_p)
+        return (scm_read_bytevector (chr, port, ctx, line, column));
+      else
+        return (scm_read_srfi4_vector (chr, port, ctx, line, column));
     case '*':
       return (scm_read_guile_bit_vector (chr, port, ctx, line, column));
     case 't':
@@ -2383,9 +2397,10 @@ SCM_SYMBOL (sym_port_read_options, "port-read-options");
 #define READ_OPTION_HUNGRY_EOL_ESCAPES_P  12
 #define READ_OPTION_CURLY_INFIX_P         14
 #define READ_OPTION_R7RS_SYMBOLS_P        16
+#define READ_OPTION_R7RS_BYTEVECTORS_P    18
 
 /* The total width in bits of the per-port overrides */
-#define READ_OPTIONS_NUM_BITS             18
+#define READ_OPTIONS_NUM_BITS             20
 
 #define READ_OPTIONS_INHERIT_ALL  ((1UL << READ_OPTIONS_NUM_BITS) - 1)
 #define READ_OPTIONS_MAX_VALUE    READ_OPTIONS_INHERIT_ALL
@@ -2421,6 +2436,7 @@ SCM_SYMBOL (sym_square_brackets, "square-brackets");
 SCM_SYMBOL (sym_hungry_eol_escapes, "hungry-eol-escapes");
 SCM_SYMBOL (sym_curly_infix, "curly-infix");
 SCM_SYMBOL (sym_r7rs_symbols, "r7rs-symbols");
+SCM_SYMBOL (sym_r7rs_bytevectors, "r7rs-bytevectors");
 
 /* Special 'inherit' value for 'set-port-read-option!'. */
 SCM_SYMBOL (sym_inherit, "inherit");
@@ -2469,6 +2485,8 @@ SCM_DEFINE (scm_set_port_read_option_x, "set-port-read-option!", 3, 0, 0,
         option_code = READ_OPTION_CURLY_INFIX_P;
       else if (scm_is_eq (option, sym_r7rs_symbols))
         option_code = READ_OPTION_R7RS_SYMBOLS_P;
+      else if (scm_is_eq (option, sym_r7rs_bytevectors))
+        option_code = READ_OPTION_R7RS_BYTEVECTORS_P;
       else
         scm_wrong_type_arg_msg ("set-port-read-option!", 2,
                                 option, "valid read option symbol");
@@ -2562,6 +2580,7 @@ init_read_context (SCM port, scm_t_read_context *ctx)
   RESOLVE_BOOLEAN_OPTION (HUNGRY_EOL_ESCAPES_P, hungry_eol_escapes_p);
   RESOLVE_BOOLEAN_OPTION (CURLY_INFIX_P,        curly_infix_p);
   RESOLVE_BOOLEAN_OPTION (R7RS_SYMBOLS_P,       r7rs_symbols_p);
+  RESOLVE_BOOLEAN_OPTION (R7RS_BYTEVECTORS_P,   r7rs_bytevectors_p);
 
 #undef RESOLVE_BOOLEAN_OPTION
 
diff --git a/test-suite/tests/print.test b/test-suite/tests/print.test
index 01bc994..5ced167 100644
--- a/test-suite/tests/print.test
+++ b/test-suite/tests/print.test
@@ -17,6 +17,7 @@
 ;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 
 (define-module (test-suite test-print)
+  #:use-module (rnrs bytevectors)
   #:use-module (ice-9 pretty-print)
   #:use-module (test-suite lib))
 
@@ -86,7 +87,21 @@
     (pass-if-equal "ends with backslash"
         "|foo\\x5c;|"
       (write-with-options '(r7rs-symbols)
-                          (string->symbol "foo\\")))))
+                          (string->symbol "foo\\"))))
+
+  (with-test-prefix "r7rs-bytevectors"
+
+    (pass-if-equal "off"
+        "#vu8(3 0 203 1)"
+      (write-with-options '() (u8-list->bytevector '(3 0 203 1))))
+
+    (pass-if-equal "on"
+        "#u8(0 6 255 103)"
+      (write-with-options '(r7rs-bytevectors) (u8-list->bytevector '(0 6 255 103))))
+
+    (pass-if-equal "on - doesn't affect other SRFI-4 types"
+        "#u16(0 6 255 103)"
+      (write-with-options '(r7rs-bytevectors) #u16(0 6 255 103)))))
 
 \f
 (with-test-prefix "pretty-print"
diff --git a/test-suite/tests/reader.test b/test-suite/tests/reader.test
index 18c0293..ae4fd5f 100644
--- a/test-suite/tests/reader.test
+++ b/test-suite/tests/reader.test
@@ -19,6 +19,7 @@
 ;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
 
 (define-module (test-suite reader)
+  :use-module (rnrs bytevectors)
   :use-module (srfi srfi-1)
   :use-module (test-suite lib))
 
@@ -243,6 +244,18 @@
     (with-read-options '(r7rs-symbols)
       (lambda ()
         (read-string "(a |H\\x65;llo, this is \\| a \"test\"| b)"))))
+  (pass-if "r7rs-bytevectors off"
+    (let ((bv1 (u8-list->bytevector '(1 2 3 200)))
+          (bv2 (with-read-options '()
+                    (lambda ()
+                      (read-string "#vu8(1 2 3 200)")))))
+      (and (bytevector=? bv1 bv2) (not (u8vector? bv2)))))
+  (pass-if "r7rs-bytevectors on"
+    (let ((bv1 (u8-list->bytevector '(1 2 3 200)))
+          (bv2 (with-read-options '(r7rs-bytevectors)
+                    (lambda ()
+                      (read-string "#u8(1 2 3 200)")))))
+      (and (bytevector=? bv1 bv2) (not (u8vector? bv2)))))
   (pass-if "prefix keywords"
     (eq? #:keyword
          (with-read-options '(keywords prefix case-insensitive)
-- 
2.9.4


^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
  2017-06-18 23:28 [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax Freja Nordsiek
@ 2017-06-21  1:13 ` Mark H Weaver
  2017-06-21  6:04   ` Freja Nordsiek
  2017-06-21  2:11 ` Mark H Weaver
  1 sibling, 1 reply; 8+ messages in thread
From: Mark H Weaver @ 2017-06-21  1:13 UTC (permalink / raw)
  To: Freja Nordsiek; +Cc: Andy Wingo, guile-devel

Hi Freja,

Freja Nordsiek <fnordsie@gmail.com> writes:
> Was fiddling around with using Chibi's R7RS test-suite in Guile and
> found a major R7RS syntax feature currently missing from Guile. The
> feature is R7RS bytevector notation, which uses the #u8 prefix like
> SRFI-4 unsigned 8-bit integer vectors instead of the R6RS prefix #vu8.

This is mostly not an issue, because in Guile, SRFI-4 vectors are
actually just bytevectors.  All of the bytevector operations work on
them, and 'bytevector?' returns true for them.  So, I expect that the
overwhelming majority of R7RS code will work.

To support this, Guile's bytevectors include an additional field called
the "element type", which tells what kind of elements are in the
bytevector, e.g. SCM_ARRAY_ELEMENT_TYPE_U8, SCM_ARRAY_ELEMENT_TYPE_F64,
etc.  See srfi-4.[ch] and bytevectors.[ch].

The only remaining issue is that, for some reason which I don't recall,
we have distinct element types for plain bytevectors (as produced by our
bytevector operations, and by reading #vu8(...)) and SRFI-4 unsigned
8-bit vectors (from reading #u8(...)), despite the fact that the
elements are actually the same type.  In the C code, they are:

  SCM_ARRAY_ELEMENT_TYPE_VU8
  SCM_ARRAY_ELEMENT_TYPE_U8

This affects how bytevectors are printed, and it also affects equality
testing on bytevectors:

  (bytevector=? #u8(1 2 3) #vu8(1 2 3))
    => #f

My preliminary attempt to mitigate this issue in 'r7rs-wip' was:

  commit 84aebcaecb78ac87b0039451becf9623e3ddcce4
  Author: Mark H Weaver <mhw@netris.org>
  Date:   Sun Jan 12 04:44:39 2014 -0500

  bytevector=?: #vu8(1 2 3) is equal to #u8(1 2 3).

  * libguile/bytevectors.c (scm_bytevector_eq_p): Treat VU8 and U8 element
  types as equivalent.

but I'm not sure it's the right solution.  This alone will still result
in R7RS code ending up with a mixture of U8 and VU8 bytevectors: the
former for bytevector literals, and the latter as the results of other
bytevector constructors.

Perhaps the more obvious solution would be to completely eliminate the
distinction between SRFI-4 u8 vectors and normal bytevectors by merging
their element types, but there may be problems with that idea as well.

Andy Wingo wrote the bytevector and SRFI-4 implementations in Guile.  I
remember talking to him about this issue several years ago, and I seem
to recall that he didn't like the idea of merging the element types, but
I don't remember his rationale.  In any case, we should not proceed
without his input.

Andy, what do you think?

     Regards,
       Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
  2017-06-18 23:28 [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax Freja Nordsiek
  2017-06-21  1:13 ` Mark H Weaver
@ 2017-06-21  2:11 ` Mark H Weaver
  2017-06-21  7:00   ` Freja Nordsiek
  1 sibling, 1 reply; 8+ messages in thread
From: Mark H Weaver @ 2017-06-21  2:11 UTC (permalink / raw)
  To: Freja Nordsiek; +Cc: guile-devel

Freja Nordsiek <fnordsie@gmail.com> writes:

> Was fiddling around with using Chibi's R7RS test-suite in Guile and
> found a major R7RS syntax feature currently missing from Guile. The
> feature is R7RS bytevector notation, which uses the #u8 prefix like
> SRFI-4 unsigned 8-bit integer vectors instead of the R6RS prefix #vu8.
>
> I wrote a patch for the r7rs-wip branch (attached) to add and
> implement reader and print options to enable the use of R7RS
> bytevector syntax, as well as add unit tests for the options and
> update the documentation. I made a boolean option for both named
> 'r7rs-bytevectors to enable the R7RS syntax (default is #f). They
> syntax options are enabled with
>
> (read-enable 'r7rs-bytevectors)
> (print-enable 'r7rs-bytevectors)
>
> Turning this syntax option on does mean that SRFI-4 unsigned 8-bit
> integer vectors cannot be created with the #u8 prefix and that they
> cannot be distinguished from bytevectors when printed with write or
> display. The patch adds warnings about this in the Bytevectors and
> SRFI-4 sections of the documentation.

To supplement my earlier reply, I suppose I should explain why I would
prefer to merge the #u8 and #vu8 bytevectors into a single type, or
somehow render the differences negligible, instead of adding these read
and print options as you suggest.

The main reason is that the reader and printer are global facilities in
Guile, whereas in general programs will be composed of a mixture of
R6RS, R7RS, and native Guile code.

Adding (read-enable 'r7rs-bytevectors) to the top of a source file is
not a good approach, because it sets the read option for the remainder
of the Guile session, and thus will affect the way future files are
read, even if they do not contain R7RS code.  Also, there's no guarantee
that (read-enable 'r7rs-bytevectors) will be evaluated before the rest
of the file is read.

One thing that we could do instead is to add a #!r7rs reader directive
for use at the top of source files.  See 'scm_read_shebang' in read.c,
which supports #!r6rs and other reader directives.  This would be
guaranteed to affect the way subsequent datums are read, but only on
that particular port.  I suppose it would be good to add this,
regardless of how we handle the issue with literal bytevectors.

However, it should be noted that unlike R6RS, which explicitly includes
#!r6rs in its formal syntax and specifies its meaning, the R7RS formal
syntax does not even allow for #!r7rs, so a compliant R7RS
implementation may reject a file that contains it.  So, it would be good
to avoid relying on this.

By reusing the SRFI-4 syntax for its bytevectors, R7RS effectively
requires that its implementations will treat SRFI-4 U8 vectors as
equivalent to bytevectors.  If there's no compelling reason to avoid
this, I think we should do it.

       Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
  2017-06-21  1:13 ` Mark H Weaver
@ 2017-06-21  6:04   ` Freja Nordsiek
  2017-06-21 15:58     ` Mark H Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Freja Nordsiek @ 2017-06-21  6:04 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

[-- Attachment #1: Type: text/plain, Size: 3688 bytes --]

Mark,


I saw the stuff in the bytevectors code where the SRFI-4 u8 vectors and bytevectors are essentially the same but with different flavors. The reader option was to make #u8 make bytectors that are exactly the same as #vu8.

Merging the two types solves the reading problem completely since it doesn't matter how they are made, they will be the same type. But, it does still leave the printing problem since there are still two ways to print them, R6RS and R7RS. And it introduces the issue that SRFI-4 u8 vectors would print as #vu8 in R6RS code. One potential solution to this would be to have two different print functions essentially, an R6RS one and an R7RS one and which one one gets depends on which libraries one imports from (R6RS or R7RS), though I can imagine ugliness that this would cause. It really is kind of a tough problem that R6RS didn't go with #u8 notation but R7RS did.


Freja Nordsiek

On June 21, 2017 3:13:07 AM GMT+02:00, Mark H Weaver <mhw@netris.org> wrote:
>Hi Freja,
>
>Freja Nordsiek <fnordsie@gmail.com> writes:
>> Was fiddling around with using Chibi's R7RS test-suite in Guile and
>> found a major R7RS syntax feature currently missing from Guile. The
>> feature is R7RS bytevector notation, which uses the #u8 prefix like
>> SRFI-4 unsigned 8-bit integer vectors instead of the R6RS prefix
>#vu8.
>
>This is mostly not an issue, because in Guile, SRFI-4 vectors are
>actually just bytevectors.  All of the bytevector operations work on
>them, and 'bytevector?' returns true for them.  So, I expect that the
>overwhelming majority of R7RS code will work.
>
>To support this, Guile's bytevectors include an additional field called
>the "element type", which tells what kind of elements are in the
>bytevector, e.g. SCM_ARRAY_ELEMENT_TYPE_U8, SCM_ARRAY_ELEMENT_TYPE_F64,
>etc.  See srfi-4.[ch] and bytevectors.[ch].
>
>The only remaining issue is that, for some reason which I don't recall,
>we have distinct element types for plain bytevectors (as produced by
>our
>bytevector operations, and by reading #vu8(...)) and SRFI-4 unsigned
>8-bit vectors (from reading #u8(...)), despite the fact that the
>elements are actually the same type.  In the C code, they are:
>
>  SCM_ARRAY_ELEMENT_TYPE_VU8
>  SCM_ARRAY_ELEMENT_TYPE_U8
>
>This affects how bytevectors are printed, and it also affects equality
>testing on bytevectors:
>
>  (bytevector=? #u8(1 2 3) #vu8(1 2 3))
>    => #f
>
>My preliminary attempt to mitigate this issue in 'r7rs-wip' was:
>
>  commit 84aebcaecb78ac87b0039451becf9623e3ddcce4
>  Author: Mark H Weaver <mhw@netris.org>
>  Date:   Sun Jan 12 04:44:39 2014 -0500
>
>  bytevector=?: #vu8(1 2 3) is equal to #u8(1 2 3).
>
>* libguile/bytevectors.c (scm_bytevector_eq_p): Treat VU8 and U8
>element
>  types as equivalent.
>
>but I'm not sure it's the right solution.  This alone will still result
>in R7RS code ending up with a mixture of U8 and VU8 bytevectors: the
>former for bytevector literals, and the latter as the results of other
>bytevector constructors.
>
>Perhaps the more obvious solution would be to completely eliminate the
>distinction between SRFI-4 u8 vectors and normal bytevectors by merging
>their element types, but there may be problems with that idea as well.
>
>Andy Wingo wrote the bytevector and SRFI-4 implementations in Guile.  I
>remember talking to him about this issue several years ago, and I seem
>to recall that he didn't like the idea of merging the element types,
>but
>I don't remember his rationale.  In any case, we should not proceed
>without his input.
>
>Andy, what do you think?
>
>     Regards,
>       Mark

[-- Attachment #2: Type: text/html, Size: 4311 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
  2017-06-21  2:11 ` Mark H Weaver
@ 2017-06-21  7:00   ` Freja Nordsiek
  2017-06-21 15:42     ` Mark H Weaver
  0 siblings, 1 reply; 8+ messages in thread
From: Freja Nordsiek @ 2017-06-21  7:00 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3737 bytes --]

Mark,


That is a good point about the global effect of the reader and print options. I saw that ports can have their own separate reader options. Wondering i f there is a good way to set it on a per file basis without using a #!r7rs line. It is easier with libraries where if the r7rs syntax is used, the switches could be flipped (and the r6rs ones for the r6rs library syntax). But this means a default has to be chosen for guile style library declarations and chosing for scripts would be really ugly. In principle, it could be based on what is imported but that would fail the moment a script imports libraries both from r6rs and r7rs, or if one doesn't import anything.


Freja Nordsiek

On June 21, 2017 4:11:16 AM GMT+02:00, Mark H Weaver <mhw@netris.org> wrote:
>Freja Nordsiek <fnordsie@gmail.com> writes:
>
>> Was fiddling around with using Chibi's R7RS test-suite in Guile and
>> found a major R7RS syntax feature currently missing from Guile. The
>> feature is R7RS bytevector notation, which uses the #u8 prefix like
>> SRFI-4 unsigned 8-bit integer vectors instead of the R6RS prefix
>#vu8.
>>
>> I wrote a patch for the r7rs-wip branch (attached) to add and
>> implement reader and print options to enable the use of R7RS
>> bytevector syntax, as well as add unit tests for the options and
>> update the documentation. I made a boolean option for both named
>> 'r7rs-bytevectors to enable the R7RS syntax (default is #f). They
>> syntax options are enabled with
>>
>> (read-enable 'r7rs-bytevectors)
>> (print-enable 'r7rs-bytevectors)
>>
>> Turning this syntax option on does mean that SRFI-4 unsigned 8-bit
>> integer vectors cannot be created with the #u8 prefix and that they
>> cannot be distinguished from bytevectors when printed with write or
>> display. The patch adds warnings about this in the Bytevectors and
>> SRFI-4 sections of the documentation.
>
>To supplement my earlier reply, I suppose I should explain why I would
>prefer to merge the #u8 and #vu8 bytevectors into a single type, or
>somehow render the differences negligible, instead of adding these read
>and print options as you suggest.
>
>The main reason is that the reader and printer are global facilities in
>Guile, whereas in general programs will be composed of a mixture of
>R6RS, R7RS, and native Guile code.
>
>Adding (read-enable 'r7rs-bytevectors) to the top of a source file is
>not a good approach, because it sets the read option for the remainder
>of the Guile session, and thus will affect the way future files are
>read, even if they do not contain R7RS code.  Also, there's no
>guarantee
>that (read-enable 'r7rs-bytevectors) will be evaluated before the rest
>of the file is read.
>
>One thing that we could do instead is to add a #!r7rs reader directive
>for use at the top of source files.  See 'scm_read_shebang' in read.c,
>which supports #!r6rs and other reader directives.  This would be
>guaranteed to affect the way subsequent datums are read, but only on
>that particular port.  I suppose it would be good to add this,
>regardless of how we handle the issue with literal bytevectors.
>
>However, it should be noted that unlike R6RS, which explicitly includes
>#!r6rs in its formal syntax and specifies its meaning, the R7RS formal
>syntax does not even allow for #!r7rs, so a compliant R7RS
>implementation may reject a file that contains it.  So, it would be
>good
>to avoid relying on this.
>
>By reusing the SRFI-4 syntax for its bytevectors, R7RS effectively
>requires that its implementations will treat SRFI-4 U8 vectors as
>equivalent to bytevectors.  If there's no compelling reason to avoid
>this, I think we should do it.
>
>       Mark

[-- Attachment #2: Type: text/html, Size: 4298 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
  2017-06-21  7:00   ` Freja Nordsiek
@ 2017-06-21 15:42     ` Mark H Weaver
  0 siblings, 0 replies; 8+ messages in thread
From: Mark H Weaver @ 2017-06-21 15:42 UTC (permalink / raw)
  To: Freja Nordsiek; +Cc: guile-devel

Hi Freja,

Freja Nordsiek <fnordsie@gmail.com> writes:

> That is a good point about the global effect of the reader and print
> options. I saw that ports can have their own separate reader
> options. Wondering i f there is a good way to set it on a per file
> basis without using a #!r7rs line. It is easier with libraries where
> if the r7rs syntax is used, the switches could be flipped (and the
> r6rs ones for the r6rs library syntax). But this means a default has
> to be chosen for guile style library declarations and chosing for
> scripts would be really ugly. In principle, it could be based on what
> is imported

It's true that we could implement something like this, and it might even
be the best approach among the available options, none of which are
pleasant.  It would be rather gross to implement though :-(

> but that would fail the moment a script imports libraries
> both from r6rs and r7rs, or if one doesn't import anything.

In R6RS and R7RS, importing nothing is not an option.  Both of those
standards explicitly require that *every* program begin with at least
one 'import' form, which is needed anyway because in those languages,
there are no bindings at all, not even core syntax, until they are
imported.

Both of those languages include base libraries that must be imported
explicitly: R6RS has (rnrs base), and R7RS has (scheme base).  In
principle, a program might not import either of them, but rather some
other library that exports a different set of core bindings, but in
practice I suspect that's exceptionally rare.  So, for programs we could
make the decision based on which base library is imported.

For libraries it is easy to reliably detect R6RS or R7RS, based on
whether (library ...) or (define-library ...) is found.

If the first datum in a file or stream is not (import ...),
(library ...), or (define-library ...), then we can reliably conclude
that the file is not valid R6RS or R7RS code.

The reason this would be gross to implement is that the auto-detection
logic would have to be integrated in the reader itself.  The reader
reads an entire datum at a time, but R6RS and R7RS libraries are fully
contained in only one datum.  So, at least for libraries, the reader
would have to be called with some kind of "auto-library-detect" flag,
and would need to switch its own reader options in the middle of reading
the datum.

Thoughts?

      Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
  2017-06-21  6:04   ` Freja Nordsiek
@ 2017-06-21 15:58     ` Mark H Weaver
  2017-06-27 17:19       ` Freja Nordsiek
  0 siblings, 1 reply; 8+ messages in thread
From: Mark H Weaver @ 2017-06-21 15:58 UTC (permalink / raw)
  To: Freja Nordsiek; +Cc: Andy Wingo, guile-devel

Freja Nordsiek <fnordsie@gmail.com> writes:

> I saw the stuff in the bytevectors code where the SRFI-4 u8 vectors
> and bytevectors are essentially the same but with different
> flavors. The reader option was to make #u8 make bytectors that are
> exactly the same as #vu8.

Right.  If it turns out that merging the VU8 and U8 element types for
bytevectors is problematic, this is probably the next best option,
combined with the auto-detection idea discussed in the other branch of
this thread.

> Merging the two types solves the reading problem completely since it
> doesn't matter how they are made, they will be the same type. But, it
> does still leave the printing problem since there are still two ways
> to print them, R6RS and R7RS. And it introduces the issue that SRFI-4
> u8 vectors would print as #vu8 in R6RS code. One potential solution to
> this would be to have two different print functions essentially, an
> R6RS one and an R7RS one and which one one gets depends on which
> libraries one imports from (R6RS or R7RS), though I can imagine
> ugliness that this would cause. It really is kind of a tough problem
> that R6RS didn't go with #u8 notation but R7RS did.

Yeah, I find it uncomfortable, but it may be that having different
top-level 'write' procedures for each language variant is the best
available approach.  Unlike 'read', auto-detection is not an option
here.

At this point, I'm desperately seeking ways to avoid setting global
reader/printer options.  That approach seems likely to lead to bad
consequences in a world where programs are composed of a mixture of
R6RS, R7RS and native Guile code.  I sometimes wonder if we should
deprecate global reader options.

      Mark

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax.
  2017-06-21 15:58     ` Mark H Weaver
@ 2017-06-27 17:19       ` Freja Nordsiek
  0 siblings, 0 replies; 8+ messages in thread
From: Freja Nordsiek @ 2017-06-27 17:19 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: Andy Wingo, guile-devel

[-- Attachment #1: Type: text/plain, Size: 4012 bytes --]

Mark,

Hmm, well, I have thought of one way to implement this. If the main
underlying write/print function took an extra-keyword option that said
which syntax to print bytevectors/u8vectors in (would be best to use
symbols 'r6rs and 'r7rs so as to not program ourselves into a corner with
regards to future changes to Scheme) with the display and write functions
in the R6RS and R7RS modules using setting the option to 'r6rs and 'r7rs
respectively. Only tricky thing here is to decide the default in the base
(guile) module. If bytevectors and u8vectors are unified, some level of
backwards compatability will be broken with 2.2 and 2.0 since either
u8vectors would get printed as #vu8 or bytevectors will get printed as #u8.
If the types are not unified, backwards compatability is maintained but
then using R7RS and SRFI-4 simultaneously becomes difficult unless we move
the R7RS modules entirely over to using SRFI-4 u8vectors. Moving R7RS over
to SRFI-4 u8vectors entirely them means we have two different kinds of
bytevectors, R6RS and R7RS kinds, and the procedures from each set would
have to only produce their respective kind and in some way handle when
given the other kind of bytevector (ones that set them probably shouldn't
change the type but for ones that return copies, it is hard to say which
option is best).

It seems like none of these options are good.

Unifying the types and just having the display/write procedures be
different poses some backwards compatibility issues and that when mixing
SRFI-4 and R6RS together, things will get messy.

Making R7RS use SRFI-4 u8vectors as its bytevectors means that a lot more
work needs to be done with R7RS's bytevector related functions to make them
produce the right type and figure out how to handle cases when R6RS and
R7RS bytevectors get mixed together as well as deal with making sure that
these R7RS bytevectors behave reasonably with existing bytevector related
functions (a lot of this has already been done with u8vectors since SRFI-4
has been in Guile quite a while).

I think that both of these options are better than my original proposal,
which was reader and print options.

Freja

On Wed, Jun 21, 2017 at 5:58 PM, Mark H Weaver <mhw@netris.org> wrote:

> Freja Nordsiek <fnordsie@gmail.com> writes:
>
> > I saw the stuff in the bytevectors code where the SRFI-4 u8 vectors
> > and bytevectors are essentially the same but with different
> > flavors. The reader option was to make #u8 make bytectors that are
> > exactly the same as #vu8.
>
> Right.  If it turns out that merging the VU8 and U8 element types for
> bytevectors is problematic, this is probably the next best option,
> combined with the auto-detection idea discussed in the other branch of
> this thread.
>
> > Merging the two types solves the reading problem completely since it
> > doesn't matter how they are made, they will be the same type. But, it
> > does still leave the printing problem since there are still two ways
> > to print them, R6RS and R7RS. And it introduces the issue that SRFI-4
> > u8 vectors would print as #vu8 in R6RS code. One potential solution to
> > this would be to have two different print functions essentially, an
> > R6RS one and an R7RS one and which one one gets depends on which
> > libraries one imports from (R6RS or R7RS), though I can imagine
> > ugliness that this would cause. It really is kind of a tough problem
> > that R6RS didn't go with #u8 notation but R7RS did.
>
> Yeah, I find it uncomfortable, but it may be that having different
> top-level 'write' procedures for each language variant is the best
> available approach.  Unlike 'read', auto-detection is not an option
> here.
>
> At this point, I'm desperately seeking ways to avoid setting global
> reader/printer options.  That approach seems likely to lead to bad
> consequences in a world where programs are composed of a mixture of
> R6RS, R7RS and native Guile code.  I sometimes wonder if we should
> deprecate global reader options.
>
>       Mark
>

[-- Attachment #2: Type: text/html, Size: 4748 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-06-27 17:19 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-06-18 23:28 [PATCH] r7rs-wip branch: Add reader and print options to support R7RS bytevector syntax Freja Nordsiek
2017-06-21  1:13 ` Mark H Weaver
2017-06-21  6:04   ` Freja Nordsiek
2017-06-21 15:58     ` Mark H Weaver
2017-06-27 17:19       ` Freja Nordsiek
2017-06-21  2:11 ` Mark H Weaver
2017-06-21  7:00   ` Freja Nordsiek
2017-06-21 15:42     ` Mark H Weaver

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).