* Fix reader options for R6RS `get-datum'
@ 2012-12-09 12:47 Andreas Rottmann
2012-12-09 12:47 ` [PATCH 1/3] Split r6rs-ports.c according to module boundaries Andreas Rottmann
` (3 more replies)
0 siblings, 4 replies; 15+ messages in thread
From: Andreas Rottmann @ 2012-12-09 12:47 UTC (permalink / raw)
To: guile-devel
This patch series addresses the problem that `get-datum' is using the
global reader options, even for those options that have to have fixed
values to make the reader behave in an R6RS-compatible way.
* [PATCH 1/3] Split r6rs-ports.c according to module boundaries
Needed by the last patch, since we want to add a private helper
function for use in `(rnrs io ports)' there.
* [PATCH 2/3] Add internal API to specify reader options at reader
This provides the basic mechanism, a.k.a. `scm_i_read()'.
* [PATCH 3/3] Make `get-datum' conform more closely to R6RS semantics
And finally, this makes use of `scm_i_read()' to implement
`get-datum'.
^ permalink raw reply [flat|nested] 15+ messages in thread
* [PATCH 1/3] Split r6rs-ports.c according to module boundaries
2012-12-09 12:47 Fix reader options for R6RS `get-datum' Andreas Rottmann
@ 2012-12-09 12:47 ` Andreas Rottmann
2012-12-15 5:35 ` Mark H Weaver
2012-12-15 5:38 ` Mark H Weaver
2012-12-09 12:47 ` [PATCH 2/3] Add internal API to specify reader options at reader invocation Andreas Rottmann
` (2 subsequent siblings)
3 siblings, 2 replies; 15+ messages in thread
From: Andreas Rottmann @ 2012-12-09 12:47 UTC (permalink / raw)
To: guile-devel
Currently, r6rs-ports.c provides procedures exported from `(ice-9
binary-ports)'. However, in addition, it also provides
`%make-transcoder', which is used by `(rnrs io ports)' via a "private
import".
This change splits r6rs-ports.c into two parts, one specific to `(ice-9
binary-ports)' and the other specific to `(rnrs io ports)'. This
provides a place to put internal C-implemented helpers needed only in
`(rnrs io ports)', thus allowing to eliminate the private import of
`%make-transcoder'.
* libguile/binary-port.c: New file, contains the most part of what has
been in libguile/r6rs-ports.c for now.
* libguile/binary-ports.h: Ditto for the header part.
* libguile/r6rs-ports.c: Now contains just transcoded ports, as this API
is not part of the `(ice-9 binary-ports)' module.
* libguile/r6rs-ports.h: Now includes "libguile/binary-ports.h", and
defines the extension registration and initialization functions.
* libguile/Makefile.am: Add new files.
---
libguile/Makefile.am | 2 +
libguile/binary-ports.c | 1152 +++++++++++++++++++++++++++++++++++++++++
libguile/binary-ports.h | 46 ++
libguile/r6rs-ports.c | 1111 +--------------------------------------
libguile/r6rs-ports.h | 21 +-
module/ice-9/binary-ports.scm | 7 +-
module/rnrs/io/ports.scm | 9 +-
7 files changed, 1211 insertions(+), 1137 deletions(-)
create mode 100644 libguile/binary-ports.c
create mode 100644 libguile/binary-ports.h
diff --git a/libguile/Makefile.am b/libguile/Makefile.am
index e216435..3c0fc8e 100644
--- a/libguile/Makefile.am
+++ b/libguile/Makefile.am
@@ -125,6 +125,7 @@ libguile_@GUILE_EFFECTIVE_VERSION@_la_SOURCES = \
async.c \
backtrace.c \
boolean.c \
+ binary-ports.c \
bitvectors.c \
bytevectors.c \
chars.c \
@@ -230,6 +231,7 @@ DOT_X_FILES = \
async.x \
backtrace.x \
boolean.x \
+ binary-ports.x \
bitvectors.x \
bytevectors.x \
chars.x \
diff --git a/libguile/binary-ports.c b/libguile/binary-ports.c
new file mode 100644
index 0000000..b58578b
--- /dev/null
+++ b/libguile/binary-ports.c
@@ -0,0 +1,1152 @@
+/* Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 3 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+#ifdef HAVE_CONFIG_H
+# include <config.h>
+#endif
+
+#ifdef HAVE_UNISTD_H
+# include <unistd.h>
+#endif
+
+#include <string.h>
+#include <stdio.h>
+#include <assert.h>
+
+#include "libguile/_scm.h"
+#include "libguile/bytevectors.h"
+#include "libguile/chars.h"
+#include "libguile/eval.h"
+#include "libguile/r6rs-ports.h"
+#include "libguile/strings.h"
+#include "libguile/validate.h"
+#include "libguile/values.h"
+#include "libguile/vectors.h"
+
+
+\f
+/* Unimplemented features. */
+
+
+/* Transoders are currently not implemented since Guile 1.8 is not
+ Unicode-capable. Thus, most of the code here assumes the use of the
+ binary transcoder. */
+static inline void
+transcoders_not_implemented (void)
+{
+ fprintf (stderr, "%s: warning: transcoders not implemented\n",
+ PACKAGE_NAME);
+}
+
+\f
+/* End-of-file object. */
+
+SCM_DEFINE (scm_eof_object, "eof-object", 0, 0, 0,
+ (void),
+ "Return the end-of-file object.")
+#define FUNC_NAME s_scm_eof_object
+{
+ return (SCM_EOF_VAL);
+}
+#undef FUNC_NAME
+
+\f
+/* Input ports. */
+
+#ifndef MIN
+# define MIN(a,b) ((a) < (b) ? (a) : (b))
+#endif
+
+/* Bytevector input ports or "bip" for short. */
+static scm_t_bits bytevector_input_port_type = 0;
+
+static inline SCM
+make_bip (SCM bv)
+{
+ SCM port;
+ char *c_bv;
+ unsigned c_len;
+ scm_t_port *c_port;
+ const unsigned long mode_bits = SCM_OPN | SCM_RDNG;
+
+ scm_i_scm_pthread_mutex_lock (&scm_i_port_table_mutex);
+
+ port = scm_new_port_table_entry (bytevector_input_port_type);
+ c_port = SCM_PTAB_ENTRY (port);
+
+ /* Match the expectation of `binary-port?'. */
+ c_port->encoding = NULL;
+
+ /* Prevent BV from being GC'd. */
+ SCM_SETSTREAM (port, SCM_UNPACK (bv));
+
+ /* Have the port directly access the bytevector. */
+ c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
+ c_len = SCM_BYTEVECTOR_LENGTH (bv);
+
+ c_port->read_pos = c_port->read_buf = (unsigned char *) c_bv;
+ c_port->read_end = (unsigned char *) c_bv + c_len;
+ c_port->read_buf_size = c_len;
+
+ /* Mark PORT as open, readable and unbuffered (hmm, how elegant...). */
+ SCM_SET_CELL_TYPE (port, bytevector_input_port_type | mode_bits);
+
+ scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
+
+ return port;
+}
+
+static int
+bip_fill_input (SCM port)
+{
+ int result;
+ scm_t_port *c_port = SCM_PTAB_ENTRY (port);
+
+ if (c_port->read_pos >= c_port->read_end)
+ result = EOF;
+ else
+ result = (int) *c_port->read_pos;
+
+ return result;
+}
+
+static scm_t_off
+bip_seek (SCM port, scm_t_off offset, int whence)
+#define FUNC_NAME "bip_seek"
+{
+ scm_t_off c_result = 0;
+ scm_t_port *c_port = SCM_PTAB_ENTRY (port);
+
+ switch (whence)
+ {
+ case SEEK_CUR:
+ offset += c_port->read_pos - c_port->read_buf;
+ /* Fall through. */
+
+ case SEEK_SET:
+ if (c_port->read_buf + offset <= c_port->read_end)
+ {
+ c_port->read_pos = c_port->read_buf + offset;
+ c_result = offset;
+ }
+ else
+ scm_out_of_range (FUNC_NAME, scm_from_int (offset));
+ break;
+
+ case SEEK_END:
+ if (c_port->read_end - offset >= c_port->read_buf)
+ {
+ c_port->read_pos = c_port->read_end - offset;
+ c_result = c_port->read_pos - c_port->read_buf;
+ }
+ else
+ scm_out_of_range (FUNC_NAME, scm_from_int (offset));
+ break;
+
+ default:
+ scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
+ "invalid `seek' parameter");
+ }
+
+ return c_result;
+}
+#undef FUNC_NAME
+
+
+/* Instantiate the bytevector input port type. */
+static inline void
+initialize_bytevector_input_ports (void)
+{
+ bytevector_input_port_type =
+ scm_make_port_type ("r6rs-bytevector-input-port", bip_fill_input,
+ NULL);
+
+ scm_set_port_seek (bytevector_input_port_type, bip_seek);
+}
+
+
+SCM_DEFINE (scm_open_bytevector_input_port,
+ "open-bytevector-input-port", 1, 1, 0,
+ (SCM bv, SCM transcoder),
+ "Return an input port whose contents are drawn from "
+ "bytevector @var{bv}.")
+#define FUNC_NAME s_scm_open_bytevector_input_port
+{
+ SCM_VALIDATE_BYTEVECTOR (1, bv);
+ if (!SCM_UNBNDP (transcoder) && !scm_is_false (transcoder))
+ transcoders_not_implemented ();
+
+ return (make_bip (bv));
+}
+#undef FUNC_NAME
+
+\f
+/* Custom binary ports. The following routines are shared by input and
+ output custom binary ports. */
+
+#define SCM_CBP_GET_POSITION_PROC(_port) \
+ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 1)
+#define SCM_CBP_SET_POSITION_PROC(_port) \
+ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 2)
+#define SCM_CBP_CLOSE_PROC(_port) \
+ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 3)
+
+static scm_t_off
+cbp_seek (SCM port, scm_t_off offset, int whence)
+#define FUNC_NAME "cbp_seek"
+{
+ SCM result;
+ scm_t_off c_result = 0;
+
+ switch (whence)
+ {
+ case SEEK_CUR:
+ {
+ SCM get_position_proc;
+
+ get_position_proc = SCM_CBP_GET_POSITION_PROC (port);
+ if (SCM_LIKELY (scm_is_true (get_position_proc)))
+ result = scm_call_0 (get_position_proc);
+ else
+ scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
+ "R6RS custom binary port does not "
+ "support `port-position'");
+
+ offset += scm_to_int (result);
+ /* Fall through. */
+ }
+
+ case SEEK_SET:
+ {
+ SCM set_position_proc;
+
+ set_position_proc = SCM_CBP_SET_POSITION_PROC (port);
+ if (SCM_LIKELY (scm_is_true (set_position_proc)))
+ result = scm_call_1 (set_position_proc, scm_from_int (offset));
+ else
+ scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
+ "R6RS custom binary port does not "
+ "support `set-port-position!'");
+
+ /* Assuming setting the position succeeded. */
+ c_result = offset;
+ break;
+ }
+
+ default:
+ /* `SEEK_END' cannot be supported. */
+ scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
+ "R6RS custom binary ports do not "
+ "support `SEEK_END'");
+ }
+
+ return c_result;
+}
+#undef FUNC_NAME
+
+static int
+cbp_close (SCM port)
+{
+ SCM close_proc;
+
+ close_proc = SCM_CBP_CLOSE_PROC (port);
+ if (scm_is_true (close_proc))
+ /* Invoke the `close' thunk. */
+ scm_call_0 (close_proc);
+
+ return 1;
+}
+
+\f
+/* Custom binary input port ("cbip" for short). */
+
+static scm_t_bits custom_binary_input_port_type = 0;
+
+/* Size of the buffer embedded in custom binary input ports. */
+#define CBIP_BUFFER_SIZE 4096
+
+/* Return the bytevector associated with PORT. */
+#define SCM_CBIP_BYTEVECTOR(_port) \
+ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 4)
+
+/* Return the various procedures of PORT. */
+#define SCM_CBIP_READ_PROC(_port) \
+ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 0)
+
+
+static inline SCM
+make_cbip (SCM read_proc, SCM get_position_proc,
+ SCM set_position_proc, SCM close_proc)
+{
+ SCM port, bv, method_vector;
+ char *c_bv;
+ unsigned c_len;
+ scm_t_port *c_port;
+ const unsigned long mode_bits = SCM_OPN | SCM_RDNG;
+
+ /* Use a bytevector as the underlying buffer. */
+ c_len = CBIP_BUFFER_SIZE;
+ bv = scm_c_make_bytevector (c_len);
+ c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
+
+ /* Store the various methods and bytevector in a vector. */
+ method_vector = scm_c_make_vector (5, SCM_BOOL_F);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 4, bv);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 0, read_proc);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 1, get_position_proc);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 2, set_position_proc);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 3, close_proc);
+
+ scm_i_pthread_mutex_lock (&scm_i_port_table_mutex);
+
+ port = scm_new_port_table_entry (custom_binary_input_port_type);
+ c_port = SCM_PTAB_ENTRY (port);
+
+ /* Match the expectation of `binary-port?'. */
+ c_port->encoding = NULL;
+
+ /* Attach it the method vector. */
+ SCM_SETSTREAM (port, SCM_UNPACK (method_vector));
+
+ /* Have the port directly access the buffer (bytevector). */
+ c_port->read_pos = c_port->read_buf = (unsigned char *) c_bv;
+ c_port->read_end = (unsigned char *) c_bv;
+ c_port->read_buf_size = c_len;
+
+ /* Mark PORT as open, readable and unbuffered (hmm, how elegant...). */
+ SCM_SET_CELL_TYPE (port, custom_binary_input_port_type | mode_bits);
+
+ scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
+
+ return port;
+}
+
+static int
+cbip_fill_input (SCM port)
+#define FUNC_NAME "cbip_fill_input"
+{
+ int result;
+ scm_t_port *c_port = SCM_PTAB_ENTRY (port);
+
+ again:
+ if (c_port->read_pos >= c_port->read_end)
+ {
+ /* Invoke the user's `read!' procedure. */
+ unsigned c_octets;
+ SCM bv, read_proc, octets;
+
+ /* Use the bytevector associated with PORT as the buffer passed to the
+ `read!' procedure, thereby avoiding additional allocations. */
+ bv = SCM_CBIP_BYTEVECTOR (port);
+ read_proc = SCM_CBIP_READ_PROC (port);
+
+ /* The assumption here is that C_PORT's internal buffer wasn't changed
+ behind our back. */
+ assert (c_port->read_buf ==
+ (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv));
+ assert ((unsigned) c_port->read_buf_size
+ == SCM_BYTEVECTOR_LENGTH (bv));
+
+ octets = scm_call_3 (read_proc, bv, SCM_INUM0,
+ SCM_I_MAKINUM (CBIP_BUFFER_SIZE));
+ c_octets = scm_to_uint (octets);
+
+ c_port->read_pos = (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv);
+ c_port->read_end = (unsigned char *) c_port->read_pos + c_octets;
+
+ if (c_octets > 0)
+ goto again;
+ else
+ result = EOF;
+ }
+ else
+ result = (int) *c_port->read_pos;
+
+ return result;
+}
+#undef FUNC_NAME
+
+
+SCM_DEFINE (scm_make_custom_binary_input_port,
+ "make-custom-binary-input-port", 5, 0, 0,
+ (SCM id, SCM read_proc, SCM get_position_proc,
+ SCM set_position_proc, SCM close_proc),
+ "Return a new custom binary input port whose input is drained "
+ "by invoking @var{read_proc} and passing it a bytevector, an "
+ "index where octets should be written, and an octet count.")
+#define FUNC_NAME s_scm_make_custom_binary_input_port
+{
+ SCM_VALIDATE_STRING (1, id);
+ SCM_VALIDATE_PROC (2, read_proc);
+
+ if (!scm_is_false (get_position_proc))
+ SCM_VALIDATE_PROC (3, get_position_proc);
+
+ if (!scm_is_false (set_position_proc))
+ SCM_VALIDATE_PROC (4, set_position_proc);
+
+ if (!scm_is_false (close_proc))
+ SCM_VALIDATE_PROC (5, close_proc);
+
+ return (make_cbip (read_proc, get_position_proc, set_position_proc,
+ close_proc));
+}
+#undef FUNC_NAME
+
+
+/* Instantiate the custom binary input port type. */
+static inline void
+initialize_custom_binary_input_ports (void)
+{
+ custom_binary_input_port_type =
+ scm_make_port_type ("r6rs-custom-binary-input-port",
+ cbip_fill_input, NULL);
+
+ scm_set_port_seek (custom_binary_input_port_type, cbp_seek);
+ scm_set_port_close (custom_binary_input_port_type, cbp_close);
+}
+
+
+\f
+/* Binary input. */
+
+/* We currently don't support specific binary input ports. */
+#define SCM_VALIDATE_BINARY_INPUT_PORT SCM_VALIDATE_OPINPORT
+
+SCM_DEFINE (scm_get_u8, "get-u8", 1, 0, 0,
+ (SCM port),
+ "Read an octet from @var{port}, a binary input port, "
+ "blocking as necessary.")
+#define FUNC_NAME s_scm_get_u8
+{
+ SCM result;
+ int c_result;
+
+ SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
+
+ c_result = scm_get_byte_or_eof (port);
+ if (c_result == EOF)
+ result = SCM_EOF_VAL;
+ else
+ result = SCM_I_MAKINUM ((unsigned char) c_result);
+
+ return result;
+}
+#undef FUNC_NAME
+
+SCM_DEFINE (scm_lookahead_u8, "lookahead-u8", 1, 0, 0,
+ (SCM port),
+ "Like @code{get-u8} but does not update @var{port} to "
+ "point past the octet.")
+#define FUNC_NAME s_scm_lookahead_u8
+{
+ int u8;
+ SCM result;
+
+ SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
+
+ u8 = scm_peek_byte_or_eof (port);
+ if (u8 == EOF)
+ result = SCM_EOF_VAL;
+ else
+ result = SCM_I_MAKINUM ((scm_t_uint8) u8);
+
+ return result;
+}
+#undef FUNC_NAME
+
+SCM_DEFINE (scm_get_bytevector_n, "get-bytevector-n", 2, 0, 0,
+ (SCM port, SCM count),
+ "Read @var{count} octets from @var{port}, blocking as "
+ "necessary and return a bytevector containing the octets "
+ "read. If fewer bytes are available, a bytevector smaller "
+ "than @var{count} is returned.")
+#define FUNC_NAME s_scm_get_bytevector_n
+{
+ SCM result;
+ char *c_bv;
+ unsigned c_count;
+ size_t c_read;
+
+ SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
+ c_count = scm_to_uint (count);
+
+ result = scm_c_make_bytevector (c_count);
+ c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (result);
+
+ if (SCM_LIKELY (c_count > 0))
+ /* XXX: `scm_c_read ()' does not update the port position. */
+ c_read = scm_c_read (port, c_bv, c_count);
+ else
+ /* Don't invoke `scm_c_read ()' since it may block. */
+ c_read = 0;
+
+ if ((c_read == 0) && (c_count > 0))
+ {
+ if (SCM_EOF_OBJECT_P (scm_peek_char (port)))
+ result = SCM_EOF_VAL;
+ else
+ result = scm_null_bytevector;
+ }
+ else
+ {
+ if (c_read < c_count)
+ result = scm_c_shrink_bytevector (result, c_read);
+ }
+
+ return result;
+}
+#undef FUNC_NAME
+
+SCM_DEFINE (scm_get_bytevector_n_x, "get-bytevector-n!", 4, 0, 0,
+ (SCM port, SCM bv, SCM start, SCM count),
+ "Read @var{count} bytes from @var{port} and store them "
+ "in @var{bv} starting at index @var{start}. Return either "
+ "the number of bytes actually read or the end-of-file "
+ "object.")
+#define FUNC_NAME s_scm_get_bytevector_n_x
+{
+ SCM result;
+ char *c_bv;
+ unsigned c_start, c_count, c_len;
+ size_t c_read;
+
+ SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
+ SCM_VALIDATE_BYTEVECTOR (2, bv);
+ c_start = scm_to_uint (start);
+ c_count = scm_to_uint (count);
+
+ c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
+ c_len = SCM_BYTEVECTOR_LENGTH (bv);
+
+ if (SCM_UNLIKELY (c_start + c_count > c_len))
+ scm_out_of_range (FUNC_NAME, count);
+
+ if (SCM_LIKELY (c_count > 0))
+ c_read = scm_c_read (port, c_bv + c_start, c_count);
+ else
+ /* Don't invoke `scm_c_read ()' since it may block. */
+ c_read = 0;
+
+ if ((c_read == 0) && (c_count > 0))
+ {
+ if (SCM_EOF_OBJECT_P (scm_peek_char (port)))
+ result = SCM_EOF_VAL;
+ else
+ result = SCM_I_MAKINUM (0);
+ }
+ else
+ result = scm_from_size_t (c_read);
+
+ return result;
+}
+#undef FUNC_NAME
+
+
+SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
+ (SCM port),
+ "Read from @var{port}, blocking as necessary, until data "
+ "are available or and end-of-file is reached. Return either "
+ "a new bytevector containing the data read or the "
+ "end-of-file object.")
+#define FUNC_NAME s_scm_get_bytevector_some
+{
+ /* Read at least one byte, unless the end-of-file is already reached, and
+ read while characters are available (buffered). */
+
+ SCM result;
+ char *c_bv;
+ unsigned c_len;
+ size_t c_total;
+
+ SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
+
+ c_len = 4096;
+ c_bv = (char *) scm_gc_malloc_pointerless (c_len, SCM_GC_BYTEVECTOR);
+ c_total = 0;
+
+ do
+ {
+ int c_chr;
+
+ if (c_total + 1 > c_len)
+ {
+ /* Grow the bytevector. */
+ c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_len * 2,
+ SCM_GC_BYTEVECTOR);
+ c_len *= 2;
+ }
+
+ /* We can't use `scm_c_read ()' since it blocks. */
+ c_chr = scm_getc (port);
+ if (c_chr != EOF)
+ {
+ c_bv[c_total] = (char) c_chr;
+ c_total++;
+ }
+ }
+ while ((scm_is_true (scm_char_ready_p (port)))
+ && (!SCM_EOF_OBJECT_P (scm_peek_char (port))));
+
+ if (c_total == 0)
+ {
+ result = SCM_EOF_VAL;
+ scm_gc_free (c_bv, c_len, SCM_GC_BYTEVECTOR);
+ }
+ else
+ {
+ if (c_len > c_total)
+ {
+ /* Shrink the bytevector. */
+ c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_total,
+ SCM_GC_BYTEVECTOR);
+ c_len = (unsigned) c_total;
+ }
+
+ result = scm_c_take_gc_bytevector ((signed char *) c_bv, c_len);
+ }
+
+ return result;
+}
+#undef FUNC_NAME
+
+SCM_DEFINE (scm_get_bytevector_all, "get-bytevector-all", 1, 0, 0,
+ (SCM port),
+ "Read from @var{port}, blocking as necessary, until "
+ "the end-of-file is reached. Return either "
+ "a new bytevector containing the data read or the "
+ "end-of-file object (if no data were available).")
+#define FUNC_NAME s_scm_get_bytevector_all
+{
+ SCM result;
+ char *c_bv;
+ unsigned c_len, c_count;
+ size_t c_read, c_total;
+
+ SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
+
+ c_len = c_count = 4096;
+ c_bv = (char *) scm_gc_malloc_pointerless (c_len, SCM_GC_BYTEVECTOR);
+ c_total = c_read = 0;
+
+ do
+ {
+ if (c_total + c_read > c_len)
+ {
+ /* Grow the bytevector. */
+ c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_len * 2,
+ SCM_GC_BYTEVECTOR);
+ c_count = c_len;
+ c_len *= 2;
+ }
+
+ /* `scm_c_read ()' blocks until C_COUNT bytes are available or EOF is
+ reached. */
+ c_read = scm_c_read (port, c_bv + c_total, c_count);
+ c_total += c_read, c_count -= c_read;
+ }
+ while (!SCM_EOF_OBJECT_P (scm_peek_char (port)));
+
+ if (c_total == 0)
+ {
+ result = SCM_EOF_VAL;
+ scm_gc_free (c_bv, c_len, SCM_GC_BYTEVECTOR);
+ }
+ else
+ {
+ if (c_len > c_total)
+ {
+ /* Shrink the bytevector. */
+ c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_total,
+ SCM_GC_BYTEVECTOR);
+ c_len = (unsigned) c_total;
+ }
+
+ result = scm_c_take_gc_bytevector ((signed char *) c_bv, c_len);
+ }
+
+ return result;
+}
+#undef FUNC_NAME
+
+
+\f
+/* Binary output. */
+
+/* We currently don't support specific binary input ports. */
+#define SCM_VALIDATE_BINARY_OUTPUT_PORT SCM_VALIDATE_OPOUTPORT
+
+
+SCM_DEFINE (scm_put_u8, "put-u8", 2, 0, 0,
+ (SCM port, SCM octet),
+ "Write @var{octet} to binary port @var{port}.")
+#define FUNC_NAME s_scm_put_u8
+{
+ scm_t_uint8 c_octet;
+
+ SCM_VALIDATE_BINARY_OUTPUT_PORT (1, port);
+ c_octet = scm_to_uint8 (octet);
+
+ scm_putc ((char) c_octet, port);
+
+ return SCM_UNSPECIFIED;
+}
+#undef FUNC_NAME
+
+SCM_DEFINE (scm_put_bytevector, "put-bytevector", 2, 2, 0,
+ (SCM port, SCM bv, SCM start, SCM count),
+ "Write the contents of @var{bv} to @var{port}, optionally "
+ "starting at index @var{start} and limiting to @var{count} "
+ "octets.")
+#define FUNC_NAME s_scm_put_bytevector
+{
+ char *c_bv;
+ unsigned c_start, c_count, c_len;
+
+ SCM_VALIDATE_BINARY_OUTPUT_PORT (1, port);
+ SCM_VALIDATE_BYTEVECTOR (2, bv);
+
+ c_len = SCM_BYTEVECTOR_LENGTH (bv);
+ c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
+
+ if (!scm_is_eq (start, SCM_UNDEFINED))
+ {
+ c_start = scm_to_uint (start);
+
+ if (!scm_is_eq (count, SCM_UNDEFINED))
+ {
+ c_count = scm_to_uint (count);
+ if (SCM_UNLIKELY (c_start + c_count > c_len))
+ scm_out_of_range (FUNC_NAME, count);
+ }
+ else
+ {
+ if (SCM_UNLIKELY (c_start >= c_len))
+ scm_out_of_range (FUNC_NAME, start);
+ else
+ c_count = c_len - c_start;
+ }
+ }
+ else
+ c_start = 0, c_count = c_len;
+
+ scm_c_write (port, c_bv + c_start, c_count);
+
+ return SCM_UNSPECIFIED;
+}
+#undef FUNC_NAME
+
+
+\f
+/* Bytevector output port ("bop" for short). */
+
+/* Implementation of "bops".
+
+ Each bop has an internal buffer, of type `scm_t_bop_buffer', attached to
+ it. The procedure returned along with the output port is actually an
+ applicable SMOB. The SMOB holds a reference to the port. When applied,
+ the SMOB swallows the port's internal buffer, turning it into a
+ bytevector, and resets it.
+
+ XXX: Access to a bop's internal buffer is not thread-safe. */
+
+static scm_t_bits bytevector_output_port_type = 0;
+
+SCM_SMOB (bytevector_output_port_procedure,
+ "r6rs-bytevector-output-port-procedure",
+ 0);
+
+#define SCM_GC_BOP "r6rs-bytevector-output-port"
+#define SCM_BOP_BUFFER_INITIAL_SIZE 4096
+
+/* Representation of a bop's internal buffer. */
+typedef struct
+{
+ size_t total_len;
+ size_t len;
+ size_t pos;
+ char *buffer;
+} scm_t_bop_buffer;
+
+
+/* Accessing a bop's buffer. */
+#define SCM_BOP_BUFFER(_port) \
+ ((scm_t_bop_buffer *) SCM_STREAM (_port))
+#define SCM_SET_BOP_BUFFER(_port, _buf) \
+ (SCM_SETSTREAM ((_port), (scm_t_bits) (_buf)))
+
+
+static inline void
+bop_buffer_init (scm_t_bop_buffer *buf)
+{
+ buf->total_len = buf->len = buf->pos = 0;
+ buf->buffer = NULL;
+}
+
+static inline void
+bop_buffer_grow (scm_t_bop_buffer *buf, size_t min_size)
+{
+ char *new_buf;
+ size_t new_size;
+
+ for (new_size = buf->total_len
+ ? buf->total_len : SCM_BOP_BUFFER_INITIAL_SIZE;
+ new_size < min_size;
+ new_size *= 2);
+
+ if (buf->buffer)
+ new_buf = scm_gc_realloc ((void *) buf->buffer, buf->total_len,
+ new_size, SCM_GC_BOP);
+ else
+ new_buf = scm_gc_malloc_pointerless (new_size, SCM_GC_BOP);
+
+ buf->buffer = new_buf;
+ buf->total_len = new_size;
+}
+
+static inline SCM
+make_bop (void)
+{
+ SCM port, bop_proc;
+ scm_t_port *c_port;
+ scm_t_bop_buffer *buf;
+ const unsigned long mode_bits = SCM_OPN | SCM_WRTNG;
+
+ scm_i_pthread_mutex_lock (&scm_i_port_table_mutex);
+
+ port = scm_new_port_table_entry (bytevector_output_port_type);
+ c_port = SCM_PTAB_ENTRY (port);
+
+ /* Match the expectation of `binary-port?'. */
+ c_port->encoding = NULL;
+
+ buf = (scm_t_bop_buffer *) scm_gc_malloc (sizeof (* buf), SCM_GC_BOP);
+ bop_buffer_init (buf);
+
+ c_port->write_buf = c_port->write_pos = c_port->write_end = NULL;
+ c_port->write_buf_size = 0;
+
+ SCM_SET_BOP_BUFFER (port, buf);
+
+ /* Mark PORT as open and writable. */
+ SCM_SET_CELL_TYPE (port, bytevector_output_port_type | mode_bits);
+
+ scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
+
+ /* Make the bop procedure. */
+ SCM_NEWSMOB (bop_proc, bytevector_output_port_procedure, buf);
+
+ return (scm_values (scm_list_2 (port, bop_proc)));
+}
+
+/* Write SIZE octets from DATA to PORT. */
+static void
+bop_write (SCM port, const void *data, size_t size)
+{
+ scm_t_bop_buffer *buf;
+
+ buf = SCM_BOP_BUFFER (port);
+
+ if (buf->pos + size > buf->total_len)
+ bop_buffer_grow (buf, buf->pos + size);
+
+ memcpy (buf->buffer + buf->pos, data, size);
+ buf->pos += size;
+ buf->len = (buf->len > buf->pos) ? buf->len : buf->pos;
+}
+
+static scm_t_off
+bop_seek (SCM port, scm_t_off offset, int whence)
+#define FUNC_NAME "bop_seek"
+{
+ scm_t_bop_buffer *buf;
+
+ buf = SCM_BOP_BUFFER (port);
+ switch (whence)
+ {
+ case SEEK_CUR:
+ offset += (scm_t_off) buf->pos;
+ /* Fall through. */
+
+ case SEEK_SET:
+ if (offset < 0 || (unsigned) offset > buf->len)
+ scm_out_of_range (FUNC_NAME, scm_from_int (offset));
+ else
+ buf->pos = offset;
+ break;
+
+ case SEEK_END:
+ if (offset < 0 || (unsigned) offset >= buf->len)
+ scm_out_of_range (FUNC_NAME, scm_from_int (offset));
+ else
+ buf->pos = buf->len - (offset + 1);
+ break;
+
+ default:
+ scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
+ "invalid `seek' parameter");
+ }
+
+ return buf->pos;
+}
+#undef FUNC_NAME
+
+/* Fetch data from a bop. */
+SCM_SMOB_APPLY (bytevector_output_port_procedure,
+ bop_proc_apply, 0, 0, 0, (SCM bop_proc))
+{
+ SCM bv;
+ scm_t_bop_buffer *buf, result_buf;
+
+ buf = (scm_t_bop_buffer *) SCM_SMOB_DATA (bop_proc);
+
+ result_buf = *buf;
+ bop_buffer_init (buf);
+
+ if (result_buf.len == 0)
+ bv = scm_c_take_gc_bytevector (NULL, 0);
+ else
+ {
+ if (result_buf.total_len > result_buf.len)
+ /* Shrink the buffer. */
+ result_buf.buffer = scm_gc_realloc ((void *) result_buf.buffer,
+ result_buf.total_len,
+ result_buf.len,
+ SCM_GC_BOP);
+
+ bv = scm_c_take_gc_bytevector ((signed char *) result_buf.buffer,
+ result_buf.len);
+ }
+
+ return bv;
+}
+
+SCM_DEFINE (scm_open_bytevector_output_port,
+ "open-bytevector-output-port", 0, 1, 0,
+ (SCM transcoder),
+ "Return two values: an output port and a procedure. The latter "
+ "should be called with zero arguments to obtain a bytevector "
+ "containing the data accumulated by the port.")
+#define FUNC_NAME s_scm_open_bytevector_output_port
+{
+ if (!SCM_UNBNDP (transcoder) && !scm_is_false (transcoder))
+ transcoders_not_implemented ();
+
+ return (make_bop ());
+}
+#undef FUNC_NAME
+
+static inline void
+initialize_bytevector_output_ports (void)
+{
+ bytevector_output_port_type =
+ scm_make_port_type ("r6rs-bytevector-output-port",
+ NULL, bop_write);
+
+ scm_set_port_seek (bytevector_output_port_type, bop_seek);
+}
+
+\f
+/* Custom binary output port ("cbop" for short). */
+
+static scm_t_bits custom_binary_output_port_type;
+
+/* Return the various procedures of PORT. */
+#define SCM_CBOP_WRITE_PROC(_port) \
+ SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 0)
+
+
+static inline SCM
+make_cbop (SCM write_proc, SCM get_position_proc,
+ SCM set_position_proc, SCM close_proc)
+{
+ SCM port, method_vector;
+ scm_t_port *c_port;
+ const unsigned long mode_bits = SCM_OPN | SCM_WRTNG;
+
+ /* Store the various methods and bytevector in a vector. */
+ method_vector = scm_c_make_vector (4, SCM_BOOL_F);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 0, write_proc);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 1, get_position_proc);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 2, set_position_proc);
+ SCM_SIMPLE_VECTOR_SET (method_vector, 3, close_proc);
+
+ scm_i_pthread_mutex_lock (&scm_i_port_table_mutex);
+
+ port = scm_new_port_table_entry (custom_binary_output_port_type);
+ c_port = SCM_PTAB_ENTRY (port);
+
+ /* Match the expectation of `binary-port?'. */
+ c_port->encoding = NULL;
+
+ /* Attach it the method vector. */
+ SCM_SETSTREAM (port, SCM_UNPACK (method_vector));
+
+ /* Have the port directly access the buffer (bytevector). */
+ c_port->write_buf = c_port->write_pos = c_port->write_end = NULL;
+ c_port->write_buf_size = c_port->read_buf_size = 0;
+
+ /* Mark PORT as open, writable and unbuffered. */
+ SCM_SET_CELL_TYPE (port, custom_binary_output_port_type | mode_bits);
+
+ scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
+
+ return port;
+}
+
+/* Write SIZE octets from DATA to PORT. */
+static void
+cbop_write (SCM port, const void *data, size_t size)
+#define FUNC_NAME "cbop_write"
+{
+ long int c_result;
+ size_t c_written;
+ SCM bv, write_proc, result;
+
+ /* XXX: Allocating a new bytevector at each `write' call is inefficient,
+ but necessary since (1) we don't control the lifetime of the buffer
+ pointed to by DATA, and (2) the `write!' procedure could capture the
+ bytevector it is passed. */
+ bv = scm_c_make_bytevector (size);
+ memcpy (SCM_BYTEVECTOR_CONTENTS (bv), data, size);
+
+ write_proc = SCM_CBOP_WRITE_PROC (port);
+
+ /* Since the `write' procedure of Guile's ports has type `void', it must
+ try hard to write exactly SIZE bytes, regardless of how many bytes the
+ sink can handle. */
+ for (c_written = 0;
+ c_written < size;
+ c_written += c_result)
+ {
+ result = scm_call_3 (write_proc, bv,
+ scm_from_size_t (c_written),
+ scm_from_size_t (size - c_written));
+
+ c_result = scm_to_long (result);
+ if (SCM_UNLIKELY (c_result < 0
+ || (size_t) c_result > (size - c_written)))
+ scm_wrong_type_arg_msg (FUNC_NAME, 0, result,
+ "R6RS custom binary output port `write!' "
+ "returned a incorrect integer");
+ }
+}
+#undef FUNC_NAME
+
+
+SCM_DEFINE (scm_make_custom_binary_output_port,
+ "make-custom-binary-output-port", 5, 0, 0,
+ (SCM id, SCM write_proc, SCM get_position_proc,
+ SCM set_position_proc, SCM close_proc),
+ "Return a new custom binary output port whose output is drained "
+ "by invoking @var{write_proc} and passing it a bytevector, an "
+ "index where octets should be written, and an octet count.")
+#define FUNC_NAME s_scm_make_custom_binary_output_port
+{
+ SCM_VALIDATE_STRING (1, id);
+ SCM_VALIDATE_PROC (2, write_proc);
+
+ if (!scm_is_false (get_position_proc))
+ SCM_VALIDATE_PROC (3, get_position_proc);
+
+ if (!scm_is_false (set_position_proc))
+ SCM_VALIDATE_PROC (4, set_position_proc);
+
+ if (!scm_is_false (close_proc))
+ SCM_VALIDATE_PROC (5, close_proc);
+
+ return (make_cbop (write_proc, get_position_proc, set_position_proc,
+ close_proc));
+}
+#undef FUNC_NAME
+
+
+/* Instantiate the custom binary output port type. */
+static inline void
+initialize_custom_binary_output_ports (void)
+{
+ custom_binary_output_port_type =
+ scm_make_port_type ("r6rs-custom-binary-output-port",
+ NULL, cbop_write);
+
+ scm_set_port_seek (custom_binary_output_port_type, cbp_seek);
+ scm_set_port_close (custom_binary_output_port_type, cbp_close);
+}
+
+\f
+/* Textual I/O */
+
+SCM_DEFINE (scm_get_string_n_x,
+ "get-string-n!", 4, 0, 0,
+ (SCM port, SCM str, SCM start, SCM count),
+ "Read up to @var{count} characters from @var{port} into "
+ "@var{str}, starting at @var{start}. If no characters "
+ "can be read before the end of file is encountered, the end "
+ "of file object is returned. Otherwise, the number of "
+ "characters read is returned.")
+#define FUNC_NAME s_scm_get_string_n_x
+{
+ size_t c_start, c_count, c_len, c_end, j;
+ scm_t_wchar c;
+
+ SCM_VALIDATE_OPINPORT (1, port);
+ SCM_VALIDATE_STRING (2, str);
+ c_len = scm_c_string_length (str);
+ c_start = scm_to_size_t (start);
+ c_count = scm_to_size_t (count);
+ c_end = c_start + c_count;
+
+ if (SCM_UNLIKELY (c_end > c_len))
+ scm_out_of_range (FUNC_NAME, count);
+
+ for (j = c_start; j < c_end; j++)
+ {
+ c = scm_getc (port);
+ if (c == EOF)
+ {
+ size_t chars_read = j - c_start;
+ return chars_read == 0 ? SCM_EOF_VAL : scm_from_size_t (chars_read);
+ }
+ scm_c_string_set_x (str, j, SCM_MAKE_CHAR (c));
+ }
+ return count;
+}
+#undef FUNC_NAME
+
+\f
+/* Initialization. */
+
+void
+scm_register_binary_ports (void)
+{
+ scm_c_register_extension ("libguile-" SCM_EFFECTIVE_VERSION,
+ "scm_init_binary_ports",
+ (scm_t_extension_init_func) scm_init_binary_ports,
+ NULL);
+}
+
+void
+scm_init_binary_ports (void)
+{
+#include "libguile/binary-ports.x"
+
+ initialize_bytevector_input_ports ();
+ initialize_custom_binary_input_ports ();
+ initialize_bytevector_output_ports ();
+ initialize_custom_binary_output_ports ();
+}
diff --git a/libguile/binary-ports.h b/libguile/binary-ports.h
new file mode 100644
index 0000000..efd7ed9
--- /dev/null
+++ b/libguile/binary-ports.h
@@ -0,0 +1,46 @@
+#ifndef SCM_BINARY_PORTS_H
+#define SCM_BINARY_PORTS_H
+
+/* Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
+ *
+ * This library is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU Lesser General Public License
+ * as published by the Free Software Foundation; either version 3 of
+ * the License, or (at your option) any later version.
+ *
+ * This library is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
+ * Lesser General Public License for more details.
+ *
+ * You should have received a copy of the GNU Lesser General Public
+ * License along with this library; if not, write to the Free Software
+ * Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA
+ * 02110-1301 USA
+ */
+
+\f
+
+#include "libguile/__scm.h"
+
+/* Binary I/O Ports. */
+
+SCM_API SCM scm_eof_object (void);
+SCM_API SCM scm_open_bytevector_input_port (SCM, SCM);
+SCM_API SCM scm_make_custom_binary_input_port (SCM, SCM, SCM, SCM, SCM);
+SCM_API SCM scm_get_u8 (SCM);
+SCM_API SCM scm_lookahead_u8 (SCM);
+SCM_API SCM scm_get_bytevector_n (SCM, SCM);
+SCM_API SCM scm_get_bytevector_n_x (SCM, SCM, SCM, SCM);
+SCM_API SCM scm_get_bytevector_some (SCM);
+SCM_API SCM scm_get_bytevector_all (SCM);
+SCM_API SCM scm_put_u8 (SCM, SCM);
+SCM_API SCM scm_put_bytevector (SCM, SCM, SCM, SCM);
+SCM_API SCM scm_open_bytevector_output_port (SCM);
+SCM_API SCM scm_make_custom_binary_output_port (SCM, SCM, SCM, SCM, SCM);
+SCM_API SCM scm_get_string_n_x (SCM, SCM, SCM, SCM);
+
+SCM_API void scm_init_binary_ports (void);
+SCM_INTERNAL void scm_register_binary_ports (void);
+
+#endif /* SCM_BINARY_PORTS_H */
diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c
index e867429..19dea8d 100644
--- a/libguile/r6rs-ports.c
+++ b/libguile/r6rs-ports.c
@@ -1,4 +1,4 @@
-/* Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
+/* Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -20,1073 +20,8 @@
# include <config.h>
#endif
-#ifdef HAVE_UNISTD_H
-# include <unistd.h>
-#endif
-
-#include <string.h>
-#include <stdio.h>
-#include <assert.h>
-
#include "libguile/_scm.h"
-#include "libguile/bytevectors.h"
-#include "libguile/chars.h"
-#include "libguile/eval.h"
#include "libguile/r6rs-ports.h"
-#include "libguile/strings.h"
-#include "libguile/validate.h"
-#include "libguile/values.h"
-#include "libguile/vectors.h"
-
-
-\f
-/* Unimplemented features. */
-
-
-/* Transoders are currently not implemented since Guile 1.8 is not
- Unicode-capable. Thus, most of the code here assumes the use of the
- binary transcoder. */
-static inline void
-transcoders_not_implemented (void)
-{
- fprintf (stderr, "%s: warning: transcoders not implemented\n",
- PACKAGE_NAME);
-}
-
-\f
-/* End-of-file object. */
-
-SCM_DEFINE (scm_eof_object, "eof-object", 0, 0, 0,
- (void),
- "Return the end-of-file object.")
-#define FUNC_NAME s_scm_eof_object
-{
- return (SCM_EOF_VAL);
-}
-#undef FUNC_NAME
-
-\f
-/* Input ports. */
-
-#ifndef MIN
-# define MIN(a,b) ((a) < (b) ? (a) : (b))
-#endif
-
-/* Bytevector input ports or "bip" for short. */
-static scm_t_bits bytevector_input_port_type = 0;
-
-static inline SCM
-make_bip (SCM bv)
-{
- SCM port;
- char *c_bv;
- unsigned c_len;
- scm_t_port *c_port;
- const unsigned long mode_bits = SCM_OPN | SCM_RDNG;
-
- scm_i_scm_pthread_mutex_lock (&scm_i_port_table_mutex);
-
- port = scm_new_port_table_entry (bytevector_input_port_type);
- c_port = SCM_PTAB_ENTRY (port);
-
- /* Match the expectation of `binary-port?'. */
- c_port->encoding = NULL;
-
- /* Prevent BV from being GC'd. */
- SCM_SETSTREAM (port, SCM_UNPACK (bv));
-
- /* Have the port directly access the bytevector. */
- c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
- c_len = SCM_BYTEVECTOR_LENGTH (bv);
-
- c_port->read_pos = c_port->read_buf = (unsigned char *) c_bv;
- c_port->read_end = (unsigned char *) c_bv + c_len;
- c_port->read_buf_size = c_len;
-
- /* Mark PORT as open, readable and unbuffered (hmm, how elegant...). */
- SCM_SET_CELL_TYPE (port, bytevector_input_port_type | mode_bits);
-
- scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
-
- return port;
-}
-
-static int
-bip_fill_input (SCM port)
-{
- int result;
- scm_t_port *c_port = SCM_PTAB_ENTRY (port);
-
- if (c_port->read_pos >= c_port->read_end)
- result = EOF;
- else
- result = (int) *c_port->read_pos;
-
- return result;
-}
-
-static scm_t_off
-bip_seek (SCM port, scm_t_off offset, int whence)
-#define FUNC_NAME "bip_seek"
-{
- scm_t_off c_result = 0;
- scm_t_port *c_port = SCM_PTAB_ENTRY (port);
-
- switch (whence)
- {
- case SEEK_CUR:
- offset += c_port->read_pos - c_port->read_buf;
- /* Fall through. */
-
- case SEEK_SET:
- if (c_port->read_buf + offset <= c_port->read_end)
- {
- c_port->read_pos = c_port->read_buf + offset;
- c_result = offset;
- }
- else
- scm_out_of_range (FUNC_NAME, scm_from_int (offset));
- break;
-
- case SEEK_END:
- if (c_port->read_end - offset >= c_port->read_buf)
- {
- c_port->read_pos = c_port->read_end - offset;
- c_result = c_port->read_pos - c_port->read_buf;
- }
- else
- scm_out_of_range (FUNC_NAME, scm_from_int (offset));
- break;
-
- default:
- scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
- "invalid `seek' parameter");
- }
-
- return c_result;
-}
-#undef FUNC_NAME
-
-
-/* Instantiate the bytevector input port type. */
-static inline void
-initialize_bytevector_input_ports (void)
-{
- bytevector_input_port_type =
- scm_make_port_type ("r6rs-bytevector-input-port", bip_fill_input,
- NULL);
-
- scm_set_port_seek (bytevector_input_port_type, bip_seek);
-}
-
-
-SCM_DEFINE (scm_open_bytevector_input_port,
- "open-bytevector-input-port", 1, 1, 0,
- (SCM bv, SCM transcoder),
- "Return an input port whose contents are drawn from "
- "bytevector @var{bv}.")
-#define FUNC_NAME s_scm_open_bytevector_input_port
-{
- SCM_VALIDATE_BYTEVECTOR (1, bv);
- if (!SCM_UNBNDP (transcoder) && !scm_is_false (transcoder))
- transcoders_not_implemented ();
-
- return (make_bip (bv));
-}
-#undef FUNC_NAME
-
-\f
-/* Custom binary ports. The following routines are shared by input and
- output custom binary ports. */
-
-#define SCM_CBP_GET_POSITION_PROC(_port) \
- SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 1)
-#define SCM_CBP_SET_POSITION_PROC(_port) \
- SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 2)
-#define SCM_CBP_CLOSE_PROC(_port) \
- SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 3)
-
-static scm_t_off
-cbp_seek (SCM port, scm_t_off offset, int whence)
-#define FUNC_NAME "cbp_seek"
-{
- SCM result;
- scm_t_off c_result = 0;
-
- switch (whence)
- {
- case SEEK_CUR:
- {
- SCM get_position_proc;
-
- get_position_proc = SCM_CBP_GET_POSITION_PROC (port);
- if (SCM_LIKELY (scm_is_true (get_position_proc)))
- result = scm_call_0 (get_position_proc);
- else
- scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
- "R6RS custom binary port does not "
- "support `port-position'");
-
- offset += scm_to_int (result);
- /* Fall through. */
- }
-
- case SEEK_SET:
- {
- SCM set_position_proc;
-
- set_position_proc = SCM_CBP_SET_POSITION_PROC (port);
- if (SCM_LIKELY (scm_is_true (set_position_proc)))
- result = scm_call_1 (set_position_proc, scm_from_int (offset));
- else
- scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
- "R6RS custom binary port does not "
- "support `set-port-position!'");
-
- /* Assuming setting the position succeeded. */
- c_result = offset;
- break;
- }
-
- default:
- /* `SEEK_END' cannot be supported. */
- scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
- "R6RS custom binary ports do not "
- "support `SEEK_END'");
- }
-
- return c_result;
-}
-#undef FUNC_NAME
-
-static int
-cbp_close (SCM port)
-{
- SCM close_proc;
-
- close_proc = SCM_CBP_CLOSE_PROC (port);
- if (scm_is_true (close_proc))
- /* Invoke the `close' thunk. */
- scm_call_0 (close_proc);
-
- return 1;
-}
-
-\f
-/* Custom binary input port ("cbip" for short). */
-
-static scm_t_bits custom_binary_input_port_type = 0;
-
-/* Size of the buffer embedded in custom binary input ports. */
-#define CBIP_BUFFER_SIZE 4096
-
-/* Return the bytevector associated with PORT. */
-#define SCM_CBIP_BYTEVECTOR(_port) \
- SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 4)
-
-/* Return the various procedures of PORT. */
-#define SCM_CBIP_READ_PROC(_port) \
- SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 0)
-
-
-static inline SCM
-make_cbip (SCM read_proc, SCM get_position_proc,
- SCM set_position_proc, SCM close_proc)
-{
- SCM port, bv, method_vector;
- char *c_bv;
- unsigned c_len;
- scm_t_port *c_port;
- const unsigned long mode_bits = SCM_OPN | SCM_RDNG;
-
- /* Use a bytevector as the underlying buffer. */
- c_len = CBIP_BUFFER_SIZE;
- bv = scm_c_make_bytevector (c_len);
- c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
-
- /* Store the various methods and bytevector in a vector. */
- method_vector = scm_c_make_vector (5, SCM_BOOL_F);
- SCM_SIMPLE_VECTOR_SET (method_vector, 4, bv);
- SCM_SIMPLE_VECTOR_SET (method_vector, 0, read_proc);
- SCM_SIMPLE_VECTOR_SET (method_vector, 1, get_position_proc);
- SCM_SIMPLE_VECTOR_SET (method_vector, 2, set_position_proc);
- SCM_SIMPLE_VECTOR_SET (method_vector, 3, close_proc);
-
- scm_i_pthread_mutex_lock (&scm_i_port_table_mutex);
-
- port = scm_new_port_table_entry (custom_binary_input_port_type);
- c_port = SCM_PTAB_ENTRY (port);
-
- /* Match the expectation of `binary-port?'. */
- c_port->encoding = NULL;
-
- /* Attach it the method vector. */
- SCM_SETSTREAM (port, SCM_UNPACK (method_vector));
-
- /* Have the port directly access the buffer (bytevector). */
- c_port->read_pos = c_port->read_buf = (unsigned char *) c_bv;
- c_port->read_end = (unsigned char *) c_bv;
- c_port->read_buf_size = c_len;
-
- /* Mark PORT as open, readable and unbuffered (hmm, how elegant...). */
- SCM_SET_CELL_TYPE (port, custom_binary_input_port_type | mode_bits);
-
- scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
-
- return port;
-}
-
-static int
-cbip_fill_input (SCM port)
-#define FUNC_NAME "cbip_fill_input"
-{
- int result;
- scm_t_port *c_port = SCM_PTAB_ENTRY (port);
-
- again:
- if (c_port->read_pos >= c_port->read_end)
- {
- /* Invoke the user's `read!' procedure. */
- unsigned c_octets;
- SCM bv, read_proc, octets;
-
- /* Use the bytevector associated with PORT as the buffer passed to the
- `read!' procedure, thereby avoiding additional allocations. */
- bv = SCM_CBIP_BYTEVECTOR (port);
- read_proc = SCM_CBIP_READ_PROC (port);
-
- /* The assumption here is that C_PORT's internal buffer wasn't changed
- behind our back. */
- assert (c_port->read_buf ==
- (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv));
- assert ((unsigned) c_port->read_buf_size
- == SCM_BYTEVECTOR_LENGTH (bv));
-
- octets = scm_call_3 (read_proc, bv, SCM_INUM0,
- SCM_I_MAKINUM (CBIP_BUFFER_SIZE));
- c_octets = scm_to_uint (octets);
-
- c_port->read_pos = (unsigned char *) SCM_BYTEVECTOR_CONTENTS (bv);
- c_port->read_end = (unsigned char *) c_port->read_pos + c_octets;
-
- if (c_octets > 0)
- goto again;
- else
- result = EOF;
- }
- else
- result = (int) *c_port->read_pos;
-
- return result;
-}
-#undef FUNC_NAME
-
-
-SCM_DEFINE (scm_make_custom_binary_input_port,
- "make-custom-binary-input-port", 5, 0, 0,
- (SCM id, SCM read_proc, SCM get_position_proc,
- SCM set_position_proc, SCM close_proc),
- "Return a new custom binary input port whose input is drained "
- "by invoking @var{read_proc} and passing it a bytevector, an "
- "index where octets should be written, and an octet count.")
-#define FUNC_NAME s_scm_make_custom_binary_input_port
-{
- SCM_VALIDATE_STRING (1, id);
- SCM_VALIDATE_PROC (2, read_proc);
-
- if (!scm_is_false (get_position_proc))
- SCM_VALIDATE_PROC (3, get_position_proc);
-
- if (!scm_is_false (set_position_proc))
- SCM_VALIDATE_PROC (4, set_position_proc);
-
- if (!scm_is_false (close_proc))
- SCM_VALIDATE_PROC (5, close_proc);
-
- return (make_cbip (read_proc, get_position_proc, set_position_proc,
- close_proc));
-}
-#undef FUNC_NAME
-
-
-/* Instantiate the custom binary input port type. */
-static inline void
-initialize_custom_binary_input_ports (void)
-{
- custom_binary_input_port_type =
- scm_make_port_type ("r6rs-custom-binary-input-port",
- cbip_fill_input, NULL);
-
- scm_set_port_seek (custom_binary_input_port_type, cbp_seek);
- scm_set_port_close (custom_binary_input_port_type, cbp_close);
-}
-
-
-\f
-/* Binary input. */
-
-/* We currently don't support specific binary input ports. */
-#define SCM_VALIDATE_BINARY_INPUT_PORT SCM_VALIDATE_OPINPORT
-
-SCM_DEFINE (scm_get_u8, "get-u8", 1, 0, 0,
- (SCM port),
- "Read an octet from @var{port}, a binary input port, "
- "blocking as necessary.")
-#define FUNC_NAME s_scm_get_u8
-{
- SCM result;
- int c_result;
-
- SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
-
- c_result = scm_get_byte_or_eof (port);
- if (c_result == EOF)
- result = SCM_EOF_VAL;
- else
- result = SCM_I_MAKINUM ((unsigned char) c_result);
-
- return result;
-}
-#undef FUNC_NAME
-
-SCM_DEFINE (scm_lookahead_u8, "lookahead-u8", 1, 0, 0,
- (SCM port),
- "Like @code{get-u8} but does not update @var{port} to "
- "point past the octet.")
-#define FUNC_NAME s_scm_lookahead_u8
-{
- int u8;
- SCM result;
-
- SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
-
- u8 = scm_peek_byte_or_eof (port);
- if (u8 == EOF)
- result = SCM_EOF_VAL;
- else
- result = SCM_I_MAKINUM ((scm_t_uint8) u8);
-
- return result;
-}
-#undef FUNC_NAME
-
-SCM_DEFINE (scm_get_bytevector_n, "get-bytevector-n", 2, 0, 0,
- (SCM port, SCM count),
- "Read @var{count} octets from @var{port}, blocking as "
- "necessary and return a bytevector containing the octets "
- "read. If fewer bytes are available, a bytevector smaller "
- "than @var{count} is returned.")
-#define FUNC_NAME s_scm_get_bytevector_n
-{
- SCM result;
- char *c_bv;
- unsigned c_count;
- size_t c_read;
-
- SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
- c_count = scm_to_uint (count);
-
- result = scm_c_make_bytevector (c_count);
- c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (result);
-
- if (SCM_LIKELY (c_count > 0))
- /* XXX: `scm_c_read ()' does not update the port position. */
- c_read = scm_c_read (port, c_bv, c_count);
- else
- /* Don't invoke `scm_c_read ()' since it may block. */
- c_read = 0;
-
- if ((c_read == 0) && (c_count > 0))
- {
- if (SCM_EOF_OBJECT_P (scm_peek_char (port)))
- result = SCM_EOF_VAL;
- else
- result = scm_null_bytevector;
- }
- else
- {
- if (c_read < c_count)
- result = scm_c_shrink_bytevector (result, c_read);
- }
-
- return result;
-}
-#undef FUNC_NAME
-
-SCM_DEFINE (scm_get_bytevector_n_x, "get-bytevector-n!", 4, 0, 0,
- (SCM port, SCM bv, SCM start, SCM count),
- "Read @var{count} bytes from @var{port} and store them "
- "in @var{bv} starting at index @var{start}. Return either "
- "the number of bytes actually read or the end-of-file "
- "object.")
-#define FUNC_NAME s_scm_get_bytevector_n_x
-{
- SCM result;
- char *c_bv;
- unsigned c_start, c_count, c_len;
- size_t c_read;
-
- SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
- SCM_VALIDATE_BYTEVECTOR (2, bv);
- c_start = scm_to_uint (start);
- c_count = scm_to_uint (count);
-
- c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
- c_len = SCM_BYTEVECTOR_LENGTH (bv);
-
- if (SCM_UNLIKELY (c_start + c_count > c_len))
- scm_out_of_range (FUNC_NAME, count);
-
- if (SCM_LIKELY (c_count > 0))
- c_read = scm_c_read (port, c_bv + c_start, c_count);
- else
- /* Don't invoke `scm_c_read ()' since it may block. */
- c_read = 0;
-
- if ((c_read == 0) && (c_count > 0))
- {
- if (SCM_EOF_OBJECT_P (scm_peek_char (port)))
- result = SCM_EOF_VAL;
- else
- result = SCM_I_MAKINUM (0);
- }
- else
- result = scm_from_size_t (c_read);
-
- return result;
-}
-#undef FUNC_NAME
-
-
-SCM_DEFINE (scm_get_bytevector_some, "get-bytevector-some", 1, 0, 0,
- (SCM port),
- "Read from @var{port}, blocking as necessary, until data "
- "are available or and end-of-file is reached. Return either "
- "a new bytevector containing the data read or the "
- "end-of-file object.")
-#define FUNC_NAME s_scm_get_bytevector_some
-{
- /* Read at least one byte, unless the end-of-file is already reached, and
- read while characters are available (buffered). */
-
- SCM result;
- char *c_bv;
- unsigned c_len;
- size_t c_total;
-
- SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
-
- c_len = 4096;
- c_bv = (char *) scm_gc_malloc_pointerless (c_len, SCM_GC_BYTEVECTOR);
- c_total = 0;
-
- do
- {
- int c_chr;
-
- if (c_total + 1 > c_len)
- {
- /* Grow the bytevector. */
- c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_len * 2,
- SCM_GC_BYTEVECTOR);
- c_len *= 2;
- }
-
- /* We can't use `scm_c_read ()' since it blocks. */
- c_chr = scm_getc (port);
- if (c_chr != EOF)
- {
- c_bv[c_total] = (char) c_chr;
- c_total++;
- }
- }
- while ((scm_is_true (scm_char_ready_p (port)))
- && (!SCM_EOF_OBJECT_P (scm_peek_char (port))));
-
- if (c_total == 0)
- {
- result = SCM_EOF_VAL;
- scm_gc_free (c_bv, c_len, SCM_GC_BYTEVECTOR);
- }
- else
- {
- if (c_len > c_total)
- {
- /* Shrink the bytevector. */
- c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_total,
- SCM_GC_BYTEVECTOR);
- c_len = (unsigned) c_total;
- }
-
- result = scm_c_take_gc_bytevector ((signed char *) c_bv, c_len);
- }
-
- return result;
-}
-#undef FUNC_NAME
-
-SCM_DEFINE (scm_get_bytevector_all, "get-bytevector-all", 1, 0, 0,
- (SCM port),
- "Read from @var{port}, blocking as necessary, until "
- "the end-of-file is reached. Return either "
- "a new bytevector containing the data read or the "
- "end-of-file object (if no data were available).")
-#define FUNC_NAME s_scm_get_bytevector_all
-{
- SCM result;
- char *c_bv;
- unsigned c_len, c_count;
- size_t c_read, c_total;
-
- SCM_VALIDATE_BINARY_INPUT_PORT (1, port);
-
- c_len = c_count = 4096;
- c_bv = (char *) scm_gc_malloc_pointerless (c_len, SCM_GC_BYTEVECTOR);
- c_total = c_read = 0;
-
- do
- {
- if (c_total + c_read > c_len)
- {
- /* Grow the bytevector. */
- c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_len * 2,
- SCM_GC_BYTEVECTOR);
- c_count = c_len;
- c_len *= 2;
- }
-
- /* `scm_c_read ()' blocks until C_COUNT bytes are available or EOF is
- reached. */
- c_read = scm_c_read (port, c_bv + c_total, c_count);
- c_total += c_read, c_count -= c_read;
- }
- while (!SCM_EOF_OBJECT_P (scm_peek_char (port)));
-
- if (c_total == 0)
- {
- result = SCM_EOF_VAL;
- scm_gc_free (c_bv, c_len, SCM_GC_BYTEVECTOR);
- }
- else
- {
- if (c_len > c_total)
- {
- /* Shrink the bytevector. */
- c_bv = (char *) scm_gc_realloc (c_bv, c_len, c_total,
- SCM_GC_BYTEVECTOR);
- c_len = (unsigned) c_total;
- }
-
- result = scm_c_take_gc_bytevector ((signed char *) c_bv, c_len);
- }
-
- return result;
-}
-#undef FUNC_NAME
-
-
-\f
-/* Binary output. */
-
-/* We currently don't support specific binary input ports. */
-#define SCM_VALIDATE_BINARY_OUTPUT_PORT SCM_VALIDATE_OPOUTPORT
-
-
-SCM_DEFINE (scm_put_u8, "put-u8", 2, 0, 0,
- (SCM port, SCM octet),
- "Write @var{octet} to binary port @var{port}.")
-#define FUNC_NAME s_scm_put_u8
-{
- scm_t_uint8 c_octet;
-
- SCM_VALIDATE_BINARY_OUTPUT_PORT (1, port);
- c_octet = scm_to_uint8 (octet);
-
- scm_putc ((char) c_octet, port);
-
- return SCM_UNSPECIFIED;
-}
-#undef FUNC_NAME
-
-SCM_DEFINE (scm_put_bytevector, "put-bytevector", 2, 2, 0,
- (SCM port, SCM bv, SCM start, SCM count),
- "Write the contents of @var{bv} to @var{port}, optionally "
- "starting at index @var{start} and limiting to @var{count} "
- "octets.")
-#define FUNC_NAME s_scm_put_bytevector
-{
- char *c_bv;
- unsigned c_start, c_count, c_len;
-
- SCM_VALIDATE_BINARY_OUTPUT_PORT (1, port);
- SCM_VALIDATE_BYTEVECTOR (2, bv);
-
- c_len = SCM_BYTEVECTOR_LENGTH (bv);
- c_bv = (char *) SCM_BYTEVECTOR_CONTENTS (bv);
-
- if (!scm_is_eq (start, SCM_UNDEFINED))
- {
- c_start = scm_to_uint (start);
-
- if (!scm_is_eq (count, SCM_UNDEFINED))
- {
- c_count = scm_to_uint (count);
- if (SCM_UNLIKELY (c_start + c_count > c_len))
- scm_out_of_range (FUNC_NAME, count);
- }
- else
- {
- if (SCM_UNLIKELY (c_start >= c_len))
- scm_out_of_range (FUNC_NAME, start);
- else
- c_count = c_len - c_start;
- }
- }
- else
- c_start = 0, c_count = c_len;
-
- scm_c_write (port, c_bv + c_start, c_count);
-
- return SCM_UNSPECIFIED;
-}
-#undef FUNC_NAME
-
-
-\f
-/* Bytevector output port ("bop" for short). */
-
-/* Implementation of "bops".
-
- Each bop has an internal buffer, of type `scm_t_bop_buffer', attached to
- it. The procedure returned along with the output port is actually an
- applicable SMOB. The SMOB holds a reference to the port. When applied,
- the SMOB swallows the port's internal buffer, turning it into a
- bytevector, and resets it.
-
- XXX: Access to a bop's internal buffer is not thread-safe. */
-
-static scm_t_bits bytevector_output_port_type = 0;
-
-SCM_SMOB (bytevector_output_port_procedure,
- "r6rs-bytevector-output-port-procedure",
- 0);
-
-#define SCM_GC_BOP "r6rs-bytevector-output-port"
-#define SCM_BOP_BUFFER_INITIAL_SIZE 4096
-
-/* Representation of a bop's internal buffer. */
-typedef struct
-{
- size_t total_len;
- size_t len;
- size_t pos;
- char *buffer;
-} scm_t_bop_buffer;
-
-
-/* Accessing a bop's buffer. */
-#define SCM_BOP_BUFFER(_port) \
- ((scm_t_bop_buffer *) SCM_STREAM (_port))
-#define SCM_SET_BOP_BUFFER(_port, _buf) \
- (SCM_SETSTREAM ((_port), (scm_t_bits) (_buf)))
-
-
-static inline void
-bop_buffer_init (scm_t_bop_buffer *buf)
-{
- buf->total_len = buf->len = buf->pos = 0;
- buf->buffer = NULL;
-}
-
-static inline void
-bop_buffer_grow (scm_t_bop_buffer *buf, size_t min_size)
-{
- char *new_buf;
- size_t new_size;
-
- for (new_size = buf->total_len
- ? buf->total_len : SCM_BOP_BUFFER_INITIAL_SIZE;
- new_size < min_size;
- new_size *= 2);
-
- if (buf->buffer)
- new_buf = scm_gc_realloc ((void *) buf->buffer, buf->total_len,
- new_size, SCM_GC_BOP);
- else
- new_buf = scm_gc_malloc_pointerless (new_size, SCM_GC_BOP);
-
- buf->buffer = new_buf;
- buf->total_len = new_size;
-}
-
-static inline SCM
-make_bop (void)
-{
- SCM port, bop_proc;
- scm_t_port *c_port;
- scm_t_bop_buffer *buf;
- const unsigned long mode_bits = SCM_OPN | SCM_WRTNG;
-
- scm_i_pthread_mutex_lock (&scm_i_port_table_mutex);
-
- port = scm_new_port_table_entry (bytevector_output_port_type);
- c_port = SCM_PTAB_ENTRY (port);
-
- /* Match the expectation of `binary-port?'. */
- c_port->encoding = NULL;
-
- buf = (scm_t_bop_buffer *) scm_gc_malloc (sizeof (* buf), SCM_GC_BOP);
- bop_buffer_init (buf);
-
- c_port->write_buf = c_port->write_pos = c_port->write_end = NULL;
- c_port->write_buf_size = 0;
-
- SCM_SET_BOP_BUFFER (port, buf);
-
- /* Mark PORT as open and writable. */
- SCM_SET_CELL_TYPE (port, bytevector_output_port_type | mode_bits);
-
- scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
-
- /* Make the bop procedure. */
- SCM_NEWSMOB (bop_proc, bytevector_output_port_procedure, buf);
-
- return (scm_values (scm_list_2 (port, bop_proc)));
-}
-
-/* Write SIZE octets from DATA to PORT. */
-static void
-bop_write (SCM port, const void *data, size_t size)
-{
- scm_t_bop_buffer *buf;
-
- buf = SCM_BOP_BUFFER (port);
-
- if (buf->pos + size > buf->total_len)
- bop_buffer_grow (buf, buf->pos + size);
-
- memcpy (buf->buffer + buf->pos, data, size);
- buf->pos += size;
- buf->len = (buf->len > buf->pos) ? buf->len : buf->pos;
-}
-
-static scm_t_off
-bop_seek (SCM port, scm_t_off offset, int whence)
-#define FUNC_NAME "bop_seek"
-{
- scm_t_bop_buffer *buf;
-
- buf = SCM_BOP_BUFFER (port);
- switch (whence)
- {
- case SEEK_CUR:
- offset += (scm_t_off) buf->pos;
- /* Fall through. */
-
- case SEEK_SET:
- if (offset < 0 || (unsigned) offset > buf->len)
- scm_out_of_range (FUNC_NAME, scm_from_int (offset));
- else
- buf->pos = offset;
- break;
-
- case SEEK_END:
- if (offset < 0 || (unsigned) offset >= buf->len)
- scm_out_of_range (FUNC_NAME, scm_from_int (offset));
- else
- buf->pos = buf->len - (offset + 1);
- break;
-
- default:
- scm_wrong_type_arg_msg (FUNC_NAME, 0, port,
- "invalid `seek' parameter");
- }
-
- return buf->pos;
-}
-#undef FUNC_NAME
-
-/* Fetch data from a bop. */
-SCM_SMOB_APPLY (bytevector_output_port_procedure,
- bop_proc_apply, 0, 0, 0, (SCM bop_proc))
-{
- SCM bv;
- scm_t_bop_buffer *buf, result_buf;
-
- buf = (scm_t_bop_buffer *) SCM_SMOB_DATA (bop_proc);
-
- result_buf = *buf;
- bop_buffer_init (buf);
-
- if (result_buf.len == 0)
- bv = scm_c_take_gc_bytevector (NULL, 0);
- else
- {
- if (result_buf.total_len > result_buf.len)
- /* Shrink the buffer. */
- result_buf.buffer = scm_gc_realloc ((void *) result_buf.buffer,
- result_buf.total_len,
- result_buf.len,
- SCM_GC_BOP);
-
- bv = scm_c_take_gc_bytevector ((signed char *) result_buf.buffer,
- result_buf.len);
- }
-
- return bv;
-}
-
-SCM_DEFINE (scm_open_bytevector_output_port,
- "open-bytevector-output-port", 0, 1, 0,
- (SCM transcoder),
- "Return two values: an output port and a procedure. The latter "
- "should be called with zero arguments to obtain a bytevector "
- "containing the data accumulated by the port.")
-#define FUNC_NAME s_scm_open_bytevector_output_port
-{
- if (!SCM_UNBNDP (transcoder) && !scm_is_false (transcoder))
- transcoders_not_implemented ();
-
- return (make_bop ());
-}
-#undef FUNC_NAME
-
-static inline void
-initialize_bytevector_output_ports (void)
-{
- bytevector_output_port_type =
- scm_make_port_type ("r6rs-bytevector-output-port",
- NULL, bop_write);
-
- scm_set_port_seek (bytevector_output_port_type, bop_seek);
-}
-
-\f
-/* Custom binary output port ("cbop" for short). */
-
-static scm_t_bits custom_binary_output_port_type;
-
-/* Return the various procedures of PORT. */
-#define SCM_CBOP_WRITE_PROC(_port) \
- SCM_SIMPLE_VECTOR_REF (SCM_PACK (SCM_STREAM (_port)), 0)
-
-
-static inline SCM
-make_cbop (SCM write_proc, SCM get_position_proc,
- SCM set_position_proc, SCM close_proc)
-{
- SCM port, method_vector;
- scm_t_port *c_port;
- const unsigned long mode_bits = SCM_OPN | SCM_WRTNG;
-
- /* Store the various methods and bytevector in a vector. */
- method_vector = scm_c_make_vector (4, SCM_BOOL_F);
- SCM_SIMPLE_VECTOR_SET (method_vector, 0, write_proc);
- SCM_SIMPLE_VECTOR_SET (method_vector, 1, get_position_proc);
- SCM_SIMPLE_VECTOR_SET (method_vector, 2, set_position_proc);
- SCM_SIMPLE_VECTOR_SET (method_vector, 3, close_proc);
-
- scm_i_pthread_mutex_lock (&scm_i_port_table_mutex);
-
- port = scm_new_port_table_entry (custom_binary_output_port_type);
- c_port = SCM_PTAB_ENTRY (port);
-
- /* Match the expectation of `binary-port?'. */
- c_port->encoding = NULL;
-
- /* Attach it the method vector. */
- SCM_SETSTREAM (port, SCM_UNPACK (method_vector));
-
- /* Have the port directly access the buffer (bytevector). */
- c_port->write_buf = c_port->write_pos = c_port->write_end = NULL;
- c_port->write_buf_size = c_port->read_buf_size = 0;
-
- /* Mark PORT as open, writable and unbuffered. */
- SCM_SET_CELL_TYPE (port, custom_binary_output_port_type | mode_bits);
-
- scm_i_pthread_mutex_unlock (&scm_i_port_table_mutex);
-
- return port;
-}
-
-/* Write SIZE octets from DATA to PORT. */
-static void
-cbop_write (SCM port, const void *data, size_t size)
-#define FUNC_NAME "cbop_write"
-{
- long int c_result;
- size_t c_written;
- SCM bv, write_proc, result;
-
- /* XXX: Allocating a new bytevector at each `write' call is inefficient,
- but necessary since (1) we don't control the lifetime of the buffer
- pointed to by DATA, and (2) the `write!' procedure could capture the
- bytevector it is passed. */
- bv = scm_c_make_bytevector (size);
- memcpy (SCM_BYTEVECTOR_CONTENTS (bv), data, size);
-
- write_proc = SCM_CBOP_WRITE_PROC (port);
-
- /* Since the `write' procedure of Guile's ports has type `void', it must
- try hard to write exactly SIZE bytes, regardless of how many bytes the
- sink can handle. */
- for (c_written = 0;
- c_written < size;
- c_written += c_result)
- {
- result = scm_call_3 (write_proc, bv,
- scm_from_size_t (c_written),
- scm_from_size_t (size - c_written));
-
- c_result = scm_to_long (result);
- if (SCM_UNLIKELY (c_result < 0
- || (size_t) c_result > (size - c_written)))
- scm_wrong_type_arg_msg (FUNC_NAME, 0, result,
- "R6RS custom binary output port `write!' "
- "returned a incorrect integer");
- }
-}
-#undef FUNC_NAME
-
-
-SCM_DEFINE (scm_make_custom_binary_output_port,
- "make-custom-binary-output-port", 5, 0, 0,
- (SCM id, SCM write_proc, SCM get_position_proc,
- SCM set_position_proc, SCM close_proc),
- "Return a new custom binary output port whose output is drained "
- "by invoking @var{write_proc} and passing it a bytevector, an "
- "index where octets should be written, and an octet count.")
-#define FUNC_NAME s_scm_make_custom_binary_output_port
-{
- SCM_VALIDATE_STRING (1, id);
- SCM_VALIDATE_PROC (2, write_proc);
-
- if (!scm_is_false (get_position_proc))
- SCM_VALIDATE_PROC (3, get_position_proc);
-
- if (!scm_is_false (set_position_proc))
- SCM_VALIDATE_PROC (4, set_position_proc);
-
- if (!scm_is_false (close_proc))
- SCM_VALIDATE_PROC (5, close_proc);
-
- return (make_cbop (write_proc, get_position_proc, set_position_proc,
- close_proc));
-}
-#undef FUNC_NAME
-
-
-/* Instantiate the custom binary output port type. */
-static inline void
-initialize_custom_binary_output_ports (void)
-{
- custom_binary_output_port_type =
- scm_make_port_type ("r6rs-custom-binary-output-port",
- NULL, cbop_write);
-
- scm_set_port_seek (custom_binary_output_port_type, cbp_seek);
- scm_set_port_close (custom_binary_output_port_type, cbp_close);
-}
\f
/* Transcoded ports ("tp" for short). */
@@ -1240,46 +175,6 @@ SCM_DEFINE (scm_i_make_transcoded_port,
#undef FUNC_NAME
\f
-/* Textual I/O */
-
-SCM_DEFINE (scm_get_string_n_x,
- "get-string-n!", 4, 0, 0,
- (SCM port, SCM str, SCM start, SCM count),
- "Read up to @var{count} characters from @var{port} into "
- "@var{str}, starting at @var{start}. If no characters "
- "can be read before the end of file is encountered, the end "
- "of file object is returned. Otherwise, the number of "
- "characters read is returned.")
-#define FUNC_NAME s_scm_get_string_n_x
-{
- size_t c_start, c_count, c_len, c_end, j;
- scm_t_wchar c;
-
- SCM_VALIDATE_OPINPORT (1, port);
- SCM_VALIDATE_STRING (2, str);
- c_len = scm_c_string_length (str);
- c_start = scm_to_size_t (start);
- c_count = scm_to_size_t (count);
- c_end = c_start + c_count;
-
- if (SCM_UNLIKELY (c_end > c_len))
- scm_out_of_range (FUNC_NAME, count);
-
- for (j = c_start; j < c_end; j++)
- {
- c = scm_getc (port);
- if (c == EOF)
- {
- size_t chars_read = j - c_start;
- return chars_read == 0 ? SCM_EOF_VAL : scm_from_size_t (chars_read);
- }
- scm_c_string_set_x (str, j, SCM_MAKE_CHAR (c));
- }
- return count;
-}
-#undef FUNC_NAME
-
-\f
/* Initialization. */
void
@@ -1296,9 +191,5 @@ scm_init_r6rs_ports (void)
{
#include "libguile/r6rs-ports.x"
- initialize_bytevector_input_ports ();
- initialize_custom_binary_input_ports ();
- initialize_bytevector_output_ports ();
- initialize_custom_binary_output_ports ();
initialize_transcoded_ports ();
}
diff --git a/libguile/r6rs-ports.h b/libguile/r6rs-ports.h
index 2ae3e76..682abcf 100644
--- a/libguile/r6rs-ports.h
+++ b/libguile/r6rs-ports.h
@@ -1,7 +1,7 @@
#ifndef SCM_R6RS_PORTS_H
#define SCM_R6RS_PORTS_H
-/* Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
+/* Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -21,24 +21,7 @@
\f
-#include "libguile/__scm.h"
-
-/* R6RS I/O Ports. */
-
-SCM_API SCM scm_eof_object (void);
-SCM_API SCM scm_open_bytevector_input_port (SCM, SCM);
-SCM_API SCM scm_make_custom_binary_input_port (SCM, SCM, SCM, SCM, SCM);
-SCM_API SCM scm_get_u8 (SCM);
-SCM_API SCM scm_lookahead_u8 (SCM);
-SCM_API SCM scm_get_bytevector_n (SCM, SCM);
-SCM_API SCM scm_get_bytevector_n_x (SCM, SCM, SCM, SCM);
-SCM_API SCM scm_get_bytevector_some (SCM);
-SCM_API SCM scm_get_bytevector_all (SCM);
-SCM_API SCM scm_put_u8 (SCM, SCM);
-SCM_API SCM scm_put_bytevector (SCM, SCM, SCM, SCM);
-SCM_API SCM scm_open_bytevector_output_port (SCM);
-SCM_API SCM scm_make_custom_binary_output_port (SCM, SCM, SCM, SCM, SCM);
-SCM_API SCM scm_get_string_n_x (SCM, SCM, SCM, SCM);
+#include "libguile/binary-ports.h"
SCM_API void scm_init_r6rs_ports (void);
SCM_INTERNAL void scm_register_r6rs_ports (void);
diff --git a/module/ice-9/binary-ports.scm b/module/ice-9/binary-ports.scm
index c07900b..59450ac 100644
--- a/module/ice-9/binary-ports.scm
+++ b/module/ice-9/binary-ports.scm
@@ -1,6 +1,6 @@
;;;; binary-ports.scm --- Binary IO on ports
-;;;; Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -43,8 +43,5 @@
open-bytevector-output-port
make-custom-binary-output-port))
-;; Note that this extension also defines %make-transcoded-port, which is
-;; not exported but is used by (rnrs io ports).
-
(load-extension (string-append "libguile-" (effective-version))
- "scm_init_r6rs_ports")
+ "scm_init_binary_ports")
diff --git a/module/rnrs/io/ports.scm b/module/rnrs/io/ports.scm
index 7c17b0c..ee8c05a 100644
--- a/module/rnrs/io/ports.scm
+++ b/module/rnrs/io/ports.scm
@@ -1,6 +1,6 @@
;;;; ports.scm --- R6RS port API -*- coding: utf-8 -*-
-;;;; Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
+;;;; Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
@@ -118,6 +118,10 @@
guile:))
+(eval-when (load eval compile)
+ (load-extension (string-append "libguile-" (effective-version))
+ "scm_init_r6rs_ports"))
+
\f
;;;
;;; Auxiliary types
@@ -253,8 +257,7 @@ Guile."
"Return a new textual port based on @var{port}, using
@var{transcoder} to encode and decode data written to or
read from its underlying binary port @var{port}."
- ;; Hackily get at %make-transcoded-port.
- (let ((result ((@@ (ice-9 binary-ports) %make-transcoded-port) port)))
+ (let ((result (%make-transcoded-port port)))
(set-port-encoding! result (transcoder-codec transcoder))
(case (transcoder-error-handling-mode transcoder)
((raise)
--
1.7.10.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 2/3] Add internal API to specify reader options at reader invocation
2012-12-09 12:47 Fix reader options for R6RS `get-datum' Andreas Rottmann
2012-12-09 12:47 ` [PATCH 1/3] Split r6rs-ports.c according to module boundaries Andreas Rottmann
@ 2012-12-09 12:47 ` Andreas Rottmann
2013-01-21 20:44 ` Andy Wingo
2012-12-09 12:47 ` [PATCH 3/3] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
2012-12-11 20:23 ` Fix reader options for R6RS `get-datum' Mark H Weaver
3 siblings, 1 reply; 15+ messages in thread
From: Andreas Rottmann @ 2012-12-09 12:47 UTC (permalink / raw)
To: guile-devel
* libguile/private-options.h: Introduce a new enum indexing the read
options, and use its values as indices for scm_read_opts.
* libguile/read.c: Get rid of the bit field offsets #define-s, and
instead use the enum values to caculate them.
* libguile/read.c (enum t_keyword_style, scm_t_read_opts): Moved to
libguile/private-options.h, the names of the keyword style enum having
been prefixed with "SCM_" (use sites adapted accordingly). These have
been moved out of the C file so that they can be used in other parts
of libguile -- they are needed by the scm_i_read()'s interface.
* libguile/read.h (scm_i_read): New procedure, marked SCM_INTERNAL.
(scm_t_read_opts): Opaque typedef for the struct defined in
private-options.h.
* libguile/read.c (init_read_options): Add an additional argument
`preset', to allow the caller to specify reader options that are
preset and should not be changed.
(scm_i_read): New internal procedure, allowing to fix (part of) the
reader options.
---
libguile/private-options.h | 51 ++++++++++++++++----
libguile/read.c | 115 +++++++++++++++++++-------------------------
libguile/read.h | 6 ++-
3 files changed, 95 insertions(+), 77 deletions(-)
diff --git a/libguile/private-options.h b/libguile/private-options.h
index ed0f314..0c85d0c 100644
--- a/libguile/private-options.h
+++ b/libguile/private-options.h
@@ -4,7 +4,7 @@
* We put this in a private header, since layout of data structures
* is an implementation detail that we want to hide.
*
- * Copyright (C) 2007, 2009, 2010, 2011 Free Software Foundation, Inc.
+ * Copyright (C) 2007, 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -60,15 +60,46 @@ SCM_INTERNAL scm_t_option scm_print_opts[];
*/
SCM_INTERNAL scm_t_option scm_read_opts[];
-#define SCM_COPY_SOURCE_P scm_read_opts[0].val
-#define SCM_RECORD_POSITIONS_P scm_read_opts[1].val
-#define SCM_CASE_INSENSITIVE_P scm_read_opts[2].val
-#define SCM_KEYWORD_STYLE scm_read_opts[3].val
-#define SCM_R6RS_ESCAPES_P scm_read_opts[4].val
-#define SCM_SQUARE_BRACKETS_P scm_read_opts[5].val
-#define SCM_HUNGRY_EOL_ESCAPES_P scm_read_opts[6].val
-#define SCM_CURLY_INFIX_P scm_read_opts[7].val
+enum
+ {
+ SCM_READ_OPTION_COPY_SOURCE_P,
+ SCM_READ_OPTION_RECORD_POSITIONS_P,
+ SCM_READ_OPTION_CASE_INSENSITIVE_P,
+ SCM_READ_OPTION_KEYWORD_STYLE,
+ SCM_READ_OPTION_R6RS_ESCAPES_P,
+ SCM_READ_OPTION_SQUARE_BRACKETS_P,
+ SCM_READ_OPTION_HUNGRY_EOL_ESCAPES_P,
+ SCM_READ_OPTION_CURLY_INFIX_P,
+ SCM_N_READ_OPTIONS
+ };
-#define SCM_N_READ_OPTIONS 7
+typedef enum
+ {
+ SCM_KEYWORD_STYLE_HASH_PREFIX,
+ SCM_KEYWORD_STYLE_PREFIX,
+ SCM_KEYWORD_STYLE_POSTFIX
+ } scm_t_keyword_style;
+
+struct scm_struct_read_opts
+{
+ scm_t_keyword_style keyword_style;
+ unsigned int copy_source_p : 1;
+ unsigned int record_positions_p : 1;
+ unsigned int case_insensitive_p : 1;
+ unsigned int r6rs_escapes_p : 1;
+ unsigned int square_brackets_p : 1;
+ unsigned int hungry_eol_escapes_p : 1;
+ unsigned int curly_infix_p : 1;
+ unsigned int neoteric_p : 1;
+};
+
+#define SCM_COPY_SOURCE_P scm_read_opts[SCM_READ_OPTION_COPY_SOURCE_P].val
+#define SCM_RECORD_POSITIONS_P scm_read_opts[SCM_READ_OPTION_RECORD_POSITIONS_P].val
+#define SCM_CASE_INSENSITIVE_P scm_read_opts[SCM_READ_OPTION_CASE_INSENSITIVE_P].val
+#define SCM_KEYWORD_STYLE scm_read_opts[SCM_READ_OPTION_KEYWORD_STYLE].val
+#define SCM_R6RS_ESCAPES_P scm_read_opts[SCM_READ_OPTION_R6RS_ESCAPES_P].val
+#define SCM_SQUARE_BRACKETS_P scm_read_opts[SCM_READ_OPTION_SQUARE_BRACKETS_P].val
+#define SCM_HUNGRY_EOL_ESCAPES_P scm_read_opts[SCM_READ_OPTION_HUNGRY_EOL_ESCAPES_P].val
+#define SCM_CURLY_INFIX_P scm_read_opts[SCM_READ_OPTION_CURLY_INFIX_P].val
#endif /* PRIVATE_OPTIONS */
diff --git a/libguile/read.c b/libguile/read.c
index 222891b..754ea99 100644
--- a/libguile/read.c
+++ b/libguile/read.c
@@ -93,29 +93,6 @@ scm_t_option scm_read_opts[] =
from the global and per-port read options, and a pointer is passed
down to all helper functions. */
-enum t_keyword_style
- {
- KEYWORD_STYLE_HASH_PREFIX,
- KEYWORD_STYLE_PREFIX,
- KEYWORD_STYLE_POSTFIX
- };
-
-struct t_read_opts
-{
- enum t_keyword_style keyword_style;
- unsigned int copy_source_p : 1;
- unsigned int record_positions_p : 1;
- unsigned int case_insensitive_p : 1;
- unsigned int r6rs_escapes_p : 1;
- unsigned int square_brackets_p : 1;
- unsigned int hungry_eol_escapes_p : 1;
- unsigned int curly_infix_p : 1;
- unsigned int neoteric_p : 1;
-};
-
-typedef struct t_read_opts scm_t_read_opts;
-
-
/*
Give meaningful error messages for errors
@@ -737,7 +714,7 @@ scm_read_mixed_case_symbol (scm_t_wchar chr, SCM port, scm_t_read_opts *opts)
SCM result;
int ends_with_colon = 0;
size_t bytes_read;
- int postfix = (opts->keyword_style == KEYWORD_STYLE_POSTFIX);
+ int postfix = (opts->keyword_style == SCM_KEYWORD_STYLE_POSTFIX);
char local_buffer[READER_BUFFER_SIZE], *buffer;
scm_t_port *pt = SCM_PTAB_ENTRY (port);
SCM str;
@@ -1771,7 +1748,7 @@ read_inner_expression (SCM port, scm_t_read_opts *opts)
case EOF:
return SCM_EOF_VAL;
case ':':
- if (opts->keyword_style == KEYWORD_STYLE_PREFIX)
+ if (opts->keyword_style == SCM_KEYWORD_STYLE_PREFIX)
return scm_symbol_to_keyword (scm_read_expression (port, opts));
/* Fall through. */
@@ -1859,7 +1836,7 @@ scm_read_expression (SCM port, scm_t_read_opts *opts)
\f
/* Actual reader. */
-static void init_read_options (SCM port, scm_t_read_opts *opts);
+static void init_read_options (SCM port, scm_t_read_opts *opts, unsigned int mask);
SCM_DEFINE (scm_read, "read", 0, 1, 0,
(SCM port),
@@ -1875,7 +1852,7 @@ SCM_DEFINE (scm_read, "read", 0, 1, 0,
port = scm_current_input_port ();
SCM_VALIDATE_OPINPORT (1, port);
- init_read_options (port, &opts);
+ init_read_options (port, &opts, 0);
c = flush_ws (port, &opts, (char *) NULL);
if (EOF == c)
@@ -1887,6 +1864,21 @@ SCM_DEFINE (scm_read, "read", 0, 1, 0,
#undef FUNC_NAME
+SCM scm_i_read (SCM port, const scm_t_read_opts *opts, unsigned int preset)
+{
+ int c;
+ scm_t_read_opts local_opts = *opts;
+
+ init_read_options (port, &local_opts, preset);
+
+ c = flush_ws (port, &local_opts, (char *) NULL);
+ if (EOF == c)
+ return SCM_EOF_VAL;
+ scm_ungetc (c, port);
+
+ return (scm_read_expression (port, &local_opts));
+}
+
\f
/* Manipulate the read-hash-procedures alist. This could be written in
@@ -2158,18 +2150,8 @@ SCM_DEFINE (scm_file_encoding, "file-encoding", 1, 0, 0,
/* Key to read options in per-port alists. */
SCM_SYMBOL (sym_port_read_options, "port-read-options");
-/* Offsets of bit fields for each per-port override */
-#define READ_OPTION_COPY_SOURCE_P 0
-#define READ_OPTION_RECORD_POSITIONS_P 2
-#define READ_OPTION_CASE_INSENSITIVE_P 4
-#define READ_OPTION_KEYWORD_STYLE 6
-#define READ_OPTION_R6RS_ESCAPES_P 8
-#define READ_OPTION_SQUARE_BRACKETS_P 10
-#define READ_OPTION_HUNGRY_EOL_ESCAPES_P 12
-#define READ_OPTION_CURLY_INFIX_P 14
-
/* The total width in bits of the per-port overrides */
-#define READ_OPTIONS_NUM_BITS 16
+#define READ_OPTIONS_NUM_BITS (SCM_N_READ_OPTIONS * 2)
#define READ_OPTIONS_INHERIT_ALL ((1UL << READ_OPTIONS_NUM_BITS) - 1)
#define READ_OPTIONS_MAX_VALUE READ_OPTIONS_INHERIT_ALL
@@ -2191,8 +2173,8 @@ set_port_read_option (SCM port, int option, int new_value)
read_options = scm_to_uint (scm_read_options);
else
read_options = READ_OPTIONS_INHERIT_ALL;
- read_options &= ~(READ_OPTION_MASK << option);
- read_options |= new_value << option;
+ read_options &= ~(READ_OPTION_MASK << (option * 2));
+ read_options |= new_value << (option * 2);
scm_read_options = scm_from_uint (read_options);
alist = scm_assq_set_x (alist, sym_port_read_options, scm_read_options);
scm_hashq_set_x (scm_i_port_weak_hash, port, alist);
@@ -2205,7 +2187,7 @@ set_port_case_insensitive_p (SCM port, scm_t_read_opts *opts, int value)
{
value = !!value;
opts->case_insensitive_p = value;
- set_port_read_option (port, READ_OPTION_CASE_INSENSITIVE_P, value);
+ set_port_read_option (port, SCM_READ_OPTION_CASE_INSENSITIVE_P, value);
}
/* Set OPTS and PORT's square_brackets_p option according to VALUE. */
@@ -2214,7 +2196,7 @@ set_port_square_brackets_p (SCM port, scm_t_read_opts *opts, int value)
{
value = !!value;
opts->square_brackets_p = value;
- set_port_read_option (port, READ_OPTION_SQUARE_BRACKETS_P, value);
+ set_port_read_option (port, SCM_READ_OPTION_SQUARE_BRACKETS_P, value);
}
/* Set OPTS and PORT's curly_infix_p option according to VALUE. */
@@ -2223,13 +2205,13 @@ set_port_curly_infix_p (SCM port, scm_t_read_opts *opts, int value)
{
value = !!value;
opts->curly_infix_p = value;
- set_port_read_option (port, READ_OPTION_CURLY_INFIX_P, value);
+ set_port_read_option (port, SCM_READ_OPTION_CURLY_INFIX_P, value);
}
/* Initialize OPTS based on PORT's read options and the global read
options. */
static void
-init_read_options (SCM port, scm_t_read_opts *opts)
+init_read_options (SCM port, scm_t_read_opts *opts, unsigned int preset)
{
SCM alist, val, scm_read_options;
unsigned int read_options, x;
@@ -2244,28 +2226,29 @@ init_read_options (SCM port, scm_t_read_opts *opts)
else
read_options = READ_OPTIONS_INHERIT_ALL;
- x = READ_OPTION_MASK & (read_options >> READ_OPTION_KEYWORD_STYLE);
- if (x == READ_OPTION_INHERIT)
- {
- val = SCM_PACK (SCM_KEYWORD_STYLE);
- if (scm_is_eq (val, scm_keyword_prefix))
- x = KEYWORD_STYLE_PREFIX;
- else if (scm_is_eq (val, scm_keyword_postfix))
- x = KEYWORD_STYLE_POSTFIX;
- else
- x = KEYWORD_STYLE_HASH_PREFIX;
- }
- opts->keyword_style = x;
-
-#define RESOLVE_BOOLEAN_OPTION(NAME, name) \
- do \
- { \
- x = READ_OPTION_MASK & (read_options >> READ_OPTION_ ## NAME); \
- if (x == READ_OPTION_INHERIT) \
- x = !!SCM_ ## NAME; \
- opts->name = x; \
- } \
- while (0)
+ if ((preset & (1 << SCM_READ_OPTION_KEYWORD_STYLE)) == 0) {
+ x = READ_OPTION_MASK & (read_options >> (SCM_READ_OPTION_KEYWORD_STYLE * 2));
+ if (x == READ_OPTION_INHERIT)
+ {
+ val = SCM_PACK (SCM_KEYWORD_STYLE);
+ if (scm_is_eq (val, scm_keyword_prefix))
+ x = SCM_KEYWORD_STYLE_PREFIX;
+ else if (scm_is_eq (val, scm_keyword_postfix))
+ x = SCM_KEYWORD_STYLE_POSTFIX;
+ else
+ x = SCM_KEYWORD_STYLE_HASH_PREFIX;
+ }
+ opts->keyword_style = x;
+ }
+
+#define RESOLVE_BOOLEAN_OPTION(NAME, name) \
+ if ((preset & (1 << SCM_READ_OPTION_ ## NAME)) == 0) { \
+ x = (READ_OPTION_MASK \
+ & (read_options >> (SCM_READ_OPTION_ ## NAME * 2))); \
+ if (x == READ_OPTION_INHERIT) \
+ x = !!SCM_ ## NAME; \
+ opts->name = x; \
+ }
RESOLVE_BOOLEAN_OPTION (COPY_SOURCE_P, copy_source_p);
RESOLVE_BOOLEAN_OPTION (RECORD_POSITIONS_P, record_positions_p);
diff --git a/libguile/read.h b/libguile/read.h
index 3c47afd..c460d2f 100644
--- a/libguile/read.h
+++ b/libguile/read.h
@@ -3,7 +3,7 @@
#ifndef SCM_READ_H
#define SCM_READ_H
-/* Copyright (C) 1995,1996,2000, 2006, 2008, 2009 Free Software Foundation, Inc.
+/* Copyright (C) 1995,1996,2000, 2006, 2008, 2009, 2012 Free Software Foundation, Inc.
*
* This library is free software; you can redistribute it and/or
* modify it under the terms of the GNU Lesser General Public License
@@ -49,12 +49,16 @@
\f
+typedef struct scm_struct_read_opts scm_t_read_opts;
SCM_API SCM scm_sym_dot;
SCM_API SCM scm_read_options (SCM setting);
SCM_API SCM scm_read (SCM port);
SCM_API SCM scm_read_hash_extend (SCM chr, SCM proc);
+SCM_INTERNAL SCM scm_i_read(SCM port,
+ const scm_t_read_opts *opts,
+ unsigned int preset);
SCM_INTERNAL char *scm_i_scan_for_encoding (SCM port);
SCM_API SCM scm_file_encoding (SCM port);
--
1.7.10.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* [PATCH 3/3] Make `get-datum' conform more closely to R6RS semantics
2012-12-09 12:47 Fix reader options for R6RS `get-datum' Andreas Rottmann
2012-12-09 12:47 ` [PATCH 1/3] Split r6rs-ports.c according to module boundaries Andreas Rottmann
2012-12-09 12:47 ` [PATCH 2/3] Add internal API to specify reader options at reader invocation Andreas Rottmann
@ 2012-12-09 12:47 ` Andreas Rottmann
2013-01-21 20:48 ` Andy Wingo
2012-12-11 20:23 ` Fix reader options for R6RS `get-datum' Mark H Weaver
3 siblings, 1 reply; 15+ messages in thread
From: Andreas Rottmann @ 2012-12-09 12:47 UTC (permalink / raw)
To: guile-devel
With Guile's default reader options, R6RS hex escape and EOL escape
behavior is missing. This change enables the former via the
`r6rs-hex-escapes' option, and gets us closer to the latter by setting
`hungry-eol-escapes'.
* libguile/r6rs-ports.c (R6RS_READ_OPTION_MASK): New macro, defines
which reader options need fixed values according to R6RS.
(r6rs_read_options): New internal constant, defining the values of the
relevant reader options.
(scm_i_get_datum): New internal helper calling `scm_i_read' using the
R6RS reader options.
* module/rnrs/io/ports.scm (get-datum): Call `%get-datum' instead of
`read'.
* test-suite/tests/r6rs-ports.test ("8.2.9 Textual input")["get-datum"]:
New tests.
---
libguile/r6rs-ports.c | 29 ++++++++++++++++++++++++
module/rnrs/io/ports.scm | 2 +-
test-suite/tests/r6rs-ports.test | 45 ++++++++++++++++++++++++++++++++++++++
3 files changed, 75 insertions(+), 1 deletion(-)
diff --git a/libguile/r6rs-ports.c b/libguile/r6rs-ports.c
index 19dea8d..973953a 100644
--- a/libguile/r6rs-ports.c
+++ b/libguile/r6rs-ports.c
@@ -174,6 +174,35 @@ SCM_DEFINE (scm_i_make_transcoded_port,
}
#undef FUNC_NAME
+#define R6RS_READ_OPTION_MASK \
+ ((1 << SCM_READ_OPTION_KEYWORD_STYLE) \
+ | (1 << SCM_READ_OPTION_R6RS_ESCAPES_P) \
+ | (1 << SCM_READ_OPTION_CASE_INSENSITIVE_P) \
+ | (1 << SCM_READ_OPTION_SQUARE_BRACKETS_P) \
+ | (1 << SCM_READ_OPTION_HUNGRY_EOL_ESCAPES_P))
+
+static const scm_t_read_opts r6rs_read_options = {
+ SCM_KEYWORD_STYLE_HASH_PREFIX,
+ 0, /* copy_source_p, not relevant */
+ 0, /* record_positions_p, not relevant */
+ 0, /* case_insensitive_p */
+ 1, /* r6rs_escapes_p */
+ 1, /* square_brackets_p */
+ 1, /* hungry_eol_escapes_p */
+ 0, /* curly_infix_p, compatible extension */
+ 0, /* neoteric_p, not relevant */
+};
+
+SCM_DEFINE (scm_i_get_datum,
+ "%get-datum", 1, 0, 0,
+ (SCM port),
+ "Read a datum in R6RS syntax from @var{port}")
+#define FUNC_NAME s_scm_i_get_datum
+{
+ return scm_i_read(port, &r6rs_read_options, R6RS_READ_OPTION_MASK);
+}
+#undef FUNC_NAME
+
\f
/* Initialization. */
diff --git a/module/rnrs/io/ports.scm b/module/rnrs/io/ports.scm
index ee8c05a..b2828cd 100644
--- a/module/rnrs/io/ports.scm
+++ b/module/rnrs/io/ports.scm
@@ -458,7 +458,7 @@ return the characters accumulated in that port."
(with-textual-input-conditions port (read-char port)))
(define (get-datum port)
- (with-textual-input-conditions port (read port)))
+ (with-textual-input-conditions port (%get-datum port)))
(define (get-line port)
(with-textual-input-conditions port (read-line port 'trim)))
diff --git a/test-suite/tests/r6rs-ports.test b/test-suite/tests/r6rs-ports.test
index ed49598..6a92987 100644
--- a/test-suite/tests/r6rs-ports.test
+++ b/test-suite/tests/r6rs-ports.test
@@ -707,6 +707,16 @@
(eq? (error-handling-mode replace)
(transcoder-error-handling-mode t))))))
+;; FIXME: duplicated from reader.test
+(define (with-read-options opts thunk)
+ (let ((saved-options (read-options)))
+ (dynamic-wind
+ (lambda ()
+ (read-options opts))
+ thunk
+ (lambda ()
+ (read-options saved-options)))))
+
(with-test-prefix "8.2.9 Textual input"
(pass-if "get-string-n [short]"
@@ -724,6 +734,41 @@
(s (string-copy "Isn't XXX great?")))
(and (= 3 (get-string-n! port s 6 3))
(string=? s "Isn't GNU great?"))))
+ (with-test-prefix "get-datum"
+ (let ((string->datum
+ (lambda (s)
+ ;; We should check against all possible permutations of
+ ;; read options, but we just enable (and leave disabled)
+ ;; the ones that each would break R6RS individually.
+ (with-read-options '(keywords prefix case-insensitive hungry-eol-escapes)
+ (lambda () (get-datum (open-input-string s)))))))
+ (pass-if "symbol"
+ (eq? (string->datum "foo") 'foo))
+ (pass-if "symbol [starting with colon]"
+ (eq? ':foo (string->datum ":foo")))
+ (pass-if "symbol ending with colon"
+ (eq? 'foo: (string->datum "foo:")))
+ (pass-if "string"
+ (string=? "foo" (string->datum "\"foo\"")))
+ (pass-if "string [with hex escapes]"
+ (string=? "bar\nA" (string->datum "\"bar\\x0A;\\x41;\"")))
+ (pass-if "string [hungry EOL]"
+ (string=? "bar baz" (string->datum "\"bar \\\n baz\"")))
+ ;; FIXME: actually, R6RS demands an even more hungry EOL escape
+ ;; than the reader currently implements: also any whitespace
+ ;; between the backslash and the newline should vanish. Currently,
+ ;; the reader barfs on that.
+ (pass-if "string [hungry EOL, space also before newline]"
+ (throw 'unresolved)
+ (string=? "bar baz" (string->datum "\"bar \\ \n baz\"")))
+ (pass-if "number [decimal]"
+ (= (string->datum "42") 42))
+ (pass-if "number [hexadecimal]"
+ (= (string->datum "#x2A") 42))
+ (pass-if "number [octal]"
+ (= (string->datum "#o0777") 511))
+ (pass-if "number [binary]"
+ (= (string->datum "#b101010") 42))))
(with-test-prefix "read error"
(pass-if-condition "get-char" i/o-read-error?
--
1.7.10.4
^ permalink raw reply related [flat|nested] 15+ messages in thread
* Re: Fix reader options for R6RS `get-datum'
2012-12-09 12:47 Fix reader options for R6RS `get-datum' Andreas Rottmann
` (2 preceding siblings ...)
2012-12-09 12:47 ` [PATCH 3/3] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
@ 2012-12-11 20:23 ` Mark H Weaver
2012-12-12 21:32 ` Andreas Rottmann
3 siblings, 1 reply; 15+ messages in thread
From: Mark H Weaver @ 2012-12-11 20:23 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: guile-devel
Hi Andreas,
Andreas Rottmann <a.rottmann@gmx.at> writes:
> This patch series addresses the problem that `get-datum' is using the
> global reader options, even for those options that have to have fixed
> values to make the reader behave in an R6RS-compatible way.
I'm sorry to have not done so earlier, but I finally looked at the R6RS
specification for 'get-datum', and I don't see anything to suggest that
it should recognize a different notation than 'read' does. If the user
has enabled reader extensions for 'read', I don't see why 'get-datum'
shouldn't honor those extensions as well.
Maybe what we should have instead is a command-line option that sets
some(?) of the global read options to conform with R6RS.
What do you think?
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix reader options for R6RS `get-datum'
2012-12-11 20:23 ` Fix reader options for R6RS `get-datum' Mark H Weaver
@ 2012-12-12 21:32 ` Andreas Rottmann
2012-12-12 23:48 ` Mark H Weaver
0 siblings, 1 reply; 15+ messages in thread
From: Andreas Rottmann @ 2012-12-12 21:32 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Mark H Weaver <mhw@netris.org> writes:
> Hi Andreas,
>
> Andreas Rottmann <a.rottmann@gmx.at> writes:
>> This patch series addresses the problem that `get-datum' is using the
>> global reader options, even for those options that have to have fixed
>> values to make the reader behave in an R6RS-compatible way.
>
> I'm sorry to have not done so earlier, but I finally looked at the R6RS
> specification for 'get-datum', and I don't see anything to suggest that
> it should recognize a different notation than 'read' does.
>
I think it does. But in the place where one would expect, namely the
docs on `get-datum' (8.2.6), it does not say it as clearly as one would
like:
Reads an external representation from textual-input-port and returns
the datum it represents. The get-datum procedure returns the next
datum that can be parsed from the given textual-input-port, updating
textual-input-port to point exactly past the end of the external
representation of the object.
Note that "external representation", referenced twice in the above text,
appears to be defined at the beginning of Chapter 4:
Syntactic data (also called external representations) double as a
notation for objects, and Scheme’s (rnrs io ports (6)) library [...]
provides the get-datum and put-datum procedures for reading and
writing syntactic data, converting between their textual
representation and the corresponding objects.
This leads me to conclude that `get-datum' should parse R6RS syntax, as
defined in R6RS 4.2.1. My proposed changes get us further in that
direction, by making sure we set all the knobs currently available to
enlarge the syntactic subset of R6RS we parse correctly, but it does not
get us to the finishing line.
> If the user has enabled reader extensions for 'read', I don't see why
> 'get-datum' shouldn't honor those extensions as well.
>
Well, there's difference between upwardly-compatible extensions, and
ones that are incompatible with R6RS syntax, such as colon-prefix or
colon-suffix keywords. Hash-colon keywords are ok, since they don't
conflict with the interpretation of legal R6RS code.
> Maybe what we should have instead is a command-line option that sets
> some(?) of the global read options to conform with R6RS.
>
> What do you think?
>
I disagree quite strongly -- IMO, `get-datum' must, for every valid
datum, according to R6RS lexical syntax, return the Scheme data denoted
by that external representation. By that rule, one must fix the values
of reader options which would otherwise lead to valid R6RS external
representations being read as a datum different from the one which would
result according to R6RS syntax.
Accepting lexical syntax which is not defined by R6RS is another matter,
and I'm fine with allowing it.
Regards, Rotty
--
Andreas Rottmann -- <http://rotty.xx.vu/>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix reader options for R6RS `get-datum'
2012-12-12 21:32 ` Andreas Rottmann
@ 2012-12-12 23:48 ` Mark H Weaver
2012-12-14 3:22 ` Andreas Rottmann
0 siblings, 1 reply; 15+ messages in thread
From: Mark H Weaver @ 2012-12-12 23:48 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: guile-devel
Hi Andreas,
Andreas Rottmann <a.rottmann@gmx.at> writes:
> Mark H Weaver <mhw@netris.org> writes:
>> Andreas Rottmann <a.rottmann@gmx.at> writes:
>>> This patch series addresses the problem that `get-datum' is using the
>>> global reader options, even for those options that have to have fixed
>>> values to make the reader behave in an R6RS-compatible way.
>>
>> I'm sorry to have not done so earlier, but I finally looked at the R6RS
>> specification for 'get-datum', and I don't see anything to suggest that
>> it should recognize a different notation than 'read' does.
>>
> I think it does. But in the place where one would expect, namely the
> docs on `get-datum' (8.2.6), it does not say it as clearly as one would
> like:
>
> Reads an external representation from textual-input-port and returns
> the datum it represents. The get-datum procedure returns the next
> datum that can be parsed from the given textual-input-port, updating
> textual-input-port to point exactly past the end of the external
> representation of the object.
Section 8.3 defines 'read' as follows:
Reads an external representation from textual-input-port and returns
the datum it represents. The read procedure operates in the same way
as get-datum, see section 8.2.9.
I believe this last sentence clearly confirms my belief that 'read' and
'get-datum' should recognize the same syntax.
> Note that "external representation", referenced twice in the above
> text, appears to be defined at the beginning of Chapter 4:
>
> Syntactic data (also called external representations) double as a
> notation for objects, and Scheme’s (rnrs io ports (6)) library [...]
> provides the get-datum and put-datum procedures for reading and
> writing syntactic data, converting between their textual
> representation and the corresponding objects.
>
> This leads me to conclude that `get-datum' should parse R6RS syntax, as
> defined in R6RS 4.2.1. My proposed changes get us further in that
> direction, by making sure we set all the knobs currently available to
> enlarge the syntactic subset of R6RS we parse correctly, but it does not
> get us to the finishing line.
>
>> If the user has enabled reader extensions for 'read', I don't see why
>> 'get-datum' shouldn't honor those extensions as well.
>>
> Well, there's difference between upwardly-compatible extensions, and
> ones that are incompatible with R6RS syntax, such as colon-prefix or
> colon-suffix keywords. Hash-colon keywords are ok, since they don't
> conflict with the interpretation of legal R6RS code.
>
>> Maybe what we should have instead is a command-line option that sets
>> some(?) of the global read options to conform with R6RS.
>>
>> What do you think?
>>
> I disagree quite strongly -- IMO, `get-datum' must, for every valid
> datum, according to R6RS lexical syntax, return the Scheme data denoted
> by that external representation. By that rule, one must fix the values
> of reader options which would otherwise lead to valid R6RS external
> representations being read as a datum different from the one which would
> result according to R6RS syntax.
I agree that there should be an easy way to set the reader options so
that all of the R6RS external representations are read as specified.
IMO, this needs to be done not only for 'get-datum' but also for 'read',
hence my suggestion to provide some way to set the global read options
appropriately.
On the flip side, if someone has enabled SRFI-105 curly-infix
expressions, or any other reader extension that does not conflict with
standard R6RS notation, then both 'get-datum' and 'read' should honor
that setting.
Does that make sense?
Regards,
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix reader options for R6RS `get-datum'
2012-12-12 23:48 ` Mark H Weaver
@ 2012-12-14 3:22 ` Andreas Rottmann
2012-12-16 22:12 ` Mark H Weaver
0 siblings, 1 reply; 15+ messages in thread
From: Andreas Rottmann @ 2012-12-14 3:22 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Mark H Weaver <mhw@netris.org> writes:
> Hi Andreas,
>
> Andreas Rottmann <a.rottmann@gmx.at> writes:
>> Mark H Weaver <mhw@netris.org> writes:
>>> Andreas Rottmann <a.rottmann@gmx.at> writes:
>>>> This patch series addresses the problem that `get-datum' is using the
>>>> global reader options, even for those options that have to have fixed
>>>> values to make the reader behave in an R6RS-compatible way.
>>>
>>> I'm sorry to have not done so earlier, but I finally looked at the R6RS
>>> specification for 'get-datum', and I don't see anything to suggest that
>>> it should recognize a different notation than 'read' does.
>>>
>> I think it does. But in the place where one would expect, namely the
>> docs on `get-datum' (8.2.6), it does not say it as clearly as one would
>> like:
>>
>> Reads an external representation from textual-input-port and returns
>> the datum it represents. The get-datum procedure returns the next
>> datum that can be parsed from the given textual-input-port, updating
>> textual-input-port to point exactly past the end of the external
>> representation of the object.
>
> Section 8.3 defines 'read' as follows:
>
> Reads an external representation from textual-input-port and returns
> the datum it represents. The read procedure operates in the same way
> as get-datum, see section 8.2.9.
>
> I believe this last sentence clearly confirms my belief that 'read' and
> 'get-datum' should recognize the same syntax.
>
Well yes, R6RS `read' and R6RS `get-datum' need to understand the same
syntax, but I thought you were talking about Guile `read' and R6RS
`get-datum'.
>>
>>> Maybe what we should have instead is a command-line option that sets
>>> some(?) of the global read options to conform with R6RS.
>>>
>>> What do you think?
>>>
>> I disagree quite strongly -- IMO, `get-datum' must, for every valid
>> datum, according to R6RS lexical syntax, return the Scheme data denoted
>> by that external representation. By that rule, one must fix the values
>> of reader options which would otherwise lead to valid R6RS external
>> representations being read as a datum different from the one which would
>> result according to R6RS syntax.
>
> I agree that there should be an easy way to set the reader options so
> that all of the R6RS external representations are read as specified.
>
Yes, but this cannot be a program-global switch -- it is simply defined
behavior `get-datum', which must *not* be affected by any global
switches. An R6RS syntax commandline switch would make sense regardless,
for dealing with e.g. script syntax, but this is a completely orthogonal
issue, and I'd like to concentrate on the `get-datum' correctness issue
for now.
> IMO, this needs to be done not only for 'get-datum' but also for 'read',
> hence my suggestion to provide some way to set the global read options
> appropriately.
>
Yup, R6RS `read' needs to be implemented in terms of `get-datum', not
only because of reader options, but also because of the required
exception behavior. This is how it's done already -- see
modules/rnrs/io/simple.scm.
> On the flip side, if someone has enabled SRFI-105 curly-infix
> expressions, or any other reader extension that does not conflict with
> standard R6RS notation, then both 'get-datum' and 'read' should honor
> that setting.
>
> Does that make sense?
>
It does, and I think this is also what my patch implements, if I
understood both the code and your words correctly :-).
Regards, Rotty
--
Andreas Rottmann -- <http://rotty.xx.vu/>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] Split r6rs-ports.c according to module boundaries
2012-12-09 12:47 ` [PATCH 1/3] Split r6rs-ports.c according to module boundaries Andreas Rottmann
@ 2012-12-15 5:35 ` Mark H Weaver
2012-12-15 5:38 ` Mark H Weaver
1 sibling, 0 replies; 15+ messages in thread
From: Mark H Weaver @ 2012-12-15 5:35 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: guile-devel
Hi Andreas,
This first patch mostly looks good to me, except for some minor problems
with the commit log (see below) and more importantly:
* binary-ports.h needs to be added to the list of headers included by
libguile.h.
Andreas Rottmann <a.rottmann@gmx.at> writes:
> Currently, r6rs-ports.c provides procedures exported from `(ice-9
> binary-ports)'. However, in addition, it also provides
> `%make-transcoder', which is used by `(rnrs io ports)' via a "private
> import".
"Currently" -> "Previously"
"provides" -> "provided" (two places)
"is used by" -> "was used by"
> This change splits r6rs-ports.c into two parts, one specific to `(ice-9
> binary-ports)' and the other specific to `(rnrs io ports)'. This
> provides a place to put internal C-implemented helpers needed only in
> `(rnrs io ports)', thus allowing to eliminate the private import of
> `%make-transcoder'.
>
> * libguile/binary-port.c: New file, contains the most part of what has
> been in libguile/r6rs-ports.c for now.
"for now" -> "before now" or "previously".
> * libguile/binary-ports.h: Ditto for the header part.
>
> * libguile/r6rs-ports.c: Now contains just transcoded ports, as this API
> is not part of the `(ice-9 binary-ports)' module.
> * libguile/r6rs-ports.h: Now includes "libguile/binary-ports.h", and
> defines the extension registration and initialization functions.
>
> * libguile/Makefile.am: Add new files.
> ---
Entries for binary-ports.scm, ports.scm, and libguile.h should be added
to the commit log. It would also be good to mention that
"scm_register_r6rs_ports" was renamed to "scm_register_binary_ports".
Other than that, it looks good to me.
Thanks!
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 1/3] Split r6rs-ports.c according to module boundaries
2012-12-09 12:47 ` [PATCH 1/3] Split r6rs-ports.c according to module boundaries Andreas Rottmann
2012-12-15 5:35 ` Mark H Weaver
@ 2012-12-15 5:38 ` Mark H Weaver
1 sibling, 0 replies; 15+ messages in thread
From: Mark H Weaver @ 2012-12-15 5:38 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: guile-devel
Hi Andreas, one more thing:
Andreas Rottmann <a.rottmann@gmx.at> writes:
> * libguile/binary-port.c: New file, contains the most part of what has
This filename is misspelled. It is missing the "s".
Regards,
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix reader options for R6RS `get-datum'
2012-12-14 3:22 ` Andreas Rottmann
@ 2012-12-16 22:12 ` Mark H Weaver
2012-12-17 19:05 ` Andreas Rottmann
0 siblings, 1 reply; 15+ messages in thread
From: Mark H Weaver @ 2012-12-16 22:12 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: guile-devel
Andreas Rottmann <a.rottmann@gmx.at> writes:
> Mark H Weaver <mhw@netris.org> writes:
>
>> Section 8.3 defines 'read' as follows:
>>
>> Reads an external representation from textual-input-port and returns
>> the datum it represents. The read procedure operates in the same way
>> as get-datum, see section 8.2.9.
>>
>> I believe this last sentence clearly confirms my belief that 'read' and
>> 'get-datum' should recognize the same syntax.
>>
> Well yes, R6RS `read' and R6RS `get-datum' need to understand the same
> syntax, but I thought you were talking about Guile `read' and R6RS
> `get-datum'.
Ah, so you want R6RS 'read' to be different than Guile 'read'.
I think this would be a mistake.
I'd like to allow coherent systems to be built from a mixture of R6RS
code, R7RS code, native Guile code, etc. With this in mind, I think it
would be terribly confusing for users (and not particularly sensible)
for the notation recognized by 'read' to depend upon whether the code
that happens to call 'read' is in an R6RS library or a Guile module.
For example, the code that calls 'read' when compiling source files
happens to be in a Guile module. What does that have to do with the
language being read? Nothing.
> Yup, R6RS `read' needs to be implemented in terms of `get-datum', not
> only because of reader options, but also because of the required
> exception behavior. This is how it's done already -- see
> modules/rnrs/io/simple.scm.
I thought we agreed on IRC that this is an unworkable approach to
supporting R6RS exceptions in Guile. That path leads to a future where
there are two variants of every primitive procedure that might throw
exceptions. It also means duplicating every VM instruction that might
throw exceptions.
Those facts alone would be bad enough, but it gets worse. In a program
composed of a mixture of R6RS and native Guile code, an R6RS exception
handler should be able to properly catch an error that happened within
native Guile code, and vice versa. That won't work with this approach
of throwing R6RS-style exceptions from within R6RS primitives and
Guile-style exceptions within Guile primitives.
IMO, to create a coherent system that allows mixing of code, we need a
single unified exception system that is sufficiently fine-grained (and
provides enough information) to satisfy the needs of both R6RS exception
handlers and legacy Guile exception handlers.
At any given time, there might be exception handlers installed by both
Guile 'catch' and R6RS 'guard'. The code that throws an exception has
no way of knowing which kind of exception handler will catch it.
Therefore, the conversion to native R6RS conditions needs to happen
within the exception handler.
Does that make sense? I thought we discussed this on IRC and agreed on
this general approach.
>> On the flip side, if someone has enabled SRFI-105 curly-infix
>> expressions, or any other reader extension that does not conflict with
>> standard R6RS notation, then both 'get-datum' and 'read' should honor
>> that setting.
>>
>> Does that make sense?
>>
> It does, and I think this is also what my patch implements, if I
> understood both the code and your words correctly :-).
To make this more concrete, let's consider two of the reader options
that you'd apparently like to override within R6RS code:
*** Case insensitivity (you would force case-sensitive mode in R6RS):
R6RS appendix B specifies the following optional reader directives:
#!fold-case
#!no-fold-case
and Guile 2.0.7 now supports this. Your patch would break this when
'read' is used within R6RS code. Furthermore, it would break in a
strange way: #!fold-case or #!no-fold-case would take affect for the
immediately following datum (or the containing datum if the directive is
found within a list), but then the reader would revert to case-sensitive
mode for subsequent datums.
*** Keyword style (you would disallow this option in R6RS):
While it is true that ':' is one of the "extended alphabetic characters"
allowed in identifiers (and therefore the standard requires that :foo be
read as a normal symbol), this has _always_ been the case in every
Scheme standard since at least the R2RS. Nonetheless, some users want a
more convenient syntax for keywords, hence we have this reader option.
It is off by default, but some users prefer to have it on. I don't see
why this setting should be ignored if the code that calls 'read' happens
to be in an R6RS library.
Furthermore, I intend to add another reader directive to set the keyword
option. If you override this option, it will break in the same manner
as for #!fold-case as described above.
I have more to say about this issue, but this is enough for one email :)
Thoughts?
Regards,
Mark
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix reader options for R6RS `get-datum'
2012-12-16 22:12 ` Mark H Weaver
@ 2012-12-17 19:05 ` Andreas Rottmann
2012-12-17 19:30 ` Noah Lavine
0 siblings, 1 reply; 15+ messages in thread
From: Andreas Rottmann @ 2012-12-17 19:05 UTC (permalink / raw)
To: Mark H Weaver; +Cc: guile-devel
Mark H Weaver <mhw@netris.org> writes:
> Andreas Rottmann <a.rottmann@gmx.at> writes:
>
>> Mark H Weaver <mhw@netris.org> writes:
>>
>>> Section 8.3 defines 'read' as follows:
>>>
>>> Reads an external representation from textual-input-port and returns
>>> the datum it represents. The read procedure operates in the same way
>>> as get-datum, see section 8.2.9.
>>>
>>> I believe this last sentence clearly confirms my belief that 'read' and
>>> 'get-datum' should recognize the same syntax.
>>>
>> Well yes, R6RS `read' and R6RS `get-datum' need to understand the same
>> syntax, but I thought you were talking about Guile `read' and R6RS
>> `get-datum'.
>
> Ah, so you want R6RS 'read' to be different than Guile 'read'.
> I think this would be a mistake.
>
I think that's mandated by R6RS and the fact that Guile offers reader
options that plainly incompatible with the syntax described in R6RS.
> I'd like to allow coherent systems to be built from a mixture of R6RS
> code, R7RS code, native Guile code, etc. With this in mind, I think it
> would be terribly confusing for users (and not particularly sensible)
> for the notation recognized by 'read' to depend upon whether the code
> that happens to call 'read' is in an R6RS library or a Guile module.
>
>
Strictly speaking, it's not whether the code is in a "Guile module" or
"R6RS library" (both are actually "Guile modules" in Guile's
implementation of R6RS), but whether the binding imported for `read' is
Guile's core `scm_read' or the one from `(rnrs io simple)'.
> For example, the code that calls 'read' when compiling source files
> happens to be in a Guile module. What does that have to do with the
> language being read? Nothing.
>
>> Yup, R6RS `read' needs to be implemented in terms of `get-datum', not
>> only because of reader options, but also because of the required
>> exception behavior. This is how it's done already -- see
>> modules/rnrs/io/simple.scm.
>
> I thought we agreed on IRC that this is an unworkable approach to
> supporting R6RS exceptions in Guile. That path leads to a future where
> there are two variants of every primitive procedure that might throw
> exceptions. It also means duplicating every VM instruction that might
> throw exceptions.
>
Yeah, but until exception conversion in the `guard' (or `catch') is
implemented, `get-datum' & co. still need to adhere to the
specification, i.e. throw the exceptions mandated by R6RS in the
circustances described therin. I don't think it is necessary to pull
the implementation strategy for exceptions into this discussion (even if
I mistakenly started with it ;-). The issue of reader options is and
orthogonal, if related, one, IMO.
> Those facts alone would be bad enough, but it gets worse. In a program
> composed of a mixture of R6RS and native Guile code, an R6RS exception
> handler should be able to properly catch an error that happened within
> native Guile code, and vice versa. That won't work with this approach
> of throwing R6RS-style exceptions from within R6RS primitives and
> Guile-style exceptions within Guile primitives.
>
> IMO, to create a coherent system that allows mixing of code, we need a
> single unified exception system that is sufficiently fine-grained (and
> provides enough information) to satisfy the needs of both R6RS exception
> handlers and legacy Guile exception handlers.
>
> At any given time, there might be exception handlers installed by both
> Guile 'catch' and R6RS 'guard'. The code that throws an exception has
> no way of knowing which kind of exception handler will catch it.
> Therefore, the conversion to native R6RS conditions needs to happen
> within the exception handler.
>
> Does that make sense? I thought we discussed this on IRC and agreed on
> this general approach.
>
Yeah, we agreed that this is where we want to arrive at, but please
let's discuss reader options only in this thread.
>>> On the flip side, if someone has enabled SRFI-105 curly-infix
>>> expressions, or any other reader extension that does not conflict with
>>> standard R6RS notation, then both 'get-datum' and 'read' should honor
>>> that setting.
>>>
>>> Does that make sense?
>>>
>> It does, and I think this is also what my patch implements, if I
>> understood both the code and your words correctly :-).
>
> To make this more concrete, let's consider two of the reader options
> that you'd apparently like to override within R6RS code:
>
> *** Case insensitivity (you would force case-sensitive mode in R6RS):
>
> R6RS appendix B specifies the following optional reader directives:
>
> #!fold-case
> #!no-fold-case
>
> and Guile 2.0.7 now supports this. Your patch would break this when
> 'read' is used within R6RS code. Furthermore, it would break in a
> strange way: #!fold-case or #!no-fold-case would take affect for the
> immediately following datum (or the containing datum if the directive is
> found within a list), but then the reader would revert to case-sensitive
> mode for subsequent datums.
>
OK, this makes sense. If we have per-port reader options _actually set_
by the contents of that port, these sensibly should override R6RS
syntax, even if they conflict with the "R6RS standard syntax". However,
taking over *global* reader options that contradict behavior expected by
R6RS code makes no sense. Let me make my intents more clear with an
example as well. Assume you write this piece of R6RS code:
(define (text->datums text)
(call-with-port (open-string-input-port text)
(lambda (port)
(let loop ((lst '()))
(let ((datum (read port)))
(if (eof-object? datum)
(reverse lst)
(cons datum lst)))))))
Now, you call `text->datums' with an argument that is within the allowed
syntax for R6RS (whether optional or non-optional), I want to ensure
that the result of the invocation is conformant with the R6RS syntax.
> *** Keyword style (you would disallow this option in R6RS):
>
> While it is true that ':' is one of the "extended alphabetic characters"
> allowed in identifiers (and therefore the standard requires that :foo be
> read as a normal symbol), this has _always_ been the case in every
> Scheme standard since at least the R2RS. Nonetheless, some users want a
> more convenient syntax for keywords, hence we have this reader option.
>
> It is off by default, but some users prefer to have it on. I don't see
> why this setting should be ignored if the code that calls 'read' happens
> to be in an R6RS library.
>
I think this is my assumption that you seem to disagree on: by using the
binding of `read' from `(rnrs io simple)', instead of the one provided
by Guile's core, the writer of the code using that binding has declared
that he wishes `read' to adhere to R6RS. Your suggestion would break
that code for any users who like to set reader options incompatible with
R6RS. The same was true with R5RS read, but with R6RS, the problem is
sharpened by the presence of libraries (and thus a way to combine code
in modular, defined way).
Let's assume the code in question is `text->datums' as given above,
placed in some library/module, then `(symbol? (car (text->datums
":foo")))' has to hold true, no matter the global reader options.
Otherwise the `read' provided by `(rnrs io simple)' would fail to
implement R6RS. Allowing the user to override syntax on a global level
(as opposed to on an per-port one) means breaking perfectly fine code. I
don't think you can have it both ways.
> Furthermore, I intend to add another reader directive to set the keyword
> option. If you override this option, it will break in the same manner
> as for #!fold-case as described above.
>
That per-port options should override the "language defaults"
(i.e. R6RS-compatible or full range of Guile's global reader options) is
a good point and completely in accordance with R6RS, IMHO: If the text
read from a port contains directives not defined in R6RS, R6RS of course
cannot say anything about the resulting objects read and their
relationship to the source text (imagine a hypothetical #!brainfuck
directive ;-).
Regard, Rotty
--
Andreas Rottmann -- <http://rotty.xx.vu/>
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: Fix reader options for R6RS `get-datum'
2012-12-17 19:05 ` Andreas Rottmann
@ 2012-12-17 19:30 ` Noah Lavine
0 siblings, 0 replies; 15+ messages in thread
From: Noah Lavine @ 2012-12-17 19:30 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: Mark H Weaver, guile-devel
[-- Attachment #1: Type: text/plain, Size: 1709 bytes --]
Hello,
On Mon, Dec 17, 2012 at 2:05 PM, Andreas Rottmann <a.rottmann@gmx.at> wrote:
> I think this is my assumption that you seem to disagree on: by using the
> binding of `read' from `(rnrs io simple)', instead of the one provided
> by Guile's core, the writer of the code using that binding has declared
> that he wishes `read' to adhere to R6RS.
I actually disagree with this assumption too, but let me explain why. I
think there are some situations where it is right and some where it is
wrong.
A program will want strict R6RS semantics (as Rotty says) if it is, for
instance, reading from a data file in a strictly-defined format. In that
case, it might have written its parser assuming an R6RS-compliant `read'
function, and it could fail if it gets an extended `read'.
However, the R6RS reader would be wrong for a program that wants to, say,
read a file of Scheme code as S-expressions and manipulate it. The program
would have to be written using R6RS libraries to be portable, but the right
thing for it to do in any particular Scheme implementation is to read
exactly the same syntax as that implementation. (This is a bit of a
contrived example, but I hope you can see my point that some programs don't
want strict R6RS semantics as much as they want to integrate with whatever
extensions their host implementation has made to `read', and that what they
really want is a portable way to access the built-in `read'.)
If you believe both of these examples, then programs need to be able to
choose their own meaning. Perhaps the solution is a procedure like
`strict-r6rs-read-options' for programs that want that, or alternatively
`native-read-options' for programs that don't.
Regards,
Noah
[-- Attachment #2: Type: text/html, Size: 2091 bytes --]
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 2/3] Add internal API to specify reader options at reader invocation
2012-12-09 12:47 ` [PATCH 2/3] Add internal API to specify reader options at reader invocation Andreas Rottmann
@ 2013-01-21 20:44 ` Andy Wingo
0 siblings, 0 replies; 15+ messages in thread
From: Andy Wingo @ 2013-01-21 20:44 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: guile-devel
On Sun 09 Dec 2012 13:47, Andreas Rottmann <a.rottmann@gmx.at> writes:
> * libguile/private-options.h: Introduce a new enum indexing the read
> options, and use its values as indices for scm_read_opts.
Seems to define struct scm_read_opts as well?
> +SCM scm_i_read (SCM port, const scm_t_read_opts *opts, unsigned int preset)
Comment needed about the role of "preset"
> @@ -2191,8 +2173,8 @@ set_port_read_option (SCM port, int option, int new_value)
> read_options = scm_to_uint (scm_read_options);
> else
> read_options = READ_OPTIONS_INHERIT_ALL;
> - read_options &= ~(READ_OPTION_MASK << option);
> - read_options |= new_value << option;
> + read_options &= ~(READ_OPTION_MASK << (option * 2));
> + read_options |= new_value << (option * 2);
This is getting super-nasty. Some kind of abstraction is needed here,
perhaps a static function.
> @@ -2244,28 +2226,29 @@ init_read_options (SCM port, scm_t_read_opts *opts)
> else
> read_options = READ_OPTIONS_INHERIT_ALL;
>
> + if ((preset & (1 << SCM_READ_OPTION_KEYWORD_STYLE)) == 0) {
> + x = READ_OPTION_MASK & (read_options >> (SCM_READ_OPTION_KEYWORD_STYLE * 2));
Why is this option special? (I have a guess, but a comment seems to be
lacking)
> \f
> +typedef struct scm_struct_read_opts scm_t_read_opts;
No need to infix "struct" into the name; struct tag namespaces are
disjoint from type namespaces.
I have no idea if this patch is good or not; just a drive-by.
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
* Re: [PATCH 3/3] Make `get-datum' conform more closely to R6RS semantics
2012-12-09 12:47 ` [PATCH 3/3] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
@ 2013-01-21 20:48 ` Andy Wingo
0 siblings, 0 replies; 15+ messages in thread
From: Andy Wingo @ 2013-01-21 20:48 UTC (permalink / raw)
To: Andreas Rottmann; +Cc: guile-devel
On Sun 09 Dec 2012 13:47, Andreas Rottmann <a.rottmann@gmx.at> writes:
> With Guile's default reader options, R6RS hex escape and EOL escape
> behavior is missing. This change enables the former via the
> `r6rs-hex-escapes' option, and gets us closer to the latter by setting
> `hungry-eol-escapes'.
I would consider exposing an optional second argument to `read', for
read options. Probably Ludo and Mark disagree? But I think for Scheme
the cyclomatic complexity is not that bad, as we are replacing globals
with a parameter that we pass around anyway. I would expose
(current-reader-options) as a parameter, and have it be the default
value. Updating the current read options would update this parameter
somehow. Could we consider a read-enable FOO as updating the binding in
the current dynamic state only?
Maybe I'm totally wrong here :)
Andy
--
http://wingolog.org/
^ permalink raw reply [flat|nested] 15+ messages in thread
end of thread, other threads:[~2013-01-21 20:48 UTC | newest]
Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-12-09 12:47 Fix reader options for R6RS `get-datum' Andreas Rottmann
2012-12-09 12:47 ` [PATCH 1/3] Split r6rs-ports.c according to module boundaries Andreas Rottmann
2012-12-15 5:35 ` Mark H Weaver
2012-12-15 5:38 ` Mark H Weaver
2012-12-09 12:47 ` [PATCH 2/3] Add internal API to specify reader options at reader invocation Andreas Rottmann
2013-01-21 20:44 ` Andy Wingo
2012-12-09 12:47 ` [PATCH 3/3] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
2013-01-21 20:48 ` Andy Wingo
2012-12-11 20:23 ` Fix reader options for R6RS `get-datum' Mark H Weaver
2012-12-12 21:32 ` Andreas Rottmann
2012-12-12 23:48 ` Mark H Weaver
2012-12-14 3:22 ` Andreas Rottmann
2012-12-16 22:12 ` Mark H Weaver
2012-12-17 19:05 ` Andreas Rottmann
2012-12-17 19:30 ` Noah Lavine
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).