unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* [PATCH] Make `get-datum' conform more closely to R6RS semantics
@ 2012-11-04 23:59 Andreas Rottmann
  2012-11-05  5:07 ` Mark H Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Andreas Rottmann @ 2012-11-04 23:59 UTC (permalink / raw
  To: guile-devel

* module/rnrs/io/ports.scm (get-datum): Set reader options to be more
  compatible with R6RS syntax.

  With Guile's default reader options, R6RS hex escape and EOL escape
  behavior is missing.  This change enables the former via the
  `r6rs-hex-escapes' option, and gets us closer to the latter by setting
  `hungry-eol-escapes'.

* test-suite/tests/r6rs-ports.test ("8.2.9 Textual input")["get-datum"]:
  New tests.
---
 module/rnrs/io/ports.scm         |   13 +++++++++++--
 test-suite/tests/r6rs-ports.test |   28 ++++++++++++++++++++++++++++
 2 files changed, 39 insertions(+), 2 deletions(-)

diff --git a/module/rnrs/io/ports.scm b/module/rnrs/io/ports.scm
index fddb491..6e6a66d 100644
--- a/module/rnrs/io/ports.scm
+++ b/module/rnrs/io/ports.scm
@@ -1,6 +1,6 @@
 ;;;; ports.scm --- R6RS port API                    -*- coding: utf-8 -*-
 
-;;;;	Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
+;;;;	Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
 ;;;;
 ;;;; This library is free software; you can redistribute it and/or
 ;;;; modify it under the terms of the GNU Lesser General Public
@@ -437,7 +437,16 @@ return the characters accumulated in that port."
   (with-textual-input-conditions port (read-char port)))
 
 (define (get-datum port)
-  (with-textual-input-conditions port (read port)))
+  (with-textual-input-conditions port
+    (let ((saved-options (read-options)))
+      (dynamic-wind
+        (lambda () (read-options '(positions
+                                   keywords #f
+                                   square-brackets
+                                   r6rs-hex-escapes
+                                   hungry-eol-escapes)))
+        (lambda () (read port))
+        (lambda () (read-options saved-options))))))
 
 (define (get-line port)
   (with-textual-input-conditions port (read-line port 'trim)))
diff --git a/test-suite/tests/r6rs-ports.test b/test-suite/tests/r6rs-ports.test
index 46da67f..6a7386e 100644
--- a/test-suite/tests/r6rs-ports.test
+++ b/test-suite/tests/r6rs-ports.test
@@ -719,6 +719,34 @@
       (and (= 3 (get-string-n! port s 6 3))
            (string=? s "Isn't GNU great?"))))
 
+  (with-test-prefix "get-datum"
+    (let ((string->datum (lambda (s) (get-datum (open-input-string s)))))
+      (pass-if "symbol"
+        (eq? (string->datum "foo") 'foo))
+      (pass-if "symbol [starting with colon]"
+        (eq? ':foo (string->datum ":foo")))
+      (pass-if "string"
+        (string=? "foo" (string->datum "\"foo\"")))
+      (pass-if "string [with hex escapes]"
+        (string=? "bar\nA" (string->datum "\"bar\\x0A;\\x41;\"")))
+      (pass-if "string [hungry EOL]"
+        (string=? "bar baz" (string->datum "\"bar \\\n   baz\"")))
+      ;; FIXME: actually, R6RS demands an even more hungry EOL escape
+      ;; than the reader currently implements: also any whitespace
+      ;; between the backslash and the newline should vanish. Currently,
+      ;; the reader barfs on that.
+      (pass-if "string [hungry EOL, space also before newline]"
+        (throw 'unresolved)
+        (string=? "bar baz" (string->datum "\"bar \\  \n   baz\"")))
+      (pass-if "number [decimal]"
+        (= (string->datum "42") 42))
+      (pass-if "number [hexadecimal]"
+        (= (string->datum "#x2A") 42))
+      (pass-if "number [octal]"
+        (= (string->datum "#o0777") 511))
+      (pass-if "number [binary]"
+        (= (string->datum "#b101010") 42))))
+
   (with-test-prefix "read error"
     (pass-if-condition "get-char" i/o-read-error?
       (get-char (make-failing-port)))
-- 
1.7.10.4




^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make `get-datum' conform more closely to R6RS semantics
  2012-11-04 23:59 [PATCH] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
@ 2012-11-05  5:07 ` Mark H Weaver
  2012-11-05 17:52   ` Ludovic Courtès
  2012-11-06 19:55   ` [PATCH] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
  0 siblings, 2 replies; 7+ messages in thread
From: Mark H Weaver @ 2012-11-05  5:07 UTC (permalink / raw
  To: Andreas Rottmann; +Cc: guile-devel

Hi Andreas,

Andreas Rottmann <a.rottmann@gmx.at> writes:
> * module/rnrs/io/ports.scm (get-datum): Set reader options to be more
>   compatible with R6RS syntax.
>
>   With Guile's default reader options, R6RS hex escape and EOL escape
>   behavior is missing.  This change enables the former via the
>   `r6rs-hex-escapes' option, and gets us closer to the latter by setting
>   `hungry-eol-escapes'.
>
> * test-suite/tests/r6rs-ports.test ("8.2.9 Textual input")["get-datum"]:
>   New tests.
> ---
>  module/rnrs/io/ports.scm         |   13 +++++++++++--
>  test-suite/tests/r6rs-ports.test |   28 ++++++++++++++++++++++++++++
>  2 files changed, 39 insertions(+), 2 deletions(-)
>
> diff --git a/module/rnrs/io/ports.scm b/module/rnrs/io/ports.scm
> index fddb491..6e6a66d 100644
> --- a/module/rnrs/io/ports.scm
> +++ b/module/rnrs/io/ports.scm
> @@ -1,6 +1,6 @@
>  ;;;; ports.scm --- R6RS port API                    -*- coding: utf-8 -*-
>  
> -;;;;	Copyright (C) 2009, 2010, 2011 Free Software Foundation, Inc.
> +;;;;	Copyright (C) 2009, 2010, 2011, 2012 Free Software Foundation, Inc.
>  ;;;;
>  ;;;; This library is free software; you can redistribute it and/or
>  ;;;; modify it under the terms of the GNU Lesser General Public
> @@ -437,7 +437,16 @@ return the characters accumulated in that port."
>    (with-textual-input-conditions port (read-char port)))
>  
>  (define (get-datum port)
> -  (with-textual-input-conditions port (read port)))
> +  (with-textual-input-conditions port
> +    (let ((saved-options (read-options)))
> +      (dynamic-wind
> +        (lambda () (read-options '(positions
> +                                   keywords #f
> +                                   square-brackets
> +                                   r6rs-hex-escapes
> +                                   hungry-eol-escapes)))
> +        (lambda () (read port))
> +        (lambda () (read-options saved-options))))))

The problem with the approach above is that it sets the read options
globally, which is obviously a bad idea in a multithreaded program.

Until very recently there was no other practical option, but now 'read'
starts by building a private struct 'scm_t_read_opts' and passes it down
to all the helper functions explicitly.  This was partly what enable
per-port read options, which are now supported internally and accessible
using reader directives such as #!fold-case, #!no-fold-case, and
#!curly-infix.  It also means that it is now feasible to provide another
'read' procedure that accepts a set of read options explicitly.

I've been avoiding adding a public API for this, because I feel that the
current 'read-options' API is poorly-designed and I'd rather take the
opportunity to come up with a clean design.

For now, I suggest that we add 'get-datum' as a C function in read.c,
which initializes the 'scm_t_read_opts' as needed for R6RS.

Also, while we're on the subject, now that we have per-port read
options, perhaps #!r6rs ought to set some of them instead of being a
no-op.  See 'scm_read_shebang' in read.c.

What do you think?

    Regards,
      Mark



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make `get-datum' conform more closely to R6RS semantics
  2012-11-05  5:07 ` Mark H Weaver
@ 2012-11-05 17:52   ` Ludovic Courtès
  2012-11-06  7:36     ` Improving the API for read options Mark H Weaver
  2012-11-06 19:55   ` [PATCH] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
  1 sibling, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2012-11-05 17:52 UTC (permalink / raw
  To: guile-devel

Hello!

Mark H Weaver <mhw@netris.org> skribis:

> The problem with the approach above is that it sets the read options
> globally, which is obviously a bad idea in a multithreaded program.
>
> Until very recently there was no other practical option, but now 'read'
> starts by building a private struct 'scm_t_read_opts' and passes it down
> to all the helper functions explicitly. 

Yep, it’s good to see that this work is already beneficial to other
areas.  :-)

> I've been avoiding adding a public API for this, because I feel that the
> current 'read-options' API is poorly-designed and I'd rather take the
> opportunity to come up with a clean design.

What about just mapping the existing ‘read-options’ to something like:

  (set-port-read-options! PORT OPTIONS)

where OPTIONS would be a list of symbols, as for ‘read-options’?  This
seems to me like the obvious API extension, but maybe I’m overlooking
something.

> For now, I suggest that we add 'get-datum' as a C function in read.c,
> which initializes the 'scm_t_read_opts' as needed for R6RS.

Yes, there’s still this possibility if the above one doesn’t work.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 7+ messages in thread

* Improving the API for read options
  2012-11-05 17:52   ` Ludovic Courtès
@ 2012-11-06  7:36     ` Mark H Weaver
  2012-11-06 19:01       ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Mark H Weaver @ 2012-11-06  7:36 UTC (permalink / raw
  To: Ludovic Courtès; +Cc: guile-devel

Hi Ludovic,

ludo@gnu.org (Ludovic Courtès) writes:
>> I've been avoiding adding a public API for this, because I feel that the
>> current 'read-options' API is poorly-designed and I'd rather take the
>> opportunity to come up with a clean design.
>
> What about just mapping the existing ‘read-options’ to something like:
>
>   (set-port-read-options! PORT OPTIONS)
>
> where OPTIONS would be a list of symbols, as for ‘read-options’?  This
> seems to me like the obvious API extension, but maybe I’m overlooking
> something.

Yes, that would be the obvious minimal change to the existing API, but
there are some problems with the existing API that I'd prefer not to
propagate the existing API as-is.

One problem I have with the existing API has to do with its treatment of
square brackets.  There are multiple things that one might want to do
with square brackets, but since 'square-brackets' is a boolean option,
there is no nice way to add additional meanings of square brackets.  We
could add more boolean options, but that raises the question of what to
do when multiple booleans pertaining to square brackets are enabled.
Which one takes precedence?  And there will forever be this wart on the
API that one particular use has the privileged name 'square-brackets'.

One might make a similar complaint about the 'curly-infix' option, but
in that case it's not so simple, because curly-infix affects more than
just curly braces.  Although it allows us to do (almost) whatever we
want to square brackets, it requires that square brackets be delimiters,
and furthermore assigns a meaning to square brackets in one particular
context: within curly braces, when the open square bracket immediately
follows another expression without intervening whitespace.

This makes me wonder if it might be worthwhile to introduce an
additional layer of abstraction between the read options that the user
sees and manipulates, and the low-level read options that the reader
consults.  It might make sense to make options like 'curly-infix' or
'kawa-style-square-brackets' into higher-level views on the low-level
options.

This in turn suggests that the existing API that allows the user to
query the complete list of read options (which correspond precisely to
the options that they directly set) is a poor model.

We might take a lesson from Scheme's treatment of types.  In particular,
Scheme does not provide a way to query the type of an object.  Although
the Scheme standards guarantee that a certain set of types are disjoint,
in general the philosophy is that an object might be of more than one
type.  Therefore, only type predicates are provided, with the
understanding that more than one type predicate might return true for a
given object.

I haven't yet had time to think this through, but my gut instinct is
that I would prefer an API closer to this:

* We provide an opaque type 'read-options' which we reserve the right to
  change at any time, and a set of predefined read-options objects such
  as 'inherit-all-read-options', 'guile-default-read-options',
  'r6rs-read-options', etc.

* We provide procedures to set! and retrieve the read-options object
  to/from a port, and perhaps to/from a global setting.  We might also
  provide a parameter.

* For each user-visible (high-level) read option, we provide a set of
  predicates and mutators for 'read-options' objects.

* We provide a way to explicitly pass a 'read-options' object to the
  reader.

* We define the order in which the 'read-options' objects are checked,
  e.g. first from the explicit parameter (if any), then the per-port
  options, then the fluid (if we add one), then the globals.

* We provide a way for user-defined token readers to accept the
  'read-options' object from the reader, so that it can be propagated
  properly to subordinate readers.

* We need to think about whether (and how) to expose inheritance to the
  user.  We might want to provide ways to reset an option to "inherit"
  mode and to test whether an option is in "inherit" mode.  However, in
  other contexts the user may just want to know about the applicable
  option.

This kind of API would give us more freedom to enhance and generalize
the reader in the future, while providing an easy-to-use and stable API
that users can rely upon.

Obviously, churning through all these issues is a big job, and I don't
think it's realistic to do this job properly in time for 2.0.7.
Therefore, I'd prefer to avoid exposing this API for now.

What do you think?

    Mark



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Improving the API for read options
  2012-11-06  7:36     ` Improving the API for read options Mark H Weaver
@ 2012-11-06 19:01       ` Ludovic Courtès
  2012-11-11  7:56         ` Mark H Weaver
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2012-11-06 19:01 UTC (permalink / raw
  To: Mark H Weaver; +Cc: guile-devel

Mark H Weaver <mhw@netris.org> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>>> I've been avoiding adding a public API for this, because I feel that the
>>> current 'read-options' API is poorly-designed and I'd rather take the
>>> opportunity to come up with a clean design.
>>
>> What about just mapping the existing ‘read-options’ to something like:
>>
>>   (set-port-read-options! PORT OPTIONS)
>>
>> where OPTIONS would be a list of symbols, as for ‘read-options’?  This
>> seems to me like the obvious API extension, but maybe I’m overlooking
>> something.

[...]

> One problem I have with the existing API has to do with its treatment of
> square brackets.  There are multiple things that one might want to do
> with square brackets, but since 'square-brackets' is a boolean option,

For that particular problem, I’d propose:

  (set-port-read-options! PORT OPTIONS)

but with OPTIONS being an list of option/value pairs, or:

  (set-port-read-option! PORT OPTION VALUE)

where OPTION is a symbol.

Nothing fancy, but that should do the job while being reasonably
future-proof, no?

[...]

> I haven't yet had time to think this through, but my gut instinct is
> that I would prefer an API closer to this:
>
> * We provide an opaque type 'read-options' which we reserve the right to
>   change at any time, and a set of predefined read-options objects such
>   as 'inherit-all-read-options', 'guile-default-read-options',
>   'r6rs-read-options', etc.
>
> * We provide procedures to set! and retrieve the read-options object
>   to/from a port, and perhaps to/from a global setting.  We might also
>   provide a parameter.
>
> * For each user-visible (high-level) read option, we provide a set of
>   predicates and mutators for 'read-options' objects.
>
> * We provide a way to explicitly pass a 'read-options' object to the
>   reader.
>
> * We define the order in which the 'read-options' objects are checked,
>   e.g. first from the explicit parameter (if any), then the per-port
>   options, then the fluid (if we add one), then the globals.
>
> * We provide a way for user-defined token readers to accept the
>   'read-options' object from the reader, so that it can be propagated
>   properly to subordinate readers.
>
> * We need to think about whether (and how) to expose inheritance to the
>   user.  We might want to provide ways to reset an option to "inherit"
>   mode and to test whether an option is in "inherit" mode.  However, in
>   other contexts the user may just want to know about the applicable
>   option.
>
> This kind of API would give us more freedom to enhance and generalize
> the reader in the future, while providing an easy-to-use and stable API
> that users can rely upon.

Right.  However, it seems to be addressing a lot more problems (more
like Guile-Reader) than what we were initially discussing, which is to
provide a way for users to set per-port options.

I was really hoping that a dumb solution as I proposed would be enough.
:-)

WDYT?

(Regarding reader extensibility, I’ve become dubious about the whole
idea of reusable and composable token readers.  I think readers should
rather be generated from SILex-style declarations, and we could have
several of them pre-generated.  And the ‘scm_t_read opts’ may not scale
well in terms of cyclomatic complexity.)

Thanks,
Ludo’.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: [PATCH] Make `get-datum' conform more closely to R6RS semantics
  2012-11-05  5:07 ` Mark H Weaver
  2012-11-05 17:52   ` Ludovic Courtès
@ 2012-11-06 19:55   ` Andreas Rottmann
  1 sibling, 0 replies; 7+ messages in thread
From: Andreas Rottmann @ 2012-11-06 19:55 UTC (permalink / raw
  To: Mark H Weaver; +Cc: guile-devel

Mark H Weaver <mhw@netris.org> writes:

> Hi Andreas,
>
> Andreas Rottmann <a.rottmann@gmx.at> writes:
>> * module/rnrs/io/ports.scm (get-datum): Set reader options to be more
>>   compatible with R6RS syntax.
>>
>>   With Guile's default reader options, R6RS hex escape and EOL escape
>>   behavior is missing.  This change enables the former via the
>>   `r6rs-hex-escapes' option, and gets us closer to the latter by setting
>>   `hungry-eol-escapes'.
>>
>> * test-suite/tests/r6rs-ports.test ("8.2.9 Textual input")["get-datum"]:
>>   New tests.
>> ---
>>  module/rnrs/io/ports.scm         |   13 +++++++++++--
>>  test-suite/tests/r6rs-ports.test |   28 ++++++++++++++++++++++++++++
>>  2 files changed, 39 insertions(+), 2 deletions(-)
>>
>> [... patch elided ... ]

>
> The problem with the approach above is that it sets the read options
> globally, which is obviously a bad idea in a multithreaded program.
>
Oops, yes. I was under the impression that the read options are stored
in a fluid, but I didn't verify that assumption.

> Until very recently there was no other practical option, but now 'read'
> starts by building a private struct 'scm_t_read_opts' and passes it down
> to all the helper functions explicitly.  This was partly what enable
> per-port read options, which are now supported internally and accessible
> using reader directives such as #!fold-case, #!no-fold-case, and
> #!curly-infix.  It also means that it is now feasible to provide another
> 'read' procedure that accepts a set of read options explicitly.
>
> I've been avoiding adding a public API for this, because I feel that the
> current 'read-options' API is poorly-designed and I'd rather take the
> opportunity to come up with a clean design.
>
> For now, I suggest that we add 'get-datum' as a C function in read.c,
> which initializes the 'scm_t_read_opts' as needed for R6RS.
>
Yup, I've seen that change while skimming over recent changes in
stable-2.0, and have come to the conclusion that a solution that would
directly call `read' with an R6RS set of options would be better. My
patch has originated some way back -- I'll rework it along the lines of
your proposal.

> Also, while we're on the subject, now that we have per-port read
> options, perhaps #!r6rs ought to set some of them instead of being a
> no-op.  See 'scm_read_shebang' in read.c.
>
Yep, I think that's a good idea. This can also spare us from setting
them as part of an --r6rs command-line option, which I suggested in the
%load-extensions thread
<http://lists.gnu.org/archive/html/guile-devel/2012-11/msg00018.html>.

Regards, Rotty
-- 
Andreas Rottmann -- <http://rotty.yi.org/>



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Improving the API for read options
  2012-11-06 19:01       ` Ludovic Courtès
@ 2012-11-11  7:56         ` Mark H Weaver
  0 siblings, 0 replies; 7+ messages in thread
From: Mark H Weaver @ 2012-11-11  7:56 UTC (permalink / raw
  To: Ludovic Courtès; +Cc: guile-devel

ludo@gnu.org (Ludovic Courtès) writes:

> Mark H Weaver <mhw@netris.org> skribis:
>
>> One problem I have with the existing API has to do with its treatment of
>> square brackets.  There are multiple things that one might want to do
>> with square brackets, but since 'square-brackets' is a boolean option,
>
> For that particular problem, I’d propose:
>
>   (set-port-read-options! PORT OPTIONS)
>
> but with OPTIONS being an list of option/value pairs, or:
>
>   (set-port-read-option! PORT OPTION VALUE)
>
> where OPTION is a symbol.
>
> Nothing fancy, but that should do the job while being reasonably
> future-proof, no?

Okay, if we limit the API to setting per-port options for now (and
leaving any unspecified options unchanged), I guess that's fine.

I'll work on it.

    Mark



^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2012-11-11  7:56 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2012-11-04 23:59 [PATCH] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann
2012-11-05  5:07 ` Mark H Weaver
2012-11-05 17:52   ` Ludovic Courtès
2012-11-06  7:36     ` Improving the API for read options Mark H Weaver
2012-11-06 19:01       ` Ludovic Courtès
2012-11-11  7:56         ` Mark H Weaver
2012-11-06 19:55   ` [PATCH] Make `get-datum' conform more closely to R6RS semantics Andreas Rottmann

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).