unofficial mirror of bug-guile@gnu.org 
 help / color / mirror / Atom feed
* bug#38398: non-obvious SCM_EOF_VAL rationale
@ 2019-11-27  7:44 Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language
  2019-11-27  8:55 ` John Cowan
  0 siblings, 1 reply; 4+ messages in thread
From: Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language @ 2019-11-27  7:44 UTC (permalink / raw)
  To: 38398

The part of the Guile manual on the representation of immediate objects
says:

# -- Macro: SCM SCM_EOF_VAL
#     The Scheme end-of-file value.  It has no standard written
#     representation, for obvious reasons.

I disagree with the manual: the reasons for the EOF value having no
s-expression representation are not at all obvious.  It's fairly obvious
that it's a value that can't be returned by read-char, and therefore is
not itself a character, but that's quite a different matter.  The lack of
s-expression representation actually comes from the entirely unobvious,
and undocumented in Guile, use of the EOF value with the read function.

In the RnRS series, the concept of an EOF object appears in R2RS, and
remains essentially unchanged from there.  (The only difference is that
R6RS specifies that there is one EOF object, whereas all others allow
for multiple EOF objects.)  They all specify that if the read function
encounters EOF then it will return an EOF object, and in order to support
that usage they also specify that EOF objects can never be returned
by read.  This poor design precludes RnRS specifying read syntax for any
EOF object.  The relationship here is fairly obvious, but only once one
is aware of this rather surprising use of EOF objects by read.

The situation in Guile is more muddied.  Because Guile supports the
"#." syntax for read-time evaluation, it actually *is* possible for the
read function to return an EOF object without having reached EOF:

$ echo '#.(eof-object)' | guile-2.2 -c '(fluid-set! read-eval? #t) (use-modules (rnrs io simple)) (write (read)) (newline)'
#<eof>

This is technically a violation of RnRS, but I have no complaint
about breaking such an onerous rule in these circumstances where it's
necessitated only by such a poor design decision.  Anyway, it means
that the RnRS rationale for having no s-expression representation for
the EOF object *doesn't apply* to Guile.  There's also precedent, in
"#nil", for Guile extending read syntax beyond RnRS for immediate objects.
So it seems to me that you are quite free to invent some readable syntax
such as "#eof" for the EOF object.

So, to resolve this, firstly you should add to the documentation of
the read function some text about its behaviour on EOF (on which it
is currently silent).  Perhaps also add some text about the ambiguity
of read returning the EOF object.  Then you should remove the ", for
obvious reasons" part of the SCM_EOF_VAL documentation.  After that you
have a choice.  You could leave the lack of s-expression representation
unexplained.  Alternatively you could attempt an actual explanation, which
in the minimal form would be "so that without the use of the non-standard
read-time-evaluation facility it can't be returned by the read function
in non-end-of-file situations, which would cause an ambiguity".  For Guile
2.4 you could instead add a read syntax for it and document that.

-zefram





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#38398: non-obvious SCM_EOF_VAL rationale
  2019-11-27  7:44 bug#38398: non-obvious SCM_EOF_VAL rationale Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language
@ 2019-11-27  8:55 ` John Cowan
  2019-11-27 12:05   ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language
  0 siblings, 1 reply; 4+ messages in thread
From: John Cowan @ 2019-11-27  8:55 UTC (permalink / raw)
  To: Zefram; +Cc: 38398

[-- Attachment #1: Type: text/plain, Size: 2452 bytes --]

On Wed, Nov 27, 2019 at 2:45 AM Zefram via Bug reports for GUILE, GNU's
Ubiquitous Extension Language <bug-guile@gnu.org> wrote:


> It's fairly obvious
> that it's a value that can't be returned by read-char, and therefore is
> not itself a character, but that's quite a different matter.


On the contrary:  the EOF object is not a character, but it *can* be
returned by read-char .  Indeed it *is* returned by read-char just in case
read-char is called after the last character of its input port has been
read.  This makes it possible to distinguish between two cases: read-char
returns a character if there are any in the input port, and the EOF object
if there are none.

By the same token, read can return either a datum value or an EOF object.
It returns a datum value if the remaining characters in its input port
constitute at least one datum (what R6RS calls an "external
representation") or the EOF object if no characters are available, and
raises an exception if the available characters do not constitute a datum.
An input port containing just "(", for example, will not return an EOF
object; it will raise an exception.


> The lack of
> s-expression representation actually comes from the entirely unobvious,
> and undocumented in Guile, use of the EOF value with the read function.
>

It's true that section 6.18.2 of the Guile 2.2.x manual is rather terse and
does not document this behavior.  However, section 4.1 says that Guile is
fully compliant with R5RS.  This means that it incorporates by reference
the R5RS specification, and in particular section 6.6.2, which restates at
greater length the rules I have given above.  The definition of read in
R6RS defers to the definition of get-datum (both are in library section
8.2.9), which is yet another restatement of the same rules.


> This poor design precludes RnRS specifying read syntax for any
> EOF object.


Why do you believe it to be a poor design?  It seems quite appropriate to
me for the EOF object not to be a datum value, for the same reason that it
should not be a character.  You nowhere state what purpose such a read
syntax would serve.  Do you wish to be able to use read to input a list of
EOF objects, for instance?  What would you do with them?



John Cowan          http://vrici.lojban.org/~cowan        cowan@ccil.org
Pour moi, les villes du Silmarillion ont plus de realite que Babylone.
                --Christopher Tolkien, as interviewed by Le Monde

[-- Attachment #2: Type: text/html, Size: 3394 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#38398: non-obvious SCM_EOF_VAL rationale
  2019-11-27  8:55 ` John Cowan
@ 2019-11-27 12:05   ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language
  2019-11-27 12:34     ` tomas
  0 siblings, 1 reply; 4+ messages in thread
From: Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language @ 2019-11-27 12:05 UTC (permalink / raw)
  To: John Cowan; +Cc: 38398

John Cowan wrote:
>On the contrary:  the EOF object is not a character, but it *can* be
>returned by read-char .

Bother.  Of course I meant "can't be returned by read-char in a non-EOF
situation".  I was alluding precisely to it being distinguishable from
characters for the purposes of that return convention.

>                                  However, section 4.1 says that Guile is
>fully compliant with R5RS.

And yet, as I noted, it's actually non-compliant, in a way that's directly
relevant to this issue.

>Why do you believe it to be a poor design?

Because it makes it impossible to distinguish between reaching EOF and
reading a value that is otherwise a perfectly good one.  Or, from the
other point of view, because it requires that read syntax be crippled
specifically to prevent this one value ever being a genuine result
of reading.  read-char is free to use a distinguished return value
for EOF because the things it can read in a non-EOF situation form an
obviously-constrained subset of values.  The nature of the read function,
however, is that it can read basically any value, so there is no obvious
place for a distinguished value for EOF.

Although the RnRS read syntax doesn't cover absolutely all values,
when extending the read syntax it's quite easy, even unintentionally,
to make it capable of reading types of object that RnRS doesn't imagine
being readable.  Indeed, not only does Guile have the occasionally-useful
"#.", which makes absolutely all values readable, it's also got the
read-hash-extend system, which invites casual extension, and does nothing
to prevent user extensions returning the EOF object.

So it makes much more sense to embrace the ability of read to read
any value whatsoever, and to use some other mechanism to signal EOF.
Common Lisp, for example, which has "#." as standard, specifies that
read is to signal an error by default if it's at EOF.

>                                            It seems quite appropriate to
>me for the EOF object not to be a datum value, for the same reason that it
>should not be a character.  You nowhere state what purpose such a read
>syntax would serve.

You're making a bit of a leap here, if there's meant to be some causal
connection between these two sentences.  By "such a read syntax" you seem
to be referring to my "#eof" suggestion, but the case against the RnRS
design of read doesn't depend at all on whether there's a read syntax
specifically for that object.

The use of a distinguished EOF return value from read, and the consequent
rationale for not having a specific read syntax for the EOF object, is
founded on the idea that read can't return the EOF object *at all* in a
non-EOF situation.  This is undermined for Guile by the already-existing
"#." and read-hash-extend, without any need to invent new syntax.

To answer the second sentence in isolation: it would serve about the
same use as "#nil", making it easier to reference this useful object,
and extending the scope within which write-read round-tripping works.
I don't have strong feelings about having a specific read syntax,
it's just that this kind of distinguished object usually does have
specific syntax ("()", "#t", "#nil").  However, not every other object
like this has a read syntax; Guile's `unspecified' value is another one
that doesn't.  (Tangent: the unspecified value could equally well do
with a read syntax, but through testing with "#.*unspecified*" I note
that at present weird behaviour results from actually reading it.)

>                     Do you wish to be able to use read to input a list of
>EOF objects, for instance?  What would you do with them?

In code, I can imagine using a quoted EOF object in order to return
it from a function that's following something like read-char's return
convention, or to pass it to a function that expects values following
a similar convention.  Also to pass it to something like memq, for the
purposes of testing a value that could be the EOF object.  (A quoted
EOF object currently works in the interpreter but not in the compiler.)
In data, I imagine the EOF object would appear because of much the
same situations: it got returned from something like read-char, or it's
going to be fed to something that expects to occasionally receive the
EOF object.  Stick them in a list?  Sure, a list of values on its way
from A to B could well include an EOF object.

But please don't get sidetracked.  This wasn't a feature request for
"#eof"; that's just an idea that idly arose from consideration of the
rationale in question.  The issue that I'm seeking to get resolved is
that the documentation says the reason for the EOF object having no
specific read syntax is obvious, when in context it's really not.

-zefram





^ permalink raw reply	[flat|nested] 4+ messages in thread

* bug#38398: non-obvious SCM_EOF_VAL rationale
  2019-11-27 12:05   ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language
@ 2019-11-27 12:34     ` tomas
  0 siblings, 0 replies; 4+ messages in thread
From: tomas @ 2019-11-27 12:34 UTC (permalink / raw)
  To: 38398

[-- Attachment #1: Type: text/plain, Size: 602 bytes --]

On Wed, Nov 27, 2019 at 12:05:34PM +0000, Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language wrote:

[...]

> But please don't get sidetracked.  This wasn't a feature request for
> "#eof" [...]

To be fair, you contributed strongly to this side-tracking. By waving
a big red flag:

  "This poor design precludes RnRS specifying read syntax for any
   EOF object [...]"

you yourself drew attention to the underlying issues of the design
instead of keeping things focused to the documentation.

I agree that the doc could improve in this case...

Cheers
-- tomás

[-- Attachment #2: Digital signature --]
[-- Type: application/pgp-signature, Size: 198 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2019-11-27 12:34 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-11-27  7:44 bug#38398: non-obvious SCM_EOF_VAL rationale Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language
2019-11-27  8:55 ` John Cowan
2019-11-27 12:05   ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language
2019-11-27 12:34     ` tomas

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).