* bug#38398: non-obvious SCM_EOF_VAL rationale @ 2019-11-27 7:44 Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language 2019-11-27 8:55 ` John Cowan 0 siblings, 1 reply; 4+ messages in thread From: Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language @ 2019-11-27 7:44 UTC (permalink / raw) To: 38398 The part of the Guile manual on the representation of immediate objects says: # -- Macro: SCM SCM_EOF_VAL # The Scheme end-of-file value. It has no standard written # representation, for obvious reasons. I disagree with the manual: the reasons for the EOF value having no s-expression representation are not at all obvious. It's fairly obvious that it's a value that can't be returned by read-char, and therefore is not itself a character, but that's quite a different matter. The lack of s-expression representation actually comes from the entirely unobvious, and undocumented in Guile, use of the EOF value with the read function. In the RnRS series, the concept of an EOF object appears in R2RS, and remains essentially unchanged from there. (The only difference is that R6RS specifies that there is one EOF object, whereas all others allow for multiple EOF objects.) They all specify that if the read function encounters EOF then it will return an EOF object, and in order to support that usage they also specify that EOF objects can never be returned by read. This poor design precludes RnRS specifying read syntax for any EOF object. The relationship here is fairly obvious, but only once one is aware of this rather surprising use of EOF objects by read. The situation in Guile is more muddied. Because Guile supports the "#." syntax for read-time evaluation, it actually *is* possible for the read function to return an EOF object without having reached EOF: $ echo '#.(eof-object)' | guile-2.2 -c '(fluid-set! read-eval? #t) (use-modules (rnrs io simple)) (write (read)) (newline)' #<eof> This is technically a violation of RnRS, but I have no complaint about breaking such an onerous rule in these circumstances where it's necessitated only by such a poor design decision. Anyway, it means that the RnRS rationale for having no s-expression representation for the EOF object *doesn't apply* to Guile. There's also precedent, in "#nil", for Guile extending read syntax beyond RnRS for immediate objects. So it seems to me that you are quite free to invent some readable syntax such as "#eof" for the EOF object. So, to resolve this, firstly you should add to the documentation of the read function some text about its behaviour on EOF (on which it is currently silent). Perhaps also add some text about the ambiguity of read returning the EOF object. Then you should remove the ", for obvious reasons" part of the SCM_EOF_VAL documentation. After that you have a choice. You could leave the lack of s-expression representation unexplained. Alternatively you could attempt an actual explanation, which in the minimal form would be "so that without the use of the non-standard read-time-evaluation facility it can't be returned by the read function in non-end-of-file situations, which would cause an ambiguity". For Guile 2.4 you could instead add a read syntax for it and document that. -zefram ^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#38398: non-obvious SCM_EOF_VAL rationale 2019-11-27 7:44 bug#38398: non-obvious SCM_EOF_VAL rationale Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language @ 2019-11-27 8:55 ` John Cowan 2019-11-27 12:05 ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language 0 siblings, 1 reply; 4+ messages in thread From: John Cowan @ 2019-11-27 8:55 UTC (permalink / raw) To: Zefram; +Cc: 38398 [-- Attachment #1: Type: text/plain, Size: 2452 bytes --] On Wed, Nov 27, 2019 at 2:45 AM Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language <bug-guile@gnu.org> wrote: > It's fairly obvious > that it's a value that can't be returned by read-char, and therefore is > not itself a character, but that's quite a different matter. On the contrary: the EOF object is not a character, but it *can* be returned by read-char . Indeed it *is* returned by read-char just in case read-char is called after the last character of its input port has been read. This makes it possible to distinguish between two cases: read-char returns a character if there are any in the input port, and the EOF object if there are none. By the same token, read can return either a datum value or an EOF object. It returns a datum value if the remaining characters in its input port constitute at least one datum (what R6RS calls an "external representation") or the EOF object if no characters are available, and raises an exception if the available characters do not constitute a datum. An input port containing just "(", for example, will not return an EOF object; it will raise an exception. > The lack of > s-expression representation actually comes from the entirely unobvious, > and undocumented in Guile, use of the EOF value with the read function. > It's true that section 6.18.2 of the Guile 2.2.x manual is rather terse and does not document this behavior. However, section 4.1 says that Guile is fully compliant with R5RS. This means that it incorporates by reference the R5RS specification, and in particular section 6.6.2, which restates at greater length the rules I have given above. The definition of read in R6RS defers to the definition of get-datum (both are in library section 8.2.9), which is yet another restatement of the same rules. > This poor design precludes RnRS specifying read syntax for any > EOF object. Why do you believe it to be a poor design? It seems quite appropriate to me for the EOF object not to be a datum value, for the same reason that it should not be a character. You nowhere state what purpose such a read syntax would serve. Do you wish to be able to use read to input a list of EOF objects, for instance? What would you do with them? John Cowan http://vrici.lojban.org/~cowan cowan@ccil.org Pour moi, les villes du Silmarillion ont plus de realite que Babylone. --Christopher Tolkien, as interviewed by Le Monde [-- Attachment #2: Type: text/html, Size: 3394 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#38398: non-obvious SCM_EOF_VAL rationale 2019-11-27 8:55 ` John Cowan @ 2019-11-27 12:05 ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language 2019-11-27 12:34 ` tomas 0 siblings, 1 reply; 4+ messages in thread From: Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language @ 2019-11-27 12:05 UTC (permalink / raw) To: John Cowan; +Cc: 38398 John Cowan wrote: >On the contrary: the EOF object is not a character, but it *can* be >returned by read-char . Bother. Of course I meant "can't be returned by read-char in a non-EOF situation". I was alluding precisely to it being distinguishable from characters for the purposes of that return convention. > However, section 4.1 says that Guile is >fully compliant with R5RS. And yet, as I noted, it's actually non-compliant, in a way that's directly relevant to this issue. >Why do you believe it to be a poor design? Because it makes it impossible to distinguish between reaching EOF and reading a value that is otherwise a perfectly good one. Or, from the other point of view, because it requires that read syntax be crippled specifically to prevent this one value ever being a genuine result of reading. read-char is free to use a distinguished return value for EOF because the things it can read in a non-EOF situation form an obviously-constrained subset of values. The nature of the read function, however, is that it can read basically any value, so there is no obvious place for a distinguished value for EOF. Although the RnRS read syntax doesn't cover absolutely all values, when extending the read syntax it's quite easy, even unintentionally, to make it capable of reading types of object that RnRS doesn't imagine being readable. Indeed, not only does Guile have the occasionally-useful "#.", which makes absolutely all values readable, it's also got the read-hash-extend system, which invites casual extension, and does nothing to prevent user extensions returning the EOF object. So it makes much more sense to embrace the ability of read to read any value whatsoever, and to use some other mechanism to signal EOF. Common Lisp, for example, which has "#." as standard, specifies that read is to signal an error by default if it's at EOF. > It seems quite appropriate to >me for the EOF object not to be a datum value, for the same reason that it >should not be a character. You nowhere state what purpose such a read >syntax would serve. You're making a bit of a leap here, if there's meant to be some causal connection between these two sentences. By "such a read syntax" you seem to be referring to my "#eof" suggestion, but the case against the RnRS design of read doesn't depend at all on whether there's a read syntax specifically for that object. The use of a distinguished EOF return value from read, and the consequent rationale for not having a specific read syntax for the EOF object, is founded on the idea that read can't return the EOF object *at all* in a non-EOF situation. This is undermined for Guile by the already-existing "#." and read-hash-extend, without any need to invent new syntax. To answer the second sentence in isolation: it would serve about the same use as "#nil", making it easier to reference this useful object, and extending the scope within which write-read round-tripping works. I don't have strong feelings about having a specific read syntax, it's just that this kind of distinguished object usually does have specific syntax ("()", "#t", "#nil"). However, not every other object like this has a read syntax; Guile's `unspecified' value is another one that doesn't. (Tangent: the unspecified value could equally well do with a read syntax, but through testing with "#.*unspecified*" I note that at present weird behaviour results from actually reading it.) > Do you wish to be able to use read to input a list of >EOF objects, for instance? What would you do with them? In code, I can imagine using a quoted EOF object in order to return it from a function that's following something like read-char's return convention, or to pass it to a function that expects values following a similar convention. Also to pass it to something like memq, for the purposes of testing a value that could be the EOF object. (A quoted EOF object currently works in the interpreter but not in the compiler.) In data, I imagine the EOF object would appear because of much the same situations: it got returned from something like read-char, or it's going to be fed to something that expects to occasionally receive the EOF object. Stick them in a list? Sure, a list of values on its way from A to B could well include an EOF object. But please don't get sidetracked. This wasn't a feature request for "#eof"; that's just an idea that idly arose from consideration of the rationale in question. The issue that I'm seeking to get resolved is that the documentation says the reason for the EOF object having no specific read syntax is obvious, when in context it's really not. -zefram ^ permalink raw reply [flat|nested] 4+ messages in thread
* bug#38398: non-obvious SCM_EOF_VAL rationale 2019-11-27 12:05 ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language @ 2019-11-27 12:34 ` tomas 0 siblings, 0 replies; 4+ messages in thread From: tomas @ 2019-11-27 12:34 UTC (permalink / raw) To: 38398 [-- Attachment #1: Type: text/plain, Size: 602 bytes --] On Wed, Nov 27, 2019 at 12:05:34PM +0000, Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language wrote: [...] > But please don't get sidetracked. This wasn't a feature request for > "#eof" [...] To be fair, you contributed strongly to this side-tracking. By waving a big red flag: "This poor design precludes RnRS specifying read syntax for any EOF object [...]" you yourself drew attention to the underlying issues of the design instead of keeping things focused to the documentation. I agree that the doc could improve in this case... Cheers -- tomás [-- Attachment #2: Digital signature --] [-- Type: application/pgp-signature, Size: 198 bytes --] ^ permalink raw reply [flat|nested] 4+ messages in thread
end of thread, other threads:[~2019-11-27 12:34 UTC | newest] Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2019-11-27 7:44 bug#38398: non-obvious SCM_EOF_VAL rationale Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language 2019-11-27 8:55 ` John Cowan 2019-11-27 12:05 ` Zefram via Bug reports for GUILE, GNU's Ubiquitous Extension Language 2019-11-27 12:34 ` tomas
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).