Hi, On Sunday, February 26, 2023 6:02:04 AM EST Marc Nieper-Wißkirchen wrote: > Am So., 26. Feb. 2023 um 08:46 Uhr schrieb : > > Message: 1 > > Date: Sun, 26 Feb 2023 02:45:12 -0500 > > From: "Philip McGrath" > > To: "Maxime Devos" , Ludovic Courtès > > > > , "Matt Wette" , > > guile-devel@gnu.org > > > > Cc: "Christine Lemmer-Webber" > > Subject: Re: [PATCH] add language/wisp to Guile? > > Message-ID: <981b0e74-96c0-4430-b693-7fc8026e3ead@app.fastmail.com> > > Content-Type: text/plain;charset=utf-8 > > [...] > > I would like to make two remarks, which I think are essential to get > the semantics right. > > The R6RS comments of the form "#!r6rs" are defined to modify the > lexical syntax of the reader; possibly, they don't change the language > semantics (after reading). In particular, "#!r6rs" also applies to > data files but does not affect the interpretation of the data after it > is read. It cannot because the reader otherwise ignores and does not > report comments. > > Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it > is not a substitute for Racket's "#lang" (or a similar mechanism). > Guile shouldn't confuse these two different levels of meaning. > I agree that it's important to distinguish between lexical syntax (`read`) and the semantics of what is read. However, Racket's `#lang` in fact operates entirely at the level of `read`. (Racketeers contribute to confusion on this point by using `#lang` as a shorthand for Racket's entire language-creation infrastructure, when in fact `#lang` specifically has a fairly small, though important, role.) When `read` encounters `#lang something`, it looks up a reader extension procedure in the module indicated by `something` and uses that procedure to continue parsing the input stream into data. Importantly, while syntax objects may be used to attach source location information, there is no "lexical context" or binding information at this stage, as one familiar with syntax objects from macro writing might expect: those semantics come after `read` has finished parsing the input stream from bytes to values. Nonetheless, because the reader extension procedure has complete control over what data is read, a `#lang` language's reader can produce data with the semantics the language's author intends. Before I go on to give some examples, I would note that of course R6RS gives implementations great freedom in how they use `#!implementation-specific` to extend the lexical syntax, and some ways that some implementations use it would be awkward at best to express by treating `#!` as an alternate spelling for `#lang `: Chez Scheme's `#!base-rtd` and `#!bwp` come to mind in that category. Now some examples, where `with-module-reading-parameterization` configures `read` to the right state for reading a module. (When reading generic data, reader extension procedures are turned off by default.) ``` philip@bastet:~$ racket Welcome to Racket v8.8 [cs]. -> (require syntax/modread) -> (define (read-module str) (with-module-reading-parameterization (lambda () (read (open-input-string str))))) -> (read-module "#lang racket (eqv? 1.5 (/ 3 2))") '(module anonymous-module racket (#%module-begin (eqv? 1.5 (/ 3 2)))) -> (read-module "#lang exact-decimal racket (eqv? 1.5 (/ 3 2))") '(module anonymous-module racket (#%module-begin (eqv? 3/2 (/ 3 2)))) ``` Here the `exact-decimal` meta-language parameterizes the call to the base language's reader, changing lexical syntax like "1.5" to be read as an exact number instead of an inexact number. Chez Scheme's `#!fold-case` and `#!no- fold-case` directives are somewhat similar in spirit. Most Racket `#lang` languages are more like this example: ``` -> (read-module "#lang racket (lambda (x) (+ 1 x))") '(module anonymous-module racket (#%module-begin (lambda (x) (+ 1 x)))) -> (read-module "#lang typed/racket (lambda (x) (+ 1 x))") '(module anonymous-module typed/racket (#%module-begin (lambda (x) (+ 1 x)))) ``` Here, the lexical syntax is the same: the only difference is whether the resulting `module` form has `racket` or `typed/racket` as its "initial import" or "language". However, the semantics are starkly different: the `#lang racket` example is a perfectly good program, but attempting to macro-expand the `#lang typed/racket` program fails with the following error: ``` Type Checker: type mismatch expected: Number given: Any in: x ``` Much of the power of Racket's language infrastructure comes from the each `module` form specifying its "language"/"initial import", which dates from "Composable and Compilable Macros: You Want it When?" (ICFP 2002) [1], long before the addition of `#lang`. You can write a `module` form explicitly, without using a `#lang`-based reader extension. Doing this sort of thing is less elegant and convenient with R6RS `library` forms, but it's certainly possible to produce an R6RS library form where `lambda` and `+` have semantics different than the versions exported from the `(rnrs)` library. For a variation on the theme, consider: ``` -> (read-module "#lang web-server (require racket/serialize) (serialize (λ (x) x))") '(module anonymous-module web-server/lang (#%module-begin (require racket/serialize) (serialize (λ (x) x)))) ``` First, note that the resulting datum refers to `web-server/lang`, not `web- server`: in this particular case, the mismatch is a historical accident that confused me once (so I remember it), but it does illustrate how `#lang` maintains the distinction between the reader and the expander. The example also illustrates semantic change. The `#%module-begin` form from the web server language performs a whole-module transformation on its subforms to make continuations serializable, thus producing "Automatically RESTful Web Applications" [2]. Likewise, it's `lambda` form, unlike the version from `racket/base`, creates serializable closures. Finally, here's a more dramatic example: ``` -> (define (read-module* str) (with-module-reading-parameterization (lambda () ;; avoid a bug (syntax->datum (read-syntax #f (open-input-string str)))))) -> (read-module* "#lang datalog parent(creusa, ascanius). parent(anchises, aeneas). parent(aeneas, ascanius). parent(hecuba, creusa). parent(priam, creusa). ancestor(A, B) :- parent(A, B). ancestor(A, B) :- parent(A, C), ancestor(C, B). ancestor(A, ascanius)?") '(module anonymous-module datalog/sexp/lang (#%module-begin (! (parent creusa ascanius)) (! (parent anchises aeneas)) (! (parent aeneas ascanius)) (! (parent hecuba creusa)) (! (parent priam creusa)) (! (:- (ancestor A B) (parent A B))) (! (:- (ancestor A B) (parent A C) (ancestor C B))) (? (ancestor A ascanius)))) ``` Even when `#lang` is working at the fullest extent of its powers, its job is merely to `read` the input byte stream and produce an S-expression. > The second comment concerns the shebang line in R6RS scripts (as > described in the non-normative appendices). The shebang line is not a > comment in the R6RS lexical syntax; it does not even reach the reader > - at least, conceptionally. The Scheme reader only sees the lines > following the shebang line. > > For example, a conforming R6RS implementation must raise an exception > when trying to read (using get-datum, for example) a file that begins > with a shebang line. > > Thus, the shebang line doesn't need to be considered when discussing > comment formats in lexical syntax. > This is a very persuasive account of the R6RS appendices. I just find the approach somewhat unsatisfying. An R6RS implementation with script support must have a procedure `not-quite-read` that handles a potential shebang line before calling `read`. I wish this `not-quite-read` procedure were made available from some Scheme library (and perhaps somewhat more explicitly specified), and I'd probably find it most beautiful for this `not-quite-read` to be unified with `read`. But that's not really relevant per se. > > Best, > > Marc Thank you for these thought-provoking remarks! Philip [1]: https://www-old.cs.utah.edu/plt/publications/macromod.pdf [2]: https://jeapostrophe.github.io/home/static/icfp065-mccarthy.pdf