Hi,

On Sunday, February 26, 2023 6:02:04 AM EST Marc Nieper-Wißkirchen wrote:
> Am So., 26. Feb. 2023 um 08:46 Uhr schrieb <guile-devel-request@gnu.org>:
> > Message: 1
> > Date: Sun, 26 Feb 2023 02:45:12 -0500
> > From: "Philip McGrath" <philip@philipmcgrath.com>
> > To: "Maxime Devos" <maximedevos@telenet.be>, Ludovic Courtès
> > 
> >         <ludo@gnu.org>, "Matt Wette" <matt.wette@gmail.com>,
> >         guile-devel@gnu.org
> > 
> > Cc: "Christine Lemmer-Webber" <cwebber@dustycloud.org>
> > Subject: Re: [PATCH] add language/wisp to Guile?
> > Message-ID: <981b0e74-96c0-4430-b693-7fc8026e3ead@app.fastmail.com>
> > Content-Type: text/plain;charset=utf-8
> 
> [...]
> 
> I would like to make two remarks, which I think are essential to get
> the semantics right.
> 
> The R6RS comments of the form "#!r6rs" are defined to modify the
> lexical syntax of the reader; possibly, they don't change the language
> semantics (after reading).  In particular, "#!r6rs" also applies to
> data files but does not affect the interpretation of the data after it
> is read. It cannot because the reader otherwise ignores and does not
> report comments.
> 
> Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it
> is not a substitute for Racket's "#lang" (or a similar mechanism).
> Guile shouldn't confuse these two different levels of meaning.
> 

I agree that it's important to distinguish between lexical syntax (`read`) and 
the semantics of what is read.

However, Racket's `#lang` in fact operates entirely at the level of `read`. 
(Racketeers contribute to confusion on this point by using `#lang` as a 
shorthand for Racket's entire language-creation infrastructure, when in fact 
`#lang` specifically has a fairly small, though important, role.) When `read` 
encounters `#lang something`, it looks up a reader extension procedure in the 
module indicated by `something` and uses that procedure to continue parsing 
the input stream into data. Importantly, while syntax objects may be used to 
attach source location information, there is no "lexical context" or binding 
information at this stage, as one familiar with syntax objects from macro 
writing might expect: those semantics come after `read` has finished parsing 
the input stream from bytes to values.

Nonetheless, because the reader extension procedure has complete control over 
what data is read, a `#lang` language's reader can produce data with the 
semantics the language's author intends.

Before I go on to give some examples, I would note that of course R6RS gives 
implementations great freedom in how they use `#!implementation-specific` to 
extend the lexical syntax, and some ways that some implementations use it 
would be awkward at best to express by treating `#!` as an alternate spelling 
for `#lang `: Chez Scheme's `#!base-rtd` and `#!bwp` come to mind in that 
category. 

Now some examples, where `with-module-reading-parameterization` configures 
`read` to the right state for reading a module. (When reading generic data, 
reader extension procedures are turned off by default.)

```
philip@bastet:~$ racket 
Welcome to Racket v8.8 [cs].
-> (require syntax/modread)
-> (define (read-module str)              
     (with-module-reading-parameterization
       (lambda ()                         
         (read (open-input-string str)))))
-> (read-module "#lang racket (eqv? 1.5 (/ 3 2))")
'(module anonymous-module racket (#%module-begin (eqv? 1.5 (/ 3 2))))
-> (read-module "#lang exact-decimal racket (eqv? 1.5 (/ 3 2))")
'(module anonymous-module racket (#%module-begin (eqv? 3/2 (/ 3 2))))
```

Here the `exact-decimal` meta-language parameterizes the call to the base 
language's reader, changing lexical syntax like "1.5" to be read as an exact 
number instead of an inexact number. Chez Scheme's `#!fold-case` and `#!no-
fold-case` directives are somewhat similar in spirit.

Most Racket `#lang` languages are more like this example:

```
-> (read-module "#lang racket (lambda (x) (+ 1 x))")
'(module anonymous-module racket (#%module-begin (lambda (x) (+ 1 x))))
-> (read-module "#lang typed/racket (lambda (x) (+ 1 x))")
'(module anonymous-module typed/racket (#%module-begin (lambda (x) (+ 1 x))))
```

Here, the lexical syntax is the same: the only difference is whether the 
resulting `module` form has `racket` or `typed/racket` as its "initial import" 
or "language". However, the semantics are starkly different: the `#lang racket` 
example is a perfectly good program, but attempting to macro-expand the `#lang 
typed/racket` program fails with the following error:

```
Type Checker: type mismatch
  expected: Number
  given: Any in: x
```

Much of the power of Racket's language infrastructure comes from the each 
`module` form specifying its "language"/"initial import", which dates from 
"Composable and Compilable Macros: You Want it When?" (ICFP 2002) [1], long 
before the addition of `#lang`. You can write a `module` form explicitly, 
without using a `#lang`-based reader extension.

Doing this sort of thing is less elegant and convenient with R6RS `library` 
forms, but it's certainly possible to produce an R6RS library form where 
`lambda` and `+` have semantics different than the versions exported from the 
`(rnrs)` library.

For a variation on the theme, consider:

```
-> (read-module "#lang web-server
    (require racket/serialize)
    (serialize (λ (x) x))")
'(module anonymous-module web-server/lang
   (#%module-begin (require racket/serialize) (serialize (λ (x) x))))
```

First, note that the resulting datum refers to `web-server/lang`, not `web-
server`: in this particular case, the mismatch is a historical accident that 
confused me once (so I remember it), but it does illustrate how `#lang` 
maintains the distinction between the reader and the expander.

The example also illustrates semantic change. The `#%module-begin` form from 
the web server language performs a whole-module transformation on its subforms 
to make continuations serializable, thus producing "Automatically RESTful Web 
Applications" [2]. Likewise, it's `lambda` form, unlike the version from 
`racket/base`, creates serializable closures.

Finally, here's a more dramatic example:

```
-> (define (read-module* str)
     (with-module-reading-parameterization
       (lambda ()
         ;; avoid a bug
         (syntax->datum (read-syntax #f (open-input-string str))))))
-> (read-module* "#lang datalog
   parent(creusa, ascanius).  
   parent(anchises, aeneas).
   parent(aeneas, ascanius).
   parent(hecuba, creusa).  
   parent(priam, creusa). 
   ancestor(A, B) :- parent(A, B).
   ancestor(A, B) :- parent(A, C), ancestor(C, B).
   ancestor(A, ascanius)?")
'(module anonymous-module datalog/sexp/lang
   (#%module-begin
    (! (parent creusa ascanius))
    (! (parent anchises aeneas))
    (! (parent aeneas ascanius))
    (! (parent hecuba creusa))
    (! (parent priam creusa))
    (! (:- (ancestor A B) (parent A B)))
    (! (:- (ancestor A B) (parent A C) (ancestor C B)))
    (? (ancestor A ascanius))))
```

Even when `#lang` is working at the fullest extent of its powers, its job is 
merely to `read` the input byte stream and produce an S-expression.

> The second comment concerns the shebang line in R6RS scripts (as
> described in the non-normative appendices).  The shebang line is not a
> comment in the R6RS lexical syntax; it does not even reach the reader
> - at least, conceptionally.  The Scheme reader only sees the lines
> following the shebang line.
> 
> For example, a conforming R6RS implementation must raise an exception
> when trying to read (using get-datum, for example) a file that begins
> with a shebang line.
> 
> Thus, the shebang line doesn't need to be considered when discussing
> comment formats in lexical syntax.
> 

This is a very persuasive account of the R6RS appendices. I just find the 
approach somewhat unsatisfying. An R6RS implementation with script support 
must have a procedure `not-quite-read` that handles a potential shebang line 
before calling `read`. I wish this `not-quite-read` procedure were made 
available from some Scheme library (and perhaps somewhat more explicitly 
specified), and I'd probably find it most beautiful for this `not-quite-read` to 
be unified with `read`. But that's not really relevant per se.

> 
> Best,
> 
> Marc

Thank you for these thought-provoking remarks!

Philip

[1]: https://www-old.cs.utah.edu/plt/publications/macromod.pdf
[2]: https://jeapostrophe.github.io/home/static/icfp065-mccarthy.pdf