Date: Sun, 26 Feb 2023 18:22:39 -0500 Message-ID: <3517394.V25eIC5XRa@bastet> MIME-Version: 1.0 Hi, On Sunday, February 26, 2023 6:02:04 AM EST Marc Nieper-Wi=C3=9Fkirchen wro= te: > Am So., 26. Feb. 2023 um 08:46 Uhr schrieb <>: > > Message: 1 > > Date: Sun, 26 Feb 2023 02:45:12 -0500 > > From: "Philip McGrath" <> > > To: "Maxime Devos" <>, Ludovic Court=C3=A8s > >=20 > > <>, "Matt Wette" <>, > > > >=20 > > Cc: "Christine Lemmer-Webber" <> > > Subject: Re: [PATCH] add language/wisp to Guile? > > Message-ID: <> > > Content-Type: text/plain;charset=3Dutf-8 >=20 > [...] >=20 > I would like to make two remarks, which I think are essential to get > the semantics right. >=20 > The R6RS comments of the form "#!r6rs" are defined to modify the > lexical syntax of the reader; possibly, they don't change the language > semantics (after reading). In particular, "#!r6rs" also applies to > data files but does not affect the interpretation of the data after it > is read. It cannot because the reader otherwise ignores and does not > report comments. >=20 > Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it > is not a substitute for Racket's "#lang" (or a similar mechanism). > Guile shouldn't confuse these two different levels of meaning. >=20 I agree that it's important to distinguish between lexical syntax (`read`) = and=20 the semantics of what is read. However, Racket's `#lang` in fact operates entirely at the level of `read`.= =20 (Racketeers contribute to confusion on this point by using `#lang` as a=20 shorthand for Racket's entire language-creation infrastructure, when in fac= t=20 `#lang` specifically has a fairly small, though important, role.) When `rea= d`=20 encounters `#lang something`, it looks up a reader extension procedure in t= he=20 module indicated by `something` and uses that procedure to continue parsing= =20 the input stream into data. Importantly, while syntax objects may be used t= o=20 attach source location information, there is no "lexical context" or bindin= g=20 information at this stage, as one familiar with syntax objects from macro=20 writing might expect: those semantics come after `read` has finished parsin= g=20 the input stream from bytes to values. Nonetheless, because the reader extension procedure has complete control ov= er=20 what data is read, a `#lang` language's reader can produce data with the=20 semantics the language's author intends. Before I go on to give some examples, I would note that of course R6RS give= s=20 implementations great freedom in how they use `#!implementation-specific` t= o=20 extend the lexical syntax, and some ways that some implementations use it=20 would be awkward at best to express by treating `#!` as an alternate spelli= ng=20 for `#lang `: Chez Scheme's `#!base-rtd` and `#!bwp` come to mind in that=20 category.=20 Now some examples, where `with-module-reading-parameterization` configures= =20 `read` to the right state for reading a module. (When reading generic data,= =20 reader extension procedures are turned off by default.) ``` philip@bastet:~$ racket=20 Welcome to Racket v8.8 [cs]. =2D> (require syntax/modread) =2D> (define (read-module str) =20 (with-module-reading-parameterization (lambda () =20 (read (open-input-string str))))) =2D> (read-module "#lang racket (eqv? 1.5 (/ 3 2))") '(module anonymous-module racket (#%module-begin (eqv? 1.5 (/ 3 2)))) =2D> (read-module "#lang exact-decimal racket (eqv? 1.5 (/ 3 2))") '(module anonymous-module racket (#%module-begin (eqv? 3/2 (/ 3 2)))) ``` Here the `exact-decimal` meta-language parameterizes the call to the base=20 language's reader, changing lexical syntax like "1.5" to be read as an exac= t=20 number instead of an inexact number. Chez Scheme's `#!fold-case` and `#!no- fold-case` directives are somewhat similar in spirit. Most Racket `#lang` languages are more like this example: ``` =2D> (read-module "#lang racket (lambda (x) (+ 1 x))") '(module anonymous-module racket (#%module-begin (lambda (x) (+ 1 x)))) =2D> (read-module "#lang typed/racket (lambda (x) (+ 1 x))") '(module anonymous-module typed/racket (#%module-begin (lambda (x) (+ 1 x))= )) ``` Here, the lexical syntax is the same: the only difference is whether the=20 resulting `module` form has `racket` or `typed/racket` as its "initial impo= rt"=20 or "language". However, the semantics are starkly different: the `#lang rac= ket`=20 example is a perfectly good program, but attempting to macro-expand the `#l= ang=20 typed/racket` program fails with the following error: ``` Type Checker: type mismatch expected: Number given: Any in: x ``` Much of the power of Racket's language infrastructure comes from the each=20 `module` form specifying its "language"/"initial import", which dates from= =20 "Composable and Compilable Macros: You Want it When?" (ICFP 2002) [1], long= =20 before the addition of `#lang`. You can write a `module` form explicitly,=20 without using a `#lang`-based reader extension. Doing this sort of thing is less elegant and convenient with R6RS `library`= =20 forms, but it's certainly possible to produce an R6RS library form where=20 `lambda` and `+` have semantics different than the versions exported from t= he=20 `(rnrs)` library. =46or a variation on the theme, consider: ``` =2D> (read-module "#lang web-server (require racket/serialize) (serialize (=CE=BB (x) x))") '(module anonymous-module web-server/lang (#%module-begin (require racket/serialize) (serialize (=CE=BB (x) x)))) ``` =46irst, note that the resulting datum refers to `web-server/lang`, not `we= b- server`: in this particular case, the mismatch is a historical accident tha= t=20 confused me once (so I remember it), but it does illustrate how `#lang`=20 maintains the distinction between the reader and the expander. The example also illustrates semantic change. The `#%module-begin` form fro= m=20 the web server language performs a whole-module transformation on its subfo= rms=20 to make continuations serializable, thus producing "Automatically RESTful W= eb=20 Applications" [2]. Likewise, it's `lambda` form, unlike the version from=20 `racket/base`, creates serializable closures. =46inally, here's a more dramatic example: ``` =2D> (define (read-module* str) (with-module-reading-parameterization (lambda () ;; avoid a bug (syntax->datum (read-syntax #f (open-input-string str)))))) =2D> (read-module* "#lang datalog parent(creusa, ascanius). =20 parent(anchises, aeneas). parent(aeneas, ascanius). parent(hecuba, creusa). =20 parent(priam, creusa).=20 ancestor(A, B) :- parent(A, B). ancestor(A, B) :- parent(A, C), ancestor(C, B). ancestor(A, ascanius)?") '(module anonymous-module datalog/sexp/lang (#%module-begin (! (parent creusa ascanius)) (! (parent anchises aeneas)) (! (parent aeneas ascanius)) (! (parent hecuba creusa)) (! (parent priam creusa)) (! (:- (ancestor A B) (parent A B))) (! (:- (ancestor A B) (parent A C) (ancestor C B))) (? (ancestor A ascanius)))) ``` Even when `#lang` is working at the fullest extent of its powers, its job i= s=20 merely to `read` the input byte stream and produce an S-expression. > The second comment concerns the shebang line in R6RS scripts (as > described in the non-normative appendices). The shebang line is not a > comment in the R6RS lexical syntax; it does not even reach the reader > - at least, conceptionally. The Scheme reader only sees the lines > following the shebang line. >=20 > For example, a conforming R6RS implementation must raise an exception > when trying to read (using get-datum, for example) a file that begins > with a shebang line. >=20 > Thus, the shebang line doesn't need to be considered when discussing > comment formats in lexical syntax. >=20 This is a very persuasive account of the R6RS appendices. I just find the=20 approach somewhat unsatisfying. An R6RS implementation with script support= =20 must have a procedure `not-quite-read` that handles a potential shebang lin= e=20 before calling `read`. I wish this `not-quite-read` procedure were made=20 available from some Scheme library (and perhaps somewhat more explicitly=20 specified), and I'd probably find it most beautiful for this `not-quite-rea= d` to=20 be unified with `read`. But that's not really relevant per se. >=20 > Best, >=20 > Marc Thank you for these thought-provoking remarks! Philip [1]: [2]: --nextPart3575643.LM0AJKV5NW Content-Type: application/pgp-signature; name="signature.asc" Content-Description: This is a digitally signed message part. 