* Re: [PATCH] add language/wisp to Guile? [not found] <mailman.886.1677397547.13386.guile-devel@gnu.org> @ 2023-02-26 11:02 ` Marc Nieper-Wißkirchen 2023-02-26 23:22 ` Philip McGrath 0 siblings, 1 reply; 56+ messages in thread From: Marc Nieper-Wißkirchen @ 2023-02-26 11:02 UTC (permalink / raw) To: guile-devel Am So., 26. Feb. 2023 um 08:46 Uhr schrieb <guile-devel-request@gnu.org>: > Message: 1 > Date: Sun, 26 Feb 2023 02:45:12 -0500 > From: "Philip McGrath" <philip@philipmcgrath.com> > To: "Maxime Devos" <maximedevos@telenet.be>, Ludovic Courtès > <ludo@gnu.org>, "Matt Wette" <matt.wette@gmail.com>, > guile-devel@gnu.org > Cc: "Christine Lemmer-Webber" <cwebber@dustycloud.org> > Subject: Re: [PATCH] add language/wisp to Guile? > Message-ID: <981b0e74-96c0-4430-b693-7fc8026e3ead@app.fastmail.com> > Content-Type: text/plain;charset=utf-8 [...] I would like to make two remarks, which I think are essential to get the semantics right. The R6RS comments of the form "#!r6rs" are defined to modify the lexical syntax of the reader; possibly, they don't change the language semantics (after reading). In particular, "#!r6rs" also applies to data files but does not affect the interpretation of the data after it is read. It cannot because the reader otherwise ignores and does not report comments. Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it is not a substitute for Racket's "#lang" (or a similar mechanism). Guile shouldn't confuse these two different levels of meaning. The second comment concerns the shebang line in R6RS scripts (as described in the non-normative appendices). The shebang line is not a comment in the R6RS lexical syntax; it does not even reach the reader - at least, conceptionally. The Scheme reader only sees the lines following the shebang line. For example, a conforming R6RS implementation must raise an exception when trying to read (using get-datum, for example) a file that begins with a shebang line. Thus, the shebang line doesn't need to be considered when discussing comment formats in lexical syntax. This also shows how a mechanism like "#lang" can be made compatible with R6RS, namely similar to scripts starting with a shebang line: Let us define a "multi-language script" whose first line may be a shebang and whose second line may be of the form "#lang ...". If "..." is, say, "r6rs", the following lines are then fed to the reader and interpreted as an R6RS top-level program. Best, Marc ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 11:02 ` [PATCH] add language/wisp to Guile? Marc Nieper-Wißkirchen @ 2023-02-26 23:22 ` Philip McGrath 2023-02-27 5:00 ` Dr. Arne Babenhauserheide 2023-02-27 7:26 ` Marc Nieper-Wißkirchen 0 siblings, 2 replies; 56+ messages in thread From: Philip McGrath @ 2023-02-26 23:22 UTC (permalink / raw) To: guile-devel; +Cc: Marc Nieper-Wißkirchen [-- Attachment #1: Type: text/plain, Size: 8726 bytes --] Hi, On Sunday, February 26, 2023 6:02:04 AM EST Marc Nieper-Wißkirchen wrote: > Am So., 26. Feb. 2023 um 08:46 Uhr schrieb <guile-devel-request@gnu.org>: > > Message: 1 > > Date: Sun, 26 Feb 2023 02:45:12 -0500 > > From: "Philip McGrath" <philip@philipmcgrath.com> > > To: "Maxime Devos" <maximedevos@telenet.be>, Ludovic Courtès > > > > <ludo@gnu.org>, "Matt Wette" <matt.wette@gmail.com>, > > guile-devel@gnu.org > > > > Cc: "Christine Lemmer-Webber" <cwebber@dustycloud.org> > > Subject: Re: [PATCH] add language/wisp to Guile? > > Message-ID: <981b0e74-96c0-4430-b693-7fc8026e3ead@app.fastmail.com> > > Content-Type: text/plain;charset=utf-8 > > [...] > > I would like to make two remarks, which I think are essential to get > the semantics right. > > The R6RS comments of the form "#!r6rs" are defined to modify the > lexical syntax of the reader; possibly, they don't change the language > semantics (after reading). In particular, "#!r6rs" also applies to > data files but does not affect the interpretation of the data after it > is read. It cannot because the reader otherwise ignores and does not > report comments. > > Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it > is not a substitute for Racket's "#lang" (or a similar mechanism). > Guile shouldn't confuse these two different levels of meaning. > I agree that it's important to distinguish between lexical syntax (`read`) and the semantics of what is read. However, Racket's `#lang` in fact operates entirely at the level of `read`. (Racketeers contribute to confusion on this point by using `#lang` as a shorthand for Racket's entire language-creation infrastructure, when in fact `#lang` specifically has a fairly small, though important, role.) When `read` encounters `#lang something`, it looks up a reader extension procedure in the module indicated by `something` and uses that procedure to continue parsing the input stream into data. Importantly, while syntax objects may be used to attach source location information, there is no "lexical context" or binding information at this stage, as one familiar with syntax objects from macro writing might expect: those semantics come after `read` has finished parsing the input stream from bytes to values. Nonetheless, because the reader extension procedure has complete control over what data is read, a `#lang` language's reader can produce data with the semantics the language's author intends. Before I go on to give some examples, I would note that of course R6RS gives implementations great freedom in how they use `#!implementation-specific` to extend the lexical syntax, and some ways that some implementations use it would be awkward at best to express by treating `#!` as an alternate spelling for `#lang `: Chez Scheme's `#!base-rtd` and `#!bwp` come to mind in that category. Now some examples, where `with-module-reading-parameterization` configures `read` to the right state for reading a module. (When reading generic data, reader extension procedures are turned off by default.) ``` philip@bastet:~$ racket Welcome to Racket v8.8 [cs]. -> (require syntax/modread) -> (define (read-module str) (with-module-reading-parameterization (lambda () (read (open-input-string str))))) -> (read-module "#lang racket (eqv? 1.5 (/ 3 2))") '(module anonymous-module racket (#%module-begin (eqv? 1.5 (/ 3 2)))) -> (read-module "#lang exact-decimal racket (eqv? 1.5 (/ 3 2))") '(module anonymous-module racket (#%module-begin (eqv? 3/2 (/ 3 2)))) ``` Here the `exact-decimal` meta-language parameterizes the call to the base language's reader, changing lexical syntax like "1.5" to be read as an exact number instead of an inexact number. Chez Scheme's `#!fold-case` and `#!no- fold-case` directives are somewhat similar in spirit. Most Racket `#lang` languages are more like this example: ``` -> (read-module "#lang racket (lambda (x) (+ 1 x))") '(module anonymous-module racket (#%module-begin (lambda (x) (+ 1 x)))) -> (read-module "#lang typed/racket (lambda (x) (+ 1 x))") '(module anonymous-module typed/racket (#%module-begin (lambda (x) (+ 1 x)))) ``` Here, the lexical syntax is the same: the only difference is whether the resulting `module` form has `racket` or `typed/racket` as its "initial import" or "language". However, the semantics are starkly different: the `#lang racket` example is a perfectly good program, but attempting to macro-expand the `#lang typed/racket` program fails with the following error: ``` Type Checker: type mismatch expected: Number given: Any in: x ``` Much of the power of Racket's language infrastructure comes from the each `module` form specifying its "language"/"initial import", which dates from "Composable and Compilable Macros: You Want it When?" (ICFP 2002) [1], long before the addition of `#lang`. You can write a `module` form explicitly, without using a `#lang`-based reader extension. Doing this sort of thing is less elegant and convenient with R6RS `library` forms, but it's certainly possible to produce an R6RS library form where `lambda` and `+` have semantics different than the versions exported from the `(rnrs)` library. For a variation on the theme, consider: ``` -> (read-module "#lang web-server (require racket/serialize) (serialize (λ (x) x))") '(module anonymous-module web-server/lang (#%module-begin (require racket/serialize) (serialize (λ (x) x)))) ``` First, note that the resulting datum refers to `web-server/lang`, not `web- server`: in this particular case, the mismatch is a historical accident that confused me once (so I remember it), but it does illustrate how `#lang` maintains the distinction between the reader and the expander. The example also illustrates semantic change. The `#%module-begin` form from the web server language performs a whole-module transformation on its subforms to make continuations serializable, thus producing "Automatically RESTful Web Applications" [2]. Likewise, it's `lambda` form, unlike the version from `racket/base`, creates serializable closures. Finally, here's a more dramatic example: ``` -> (define (read-module* str) (with-module-reading-parameterization (lambda () ;; avoid a bug (syntax->datum (read-syntax #f (open-input-string str)))))) -> (read-module* "#lang datalog parent(creusa, ascanius). parent(anchises, aeneas). parent(aeneas, ascanius). parent(hecuba, creusa). parent(priam, creusa). ancestor(A, B) :- parent(A, B). ancestor(A, B) :- parent(A, C), ancestor(C, B). ancestor(A, ascanius)?") '(module anonymous-module datalog/sexp/lang (#%module-begin (! (parent creusa ascanius)) (! (parent anchises aeneas)) (! (parent aeneas ascanius)) (! (parent hecuba creusa)) (! (parent priam creusa)) (! (:- (ancestor A B) (parent A B))) (! (:- (ancestor A B) (parent A C) (ancestor C B))) (? (ancestor A ascanius)))) ``` Even when `#lang` is working at the fullest extent of its powers, its job is merely to `read` the input byte stream and produce an S-expression. > The second comment concerns the shebang line in R6RS scripts (as > described in the non-normative appendices). The shebang line is not a > comment in the R6RS lexical syntax; it does not even reach the reader > - at least, conceptionally. The Scheme reader only sees the lines > following the shebang line. > > For example, a conforming R6RS implementation must raise an exception > when trying to read (using get-datum, for example) a file that begins > with a shebang line. > > Thus, the shebang line doesn't need to be considered when discussing > comment formats in lexical syntax. > This is a very persuasive account of the R6RS appendices. I just find the approach somewhat unsatisfying. An R6RS implementation with script support must have a procedure `not-quite-read` that handles a potential shebang line before calling `read`. I wish this `not-quite-read` procedure were made available from some Scheme library (and perhaps somewhat more explicitly specified), and I'd probably find it most beautiful for this `not-quite-read` to be unified with `read`. But that's not really relevant per se. > > Best, > > Marc Thank you for these thought-provoking remarks! Philip [1]: https://www-old.cs.utah.edu/plt/publications/macromod.pdf [2]: https://jeapostrophe.github.io/home/static/icfp065-mccarthy.pdf [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 23:22 ` Philip McGrath @ 2023-02-27 5:00 ` Dr. Arne Babenhauserheide 2023-02-27 7:26 ` Marc Nieper-Wißkirchen 1 sibling, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-27 5:00 UTC (permalink / raw) To: Philip McGrath; +Cc: Marc Nieper-Wißkirchen, guile-devel [-- Attachment #1: Type: text/plain, Size: 1611 bytes --] Philip McGrath <philip@philipmcgrath.com> writes: > On Sunday, February 26, 2023 6:02:04 AM EST Marc Nieper-Wißkirchen wrote: >> Am So., 26. Feb. 2023 um 08:46 Uhr schrieb <guile-devel-request@gnu.org>: >> I would like to make two remarks, which I think are essential to get >> the semantics right. >> >> The R6RS comments of the form "#!r6rs" are defined to modify the >> lexical syntax of the reader; possibly, they don't change the language >> semantics (after reading). In particular, "#!r6rs" also applies to >> data files but does not affect the interpretation of the data after it >> is read. It cannot because the reader otherwise ignores and does not >> report comments. >> >> Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it >> is not a substitute for Racket's "#lang" (or a similar mechanism). >> Guile shouldn't confuse these two different levels of meaning. > > I agree that it's important to distinguish between lexical syntax (`read`) and > the semantics of what is read. > > However, Racket's `#lang` in fact operates entirely at the level of > `read`. I would like to stress here that we’re talking about Guile where the infrastructure works differently, so "this is how Racket does it" isn’t that relevant to the question about the right way for Guile. Guile uses explicit language specifications with reader, compiler, decompiler, evaluator, printer and default environment. That is for what we need to find good solutions. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 23:22 ` Philip McGrath 2023-02-27 5:00 ` Dr. Arne Babenhauserheide @ 2023-02-27 7:26 ` Marc Nieper-Wißkirchen 2023-02-28 4:27 ` Philip McGrath 1 sibling, 1 reply; 56+ messages in thread From: Marc Nieper-Wißkirchen @ 2023-02-27 7:26 UTC (permalink / raw) To: Philip McGrath; +Cc: guile-devel Am Mo., 27. Feb. 2023 um 00:22 Uhr schrieb Philip McGrath <philip@philipmcgrath.com>: > > Hi, > > On Sunday, February 26, 2023 6:02:04 AM EST Marc Nieper-Wißkirchen wrote: > > Am So., 26. Feb. 2023 um 08:46 Uhr schrieb <guile-devel-request@gnu.org>: > > > Message: 1 > > > Date: Sun, 26 Feb 2023 02:45:12 -0500 > > > From: "Philip McGrath" <philip@philipmcgrath.com> > > > To: "Maxime Devos" <maximedevos@telenet.be>, Ludovic Courtès > > > > > > <ludo@gnu.org>, "Matt Wette" <matt.wette@gmail.com>, > > > guile-devel@gnu.org > > > > > > Cc: "Christine Lemmer-Webber" <cwebber@dustycloud.org> > > > Subject: Re: [PATCH] add language/wisp to Guile? > > > Message-ID: <981b0e74-96c0-4430-b693-7fc8026e3ead@app.fastmail.com> > > > Content-Type: text/plain;charset=utf-8 > > > > [...] > > > > I would like to make two remarks, which I think are essential to get > > the semantics right. > > > > The R6RS comments of the form "#!r6rs" are defined to modify the > > lexical syntax of the reader; possibly, they don't change the language > > semantics (after reading). In particular, "#!r6rs" also applies to > > data files but does not affect the interpretation of the data after it > > is read. It cannot because the reader otherwise ignores and does not > > report comments. > > > > Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it > > is not a substitute for Racket's "#lang" (or a similar mechanism). > > Guile shouldn't confuse these two different levels of meaning. > > > > I agree that it's important to distinguish between lexical syntax (`read`) and > the semantics of what is read. > > However, Racket's `#lang` in fact operates entirely at the level of `read`. > (Racketeers contribute to confusion on this point by using `#lang` as a > shorthand for Racket's entire language-creation infrastructure, when in fact > `#lang` specifically has a fairly small, though important, role.) When `read` > encounters `#lang something`, it looks up a reader extension procedure in the > module indicated by `something` and uses that procedure to continue parsing > the input stream into data. Importantly, while syntax objects may be used to > attach source location information, there is no "lexical context" or binding > information at this stage, as one familiar with syntax objects from macro > writing might expect: those semantics come after `read` has finished parsing > the input stream from bytes to values. [...] Thank you for the reminder on Racket's #lang mechanism; it is a long time ago since I wrote some #lang extensions myself when experimenting with Racket. Nevertheless, I am not sure whether it is relevant to the point I tried to make. The "#!r6rs" does not indicate a particular language (so tools scanning for "#!r6rs" cannot assume that the file is indeed an R6RS program/library). In an implementation that supports, say, R6RS and R7RS, "#!r6rs" can only switch the lexical syntax but cannot introduce forms that make the implementation change the semantics from R7RS to R6RS, e.g., in the case of unquoted vector literals. (It must be compatible with calling the procedures "read" and "eval" directly, so "#!r6rs" must not wrap everything in some module form, say.) Racket's "#lang" mechanism has more freedom (regardless of how it is implemented). Of course, R6RS gives implementations the freedom to modify the reader in whatever way after, say, "#!foo-baz" was read. Thus, "#!foo-baz" could be defined to work like Racket's "#lang foo-baz," reading the rest of the source as "(module ...)". But as long as we stay within the confines of R6RS, this will only raise an undefined exception because, in general, "module" is not globally bound. I don't want to contradict you; I just mean that a plain "#!r6rs" without a top-level language where "module" is bound is not equivalent to "#lang" and that trying to switch to, say, Elisp mode with "#!elisp" would leave the boundaries of the Scheme reports (and when this is done, this specific discussion is moot). [...] > > The second comment concerns the shebang line in R6RS scripts (as > > described in the non-normative appendices). The shebang line is not a > > comment in the R6RS lexical syntax; it does not even reach the reader > > - at least, conceptionally. The Scheme reader only sees the lines > > following the shebang line. > > > > For example, a conforming R6RS implementation must raise an exception > > when trying to read (using get-datum, for example) a file that begins > > with a shebang line. > > > > Thus, the shebang line doesn't need to be considered when discussing > > comment formats in lexical syntax. > > > > This is a very persuasive account of the R6RS appendices. I just find the > approach somewhat unsatisfying. An R6RS implementation with script support > must have a procedure `not-quite-read` that handles a potential shebang line > before calling `read`. I wish this `not-quite-read` procedure were made > available from some Scheme library (and perhaps somewhat more explicitly > specified), and I'd probably find it most beautiful for this `not-quite-read` to > be unified with `read`. But that's not really relevant per se. The R6RS approach is the sound one. The shebang line is interpreted by the kernel, which only sees the binary file. The Scheme reader, on the other hand, operates on a textual file. So the logically correct way to implement script support is to open a binary port, and check whether the file starts with the bytes (!) corresponding to "#!/" or "#! /" and, if so, skips bytes until #\newline is seen. Only then it changes the binary port into a textual port (using whatever encoding the user may have specified) and uses the Scheme reader. This doesn't mean that there is no room for a procedure "read-script" that takes a binary port/filename. It just cannot and shouldn't be merged with "read". Another reason why this is not possible is that Scheme's lexical as defined in the R6RS does not include shebangs as possible tokens ("#!<delimiter>" is not a valid token, for example). > > > > > Best, > > > > Marc > > Thank you for these thought-provoking remarks! > > Philip > > [1]: https://www-old.cs.utah.edu/plt/publications/macromod.pdf > [2]: https://jeapostrophe.github.io/home/static/icfp065-mccarthy.pdf ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-27 7:26 ` Marc Nieper-Wißkirchen @ 2023-02-28 4:27 ` Philip McGrath 2023-02-28 6:57 ` Marc Nieper-Wißkirchen 0 siblings, 1 reply; 56+ messages in thread From: Philip McGrath @ 2023-02-28 4:27 UTC (permalink / raw) To: Marc Nieper-Wißkirchen; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 11184 bytes --] Hi, On Monday, February 27, 2023 2:26:47 AM EST Marc Nieper-Wißkirchen wrote: > Am Mo., 27. Feb. 2023 um 00:22 Uhr schrieb Philip McGrath > > <philip@philipmcgrath.com>: > > Hi, > > > > On Sunday, February 26, 2023 6:02:04 AM EST Marc Nieper-Wißkirchen wrote: > > > Am So., 26. Feb. 2023 um 08:46 Uhr schrieb <guile-devel- request@gnu.org>: > > > > Message: 1 > > > > Date: Sun, 26 Feb 2023 02:45:12 -0500 > > > > From: "Philip McGrath" <philip@philipmcgrath.com> > > > > To: "Maxime Devos" <maximedevos@telenet.be>, Ludovic Courtès > > > > > > > > <ludo@gnu.org>, "Matt Wette" <matt.wette@gmail.com>, > > > > guile-devel@gnu.org > > > > > > > > Cc: "Christine Lemmer-Webber" <cwebber@dustycloud.org> > > > > Subject: Re: [PATCH] add language/wisp to Guile? > > > > Message-ID: <981b0e74-96c0-4430-b693-7fc8026e3ead@app.fastmail.com> > > > > Content-Type: text/plain;charset=utf-8 > > > > > > [...] > > > > > > I would like to make two remarks, which I think are essential to get > > > the semantics right. > > > > > > The R6RS comments of the form "#!r6rs" are defined to modify the > > > lexical syntax of the reader; possibly, they don't change the language > > > semantics (after reading). In particular, "#!r6rs" also applies to > > > data files but does not affect the interpretation of the data after it > > > is read. It cannot because the reader otherwise ignores and does not > > > report comments. > > > > > > Thus a comment of the form "#!r6rs" may be suitable for Wisp, but it > > > is not a substitute for Racket's "#lang" (or a similar mechanism). > > > Guile shouldn't confuse these two different levels of meaning. > > > > I agree that it's important to distinguish between lexical syntax (`read`) > > and the semantics of what is read. > > > > However, Racket's `#lang` in fact operates entirely at the level of > > `read`. > > (Racketeers contribute to confusion on this point by using `#lang` as a > > shorthand for Racket's entire language-creation infrastructure, when in > > fact `#lang` specifically has a fairly small, though important, role.) > > When `read` encounters `#lang something`, it looks up a reader extension > > procedure in the module indicated by `something` and uses that procedure > > to continue parsing the input stream into data. Importantly, while syntax > > objects may be used to attach source location information, there is no > > "lexical context" or binding information at this stage, as one familiar > > with syntax objects from macro writing might expect: those semantics come > > after `read` has finished parsing the input stream from bytes to values. > > [...] > > Thank you for the reminder on Racket's #lang mechanism; it is a long > time ago since I wrote some #lang extensions myself when experimenting > with Racket. > > Nevertheless, I am not sure whether it is relevant to the point I > tried to make. The "#!r6rs" does not indicate a particular language > (so tools scanning for "#!r6rs" cannot assume that the file is indeed > an R6RS program/library). I think I had missed that some of your remarks are specifically about the "#!r6rs" directive, not directives of the form "#!<identifier>" more generally. I agree that implementations have more responsibilities with respect to "#!r6rs", that the presence of "#!r6rs" in a file is not enough to conclude that the file is an R6RS program/library, and that a straightforward implementation of "#!r6rs" as reading like "#lang r6rs" in the manner of my previous examples would not conform to R6RS. Also, on the broader question, my first preference would be for Guile to implement `#lang language/wisp`, not least to avoid the confusing subtleties here and the potential for humans to confuse `#!language/wisp` with a shebang line. I raise the possibility of `#!language/wisp` only as an alternative if people are more comfortable using a mechanism that R6RS specifically designed for implementation-defined extensions. Nonetheless, I'll try to explain why I think "#!r6rs" can be handled, and is handled by Racket, consistently with both "#lang r6rs" and the behavior specified in the report. > > Of course, R6RS gives implementations the freedom to modify the reader > in whatever way after, say, "#!foo-baz" was read. Thus, "#!foo-baz" > could be defined to work like Racket's "#lang foo-baz," reading the > rest of the source as "(module ...)". But as long as we stay within > the confines of R6RS, this will only raise an undefined exception > because, in general, "module" is not globally bound. > Before getting to the general point, specifically about "module" not being bound: in Racket, a root-level `module` form is handled quite similarly to the `library` form in R6RS, which says in 7.1 [1]: >>>> The names `library`, `export`, `import`, [...] appearing in the library syntax are part of the syntax and are not reserved, i.e., the same names can be used for other purposes within the library or even exported from or imported into a library with different meanings, without affecting their use in the `library` form. None of the libraries defined in R6RS export a binding for `library`: instead, the implementation must recognize it somehow, whether by handling it as a built-in or binding it in some environment not standardized by R6RS. (The `racket/base` library/language does in fact export a binding for `module` which can be used to create submodules with the same syntax as a root-level `module`, but that isn't relevant to the handling of a `root-level` module form itself.) > I don't want to contradict you; I just mean that a plain "#!r6rs" > without a top-level language where "module" is bound is not equivalent > to "#lang" and that trying to switch to, say, Elisp mode with > "#!elisp" would leave the boundaries of the Scheme reports (and when > this is done, this specific discussion is moot). > > [...] > > (It must be compatible with calling the procedures "read" and "eval" > directly, so "#!r6rs" must not wrap everything in some module form, > say.) Now I'll try to sketch Racket's handling of "#!r6rs" from an R6RS perspective. For the sake of a concrete example, lets consider this program: ``` #!r6rs (library (demo) (export x) (import (rnrs base)) (define x (+ 1 #!r6rs 2))) ``` Using R6RS's `read`/`get-datum` and `write` on such input produces the datum (with linebreaks for legibility): ``` (library (demo) (export x) (import (rnrs base)) (define x (+ 1 2))) ``` Racket is an implementation of the sort contemplated by Appendix A [2]: >>>> [T]he default mode offered by a Scheme implementation may be non- conformant, and such a Scheme implementation may require special settings or declarations to enter the report-conformant mode. When Racket begins reading a module's source code, the reader is in a non- conformant mode. The first "#!r6rs" lexeme is the required "declaration[] to enter the report-conformant mode". From that point on, the import is read with a reader as specified in R6RS, with no extensions. Thus, the second "#!r6rs" lexeme, as the report specifies, is treated as a comment. (Since the reader is already in strict R6RS mode, it has no side-effect.) Racket's reader (as noted, in `with-module-reading-parameterization` mode) produces the following datum: ``` (module anonymous-module r6rs (#%module-begin (library (demo) (export x) (import (rnrs base)) (define x (+ 1 2))))) ``` Racket's reader has adjusted the "declaration[] to enter the report-conformant mode" to an explicitly-parenthesized form, but the portion of the input read in report-conformant mode produced the same datum as above. The important point here is that the `read` and `eval` procedures from `racket/base` are not the same as the `read` and `eval` from `(rnrs io simple)` and `(rnrs eval)`, respectively. The R6RS version of `read` does not introduce a `module` form, and the R6RS version of `eval` happily evaluates the forms that the R6RS `read` produces. It's a bit of a tangent here, but for the broader discussion about `#lang` or similar it might be interesting to note that, in addition to this "declaration[] to enter the report-conformant mode" that can be written in- band in a report-conformant source file, Racket also has out-of-band ways to enter R6RS "report-conformant mode". In particular, Racket distributes an executable `plt-r6rs` that can run and compile R6RS programs that do not necessarily start with `#!r6rs`. [3] Invoking it instead with the form `plt-r6rs --install ‹libraries-file›` will read `‹libraries-file›`, which need not begin with `#!r6rs`, with the R6RS-conformant reader. The ‹libraries-file› should contain R6RS library forms, each of which will be installed to its own file, located where Racket would expect to load the R6RS library with its declared name. In the process of installing the libraries, `plt-r6rs` adds a `#!r6rs` directive at the beginning of each file. [4] > In an implementation that supports, say, > R6RS and R7RS, "#!r6rs" can only switch the lexical syntax but cannot > introduce forms that make the implementation change the semantics from > R7RS to R6RS, e.g., in the case of unquoted vector literals. I'm not very familiar with R7RS, and, quickly skimming R7RS Small, I didn't see a notion of directives other than `#!fold-case` and `#!no-fold-case`. (That's a bit ironic, given that the R6RS editors seem to have contemplated `#!r7rs` *before* they considered `#!r6rs`.) I think a similar technique could work in this case, though. From an R6RS perspective, at least, an implementation could implement a directive such that the `read` from `(rnrs)` would parse: >>>> #!r7rs #(1 2 3) as: >>>> (quote #(1 2 3)) The other direction is a bit trickier, but the R7RS specification for `read` from `(scheme base)` does say that "implementations may support extended syntax to represent record types or other types that do not have datum representations." It seems an implementation could define a type "non-self- evaluating-vector" and have `read` from `(scheme base)` produce a value of that type when given: >>>> #!r6rs #(1 2 3) Presumably `eval` from `(scheme eval)` would raise an error if asked to evaluate such a datum, as it does if asked to evaluate an unquoted (), but `quote` from `(scheme base)` would arrange to replace such a datum with a vector. (I'm not at all sure that an implementation *should* do such a thing: I'm only trying to explain why I don't think the Scheme reports prohibit it.) -Philip [1]: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-10.html#node_sec_7.1 [2]: http://www.r6rs.org/final/html/r6rs-app/r6rs-app-Z-H-3.html#node_chap_A [3]: https://docs.racket-lang.org/r6rs/Running_Top-Level_Programs.html [4]: https://docs.racket-lang.org/r6rs/Installing_Libraries.html [-- Attachment #2: This is a digitally signed message part. --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-28 4:27 ` Philip McGrath @ 2023-02-28 6:57 ` Marc Nieper-Wißkirchen 0 siblings, 0 replies; 56+ messages in thread From: Marc Nieper-Wißkirchen @ 2023-02-28 6:57 UTC (permalink / raw) To: Philip McGrath; +Cc: guile-devel Am Di., 28. Feb. 2023 um 05:27 Uhr schrieb Philip McGrath <philip@philipmcgrath.com>: > > Hi, > > On Monday, February 27, 2023 2:26:47 AM EST Marc Nieper-Wißkirchen wrote: [...] > > Nevertheless, I am not sure whether it is relevant to the point I > > tried to make. The "#!r6rs" does not indicate a particular language > > (so tools scanning for "#!r6rs" cannot assume that the file is indeed > > an R6RS program/library). > > I think I had missed that some of your remarks are specifically about the > "#!r6rs" directive, not directives of the form "#!<identifier>" more generally. > I agree that implementations have more responsibilities with respect to > "#!r6rs", that the presence of "#!r6rs" in a file is not enough to conclude > that the file is an R6RS program/library, and that a straightforward > implementation of "#!r6rs" as reading like "#lang r6rs" in the manner of my > previous examples would not conform to R6RS. Yes, this summarizes it well. > Also, on the broader question, my first preference would be for Guile to > implement `#lang language/wisp`, not least to avoid the confusing subtleties > here and the potential for humans to confuse `#!language/wisp` with a shebang > line. I raise the possibility of `#!language/wisp` only as an alternative if > people are more comfortable using a mechanism that R6RS specifically designed > for implementation-defined extensions. When wisp only changes the lexical syntax, `#!wisp` would be fine (and it cannot be confused with a shebang line IMO because a shebang line must begin with `#! ` or `#!/`. However, the authors of the R6RS clearly had minor changes of the lexical syntax in mind when they introduced comments like `#!r6rs` or `#!chezscheme` (e.g. Chez Scheme adds a syntax for gensyms). As wisp radically changes how the text is tokenized, something like `#!wisp` probably only follows the latter but not the spirit of R6RS. > Nonetheless, I'll try to explain why I think "#!r6rs" can be handled, and is > handled by Racket, consistently with both "#lang r6rs" and the behavior > specified in the report. > > > > > Of course, R6RS gives implementations the freedom to modify the reader > > in whatever way after, say, "#!foo-baz" was read. Thus, "#!foo-baz" > > could be defined to work like Racket's "#lang foo-baz," reading the > > rest of the source as "(module ...)". But as long as we stay within > > the confines of R6RS, this will only raise an undefined exception > > because, in general, "module" is not globally bound. > > > > Before getting to the general point, specifically about "module" not being > bound: in Racket, a root-level `module` form is handled quite similarly to the > `library` form in R6RS, which says in 7.1 [1]: > > >>>> The names `library`, `export`, `import`, [...] appearing in the library > syntax are part of the syntax and are not reserved, i.e., the same names can > be used for other purposes within the library or even exported from or > imported into a library with different meanings, without affecting their use in > the `library` form. > > None of the libraries defined in R6RS export a binding for `library`: instead, > the implementation must recognize it somehow, whether by handling it as a > built-in or binding it in some environment not standardized by R6RS. > > (The `racket/base` library/language does in fact export a binding for `module` > which can be used to create submodules with the same syntax as a root-level > `module`, but that isn't relevant to the handling of a `root-level` module > form itself.) Sure, but not relevant. I didn't say that module is bound at the Racket top-level, only that an R6RS implementation wouldn't expect it (and cannot interpret it because it is not bound). > > I don't want to contradict you; I just mean that a plain "#!r6rs" > > without a top-level language where "module" is bound is not equivalent > > to "#lang" and that trying to switch to, say, Elisp mode with > > "#!elisp" would leave the boundaries of the Scheme reports (and when > > this is done, this specific discussion is moot). > > > > [...] > > > > (It must be compatible with calling the procedures "read" and "eval" > > directly, so "#!r6rs" must not wrap everything in some module form, > > say.) > > Now I'll try to sketch Racket's handling of "#!r6rs" from an R6RS perspective. > For the sake of a concrete example, lets consider this program: It is obvious that one can do it; it is just outside the realm of R6RS because the "non-conformant mode" can be any (even a C-interpreting mode) that can listen to whatever magic numbers there may be in the input. That said, the use of "#!r6rs" as such a magic marker is not in the spirit of the lexical syntax of R6RS (where it was introduced). This has been my original point. Doable, of course. [...] > > In an implementation that supports, say, > > R6RS and R7RS, "#!r6rs" can only switch the lexical syntax but cannot > > introduce forms that make the implementation change the semantics from > > R7RS to R6RS, e.g., in the case of unquoted vector literals. > > I'm not very familiar with R7RS, and, quickly skimming R7RS Small, I didn't > see a notion of directives other than `#!fold-case` and `#!no-fold-case`. > (That's a bit ironic, given that the R6RS editors seem to have contemplated > `#!r7rs` *before* they considered `#!r6rs`.) I think a similar technique could > work in this case, though. From an R6RS perspective, at least, an > implementation could implement a directive such that the `read` from `(rnrs)` > would parse: > > >>>> #!r7rs #(1 2 3) > > as: > > >>>> (quote #(1 2 3)) > > The other direction is a bit trickier, but the R7RS specification for `read` > from `(scheme base)` does say that "implementations may support extended > syntax to represent record types or other types that do not have datum > representations." It seems an implementation could define a type "non-self- > evaluating-vector" and have `read` from `(scheme base)` produce a value of > that type when given: > > >>>> #!r6rs #(1 2 3) This cannot work because the macros can detect the presence of quotes before vector literals; the reader must not insert quotes that weren't there. The expander must be made aware of what the valid literal expressions are. In Racket, this can be done with #%datum, but this does not easily translate to R6RS (because of the presence of unwrapped syntax objects in R6RS). > Presumably `eval` from `(scheme eval)` would raise an error if asked to > evaluate such a datum, as it does if asked to evaluate an unquoted (), but > `quote` from `(scheme base)` would arrange to replace such a datum with a > vector. > > (I'm not at all sure that an implementation *should* do such a thing: I'm only > trying to explain why I don't think the Scheme reports prohibit it.) My point is that just switching the lexical syntax won't do it. Of course, you can interpret a leading `#!r7rs` as switching from conformant to non-conformant R6RS mode, but this would again be a misuse (I guess this is also the reason why Racket favors `#lang` over `#!...`. But even then, one has a problem because the change cannot be file local. If a library in R7RS mode exports a macro that is used in a context with R6RS mode, the vector literals inserted by the macro should still be interpreted in R7RS mode. If we could have #%datum, this wouldn't be a problem. Thank you a lot for your insights and your detailed explanations, Marc > -Philip > > [1]: http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-10.html#node_sec_7.1 > [2]: http://www.r6rs.org/final/html/r6rs-app/r6rs-app-Z-H-3.html#node_chap_A > [3]: https://docs.racket-lang.org/r6rs/Running_Top-Level_Programs.html > [4]: https://docs.racket-lang.org/r6rs/Installing_Libraries.html ^ permalink raw reply [flat|nested] 56+ messages in thread
* [PATCH] add language/wisp to Guile? @ 2023-02-03 21:26 Dr. Arne Babenhauserheide 2023-02-04 15:08 ` Maxime Devos 2023-09-30 13:17 ` Christine Lemmer-Webber 0 siblings, 2 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-03 21:26 UTC (permalink / raw) To: guile-devel [-- Attachment #1.1: Type: text/plain, Size: 1597 bytes --] Hi, Since (language wisp)¹ has been rock stable for years now and is used in the Guix Workflow Language and supported in the Chickadee and the Tsukundere game engines, I thought it coud be a good time to merge Wisp into Guile itself. So I prepared a patch that adds language/wisp, some texinfo for SRFI-119, and some tests. Why add Wisp? For Wisp: it is then available directly wherever Guile is available. This will make it much easier for people to follow tutorials. For Guile: - Wisp has proven to be good at enabling people to get an entrance to Scheme² without pulling them out of the community. - It has also been shown to enable people who are used to other programming languages to get a quick start at tools written in Guile. - And it provides access to the full capabilities of Guile with minimal maintenance effort, because it is just the thinnest possible layer around Scheme. The last required change was in 2020 while I used it continuously. The attached patch provides just the wisp reader, but not the wisp->scheme transformer, because the latter has known broken edge-cases (and who needs the transformer can get it from the wisp repo and execute it directly with a Guile that then already supports wisp without any path adaptions). So I’d like to ask: can we merge Wisp as supported language into Guile? Best wishes, Arne ¹: https://www.draketo.de/software/wisp ²: »Wisp allows people to see code how Lispers perceive it. Its structure becomes apparent.« — Ricardo Wurmus in IRC [-- Attachment #1.2: 0001-Add-language-wisp-wisp-tests-and-srfi-119-documentat.patch --] [-- Type: text/x-patch, Size: 48201 bytes --] From 4d4759f9fc67b01c40bde41b93e3998f7d64eabd Mon Sep 17 00:00:00 2001 From: Arne Babenhauserheide <arne_bab@web.de> Date: Fri, 3 Feb 2023 22:20:04 +0100 Subject: [PATCH] Add language/wisp, wisp tests, and srfi-119 documentation * doc/ref/srfi-modules.texi (srfi-119): add node * module/language/wisp.scm: New file. * module/language/wisp/spec.scm: New file. * test-suite/tests/srfi-119.test: New file. --- doc/ref/srfi-modules.texi | 30 ++ module/language/wisp.scm | 796 +++++++++++++++++++++++++++++++++ module/language/wisp/spec.scm | 107 +++++ test-suite/tests/srfi-119.test | 81 ++++ 4 files changed, 1014 insertions(+) create mode 100644 module/language/wisp.scm create mode 100644 module/language/wisp/spec.scm create mode 100644 test-suite/tests/srfi-119.test diff --git a/doc/ref/srfi-modules.texi b/doc/ref/srfi-modules.texi index 0ef136215..759e293ec 100644 --- a/doc/ref/srfi-modules.texi +++ b/doc/ref/srfi-modules.texi @@ -64,6 +64,7 @@ get the relevant SRFI documents from the SRFI home page * SRFI-98:: Accessing environment variables. * SRFI-105:: Curly-infix expressions. * SRFI-111:: Boxes. +* SRFI-119:: Wisp: simpler indentation-sensitive scheme. * SRFI-171:: Transducers @end menu @@ -5662,6 +5663,34 @@ Return the current contents of @var{box}. Set the contents of @var{box} to @var{value}. @end deffn +@node SRFI-119 +@subsection SRFI-119 Wisp: simpler indentation-sensitive scheme. +@cindex SRFI-119 +@cindex wisp + +The languages shipped in Guile include SRFI-119 (wisp), an encoding of +Scheme that allows replacing parentheses with equivalent indentation and +inline colons. See +@uref{http://srfi.schemers.org/srfi-119/srfi-119.html, the specification +of SRFI-119}. Some examples: + +@example +display "Hello World!" @result{} (display "Hello World!") +@end example + +@example +define : factorial n @result{} (define (factorial n) + if : zero? n @result{} (if (zero? n) + . 1 @result{} 1 + * n : factorial @{n - 1@} @result{} (* n (factorial @{n - 1@})))) +@end example + +To execute a file with wisp code, select the language and filename +extension @code{.w} vie @code{guile --language=wisp -x .w}. + +In files using Wisp, @xref{SRFI-105} (Curly Infix) is always activated. + + @node SRFI-171 @subsection Transducers @cindex SRFI-171 @@ -5705,6 +5734,7 @@ left-to-right, due to how transducers are initiated. * SRFI-171 Helpers:: Utilities for writing your own transducers @end menu + @node SRFI-171 General Discussion @subsubsection SRFI-171 General Discussion @cindex transducers discussion diff --git a/module/language/wisp.scm b/module/language/wisp.scm new file mode 100644 index 000000000..ba24f54c5 --- /dev/null +++ b/module/language/wisp.scm @@ -0,0 +1,796 @@ +;;; Wisp + +;; Copyright (C) 2013, 2017, 2018, 2020 Free Software Foundation, Inc. +;; Copyright (C) 2014--2023 Arne Babenhauserheide. + +;;;; This library is free software; you can redistribute it and/or +;;;; modify it under the terms of the GNU Lesser General Public +;;;; License as published by the Free Software Foundation; either +;;;; version 3 of the License, or (at your option) any later version. +;;;; +;;;; This library is distributed in the hope that it will be useful, +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;;;; Lesser General Public License for more details. +;;;; +;;;; You should have received a copy of the GNU Lesser General Public +;;;; License along with this library; if not, write to the Free Software +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +;;; Commentary: + +;; Scheme-only implementation of a wisp-preprocessor which output a +;; scheme code tree to feed to a scheme interpreter instead of a +;; preprocessed file. + +;; Limitations: +;; - only unescapes up to 12 leading underscores at line start (\____________) +;; - in some cases the source line information is missing in backtraces. +;; check for set-source-property! + +;;; Code: + +(define-module (language wisp) + #:export (wisp-scheme-read-chunk wisp-scheme-read-all + wisp-scheme-read-file-chunk wisp-scheme-read-file + wisp-scheme-read-string)) + +; use curly-infix by default +(read-enable 'curly-infix) + +(use-modules + (srfi srfi-1) + (srfi srfi-11 ); for let-values + (ice-9 rw ); for write-string/partial + (ice-9 match)) + + +;; Helper functions for the indent-and-symbols data structure: '((indent token token ...) ...) +(define (line-indent line) + (car line)) + +(define (line-real-indent line) + "Get the indentation without the comment-marker for unindented lines (-1 is treated as 0)." + (let (( indent (line-indent line))) + (if (= -1 indent) + 0 + indent))) + +(define (line-code line) + (let ((code (cdr line))) + ; propagate source properties + (when (not (null? code)) + (set-source-properties! code (source-properties line))) + code)) + +; literal values I need +(define readcolon + (string->symbol ":")) + +(define wisp-uuid "e749c73d-c826-47e2-a798-c16c13cb89dd") +; define an intermediate dot replacement with UUID to avoid clashes. +(define repr-dot ; . + (string->symbol (string-append "REPR-DOT-" wisp-uuid))) + +; allow using reader additions as the first element on a line to prefix the list +(define repr-quote ; ' + (string->symbol (string-append "REPR-QUOTE-" wisp-uuid))) +(define repr-unquote ; , + (string->symbol (string-append "REPR-UNQUOTE-" wisp-uuid))) +(define repr-quasiquote ; ` + (string->symbol (string-append "REPR-QUASIQUOTE-" wisp-uuid))) +(define repr-unquote-splicing ; ,@ + (string->symbol (string-append "REPR-UNQUOTESPLICING-" wisp-uuid))) + +(define repr-syntax ; #' + (string->symbol (string-append "REPR-SYNTAX-" wisp-uuid))) +(define repr-unsyntax ; #, + (string->symbol (string-append "REPR-UNSYNTAX-" wisp-uuid))) +(define repr-quasisyntax ; #` + (string->symbol (string-append "REPR-QUASISYNTAX-" wisp-uuid))) +(define repr-unsyntax-splicing ; #,@ + (string->symbol (string-append "REPR-UNSYNTAXSPLICING-" wisp-uuid))) + +; TODO: wrap the reader to return the repr of the syntax reader +; additions + +(define (match-charlist-to-repr charlist) + (let + ((chlist (reverse charlist))) + (cond + ((equal? chlist (list #\.)) + repr-dot) + ((equal? chlist (list #\')) + repr-quote) + ((equal? chlist (list #\,)) + repr-unquote) + ((equal? chlist (list #\`)) + repr-quasiquote) + ((equal? chlist (list #\, #\@ )) + repr-unquote-splicing) + ((equal? chlist (list #\# #\' )) + repr-syntax) + ((equal? chlist (list #\# #\, )) + repr-unsyntax) + ((equal? chlist (list #\# #\` )) + repr-quasisyntax) + ((equal? chlist (list #\# #\, #\@ )) + repr-unsyntax-splicing) + (else + #f)))) + +(define (wisp-read port) + "wrap read to catch list prefixes." + (let ((prefix-maxlen 4)) + (let longpeek + ((peeked '()) + (repr-symbol #f)) + (cond + ((or (< prefix-maxlen (length peeked)) (eof-object? (peek-char port)) (equal? #\space (peek-char port)) (equal? #\newline (peek-char port)) ) + (if repr-symbol ; found a special symbol, return it. + ; TODO: Somehow store source-properties. The commented-out code below does not work. + ; catch #t + ; lambda () + ; write : source-properties symbol-or-symbols + ; set-source-property! symbol-or-symbols 'filename : port-filename port + ; set-source-property! symbol-or-symbols 'line : 1+ : port-line port + ; set-source-property! symbol-or-symbols 'column : port-column port + ; write : source-properties symbol-or-symbols + ; lambda : key . arguments + ; . #f + repr-symbol + (let unpeek + ((remaining peeked)) + (cond + ((equal? '() remaining ) + (read port )); let read to the work + (else + (unread-char (car remaining) port) + (unpeek (cdr remaining))))))) + (else + (let* + ((next-char (read-char port)) + (peeked (cons next-char peeked))) + (longpeek + peeked + (match-charlist-to-repr peeked)))))))) + + + +(define (line-continues? line) + (equal? repr-dot (car (line-code line)))) + +(define (line-only-colon? line) + (and + (equal? ":" (car (line-code line))) + (null? (cdr (line-code line))))) + +(define (line-empty-code? line) + (null? (line-code line))) + +(define (line-empty? line) + (and + ; if indent is -1, we stripped a comment, so the line was not really empty. + (= 0 (line-indent line)) + (line-empty-code? line))) + +(define (line-strip-continuation line ) + (if (line-continues? line) + (append + (list + (line-indent line)) + (cdr (line-code line))) + line)) + +(define (line-strip-indentation-marker line) + "Strip the indentation markers from the beginning of the line" + (cdr line)) + +(define (indent-level-reduction indentation-levels level select-fun) + "Reduce the INDENTATION-LEVELS to the given LEVEL and return the value selected by SELECT-FUN" + (let loop + ((newlevels indentation-levels) + (diff 0)) + (cond + ((= level (car newlevels)) + (select-fun (list diff indentation-levels))) + ((< level (car newlevels)) + (loop + (cdr newlevels) + (1+ diff))) + (else + (throw 'wisp-syntax-error "Level ~A not found in the indentation-levels ~A."))))) + +(define (indent-level-difference indentation-levels level) + "Find how many indentation levels need to be popped off to find the given level." + (indent-level-reduction indentation-levels level + (lambda (x ); get the count + (car x)))) + +(define (indent-reduce-to-level indentation-levels level) + "Find how many indentation levels need to be popped off to find the given level." + (indent-level-reduction indentation-levels level + (lambda (x ); get the levels + (car (cdr x))))) + +(define (chunk-ends-with-period currentsymbols next-char) + "Check whether indent-and-symbols ends with a period, indicating the end of a chunk." + (and (not (null? currentsymbols)) + (equal? #\newline next-char) + (equal? repr-dot + (list-ref currentsymbols (- (length currentsymbols) 1))))) + +(define (wisp-scheme-read-chunk-lines port) + (let loop + ((indent-and-symbols (list )); '((5 "(foobar)" "\"yobble\"")(3 "#t")) + (inindent #t) + (inunderscoreindent (equal? #\_ (peek-char port))) + (incomment #f) + (currentindent 0) + (currentsymbols '()) + (emptylines 0)) + (cond + ((>= emptylines 2 ); the chunk end has to be checked + ; before we look for new chars in the + ; port to make execution in the REPL + ; after two empty lines work + ; (otherwise it shows one more line). + indent-and-symbols) + (else + (let ((next-char (peek-char port))) + (cond + ((eof-object? next-char) + (append indent-and-symbols (list (append (list currentindent) currentsymbols)))) + ((and inindent (zero? currentindent) (not incomment) (not (null? indent-and-symbols)) (not inunderscoreindent) (not (or (equal? #\space next-char) (equal? #\newline next-char) (equal? (string-ref ";" 0) next-char)))) + (append indent-and-symbols )); top-level form ends chunk + ((chunk-ends-with-period currentsymbols next-char) + ; the line ends with a period. This is forbidden in + ; SRFI-119. Use it to end the line in the REPL without + ; showing continuation dots (...). + (append indent-and-symbols (list (append (list currentindent) (drop-right currentsymbols 1))))) + ((and inindent (equal? #\space next-char)) + (read-char port ); remove char + (loop + indent-and-symbols + #t ; inindent + #f ; inunderscoreindent + #f ; incomment + (1+ currentindent) + currentsymbols + emptylines)) + ((and inunderscoreindent (equal? #\_ next-char)) + (read-char port ); remove char + (loop + indent-and-symbols + #t ; inindent + #t ; inunderscoreindent + #f ; incomment + (1+ currentindent) + currentsymbols + emptylines)) + ; any char but whitespace *after* underscoreindent is + ; an error. This is stricter than the current wisp + ; syntax definition. TODO: Fix the definition. Better + ; start too strict. FIXME: breaks on lines with only + ; underscores which should be empty lines. + ((and inunderscoreindent (and (not (equal? #\space next-char)) (not (equal? #\newline next-char)))) + (throw 'wisp-syntax-error "initial underscores without following whitespace at beginning of the line after" (last indent-and-symbols))) + ((equal? #\newline next-char) + (read-char port ); remove the newline + ; The following two lines would break the REPL by requiring one char too many. + ; if : and (equal? #\newline next-char) : equal? #\return : peek-char port + ; read-char port ; remove a full \n\r. Damn special cases... + (let* ; distinguish pure whitespace lines and lines + ; with comment by giving the former zero + ; indent. Lines with a comment at zero indent + ; get indent -1 for the same reason - meaning + ; not actually empty. + ( + (indent + (cond + (incomment + (if (= 0 currentindent ); specialcase + -1 + currentindent )) + ((not (null? currentsymbols )); pure whitespace + currentindent) + (else + 0))) + (parsedline (append (list indent) currentsymbols)) + (emptylines + (if (not (line-empty? parsedline)) + 0 + (1+ emptylines)))) + (when (not (= 0 (length parsedline))) + ; set the source properties to parsedline so we can try to add them later. + (set-source-property! parsedline 'filename (port-filename port)) + (set-source-property! parsedline 'line (port-line port))) + ; TODO: If the line is empty. Either do it here and do not add it, just + ; increment the empty line counter, or strip it later. Replace indent + ; -1 by indent 0 afterwards. + (loop + (append indent-and-symbols (list parsedline)) + #t ; inindent + (if (<= 2 emptylines) + #f ; chunk ends here + (equal? #\_ (peek-char port ))); are we in underscore indent? + #f ; incomment + 0 + '() + emptylines))) + ((equal? #t incomment) + (read-char port ); remove one comment character + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #t ; incomment + currentindent + currentsymbols + emptylines)) + ((or (equal? #\space next-char) (equal? #\tab next-char) (equal? #\return next-char) ); remove whitespace when not in indent + (read-char port ); remove char + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #f ; incomment + currentindent + currentsymbols + emptylines)) + ; | cludge to appease the former wisp parser + ; | used for bootstrapping which has a + ; v problem with the literal comment char + ((equal? (string-ref ";" 0) next-char) + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #t ; incomment + currentindent + currentsymbols + emptylines)) + (else ; use the reader + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #f ; incomment + currentindent + ; this also takes care of the hashbang and leading comments. + (append currentsymbols (list (wisp-read port))) + emptylines)))))))) + + +(define (line-code-replace-inline-colons line) + "Replace inline colons by opening parens which close at the end of the line" + ; format #t "replace inline colons for line ~A\n" line + (let loop + ((processed '()) + (unprocessed line)) + (cond + ((null? unprocessed) + ; format #t "inline-colons processed line: ~A\n" processed + processed) + ; replace : . with nothing + ((and (<= 2 (length unprocessed)) (equal? readcolon (car unprocessed)) (equal? repr-dot (car (cdr unprocessed)))) + (loop + (append processed + (loop '() (cdr (cdr unprocessed)))) + '())) + ((equal? readcolon (car unprocessed)) + (loop + ; FIXME: This should turn unprocessed into a list. + (append processed + (list (loop '() (cdr unprocessed)))) + '())) + (else + (loop + (append processed + (list (car unprocessed))) + (cdr unprocessed)))))) + +(define (line-replace-inline-colons line) + (cons + (line-indent line) + (line-code-replace-inline-colons (line-code line)))) + +(define (line-strip-lone-colon line) + "A line consisting only of a colon is just a marked indentation level. We need to kill the colon before replacing inline colons." + (if + (equal? + (line-code line) + (list readcolon)) + (list (line-indent line)) + line)) + +(define (line-finalize line) + "Process all wisp-specific information in a line and strip it" + (let + ( + (l + (line-code-replace-inline-colons + (line-strip-indentation-marker + (line-strip-lone-colon + (line-strip-continuation line)))))) + (when (not (null? (source-properties line))) + (catch #t + (lambda () + (set-source-properties! l (source-properties line))) + (lambda (key . arguments) + #f))) + l)) + +(define (wisp-add-source-properties-from source target) + "Copy the source properties from source into the target and return the target." + (catch #t + (lambda () + (set-source-properties! target (source-properties source))) + (lambda (key . arguments) + #f)) + target) + +(define (wisp-propagate-source-properties code) + "Propagate the source properties from the sourrounding list into every part of the code." + (let loop + ((processed '()) + (unprocessed code)) + (cond + ((and (null? processed) (not (pair? unprocessed)) (not (list? unprocessed))) + unprocessed) + ((and (pair? unprocessed) (not (list? unprocessed))) + (cons + (wisp-propagate-source-properties (car unprocessed)) + (wisp-propagate-source-properties (cdr unprocessed)))) + ((null? unprocessed) + processed) + (else + (let ((line (car unprocessed))) + (if (null? (source-properties unprocessed)) + (wisp-add-source-properties-from line unprocessed) + (wisp-add-source-properties-from unprocessed line)) + (loop + (append processed (list (wisp-propagate-source-properties line))) + (cdr unprocessed))))))) + +(define* (wisp-scheme-indentation-to-parens lines) + "Add parentheses to lines and remove the indentation markers" + (when + (and + (not (null? lines)) + (not (line-empty-code? (car lines))) + (not (= 0 (line-real-indent (car lines ))))); -1 is a line with a comment + (if (= 1 (line-real-indent (car lines))) + ;; accept a single space as indentation of the first line (and ignore the indentation) to support meta commands + (set! lines + (cons + (cons 0 (cdr (car lines))) + (cdr lines))) + (throw 'wisp-syntax-error + (format #f "The first symbol in a chunk must start at zero indentation. Indentation and line: ~A" + (car lines))))) + (let loop + ((processed '()) + (unprocessed lines) + (indentation-levels '(0))) + (let* + ( + (current-line + (if (<= 1 (length unprocessed)) + (car unprocessed) + (list 0 ))); empty code + (next-line + (if (<= 2 (length unprocessed)) + (car (cdr unprocessed)) + (list 0 ))); empty code + (current-indentation + (car indentation-levels)) + (current-line-indentation (line-real-indent current-line))) + ; format #t "processed: ~A\ncurrent-line: ~A\nnext-line: ~A\nunprocessed: ~A\nindentation-levels: ~A\ncurrent-indentation: ~A\n\n" + ; . processed current-line next-line unprocessed indentation-levels current-indentation + (cond + ; the real end: this is reported to the outside world. + ((and (null? unprocessed) (not (null? indentation-levels)) (null? (cdr indentation-levels))) + ; display "done\n" + ; reverse the processed lines, because I use cons. + processed) + ; the recursion end-condition + ((and (null? unprocessed)) + ; display "last step\n" + ; this is the last step. Nothing more to do except + ; for rolling up the indentation levels. return the + ; new processed and unprocessed lists: this is a + ; side-recursion + (values processed unprocessed)) + ((null? indentation-levels) + ; display "indentation-levels null\n" + (throw 'wisp-programming-error "The indentation-levels are null but the current-line is null: Something killed the indentation-levels.")) + (else ; now we come to the line-comparisons and indentation-counting. + (cond + ((line-empty-code? current-line) + ; display "current-line empty\n" + ; We cannot process indentation without + ; code. Just switch to the next line. This should + ; only happen at the start of the recursion. + ; TODO: Somehow preserve the line-numbers. + (loop + processed + (cdr unprocessed) + indentation-levels)) + ((and (line-empty-code? next-line) (<= 2 (length unprocessed ))) + ; display "next-line empty\n" + ; TODO: Somehow preserve the line-numbers. + ; take out the next-line from unprocessed. + (loop + processed + (cons current-line + (cdr (cdr unprocessed))) + indentation-levels)) + ((> current-indentation current-line-indentation) + ; display "current-indent > next-line\n" + ; this just steps back one level via the side-recursion. + (let ((previous-indentation (car (cdr indentation-levels)))) + (if (<= current-line-indentation previous-indentation) + (values processed unprocessed) + (begin ;; not yet used level! TODO: maybe throw an error here instead of a warning. + (let ((linenumber (- (length lines) (length unprocessed)))) + (format (current-error-port) ";;; WARNING:~A: used lower but undefined indentation level (line ~A of the current chunk: ~S). This makes refactoring much more error-prone, therefore it might become an error in a later version of Wisp.\n" (source-property current-line 'line) linenumber (cdr current-line))) + (loop + processed + unprocessed + (cons ; recursion via the indentation-levels + current-line-indentation + (cdr indentation-levels))))))) + ((= current-indentation current-line-indentation) + ; display "current-indent = next-line\n" + (let + ((line (line-finalize current-line)) + (next-line-indentation (line-real-indent next-line))) + (cond + ((>= current-line-indentation next-line-indentation) + ; simple recursiive step to the next line + ; display "current-line-indent >= next-line-indent\n" + (loop + (append processed + (if (line-continues? current-line) + line + (wisp-add-source-properties-from line (list line)))) + (cdr unprocessed ); recursion here + indentation-levels)) + ((< current-line-indentation next-line-indentation) + ; display "current-line-indent < next-line-indent\n" + ; format #t "line: ~A\n" line + ; side-recursion via a sublist + (let-values + ( + ((sub-processed sub-unprocessed) + (loop + line + (cdr unprocessed ); recursion here + indentation-levels))) + ; format #t "side-recursion:\n sub-processed: ~A\n processed: ~A\n\n" sub-processed processed + (loop + (append processed (list sub-processed)) + sub-unprocessed ; simply use the recursion from the sub-recursion + indentation-levels)))))) + ((< current-indentation current-line-indentation) + ; display "current-indent < next-line\n" + (loop + processed + unprocessed + (cons ; recursion via the indentation-levels + current-line-indentation + indentation-levels))) + (else + (throw 'wisp-not-implemented + (format #f "Need to implement further line comparison: current: ~A, next: ~A, processed: ~A." + current-line next-line processed))))))))) + + +(define (wisp-scheme-replace-inline-colons lines) + "Replace inline colons by opening parens which close at the end of the line" + (let loop + ((processed '()) + (unprocessed lines)) + (if (null? unprocessed) + processed + (loop + (append processed (list (line-replace-inline-colons (car unprocessed)))) + (cdr unprocessed))))) + + +(define (wisp-scheme-strip-indentation-markers lines) + "Strip the indentation markers from the beginning of the lines" + (let loop + ((processed '()) + (unprocessed lines)) + (if (null? unprocessed) + processed + (loop + (append processed (cdr (car unprocessed))) + (cdr unprocessed))))) + +(define (wisp-unescape-underscore-and-colon code) + "replace \\_ and \\: by _ and :" + (match code + ((a ...) + (map wisp-unescape-underscore-and-colon a)) + ('\_ + '_) + ('\__ + '__) + ('\___ + '___) + ('\____ + '____) + ('\_____ + '_____) + ('\______ + '______) + ('\_______ + '_______) + ('\________ + '________) + ('\_________ + '_________) + ('\__________ + '__________) + ('\___________ + '___________) + ('\____________ + '____________) + ('\: + ':) + (a + a))) + + +(define (wisp-replace-empty-eof code) + "replace ((#<eof>)) by ()" + ; FIXME: Actually this is a hack which fixes a bug when the + ; parser hits files with only hashbang and comments. + (if (and (not (null? code)) (pair? (car code)) (eof-object? (car (car code))) (null? (cdr code)) (null? (cdr (car code)))) + (list) + code)) + + +(define (wisp-replace-paren-quotation-repr code) + "Replace lists starting with a quotation symbol by + quoted lists." + (match code + (('REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quote (map wisp-replace-paren-quotation-repr a))) + ((a ... 'REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ); this is the quoted empty list + (append + (map wisp-replace-paren-quotation-repr a) + (list (list 'quote (map wisp-replace-paren-quotation-repr b))))) + (('REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quasiquote (list 'unquote (map wisp-replace-paren-quotation-repr a)))) + (('REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unquote (map wisp-replace-paren-quotation-repr a))) + ((a ... 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ) + (append + (map wisp-replace-paren-quotation-repr a) + (list (list 'unquote (map wisp-replace-paren-quotation-repr b))))) + (('REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quasiquote (map wisp-replace-paren-quotation-repr a))) + ((a ... 'REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ); this is the quoted empty list + (append + (map wisp-replace-paren-quotation-repr a) + (list (list 'quasiquote (map wisp-replace-paren-quotation-repr b))))) + (('REPR-UNQUOTESPLICING-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unquote-splicing (map wisp-replace-paren-quotation-repr a))) + (('REPR-SYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'syntax (map wisp-replace-paren-quotation-repr a))) + (('REPR-UNSYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unsyntax (map wisp-replace-paren-quotation-repr a))) + (('REPR-QUASISYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quasisyntax (map wisp-replace-paren-quotation-repr a))) + (('REPR-UNSYNTAXSPLICING-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unsyntax-splicing (map wisp-replace-paren-quotation-repr a))) + ;; literal array as start of a line: # (a b) c -> (#(a b) c) + ((#\# a ...) + (with-input-from-string ;; hack to defer to read + (string-append "#" + (with-output-to-string + (λ () + (write (map wisp-replace-paren-quotation-repr a) + (current-output-port))))) + read)) + ((a ...) + (map wisp-replace-paren-quotation-repr a)) + (a + a))) + +(define (wisp-make-improper code) + "Turn (a #{.}# b) into the correct (a . b). + +read called on a single dot creates a variable named #{.}# (|.| +in r7rs). Due to parsing the indentation before the list +structure is known, the reader cannot create improper lists +when it reads a dot. So we have to take another pass over the +code to recreate the improper lists. + +Match is awesome!" + (let + ( + (improper + (match code + ((a ... b 'REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd c) + (append (map wisp-make-improper a) + (cons (wisp-make-improper b) (wisp-make-improper c)))) + ((a ...) + (map wisp-make-improper a)) + (a + a)))) + (define (syntax-error li msg) + (throw 'wisp-syntax-error (format #f "incorrect dot-syntax #{.}# in code: ~A: ~A" msg li))) + (if #t + improper + (let check + ((tocheck improper)) + (match tocheck + ; lists with only one member + (('REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd) + (syntax-error tocheck "list with the period as only member")) + ; list with remaining dot. + ((a ...) + (if (and (member repr-dot a)) + (syntax-error tocheck "leftover period in list") + (map check a))) + ; simple pair - this and the next do not work when parsed from wisp-scheme itself. Why? + (('REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd . c) + (syntax-error tocheck "dot as first element in already improper pair")) + ; simple pair, other way round + ((a . 'REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd) + (syntax-error tocheck "dot as last element in already improper pair")) + ; more complex pairs + ((? pair? a) + (let + ((head (drop-right a 1)) + (tail (last-pair a))) + (cond + ((equal? repr-dot (car tail)) + (syntax-error tocheck "equal? repr-dot : car tail")) + ((equal? repr-dot (cdr tail)) + (syntax-error tocheck "equal? repr-dot : cdr tail")) + ((member repr-dot head) + (syntax-error tocheck "member repr-dot head")) + (else + a)))) + (a + a)))))) + +(define (wisp-scheme-read-chunk port) + "Read and parse one chunk of wisp-code" + (let (( lines (wisp-scheme-read-chunk-lines port))) + (wisp-make-improper + (wisp-replace-empty-eof + (wisp-unescape-underscore-and-colon + (wisp-replace-paren-quotation-repr + (wisp-propagate-source-properties + (wisp-scheme-indentation-to-parens lines)))))))) + +(define (wisp-scheme-read-all port) + "Read all chunks from the given port" + (let loop + ((tokens '())) + (cond + ((eof-object? (peek-char port)) + tokens) + (else + (loop + (append tokens (wisp-scheme-read-chunk port))))))) + +(define (wisp-scheme-read-file path) + (call-with-input-file path wisp-scheme-read-all)) + +(define (wisp-scheme-read-file-chunk path) + (call-with-input-file path wisp-scheme-read-chunk)) + +(define (wisp-scheme-read-string str) + (call-with-input-string str wisp-scheme-read-all)) + +(define (wisp-scheme-read-string-chunk str) + (call-with-input-string str wisp-scheme-read-chunk)) + diff --git a/module/language/wisp/spec.scm b/module/language/wisp/spec.scm new file mode 100644 index 000000000..d5ea7abce --- /dev/null +++ b/module/language/wisp/spec.scm @@ -0,0 +1,107 @@ +;; Language interface for Wisp in Guile + +;;; adapted from guile-sweet: https://gitorious.org/nacre/guile-sweet/source/ae306867e371cb4b56e00bb60a50d9a0b8353109:sweet/common.scm + +;;; Copyright (C) 2005-2014 by David A. Wheeler and Alan Manuel K. Gloria +;;; Copyright (C) Arne Babenhauserheide (2014--2023). + +;;; Permission is hereby granted, free of charge, to any person +;;; obtaining a copy of this software and associated documentation +;;; files (the "Software"), to deal in the Software without +;;; restriction, including without limitation the rights to use, copy, +;;; modify, merge, publish, distribute, sublicense, and/or sell copies +;;; of the Software, and to permit persons to whom the Software is +;;; furnished to do so, subject to the following conditions: +;;; +;;; The above copyright notice and this permission notice shall be +;;; included in all copies or substantial portions of the Software. +;;; +;;; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +;;; EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +;;; MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +;;; NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS +;;; BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN +;;; ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN +;;; CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +;;; SOFTWARE. + +; adapted from spec.scm: https://gitorious.org/nacre/guile-sweet/source/ae306867e371cb4b56e00bb60a50d9a0b8353109:sweet/spec.scm +(define-module (language wisp spec) + #:use-module (language wisp) + #:use-module (system base compile) + #:use-module (system base language) + #:use-module (language scheme compile-tree-il) + #:use-module (language scheme decompile-tree-il) + #:export (wisp)) + +; Set locale to something which supports unicode. Required to avoid using fluids. +(catch #t + (lambda () + (setlocale LC_ALL "")) + (lambda (key . parameters) + (let ((locale-fallback "en_US.UTF-8")) + (format (current-error-port) + (string-join + (list ";;; Warning: setlocale LC_ALL \"\" failed with ~A: ~A" + "switching to explicit ~A locale. Please setup your locale." + "If this fails, you might need glibc support for unicode locales.\n") + "\n;;; ") + key parameters locale-fallback) + (catch #t + (lambda () + (setlocale LC_ALL locale-fallback)) + (lambda (key . parameters) + (format (current-error-port) + (string-join + (list ";;; Warning: fallback setlocale LC_ALL ~A failed with ~A: ~A" + "Not switching to Unicode." + "You might need glibc support for unicode locales.\n") + "\n;;; ") + locale-fallback key parameters)))))) + +;;; +;;; Language definition +;;; + +(define wisp-pending-sexps (list)) + +(define (read-one-wisp-sexp port env) + ;; allow using "# foo" as #(foo). + (read-hash-extend #\# (λ (chr port) #\#)) + (cond + ((eof-object? (peek-char port)) + (read-char port )); return eof: we’re done + (else + (let ((chunk (wisp-scheme-read-chunk port))) + (cond + ((not (null? chunk)) + (car chunk)) + (else + #f)))))) + +(define-language wisp + #:title "Wisp Scheme Syntax. See SRFI-119 for details." + ; . #:reader read-one-wisp-sexp + #:reader read-one-wisp-sexp ; : lambda (port env) : let ((x (read-one-wisp-sexp port env))) (display x)(newline) x ; + #:compilers `((tree-il . ,compile-tree-il)) + #:decompilers `((tree-il . ,decompile-tree-il)) + #:evaluator (lambda (x module) (primitive-eval x)) + #:printer write ; TODO: backtransform to wisp? Use source-properties? + #:make-default-environment + (lambda () + ;; Ideally we'd duplicate the whole module hierarchy so that `set!', + ;; `fluid-set!', etc. don't have any effect in the current environment. + (let ((m (make-fresh-user-module))) + ;; Provide a separate `current-reader' fluid so that + ;; compile-time changes to `current-reader' are + ;; limited to the current compilation unit. + (module-define! m 'current-reader (make-fluid)) + ;; Default to `simple-format', as is the case until + ;; (ice-9 format) is loaded. This allows + ;; compile-time warnings to be emitted when using + ;; unsupported options. + (module-set! m 'format simple-format) + m))) + + + diff --git a/test-suite/tests/srfi-119.test b/test-suite/tests/srfi-119.test new file mode 100644 index 000000000..a888df41d --- /dev/null +++ b/test-suite/tests/srfi-119.test @@ -0,0 +1,81 @@ +;;;; srfi-119.test --- Test suite for Guile's SRFI-119 reader. -*- scheme -*- +;;;; +;;;; Copyright (C) 2023 Free Software Foundation, Inc. +;;;; +;;;; This library is free software; you can redistribute it and/or +;;;; modify it under the terms of the GNU Lesser General Public +;;;; License as published by the Free Software Foundation; either +;;;; version 3 of the License, or (at your option) any later version. +;;;; +;;;; This library is distributed in the hope that it will be useful, +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;;;; Lesser General Public License for more details. +;;;; +;;;; You should have received a copy of the GNU Lesser General Public +;;;; License along with this library; if not, write to the Free Software +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +(define-module (test-srfi-119) + #:use-module (test-suite lib) + #:use-module (srfi srfi-1) + #:use-module (language wisp)) + +(define (read-string s) + (with-input-from-string s read)) + +(define (with-read-options opts thunk) + (let ((saved-options (read-options))) + (dynamic-wind + (lambda () + (read-options opts)) + thunk + (lambda () + (read-options saved-options))))) + +(define (wisp->list str) + (wisp-scheme-read-string str)) + +(with-test-prefix "wisp-read-simple" + (pass-if (equal? (wisp->list "<= n 5") '((<= n 5)))) + (pass-if (equal? (wisp->list ". 5") '(5))) + (pass-if (equal? (wisp->list "+ 1 : * 2 3") '((+ 1 (* 2 3)))))) +(with-test-prefix "wisp-read-complex" + (pass-if (equal? (wisp->list " +a b c d e + . f g h + . i j k + +concat \"I want \" + getwish from me + . \" - \" username +") '( +(a b c d e + f g h + i j k) + +(concat "I want " + (getwish from me) + " - " username)))) + + (pass-if (equal? (wisp->list " +define : a b c +_ d e +___ f +___ g h +__ . i + +define : _ +_ display \"hello\n\" + +\\_") '( +(define (a b c) + (d e + (f) + (g h) + i)) + +(define (_) + (display "hello\n")) + +(_))))) -- 2.39.1 [-- Attachment #1.3: Type: text/plain, Size: 81 bytes --] -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-03 21:26 Dr. Arne Babenhauserheide @ 2023-02-04 15:08 ` Maxime Devos 2023-02-04 15:46 ` Dr. Arne Babenhauserheide 2023-02-16 8:03 ` Dr. Arne Babenhauserheide 2023-09-30 13:17 ` Christine Lemmer-Webber 1 sibling, 2 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-04 15:08 UTC (permalink / raw) To: Dr. Arne Babenhauserheide, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 4872 bytes --] > Why add Wisp? > > For Wisp: it is then available directly wherever Guile is available. > This will make it much easier for people to follow tutorials. I'm not convinced of this argument, because package managers exist, but ... > For Guile: > > - Wisp has proven to be good at enabling people to get an > entrance to Scheme² without pulling them out of the community. > - [...] ... all good points, and the implementation of Wisp is tiny anyway. For an additional reason: Wisp is a SRFI (Scheme Requests for Implementation) and Guile is a Scheme implementation. > So I’d like to ask: can we merge Wisp as supported language into Guile? From some conversations elsewhere, I got the impression that (use-modules (foo)) will search for foo.scm and not in foo.w. I think you'll need to tweak the loading mechanism to also look for foo.w instead of only foo.scm, if not done already. Also, I think that when foo.go exists, but foo.scm doesn't, then Guile refuses to load foo.scm, though I'm less sure of that. If this is the case, I propose removing the requirement that the source code is available, or alternatively keep the 'source code available' requirement and also accept 'foo.w', if not done already. > +; Set locale to something which supports unicode. Required to avoid > using fluids. > +(catch #t * Why avoid fluids? * Assuming for sake of argument that fluids are to be avoided, what is the point of setting the locale to something supporting Unicode? As-is, it now becomes impossible to use 'gettext' to translate software to non-English locales when the software imports (language wisp), which seems unfortunate to me. If you elaborate on what your goal here is, maybe I have an alternative solution. > + ;; allow using "# foo" as #(foo). > + (read-hash-extend #\# (λ (chr port) #\#)) That's a rather Wisp-specific extension, but it appears you are extending things globally. Instead, I propose extending it temporarily, with the undocumented '%read-hash-procedures' fluid. > + (let > + ( > + (l Lonely parenthesis. + (not (= 0 (line-real-indent (car lines ))))); -1 is a line with a comment Superfluous space after 'lines'. > + ; simple recursiive step to the next line I think the convention is ';;', OTOH there exist multiple conventions. +(define (wisp-scheme-replace-inline-colons lines) + "Replace inline colons by opening parens which close at the end of the line" Too much space; convention is two spaces. (Similar styles issues in other places.) "guix style" might be useful. > +(define (wisp-replace-paren-quotation-repr code) > + "Replace lists starting with a quotation symbol by > + quoted lists." > + (match code > + (('REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'quote (map wisp-replace-paren-quotation-repr a))) > [...] > +(define wisp-uuid "e749c73d-c826-47e2-a798-c16c13cb89dd") > +; define an intermediate dot replacement with UUID to avoid clashes. > +(define repr-dot ; . > + (string->symbol (string-append "REPR-DOT-" wisp-uuid))) There is a risk of collision -- e.g., suppose that someone translates your implementation of Wisp into Wisp. I imagine there might be a risk of misinterpreting the 'REPR-QUOTE-...' in wisp-replace-parent-quotation-repr, though I haven't tried it out. As such, assuming this actually works, I propose using uninterned symbols instead, e.g.: (define repr-dot (make-symbol "REPR-DOT")). If this change is done, you might need to replace + ;; literal array as start of a line: # (a b) c -> (#(a b) c) + ((#\# a ...) + (with-input-from-string ;; hack to defer to read + (string-append "#" + (with-output-to-string + (λ () + (write (map wisp-replace-paren-quotation-repr a) + (current-output-port))))) + read)) (unverified -- I think removing this is unneeded but I don't understand this REPR-... stuff well enough). Also, I wonder if you could just do something like (apply vector (map wisp-replace-paren-quotation-repr a)) instead of this 'hack to defer to read' thing. This seems simpler to me and equivalent. (AFAIK, these REPR-... symbols are never written to a port or turned into syntax, so I think that uninterned symbols would work here.) (Aside from the REPR-... thing, I'm assuming (language wisp) is alright -- the SRFI is in 'final' status and it has been stable for years now, after all.) Greetings, Maxime [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-04 15:08 ` Maxime Devos @ 2023-02-04 15:46 ` Dr. Arne Babenhauserheide 2023-02-04 19:09 ` Maxime Devos 2023-02-16 8:03 ` Dr. Arne Babenhauserheide 1 sibling, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-04 15:46 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 7161 bytes --] Thank you for your review! Maxime Devos <maximedevos@telenet.be> writes: >> Why add Wisp? >> For Wisp: it is then available directly wherever Guile is available. >> This will make it much easier for people to follow tutorials. > > I'm not convinced of this argument, because package managers exist, but ... > >> For Guile: >> - Wisp has proven to be good at enabling people to get an >> entrance to Scheme² without pulling them out of the community. >> - [...] > > ... all good points, and the implementation of Wisp is tiny anyway. > For an additional reason: Wisp is a SRFI (Scheme Requests for > Implementation) and Guile is a Scheme implementation. That’s a good point — I should really have written it :-) >> So I’d like to ask: can we merge Wisp as supported language into Guile? > > From some conversations elsewhere, I got the impression that > > (use-modules (foo)) > > will search for foo.scm and not in foo.w. I think you'll need to > tweak the loading mechanism to also look for foo.w instead of only > foo.scm, if not done already. This needs an addition to the extensions via guile -x .w — I wrote that in the documentation. I didn’t want to do that unconditionally, because detecting a wisp file as scheme import would cause errors. Is there a way to only extend the loading mechanism to detect .w when language is changed to wisp? readable uses (set! %load-extensions (cons ".sscm" %load-extensions)) Would that be the correct way of doing this? > Also, I think that when foo.go exists, but foo.scm doesn't, then Guile > refuses to load foo.scm, though I'm less sure of that. If this is the > case, I propose removing the requirement that the source code is > available, or alternatively keep the 'source code available' > requirement and also accept 'foo.w', if not done already. I think accepting any extension supported by any language in Guile would be better. >> +; Set locale to something which supports unicode. Required to avoid >> using fluids. >> +(catch #t > > * Why avoid fluids? I’m not sure anymore. It has been years since I wrote that code … I think it was because I did not understand what that would mean for the program. And I actually still don’t know … Hoow would I do that instead with fluids? > * Assuming for sake of argument that fluids are to be avoided, > what is the point of setting the locale to something supporting > Unicode? I had problems with reading unicode symbols. Things like define (Σ . args) : apply + args > As-is, it now becomes impossible to use 'gettext' to translate > software to non-English locales when the software imports (language > wisp), which seems unfortunate to me. That is very much not what I want. > If you elaborate on what your > goal here is, maybe I have an alternative solution. This is to ensure that Wisp are always read as Unicode. Since it uses regular (read) as part of parsing, it must affect (read), too. >> + ;; allow using "# foo" as #(foo). >> + (read-hash-extend #\# (λ (chr port) #\#)) > > That's a rather Wisp-specific extension, but it appears you are > extending things globally. Instead, I propose extending it > temporarily, with the undocumented '%read-hash-procedures' fluid. > >> + (let >> + ( >> + (l > > Lonely parenthesis. Thank you! Will be fixed :-) > + (not (= 0 (line-real-indent (car lines ))))); -1 is a > line with a comment > > Superfluous space after 'lines'. > >> + ; simple recursiive step to the next line > > I think the convention is ';;', OTOH there exist multiple conventions. > > +(define (wisp-scheme-replace-inline-colons lines) > + "Replace inline colons by opening parens which close at the > end of the line" > > Too much space; convention is two spaces. > (Similar styles issues in other places.) > "guix style" might be useful. I’ll do that … >> +(define (wisp-replace-paren-quotation-repr code) >> + "Replace lists starting with a quotation symbol by >> + quoted lists." >> + (match code >> + (('REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) >> + (list 'quote (map wisp-replace-paren-quotation-repr a))) >> [...] >> +(define wisp-uuid "e749c73d-c826-47e2-a798-c16c13cb89dd") >> +; define an intermediate dot replacement with UUID to avoid clashes. >> +(define repr-dot ; . >> + (string->symbol (string-append "REPR-DOT-" wisp-uuid))) > > There is a risk of collision -- e.g., suppose that someone translates > your implementation of Wisp into Wisp. I imagine there might be a > risk of misinterpreting the 'REPR-QUOTE-...' in > wisp-replace-parent-quotation-repr, though I haven't tried it out. This is actually auto-translated from wisp via wisp2lisp :-) > As such, assuming this actually works, I propose using uninterned > symbols instead, e.g.: > > (define repr-dot (make-symbol "REPR-DOT")). That looks better — does uninterned symbol mean it can’t be mis-interpreted? Can I (match l ...) on uninterned symbols? They are used to match on precisely these symbols later. Can I write it into a string and then read it back? When I see them, I have to turn them into a different representation that I can then write back into the string and allow it to be read by the normal reader. > If this change is done, you might need to replace > > + ;; literal array as start of a line: # (a b) c -> (#(a b) c) > + ((#\# a ...) > + (with-input-from-string ;; hack to defer to read > + (string-append "#" > + (with-output-to-string > + (λ () > + (write (map > wisp-replace-paren-quotation-repr a) > + (current-output-port))))) > + read)) > > > (unverified -- I think removing this is unneeded but I don't > understand this REPR-... stuff well enough). The REPR supports the syntactic sugar like '(...) for (quote ...) by turning (' ...) into '(...). Also it is needed to turn ((. a b c)) into (a b c). However the literal array is used to make it possible to define procedure properties which need a literal array. > Also, I wonder if you could just do something like > > (apply vector (map wisp-replace-paren-quotation-repr a)) > > instead of this 'hack to defer to read' thing. This seems simpler to > me and equivalent. That looks much cleaner. Thank you! > (AFAIK, these REPR-... symbols are never written to a port or turned > into syntax, so I think that uninterned symbols would work here.) They are unread into a string. > (Aside from the REPR-... thing, I'm assuming (language wisp) is > alright -- the SRFI is in 'final' status and it has been stable for > years now, after all.) Thank you! Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-04 15:46 ` Dr. Arne Babenhauserheide @ 2023-02-04 19:09 ` Maxime Devos 2023-02-04 21:35 ` Dr. Arne Babenhauserheide 0 siblings, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-04 19:09 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 12311 bytes --] On 04-02-2023 16:46, Dr. Arne Babenhauserheide wrote: > [...] >>> So I’d like to ask: can we merge Wisp as supported language into Guile? >> >> From some conversations elsewhere, I got the impression that >> >> (use-modules (foo)) >> >> will search for foo.scm and not in foo.w. I think you'll need to >> tweak the loading mechanism to also look for foo.w instead of only >> foo.scm, if not done already. > > This needs an addition to the extensions via guile -x .w — I wrote that > in the documentation. I didn’t want to do that unconditionally, because > detecting a wisp file as scheme import would cause errors. If done carefully, I don't think this situations would happen. More precisely: * .w would be in the file extensions list. * Instead of a list, it would actually be a map from extensions to languages: .scm -> scheme .w -> wisp With this change, (use-modules (foo)) will load 'foo.scm' as Scheme and 'foo.w' as Wisp. (Assuming that foo.go is out-of-date or doesn't exist.) (For backwards compatibility, I think %load-extensions needs to remain a list of strings, but a %extension-language variable could be defined.) * "guile --language=whatever foo" loads foo as whatever, regardless of the extension of 'foo' (if a specific language is requested, then the user knows best). * "guile foo" without --language will look up the extension of foo in the extension map. If an entry exists, it would use the corresponding language. If no entry exists, it would use a default language (scheme). With these changes, I don't think that Wisp code would be detected as Scheme or the other way around. > Is there a way to only extend the loading mechanism to detect .w when > language is changed to wisp? Regardless of whether it's technically possible, that sounds insufficient to me. Suppose someone writes a library 'Foo' in Wisp. Suppose I write a library 'Bar' in parenthese-y Scheme, that happens to use the Foo library as a dependency. Then when compiling Bar or running its tests, it will be done in the Scheme language, and additionally assuming that compiled .go are available for Foo, then the language will never be changed to Wisp, and hence .w will never be added to %load-extensions. As such, the Makefile.am or equivalent of Foo would need to be converted to Wisp, or '-x w' would need to be added. I don't care what language the library Foo is written in, and my library Bar isn't written in Wisp so it seems unreasonable to have to add -x w. (It wouldn't be too much trouble, but still not something that should have to be done _in Bar_, as the Wispyness of Foo is just an implementation detail of Foo, not Bar.) Worse, adding the Wispy library Foo of the parenthese-y library Bar would be an incompatible change, as parenthese-y dependents of Foo would need to add '-x w' in places whereas they didn't to previously. It's easily resolvable, but I think it would be very annoying as well. > readable uses This sentence appears to be incomplete; I might have misinterpreted it below (I don't know what you mean with 'readable' -- its an adjective and you are using it as a noun?). > (set! %load-extensions (cons ".sscm" %load-extensions)) > > Would that be the correct way of doing this? I assume you meant ".w" instead of ".sscm". I don't quite see how this would be an answer to: Is there a way to only extend the loading mechanism to detect .w when language is changed to wisp? More precisely, I'm missing how it addresses 'only ... when the language is changed to wisp'. FWIW, it appears to be an answer to the following unasked question: How to make Guile accept "foo.go" when "foo.w" exists and is up-to-date. >> Also, I think that when foo.go exists, but foo.scm doesn't, then Guile >> refuses to load foo.scm, though I'm less sure of that. If this is the >> case, I propose removing the requirement that the source code is >> available, or alternatively keep the 'source code available' >> requirement and also accept 'foo.w', if not done already. > > I think accepting any extension supported by any language in Guile would > be better. This sounds like the second proposal ('alternatively ...'), but the way it is written, you appear to proposing it as a third proposal. Is this the case? (I mean, after this patch, Wisp is a supported language, so it seems equivalent to me.) >>> +; Set locale to something which supports unicode. Required to avoid >>> using fluids. >>> +(catch #t >> >> * Why avoid fluids? > > I’m not sure anymore. It has been years since I wrote that code … > > I think it was because I did not understand what that would mean for the > program. And I actually still don’t know … > > Hoow would I do that instead with fluids? > >> * Assuming for sake of argument that fluids are to be avoided, >> what is the point of setting the locale to something supporting >> Unicode? > > I had problems with reading unicode symbols. Things like > define (Σ . args) : apply + args > [...]> > This is to ensure that Wisp are always read as Unicode. Since it uses > regular (read) as part of parsing, it must affect (read), too. OK. So, Wisp files are supposed to be UTF-8, no matter the locale? AFAICT, the SRFI-119 document does not mention this UTF-8 (or UTF-16, or ...) requirement anywhere, this seems like an omission in <https://srfi.schemers.org/srfi-119/srfi-119.html> to me. First, I would like to point out the following part of ‘(guile)The Top of a Script File’: • If this source code file is not ASCII or ISO-8859-1 encoded, a coding declaration such as ‘coding: utf-8’ should appear in a comment somewhere in the first five lines of the file: see *note Character Encoding of Source Files::. oing by this, it is already possible to ask Guile to read the Scheme files as UTF-8; presumably the relevant bits could be copied over to Wisp. (I don't know if this applies to non-script files, but I'd assume so.) It's not 'UTF-8 by default', but it can be 'close enough', and doing 'always UTF-8 even if coding: something-else' would be inconsistent with the Scheme language, so I ask you to consider whether it's worth (and perhaps the answer is 'yes'). (OTOH, (guile)Character Encoding says 'In the absence of any hints, UTF-8 is assumed.' which appears to suffice for you, but it also contradicts "If this source file is not ASCII or ISO-8859-1 encodes, ...", so I don't know what precisely is going on here.) If you aren't going for the 'coding: ...' stuff or porting the encoding autodetection from Scheme to Wisp, here's an alternative solution: Keep in mind that encodings are a per-port property -- the locale might have a default encoding, and ports by default take the encoding from %default-port-encoding or the locale (I think), but you can override the port encoding: -- Scheme Procedure: set-port-encoding! port enc -- C Function: scm_set_port_encoding_x (port, enc) Sets the character encoding that will be used to interpret I/O to PORT. ENC is a string containing the name of an encoding. Valid encoding names are those defined by IANA (http://www.iana.org/assignments/character-sets), for example ‘"UTF-8"’ or ‘"ISO-8859-1"’. As such, I propose calling set-port-encoding! right in the beginning of read-one-wisp-sexp. Also, unrelated, I now noticed some dead code you can remove: +(define wisp-pending-sexps (list)) > [...] >>> +(define (wisp-replace-paren-quotation-repr code) >>> + "Replace lists starting with a quotation symbol by >>> + quoted lists." >>> + (match code >>> + (('REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) >>> + (list 'quote (map wisp-replace-paren-quotation-repr a))) >>> [...] >>> +(define wisp-uuid "e749c73d-c826-47e2-a798-c16c13cb89dd") >>> +; define an intermediate dot replacement with UUID to avoid clashes. >>> +(define repr-dot ; . >>> + (string->symbol (string-append "REPR-DOT-" wisp-uuid))) >> >> There is a risk of collision -- e.g., suppose that someone translates >> your implementation of Wisp into Wisp. I imagine there might be a >> risk of misinterpreting the 'REPR-QUOTE-...' in >> wisp-replace-parent-quotation-repr, though I haven't tried it out. > > This is actually auto-translated from wisp via wisp2lisp :-) > >> As such, assuming this actually works, I propose using uninterned >> symbols instead, e.g.: >> >> (define repr-dot (make-symbol "REPR-DOT")). > > That looks better — does uninterned symbol mean it can’t be > mis-interpreted? Yes. This is because 'read' only reads interned symbols; uninterned symbols are unreadable: scheme@(guile-user)> (make-symbol "foo") $1 = #<uninterned-symbol foo 7f17efab7240> scheme@(guile-user)> #<uninterned-symbol foo 7f17efab7240> While reading expression: #<unknown port>:2:3: Unknown # object: "#<" Also: (eq? (make-symbol "stuff") 'stuff) -> #false. > Can I (match l ...) on uninterned symbols? They are used to match on > precisely these symbols later. Yes, but it's going to look differently and more verbose: (define interned-symbol1 (make-symbol "foo1")) (define interned-symbol2 (make-symbol "foo2")) (match symbol ((? (lambda (x) (eq? x interned-symbol1))) stuff1) ((? (lambda (x) (eq? x interned-symbol2))) stuff2) [...]) -- basically, replace 'stuff by (? (lambda (x) ...)). > Can I write it into a string and then read it back? No. If you could, then uninterned symbols wouldn't be uninterned anymore, but rather a separation of symbols in two kinds that pretty much behave the same, and then you would again have a (very low) risk of a collision: > When I see them, I have to turn them into a different representation > that I can then write back into the string and allow it to be read by > the normal reader. That's the case for the old code, but AFAIK it is only done in the following ... > >> If this change is done, you might need to replace >> >> + ;; literal array as start of a line: # (a b) c -> (#(a b) c) >> + ((#\# a ...) >> + (with-input-from-string ;; hack to defer to read >> + (string-append "#" >> + (with-output-to-string >> + (λ () >> + (write (map >> wisp-replace-paren-quotation-repr a) >> + (current-output-port))))) >> + read)) >> >> (unverified -- I think removing this is unneeded but I don't >> understand this REPR-... stuff well enough). ..., for which I proposed a replacement, so do you still need to turn it in a string & back? > > The REPR supports the syntactic sugar like '(...) for (quote ...) by turning > (' ...) into '(...). > > Also it is needed to turn ((. a b c)) into (a b c). > > However the literal array is used to make it possible to define > procedure properties which need a literal array. > >> Also, I wonder if you could just do something like >> >> (apply vector (map wisp-replace-paren-quotation-repr a)) >> >> instead of this 'hack to defer to read' thing. This seems simpler to >> me and equivalent. > > That looks much cleaner. Thank you! This sounds positive, but it is unclear to me if I have found a solution, because of your negative "However the literal array is used to make it possible to define procedure properties which need a literal array." comment. Do I need to look into solving the 'literal array and procedure properties' stuff, or does the (apply vector (map ...)) suffice as-is? (If there is 'literal array and procedure properties' stuff to be solved, you will need to elaborate on what you mean, because arrays aren't procedures and procedures aren't arrays -- maybe you meant 'object properties'?) Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-04 19:09 ` Maxime Devos @ 2023-02-04 21:35 ` Dr. Arne Babenhauserheide 2023-02-05 15:08 ` Maxime Devos 0 siblings, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-04 21:35 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 10358 bytes --] Maxime Devos <maximedevos@telenet.be> writes: >> This needs an addition to the extensions via guile -x .w — I wrote >> that >> in the documentation. I didn’t want to do that unconditionally, because >> detecting a wisp file as scheme import would cause errors. > > If done carefully, I don't think this situations would happen. > More precisely: > > * .w would be in the file extensions list. > > * Instead of a list, it would actually be a map from extensions to > languages: > > .scm -> scheme > .w -> wisp > > With this change, (use-modules (foo)) will load 'foo.scm' as Scheme > and 'foo.w' as Wisp. (Assuming that foo.go is out-of-date or > doesn't exist.) > > (For backwards compatibility, I think %load-extensions needs to > remain a list of strings, but a %extension-language variable could > be defined.) > > * "guile --language=whatever foo" loads foo as whatever, regardless > of the extension of 'foo' (if a specific language is requested, > then the user knows best). > > * "guile foo" without --language will look up the extension of foo in > the extension map. If an entry exists, it would use the > corresponding language. If no entry exists, it would use > a default language (scheme). This sounds good, though a bit more complex than I think it should be. I think this should stick to only load Scheme if no language is detected to keep Scheme the default language for Guile — and also to avoid stumbling over files that just take that extension. Checking more files could slow down startup and I think having multiple languages fully equal would risk splintering the development community. Guile is first and foremost Scheme and fast startup time is essential. More complicated is what should be done if a *.go file is detected during import. There I could see Guile check if a file with any supported extension is up to date. >> Is there a way to only extend the loading mechanism to detect .w when >> language is changed to wisp? > I don't care what language the library Foo is written in, and my > library Bar isn't written in Wisp so it seems unreasonable to have to > add -x w. I think you’re right with that. For any already compiled library, the language should not matter. >> readable uses > > This sentence appears to be incomplete; I might have misinterpreted it > below (I don't know what you mean with 'readable' -- its an adjective > and you are using it as a noun?). readable is a noun, yes: the readable lisp project. >> (set! %load-extensions (cons ".sscm" %load-extensions)) >> Would that be the correct way of doing this? > FWIW, it appears to be an answer to the following unasked question: > > How to make Guile accept "foo.go" when "foo.w" exists and is > up-to-date. Yes, I think that is the most important question. If that is solved, guile provides a multi-language environment in which only the build tools of the libraries themselves have to know the languages used. > This sounds like the second proposal ('alternatively ...'), but the > way it is written, you appear to proposing it as a third proposal. Is > this the case? It only differs in details (keeping Scheme more central and only checking for non-scheme languages if a *.go file is detected). > (I mean, after this patch, Wisp is a supported language, so it seems > equivalent to me.) Pretty close, yes. >>>> +; Set locale to something which supports unicode. Required to avoid >>>> using fluids. >>>> +(catch #t >>> >>> * Why avoid fluids? >> I’m not sure anymore. It has been years since I wrote that code … >> I think it was because I did not understand what that would mean for >> the >> program. And I actually still don’t know … >> Hoow would I do that instead with fluids? >> >>> * Assuming for sake of argument that fluids are to be avoided, >>> what is the point of setting the locale to something supporting >>> Unicode? >> I had problems with reading unicode symbols. Things like >> define (Σ . args) : apply + args >> [...]> >> This is to ensure that Wisp are always read as Unicode. Since it uses >> regular (read) as part of parsing, it must affect (read), too. > > OK. So, Wisp files are supposed to be UTF-8, no matter the locale? > AFAICT, the SRFI-119 document does not mention this UTF-8 (or UTF-16, > or ...) requirement anywhere, this seems like an omission in > <https://srfi.schemers.org/srfi-119/srfi-119.html> to me. That’s an omission, yes … but since it was omitted (by me …), you’re right. Forcing UTF-8 is actually not the way. > First, I would like to point out the following part of > ‘(guile)The Top of a Script File’: > > • If this source code file is not ASCII or ISO-8859-1 encoded, a > coding declaration such as ‘coding: utf-8’ should appear in a > comment somewhere in the first five lines of the file: see *note > Character Encoding of Source Files::. … > (OTOH, (guile)Character Encoding says 'In the absence of any hints, > UTF-8 is assumed.' which appears to suffice for you, but it also > contradicts "If this source file is not ASCII or ISO-8859-1 encodes, > ...", so I don't know what precisely is going on here.) I think this inconsistency calls for calling in old timers who know why this is there. Maybe one of these is just a leftover? > Keep in mind that encodings are a per-port property -- the locale > might have a default encoding, and ports by default take the encoding > from %default-port-encoding or the locale (I think), but you can > override the port encoding: > > -- Scheme Procedure: set-port-encoding! port enc > -- C Function: scm_set_port_encoding_x (port, enc) > Sets the character encoding that will be used to interpret I/O to > PORT. ENC is a string containing the name of an encoding. Valid > encoding names are those defined by IANA > (http://www.iana.org/assignments/character-sets), for example > ‘"UTF-8"’ or ‘"ISO-8859-1"’. > > As such, I propose calling set-port-encoding! right in the beginning > of read-one-wisp-sexp. This sounds like the best way forward on the short term. > Also, unrelated, I now noticed some dead code you can remove: > > +(define wisp-pending-sexps (list)) You’re right, that was only needed in a previous iteration of wisp (last used more than 3 years ago, IIRC). Thank you! >>> (define repr-dot (make-symbol "REPR-DOT")). >> That looks better — does uninterned symbol mean it can’t be >> mis-interpreted? > > Yes. This is because 'read' only reads interned symbols; uninterned > symbols are unreadable: … >> Can I write it into a string and then read it back? > > No. If you could, then uninterned symbols wouldn't be uninterned > anymore, but rather a separation of symbols in two kinds that pretty > much behave the same, and then you would again have a (very low) risk > of a collision: This sounds like I cannot go that way, because there’s a necessary pre-processing step in wisp-read via (match-charlist-to-repr peeked): (define (wisp-read port) "wrap read to catch list prefixes." (let ((prefix-maxlen 4)) (let longpeek ((peeked '()) (repr-symbol #f)) (cond ((or (< prefix-maxlen (length peeked)) (eof-object? (peek-char port)) (equal? #\space (peek-char port)) (equal? #\newline (peek-char port))) (if repr-symbol ; found a special symbol, return it. repr-symbol (let unpeek ((remaining peeked)) (cond ((equal? '() remaining) (read port)); let read to the work (else (unread-char (car remaining) port) (unpeek (cdr remaining))))))) (else (let* ((next-char (read-char port)) (peeked (cons next-char peeked))) (longpeek peeked (match-charlist-to-repr peeked)))))))) This actually needs to be able to write the replacement symbols back into the port. > ..., for which I proposed a replacement, so do you still need to turn > it in a string & back? Sadly yes. Otherwise the normal reader will play tricks on the code, because it does not know where a symbol needs to be interpreted differently (i.e. where ` needs to be treated as `() even though that’s not in the string). >> The REPR supports the syntactic sugar like '(...) for (quote ...) by >> turning >> (' ...) into '(...). >> Also it is needed to turn ((. a b c)) into (a b c). >> However the literal array is used to make it possible to define >> procedure properties which need a literal array. >> >>> Also, I wonder if you could just do something like >>> >>> (apply vector (map wisp-replace-paren-quotation-repr a)) >>> >>> instead of this 'hack to defer to read' thing. This seems simpler to >>> me and equivalent. >> That looks much cleaner. Thank you! > > This sounds positive, but it is unclear to me if I have found a > solution, because of your negative "However the literal array is used > to make it possible to define procedure properties which need a > literal array." comment. > > Do I need to look into solving the 'literal array and procedure > properties' stuff, or does the (apply vector (map ...)) suffice as-is? > > (If there is 'literal array and procedure properties' stuff to be > solved, you will need to elaborate on what you mean, because arrays > aren't procedures and procedures aren't arrays -- maybe you meant > 'object properties'?) I meant this: (define (foo) #((bar . baz)) #f) (procedure-properties foo) => ((name . foo) (bar . baz)) I use that for doctests: (define (A) #((tests (test-eqv 'A (A)) (test-assert #t))) 'A) (define %this-module (current-module)) (define (main args) (doctests-testmod %this-module)) Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-04 21:35 ` Dr. Arne Babenhauserheide @ 2023-02-05 15:08 ` Maxime Devos 2023-02-14 8:32 ` Dr. Arne Babenhauserheide 0 siblings, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-05 15:08 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 6790 bytes --] On 04-02-2023 22:35, Dr. Arne Babenhauserheide wrote: > > Maxime Devos <maximedevos@telenet.be> writes: > >>> This needs an addition to the extensions via guile -x .w — I wrote >>> that >>> in the documentation. I didn’t want to do that unconditionally, because >>> detecting a wisp file as scheme import would cause errors. >> >> If done carefully, I don't think this situations would happen. >> More precisely: >> >> * .w would be in the file extensions list. >> >> * Instead of a list, it would actually be a map from extensions to >> languages: >> >> .scm -> scheme >> .w -> wisp >> >> With this change, (use-modules (foo)) will load 'foo.scm' as Scheme >> and 'foo.w' as Wisp. (Assuming that foo.go is out-of-date or >> doesn't exist.) >> >> (For backwards compatibility, I think %load-extensions needs to >> remain a list of strings, but a %extension-language variable could >> be defined.) >> >> * "guile --language=whatever foo" loads foo as whatever, regardless >> of the extension of 'foo' (if a specific language is requested, >> then the user knows best). >> >> * "guile foo" without --language will look up the extension of foo in >> the extension map. If an entry exists, it would use the >> corresponding language. If no entry exists, it would use >> a default language (scheme). > > This sounds good, though a bit more complex than I think it should be. > > I think this should stick to only load Scheme if no language is detected > to keep Scheme the default language for Guile To my knowledge, this is the case in my proposal. Detecting the language is done via the file extension, and if no known mapping exists, it defaults to Scheme. > — and also to avoid > stumbling over files that just take that extension. While I suppose it is theoretically possible someone will write a .w file that contains Scheme code instead of Wisp, I'm not convinced by this argument. It sounds very unlikely, and also a 'don't do that, then' situation. > Checking more files > could slow down startup and I think having multiple languages fully > equal would risk splintering the development community. > > Guile is first and foremost Scheme and fast startup time is essential. > > More complicated is what should be done if a *.go file is detected > during import. There I could see Guile check if a file with any > supported extension is up to date. Maybe the .go could contain some information on what the corresponding source code file name is, and Guile could read the .go without checking first checking for up-to-dateness. (But only reading; not loading yet!) (There is already debugging information with such information, but to my understanding that's for individual procedures, not the .go as a whole, and by using stuff like 'include' or macros there can be multiple source files.) Once read, it should be easy to look up the source code file name from the .go and then verify whether the .go is up to date, and proceed with actually loading the .go (as in, put stuff in the module system, run top-level code, ...). IIUC, that would be fully backwards compatible, and not cause any non-negligible slowdowns. I also have an alternative proposal, more complicated and backwards-incompatible -- I wouldn't recommend it, but for completeness: * when doing (use-module (foo)) and foo.go exists in the $GUILE_LOAD_COMPILED_PATH, load it, and don't bother checking whether foo.scm, foo.w or foo.whatever exists or is up-to-date. (If not done already in Guile.) That should solve the 'I don't care what language the library Foo is written in, and my library Bar isn't written in Wisp so it seems unreasonable to have to add -x w.’, and would also avoid the need for a 'extension -> language map' thing. It should also be a little faster than what we had before. That's for "make install", "apt-get install", "guix install" ...-like uses of compiled .go -- let's call them 'installed .go'. It won't work for ~/.cache/guile/ccache/3.0-LE-8-4 (‘cached .go’) as for that it's actually important to check up-to-dateness because, well, cache. Additionally, to support compiling software that is already installed, there needs to be an option to treat certain modules with the 'cache' behaviour even if not in the actual ~/.cache, maybe with some '--local-module=(stuff ...)' option (name pending). This would be backwards-incompatible, but it could be done. >>> readable uses >> >> This sentence appears to be incomplete; I might have misinterpreted it >> below (I don't know what you mean with 'readable' -- its an adjective >> and you are using it as a noun?). > > readable is a noun, yes: the readable lisp project. Looks like you meant this: <https://readable.sourceforge.io/>. >>> Can I write it into a string and then read it back? >> >> No. [...] > > This sounds like I cannot go that way, because there’s a necessary > pre-processing step in wisp-read via (match-charlist-to-repr peeked): > [...] > This actually needs to be able to write the replacement symbols back > into the port. > >> ..., for which I proposed a replacement, so do you still need to turn >> it in a string & back? > > Sadly yes. Otherwise the normal reader will play tricks on the code, > because it does not know where a symbol needs to be interpreted > differently (i.e. where ` needs to be treated as `() even though that’s > not in the string). OK, too bad. Looks like the REPR-... stuff is to stay for now. > I meant this: > > (define (foo) > #((bar . baz)) > #f) > (procedure-properties foo) > => ((name . foo) (bar . baz)) > > I use that for doctests: [...] I didn't know that these literal vectors get turned into procedure properties ... looks interesting. Also, about wisp-unescape-underscore-and-colon and the 'only unescapes up to 12 leading underscores at line start (\____________)' limitation: I have found a solution: you can use a combination of string-every, symbol->string, string->symbol, substring and string-ref: (cond ((list? code) (map wisp-... code)) ((eq? code '\:) ':) ;; Look for symbols like \____ and remove the \. ((symbol? code) (let ((as-string (symbol->string code))) (if (and (>= (string-length as-string) 2) ; at least a single underscore (char=? (string-ref as-string 0) #\\) (string-every #\_ (substring as-string 1))) (string->symbol (substring as-string 1)) code))) (#true code)) Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-05 15:08 ` Maxime Devos @ 2023-02-14 8:32 ` Dr. Arne Babenhauserheide 2023-02-14 21:24 ` Dr. Arne Babenhauserheide 0 siblings, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-14 8:32 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1.1: Type: text/plain, Size: 6155 bytes --] Maxime Devos <maximedevos@telenet.be> writes: > On 04-02-2023 22:35, Dr. Arne Babenhauserheide wrote: >> Maxime Devos <maximedevos@telenet.be> writes: >> >>>> This needs an addition to the extensions via guile -x .w — I wrote >>>> that >>>> in the documentation. I didn’t want to do that unconditionally, because >>>> detecting a wisp file as scheme import would cause errors. >>> >>> If done carefully, I don't think this situations would happen. >>> More precisely: >>> >>> * .w would be in the file extensions list. >>> >>> * Instead of a list, it would actually be a map from extensions to >>> languages: >>> >>> .scm -> scheme >>> .w -> wisp >>> >>> With this change, (use-modules (foo)) will load 'foo.scm' as Scheme >>> and 'foo.w' as Wisp. (Assuming that foo.go is out-of-date or >>> doesn't exist.) >>> >>> (For backwards compatibility, I think %load-extensions needs to >>> remain a list of strings, but a %extension-language variable could >>> be defined.) >>> >>> * "guile --language=whatever foo" loads foo as whatever, regardless >>> of the extension of 'foo' (if a specific language is requested, >>> then the user knows best). >>> >>> * "guile foo" without --language will look up the extension of foo in >>> the extension map. If an entry exists, it would use the >>> corresponding language. If no entry exists, it would use >>> a default language (scheme). >> This sounds good, though a bit more complex than I think it should >> be. >> I think this should stick to only load Scheme if no language is >> detected >> to keep Scheme the default language for Guile > > To my knowledge, this is the case in my proposal. Detecting the > language is done via the file extension, and if no known mapping > exists, it defaults to Scheme. I’ve been thinking about this for a while and I expect that this will take quite a bit of discussion, because it is a change that would affect every language already shipped by Guile, and it is a change that needs strategic decisions. I like the idea but would like to separate out changes to how languages are treated. >> — and also to avoid >> stumbling over files that just take that extension. > > While I suppose it is theoretically possible someone will write a .w > file that contains Scheme code instead of Wisp, I'm not convinced by > this argument. It sounds very unlikely, and also a 'don't do that, > then' situation. I don’t mean Scheme files, but rather some other tool using the extension for another file format. > Maybe the .go could contain some information on what the corresponding > source code file name is, and Guile could read the .go without > checking first checking for up-to-dateness. (But only reading; not > loading yet!) That would be nice, but would require doing changes in a critical core part of Guile. It would change this addition from a risk-free added feature to a risky core change. > Once read, it should be easy to look up the source code file name from > the .go and then verify whether the .go is up to date, and proceed > with actually loading the .go (as in, put stuff in the module system, > run top-level code, ...). That’s what I’l like to see, yes. > IIUC, that would be fully backwards compatible, and not cause any > non-negligible slowdowns. It could actually speed up loading, because there are fewer locations for caching bytecode files that locations for source-files. > * when doing (use-module (foo)) and foo.go exists in the > $GUILE_LOAD_COMPILED_PATH, load it, and don't > bother checking whether foo.scm, foo.w or foo.whatever exists > or is up-to-date. (If not done already in Guile.) > > That should solve the 'I don't care what language the library Foo is > written in, and my library Bar isn't written in Wisp so it seems > unreasonable to have to add -x w.’, and would also avoid the need for > a 'extension -> language map' thing. It should also be a little > faster than what we had before. That would also enable shipping pre-compiled software without sourcecode, so there may be strategic reasons to avoid it. Always providing the sourcecode also makes compliance with automatic copyleft licenses automatic. >>>> readable uses >>> >>> This sentence appears to be incomplete; I might have misinterpreted it >>> below (I don't know what you mean with 'readable' -- its an adjective >>> and you are using it as a noun?). >> readable is a noun, yes: the readable lisp project. > > Looks like you meant this: <https://readable.sourceforge.io/>. >>>> Can I write it into a string and then read it back? >>> >>> No. [...] >> This sounds like I cannot go that way, because there’s a necessary >> pre-processing step in wisp-read via (match-charlist-to-repr peeked): >> [...] > This actually needs to be able to write the replacement symbols back >> into the port. >> >>> ..., for which I proposed a replacement, so do you still need to turn >>> it in a string & back? >> Sadly yes. Otherwise the normal reader will play tricks on the code, >> because it does not know where a symbol needs to be interpreted >> differently (i.e. where ` needs to be treated as `() even though that’s >> not in the string). > > OK, too bad. Looks like the REPR-... stuff is to stay for now. OK — thank you for your try to find a better way! > (cond ((list? code) (map wisp-... code)) > ((eq? code '\:) ':) > ;; Look for symbols like \____ and remove the \. > ((symbol? code) > (let ((as-string (symbol->string code))) > (if (and (>= (string-length as-string) 2) ; at least a single > underscore > (char=? (string-ref as-string 0) #\\) > (string-every #\_ (substring as-string 1))) > (string->symbol (substring as-string 1)) > code))) > (#true code)) This looks great — thank you! I’m attaching my changes to the original patch and a new squashed patch. [-- Attachment #1.2: 0002-Stylistic-changes-to-language-wisp.patch --] [-- Type: text/x-patch, Size: 16537 bytes --] From 0755a9bcb5f359c39cc5e91ff33e03e8b897b93d Mon Sep 17 00:00:00 2001 From: Arne Babenhauserheide <arne_bab@web.de> Date: Tue, 14 Feb 2023 09:41:24 +0100 Subject: [PATCH 2/3] Stylistic changes to language/wisp * module/language/wisp.scm: fix whitespace and remove commented-out code * module/language/wisp/spec.scm: remove unused variable Acked-by: Arne Babenhauserheide <arne_bab@web.de> --- module/language/wisp.scm | 101 ++++++++++++++-------------------- module/language/wisp/spec.scm | 2 - 2 files changed, 42 insertions(+), 61 deletions(-) diff --git a/module/language/wisp.scm b/module/language/wisp.scm index ba24f54c5..bec91e274 100644 --- a/module/language/wisp.scm +++ b/module/language/wisp.scm @@ -40,8 +40,8 @@ (use-modules (srfi srfi-1) - (srfi srfi-11 ); for let-values - (ice-9 rw ); for write-string/partial + (srfi srfi-11); for let-values + (ice-9 rw); for write-string/partial (ice-9 match)) @@ -106,15 +106,15 @@ repr-unquote) ((equal? chlist (list #\`)) repr-quasiquote) - ((equal? chlist (list #\, #\@ )) + ((equal? chlist (list #\, #\@)) repr-unquote-splicing) - ((equal? chlist (list #\# #\' )) + ((equal? chlist (list #\# #\')) repr-syntax) - ((equal? chlist (list #\# #\, )) + ((equal? chlist (list #\# #\,)) repr-unsyntax) - ((equal? chlist (list #\# #\` )) + ((equal? chlist (list #\# #\`)) repr-quasisyntax) - ((equal? chlist (list #\# #\, #\@ )) + ((equal? chlist (list #\# #\, #\@)) repr-unsyntax-splicing) (else #f)))) @@ -124,33 +124,23 @@ (let ((prefix-maxlen 4)) (let longpeek ((peeked '()) - (repr-symbol #f)) + (repr-symbol #f)) (cond - ((or (< prefix-maxlen (length peeked)) (eof-object? (peek-char port)) (equal? #\space (peek-char port)) (equal? #\newline (peek-char port)) ) + ((or (< prefix-maxlen (length peeked)) (eof-object? (peek-char port)) (equal? #\space (peek-char port)) (equal? #\newline (peek-char port))) (if repr-symbol ; found a special symbol, return it. - ; TODO: Somehow store source-properties. The commented-out code below does not work. - ; catch #t - ; lambda () - ; write : source-properties symbol-or-symbols - ; set-source-property! symbol-or-symbols 'filename : port-filename port - ; set-source-property! symbol-or-symbols 'line : 1+ : port-line port - ; set-source-property! symbol-or-symbols 'column : port-column port - ; write : source-properties symbol-or-symbols - ; lambda : key . arguments - ; . #f repr-symbol (let unpeek ((remaining peeked)) (cond - ((equal? '() remaining ) - (read port )); let read to the work + ((equal? '() remaining) + (read port)); let read to the work (else (unread-char (car remaining) port) (unpeek (cdr remaining))))))) (else (let* ((next-char (read-char port)) - (peeked (cons next-char peeked))) + (peeked (cons next-char peeked))) (longpeek peeked (match-charlist-to-repr peeked)))))))) @@ -174,7 +164,7 @@ (= 0 (line-indent line)) (line-empty-code? line))) -(define (line-strip-continuation line ) +(define (line-strip-continuation line) (if (line-continues? line) (append (list @@ -204,13 +194,13 @@ (define (indent-level-difference indentation-levels level) "Find how many indentation levels need to be popped off to find the given level." (indent-level-reduction indentation-levels level - (lambda (x ); get the count + (lambda (x); get the count (car x)))) (define (indent-reduce-to-level indentation-levels level) "Find how many indentation levels need to be popped off to find the given level." (indent-level-reduction indentation-levels level - (lambda (x ); get the levels + (lambda (x); get the levels (car (cdr x))))) (define (chunk-ends-with-period currentsymbols next-char) @@ -222,7 +212,7 @@ (define (wisp-scheme-read-chunk-lines port) (let loop - ((indent-and-symbols (list )); '((5 "(foobar)" "\"yobble\"")(3 "#t")) + ((indent-and-symbols (list)); '((5 "(foobar)" "\"yobble\"")(3 "#t")) (inindent #t) (inunderscoreindent (equal? #\_ (peek-char port))) (incomment #f) @@ -230,7 +220,7 @@ (currentsymbols '()) (emptylines 0)) (cond - ((>= emptylines 2 ); the chunk end has to be checked + ((>= emptylines 2); the chunk end has to be checked ; before we look for new chars in the ; port to make execution in the REPL ; after two empty lines work @@ -242,14 +232,14 @@ ((eof-object? next-char) (append indent-and-symbols (list (append (list currentindent) currentsymbols)))) ((and inindent (zero? currentindent) (not incomment) (not (null? indent-and-symbols)) (not inunderscoreindent) (not (or (equal? #\space next-char) (equal? #\newline next-char) (equal? (string-ref ";" 0) next-char)))) - (append indent-and-symbols )); top-level form ends chunk + (append indent-and-symbols)); top-level form ends chunk ((chunk-ends-with-period currentsymbols next-char) ; the line ends with a period. This is forbidden in ; SRFI-119. Use it to end the line in the REPL without ; showing continuation dots (...). (append indent-and-symbols (list (append (list currentindent) (drop-right currentsymbols 1))))) ((and inindent (equal? #\space next-char)) - (read-char port ); remove char + (read-char port); remove char (loop indent-and-symbols #t ; inindent @@ -259,7 +249,7 @@ currentsymbols emptylines)) ((and inunderscoreindent (equal? #\_ next-char)) - (read-char port ); remove char + (read-char port); remove char (loop indent-and-symbols #t ; inindent @@ -276,7 +266,7 @@ ((and inunderscoreindent (and (not (equal? #\space next-char)) (not (equal? #\newline next-char)))) (throw 'wisp-syntax-error "initial underscores without following whitespace at beginning of the line after" (last indent-and-symbols))) ((equal? #\newline next-char) - (read-char port ); remove the newline + (read-char port); remove the newline ; The following two lines would break the REPL by requiring one char too many. ; if : and (equal? #\newline next-char) : equal? #\return : peek-char port ; read-char port ; remove a full \n\r. Damn special cases... @@ -285,14 +275,13 @@ ; indent. Lines with a comment at zero indent ; get indent -1 for the same reason - meaning ; not actually empty. - ( - (indent + ((indent (cond (incomment - (if (= 0 currentindent ); specialcase + (if (= 0 currentindent); specialcase -1 - currentindent )) - ((not (null? currentsymbols )); pure whitespace + currentindent)) + ((not (null? currentsymbols)); pure whitespace currentindent) (else 0))) @@ -313,13 +302,13 @@ #t ; inindent (if (<= 2 emptylines) #f ; chunk ends here - (equal? #\_ (peek-char port ))); are we in underscore indent? + (equal? #\_ (peek-char port))); are we in underscore indent? #f ; incomment 0 '() emptylines))) ((equal? #t incomment) - (read-char port ); remove one comment character + (read-char port); remove one comment character (loop indent-and-symbols #f ; inindent @@ -328,8 +317,8 @@ currentindent currentsymbols emptylines)) - ((or (equal? #\space next-char) (equal? #\tab next-char) (equal? #\return next-char) ); remove whitespace when not in indent - (read-char port ); remove char + ((or (equal? #\space next-char) (equal? #\tab next-char) (equal? #\return next-char)); remove whitespace when not in indent + (read-char port); remove char (loop indent-and-symbols #f ; inindent @@ -406,10 +395,7 @@ (define (line-finalize line) "Process all wisp-specific information in a line and strip it" - (let - ( - (l - (line-code-replace-inline-colons + (let ((l (line-code-replace-inline-colons (line-strip-indentation-marker (line-strip-lone-colon (line-strip-continuation line)))))) @@ -459,7 +445,7 @@ (and (not (null? lines)) (not (line-empty-code? (car lines))) - (not (= 0 (line-real-indent (car lines ))))); -1 is a line with a comment + (not (= 0 (line-real-indent (car lines))))); -1 is a line with a comment (if (= 1 (line-real-indent (car lines))) ;; accept a single space as indentation of the first line (and ignore the indentation) to support meta commands (set! lines @@ -474,15 +460,14 @@ (unprocessed lines) (indentation-levels '(0))) (let* - ( - (current-line + ((current-line (if (<= 1 (length unprocessed)) (car unprocessed) - (list 0 ))); empty code + (list 0))); empty code (next-line (if (<= 2 (length unprocessed)) (car (cdr unprocessed)) - (list 0 ))); empty code + (list 0))); empty code (current-indentation (car indentation-levels)) (current-line-indentation (line-real-indent current-line))) @@ -517,7 +502,7 @@ processed (cdr unprocessed) indentation-levels)) - ((and (line-empty-code? next-line) (<= 2 (length unprocessed ))) + ((and (line-empty-code? next-line) (<= 2 (length unprocessed))) ; display "next-line empty\n" ; TODO: Somehow preserve the line-numbers. ; take out the next-line from unprocessed. @@ -555,18 +540,17 @@ (if (line-continues? current-line) line (wisp-add-source-properties-from line (list line)))) - (cdr unprocessed ); recursion here + (cdr unprocessed); recursion here indentation-levels)) ((< current-line-indentation next-line-indentation) ; display "current-line-indent < next-line-indent\n" ; format #t "line: ~A\n" line ; side-recursion via a sublist (let-values - ( - ((sub-processed sub-unprocessed) + (((sub-processed sub-unprocessed) (loop line - (cdr unprocessed ); recursion here + (cdr unprocessed); recursion here indentation-levels))) ; format #t "side-recursion:\n sub-processed: ~A\n processed: ~A\n\n" sub-processed processed (loop @@ -660,7 +644,7 @@ (match code (('REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) (list 'quote (map wisp-replace-paren-quotation-repr a))) - ((a ... 'REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ); this is the quoted empty list + ((a ... 'REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b); this is the quoted empty list (append (map wisp-replace-paren-quotation-repr a) (list (list 'quote (map wisp-replace-paren-quotation-repr b))))) @@ -668,13 +652,13 @@ (list 'quasiquote (list 'unquote (map wisp-replace-paren-quotation-repr a)))) (('REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) (list 'unquote (map wisp-replace-paren-quotation-repr a))) - ((a ... 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ) + ((a ... 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b) (append (map wisp-replace-paren-quotation-repr a) (list (list 'unquote (map wisp-replace-paren-quotation-repr b))))) (('REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) (list 'quasiquote (map wisp-replace-paren-quotation-repr a))) - ((a ... 'REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ); this is the quoted empty list + ((a ... 'REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b); this is the quoted empty list (append (map wisp-replace-paren-quotation-repr a) (list (list 'quasiquote (map wisp-replace-paren-quotation-repr b))))) @@ -713,8 +697,7 @@ code to recreate the improper lists. Match is awesome!" (let - ( - (improper + ((improper (match code ((a ... b 'REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd c) (append (map wisp-make-improper a) diff --git a/module/language/wisp/spec.scm b/module/language/wisp/spec.scm index d5ea7abce..155292d9a 100644 --- a/module/language/wisp/spec.scm +++ b/module/language/wisp/spec.scm @@ -63,8 +63,6 @@ ;;; Language definition ;;; -(define wisp-pending-sexps (list)) - (define (read-one-wisp-sexp port env) ;; allow using "# foo" as #(foo). (read-hash-extend #\# (λ (chr port) #\#)) -- 2.39.1 [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #1.3: 0003-Remove-limit-of-_-prefix-in-language-wisp-thanks-to-.patch --] [-- Type: text/x-patch, Size: 2679 bytes --] From d8e688c567e70831d2d916154eb5fa245651a4ca Mon Sep 17 00:00:00 2001 From: Arne Babenhauserheide <arne_bab@web.de> Date: Tue, 14 Feb 2023 09:49:38 +0100 Subject: [PATCH 3/3] =?UTF-8?q?Remove=20limit=20of=20=5F-prefix=20in=20lan?= =?UTF-8?q?guage/wisp=20=E2=80=94=20thanks=20to=20Maxime=20Devos!?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit * module/language/wisp.scm: replace match by general cond Acked-by: Arne Babenhauserheide <arne_bab@web.de> --- module/language/wisp.scm | 45 +++++++++++----------------------------- 1 file changed, 12 insertions(+), 33 deletions(-) diff --git a/module/language/wisp.scm b/module/language/wisp.scm index bec91e274..1ce3ced1c 100644 --- a/module/language/wisp.scm +++ b/module/language/wisp.scm @@ -24,7 +24,6 @@ ;; preprocessed file. ;; Limitations: -;; - only unescapes up to 12 leading underscores at line start (\____________) ;; - in some cases the source line information is missing in backtraces. ;; check for set-source-property! @@ -595,38 +594,18 @@ (cdr unprocessed))))) (define (wisp-unescape-underscore-and-colon code) - "replace \\_ and \\: by _ and :" - (match code - ((a ...) - (map wisp-unescape-underscore-and-colon a)) - ('\_ - '_) - ('\__ - '__) - ('\___ - '___) - ('\____ - '____) - ('\_____ - '_____) - ('\______ - '______) - ('\_______ - '_______) - ('\________ - '________) - ('\_________ - '_________) - ('\__________ - '__________) - ('\___________ - '___________) - ('\____________ - '____________) - ('\: - ':) - (a - a))) + "replace \\_ and \\: by _ and :" + (cond ((list? code) (map wisp-unescape-underscore-and-colon code)) + ((eq? code '\:) ':) + ;; Look for symbols like \____ and remove the \. + ((symbol? code) + (let ((as-string (symbol->string code))) + (if (and (>= (string-length as-string) 2) ; at least a single underscore + (char=? (string-ref as-string 0) #\\) + (string-every #\_ (substring as-string 1))) + (string->symbol (substring as-string 1)) + code))) + (#t code))) (define (wisp-replace-empty-eof code) -- 2.39.1 [-- Attachment #1.4: 0001-Add-language-wisp-wisp-tests-and-srfi-119-documentat-squashed.patch --] [-- Type: text/x-patch, Size: 46919 bytes --] From bb38072818ff754ba67533380982e993ffc0ff52 Mon Sep 17 00:00:00 2001 From: Arne Babenhauserheide <arne_bab@web.de> Date: Fri, 3 Feb 2023 22:20:04 +0100 Subject: [PATCH] Add language/wisp, wisp tests, and srfi-119 documentation * doc/ref/srfi-modules.texi (srfi-119): add node * module/language/wisp.scm: New file. * module/language/wisp/spec.scm: New file. * test-suite/tests/srfi-119.test: New file. --- doc/ref/srfi-modules.texi | 30 ++ module/language/wisp.scm | 758 +++++++++++++++++++++++++++++++++ module/language/wisp/spec.scm | 105 +++++ test-suite/tests/srfi-119.test | 81 ++++ 4 files changed, 974 insertions(+) create mode 100644 module/language/wisp.scm create mode 100644 module/language/wisp/spec.scm create mode 100644 test-suite/tests/srfi-119.test diff --git a/doc/ref/srfi-modules.texi b/doc/ref/srfi-modules.texi index 0ef136215..759e293ec 100644 --- a/doc/ref/srfi-modules.texi +++ b/doc/ref/srfi-modules.texi @@ -64,6 +64,7 @@ get the relevant SRFI documents from the SRFI home page * SRFI-98:: Accessing environment variables. * SRFI-105:: Curly-infix expressions. * SRFI-111:: Boxes. +* SRFI-119:: Wisp: simpler indentation-sensitive scheme. * SRFI-171:: Transducers @end menu @@ -5662,6 +5663,34 @@ Return the current contents of @var{box}. Set the contents of @var{box} to @var{value}. @end deffn +@node SRFI-119 +@subsection SRFI-119 Wisp: simpler indentation-sensitive scheme. +@cindex SRFI-119 +@cindex wisp + +The languages shipped in Guile include SRFI-119 (wisp), an encoding of +Scheme that allows replacing parentheses with equivalent indentation and +inline colons. See +@uref{http://srfi.schemers.org/srfi-119/srfi-119.html, the specification +of SRFI-119}. Some examples: + +@example +display "Hello World!" @result{} (display "Hello World!") +@end example + +@example +define : factorial n @result{} (define (factorial n) + if : zero? n @result{} (if (zero? n) + . 1 @result{} 1 + * n : factorial @{n - 1@} @result{} (* n (factorial @{n - 1@})))) +@end example + +To execute a file with wisp code, select the language and filename +extension @code{.w} vie @code{guile --language=wisp -x .w}. + +In files using Wisp, @xref{SRFI-105} (Curly Infix) is always activated. + + @node SRFI-171 @subsection Transducers @cindex SRFI-171 @@ -5705,6 +5734,7 @@ left-to-right, due to how transducers are initiated. * SRFI-171 Helpers:: Utilities for writing your own transducers @end menu + @node SRFI-171 General Discussion @subsubsection SRFI-171 General Discussion @cindex transducers discussion diff --git a/module/language/wisp.scm b/module/language/wisp.scm new file mode 100644 index 000000000..1ce3ced1c --- /dev/null +++ b/module/language/wisp.scm @@ -0,0 +1,758 @@ +;;; Wisp + +;; Copyright (C) 2013, 2017, 2018, 2020 Free Software Foundation, Inc. +;; Copyright (C) 2014--2023 Arne Babenhauserheide. + +;;;; This library is free software; you can redistribute it and/or +;;;; modify it under the terms of the GNU Lesser General Public +;;;; License as published by the Free Software Foundation; either +;;;; version 3 of the License, or (at your option) any later version. +;;;; +;;;; This library is distributed in the hope that it will be useful, +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;;;; Lesser General Public License for more details. +;;;; +;;;; You should have received a copy of the GNU Lesser General Public +;;;; License along with this library; if not, write to the Free Software +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +;;; Commentary: + +;; Scheme-only implementation of a wisp-preprocessor which output a +;; scheme code tree to feed to a scheme interpreter instead of a +;; preprocessed file. + +;; Limitations: +;; - in some cases the source line information is missing in backtraces. +;; check for set-source-property! + +;;; Code: + +(define-module (language wisp) + #:export (wisp-scheme-read-chunk wisp-scheme-read-all + wisp-scheme-read-file-chunk wisp-scheme-read-file + wisp-scheme-read-string)) + +; use curly-infix by default +(read-enable 'curly-infix) + +(use-modules + (srfi srfi-1) + (srfi srfi-11); for let-values + (ice-9 rw); for write-string/partial + (ice-9 match)) + + +;; Helper functions for the indent-and-symbols data structure: '((indent token token ...) ...) +(define (line-indent line) + (car line)) + +(define (line-real-indent line) + "Get the indentation without the comment-marker for unindented lines (-1 is treated as 0)." + (let (( indent (line-indent line))) + (if (= -1 indent) + 0 + indent))) + +(define (line-code line) + (let ((code (cdr line))) + ; propagate source properties + (when (not (null? code)) + (set-source-properties! code (source-properties line))) + code)) + +; literal values I need +(define readcolon + (string->symbol ":")) + +(define wisp-uuid "e749c73d-c826-47e2-a798-c16c13cb89dd") +; define an intermediate dot replacement with UUID to avoid clashes. +(define repr-dot ; . + (string->symbol (string-append "REPR-DOT-" wisp-uuid))) + +; allow using reader additions as the first element on a line to prefix the list +(define repr-quote ; ' + (string->symbol (string-append "REPR-QUOTE-" wisp-uuid))) +(define repr-unquote ; , + (string->symbol (string-append "REPR-UNQUOTE-" wisp-uuid))) +(define repr-quasiquote ; ` + (string->symbol (string-append "REPR-QUASIQUOTE-" wisp-uuid))) +(define repr-unquote-splicing ; ,@ + (string->symbol (string-append "REPR-UNQUOTESPLICING-" wisp-uuid))) + +(define repr-syntax ; #' + (string->symbol (string-append "REPR-SYNTAX-" wisp-uuid))) +(define repr-unsyntax ; #, + (string->symbol (string-append "REPR-UNSYNTAX-" wisp-uuid))) +(define repr-quasisyntax ; #` + (string->symbol (string-append "REPR-QUASISYNTAX-" wisp-uuid))) +(define repr-unsyntax-splicing ; #,@ + (string->symbol (string-append "REPR-UNSYNTAXSPLICING-" wisp-uuid))) + +; TODO: wrap the reader to return the repr of the syntax reader +; additions + +(define (match-charlist-to-repr charlist) + (let + ((chlist (reverse charlist))) + (cond + ((equal? chlist (list #\.)) + repr-dot) + ((equal? chlist (list #\')) + repr-quote) + ((equal? chlist (list #\,)) + repr-unquote) + ((equal? chlist (list #\`)) + repr-quasiquote) + ((equal? chlist (list #\, #\@)) + repr-unquote-splicing) + ((equal? chlist (list #\# #\')) + repr-syntax) + ((equal? chlist (list #\# #\,)) + repr-unsyntax) + ((equal? chlist (list #\# #\`)) + repr-quasisyntax) + ((equal? chlist (list #\# #\, #\@)) + repr-unsyntax-splicing) + (else + #f)))) + +(define (wisp-read port) + "wrap read to catch list prefixes." + (let ((prefix-maxlen 4)) + (let longpeek + ((peeked '()) + (repr-symbol #f)) + (cond + ((or (< prefix-maxlen (length peeked)) (eof-object? (peek-char port)) (equal? #\space (peek-char port)) (equal? #\newline (peek-char port))) + (if repr-symbol ; found a special symbol, return it. + repr-symbol + (let unpeek + ((remaining peeked)) + (cond + ((equal? '() remaining) + (read port)); let read to the work + (else + (unread-char (car remaining) port) + (unpeek (cdr remaining))))))) + (else + (let* + ((next-char (read-char port)) + (peeked (cons next-char peeked))) + (longpeek + peeked + (match-charlist-to-repr peeked)))))))) + + + +(define (line-continues? line) + (equal? repr-dot (car (line-code line)))) + +(define (line-only-colon? line) + (and + (equal? ":" (car (line-code line))) + (null? (cdr (line-code line))))) + +(define (line-empty-code? line) + (null? (line-code line))) + +(define (line-empty? line) + (and + ; if indent is -1, we stripped a comment, so the line was not really empty. + (= 0 (line-indent line)) + (line-empty-code? line))) + +(define (line-strip-continuation line) + (if (line-continues? line) + (append + (list + (line-indent line)) + (cdr (line-code line))) + line)) + +(define (line-strip-indentation-marker line) + "Strip the indentation markers from the beginning of the line" + (cdr line)) + +(define (indent-level-reduction indentation-levels level select-fun) + "Reduce the INDENTATION-LEVELS to the given LEVEL and return the value selected by SELECT-FUN" + (let loop + ((newlevels indentation-levels) + (diff 0)) + (cond + ((= level (car newlevels)) + (select-fun (list diff indentation-levels))) + ((< level (car newlevels)) + (loop + (cdr newlevels) + (1+ diff))) + (else + (throw 'wisp-syntax-error "Level ~A not found in the indentation-levels ~A."))))) + +(define (indent-level-difference indentation-levels level) + "Find how many indentation levels need to be popped off to find the given level." + (indent-level-reduction indentation-levels level + (lambda (x); get the count + (car x)))) + +(define (indent-reduce-to-level indentation-levels level) + "Find how many indentation levels need to be popped off to find the given level." + (indent-level-reduction indentation-levels level + (lambda (x); get the levels + (car (cdr x))))) + +(define (chunk-ends-with-period currentsymbols next-char) + "Check whether indent-and-symbols ends with a period, indicating the end of a chunk." + (and (not (null? currentsymbols)) + (equal? #\newline next-char) + (equal? repr-dot + (list-ref currentsymbols (- (length currentsymbols) 1))))) + +(define (wisp-scheme-read-chunk-lines port) + (let loop + ((indent-and-symbols (list)); '((5 "(foobar)" "\"yobble\"")(3 "#t")) + (inindent #t) + (inunderscoreindent (equal? #\_ (peek-char port))) + (incomment #f) + (currentindent 0) + (currentsymbols '()) + (emptylines 0)) + (cond + ((>= emptylines 2); the chunk end has to be checked + ; before we look for new chars in the + ; port to make execution in the REPL + ; after two empty lines work + ; (otherwise it shows one more line). + indent-and-symbols) + (else + (let ((next-char (peek-char port))) + (cond + ((eof-object? next-char) + (append indent-and-symbols (list (append (list currentindent) currentsymbols)))) + ((and inindent (zero? currentindent) (not incomment) (not (null? indent-and-symbols)) (not inunderscoreindent) (not (or (equal? #\space next-char) (equal? #\newline next-char) (equal? (string-ref ";" 0) next-char)))) + (append indent-and-symbols)); top-level form ends chunk + ((chunk-ends-with-period currentsymbols next-char) + ; the line ends with a period. This is forbidden in + ; SRFI-119. Use it to end the line in the REPL without + ; showing continuation dots (...). + (append indent-and-symbols (list (append (list currentindent) (drop-right currentsymbols 1))))) + ((and inindent (equal? #\space next-char)) + (read-char port); remove char + (loop + indent-and-symbols + #t ; inindent + #f ; inunderscoreindent + #f ; incomment + (1+ currentindent) + currentsymbols + emptylines)) + ((and inunderscoreindent (equal? #\_ next-char)) + (read-char port); remove char + (loop + indent-and-symbols + #t ; inindent + #t ; inunderscoreindent + #f ; incomment + (1+ currentindent) + currentsymbols + emptylines)) + ; any char but whitespace *after* underscoreindent is + ; an error. This is stricter than the current wisp + ; syntax definition. TODO: Fix the definition. Better + ; start too strict. FIXME: breaks on lines with only + ; underscores which should be empty lines. + ((and inunderscoreindent (and (not (equal? #\space next-char)) (not (equal? #\newline next-char)))) + (throw 'wisp-syntax-error "initial underscores without following whitespace at beginning of the line after" (last indent-and-symbols))) + ((equal? #\newline next-char) + (read-char port); remove the newline + ; The following two lines would break the REPL by requiring one char too many. + ; if : and (equal? #\newline next-char) : equal? #\return : peek-char port + ; read-char port ; remove a full \n\r. Damn special cases... + (let* ; distinguish pure whitespace lines and lines + ; with comment by giving the former zero + ; indent. Lines with a comment at zero indent + ; get indent -1 for the same reason - meaning + ; not actually empty. + ((indent + (cond + (incomment + (if (= 0 currentindent); specialcase + -1 + currentindent)) + ((not (null? currentsymbols)); pure whitespace + currentindent) + (else + 0))) + (parsedline (append (list indent) currentsymbols)) + (emptylines + (if (not (line-empty? parsedline)) + 0 + (1+ emptylines)))) + (when (not (= 0 (length parsedline))) + ; set the source properties to parsedline so we can try to add them later. + (set-source-property! parsedline 'filename (port-filename port)) + (set-source-property! parsedline 'line (port-line port))) + ; TODO: If the line is empty. Either do it here and do not add it, just + ; increment the empty line counter, or strip it later. Replace indent + ; -1 by indent 0 afterwards. + (loop + (append indent-and-symbols (list parsedline)) + #t ; inindent + (if (<= 2 emptylines) + #f ; chunk ends here + (equal? #\_ (peek-char port))); are we in underscore indent? + #f ; incomment + 0 + '() + emptylines))) + ((equal? #t incomment) + (read-char port); remove one comment character + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #t ; incomment + currentindent + currentsymbols + emptylines)) + ((or (equal? #\space next-char) (equal? #\tab next-char) (equal? #\return next-char)); remove whitespace when not in indent + (read-char port); remove char + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #f ; incomment + currentindent + currentsymbols + emptylines)) + ; | cludge to appease the former wisp parser + ; | used for bootstrapping which has a + ; v problem with the literal comment char + ((equal? (string-ref ";" 0) next-char) + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #t ; incomment + currentindent + currentsymbols + emptylines)) + (else ; use the reader + (loop + indent-and-symbols + #f ; inindent + #f ; inunderscoreindent + #f ; incomment + currentindent + ; this also takes care of the hashbang and leading comments. + (append currentsymbols (list (wisp-read port))) + emptylines)))))))) + + +(define (line-code-replace-inline-colons line) + "Replace inline colons by opening parens which close at the end of the line" + ; format #t "replace inline colons for line ~A\n" line + (let loop + ((processed '()) + (unprocessed line)) + (cond + ((null? unprocessed) + ; format #t "inline-colons processed line: ~A\n" processed + processed) + ; replace : . with nothing + ((and (<= 2 (length unprocessed)) (equal? readcolon (car unprocessed)) (equal? repr-dot (car (cdr unprocessed)))) + (loop + (append processed + (loop '() (cdr (cdr unprocessed)))) + '())) + ((equal? readcolon (car unprocessed)) + (loop + ; FIXME: This should turn unprocessed into a list. + (append processed + (list (loop '() (cdr unprocessed)))) + '())) + (else + (loop + (append processed + (list (car unprocessed))) + (cdr unprocessed)))))) + +(define (line-replace-inline-colons line) + (cons + (line-indent line) + (line-code-replace-inline-colons (line-code line)))) + +(define (line-strip-lone-colon line) + "A line consisting only of a colon is just a marked indentation level. We need to kill the colon before replacing inline colons." + (if + (equal? + (line-code line) + (list readcolon)) + (list (line-indent line)) + line)) + +(define (line-finalize line) + "Process all wisp-specific information in a line and strip it" + (let ((l (line-code-replace-inline-colons + (line-strip-indentation-marker + (line-strip-lone-colon + (line-strip-continuation line)))))) + (when (not (null? (source-properties line))) + (catch #t + (lambda () + (set-source-properties! l (source-properties line))) + (lambda (key . arguments) + #f))) + l)) + +(define (wisp-add-source-properties-from source target) + "Copy the source properties from source into the target and return the target." + (catch #t + (lambda () + (set-source-properties! target (source-properties source))) + (lambda (key . arguments) + #f)) + target) + +(define (wisp-propagate-source-properties code) + "Propagate the source properties from the sourrounding list into every part of the code." + (let loop + ((processed '()) + (unprocessed code)) + (cond + ((and (null? processed) (not (pair? unprocessed)) (not (list? unprocessed))) + unprocessed) + ((and (pair? unprocessed) (not (list? unprocessed))) + (cons + (wisp-propagate-source-properties (car unprocessed)) + (wisp-propagate-source-properties (cdr unprocessed)))) + ((null? unprocessed) + processed) + (else + (let ((line (car unprocessed))) + (if (null? (source-properties unprocessed)) + (wisp-add-source-properties-from line unprocessed) + (wisp-add-source-properties-from unprocessed line)) + (loop + (append processed (list (wisp-propagate-source-properties line))) + (cdr unprocessed))))))) + +(define* (wisp-scheme-indentation-to-parens lines) + "Add parentheses to lines and remove the indentation markers" + (when + (and + (not (null? lines)) + (not (line-empty-code? (car lines))) + (not (= 0 (line-real-indent (car lines))))); -1 is a line with a comment + (if (= 1 (line-real-indent (car lines))) + ;; accept a single space as indentation of the first line (and ignore the indentation) to support meta commands + (set! lines + (cons + (cons 0 (cdr (car lines))) + (cdr lines))) + (throw 'wisp-syntax-error + (format #f "The first symbol in a chunk must start at zero indentation. Indentation and line: ~A" + (car lines))))) + (let loop + ((processed '()) + (unprocessed lines) + (indentation-levels '(0))) + (let* + ((current-line + (if (<= 1 (length unprocessed)) + (car unprocessed) + (list 0))); empty code + (next-line + (if (<= 2 (length unprocessed)) + (car (cdr unprocessed)) + (list 0))); empty code + (current-indentation + (car indentation-levels)) + (current-line-indentation (line-real-indent current-line))) + ; format #t "processed: ~A\ncurrent-line: ~A\nnext-line: ~A\nunprocessed: ~A\nindentation-levels: ~A\ncurrent-indentation: ~A\n\n" + ; . processed current-line next-line unprocessed indentation-levels current-indentation + (cond + ; the real end: this is reported to the outside world. + ((and (null? unprocessed) (not (null? indentation-levels)) (null? (cdr indentation-levels))) + ; display "done\n" + ; reverse the processed lines, because I use cons. + processed) + ; the recursion end-condition + ((and (null? unprocessed)) + ; display "last step\n" + ; this is the last step. Nothing more to do except + ; for rolling up the indentation levels. return the + ; new processed and unprocessed lists: this is a + ; side-recursion + (values processed unprocessed)) + ((null? indentation-levels) + ; display "indentation-levels null\n" + (throw 'wisp-programming-error "The indentation-levels are null but the current-line is null: Something killed the indentation-levels.")) + (else ; now we come to the line-comparisons and indentation-counting. + (cond + ((line-empty-code? current-line) + ; display "current-line empty\n" + ; We cannot process indentation without + ; code. Just switch to the next line. This should + ; only happen at the start of the recursion. + ; TODO: Somehow preserve the line-numbers. + (loop + processed + (cdr unprocessed) + indentation-levels)) + ((and (line-empty-code? next-line) (<= 2 (length unprocessed))) + ; display "next-line empty\n" + ; TODO: Somehow preserve the line-numbers. + ; take out the next-line from unprocessed. + (loop + processed + (cons current-line + (cdr (cdr unprocessed))) + indentation-levels)) + ((> current-indentation current-line-indentation) + ; display "current-indent > next-line\n" + ; this just steps back one level via the side-recursion. + (let ((previous-indentation (car (cdr indentation-levels)))) + (if (<= current-line-indentation previous-indentation) + (values processed unprocessed) + (begin ;; not yet used level! TODO: maybe throw an error here instead of a warning. + (let ((linenumber (- (length lines) (length unprocessed)))) + (format (current-error-port) ";;; WARNING:~A: used lower but undefined indentation level (line ~A of the current chunk: ~S). This makes refactoring much more error-prone, therefore it might become an error in a later version of Wisp.\n" (source-property current-line 'line) linenumber (cdr current-line))) + (loop + processed + unprocessed + (cons ; recursion via the indentation-levels + current-line-indentation + (cdr indentation-levels))))))) + ((= current-indentation current-line-indentation) + ; display "current-indent = next-line\n" + (let + ((line (line-finalize current-line)) + (next-line-indentation (line-real-indent next-line))) + (cond + ((>= current-line-indentation next-line-indentation) + ; simple recursiive step to the next line + ; display "current-line-indent >= next-line-indent\n" + (loop + (append processed + (if (line-continues? current-line) + line + (wisp-add-source-properties-from line (list line)))) + (cdr unprocessed); recursion here + indentation-levels)) + ((< current-line-indentation next-line-indentation) + ; display "current-line-indent < next-line-indent\n" + ; format #t "line: ~A\n" line + ; side-recursion via a sublist + (let-values + (((sub-processed sub-unprocessed) + (loop + line + (cdr unprocessed); recursion here + indentation-levels))) + ; format #t "side-recursion:\n sub-processed: ~A\n processed: ~A\n\n" sub-processed processed + (loop + (append processed (list sub-processed)) + sub-unprocessed ; simply use the recursion from the sub-recursion + indentation-levels)))))) + ((< current-indentation current-line-indentation) + ; display "current-indent < next-line\n" + (loop + processed + unprocessed + (cons ; recursion via the indentation-levels + current-line-indentation + indentation-levels))) + (else + (throw 'wisp-not-implemented + (format #f "Need to implement further line comparison: current: ~A, next: ~A, processed: ~A." + current-line next-line processed))))))))) + + +(define (wisp-scheme-replace-inline-colons lines) + "Replace inline colons by opening parens which close at the end of the line" + (let loop + ((processed '()) + (unprocessed lines)) + (if (null? unprocessed) + processed + (loop + (append processed (list (line-replace-inline-colons (car unprocessed)))) + (cdr unprocessed))))) + + +(define (wisp-scheme-strip-indentation-markers lines) + "Strip the indentation markers from the beginning of the lines" + (let loop + ((processed '()) + (unprocessed lines)) + (if (null? unprocessed) + processed + (loop + (append processed (cdr (car unprocessed))) + (cdr unprocessed))))) + +(define (wisp-unescape-underscore-and-colon code) + "replace \\_ and \\: by _ and :" + (cond ((list? code) (map wisp-unescape-underscore-and-colon code)) + ((eq? code '\:) ':) + ;; Look for symbols like \____ and remove the \. + ((symbol? code) + (let ((as-string (symbol->string code))) + (if (and (>= (string-length as-string) 2) ; at least a single underscore + (char=? (string-ref as-string 0) #\\) + (string-every #\_ (substring as-string 1))) + (string->symbol (substring as-string 1)) + code))) + (#t code))) + + +(define (wisp-replace-empty-eof code) + "replace ((#<eof>)) by ()" + ; FIXME: Actually this is a hack which fixes a bug when the + ; parser hits files with only hashbang and comments. + (if (and (not (null? code)) (pair? (car code)) (eof-object? (car (car code))) (null? (cdr code)) (null? (cdr (car code)))) + (list) + code)) + + +(define (wisp-replace-paren-quotation-repr code) + "Replace lists starting with a quotation symbol by + quoted lists." + (match code + (('REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quote (map wisp-replace-paren-quotation-repr a))) + ((a ... 'REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b); this is the quoted empty list + (append + (map wisp-replace-paren-quotation-repr a) + (list (list 'quote (map wisp-replace-paren-quotation-repr b))))) + (('REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quasiquote (list 'unquote (map wisp-replace-paren-quotation-repr a)))) + (('REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unquote (map wisp-replace-paren-quotation-repr a))) + ((a ... 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b) + (append + (map wisp-replace-paren-quotation-repr a) + (list (list 'unquote (map wisp-replace-paren-quotation-repr b))))) + (('REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quasiquote (map wisp-replace-paren-quotation-repr a))) + ((a ... 'REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b); this is the quoted empty list + (append + (map wisp-replace-paren-quotation-repr a) + (list (list 'quasiquote (map wisp-replace-paren-quotation-repr b))))) + (('REPR-UNQUOTESPLICING-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unquote-splicing (map wisp-replace-paren-quotation-repr a))) + (('REPR-SYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'syntax (map wisp-replace-paren-quotation-repr a))) + (('REPR-UNSYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unsyntax (map wisp-replace-paren-quotation-repr a))) + (('REPR-QUASISYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'quasisyntax (map wisp-replace-paren-quotation-repr a))) + (('REPR-UNSYNTAXSPLICING-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) + (list 'unsyntax-splicing (map wisp-replace-paren-quotation-repr a))) + ;; literal array as start of a line: # (a b) c -> (#(a b) c) + ((#\# a ...) + (with-input-from-string ;; hack to defer to read + (string-append "#" + (with-output-to-string + (λ () + (write (map wisp-replace-paren-quotation-repr a) + (current-output-port))))) + read)) + ((a ...) + (map wisp-replace-paren-quotation-repr a)) + (a + a))) + +(define (wisp-make-improper code) + "Turn (a #{.}# b) into the correct (a . b). + +read called on a single dot creates a variable named #{.}# (|.| +in r7rs). Due to parsing the indentation before the list +structure is known, the reader cannot create improper lists +when it reads a dot. So we have to take another pass over the +code to recreate the improper lists. + +Match is awesome!" + (let + ((improper + (match code + ((a ... b 'REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd c) + (append (map wisp-make-improper a) + (cons (wisp-make-improper b) (wisp-make-improper c)))) + ((a ...) + (map wisp-make-improper a)) + (a + a)))) + (define (syntax-error li msg) + (throw 'wisp-syntax-error (format #f "incorrect dot-syntax #{.}# in code: ~A: ~A" msg li))) + (if #t + improper + (let check + ((tocheck improper)) + (match tocheck + ; lists with only one member + (('REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd) + (syntax-error tocheck "list with the period as only member")) + ; list with remaining dot. + ((a ...) + (if (and (member repr-dot a)) + (syntax-error tocheck "leftover period in list") + (map check a))) + ; simple pair - this and the next do not work when parsed from wisp-scheme itself. Why? + (('REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd . c) + (syntax-error tocheck "dot as first element in already improper pair")) + ; simple pair, other way round + ((a . 'REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd) + (syntax-error tocheck "dot as last element in already improper pair")) + ; more complex pairs + ((? pair? a) + (let + ((head (drop-right a 1)) + (tail (last-pair a))) + (cond + ((equal? repr-dot (car tail)) + (syntax-error tocheck "equal? repr-dot : car tail")) + ((equal? repr-dot (cdr tail)) + (syntax-error tocheck "equal? repr-dot : cdr tail")) + ((member repr-dot head) + (syntax-error tocheck "member repr-dot head")) + (else + a)))) + (a + a)))))) + +(define (wisp-scheme-read-chunk port) + "Read and parse one chunk of wisp-code" + (let (( lines (wisp-scheme-read-chunk-lines port))) + (wisp-make-improper + (wisp-replace-empty-eof + (wisp-unescape-underscore-and-colon + (wisp-replace-paren-quotation-repr + (wisp-propagate-source-properties + (wisp-scheme-indentation-to-parens lines)))))))) + +(define (wisp-scheme-read-all port) + "Read all chunks from the given port" + (let loop + ((tokens '())) + (cond + ((eof-object? (peek-char port)) + tokens) + (else + (loop + (append tokens (wisp-scheme-read-chunk port))))))) + +(define (wisp-scheme-read-file path) + (call-with-input-file path wisp-scheme-read-all)) + +(define (wisp-scheme-read-file-chunk path) + (call-with-input-file path wisp-scheme-read-chunk)) + +(define (wisp-scheme-read-string str) + (call-with-input-string str wisp-scheme-read-all)) + +(define (wisp-scheme-read-string-chunk str) + (call-with-input-string str wisp-scheme-read-chunk)) + diff --git a/module/language/wisp/spec.scm b/module/language/wisp/spec.scm new file mode 100644 index 000000000..155292d9a --- /dev/null +++ b/module/language/wisp/spec.scm @@ -0,0 +1,105 @@ +;; Language interface for Wisp in Guile + +;;; adapted from guile-sweet: https://gitorious.org/nacre/guile-sweet/source/ae306867e371cb4b56e00bb60a50d9a0b8353109:sweet/common.scm + +;;; Copyright (C) 2005-2014 by David A. Wheeler and Alan Manuel K. Gloria +;;; Copyright (C) Arne Babenhauserheide (2014--2023). + +;;; Permission is hereby granted, free of charge, to any person +;;; obtaining a copy of this software and associated documentation +;;; files (the "Software"), to deal in the Software without +;;; restriction, including without limitation the rights to use, copy, +;;; modify, merge, publish, distribute, sublicense, and/or sell copies +;;; of the Software, and to permit persons to whom the Software is +;;; furnished to do so, subject to the following conditions: +;;; +;;; The above copyright notice and this permission notice shall be +;;; included in all copies or substantial portions of the Software. +;;; +;;; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, +;;; EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF +;;; MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND +;;; NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS +;;; BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN +;;; ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN +;;; CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE +;;; SOFTWARE. + +; adapted from spec.scm: https://gitorious.org/nacre/guile-sweet/source/ae306867e371cb4b56e00bb60a50d9a0b8353109:sweet/spec.scm +(define-module (language wisp spec) + #:use-module (language wisp) + #:use-module (system base compile) + #:use-module (system base language) + #:use-module (language scheme compile-tree-il) + #:use-module (language scheme decompile-tree-il) + #:export (wisp)) + +; Set locale to something which supports unicode. Required to avoid using fluids. +(catch #t + (lambda () + (setlocale LC_ALL "")) + (lambda (key . parameters) + (let ((locale-fallback "en_US.UTF-8")) + (format (current-error-port) + (string-join + (list ";;; Warning: setlocale LC_ALL \"\" failed with ~A: ~A" + "switching to explicit ~A locale. Please setup your locale." + "If this fails, you might need glibc support for unicode locales.\n") + "\n;;; ") + key parameters locale-fallback) + (catch #t + (lambda () + (setlocale LC_ALL locale-fallback)) + (lambda (key . parameters) + (format (current-error-port) + (string-join + (list ";;; Warning: fallback setlocale LC_ALL ~A failed with ~A: ~A" + "Not switching to Unicode." + "You might need glibc support for unicode locales.\n") + "\n;;; ") + locale-fallback key parameters)))))) + +;;; +;;; Language definition +;;; + +(define (read-one-wisp-sexp port env) + ;; allow using "# foo" as #(foo). + (read-hash-extend #\# (λ (chr port) #\#)) + (cond + ((eof-object? (peek-char port)) + (read-char port )); return eof: we’re done + (else + (let ((chunk (wisp-scheme-read-chunk port))) + (cond + ((not (null? chunk)) + (car chunk)) + (else + #f)))))) + +(define-language wisp + #:title "Wisp Scheme Syntax. See SRFI-119 for details." + ; . #:reader read-one-wisp-sexp + #:reader read-one-wisp-sexp ; : lambda (port env) : let ((x (read-one-wisp-sexp port env))) (display x)(newline) x ; + #:compilers `((tree-il . ,compile-tree-il)) + #:decompilers `((tree-il . ,decompile-tree-il)) + #:evaluator (lambda (x module) (primitive-eval x)) + #:printer write ; TODO: backtransform to wisp? Use source-properties? + #:make-default-environment + (lambda () + ;; Ideally we'd duplicate the whole module hierarchy so that `set!', + ;; `fluid-set!', etc. don't have any effect in the current environment. + (let ((m (make-fresh-user-module))) + ;; Provide a separate `current-reader' fluid so that + ;; compile-time changes to `current-reader' are + ;; limited to the current compilation unit. + (module-define! m 'current-reader (make-fluid)) + ;; Default to `simple-format', as is the case until + ;; (ice-9 format) is loaded. This allows + ;; compile-time warnings to be emitted when using + ;; unsupported options. + (module-set! m 'format simple-format) + m))) + + + diff --git a/test-suite/tests/srfi-119.test b/test-suite/tests/srfi-119.test new file mode 100644 index 000000000..a888df41d --- /dev/null +++ b/test-suite/tests/srfi-119.test @@ -0,0 +1,81 @@ +;;;; srfi-119.test --- Test suite for Guile's SRFI-119 reader. -*- scheme -*- +;;;; +;;;; Copyright (C) 2023 Free Software Foundation, Inc. +;;;; +;;;; This library is free software; you can redistribute it and/or +;;;; modify it under the terms of the GNU Lesser General Public +;;;; License as published by the Free Software Foundation; either +;;;; version 3 of the License, or (at your option) any later version. +;;;; +;;;; This library is distributed in the hope that it will be useful, +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU +;;;; Lesser General Public License for more details. +;;;; +;;;; You should have received a copy of the GNU Lesser General Public +;;;; License along with this library; if not, write to the Free Software +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA + +(define-module (test-srfi-119) + #:use-module (test-suite lib) + #:use-module (srfi srfi-1) + #:use-module (language wisp)) + +(define (read-string s) + (with-input-from-string s read)) + +(define (with-read-options opts thunk) + (let ((saved-options (read-options))) + (dynamic-wind + (lambda () + (read-options opts)) + thunk + (lambda () + (read-options saved-options))))) + +(define (wisp->list str) + (wisp-scheme-read-string str)) + +(with-test-prefix "wisp-read-simple" + (pass-if (equal? (wisp->list "<= n 5") '((<= n 5)))) + (pass-if (equal? (wisp->list ". 5") '(5))) + (pass-if (equal? (wisp->list "+ 1 : * 2 3") '((+ 1 (* 2 3)))))) +(with-test-prefix "wisp-read-complex" + (pass-if (equal? (wisp->list " +a b c d e + . f g h + . i j k + +concat \"I want \" + getwish from me + . \" - \" username +") '( +(a b c d e + f g h + i j k) + +(concat "I want " + (getwish from me) + " - " username)))) + + (pass-if (equal? (wisp->list " +define : a b c +_ d e +___ f +___ g h +__ . i + +define : _ +_ display \"hello\n\" + +\\_") '( +(define (a b c) + (d e + (f) + (g h) + i)) + +(define (_) + (display "hello\n")) + +(_))))) -- 2.39.1 [-- Attachment #1.5: Type: text/plain, Size: 103 bytes --] Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-14 8:32 ` Dr. Arne Babenhauserheide @ 2023-02-14 21:24 ` Dr. Arne Babenhauserheide 2023-02-14 23:01 ` Maxime Devos 0 siblings, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-14 21:24 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 214 bytes --] PS: So what’s still missing here is to avoid setting the locale. Do you happen to have a hint how to actually do this right? -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-14 21:24 ` Dr. Arne Babenhauserheide @ 2023-02-14 23:01 ` Maxime Devos 2023-02-15 1:46 ` Matt Wette 2023-02-15 8:36 ` Dr. Arne Babenhauserheide 0 siblings, 2 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-14 23:01 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 4096 bytes --] > [...] > That would be nice, but would require doing changes in a critical core > part of Guile. It would change this addition from a risk-free added > feature to a risky core change. I maintain that a new language shouldn't be merged until the Scheme-specific load path stuff is fixed/extended to work for non-Scheme things (e.g. Wisp) too -- if this requires somewhat risky (*) changes to core parts, then that just means we'll have to do some risky stuff, then. I also expect that Guile maintainers will have the opposite opinion (i.e., ‘fixing the load path stuff isn't necessary for merging a new language implementation’). (*) FWIW I disagree on the 'risky' assessment -- it seems like a ‘if it runs, it will work’ thing to me. That it modifies a core part of Guile, makes it less risky IMO, as it would automatically be more tested. Aside from the (*) and the 'I also expect [...],', I don't have anything new to say about this, so I'll stop here. > [...] > That would also enable shipping pre-compiled software without > sourcecode, That can already be done -- besides legalities, nothing stops people from putting [^] or [^] .scm files in $GUILE_LOAD_PATH and putting .go in $GUILE_LOAD_COMPILED_PATH. [^]: Redacted to not give people ideas on how to circumvent stuff. I can elaborate by non-public e-mail if you like. > so there may be strategic reasons to avoid it. Always > providing the sourcecode also makes compliance with automatic copyleft > licenses automatic. Mm, yes, I guess. If only people weren't careless and didn't try to circumvent copyleft, then things would be easier ... On 14-02-2023 22:24, Dr. Arne Babenhauserheide wrote: > PS: So what’s still missing here is to avoid setting the locale. Do you > happen to have a hint how to actually do this right? I think you might have forgotten about this: > -- Scheme Procedure: set-port-encoding! port enc > -- C Function: scm_set_port_encoding_x (port, enc) > Sets the character encoding that will be used to interpret I/O to > PORT. ENC is a string containing the name of an encoding. Valid > encoding names are those defined by IANA > (http://www.iana.org/assignments/character-sets), for example > ‘"UTF-8"’ or ‘"ISO-8859-1"’. > > As such, I propose calling set-port-encoding! right in the beginning of read-one-wisp-sexp. More concretely, replace (define (read-one-wisp-sexp port env) ;; allow using "# foo" as #(foo). (read-hash-extend #\# (λ (chr port) #\#)) (cond ((eof-object? (peek-char port)) (read-char port )); return eof: we’re done (else (let ((chunk (wisp-scheme-read-chunk port))) (cond ((not (null? chunk)) (car chunk)) (else #f)))))) by (define (read-one-wisp-sexp port env) ;; Allow using "# foo" as #(foo). ;; Don't use the globally-acting read-hash-extend, because this ;; doesn't make much sense in parenthese-y (non-Wisp) Scheme. ;; Instead, use fluids to temporarily add the extension. (define %read-hash-procedures/parameter (fluid->parameter %read-hash-procedures)) (parameterize ((%read-hash-procedures/parameter `((#\# ,(λ (chr port) #\#)) ,@(%read-hash-procedures/parameter)))) ;; Read Wisp files as UTF-8, to support non-ASCII characters. ;; TODO: would be nice to support ';; coding: whatever' lines ;; like in parenthese-y Scheme. (set-port-encoding! port "UTF-8") (if (eof-object? (peek-char port)) (read-char port) ; return eof: we’re done (let ((chunk (wisp-scheme-read-chunk port))) (and (not (null? chunk)) ; <---- XXX: maybe (pair? chunk) (car chunk)))))) (untested). (I've also done the read-hash-extend stuff and simplified the 'cond' expressions.) Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-14 23:01 ` Maxime Devos @ 2023-02-15 1:46 ` Matt Wette 2023-02-16 21:38 ` Dr. Arne Babenhauserheide 2023-02-15 8:36 ` Dr. Arne Babenhauserheide 1 sibling, 1 reply; 56+ messages in thread From: Matt Wette @ 2023-02-15 1:46 UTC (permalink / raw) To: guile-devel You may be interested in the load-lang patch I generated a few years ago to allow file-extension based loading, in addition to '#land elisp" type hooks. https://github.com/mwette/guile-contrib/blob/main/patch/3.0.8/load-lang.patch Matt ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-15 1:46 ` Matt Wette @ 2023-02-16 21:38 ` Dr. Arne Babenhauserheide 2023-02-17 1:26 ` Matt Wette 2023-02-17 23:06 ` Maxime Devos 0 siblings, 2 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-16 21:38 UTC (permalink / raw) To: Matt Wette; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 528 bytes --] Matt Wette <matt.wette@gmail.com> writes: > You may be interested in the load-lang patch I generated a few years ago > to allow file-extension based loading, in addition to '#lang elisp" > type hooks. > > https://github.com/mwette/guile-contrib/blob/main/patch/3.0.8/load-lang.patch @Maxime: Is this something you’d be interested in championing? @Matt: Who needs to ack your patch for it to go into the repo? Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-16 21:38 ` Dr. Arne Babenhauserheide @ 2023-02-17 1:26 ` Matt Wette 2023-02-23 11:36 ` Ludovic Courtès 2023-02-17 23:06 ` Maxime Devos 1 sibling, 1 reply; 56+ messages in thread From: Matt Wette @ 2023-02-17 1:26 UTC (permalink / raw) To: guile-devel On 2/16/23 1:38 PM, Dr. Arne Babenhauserheide wrote: > Matt Wette <matt.wette@gmail.com> writes: > >> You may be interested in the load-lang patch I generated a few years ago >> to allow file-extension based loading, in addition to '#lang elisp" >> type hooks. >> >> https://github.com/mwette/guile-contrib/blob/main/patch/3.0.8/load-lang.patch > @Maxime: Is this something you’d be interested in championing? > > @Matt: Who needs to ack your patch for it to go into the repo? > If by repo you mean main branch, I'm guessing this is a Ludo/Andy call. Maybe someone (you?) with write priv's could make a wip-branch for it? Matt ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-17 1:26 ` Matt Wette @ 2023-02-23 11:36 ` Ludovic Courtès 2023-02-23 17:48 ` Dr. Arne Babenhauserheide 2023-02-23 18:42 ` Maxime Devos 0 siblings, 2 replies; 56+ messages in thread From: Ludovic Courtès @ 2023-02-23 11:36 UTC (permalink / raw) To: guile-devel Hi! Sorry for the late reply. FWIW, I think it might be best to keep Wisp as a separate package: that allows it to evolve independently of Guile (and possibly more quickly :-)), and it might simplify maintenance in some way. Adding #lang support in Guile would be nice. As discussed on IRC, it can be experimented with in a WIP branch. We’ll then have to discuss when to incorporate it. My gut feeling is that it may have to wait until the next stable series (3.2.x), as this is quite a core change, but let’s see what people think. Thanks! Ludo’. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 11:36 ` Ludovic Courtès @ 2023-02-23 17:48 ` Dr. Arne Babenhauserheide 2023-02-23 18:42 ` Maxime Devos 1 sibling, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-23 17:48 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 4049 bytes --] Hi, Ludovic Courtès <ludo@gnu.org> writes: > FWIW, I think it might be best to keep Wisp as a separate package: that > allows it to evolve independently of Guile (and possibly more quickly > :-)), and it might simplify maintenance in some way. While this reasoning is true for most languages, I don’t think it applies to wisp: wisp will not evolve faster independently of Guile, because it is pretty much finished, and it should not need much maintenance (and if some should be needed, I can do that directly in Guile). It is such a thin layer that (different from other languages) there is no need to take care of changing syntax to follow or new semantics. It is defined such that its implementation can actually be finished. Every change in Scheme can automatically also be used with Wisp. While ecmascript, elisp, lokke, and python-on-guile must be kept up to date to stay useful, Wisp does not need that. The only larger changes in the past 4 years were in editor-support and some scripts, and these can stay separate, just like emacs-geiser-guile is separate from Guile. I propose adding Wisp now, because while I have been using Wisp for many projects in the past years, the changes to wisp itself since declaring 1.0 at FOSDEM 2019 were minimal, and the biggest were due to the review of Maxime here: wisp 1.0.10 (2023-02-16): - only extend the reader for ## while reading wisp. Thanks to Maxime Devos for help to use fluids! wisp 1.0.9 (2023-02-16): - remove limitation of the number of prefix underscores (_). Thanks to Maxime Devos for a much cleaner algorithm! - only set *port* encoding to UTF-8, do not change encoding for the application. Thanks to Maxime Devos! wisp 1.0.8 (2022-12-09): - wisp2lisp can now process stdin when called with - as filename. And it has help output. wisp 1.0.7 (2021-12-20): - fix: a lisp-style comment in the bash-cript had broken the wisp REPL wisp 1.0.6 (2021-11-30): - allow (and ignore!) a single space indentation for the first line of a chunk to support meta-commands - ensure that (language wisp) is compiled in the wisp REPL wisp 1.0.5 (2021-05-02): - explicitly allow using wisp as language under the expat-license for easier embedding in Guile-using games like Tsukundere Shitsumon: https://gitlab.com/leoprikler/tsukundere-shitsumon/ wisp 1.0.4 (2021-02-08): - add one more setlocale fallback: If it cannot use unicode, wisp now proceeds with degraded operation rather than failing outright. wisp 1.0.3 (2020-09-15): - provide wisp script that wraps guile --language=wisp -x .w - add Guile 3.0 to supported versions - fix documentation: wisp allows up to 12 underscores - You can create wisp-projects with conf via `conf new -l wisp PROJNAME`. See https://hg.sr.ht/~arnebab/conf - wisp moved to sourcehut: https://hg.sr.ht/~arnebab/wisp wisp 1.0.2 (2019-04-09): - guild compile is missing the load path wisp 1.0.1 (2019-03-23): - fix install directory, thanks to James-Adam Renquinha Henri and Ludovic Courtès who both discovered a mistake in the paths: correct module path is /usr/share/guile/site/<version>/…, but I used /usr/share/guile/<version>/site - simplify install logic and pre-compile installed modules. - add beautiful make help wisp 1.0 (2019-02-08): - add FOSDEM 2019 slides: docs/fosdem2019.org - As presented at FOSDEM, wisp the language is complete. Tooling, documentation, and porting are still work in progress. I plan to then only install language/wisp from the wisp-repo when the local Guile does not provide wisp. The reasoning for adding wisp to Guile is at the start of the thread: https://lists.gnu.org/archive/html/guile-devel/2023-02/msg00012.html > Adding #lang support in Guile would be nice. As discussed on IRC, it > can be experimented with in a WIP branch. I now created the branch wip-load-lang with the two patches by Matt. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 11:36 ` Ludovic Courtès 2023-02-23 17:48 ` Dr. Arne Babenhauserheide @ 2023-02-23 18:42 ` Maxime Devos 2023-02-24 15:45 ` Ludovic Courtès 1 sibling, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-23 18:42 UTC (permalink / raw) To: Ludovic Courtès, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 1776 bytes --] On 23-02-2023 12:36, Ludovic Courtès wrote: > Hi! > > Sorry for the late reply. > > FWIW, I think it might be best to keep Wisp as a separate package: that > allows it to evolve independently of Guile (and possibly more quickly > :-)), and it might simplify maintenance in some way. To my understanding, Wisp is pretty much finished -- it is standardised as surfie 119: <https://srfi.schemers.org/srfi-119/>, which is in final status. As such, there is no room for evolving (beside bugfixes, perhaps). The maintenance aspect (and also the evolving) is addressed in the cover letter: > - And it provides access to the full capabilities of Guile with minimal > maintenance effort, because it is just the thinnest possible layer > around Scheme. The last required change was in 2020 while I used it > continuously. There also were several other points in the cover letter for keeping Wisp as a _non-separate_ package. Why should Wisp be a separate package when other SRFIs are made part of Guile? Your point about maintenance and evolving applies equally to other SRFIs. > Adding #lang support in Guile would be nice. As discussed on IRC, it > can be experimented with in a WIP branch. Have you seen my messages on how the "#lang" construct is problematic for some languages, and how alternatives like "[comment delimiter] -*- stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be equally simple (*) and not have any downsides (**). (*) The port encoding detection supports "-*- coding: whatever -*-", presumably that functionality could be reused. (**) For compatibility with Racket, it's not like we couldn't implement both "#lang" and "-*- stuff: language -*-". Greetingss, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 18:42 ` Maxime Devos @ 2023-02-24 15:45 ` Ludovic Courtès 2023-02-24 16:34 ` Dr. Arne Babenhauserheide 2023-02-24 23:48 ` Maxime Devos 0 siblings, 2 replies; 56+ messages in thread From: Ludovic Courtès @ 2023-02-24 15:45 UTC (permalink / raw) To: guile-devel Hello! Maxime Devos <maximedevos@telenet.be> skribis: > Why should Wisp be a separate package when other SRFIs are made part > of Guile? Your point about maintenance and evolving applies equally > to other SRFIs. That’s a good point. Making it available as (srfi srfi-119) would make sense I guess. I need to take a closer look… >> Adding #lang support in Guile would be nice. As discussed on IRC, it >> can be experimented with in a WIP branch. > > Have you seen my messages on how the "#lang" construct is problematic > for some languages, and how alternatives like "[comment delimiter] -*- > stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be > equally simple (*) and not have any downsides (**). > > (*) The port encoding detection supports "-*- coding: whatever -*-", > presumably that functionality could be reused. > > (**) For compatibility with Racket, it's not like we couldn't > implement both "#lang" and "-*- stuff: language -*-". I haven’t seen your messages yet, I just wanted to express support of the general idea. For years, we have discussed #lang support; I know Andy is enthusiastic about it, and while I was initially reluctant, I’ve come to appreciate the idea. What you point out is worth considering, but note that Guile already supports #!r6rs for instance. Thanks, Ludo’. ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-24 15:45 ` Ludovic Courtès @ 2023-02-24 16:34 ` Dr. Arne Babenhauserheide 2023-03-08 10:34 ` Dr. Arne Babenhauserheide 2023-02-24 23:48 ` Maxime Devos 1 sibling, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-24 16:34 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 693 bytes --] Hi, Ludovic Courtès <ludo@gnu.org> writes: >> Why should Wisp be a separate package when other SRFIs are made part >> of Guile? Your point about maintenance and evolving applies equally >> to other SRFIs. > > That’s a good point. Making it available as (srfi srfi-119) would make > sense I guess. I need to take a closer look… That’s where the documentation and tests are located: - test-suite/tests/srfi-119.test - doc/ref/srfi-modules.texi::5666:@node SRFI-119 The language implementation is in (language wisp) because that’s in the language search path. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-24 16:34 ` Dr. Arne Babenhauserheide @ 2023-03-08 10:34 ` Dr. Arne Babenhauserheide 0 siblings, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-03-08 10:34 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 1177 bytes --] Hi, "Dr. Arne Babenhauserheide" <arne_bab@web.de> writes: > Ludovic Courtès <ludo@gnu.org> writes: >>> Why should Wisp be a separate package when other SRFIs are made part >>> of Guile? Your point about maintenance and evolving applies equally >>> to other SRFIs. >> >> That’s a good point. Making it available as (srfi srfi-119) would make >> sense I guess. I need to take a closer look… > > That’s where the documentation and tests are located: > > - test-suite/tests/srfi-119.test > - doc/ref/srfi-modules.texi::5666:@node SRFI-119 > > The language implementation is in (language wisp) because that’s in the > language search path. Given the complexities in changing the way languages are handled (the required discussions, as we’ve seen in the not yet resolved discussion), would you be OK with keeping the question about adding support for SRFI-119 to Guile separate from the general discussion about language handling? Are there improvements needed besides the ones I did thanks to the review by Maxime or is this good to go? Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-24 15:45 ` Ludovic Courtès 2023-02-24 16:34 ` Dr. Arne Babenhauserheide @ 2023-02-24 23:48 ` Maxime Devos 2023-02-24 23:51 ` Maxime Devos 1 sibling, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-24 23:48 UTC (permalink / raw) To: Ludovic Courtès, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 1491 bytes --] On 24-02-2023 16:45, Ludovic Courtès wrote: >>> Adding #lang support in Guile would be nice. As discussed on IRC, it >>> can be experimented with in a WIP branch. >> Have you seen my messages on how the "#lang" construct is problematic >> for some languages, and how alternatives like "[comment delimiter] -*- >> stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be >> equally simple (*) and not have any downsides (**). >> >> (*) The port encoding detection supports "-*- coding: whatever -*-", >> presumably that functionality could be reused. >> >> (**) For compatibility with Racket, it's not like we couldn't >> implement both "#lang" and "-*- stuff: language -*-". > I haven’t seen your messages yet, I just wanted to express support of > the general idea. For years, we have discussed #lang support; I know > Andy is enthusiastic about it, and while I was initially reluctant, I’ve > come to appreciate the idea. > > What you point out is worth considering, but note that Guile already > supports #!r6rs for instance. While I initially thought that #!r6rs was problematic, it was pointed out to me that #!r6rs is actually in the R6RS (unlike #lang, and even if #lang was in the R6RS, it isn't in the de-jure or de-factor standards of many other languages). As such, #!r6rs is a different (unproblematic) situation compared to #lang -- #lang is not the same as #!r6rs from a compatibility perspective. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-24 23:48 ` Maxime Devos @ 2023-02-24 23:51 ` Maxime Devos 2023-02-25 0:15 ` Matt Wette 0 siblings, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-24 23:51 UTC (permalink / raw) To: Ludovic Courtès, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 403 bytes --] On 25-02-2023 00:48, Maxime Devos wrote: >>> (**) For compatibility with Racket, it's not like we couldn't >>> implement both "#lang" and "-*- stuff: language -*-". TBC, I mean ‘only support #lang' for values of 'lang' that Racket supports’, or alternatively ‘support #lang for compatibility, but recommend -*- ... -*- in the Guile manual.’. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-24 23:51 ` Maxime Devos @ 2023-02-25 0:15 ` Matt Wette 2023-02-25 10:42 ` Maxime Devos 0 siblings, 1 reply; 56+ messages in thread From: Matt Wette @ 2023-02-25 0:15 UTC (permalink / raw) To: guile-devel On 2/24/23 3:51 PM, Maxime Devos wrote: > On 25-02-2023 00:48, Maxime Devos wrote: >>>> (**) For compatibility with Racket, it's not like we couldn't >>>> implement both "#lang" and "-*- stuff: language -*-". > > TBC, I mean ‘only support #lang' for values of 'lang' that Racket > supports’, or alternatively ‘support #lang for compatibility, but > recommend -*- ... -*- in the Guile manual.’. The point of this is to tell the compiler what language it needs to parse, if not the default (Scheme). Using `#lang pascal' or `#!lang pascal' allows the compiler to figure this out in the first few characters read. Using `-*- lang: pascal -*-' might require the compiler to read a lot of text (not knowing if what text is comment or not) to determine if language has been specified. For the `-*-' based design, is there a maximum number of characters to be read before deciding? Matt ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-25 0:15 ` Matt Wette @ 2023-02-25 10:42 ` Maxime Devos 0 siblings, 0 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-25 10:42 UTC (permalink / raw) To: Matt Wette, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 2402 bytes --] On 25-02-2023 01:15, Matt Wette wrote: > On 2/24/23 3:51 PM, Maxime Devos wrote: >> On 25-02-2023 00:48, Maxime Devos wrote: >>>>> (**) For compatibility with Racket, it's not like we couldn't >>>>> implement both "#lang" and "-*- stuff: language -*-". >> >> TBC, I mean ‘only support #lang' for values of 'lang' that Racket >> supports’, or alternatively ‘support #lang for compatibility, but >> recommend -*- ... -*- in the Guile manual.’. > > The point of this is to tell the compiler what language it needs to parse, > if not the default (Scheme). Using `#lang pascal' or `#!lang pascal' > allows > the compiler to figure this out in the first few characters read. Using > `-*- lang: pascal -*-' might require the compiler to read a lot of text > (not knowing > if what text is comment or not) to determine if language has been > specified. > For the `-*-' based design, is there a maximum number of characters to be > read before deciding? Guile already searches for '-*- ... -*-' lines, as mentioned in a previous message: > (*) The port encoding detection supports "-*- coding: whatever -*-", > presumably that functionality could be reused. As such, it shouldn't incur an overly large cost to support '-*- something: modula -*-' too. On the character limit: according to Emacs documentation, there is a line limit: the '-*- ... -*-' must be on the first line. (See: ‘(emacs)Specifying File Variables’.) This seems the same situation like '#lang' to me; not ‘a lot of text’. A more explicit character limit of 'first few hundred bytes' is mentioned in see the documentation of 'file-encoding': -- Scheme Procedure: file-encoding port -- C Function: scm_file_encoding (port) Attempt to scan the **first few hundred bytes** from the PORT for hints about its character encoding. Return a string containing the encoding name or ‘#f’ if the encoding cannot be determined. The port is rewound. Currently, the only supported method is to look for an Emacs-like character coding declaration (*note how Emacs recognizes file encoding: (emacs)Recognize Coding.). The coding declaration is of the form ‘coding: XXXXX’ and must appear in a Scheme comment. Additional heuristics may be added in the future. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-16 21:38 ` Dr. Arne Babenhauserheide 2023-02-17 1:26 ` Matt Wette @ 2023-02-17 23:06 ` Maxime Devos 2023-02-18 3:50 ` Philip McGrath 1 sibling, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-17 23:06 UTC (permalink / raw) To: Dr. Arne Babenhauserheide, Matt Wette; +Cc: guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 2013 bytes --] On 16-02-2023 22:38, Dr. Arne Babenhauserheide wrote: > > Matt Wette <matt.wette@gmail.com> writes: > >> You may be interested in the load-lang patch I generated a few years ago >> to allow file-extension based loading, in addition to '#lang elisp" >> type hooks. >> >> https://github.com/mwette/guile-contrib/blob/main/patch/3.0.8/load-lang.patch > > @Maxime: Is this something you’d be interested in championing? For the '#lang whatever stuff', no: The '#lang whatever' stuff makes Scheme (*) files unportable between implementations, as '#lang scheme' is not a valid comment -- there exist Schemes beyond Guile and Racket. If it were changed to recognising '-*- mode: scheme -*-' or '-*- language: scheme -*-' or such, it would be better IMO, but insufficient, because (^). (*) Same argument applies for some, but not all, other non-Scheme languages too. I'm assuming you don't meant the '%file-extension-map' stuff, because of your previous ‘[...] and also to avoid stumbling over files that just take that extension’ response to the proposal for such a thing. Even if you meant that, (^) also applies. (^) it doesn't integrate with the module system -- more concretely, (use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments would solve that, but we agree that that would be unreasonable in many situations. (Alternatively one could place ECMAScript code in a file with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but ... no.) Aside from the '#lang ...' -> '-*- language: scheme -*-' stuff, I think the idea behind the patch is good (**) -- it solves the problem it aims to solve, AFAICT. However, this problem is not the module system problem that Wisp currently has. As such, I suppose you could say that I would 'champion' the patch on its own, but not champion it in relation to '[PATCH] add language/wisp to Guile'. (**) It needs some documentation though. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-17 23:06 ` Maxime Devos @ 2023-02-18 3:50 ` Philip McGrath 2023-02-18 15:58 ` Maxime Devos 2023-02-23 7:59 ` Maxime Devos 0 siblings, 2 replies; 56+ messages in thread From: Philip McGrath @ 2023-02-18 3:50 UTC (permalink / raw) To: guile-devel Hi, On Fri, Feb 17, 2023, at 6:06 PM, Maxime Devos wrote: > On 16-02-2023 22:38, Dr. Arne Babenhauserheide wrote: >> >> Matt Wette <matt.wette@gmail.com> writes: >> >>> You may be interested in the load-lang patch I generated a few years ago >>> to allow file-extension based loading, in addition to '#lang elisp" >>> type hooks. >>> >>> https://github.com/mwette/guile-contrib/blob/main/patch/3.0.8/load-lang.patch >> >> @Maxime: Is this something you’d be interested in championing? > > For the '#lang whatever stuff', no: > > The '#lang whatever' stuff makes Scheme (*) files unportable between > implementations, as '#lang scheme' is not a valid comment -- there exist > Schemes beyond Guile and Racket. If it were changed to recognising > '-*- mode: scheme -*-' or '-*- language: scheme -*-' or such, it would > be better IMO, but insufficient, because (^). > I haven't read the patch or this thread closely, but R6RS has an answer to any concerns about compatibility with `#lang`. At the beginning of Chapter 4, "Lexical and Datum Syntax" (<http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4>) the report specifies: > An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax `#!<identifier>`, for any <identifier> (see section 4.2.4) that is not `r6rs`, as a syntax violation, and it may use specific `#!`-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax. The syntax `#!r6rs` may be used to signify that the input afterward is written with the lexical syntax and datum syntax described by this report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3. Chez Scheme uses such comments to support extensions to lexical syntax, as documented in <https://cisco.github.io/ChezScheme/csug9.5/intro.html#./intro:h1>: > The Chez Scheme lexical extensions described above are disabled in an input stream after an `#!r6rs` comment directive has been seen, unless a `#!chezscheme` comment directive has been seen since. Each library loaded implicitly via import and each RNRS top-level program loaded via the `--program` command-line option, the `scheme-script` command, or the `load-program` procedure is treated as if it begins implicitly with an `#!r6rs` comment directive. > The case of symbol and character names is normally significant, as required by the Revised6 Report. Names are folded, as if by string-foldcase, following a `#!fold-case` comment directive in the same input stream unless a `#!no-fold-case` has been seen since. Names are also folded if neither directive has been seen and the parameter `case-sensitive` has been set to `#f`. In Racket, in the initial configuration of the reader when reading a file, "`#!` is an alias for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or `_`." (See <https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29>.) This does not conflict with Racket's support for script shebangs: "A `#!` (which is `#!` followed by a space) or `#!/` starts a line comment that can be continued to the next line by ending a line with `\`. This form of comment normally appears at the beginning of a Unix script file." (See <https://docs.racket-lang.org/reference/reader.html#%28part._parse-comment%29>.) Furthermore, the lexical syntax for the rest of the file is entirely under control of the specified language. Most languages parameterize the reader to reject further uses of `#lang` or its `#!` alias. Some "meta-languages" chain-load another language but parameterize it in some way (e.g. <https://docs.racket-lang.org/exact-decimal-lang/>). The `#!r6rs` language, of course, handles `#!` exactly as specified by R6RS, with no extensions. (Guile does not handle `#!r6rs` properly, presumably because of the legacy `#!`/`!#` block comments. I think this should be a surmountable obstacle, though, especially since Guile does support standard `#|`/`|#` block comments.) > > (^) it doesn't integrate with the module system -- more concretely, > (use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments > would solve that, but we agree that that would be unreasonable in many > situations. (Alternatively one could place ECMAScript code in a file > with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but > ... no.) > Racket has a mechanism to enable additional source file extensions without needing explicit command-line arguments by defining `module-suffixes` or `doc-modules-suffixes` in a metadata module that is consulted when the collection is "set up": https://docs.racket-lang.org/raco/setup-info.html However, this mechanism is not widely used. Overall, the experience of the Racket community strongly suggests that a file should say what language it is written in. Furthermore, that language is a property of the code, not of its runtime environment, so environment variables, command-line options, and similar extralinguistic mechanism are a particularly poor fit for controlling it. File extensions are not the worst possible mechanisms, but they have similar problems: code written in an unsaved editor or a blog post may not have a file extension. (For more on this theme, see the corresponding point of the Racket Manifesto: <https://cs.brown.edu/~sk/Publications/Papers/Published/fffkbmt-racket-manifesto/paper.pdf>) Actually writing the language into the source code has proven to work well. To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, the ancient & spry" (<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>): > On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that. -Philip ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-18 3:50 ` Philip McGrath @ 2023-02-18 15:58 ` Maxime Devos 2023-02-18 19:56 ` Matt Wette 2023-02-26 7:45 ` Philip McGrath 2023-02-23 7:59 ` Maxime Devos 1 sibling, 2 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-18 15:58 UTC (permalink / raw) To: Philip McGrath, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 9006 bytes --] On 18-02-2023 04:50, Philip McGrath wrote: > I haven't read the patch or this thread closely, I'll assume you have read it non-closely. > but R6RS has an answer to any concerns about compatibility with `#lang`. At the beginning of Chapter 4, "Lexical and Datum Syntax" (<http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4>) the report specifies: > >> An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax `#!<identifier>`, for any <identifier> (see section 4.2.4) that is not `r6rs`, as a syntax violation, and it may use specific `#!`-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax. The syntax `#!r6rs` may be used to signify that the input afterward is written with the lexical syntax and datum syntax described by this report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3. That is for '#!lang', not '#lang'. R6RS allows the former, but the patch does the latter. As such, R6RS does not have an answer about incompatibility with `#lang', unless you count ‘it's incompatible’ as an answer. > Chez Scheme uses such comments to support extensions to lexical syntax, as documented in <https://cisco.github.io/ChezScheme/csug9.5/intro.html#./intro:h1>: > >> The Chez Scheme lexical extensions described above are disabled in an input stream after an `#!r6rs` comment directive has been seen, unless a `#!chezscheme` comment directive has been seen since. Each library loaded implicitly via import and each RNRS top-level program loaded via the `--program` command-line option, the `scheme-script` command, or the `load-program` procedure is treated as if it begins implicitly with an `#!r6rs` comment directive. [...] Again, that's '#!whatever', not '#lang' -- Chez does the former, not the latter. > In Racket, in the initial configuration of the reader when reading a file, "`#!` is an alias for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or `_`." (See <https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29>.) [...] > (Guile does not handle `#!r6rs` properly, presumably because of the legacy `#!`/`!#` block comments. I think this should be a surmountable obstacle, though, especially since Guile does support standard `#|`/`|#` block comments.) ‘#! ... !#’ comments aren't legacy; they exist to allow putting the shebang in the first line of a script, and to pass additional arguments to the Guile interpreter (see: (guile)The Top of a Script File) (*). As such, you can't just replace them with #| ... |# (unless you patch the kernel to recognise "#| ..." as a shebang line). (*) Maybe they exist for other purposes too. Furthermore, according to the kernel, #!r6rs would mean that the script needs to be interpreted by a program named 'r6rs', but 'guile' is named 'guile', not 'r6rs'. (I assume this is in POSIX somewhere, though I couldn't find it.) (This is an incompatibility between R6RS and any system that has shebangs.) >> >> (^) it doesn't integrate with the module system -- more concretely, >> (use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments >> would solve that, but we agree that that would be unreasonable in many >> situations. (Alternatively one could place ECMAScript code in a file >> with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but >> ... no.) >> > > Racket has a mechanism to enable additional source file extensions without needing explicit command-line arguments by defining `module-suffixes` or `doc-modules-suffixes` in a metadata module that is consulted when the collection is "set up": https://docs.racket-lang.org/raco/setup-info.html However, this mechanism is not widely used. I guess this is an improvement over the runtime 'guile -x extension'. However, if I'm understanding 'setup-info.html' correctly, the downside is that you now need a separate file containing compilation settings. I have previously proposed a mechanism that makes the '-x' + '--language' a compile-time thing (i.e., embed the source file extension in the compiled .go; see previous e-mails in this thread), without having to make a separate file containing compilation settings. How is Racket's method an improvement over my proposal? > Overall, the experience of the Racket community strongly suggests that a file should say what language it is written in. Furthermore, that language is a property of the code, not of its runtime environment, so environment variables, command-line options, and similar extralinguistic mechanism are a particularly poor fit for controlling it. Agreed on the 'no environment variables' thing, disagreed on the 'no command-line options'. In the past e-mails in this thread, there was agreement on the ‘embed the source file extension in the compiled .go or something like that; and add -x extension stuff _when compiling_ (not runtime!) the software that uses the extension’. Do you any particular issues with that proposal? AFAICT, it solves everything and is somewhat more straightforward that Racket. > File extensions are not the worst possible mechanisms, but they have similar problems: code written in an unsaved editor or a blog post may not have a file extension. With the proposal I wrote, it remains possible to override any 'file extension -> language' mapping. It's not in any way incompatible with "-*- lang: whatever -*-"-like comments. Additionally, Guile can only load files that exist (i.e, 'saved'); Guile is not an editor or blog reader, so these do not appear problems for Guile to me. If the editor needs to determine the language for syntax highlighting or such, then there exist constructs like ';; -*- mode: scheme -*-' that are valid Scheme, but that's not a Guile matter. > (For more on this theme, see the corresponding point of the Racket Manifesto: <https://cs.brown.edu/~sk/Publications/Papers/Published/fffkbmt-racket-manifesto/paper.pdf>) Actually writing the language into the source code has proven to work well. What is the corresponding point? I'm not finding any search results for 'file extension' or 'file name', and I'm not finding any relevant search results for 'editor'. Could you give me a page reference and a relevant quote? > To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, the ancient & spry" (<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>): > >> On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that. It was never in dispute that 'we need parallel installability'. This ‘parallel installability’ is also addressed by my proposal. For the second part: ‘Racket did a great job facilitating this with #lang and we should just adopt that’: * I previously pointed out some problems with that proposal -- i.e., '#lang whatever' is bogus Scheme / Wisp / ..., and 'the module system won't find it, because of the unexpected file extensions'. * I gave an alternate proposal (i.e., the 'embed source file extension in the compiled .go, and add something like '-x extension' and '--language ...' at compilation time), which doesn't have the pointed out problems. * Andy Wingo hasn't addressed the pointed out problems with the Racket solution adapted to Guile. * Andy Wingo hasn't considered any alternative proposals in that blog post. Why are you taking Andy Wingo's proposal as an argument by authority, when I have already pointed out the problems with that proposal and proposed a different solution without those problems? ‘Arguments by authority’ stop being good when someone explains how the authority is wrong. As you appear to consider arguments by authority to be valid arguments even in the existence of evidence that the authority is _wrong_, here's an argument from authority that ‘arguments from authority’ can be refuted: > (From: https://en.wikipedia.org/wiki/Argument_from_authority) > Scientific knowledge is best established by evidence and experiment rather than argued through authority[18][19][20] as authority has no place in science.[19][21] Carl Sagan wrote of arguments from authority: "One of the great commandments of science is, 'Mistrust arguments from authority.' ... Too many such arguments have proved too painfully wrong. Authorities must prove their contentions like everybody else."[22] However, countering this it has been argued that science is fundamentally dependent on arguments from authority to progress because "they allow science to avoid forever revisiting the same ground".[23] Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-18 15:58 ` Maxime Devos @ 2023-02-18 19:56 ` Matt Wette 2023-02-21 12:09 ` Dr. Arne Babenhauserheide 2023-02-26 7:45 ` Philip McGrath 1 sibling, 1 reply; 56+ messages in thread From: Matt Wette @ 2023-02-18 19:56 UTC (permalink / raw) To: guile-devel On 2/18/23 7:58 AM, Maxime Devos wrote: > On 18-02-2023 04:50, Philip McGrath wrote: >> I haven't read the patch or this thread closely, > > I'll assume you have read it non-closely. > >> but R6RS has an answer to any concerns about compatibility with >> `#lang`. At the beginning of Chapter 4, "Lexical and Datum Syntax" >> (<http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4>) >> the report specifies: >> >>> An implementation must not extend the lexical or datum syntax in >>> any way, with one exception: it need not treat the syntax >>> `#!<identifier>`, for any <identifier> (see section 4.2.4) that is >>> not `r6rs`, as a syntax violation, and it may use specific >>> `#!`-prefixed identifiers as flags indicating that subsequent input >>> contains extensions to the standard lexical or datum syntax. The >>> syntax `#!r6rs` may be used to signify that the input afterward is >>> written with the lexical syntax and datum syntax described by this >>> report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3. > > That is for '#!lang', not '#lang'. R6RS allows the former, but the > patch does the latter. As such, R6RS does not have an answer about > incompatibility with `#lang', unless you count ‘it's incompatible’ as > an answer. I just looked on the web and it appears that #! as first line is a comment in JavaScript also. https://developer.mozilla.org/en-US/docs/Web/JavaScript/Reference/Lexical_grammar#hashbang_comments ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-18 19:56 ` Matt Wette @ 2023-02-21 12:09 ` Dr. Arne Babenhauserheide 0 siblings, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-21 12:09 UTC (permalink / raw) To: Matt Wette; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 191 bytes --] Hi Matt, Please tell me once you know for which patch exactly you need a WIP-branch. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-18 15:58 ` Maxime Devos 2023-02-18 19:56 ` Matt Wette @ 2023-02-26 7:45 ` Philip McGrath 2023-02-26 15:42 ` Maxime Devos 2023-10-02 14:59 ` Christine Lemmer-Webber 1 sibling, 2 replies; 56+ messages in thread From: Philip McGrath @ 2023-02-26 7:45 UTC (permalink / raw) To: Maxime Devos, Ludovic Courtès, Matt Wette, guile-devel Cc: Christine Lemmer-Webber Hi, On Sat, Feb 18, 2023, at 10:58 AM, Maxime Devos wrote: > On 18-02-2023 04:50, Philip McGrath wrote: >> I haven't read the patch or this thread closely, > > I'll assume you have read it non-closely. > >> but R6RS has an answer to any concerns about compatibility with `#lang`. At the beginning of Chapter 4, "Lexical and Datum Syntax" (<http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4>) the report specifies: >> >>> An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax `#!<identifier>`, for any <identifier> (see section 4.2.4) that is not `r6rs`, as a syntax violation, and it may use specific `#!`-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax. The syntax `#!r6rs` may be used to signify that the input afterward is written with the lexical syntax and datum syntax described by this report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3. > > That is for '#!lang', not '#lang'. R6RS allows the former, but the > patch does the latter. As such, R6RS does not have an answer about > incompatibility with `#lang', unless you count ‘it's incompatible’ as an > answer. > Let me try to be more concrete. If you want a portable, RnRS-standardized lexical syntax for `#lang`, use `#!<identifier>`, and systems that understand `#lang` will treat it (in appropriate contexts) as an alias for `#lang `. Alternatively, you could embrace that Guile (like every other Scheme system I'm aware of) starts by default in a mode with implementation-specific extensions. Indeed, R6RS Appendix A specifically recognizes that "the default mode offered by a Scheme implementation may be non-conformant, and such a Scheme implementation may require special settings or declarations to enter the report-conformant mode" [1]. Then you could just write `#lang` and worry about the non-portable block comments some other day. This is what I would personally prefer. >> In Racket, in the initial configuration of the reader when reading a file, "`#!` is an alias for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or `_`." (See <https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29>.) [...] > (Guile does not handle `#!r6rs` properly, presumably because of the > legacy `#!`/`!#` block comments. I think this should be a surmountable > obstacle, though, especially since Guile does support standard `#|`/`|#` > block comments.) > > ‘#! ... !#’ comments aren't legacy; they exist to allow putting the > shebang in the first line of a script, and to pass additional arguments > to the Guile interpreter (see: (guile)The Top of a Script File) (*). As > such, you can't just replace them with #| ... |# (unless you patch the > kernel to recognise "#| ..." as a shebang line). > > (*) Maybe they exist for other purposes too. According to "(guile)Block Comments", the `#!...!#` syntax existed before Guile 2.0 added support for `#|...|#` comments from SRFI 30 and R6RS. > > Furthermore, according to the kernel, #!r6rs would mean that the script > needs to be interpreted by a program named 'r6rs', but 'guile' is named > 'guile', not 'r6rs'. (I assume this is in POSIX somewhere, though I > couldn't find it.) > > (This is an incompatibility between R6RS and any system that has shebangs.) > This is not an incompatibility, because the `#!r6rs` lexeme (or `#!<identifier>`, more generally) is not the shebang line for the script. R6RS Appendix D [2] gives this example of a Scheme script: ``` #!/usr/bin/env scheme-script #!r6rs (import (rnrs base) (rnrs io ports) (rnrs programs)) (put-bytes (standard-output-port) (call-with-port (open-file-input-port (cadr (command-line))) get-bytes-all)) ``` The appendix says that, "if the first line of a script begins with `#!/` or `#!<space>`, implementations should ignore it on all platforms, even if it does not conform to the recommended syntax". Admittedly this is not handled as consistently as I would prefer: I wish they had just standardized `#!/` and `#! ` as special comment syntax, as Racket does, and clarified the interaction with `#!<identifier>`. But Matt points out that JavaScript also has very similar special treatment for a single initial shebang comment. Lua has a similar mechanism: my vague recollection is that many languages do. >>> >>> (^) it doesn't integrate with the module system -- more concretely, >>> (use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments >>> would solve that, but we agree that that would be unreasonable in many >>> situations. (Alternatively one could place ECMAScript code in a file >>> with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but >>> ... no.) Generally I would use `.scm` (or `.rkt`), and certainly I would do so if there isn't some well-established other extension. If you are just using the file, you shouldn't necessarily have to care what language it's implemented in internally. In particular, I don't think the `#lang` concept should be conflated with editor configuration like `'-*- mode: ecmascript -*-`. As an example, consider these two Racket programs: ``` #!datalog parent(anchises, aeneas). parent(aeneas, ascanius). ancestor(A, B) :- parent(A, B). ancestor(A, B) :- parent(A, C), ancestor(C, B). ancestor(A, ascanius)? ``` ``` #lang algol60 begin comment Credit to Rosetta Code; integer procedure fibonacci(n); value n; integer n; begin integer i, fn, fn1, fn2; fn2 := 1; fn1 := 0; fn := 0; for i := 1 step 1 until n do begin fn := fn1 + fn2; fn2 := fn1; fn1 := fn end; fibonacci := fn end; integer i; for i := 0 step 1 until 20 do printnln(fibonacci(i)) end ``` While I'm sure there are Emacs modes available for Datalog and Algol 60, and some people might want to use them for these programs, I would probably want to edit them both in racket-mode: because racket-mode supports the `#lang` protocol, it can obtain the syntax highlighting, indentation, and other support defined by each language, while also retaining the global features that all `#lang`-based languages get "for free", like a tool to rename variables that respects the actual model of scope. This is one of the value propositions of the `#lang` system. >> >> Racket has a mechanism to enable additional source file extensions without needing explicit command-line arguments by defining `module-suffixes` or `doc-modules-suffixes` in a metadata module that is consulted when the collection is "set up": https://docs.racket-lang.org/raco/setup-info.html However, this mechanism is not widely used. > > I guess this is an improvement over the runtime 'guile -x extension'. > However, if I'm understanding 'setup-info.html' correctly, the downside > is that you now need a separate file containing compilation settings. > > I have previously proposed a mechanism that makes the '-x' + > '--language' a compile-time thing (i.e., embed the source file extension > in the compiled .go; see previous e-mails in this thread), without > having to make a separate file containing compilation settings. > > How is Racket's method an improvement over my proposal? > My focus in this thread is explaining and advocating for `#lang`. I see the whole business with file extensions as basically orthogonal to `#lang`, and my opinions about it are much less strong, but I'll try to answer your question. I think it would make sense for `.go` files to record the file extension of their corresponding source files: Racket's `.zo` files do likewise. I don't object to a command-line option *at compile-time* (as you said) to enable additional file extensions, and I agree that there isn't a huge difference between that and an approach with a separate configuration file, though I do find the configuration-file approach somewhat more declarative, which I prefer. What I was really trying to argue here is that the file extension should not determine the meaning of the program it contains: more on that below. >> Overall, the experience of the Racket community strongly suggests that a file should say what language it is written in. Furthermore, that language is a property of the code, not of its runtime environment, so environment variables, command-line options, and similar extralinguistic mechanism are a particularly poor fit for controlling it. > > Agreed on the 'no environment variables' thing, disagreed on the 'no > command-line options'. In the past e-mails in this thread, there was > agreement on the ‘embed the source file extension in the compiled .go or > something like that; and add -x extension stuff _when compiling_ (not > runtime!) the software that uses the extension’. > > Do you any particular issues with that proposal? AFAICT, it solves > everything and is somewhat more straightforward that Racket. > I don't have particular issues with a compile-time command-line option to determine which files to compile. I do object to using command-line options or file extensions to determine what language a file is written in. >> File extensions are not the worst possible mechanisms, but they have similar problems: code written in an unsaved editor or a blog post may not have a file extension. > > With the proposal I wrote, it remains possible to override any 'file > extension -> language' mapping. It's not in any way incompatible with > "-*- lang: whatever -*-"-like comments. > > Additionally, Guile can only load files that exist (i.e, 'saved'); Guile > is not an editor or blog reader, so these do not appear problems for > Guile to me. > While it's true that the only files Guile can load are "files that exist", it's not true that "Guile can only load files": consider procedures like `eval-string`, `compile`, and, ultimately, `read-syntax`. AFAICT, to the extent that Guile's current implementations of such procedures support multiple languages, they rely on out-of-band configuration, like an optional `#:language` argument, which is just as extra-linguistic as relying on command-line options, environment variables, or file extensions. What I'm trying to advocate is that programs should say in-band, as part of their source code, what language they are written in. > If the editor needs to determine the language for syntax highlighting or > such, then there exist constructs like ';; -*- mode: scheme -*-' that > are valid Scheme, but that's not a Guile matter. > See above for why the `#!language/wisp` option is perfectly valid R6RS Scheme and for some of my concerns about overloading editor configuration to determine the semantics of programs. More broadly, everyone who reads a piece of source code, including humans as well as editors and the `guile` executable, needs to know what language it's written in to hope to understand it. >> (For more on this theme, see the corresponding point of the Racket Manifesto: <https://cs.brown.edu/~sk/Publications/Papers/Published/fffkbmt-racket-manifesto/paper.pdf>) Actually writing the language into the source code has proven to work well. > > What is the corresponding point? I'm not finding any search results for > 'file extension' or 'file name', and I'm not finding any relevant search > results for 'editor'. Could you give me a page reference and a relevant > quote? > I was trying to refer to section 5, "Racket Internalizes Extra-Linguistic Mechanisms", which begins on p. 121 (p. 9 of the PDF). Admittedly, the connection between the main set of examples they discuss and this conversation is non-obvious. Maybe the most relevant quote is the last paragraph of that section, on p. 123 (PDF p. 11): "Finally, Racket also internalizes other aspects of its context. Dating back to the beginning, Racket programs can programmatically link modules and classes. In conventional languages, programmers must resort to extra-linguistic tools to abstract over such linguistic constructs; only ML-style languages and some scripting languages make modules and classes programmable, too." (Internal citations omitted.) >> To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, the ancient & spry" (<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>): >> Sorry, this was meant to be tongue-in-cheek, and it seems that didn't come across. "Argument from authority" is often considered a category of logical fallacy, and ending with a quote is sometimes considered to be bad style or to weaken a piece of persuasive writing. > * I previously pointed out some problems with that proposal > -- i.e., '#lang whatever' is bogus Scheme / Wisp / ..., I hope I've explained why something like `#!language/wisp` is perfectly within the bounds of R6RS. Also, given that Guile already starts with non-standard extensions enabled by default, I don't see any reason not to also support `#lang language/wisp`. In particular, the spelling of `#lang` proceeds directly from the Scheme tradition. This is from the R6RS Rationale document, chapter 4, "Lexical Syntax", section 3, "Future Extensions" [3]: >>>> The `#` is the prefix of several different kinds of syntactic entities: vectors, bytevectors, syntactic abbreviations related to syntax construction, nested comments, characters, `#!r6rs`, and implementation-specific extensions to the syntax that start with `#!`. In each case, the character following the `#` specifies what kind of syntactic datum follows. In the case of bytevectors, the syntax anticipates several different kinds of homogeneous vectors, even though R6RS specifies only one. The `u8` after the `#v` identifies the components of the vector as unsigned 8-bit entities or octets. > and > 'the module system won't find it, because of the unexpected > file extensions'. > This is indeed something that needs to be addressed, but it seems like a very solvable problem. Using the extension ".scm" for everything would be one trivial solution. Something like your proposal to enable file extensions based on a compile-time option could likewise be part of a solution. In general, I'll say that, while using Guile, I've often missed Racket's more flexible constructs for importing modules. I especially miss `(require "foo/bar.rkt")`, which imports a module at a path relative to the module where the `require` form appears: it makes it easy to organize small programs into multiple files without having to mess with a load path. More messages have come since I started writing this reply, so I'll try to address them, too. On Thu, Feb 23, 2023, at 1:04 PM, Maxime Devos wrote: > On 23-02-2023 09:51, Dr. Arne Babenhauserheide wrote: >>> Thinking a bit more about it, it should be possible to special-case >>> Guile's interpretation of "#!" such that "#!r6rs" doesn't require a >>> closing "!#". (Technically backwards-incompatible, but I don't think >>> people are writing #!r6rs ...!# in the wild.) >> Do you need the closing !# if you restrict yourself to the first line? > > I thought so at first, but doing a little experiment, it appears you > don't need to: > > $ guile > scheme@(guile-user)> #!r6rs > (display "hi") (newline) > > (output: hi) > > Apparently Guile already has required behaviour. > All the `#!r6rs` examples I've tried since I got Ludo’'s mail have worked, but I remember some not working as I'd expected in the past. I'll see if I can come up with any problematic examples again. On Thu, Feb 23, 2023, at 1:42 PM, Maxime Devos wrote: > Have you seen my messages on how the "#lang" construct is problematic > for some languages, and how alternatives like "[comment delimiter] -*- > stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be > equally simple (*) and not have any downsides (**). > > (*) The port encoding detection supports "-*- coding: whatever -*-", > presumably that functionality could be reused. > IMO, the use of "-*- coding: whatever -*-" to detect encoding is an ugly hack and should not be extended further. I tried to raise some objections above to conflating editor configuration with syntax saying what a file's language is. More broadly, I find "magic comments" highly objectionable. The whole point of comments is to be able to communicate freely to human readers without affecting the interpreter/compiler/evaluator. Introducing magic comments means must constantly think about whether what you are writing for humans might change the meaning of your program. Magic comments *without knowing a priori what is a comment* are even worse: now, you have to beware of accidental "magic" in ALL of the lexical syntax of your program. (Consider that something like `(define (-*- mode: c++ -*-) 14)` is perfectly good Scheme.) (It's not really relevant for the `#lang`-like case, but something I find especially ironic about encoding "magic comments" or, say, `<?xml version="1.0" encoding="UTF-8"?>`, is that suddenly if you encode the Unicode text in some other encoding it becomes a lie.) On Fri, Feb 24, 2023, at 6:51 PM, Maxime Devos wrote: > On 25-02-2023 00:48, Maxime Devos wrote: >>>> (**) For compatibility with Racket, it's not like we couldn't >>>> implement both "#lang" and "-*- stuff: language -*-". > > TBC, I mean ‘only support #lang' for values of 'lang' that Racket > supports’ If I understand what you're proposing here, I don't think it's a viable option. The fundamental purpose of the `#lang` construct (however you spell it) is to provide an open, extensible protocol for defining languages. Thus, "values of 'lang' that Racket supports" are unbounded, provided that a module has been installed where the language specification says to look. From The Racket Reference [4]: >>>> The `#lang` reader form is similar to `#reader`, but more constrained: the `#lang` must be followed by a single space (ASCII 32), and then a non-empty sequence of alphanumeric ASCII, `+`, `-`, `_`, and/or `/` characters terminated by whitespace or an end-of-file. The sequence must not start or end with `/`. A sequence `#lang ‹name›` is equivalent to either `#reader (submod ‹name› reader)` or `#reader ‹name›/lang/reader`, where the former is tried first guarded by a `module-declared?` check (but after filtering by `current-reader-guard`, so both are passed to the value of `current-reader-guard` if the latter is used). Note that the terminating whitespace (if any) is not consumed before the external reading procedure is called. >>>> >>>> Finally, `#!` is an alias for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or `_`. Use of this alias is discouraged except as needed to construct programs that conform to certain grammars, such as that of R6RS [Sperber07]. (The rationale for the constraints, which Racketeers generally tend to chafe against, is that the syntax of `#lang‹name›` is the one and only thing that `#lang` doesn't give us a way to compatibly change. We can quickly get to a less constrained syntax by using a chaining "meta-language": see `#lang s-exp` and `#lang reader` on that page for two of many examples.) I expect reading this would raise more questions, because that page gives lots of details on Racket's `#lang` protocol. Do I really expect Guile to implement all of those details? If not, in what sense is what I'm advocating actually compatible with `#lang`? I am definitely **not** suggesting that Guile implement all the details of Racket's `#lang` implementation. What I do strongly advocate is that you design Guile's support for `#lang` (or `#!`) to leave open a pathway for compatibility in the future. I think the best way to explain how that would work is to take as an extended example Zuo, the tiny Scheme-like language created last year to replace the build scripts for Racket and Racket's branch of Chez Scheme. Zuo was initially prototyped in Racket as a `#lang` language. Since the goal was to use Zuo to build Racket, the primary implementation is an interpreter implemented in a single file of C code, avoiding bootstrapping issues. There isn't a working Zuo implementation as a Racket at the moment. (There's a shim implementation, and there's some work in progress, as people have time and interest, to get a real implementation working again.) Zuo is based on `#lang`, but its protocol [5][6] is quite different than Racket's. Nevertheless, as I will explain, they are compatible. The C code in fact implements not `#lang zuo` or even `#lang zuo/base` but `#lang zuo/kernel`: the rest of `#lang zuo` is implemented in Zuo, building up to `#lang zuo` through a series of internal languages. A module written in `#lang zuo/kernel` is a single expression which produces an immutable symbol-keyed hash table, which is Zup's core representation of a module. When Zuo encounters `#lang whatever`, it looks up the symbol `'read-and-eval` in the hash table representing the module `whatever`: the result should be a procedure that, given a Zuo string (a Scheme bytevector) with the source of the module, returns a hash table to be used as the module's representation. An implementation of `#lang zuo/kernel` in Racket would bridge this protocol with Racket's `#lang` by synthesizing `reader` submodules implementing the procedures the Racket protocol expects by wrapping the procedure mapped to `'read-and-eval` in the Zuo-level hash table. The wrappers would propagate themselves, so a language implemented in a language implemented in `#lang zuo/kernel` would likewise be automatically bridged, and so on ad infinitum. Racket's submodules [7] make this work especially elegantly. In Guile, my experience with the tower of languages is limited, but AIUI many of the existing facilities are like `lookup-language`[8] in expecting language X to be implemented by a language object bound to X in the module `(language X spec)`. I'd suggest that Guile support `#lang language/X` (or `#!language/X`, if you prefer to spell it that way) by likewise looking up X in the `(language X spec)` module. One day, compatibility could be achieved by adding trivial bridge (sub)modules: for an illustration of how trivial this can be, see [8], a one-line module that makes SRFI 11 available as `(import (srfi :11))` for R6RS by wrapping its historical PLT Scheme location, `(require srfi/11)`. I would NOT suggest supporting arbitrary things after `#lang`, because one part of planning for compatibility is avoiding future namespace collisions. Happily, `language/` is not otherwise in use in the Racket world, so I suggest that Guile claim it. I don't think this should be overly restrictive: if it seems worth-while to support languages from other modules, you could implement the "chaining meta-language" approach I mentioned above: imagine something like `#!language/other (@ (some other module) exported-language)`, where the `other` export of `(language other spec)` is responsible for reading the next datum and using it to obtain the language object to be used for the rest of the module. (Other kinds of potential namespace collisions are easier to manage: for example, we could imagine that `(use-modules (foo bar baz))` might not access the same module as `(require foo/bar/baz)`. This is in a way an example of where it makes sense to be constrained in the syntax of `#lang` itself and let `#lang` unlock endless possibilities.) I've sort of alluded above to my pipe dream of a grand unified future for Racket-and-Guile-on-Chez, Guile-and-Racket-on-the-Guile-VM, and endless other possibilities. I wrote about it in more detail on the guix-devel list at [10]. (These thoughts were inspired by conversations with Christine Lemmer-Webber, though she bears no responsibility for my zany imaginings.) Finally, I looked into the history of `#!` in R6RS a bit, and I'll leave a few pointers here for posterity. Will Clinger's 2015 Scheme Workshop paper [11] says in section 3.1 that "Kent Dybvig suggested the `#!r6rs` flag in May 2006", Clinger "formally proposed addition of Dybvig’s suggestion" [12], and, "less than six weeks later," `#!r6rs` was "in the R6RS editors’ status report". (I am not persuaded by all of the arguments about `#!r6rs` in that paper: in particular, the analysis doesn't seem to account for R6RS Appendix A [1].) As best as I can tell, the suggestion from Kent Dybvig is [13]: On Wed May 10 15:40:13 EDT 2006, Kent Dybvig wrote: > We already have (as of last week's meeting) a syntax for dealing with > implementation-dependent lexical exceptions, which is to allow for > #!<symbol-like-thing>, e.g.: > > #!mzsceheme > #!larceny > ... > > Perhaps we can plan on using the same tool for future extensions to the > syntax: > > #!r7rs > > We can even require #!r6rs to appear at the top of a library now, or at > least allow it to be included. > > This is a lot more concise than a MIME content-type line. > > Kent I haven't tracked down any older writing about `#!<symbol-like-thing>` for "implementation-dependent lexical exceptions": it may have been a conference call. -Philip [1]: http://www.r6rs.org/final/html/r6rs-app/r6rs-app-Z-H-3.html#node_chap_A [2]: http://www.r6rs.org/final/html/r6rs-app/r6rs-app-Z-H-6.html#node_chap_D [3]: http://www.r6rs.org/final/html/r6rs-rationale/r6rs-rationale-Z-H-6.html#node_chap_4 [4]: https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29 [5]: https://docs.racket-lang.org/zuo/Zuo_Overview.html#%28part._.Zuo_.Implementation_and_.Macros%29 [6]: https://docs.racket-lang.org/zuo/Zuo_Overview.html#%28part._module-protocol%29 [7]: https://www-old.cs.utah.edu/plt/publications/gpce13-f-color.pdf [8]: https://www.gnu.org/software/guile/manual/html_node/Compiler-Tower.html#index-lookup_002dlanguage [9]: https://github.com/racket/srfi/blob/25eb1c0e1ab8a1fa227750aa7f0689a2c531f8c8/srfi-lib/srfi/%253a11.rkt [10]: https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00010.html [11]: https://andykeep.com/SchemeWorkshop2015/papers/sfpw1-2015-clinger.pdf [12]: http://www.r6rs.org/r6rs-editors/2006-May/001251.html [13]: http://www.r6rs.org/r6rs-editors/2006-May/001248.html ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 7:45 ` Philip McGrath @ 2023-02-26 15:42 ` Maxime Devos 2023-02-26 16:14 ` Dr. Arne Babenhauserheide 2023-02-26 17:58 ` Matt Wette 2023-10-02 14:59 ` Christine Lemmer-Webber 1 sibling, 2 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-26 15:42 UTC (permalink / raw) To: Philip McGrath, Ludovic Courtès, Matt Wette, guile-devel Cc: Christine Lemmer-Webber [-- Attachment #1.1.1: Type: text/plain, Size: 37164 bytes --] Op 26-02-2023 om 08:45 schreef Philip McGrath: > Hi, > > On Sat, Feb 18, 2023, at 10:58 AM, Maxime Devos wrote: >> On 18-02-2023 04:50, Philip McGrath wrote: >>> I haven't read the patch or this thread closely, >> >> I'll assume you have read it non-closely. >> >>> but R6RS has an answer to any concerns about compatibility with `#lang`. At the beginning of Chapter 4, "Lexical and Datum Syntax" (<http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4>) the report specifies: >>> >>>> An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax `#!<identifier>`, for any <identifier> (see section 4.2.4) that is not `r6rs`, as a syntax violation, and it may use specific `#!`-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax. The syntax `#!r6rs` may be used to signify that the input afterward is written with the lexical syntax and datum syntax described by this report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3. >> >> That is for '#!lang', not '#lang'. R6RS allows the former, but the >> patch does the latter. As such, R6RS does not have an answer about >> incompatibility with `#lang', unless you count ‘it's incompatible’ as an >> answer. >> > > Let me try to be more concrete. > > If you want a portable, RnRS-standardized lexical syntax for `#lang`, use `#!<identifier>`, and systems that understand `#lang` will treat it (in appropriate contexts) as an alias for `#lang `. RnRS only standardises #!r6rs, not #!<identifier>. Even if RnRS standardised #!<identifier> for values of <identifier> that aren't rnrs, the RnRS only holds sway for Scheme, and one of the main points of Guile's language system is to support more than only Scheme. > > Alternatively, you could embrace that Guile (like every other Scheme system I'm aware of) starts by default in a mode with implementation-specific extensions. Indeed, R6RS Appendix A specifically recognizes that "the default mode offered by a Scheme implementation may be non-conformant, and such a Scheme implementation may require special settings or declarations to enter the report-conformant mode" [1]. Then you could just write `#lang` and worry about the non-portable block comments some other day. This is what I would personally prefer. Emphasis on 'non-conformant'. The appendix states that Scheme implementations don't need to be R6RS by default; it doesn't state that things non-conformant things are conformant with R6RS. Remember that this part of the discussion started with: ‘The '#lang whatever' stuff makes Scheme (*) files unportable between implementations, as '#lang scheme' is not a valid comment’. The R6RS might permit non-R6RS implementations, but this does not make non-R6RS constructs like '#lang scheme' portable. >>> In Racket, in the initial configuration of the reader when reading a file, "`#!` is an alias for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or `_`." (See <https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29>.) [...] > (Guile does not handle `#!r6rs` properly, presumably because of the >> legacy `#!`/`!#` block comments. I think this should be a surmountable >> obstacle, though, especially since Guile does support standard `#|`/`|#` >> block comments.) >> >> ‘#! ... !#’ comments aren't legacy; they exist to allow putting the >> shebang in the first line of a script, and to pass additional arguments >> to the Guile interpreter (see: (guile)The Top of a Script File) (*). As >> such, you can't just replace them with #| ... |# (unless you patch the >> kernel to recognise "#| ..." as a shebang line). >> >> (*) Maybe they exist for other purposes too. > > According to "(guile)Block Comments", the `#!...!#` syntax existed before Guile 2.0 added support for `#|...|#` comments from SRFI 30 and R6RS. I agree, and I don't follow what your point is here. >> Furthermore, according to the kernel, #!r6rs would mean that the script >> needs to be interpreted by a program named 'r6rs', but 'guile' is named >> 'guile', not 'r6rs'. (I assume this is in POSIX somewhere, though I >> couldn't find it.) >> >> (This is an incompatibility between R6RS and any system that has shebangs.) >> > > This is not an incompatibility, because the `#!r6rs` lexeme (or `#!<identifier>`, more generally) is not the shebang line for the script. R6RS Appendix D [2] gives this example of a Scheme script: > > ``` > #!/usr/bin/env scheme-script > #!r6rs > (import (rnrs base) > (rnrs io ports) > (rnrs programs)) > (put-bytes (standard-output-port) > (call-with-port > (open-file-input-port > (cadr (command-line))) > get-bytes-all)) > ``` OK, didn't notice that appendix. Only covers Scheme, though. > -- > The appendix says that, "if the first line of a script begins with `#!/` or `#!<space>`, implementations should ignore it on all platforms, even if it does not conform to the recommended syntax". Admittedly this is not handled as consistently as I would prefer: I wish they had just standardized `#!/` and `#! ` as special comment syntax, as Racket does, and clarified the interaction with `#!<identifier>`. But Matt points out that JavaScript also has very similar special treatment for a single initial shebang comment. Lua has a similar mechanism: my vague recollection is that many languages do. I do not follow what your point is here -- I only (falsely) claimed that POSIX and R6RS are incompatible w.r.t. shebangs and "#!"; I did not make such claims for other languages -- some other languages don't even have "#!" (e.g. BASIC). >>>> >>>> (^) it doesn't integrate with the module system -- more concretely, >>>> (use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments >>>> would solve that, but we agree that that would be unreasonable in many >>>> situations. (Alternatively one could place ECMAScript code in a file >>>> with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but >>>> ... no.) > > Generally I would use `.scm` (or `.rkt`), and certainly I would do so if there isn't some well-established other extension. If you are just using the file, you shouldn't necessarily have to care what language it's implemented in internally. Maybe you would, but Guile shouldn't require people to change the extension of source files to something invalid, as I pointed out with the ECMAScript example. .scm means Scheme, not ECMAScript. As such, support for non-.scm file extensions is required. > In particular, I don't think the `#lang` concept should be conflated with editor configuration like `'-*- mode: ecmascript -*-`. > [...] Then don't do that, and use non-editor configuration like '-*- programming-language: ecmascript -*-' instead. While Emacs is the main user of '-*- ... -*-' lines, there is nothing stopping use from adding a few variables like e.g. 'programming-language' (*) that Emacs doesn't assign a meaning to. (*) I don't actually know if Emacs assigns a meaning to this variable or not. Some other word might perhaps be needed. For convenience, I would recommend supporting '-*- mode: ... -*-' too, such that non-Scheme source files can sometimes be loaded without making any Guile-specific changes to the source files. If whoever writes or reads the source file wants to use another Emacs mode, or if it the mode is ambiguous because it covers multiple languages, there is nothing stopping them from setting both 'mode: ...' and 'programming-language: ...': % -*- language: datalog; mode: racket -*- [...] As an example, consider these two Racket programs: > > ``` > #!datalog > parent(anchises, aeneas). > parent(aeneas, ascanius). > ancestor(A, B) :- parent(A, B). > ancestor(A, B) :- parent(A, C), ancestor(C, B). > ancestor(A, ascanius)? > ``` > > ``` > #lang algol60 > begin > comment Credit to Rosetta Code; > integer procedure fibonacci(n); value n; integer n; > begin > integer i, fn, fn1, fn2; > fn2 := 1; > fn1 := 0; > fn := 0; > for i := 1 step 1 until n do begin > fn := fn1 + fn2; > fn2 := fn1; > fn1 := fn > end; > fibonacci := fn > end; > > integer i; > for i := 0 step 1 until 20 do printnln(fibonacci(i)) > end > ``` > > While I'm sure there are Emacs modes available for Datalog and Algol 60, and some people might want to use them for these programs, I would probably want to edit them both in racket-mode: because racket-mode supports the `#lang` protocol, it can obtain the syntax highlighting, indentation, and other support defined by each language, while also retaining the global features that all `#lang`-based languages get "for free", like a tool to rename variables that respects the actual model of scope. This is one of the value propositions of the `#lang` system. As pointed out by my previous example, this is solved by '-*- ... -*-' too. > >>> >>> Racket has a mechanism to enable additional source file extensions without needing explicit command-line arguments by defining `module-suffixes` or `doc-modules-suffixes` in a metadata module that is consulted when the collection is "set up": https://docs.racket-lang.org/raco/setup-info.html However, this mechanism is not widely used. >> >> I guess this is an improvement over the runtime 'guile -x extension'. >> However, if I'm understanding 'setup-info.html' correctly, the downside >> is that you now need a separate file containing compilation settings. >> >> I have previously proposed a mechanism that makes the '-x' + >> '--language' a compile-time thing (i.e., embed the source file extension >> in the compiled .go; see previous e-mails in this thread), without >> having to make a separate file containing compilation settings. >> >> How is Racket's method an improvement over my proposal? >> > > My focus in this thread is explaining and advocating for `#lang`. I see the whole business with file extensions as basically orthogonal to `#lang`, and my opinions about it are much less strong, but I'll try to answer your question. I think it would make sense for `.go` files to record the file extension of their corresponding source files: Racket's `.zo` files do likewise. I don't object to a command-line option *at compile-time* (as you said) to enable additional file extensions, and I agree that there isn't a huge difference between that and an approach with a separate configuration file, though I do find the configuration-file approach somewhat more declarative, which I prefer. '--language whatever' appears pretty declarative to me, as in it declares that the language is 'whatever'. > What I was really trying to argue here is that the file extension should not determine the meaning of the program it contains: more on that below. That's what the '--language whatever' compilation argument is for: it overrides the 'guess by file extension' fallback. >>> Overall, the experience of the Racket community strongly suggests that a file should say what language it is written in. Furthermore, that language is a property of the code, not of its runtime environment, so environment variables, command-line options, and similar extralinguistic mechanism are a particularly poor fit for controlling it. >> >> Agreed on the 'no environment variables' thing, disagreed on the 'no >> command-line options'. In the past e-mails in this thread, there was >> agreement on the ‘embed the source file extension in the compiled .go or >> something like that; and add -x extension stuff _when compiling_ (not >> runtime!) the software that uses the extension’. >> >> Do you any particular issues with that proposal? AFAICT, it solves >> everything and is somewhat more straightforward that Racket. >> > > I don't have particular issues with a compile-time command-line option to determine which files to compile. I do object to using command-line options or file extensions to determine what language a file is written in. > >>> File extensions are not the worst possible mechanisms, but they have similar problems: code written in an unsaved editor or a blog post may not have a file extension. >> >> With the proposal I wrote, it remains possible to override any 'file >> extension -> language' mapping. It's not in any way incompatible with >> "-*- lang: whatever -*-"-like comments. >> >> Additionally, Guile can only load files that exist (i.e, 'saved'); Guile >> is not an editor or blog reader, so these do not appear problems for >> Guile to me. >> > > While it's true that the only files Guile can load are "files that exist", it's not true that "Guile can only load files": consider procedures like `eval-string`, `compile`, and, ultimately, `read-syntax`. * read-syntax is for reading S-expressions -- it is only for Scheme, other languages are out-of-scope for that procedure. As such, read-syntax appears irrelevant here to me. * For 'compile' and 'eval-string', I'd like to that when the point out that they have a "#:from" and #:lang" argument to set the language, as you appear to know going by your responses below. As such, even if Guile had an integrated editor, that editor can pass the language to Guile's compiler. I mean, if the editor is good, it has syntax highlighting, and to do syntax highlighting it needs to know the language, so it knows the language anyway (e.g. maybe it has separate "Write new Scheme" and "Write new ECMAScript" buttons, or maybe it has a 'mode: scheme' and 'mode: ecmascript' like Emacs and being an editor, it then knows how to convert that editor configuration into #:from/#:lang). * What I meant with 'Guile can only load files that exist', is that the files it loads are only those that exist. I did not mean that no loadable non-file things exist. The point here, is that if you wrote a blog post that defines the (foo) module and you enter (use-modules (foo)) in a Guile REPL, it isn't going to surf to your blog to download the (foo) module. As Guile doesn't even know about your blog post, it has no use for any file extension or language declaration that your blog post about (foo) might or might not have. > AFAICT, to the extent that Guile's current implementations of such procedures support multiple languages, they rely on out-of-band configuration, like an optional `#:language` argument, which is just as extra-linguistic as relying on command-line options, environment variables, or file extensions. First, I never proposed relying on environment variables. I oppose using environment variables for these things. Why are you mentioning environment variables, when this has never been proposed? Second, the implicit argument here appears to be 'extra-linguistic is bad, so we shouldn't do these extra-linguistic' things. But what's the problem with being 'extra-linguistic'? Some stuff like environment variables are plain bad here (no disagreement here), file extensions are bad to rely on but acceptable and convenient as a fallback. Third, I am not proposing to rely on command line options and file extension -- I only propose _using_ them, not _relying_ on them -- if someone wants to implement an in-band (intra-linguistic?) override like '-*- ... -*-'/#lang for file-extension based detection, they can do that -- my '-*- ... -*-' is just a proposed improvement over "#lang'. Fourth, TBC, I'd like to point out that '-*- ... -*-' is equally 'intra/extra-linguistic' as '#!lang' (see my response to 'magic comments' later), though I do know that's not the point you appear to be making right here. > What I'm trying to advocate is that programs should say in-band, as part of their source code, what language they are written in. That's done by '-*- ... -*-' too, and I haven't noticed any argumentation for ‘programs should say in-band what language they are written in’. Also, there is a gap between the following five statements, which you appear to sometimes be conflating: (A) Programs should say in-band what language they are written in. (B) ‘Guile should use in-band information to determine what language a program is written in.’ (C) ‘Guile should use out-of-band information to determine what language a program is written in.’ (D) ‘Guile should exclusively use out-of-band information to determine what language a program is written in.’ (E) ‘Guile should exclusively use in-of-band information to determine what language a program is written in.’ I disagree with (A), because often it's perfectly clear from context (out-of-band) what language it is. Take for example Guile itself. Being Guile, of course everything under 'modules/' is Scheme code. Adding '#!r6rs' or '-*- language: scheme -*-' lines to every .scm isn't incorrect, but is rather silly. Likewise, I have written a Scheme library called 'Scheme-GNUnet'. From the name alone, it is clear that it's Scheme. More generally, usually it's pretty clear (for a human) which language it is by just looking at the code, and if not, probably the README mentions which language the software uses. I don't dispute (B), but neither do I find it particularly important given that adding a '--language=whatever' argument is trivial. I would like to point out that (A) does not imply (B) -- it is possible to consider it good practice to mention the language in-band, without any language implementations actually using this information. More to the point, to me (A) appears irrelevant to this thread. Sure, perhaps it's a good practice, but Guile is not a programmer; Guile is a language implementation. (A) is only relevant insofar Guile would make use of this in-band information. > What I'm trying to advocate is that programs should say in-band, as > part of their source code, what language they are written in. This would be advocating for (A). But as mentioned above, (A) is irrelevant by itself, and it doesn't imply (B). It is also false -- you weren't advocating for (A), but for (B) -- (A) is just a means to (B) in your argumentation structure. Quoting one of your first messages: > To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, the ancient & spry" (<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>): > >> On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that. You are also advocating for 'E/not (C)': > I do object to using command-line options or file extensions to determine what language a file is written in. You also appear to be thinking that I'm advocating for '(D)' -- while I agree with (D) (using a non-universal language construct (*) like '#lang' to determine the language something is written in, is rather circular), I'm not arguing for it. (*) Again, #lang is rather Racket-specific, whereas comments are mostly universal. >> If the editor needs to determine the language for syntax highlighting or >> such, then there exist constructs like ';; -*- mode: scheme -*-' that >> are valid Scheme, but that's not a Guile matter. >> > > See above for why the `#!language/wisp` option is perfectly valid R6RS Scheme Wisp isn't R6RS. Wisp code needs to be valid Wisp, not valid R6RS Scheme. There also exist languages beyond Wisp and Scheme. > and for some of my concerns about overloading editor configuration to determine the semantics of programs. See above replies. > More broadly, everyone who reads a piece of source code, including humans as well as editors and the `guile` executable, needs to know what language it's written in to hope to understand it. For programmers, this is covered by: * looking at the code -- even without any explicit in-band information like ';; -*- ... -*-' comments or "#lang", or out-of-band information like file extensions, a README or Makefile with compilation, it usually is pretty clear what language it is in. * usually source code is in files, which usually has file extensions. Usually there's a good map file extension->language, e.g. .scm files only contain Scheme, .js only contains ECMAScript, ... For editors, this is covered by: * Editor configuration like '-*- mode: scheme -*-'. * Language-specific declarations like #lang, #!r6rs, '-*- programming-language: scheme -*-' * File extensions. * If the editor guessed wrong, likely the syntax highlighting is wrong etc., so the programmer gives a hint to the editor (e.g. by adding a -*- mode: scheme -*- line, or #!r6rs, ...) For the Guile executable, this is covered by: * --language=.../#:from/#:lang arguments. * -*- ... -*- / #!r6rs lines (but not #lang except when needed for compatibility with Racket, otherwise Guile would create incompatibilities.) * File extensions. * Default to Scheme. * If guessing wrong, there will almost surely be some parsing error, in which case the programmer will intervene by modifying a single line in the Makefile or such to add "--language=" line, or if they per se want to spend much more time than needed, add "-*- programming-language: whatever -*-" comments to every single source file. >>> (For more on this theme, see the corresponding point of the Racket Manifesto: <https://cs.brown.edu/~sk/Publications/Papers/Published/fffkbmt-racket-manifesto/paper.pdf>) Actually writing the language into the source code has proven to work well. >> >> What is the corresponding point? I'm not finding any search results for >> 'file extension' or 'file name', and I'm not finding any relevant search >> results for 'editor'. Could you give me a page reference and a relevant >> quote? >> > > I was trying to refer to section 5, "Racket Internalizes Extra-Linguistic Mechanisms", which begins on p. 121 (p. 9 of the PDF). Admittedly, the connection between the main set of examples they discuss and this conversation is non-obvious. Maybe the most relevant quote is the last paragraph of that section, on p. 123 (PDF p. 11): "Finally, Racket also internalizes other aspects of its context. Dating back to the beginning, Racket programs can programmatically link modules and classes. In conventional languages, programmers must resort to extra-linguistic tools to abstract over such linguistic constructs; only ML-style languages and some scripting languages make modules and classes programmable, too." (Internal citations omitted.) This e-mail thread is about determining the language, not classes and modules. Trying to decode this vague paragraph, the relevant bit here appears ‘must resort to _extra-linguistic_ tools to abstract over such _linguistic constructs_’. As such, I assume that 'extra-linguistic' refers to file extensions (and other things, but it's the file extensions that are relevant here). Using this guess to unvaguify the phrasing, I get: ‘Programmers must resort to use file extensions to indicate which language a programmer is written in.’ However, that this is a bad thing appears to be the point that you were making in the first place, for which you gave the PDF as a source, so this doesn't explain anything. >>> To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, the ancient & spry" (<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>): >>> > > Sorry, this was meant to be tongue-in-cheek, and it seems that didn't come across. "Argument from authority" is often considered a category of logical fallacy, and ending with a quote is sometimes considered to be bad style or to weaken a piece of persuasive writing. > >> * I previously pointed out some problems with that proposal >> -- i.e., '#lang whatever' is bogus Scheme / Wisp / ..., > > I hope I've explained why something like `#!language/wisp` is perfectly within the bounds of R6RS. No, because Wisp is not R6RS -- R6RS is only relevant insofar the Wisp standard delegates to R6RS. (TBC I'm not claiming that #!language/wisp is invalid Wisp, I'm only claiming that your argumentation has holes here.) Also, you forgot the '...' in 'Scheme / Wisp / ...' -- while R6RS is somewhat relevant to Wisp, there exist languages over which the R6RS has no sway, e.g. BASIC. > Also, given that Guile already starts with non-standard extensions enabled by default, I don't see any reason not to also support `#lang language/wisp`. Here is a reason for not adding non-standard extensions, from a previous reply of mine: > The '#lang whatever' stuff makes Scheme (*) files unportable between implementations, as '#lang scheme' is not a valid comment -- there exist Schemes beyond Guile and Racket. If it were changed to recognising > '-*- mode: scheme -*-' or '-*- language: scheme -*-' or such, it would be better IMO, but insufficient, because (^). > > (*) Same argument applies for some, but not all, other non-Scheme languages too. That Guile might have made some mistakes with non-standard enabled-by-default language extensions in the past, does not mean that it should make more mistakes in the present. > In particular, the spelling of `#lang` proceeds directly from the Scheme tradition. This is from the R6RS Rationale document, chapter 4, "Lexical Syntax", section 3, "Future Extensions" [3]: [...] Again, the Scheme tradition holds no sway over non-Scheme languages (except for situations like Wisp, perhaps), e.g. Pascal and BASIC. Guile does not limit itself to Scheme languages, e.g. it has some support for elisp, brainfuck and python (see: python-on-guile). >> and >> 'the module system won't find it, because of the unexpected >> file extensions'. >> > > This is indeed something that needs to be addressed, but it seems like a very solvable problem. Using the extension ".scm" for everything would be one trivial solution. Something like your proposal to enable file extensions based on a compile-time option could likewise be part of a solution. The problem with the 'use .scm for everything' solution is that you would need to use .scm for everything, even non-Scheme files, and even when the source code comes from a project that uses a non-Guile implementation and as such uses very different extensions, e.g. '.js'. > In general, I'll say that, while using Guile, I've often missed Racket's more flexible constructs for importing modules. I especially miss `(require "foo/bar.rkt")`, which imports a module at a path relative to the module where the `require` form appears: it makes it easy to organize small programs into multiple files without having to mess with a load path. I fail to see the relevancy of this comment. Also, 'include' already doe something pretty close to this; presumably 'use-modules' could be modified to accept a #:relative-source-file-name argument: (define-module (baz)) ; /project/baz.scm ;; -> /project/foo/bar.rkt (use-modules ((foo bar) #:relative-source-file-name "foo/bar.rkt")) > On Thu, Feb 23, 2023, at 1:42 PM, Maxime Devos wrote: >> Have you seen my messages on how the "#lang" construct is problematic >> for some languages, and how alternatives like "[comment delimiter] -*- >> stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be >> equally simple (*) and not have any downsides (**). >> >> (*) The port encoding detection supports "-*- coding: whatever -*-", >> presumably that functionality could be reused. >> > > IMO, the use of "-*- coding: whatever -*-" to detect encoding is an ugly hack and should not be extended further. > > I tried to raise some objections above to conflating editor configuration with syntax saying what a file's language is. > > More broadly, I find "magic comments" highly objectionable. The whole point of comments is to be able to communicate freely to human readers without affecting the interpreter/compiler/evaluator. Introducing magic comments means must constantly think about whether what you are writing for humans might change the meaning of your program. Magic comments *without knowing a priori what is a comment* are even worse: now, you have to beware of accidental "magic" in ALL of the lexical syntax of your program. (Consider that something like `(define (-*- mode: c++ -*-) 14)` is perfectly good Scheme.) I object to the second claim -- while I can't account for aliens given the lack of them, I find it pointless to restrict the purpose of comments to human animals. The third and penultimate claim are false. If implemented correctly in Guile, only the first language declaration counts, it's not 'ALL of the lexical syntax of your program'. You previously claimed that programs should contain in-band information on which language something is written in. If this is followed, your example would actually look like: ;; -*- programming-language: scheme -*- ;; ^ or mode: c++, or #!r6rs, or an out-of-band --language=..., ... (define (-*- mode: c++ -*-) 14) As the relevant '-*- ...: scheme -*-' precedes the irrelevant '-*- mode: c++ -*-', it's the relevant one that is picked up by Guile, not the irrelevant one. As such, as long as the programmer uses the '--language=' compilation option in the Makefile, or puts a 'real' language declaration in the beginning of the source file (as a 'magic comment', or #!r6rs, or #lang as far as required for compatibility with Racket), things will work out. Even if the programmer doesn't do any of that, it's still unproblematic, because of error messages at compilation / interpretation time -- different languages tend to have incompatible syntax, if you pass a Scheme program to a C++ parser you'll just get a stream of syntax errors. Surely, the programmer will pass the code to the compiler or interpreter at some point, right? Otherwise, the programming was pointless. Likewise, test suites (ought to) exist, which would catch these problems even if they weren't written to catch these problems. (If they don't exist, then the programmer has much worse problems than a super implausible '(define (-*- mode: c++ -*- 14)' situation.) > > (It's not really relevant for the `#lang`-like case, but something I find especially ironic about encoding "magic comments" or, say, `<?xml version="1.0" encoding="UTF-8"?>`, is that suddenly if you encode the Unicode text in some other encoding it becomes a lie.) That sounds exactly the same situation as with #lang to me (and, as such, relevant). If you take a Scheme file #scheme ; ^ equivalent of <?xml version="1.0" encoding="UTF-8?>" [...] ; <- Scheme code and then convert it to Wisp, but forget to adjust the "#lang": #scheme ; ^ equivalent of <?xml version="1.0" encoding="something-else"?> [...]; <-- Wisp code then you'll get a bunch of syntax errors. > > On Fri, Feb 24, 2023, at 6:51 PM, Maxime Devos wrote: >> On 25-02-2023 00:48, Maxime Devos wrote: >>>>> (**) For compatibility with Racket, it's not like we couldn't >>>>> implement both "#lang" and "-*- stuff: language -*-". >> >> TBC, I mean ‘only support #lang' for values of 'lang' that Racket >> supports’ > > If I understand what you're proposing here, I don't think it's a viable option. > > The fundamental purpose of the `#lang` construct (however you spell it) is to provide an open, extensible protocol for defining languages. Thus, "values of 'lang' that Racket supports" are unbounded, provided that a module has been installed where the language specification says to look. From The Racket Reference [4]: The problem, as I wrote several times previously in different words, is that this 'open, extensible protocol' is not a standard protocol shared between languages. No language that precede the existence of Racket acknowledges this protocol in its specification of its syntax, and, like I said before, if the language doesn't have "#" comments, then #lang is also contrary to the syntax of the language. Like I wrote about R6RS: Racket only holds sway over Racket; it has no authority on the syntax of, say, BASIC and Pascal. Also, being unbounded in not a problem, because unbounded!=infinite. At any point in time, Racket itself only supports a finite number of 'values of 'lang'', and at any point at time there are only a finite number of external modules that implement certain 'lang'. As such, at any version of Guile, Guile could have a finite list of 'lang' where it recognises the Racket-specific extension #lang extension which is incompatible with non-Racket, non-Guile implementations. > [...] > I am definitely **not** suggesting that Guile implement all the details of Racket's `#lang` implementation. What I do strongly advocate is that you design Guile's support for `#lang` (or `#!`) to leave open a pathway for compatibility in the future. [...] The problem with this advocating, is that I agree with you here (except for 'you design' (*)), so why are you repeating this again? I wrote something among the lines ‘For __compatibility__ with Racket, __#lang should be recognised for values of 'lang' that are recognised by Racket__, but not for other languages’ (emphasis added). (*) Sure, someone could implement this compatibility, whatever, but we don't need this compatibility for Wisp. For Wisp, the more general and less problematic 'embed source file name in .go, + --language/file extension guessing' suffices. It's also rather pushy -- _you_ are demanding that _I_ paper over a source of incompatibility _introduced by others_ (Racket) (and furthermore _I_ consider that source of incompatibility _bad_), in the ML of a _volunteer project_, in a discussion that's ultimately about Wisp, not Racket, where _I_ (**) already have voluntarily designed a solution for Wisp? (**) And others maybe, I don't recall how much can be attributed to whom. > [...] > (Other kinds of potential namespace collisions are easier to manage: for example, we could imagine that `(use-modules (foo bar baz))` might not access the same module as `(require foo/bar/baz)`. [...] This is interesting but seems completely orthogonal; this e-mail thread is about detecting which language something is in, and finding source files with non-.scm modules, not about making the module system non-global. > [...] > I've sort of alluded above to my pipe dream of a grand unified future for Racket-and-Guile-on-Chez, Guile-and-Racket-on-the-Guile-VM, and endless other possibilities. I wrote about it in more detail on the guix-devel list at [10]. (These thoughts were inspired by conversations with Christine Lemmer-Webber, though she bears no responsibility for my zany imaginings.) OK, but what has this to do with this e-mail thread? This e-mail thread is about supporting additional languages, not about emulating Racket on top of Guile somehow (or perhaps you count Racket's dialect of Scheme as a language of its own to be implemented in Guile?). > Finally, I looked into the history of `#!` in R6RS a bit, and I'll leave a few pointers here for posterity. Will Clinger's 2015 Scheme Workshop paper [11] says in section 3.1 that "Kent Dybvig suggested the `#!r6rs` flag in May 2006", Clinger "formally proposed addition of Dybvig’s suggestion" [12], and, "less than six weeks later," `#!r6rs` was "in the R6RS editors’ status report". (I am not persuaded by all of the arguments about `#!r6rs` in that paper: in particular, the analysis doesn't seem to account for R6RS Appendix A [1].) As best as I can tell, the suggestion from Kent Dybvig is [13]: Again, how is RnRS relevant to _non-Scheme_ languages? Besides the 'shebangs actually are r6rs', I am disappointed by this discussion -- you keep repeating irrelevant points or points that were already addressed. (Again, R6RS and Racket are simply _irrelevant_ to non-Scheme languages that did not originate from Racket, and you are not giving arguments for them actually being relevant somehow.) As this line of discussion has proven to just be a pointless time sink, I will not read or respond to further replies by you in this line of discussion. Greetings, Maxime [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 15:42 ` Maxime Devos @ 2023-02-26 16:14 ` Dr. Arne Babenhauserheide 2023-02-26 17:58 ` Matt Wette 1 sibling, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-26 16:14 UTC (permalink / raw) To: Maxime Devos Cc: Philip McGrath, Ludovic Courtès, Matt Wette, Christine Lemmer-Webber, guile-devel [-- Attachment #1: Type: text/plain, Size: 1765 bytes --] Maxime Devos <maximedevos@telenet.be> writes: > Op 26-02-2023 om 08:45 schreef Philip McGrath: >> What I'm trying to advocate is that programs should say in-band, as part of their source code, what language they are written in. In-band is not the same as in-language. I agree that it should be part of the source code — at least for executable files - but this is already possible. I’ve been using shell-indirection for many years that ensures that files for the different languages are recognized and compiled with the correct language. It does not look as clean, but it allows solving a lot of problems. It ensures that when language/wisp.scm is not available, it first gets compiled as scheme, and then the wisp files are used as wisp, as in the file dryads-wake.w: #!/usr/bin/env bash # -*- wisp -*- # ensure that (language wisp) is pre-compiled if ! guile -L $(dirname $(realpath "$0")) -C $(dirname $(realpath "$0")) --language=wisp -c '' 2>/dev/null; then guile -L $(dirname $(realpath "$0")) -C $(dirname $(realpath "$0")) -c '(import (language wisp spec))' >/dev/null 2>&1 fi # run dryads-wake as module to ensure it is used pre-compiled exec -a "$0" guile -L $(dirname $(realpath "$0")) -C $(dirname $(realpath "$0")) --language=wisp -x .w -e '(dryads-wake)' -c '' "${@}" ; !# ; … code follows … This is an extra-linguistic feature, but it defines in the source-code what language is used for the different parts. And it has its own kind of elegance. Not in isolated language design, but in leveraging different parts of the GNU system to provide more capabilities than available from any of its parts. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 15:42 ` Maxime Devos 2023-02-26 16:14 ` Dr. Arne Babenhauserheide @ 2023-02-26 17:58 ` Matt Wette 2023-02-26 18:03 ` Dr. Arne Babenhauserheide 1 sibling, 1 reply; 56+ messages in thread From: Matt Wette @ 2023-02-26 17:58 UTC (permalink / raw) To: guile-devel With respect to file extensions, guile does not use file extension: You can name a file containing Scheme code "foo.js" and "guile foo.js" will execute it. The code in the wip-load-lang branch provides file-extension support: 1) Currently emacscript, with "js" extension, elisp, with "el" extension, and Scheme, with "scm" extension, are supported. 2) Additional languages can be added by calling add-lang-extension, from (system base compile): (add-lang-extension "m" 'nx-mlang) Matt ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 17:58 ` Matt Wette @ 2023-02-26 18:03 ` Dr. Arne Babenhauserheide 2023-02-26 18:20 ` Matt Wette 0 siblings, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-26 18:03 UTC (permalink / raw) To: Matt Wette; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 544 bytes --] Matt Wette <matt.wette@gmail.com> writes: > With respect to file extensions, guile does not use file extension: > You can name a file containing Scheme code "foo.js" and "guile foo.js" > will execute it. The module-system uses file extensions: If you (define-module (foo) #:export (main)) (define (main args) (display 'foo)) in foo.js, then guile -L . -e '(foo)' -c '' won’t find it. But it will find it in foo.scm. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 18:03 ` Dr. Arne Babenhauserheide @ 2023-02-26 18:20 ` Matt Wette 2023-02-26 21:39 ` Dr. Arne Babenhauserheide 0 siblings, 1 reply; 56+ messages in thread From: Matt Wette @ 2023-02-26 18:20 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: guile-devel On 2/26/23 10:03 AM, Dr. Arne Babenhauserheide wrote: > Matt Wette <matt.wette@gmail.com> writes: > >> With respect to file extensions, guile does not use file extension: >> You can name a file containing Scheme code "foo.js" and "guile foo.js" >> will execute it. > The module-system uses file extensions: If you > > (define-module (foo) #:export (main)) > (define (main args) (display 'foo)) > > in foo.js, then > > guile -L . -e '(foo)' -c '' > > won’t find it. But it will find it in foo.scm. > Guile does not use file extensions consistently, then? ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 18:20 ` Matt Wette @ 2023-02-26 21:39 ` Dr. Arne Babenhauserheide 0 siblings, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-26 21:39 UTC (permalink / raw) To: Matt Wette; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 931 bytes --] Matt Wette <matt.wette@gmail.com> writes: > On 2/26/23 10:03 AM, Dr. Arne Babenhauserheide wrote: >> Matt Wette <matt.wette@gmail.com> writes: >> >>> You can name a file containing Scheme code "foo.js" and "guile foo.js" >> guile -L . -e '(foo)' -c '' >> >> won’t find it. But it will find it in foo.scm. >> > > Guile does not use file extensions consistently, then? Those are two different use-cases. With guile foo.js, guile does not have to search for the file: it just uses the one you passed. With -e '(foo)' it has to search for a matching module in the known / defined load paths. Note also guile -L . -e '(foo)' -c '' has lower startup time than guile foo.scm. If I understand it correctly that’s because it simply mmaps the compiled *.go file while guile foo.scm always reads the file. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-26 7:45 ` Philip McGrath 2023-02-26 15:42 ` Maxime Devos @ 2023-10-02 14:59 ` Christine Lemmer-Webber 1 sibling, 0 replies; 56+ messages in thread From: Christine Lemmer-Webber @ 2023-10-02 14:59 UTC (permalink / raw) To: Philip McGrath Cc: Maxime Devos, Ludovic Courtès, Matt Wette, guile-devel "Philip McGrath" <philip@philipmcgrath.com> writes: > I've sort of alluded above to my pipe dream of a grand unified future > for Racket-and-Guile-on-Chez, Guile-and-Racket-on-the-Guile-VM, and > endless other possibilities. I wrote about it in more detail on the > guix-devel list at [10]. (These thoughts were inspired by > conversations with Christine Lemmer-Webber, though she bears no > responsibility for my zany imaginings.) It's a long email so I'm only quoting the part that mentions me. ;) In general I think the hash-lang idea in Racket is a neat one. File extensions are also possible I suppose, but ultimately, when the program boots, there has to be some sort of way of finding the "current configuration" of languages. Note that #langs have an ambient authority problem, if we had a more ocap'y system, but... we're far from that, any module can do a mess of things, so I suppose if we're relying on the module system with ambient authority already, the same situation may apply. One thing that's worth noting is that languages do *two things* (or three): they provide a reader, and they provide an execution model (including an initial set of bindings). It would be nice if we could separate those two things as much as possible, when it is possible. I think readers are the least interesting part of language design usually. But obviously I support Wisp here, so... ;) - Christine ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-18 3:50 ` Philip McGrath 2023-02-18 15:58 ` Maxime Devos @ 2023-02-23 7:59 ` Maxime Devos 2023-02-23 8:51 ` Dr. Arne Babenhauserheide 1 sibling, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-23 7:59 UTC (permalink / raw) To: Philip McGrath, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 1803 bytes --] >> In Racket, in the initial configuration of the reader when reading a file, "`#!` is an alias for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or `_`." (See <https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29>.) [...] > (Guile does not handle `#!r6rs` properly, presumably because of the legacy `#!`/`!#` block comments. I think this should be a surmountable obstacle, though, especially since Guile does support standard `#|`/`|#` block comments.) > > ‘#! ... !#’ comments aren't legacy; they exist to allow putting the shebang in the first line of a script, and to pass additional arguments to the Guile interpreter (see: (guile)The Top of a Script File) (*). As such, you can't just replace them with #| ... |# (unless you patch the kernel to recognise "#| ..." as a shebang line). [...] > > Furthermore, according to the kernel, #!r6rs would mean that the script needs to be interpreted by a program named 'r6rs', but 'guile' is named 'guile', not 'r6rs'. (I assume this is in POSIX somewhere, though I couldn't find it.) > > (This is an incompatibility between R6RS and any system that has shebangs.) Thinking a bit more about it, it should be possible to special-case Guile's interpretation of "#!" such that "#!r6rs" doesn't require a closing "!#". (Technically backwards-incompatible, but I don't think people are writing #!r6rs ...!# in the wild.) Still doesn't really address the problem though, as Scheme scripts (or scripts in another language) may need to start with a shebang and "#!lang" or "#lang" is not a valid comment in all languages. (E.g., I don't think it's valid Pascal, though I only have read some Pascal code, I haven't looked at the specification.) Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 931 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 7:59 ` Maxime Devos @ 2023-02-23 8:51 ` Dr. Arne Babenhauserheide 2023-02-23 18:04 ` Maxime Devos 0 siblings, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-23 8:51 UTC (permalink / raw) To: Maxime Devos; +Cc: Philip McGrath, guile-devel [-- Attachment #1: Type: text/plain, Size: 2051 bytes --] Maxime Devos <maximedevos@telenet.be> writes: >> ‘#! ... !#’ comments aren't legacy; they exist to allow putting the >> shebang in the first line of a script, and to pass additional >> arguments to the Guile interpreter (see: (guile)The Top of a Script >> File) (*). This is awesome, by the way. It’s what allowed me to write wisp scripts that just work without having wisp shipped by starting as bash script, pre-compiling the language files, and then exec'ing guile with the right arguments that interprets the file as module and runs the code inside. >> Furthermore, according to the kernel, #!r6rs would mean that the >> script needs to be interpreted by a program named 'r6rs', but >> 'guile' is named 'guile', not 'r6rs'. (I assume this is in POSIX >> somewhere, though I couldn't find it.) We could fix that by installing a binary named r6rs. > Thinking a bit more about it, it should be possible to special-case > Guile's interpretation of "#!" such that "#!r6rs" doesn't require a > closing "!#". (Technically backwards-incompatible, but I don't think > people are writing #!r6rs ...!# in the wild.) Do you need the closing !# if you restrict yourself to the first line? > Still doesn't really address the problem though, as Scheme scripts (or > scripts in another language) may need to start with a shebang and > "#!lang" or "#lang" is not a valid comment in all languages. (E.g., I > don't think it's valid Pascal, though I only have read some Pascal > code, I haven't looked at the specification.) I think itmust be ignored in all languages that work as scripts in POSIX. So I would expect that support for ignoring #!... in the first line is very widespread. Also since the language implementation is in Guile, this could simply be added for Guile. That may prevent using this file from other implementations of the language, but it should work well enough as a first step. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 8:51 ` Dr. Arne Babenhauserheide @ 2023-02-23 18:04 ` Maxime Devos 2023-02-23 18:22 ` Maxime Devos ` (2 more replies) 0 siblings, 3 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-23 18:04 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: Philip McGrath, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 1595 bytes --] On 23-02-2023 09:51, Dr. Arne Babenhauserheide wrote: >> Thinking a bit more about it, it should be possible to special-case >> Guile's interpretation of "#!" such that "#!r6rs" doesn't require a >> closing "!#". (Technically backwards-incompatible, but I don't think >> people are writing #!r6rs ...!# in the wild.) > Do you need the closing !# if you restrict yourself to the first line? I thought so at first, but doing a little experiment, it appears you don't need to: $ guile scheme@(guile-user)> #!r6rs (display "hi") (newline) (output: hi) Apparently Guile already has required behaviour. >>> Still doesn't really address the problem though, as Scheme scripts (or >>> scripts in another language) may need to start with a shebang and >>> "#!lang" or "#lang" is not a valid comment in all languages. (E.g., I >>> don't think it's valid Pascal, though I only have read some Pascal >>> code, I haven't looked at the specification.) >> I think itmust be ignored in all languages that work as scripts in >> POSIX. So I would expect that support for ignoring #!... in the first >> line is very widespread. The problem is that not all languages were made with POSIX-style scripts in mind, e.g. Pascal, BASIC and Java (*). Greetings, Maxime. (*) Java actually allows "#!", but only in 'Shebang' files (see: https://openjdk.org/jeps/330#Shebang_files). It remains invalid to put a '#!java' line in files with a class definition that is supposed to be found by Java's class loaders and compiler (in Guile terms, the source code of a module). [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 18:04 ` Maxime Devos @ 2023-02-23 18:22 ` Maxime Devos 2023-02-23 18:36 ` Maxime Devos 2023-02-23 18:37 ` Maxime Devos 2 siblings, 0 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-23 18:22 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: Philip McGrath, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 2231 bytes --] On 23-02-2023 19:04, Maxime Devos wrote: >>>> Still doesn't really address the problem though, as Scheme scripts (or >>>> scripts in another language) may need to start with a shebang and >>>> "#!lang" or "#lang" is not a valid comment in all languages. (E.g., I >>>> don't think it's valid Pascal, though I only have read some Pascal >>>> code, I haven't looked at the specification.) >>> I think itmust be ignored in all languages that work as scripts in >>> POSIX. So I would expect that support for ignoring #!... in the first >>> line is very widespread. > > The problem is that not all languages were made with POSIX-style scripts > in mind, e.g. Pascal, BASIC and Java (*). I forgot about the following: >> Also since the language implementation is in Guile, this could simply be >> added for Guile. That may prevent using this file from other >> implementations of the language, but it should work well enough as a >> first step. I disagree, because there is an alternative solution that should be straightforward and avoids the downside of making an extension to the language that could confuse other implementations. Quoting myself: > The '#lang whatever' stuff makes Scheme (*) files unportable between implementations, as '#lang scheme' is not a valid comment -- there exist Schemes beyond Guile and Racket. If it were changed to recognising > '-*- mode: scheme -*-' or '-*- language: scheme -*-' or such, it would be better IMO, [...] E.g.: ;; -*- insert-bikeshed-here: scheme -*- // -*- insert-bikeshed-here: c++ -*- # -*- insert-bikeshed-here: ecmascript -*- /* -*- insert-bikeshed-here: c -*- */ {*** *- insert-bikeshed-here: pascal -*- ***} REM -*- insert-bikeshed-here: basic -*- I suppose the decision could be made to add support for "#lang" for compatibility with Racket when 'lang' is some language Racket supports, despite the fact that "#lang" is invalid in some of those languages, but IMO we should avoid adding new extensions that confuse ‘native’ implementations when something compatible like "-*- ... -*-" can be done instead. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 18:04 ` Maxime Devos 2023-02-23 18:22 ` Maxime Devos @ 2023-02-23 18:36 ` Maxime Devos 2023-02-23 18:37 ` Maxime Devos 2 siblings, 0 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-23 18:36 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: Philip McGrath, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 436 bytes --] On 23-02-2023 19:04, Maxime Devos wrote: > (*) Java actually allows "#!", but only in 'Shebang' files (see: > https://openjdk.org/jeps/330#Shebang_files). It remains invalid to put > a '#!java' line in files with a class definition that is supposed to be > found by Java's class loaders and compiler (in Guile terms, the source > code of a module). (Also, to my understanding, "#lang" is unconditionally invalid.) [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-23 18:04 ` Maxime Devos 2023-02-23 18:22 ` Maxime Devos 2023-02-23 18:36 ` Maxime Devos @ 2023-02-23 18:37 ` Maxime Devos 2 siblings, 0 replies; 56+ messages in thread From: Maxime Devos @ 2023-02-23 18:37 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: Philip McGrath, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 539 bytes --] On 23-02-2023 19:04, Maxime Devos wrote: > (*) Java actually allows "#!", but only in 'Shebang' files (see: > https://openjdk.org/jeps/330#Shebang_files). It remains invalid to put > a '#!java' line in files with a class definition that is supposed to be > found by Java's class loaders and compiler (in Guile terms, the source > code of a module). (Also, #java is unconditionally invalid, to my understanding.) (I pressed the 'Send button' a little to early when composing the previous version of this e-mail.) [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-14 23:01 ` Maxime Devos 2023-02-15 1:46 ` Matt Wette @ 2023-02-15 8:36 ` Dr. Arne Babenhauserheide 2023-02-15 20:13 ` Maxime Devos 1 sibling, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-15 8:36 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 4556 bytes --] Maxime Devos <maximedevos@telenet.be> writes: >> [...] >> That would be nice, but would require doing changes in a critical core >> part of Guile. It would change this addition from a risk-free added >> feature to a risky core change. > > I maintain that a new language shouldn't be merged until the > Scheme-specific load path stuff is fixed/extended to work for > non-Scheme things (e.g. Wisp) too -- if this requires somewhat risky > (*) changes to core parts, then that just means we'll have to do some > risky stuff, then. > > I also expect that Guile maintainers will have the opposite opinion > (i.e., ‘fixing the load path stuff isn't necessary for merging a new > language implementation’). > > (*) FWIW I disagree on the 'risky' assessment -- it seems like a ‘if > it runs, it will work’ thing to me. That it modifies a core part of > Guile, makes it less risky IMO, as it would automatically be more > tested. > > Aside from the (*) and the 'I also expect [...],', I don't have > anything new to say about this, so I'll stop here. Thank you for your suggestions and contributions so far. They help me a lot! >> [...] >> That would also enable shipping pre-compiled software without >> sourcecode, > > That can already be done -- besides legalities, nothing stops people > from putting [^] or [^] .scm files in $GUILE_LOAD_PATH and putting .go > in $GUILE_LOAD_COMPILED_PATH. > > [^]: Redacted to not give people ideas on how to circumvent stuff. > I can elaborate by non-public e-mail if you like. Thank you! (for redacting) — I hope I’ll never need that :-) > On 14-02-2023 22:24, Dr. Arne Babenhauserheide wrote: >> PS: So what’s still missing here is to avoid setting the locale. Do you >> happen to have a hint how to actually do this right? > > I think you might have forgotten about this: > >> -- Scheme Procedure: set-port-encoding! port enc >> -- C Function: scm_set_port_encoding_x (port, enc) >> Sets the character encoding that will be used to interpret I/O to >> PORT. ENC is a string containing the name of an encoding. Valid >> encoding names are those defined by IANA >> (http://www.iana.org/assignments/character-sets), for example >> ‘"UTF-8"’ or ‘"ISO-8859-1"’. >> As such, I propose calling set-port-encoding! right in the beginning >> of read-one-wisp-sexp. Yikes, yes. I shouldn’t spend so much time thinking about implications when I haven’t yet applied all the clear and uncontested improvements. Thank you! > More concretely, replace > > (define (read-one-wisp-sexp port env) > ;; allow using "# foo" as #(foo). > (read-hash-extend #\# (λ (chr port) #\#)) > (cond > ((eof-object? (peek-char port)) > (read-char port )); return eof: we’re done > (else > (let ((chunk (wisp-scheme-read-chunk port))) > (cond > ((not (null? chunk)) > (car chunk)) > (else > #f)))))) > > by > > (define (read-one-wisp-sexp port env) > ;; Allow using "# foo" as #(foo). > ;; Don't use the globally-acting read-hash-extend, because this > ;; doesn't make much sense in parenthese-y (non-Wisp) Scheme. > ;; Instead, use fluids to temporarily add the extension. > (define %read-hash-procedures/parameter > (fluid->parameter %read-hash-procedures)) > (parameterize ((%read-hash-procedures/parameter > `((#\# ,(λ (chr port) #\#)) > ,@(%read-hash-procedures/parameter)))) > ;; Read Wisp files as UTF-8, to support non-ASCII characters. > ;; TODO: would be nice to support ';; coding: whatever' lines > ;; like in parenthese-y Scheme. > (set-port-encoding! port "UTF-8") > (if (eof-object? (peek-char port)) > (read-char port) ; return eof: we’re done > (let ((chunk (wisp-scheme-read-chunk port))) > (and (not (null? chunk)) ; <---- XXX: maybe (pair? chunk) > (car chunk)))))) > > (untested). > > (I've also done the read-hash-extend stuff and simplified the 'cond' > expressions.) Thank you again for that! Which begs an important question: How would you like to be attributed? I plan to also merge this back to the wisp repo and I’d like to attribute you there, too. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-15 8:36 ` Dr. Arne Babenhauserheide @ 2023-02-15 20:13 ` Maxime Devos 2023-02-16 7:01 ` Dr. Arne Babenhauserheide 0 siblings, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-15 20:13 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 509 bytes --] > [...] > Which begs an important question: How would you like to be attributed? I > plan to also merge this back to the wisp repo and I’d like to attribute > you there, too. You could add a ";; Copyright © 2023 Maxime Devos <maximedevos@telenet.be>" line next to yours in the file that contains the read-one-wisp-sexp I modified, a line like 'https://hg.sr.ht/~arnebab/wisp/browse/NEWS?rev=tip#L63' in the NEWS and an entry in 'Specific Contributions' in AUTHORS.in. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-15 20:13 ` Maxime Devos @ 2023-02-16 7:01 ` Dr. Arne Babenhauserheide 0 siblings, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-16 7:01 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 851 bytes --] Maxime Devos <maximedevos@telenet.be> writes: >> [...] >> Which begs an important question: How would you like to be attributed? I >> plan to also merge this back to the wisp repo and I’d like to attribute >> you there, too. > > You could add a ";; Copyright © 2023 Maxime Devos > <maximedevos@telenet.be>" line next to yours in the file that contains > the read-one-wisp-sexp I modified, a line like > 'https://hg.sr.ht/~arnebab/wisp/browse/NEWS?rev=tip#L63' in the NEWS > and an entry in 'Specific Contributions' in AUTHORS.in. I implemented that now — thank you! In the wisp-repo I added the copyright line in both the source file for wisp.scm and spec.scm, because I see the unlimited underscores as significant, too. Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-04 15:08 ` Maxime Devos 2023-02-04 15:46 ` Dr. Arne Babenhauserheide @ 2023-02-16 8:03 ` Dr. Arne Babenhauserheide 2023-02-16 11:30 ` Maxime Devos 1 sibling, 1 reply; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-16 8:03 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1: Type: text/plain, Size: 1029 bytes --] Maxime Devos <maximedevos@telenet.be> writes: >> + ;; allow using "# foo" as #(foo). >> + (read-hash-extend #\# (λ (chr port) #\#)) > > That's a rather Wisp-specific extension, but it appears you are > extending things globally. Instead, I propose extending it > temporarily, with the undocumented '%read-hash-procedures' fluid. I tried the spec-example, but it didn’t work for me. Do you mean using it similar to this test? (pass-if "R6RS/SRFI-30 block comment syntax overridden" ;; To be compatible with 1.8 and earlier, we should be able to override ;; this syntax. (with-fluids ((%read-hash-procedures (fluid-ref %read-hash-procedures))) (read-hash-extend #\| (lambda args 'not)) (fold (lambda (x y result) (and result (eq? x y))) #t (read-string "(this is #| a comment)") `(this is not a comment)))) Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-16 8:03 ` Dr. Arne Babenhauserheide @ 2023-02-16 11:30 ` Maxime Devos 2023-02-16 21:35 ` Dr. Arne Babenhauserheide 0 siblings, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-02-16 11:30 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 1437 bytes --] On 16-02-2023 09:03, Dr. Arne Babenhauserheide wrote: > > Maxime Devos <maximedevos@telenet.be> writes: >>> + ;; allow using "# foo" as #(foo). >>> + (read-hash-extend #\# (λ (chr port) #\#)) >> >> That's a rather Wisp-specific extension, but it appears you are >> extending things globally. Instead, I propose extending it >> temporarily, with the undocumented '%read-hash-procedures' fluid. > > I tried the spec-example, but it didn’t work for me. Assuming that with 'spec-example', you meant (parameterize ((%read-hash-procedures/parameter `((#\# ,(λ (chr port) #\#)) ,@(%read-hash-procedures/parameter)))) [...]): I forgot to place a '.' between #\# and ,(λ ...). > Do you mean using it similar to this test? > > (pass-if "R6RS/SRFI-30 block comment syntax overridden" > ;; To be compatible with 1.8 and earlier, we should be able to override > ;; this syntax. > (with-fluids ((%read-hash-procedures (fluid-ref %read-hash-procedures))) > (read-hash-extend #\| (lambda args 'not)) > (fold (lambda (x y result) > (and result (eq? x y))) > #t > (read-string "(this is #| a comment)") > `(this is not a comment)))) That appears to me a valid (and slightly simpler and more robust) way of doing things, yes. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-16 11:30 ` Maxime Devos @ 2023-02-16 21:35 ` Dr. Arne Babenhauserheide 0 siblings, 0 replies; 56+ messages in thread From: Dr. Arne Babenhauserheide @ 2023-02-16 21:35 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-devel [-- Attachment #1.1: Type: text/plain, Size: 961 bytes --] Maxime Devos <maximedevos@telenet.be> writes: > On 16-02-2023 09:03, Dr. Arne Babenhauserheide wrote: >> Do you mean using it similar to this test? >> (pass-if "R6RS/SRFI-30 block comment syntax overridden" >> ;; To be compatible with 1.8 and earlier, we should be able to override >> ;; this syntax. >> (with-fluids ((%read-hash-procedures (fluid-ref %read-hash-procedures))) >> (read-hash-extend #\| (lambda args 'not)) >> (fold (lambda (x y result) >> (and result (eq? x y))) >> #t >> (read-string "(this is #| a comment)") >> `(this is not a comment)))) > > That appears to me a valid (and slightly simpler and more robust) way > of doing things, yes. I finally got it working with passing tests — thank you! The reason that it did not work was that in the wisp repo there are tests that use wisp.scm directly without going through spec.scm. [-- Attachment #1.2: 0001-Only-extend-the-reader-while-reading-wisp.-Thanks-to.patch --] [-- Type: text/x-patch, Size: 2927 bytes --] From 12aa7314ad85f442f8bfe85839127bf1929be2ba Mon Sep 17 00:00:00 2001 From: Arne Babenhauserheide <arne_bab@web.de> Date: Thu, 16 Feb 2023 22:34:00 +0100 Subject: [PATCH] Only extend the reader while reading wisp. Thanks to Maxime Devos! * module/language/wisp/spec.scm (read-one-wisp-sexp): extend hash only in fluid * module/language/wisp.scm (wisp-scheme-read-chunk): extend hash in fluid --- module/language/wisp.scm | 18 ++++++++++-------- module/language/wisp/spec.scm | 8 ++------ 2 files changed, 12 insertions(+), 14 deletions(-) diff --git a/module/language/wisp.scm b/module/language/wisp.scm index 65ef8e65a..7a12e126a 100644 --- a/module/language/wisp.scm +++ b/module/language/wisp.scm @@ -725,14 +725,16 @@ Match is awesome!" a)))))) (define (wisp-scheme-read-chunk port) - "Read and parse one chunk of wisp-code" - (let (( lines (wisp-scheme-read-chunk-lines port))) - (wisp-make-improper - (wisp-replace-empty-eof - (wisp-unescape-underscore-and-colon - (wisp-replace-paren-quotation-repr - (wisp-propagate-source-properties - (wisp-scheme-indentation-to-parens lines)))))))) + "Read and parse one chunk of wisp-code" + (with-fluids ((%read-hash-procedures (fluid-ref %read-hash-procedures))) + (read-hash-extend #\# (lambda args #\#)) + (let ((lines (wisp-scheme-read-chunk-lines port))) + (wisp-make-improper + (wisp-replace-empty-eof + (wisp-unescape-underscore-and-colon + (wisp-replace-paren-quotation-repr + (wisp-propagate-source-properties + (wisp-scheme-indentation-to-parens lines))))))))) (define (wisp-scheme-read-all port) "Read all chunks from the given port" diff --git a/module/language/wisp/spec.scm b/module/language/wisp/spec.scm index 3ba248aa9..477036c71 100644 --- a/module/language/wisp/spec.scm +++ b/module/language/wisp/spec.scm @@ -45,12 +45,8 @@ ;; Don't use the globally-acting read-hash-extend, because this ;; doesn't make much sense in parenthese-y (non-Wisp) Scheme. ;; Instead, use fluids to temporarily add the extension. - (read-hash-extend #\# (λ (chr port) #\#)) - (define %read-hash-procedures/parameter - (fluid->parameter %read-hash-procedures)) - (parameterize ((%read-hash-procedures/parameter - `((#\# ,(λ (chr port) #\#)) - ,@(%read-hash-procedures/parameter)))) + (with-fluids ((%read-hash-procedures (fluid-ref %read-hash-procedures))) + (read-hash-extend #\# (lambda args #\# )) ;; Read Wisp files as UTF-8, to support non-ASCII characters. ;; TODO: would be nice to support ';; coding: whatever' lines ;; like in parenthese-y Scheme. -- 2.39.1 [-- Attachment #1.3: Type: text/plain, Size: 101 bytes --] Best wishes, Arne -- Unpolitisch sein heißt politisch sein, ohne es zu merken. draketo.de [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 1125 bytes --] ^ permalink raw reply related [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-02-03 21:26 Dr. Arne Babenhauserheide 2023-02-04 15:08 ` Maxime Devos @ 2023-09-30 13:17 ` Christine Lemmer-Webber 2023-09-30 20:09 ` Maxime Devos 1 sibling, 1 reply; 56+ messages in thread From: Christine Lemmer-Webber @ 2023-09-30 13:17 UTC (permalink / raw) To: Dr. Arne Babenhauserheide; +Cc: guile-devel Haven't fully caught up on this thread, but as a side note I have a mostly-finished implementation of a Wisp parser which takes a very different approach than Arne's, and was more understandable to me personally, a bit more functional and recursive-descent style. I could make it available if anyone is curious. Would love to see Wisp in Guile proper! "Dr. Arne Babenhauserheide" <arne_bab@web.de> writes: > [[PGP Signed Part:Undecided]] > Hi, > > Since (language wisp)¹ has been rock stable for years now and is used in > the Guix Workflow Language and supported in the Chickadee and the > Tsukundere game engines, I thought it coud be a good time to merge Wisp > into Guile itself. > > So I prepared a patch that adds language/wisp, some texinfo for > SRFI-119, and some tests. > > > Why add Wisp? > > For Wisp: it is then available directly wherever Guile is available. > This will make it much easier for people to follow tutorials. > > For Guile: > > - Wisp has proven to be good at enabling people to get an > entrance to Scheme² without pulling them out of the community. > > - It has also been shown to enable people who are used to other > programming languages to get a quick start at tools written in Guile. > > - And it provides access to the full capabilities of Guile with minimal > maintenance effort, because it is just the thinnest possible layer > around Scheme. The last required change was in 2020 while I used it > continuously. > > > The attached patch provides just the wisp reader, but not the > wisp->scheme transformer, because the latter has known broken edge-cases > (and who needs the transformer can get it from the wisp repo and execute > it directly with a Guile that then already supports wisp without any > path adaptions). > > > So I’d like to ask: can we merge Wisp as supported language into Guile? > > > Best wishes, > Arne > > > ¹: https://www.draketo.de/software/wisp > > ²: »Wisp allows people to see code how Lispers perceive it. Its > structure becomes apparent.« — Ricardo Wurmus in IRC > > From 4d4759f9fc67b01c40bde41b93e3998f7d64eabd Mon Sep 17 00:00:00 2001 > From: Arne Babenhauserheide <arne_bab@web.de> > Date: Fri, 3 Feb 2023 22:20:04 +0100 > Subject: [PATCH] Add language/wisp, wisp tests, and srfi-119 documentation > > * doc/ref/srfi-modules.texi (srfi-119): add node > * module/language/wisp.scm: New file. > * module/language/wisp/spec.scm: New file. > * test-suite/tests/srfi-119.test: New file. > --- > doc/ref/srfi-modules.texi | 30 ++ > module/language/wisp.scm | 796 +++++++++++++++++++++++++++++++++ > module/language/wisp/spec.scm | 107 +++++ > test-suite/tests/srfi-119.test | 81 ++++ > 4 files changed, 1014 insertions(+) > create mode 100644 module/language/wisp.scm > create mode 100644 module/language/wisp/spec.scm > create mode 100644 test-suite/tests/srfi-119.test > > diff --git a/doc/ref/srfi-modules.texi b/doc/ref/srfi-modules.texi > index 0ef136215..759e293ec 100644 > --- a/doc/ref/srfi-modules.texi > +++ b/doc/ref/srfi-modules.texi > @@ -64,6 +64,7 @@ get the relevant SRFI documents from the SRFI home page > * SRFI-98:: Accessing environment variables. > * SRFI-105:: Curly-infix expressions. > * SRFI-111:: Boxes. > +* SRFI-119:: Wisp: simpler indentation-sensitive scheme. > * SRFI-171:: Transducers > @end menu > > @@ -5662,6 +5663,34 @@ Return the current contents of @var{box}. > Set the contents of @var{box} to @var{value}. > @end deffn > > +@node SRFI-119 > +@subsection SRFI-119 Wisp: simpler indentation-sensitive scheme. > +@cindex SRFI-119 > +@cindex wisp > + > +The languages shipped in Guile include SRFI-119 (wisp), an encoding of > +Scheme that allows replacing parentheses with equivalent indentation and > +inline colons. See > +@uref{http://srfi.schemers.org/srfi-119/srfi-119.html, the specification > +of SRFI-119}. Some examples: > + > +@example > +display "Hello World!" @result{} (display "Hello World!") > +@end example > + > +@example > +define : factorial n @result{} (define (factorial n) > + if : zero? n @result{} (if (zero? n) > + . 1 @result{} 1 > + * n : factorial @{n - 1@} @result{} (* n (factorial @{n - 1@})))) > +@end example > + > +To execute a file with wisp code, select the language and filename > +extension @code{.w} vie @code{guile --language=wisp -x .w}. > + > +In files using Wisp, @xref{SRFI-105} (Curly Infix) is always activated. > + > + > @node SRFI-171 > @subsection Transducers > @cindex SRFI-171 > @@ -5705,6 +5734,7 @@ left-to-right, due to how transducers are initiated. > * SRFI-171 Helpers:: Utilities for writing your own transducers > @end menu > > + > @node SRFI-171 General Discussion > @subsubsection SRFI-171 General Discussion > @cindex transducers discussion > diff --git a/module/language/wisp.scm b/module/language/wisp.scm > new file mode 100644 > index 000000000..ba24f54c5 > --- /dev/null > +++ b/module/language/wisp.scm > @@ -0,0 +1,796 @@ > +;;; Wisp > + > +;; Copyright (C) 2013, 2017, 2018, 2020 Free Software Foundation, Inc. > +;; Copyright (C) 2014--2023 Arne Babenhauserheide. > + > +;;;; This library is free software; you can redistribute it and/or > +;;;; modify it under the terms of the GNU Lesser General Public > +;;;; License as published by the Free Software Foundation; either > +;;;; version 3 of the License, or (at your option) any later version. > +;;;; > +;;;; This library is distributed in the hope that it will be useful, > +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of > +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;;;; Lesser General Public License for more details. > +;;;; > +;;;; You should have received a copy of the GNU Lesser General Public > +;;;; License along with this library; if not, write to the Free Software > +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + > +;;; Commentary: > + > +;; Scheme-only implementation of a wisp-preprocessor which output a > +;; scheme code tree to feed to a scheme interpreter instead of a > +;; preprocessed file. > + > +;; Limitations: > +;; - only unescapes up to 12 leading underscores at line start (\____________) > +;; - in some cases the source line information is missing in backtraces. > +;; check for set-source-property! > + > +;;; Code: > + > +(define-module (language wisp) > + #:export (wisp-scheme-read-chunk wisp-scheme-read-all > + wisp-scheme-read-file-chunk wisp-scheme-read-file > + wisp-scheme-read-string)) > + > +; use curly-infix by default > +(read-enable 'curly-infix) > + > +(use-modules > + (srfi srfi-1) > + (srfi srfi-11 ); for let-values > + (ice-9 rw ); for write-string/partial > + (ice-9 match)) > + > + > +;; Helper functions for the indent-and-symbols data structure: '((indent token token ...) ...) > +(define (line-indent line) > + (car line)) > + > +(define (line-real-indent line) > + "Get the indentation without the comment-marker for unindented lines (-1 is treated as 0)." > + (let (( indent (line-indent line))) > + (if (= -1 indent) > + 0 > + indent))) > + > +(define (line-code line) > + (let ((code (cdr line))) > + ; propagate source properties > + (when (not (null? code)) > + (set-source-properties! code (source-properties line))) > + code)) > + > +; literal values I need > +(define readcolon > + (string->symbol ":")) > + > +(define wisp-uuid "e749c73d-c826-47e2-a798-c16c13cb89dd") > +; define an intermediate dot replacement with UUID to avoid clashes. > +(define repr-dot ; . > + (string->symbol (string-append "REPR-DOT-" wisp-uuid))) > + > +; allow using reader additions as the first element on a line to prefix the list > +(define repr-quote ; ' > + (string->symbol (string-append "REPR-QUOTE-" wisp-uuid))) > +(define repr-unquote ; , > + (string->symbol (string-append "REPR-UNQUOTE-" wisp-uuid))) > +(define repr-quasiquote ; ` > + (string->symbol (string-append "REPR-QUASIQUOTE-" wisp-uuid))) > +(define repr-unquote-splicing ; ,@ > + (string->symbol (string-append "REPR-UNQUOTESPLICING-" wisp-uuid))) > + > +(define repr-syntax ; #' > + (string->symbol (string-append "REPR-SYNTAX-" wisp-uuid))) > +(define repr-unsyntax ; #, > + (string->symbol (string-append "REPR-UNSYNTAX-" wisp-uuid))) > +(define repr-quasisyntax ; #` > + (string->symbol (string-append "REPR-QUASISYNTAX-" wisp-uuid))) > +(define repr-unsyntax-splicing ; #,@ > + (string->symbol (string-append "REPR-UNSYNTAXSPLICING-" wisp-uuid))) > + > +; TODO: wrap the reader to return the repr of the syntax reader > +; additions > + > +(define (match-charlist-to-repr charlist) > + (let > + ((chlist (reverse charlist))) > + (cond > + ((equal? chlist (list #\.)) > + repr-dot) > + ((equal? chlist (list #\')) > + repr-quote) > + ((equal? chlist (list #\,)) > + repr-unquote) > + ((equal? chlist (list #\`)) > + repr-quasiquote) > + ((equal? chlist (list #\, #\@ )) > + repr-unquote-splicing) > + ((equal? chlist (list #\# #\' )) > + repr-syntax) > + ((equal? chlist (list #\# #\, )) > + repr-unsyntax) > + ((equal? chlist (list #\# #\` )) > + repr-quasisyntax) > + ((equal? chlist (list #\# #\, #\@ )) > + repr-unsyntax-splicing) > + (else > + #f)))) > + > +(define (wisp-read port) > + "wrap read to catch list prefixes." > + (let ((prefix-maxlen 4)) > + (let longpeek > + ((peeked '()) > + (repr-symbol #f)) > + (cond > + ((or (< prefix-maxlen (length peeked)) (eof-object? (peek-char port)) (equal? #\space (peek-char port)) (equal? #\newline (peek-char port)) ) > + (if repr-symbol ; found a special symbol, return it. > + ; TODO: Somehow store source-properties. The commented-out code below does not work. > + ; catch #t > + ; lambda () > + ; write : source-properties symbol-or-symbols > + ; set-source-property! symbol-or-symbols 'filename : port-filename port > + ; set-source-property! symbol-or-symbols 'line : 1+ : port-line port > + ; set-source-property! symbol-or-symbols 'column : port-column port > + ; write : source-properties symbol-or-symbols > + ; lambda : key . arguments > + ; . #f > + repr-symbol > + (let unpeek > + ((remaining peeked)) > + (cond > + ((equal? '() remaining ) > + (read port )); let read to the work > + (else > + (unread-char (car remaining) port) > + (unpeek (cdr remaining))))))) > + (else > + (let* > + ((next-char (read-char port)) > + (peeked (cons next-char peeked))) > + (longpeek > + peeked > + (match-charlist-to-repr peeked)))))))) > + > + > + > +(define (line-continues? line) > + (equal? repr-dot (car (line-code line)))) > + > +(define (line-only-colon? line) > + (and > + (equal? ":" (car (line-code line))) > + (null? (cdr (line-code line))))) > + > +(define (line-empty-code? line) > + (null? (line-code line))) > + > +(define (line-empty? line) > + (and > + ; if indent is -1, we stripped a comment, so the line was not really empty. > + (= 0 (line-indent line)) > + (line-empty-code? line))) > + > +(define (line-strip-continuation line ) > + (if (line-continues? line) > + (append > + (list > + (line-indent line)) > + (cdr (line-code line))) > + line)) > + > +(define (line-strip-indentation-marker line) > + "Strip the indentation markers from the beginning of the line" > + (cdr line)) > + > +(define (indent-level-reduction indentation-levels level select-fun) > + "Reduce the INDENTATION-LEVELS to the given LEVEL and return the value selected by SELECT-FUN" > + (let loop > + ((newlevels indentation-levels) > + (diff 0)) > + (cond > + ((= level (car newlevels)) > + (select-fun (list diff indentation-levels))) > + ((< level (car newlevels)) > + (loop > + (cdr newlevels) > + (1+ diff))) > + (else > + (throw 'wisp-syntax-error "Level ~A not found in the indentation-levels ~A."))))) > + > +(define (indent-level-difference indentation-levels level) > + "Find how many indentation levels need to be popped off to find the given level." > + (indent-level-reduction indentation-levels level > + (lambda (x ); get the count > + (car x)))) > + > +(define (indent-reduce-to-level indentation-levels level) > + "Find how many indentation levels need to be popped off to find the given level." > + (indent-level-reduction indentation-levels level > + (lambda (x ); get the levels > + (car (cdr x))))) > + > +(define (chunk-ends-with-period currentsymbols next-char) > + "Check whether indent-and-symbols ends with a period, indicating the end of a chunk." > + (and (not (null? currentsymbols)) > + (equal? #\newline next-char) > + (equal? repr-dot > + (list-ref currentsymbols (- (length currentsymbols) 1))))) > + > +(define (wisp-scheme-read-chunk-lines port) > + (let loop > + ((indent-and-symbols (list )); '((5 "(foobar)" "\"yobble\"")(3 "#t")) > + (inindent #t) > + (inunderscoreindent (equal? #\_ (peek-char port))) > + (incomment #f) > + (currentindent 0) > + (currentsymbols '()) > + (emptylines 0)) > + (cond > + ((>= emptylines 2 ); the chunk end has to be checked > + ; before we look for new chars in the > + ; port to make execution in the REPL > + ; after two empty lines work > + ; (otherwise it shows one more line). > + indent-and-symbols) > + (else > + (let ((next-char (peek-char port))) > + (cond > + ((eof-object? next-char) > + (append indent-and-symbols (list (append (list currentindent) currentsymbols)))) > + ((and inindent (zero? currentindent) (not incomment) (not (null? indent-and-symbols)) (not inunderscoreindent) (not (or (equal? #\space next-char) (equal? #\newline next-char) (equal? (string-ref ";" 0) next-char)))) > + (append indent-and-symbols )); top-level form ends chunk > + ((chunk-ends-with-period currentsymbols next-char) > + ; the line ends with a period. This is forbidden in > + ; SRFI-119. Use it to end the line in the REPL without > + ; showing continuation dots (...). > + (append indent-and-symbols (list (append (list currentindent) (drop-right currentsymbols 1))))) > + ((and inindent (equal? #\space next-char)) > + (read-char port ); remove char > + (loop > + indent-and-symbols > + #t ; inindent > + #f ; inunderscoreindent > + #f ; incomment > + (1+ currentindent) > + currentsymbols > + emptylines)) > + ((and inunderscoreindent (equal? #\_ next-char)) > + (read-char port ); remove char > + (loop > + indent-and-symbols > + #t ; inindent > + #t ; inunderscoreindent > + #f ; incomment > + (1+ currentindent) > + currentsymbols > + emptylines)) > + ; any char but whitespace *after* underscoreindent is > + ; an error. This is stricter than the current wisp > + ; syntax definition. TODO: Fix the definition. Better > + ; start too strict. FIXME: breaks on lines with only > + ; underscores which should be empty lines. > + ((and inunderscoreindent (and (not (equal? #\space next-char)) (not (equal? #\newline next-char)))) > + (throw 'wisp-syntax-error "initial underscores without following whitespace at beginning of the line after" (last indent-and-symbols))) > + ((equal? #\newline next-char) > + (read-char port ); remove the newline > + ; The following two lines would break the REPL by requiring one char too many. > + ; if : and (equal? #\newline next-char) : equal? #\return : peek-char port > + ; read-char port ; remove a full \n\r. Damn special cases... > + (let* ; distinguish pure whitespace lines and lines > + ; with comment by giving the former zero > + ; indent. Lines with a comment at zero indent > + ; get indent -1 for the same reason - meaning > + ; not actually empty. > + ( > + (indent > + (cond > + (incomment > + (if (= 0 currentindent ); specialcase > + -1 > + currentindent )) > + ((not (null? currentsymbols )); pure whitespace > + currentindent) > + (else > + 0))) > + (parsedline (append (list indent) currentsymbols)) > + (emptylines > + (if (not (line-empty? parsedline)) > + 0 > + (1+ emptylines)))) > + (when (not (= 0 (length parsedline))) > + ; set the source properties to parsedline so we can try to add them later. > + (set-source-property! parsedline 'filename (port-filename port)) > + (set-source-property! parsedline 'line (port-line port))) > + ; TODO: If the line is empty. Either do it here and do not add it, just > + ; increment the empty line counter, or strip it later. Replace indent > + ; -1 by indent 0 afterwards. > + (loop > + (append indent-and-symbols (list parsedline)) > + #t ; inindent > + (if (<= 2 emptylines) > + #f ; chunk ends here > + (equal? #\_ (peek-char port ))); are we in underscore indent? > + #f ; incomment > + 0 > + '() > + emptylines))) > + ((equal? #t incomment) > + (read-char port ); remove one comment character > + (loop > + indent-and-symbols > + #f ; inindent > + #f ; inunderscoreindent > + #t ; incomment > + currentindent > + currentsymbols > + emptylines)) > + ((or (equal? #\space next-char) (equal? #\tab next-char) (equal? #\return next-char) ); remove whitespace when not in indent > + (read-char port ); remove char > + (loop > + indent-and-symbols > + #f ; inindent > + #f ; inunderscoreindent > + #f ; incomment > + currentindent > + currentsymbols > + emptylines)) > + ; | cludge to appease the former wisp parser > + ; | used for bootstrapping which has a > + ; v problem with the literal comment char > + ((equal? (string-ref ";" 0) next-char) > + (loop > + indent-and-symbols > + #f ; inindent > + #f ; inunderscoreindent > + #t ; incomment > + currentindent > + currentsymbols > + emptylines)) > + (else ; use the reader > + (loop > + indent-and-symbols > + #f ; inindent > + #f ; inunderscoreindent > + #f ; incomment > + currentindent > + ; this also takes care of the hashbang and leading comments. > + (append currentsymbols (list (wisp-read port))) > + emptylines)))))))) > + > + > +(define (line-code-replace-inline-colons line) > + "Replace inline colons by opening parens which close at the end of the line" > + ; format #t "replace inline colons for line ~A\n" line > + (let loop > + ((processed '()) > + (unprocessed line)) > + (cond > + ((null? unprocessed) > + ; format #t "inline-colons processed line: ~A\n" processed > + processed) > + ; replace : . with nothing > + ((and (<= 2 (length unprocessed)) (equal? readcolon (car unprocessed)) (equal? repr-dot (car (cdr unprocessed)))) > + (loop > + (append processed > + (loop '() (cdr (cdr unprocessed)))) > + '())) > + ((equal? readcolon (car unprocessed)) > + (loop > + ; FIXME: This should turn unprocessed into a list. > + (append processed > + (list (loop '() (cdr unprocessed)))) > + '())) > + (else > + (loop > + (append processed > + (list (car unprocessed))) > + (cdr unprocessed)))))) > + > +(define (line-replace-inline-colons line) > + (cons > + (line-indent line) > + (line-code-replace-inline-colons (line-code line)))) > + > +(define (line-strip-lone-colon line) > + "A line consisting only of a colon is just a marked indentation level. We need to kill the colon before replacing inline colons." > + (if > + (equal? > + (line-code line) > + (list readcolon)) > + (list (line-indent line)) > + line)) > + > +(define (line-finalize line) > + "Process all wisp-specific information in a line and strip it" > + (let > + ( > + (l > + (line-code-replace-inline-colons > + (line-strip-indentation-marker > + (line-strip-lone-colon > + (line-strip-continuation line)))))) > + (when (not (null? (source-properties line))) > + (catch #t > + (lambda () > + (set-source-properties! l (source-properties line))) > + (lambda (key . arguments) > + #f))) > + l)) > + > +(define (wisp-add-source-properties-from source target) > + "Copy the source properties from source into the target and return the target." > + (catch #t > + (lambda () > + (set-source-properties! target (source-properties source))) > + (lambda (key . arguments) > + #f)) > + target) > + > +(define (wisp-propagate-source-properties code) > + "Propagate the source properties from the sourrounding list into every part of the code." > + (let loop > + ((processed '()) > + (unprocessed code)) > + (cond > + ((and (null? processed) (not (pair? unprocessed)) (not (list? unprocessed))) > + unprocessed) > + ((and (pair? unprocessed) (not (list? unprocessed))) > + (cons > + (wisp-propagate-source-properties (car unprocessed)) > + (wisp-propagate-source-properties (cdr unprocessed)))) > + ((null? unprocessed) > + processed) > + (else > + (let ((line (car unprocessed))) > + (if (null? (source-properties unprocessed)) > + (wisp-add-source-properties-from line unprocessed) > + (wisp-add-source-properties-from unprocessed line)) > + (loop > + (append processed (list (wisp-propagate-source-properties line))) > + (cdr unprocessed))))))) > + > +(define* (wisp-scheme-indentation-to-parens lines) > + "Add parentheses to lines and remove the indentation markers" > + (when > + (and > + (not (null? lines)) > + (not (line-empty-code? (car lines))) > + (not (= 0 (line-real-indent (car lines ))))); -1 is a line with a comment > + (if (= 1 (line-real-indent (car lines))) > + ;; accept a single space as indentation of the first line (and ignore the indentation) to support meta commands > + (set! lines > + (cons > + (cons 0 (cdr (car lines))) > + (cdr lines))) > + (throw 'wisp-syntax-error > + (format #f "The first symbol in a chunk must start at zero indentation. Indentation and line: ~A" > + (car lines))))) > + (let loop > + ((processed '()) > + (unprocessed lines) > + (indentation-levels '(0))) > + (let* > + ( > + (current-line > + (if (<= 1 (length unprocessed)) > + (car unprocessed) > + (list 0 ))); empty code > + (next-line > + (if (<= 2 (length unprocessed)) > + (car (cdr unprocessed)) > + (list 0 ))); empty code > + (current-indentation > + (car indentation-levels)) > + (current-line-indentation (line-real-indent current-line))) > + ; format #t "processed: ~A\ncurrent-line: ~A\nnext-line: ~A\nunprocessed: ~A\nindentation-levels: ~A\ncurrent-indentation: ~A\n\n" > + ; . processed current-line next-line unprocessed indentation-levels current-indentation > + (cond > + ; the real end: this is reported to the outside world. > + ((and (null? unprocessed) (not (null? indentation-levels)) (null? (cdr indentation-levels))) > + ; display "done\n" > + ; reverse the processed lines, because I use cons. > + processed) > + ; the recursion end-condition > + ((and (null? unprocessed)) > + ; display "last step\n" > + ; this is the last step. Nothing more to do except > + ; for rolling up the indentation levels. return the > + ; new processed and unprocessed lists: this is a > + ; side-recursion > + (values processed unprocessed)) > + ((null? indentation-levels) > + ; display "indentation-levels null\n" > + (throw 'wisp-programming-error "The indentation-levels are null but the current-line is null: Something killed the indentation-levels.")) > + (else ; now we come to the line-comparisons and indentation-counting. > + (cond > + ((line-empty-code? current-line) > + ; display "current-line empty\n" > + ; We cannot process indentation without > + ; code. Just switch to the next line. This should > + ; only happen at the start of the recursion. > + ; TODO: Somehow preserve the line-numbers. > + (loop > + processed > + (cdr unprocessed) > + indentation-levels)) > + ((and (line-empty-code? next-line) (<= 2 (length unprocessed ))) > + ; display "next-line empty\n" > + ; TODO: Somehow preserve the line-numbers. > + ; take out the next-line from unprocessed. > + (loop > + processed > + (cons current-line > + (cdr (cdr unprocessed))) > + indentation-levels)) > + ((> current-indentation current-line-indentation) > + ; display "current-indent > next-line\n" > + ; this just steps back one level via the side-recursion. > + (let ((previous-indentation (car (cdr indentation-levels)))) > + (if (<= current-line-indentation previous-indentation) > + (values processed unprocessed) > + (begin ;; not yet used level! TODO: maybe throw an error here instead of a warning. > + (let ((linenumber (- (length lines) (length unprocessed)))) > + (format (current-error-port) ";;; WARNING:~A: used lower but undefined indentation level (line ~A of the current chunk: ~S). This makes refactoring much more error-prone, therefore it might become an error in a later version of Wisp.\n" (source-property current-line 'line) linenumber (cdr current-line))) > + (loop > + processed > + unprocessed > + (cons ; recursion via the indentation-levels > + current-line-indentation > + (cdr indentation-levels))))))) > + ((= current-indentation current-line-indentation) > + ; display "current-indent = next-line\n" > + (let > + ((line (line-finalize current-line)) > + (next-line-indentation (line-real-indent next-line))) > + (cond > + ((>= current-line-indentation next-line-indentation) > + ; simple recursiive step to the next line > + ; display "current-line-indent >= next-line-indent\n" > + (loop > + (append processed > + (if (line-continues? current-line) > + line > + (wisp-add-source-properties-from line (list line)))) > + (cdr unprocessed ); recursion here > + indentation-levels)) > + ((< current-line-indentation next-line-indentation) > + ; display "current-line-indent < next-line-indent\n" > + ; format #t "line: ~A\n" line > + ; side-recursion via a sublist > + (let-values > + ( > + ((sub-processed sub-unprocessed) > + (loop > + line > + (cdr unprocessed ); recursion here > + indentation-levels))) > + ; format #t "side-recursion:\n sub-processed: ~A\n processed: ~A\n\n" sub-processed processed > + (loop > + (append processed (list sub-processed)) > + sub-unprocessed ; simply use the recursion from the sub-recursion > + indentation-levels)))))) > + ((< current-indentation current-line-indentation) > + ; display "current-indent < next-line\n" > + (loop > + processed > + unprocessed > + (cons ; recursion via the indentation-levels > + current-line-indentation > + indentation-levels))) > + (else > + (throw 'wisp-not-implemented > + (format #f "Need to implement further line comparison: current: ~A, next: ~A, processed: ~A." > + current-line next-line processed))))))))) > + > + > +(define (wisp-scheme-replace-inline-colons lines) > + "Replace inline colons by opening parens which close at the end of the line" > + (let loop > + ((processed '()) > + (unprocessed lines)) > + (if (null? unprocessed) > + processed > + (loop > + (append processed (list (line-replace-inline-colons (car unprocessed)))) > + (cdr unprocessed))))) > + > + > +(define (wisp-scheme-strip-indentation-markers lines) > + "Strip the indentation markers from the beginning of the lines" > + (let loop > + ((processed '()) > + (unprocessed lines)) > + (if (null? unprocessed) > + processed > + (loop > + (append processed (cdr (car unprocessed))) > + (cdr unprocessed))))) > + > +(define (wisp-unescape-underscore-and-colon code) > + "replace \\_ and \\: by _ and :" > + (match code > + ((a ...) > + (map wisp-unescape-underscore-and-colon a)) > + ('\_ > + '_) > + ('\__ > + '__) > + ('\___ > + '___) > + ('\____ > + '____) > + ('\_____ > + '_____) > + ('\______ > + '______) > + ('\_______ > + '_______) > + ('\________ > + '________) > + ('\_________ > + '_________) > + ('\__________ > + '__________) > + ('\___________ > + '___________) > + ('\____________ > + '____________) > + ('\: > + ':) > + (a > + a))) > + > + > +(define (wisp-replace-empty-eof code) > + "replace ((#<eof>)) by ()" > + ; FIXME: Actually this is a hack which fixes a bug when the > + ; parser hits files with only hashbang and comments. > + (if (and (not (null? code)) (pair? (car code)) (eof-object? (car (car code))) (null? (cdr code)) (null? (cdr (car code)))) > + (list) > + code)) > + > + > +(define (wisp-replace-paren-quotation-repr code) > + "Replace lists starting with a quotation symbol by > + quoted lists." > + (match code > + (('REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'quote (map wisp-replace-paren-quotation-repr a))) > + ((a ... 'REPR-QUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ); this is the quoted empty list > + (append > + (map wisp-replace-paren-quotation-repr a) > + (list (list 'quote (map wisp-replace-paren-quotation-repr b))))) > + (('REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'quasiquote (list 'unquote (map wisp-replace-paren-quotation-repr a)))) > + (('REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'unquote (map wisp-replace-paren-quotation-repr a))) > + ((a ... 'REPR-UNQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ) > + (append > + (map wisp-replace-paren-quotation-repr a) > + (list (list 'unquote (map wisp-replace-paren-quotation-repr b))))) > + (('REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'quasiquote (map wisp-replace-paren-quotation-repr a))) > + ((a ... 'REPR-QUASIQUOTE-e749c73d-c826-47e2-a798-c16c13cb89dd b ); this is the quoted empty list > + (append > + (map wisp-replace-paren-quotation-repr a) > + (list (list 'quasiquote (map wisp-replace-paren-quotation-repr b))))) > + (('REPR-UNQUOTESPLICING-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'unquote-splicing (map wisp-replace-paren-quotation-repr a))) > + (('REPR-SYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'syntax (map wisp-replace-paren-quotation-repr a))) > + (('REPR-UNSYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'unsyntax (map wisp-replace-paren-quotation-repr a))) > + (('REPR-QUASISYNTAX-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'quasisyntax (map wisp-replace-paren-quotation-repr a))) > + (('REPR-UNSYNTAXSPLICING-e749c73d-c826-47e2-a798-c16c13cb89dd a ...) > + (list 'unsyntax-splicing (map wisp-replace-paren-quotation-repr a))) > + ;; literal array as start of a line: # (a b) c -> (#(a b) c) > + ((#\# a ...) > + (with-input-from-string ;; hack to defer to read > + (string-append "#" > + (with-output-to-string > + (λ () > + (write (map wisp-replace-paren-quotation-repr a) > + (current-output-port))))) > + read)) > + ((a ...) > + (map wisp-replace-paren-quotation-repr a)) > + (a > + a))) > + > +(define (wisp-make-improper code) > + "Turn (a #{.}# b) into the correct (a . b). > + > +read called on a single dot creates a variable named #{.}# (|.| > +in r7rs). Due to parsing the indentation before the list > +structure is known, the reader cannot create improper lists > +when it reads a dot. So we have to take another pass over the > +code to recreate the improper lists. > + > +Match is awesome!" > + (let > + ( > + (improper > + (match code > + ((a ... b 'REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd c) > + (append (map wisp-make-improper a) > + (cons (wisp-make-improper b) (wisp-make-improper c)))) > + ((a ...) > + (map wisp-make-improper a)) > + (a > + a)))) > + (define (syntax-error li msg) > + (throw 'wisp-syntax-error (format #f "incorrect dot-syntax #{.}# in code: ~A: ~A" msg li))) > + (if #t > + improper > + (let check > + ((tocheck improper)) > + (match tocheck > + ; lists with only one member > + (('REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd) > + (syntax-error tocheck "list with the period as only member")) > + ; list with remaining dot. > + ((a ...) > + (if (and (member repr-dot a)) > + (syntax-error tocheck "leftover period in list") > + (map check a))) > + ; simple pair - this and the next do not work when parsed from wisp-scheme itself. Why? > + (('REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd . c) > + (syntax-error tocheck "dot as first element in already improper pair")) > + ; simple pair, other way round > + ((a . 'REPR-DOT-e749c73d-c826-47e2-a798-c16c13cb89dd) > + (syntax-error tocheck "dot as last element in already improper pair")) > + ; more complex pairs > + ((? pair? a) > + (let > + ((head (drop-right a 1)) > + (tail (last-pair a))) > + (cond > + ((equal? repr-dot (car tail)) > + (syntax-error tocheck "equal? repr-dot : car tail")) > + ((equal? repr-dot (cdr tail)) > + (syntax-error tocheck "equal? repr-dot : cdr tail")) > + ((member repr-dot head) > + (syntax-error tocheck "member repr-dot head")) > + (else > + a)))) > + (a > + a)))))) > + > +(define (wisp-scheme-read-chunk port) > + "Read and parse one chunk of wisp-code" > + (let (( lines (wisp-scheme-read-chunk-lines port))) > + (wisp-make-improper > + (wisp-replace-empty-eof > + (wisp-unescape-underscore-and-colon > + (wisp-replace-paren-quotation-repr > + (wisp-propagate-source-properties > + (wisp-scheme-indentation-to-parens lines)))))))) > + > +(define (wisp-scheme-read-all port) > + "Read all chunks from the given port" > + (let loop > + ((tokens '())) > + (cond > + ((eof-object? (peek-char port)) > + tokens) > + (else > + (loop > + (append tokens (wisp-scheme-read-chunk port))))))) > + > +(define (wisp-scheme-read-file path) > + (call-with-input-file path wisp-scheme-read-all)) > + > +(define (wisp-scheme-read-file-chunk path) > + (call-with-input-file path wisp-scheme-read-chunk)) > + > +(define (wisp-scheme-read-string str) > + (call-with-input-string str wisp-scheme-read-all)) > + > +(define (wisp-scheme-read-string-chunk str) > + (call-with-input-string str wisp-scheme-read-chunk)) > + > diff --git a/module/language/wisp/spec.scm b/module/language/wisp/spec.scm > new file mode 100644 > index 000000000..d5ea7abce > --- /dev/null > +++ b/module/language/wisp/spec.scm > @@ -0,0 +1,107 @@ > +;; Language interface for Wisp in Guile > + > +;;; adapted from guile-sweet: https://gitorious.org/nacre/guile-sweet/source/ae306867e371cb4b56e00bb60a50d9a0b8353109:sweet/common.scm > + > +;;; Copyright (C) 2005-2014 by David A. Wheeler and Alan Manuel K. Gloria > +;;; Copyright (C) Arne Babenhauserheide (2014--2023). > + > +;;; Permission is hereby granted, free of charge, to any person > +;;; obtaining a copy of this software and associated documentation > +;;; files (the "Software"), to deal in the Software without > +;;; restriction, including without limitation the rights to use, copy, > +;;; modify, merge, publish, distribute, sublicense, and/or sell copies > +;;; of the Software, and to permit persons to whom the Software is > +;;; furnished to do so, subject to the following conditions: > +;;; > +;;; The above copyright notice and this permission notice shall be > +;;; included in all copies or substantial portions of the Software. > +;;; > +;;; THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, > +;;; EXPRESS OR IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF > +;;; MERCHANTABILITY, FITNESS FOR A PARTICULAR PURPOSE AND > +;;; NONINFRINGEMENT. IN NO EVENT SHALL THE AUTHORS OR COPYRIGHT HOLDERS > +;;; BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER LIABILITY, WHETHER IN AN > +;;; ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM, OUT OF OR IN > +;;; CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN THE > +;;; SOFTWARE. > + > +; adapted from spec.scm: https://gitorious.org/nacre/guile-sweet/source/ae306867e371cb4b56e00bb60a50d9a0b8353109:sweet/spec.scm > +(define-module (language wisp spec) > + #:use-module (language wisp) > + #:use-module (system base compile) > + #:use-module (system base language) > + #:use-module (language scheme compile-tree-il) > + #:use-module (language scheme decompile-tree-il) > + #:export (wisp)) > + > +; Set locale to something which supports unicode. Required to avoid using fluids. > +(catch #t > + (lambda () > + (setlocale LC_ALL "")) > + (lambda (key . parameters) > + (let ((locale-fallback "en_US.UTF-8")) > + (format (current-error-port) > + (string-join > + (list ";;; Warning: setlocale LC_ALL \"\" failed with ~A: ~A" > + "switching to explicit ~A locale. Please setup your locale." > + "If this fails, you might need glibc support for unicode locales.\n") > + "\n;;; ") > + key parameters locale-fallback) > + (catch #t > + (lambda () > + (setlocale LC_ALL locale-fallback)) > + (lambda (key . parameters) > + (format (current-error-port) > + (string-join > + (list ";;; Warning: fallback setlocale LC_ALL ~A failed with ~A: ~A" > + "Not switching to Unicode." > + "You might need glibc support for unicode locales.\n") > + "\n;;; ") > + locale-fallback key parameters)))))) > + > +;;; > +;;; Language definition > +;;; > + > +(define wisp-pending-sexps (list)) > + > +(define (read-one-wisp-sexp port env) > + ;; allow using "# foo" as #(foo). > + (read-hash-extend #\# (λ (chr port) #\#)) > + (cond > + ((eof-object? (peek-char port)) > + (read-char port )); return eof: we’re done > + (else > + (let ((chunk (wisp-scheme-read-chunk port))) > + (cond > + ((not (null? chunk)) > + (car chunk)) > + (else > + #f)))))) > + > +(define-language wisp > + #:title "Wisp Scheme Syntax. See SRFI-119 for details." > + ; . #:reader read-one-wisp-sexp > + #:reader read-one-wisp-sexp ; : lambda (port env) : let ((x (read-one-wisp-sexp port env))) (display x)(newline) x ; > + #:compilers `((tree-il . ,compile-tree-il)) > + #:decompilers `((tree-il . ,decompile-tree-il)) > + #:evaluator (lambda (x module) (primitive-eval x)) > + #:printer write ; TODO: backtransform to wisp? Use source-properties? > + #:make-default-environment > + (lambda () > + ;; Ideally we'd duplicate the whole module hierarchy so that `set!', > + ;; `fluid-set!', etc. don't have any effect in the current environment. > + (let ((m (make-fresh-user-module))) > + ;; Provide a separate `current-reader' fluid so that > + ;; compile-time changes to `current-reader' are > + ;; limited to the current compilation unit. > + (module-define! m 'current-reader (make-fluid)) > + ;; Default to `simple-format', as is the case until > + ;; (ice-9 format) is loaded. This allows > + ;; compile-time warnings to be emitted when using > + ;; unsupported options. > + (module-set! m 'format simple-format) > + m))) > + > + > + > diff --git a/test-suite/tests/srfi-119.test b/test-suite/tests/srfi-119.test > new file mode 100644 > index 000000000..a888df41d > --- /dev/null > +++ b/test-suite/tests/srfi-119.test > @@ -0,0 +1,81 @@ > +;;;; srfi-119.test --- Test suite for Guile's SRFI-119 reader. -*- scheme -*- > +;;;; > +;;;; Copyright (C) 2023 Free Software Foundation, Inc. > +;;;; > +;;;; This library is free software; you can redistribute it and/or > +;;;; modify it under the terms of the GNU Lesser General Public > +;;;; License as published by the Free Software Foundation; either > +;;;; version 3 of the License, or (at your option) any later version. > +;;;; > +;;;; This library is distributed in the hope that it will be useful, > +;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of > +;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU > +;;;; Lesser General Public License for more details. > +;;;; > +;;;; You should have received a copy of the GNU Lesser General Public > +;;;; License along with this library; if not, write to the Free Software > +;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA > + > +(define-module (test-srfi-119) > + #:use-module (test-suite lib) > + #:use-module (srfi srfi-1) > + #:use-module (language wisp)) > + > +(define (read-string s) > + (with-input-from-string s read)) > + > +(define (with-read-options opts thunk) > + (let ((saved-options (read-options))) > + (dynamic-wind > + (lambda () > + (read-options opts)) > + thunk > + (lambda () > + (read-options saved-options))))) > + > +(define (wisp->list str) > + (wisp-scheme-read-string str)) > + > +(with-test-prefix "wisp-read-simple" > + (pass-if (equal? (wisp->list "<= n 5") '((<= n 5)))) > + (pass-if (equal? (wisp->list ". 5") '(5))) > + (pass-if (equal? (wisp->list "+ 1 : * 2 3") '((+ 1 (* 2 3)))))) > +(with-test-prefix "wisp-read-complex" > + (pass-if (equal? (wisp->list " > +a b c d e > + . f g h > + . i j k > + > +concat \"I want \" > + getwish from me > + . \" - \" username > +") '( > +(a b c d e > + f g h > + i j k) > + > +(concat "I want " > + (getwish from me) > + " - " username)))) > + > + (pass-if (equal? (wisp->list " > +define : a b c > +_ d e > +___ f > +___ g h > +__ . i > + > +define : _ > +_ display \"hello\n\" > + > +\\_") '( > +(define (a b c) > + (d e > + (f) > + (g h) > + i)) > + > +(define (_) > + (display "hello\n")) > + > +(_))))) > -- > 2.39.1 ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-09-30 13:17 ` Christine Lemmer-Webber @ 2023-09-30 20:09 ` Maxime Devos 2023-10-02 14:48 ` Christine Lemmer-Webber 0 siblings, 1 reply; 56+ messages in thread From: Maxime Devos @ 2023-09-30 20:09 UTC (permalink / raw) To: Christine Lemmer-Webber, Dr. Arne Babenhauserheide; +Cc: guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 885 bytes --] Op 30-09-2023 om 15:17 schreef Christine Lemmer-Webber: > Haven't fully caught up on this thread, but as a side note I have a > mostly-finished implementation of a Wisp parser which takes a very > different approach than Arne's, and was more understandable to me > personally, a bit more functional and recursive-descent style. > > I could make it available if anyone is curious. > > Would love to see Wisp in Guile proper! I think I technically can't count as ‘curious’ here, as I don't think I'll actually read this other implementation, but I would be interested in it nonetheless (to potentially replace Arne's implementation later, once it's complete), because it sounds like you might have avoided the REPR (*) bug thing from Arne's implementation. (*) Extremely unlikely to be a problem in practice, but still a bug. Best regards, Maxime Devos. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 56+ messages in thread
* Re: [PATCH] add language/wisp to Guile? 2023-09-30 20:09 ` Maxime Devos @ 2023-10-02 14:48 ` Christine Lemmer-Webber 0 siblings, 0 replies; 56+ messages in thread From: Christine Lemmer-Webber @ 2023-10-02 14:48 UTC (permalink / raw) To: Maxime Devos; +Cc: Dr. Arne Babenhauserheide, guile-devel Maxime Devos <maximedevos@telenet.be> writes: > [[PGP Signed Part:Undecided]] > Op 30-09-2023 om 15:17 schreef Christine Lemmer-Webber: >> Haven't fully caught up on this thread, but as a side note I have a >> mostly-finished implementation of a Wisp parser which takes a very >> different approach than Arne's, and was more understandable to me >> personally, a bit more functional and recursive-descent style. >> I could make it available if anyone is curious. >> Would love to see Wisp in Guile proper! > > I think I technically can't count as ‘curious’ here, as I don't think > I'll actually read this other implementation, but I would be > interested in it nonetheless (to potentially replace Arne's > implementation later, once it's complete), because it sounds like you > might have avoided the REPR (*) bug thing from Arne's implementation. > > (*) Extremely unlikely to be a problem in practice, but still a bug. > > Best regards, > Maxime Devos. Well it looks like I did make it available already, I just forgot, and didn't advertise it much: https://gitlab.com/dustyweb/rewisp/-/blob/main/rewisp.scm?ref_type=heads It really isn't very complete, but here's some of the example data I was using: (define wisp-fac-noinfix-repeat "\ define : factorial n ; foo __ if : zero? n ____ . 1 ____ * n : factorial : - n 1 define : factorial n __ if : zero? n ____ . 1 ____ * n : factorial : - n 1") scheme@(rewisp)> ,pp (parse-lines->sexp (call-with-input-string wisp-fac-noinfix-repeat read-wisp-lines)) $6 = ((define (factorial n) (if (zero? n) 1 (* n (factorial (- n 1))))) (define (factorial n) (if (zero? n) 1 (* n (factorial (- n 1)))))) What's kind of interesting is to look at it before the parse-lines->sexp step though: scheme@(rewisp)> ,pp (call-with-input-string wisp-fac-noinfix-repeat read-wisp-lines) $7 = (#<<line> indent: 0 args: (#<syntax:unknown file:1:0 define> (#<syntax:unknown file:1:9 factorial> #<syntax:unknown file:1:19 n>))> #<<line> indent: 4 args: (#<syntax:unknown file:2:4 if> (#<syntax:unknown file:2:9 zero?> #<syntax:unknown file:2:15 n>))> #<<line> indent: 7 args: (#<<DOT> source-loc: #(#f 2 7)> #<syntax:unknown file:3:9 1>)> #<<line> indent: 7 args: (#<syntax:unknown file:4:7 *> #<syntax:unknown file:4:9 n> (#<syntax:unknown file:4:13 factorial> (#<syntax:unknown file:4:25 -> #<syntax:unknown file:4:27 n> #<syntax:unknown file:4:29 1>)))> #f #<<line> indent: 0 args: (#<syntax:unknown file:6:0 define> (#<syntax:unknown file:6:9 factorial> #<syntax:unknown file:6:19 n>))> #<<line> indent: 4 args: (#<syntax:unknown file:7:4 if> (#<syntax:unknown file:7:9 zero?> #<syntax:unknown file:7:15 n>))> #<<line> indent: 7 args: (#<<DOT> source-loc: #(#f 7 7)> #<syntax:unknown file:8:9 1>)> #<<line> indent: 7 args: (#<syntax:unknown file:9:7 *> #<syntax:unknown file:9:9 n> (#<syntax:unknown file:9:13 factorial> (#<syntax:unknown file:9:25 -> #<syntax:unknown file:9:27 n> #<syntax:unknown file:9:29 1>)))>) So what it does is rewisp builds up a set of lines and parses each line individually, noting its indentation level and the "arguments" that appear in it. It then has a separate step to assemble it into a sexp structure by examining the indentation level and whether or not it sees the special <DOT> record. Guile's syntax records are used so that this would be able to work correctly with the rest of Guile's tools. So, that's it... first, read each line one by one, don't try to figure out its relationship to the other lines, and we have a flat structure. Next, turn that flat structure into a nested structure. I thought the design was pretty good. - Christine ^ permalink raw reply [flat|nested] 56+ messages in thread
end of thread, other threads:[~2023-10-02 14:59 UTC | newest] Thread overview: 56+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <mailman.886.1677397547.13386.guile-devel@gnu.org> 2023-02-26 11:02 ` [PATCH] add language/wisp to Guile? Marc Nieper-Wißkirchen 2023-02-26 23:22 ` Philip McGrath 2023-02-27 5:00 ` Dr. Arne Babenhauserheide 2023-02-27 7:26 ` Marc Nieper-Wißkirchen 2023-02-28 4:27 ` Philip McGrath 2023-02-28 6:57 ` Marc Nieper-Wißkirchen 2023-02-03 21:26 Dr. Arne Babenhauserheide 2023-02-04 15:08 ` Maxime Devos 2023-02-04 15:46 ` Dr. Arne Babenhauserheide 2023-02-04 19:09 ` Maxime Devos 2023-02-04 21:35 ` Dr. Arne Babenhauserheide 2023-02-05 15:08 ` Maxime Devos 2023-02-14 8:32 ` Dr. Arne Babenhauserheide 2023-02-14 21:24 ` Dr. Arne Babenhauserheide 2023-02-14 23:01 ` Maxime Devos 2023-02-15 1:46 ` Matt Wette 2023-02-16 21:38 ` Dr. Arne Babenhauserheide 2023-02-17 1:26 ` Matt Wette 2023-02-23 11:36 ` Ludovic Courtès 2023-02-23 17:48 ` Dr. Arne Babenhauserheide 2023-02-23 18:42 ` Maxime Devos 2023-02-24 15:45 ` Ludovic Courtès 2023-02-24 16:34 ` Dr. Arne Babenhauserheide 2023-03-08 10:34 ` Dr. Arne Babenhauserheide 2023-02-24 23:48 ` Maxime Devos 2023-02-24 23:51 ` Maxime Devos 2023-02-25 0:15 ` Matt Wette 2023-02-25 10:42 ` Maxime Devos 2023-02-17 23:06 ` Maxime Devos 2023-02-18 3:50 ` Philip McGrath 2023-02-18 15:58 ` Maxime Devos 2023-02-18 19:56 ` Matt Wette 2023-02-21 12:09 ` Dr. Arne Babenhauserheide 2023-02-26 7:45 ` Philip McGrath 2023-02-26 15:42 ` Maxime Devos 2023-02-26 16:14 ` Dr. Arne Babenhauserheide 2023-02-26 17:58 ` Matt Wette 2023-02-26 18:03 ` Dr. Arne Babenhauserheide 2023-02-26 18:20 ` Matt Wette 2023-02-26 21:39 ` Dr. Arne Babenhauserheide 2023-10-02 14:59 ` Christine Lemmer-Webber 2023-02-23 7:59 ` Maxime Devos 2023-02-23 8:51 ` Dr. Arne Babenhauserheide 2023-02-23 18:04 ` Maxime Devos 2023-02-23 18:22 ` Maxime Devos 2023-02-23 18:36 ` Maxime Devos 2023-02-23 18:37 ` Maxime Devos 2023-02-15 8:36 ` Dr. Arne Babenhauserheide 2023-02-15 20:13 ` Maxime Devos 2023-02-16 7:01 ` Dr. Arne Babenhauserheide 2023-02-16 8:03 ` Dr. Arne Babenhauserheide 2023-02-16 11:30 ` Maxime Devos 2023-02-16 21:35 ` Dr. Arne Babenhauserheide 2023-09-30 13:17 ` Christine Lemmer-Webber 2023-09-30 20:09 ` Maxime Devos 2023-10-02 14:48 ` Christine Lemmer-Webber
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).