Op 26-02-2023 om 08:45 schreef Philip McGrath:
> Hi,
> 
> On Sat, Feb 18, 2023, at 10:58 AM, Maxime Devos wrote:
>> On 18-02-2023 04:50, Philip McGrath wrote:
>>> I haven't read the patch or this thread closely,
>>
>> I'll assume you have read it non-closely.
>>
>>> but R6RS has an answer to any concerns about compatibility with `#lang`. At the beginning of Chapter 4, "Lexical and Datum Syntax" (<http://www.r6rs.org/final/html/r6rs/r6rs-Z-H-7.html#node_chap_4>) the report specifies:
>>>
>>>>    An implementation must not extend the lexical or datum syntax in any way, with one exception: it need not treat the syntax `#!<identifier>`, for any <identifier> (see section 4.2.4) that is not `r6rs`, as a syntax violation, and it may use specific `#!`-prefixed identifiers as flags indicating that subsequent input contains extensions to the standard lexical or datum syntax. The syntax `#!r6rs` may be used to signify that the input afterward is written with the lexical syntax and datum syntax described by this report. `#!r6rs` is otherwise treated as a comment; see section 4.2.3.
>>
>> That is for '#!lang', not '#lang'.  R6RS allows the former, but the
>> patch does the latter.  As such, R6RS does not have an answer about
>> incompatibility with `#lang', unless you count ‘it's incompatible’ as an
>> answer.
>>
> 
> Let me try to be more concrete.
> 
> If you want a portable, RnRS-standardized lexical syntax for `#lang`, use `#!<identifier>`, and systems that understand `#lang` will treat it (in appropriate contexts) as an alias for `#lang `.

RnRS only standardises #!r6rs, not #!<identifier>.  Even if RnRS 
standardised #!<identifier> for values of <identifier> that aren't rnrs, 
the RnRS only holds sway for Scheme, and one of the main points of 
Guile's language system is to support more than only Scheme.

> 
> Alternatively, you could embrace that Guile (like every other Scheme system I'm aware of) starts by default in a mode with implementation-specific extensions. Indeed, R6RS Appendix A specifically recognizes that "the default mode offered by a Scheme implementation may be non-conformant, and such a Scheme implementation may require special settings or declarations to enter the report-conformant mode" [1]. Then you could just write `#lang` and worry about the non-portable block comments some other day. This is what I would personally prefer.

Emphasis on 'non-conformant'.  The appendix states that Scheme 
implementations don't need to be R6RS by default; it doesn't state that 
things non-conformant things are conformant with R6RS.

Remember that this part of the discussion started with:

‘The '#lang whatever' stuff makes Scheme (*) files unportable between
implementations, as '#lang scheme' is not a valid comment’.

The R6RS might permit non-R6RS implementations, but this does not make 
non-R6RS constructs like '#lang scheme' portable.


>>> In Racket, in the initial configuration of the reader when reading a file, "`#!` is an alias for `#lang` followed by a space when `#!` is followed by alphanumeric ASCII, `+`, `-`, or `_`." (See <https://docs.racket-lang.org/reference/reader.html#%28part._parse-reader%29>.) [...] > (Guile does not handle `#!r6rs` properly, presumably because of the
>> legacy `#!`/`!#` block comments. I think this should be a surmountable
>> obstacle, though, especially since Guile does support standard `#|`/`|#`
>> block comments.)
>>
>> ‘#! ... !#’ comments aren't legacy; they exist to allow putting the
>> shebang in the first line of a script, and to pass additional arguments
>> to the Guile interpreter (see: (guile)The Top of a Script File) (*).  As
>> such, you can't just replace them with #| ... |# (unless you patch the
>> kernel to recognise "#| ..." as a shebang line).
>>
>> (*) Maybe they exist for other purposes too.
> 
> According to "(guile)Block Comments", the `#!...!#` syntax existed before Guile 2.0 added support for `#|...|#` comments from SRFI 30 and R6RS.

I agree, and I don't follow what your point is here.


>> Furthermore, according to the kernel, #!r6rs would mean that the script
>> needs to be interpreted by a program named 'r6rs', but 'guile' is named
>> 'guile', not 'r6rs'.  (I assume this is in POSIX somewhere, though I
>> couldn't find it.)
>>
>> (This is an incompatibility between R6RS and any system that has shebangs.)
>>
> 
> This is not an incompatibility, because the `#!r6rs` lexeme (or `#!<identifier>`, more generally) is not the shebang line for the script. R6RS Appendix D [2] gives this example of a Scheme script:
> 
> ```
> #!/usr/bin/env scheme-script
> #!r6rs
> (import (rnrs base)
>          (rnrs io ports)
>          (rnrs programs))
> (put-bytes (standard-output-port)
>             (call-with-port
>                 (open-file-input-port
>                   (cadr (command-line)))
>               get-bytes-all))
> ```

OK, didn't notice that appendix.  Only covers Scheme, though.

> -- 
> The appendix says that, "if the first line of a script begins with `#!/` or `#!<space>`, implementations should ignore it on all platforms, even if it does not conform to the recommended syntax". Admittedly this is not handled as consistently as I would prefer: I wish they had just standardized `#!/` and `#! ` as special comment syntax, as Racket does, and clarified the interaction with `#!<identifier>`. But Matt points out that JavaScript also has very similar special treatment for a single initial shebang comment. Lua has a similar mechanism: my vague recollection is that many languages do.

I do not follow what your point is here -- I only (falsely) claimed that 
POSIX and R6RS are incompatible w.r.t. shebangs and "#!"; I did not make 
such claims for other languages -- some other languages don't even have 
"#!" (e.g. BASIC).

>>>>
>>>> (^) it doesn't integrate with the module system -- more concretely,
>>>> (use-modules (foo)) wouldn't try loading foo.js -- adding '-x' arguments
>>>> would solve that, but we agree that that would be unreasonable in many
>>>> situations.  (Alternatively one could place ECMAScript code in a file
>>>> with extension '.scm' with a '#lang' / '-*- mode: ecmascript -*-', but
>>>> ... no.)
> 
> Generally I would use `.scm` (or `.rkt`), and certainly I would do so if there isn't some well-established other extension. If you are just using the file, you shouldn't necessarily have to care what language it's implemented in internally.

Maybe you would, but Guile shouldn't require people to change the 
extension of source files to something invalid, as I pointed out with 
the ECMAScript example.  .scm means Scheme, not ECMAScript.

As such, support for non-.scm file extensions is required.

> In particular, I don't think the `#lang` concept should be conflated with editor configuration like `'-*- mode: ecmascript -*-`.
 > [...]

Then don't do that, and use non-editor configuration like
'-*- programming-language: ecmascript -*-' instead.  While Emacs is the 
main user of '-*- ... -*-' lines, there is nothing stopping use from 
adding a few variables like e.g. 'programming-language' (*) that Emacs 
doesn't assign a meaning to.

(*) I don't actually know if Emacs assigns a meaning to this variable or 
not.  Some other word might perhaps be needed.

For convenience, I would recommend supporting '-*- mode: ... -*-' too, 
such that non-Scheme source files can sometimes be loaded without making 
any Guile-specific changes to the source files.  If whoever writes or 
reads the source file wants to use another Emacs mode, or if it the mode 
is ambiguous because it covers multiple languages, there is nothing 
stopping them from setting both 'mode: ...' and 'programming-language: ...':

% -*- language: datalog; mode: racket -*-
[...]

  As an example, consider these two Racket programs:
> 
> ```
> #!datalog
> parent(anchises, aeneas).
> parent(aeneas, ascanius).
> ancestor(A, B) :- parent(A, B).
> ancestor(A, B) :- parent(A, C), ancestor(C, B).
> ancestor(A, ascanius)?
> ```
> 
> ```
> #lang algol60
> begin
>      comment Credit to Rosetta Code;
>      integer procedure fibonacci(n); value n; integer n;
>      begin
>          integer i, fn, fn1, fn2;
>          fn2 := 1;
>          fn1 := 0;
>          fn  := 0;
>          for i := 1 step 1 until n do begin
>              fn  := fn1 + fn2;
>              fn2 := fn1;
>              fn1 := fn
>          end;
>          fibonacci := fn
>      end;
>   
>      integer i;
>      for i := 0 step 1 until 20 do printnln(fibonacci(i))
> end
> ```
> 
> While I'm sure there are Emacs modes available for Datalog and Algol 60, and some people might want to use them for these programs, I would probably want to edit them both in racket-mode: because racket-mode supports the `#lang` protocol, it can obtain the syntax highlighting, indentation, and other support defined by each language, while also retaining the global features that all `#lang`-based languages get "for free", like a tool to rename variables that respects the actual model of scope. This is one of the value propositions of the `#lang` system.

As pointed out by my previous example, this is solved by '-*- ... -*-' too.

> 
>>>
>>> Racket has a mechanism to enable additional source file extensions without needing explicit command-line arguments by defining `module-suffixes` or `doc-modules-suffixes` in a metadata module that is consulted when the collection is "set up": https://docs.racket-lang.org/raco/setup-info.html However, this mechanism is not widely used.
>>
>> I guess this is an improvement over the runtime 'guile -x extension'.
>> However, if I'm understanding 'setup-info.html' correctly, the downside
>> is that you now need a separate file containing compilation settings.
>>
>> I have previously proposed a mechanism that makes the '-x' +
>> '--language' a compile-time thing (i.e., embed the source file extension
>> in the compiled .go; see previous e-mails in this thread), without
>> having to make a separate file containing compilation settings.
>>
>> How is Racket's method an improvement over my proposal?
>>
> 
> My focus in this thread is explaining and advocating for `#lang`. I see the whole business with file extensions as basically orthogonal to `#lang`, and my opinions about it are much less strong, but I'll try to answer your question. I think it would make sense for `.go` files to record the file extension of their corresponding source files: Racket's `.zo` files do likewise. I don't object to a command-line option *at compile-time* (as you said) to enable additional file extensions, and I agree that there isn't a huge difference between that and an approach with a separate configuration file, though I do find the configuration-file approach somewhat more declarative, which I prefer.

'--language whatever' appears pretty declarative to me, as in it 
declares that the language is 'whatever'.

> What I was really trying to argue here is that the file extension should not determine the meaning of the program it contains: more on that below.

That's what the '--language whatever' compilation argument is for: it 
overrides the 'guess by file extension' fallback.

>>> Overall, the experience of the Racket community strongly suggests that a file should say what language it is written in. Furthermore, that language is a property of the code, not of its runtime environment, so environment variables, command-line options, and similar extralinguistic mechanism are a particularly poor fit for controlling it.
>>
>> Agreed on the 'no environment variables' thing, disagreed on the 'no
>> command-line options'.  In the past e-mails in this thread, there was
>> agreement on the ‘embed the source file extension in the compiled .go or
>> something like that; and add -x extension stuff _when compiling_ (not
>> runtime!) the software that uses the extension’.
>>
>> Do you any particular issues with that proposal?  AFAICT, it solves
>> everything and is somewhat more straightforward that Racket.
>>
> 
> I don't have particular issues with a compile-time command-line option to determine which files to compile. I do object to using command-line options or file extensions to determine what language a file is written in.
> 
>>> File extensions are not the worst possible mechanisms, but they have similar problems: code written in an unsaved editor or a blog post may not have a file extension.
>>
>> With the proposal I wrote, it remains possible to override any 'file
>> extension -> language' mapping.  It's not in any way incompatible with
>> "-*- lang: whatever -*-"-like comments.
>>
>> Additionally, Guile can only load files that exist (i.e, 'saved'); Guile
>> is not an editor or blog reader, so these do not appear problems for
>> Guile to me.
>>
> 
> While it's true that the only files Guile can load are "files that exist", it's not true that "Guile can only load files": consider procedures like `eval-string`, `compile`, and, ultimately, `read-syntax`.
* read-syntax is for reading S-expressions -- it is only for Scheme,
   other languages are out-of-scope for that procedure.  As such,
   read-syntax appears irrelevant here to me.

* For 'compile' and 'eval-string', I'd like to that when the point out
   that they have a "#:from" and #:lang" argument to set the language, as
   you appear to know going by your responses below.  As such, even if
   Guile had an integrated editor, that editor can pass the language to
   Guile's compiler.

   I mean, if the editor is good, it has syntax highlighting, and to do
   syntax highlighting it needs to know the language, so it knows the
   language anyway (e.g. maybe it has separate "Write new Scheme" and
   "Write new ECMAScript" buttons, or maybe it has a 'mode: scheme' and
   'mode: ecmascript' like Emacs and being an editor, it then knows how
   to convert that editor configuration into #:from/#:lang).

* What I meant with 'Guile can only load files that exist',
   is that the files it loads are only those that exist.
   I did not mean that no loadable non-file things exist.

   The point here, is that if you wrote a blog post that defines the
   (foo) module and you enter (use-modules (foo)) in a Guile REPL, it
   isn't going to surf to your blog to download the (foo) module.  As
   Guile doesn't even know about your blog post, it has no use for any
   file extension or language declaration that your blog post about (foo)
   might or might not have.


> AFAICT, to the extent that Guile's current implementations of such procedures support multiple languages, they rely on out-of-band configuration, like an optional `#:language` argument, which is just as extra-linguistic as relying on command-line options, environment variables, or file extensions.

First, I never proposed relying on environment variables.  I oppose 
using environment variables for these things.  Why are you mentioning 
environment variables, when this has never been proposed?

Second, the implicit argument here appears to be 'extra-linguistic is 
bad, so we shouldn't do these extra-linguistic' things.  But what's the 
problem with being 'extra-linguistic'?  Some stuff like environment 
variables are plain bad here (no disagreement here), file extensions are 
bad to rely on but acceptable and convenient as a fallback.

Third, I am not proposing to rely on command line options and file 
extension -- I only propose _using_ them, not _relying_ on them -- if 
someone wants to implement an in-band (intra-linguistic?) override like 
'-*- ... -*-'/#lang for file-extension based detection, they can do that 
-- my '-*- ... -*-' is just a proposed improvement over "#lang'.

Fourth, TBC, I'd like to point out that '-*- ... -*-' is equally 
'intra/extra-linguistic' as '#!lang' (see my response to 'magic 
comments' later), though I do know that's not the point you appear to be 
making right here.

> What I'm trying to advocate is that programs should say in-band, as part of their source code, what language they are written in.

That's done by '-*- ... -*-' too, and I haven't noticed any 
argumentation for ‘programs should say in-band what language they are 
written in’.

Also, there is a gap between the following five statements, which you 
appear to sometimes be conflating:

   (A) Programs should say in-band what language they are written in.
   (B) ‘Guile should use in-band information to determine what language a
       program is written in.’
   (C) ‘Guile should use out-of-band information to determine what
       language a program is written in.’
   (D) ‘Guile should exclusively use out-of-band information to determine
       what language a program is written in.’
   (E) ‘Guile should exclusively use in-of-band information to determine
       what language a program is written in.’

I disagree with (A), because often it's perfectly clear from context 
(out-of-band) what language it is.

Take for example Guile itself.  Being Guile, of course everything under 
'modules/' is Scheme code.  Adding '#!r6rs' or '-*- language: scheme 
-*-' lines to every .scm isn't incorrect, but is rather silly. 
Likewise, I have written a Scheme library called 'Scheme-GNUnet'.  From 
the name alone, it is clear that it's Scheme.  More generally, usually 
it's pretty clear (for a human) which language it is by just looking at 
the code, and if not, probably the README mentions which language the 
software uses.

I don't dispute (B), but neither do I find it particularly important 
given that adding a '--language=whatever' argument is trivial.

I would like to point out that (A) does not imply (B) -- it is possible 
to consider it good practice to mention the language in-band, without 
any language implementations actually using this information.

More to the point, to me (A) appears irrelevant to this thread.  Sure, 
perhaps it's a good practice, but Guile is not a programmer; Guile is a 
language implementation.  (A) is only relevant insofar Guile would make 
use of this in-band information.

 > What I'm trying to advocate is that programs should say in-band, as
 > part of their source code, what language they are written in.

This would be advocating for (A).  But as mentioned above, (A) is 
irrelevant by itself, and it doesn't imply (B).

It is also false -- you weren't advocating for (A), but for (B) -- (A) 
is just a means to (B) in your argumentation structure.  Quoting one of 
your first messages:

> To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, the ancient & spry" (<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>):
> 
>> On the change side, we need parallel installability for entire languages. Racket did a great job facilitating this with #lang and we should just adopt that.

You are also advocating for 'E/not (C)':

> I do object to using command-line options or file extensions to determine what language a file is written in. 

You also appear to be thinking that I'm advocating for '(D)' -- while I 
agree with (D) (using a non-universal language construct (*) like 
'#lang' to determine the language something is written in, is rather 
circular), I'm not arguing for it.

(*) Again, #lang is rather Racket-specific, whereas comments are mostly 
universal.

>> If the editor needs to determine the language for syntax highlighting or
>> such, then there exist constructs like ';; -*- mode: scheme -*-' that
>> are valid Scheme, but that's not a Guile matter.
>>
> 
> See above for why the `#!language/wisp` option is perfectly valid R6RS Scheme

Wisp isn't R6RS.  Wisp code needs to be valid Wisp, not valid R6RS 
Scheme.   There also exist languages beyond Wisp and Scheme.

> and for some of my concerns about overloading editor configuration to determine the semantics of programs.

See above replies.

> More broadly, everyone who reads a piece of source code, including humans as well as editors and the `guile` executable, needs to know what language it's written in to hope to understand it.

For programmers, this is covered by:

   * looking at the code -- even without any explicit in-band information
     like ';; -*- ... -*-' comments or "#lang", or out-of-band
     information like file extensions, a README or Makefile with
     compilation, it usually is pretty clear what language it is in.

   * usually source code is in files, which usually has file extensions.
     Usually there's a good map file extension->language, e.g. .scm files
     only contain Scheme, .js only contains ECMAScript, ...

For editors, this is covered by:

   * Editor configuration like '-*- mode: scheme -*-'.
   * Language-specific declarations like #lang, #!r6rs,
     '-*- programming-language: scheme -*-'
   * File extensions.
   * If the editor guessed wrong, likely the syntax highlighting is
     wrong etc., so the programmer gives a hint to the editor
     (e.g. by adding a -*- mode: scheme -*- line, or #!r6rs, ...)

For the Guile executable, this is covered by:

   * --language=.../#:from/#:lang arguments.
   * -*- ... -*- / #!r6rs lines (but not #lang except when needed for
     compatibility with Racket, otherwise Guile would create
     incompatibilities.)
   * File extensions.
   * Default to Scheme.
   * If guessing wrong, there will almost surely be some parsing error,
     in which case the programmer will intervene by modifying a single
     line in the Makefile or such to add "--language=" line, or if they
     per se want to spend much more time than needed, add
     "-*- programming-language: whatever -*-"  comments to every single
     source file.

>>> (For more on this theme, see the corresponding point of the Racket Manifesto: <https://cs.brown.edu/~sk/Publications/Papers/Published/fffkbmt-racket-manifesto/paper.pdf>) Actually writing the language into the source code has proven to work well.
>>
>> What is the corresponding point?  I'm not finding any search results for
>> 'file extension' or 'file name', and I'm not finding any relevant search
>> results for 'editor'.  Could you give me a page reference and a relevant
>> quote?
>>
> 
> I was trying to refer to section 5, "Racket Internalizes Extra-Linguistic Mechanisms", which begins on p. 121 (p. 9 of the PDF). Admittedly, the connection between the main set of examples they discuss and this conversation is non-obvious. Maybe the most relevant quote is the last paragraph of that section, on p. 123 (PDF p. 11): "Finally, Racket also internalizes other aspects of its context. Dating back to the beginning, Racket programs can programmatically link modules and classes. In conventional languages, programmers must resort to extra-linguistic tools to abstract over such linguistic constructs; only ML-style languages and some scripting languages make modules and classes programmable, too." (Internal citations omitted.)

This e-mail thread is about determining the language, not classes and 
modules.  Trying to decode this vague paragraph, the relevant bit here 
appears ‘must resort to _extra-linguistic_ tools to abstract over such 
_linguistic constructs_’.

As such, I assume that 'extra-linguistic' refers to file extensions (and 
other things, but it's the file extensions that are relevant here).
Using this guess to unvaguify the phrasing, I get:

   ‘Programmers must resort to use file extensions to indicate which 
language a programmer is written in.’

However, that this is a bad thing appears to be the point that you were 
making in the first place, for which you gave the PDF as a source, so 
this doesn't explain anything.

>>> To end with an argument from authority, this is from Andy Wingo's "lessons learned from guile, the ancient & spry" (<https://wingolog.org/archives/2020/02/07/lessons-learned-from-guile-the-ancient-spry>):
>>>
> 
> Sorry, this was meant to be tongue-in-cheek, and it seems that didn't come across. "Argument from authority" is often considered a category of logical fallacy, and ending with a quote is sometimes considered to be bad style or to weaken a piece of persuasive writing.
> 
>>     * I previously pointed out some problems with that proposal
>>       -- i.e., '#lang whatever' is bogus Scheme / Wisp / ...,
> 
> I hope I've explained why something like `#!language/wisp` is perfectly within the bounds of R6RS.

No, because Wisp is not R6RS -- R6RS is only relevant insofar the Wisp 
standard delegates to R6RS. (TBC I'm not claiming that #!language/wisp 
is invalid Wisp, I'm only claiming that your argumentation has holes here.)

Also, you forgot the '...' in 'Scheme / Wisp / ...' -- while R6RS is 
somewhat relevant to Wisp, there exist languages over which the R6RS has 
no sway, e.g. BASIC.

> Also, given that Guile already starts with non-standard extensions enabled by default, I don't see any reason not to also support `#lang language/wisp`.

Here is a reason for not adding non-standard extensions, from a previous 
reply of mine:

> The '#lang whatever' stuff makes Scheme (*) files unportable between implementations, as '#lang scheme' is not a valid comment -- there exist Schemes beyond Guile and Racket.  If it were changed to recognising
> '-*- mode: scheme -*-' or '-*- language: scheme -*-' or such, it would be better IMO, but insufficient, because (^).
 >
 > (*) Same argument applies for some, but not all, other non-Scheme 
languages too.

That Guile might have made some mistakes with non-standard 
enabled-by-default language extensions in the past, does not mean that 
it should make more mistakes in the present.

> In particular, the spelling of `#lang` proceeds directly from the Scheme tradition. This is from the R6RS Rationale document, chapter 4, "Lexical Syntax", section 3, "Future Extensions" [3]: [...]

Again, the Scheme tradition holds no sway over non-Scheme languages 
(except for situations like Wisp, perhaps), e.g. Pascal and BASIC.
Guile does not limit itself to Scheme languages, e.g. it has some 
support for elisp, brainfuck and python (see: python-on-guile).

>>       and
>>       'the module system won't find it, because of the unexpected
>>       file extensions'.
>>
> 
> This is indeed something that needs to be addressed, but it seems like a very solvable problem. Using the extension ".scm" for everything would be one trivial solution. Something like your proposal to enable file extensions based on a compile-time option could likewise be part of a solution.

The problem with the 'use .scm for everything' solution is that you 
would need to use .scm for everything, even non-Scheme files, and even 
when the source code comes from a project that uses a non-Guile 
implementation and as such uses very different extensions, e.g. '.js'.

> In general, I'll say that, while using Guile, I've often missed Racket's more flexible constructs for importing modules. I especially miss `(require "foo/bar.rkt")`, which imports a module at a path relative to the module where the `require` form appears: it makes it easy to organize small programs into multiple files without having to mess with a load path.

I fail to see the relevancy of this comment.  Also, 'include' already 
doe something pretty close to this; presumably 'use-modules' could be 
modified to accept a #:relative-source-file-name argument:

(define-module (baz)) ; /project/baz.scm
;; -> /project/foo/bar.rkt
(use-modules ((foo bar) #:relative-source-file-name "foo/bar.rkt"))

> On Thu, Feb 23, 2023, at 1:42 PM, Maxime Devos wrote:
>> Have you seen my messages on how the "#lang" construct is problematic
>> for some languages, and how alternatives like "[comment delimiter] -*-
>> stuff: scheme/ecmascript/... -*- [comment delimiter]" appear to be
>> equally simple (*) and not have any downsides (**).
>>
>> (*) The port encoding detection supports "-*- coding: whatever -*-",
>> presumably that functionality could be reused.
>>
> 
> IMO, the use of  "-*- coding: whatever -*-" to detect encoding is an ugly hack and should not be extended further.
> 
> I tried to raise some objections above to conflating editor configuration with syntax saying what a file's language is.
> 
> More broadly, I find "magic comments" highly objectionable. The whole point of comments is to be able to communicate freely to human readers without affecting the interpreter/compiler/evaluator. Introducing magic comments means must constantly think about whether what you are writing for humans might change the meaning of your program. Magic comments *without knowing a priori what is a comment* are even worse: now, you have to beware of accidental "magic" in ALL of the lexical syntax of your program. (Consider that something like `(define (-*- mode: c++ -*-) 14)` is perfectly good Scheme.)

I object to the second claim -- while I can't account for aliens given 
the lack of them, I find it pointless to restrict the purpose of 
comments to human animals.

The third and penultimate claim are false.  If implemented correctly in 
Guile, only the first language declaration counts, it's not 'ALL of the 
lexical syntax of your program'.

You previously claimed that programs should contain in-band information 
on which language something is written in.  If this is followed, your 
example would actually look like:

;; -*- programming-language: scheme -*-
;; ^ or mode: c++, or #!r6rs, or an out-of-band --language=..., ...
(define (-*- mode: c++ -*-) 14)

As the relevant '-*- ...: scheme -*-' precedes the irrelevant '-*- mode: 
c++ -*-', it's the relevant one that is picked up by Guile, not the 
irrelevant one.

As such, as long as the programmer uses the '--language=' compilation 
option in the Makefile, or puts a 'real' language declaration in the 
beginning of the source file (as a 'magic comment',  or #!r6rs, or #lang 
as far as required for compatibility with Racket), things will work out.

Even if the programmer doesn't do any of that, it's still unproblematic, 
because of error messages at compilation / interpretation time -- 
different languages tend to have incompatible syntax, if you pass a 
Scheme program to a C++ parser you'll just get a stream of syntax errors.

Surely, the programmer will pass the code to the compiler or interpreter 
at some point, right?  Otherwise, the programming was pointless. 
Likewise, test suites (ought to) exist, which would catch these problems 
even if they weren't written to catch these problems.  (If they don't 
exist, then the programmer has much worse problems than a super 
implausible '(define (-*- mode: c++ -*- 14)' situation.)

> 
> (It's not really relevant for the `#lang`-like case, but something I find especially ironic about encoding "magic comments" or, say, `<?xml version="1.0" encoding="UTF-8"?>`, is that suddenly if you encode the Unicode text in some other encoding it becomes a lie.)

That sounds exactly the same situation as with #lang to me (and, as 
such, relevant). If you take a Scheme file

   #scheme
   ; ^ equivalent of <?xml version="1.0" encoding="UTF-8?>"
   [...] ; <- Scheme code

and then convert it to Wisp, but forget to adjust the "#lang":

   #scheme
   ; ^ equivalent of <?xml version="1.0" encoding="something-else"?>
   [...]; <-- Wisp code

then you'll get a bunch of syntax errors.

> 
> On Fri, Feb 24, 2023, at 6:51 PM, Maxime Devos wrote:
>> On 25-02-2023 00:48, Maxime Devos wrote:
>>>>> (**) For compatibility with Racket, it's not like we couldn't
>>>>> implement both "#lang" and "-*- stuff: language -*-".
>>
>> TBC, I mean ‘only support #lang' for values of 'lang' that Racket
>> supports’
> 
> If I understand what you're proposing here, I don't think it's a viable option.
> 
> The fundamental purpose of the `#lang` construct (however you spell it) is to provide an open, extensible protocol for defining languages. Thus, "values of 'lang' that Racket supports" are unbounded, provided that a module has been installed where the language specification says to look. From The Racket Reference [4]:

The problem, as I wrote several times previously in different words, is 
that this 'open, extensible protocol' is not a standard protocol shared 
between languages.  No language that precede the existence of Racket 
acknowledges this protocol in its specification of its syntax, and, like 
I said before, if the language doesn't have "#" comments, then #lang is 
also contrary to the syntax of the language.

Like I wrote about R6RS: Racket only holds sway over Racket; it has no 
authority on the syntax of, say, BASIC and Pascal.

Also, being unbounded in not a problem, because unbounded!=infinite.  At 
any point in time, Racket itself only supports a finite number of 
'values of 'lang'', and at any point at time there are only a finite 
number of external modules that implement certain 'lang'.

As such, at any version of Guile, Guile could have a finite list of 
'lang' where it recognises the Racket-specific extension #lang extension 
which is incompatible with non-Racket, non-Guile implementations.

> [...]
> I am definitely **not** suggesting that Guile implement all the details of Racket's `#lang` implementation. What I do strongly advocate is that you design Guile's support for `#lang` (or `#!`) to leave open a pathway for compatibility in the future. [...]

The problem with this advocating, is that I agree with you here (except 
for 'you design' (*)), so why are you repeating this again?  I wrote 
something among the lines ‘For __compatibility__ with Racket, __#lang 
should be recognised for values of 'lang' that are recognised by 
Racket__, but not for other languages’ (emphasis added).

(*) Sure, someone could implement this compatibility, whatever, but we 
don't need this compatibility for Wisp.  For Wisp, the more general and 
less problematic 'embed source file name in .go, + --language/file 
extension guessing' suffices.  It's also rather pushy -- _you_ are 
demanding that _I_ paper over a source of incompatibility _introduced by 
others_ (Racket) (and furthermore _I_ consider that source of 
incompatibility _bad_), in the ML of a _volunteer project_, in a 
discussion that's ultimately about Wisp, not Racket, where _I_ (**) 
already have voluntarily designed a solution for Wisp?

(**) And others maybe, I don't recall how much can be attributed to whom.

 > [...]
> (Other kinds of potential namespace collisions are easier to manage: for example, we could imagine that `(use-modules (foo bar baz))` might not access the same module as `(require foo/bar/baz)`. [...]

This is interesting but seems completely orthogonal; this e-mail thread 
is about detecting which language something is in, and finding source 
files with non-.scm modules, not about making the module system non-global.

 > [...]
> I've sort of alluded above to my pipe dream of a grand unified future for Racket-and-Guile-on-Chez, Guile-and-Racket-on-the-Guile-VM, and endless other possibilities. I wrote about it in more detail on the guix-devel list at [10]. (These thoughts were inspired by conversations with Christine Lemmer-Webber, though she bears no responsibility for my zany imaginings.)

OK, but what has this to do with this e-mail thread?  This e-mail thread 
is about supporting additional languages, not about emulating Racket on 
top of Guile somehow (or perhaps you count Racket's dialect of Scheme as 
a language of its own to be implemented in Guile?).

> Finally, I looked into the history of `#!` in R6RS a bit, and I'll leave a few pointers here for posterity. Will Clinger's 2015 Scheme Workshop paper [11] says in section 3.1 that "Kent Dybvig suggested the `#!r6rs` flag in May 2006", Clinger "formally proposed addition of Dybvig’s suggestion" [12], and, "less than six weeks later," `#!r6rs` was "in the R6RS editors’ status report". (I am not persuaded by all of the arguments about `#!r6rs` in that paper: in particular, the analysis doesn't seem to account for R6RS Appendix A [1].) As best as I can tell, the suggestion from Kent Dybvig is [13]:

Again, how is RnRS relevant to _non-Scheme_ languages?

Besides the 'shebangs actually are r6rs', I am disappointed by this 
discussion -- you keep repeating irrelevant points or points that were 
already addressed. (Again, R6RS and Racket are simply _irrelevant_ to 
non-Scheme languages that did not originate from Racket, and you are not 
giving arguments for them actually being relevant somehow.)

As this line of discussion has proven to just be a pointless time sink, 
I will not read or respond to further replies by you in this line of 
discussion.

Greetings,
Maxime