On Sun, Dec 29, 2024, 7:31 PM Yuan Fu <casouri@gmail.com> wrote:


> On Dec 29, 2024, at 3:29 PM, Björn Bidar <bjorn.bidar@thaodan.de> wrote:
>
> Daniel Colascione <dancol@dancol.org> writes:
>
>> Lynn Winebarger <owinebar@gmail.com> writes:
>>
>>> On Fri, Dec 27, 2024, 9:25 AM Daniel Colascione <dancol@dancol.org> wrote:
>>>
>>>>
>>>>
>>>> It's a shame there's no way to write TS grammars in plain elisp. I figure
>>>> vendoring both the source and the generated code would be best, as it'd
>>>> allow building Emacs anywhere but still make it convenient on systems with
>>>> needed tools (JS runtime, Rust, etc.) to update and modify the grammar. As
>>>> with any scheme involving checking in generated outputs, the source and
>>>> output can get out of sync, but I think there are build time guardrails we
>>>> can build to make sure it doesn't happen.
>>>>
>>>
>>> I looked into this last year.  The tree-sitter library provides a parsing
>>> engine that references a fairly standard LR type parsing table in binary
>>> form.  I got stuck in adding a generic primitive functionality for reading
>>> and writing arbitrary binary data structures based on a data description
>>> DSL, since I wouldn't want to tie the interpreter core to the data
>>> structures of an external, dynamically-loadable library.  But, I wasn't
>>> sure such an extension would be accepted into emacs, as I am not an expert
>>> on the possible security implications.
>>>
>>> Other than that, emacs already has the code for calculating (LA)LR parsing
>>> tables in the semantic packages.  The tree-sitter grammar compiler may have
>>> additional logic for providing multiple starting symbols, but the parsing
>>> engine should still function with a classic parsing table.
>>
>> Thanks.  Such an approach would let us treat tree-sitter grammars a lot
>> more like font-lock-keywords, and I think for some modes, that'd be a
>> good option.  (Of course, SHTDI.)
>>
>> Tree sitter, as wonderful as it is, strikes me as a bit of a Rube
>> Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-)

> I was wondering the same. How the hell? There had been some talks to
> support a more lightweight JavaScript interpreter as an alternative but
> it hasn't gone anyway. Somehow because compatibility reason. I don't how
> could node be dependency for these. Grammars are mostly without
> dependencies except some have dependencies to other grammars on the
> source level such as the C++ require the C grammar.

I don’t think you need nodejs to build the grammar. You might need it to develop the grammar, but compiling grammar.js to parser.c only requires the tree-sitter CLI which is written in Rust.

The grammar.js is written in a lispy way, an is interpreted by node to expand out to a JSON format.  See the middle ofhttps://tree-sitter.github.io/tree-sitter/5-implementation.html :

==========
Parsing a Grammar
First, Tree-sitter must evaluate the JavaScript code in grammar.js and convert the grammar to a JSON format. It does this by shelling out to node. The format of the grammars is formally specified by the JSON schema in grammar.schema.json. The parsing is implemented in parse_grammar.rs.
===========

The resulting JSON representation of the grammar is then compiled by the parser (table) generator written in Rust.

The JavaScript form of the grammar could only use the functions defined by the tree-sitter node module (e.g. the "$" object, "choice" function, etc) which would be fairly trivial to transliterate into lisp form, but it can incorporate arbitrary JS code as well.

Lynn

Lynn