On Fri, Dec 27, 2024, 9:25 AM Daniel Colascione <dancol@dancol.org> wrote:


It's a shame there's no way to write TS grammars in plain elisp. I figure vendoring both the source and the generated code would be best, as it'd allow building Emacs anywhere but still make it convenient on systems with needed tools (JS runtime, Rust, etc.) to update and modify the grammar. As with any scheme involving checking in generated outputs, the source and output can get out of sync, but I think there are build time guardrails we can build to make sure it doesn't happen.

I looked into this last year.  The tree-sitter library provides a parsing engine that references a fairly standard LR type parsing table in binary form.  I got stuck in adding a generic primitive functionality for reading and writing arbitrary binary data structures based on a data description DSL, since I wouldn't want to tie the interpreter core to the data structures of an external, dynamically-loadable library.  But, I wasn't sure such an extension would be accepted into emacs, as I am not an expert on the possible security implications.

Other than that, emacs already has the code for calculating (LA)LR parsing tables in the semantic packages.  The tree-sitter grammar compiler may have additional logic for providing multiple starting symbols, but the parsing engine should still function with a classic parsing table.

Lynn