From: Lynn Winebarger <owinebar@gmail.com>
To: Daniel Colascione <dancol@dancol.org>
Cc: Philip Kaludercic <philipk@posteo.net>,
emacs-devel <emacs-devel@gnu.org>, Eli Zaretskii <eliz@gnu.org>,
Richard Stallman <rms@gnu.org>,
manphiz@gmail.com
Subject: Re: Tree-sitter maturity
Date: Tue, 31 Dec 2024 17:29:04 -0500 [thread overview]
Message-ID: <CAM=F=bC-TXTB4_s764_c2wSAWkN7Sx9W+qiH+gA+Q7Lorjo18Q@mail.gmail.com> (raw)
In-Reply-To: <87msge8bv8.fsf@dancol.org>
On Sun, Dec 29, 2024 at 3:37 PM Daniel Colascione <dancol@dancol.org> wrote:
>
> Thanks. Such an approach would let us treat tree-sitter grammars a lot
> more like font-lock-keywords, and I think for some modes, that'd be a
> good option. (Of course, SHTDI.)
The main blocking point for me is a primitive facility for describing
machine-level binary data structures, and operations for manipulating
data according to those specifications. The "bindat" facility is a
step in that direction, but its semantics lacks pointers, which is a
big limitation for simple translation of C data structures from source
code.
>
> Tree sitter, as wonderful as it is, strikes me as a bit of a Rube
> Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-)
They evidently decided to use JSON and a simple schema to specify the
concrete grammar, instead of creating a DSL for the purpose.
Javascript is just a convenient way for embedding code into JSON the
same way LISP programmers use lisp to generate S-expressions. Once
you have the JSON format generated, javascript is not used.
The rest of the project is really composed of orthogonal components,
the GLR grammar compiler (written in Rust) and the run-time GLR
parsing engine, written in C. The grammar compiler produces the
parsing tables in the form of C source code that is compiled together
with the library for a single library per grammar, but the C library
does not actually require the parsing tables to be statically known at
compile-time, at least the last I looked, unless some really obscure
dependence. The procedural interface to the parser just takes a
pointer to the parser table data structure at run-time.
Since GLR grammars are basically arbitrary (ambiguous) LR(1) grammars,
the parser run-time has to implement a fairly sophisticated algorithm
(graph-stacks) to be efficient. Having implemented the LALR parser
generator at least 3 times in the last couple of decades (just for my
own use), generating the parse tables looks like a lot simpler (and
well-understood) problem to solve than the GLR run-time. More
importantly, the efficiency of the grammar compiler is not all that
critical compared to the run-time.
> Do you happen to know whether the subset of Rust that gccrs recognizes
> is sufficient to compile the tree sitter grammar compiler? If so, we
> could in principle combine gccrs with a bare-bones embedded JS
> interpreter like https://duckjs.org/ to produce a mechanism that would
> let us customize and rebuild tree sitter grammars as easily as we do
> elisp files, even on obscure platforms like DJGPP.
I have no idea. As I wrote above, replicating the calculation
performed by the grammar compiler is not that intimidating, if we had
a way of writing out the parse tables in the in-memory structures
understood by the runtime procedural interface. At least, replicating
the GLR grammar analysis is a lot simpler than implementing a compiler
for Rust, if that viewpoint makes sense. At worst, Emacs could move
from the generated C files to consuming the JSON files. There's no
requirement that parsers even have a JS form, that's just for
convenience of grammar writers.
I mean, look at
https://github.com/tree-sitter/tree-sitter-cpp/blob/master/grammar.js
. That uses a fairly limited subset of JS that has a straightforward
translation to lisp types. If we don't require replicating every
corner-case of Javascript, then the existing JS tree-sitter library
could probably be used to produce an S-expression that a simple set of
macros could translate into an S-expression equivalent of the
corresponding grammar.json. If you had a emacs-based grammar compiler
that could consume grammars in JSON format, with a generic tee-sitter
dynamic library (no fixed parse tables), you could even "bootstrap"
using the existing JSON from
https://github.com/tree-sitter/tree-sitter-javascript/blob/master/src/grammar.json,
so Rust was never involved (if that is important).
>
> Some Emacs modes could ship with .js grammars sourced from upstream
> editor-neutral projects. Other modes might just build tree sitter parse
> tables in elisp using something vaguely like SMIE syntax. Both styles
> of mode would be customizable by end users, and we'd (because, I'm a
> broken record, vendor vendor vendor) we'd maintain compatibility without
> mysterious AST-change-related breakages.
I agree, a generic grammar capturing the structures of most
programming languages would be useful. It is definitely possible to
extract the syntactic/semantic concepts from C++ and Python to create
such a grammar, if you are willing to allow nested grammars
appropriately delimited. For example, a constructor context would
delimit an expression in a data language that is embedded in a
constructor context that may itself have delimited value contexts
where the functional/procedural grammar may appear, ad infinitum. The
procedural and data grammars are distinct but mutually recursive.
That would be if the form appeared in an rvalue-context. For l-value
expressions, the same constructor delimiting syntax can become a
binding form, at least, with subexpressions of binding forms also
being binding forms. As long as the scanner is dynamically set
according to the grammar context (and recognizes/signals the closing
delimiter), the grammar can be made non-ambiguous because a given
character will produce context-appropriate terminal symbols.
As for vendoring, I just doubt you will get much buy-in in this forum.
There are corporate-type free/open-source software projects that
prioritize uniformity in build environments and limiting the scope of
bugs that can arise from the build process/dependencies that vendor at
the drop of the hat. Then there are "classic" free software projects
that have amalgamated the work of many individual contributors, and
those contributors often prioritize control of the software running on
their systems for whatever reason (but eliminating non-free software
is definitely one of them), and they often can/will contribute patches
for that purpose. The second camp *hates* vendoring because it
subverts their control of their computational resources. At least,
that's the dichotomy I see. There are probably finer points I'm
missing or mischaracterizing.
Lynn
next prev parent reply other threads:[~2024-12-31 22:29 UTC|newest]
Thread overview: 195+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-20 15:13 My resignation from Emacs development Alan Mackenzie
2024-11-20 15:34 ` Eli Zaretskii
2024-11-20 16:23 ` Christopher Dimech
2024-11-21 6:22 ` Gerd Möllmann
2024-11-21 10:05 ` Christopher Dimech
2024-11-21 11:23 ` Gerd Möllmann
2024-11-21 11:40 ` Eli Zaretskii
2024-11-21 10:29 ` Alan Mackenzie
2024-11-21 12:26 ` Christopher Dimech
2024-11-20 16:42 ` Alfred M. Szmidt
2024-11-20 17:04 ` tomas
2024-11-20 21:56 ` Dmitry Gutov
2024-11-21 2:28 ` Stefan Kangas
2024-11-21 12:34 ` Tree-sitter maturity (was: My resignation from Emacs development) Peter Oliver
2024-11-23 13:41 ` Stefan Kangas
2024-11-24 2:10 ` Tree-sitter maturity Björn Bidar
[not found] ` <67428b3d.c80a0220.2f3036.adbdSMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-17 22:11 ` Yuan Fu
2024-12-18 13:34 ` Eli Zaretskii
2024-12-19 1:40 ` Yuan Fu
2024-12-19 8:17 ` Eli Zaretskii
2024-12-20 9:13 ` Björn Bidar
[not found] ` <6765355b.c80a0220.1a6b24.3117SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-20 9:29 ` Yuan Fu
2024-12-23 0:43 ` Björn Bidar
[not found] ` <6768b256.c80a0220.222b1b.64e6SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-24 1:20 ` Yuan Fu
[not found] ` <87frmfxm8y.fsf@>
2024-12-24 4:52 ` Richard Stallman
2024-12-24 12:32 ` Eli Zaretskii
2024-12-24 21:31 ` Xiyue Deng
2024-12-26 4:30 ` Richard Stallman
2024-12-27 10:54 ` Philip Kaludercic
2024-12-27 12:40 ` Eli Zaretskii
2024-12-27 13:46 ` Daniel Colascione
2024-12-27 14:19 ` Philip Kaludercic
2024-12-27 14:24 ` Daniel Colascione
2024-12-27 14:57 ` Philip Kaludercic
2024-12-27 15:02 ` Philip Kaludercic
2024-12-29 4:19 ` Richard Stallman
2024-12-29 4:23 ` Daniel Colascione
2024-12-29 7:44 ` Eli Zaretskii
2024-12-29 8:01 ` Daniel Colascione
2024-12-29 8:41 ` Eli Zaretskii
2024-12-29 8:59 ` Yuan Fu
2024-12-29 9:14 ` Daniel Colascione
2024-12-29 9:24 ` Eli Zaretskii
2024-12-29 10:01 ` Daniel Colascione
2024-12-29 13:35 ` Eli Zaretskii
2024-12-29 20:12 ` Daniel Colascione
2024-12-29 10:13 ` tomas
2024-12-29 10:21 ` Yuan Fu
2024-12-29 14:59 ` Daniel Colascione
2024-12-29 14:14 ` Dmitry Gutov
2024-12-29 7:26 ` Eli Zaretskii
[not found] ` <904957B9-55C1-42DF-BE6A-16986A4B539A@dancol.org>
[not found] ` <87r05o2eji.fsf@posteo.net>
[not found] ` <E2C32D27-EEC2-4DD2-B6F6-8827820B880E@dancol.org>
2024-12-31 16:47 ` Philip Kaludercic
2024-12-29 14:36 ` Lynn Winebarger
2024-12-29 20:36 ` Daniel Colascione
2024-12-29 23:29 ` Björn Bidar
[not found] ` <6771db94.050a0220.386e00.e451SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-30 0:30 ` Yuan Fu
2024-12-30 0:36 ` Daniel Colascione
2024-12-30 1:00 ` Yuan Fu
2024-12-31 9:48 ` Philip Kaludercic
2024-12-30 3:20 ` Lynn Winebarger
2024-12-31 3:22 ` Björn Bidar
2024-12-31 22:29 ` Lynn Winebarger [this message]
2025-01-01 20:23 ` Björn Bidar
2024-12-28 12:20 ` Peter Oliver
2024-12-28 12:23 ` Philip Kaludercic
2024-12-29 14:50 ` Björn Bidar
2024-12-27 14:59 ` Eli Zaretskii
2024-12-27 15:05 ` Daniel Colascione
2024-12-27 15:31 ` Eli Zaretskii
2024-12-27 15:37 ` Daniel Colascione
2024-12-28 1:08 ` Stefan Kangas
2024-12-29 4:19 ` Richard Stallman
2024-12-29 4:21 ` Daniel Colascione
2024-12-29 6:41 ` tomas
2024-12-29 6:43 ` Daniel Colascione
2024-12-29 6:54 ` tomas
2024-12-29 7:05 ` Daniel Colascione
2024-12-29 8:56 ` tomas
2024-12-29 15:16 ` Björn Bidar
2024-12-29 15:05 ` Björn Bidar
[not found] ` <87ed1qedhl.fsf@>
2024-12-29 15:21 ` Daniel Colascione
2024-12-29 16:02 ` Björn Bidar
[not found] ` <663726A2-141B-4B98-80FB-BD93E99AC122@dancol.org>
2024-12-29 19:06 ` Björn Bidar
[not found] ` <6771d84b.050a0220.250914.d0e0SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-30 0:56 ` Yuan Fu
2024-12-27 14:11 ` Philip Kaludercic
2024-12-27 15:06 ` Eli Zaretskii
2024-12-31 13:47 ` Philip Kaludercic
2024-12-27 18:29 ` Ihor Radchenko
2024-12-28 7:55 ` Eli Zaretskii
2024-12-28 8:11 ` Ihor Radchenko
2024-12-28 8:58 ` Eli Zaretskii
2024-12-29 15:09 ` Björn Bidar
2024-12-26 4:32 ` Richard Stallman
2024-12-26 7:12 ` Eli Zaretskii
2024-12-29 14:35 ` Björn Bidar
2024-12-19 12:23 ` Peter Oliver
2024-12-19 12:42 ` Eli Zaretskii
2024-12-19 13:15 ` Vincenzo Pupillo
2024-12-20 8:59 ` Björn Bidar
2024-11-21 13:01 ` My resignation from Emacs development Alan Mackenzie
2024-11-21 13:48 ` Eli Zaretskii
2024-11-21 14:29 ` Alfred M. Szmidt
2024-11-22 0:01 ` Po Lu
2024-11-22 7:03 ` Eli Zaretskii
2024-11-22 8:14 ` Robert Pluim
2024-11-22 8:32 ` Eli Zaretskii
2024-11-22 23:59 ` Po Lu
2024-11-23 6:39 ` Eli Zaretskii
2024-11-21 16:29 ` Alan Mackenzie
2024-11-22 5:35 ` Adam Porter
2024-11-22 7:24 ` Madhu
2024-11-22 8:11 ` Eli Zaretskii
2024-11-22 9:26 ` Madhu
2024-11-22 12:07 ` Eli Zaretskii
2024-11-22 12:40 ` Stefan Kangas
2024-11-22 13:06 ` Alan Mackenzie
2024-11-22 13:39 ` Stefan Kangas
2024-11-22 14:25 ` Eli Zaretskii
2024-11-25 4:28 ` Richard Stallman
2024-11-26 17:37 ` Alan Mackenzie
2024-12-13 4:35 ` Richard Stallman
2024-12-15 15:27 ` Alan Mackenzie
2024-12-15 15:48 ` Eli Zaretskii
2024-12-15 20:43 ` Alan Mackenzie
2024-12-19 4:22 ` Richard Stallman
2024-12-19 8:26 ` Eli Zaretskii
2024-11-23 22:18 ` Andrea Corallo
2024-11-22 10:57 ` Alan Mackenzie
2024-11-22 23:19 ` Adam Porter
2024-11-26 19:01 ` Daniel Radetsky
2024-11-26 19:51 ` Christopher Dimech
2024-11-27 2:18 ` Adam Porter
2024-11-27 9:36 ` Daniel Radetsky
2024-11-27 9:59 ` Christopher Dimech
2024-11-30 3:52 ` Richard Stallman
2024-11-30 7:53 ` Eli Zaretskii
2024-11-30 16:22 ` Discuss new features/enhancements or large changes for users in emacs-devel [was: My resignation from Emacs development] Drew Adams
2024-11-30 16:56 ` Eli Zaretskii
2024-11-30 21:06 ` [External] : " Drew Adams
2024-12-01 6:00 ` Eli Zaretskii
2024-12-03 7:26 ` My resignation from Emacs development Richard Stallman
2024-12-03 13:33 ` Eli Zaretskii
2024-11-30 16:21 ` Discuss new features/enhancements or large changes for users in emacs-devel [was My resignation from Emacs development] Drew Adams
2024-11-30 17:05 ` Eli Zaretskii
2024-11-30 21:09 ` [External] : " Drew Adams
2024-12-01 6:12 ` Eli Zaretskii
2024-12-01 19:23 ` Drew Adams
2024-12-03 7:25 ` Richard Stallman
2024-12-03 13:32 ` Eli Zaretskii
2024-12-06 4:48 ` Richard Stallman
2024-12-02 4:09 ` Richard Stallman
2024-12-02 13:04 ` Discuss new features/enhancements or large changes for users in emacs-devel Eli Zaretskii
2024-12-02 15:32 ` [External] : " Drew Adams
2024-12-05 5:08 ` Richard Stallman
2024-12-05 6:33 ` Eli Zaretskii
2024-12-02 15:29 ` [External] : Re: Discuss new features/enhancements or large changes for users in emacs-devel [was My resignation from Emacs development] Drew Adams
2024-11-27 2:06 ` My resignation from Emacs development Adam Porter
2024-11-27 9:17 ` Daniel Radetsky
2024-11-22 15:36 ` Stefan Kangas
2024-11-22 17:48 ` Alan Mackenzie
2024-11-23 23:43 ` Stefan Monnier via Emacs development discussions.
2024-11-23 6:10 ` Richard Stallman
2024-11-23 7:48 ` Eli Zaretskii
2024-11-23 11:06 ` Christopher Dimech
2024-11-23 11:54 ` Eli Zaretskii
2024-11-23 12:48 ` Christopher Dimech
2024-11-23 23:59 ` Adam Porter
2024-12-01 3:50 ` Sean Whitton
2024-12-01 6:19 ` tomas
2024-11-24 18:12 ` Suhail Singh
2024-11-26 4:56 ` Richard Stallman
2024-11-26 7:38 ` Suhail Singh
2024-11-21 5:59 ` Gerd Möllmann
2024-11-22 11:36 ` Alan Mackenzie
2024-11-22 11:52 ` Eli Zaretskii
2024-11-23 10:36 ` Alan Mackenzie
2024-11-23 11:31 ` Eli Zaretskii
2024-11-21 13:39 ` Andrea Corallo
2024-11-21 19:01 ` Alfred M. Szmidt
2024-11-21 19:19 ` Christopher Dimech
2024-11-21 19:47 ` Eli Zaretskii
2024-11-21 19:40 ` Jim Porter
2024-11-24 4:35 ` Richard Stallman
2024-11-21 23:57 ` Po Lu
2024-11-22 17:26 ` On committing significant and/or controversial changes (was: My resignation from Emacs development) Ihor Radchenko
2024-11-22 17:47 ` Ship Mints
2024-11-22 19:04 ` Eli Zaretskii
2024-11-24 2:35 ` On committing significant and/or controversial changes Björn Bidar
2024-11-24 4:41 ` Adam Porter
2024-11-30 2:16 ` Björn Bidar
[not found] ` <87ttbx73zu.fsf@>
2024-11-24 8:26 ` Eli Zaretskii
2024-11-22 19:01 ` Eli Zaretskii
2024-11-23 6:10 ` My resignation from Emacs development Richard Stallman
2024-11-23 8:50 ` Eli Zaretskii
2024-11-23 6:10 ` Richard Stallman
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAM=F=bC-TXTB4_s764_c2wSAWkN7Sx9W+qiH+gA+Q7Lorjo18Q@mail.gmail.com' \
--to=owinebar@gmail.com \
--cc=dancol@dancol.org \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=manphiz@gmail.com \
--cc=philipk@posteo.net \
--cc=rms@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).