unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Lynn Winebarger <owinebar@gmail.com>
To: "Björn Bidar" <bjorn.bidar@thaodan.de>
Cc: Daniel Colascione <dancol@dancol.org>,
	Philip Kaludercic <philipk@posteo.net>,
	 emacs-devel <emacs-devel@gnu.org>, Eli Zaretskii <eliz@gnu.org>,
	Richard Stallman <rms@gnu.org>,
	manphiz@gmail.com
Subject: Re: Tree-sitter maturity
Date: Sat, 4 Jan 2025 11:15:22 -0500	[thread overview]
Message-ID: <CAM=F=bDDHvY0qh19kc4fyu8pSfHW95wE-nenSZgCbrMbH6k6+w@mail.gmail.com> (raw)
In-Reply-To: <6775a459.170a0220.2f3d1e.1897SMTPIN_ADDED_BROKEN@mx.google.com>

On Wed, Jan 1, 2025 at 3:23 PM Björn Bidar <bjorn.bidar@thaodan.de> wrote:
> Lynn Winebarger <owinebar@gmail.com> writes:
> >> Tree sitter, as wonderful as it is, strikes me as a bit of a Rube
> >> Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-)
> >
> > They evidently decided to use JSON and a simple schema to specify the
> > concrete grammar, instead of creating a DSL for the purpose.
> > Javascript is just a convenient way for embedding code into JSON the
> > same way LISP programmers use lisp to generate S-expressions.  Once
> > you have the JSON format generated, javascript is not used.
> >
> > The rest of the project is really composed of orthogonal components,
> > the GLR grammar compiler (written in Rust) and the run-time GLR
> > parsing engine, written in C.  The grammar compiler produces the
> > parsing tables in the form of C source code that is compiled together
> > with the library for a single library per grammar, but the C library
> > does not actually require the parsing tables to be statically known at
> > compile-time, at least the last I looked, unless some really obscure
> > dependence.  The procedural interface to the parser just takes a
> > pointer to the parser table data structure at run-time.
> >
> > Since GLR grammars are basically arbitrary (ambiguous) LR(1) grammars,
> > the parser run-time has to implement a fairly sophisticated algorithm
> > (graph-stacks) to be efficient.  Having implemented the LALR parser
> > generator at least 3 times in the last couple of decades (just for my
> > own use), generating the parse tables looks like a lot simpler (and
> > well-understood) problem to solve than the GLR run-time.  More
> > importantly, the efficiency of the grammar compiler is not all that
> > critical compared to the run-time.
> >
>
> Additional alernatives instead of Node are already a good alternative.
> Using WASM as the output format also does not sound bad assuming their
> is some abstraction from the tree-sitter library side.

I'm not sure why WASM would be interesting.  AFAICT, it's just another
set of bindings to the C library, maybe with the tables compiled into
WASM binary module (or whatever the correct term should be - I'm not a
WASM expert).  In any case, AFAIK Emacs has no particular capability
for using WASM files as dynamic libraries in general.  Maybe if Emacs
itself was compiled to WASM, in which case I suppose the function for
dynamically loading libraries would implicitly load such modules.

OTOH, the generated WASM bindings might provide an example of using
the tree-sitter DLL with the in-memory parse table structure not
embedded in the tree-sitter DLL.  Is that what you meant?

> > I agree, a generic grammar capturing the structures of most
> > programming languages would be useful.  It is definitely possible to
> > extract the syntactic/semantic concepts from C++ and Python to create
> > such a grammar, if you are willing to allow nested grammars
> > appropriately delimited.  For example, a constructor context would
> > delimit an expression in a data language that is embedded in a
> > constructor context that may itself have delimited value contexts
> > where the functional/procedural grammar may appear, ad infinitum.  The
> > procedural and data grammars are distinct but mutually recursive.
> > That would be if the form appeared in an rvalue-context.  For l-value
> > expressions, the same constructor delimiting syntax can become a
> > binding form, at least, with subexpressions of binding forms also
> > being binding forms.  As long as the scanner is dynamically  set
> > according to the grammar context (and recognizes/signals the closing
> > delimiter), the grammar can be made non-ambiguous because a given
> > character will produce context-appropriate terminal symbols.
>
> What kind of scanner are you referring to? Something that works like a
> binding generator but for AST?

A few years ago, I wanted a template system for this terrible
proprietary language I was working with, so I wrote this grammar that
could encompass that language (which, AFAICT, was only defined by
company programmers hacking additional patterns directly into their
hand-written parser, for which I reverse-engineered a LALR(1)
grammar), a shell-type interpolation sublanguage, and other languages
that stuck to the syntactic constructs allowed by Python and C++.  It
was a bear to work out, and I ended up throwing it away, anyway.  But
the point is, at the start of an interpolation context, the parser
would switch scanner and parser tables to the language assigned to the
scope of that interpolation context (associated with a particular
terminal introducing that context in the "current" parser table).  So
while parsing language A, "${" might introduce an interpolation
context for language B, "$!{" for language C, "$[" for language D,
etc.  As long as the new scanner or parser could discriminate the
closing terminal as ending the sublanguage program and returning to
language A context, it should work.

Anyway, for that purpose, I wanted a grammar that would be flexible
enough that I could just switch the bindings for the actions and
mapping of terminals, not change the whole grammar, so I would only
need to do the grammar analysis once.  That being said, I never
actually showed it could be done with multiple real terminals for a
single meta-terminal.  That is, in the previous paragraph there might
have been a  "meta-terminal" "START_INTERPOLATION_CONTEXT" that would
expand to 3 concrete terminals (in the grammar for language A)
"START_INTERPOLATION_B", "START_INTERPOLATION_C",
"START_INTERPOLATION_D", so the parser would have to know which of
those concrete terminals was being reduced to choose the right action.
I've been waiting for the details to rot from my memory so I can start
from scratch on a concrete grammar.

Aside from being useful for generic templating purposes, Such a
generic grammar would be of use for the purpose Daniel described, i.e.
a layer of abstraction usable for almost any modern language, even in
polyglot texts.

> > As for vendoring, I just doubt you will get much buy-in in this forum.
> > There are corporate-type free/open-source software projects that
> > prioritize uniformity in build environments and limiting the scope of
> > bugs that can arise from the build process/dependencies that vendor at
> > the drop of the hat.  Then there are "classic" free software projects
> > that have amalgamated the work of many individual contributors, and
> > those contributors often prioritize control of the software running on
> > their systems for whatever reason (but eliminating non-free software
> > is definitely one of them), and they often can/will contribute patches
> > for that purpose.  The second camp *hates* vendoring because it
> > subverts their control of their computational resources.    At least,
> > that's the dichotomy I see. There are probably finer points I'm
> > missing or mischaracterizing.
>
> From my point as a distribution packager there are several reason why
> vendoring can be bad or in some context keeping them is the better
> decision.
>
> But in this context it complicates the build process as now each grammar
> has to be built for Emacs in addition to another editors.
> The Emacs package now pulls in more build dependencies at built time
> which complicates the built process  as the dependency grows.
>
> Besides bundled dependencies are not allowed unless there's no way to
> avoid them. It is not about control or anything.

That sounds like something I would interpret as control.  Distro
creators/maintainers are prime candidates for wanting to maintain
control of the build/run-time environment, as they are responsible for
everything they bundle working together.  Perhaps "control of their
computational resources" is more specific than I intended in my
previous posting.

Lynn



  parent reply	other threads:[~2025-01-04 16:15 UTC|newest]

Thread overview: 207+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-20 15:13 My resignation from Emacs development Alan Mackenzie
2024-11-20 15:34 ` Eli Zaretskii
2024-11-20 16:23 ` Christopher Dimech
2024-11-21  6:22   ` Gerd Möllmann
2024-11-21 10:05     ` Christopher Dimech
2024-11-21 11:23       ` Gerd Möllmann
2024-11-21 11:40         ` Eli Zaretskii
2024-11-21 10:29   ` Alan Mackenzie
2024-11-21 12:26     ` Christopher Dimech
2024-11-20 16:42 ` Alfred M. Szmidt
2024-11-20 17:04 ` tomas
2024-11-20 21:56 ` Dmitry Gutov
2024-11-21  2:28 ` Stefan Kangas
2024-11-21 12:34   ` Tree-sitter maturity (was: My resignation from Emacs development) Peter Oliver
2024-11-23 13:41     ` Stefan Kangas
2024-11-24  2:10     ` Tree-sitter maturity Björn Bidar
     [not found]     ` <67428b3d.c80a0220.2f3036.adbdSMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-17 22:11       ` Yuan Fu
2024-12-18 13:34         ` Eli Zaretskii
2024-12-19  1:40           ` Yuan Fu
2024-12-19  8:17             ` Eli Zaretskii
2024-12-20  9:13             ` Björn Bidar
     [not found]             ` <6765355b.c80a0220.1a6b24.3117SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-20  9:29               ` Yuan Fu
2024-12-23  0:43                 ` Björn Bidar
     [not found]                 ` <6768b256.c80a0220.222b1b.64e6SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-24  1:20                   ` Yuan Fu
     [not found]                 ` <87frmfxm8y.fsf@>
2024-12-24  4:52                   ` Richard Stallman
2024-12-24 12:32                     ` Eli Zaretskii
2024-12-24 21:31                       ` Xiyue Deng
2024-12-26  4:30                         ` Richard Stallman
2024-12-27 10:54                           ` Philip Kaludercic
2024-12-27 12:40                             ` Eli Zaretskii
2024-12-27 13:46                               ` Daniel Colascione
2024-12-27 14:19                                 ` Philip Kaludercic
2024-12-27 14:24                                   ` Daniel Colascione
2024-12-27 14:57                                     ` Philip Kaludercic
2024-12-27 15:02                                       ` Philip Kaludercic
2024-12-29  4:19                                         ` Richard Stallman
2024-12-29  4:23                                           ` Daniel Colascione
2024-12-29  7:44                                             ` Eli Zaretskii
2024-12-29  8:01                                               ` Daniel Colascione
2024-12-29  8:41                                                 ` Eli Zaretskii
2024-12-29  8:59                                                   ` Yuan Fu
2024-12-29  9:14                                                     ` Daniel Colascione
2024-12-29  9:24                                                       ` Eli Zaretskii
2024-12-29 10:01                                                         ` Daniel Colascione
2024-12-29 13:35                                                           ` Eli Zaretskii
2024-12-29 20:12                                                             ` Daniel Colascione
2024-12-29 10:13                                                       ` tomas
2024-12-29 10:21                                                       ` Yuan Fu
2024-12-29 14:59                                                         ` Daniel Colascione
2024-12-29 14:14                                                       ` Dmitry Gutov
2024-12-29  7:26                                           ` Eli Zaretskii
     [not found]                                         ` <904957B9-55C1-42DF-BE6A-16986A4B539A@dancol.org>
     [not found]                                           ` <87r05o2eji.fsf@posteo.net>
     [not found]                                             ` <E2C32D27-EEC2-4DD2-B6F6-8827820B880E@dancol.org>
2024-12-31 16:47                                               ` Philip Kaludercic
2024-12-29 14:36                                     ` Lynn Winebarger
2024-12-29 20:36                                       ` Daniel Colascione
2024-12-29 23:29                                         ` Björn Bidar
     [not found]                                         ` <6771db94.050a0220.386e00.e451SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-30  0:30                                           ` Yuan Fu
2024-12-30  0:36                                             ` Daniel Colascione
2024-12-30  1:00                                               ` Yuan Fu
2024-12-31  9:48                                               ` Philip Kaludercic
2024-12-30  3:20                                             ` Lynn Winebarger
2024-12-31  3:22                                             ` Björn Bidar
2024-12-31 22:29                                         ` Lynn Winebarger
2025-01-01 20:23                                           ` Björn Bidar
     [not found]                                           ` <6775a459.170a0220.2f3d1e.1897SMTPIN_ADDED_BROKEN@mx.google.com>
2025-01-04 16:15                                             ` Lynn Winebarger [this message]
2025-01-04 17:39                                               ` Daniel Colascione
2025-01-04 18:57                                                 ` Eli Zaretskii
2025-01-04 19:30                                                   ` Daniel Colascione
2025-01-04 20:12                                                     ` Eli Zaretskii
2025-01-04 20:46                                                       ` Daniel Colascione
2025-01-04 20:57                                                         ` Eli Zaretskii
2025-01-04 21:18                                                           ` Daniel Colascione
2025-01-05  6:13                                                             ` Eli Zaretskii
2025-01-04 21:25                                                 ` Lynn Winebarger
2025-01-04 21:34                                                   ` Daniel Colascione
2025-01-04 23:21                                               ` Björn Bidar
2024-12-28 12:20                                   ` Peter Oliver
2024-12-28 12:23                                     ` Philip Kaludercic
2024-12-29 14:50                                     ` Björn Bidar
2024-12-27 14:59                                 ` Eli Zaretskii
2024-12-27 15:05                                   ` Daniel Colascione
2024-12-27 15:31                                     ` Eli Zaretskii
2024-12-27 15:37                                       ` Daniel Colascione
2024-12-28  1:08                                       ` Stefan Kangas
2024-12-29  4:19                                         ` Richard Stallman
2024-12-29  4:21                                           ` Daniel Colascione
2024-12-29  6:41                                             ` tomas
2024-12-29  6:43                                               ` Daniel Colascione
2024-12-29  6:54                                                 ` tomas
2024-12-29  7:05                                                   ` Daniel Colascione
2024-12-29  8:56                                                     ` tomas
2024-12-29 15:16                                                   ` Björn Bidar
2024-12-29 15:05                                     ` Björn Bidar
     [not found]                                     ` <87ed1qedhl.fsf@>
2024-12-29 15:21                                       ` Daniel Colascione
2024-12-29 16:02                                         ` Björn Bidar
     [not found]                                         ` <663726A2-141B-4B98-80FB-BD93E99AC122@dancol.org>
2024-12-29 19:06                                           ` Björn Bidar
     [not found]                                           ` <6771d84b.050a0220.250914.d0e0SMTPIN_ADDED_BROKEN@mx.google.com>
2024-12-30  0:56                                             ` Yuan Fu
2024-12-27 14:11                               ` Philip Kaludercic
2024-12-27 15:06                                 ` Eli Zaretskii
2024-12-31 13:47                                   ` Philip Kaludercic
2024-12-27 18:29                               ` Ihor Radchenko
2024-12-28  7:55                                 ` Eli Zaretskii
2024-12-28  8:11                                   ` Ihor Radchenko
2024-12-28  8:58                                     ` Eli Zaretskii
2024-12-29 15:09                           ` Björn Bidar
2024-12-26  4:32                       ` Richard Stallman
2024-12-26  7:12                         ` Eli Zaretskii
2024-12-29 14:35                         ` Björn Bidar
2024-12-19 12:23           ` Peter Oliver
2024-12-19 12:42             ` Eli Zaretskii
2024-12-19 13:15             ` Vincenzo Pupillo
2024-12-20  8:59           ` Björn Bidar
2024-11-21 13:01   ` My resignation from Emacs development Alan Mackenzie
2024-11-21 13:48     ` Eli Zaretskii
2024-11-21 14:29       ` Alfred M. Szmidt
2024-11-22  0:01         ` Po Lu
2024-11-22  7:03           ` Eli Zaretskii
2024-11-22  8:14             ` Robert Pluim
2024-11-22  8:32               ` Eli Zaretskii
2024-11-22 23:59               ` Po Lu
2024-11-23  6:39                 ` Eli Zaretskii
2024-11-21 16:29       ` Alan Mackenzie
2024-11-22  5:35     ` Adam Porter
2024-11-22  7:24       ` Madhu
2024-11-22  8:11         ` Eli Zaretskii
2024-11-22  9:26           ` Madhu
2024-11-22 12:07             ` Eli Zaretskii
2024-11-22 12:40           ` Stefan Kangas
2024-11-22 13:06           ` Alan Mackenzie
2024-11-22 13:39             ` Stefan Kangas
2024-11-22 14:25             ` Eli Zaretskii
2024-11-25  4:28             ` Richard Stallman
2024-11-26 17:37               ` Alan Mackenzie
2024-12-13  4:35                 ` Richard Stallman
2024-12-15 15:27                   ` Alan Mackenzie
2024-12-15 15:48                     ` Eli Zaretskii
2024-12-15 20:43                       ` Alan Mackenzie
2024-12-19  4:22                     ` Richard Stallman
2024-12-19  8:26                       ` Eli Zaretskii
2024-11-23 22:18           ` Andrea Corallo
2024-11-22 10:57       ` Alan Mackenzie
2024-11-22 23:19         ` Adam Porter
2024-11-26 19:01       ` Daniel Radetsky
2024-11-26 19:51         ` Christopher Dimech
2024-11-27  2:18           ` Adam Porter
2024-11-27  9:36             ` Daniel Radetsky
2024-11-27  9:59             ` Christopher Dimech
2024-11-30  3:52             ` Richard Stallman
2024-11-30  7:53               ` Eli Zaretskii
2024-11-30 16:22                 ` Discuss new features/enhancements or large changes for users in emacs-devel [was: My resignation from Emacs development] Drew Adams
2024-11-30 16:56                   ` Eli Zaretskii
2024-11-30 21:06                     ` [External] : " Drew Adams
2024-12-01  6:00                       ` Eli Zaretskii
2024-12-03  7:26                 ` My resignation from Emacs development Richard Stallman
2024-12-03 13:33                   ` Eli Zaretskii
2024-11-30 16:21               ` Discuss new features/enhancements or large changes for users in emacs-devel [was My resignation from Emacs development] Drew Adams
2024-11-30 17:05                 ` Eli Zaretskii
2024-11-30 21:09                   ` [External] : " Drew Adams
2024-12-01  6:12                     ` Eli Zaretskii
2024-12-01 19:23                       ` Drew Adams
2024-12-03  7:25                   ` Richard Stallman
2024-12-03 13:32                     ` Eli Zaretskii
2024-12-06  4:48                       ` Richard Stallman
2024-12-02  4:09                 ` Richard Stallman
2024-12-02 13:04                   ` Discuss new features/enhancements or large changes for users in emacs-devel Eli Zaretskii
2024-12-02 15:32                     ` [External] : " Drew Adams
2024-12-05  5:08                     ` Richard Stallman
2024-12-05  6:33                       ` Eli Zaretskii
2024-12-02 15:29                   ` [External] : Re: Discuss new features/enhancements or large changes for users in emacs-devel [was My resignation from Emacs development] Drew Adams
2024-11-27  2:06         ` My resignation from Emacs development Adam Porter
2024-11-27  9:17           ` Daniel Radetsky
2024-11-22 15:36     ` Stefan Kangas
2024-11-22 17:48       ` Alan Mackenzie
2024-11-23 23:43     ` Stefan Monnier via Emacs development discussions.
2024-11-23  6:10   ` Richard Stallman
2024-11-23  7:48     ` Eli Zaretskii
2024-11-23 11:06       ` Christopher Dimech
2024-11-23 11:54         ` Eli Zaretskii
2024-11-23 12:48           ` Christopher Dimech
2024-11-23 23:59       ` Adam Porter
2024-12-01  3:50         ` Sean Whitton
2024-12-01  6:19           ` tomas
2024-11-24 18:12     ` Suhail Singh
2024-11-26  4:56       ` Richard Stallman
2024-11-26  7:38         ` Suhail Singh
2024-11-21  5:59 ` Gerd Möllmann
2024-11-22 11:36   ` Alan Mackenzie
2024-11-22 11:52     ` Eli Zaretskii
2024-11-23 10:36       ` Alan Mackenzie
2024-11-23 11:31         ` Eli Zaretskii
2024-11-21 13:39 ` Andrea Corallo
2024-11-21 19:01   ` Alfred M. Szmidt
2024-11-21 19:19     ` Christopher Dimech
2024-11-21 19:47     ` Eli Zaretskii
2024-11-21 19:40 ` Jim Porter
2024-11-24  4:35   ` Richard Stallman
2024-11-21 23:57 ` Po Lu
2024-11-22 17:26 ` On committing significant and/or controversial changes (was: My resignation from Emacs development) Ihor Radchenko
2024-11-22 17:47   ` Ship Mints
2024-11-22 19:04     ` Eli Zaretskii
2024-11-24  2:35       ` On committing significant and/or controversial changes Björn Bidar
2024-11-24  4:41         ` Adam Porter
2024-11-30  2:16           ` Björn Bidar
     [not found]       ` <87ttbx73zu.fsf@>
2024-11-24  8:26         ` Eli Zaretskii
2024-11-22 19:01   ` Eli Zaretskii
2024-11-23  6:10 ` My resignation from Emacs development Richard Stallman
2024-11-23  8:50   ` Eli Zaretskii
2024-11-23  6:10 ` Richard Stallman

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAM=F=bDDHvY0qh19kc4fyu8pSfHW95wE-nenSZgCbrMbH6k6+w@mail.gmail.com' \
    --to=owinebar@gmail.com \
    --cc=bjorn.bidar@thaodan.de \
    --cc=dancol@dancol.org \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=manphiz@gmail.com \
    --cc=philipk@posteo.net \
    --cc=rms@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).