On January 4, 2025 1:57:15 PM EST, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Daniel Colascione <dancol@dancol.org>
>> Cc: Björn Bidar <bjorn.bidar@thaodan.de>,  Philip
>>  Kaludercic
>>  <philipk@posteo.net>,  emacs-devel <emacs-devel@gnu.org>,  Eli Zaretskii
>>  <eliz@gnu.org>,  Richard Stallman <rms@gnu.org>,  manphiz@gmail.com
>> Date: Sat, 04 Jan 2025 12:39:44 -0500
>> 
>> The point I keep trying to make is that you can't safely update a
>> foo-ts-mode tree sitter grammar without updating the corresponding
>> foo-ts-mode Lisp.  They're tightly coupled.  They're not separate
>> programs.  Same goes for nvim or whatever using TS grammars.
>> Even distribution packagers understand the futility of consolidating
>> dependencies with unstable interfaces.
>> 
>> When it comes to Emacs, we either 1) treat grammars as part of Emacs and
>> build them with Emacs, or 2) try to take a runtime dependency on
>> grammars that can be updated independently of Emacs.
>> Compatibility considerations mean #2 can't work, so we're left with
>> doing #1 somehow.
>
>This is true in principle, but in practice incompatible changes in
>grammar libraries are rare. 

They are not rare. There are several workarounds in Emacs Lisp for grammars with different versions with different vocabularies. c++-ts-mode recently stopped recognizing certain languages keywords ("virtual" I believe) when a grammar made an unannounced incompatible change, and such a workaround had to be added. These breakages will keep happening no matter how much one might wish grammar authors would consider stability guarantees.

> So in practice the same Lisp in
>foo-ts-mode can endure quite a few changes in the tree-sitter-foo
>grammar library

It's like cancer. Mutations can happen any time, and if you're unlucky, you'll get a harmful one without warning.

>> We're not talking about something like libpng, which
>> could in principle be updated without Emacs having to know about the
>> update
>
>Libraries like libpng also make incompatible ABI changes from time to
>time.  I agree that they do it less frequently than tree-sitter
>grammar libraries, but they still do.  And yet we don't distribute
>libpng with Emacs.

When a library likes libpng makes an incompatible change, it gets a new major version. Consider GTK3 and GTK4. Often, several versions get maintained simultaneously. Breakages are telegraphed in advance, and versions are usually introspectable. Grammars have none of this version discipline.

Besides: updating libpng usually gives you some value in exchange for the doing the update. A new version might fix a security problem, improve performance, or add a feature. These concerns aren't relevant for grammars: fixes and improvements usually involve changing the shape of the parse tree, and when you change the parse tree, you have to change the Lisp that consumed the parse tree to match.

I think we should vendor even libpng. Down with dynamic linking! Seriousy. But I can at least sort of see the logic in loose coupling to libpng, especially if we consider the constraints of the boxed software and floppies beforetime. But grammars? I don't think it makes sense to depend on them dynamically even under a framework in which it makes sense to unbundle libpng.

>> The simplest possible way to implement #1 is to just check the grammars
>> into the Emacs repository and build them with Emacs using the normal
>> build system.  Trying to check in hashes and download the hash-named
>> grammar versions during the build and *then* build them with Emacs ---
>> why bother?  Because of the hash-locking, a download-at-build-time
>> scheme doesn't actually add any flexibility relative to just checking in
>> the code.
>
>This eliminates the need to keep the grammar in our repository (or
>have it sub-moduled)

And it creates the need to do code distribution in a bespoke way. How is that a net win?


> to say nothing of the legal aspects that are
>better avoided.  

Nobody has been able to describe these legal aspects. Grammars are free software. GPL compatible, too. That means we can put them in Emacs. That's what software freedom means.

> Also don't forget that we have at least two active
>branches at any given time, and the number of grammar libraries we are
>interested in is more than a handful.  So adding them to our
>repository is a significant addition to the maintenance burden.

Vendoring reduces, not increases, the maintenance burden. If you're vendoring or hash locking, when you cut a branch, you cut the grammars at the same time. If you check in the grammars or their hashes, this snapshotting happens automatically. The alternative would be bizarre: we don't try to combine cc-langs.el from master with cc-engine.el from a release branch!


>Other than that, yes, hash-locking is not much more flexible than
>bundling.  I tried to tell that to people who think hash-locking is a
>solution, but they still insisted.



> And since they also volunteered to
>maintain the DB of hashes, I don't see why I should reject that.  But
>I don't think it's a good solution.

Then these people should use git submodules instead of inventing a random custom thing that we have to maintain that does the same thing as git submodules, except less flexibly, less familiar, and probably less robust.

>> It's just a more complicated and error-prone way of doing the
>> same thing as checking in the code.  The same goes for other forms of
>> downloading dependencies, e.g. via git submodules.
>
>The difference is that the RI changes.  And that's not something to
>ignore, from where I stand.

Huh? In what possible way could a bespoke downloader be a better engineering choice than submodules?