From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter maturity Date: Sat, 04 Jan 2025 14:30:30 -0500 Message-ID: References: <86ldwdm7xg.fsf@gnu.org> <6765355b.c80a0220.1a6b24.3117SMTPIN_ADDED_BROKEN@mx.google.com> <00554790-CACA-4233-8846-9E091CF1F7AA@gmail.com> <86msgl2red.fsf@gnu.org> <87o710sr7y.fsf@debian-hx90.lan> <8734i9tmze.fsf@posteo.net> <86plldwb7w.fsf@gnu.org> <87ttapryxr.fsf@posteo.net> <0883EB00-3BB2-4BC8-95D1-45F4497C0526@dancol.org> <87msge8bv8.fsf@dancol.org> <6775a459.170a0220.2f3d1e.1897SMTPIN_ADDED_BROKEN@mx.google.com> <87h66emqan.fsf@dancol.org> <86ldvqbe5w.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=----QWHLHX80PLFQ0LVR57MU355RUNK34B Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="9647"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: K-9 Mail for Android Cc: owinebar@gmail.com, bjorn.bidar@thaodan.de, philipk@posteo.net, emacs-devel@gnu.org, rms@gnu.org, manphiz@gmail.com To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jan 04 20:31:24 2025 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tU9rn-0002MQ-NP for ged-emacs-devel@m.gmane-mx.org; Sat, 04 Jan 2025 20:31:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tU9rE-0000Ff-4S; Sat, 04 Jan 2025 14:30:48 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU9r3-0000FU-FC for emacs-devel@gnu.org; Sat, 04 Jan 2025 14:30:38 -0500 Original-Received: from dancol.org ([2600:3c01:e000:3d8::1]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tU9r0-0006Ri-Mw; Sat, 04 Jan 2025 14:30:36 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID: References:In-Reply-To:Subject:CC:To:From:Date:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=wphU7FAndFfcrXMfNgk29VH1gI5WNb11pHKBr/6hVQQ=; b=pWr/XpZYDeyVgBxsKjRx5JbmrT BorijmjBzpQZe/7hqZJuYNwIAbqtALvY6fzQ4U5mtSZ/TolvUl1xlygr+JJcKOFRbVZh8s+IP44mS cVWNBknuENSItJm4zmhh40bPbW/kiOlaIeyCxqzYjbp1C4ZACDTWl7Zby30HNlO44XYvbqXxS5PLl 1msawajjKIGAQVNqMe+ltNToYfTZaQ18kjLy9IFm6F2pbvFnZTkuwh5YlcM9GDLvSpEYaxxHuS0Fn 5HPU+6Fhxe5+Ol6bgWcMdKuwzHKKJjnM6eUoGLkzHJKo4m+B8HvCGOBt6X3jOEfJeDe9uejLeF/q1 vIJU9UXQ==; Original-Received: from [2600:1006:b11c:64c8:0:11:e407:c001] (port=38232 helo=[IPv6:::1]) by dancol.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_128_GCM_SHA256 (Exim 4.96) (envelope-from ) id 1tU9qw-0006N2-1U; Sat, 04 Jan 2025 14:30:30 -0500 In-Reply-To: <86ldvqbe5w.fsf@gnu.org> Received-SPF: pass client-ip=2600:3c01:e000:3d8::1; envelope-from=dancol@dancol.org; helo=dancol.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, HTML_MESSAGE=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327680 Archived-At: ------QWHLHX80PLFQ0LVR57MU355RUNK34B Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable On January 4, 2025 1:57:15 PM EST, Eli Zaretskii wrote: >> From: Daniel Colascione >> Cc: Bj=C3=B6rn Bidar , Philip >> Kaludercic >> , emacs-devel , Eli Zar= etskii >> , Richard Stallman , manphiz@gmail=2E= com >> Date: Sat, 04 Jan 2025 12:39:44 -0500 >>=20 >> The point I keep trying to make is that you can't safely update a >> foo-ts-mode tree sitter grammar without updating the corresponding >> foo-ts-mode Lisp=2E They're tightly coupled=2E They're not separate >> programs=2E Same goes for nvim or whatever using TS grammars=2E >> Even distribution packagers understand the futility of consolidating >> dependencies with unstable interfaces=2E >>=20 >> When it comes to Emacs, we either 1) treat grammars as part of Emacs an= d >> build them with Emacs, or 2) try to take a runtime dependency on >> grammars that can be updated independently of Emacs=2E >> Compatibility considerations mean #2 can't work, so we're left with >> doing #1 somehow=2E > >This is true in principle, but in practice incompatible changes in >grammar libraries are rare=2E=20 They are not rare=2E There are several workarounds in Emacs Lisp for gramm= ars with different versions with different vocabularies=2E c++-ts-mode rece= ntly stopped recognizing certain languages keywords ("virtual" I believe) w= hen a grammar made an unannounced incompatible change, and such a workaroun= d had to be added=2E These breakages will keep happening no matter how much= one might wish grammar authors would consider stability guarantees=2E > So in practice the same Lisp in >foo-ts-mode can endure quite a few changes in the tree-sitter-foo >grammar library It's like cancer=2E Mutations can happen any time, and if you're unlucky, = you'll get a harmful one without warning=2E >> We're not talking about something like libpng, which >> could in principle be updated without Emacs having to know about the >> update > >Libraries like libpng also make incompatible ABI changes from time to >time=2E I agree that they do it less frequently than tree-sitter >grammar libraries, but they still do=2E And yet we don't distribute >libpng with Emacs=2E When a library likes libpng makes an incompatible change, it gets a new ma= jor version=2E Consider GTK3 and GTK4=2E Often, several versions get mainta= ined simultaneously=2E Breakages are telegraphed in advance, and versions a= re usually introspectable=2E Grammars have none of this version discipline= =2E Besides: updating libpng usually gives you some value in exchange for the = doing the update=2E A new version might fix a security problem, improve per= formance, or add a feature=2E These concerns aren't relevant for grammars: = fixes and improvements usually involve changing the shape of the parse tree= , and when you change the parse tree, you have to change the Lisp that cons= umed the parse tree to match=2E I think we should vendor even libpng=2E Down with dynamic linking! Serious= y=2E But I can at least sort of see the logic in loose coupling to libpng, = especially if we consider the constraints of the boxed software and floppie= s beforetime=2E But grammars? I don't think it makes sense to depend on the= m dynamically even under a framework in which it makes sense to unbundle li= bpng=2E >> The simplest possible way to implement #1 is to just check the grammars >> into the Emacs repository and build them with Emacs using the normal >> build system=2E Trying to check in hashes and download the hash-named >> grammar versions during the build and *then* build them with Emacs --- >> why bother? Because of the hash-locking, a download-at-build-time >> scheme doesn't actually add any flexibility relative to just checking i= n >> the code=2E > >This eliminates the need to keep the grammar in our repository (or >have it sub-moduled) And it creates the need to do code distribution in a bespoke way=2E How is= that a net win? > to say nothing of the legal aspects that are >better avoided=2E =20 Nobody has been able to describe these legal aspects=2E Grammars are free = software=2E GPL compatible, too=2E That means we can put them in Emacs=2E T= hat's what software freedom means=2E > Also don't forget that we have at least two active >branches at any given time, and the number of grammar libraries we are >interested in is more than a handful=2E So adding them to our >repository is a significant addition to the maintenance burden=2E Vendoring reduces, not increases, the maintenance burden=2E If you're vend= oring or hash locking, when you cut a branch, you cut the grammars at the s= ame time=2E If you check in the grammars or their hashes, this snapshotting= happens automatically=2E The alternative would be bizarre: we don't try to= combine cc-langs=2Eel from master with cc-engine=2Eel from a release branc= h! >Other than that, yes, hash-locking is not much more flexible than >bundling=2E I tried to tell that to people who think hash-locking is a >solution, but they still insisted=2E > And since they also volunteered to >maintain the DB of hashes, I don't see why I should reject that=2E But >I don't think it's a good solution=2E Then these people should use git submodules instead of inventing a random = custom thing that we have to maintain that does the same thing as git submo= dules, except less flexibly, less familiar, and probably less robust=2E >> It's just a more complicated and error-prone way of doing the >> same thing as checking in the code=2E The same goes for other forms of >> downloading dependencies, e=2Eg=2E via git submodules=2E > >The difference is that the RI changes=2E And that's not something to >ignore, from where I stand=2E Huh? In what possible way could a bespoke downloader be a better engineeri= ng choice than submodules? ------QWHLHX80PLFQ0LVR57MU355RUNK34B Content-Type: text/html; charset=utf-8 Content-Transfer-Encoding: quoted-printable
On January 4, 2025 1:57:15 PM = EST, Eli Zaretskii <eliz@gnu=2Eorg> wrote:
>> From: Daniel C= olascione <dancol@dancol=2Eorg>
>> Cc: Bj=C3=B6rn Bidar <= bjorn=2Ebidar@thaodan=2Ede>,=C2=A0 Philip
>>=C2=A0 Kaludercic>>=C2=A0 <philipk@posteo=2Enet>,=C2=A0 emacs-devel <emacs-= devel@gnu=2Eorg>,=C2=A0 Eli Zaretskii
>>=C2=A0 <eliz@gnu=2Eo= rg>,=C2=A0 Richard Stallman <rms@gnu=2Eorg>,=C2=A0 manphiz@gmail= =2Ecom
>> Date: Sat, 04 Jan 2025 12:39:44 -0500
>>
&g= t;> The point I keep trying to make is that you can't safely update a>> foo-ts-mode tree sitter grammar without updating the correspondin= g
>> foo-ts-mode Lisp=2E=C2=A0 They're tightly coupled=2E=C2=A0 Th= ey're not separate
>> programs=2E=C2=A0 Same goes for nvim or what= ever using TS grammars=2E
>> Even distribution packagers understan= d the futility of consolidating
>> dependencies with unstable inte= rfaces=2E
>>
>> When it comes to Emacs, we either 1) tre= at grammars as part of Emacs and
>> build them with Emacs, or 2) t= ry to take a runtime dependency on
>> grammars that can be updated= independently of Emacs=2E
>> Compatibility considerations mean #2= can't work, so we're left with
>> doing #1 somehow=2E
>
= >This is true in principle, but in practice incompatible changes in
&= gt;grammar libraries are rare=2E

They are not rare=2E There are sev= eral workarounds in Emacs Lisp for grammars with different versions with di= fferent vocabularies=2E c++-ts-mode recently stopped recognizing certain la= nguages keywords ("virtual" I believe) when a grammar made an unannounced i= ncompatible change, and such a workaround had to be added=2E These breakage= s will keep happening no matter how much one might wish grammar authors wou= ld consider stability guarantees=2E

> So in practice the same Lis= p in
>foo-ts-mode can endure quite a few changes in the tree-sitter-f= oo
>grammar library

It's like cancer=2E Mutations can happen a= ny time, and if you're unlucky, you'll get a harmful one without warning=2E=

>> We're not talking about something like libpng, which
&g= t;> could in principle be updated without Emacs having to know about the=
>> update
>
>Libraries like libpng also make incompat= ible ABI changes from time to
>time=2E=C2=A0 I agree that they do it = less frequently than tree-sitter
>grammar libraries, but they still d= o=2E=C2=A0 And yet we don't distribute
>libpng with Emacs=2E

W= hen a library likes libpng makes an incompatible change, it gets a new majo= r version=2E Consider GTK3 and GTK4=2E Often, several versions get maintain= ed simultaneously=2E Breakages are telegraphed in advance, and versions are= usually introspectable=2E Grammars have none of this version discipline=2E=

Besides: updating libpng usually gives you some value in exchange f= or the doing the update=2E A new version might fix a security problem, impr= ove performance, or add a feature=2E These concerns aren't relevant for gra= mmars: fixes and improvements usually involve changing the shape of the par= se tree, and when you change the parse tree, you have to change the Lisp th= at consumed the parse tree to match=2E

I think we should vendor even= libpng=2E Down with dynamic linking! Seriousy=2E But I can at least sort o= f see the logic in loose coupling to libpng, especially if we consider the = constraints of the boxed software and floppies beforetime=2E But grammars? = I don't think it makes sense to depend on them dynamically even under a fra= mework in which it makes sense to unbundle libpng=2E

>> The si= mplest possible way to implement #1 is to just check the grammars
>&g= t; into the Emacs repository and build them with Emacs using the normal
= >> build system=2E=C2=A0 Trying to check in hashes and download the h= ash-named
>> grammar versions during the build and *then* build th= em with Emacs ---
>> why bother?=C2=A0 Because of the hash-locking= , a download-at-build-time
>> scheme doesn't actually add any flex= ibility relative to just checking in
>> the code=2E
>
>= ;This eliminates the need to keep the grammar in our repository (or
>= have it sub-moduled)

And it creates the need to do code distribution= in a bespoke way=2E How is that a net win?


> to say nothing = of the legal aspects that are
>better avoided=2E=C2=A0

Nobody= has been able to describe these legal aspects=2E Grammars are free softwar= e=2E GPL compatible, too=2E That means we can put them in Emacs=2E That's w= hat software freedom means=2E

> Also don't forget that we have at= least two active
>branches at any given time, and the number of gram= mar libraries we are
>interested in is more than a handful=2E=C2=A0 S= o adding them to our
>repository is a significant addition to the mai= ntenance burden=2E

Vendoring reduces, not increases, the maintenance= burden=2E If you're vendoring or hash locking, when you cut a branch, you = cut the grammars at the same time=2E If you check in the grammars or their = hashes, this snapshotting happens automatically=2E The alternative would be= bizarre: we don't try to combine cc-langs=2Eel from master with cc-engine= =2Eel from a release branch!


>Other than that, yes, hash-lock= ing is not much more flexible than
>bundling=2E=C2=A0 I tried to tell= that to people who think hash-locking is a
>solution, but they still= insisted=2E



> And since they also volunteered to
>= maintain the DB of hashes, I don't see why I should reject that=2E=C2=A0 Bu= t
>I don't think it's a good solution=2E

Then these people sho= uld use git submodules instead of inventing a random custom thing that we h= ave to maintain that does the same thing as git submodules, except less fle= xibly, less familiar, and probably less robust=2E

>> It's just= a more complicated and error-prone way of doing the
>> same thing= as checking in the code=2E=C2=A0 The same goes for other forms of
>&= gt; downloading dependencies, e=2Eg=2E via git submodules=2E
>
>= ;The difference is that the RI changes=2E=C2=A0 And that's not something to=
>ignore, from where I stand=2E

Huh? In what possible way coul= d a bespoke downloader be a better engineering choice than submodules?
<= /div> ------QWHLHX80PLFQ0LVR57MU355RUNK34B--