From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Daniel Colascione Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter maturity Date: Sun, 29 Dec 2024 15:36:59 -0500 Message-ID: <87msge8bv8.fsf@dancol.org> References: <1ed88fca-788a-fe9f-b6c8-edb2f49751c9@mavit.org.uk> <67428b3d.c80a0220.2f3036.adbdSMTPIN_ADDED_BROKEN@mx.google.com> <86ldwdm7xg.fsf@gnu.org> <6765355b.c80a0220.1a6b24.3117SMTPIN_ADDED_BROKEN@mx.google.com> <00554790-CACA-4233-8846-9E091CF1F7AA@gmail.com> <86msgl2red.fsf@gnu.org> <87o710sr7y.fsf@debian-hx90.lan> <8734i9tmze.fsf@posteo.net> <86plldwb7w.fsf@gnu.org> <87ttapryxr.fsf@posteo.net> <0883EB00-3BB2-4BC8-95D1-45F4497C0526@dancol.org> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="11854"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: mu4e 1.12.8; emacs 31.0.50 Cc: Philip Kaludercic , emacs-devel , Eli Zaretskii , Richard Stallman , manphiz@gmail.com To: Lynn Winebarger Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Dec 29 21:37:59 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tS02v-0002oJ-TI for ged-emacs-devel@m.gmane-mx.org; Sun, 29 Dec 2024 21:37:58 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tS02I-00022Z-FW; Sun, 29 Dec 2024 15:37:18 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tS02F-00021y-DM for emacs-devel@gnu.org; Sun, 29 Dec 2024 15:37:16 -0500 Original-Received: from dancol.org ([2600:3c01:e000:3d8::1]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tS028-0000OM-LS; Sun, 29 Dec 2024 15:37:15 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=dancol.org; s=x; h=Content-Transfer-Encoding:Content-Type:MIME-Version:Message-ID:Date: References:In-Reply-To:Subject:Cc:To:From:Sender:Reply-To:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:List-Id:List-Help:List-Unsubscribe:List-Subscribe: List-Post:List-Owner:List-Archive; bh=/HKRuDcJuuXUlMUpybrhKiqPs8cQDLgdJNgu6uJgFoo=; b=AorcN+MkqEPYCFHm1B1JSdQUVX Q5Fx1XR94k52JV5mrJxesrAoswwvqn0ll2DFXP/zacXK2pZ6hkt6ZZbkmac85+5u4/Wq0KfKM04rG K3Mpuz/BzwRs1XjvPBFbOOmzH8J1LTj10Ed+4k3cjgbt4QcXCjntS1di88pS0uOPqdw2bxFEAYCHg q3N0z0xDpQRnLdBYIqxdltKQdv8ToS/P7sdcvbh+Ez+hV6jABK4OkKNyvITEAmFQoBXeVMndB2SAl Er0v0C0UQ4ZYE8AZKc0qgFhAxsP5ebxWkW9cmpw4MJSKQ1v3FMdkRWf/0cjQtJldVCsfYWmgOam+v asapoCrg==; Original-Received: from [2600:1006:b142:ae25:7a7c:f6d5:6d2d:5507] (port=40390 helo=localhost) by dancol.org with esmtpsa (TLS1.3) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.96) (envelope-from ) id 1tS021-0000rp-0t; Sun, 29 Dec 2024 15:37:01 -0500 In-Reply-To: (Lynn Winebarger's message of "Sun, 29 Dec 2024 09:36:23 -0500") Received-SPF: pass client-ip=2600:3c01:e000:3d8::1; envelope-from=dancol@dancol.org; helo=dancol.org X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327371 Archived-At: Lynn Winebarger writes: > On Fri, Dec 27, 2024, 9:25=E2=80=AFAM Daniel Colascione wrote: > >> >> >> It's a shame there's no way to write TS grammars in plain elisp. I figure >> vendoring both the source and the generated code would be best, as it'd >> allow building Emacs anywhere but still make it convenient on systems wi= th >> needed tools (JS runtime, Rust, etc.) to update and modify the grammar. = As >> with any scheme involving checking in generated outputs, the source and >> output can get out of sync, but I think there are build time guardrails = we >> can build to make sure it doesn't happen. >> > > I looked into this last year. The tree-sitter library provides a parsing > engine that references a fairly standard LR type parsing table in binary > form. I got stuck in adding a generic primitive functionality for reading > and writing arbitrary binary data structures based on a data description > DSL, since I wouldn't want to tie the interpreter core to the data > structures of an external, dynamically-loadable library. But, I wasn't > sure such an extension would be accepted into emacs, as I am not an expert > on the possible security implications. > > Other than that, emacs already has the code for calculating (LA)LR parsing > tables in the semantic packages. The tree-sitter grammar compiler may ha= ve > additional logic for providing multiple starting symbols, but the parsing > engine should still function with a classic parsing table. Thanks. Such an approach would let us treat tree-sitter grammars a lot more like font-lock-keywords, and I think for some modes, that'd be a good option. (Of course, SHTDI.) Tree sitter, as wonderful as it is, strikes me as a bit of a Rube Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-) Do you happen to know whether the subset of Rust that gccrs recognizes is sufficient to compile the tree sitter grammar compiler? If so, we could in principle combine gccrs with a bare-bones embedded JS interpreter like https://duckjs.org/ to produce a mechanism that would let us customize and rebuild tree sitter grammars as easily as we do elisp files, even on obscure platforms like DJGPP. Some Emacs modes could ship with .js grammars sourced from upstream editor-neutral projects. Other modes might just build tree sitter parse tables in elisp using something vaguely like SMIE syntax. Both styles of mode would be customizable by end users, and we'd (because, I'm a broken record, vendor vendor vendor) we'd maintain compatibility without mysterious AST-change-related breakages.