From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?utf-8?Q?Bj=C3=B6rn?= Bidar Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter maturity Date: Sun, 05 Jan 2025 01:21:23 +0200 Message-ID: <23250.1399282896$1736032971@news.gmane.org> References: <67428b3d.c80a0220.2f3036.adbdSMTPIN_ADDED_BROKEN@mx.google.com> <86ldwdm7xg.fsf@gnu.org> <6765355b.c80a0220.1a6b24.3117SMTPIN_ADDED_BROKEN@mx.google.com> <00554790-CACA-4233-8846-9E091CF1F7AA@gmail.com> <86msgl2red.fsf@gnu.org> <87o710sr7y.fsf@debian-hx90.lan> <8734i9tmze.fsf@posteo.net> <86plldwb7w.fsf@gnu.org> <87ttapryxr.fsf@posteo.net> <0883EB00-3BB2-4BC8-95D1-45F4497C0526@dancol.org> <87msge8bv8.fsf@dancol.org> <6775a459.170a0220.2f3d1e.1897SMTPIN_ADDED_BROKEN@mx.google.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35916"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Daniel Colascione , Philip Kaludercic , emacs-devel , Eli Zaretskii , Richard Stallman , manphiz@gmail.com To: Lynn Winebarger Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Jan 05 00:22:42 2025 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tUDTe-0009Cp-29 for ged-emacs-devel@m.gmane-mx.org; Sun, 05 Jan 2025 00:22:42 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tUDSX-0002EC-Sv; Sat, 04 Jan 2025 18:21:33 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUDSU-0002Dg-SD for emacs-devel@gnu.org; Sat, 04 Jan 2025 18:21:31 -0500 Original-Received: from thaodan.de ([185.216.177.71]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tUDSS-0002hK-Jf; Sat, 04 Jan 2025 18:21:30 -0500 Original-Received: from odin (dsl-trebng12-50dc7b-49.dhcp.inet.fi [80.220.123.49]) by thaodan.de (Postfix) with ESMTPSA id BC2EAD0002E; Sun, 5 Jan 2025 01:21:24 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=thaodan.de; s=mail; t=1736032885; bh=E3+nIHHFx7Dye8Z04srFaetpTLGuA77Na5VhJ45GXko=; h=From:To:Cc:Subject:In-Reply-To:References:Date; b=JhvusUcEWyeQv7S4Ydzj6+Qzvaa6pF+DHJXrlF94VnFCzDfEea87c45W/jEh7TDAL RMCgRRTTk1O/jipHAqFUm5k4DSbQUaylpTguZdXijqMueHV0K1bCRsxAepGoTw4x2G QnZKduMqgIgzRk4O/wqNYcSpOeOGXnxa2BO6/2xhpDQC0EjFLRR0ZOST82mLj/QoJc J50DPjgTdol+rxjp1X6TXTAWnNl1TxZWeNrq8oaNBVzTPGtI2JYNuOrYsuzbgU3oED woQzT/Nz6s4N+vSkXR37uGvxXE6oOflDPkfbagctUQybA4qqdrCoJLdN88kWzqS+09 zXAKadeEtlHWgN5g885sXtTlx1WyhaW4Odikl46FIJoutjUXFZUKPaUb9egiLdAsmm fTMaCagdsdbDlsNVu2udsmzD2Mt92SskV5sVv5ivAT7VgJc9HOro6XdwB0oa3aUYPC xziODphgxZc3AtKAXZeUgFKtBD1Hn885ZUL1iX+4cazov6KzMTZlmKt+o3NXdaO7IQ mtR2CrpboStqbFdTm8/Rk9tA39KF33yrymDxgiVnaf+KWaf2K8gHKvaZfm607MofwM qyEH+epm0RqORFgaGq+FRgPHbDC1bTZQ9hFemS3TW1yD8njt+Z77xsWs3wL0mkJ2ve QnWnPks7hULPD/C+LQ5U/YEE= In-Reply-To: (Lynn Winebarger's message of "Sat, 4 Jan 2025 11:15:22 -0500") Autocrypt: addr=bjorn.bidar@thaodan.de; prefer-encrypt=nopreference; keydata= mDMEZNfpPhYJKwYBBAHaRw8BAQdACBEmr+0xwIIHZfIDlZmm7sa+lHHSb0g9FZrN6qE6ru60JUJq w7ZybiBCaWRhciA8Ympvcm4uYmlkYXJAdGhhb2Rhbi5kZT6IlgQTFgoAPgIbAwULCQgHAgIiAgYV CgkICwIEFgIDAQIeBwIXgBYhBFHxdut1RzAepymoq1wbdKFlHF9oBQJk1/YmAhkBAAoJEFwbdKFl HF9oB9cBAJoIIGQKXm4cpap+Flxc/EGnYl0123lcEyzuduqvlDT0AQC3OlFKm/OiqJ8IMTrzJRZ8 phFssTkSrrFXnM2jm5PYDoiTBBMWCgA7FiEEUfF263VHMB6nKairXBt0oWUcX2gFAmTX6T4CGwMF CwkIBwICIgIGFQoJCAsCBBYCAwECHgcCF4AACgkQXBt0oWUcX2hbCQEAtru7kvM8hi8zo6z9ux2h K+B5xViKuo7Z8K3IXuK5ugwA+wUfKzomzdBPhfxDsqLcEziGRxoyx0Q3ld9aermBUccHtBxCasO2 cm4gQmlkYXIgPG1lQHRoYW9kYW4uZGU+iJMEExYKADsCGwMFCwkIBwICIgIGFQoJCAsCBBYCAwEC HgcCF4AWIQRR8XbrdUcwHqcpqKtcG3ShZRxfaAUCZNf2FQAKCRBcG3ShZRxfaCzSAP4hZ7cSp0YN XYpcjHdsySh2MuBhhoPeLGXs+2kSiqBiOwD/TP8AgPEg/R+SI9GI9on7fBJJ0mp2IT8kZ2rhDOjg gA6IkwQTFgoAOxYhBFHxdut1RzAepymoq1wbdKFlH Received-SPF: pass client-ip=185.216.177.71; envelope-from=bjorn.bidar@thaodan.de; helo=thaodan.de X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, INVALID_MSGID=0.568, RCVD_IN_VALIDITY_RPBL_BLOCKED=0.001, RCVD_IN_VALIDITY_SAFE_BLOCKED=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327692 Archived-At: Lynn Winebarger writes: > On Wed, Jan 1, 2025 at 3:23=E2=80=AFPM Bj=C3=B6rn Bidar wrote: >> Lynn Winebarger writes: >> >> Tree sitter, as wonderful as it is, strikes me as a bit of a Rube >> >> Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-) >> > >> > They evidently decided to use JSON and a simple schema to specify the >> > concrete grammar, instead of creating a DSL for the purpose. >> > Javascript is just a convenient way for embedding code into JSON the >> > same way LISP programmers use lisp to generate S-expressions. Once >> > you have the JSON format generated, javascript is not used. >> > >> > The rest of the project is really composed of orthogonal components, >> > the GLR grammar compiler (written in Rust) and the run-time GLR >> > parsing engine, written in C. The grammar compiler produces the >> > parsing tables in the form of C source code that is compiled together >> > with the library for a single library per grammar, but the C library >> > does not actually require the parsing tables to be statically known at >> > compile-time, at least the last I looked, unless some really obscure >> > dependence. The procedural interface to the parser just takes a >> > pointer to the parser table data structure at run-time. >> > >> > Since GLR grammars are basically arbitrary (ambiguous) LR(1) grammars, >> > the parser run-time has to implement a fairly sophisticated algorithm >> > (graph-stacks) to be efficient. Having implemented the LALR parser >> > generator at least 3 times in the last couple of decades (just for my >> > own use), generating the parse tables looks like a lot simpler (and >> > well-understood) problem to solve than the GLR run-time. More >> > importantly, the efficiency of the grammar compiler is not all that >> > critical compared to the run-time. >> > >> >> Additional alernatives instead of Node are already a good alternative. >> Using WASM as the output format also does not sound bad assuming their >> is some abstraction from the tree-sitter library side. > > I'm not sure why WASM would be interesting. AFAICT, it's just another > set of bindings to the C library, maybe with the tables compiled into > WASM binary module (or whatever the correct term should be - I'm not a > WASM expert). In any case, AFAIK Emacs has no particular capability > for using WASM files as dynamic libraries in general. Maybe if Emacs > itself was compiled to WASM, in which case I suppose the function for > dynamically loading libraries would implicitly load such modules. > > OTOH, the generated WASM bindings might provide an example of using > the tree-sitter DLL with the in-memory parse table structure not > embedded in the tree-sitter DLL. Is that what you meant? Maybe I missunderstood but my assumption was that the newer WASM parsers would be less prone to breakage. But if it's just about compiling the same code generated to WASM then I don't see the benefit either. >> > I agree, a generic grammar capturing the structures of most >> > programming languages would be useful. It is definitely possible to >> > extract the syntactic/semantic concepts from C++ and Python to create >> > such a grammar, if you are willing to allow nested grammars >> > appropriately delimited. For example, a constructor context would >> > delimit an expression in a data language that is embedded in a >> > constructor context that may itself have delimited value contexts >> > where the functional/procedural grammar may appear, ad infinitum. The >> > procedural and data grammars are distinct but mutually recursive. >> > That would be if the form appeared in an rvalue-context. For l-value >> > expressions, the same constructor delimiting syntax can become a >> > binding form, at least, with subexpressions of binding forms also >> > being binding forms. As long as the scanner is dynamically set >> > according to the grammar context (and recognizes/signals the closing >> > delimiter), the grammar can be made non-ambiguous because a given >> > character will produce context-appropriate terminal symbols. >> >> What kind of scanner are you referring to? Something that works like a >> binding generator but for AST? > > Aside from being useful for generic templating purposes, Such a > generic grammar would be of use for the purpose Daniel described, i.e. > a layer of abstraction usable for almost any modern language, even in > polyglot texts. > This exactly what I wondering too. Some languages embed others into themselves or are hybris. Good examples would be Python inside a template and QML is Markup but also JavaScript depending on the context. A more flexible grammar system would help here. Kinda like reinventing semantic again.. >> > As for vendoring, I just doubt you will get much buy-in in this forum. >> > There are corporate-type free/open-source software projects that >> > prioritize uniformity in build environments and limiting the scope of >> > bugs that can arise from the build process/dependencies that vendor at >> > the drop of the hat. Then there are "classic" free software projects >> > that have amalgamated the work of many individual contributors, and >> > those contributors often prioritize control of the software running on >> > their systems for whatever reason (but eliminating non-free software >> > is definitely one of them), and they often can/will contribute patches >> > for that purpose. The second camp *hates* vendoring because it >> > subverts their control of their computational resources. At least, >> > that's the dichotomy I see. There are probably finer points I'm >> > missing or mischaracterizing. >> >> From my point as a distribution packager there are several reason why >> vendoring can be bad or in some context keeping them is the better >> decision. >> >> But in this context it complicates the build process as now each grammar >> has to be built for Emacs in addition to another editors. >> The Emacs package now pulls in more build dependencies at built time >> which complicates the built process as the dependency grows. >> >> Besides bundled dependencies are not allowed unless there's no way to >> avoid them. It is not about control or anything. > > That sounds like something I would interpret as control. Distro > creators/maintainers are prime candidates for wanting to maintain > control of the build/run-time environment, as they are responsible for > everything they bundle working together. Perhaps "control of their > computational resources" is more specific than I intended in my > previous posting. > Yeah you are right.=20