From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Lynn Winebarger Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter maturity Date: Sun, 29 Dec 2024 22:20:59 -0500 Message-ID: References: <1ed88fca-788a-fe9f-b6c8-edb2f49751c9@mavit.org.uk> <67428b3d.c80a0220.2f3036.adbdSMTPIN_ADDED_BROKEN@mx.google.com> <86ldwdm7xg.fsf@gnu.org> <6765355b.c80a0220.1a6b24.3117SMTPIN_ADDED_BROKEN@mx.google.com> <00554790-CACA-4233-8846-9E091CF1F7AA@gmail.com> <86msgl2red.fsf@gnu.org> <87o710sr7y.fsf@debian-hx90.lan> <8734i9tmze.fsf@posteo.net> <86plldwb7w.fsf@gnu.org> <87ttapryxr.fsf@posteo.net> <0883EB00-3BB2-4BC8-95D1-45F4497C0526@dancol.org> <87msge8bv8.fsf@dancol.org> <6771db94.050a0220.386e00.e451SMTPIN_ADDED_BROKEN@mx.google.com> <77FBB3FF-A0F5-416C-AE35-39C0D818FBA9@gmail.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000fa7dc3062a744daf" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32559"; mail-complaints-to="usenet@ciao.gmane.io" Cc: =?UTF-8?B?QmrDtnJuIEJpZGFy?= , Daniel Colascione , Philip Kaludercic , emacs-devel , Eli Zaretskii , Richard Stallman , manphiz@gmail.com To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Dec 30 04:21:39 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1tS6Lb-0008LP-At for ged-emacs-devel@m.gmane-mx.org; Mon, 30 Dec 2024 04:21:39 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1tS6LI-0000VL-CP; Sun, 29 Dec 2024 22:21:20 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1tS6LH-0000SF-3o for emacs-devel@gnu.org; Sun, 29 Dec 2024 22:21:19 -0500 Original-Received: from mail-wm1-x32f.google.com ([2a00:1450:4864:20::32f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1tS6LF-0005IE-0C; Sun, 29 Dec 2024 22:21:18 -0500 Original-Received: by mail-wm1-x32f.google.com with SMTP id 5b1f17b1804b1-436203f1203so12976355e9.2; Sun, 29 Dec 2024 19:21:14 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1735528873; x=1736133673; darn=gnu.org; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=35nQ+q+f5deJsNKUZ2DFVShvaIk7B2MGIdY0ONT5k6E=; b=kAvNrbPcGUjiEeIlxIeHtLTwwLbSK7tQfEMR6pcalNh0qbU/m7WjvAV703Ix9xrR1g wkN83BO7riqj+PVJCstk/++esBDw43QV38yWdf440TYSM/6oOikVElcWUeQP/xjFxW9Q mQHNFZctiacuileOdKfR9eFNdtHp1XjIbx2DXm6SKgPKMlo+k/T4ukQVQIEB3FcB2nXC qq6Lm4aTGnpi3IB1gkYWIVKzbufK5xOR7vrY1edo1VBsXdCs5FMOyenS/VDZ7CyaEcU/ wDng2Zq+AWcRq8oqjcRGFs06K7nD/DIr83zQt9zyUp+eL8RGNqnV/IIsulvAUc7v7Tqa jnmg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1735528873; x=1736133673; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=35nQ+q+f5deJsNKUZ2DFVShvaIk7B2MGIdY0ONT5k6E=; b=fFzg+jwu4AFekVZeRHllbOlNduTTuUrRbg8ePuiXC5OZH3zFVStjZ/k61dfdeAUZJJ 0rDQfpj9Pz9pV6xuQgezIY9OW3arbcA5Xto7xALWjXymechz8DrAN0OM7UlEIshmBOtu 9Lp1QlCdS92OutDYNbFMddtKjEBxsQUJPbVgYsEtOR3oVol1IrELMztxJQYXS81j1R2Q 5Ny1eJkvde5ZIV0w72LAdjqDCJvBIUaTFJG2oPn7jObgVm1/T5sI/yY9vLsSlQp1K2ld w2uulZl3bHEbfHKvlI55WC4j4Jmm0/hWFg4anYomS5WonmGwYwGdfpHZkGyxdZfj7PFQ DMGg== X-Forwarded-Encrypted: i=1; AJvYcCVTVM4qaZRfOr/w+TVzinwyP8F/BTg1aHX44fl56jcQuLIyXg1p6RDbSsXBrvRi8XK/6ltbqL05OiapznU=@gnu.org, AJvYcCWTZbo9jrNAT8RmxnHa0rgfQEh+0egptR/Q3s1+7DvSh8rsdOnnY7O0W6VoevFIFVGEXIVV@gnu.org, AJvYcCXJ6hGmhOz9GnDCF3fhzHo7eCFvHPmkDwfxHJ6fXZ2VEdLgXQieOCCgM3U00Ww3oxYyBB1w@gnu.org X-Gm-Message-State: AOJu0YwyM+jFuyiaihTtFOC5sbKj8xVNXpBWNN1DFoiiwRKYlG/xdmWP krWutVJPm8Pomm6jbC5iXDcJMfB0hBTtmUEoaup6uV3sA4ZcrQI9ILHHq8WCIKX0pW09wt7nRNg kg9mz11cYi4DlW2G9JHu1usx0YEE= X-Gm-Gg: ASbGncv5TG3ohb3UqaZ8K9VL2vrYhTxha6jvQxl2YjfRSA4Osh/Hs22qgWrjXCbtfVO Af2fm4TlQX/iH4tC2ycI8Cv0M7DzzWD9ixNvFI5MSeTIeS113PbX6WlxC2y0lNpio4HNB/Q== X-Google-Smtp-Source: AGHT+IHur47Vws4RVhVcHOcSX537ST6WqRqSeEtzJX4lihqSB6dpKuuyq9N6yObrKB7ZXZYhCG49aVP5s1GuEsx4HQk= X-Received: by 2002:a7b:c85a:0:b0:436:90d4:9db with SMTP id 5b1f17b1804b1-43690d40b0fmr54580035e9.2.1735528873096; Sun, 29 Dec 2024 19:21:13 -0800 (PST) In-Reply-To: <77FBB3FF-A0F5-416C-AE35-39C0D818FBA9@gmail.com> Received-SPF: pass client-ip=2a00:1450:4864:20::32f; envelope-from=owinebar@gmail.com; helo=mail-wm1-x32f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:327383 Archived-At: --000000000000fa7dc3062a744daf Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable On Sun, Dec 29, 2024, 7:31=E2=80=AFPM Yuan Fu wrote: > > > > On Dec 29, 2024, at 3:29=E2=80=AFPM, Bj=C3=B6rn Bidar wrote: > > > > Daniel Colascione writes: > > > >> Lynn Winebarger writes: > >> > >>> On Fri, Dec 27, 2024, 9:25=E2=80=AFAM Daniel Colascione > wrote: > >>> > >>>> > >>>> > >>>> It's a shame there's no way to write TS grammars in plain elisp. I > figure > >>>> vendoring both the source and the generated code would be best, as > it'd > >>>> allow building Emacs anywhere but still make it convenient on system= s > with > >>>> needed tools (JS runtime, Rust, etc.) to update and modify the > grammar. As > >>>> with any scheme involving checking in generated outputs, the source > and > >>>> output can get out of sync, but I think there are build time > guardrails we > >>>> can build to make sure it doesn't happen. > >>>> > >>> > >>> I looked into this last year. The tree-sitter library provides a > parsing > >>> engine that references a fairly standard LR type parsing table in > binary > >>> form. I got stuck in adding a generic primitive functionality for > reading > >>> and writing arbitrary binary data structures based on a data > description > >>> DSL, since I wouldn't want to tie the interpreter core to the data > >>> structures of an external, dynamically-loadable library. But, I wasn= 't > >>> sure such an extension would be accepted into emacs, as I am not an > expert > >>> on the possible security implications. > >>> > >>> Other than that, emacs already has the code for calculating (LA)LR > parsing > >>> tables in the semantic packages. The tree-sitter grammar compiler ma= y > have > >>> additional logic for providing multiple starting symbols, but the > parsing > >>> engine should still function with a classic parsing table. > >> > >> Thanks. Such an approach would let us treat tree-sitter grammars a lo= t > >> more like font-lock-keywords, and I think for some modes, that'd be a > >> good option. (Of course, SHTDI.) > >> > >> Tree sitter, as wonderful as it is, strikes me as a bit of a Rube > >> Goldberg machine architecturally: JS *and* Rust *and* C? Really? :-) > > > I was wondering the same. How the hell? There had been some talks to > > support a more lightweight JavaScript interpreter as an alternative but > > it hasn't gone anyway. Somehow because compatibility reason. I don't ho= w > > could node be dependency for these. Grammars are mostly without > > dependencies except some have dependencies to other grammars on the > > source level such as the C++ require the C grammar. > > I don=E2=80=99t think you need nodejs to build the grammar. You might nee= d it to > develop the grammar, but compiling grammar.js to parser.c only requires t= he > tree-sitter CLI which is written in Rust. > The grammar.js is written in a lispy way, an is interpreted by node to expand out to a JSON format. See the middle ofhttps:// tree-sitter.github.io/tree-sitter/5-implementation.html : =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D Parsing a Grammar First, Tree-sitter must evaluate the JavaScript code in grammar.js and convert the grammar to a JSON format. It does this by shelling out to node. The format of the grammars is formally specified by the JSON schema in grammar.schema.json. The parsing is implemented in parse_grammar.rs. =3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D The resulting JSON representation of the grammar is then compiled by the parser (table) generator written in Rust. The JavaScript form of the grammar could only use the functions defined by the tree-sitter node module (e.g. the "$" object, "choice" function, etc) which would be fairly trivial to transliterate into lisp form, but it can incorporate arbitrary JS code as well. Lynn Lynn --000000000000fa7dc3062a744daf Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


On Sun, Dec 29, 2024, 7:31=E2=80=AFPM Yuan Fu= <casouri@gmail.com> wrote:


> On Dec 29, 2024, at 3:29=E2=80=AFPM, Bj=C3=B6rn Bidar <bjorn.bidar@thaodan.de> wrote:
>
> Daniel Colascione <dancol@dancol.org> writes:
>
>> Lynn Winebarger <owinebar@gmail.com> writes:<= br> >>
>>> On Fri, Dec 27, 2024, 9:25=E2=80=AFAM Daniel Colascione <dancol@dancol.org> wrote:
>>>
>>>>
>>>>
>>>> It's a shame there's no way to write TS grammars i= n plain elisp. I figure
>>>> vendoring both the source and the generated code would be = best, as it'd
>>>> allow building Emacs anywhere but still make it convenient= on systems with
>>>> needed tools (JS runtime, Rust, etc.) to update and modify= the grammar. As
>>>> with any scheme involving checking in generated outputs, t= he source and
>>>> output can get out of sync, but I think there are build ti= me guardrails we
>>>> can build to make sure it doesn't happen.
>>>>
>>>
>>> I looked into this last year.=C2=A0 The tree-sitter library pr= ovides a parsing
>>> engine that references a fairly standard LR type parsing table= in binary
>>> form.=C2=A0 I got stuck in adding a generic primitive function= ality for reading
>>> and writing arbitrary binary data structures based on a data d= escription
>>> DSL, since I wouldn't want to tie the interpreter core to = the data
>>> structures of an external, dynamically-loadable library.=C2=A0= But, I wasn't
>>> sure such an extension would be accepted into emacs, as I am n= ot an expert
>>> on the possible security implications.
>>>
>>> Other than that, emacs already has the code for calculating (L= A)LR parsing
>>> tables in the semantic packages.=C2=A0 The tree-sitter grammar= compiler may have
>>> additional logic for providing multiple starting symbols, but = the parsing
>>> engine should still function with a classic parsing table.
>>
>> Thanks.=C2=A0 Such an approach would let us treat tree-sitter gram= mars a lot
>> more like font-lock-keywords, and I think for some modes, that'= ;d be a
>> good option.=C2=A0 (Of course, SHTDI.)
>>
>> Tree sitter, as wonderful as it is, strikes me as a bit of a Rube<= br> >> Goldberg machine architecturally: JS *and* Rust *and* C? Really? := -)

> I was wondering the same. How the hell? There had been some talks to > support a more lightweight JavaScript interpreter as an alternative bu= t
> it hasn't gone anyway. Somehow because compatibility reason. I don= 't how
> could node be dependency for these. Grammars are mostly without
> dependencies except some have dependencies to other grammars on the > source level such as the C++ require the C grammar.

I don=E2=80=99t think you need nodejs to build the grammar. You might need = it to develop the grammar, but compiling grammar.js to parser.c only requir= es the tree-sitter CLI which is written in Rust.

The grammar.js is written in a li= spy way, an is interpreted by node to expand out to a JSON format.=C2=A0 Se= e the middle ofhttps://tree-sitter.github.io/tree-sitter/5-implementation.h= tml :

=3D=3D=3D=3D= =3D=3D=3D=3D=3D=3D
Parsing a Grammar
First, Tree-sitter must evaluate the JavaScript code in grammar.j= s and convert the grammar to a JSON format. It does this by shelling out to= node. The format of the grammars is formally specified by the JSON schema = in grammar.schema.json. The parsing is implemented in parse_grammar.rs.
=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D=3D

The resulting JSON representation of the gr= ammar is then compiled by the parser (table) generator written in Rust.

The JavaScript form of the = grammar could only use the functions defined by the tree-sitter node module= (e.g. the "$" object, "choice" function, etc) which wo= uld be fairly trivial to transliterate into lisp form, but it can incorpora= te arbitrary JS code as well.

Lynn

Lynn

--000000000000fa7dc3062a744daf--