From: Yuan Fu <casouri@gmail.com>
To: Lynn Winebarger <owinebar@gmail.com>
Cc: "\"Augustin Chéneau (BTuin)\"" <btuin@mailo.com>, emacs-devel@gnu.org
Subject: Re: Questions about tree-sitter
Date: Thu, 7 Sep 2023 16:42:44 -0700 [thread overview]
Message-ID: <581816B0-2F41-42C9-B49A-70F7DD800212@gmail.com> (raw)
In-Reply-To: <CAM=F=bAwuhmzysqQUVYUuMDo1mq=K2O4BiZm-pOh+LYjJF774A@mail.gmail.com>
> On Sep 6, 2023, at 9:11 AM, Lynn Winebarger <owinebar@gmail.com> wrote:
>
> On Wed, Aug 30, 2023 at 3:03 AM Yuan Fu <casouri@gmail.com> wrote:
>>> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>> I have a few questions about tree-sitter.
>>>
>>> I'm currently developing a grammar for GNU Bison alongside a tree-sitter
>>> major mode, it's a work in progress. The grammar is here:
>>> <https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
>>> far able to parse simple files, and the major mode prototype is
>>> attached to this message.
>>>
>>> So, the questions:
>>>
>>> 1. Is there a way to reload a grammar?
>>>
>>> Emacs is pretty nice as a playground for testing grammars, but once a
>>> grammar is loaded, it won't be loaded again until Emacs restarts (as far
>>> as I know).
>>> Is it possible to reload a grammar after modifying it?
>>
>> No, and it’s probably not easy to implement either, since unloading the grammar would require Emacs to purge/invalid all the node/query/parsers using that grammar.
>
> Reviewing some generated "parser.c" files, and some of the available
> documentation, it appears the parser.c file basically creates a lexing
> function that adheres to a certain protocol in terms of
> producing/consuming a standard lexer state data structure, and an
> LR(1) parser table suitable for GLR parsing (i.e. allows ambiguous
> actions). These and definitions of the tokens and grammar symbols are
> bundled up in a language structure passed to the tree-sitter library.
> LALR(1) tables are essentially simplified/compressed LR(1) tables, and
> emacs has code to calculate such tables directly in elisp.
> Therefore, given functionality to translate elisp data into the raw C
> structures, we should be able to dynamically create language data
> structures to pass to the tree-sitter library to create a library.
> We would also need a table driven lexer framework in place of the
> generated lexer in the C file to completely avoid going through a C
> compiler.
> The other novel features of tree-sitter parsers appear to be
> implemented in the parser runtime, not in the table calculation.
>
> I've implemented LALR(1) parser generators two or three times in the
> last couple of decades, this might be a fun project for me while I am
> unambiguously able to contribute to GNU Emacs.
That’ll be great. But note that the parser structure has scape hatches: certain things can be implemented by arbitrary C function. Also tree-sitter allows grammars to use custom scanners [1].
[1] https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners
Yuan
next prev parent reply other threads:[~2023-09-07 23:42 UTC|newest]
Thread overview: 21+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-08-29 21:26 Questions about tree-sitter Augustin Chéneau (BTuin)
2023-08-30 7:03 ` Yuan Fu
2023-08-30 11:28 ` Augustin Chéneau (BTuin)
2023-09-06 4:07 ` Yuan Fu
2023-09-08 11:53 ` Augustin Chéneau (BTuin)
2023-09-08 16:43 ` Yuan Fu
2023-09-09 16:39 ` Augustin Chéneau (BTuin)
2023-09-12 0:22 ` Yuan Fu
2023-09-13 12:43 ` Augustin Chéneau (BTuin)
2023-09-14 4:11 ` Yuan Fu
2023-09-18 17:04 ` Augustin Chéneau (BTuin)
2023-09-19 4:00 ` Yuan Fu
2023-09-01 2:39 ` Madhu
2023-09-01 6:53 ` Eli Zaretskii
2023-09-01 9:15 ` Madhu
2023-09-01 10:45 ` Dmitry Gutov
2023-09-01 10:58 ` Eli Zaretskii
2023-11-27 7:16 ` Madhu
2023-09-06 16:11 ` Lynn Winebarger
2023-09-07 23:42 ` Yuan Fu [this message]
2023-09-08 0:11 ` Lynn Winebarger
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=581816B0-2F41-42C9-B49A-70F7DD800212@gmail.com \
--to=casouri@gmail.com \
--cc=btuin@mailo.com \
--cc=emacs-devel@gnu.org \
--cc=owinebar@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).