From: Ihor Radchenko <yantar92@posteo.net>
To: Yuan Fu <casouri@gmail.com>
Cc: emacs-devel <emacs-devel@gnu.org>,
"Danny Freeman" <danny@dfreeman.email>,
"Theodor Thornhill" <theo@thornhill.no>,
"Jostein Kjønigsen" <jostein@secure.kjonigsen.net>,
"Randy Taylor" <dev@rjt.dev>,
"Wilhelm Kirschbaum" <wkirschbaum@gmail.com>,
"Perry Smith" <pedz@easesoftware.com>,
"Dmitry Gutov" <dgutov@yandex.ru>
Subject: Re: Update on tree-sitter structure navigation
Date: Sat, 02 Sep 2023 06:52:41 +0000 [thread overview]
Message-ID: <87h6odhxs6.fsf@localhost> (raw)
In-Reply-To: <5E7F2A94-4377-45C0-8541-7F59F3B54BA1@gmail.com>
Yuan Fu <casouri@gmail.com> writes:
> In the months after wrapping up tree-sitter stuff in emacs-29, I was
> thinking about how to implement structural navigation and extracting
> information from the parser with tree-sitter. In emacs-29 we have
> things like treesit-beginning/end-of-defun, and treesit-defun-name. I
> was thinking maybe we can generalize this to support getting arbitrary
> “thing” at point, move around them, and getting information like the
> name of a defun, its arglist, parent of a class, type of an variable
> declaration, etc, in a language-agnostic way.
Note that Org mode also does all of these using
https://orgmode.org/worg/dev/org-element-api.html
It would be nice if we could converge to more consistent interface
across all the modes. For example, by extending `thing-at-point' to handle
parsed elements, not just simplistic regexp-based "thing" boundaries
exposed by `thing-at-point' now.
Org approaches getting name/begin/end/arguments using a common API:
(org-element-property :begin NODE)
(org-element-property :end NODE)
(org-element-property :contents-begin NODE)
(org-element-property :contents-end NODE)
(org-element-property :name NODE)
(org-element-property :args NODE)
Language-agnostic "thing"s will certainly be welcome, especially given
that tree-sitter grammars use inconsistent naming schemes, which have to
be learned separately, and may even change with grammar versions.
I think that both NODE types and attributes can be standardized.
> Also, at the time, we only support defining things by a regexp
> matching a node’s type, which is often not enough.
>
> And it would be nice to somehow take advantage of the tree-sitter
> queries for the features I mentioned above. Tree-sitter query is what
> every other editor are using for virtually all tree-sitter related
> features. But in Emacs, we mostly only use it for font-lock.
I recall one user asking about something like VIM's textobjects via
tree-sitter queries. Example:
https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/cpp/textobjects.scm
> Here’s the progress as of now:
>
> - Functions like treesit-search-forward, treesit-induce-sparse-tree,
> treesit-thing-at-point, treesit--navigate-thing, etc, support a richer
> set of predicates now. Besides regexp matching the type, the predicate
> can also be a predication function, or (REGEP . FUNC), or compound
> predicates like (or PRED PRED) or (not PRED).
Slightly unrelated, but do you have any idea if it can be faster to use
Emacs' regexp search combined with treesit-thing-at-point vs. pure
tree-sitter query?
> - There’s now a variable treesit-thing-settings, which holds
> definition for things. Then, instead of passing the predicate to the
> functions I mentioned above, you can save the predicate in
> treesit-thing-settings under a symbol, say ‘sexp', and pass the symbol
> instead, just like thing-at-point.el. (We’ll work on integrating with
> thing-at-point.el later.)
This sounds similar to textobjects I linked above.
One question: how will it integrate with multiple parsers in one buffer?
> - I can’t think of a good way to integrate tree-sitter queries with
> the navigation functions we have right now. Most importantly,
> tree-sitter query always search top-down, and you can’t limit the
> depth it searches. OTOH, our navigation functions work by traversing
> the tree node-to-node.
May you elaborate about the difficulties you encountered?
> Some other things on the TODO list that people can take a jab at:
>
> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error.
May we somehow get a hash of the library? That way, we can at least
detect if something has changed.
> - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged.
I think that integration of tree-sitter with navigation functions might
be a step towards solving this problem. If common Emacs commands can
automatically choose between tree-sitter and classic implementations, it
might become easier to unify foo-ts-mode with foo-mode.
> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next.
Do you mean that a single parser sees subsequent block as a continuation
of the previous?
--
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>
next prev parent reply other threads:[~2023-09-02 6:52 UTC|newest]
Thread overview: 30+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-09-02 5:01 Update on tree-sitter structure navigation Yuan Fu
2023-09-02 6:52 ` Ihor Radchenko [this message]
2023-09-02 8:50 ` Hugo Thunnissen
2023-09-02 22:12 ` Yuan Fu
2023-09-06 11:37 ` Ihor Radchenko
2023-09-08 0:59 ` Yuan Fu
2023-09-02 22:09 ` Yuan Fu
2023-09-06 11:57 ` Ihor Radchenko
2023-09-06 12:58 ` Eli Zaretskii
2023-09-08 12:03 ` Ihor Radchenko
2023-09-08 13:08 ` Eli Zaretskii
2023-09-08 1:06 ` Yuan Fu
2023-09-08 9:09 ` Ihor Radchenko
2023-09-08 16:46 ` Yuan Fu
2023-09-03 0:56 ` Dmitry Gutov
2023-09-06 2:51 ` Danny Freeman
2023-09-06 12:47 ` Dmitry Gutov
2023-09-07 3:18 ` Danny Freeman
2023-09-07 12:52 ` Dmitry Gutov
2023-09-08 1:04 ` Yuan Fu
2023-09-08 6:40 ` Eli Zaretskii
2023-09-08 20:52 ` Dmitry Gutov
2023-09-09 6:32 ` Eli Zaretskii
2023-09-09 10:24 ` Dmitry Gutov
2023-09-09 11:38 ` Eli Zaretskii
2023-09-09 17:04 ` Dmitry Gutov
2023-09-09 17:28 ` Eli Zaretskii
2023-09-12 0:36 ` Yuan Fu
2023-09-12 10:17 ` Dmitry Gutov
2023-09-08 21:05 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h6odhxs6.fsf@localhost \
--to=yantar92@posteo.net \
--cc=casouri@gmail.com \
--cc=danny@dfreeman.email \
--cc=dev@rjt.dev \
--cc=dgutov@yandex.ru \
--cc=emacs-devel@gnu.org \
--cc=jostein@secure.kjonigsen.net \
--cc=pedz@easesoftware.com \
--cc=theo@thornhill.no \
--cc=wkirschbaum@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).