Update on tree-sitter structure navigation

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Update on tree-sitter structure navigation
@ 2023-09-02  5:01 Yuan Fu
  2023-09-02  6:52 ` Ihor Radchenko
  2023-09-03  0:56 ` Dmitry Gutov
  0 siblings, 2 replies; 30+ messages in thread
From: Yuan Fu @ 2023-09-02  5:01 UTC (permalink / raw)
  To: emacs-devel
  Cc: Danny Freeman, Theodor Thornhill, Jostein Kjønigsen,
	Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov

Hey guys,

In the months after wrapping up tree-sitter stuff in emacs-29, I was thinking about how to implement structural navigation and extracting information from the parser with tree-sitter. In emacs-29 we have things like treesit-beginning/end-of-defun, and treesit-defun-name. I was thinking maybe we can generalize this to support getting arbitrary “thing” at point, move around them, and getting information like the name of a defun, its arglist,  parent of a class, type of an variable declaration, etc, in a language-agnostic way.

Also, at the time, we only support defining things by a regexp matching a node’s type, which is often not enough. 

And it would be nice to somehow take advantage of the tree-sitter queries for the features I mentioned above. Tree-sitter query is what every other editor are using for virtually all tree-sitter related features. But in Emacs, we mostly only use it for font-lock.

Here’s the progress as of now:

- Functions like treesit-search-forward, treesit-induce-sparse-tree, treesit-thing-at-point, treesit--navigate-thing, etc, support a richer set of predicates now. Besides regexp matching the type, the predicate can also be a predication function, or (REGEP . FUNC), or compound predicates like (or PRED PRED) or (not PRED).

- There’s now a variable treesit-thing-settings, which holds definition for things. Then, instead of passing the predicate to the functions I mentioned above, you can save the predicate in treesit-thing-settings under a symbol, say ‘sexp', and pass the symbol instead, just like thing-at-point.el. (We’ll work on integrating with thing-at-point.el later.)

- I can’t think of a good way to integrate tree-sitter queries with the navigation functions we have right now. Most importantly, tree-sitter query always search top-down, and you can’t limit the depth it searches. OTOH, our navigation functions work by traversing the tree node-to-node.

- There’s no progress on getting information like name and type, etc, in a language-agnostic way. I haven’t come up with a good interface and/or implementation. I encourage interested folks to give it some thought. Bonus points for reusing the query files neovim folks has accumulated :-)

Some other things on the TODO list that people can take a jab at:

- Query-based indentation (neovim’s implementation can be a source of inspiration)
- Improve c-ts-mode (indentation styles, other cc-mode features, etc) and other tree-sitter modes
- Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error.
- Major mode fallback/inheritance, this has been discussed many times, no good solution emerged.
- Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next.

Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too.

Yuan

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-02  5:01 Update on tree-sitter structure navigation Yuan Fu
@ 2023-09-02  6:52 ` Ihor Radchenko
  2023-09-02  8:50   ` Hugo Thunnissen
  2023-09-02 22:09   ` Yuan Fu
  2023-09-03  0:56 ` Dmitry Gutov
  1 sibling, 2 replies; 30+ messages in thread
From: Ihor Radchenko @ 2023-09-02  6:52 UTC (permalink / raw)
  To: Yuan Fu
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov

Yuan Fu <casouri@gmail.com> writes:

> In the months after wrapping up tree-sitter stuff in emacs-29, I was
> thinking about how to implement structural navigation and extracting
> information from the parser with tree-sitter. In emacs-29 we have
> things like treesit-beginning/end-of-defun, and treesit-defun-name. I
> was thinking maybe we can generalize this to support getting arbitrary
> “thing” at point, move around them, and getting information like the
> name of a defun, its arglist, parent of a class, type of an variable
> declaration, etc, in a language-agnostic way.

Note that Org mode also does all of these using
https://orgmode.org/worg/dev/org-element-api.html

It would be nice if we could converge to more consistent interface
across all the modes. For example, by extending `thing-at-point' to handle
parsed elements, not just simplistic regexp-based "thing" boundaries
exposed by `thing-at-point' now.

Org approaches getting name/begin/end/arguments using a common API:

(org-element-property :begin NODE)
(org-element-property :end NODE)
(org-element-property :contents-begin NODE)
(org-element-property :contents-end NODE)
(org-element-property :name NODE)
(org-element-property :args NODE)

Language-agnostic "thing"s will certainly be welcome, especially given
that tree-sitter grammars use inconsistent naming schemes, which have to
be learned separately, and may even change with grammar versions.

I think that both NODE types and attributes can be standardized.

> Also, at the time, we only support defining things by a regexp
> matching a node’s type, which is often not enough.
>
> And it would be nice to somehow take advantage of the tree-sitter
> queries for the features I mentioned above. Tree-sitter query is what
> every other editor are using for virtually all tree-sitter related
> features. But in Emacs, we mostly only use it for font-lock.

I recall one user asking about something like VIM's textobjects via
tree-sitter queries. Example:
https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/cpp/textobjects.scm

> Here’s the progress as of now:
>
> - Functions like treesit-search-forward, treesit-induce-sparse-tree,
> treesit-thing-at-point, treesit--navigate-thing, etc, support a richer
> set of predicates now. Besides regexp matching the type, the predicate
> can also be a predication function, or (REGEP . FUNC), or compound
> predicates like (or PRED PRED) or (not PRED).

Slightly unrelated, but do you have any idea if it can be faster to use
Emacs' regexp search combined with treesit-thing-at-point vs. pure
tree-sitter query?

> - There’s now a variable treesit-thing-settings, which holds
> definition for things. Then, instead of passing the predicate to the
> functions I mentioned above, you can save the predicate in
> treesit-thing-settings under a symbol, say ‘sexp', and pass the symbol
> instead, just like thing-at-point.el. (We’ll work on integrating with
> thing-at-point.el later.)

This sounds similar to textobjects I linked above.
One question: how will it integrate with multiple parsers in one buffer?

> - I can’t think of a good way to integrate tree-sitter queries with
> the navigation functions we have right now. Most importantly,
> tree-sitter query always search top-down, and you can’t limit the
> depth it searches. OTOH, our navigation functions work by traversing
> the tree node-to-node.

May you elaborate about the difficulties you encountered?

> Some other things on the TODO list that people can take a jab at:
>
> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error.

May we somehow get a hash of the library? That way, we can at least
detect if something has changed.

> - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged.

I think that integration of tree-sitter with navigation functions might
be a step towards solving this problem. If common Emacs commands can
automatically choose between tree-sitter and classic implementations, it
might become easier to unify foo-ts-mode with foo-mode.

> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next.

Do you mean that a single parser sees subsequent block as a continuation
of the previous?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-02  6:52 ` Ihor Radchenko
@ 2023-09-02  8:50   ` Hugo Thunnissen
  2023-09-02 22:12     ` Yuan Fu
  2023-09-02 22:09   ` Yuan Fu
  1 sibling, 1 reply; 30+ messages in thread
From: Hugo Thunnissen @ 2023-09-02  8:50 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: Yuan Fu, emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov

Ihor Radchenko <yantar92@posteo.net> writes:

> Yuan Fu <casouri@gmail.com> writes:
>
>> In the months after wrapping up tree-sitter stuff in emacs-29, I was
>> thinking about how to implement structural navigation and extracting
>> information from the parser with tree-sitter. In emacs-29 we have
>> things like treesit-beginning/end-of-defun, and treesit-defun-name. I
>> was thinking maybe we can generalize this to support getting arbitrary
>> “thing” at point, move around them, and getting information like the
>> name of a defun, its arglist, parent of a class, type of an variable
>> declaration, etc, in a language-agnostic way.
>
> Note that Org mode also does all of these using
> https://orgmode.org/worg/dev/org-element-api.html
>
> It would be nice if we could converge to more consistent interface
> across all the modes. For example, by extending `thing-at-point' to handle
> parsed elements, not just simplistic regexp-based "thing" boundaries
> exposed by `thing-at-point' now.
>
> Org approaches getting name/begin/end/arguments using a common API:
>
> (org-element-property :begin NODE)
> (org-element-property :end NODE)
> (org-element-property :contents-begin NODE)
> (org-element-property :contents-end NODE)
> (org-element-property :name NODE)
> (org-element-property :args NODE)
>
> Language-agnostic "thing"s will certainly be welcome, especially given
> that tree-sitter grammars use inconsistent naming schemes, which have to
> be learned separately, and may even change with grammar versions.
>
> I think that both NODE types and attributes can be standardized.
>

It would be great to see standardization that can work with more than
just tree-sitter.  Depending on how extensive such a generic NODE type
and accompanying API are, I could see standardization of a lot of things
that are currently implemented in major modes, to name a few:

- indentation
- fontification
- thing-at-point
- imenu
- simple forms of completion (variables, function names in buffer)

I have some idea of the underpinnings, but I have never implemented a
full major mode so it is hard for me to judge the practicality of
this. How much would be practical to standardize, without needlessly
complicated/resource-heavy abstractions?



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-02  8:50   ` Hugo Thunnissen
@ 2023-09-02 22:12     ` Yuan Fu
  2023-09-06 11:37       ` Ihor Radchenko
  0 siblings, 1 reply; 30+ messages in thread
From: Yuan Fu @ 2023-09-02 22:12 UTC (permalink / raw)
  To: Hugo Thunnissen
  Cc: Ihor Radchenko, emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov



> On Sep 2, 2023, at 1:50 AM, Hugo Thunnissen <devel@hugot.nl> wrote:
> 
> Ihor Radchenko <yantar92@posteo.net> writes:
> 
>> Yuan Fu <casouri@gmail.com> writes:
>> 
>>> In the months after wrapping up tree-sitter stuff in emacs-29, I was
>>> thinking about how to implement structural navigation and extracting
>>> information from the parser with tree-sitter. In emacs-29 we have
>>> things like treesit-beginning/end-of-defun, and treesit-defun-name. I
>>> was thinking maybe we can generalize this to support getting arbitrary
>>> “thing” at point, move around them, and getting information like the
>>> name of a defun, its arglist, parent of a class, type of an variable
>>> declaration, etc, in a language-agnostic way.
>> 
>> Note that Org mode also does all of these using
>> https://orgmode.org/worg/dev/org-element-api.html
>> 
>> It would be nice if we could converge to more consistent interface
>> across all the modes. For example, by extending `thing-at-point' to handle
>> parsed elements, not just simplistic regexp-based "thing" boundaries
>> exposed by `thing-at-point' now.
>> 
>> Org approaches getting name/begin/end/arguments using a common API:
>> 
>> (org-element-property :begin NODE)
>> (org-element-property :end NODE)
>> (org-element-property :contents-begin NODE)
>> (org-element-property :contents-end NODE)
>> (org-element-property :name NODE)
>> (org-element-property :args NODE)
>> 
>> Language-agnostic "thing"s will certainly be welcome, especially given
>> that tree-sitter grammars use inconsistent naming schemes, which have to
>> be learned separately, and may even change with grammar versions.
>> 
>> I think that both NODE types and attributes can be standardized.
>> 
> 
> It would be great to see standardization that can work with more than
> just tree-sitter.  Depending on how extensive such a generic NODE type
> and accompanying API are, I could see standardization of a lot of things
> that are currently implemented in major modes, to name a few:
> 
> - indentation
> - fontification
> - thing-at-point
> - imenu
> - simple forms of completion (variables, function names in buffer)
> 
> I have some idea of the underpinnings, but I have never implemented a
> full major mode so it is hard for me to judge the practicality of
> this. How much would be practical to standardize, without needlessly
> complicated/resource-heavy abstractions?

I don’t know which level of standardization you are thinking about, but aren’t they already standardized?

- indentation: indent-line/region-function
- fontification: font-lock-defaults
- thing-at-point: thing-at-point function
- imenu: imenu-create-index-function
- completion: completion-at-point-function

Yuan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-02 22:12     ` Yuan Fu
@ 2023-09-06 11:37       ` Ihor Radchenko
  2023-09-08  0:59         ` Yuan Fu
  0 siblings, 1 reply; 30+ messages in thread
From: Ihor Radchenko @ 2023-09-06 11:37 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Hugo Thunnissen, emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov

Yuan Fu <casouri@gmail.com> writes:

> I don’t know which level of standardization you are thinking about, but aren’t they already standardized?
> ...
> - fontification: font-lock-defaults

AFAIU, tree-sitter-specific font-lock is configured separately from the
rest of the font-lock-keywords.

> - thing-at-point: thing-at-point function

Adding new "things" is not well-documented though.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-06 11:37       ` Ihor Radchenko
@ 2023-09-08  0:59         ` Yuan Fu
  0 siblings, 0 replies; 30+ messages in thread
From: Yuan Fu @ 2023-09-08  0:59 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: Hugo Thunnissen, emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov



> On Sep 6, 2023, at 4:37 AM, Ihor Radchenko <yantar92@posteo.net> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>> I don’t know which level of standardization you are thinking about, but aren’t they already standardized?
>> ...
>> - fontification: font-lock-defaults
> 
> AFAIU, tree-sitter-specific font-lock is configured separately from the
> rest of the font-lock-keywords.

The standard interfacing I’m referring to is what tree-sitter uses, rather than what tree-sitter provides. Ie, font-lock-fontify-region-function, etc.

> 
>> - thing-at-point: thing-at-point function
> 
> Adding new "things" is not well-documented though.

That’s true. I didn’t investigate myself, either.

Yuan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-02  6:52 ` Ihor Radchenko
  2023-09-02  8:50   ` Hugo Thunnissen
@ 2023-09-02 22:09   ` Yuan Fu
  2023-09-06 11:57     ` Ihor Radchenko
  1 sibling, 1 reply; 30+ messages in thread
From: Yuan Fu @ 2023-09-02 22:09 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov



> On Sep 1, 2023, at 11:52 PM, Ihor Radchenko <yantar92@posteo.net> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>> In the months after wrapping up tree-sitter stuff in emacs-29, I was
>> thinking about how to implement structural navigation and extracting
>> information from the parser with tree-sitter. In emacs-29 we have
>> things like treesit-beginning/end-of-defun, and treesit-defun-name. I
>> was thinking maybe we can generalize this to support getting arbitrary
>> “thing” at point, move around them, and getting information like the
>> name of a defun, its arglist, parent of a class, type of an variable
>> declaration, etc, in a language-agnostic way.
> 
> Note that Org mode also does all of these using
> https://orgmode.org/worg/dev/org-element-api.html
> 
> It would be nice if we could converge to more consistent interface
> across all the modes. For example, by extending `thing-at-point' to handle
> parsed elements, not just simplistic regexp-based "thing" boundaries
> exposed by `thing-at-point' now.
> 
> Org approaches getting name/begin/end/arguments using a common API:
> 
> (org-element-property :begin NODE)
> (org-element-property :end NODE)
> (org-element-property :contents-begin NODE)
> (org-element-property :contents-end NODE)
> (org-element-property :name NODE)
> (org-element-property :args NODE)
> 
> Language-agnostic "thing"s will certainly be welcome, especially given
> that tree-sitter grammars use inconsistent naming schemes, which have to
> be learned separately, and may even change with grammar versions.
> 
> I think that both NODE types and attributes can be standardized.

If we come up with a thing-at-point interface that provides more information than the current (BEG . END), tree-sitter surely can support it as a backend. Just need SomeOne to come up with it :-) But I don’t see how this interface can support semantic information like arglist of a defun, or type of a declaration—these things are not universal to all “nodes”.

> 
>> Also, at the time, we only support defining things by a regexp
>> matching a node’s type, which is often not enough.
>> 
>> And it would be nice to somehow take advantage of the tree-sitter
>> queries for the features I mentioned above. Tree-sitter query is what
>> every other editor are using for virtually all tree-sitter related
>> features. But in Emacs, we mostly only use it for font-lock.
> 
> I recall one user asking about something like VIM's textobjects via
> tree-sitter queries. Example:
> https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/cpp/textobjects.scm

I think that’s something that can be implemented with thing definitions.


>> Here’s the progress as of now:
>> 
>> - Functions like treesit-search-forward, treesit-induce-sparse-tree,
>> treesit-thing-at-point, treesit--navigate-thing, etc, support a richer
>> set of predicates now. Besides regexp matching the type, the predicate
>> can also be a predication function, or (REGEP . FUNC), or compound
>> predicates like (or PRED PRED) or (not PRED).
> 
> Slightly unrelated, but do you have any idea if it can be faster to use
> Emacs' regexp search combined with treesit-thing-at-point vs. pure
> tree-sitter query?

Not really.

> 
>> - There’s now a variable treesit-thing-settings, which holds
>> definition for things. Then, instead of passing the predicate to the
>> functions I mentioned above, you can save the predicate in
>> treesit-thing-settings under a symbol, say ‘sexp', and pass the symbol
>> instead, just like thing-at-point.el. (We’ll work on integrating with
>> thing-at-point.el later.)
> 
> This sounds similar to textobjects I linked above.
> One question: how will it integrate with multiple parsers in one buffer?

This only concerns with checking if a node satisfies the definition of a “thing”, and doesn’t care how you get the node. Retrieving node through either treesit-node-at or other functions already works with multiple parsers.

Also the “thing” definition is language-specific.

> 
>> - I can’t think of a good way to integrate tree-sitter queries with
>> the navigation functions we have right now. Most importantly,
>> tree-sitter query always search top-down, and you can’t limit the
>> depth it searches. OTOH, our navigation functions work by traversing
>> the tree node-to-node.
> 
> May you elaborate about the difficulties you encountered?

Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful.

> 
>> Some other things on the TODO list that people can take a jab at:
>> 
>> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error.
> 
> May we somehow get a hash of the library? That way, we can at least
> detect if something has changed.

All we get is a binary dynamic library. So I don’t think so.

> 
>> - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged.
> 
> I think that integration of tree-sitter with navigation functions might
> be a step towards solving this problem. If common Emacs commands can
> automatically choose between tree-sitter and classic implementations, it
> might become easier to unify foo-ts-mode with foo-mode.

Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion.

> 
>> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next.
> 
> Do you mean that a single parser sees subsequent block as a continuation
> of the previous?

Exactly.

Yuan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-02 22:09   ` Yuan Fu
@ 2023-09-06 11:57     ` Ihor Radchenko
  2023-09-06 12:58       ` Eli Zaretskii
  2023-09-08  1:06       ` Yuan Fu
  0 siblings, 2 replies; 30+ messages in thread
From: Ihor Radchenko @ 2023-09-06 11:57 UTC (permalink / raw)
  To: Yuan Fu
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov

Yuan Fu <casouri@gmail.com> writes:

> I think that both NODE types and attributes can be standardized.
>
> If we come up with a thing-at-point interface that provides more information than the current (BEG . END), tree-sitter surely can support it as a backend. Just need SomeOne to come up with it :-) But I don’t see how this interface can support semantic information like arglist of a defun, or type of a declaration—these things are not universal to all “nodes”.

For example, consider something like

(thing-slot 'arglist (thing-at-point 'defun)) ; => (ARGLIST_BEG . ARGLIST_END)
(thing-slot 'arglist (thing-at-point 'variable)) ; => nil

>>> - I can’t think of a good way to integrate tree-sitter queries with
>>> the navigation functions we have right now. Most importantly,
>>> tree-sitter query always search top-down, and you can’t limit the
>>> depth it searches. OTOH, our navigation functions work by traversing
>>> the tree node-to-node.
>> 
>> May you elaborate about the difficulties you encountered?
>
> Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful.

Isn't ts_query_cursor_next_match only searching a single match?

>>> - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged.
>> 
>> I think that integration of tree-sitter with navigation functions might
>> be a step towards solving this problem. If common Emacs commands can
>> automatically choose between tree-sitter and classic implementations, it
>> might become easier to unify foo-ts-mode with foo-mode.
>
> Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion.

Any chance you have links to these discussions?

>>> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next.
>> 
>> Do you mean that a single parser sees subsequent block as a continuation
>> of the previous?
>
> Exactly.

Then, I can see cases when we do and also when we do _not_ want separate
parsers for different blocks. For example, literate programming often
uses other language blocks that are intended to be continuous.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-06 11:57     ` Ihor Radchenko
@ 2023-09-06 12:58       ` Eli Zaretskii
  2023-09-08 12:03         ` Ihor Radchenko
  2023-09-08  1:06       ` Yuan Fu
  1 sibling, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2023-09-06 12:58 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz, dgutov

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: emacs-devel <emacs-devel@gnu.org>, Danny Freeman <danny@dfreeman.email>,
>  Theodor Thornhill <theo@thornhill.no>, Jostein Kjønigsen
>  <jostein@secure.kjonigsen.net>, Randy Taylor <dev@rjt.dev>, Wilhelm
>  Kirschbaum <wkirschbaum@gmail.com>, Perry Smith <pedz@easesoftware.com>,
>  Dmitry Gutov <dgutov@yandex.ru>
> Date: Wed, 06 Sep 2023 11:57:26 +0000
> 
> > Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion.
> 
> Any chance you have links to these discussions?

Here's one:

  https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01251.html
  https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01293.html



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-06 12:58       ` Eli Zaretskii
@ 2023-09-08 12:03         ` Ihor Radchenko
  2023-09-08 13:08           ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Ihor Radchenko @ 2023-09-08 12:03 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz, dgutov

Eli Zaretskii <eliz@gnu.org> writes:

>> > Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion.
>> 
>> Any chance you have links to these discussions?
>
> Here's one:
>
>   https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01251.html
>   https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01293.html

Thanks!

According to the discussion, the main problem is that interleaving
ts-related and ts-unrelated code in the same mode is risky. It is safer
to have a dedicated foo-ts-mode rather than modifying the existing
foo-mode.

However, separate *-ts- and *- modes create a problem when user config
tailored for old, non-ts mode will no longer work. For example,
c-ts-mode has `c-ts-mode-indent-offset', while cc-mode has
c-basic-offset in `c-style-alist'.

Ideally, user-facing API should be shared between the modes: defcustoms,
faces, and certain high-level functions like `c-set-style'.

One might slowly:
1. Add support of foo-mode's defcustoms to foo-ts-mode, when applicable
2. Create a shared API between foo-mode and foo-ts-mode that will call
   the appropriate implementation depending on which mode is active.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-08 12:03         ` Ihor Radchenko
@ 2023-09-08 13:08           ` Eli Zaretskii
  0 siblings, 0 replies; 30+ messages in thread
From: Eli Zaretskii @ 2023-09-08 13:08 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz, dgutov

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email,
>  theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev,
>  wkirschbaum@gmail.com, pedz@easesoftware.com, dgutov@yandex.ru
> Date: Fri, 08 Sep 2023 12:03:58 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >   https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01251.html
> >   https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01293.html
> 
> Thanks!
> 
> According to the discussion, the main problem is that interleaving
> ts-related and ts-unrelated code in the same mode is risky. It is safer
> to have a dedicated foo-ts-mode rather than modifying the existing
> foo-mode.

No, that's the wrong conclusion.  The main reason is that mixing these
modes makes no sense in most cases, due to completely different
infrastructures they use.  The main aspects of a major mode --
font-lock, indentation, and defun- and expression-level navigation --
are based on such different grounds that you cannot possibly reuse
them.  And once those are implemented on a different basis, what is
left to share?

> However, separate *-ts- and *- modes create a problem when user config
> tailored for old, non-ts mode will no longer work.

There's no argument that this is a disadvantage that causes problems
to users.  The challenge is to find a good solution.  The basic
requirements from such a solution are:

  . as much as possible, provide the same or equivalent features
  . allow easy migration of customizations from an old mode to a TS mode
  . allow to switch easily between the two kinds of modes for the same
    PL, in both directions (for example, to let users try the TS mode
    and switch back if they don't like it)
  . avoid complicating the maintenance too much

> For example, c-ts-mode has `c-ts-mode-indent-offset', while cc-mode
> has c-basic-offset in `c-style-alist'.

Yes, but CC Mode's indentation customizations cannot be ported to
c-ts-mode because they are based on a completely different
classification of syntactic elements, so what do you propose as the
solution for this particular schism?

As for c-style-alist, the elements of the style are also completely
different.  So for now, we provide a different variable for c-ts-mode
which supports the subset of built-in styles supported by CC Mode.  If
you have a concrete proposal for a better solution, let's hear it.

> Ideally, user-facing API should be shared between the modes: defcustoms,
> faces, and certain high-level functions like `c-set-style'.

Again, there's no argument about the ideal, and never was.  We just
couldn't find a way of implementing this ideal without bumping into
serious problems.  May I suggest to study the code of at least a few
pairs of modes, and see what I'm talking about?

> One might slowly:
> 1. Add support of foo-mode's defcustoms to foo-ts-mode, when applicable
> 2. Create a shared API between foo-mode and foo-ts-mode that will call
>    the appropriate implementation depending on which mode is active.

This sounds great in the abstract, but in practice bumps into serious
implementation problems.  The names of the variables are the least of
our problems; the fact that we provide different names in the TS modes
is to make sure no one expects the non-TS customizations to work with
TS modes, because that's currently impossible: the internal structure
of the data of the variables, as well as the way the related internal
functions work, is too different.  As an exercise, try to create an
API for font-lock that could be shared by a TS and a non-TS mode.  If
you succeed, and if the result is significantly different from what we
already have, please present the solution, because maybe we have
missed something.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-06 11:57     ` Ihor Radchenko
  2023-09-06 12:58       ` Eli Zaretskii
@ 2023-09-08  1:06       ` Yuan Fu
  2023-09-08  9:09         ` Ihor Radchenko
  1 sibling, 1 reply; 30+ messages in thread
From: Yuan Fu @ 2023-09-08  1:06 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov



> On Sep 6, 2023, at 4:57 AM, Ihor Radchenko <yantar92@posteo.net> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>> I think that both NODE types and attributes can be standardized.
>> 
>> If we come up with a thing-at-point interface that provides more information than the current (BEG . END), tree-sitter surely can support it as a backend. Just need SomeOne to come up with it :-) But I don’t see how this interface can support semantic information like arglist of a defun, or type of a declaration—these things are not universal to all “nodes”.
> 
> For example, consider something like
> 
> (thing-slot 'arglist (thing-at-point 'defun)) ; => (ARGLIST_BEG . ARGLIST_END)
> (thing-slot 'arglist (thing-at-point 'variable)) ; => nil
> 

Yeah, that makes sense.

>>>> - I can’t think of a good way to integrate tree-sitter queries with
>>>> the navigation functions we have right now. Most importantly,
>>>> tree-sitter query always search top-down, and you can’t limit the
>>>> depth it searches. OTOH, our navigation functions work by traversing
>>>> the tree node-to-node.
>>> 
>>> May you elaborate about the difficulties you encountered?
>> 
>> Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful.
> 
> Isn't ts_query_cursor_next_match only searching a single match?

Seems so, that’s good. But there’s no guarantee that the first match with be the top node, even thought implementation-wise, I think that’s probably the case. Maybe we can ask tree-sitter developer to add such a promise.

>>>> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next.
>>> 
>>> Do you mean that a single parser sees subsequent block as a continuation
>>> of the previous?
>> 
>> Exactly.
> 
> Then, I can see cases when we do and also when we do _not_ want separate
> parsers for different blocks. For example, literate programming often
> uses other language blocks that are intended to be continuous.

Surprise, I added support for local parsers. Major mode authors can choose between global and local parsers.

Yuan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-08  1:06       ` Yuan Fu
@ 2023-09-08  9:09         ` Ihor Radchenko
  2023-09-08 16:46           ` Yuan Fu
  0 siblings, 1 reply; 30+ messages in thread
From: Ihor Radchenko @ 2023-09-08  9:09 UTC (permalink / raw)
  To: Yuan Fu
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov

Yuan Fu <casouri@gmail.com> writes:

>>> Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful.
>> 
>> Isn't ts_query_cursor_next_match only searching a single match?
>
> Seems so, that’s good. But there’s no guarantee that the first match with be the top node, even thought implementation-wise, I think that’s probably the case. Maybe we can ask tree-sitter developer to add such a promise.

I have found several potentially useful things in the ABI
https://github.com/tree-sitter/tree-sitter/blob/524bf7e2c664d4a5dbd0c20d4d10f1e58f99e8ce/lib/include/tree_sitter/api.h

/**
 * Set the maximum start depth for a query cursor.
 *
 * This prevents cursors from exploring children nodes at a certain depth.
 * Note if a pattern includes many children, then they will still be checked.
 *
 * The zero max start depth value can be used as a special behavior and
 * it helps to destructure a subtree by staying on a node and using captures
 * for interested parts. Note that the zero max start depth only limit a search
 * depth for a pattern's root node but other nodes that are parts of the pattern
 * may be searched at any depth what defined by the pattern structure.
 *
 * Set to `UINT32_MAX` to remove the maximum start depth.
 */
void ts_query_cursor_set_max_start_depth(TSQueryCursor *self, uint32_t max_start_depth);

/**
 * Set the range of bytes or (row, column) positions in which the query
 * will be executed.
 */
void ts_query_cursor_set_byte_range(TSQueryCursor *self, uint32_t start_byte, uint32_t end_byte);
void ts_query_cursor_set_point_range(TSQueryCursor *self, TSPoint start_point, TSPoint end_point);

>> Then, I can see cases when we do and also when we do _not_ want separate
>> parsers for different blocks. For example, literate programming often
>> uses other language blocks that are intended to be continuous.
>
> Surprise, I added support for local parsers. Major mode authors can choose between global and local parsers.

Thanks!

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-08  9:09         ` Ihor Radchenko
@ 2023-09-08 16:46           ` Yuan Fu
  0 siblings, 0 replies; 30+ messages in thread
From: Yuan Fu @ 2023-09-08 16:46 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith, Dmitry Gutov



> On Sep 8, 2023, at 2:09 AM, Ihor Radchenko <yantar92@posteo.net> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>>> Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful.
>>> 
>>> Isn't ts_query_cursor_next_match only searching a single match?
>> 
>> Seems so, that’s good. But there’s no guarantee that the first match with be the top node, even thought implementation-wise, I think that’s probably the case. Maybe we can ask tree-sitter developer to add such a promise.
> 
> I have found several potentially useful things in the ABI
> https://github.com/tree-sitter/tree-sitter/blob/524bf7e2c664d4a5dbd0c20d4d10f1e58f99e8ce/lib/include/tree_sitter/api.h
> 
> /**
> * Set the maximum start depth for a query cursor.
> *
> * This prevents cursors from exploring children nodes at a certain depth.
> * Note if a pattern includes many children, then they will still be checked.
> *
> * The zero max start depth value can be used as a special behavior and
> * it helps to destructure a subtree by staying on a node and using captures
> * for interested parts. Note that the zero max start depth only limit a search
> * depth for a pattern's root node but other nodes that are parts of the pattern
> * may be searched at any depth what defined by the pattern structure.
> *
> * Set to `UINT32_MAX` to remove the maximum start depth.
> */
> void ts_query_cursor_set_max_start_depth(TSQueryCursor *self, uint32_t max_start_depth);
> 
> /**
> * Set the range of bytes or (row, column) positions in which the query
> * will be executed.
> */
> void ts_query_cursor_set_byte_range(TSQueryCursor *self, uint32_t start_byte, uint32_t end_byte);
> void ts_query_cursor_set_point_range(TSQueryCursor *self, TSPoint start_point, TSPoint end_point);

That’s great. Seems like a new addition to the API. That solves every problem I had!

Yuan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-02  5:01 Update on tree-sitter structure navigation Yuan Fu
  2023-09-02  6:52 ` Ihor Radchenko
@ 2023-09-03  0:56 ` Dmitry Gutov
  2023-09-06  2:51   ` Danny Freeman
  2023-09-08  1:04   ` Yuan Fu
  1 sibling, 2 replies; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-03  0:56 UTC (permalink / raw)
  To: Yuan Fu, emacs-devel
  Cc: Danny Freeman, Theodor Thornhill, Jostein Kjønigsen,
	Randy Taylor, Wilhelm Kirschbaum, Perry Smith

Hi Yuan,

On 02/09/2023 08:01, Yuan Fu wrote:
> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error.

I don't have a better idea than basically copying NeoVim and others: to 
maintain the urls to parser repositories and the ref of the latest known 
good revision, for the current version of the major mode. That info 
could be filled in by major modes themselves, e.g. in an autoload block 
(similarly to how auto-mode-alist is appended to).

> Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too.

Something I mentioned previously, there is notion of scopes in 
tree-sitter docs, see the Local Variables section here: 
https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables

Basically to know which symbols are defined and for how long, the parser 
needs additional help from the major mode author.

Neovim's definition here: 
https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/ruby/locals.scm



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-03  0:56 ` Dmitry Gutov
@ 2023-09-06  2:51   ` Danny Freeman
  2023-09-06 12:47     ` Dmitry Gutov
  2023-09-08  1:04   ` Yuan Fu
  1 sibling, 1 reply; 30+ messages in thread
From: Danny Freeman @ 2023-09-06  2:51 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen,
	Randy Taylor, Wilhelm Kirschbaum, Perry Smith

Dmitry Gutov <dgutov@yandex.ru> writes:

> Hi Yuan,
>
> On 02/09/2023 08:01, Yuan Fu wrote:
>> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version
>> number, so every time the author changes the grammar, our queries break, and loading the mode only
>> produces a giant error.
>
> I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser
> repositories and the ref of the latest known good revision, for the current version of the major
> mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly
> to how auto-mode-alist is appended to).

clojure-ts-mode keeps a URL for the parser, but doesn't do anything
about the git revision. It easily could but I don't feel the need (yet)
since I am also a maintainer of the clojure grammar and know when we're
about to break grammar consumers.

It's not quite that simple though. Some distributions (nixos for
example) are already providing pre-compiled grammars. That is how I
discovered a couple recent bugs in js-ts-mode, because the grammars
distributed with nixos 23.05 no longer worked on Emacs 30 after a patch
was applied that was supposed to be backwards compatible (a real pain to
verify in my experience).

With the way Emacs can load a grammar provided by the user's
distribution, keeping information about the version of the grammar in
the major mode doesn't help all that much. Even if we did it we have no
idea what version might be have been built used the user's
.emacs.d/tree-sitter folder. That would require something like putting a
version number in the file name, or maybe applying a patch to the
grammar's C source that allowed us to get a version, SHA, something at
runtime. 

I'm not so sure we can have a great way to do this without a change to
the tree-sitter libraries. I would love to see some kind of increasing
version number generated in the grammar's C source that we could then
access. It could be used to make decisions about what queries to use, or
to warn the user they need to use a different grammar (maybe offering to
install a compatible version).

Tree-sitter grammar changes are almost always breaking changes. Adding
nodes can break things, re-naming them and removing them definitely can.
I'm not sure any grammar consumer has a great way to deal with this
without always compiling the exact grammar they need and only ever using
it.

-- 
Danny Freeman

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-06  2:51   ` Danny Freeman
@ 2023-09-06 12:47     ` Dmitry Gutov
  2023-09-07  3:18       ` Danny Freeman
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-06 12:47 UTC (permalink / raw)
  To: Danny Freeman
  Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen,
	Randy Taylor, Wilhelm Kirschbaum, Perry Smith

On 06/09/2023 05:51, Danny Freeman wrote:
> 
> Dmitry Gutov <dgutov@yandex.ru> writes:
> 
>> Hi Yuan,
>>
>> On 02/09/2023 08:01, Yuan Fu wrote:
>>> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version
>>> number, so every time the author changes the grammar, our queries break, and loading the mode only
>>> produces a giant error.
>>
>> I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser
>> repositories and the ref of the latest known good revision, for the current version of the major
>> mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly
>> to how auto-mode-alist is appended to).
> 
> clojure-ts-mode keeps a URL for the parser, but doesn't do anything
> about the git revision. It easily could but I don't feel the need (yet)
> since I am also a maintainer of the clojure grammar and know when we're
> about to break grammar consumers.

Sure, that's easy enough to do when the package is only in ELPA: upgrade 
the grammar, upgrade the package, all in lockstep.

Unless nixos or other distros are going to start distributing it as 
well, and you'll need to care about having the recent clojure-ts-mode 
being loaded with old versions of the grammar.

> It's not quite that simple though. Some distributions (nixos for
> example) are already providing pre-compiled grammars. That is how I
> discovered a couple recent bugs in js-ts-mode, because the grammars
> distributed with nixos 23.05 no longer worked on Emacs 30 after a patch
> was applied that was supposed to be backwards compatible (a real pain to
> verify in my experience).

A helpful find. ;)

> With the way Emacs can load a grammar provided by the user's
> distribution, keeping information about the version of the grammar in
> the major mode doesn't help all that much. Even if we did it we have no
> idea what version might be have been built used the user's
> .emacs.d/tree-sitter folder. That would require something like putting a
> version number in the file name, or maybe applying a patch to the
> grammar's C source that allowed us to get a version, SHA, something at
> runtime.

Well, it would at least allow the user to rebuild the grammar to the 
version best known to work. Also, perhaps if the mode tracks the changes 
in the hash over time, it could see whether the grammar needs to be 
rebuilt. Finally, treesit-install-language-grammar could track which 
revision was last compiled.

So there is *something* we could do for the users who upgrade their 
grammars from Git.

Grammars distributed from distros are more of a problem, because it's 
not always a good idea to abort with "wrong version". But perhaps we 
could do that and recommend installing from Git in such cases anyway?

Another problem is that grammars don't have good versioning, and even if 
they did, we'd have to sometimes update the "upper bound" (we'd need 
coarse ranges, right? rather that one fixed version requirement) more 
frequently than Emacs is released. Less of a problem for modes in ELPA, 
though.

> I'm not so sure we can have a great way to do this without a change to
> the tree-sitter libraries. I would love to see some kind of increasing
> version number generated in the grammar's C source that we could then
> access. It could be used to make decisions about what queries to use, or
> to warn the user they need to use a different grammar (maybe offering to
> install a compatible version).

Yes, that would be an improvement, worth being up on the issue tracker 
maybe.

> Tree-sitter grammar changes are almost always breaking changes. Adding
> nodes can break things, re-naming them and removing them definitely can.
> I'm not sure any grammar consumer has a great way to deal with this
> without always compiling the exact grammar they need and only ever using
> it.

That's my conclusion as well for the time being.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-06 12:47     ` Dmitry Gutov
@ 2023-09-07  3:18       ` Danny Freeman
  2023-09-07 12:52         ` Dmitry Gutov
  0 siblings, 1 reply; 30+ messages in thread
From: Danny Freeman @ 2023-09-07  3:18 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen,
	Randy Taylor, Wilhelm Kirschbaum, Perry Smith

Dmitry Gutov <dgutov@yandex.ru> writes:

>> clojure-ts-mode keeps a URL for the parser, but doesn't do anything
>> about the git revision. It easily could but I don't feel the need (yet)
>> since I am also a maintainer of the clojure grammar and know when we're
>> about to break grammar consumers.
>
> Sure, that's easy enough to do when the package is only in ELPA: upgrade the grammar, upgrade the
> package, all in lockstep.

Yeah, soon after I sent that email I realized there is no reason for me
not to specify a version for the grammar so I pushed a change doing just
that.

> Unless nixos or other distros are going to start distributing it as well, and you'll need to care
> about having the recent clojure-ts-mode being loaded with old versions of the grammar.

Luckily the grammar has not changed since my package was released.
But you are right it will eventually become an issue.

>> With the way Emacs can load a grammar provided by the user's
>> distribution, keeping information about the version of the grammar in
>> the major mode doesn't help all that much. Even if we did it we have no
>> idea what version might be have been built used the user's
>> .emacs.d/tree-sitter folder. That would require something like putting a
>> version number in the file name, or maybe applying a patch to the
>> grammar's C source that allowed us to get a version, SHA, something at
>> runtime.
>
> Well, it would at least allow the user to rebuild the grammar to the version best known to work.
> Also, perhaps if the mode tracks the changes in the hash over time, it could see whether the grammar
> needs to be rebuilt. Finally, treesit-install-language-grammar could track which revision was last
> compiled.
>
> So there is *something* we could do for the users who upgrade their grammars from Git.
>
> Grammars distributed from distros are more of a problem, because it's not always a good idea to
> abort with "wrong version". But perhaps we could do that and recommend installing from Git in such
> cases anyway?

In some cases, distros might place the grammars in a strange location
made accessible on `treesit-extra-load-path`, which takes precedence
over the grammars that are installed from git in the user's Emacs
directory. This is what nix does, but is probably an outlier. I would
guess more conventional distributions might just make them accessible
where dynamic libraries are normally located and the grammars installed
from git would take precedence.

> Another problem is that grammars don't have good versioning, and even if they did, we'd have to
> sometimes update the "upper bound" (we'd need coarse ranges, right? rather that one fixed version
> requirement) more frequently than Emacs is released. Less of a problem for modes in ELPA, though.

Yeah I think ranges would be right. It would be good to say, we tested
this with versions N through M, anything else might not work. There
would still need to be some checks and patches like what exists in
js-ts-mode now. But that seems unavoidable, but could be cleaner if we
had a good way to ID grammars. Not sure about how we'd keep up with
grammars. Maybe we just can't and would need to have users install older
versions. That seems okay?

>> I'm not so sure we can have a great way to do this without a change to
>> the tree-sitter libraries. I would love to see some kind of increasing
>> version number generated in the grammar's C source that we could then
>> access. It could be used to make decisions about what queries to use, or
>> to warn the user they need to use a different grammar (maybe offering to
>> install a compatible version).
>
> Yes, that would be an improvement, worth being up on the issue tracker maybe.

Yeah, I think this is a good move. I opened up one here
https://github.com/tree-sitter/tree-sitter/issues/2611
Of course, anyone feel free to chime in.

Thank you,
-- 
Danny Freeman

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-07  3:18       ` Danny Freeman
@ 2023-09-07 12:52         ` Dmitry Gutov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-07 12:52 UTC (permalink / raw)
  To: Danny Freeman
  Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen,
	Randy Taylor, Wilhelm Kirschbaum, Perry Smith

On 07/09/2023 06:18, Danny Freeman wrote:
> 
> Dmitry Gutov <dgutov@yandex.ru> writes:
> 
>>> clojure-ts-mode keeps a URL for the parser, but doesn't do anything
>>> about the git revision. It easily could but I don't feel the need (yet)
>>> since I am also a maintainer of the clojure grammar and know when we're
>>> about to break grammar consumers.
>>
>> Sure, that's easy enough to do when the package is only in ELPA: upgrade the grammar, upgrade the
>> package, all in lockstep.
> 
> Yeah, soon after I sent that email I realized there is no reason for me
> not to specify a version for the grammar so I pushed a change doing just
> that.

Nice.

>> Grammars distributed from distros are more of a problem, because it's not always a good idea to
>> abort with "wrong version". But perhaps we could do that and recommend installing from Git in such
>> cases anyway?
> 
> In some cases, distros might place the grammars in a strange location
> made accessible on `treesit-extra-load-path`, which takes precedence
> over the grammars that are installed from git in the user's Emacs
> directory. This is what nix does, but is probably an outlier. I would
> guess more conventional distributions might just make them accessible
> where dynamic libraries are normally located and the grammars installed
> from git would take precedence.

Perhaps the user's Emacs directory should take precendence over 
treesit-extra-load-path. Or treesit-install-language-grammar should pick 
a higher-priority place instead. It just makes sense that the 
user-installed grammar would be loaded first.

>> Another problem is that grammars don't have good versioning, and even if they did, we'd have to
>> sometimes update the "upper bound" (we'd need coarse ranges, right? rather that one fixed version
>> requirement) more frequently than Emacs is released. Less of a problem for modes in ELPA, though.
> 
> Yeah I think ranges would be right. It would be good to say, we tested
> this with versions N through M, anything else might not work. There
> would still need to be some checks and patches like what exists in
> js-ts-mode now. But that seems unavoidable, but could be cleaner if we
> had a good way to ID grammars. Not sure about how we'd keep up with
> grammars. Maybe we just can't and would need to have users install older
> versions. That seems okay?

Basically, yes: if the current available grammar is outside of the 
compatibility range (and/or we get query errors, I'm not sure where to 
put the balance: I suppose sometimes the query will succeed but it 
wouldn't match some elements which it matched before), we issue a 
warning to the user that they're recommended to use 
treesit-install-language-grammar - installing the last-known good hash, 
which might as well be older than the current installed grammar.

>>> I'm not so sure we can have a great way to do this without a change to
>>> the tree-sitter libraries. I would love to see some kind of increasing
>>> version number generated in the grammar's C source that we could then
>>> access. It could be used to make decisions about what queries to use, or
>>> to warn the user they need to use a different grammar (maybe offering to
>>> install a compatible version).
>>
>> Yes, that would be an improvement, worth being up on the issue tracker maybe.
> 
> Yeah, I think this is a good move. I opened up one here
> https://github.com/tree-sitter/tree-sitter/issues/2611
> Of course, anyone feel free to chime in.

Thanks! I left a note too.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-03  0:56 ` Dmitry Gutov
  2023-09-06  2:51   ` Danny Freeman
@ 2023-09-08  1:04   ` Yuan Fu
  2023-09-08  6:40     ` Eli Zaretskii
  2023-09-08 21:05     ` Dmitry Gutov
  1 sibling, 2 replies; 30+ messages in thread
From: Yuan Fu @ 2023-09-08  1:04 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith



> On Sep 2, 2023, at 5:56 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> Hi Yuan,
> 
> On 02/09/2023 08:01, Yuan Fu wrote:
>> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error.
> 
> I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to).

That’ll probably be ideal for third-party modes. But knowing Eli, I don’t think builtin major modes can do that.

> 
>> Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too.
> 
> Something I mentioned previously, there is notion of scopes in tree-sitter docs, see the Local Variables section here: https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables
> 
> Basically to know which symbols are defined and for how long, the parser needs additional help from the major mode author.
> 
> Neovim's definition here: https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/ruby/locals.scm

Good call. I’ll add it to my TODO list, but it’ll have a lower priority, since I personally aren’t really interested in coloring variables different colors. If someone is interested, do please give it a try.

Yuan


^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-08  1:04   ` Yuan Fu
@ 2023-09-08  6:40     ` Eli Zaretskii
  2023-09-08 20:52       ` Dmitry Gutov
  2023-09-08 21:05     ` Dmitry Gutov
  1 sibling, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2023-09-08  6:40 UTC (permalink / raw)
  To: Yuan Fu; +Cc: dgutov, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 7 Sep 2023 18:04:02 -0700
> Cc: emacs-devel <emacs-devel@gnu.org>, Danny Freeman <danny@dfreeman.email>,
>  Theodor Thornhill <theo@thornhill.no>,
>  Jostein Kjønigsen <jostein@secure.kjonigsen.net>,
>  Randy Taylor <dev@rjt.dev>, Wilhelm Kirschbaum <wkirschbaum@gmail.com>,
>  Perry Smith <pedz@easesoftware.com>
> 
> > I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to).
> 
> That’ll probably be ideal for third-party modes. But knowing Eli, I don’t think builtin major modes can do that.

FTR, I have nothing against this technique, I just said that it will
need volunteers to assume this non-trivial job for each major mode,
and therefore I personally don't believe this to be a reliable
solution in practice.  But if volunteers step forward to do this, I
don't object.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-08  6:40     ` Eli Zaretskii
@ 2023-09-08 20:52       ` Dmitry Gutov
  2023-09-09  6:32         ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-08 20:52 UTC (permalink / raw)
  To: Eli Zaretskii, Yuan Fu
  Cc: emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz

On 08/09/2023 09:40, Eli Zaretskii wrote:
>> From: Yuan Fu<casouri@gmail.com>
>> Date: Thu, 7 Sep 2023 18:04:02 -0700
>> Cc: emacs-devel<emacs-devel@gnu.org>, Danny Freeman<danny@dfreeman.email>,
>>   Theodor Thornhill<theo@thornhill.no>,
>>   Jostein Kjønigsen<jostein@secure.kjonigsen.net>,
>>   Randy Taylor<dev@rjt.dev>, Wilhelm Kirschbaum<wkirschbaum@gmail.com>,
>>   Perry Smith<pedz@easesoftware.com>
>>
>>> I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to).
>> That’ll probably be ideal for third-party modes. But knowing Eli, I don’t think builtin major modes can do that.
> FTR, I have nothing against this technique, I just said that it will
> need volunteers to assume this non-trivial job for each major mode,
> and therefore I personally don't believe this to be a reliable
> solution in practice.  But if volunteers step forward to do this, I
> don't object.

I don't see a way around it, if the grammars continue to add breaking 
changes.

We already have volunteers: when somebody works on a ts mode (adds a new 
feature or verifies that the current font-lock and indentation work 
fine), might as well put in the last-known-good commit hash. Or update 
it, if needed (e.g. the new feature requires that).

Adding versions ranges if/when proper versions arrive might require more 
foresight, but the alternative seems to be "unsupporting" 
distro-packaged grammars.

What I'm saying is in this case not doing this job well (e.g. updating 
the commit hashes and font-lock/indent rules very rarely) might still be 
better than not doing it at all.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-08 20:52       ` Dmitry Gutov
@ 2023-09-09  6:32         ` Eli Zaretskii
  2023-09-09 10:24           ` Dmitry Gutov
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2023-09-09  6:32 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz

> Date: Fri, 8 Sep 2023 23:52:48 +0300
> Cc: emacs-devel@gnu.org, danny@dfreeman.email, theo@thornhill.no,
>  jostein@secure.kjonigsen.net, dev@rjt.dev, wkirschbaum@gmail.com,
>  pedz@easesoftware.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> > FTR, I have nothing against this technique, I just said that it will
> > need volunteers to assume this non-trivial job for each major mode,
> > and therefore I personally don't believe this to be a reliable
> > solution in practice.  But if volunteers step forward to do this, I
> > don't object.
> 
> I don't see a way around it, if the grammars continue to add breaking 
> changes.

If the only way we see is impractical, this doesn't help, does it?

> We already have volunteers: when somebody works on a ts mode (adds a new 
> feature or verifies that the current font-lock and indentation work 
> fine), might as well put in the last-known-good commit hash. Or update 
> it, if needed (e.g. the new feature requires that).

No, that'd be worse than what we have now: those commit hashes will
quickly become outdated (most grammar libraries are very actively
developed), and create the false impression that any later version will
not work.

The job is to track all the commits of the corresponding libraries and
keep the last commit known to work constantly up-to-date, with delays
that are at most days, not weeks or months.

> What I'm saying is in this case not doing this job well (e.g. updating 
> the commit hashes and font-lock/indent rules very rarely) might still be 
> better than not doing it at all.

I strongly disagree.  I think that doing this job not well is _worse_
than not doing it.  It takes just a few days, sometimes a couple of
weeks, from submission of the report about a breakage till a fix is
available, so people could install it soon enough and be done (since
these modes are not preloaded).  And we always strive to fix these
breakages in a way that makes the code more immune to further changes,
so there's hope that with time the frequency of these problems will
become lower.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-09  6:32         ` Eli Zaretskii
@ 2023-09-09 10:24           ` Dmitry Gutov
  2023-09-09 11:38             ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-09 10:24 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz

On 09/09/2023 09:32, Eli Zaretskii wrote:
>> Date: Fri, 8 Sep 2023 23:52:48 +0300
>> Cc: emacs-devel@gnu.org, danny@dfreeman.email, theo@thornhill.no,
>>   jostein@secure.kjonigsen.net, dev@rjt.dev, wkirschbaum@gmail.com,
>>   pedz@easesoftware.com
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>>> FTR, I have nothing against this technique, I just said that it will
>>> need volunteers to assume this non-trivial job for each major mode,
>>> and therefore I personally don't believe this to be a reliable
>>> solution in practice.  But if volunteers step forward to do this, I
>>> don't object.
>>
>> I don't see a way around it, if the grammars continue to add breaking
>> changes.
> 
> If the only way we see is impractical, this doesn't help, does it?

Is it definitely impractical if it's known to work for NeoVim?

>> We already have volunteers: when somebody works on a ts mode (adds a new
>> feature or verifies that the current font-lock and indentation work
>> fine), might as well put in the last-known-good commit hash. Or update
>> it, if needed (e.g. the new feature requires that).
> 
> No, that'd be worse than what we have now: those commit hashes will
> quickly become outdated (most grammar libraries are very actively
> developed), and create the false impression that any later version will
> not work.

But it's not the first thing the user sees, just internal information: 
we tested with this version last, it's known to work, so if you want to 
have a known well-working configuration, you will install this one. 
Might as well install the latest and try their luck, though.

Further, most important grammars seem to be in a reasonably complete 
state by now. So installing the known-to-work version shouldn't 
generally result in obvious omissions in language features supported. 
And, well, when a grammar adds support for new ones, we would likely 
have to update the major mode anyway (together with the hash).

> The job is to track all the commits of the corresponding libraries and
> keep the last commit known to work constantly up-to-date, with delays
> that are at most days, not weeks or months.

Consider that js-ts-mode is "broken" in Emacs 29.1 now with the latest 
grammar. If there was the last-known-working hash, we could offer the 
users a friendlier way to install it.

>> What I'm saying is in this case not doing this job well (e.g. updating
>> the commit hashes and font-lock/indent rules very rarely) might still be
>> better than not doing it at all.
> 
> I strongly disagree.  I think that doing this job not well is _worse_
> than not doing it.  It takes just a few days, sometimes a couple of
> weeks, from submission of the report about a breakage till a fix is
> available, so people could install it soon enough and be done (since
> these modes are not preloaded).

But the ts modes in an Emacs release (now in 29.1, later in 29.2, etc) 
remain incompatible with any future grammar changes, right?

> And we always strive to fix these
> breakages in a way that makes the code more immune to further changes,
> so there's hope that with time the frequency of these problems will
> become lower.

We can continue with this approach too (it's not incompatible with 
saving the last-known-hash anyway), and it could also be of benefit 
later if grammars grow proper versions, but it's also a bit of 
maintenance headache on its own.

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-09 10:24           ` Dmitry Gutov
@ 2023-09-09 11:38             ` Eli Zaretskii
  2023-09-09 17:04               ` Dmitry Gutov
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2023-09-09 11:38 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz

> Date: Sat, 9 Sep 2023 13:24:46 +0300
> Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email,
>  theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev,
>  wkirschbaum@gmail.com, pedz@easesoftware.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> On 09/09/2023 09:32, Eli Zaretskii wrote:
> >> Date: Fri, 8 Sep 2023 23:52:48 +0300
> >> Cc: emacs-devel@gnu.org, danny@dfreeman.email, theo@thornhill.no,
> >>   jostein@secure.kjonigsen.net, dev@rjt.dev, wkirschbaum@gmail.com,
> >>   pedz@easesoftware.com
> >> From: Dmitry Gutov <dgutov@yandex.ru>
> >>
> >>> FTR, I have nothing against this technique, I just said that it will
> >>> need volunteers to assume this non-trivial job for each major mode,
> >>> and therefore I personally don't believe this to be a reliable
> >>> solution in practice.  But if volunteers step forward to do this, I
> >>> don't object.
> >>
> >> I don't see a way around it, if the grammars continue to add breaking
> >> changes.
> > 
> > If the only way we see is impractical, this doesn't help, does it?
> 
> Is it definitely impractical if it's known to work for NeoVim?

NeoVim is a different editor, with (potentially) different user
audience, different development and release schedules, different
distribution practices, different downstream distros and upgrade
procedures, etc.  Without considering all of these aspects and
comparing them to ours, I don't think it's meaningful to say "works"
when we discuss what we should do in our case.

> > No, that'd be worse than what we have now: those commit hashes will
> > quickly become outdated (most grammar libraries are very actively
> > developed), and create the false impression that any later version will
> > not work.
> 
> But it's not the first thing the user sees, just internal information: 
> we tested with this version last, it's known to work, so if you want to 
> have a known well-working configuration, you will install this one. 
> Might as well install the latest and try their luck, though.

How is it useful to ask users to use, say, 2-year old versions of
grammar libraries, especially for languages where either the language
or the library (or both) change quickly?

> Further, most important grammars seem to be in a reasonably complete 
> state by now.

They add features and fix problems all the time.  So I disagree with
the "reasonably complete" part, and so are the developers of those
libraries, evidently.

> > The job is to track all the commits of the corresponding libraries and
> > keep the last commit known to work constantly up-to-date, with delays
> > that are at most days, not weeks or months.
> 
> Consider that js-ts-mode is "broken" in Emacs 29.1 now with the latest 
> grammar. If there was the last-known-working hash, we could offer the 
> users a friendlier way to install it.

How is it friendlier to downgrade to an older version (which would
require fetching it, building it with a C compiler, and installing it)
than to patch a single Lisp file?  Actually, people don't even need to
patch their Emacs installations, they could instead have a fixed
version of the Lisp file in their home directories or in site-lisp.

> >> What I'm saying is in this case not doing this job well (e.g. updating
> >> the commit hashes and font-lock/indent rules very rarely) might still be
> >> better than not doing it at all.
> > 
> > I strongly disagree.  I think that doing this job not well is _worse_
> > than not doing it.  It takes just a few days, sometimes a couple of
> > weeks, from submission of the report about a breakage till a fix is
> > available, so people could install it soon enough and be done (since
> > these modes are not preloaded).
> 
> But the ts modes in an Emacs release (now in 29.1, later in 29.2, etc) 
> remain incompatible with any future grammar changes, right?

That depends on the change: not every change breaks our modes.  Only
changes that remove or rename the syntactic elements on which we rely
are breaking changes, from our POV.

> > And we always strive to fix these
> > breakages in a way that makes the code more immune to further changes,
> > so there's hope that with time the frequency of these problems will
> > become lower.
> 
> We can continue with this approach too (it's not incompatible with 
> saving the last-known-hash anyway), and it could also be of benefit 
> later if grammars grow proper versions, but it's also a bit of 
> maintenance headache on its own.

I agree it's a maintenance headache, but this issue will cause us
headaches no matter what we do.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-09 11:38             ` Eli Zaretskii
@ 2023-09-09 17:04               ` Dmitry Gutov
  2023-09-09 17:28                 ` Eli Zaretskii
  0 siblings, 1 reply; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-09 17:04 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz

On 09/09/2023 14:38, Eli Zaretskii wrote:
>>> No, that'd be worse than what we have now: those commit hashes will
>>> quickly become outdated (most grammar libraries are very actively
>>> developed), and create the false impression that any later version will
>>> not work.
>>
>> But it's not the first thing the user sees, just internal information:
>> we tested with this version last, it's known to work, so if you want to
>> have a known well-working configuration, you will install this one.
>> Might as well install the latest and try their luck, though.
> 
> How is it useful to ask users to use, say, 2-year old versions of
> grammar libraries, especially for languages where either the language
> or the library (or both) change quickly?

It would be better to use a 2-year-old grammar which works with our mode 
than a new grammar which breaks our mode anyway.

We could also take a slightly more advanced approach: first install the 
latest version (if the user goes for 'M-x 
treesit-install-language-grammar' right away), and then in case of query 
errors suggest the version of the grammar known to work. But that's 
extra complexity (and more actions on the part of the user as well), and 
the actual benefit is hard to foretell.

>> Further, most important grammars seem to be in a reasonably complete
>> state by now.
> 
> They add features and fix problems all the time.  So I disagree with
> the "reasonably complete" part, and so are the developers of those
> libraries, evidently.

Depends on the individual language, of course.

>>> The job is to track all the commits of the corresponding libraries and
>>> keep the last commit known to work constantly up-to-date, with delays
>>> that are at most days, not weeks or months.
>>
>> Consider that js-ts-mode is "broken" in Emacs 29.1 now with the latest
>> grammar. If there was the last-known-working hash, we could offer the
>> users a friendlier way to install it.
> 
> How is it friendlier to downgrade to an older version (which would
> require fetching it, building it with a C compiler, and installing it)
> than to patch a single Lisp file?  Actually, people don't even need to
> patch their Emacs installations, they could instead have a fixed
> version of the Lisp file in their home directories or in site-lisp.

So we'll suggest they manually copy the latest version of xxx-js-mode.el 
from master over to their site-lisp? That will be our recommendation in 
case a grammar breaks?

I suppose we could publish all ts grammars in "core ELPA". Then the 
recommendation will be "just upgrade from ELPA" (though keeping in mind 
the associated usability problem like the one we discussed with Eglot). 
"Core ELPA" also inflicts certain restrictions on how the code in the 
package is written (backward compatibility checks, etc).

^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-09 17:04               ` Dmitry Gutov
@ 2023-09-09 17:28                 ` Eli Zaretskii
  2023-09-12  0:36                   ` Yuan Fu
  0 siblings, 1 reply; 30+ messages in thread
From: Eli Zaretskii @ 2023-09-09 17:28 UTC (permalink / raw)
  To: Dmitry Gutov
  Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum,
	pedz

> Date: Sat, 9 Sep 2023 20:04:07 +0300
> Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email,
>  theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev,
>  wkirschbaum@gmail.com, pedz@easesoftware.com
> From: Dmitry Gutov <dgutov@yandex.ru>
> 
> > How is it useful to ask users to use, say, 2-year old versions of
> > grammar libraries, especially for languages where either the language
> > or the library (or both) change quickly?
> 
> It would be better to use a 2-year-old grammar which works with our mode 
> than a new grammar which breaks our mode anyway.

But worse than using a 6-month-old grammar that doesn't break the mode
and has a lot of improvements.

> > How is it friendlier to downgrade to an older version (which would
> > require fetching it, building it with a C compiler, and installing it)
> > than to patch a single Lisp file?  Actually, people don't even need to
> > patch their Emacs installations, they could instead have a fixed
> > version of the Lisp file in their home directories or in site-lisp.
> 
> So we'll suggest they manually copy the latest version of xxx-js-mode.el 
> from master over to their site-lisp? That will be our recommendation in 
> case a grammar breaks?

Something like that, yes.  Or applying the diffs from the fix.

> I suppose we could publish all ts grammars in "core ELPA".

Yes, that could be a good solution, if nothing better comes up.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-09 17:28                 ` Eli Zaretskii
@ 2023-09-12  0:36                   ` Yuan Fu
  2023-09-12 10:17                     ` Dmitry Gutov
  0 siblings, 1 reply; 30+ messages in thread
From: Yuan Fu @ 2023-09-12  0:36 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: Dmitry Gutov, emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, pedz



> On Sep 9, 2023, at 10:28 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> Date: Sat, 9 Sep 2023 20:04:07 +0300
>> Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email,
>> theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev,
>> wkirschbaum@gmail.com, pedz@easesoftware.com
>> From: Dmitry Gutov <dgutov@yandex.ru>
>> 
>>> How is it useful to ask users to use, say, 2-year old versions of
>>> grammar libraries, especially for languages where either the language
>>> or the library (or both) change quickly?
>> 
>> It would be better to use a 2-year-old grammar which works with our mode 
>> than a new grammar which breaks our mode anyway.
> 
> But worse than using a 6-month-old grammar that doesn't break the mode
> and has a lot of improvements.
> 
>>> How is it friendlier to downgrade to an older version (which would
>>> require fetching it, building it with a C compiler, and installing it)
>>> than to patch a single Lisp file?  Actually, people don't even need to
>>> patch their Emacs installations, they could instead have a fixed
>>> version of the Lisp file in their home directories or in site-lisp.
>> 
>> So we'll suggest they manually copy the latest version of xxx-js-mode.el 
>> from master over to their site-lisp? That will be our recommendation in 
>> case a grammar breaks?
> 
> Something like that, yes.  Or applying the diffs from the fix.
> 
>> I suppose we could publish all ts grammars in "core ELPA".
> 
> Yes, that could be a good solution, if nothing better comes up.

Does “publish all ts grammars” mean the binary libraries?

Yuan




^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-12  0:36                   ` Yuan Fu
@ 2023-09-12 10:17                     ` Dmitry Gutov
  0 siblings, 0 replies; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-12 10:17 UTC (permalink / raw)
  To: Yuan Fu, Eli Zaretskii
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, pedz

On 12/09/2023 03:36, Yuan Fu wrote:
>>> I suppose we could publish all ts grammars in "core ELPA".
>> Yes, that could be a good solution, if nothing better comes up.
> Does “publish all ts grammars” mean the binary libraries?

I meant just the major modes, FWIW. Maybe that was poor wording.



^ permalink raw reply	[flat|nested] 30+ messages in thread

* Re: Update on tree-sitter structure navigation
  2023-09-08  1:04   ` Yuan Fu
  2023-09-08  6:40     ` Eli Zaretskii
@ 2023-09-08 21:05     ` Dmitry Gutov
  1 sibling, 0 replies; 30+ messages in thread
From: Dmitry Gutov @ 2023-09-08 21:05 UTC (permalink / raw)
  To: Yuan Fu
  Cc: emacs-devel, Danny Freeman, Theodor Thornhill,
	Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum,
	Perry Smith

On 08/09/2023 04:04, Yuan Fu wrote:
>> Something I mentioned previously, there is notion of scopes in tree-sitter docs, see the Local Variables section here:https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables
>>
>> Basically to know which symbols are defined and for how long, the parser needs additional help from the major mode author.
>>
>> Neovim's definition here:https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/ruby/locals.scm
> Good call. I’ll add it to my TODO list, but it’ll have a lower priority, since I personally aren’t really interested in coloring variables different colors. If someone is interested, do please give it a try.

Sure, it's probably more valuable in some languages than others.

In case you have some ideas for the implementation strategy, though, 
perhaps mention them inside treesit.el's Commentary (it could also have 
a TODO block). Offhand, it doesn't quite fit to what we do with 
font-lock. OTOH, I suppose I could go take a look at NVim's implementation.



^ permalink raw reply	[flat|nested] 30+ messages in thread

end of thread, other threads:[~2023-09-12 10:17 UTC | newest]

Thread overview: 30+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-02  5:01 Update on tree-sitter structure navigation Yuan Fu
2023-09-02  6:52 ` Ihor Radchenko
2023-09-02  8:50   ` Hugo Thunnissen
2023-09-02 22:12     ` Yuan Fu
2023-09-06 11:37       ` Ihor Radchenko
2023-09-08  0:59         ` Yuan Fu
2023-09-02 22:09   ` Yuan Fu
2023-09-06 11:57     ` Ihor Radchenko
2023-09-06 12:58       ` Eli Zaretskii
2023-09-08 12:03         ` Ihor Radchenko
2023-09-08 13:08           ` Eli Zaretskii
2023-09-08  1:06       ` Yuan Fu
2023-09-08  9:09         ` Ihor Radchenko
2023-09-08 16:46           ` Yuan Fu
2023-09-03  0:56 ` Dmitry Gutov
2023-09-06  2:51   ` Danny Freeman
2023-09-06 12:47     ` Dmitry Gutov
2023-09-07  3:18       ` Danny Freeman
2023-09-07 12:52         ` Dmitry Gutov
2023-09-08  1:04   ` Yuan Fu
2023-09-08  6:40     ` Eli Zaretskii
2023-09-08 20:52       ` Dmitry Gutov
2023-09-09  6:32         ` Eli Zaretskii
2023-09-09 10:24           ` Dmitry Gutov
2023-09-09 11:38             ` Eli Zaretskii
2023-09-09 17:04               ` Dmitry Gutov
2023-09-09 17:28                 ` Eli Zaretskii
2023-09-12  0:36                   ` Yuan Fu
2023-09-12 10:17                     ` Dmitry Gutov
2023-09-08 21:05     ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).