* Update on tree-sitter structure navigation @ 2023-09-02 5:01 Yuan Fu 2023-09-02 6:52 ` Ihor Radchenko 2023-09-03 0:56 ` Dmitry Gutov 0 siblings, 2 replies; 30+ messages in thread From: Yuan Fu @ 2023-09-02 5:01 UTC (permalink / raw) To: emacs-devel Cc: Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov Hey guys, In the months after wrapping up tree-sitter stuff in emacs-29, I was thinking about how to implement structural navigation and extracting information from the parser with tree-sitter. In emacs-29 we have things like treesit-beginning/end-of-defun, and treesit-defun-name. I was thinking maybe we can generalize this to support getting arbitrary “thing” at point, move around them, and getting information like the name of a defun, its arglist, parent of a class, type of an variable declaration, etc, in a language-agnostic way. Also, at the time, we only support defining things by a regexp matching a node’s type, which is often not enough. And it would be nice to somehow take advantage of the tree-sitter queries for the features I mentioned above. Tree-sitter query is what every other editor are using for virtually all tree-sitter related features. But in Emacs, we mostly only use it for font-lock. Here’s the progress as of now: - Functions like treesit-search-forward, treesit-induce-sparse-tree, treesit-thing-at-point, treesit--navigate-thing, etc, support a richer set of predicates now. Besides regexp matching the type, the predicate can also be a predication function, or (REGEP . FUNC), or compound predicates like (or PRED PRED) or (not PRED). - There’s now a variable treesit-thing-settings, which holds definition for things. Then, instead of passing the predicate to the functions I mentioned above, you can save the predicate in treesit-thing-settings under a symbol, say ‘sexp', and pass the symbol instead, just like thing-at-point.el. (We’ll work on integrating with thing-at-point.el later.) - I can’t think of a good way to integrate tree-sitter queries with the navigation functions we have right now. Most importantly, tree-sitter query always search top-down, and you can’t limit the depth it searches. OTOH, our navigation functions work by traversing the tree node-to-node. - There’s no progress on getting information like name and type, etc, in a language-agnostic way. I haven’t come up with a good interface and/or implementation. I encourage interested folks to give it some thought. Bonus points for reusing the query files neovim folks has accumulated :-) Some other things on the TODO list that people can take a jab at: - Query-based indentation (neovim’s implementation can be a source of inspiration) - Improve c-ts-mode (indentation styles, other cc-mode features, etc) and other tree-sitter modes - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged. - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next. Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too. Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-02 5:01 Update on tree-sitter structure navigation Yuan Fu @ 2023-09-02 6:52 ` Ihor Radchenko 2023-09-02 8:50 ` Hugo Thunnissen 2023-09-02 22:09 ` Yuan Fu 2023-09-03 0:56 ` Dmitry Gutov 1 sibling, 2 replies; 30+ messages in thread From: Ihor Radchenko @ 2023-09-02 6:52 UTC (permalink / raw) To: Yuan Fu Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov Yuan Fu <casouri@gmail.com> writes: > In the months after wrapping up tree-sitter stuff in emacs-29, I was > thinking about how to implement structural navigation and extracting > information from the parser with tree-sitter. In emacs-29 we have > things like treesit-beginning/end-of-defun, and treesit-defun-name. I > was thinking maybe we can generalize this to support getting arbitrary > “thing” at point, move around them, and getting information like the > name of a defun, its arglist, parent of a class, type of an variable > declaration, etc, in a language-agnostic way. Note that Org mode also does all of these using https://orgmode.org/worg/dev/org-element-api.html It would be nice if we could converge to more consistent interface across all the modes. For example, by extending `thing-at-point' to handle parsed elements, not just simplistic regexp-based "thing" boundaries exposed by `thing-at-point' now. Org approaches getting name/begin/end/arguments using a common API: (org-element-property :begin NODE) (org-element-property :end NODE) (org-element-property :contents-begin NODE) (org-element-property :contents-end NODE) (org-element-property :name NODE) (org-element-property :args NODE) Language-agnostic "thing"s will certainly be welcome, especially given that tree-sitter grammars use inconsistent naming schemes, which have to be learned separately, and may even change with grammar versions. I think that both NODE types and attributes can be standardized. > Also, at the time, we only support defining things by a regexp > matching a node’s type, which is often not enough. > > And it would be nice to somehow take advantage of the tree-sitter > queries for the features I mentioned above. Tree-sitter query is what > every other editor are using for virtually all tree-sitter related > features. But in Emacs, we mostly only use it for font-lock. I recall one user asking about something like VIM's textobjects via tree-sitter queries. Example: https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/cpp/textobjects.scm > Here’s the progress as of now: > > - Functions like treesit-search-forward, treesit-induce-sparse-tree, > treesit-thing-at-point, treesit--navigate-thing, etc, support a richer > set of predicates now. Besides regexp matching the type, the predicate > can also be a predication function, or (REGEP . FUNC), or compound > predicates like (or PRED PRED) or (not PRED). Slightly unrelated, but do you have any idea if it can be faster to use Emacs' regexp search combined with treesit-thing-at-point vs. pure tree-sitter query? > - There’s now a variable treesit-thing-settings, which holds > definition for things. Then, instead of passing the predicate to the > functions I mentioned above, you can save the predicate in > treesit-thing-settings under a symbol, say ‘sexp', and pass the symbol > instead, just like thing-at-point.el. (We’ll work on integrating with > thing-at-point.el later.) This sounds similar to textobjects I linked above. One question: how will it integrate with multiple parsers in one buffer? > - I can’t think of a good way to integrate tree-sitter queries with > the navigation functions we have right now. Most importantly, > tree-sitter query always search top-down, and you can’t limit the > depth it searches. OTOH, our navigation functions work by traversing > the tree node-to-node. May you elaborate about the difficulties you encountered? > Some other things on the TODO list that people can take a jab at: > > - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. May we somehow get a hash of the library? That way, we can at least detect if something has changed. > - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged. I think that integration of tree-sitter with navigation functions might be a step towards solving this problem. If common Emacs commands can automatically choose between tree-sitter and classic implementations, it might become easier to unify foo-ts-mode with foo-mode. > - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next. Do you mean that a single parser sees subsequent block as a continuation of the previous? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-02 6:52 ` Ihor Radchenko @ 2023-09-02 8:50 ` Hugo Thunnissen 2023-09-02 22:12 ` Yuan Fu 2023-09-02 22:09 ` Yuan Fu 1 sibling, 1 reply; 30+ messages in thread From: Hugo Thunnissen @ 2023-09-02 8:50 UTC (permalink / raw) To: Ihor Radchenko Cc: Yuan Fu, emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov Ihor Radchenko <yantar92@posteo.net> writes: > Yuan Fu <casouri@gmail.com> writes: > >> In the months after wrapping up tree-sitter stuff in emacs-29, I was >> thinking about how to implement structural navigation and extracting >> information from the parser with tree-sitter. In emacs-29 we have >> things like treesit-beginning/end-of-defun, and treesit-defun-name. I >> was thinking maybe we can generalize this to support getting arbitrary >> “thing” at point, move around them, and getting information like the >> name of a defun, its arglist, parent of a class, type of an variable >> declaration, etc, in a language-agnostic way. > > Note that Org mode also does all of these using > https://orgmode.org/worg/dev/org-element-api.html > > It would be nice if we could converge to more consistent interface > across all the modes. For example, by extending `thing-at-point' to handle > parsed elements, not just simplistic regexp-based "thing" boundaries > exposed by `thing-at-point' now. > > Org approaches getting name/begin/end/arguments using a common API: > > (org-element-property :begin NODE) > (org-element-property :end NODE) > (org-element-property :contents-begin NODE) > (org-element-property :contents-end NODE) > (org-element-property :name NODE) > (org-element-property :args NODE) > > Language-agnostic "thing"s will certainly be welcome, especially given > that tree-sitter grammars use inconsistent naming schemes, which have to > be learned separately, and may even change with grammar versions. > > I think that both NODE types and attributes can be standardized. > It would be great to see standardization that can work with more than just tree-sitter. Depending on how extensive such a generic NODE type and accompanying API are, I could see standardization of a lot of things that are currently implemented in major modes, to name a few: - indentation - fontification - thing-at-point - imenu - simple forms of completion (variables, function names in buffer) I have some idea of the underpinnings, but I have never implemented a full major mode so it is hard for me to judge the practicality of this. How much would be practical to standardize, without needlessly complicated/resource-heavy abstractions? ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-02 8:50 ` Hugo Thunnissen @ 2023-09-02 22:12 ` Yuan Fu 2023-09-06 11:37 ` Ihor Radchenko 0 siblings, 1 reply; 30+ messages in thread From: Yuan Fu @ 2023-09-02 22:12 UTC (permalink / raw) To: Hugo Thunnissen Cc: Ihor Radchenko, emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov > On Sep 2, 2023, at 1:50 AM, Hugo Thunnissen <devel@hugot.nl> wrote: > > Ihor Radchenko <yantar92@posteo.net> writes: > >> Yuan Fu <casouri@gmail.com> writes: >> >>> In the months after wrapping up tree-sitter stuff in emacs-29, I was >>> thinking about how to implement structural navigation and extracting >>> information from the parser with tree-sitter. In emacs-29 we have >>> things like treesit-beginning/end-of-defun, and treesit-defun-name. I >>> was thinking maybe we can generalize this to support getting arbitrary >>> “thing” at point, move around them, and getting information like the >>> name of a defun, its arglist, parent of a class, type of an variable >>> declaration, etc, in a language-agnostic way. >> >> Note that Org mode also does all of these using >> https://orgmode.org/worg/dev/org-element-api.html >> >> It would be nice if we could converge to more consistent interface >> across all the modes. For example, by extending `thing-at-point' to handle >> parsed elements, not just simplistic regexp-based "thing" boundaries >> exposed by `thing-at-point' now. >> >> Org approaches getting name/begin/end/arguments using a common API: >> >> (org-element-property :begin NODE) >> (org-element-property :end NODE) >> (org-element-property :contents-begin NODE) >> (org-element-property :contents-end NODE) >> (org-element-property :name NODE) >> (org-element-property :args NODE) >> >> Language-agnostic "thing"s will certainly be welcome, especially given >> that tree-sitter grammars use inconsistent naming schemes, which have to >> be learned separately, and may even change with grammar versions. >> >> I think that both NODE types and attributes can be standardized. >> > > It would be great to see standardization that can work with more than > just tree-sitter. Depending on how extensive such a generic NODE type > and accompanying API are, I could see standardization of a lot of things > that are currently implemented in major modes, to name a few: > > - indentation > - fontification > - thing-at-point > - imenu > - simple forms of completion (variables, function names in buffer) > > I have some idea of the underpinnings, but I have never implemented a > full major mode so it is hard for me to judge the practicality of > this. How much would be practical to standardize, without needlessly > complicated/resource-heavy abstractions? I don’t know which level of standardization you are thinking about, but aren’t they already standardized? - indentation: indent-line/region-function - fontification: font-lock-defaults - thing-at-point: thing-at-point function - imenu: imenu-create-index-function - completion: completion-at-point-function Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-02 22:12 ` Yuan Fu @ 2023-09-06 11:37 ` Ihor Radchenko 2023-09-08 0:59 ` Yuan Fu 0 siblings, 1 reply; 30+ messages in thread From: Ihor Radchenko @ 2023-09-06 11:37 UTC (permalink / raw) To: Yuan Fu Cc: Hugo Thunnissen, emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov Yuan Fu <casouri@gmail.com> writes: > I don’t know which level of standardization you are thinking about, but aren’t they already standardized? > ... > - fontification: font-lock-defaults AFAIU, tree-sitter-specific font-lock is configured separately from the rest of the font-lock-keywords. > - thing-at-point: thing-at-point function Adding new "things" is not well-documented though. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-06 11:37 ` Ihor Radchenko @ 2023-09-08 0:59 ` Yuan Fu 0 siblings, 0 replies; 30+ messages in thread From: Yuan Fu @ 2023-09-08 0:59 UTC (permalink / raw) To: Ihor Radchenko Cc: Hugo Thunnissen, emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov > On Sep 6, 2023, at 4:37 AM, Ihor Radchenko <yantar92@posteo.net> wrote: > > Yuan Fu <casouri@gmail.com> writes: > >> I don’t know which level of standardization you are thinking about, but aren’t they already standardized? >> ... >> - fontification: font-lock-defaults > > AFAIU, tree-sitter-specific font-lock is configured separately from the > rest of the font-lock-keywords. The standard interfacing I’m referring to is what tree-sitter uses, rather than what tree-sitter provides. Ie, font-lock-fontify-region-function, etc. > >> - thing-at-point: thing-at-point function > > Adding new "things" is not well-documented though. That’s true. I didn’t investigate myself, either. Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-02 6:52 ` Ihor Radchenko 2023-09-02 8:50 ` Hugo Thunnissen @ 2023-09-02 22:09 ` Yuan Fu 2023-09-06 11:57 ` Ihor Radchenko 1 sibling, 1 reply; 30+ messages in thread From: Yuan Fu @ 2023-09-02 22:09 UTC (permalink / raw) To: Ihor Radchenko Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov > On Sep 1, 2023, at 11:52 PM, Ihor Radchenko <yantar92@posteo.net> wrote: > > Yuan Fu <casouri@gmail.com> writes: > >> In the months after wrapping up tree-sitter stuff in emacs-29, I was >> thinking about how to implement structural navigation and extracting >> information from the parser with tree-sitter. In emacs-29 we have >> things like treesit-beginning/end-of-defun, and treesit-defun-name. I >> was thinking maybe we can generalize this to support getting arbitrary >> “thing” at point, move around them, and getting information like the >> name of a defun, its arglist, parent of a class, type of an variable >> declaration, etc, in a language-agnostic way. > > Note that Org mode also does all of these using > https://orgmode.org/worg/dev/org-element-api.html > > It would be nice if we could converge to more consistent interface > across all the modes. For example, by extending `thing-at-point' to handle > parsed elements, not just simplistic regexp-based "thing" boundaries > exposed by `thing-at-point' now. > > Org approaches getting name/begin/end/arguments using a common API: > > (org-element-property :begin NODE) > (org-element-property :end NODE) > (org-element-property :contents-begin NODE) > (org-element-property :contents-end NODE) > (org-element-property :name NODE) > (org-element-property :args NODE) > > Language-agnostic "thing"s will certainly be welcome, especially given > that tree-sitter grammars use inconsistent naming schemes, which have to > be learned separately, and may even change with grammar versions. > > I think that both NODE types and attributes can be standardized. If we come up with a thing-at-point interface that provides more information than the current (BEG . END), tree-sitter surely can support it as a backend. Just need SomeOne to come up with it :-) But I don’t see how this interface can support semantic information like arglist of a defun, or type of a declaration—these things are not universal to all “nodes”. > >> Also, at the time, we only support defining things by a regexp >> matching a node’s type, which is often not enough. >> >> And it would be nice to somehow take advantage of the tree-sitter >> queries for the features I mentioned above. Tree-sitter query is what >> every other editor are using for virtually all tree-sitter related >> features. But in Emacs, we mostly only use it for font-lock. > > I recall one user asking about something like VIM's textobjects via > tree-sitter queries. Example: > https://github.com/nvim-treesitter/nvim-treesitter-textobjects/blob/master/queries/cpp/textobjects.scm I think that’s something that can be implemented with thing definitions. >> Here’s the progress as of now: >> >> - Functions like treesit-search-forward, treesit-induce-sparse-tree, >> treesit-thing-at-point, treesit--navigate-thing, etc, support a richer >> set of predicates now. Besides regexp matching the type, the predicate >> can also be a predication function, or (REGEP . FUNC), or compound >> predicates like (or PRED PRED) or (not PRED). > > Slightly unrelated, but do you have any idea if it can be faster to use > Emacs' regexp search combined with treesit-thing-at-point vs. pure > tree-sitter query? Not really. > >> - There’s now a variable treesit-thing-settings, which holds >> definition for things. Then, instead of passing the predicate to the >> functions I mentioned above, you can save the predicate in >> treesit-thing-settings under a symbol, say ‘sexp', and pass the symbol >> instead, just like thing-at-point.el. (We’ll work on integrating with >> thing-at-point.el later.) > > This sounds similar to textobjects I linked above. > One question: how will it integrate with multiple parsers in one buffer? This only concerns with checking if a node satisfies the definition of a “thing”, and doesn’t care how you get the node. Retrieving node through either treesit-node-at or other functions already works with multiple parsers. Also the “thing” definition is language-specific. > >> - I can’t think of a good way to integrate tree-sitter queries with >> the navigation functions we have right now. Most importantly, >> tree-sitter query always search top-down, and you can’t limit the >> depth it searches. OTOH, our navigation functions work by traversing >> the tree node-to-node. > > May you elaborate about the difficulties you encountered? Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful. > >> Some other things on the TODO list that people can take a jab at: >> >> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. > > May we somehow get a hash of the library? That way, we can at least > detect if something has changed. All we get is a binary dynamic library. So I don’t think so. > >> - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged. > > I think that integration of tree-sitter with navigation functions might > be a step towards solving this problem. If common Emacs commands can > automatically choose between tree-sitter and classic implementations, it > might become easier to unify foo-ts-mode with foo-mode. Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion. > >> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next. > > Do you mean that a single parser sees subsequent block as a continuation > of the previous? Exactly. Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-02 22:09 ` Yuan Fu @ 2023-09-06 11:57 ` Ihor Radchenko 2023-09-06 12:58 ` Eli Zaretskii 2023-09-08 1:06 ` Yuan Fu 0 siblings, 2 replies; 30+ messages in thread From: Ihor Radchenko @ 2023-09-06 11:57 UTC (permalink / raw) To: Yuan Fu Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov Yuan Fu <casouri@gmail.com> writes: > I think that both NODE types and attributes can be standardized. > > If we come up with a thing-at-point interface that provides more information than the current (BEG . END), tree-sitter surely can support it as a backend. Just need SomeOne to come up with it :-) But I don’t see how this interface can support semantic information like arglist of a defun, or type of a declaration—these things are not universal to all “nodes”. For example, consider something like (thing-slot 'arglist (thing-at-point 'defun)) ; => (ARGLIST_BEG . ARGLIST_END) (thing-slot 'arglist (thing-at-point 'variable)) ; => nil >>> - I can’t think of a good way to integrate tree-sitter queries with >>> the navigation functions we have right now. Most importantly, >>> tree-sitter query always search top-down, and you can’t limit the >>> depth it searches. OTOH, our navigation functions work by traversing >>> the tree node-to-node. >> >> May you elaborate about the difficulties you encountered? > > Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful. Isn't ts_query_cursor_next_match only searching a single match? >>> - Major mode fallback/inheritance, this has been discussed many times, no good solution emerged. >> >> I think that integration of tree-sitter with navigation functions might >> be a step towards solving this problem. If common Emacs commands can >> automatically choose between tree-sitter and classic implementations, it >> might become easier to unify foo-ts-mode with foo-mode. > > Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion. Any chance you have links to these discussions? >>> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next. >> >> Do you mean that a single parser sees subsequent block as a continuation >> of the previous? > > Exactly. Then, I can see cases when we do and also when we do _not_ want separate parsers for different blocks. For example, literate programming often uses other language blocks that are intended to be continuous. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-06 11:57 ` Ihor Radchenko @ 2023-09-06 12:58 ` Eli Zaretskii 2023-09-08 12:03 ` Ihor Radchenko 2023-09-08 1:06 ` Yuan Fu 1 sibling, 1 reply; 30+ messages in thread From: Eli Zaretskii @ 2023-09-06 12:58 UTC (permalink / raw) To: Ihor Radchenko Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz, dgutov > From: Ihor Radchenko <yantar92@posteo.net> > Cc: emacs-devel <emacs-devel@gnu.org>, Danny Freeman <danny@dfreeman.email>, > Theodor Thornhill <theo@thornhill.no>, Jostein Kjønigsen > <jostein@secure.kjonigsen.net>, Randy Taylor <dev@rjt.dev>, Wilhelm > Kirschbaum <wkirschbaum@gmail.com>, Perry Smith <pedz@easesoftware.com>, > Dmitry Gutov <dgutov@yandex.ru> > Date: Wed, 06 Sep 2023 11:57:26 +0000 > > > Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion. > > Any chance you have links to these discussions? Here's one: https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01251.html https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01293.html ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-06 12:58 ` Eli Zaretskii @ 2023-09-08 12:03 ` Ihor Radchenko 2023-09-08 13:08 ` Eli Zaretskii 0 siblings, 1 reply; 30+ messages in thread From: Ihor Radchenko @ 2023-09-08 12:03 UTC (permalink / raw) To: Eli Zaretskii Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz, dgutov Eli Zaretskii <eliz@gnu.org> writes: >> > Unifying tree-sitter and non-tree-sitter modes creates many problems. I’m rather thinking about some way to share some configuration between two modes. We’ve had many discussions before with no fruitful conclusion. >> >> Any chance you have links to these discussions? > > Here's one: > > https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01251.html > https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01293.html Thanks! According to the discussion, the main problem is that interleaving ts-related and ts-unrelated code in the same mode is risky. It is safer to have a dedicated foo-ts-mode rather than modifying the existing foo-mode. However, separate *-ts- and *- modes create a problem when user config tailored for old, non-ts mode will no longer work. For example, c-ts-mode has `c-ts-mode-indent-offset', while cc-mode has c-basic-offset in `c-style-alist'. Ideally, user-facing API should be shared between the modes: defcustoms, faces, and certain high-level functions like `c-set-style'. One might slowly: 1. Add support of foo-mode's defcustoms to foo-ts-mode, when applicable 2. Create a shared API between foo-mode and foo-ts-mode that will call the appropriate implementation depending on which mode is active. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-08 12:03 ` Ihor Radchenko @ 2023-09-08 13:08 ` Eli Zaretskii 0 siblings, 0 replies; 30+ messages in thread From: Eli Zaretskii @ 2023-09-08 13:08 UTC (permalink / raw) To: Ihor Radchenko Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz, dgutov > From: Ihor Radchenko <yantar92@posteo.net> > Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email, > theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev, > wkirschbaum@gmail.com, pedz@easesoftware.com, dgutov@yandex.ru > Date: Fri, 08 Sep 2023 12:03:58 +0000 > > Eli Zaretskii <eliz@gnu.org> writes: > > > https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01251.html > > https://lists.gnu.org/archive/html/emacs-devel/2022-12/msg01293.html > > Thanks! > > According to the discussion, the main problem is that interleaving > ts-related and ts-unrelated code in the same mode is risky. It is safer > to have a dedicated foo-ts-mode rather than modifying the existing > foo-mode. No, that's the wrong conclusion. The main reason is that mixing these modes makes no sense in most cases, due to completely different infrastructures they use. The main aspects of a major mode -- font-lock, indentation, and defun- and expression-level navigation -- are based on such different grounds that you cannot possibly reuse them. And once those are implemented on a different basis, what is left to share? > However, separate *-ts- and *- modes create a problem when user config > tailored for old, non-ts mode will no longer work. There's no argument that this is a disadvantage that causes problems to users. The challenge is to find a good solution. The basic requirements from such a solution are: . as much as possible, provide the same or equivalent features . allow easy migration of customizations from an old mode to a TS mode . allow to switch easily between the two kinds of modes for the same PL, in both directions (for example, to let users try the TS mode and switch back if they don't like it) . avoid complicating the maintenance too much > For example, c-ts-mode has `c-ts-mode-indent-offset', while cc-mode > has c-basic-offset in `c-style-alist'. Yes, but CC Mode's indentation customizations cannot be ported to c-ts-mode because they are based on a completely different classification of syntactic elements, so what do you propose as the solution for this particular schism? As for c-style-alist, the elements of the style are also completely different. So for now, we provide a different variable for c-ts-mode which supports the subset of built-in styles supported by CC Mode. If you have a concrete proposal for a better solution, let's hear it. > Ideally, user-facing API should be shared between the modes: defcustoms, > faces, and certain high-level functions like `c-set-style'. Again, there's no argument about the ideal, and never was. We just couldn't find a way of implementing this ideal without bumping into serious problems. May I suggest to study the code of at least a few pairs of modes, and see what I'm talking about? > One might slowly: > 1. Add support of foo-mode's defcustoms to foo-ts-mode, when applicable > 2. Create a shared API between foo-mode and foo-ts-mode that will call > the appropriate implementation depending on which mode is active. This sounds great in the abstract, but in practice bumps into serious implementation problems. The names of the variables are the least of our problems; the fact that we provide different names in the TS modes is to make sure no one expects the non-TS customizations to work with TS modes, because that's currently impossible: the internal structure of the data of the variables, as well as the way the related internal functions work, is too different. As an exercise, try to create an API for font-lock that could be shared by a TS and a non-TS mode. If you succeed, and if the result is significantly different from what we already have, please present the solution, because maybe we have missed something. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-06 11:57 ` Ihor Radchenko 2023-09-06 12:58 ` Eli Zaretskii @ 2023-09-08 1:06 ` Yuan Fu 2023-09-08 9:09 ` Ihor Radchenko 1 sibling, 1 reply; 30+ messages in thread From: Yuan Fu @ 2023-09-08 1:06 UTC (permalink / raw) To: Ihor Radchenko Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov > On Sep 6, 2023, at 4:57 AM, Ihor Radchenko <yantar92@posteo.net> wrote: > > Yuan Fu <casouri@gmail.com> writes: > >> I think that both NODE types and attributes can be standardized. >> >> If we come up with a thing-at-point interface that provides more information than the current (BEG . END), tree-sitter surely can support it as a backend. Just need SomeOne to come up with it :-) But I don’t see how this interface can support semantic information like arglist of a defun, or type of a declaration—these things are not universal to all “nodes”. > > For example, consider something like > > (thing-slot 'arglist (thing-at-point 'defun)) ; => (ARGLIST_BEG . ARGLIST_END) > (thing-slot 'arglist (thing-at-point 'variable)) ; => nil > Yeah, that makes sense. >>>> - I can’t think of a good way to integrate tree-sitter queries with >>>> the navigation functions we have right now. Most importantly, >>>> tree-sitter query always search top-down, and you can’t limit the >>>> depth it searches. OTOH, our navigation functions work by traversing >>>> the tree node-to-node. >>> >>> May you elaborate about the difficulties you encountered? >> >> Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful. > > Isn't ts_query_cursor_next_match only searching a single match? Seems so, that’s good. But there’s no guarantee that the first match with be the top node, even thought implementation-wise, I think that’s probably the case. Maybe we can ask tree-sitter developer to add such a promise. >>>> - Isolated ranges. For many embedded languages, each blocks should be independent from another, but currently all the embedded blocks are connected together and parsed by a single parser. We probably need to spawn a parser for each block. I’ll probably work on this one next. >>> >>> Do you mean that a single parser sees subsequent block as a continuation >>> of the previous? >> >> Exactly. > > Then, I can see cases when we do and also when we do _not_ want separate > parsers for different blocks. For example, literate programming often > uses other language blocks that are intended to be continuous. Surprise, I added support for local parsers. Major mode authors can choose between global and local parsers. Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-08 1:06 ` Yuan Fu @ 2023-09-08 9:09 ` Ihor Radchenko 2023-09-08 16:46 ` Yuan Fu 0 siblings, 1 reply; 30+ messages in thread From: Ihor Radchenko @ 2023-09-08 9:09 UTC (permalink / raw) To: Yuan Fu Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov Yuan Fu <casouri@gmail.com> writes: >>> Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful. >> >> Isn't ts_query_cursor_next_match only searching a single match? > > Seems so, that’s good. But there’s no guarantee that the first match with be the top node, even thought implementation-wise, I think that’s probably the case. Maybe we can ask tree-sitter developer to add such a promise. I have found several potentially useful things in the ABI https://github.com/tree-sitter/tree-sitter/blob/524bf7e2c664d4a5dbd0c20d4d10f1e58f99e8ce/lib/include/tree_sitter/api.h /** * Set the maximum start depth for a query cursor. * * This prevents cursors from exploring children nodes at a certain depth. * Note if a pattern includes many children, then they will still be checked. * * The zero max start depth value can be used as a special behavior and * it helps to destructure a subtree by staying on a node and using captures * for interested parts. Note that the zero max start depth only limit a search * depth for a pattern's root node but other nodes that are parts of the pattern * may be searched at any depth what defined by the pattern structure. * * Set to `UINT32_MAX` to remove the maximum start depth. */ void ts_query_cursor_set_max_start_depth(TSQueryCursor *self, uint32_t max_start_depth); /** * Set the range of bytes or (row, column) positions in which the query * will be executed. */ void ts_query_cursor_set_byte_range(TSQueryCursor *self, uint32_t start_byte, uint32_t end_byte); void ts_query_cursor_set_point_range(TSQueryCursor *self, TSPoint start_point, TSPoint end_point); >> Then, I can see cases when we do and also when we do _not_ want separate >> parsers for different blocks. For example, literate programming often >> uses other language blocks that are intended to be continuous. > > Surprise, I added support for local parsers. Major mode authors can choose between global and local parsers. Thanks! -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at <https://orgmode.org/>. Support Org development at <https://liberapay.com/org-mode>, or support my work at <https://liberapay.com/yantar92> ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-08 9:09 ` Ihor Radchenko @ 2023-09-08 16:46 ` Yuan Fu 0 siblings, 0 replies; 30+ messages in thread From: Yuan Fu @ 2023-09-08 16:46 UTC (permalink / raw) To: Ihor Radchenko Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith, Dmitry Gutov > On Sep 8, 2023, at 2:09 AM, Ihor Radchenko <yantar92@posteo.net> wrote: > > Yuan Fu <casouri@gmail.com> writes: > >>>> Ideally I’d like to pass a query and a node to treesit-node-match-p, which returns t if the query matches the node. But queries don’t work like that. They search the node and returns all the matches within that node, which could be potentially wasteful. >>> >>> Isn't ts_query_cursor_next_match only searching a single match? >> >> Seems so, that’s good. But there’s no guarantee that the first match with be the top node, even thought implementation-wise, I think that’s probably the case. Maybe we can ask tree-sitter developer to add such a promise. > > I have found several potentially useful things in the ABI > https://github.com/tree-sitter/tree-sitter/blob/524bf7e2c664d4a5dbd0c20d4d10f1e58f99e8ce/lib/include/tree_sitter/api.h > > /** > * Set the maximum start depth for a query cursor. > * > * This prevents cursors from exploring children nodes at a certain depth. > * Note if a pattern includes many children, then they will still be checked. > * > * The zero max start depth value can be used as a special behavior and > * it helps to destructure a subtree by staying on a node and using captures > * for interested parts. Note that the zero max start depth only limit a search > * depth for a pattern's root node but other nodes that are parts of the pattern > * may be searched at any depth what defined by the pattern structure. > * > * Set to `UINT32_MAX` to remove the maximum start depth. > */ > void ts_query_cursor_set_max_start_depth(TSQueryCursor *self, uint32_t max_start_depth); > > /** > * Set the range of bytes or (row, column) positions in which the query > * will be executed. > */ > void ts_query_cursor_set_byte_range(TSQueryCursor *self, uint32_t start_byte, uint32_t end_byte); > void ts_query_cursor_set_point_range(TSQueryCursor *self, TSPoint start_point, TSPoint end_point); That’s great. Seems like a new addition to the API. That solves every problem I had! Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-02 5:01 Update on tree-sitter structure navigation Yuan Fu 2023-09-02 6:52 ` Ihor Radchenko @ 2023-09-03 0:56 ` Dmitry Gutov 2023-09-06 2:51 ` Danny Freeman 2023-09-08 1:04 ` Yuan Fu 1 sibling, 2 replies; 30+ messages in thread From: Dmitry Gutov @ 2023-09-03 0:56 UTC (permalink / raw) To: Yuan Fu, emacs-devel Cc: Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith Hi Yuan, On 02/09/2023 08:01, Yuan Fu wrote: > - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to). > Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too. Something I mentioned previously, there is notion of scopes in tree-sitter docs, see the Local Variables section here: https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables Basically to know which symbols are defined and for how long, the parser needs additional help from the major mode author. Neovim's definition here: https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/ruby/locals.scm ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-03 0:56 ` Dmitry Gutov @ 2023-09-06 2:51 ` Danny Freeman 2023-09-06 12:47 ` Dmitry Gutov 2023-09-08 1:04 ` Yuan Fu 1 sibling, 1 reply; 30+ messages in thread From: Danny Freeman @ 2023-09-06 2:51 UTC (permalink / raw) To: Dmitry Gutov Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith Dmitry Gutov <dgutov@yandex.ru> writes: > Hi Yuan, > > On 02/09/2023 08:01, Yuan Fu wrote: >> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version >> number, so every time the author changes the grammar, our queries break, and loading the mode only >> produces a giant error. > > I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser > repositories and the ref of the latest known good revision, for the current version of the major > mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly > to how auto-mode-alist is appended to). clojure-ts-mode keeps a URL for the parser, but doesn't do anything about the git revision. It easily could but I don't feel the need (yet) since I am also a maintainer of the clojure grammar and know when we're about to break grammar consumers. It's not quite that simple though. Some distributions (nixos for example) are already providing pre-compiled grammars. That is how I discovered a couple recent bugs in js-ts-mode, because the grammars distributed with nixos 23.05 no longer worked on Emacs 30 after a patch was applied that was supposed to be backwards compatible (a real pain to verify in my experience). With the way Emacs can load a grammar provided by the user's distribution, keeping information about the version of the grammar in the major mode doesn't help all that much. Even if we did it we have no idea what version might be have been built used the user's .emacs.d/tree-sitter folder. That would require something like putting a version number in the file name, or maybe applying a patch to the grammar's C source that allowed us to get a version, SHA, something at runtime. I'm not so sure we can have a great way to do this without a change to the tree-sitter libraries. I would love to see some kind of increasing version number generated in the grammar's C source that we could then access. It could be used to make decisions about what queries to use, or to warn the user they need to use a different grammar (maybe offering to install a compatible version). Tree-sitter grammar changes are almost always breaking changes. Adding nodes can break things, re-naming them and removing them definitely can. I'm not sure any grammar consumer has a great way to deal with this without always compiling the exact grammar they need and only ever using it. -- Danny Freeman ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-06 2:51 ` Danny Freeman @ 2023-09-06 12:47 ` Dmitry Gutov 2023-09-07 3:18 ` Danny Freeman 0 siblings, 1 reply; 30+ messages in thread From: Dmitry Gutov @ 2023-09-06 12:47 UTC (permalink / raw) To: Danny Freeman Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith On 06/09/2023 05:51, Danny Freeman wrote: > > Dmitry Gutov <dgutov@yandex.ru> writes: > >> Hi Yuan, >> >> On 02/09/2023 08:01, Yuan Fu wrote: >>> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version >>> number, so every time the author changes the grammar, our queries break, and loading the mode only >>> produces a giant error. >> >> I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser >> repositories and the ref of the latest known good revision, for the current version of the major >> mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly >> to how auto-mode-alist is appended to). > > clojure-ts-mode keeps a URL for the parser, but doesn't do anything > about the git revision. It easily could but I don't feel the need (yet) > since I am also a maintainer of the clojure grammar and know when we're > about to break grammar consumers. Sure, that's easy enough to do when the package is only in ELPA: upgrade the grammar, upgrade the package, all in lockstep. Unless nixos or other distros are going to start distributing it as well, and you'll need to care about having the recent clojure-ts-mode being loaded with old versions of the grammar. > It's not quite that simple though. Some distributions (nixos for > example) are already providing pre-compiled grammars. That is how I > discovered a couple recent bugs in js-ts-mode, because the grammars > distributed with nixos 23.05 no longer worked on Emacs 30 after a patch > was applied that was supposed to be backwards compatible (a real pain to > verify in my experience). A helpful find. ;) > With the way Emacs can load a grammar provided by the user's > distribution, keeping information about the version of the grammar in > the major mode doesn't help all that much. Even if we did it we have no > idea what version might be have been built used the user's > .emacs.d/tree-sitter folder. That would require something like putting a > version number in the file name, or maybe applying a patch to the > grammar's C source that allowed us to get a version, SHA, something at > runtime. Well, it would at least allow the user to rebuild the grammar to the version best known to work. Also, perhaps if the mode tracks the changes in the hash over time, it could see whether the grammar needs to be rebuilt. Finally, treesit-install-language-grammar could track which revision was last compiled. So there is *something* we could do for the users who upgrade their grammars from Git. Grammars distributed from distros are more of a problem, because it's not always a good idea to abort with "wrong version". But perhaps we could do that and recommend installing from Git in such cases anyway? Another problem is that grammars don't have good versioning, and even if they did, we'd have to sometimes update the "upper bound" (we'd need coarse ranges, right? rather that one fixed version requirement) more frequently than Emacs is released. Less of a problem for modes in ELPA, though. > I'm not so sure we can have a great way to do this without a change to > the tree-sitter libraries. I would love to see some kind of increasing > version number generated in the grammar's C source that we could then > access. It could be used to make decisions about what queries to use, or > to warn the user they need to use a different grammar (maybe offering to > install a compatible version). Yes, that would be an improvement, worth being up on the issue tracker maybe. > Tree-sitter grammar changes are almost always breaking changes. Adding > nodes can break things, re-naming them and removing them definitely can. > I'm not sure any grammar consumer has a great way to deal with this > without always compiling the exact grammar they need and only ever using > it. That's my conclusion as well for the time being. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-06 12:47 ` Dmitry Gutov @ 2023-09-07 3:18 ` Danny Freeman 2023-09-07 12:52 ` Dmitry Gutov 0 siblings, 1 reply; 30+ messages in thread From: Danny Freeman @ 2023-09-07 3:18 UTC (permalink / raw) To: Dmitry Gutov Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith Dmitry Gutov <dgutov@yandex.ru> writes: >> clojure-ts-mode keeps a URL for the parser, but doesn't do anything >> about the git revision. It easily could but I don't feel the need (yet) >> since I am also a maintainer of the clojure grammar and know when we're >> about to break grammar consumers. > > Sure, that's easy enough to do when the package is only in ELPA: upgrade the grammar, upgrade the > package, all in lockstep. Yeah, soon after I sent that email I realized there is no reason for me not to specify a version for the grammar so I pushed a change doing just that. > Unless nixos or other distros are going to start distributing it as well, and you'll need to care > about having the recent clojure-ts-mode being loaded with old versions of the grammar. Luckily the grammar has not changed since my package was released. But you are right it will eventually become an issue. >> With the way Emacs can load a grammar provided by the user's >> distribution, keeping information about the version of the grammar in >> the major mode doesn't help all that much. Even if we did it we have no >> idea what version might be have been built used the user's >> .emacs.d/tree-sitter folder. That would require something like putting a >> version number in the file name, or maybe applying a patch to the >> grammar's C source that allowed us to get a version, SHA, something at >> runtime. > > Well, it would at least allow the user to rebuild the grammar to the version best known to work. > Also, perhaps if the mode tracks the changes in the hash over time, it could see whether the grammar > needs to be rebuilt. Finally, treesit-install-language-grammar could track which revision was last > compiled. > > So there is *something* we could do for the users who upgrade their grammars from Git. > > Grammars distributed from distros are more of a problem, because it's not always a good idea to > abort with "wrong version". But perhaps we could do that and recommend installing from Git in such > cases anyway? In some cases, distros might place the grammars in a strange location made accessible on `treesit-extra-load-path`, which takes precedence over the grammars that are installed from git in the user's Emacs directory. This is what nix does, but is probably an outlier. I would guess more conventional distributions might just make them accessible where dynamic libraries are normally located and the grammars installed from git would take precedence. > Another problem is that grammars don't have good versioning, and even if they did, we'd have to > sometimes update the "upper bound" (we'd need coarse ranges, right? rather that one fixed version > requirement) more frequently than Emacs is released. Less of a problem for modes in ELPA, though. Yeah I think ranges would be right. It would be good to say, we tested this with versions N through M, anything else might not work. There would still need to be some checks and patches like what exists in js-ts-mode now. But that seems unavoidable, but could be cleaner if we had a good way to ID grammars. Not sure about how we'd keep up with grammars. Maybe we just can't and would need to have users install older versions. That seems okay? >> I'm not so sure we can have a great way to do this without a change to >> the tree-sitter libraries. I would love to see some kind of increasing >> version number generated in the grammar's C source that we could then >> access. It could be used to make decisions about what queries to use, or >> to warn the user they need to use a different grammar (maybe offering to >> install a compatible version). > > Yes, that would be an improvement, worth being up on the issue tracker maybe. Yeah, I think this is a good move. I opened up one here https://github.com/tree-sitter/tree-sitter/issues/2611 Of course, anyone feel free to chime in. Thank you, -- Danny Freeman ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-07 3:18 ` Danny Freeman @ 2023-09-07 12:52 ` Dmitry Gutov 0 siblings, 0 replies; 30+ messages in thread From: Dmitry Gutov @ 2023-09-07 12:52 UTC (permalink / raw) To: Danny Freeman Cc: Yuan Fu, emacs-devel, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith On 07/09/2023 06:18, Danny Freeman wrote: > > Dmitry Gutov <dgutov@yandex.ru> writes: > >>> clojure-ts-mode keeps a URL for the parser, but doesn't do anything >>> about the git revision. It easily could but I don't feel the need (yet) >>> since I am also a maintainer of the clojure grammar and know when we're >>> about to break grammar consumers. >> >> Sure, that's easy enough to do when the package is only in ELPA: upgrade the grammar, upgrade the >> package, all in lockstep. > > Yeah, soon after I sent that email I realized there is no reason for me > not to specify a version for the grammar so I pushed a change doing just > that. Nice. >> Grammars distributed from distros are more of a problem, because it's not always a good idea to >> abort with "wrong version". But perhaps we could do that and recommend installing from Git in such >> cases anyway? > > In some cases, distros might place the grammars in a strange location > made accessible on `treesit-extra-load-path`, which takes precedence > over the grammars that are installed from git in the user's Emacs > directory. This is what nix does, but is probably an outlier. I would > guess more conventional distributions might just make them accessible > where dynamic libraries are normally located and the grammars installed > from git would take precedence. Perhaps the user's Emacs directory should take precendence over treesit-extra-load-path. Or treesit-install-language-grammar should pick a higher-priority place instead. It just makes sense that the user-installed grammar would be loaded first. >> Another problem is that grammars don't have good versioning, and even if they did, we'd have to >> sometimes update the "upper bound" (we'd need coarse ranges, right? rather that one fixed version >> requirement) more frequently than Emacs is released. Less of a problem for modes in ELPA, though. > > Yeah I think ranges would be right. It would be good to say, we tested > this with versions N through M, anything else might not work. There > would still need to be some checks and patches like what exists in > js-ts-mode now. But that seems unavoidable, but could be cleaner if we > had a good way to ID grammars. Not sure about how we'd keep up with > grammars. Maybe we just can't and would need to have users install older > versions. That seems okay? Basically, yes: if the current available grammar is outside of the compatibility range (and/or we get query errors, I'm not sure where to put the balance: I suppose sometimes the query will succeed but it wouldn't match some elements which it matched before), we issue a warning to the user that they're recommended to use treesit-install-language-grammar - installing the last-known good hash, which might as well be older than the current installed grammar. >>> I'm not so sure we can have a great way to do this without a change to >>> the tree-sitter libraries. I would love to see some kind of increasing >>> version number generated in the grammar's C source that we could then >>> access. It could be used to make decisions about what queries to use, or >>> to warn the user they need to use a different grammar (maybe offering to >>> install a compatible version). >> >> Yes, that would be an improvement, worth being up on the issue tracker maybe. > > Yeah, I think this is a good move. I opened up one here > https://github.com/tree-sitter/tree-sitter/issues/2611 > Of course, anyone feel free to chime in. Thanks! I left a note too. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-03 0:56 ` Dmitry Gutov 2023-09-06 2:51 ` Danny Freeman @ 2023-09-08 1:04 ` Yuan Fu 2023-09-08 6:40 ` Eli Zaretskii 2023-09-08 21:05 ` Dmitry Gutov 1 sibling, 2 replies; 30+ messages in thread From: Yuan Fu @ 2023-09-08 1:04 UTC (permalink / raw) To: Dmitry Gutov Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith > On Sep 2, 2023, at 5:56 PM, Dmitry Gutov <dgutov@yandex.ru> wrote: > > Hi Yuan, > > On 02/09/2023 08:01, Yuan Fu wrote: >> - Solve the grammar versioning/breaking-change problem: tree-sitter grammar don’t have a version number, so every time the author changes the grammar, our queries break, and loading the mode only produces a giant error. > > I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to). That’ll probably be ideal for third-party modes. But knowing Eli, I don’t think builtin major modes can do that. > >> Finally, feel free to send me an email or send to emacs-devel and CC me, if there are things treesit.c and treesit.el can do better, or when there are nice things in neovim and other editors and Emacs ought to have, too. > > Something I mentioned previously, there is notion of scopes in tree-sitter docs, see the Local Variables section here: https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables > > Basically to know which symbols are defined and for how long, the parser needs additional help from the major mode author. > > Neovim's definition here: https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/ruby/locals.scm Good call. I’ll add it to my TODO list, but it’ll have a lower priority, since I personally aren’t really interested in coloring variables different colors. If someone is interested, do please give it a try. Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-08 1:04 ` Yuan Fu @ 2023-09-08 6:40 ` Eli Zaretskii 2023-09-08 20:52 ` Dmitry Gutov 2023-09-08 21:05 ` Dmitry Gutov 1 sibling, 1 reply; 30+ messages in thread From: Eli Zaretskii @ 2023-09-08 6:40 UTC (permalink / raw) To: Yuan Fu; +Cc: dgutov, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz > From: Yuan Fu <casouri@gmail.com> > Date: Thu, 7 Sep 2023 18:04:02 -0700 > Cc: emacs-devel <emacs-devel@gnu.org>, Danny Freeman <danny@dfreeman.email>, > Theodor Thornhill <theo@thornhill.no>, > Jostein Kjønigsen <jostein@secure.kjonigsen.net>, > Randy Taylor <dev@rjt.dev>, Wilhelm Kirschbaum <wkirschbaum@gmail.com>, > Perry Smith <pedz@easesoftware.com> > > > I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to). > > That’ll probably be ideal for third-party modes. But knowing Eli, I don’t think builtin major modes can do that. FTR, I have nothing against this technique, I just said that it will need volunteers to assume this non-trivial job for each major mode, and therefore I personally don't believe this to be a reliable solution in practice. But if volunteers step forward to do this, I don't object. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-08 6:40 ` Eli Zaretskii @ 2023-09-08 20:52 ` Dmitry Gutov 2023-09-09 6:32 ` Eli Zaretskii 0 siblings, 1 reply; 30+ messages in thread From: Dmitry Gutov @ 2023-09-08 20:52 UTC (permalink / raw) To: Eli Zaretskii, Yuan Fu Cc: emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz On 08/09/2023 09:40, Eli Zaretskii wrote: >> From: Yuan Fu<casouri@gmail.com> >> Date: Thu, 7 Sep 2023 18:04:02 -0700 >> Cc: emacs-devel<emacs-devel@gnu.org>, Danny Freeman<danny@dfreeman.email>, >> Theodor Thornhill<theo@thornhill.no>, >> Jostein Kjønigsen<jostein@secure.kjonigsen.net>, >> Randy Taylor<dev@rjt.dev>, Wilhelm Kirschbaum<wkirschbaum@gmail.com>, >> Perry Smith<pedz@easesoftware.com> >> >>> I don't have a better idea than basically copying NeoVim and others: to maintain the urls to parser repositories and the ref of the latest known good revision, for the current version of the major mode. That info could be filled in by major modes themselves, e.g. in an autoload block (similarly to how auto-mode-alist is appended to). >> That’ll probably be ideal for third-party modes. But knowing Eli, I don’t think builtin major modes can do that. > FTR, I have nothing against this technique, I just said that it will > need volunteers to assume this non-trivial job for each major mode, > and therefore I personally don't believe this to be a reliable > solution in practice. But if volunteers step forward to do this, I > don't object. I don't see a way around it, if the grammars continue to add breaking changes. We already have volunteers: when somebody works on a ts mode (adds a new feature or verifies that the current font-lock and indentation work fine), might as well put in the last-known-good commit hash. Or update it, if needed (e.g. the new feature requires that). Adding versions ranges if/when proper versions arrive might require more foresight, but the alternative seems to be "unsupporting" distro-packaged grammars. What I'm saying is in this case not doing this job well (e.g. updating the commit hashes and font-lock/indent rules very rarely) might still be better than not doing it at all. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-08 20:52 ` Dmitry Gutov @ 2023-09-09 6:32 ` Eli Zaretskii 2023-09-09 10:24 ` Dmitry Gutov 0 siblings, 1 reply; 30+ messages in thread From: Eli Zaretskii @ 2023-09-09 6:32 UTC (permalink / raw) To: Dmitry Gutov Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz > Date: Fri, 8 Sep 2023 23:52:48 +0300 > Cc: emacs-devel@gnu.org, danny@dfreeman.email, theo@thornhill.no, > jostein@secure.kjonigsen.net, dev@rjt.dev, wkirschbaum@gmail.com, > pedz@easesoftware.com > From: Dmitry Gutov <dgutov@yandex.ru> > > > FTR, I have nothing against this technique, I just said that it will > > need volunteers to assume this non-trivial job for each major mode, > > and therefore I personally don't believe this to be a reliable > > solution in practice. But if volunteers step forward to do this, I > > don't object. > > I don't see a way around it, if the grammars continue to add breaking > changes. If the only way we see is impractical, this doesn't help, does it? > We already have volunteers: when somebody works on a ts mode (adds a new > feature or verifies that the current font-lock and indentation work > fine), might as well put in the last-known-good commit hash. Or update > it, if needed (e.g. the new feature requires that). No, that'd be worse than what we have now: those commit hashes will quickly become outdated (most grammar libraries are very actively developed), and create the false impression that any later version will not work. The job is to track all the commits of the corresponding libraries and keep the last commit known to work constantly up-to-date, with delays that are at most days, not weeks or months. > What I'm saying is in this case not doing this job well (e.g. updating > the commit hashes and font-lock/indent rules very rarely) might still be > better than not doing it at all. I strongly disagree. I think that doing this job not well is _worse_ than not doing it. It takes just a few days, sometimes a couple of weeks, from submission of the report about a breakage till a fix is available, so people could install it soon enough and be done (since these modes are not preloaded). And we always strive to fix these breakages in a way that makes the code more immune to further changes, so there's hope that with time the frequency of these problems will become lower. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-09 6:32 ` Eli Zaretskii @ 2023-09-09 10:24 ` Dmitry Gutov 2023-09-09 11:38 ` Eli Zaretskii 0 siblings, 1 reply; 30+ messages in thread From: Dmitry Gutov @ 2023-09-09 10:24 UTC (permalink / raw) To: Eli Zaretskii Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz On 09/09/2023 09:32, Eli Zaretskii wrote: >> Date: Fri, 8 Sep 2023 23:52:48 +0300 >> Cc: emacs-devel@gnu.org, danny@dfreeman.email, theo@thornhill.no, >> jostein@secure.kjonigsen.net, dev@rjt.dev, wkirschbaum@gmail.com, >> pedz@easesoftware.com >> From: Dmitry Gutov <dgutov@yandex.ru> >> >>> FTR, I have nothing against this technique, I just said that it will >>> need volunteers to assume this non-trivial job for each major mode, >>> and therefore I personally don't believe this to be a reliable >>> solution in practice. But if volunteers step forward to do this, I >>> don't object. >> >> I don't see a way around it, if the grammars continue to add breaking >> changes. > > If the only way we see is impractical, this doesn't help, does it? Is it definitely impractical if it's known to work for NeoVim? >> We already have volunteers: when somebody works on a ts mode (adds a new >> feature or verifies that the current font-lock and indentation work >> fine), might as well put in the last-known-good commit hash. Or update >> it, if needed (e.g. the new feature requires that). > > No, that'd be worse than what we have now: those commit hashes will > quickly become outdated (most grammar libraries are very actively > developed), and create the false impression that any later version will > not work. But it's not the first thing the user sees, just internal information: we tested with this version last, it's known to work, so if you want to have a known well-working configuration, you will install this one. Might as well install the latest and try their luck, though. Further, most important grammars seem to be in a reasonably complete state by now. So installing the known-to-work version shouldn't generally result in obvious omissions in language features supported. And, well, when a grammar adds support for new ones, we would likely have to update the major mode anyway (together with the hash). > The job is to track all the commits of the corresponding libraries and > keep the last commit known to work constantly up-to-date, with delays > that are at most days, not weeks or months. Consider that js-ts-mode is "broken" in Emacs 29.1 now with the latest grammar. If there was the last-known-working hash, we could offer the users a friendlier way to install it. >> What I'm saying is in this case not doing this job well (e.g. updating >> the commit hashes and font-lock/indent rules very rarely) might still be >> better than not doing it at all. > > I strongly disagree. I think that doing this job not well is _worse_ > than not doing it. It takes just a few days, sometimes a couple of > weeks, from submission of the report about a breakage till a fix is > available, so people could install it soon enough and be done (since > these modes are not preloaded). But the ts modes in an Emacs release (now in 29.1, later in 29.2, etc) remain incompatible with any future grammar changes, right? > And we always strive to fix these > breakages in a way that makes the code more immune to further changes, > so there's hope that with time the frequency of these problems will > become lower. We can continue with this approach too (it's not incompatible with saving the last-known-hash anyway), and it could also be of benefit later if grammars grow proper versions, but it's also a bit of maintenance headache on its own. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-09 10:24 ` Dmitry Gutov @ 2023-09-09 11:38 ` Eli Zaretskii 2023-09-09 17:04 ` Dmitry Gutov 0 siblings, 1 reply; 30+ messages in thread From: Eli Zaretskii @ 2023-09-09 11:38 UTC (permalink / raw) To: Dmitry Gutov Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz > Date: Sat, 9 Sep 2023 13:24:46 +0300 > Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email, > theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev, > wkirschbaum@gmail.com, pedz@easesoftware.com > From: Dmitry Gutov <dgutov@yandex.ru> > > On 09/09/2023 09:32, Eli Zaretskii wrote: > >> Date: Fri, 8 Sep 2023 23:52:48 +0300 > >> Cc: emacs-devel@gnu.org, danny@dfreeman.email, theo@thornhill.no, > >> jostein@secure.kjonigsen.net, dev@rjt.dev, wkirschbaum@gmail.com, > >> pedz@easesoftware.com > >> From: Dmitry Gutov <dgutov@yandex.ru> > >> > >>> FTR, I have nothing against this technique, I just said that it will > >>> need volunteers to assume this non-trivial job for each major mode, > >>> and therefore I personally don't believe this to be a reliable > >>> solution in practice. But if volunteers step forward to do this, I > >>> don't object. > >> > >> I don't see a way around it, if the grammars continue to add breaking > >> changes. > > > > If the only way we see is impractical, this doesn't help, does it? > > Is it definitely impractical if it's known to work for NeoVim? NeoVim is a different editor, with (potentially) different user audience, different development and release schedules, different distribution practices, different downstream distros and upgrade procedures, etc. Without considering all of these aspects and comparing them to ours, I don't think it's meaningful to say "works" when we discuss what we should do in our case. > > No, that'd be worse than what we have now: those commit hashes will > > quickly become outdated (most grammar libraries are very actively > > developed), and create the false impression that any later version will > > not work. > > But it's not the first thing the user sees, just internal information: > we tested with this version last, it's known to work, so if you want to > have a known well-working configuration, you will install this one. > Might as well install the latest and try their luck, though. How is it useful to ask users to use, say, 2-year old versions of grammar libraries, especially for languages where either the language or the library (or both) change quickly? > Further, most important grammars seem to be in a reasonably complete > state by now. They add features and fix problems all the time. So I disagree with the "reasonably complete" part, and so are the developers of those libraries, evidently. > > The job is to track all the commits of the corresponding libraries and > > keep the last commit known to work constantly up-to-date, with delays > > that are at most days, not weeks or months. > > Consider that js-ts-mode is "broken" in Emacs 29.1 now with the latest > grammar. If there was the last-known-working hash, we could offer the > users a friendlier way to install it. How is it friendlier to downgrade to an older version (which would require fetching it, building it with a C compiler, and installing it) than to patch a single Lisp file? Actually, people don't even need to patch their Emacs installations, they could instead have a fixed version of the Lisp file in their home directories or in site-lisp. > >> What I'm saying is in this case not doing this job well (e.g. updating > >> the commit hashes and font-lock/indent rules very rarely) might still be > >> better than not doing it at all. > > > > I strongly disagree. I think that doing this job not well is _worse_ > > than not doing it. It takes just a few days, sometimes a couple of > > weeks, from submission of the report about a breakage till a fix is > > available, so people could install it soon enough and be done (since > > these modes are not preloaded). > > But the ts modes in an Emacs release (now in 29.1, later in 29.2, etc) > remain incompatible with any future grammar changes, right? That depends on the change: not every change breaks our modes. Only changes that remove or rename the syntactic elements on which we rely are breaking changes, from our POV. > > And we always strive to fix these > > breakages in a way that makes the code more immune to further changes, > > so there's hope that with time the frequency of these problems will > > become lower. > > We can continue with this approach too (it's not incompatible with > saving the last-known-hash anyway), and it could also be of benefit > later if grammars grow proper versions, but it's also a bit of > maintenance headache on its own. I agree it's a maintenance headache, but this issue will cause us headaches no matter what we do. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-09 11:38 ` Eli Zaretskii @ 2023-09-09 17:04 ` Dmitry Gutov 2023-09-09 17:28 ` Eli Zaretskii 0 siblings, 1 reply; 30+ messages in thread From: Dmitry Gutov @ 2023-09-09 17:04 UTC (permalink / raw) To: Eli Zaretskii Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz On 09/09/2023 14:38, Eli Zaretskii wrote: >>> No, that'd be worse than what we have now: those commit hashes will >>> quickly become outdated (most grammar libraries are very actively >>> developed), and create the false impression that any later version will >>> not work. >> >> But it's not the first thing the user sees, just internal information: >> we tested with this version last, it's known to work, so if you want to >> have a known well-working configuration, you will install this one. >> Might as well install the latest and try their luck, though. > > How is it useful to ask users to use, say, 2-year old versions of > grammar libraries, especially for languages where either the language > or the library (or both) change quickly? It would be better to use a 2-year-old grammar which works with our mode than a new grammar which breaks our mode anyway. We could also take a slightly more advanced approach: first install the latest version (if the user goes for 'M-x treesit-install-language-grammar' right away), and then in case of query errors suggest the version of the grammar known to work. But that's extra complexity (and more actions on the part of the user as well), and the actual benefit is hard to foretell. >> Further, most important grammars seem to be in a reasonably complete >> state by now. > > They add features and fix problems all the time. So I disagree with > the "reasonably complete" part, and so are the developers of those > libraries, evidently. Depends on the individual language, of course. >>> The job is to track all the commits of the corresponding libraries and >>> keep the last commit known to work constantly up-to-date, with delays >>> that are at most days, not weeks or months. >> >> Consider that js-ts-mode is "broken" in Emacs 29.1 now with the latest >> grammar. If there was the last-known-working hash, we could offer the >> users a friendlier way to install it. > > How is it friendlier to downgrade to an older version (which would > require fetching it, building it with a C compiler, and installing it) > than to patch a single Lisp file? Actually, people don't even need to > patch their Emacs installations, they could instead have a fixed > version of the Lisp file in their home directories or in site-lisp. So we'll suggest they manually copy the latest version of xxx-js-mode.el from master over to their site-lisp? That will be our recommendation in case a grammar breaks? I suppose we could publish all ts grammars in "core ELPA". Then the recommendation will be "just upgrade from ELPA" (though keeping in mind the associated usability problem like the one we discussed with Eglot). "Core ELPA" also inflicts certain restrictions on how the code in the package is written (backward compatibility checks, etc). ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-09 17:04 ` Dmitry Gutov @ 2023-09-09 17:28 ` Eli Zaretskii 2023-09-12 0:36 ` Yuan Fu 0 siblings, 1 reply; 30+ messages in thread From: Eli Zaretskii @ 2023-09-09 17:28 UTC (permalink / raw) To: Dmitry Gutov Cc: casouri, emacs-devel, danny, theo, jostein, dev, wkirschbaum, pedz > Date: Sat, 9 Sep 2023 20:04:07 +0300 > Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email, > theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev, > wkirschbaum@gmail.com, pedz@easesoftware.com > From: Dmitry Gutov <dgutov@yandex.ru> > > > How is it useful to ask users to use, say, 2-year old versions of > > grammar libraries, especially for languages where either the language > > or the library (or both) change quickly? > > It would be better to use a 2-year-old grammar which works with our mode > than a new grammar which breaks our mode anyway. But worse than using a 6-month-old grammar that doesn't break the mode and has a lot of improvements. > > How is it friendlier to downgrade to an older version (which would > > require fetching it, building it with a C compiler, and installing it) > > than to patch a single Lisp file? Actually, people don't even need to > > patch their Emacs installations, they could instead have a fixed > > version of the Lisp file in their home directories or in site-lisp. > > So we'll suggest they manually copy the latest version of xxx-js-mode.el > from master over to their site-lisp? That will be our recommendation in > case a grammar breaks? Something like that, yes. Or applying the diffs from the fix. > I suppose we could publish all ts grammars in "core ELPA". Yes, that could be a good solution, if nothing better comes up. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-09 17:28 ` Eli Zaretskii @ 2023-09-12 0:36 ` Yuan Fu 2023-09-12 10:17 ` Dmitry Gutov 0 siblings, 1 reply; 30+ messages in thread From: Yuan Fu @ 2023-09-12 0:36 UTC (permalink / raw) To: Eli Zaretskii Cc: Dmitry Gutov, emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, pedz > On Sep 9, 2023, at 10:28 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> Date: Sat, 9 Sep 2023 20:04:07 +0300 >> Cc: casouri@gmail.com, emacs-devel@gnu.org, danny@dfreeman.email, >> theo@thornhill.no, jostein@secure.kjonigsen.net, dev@rjt.dev, >> wkirschbaum@gmail.com, pedz@easesoftware.com >> From: Dmitry Gutov <dgutov@yandex.ru> >> >>> How is it useful to ask users to use, say, 2-year old versions of >>> grammar libraries, especially for languages where either the language >>> or the library (or both) change quickly? >> >> It would be better to use a 2-year-old grammar which works with our mode >> than a new grammar which breaks our mode anyway. > > But worse than using a 6-month-old grammar that doesn't break the mode > and has a lot of improvements. > >>> How is it friendlier to downgrade to an older version (which would >>> require fetching it, building it with a C compiler, and installing it) >>> than to patch a single Lisp file? Actually, people don't even need to >>> patch their Emacs installations, they could instead have a fixed >>> version of the Lisp file in their home directories or in site-lisp. >> >> So we'll suggest they manually copy the latest version of xxx-js-mode.el >> from master over to their site-lisp? That will be our recommendation in >> case a grammar breaks? > > Something like that, yes. Or applying the diffs from the fix. > >> I suppose we could publish all ts grammars in "core ELPA". > > Yes, that could be a good solution, if nothing better comes up. Does “publish all ts grammars” mean the binary libraries? Yuan ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-12 0:36 ` Yuan Fu @ 2023-09-12 10:17 ` Dmitry Gutov 0 siblings, 0 replies; 30+ messages in thread From: Dmitry Gutov @ 2023-09-12 10:17 UTC (permalink / raw) To: Yuan Fu, Eli Zaretskii Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, pedz On 12/09/2023 03:36, Yuan Fu wrote: >>> I suppose we could publish all ts grammars in "core ELPA". >> Yes, that could be a good solution, if nothing better comes up. > Does “publish all ts grammars” mean the binary libraries? I meant just the major modes, FWIW. Maybe that was poor wording. ^ permalink raw reply [flat|nested] 30+ messages in thread
* Re: Update on tree-sitter structure navigation 2023-09-08 1:04 ` Yuan Fu 2023-09-08 6:40 ` Eli Zaretskii @ 2023-09-08 21:05 ` Dmitry Gutov 1 sibling, 0 replies; 30+ messages in thread From: Dmitry Gutov @ 2023-09-08 21:05 UTC (permalink / raw) To: Yuan Fu Cc: emacs-devel, Danny Freeman, Theodor Thornhill, Jostein Kjønigsen, Randy Taylor, Wilhelm Kirschbaum, Perry Smith On 08/09/2023 04:04, Yuan Fu wrote: >> Something I mentioned previously, there is notion of scopes in tree-sitter docs, see the Local Variables section here:https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables >> >> Basically to know which symbols are defined and for how long, the parser needs additional help from the major mode author. >> >> Neovim's definition here:https://github.com/nvim-treesitter/nvim-treesitter/blob/master/queries/ruby/locals.scm > Good call. I’ll add it to my TODO list, but it’ll have a lower priority, since I personally aren’t really interested in coloring variables different colors. If someone is interested, do please give it a try. Sure, it's probably more valuable in some languages than others. In case you have some ideas for the implementation strategy, though, perhaps mention them inside treesit.el's Commentary (it could also have a TODO block). Offhand, it doesn't quite fit to what we do with font-lock. OTOH, I suppose I could go take a look at NVim's implementation. ^ permalink raw reply [flat|nested] 30+ messages in thread
end of thread, other threads:[~2023-09-12 10:17 UTC | newest] Thread overview: 30+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-09-02 5:01 Update on tree-sitter structure navigation Yuan Fu 2023-09-02 6:52 ` Ihor Radchenko 2023-09-02 8:50 ` Hugo Thunnissen 2023-09-02 22:12 ` Yuan Fu 2023-09-06 11:37 ` Ihor Radchenko 2023-09-08 0:59 ` Yuan Fu 2023-09-02 22:09 ` Yuan Fu 2023-09-06 11:57 ` Ihor Radchenko 2023-09-06 12:58 ` Eli Zaretskii 2023-09-08 12:03 ` Ihor Radchenko 2023-09-08 13:08 ` Eli Zaretskii 2023-09-08 1:06 ` Yuan Fu 2023-09-08 9:09 ` Ihor Radchenko 2023-09-08 16:46 ` Yuan Fu 2023-09-03 0:56 ` Dmitry Gutov 2023-09-06 2:51 ` Danny Freeman 2023-09-06 12:47 ` Dmitry Gutov 2023-09-07 3:18 ` Danny Freeman 2023-09-07 12:52 ` Dmitry Gutov 2023-09-08 1:04 ` Yuan Fu 2023-09-08 6:40 ` Eli Zaretskii 2023-09-08 20:52 ` Dmitry Gutov 2023-09-09 6:32 ` Eli Zaretskii 2023-09-09 10:24 ` Dmitry Gutov 2023-09-09 11:38 ` Eli Zaretskii 2023-09-09 17:04 ` Dmitry Gutov 2023-09-09 17:28 ` Eli Zaretskii 2023-09-12 0:36 ` Yuan Fu 2023-09-12 10:17 ` Dmitry Gutov 2023-09-08 21:05 ` Dmitry Gutov
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).