* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
@ 2024-09-21 5:06 Mickey Petersen
2024-09-26 7:42 ` Yuan Fu
0 siblings, 1 reply; 106+ messages in thread
From: Mickey Petersen @ 2024-09-21 5:06 UTC (permalink / raw)
To: 73404
Examples with javascript-mode. It holds for all modes i tested with a
TS equivalent. Let -!- be the starting point and ^N be the subsequent
position after a movement command.
-!-export const add = (a, b) => a + b;
Repeated `C-M-f' yields
export const add = (a, b) => a + b;
^1 ^2 ^3 ^4 ^5 ^6
In other words, it works as it always has.
Meanwhile, in `js-ts-mode':
export const add = (a, b) => a + b;
^1 ^2 ^3 ^4
From ^1 and back with `C-M-b'
export const add-!- = (a, b) => a + b;
export const add = (a, b) => a + b;
^1
At this point, `C-M-b' no longer goes back. It is stuck.
Another example:
-!-console.log("Addition result:", result1);
With `C-M-f':
console.log("Addition result:", result1);
^1 ^2
This affects every single -sexp function that uses either
`forward-sexp-function' or `transpose-sexp-function' to do its job.
Thanks.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-21 5:06 bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Mickey Petersen
@ 2024-09-26 7:42 ` Yuan Fu
2024-09-26 9:56 ` Mickey Petersen
0 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-09-26 7:42 UTC (permalink / raw)
To: Mickey Petersen; +Cc: 73404
> On Sep 20, 2024, at 10:06 PM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>
>
>
> Examples with javascript-mode. It holds for all modes i tested with a
> TS equivalent. Let -!- be the starting point and ^N be the subsequent
> position after a movement command.
>
> -!-export const add = (a, b) => a + b;
>
> Repeated `C-M-f' yields
>
> export const add = (a, b) => a + b;
>
> ^1 ^2 ^3 ^4 ^5 ^6
>
>
> In other words, it works as it always has.
>
> Meanwhile, in `js-ts-mode':
>
> export const add = (a, b) => a + b;
> ^1 ^2 ^3 ^4
>
> From ^1 and back with `C-M-b'
>
> export const add-!- = (a, b) => a + b;
>
> export const add = (a, b) => a + b;
> ^1
>
> At this point, `C-M-b' no longer goes back. It is stuck.
>
>
> Another example:
>
> -!-console.log("Addition result:", result1);
>
> With `C-M-f':
>
> console.log("Addition result:", result1);
>
> ^1 ^2
>
>
> This affects every single -sexp function that uses either
> `forward-sexp-function' or `transpose-sexp-function' to do its job.
>
> Thanks.
>
I’m aware of this problem and it’s quite inconvenient at times, but right now I don’t have a good solution for it. Ideas are welcome.
Basically tree-sitter’s sexp movement works on subtrees. It determines the position of the point in the whole parse tree and goes forward/back across the next subtree in the parse tree. If there’s no more sibling subtrees in the same level to move over, sexp movement stops like in lisp. The parse tree is invisible and often groups token in unexpected ways, so many times the sexp movement isn’t intuitive.
We might need to add a user option so people can easily turn off tree-sitter sexp movement, since it isn’t a strict upgrade from the generic sexp movement—it’s more of a different flavored sexp movement.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-26 7:42 ` Yuan Fu
@ 2024-09-26 9:56 ` Mickey Petersen
2024-09-26 10:53 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Mickey Petersen @ 2024-09-26 9:56 UTC (permalink / raw)
To: Yuan Fu; +Cc: 73404
Yuan Fu <casouri@gmail.com> writes:
>> On Sep 20, 2024, at 10:06 PM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>>
>>
>>
>> Examples with javascript-mode. It holds for all modes i tested with a
>> TS equivalent. Let -!- be the starting point and ^N be the subsequent
>> position after a movement command.
>>
>> -!-export const add = (a, b) => a + b;
>>
>> Repeated `C-M-f' yields
>>
>> export const add = (a, b) => a + b;
>>
>> ^1 ^2 ^3 ^4 ^5 ^6
>>
>>
>> In other words, it works as it always has.
>>
>> Meanwhile, in `js-ts-mode':
>>
>> export const add = (a, b) => a + b;
>> ^1 ^2 ^3 ^4
>>
>> From ^1 and back with `C-M-b'
>>
>> export const add-!- = (a, b) => a + b;
>>
>> export const add = (a, b) => a + b;
>> ^1
>>
>> At this point, `C-M-b' no longer goes back. It is stuck.
>>
>>
>> Another example:
>>
>> -!-console.log("Addition result:", result1);
>>
>> With `C-M-f':
>>
>> console.log("Addition result:", result1);
>>
>> ^1 ^2
>>
>>
>> This affects every single -sexp function that uses either
>> `forward-sexp-function' or `transpose-sexp-function' to do its job.
>>
>> Thanks.
>>
>
> I’m aware of this problem and it’s quite inconvenient at times, but right now I don’t have a good solution for it. Ideas are welcome.
>
> Basically tree-sitter’s sexp movement works on subtrees. It determines
> the position of the point in the whole parse tree and goes
> forward/back across the next subtree in the parse tree. If there’s no
> more sibling subtrees in the same level to move over, sexp movement
> stops like in lisp. The parse tree is invisible and often groups token
> in unexpected ways, so many times the sexp movement isn’t intuitive.
>
Hi Yuan,
In my opinion, that's not what `sexp' movement is.
Sexp movement is movement by balanced expressions -- and a fallback to
word-like behaviour absent that -- and this is not that. It would be
better to relegate this sort of thing to its own set of keybindings.
> We might need to add a user option so people can easily turn off
> tree-sitter sexp movement, since it isn’t a strict upgrade from the
> generic sexp movement—it’s more of a different flavored sexp movement.
It should be opt-in, not opt-out.
>
> Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-26 9:56 ` Mickey Petersen
@ 2024-09-26 10:53 ` Eli Zaretskii
2024-09-26 12:13 ` Mickey Petersen
0 siblings, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2024-09-26 10:53 UTC (permalink / raw)
To: Mickey Petersen; +Cc: casouri, 73404
> Cc: 73404@debbugs.gnu.org
> From: Mickey Petersen <mickey@masteringemacs.org>
> Date: Thu, 26 Sep 2024 10:56:35 +0100
>
> In my opinion, that's not what `sexp' movement is.
>
> Sexp movement is movement by balanced expressions -- and a fallback to
> word-like behaviour absent that -- and this is not that. It would be
> better to relegate this sort of thing to its own set of keybindings.
The term "balanced expression" is not well defined in languages other
than Lisp and Lisp-like ones. It is clear what expected when point is
on a brace or a parenthesis, but entirely NOT clear when you start
from something else. For example:
int foo = bar + 2 * baz;
Suppose you start with point at "foo": what would you expect
forward-sexp to do? nothing?
> > We might need to add a user option so people can easily turn off
> > tree-sitter sexp movement, since it isn’t a strict upgrade from the
> > generic sexp movement—it’s more of a different flavored sexp movement.
>
> It should be opt-in, not opt-out.
I disagree. Moving by sub-trees is a natural generalization of sexp
movement for languages where parentheses and braces are rare and far
in-between.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-26 10:53 ` Eli Zaretskii
@ 2024-09-26 12:13 ` Mickey Petersen
2024-09-26 13:46 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Mickey Petersen @ 2024-09-26 12:13 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, 73404
Eli Zaretskii <eliz@gnu.org> writes:
>> Cc: 73404@debbugs.gnu.org
>> From: Mickey Petersen <mickey@masteringemacs.org>
>> Date: Thu, 26 Sep 2024 10:56:35 +0100
>>
>> In my opinion, that's not what `sexp' movement is.
>>
>> Sexp movement is movement by balanced expressions -- and a fallback to
>> word-like behaviour absent that -- and this is not that. It would be
>> better to relegate this sort of thing to its own set of keybindings.
>
> The term "balanced expression" is not well defined in languages other
> than Lisp and Lisp-like ones. It is clear what expected when point is
> on a brace or a parenthesis, but entirely NOT clear when you start
> from something else. For example:
>
> int foo = bar + 2 * baz;
>
> Suppose you start with point at "foo": what would you expect
> forward-sexp to do? nothing?
>
I expect it to behave as it presently does: default to word-like
behaviour such as M-@ / M-f etc.
Balanced expression is not well defined, de jure, but it is in
practical terms, making it de facto rather well understood and
supported. It behaves reasonably consistently across languages, and I
use *-sexp commands thousands of times a day in a wide range of major modes and
contexts, both in code and also prose.
Most people who use *-sexp (or *-word commands for that matter) in
major modes come to recognise how they work and know what happens to
the text/point in their buffer before they run them.
I would challenge anyone, given even small samples of code, to do the
same with the current TS only implementation.
>> > We might need to add a user option so people can easily turn off
>> > tree-sitter sexp movement, since it isn’t a strict upgrade from the
>> > generic sexp movement—it’s more of a different flavored sexp movement.
>>
>> It should be opt-in, not opt-out.
>
> I disagree. Moving by sub-trees is a natural generalization of sexp
> movement for languages where parentheses and braces are rare and far
> in-between.
Yes, if one can intuit the sub trees' structure, which is not so
simple; and if the selection of commands are sufficiently expressive
enough to let you navigate the tree. I am not sure they are.
The CSTs are deep, wide, and nodes' ranges frequently overlap; they
are multi-dimensional structures that map to a simple 2-dimensional
'grid' in your buffer. Making heads or tails of that is no easy feat.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-26 12:13 ` Mickey Petersen
@ 2024-09-26 13:46 ` Eli Zaretskii
2024-09-26 15:21 ` Mickey Petersen
0 siblings, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2024-09-26 13:46 UTC (permalink / raw)
To: Mickey Petersen; +Cc: casouri, 73404
> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NO_RECEIVED,
> NO_RELAYS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no
> version=3.4.2
> From: Mickey Petersen <mickey@masteringemacs.org>
> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
> Date: Thu, 26 Sep 2024 13:13:53 +0100
>
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > int foo = bar + 2 * baz;
> >
> > Suppose you start with point at "foo": what would you expect
> > forward-sexp to do? nothing?
> >
>
> I expect it to behave as it presently does: default to word-like
> behaviour such as M-@ / M-f etc.
Then we just lost an opportunity to have more useful commands, because
we already have M-f and M-@.
> Balanced expression is not well defined, de jure, but it is in
> practical terms, making it de facto rather well understood and
> supported. It behaves reasonably consistently across languages, and I
> use *-sexp commands thousands of times a day in a wide range of major modes and
> contexts, both in code and also prose.
I think the ability to move by parse sub-trees is also very useful.
> Most people who use *-sexp (or *-word commands for that matter) in
> major modes come to recognise how they work and know what happens to
> the text/point in their buffer before they run them.
>
> I would challenge anyone, given even small samples of code, to do the
> same with the current TS only implementation.
That's just a matter of getting used to the new semantics.
> > I disagree. Moving by sub-trees is a natural generalization of sexp
> > movement for languages where parentheses and braces are rare and far
> > in-between.
>
> Yes, if one can intuit the sub trees' structure, which is not so
> simple; and if the selection of commands are sufficiently expressive
> enough to let you navigate the tree. I am not sure they are.
There are enough situations where moving by words will also surprise
you. For example, did you know that M-f stops when it finds a
character from a different script? And yet we still use these
commands.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-26 13:46 ` Eli Zaretskii
@ 2024-09-26 15:21 ` Mickey Petersen
2024-09-26 15:45 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Mickey Petersen @ 2024-09-26 15:21 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, 73404
Eli Zaretskii <eliz@gnu.org> writes:
>> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NO_RECEIVED,
>> NO_RELAYS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no
>> version=3.4.2
>> From: Mickey Petersen <mickey@masteringemacs.org>
>> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
>> Date: Thu, 26 Sep 2024 13:13:53 +0100
>>
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> > int foo = bar + 2 * baz;
>> >
>> > Suppose you start with point at "foo": what would you expect
>> > forward-sexp to do? nothing?
>> >
>>
>> I expect it to behave as it presently does: default to word-like
>> behaviour such as M-@ / M-f etc.
>
> Then we just lost an opportunity to have more useful commands, because
> we already have M-f and M-@.
>
>> Balanced expression is not well defined, de jure, but it is in
>> practical terms, making it de facto rather well understood and
>> supported. It behaves reasonably consistently across languages, and I
>> use *-sexp commands thousands of times a day in a wide range of major modes and
>> contexts, both in code and also prose.
>
> I think the ability to move by parse sub-trees is also very useful.
>
Agreed. What matters is whether the crop of new sexp commands, such as they
are, perform satisfactorily.
Do you think the examples I listed in the original bug report match
your expectations? If so, then it is probably OK to close the bug report.
>> Most people who use *-sexp (or *-word commands for that matter) in
>> major modes come to recognise how they work and know what happens to
>> the text/point in their buffer before they run them.
>>
>> I would challenge anyone, given even small samples of code, to do the
>> same with the current TS only implementation.
>
> That's just a matter of getting used to the new semantics.
>
>> > I disagree. Moving by sub-trees is a natural generalization of sexp
>> > movement for languages where parentheses and braces are rare and far
>> > in-between.
>>
>> Yes, if one can intuit the sub trees' structure, which is not so
>> simple; and if the selection of commands are sufficiently expressive
>> enough to let you navigate the tree. I am not sure they are.
>
> There are enough situations where moving by words will also surprise
> you. For example, did you know that M-f stops when it finds a
> character from a different script? And yet we still use these
> commands.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-26 15:21 ` Mickey Petersen
@ 2024-09-26 15:45 ` Eli Zaretskii
2024-09-27 5:43 ` Yuan Fu
0 siblings, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2024-09-26 15:45 UTC (permalink / raw)
To: Mickey Petersen; +Cc: casouri, 73404
> From: Mickey Petersen <mickey@masteringemacs.org>
> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
> Date: Thu, 26 Sep 2024 16:21:33 +0100
>
> > I think the ability to move by parse sub-trees is also very useful.
> >
>
> Agreed. What matters is whether the crop of new sexp commands, such as they
> are, perform satisfactorily.
>
> Do you think the examples I listed in the original bug report match
> your expectations? If so, then it is probably OK to close the bug report.
Yes, I do, but let's wait for others to chime in if they have opinions
on this.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-26 15:45 ` Eli Zaretskii
@ 2024-09-27 5:43 ` Yuan Fu
2024-09-29 16:56 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-09-27 5:43 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Mickey Petersen, 73404
> On Sep 26, 2024, at 8:45 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Mickey Petersen <mickey@masteringemacs.org>
>> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
>> Date: Thu, 26 Sep 2024 16:21:33 +0100
>>
>>> I think the ability to move by parse sub-trees is also very useful.
>>>
>>
>> Agreed. What matters is whether the crop of new sexp commands, such as they
>> are, perform satisfactorily.
Note that you can affect the behavior of tree-sitter sexp movement by defining the sexp “thing” in treesit-thing-settings. Js-ts-mode defines one (js--treesit-sexp-nodes) and it only consider some nodes as sexp. You might be able to tweak the sexp movement to your liking by changing it, or directly modifying the definition for `sexp’ in treesit-thing-settings.
>>
>> Do you think the examples I listed in the original bug report match
>> your expectations? If so, then it is probably OK to close the bug report.
>
> Yes, I do, but let's wait for others to chime in if they have opinions
> on this.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-27 5:43 ` Yuan Fu
@ 2024-09-29 16:56 ` Juri Linkov
2024-10-01 3:57 ` Yuan Fu
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-09-29 16:56 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404
> Note that you can affect the behavior of tree-sitter sexp movement by
> defining the sexp “thing” in treesit-thing-settings. Js-ts-mode defines one
> (js--treesit-sexp-nodes) and it only consider some nodes as sexp. You might
> be able to tweak the sexp movement to your liking by changing it, or
> directly modifying the definition for `sexp’ in treesit-thing-settings.
>
>>> Do you think the examples I listed in the original bug report match
>>> your expectations? If so, then it is probably OK to close the bug report.
>>
>> Yes, I do, but let's wait for others to chime in if they have opinions
>> on this.
Here are some ideas how to cover more use cases.
Suppose that a user wants to disable tree-sitter sexp movement
completely to use the default forward-sexp-default-function.
The natural way to do this would be set the list of nodes to nil:
(setq js--treesit-sexp-nodes nil)
However, this currently doesn't work, and requires a change like this:
@@ -2290,10 +2290,12 @@ treesit-forward-sexp
(treesit-node-at (point) (treesit-language-at (point)))))
(or (when (and node-at-point
;; Make sure point is strictly inside node.
- (< (treesit-node-start node-at-point)
- (point)
- (treesit-node-end node-at-point))
- (treesit-node-match-p node-at-point 'text t))
+ (<= (treesit-node-start node-at-point)
+ (point)
+ (treesit-node-end node-at-point))
+ (or (treesit-node-match-p node-at-point 'text t)
+ (not (treesit-node-match-p node-at-point 'sexp t))
+ ))
(forward-sexp-default-function arg)
t)
(if (> arg 0)
Now, the next case: what if the user wants to use the default
forward-sexp-default-function except for the 'binary_expression'
like "a + b" where `C-M-f' should move from "a" to the end of "b":
export const add = (a, b) => -!-a + b;
should move to
export const add = (a, b) => a + b;
^1
The best way for the user would be to customize:
(setq js--treesit-sexp-nodes '("binary_expression"))
But this is not yet handled by the condition above:
(not (treesit-node-match-p node-at-point 'sexp t))
because 'node-at-point' is "identifier".
So we need to use 'treesit-parent-until'
to check if all parent nodes match
'js--treesit-sexp-nodes'. Then it will find
the parent "binary_expression".
I believe something like this will make
treesit-forward-sexp more customizable.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-09-29 16:56 ` Juri Linkov
@ 2024-10-01 3:57 ` Yuan Fu
2024-10-01 17:49 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-10-01 3:57 UTC (permalink / raw)
To: Juri Linkov; +Cc: Eli Zaretskii, Mickey Petersen, 73404
> On Sep 29, 2024, at 9:56 AM, Juri Linkov <juri@linkov.net> wrote:
>
>> Note that you can affect the behavior of tree-sitter sexp movement by
>> defining the sexp “thing” in treesit-thing-settings. Js-ts-mode defines one
>> (js--treesit-sexp-nodes) and it only consider some nodes as sexp. You might
>> be able to tweak the sexp movement to your liking by changing it, or
>> directly modifying the definition for `sexp’ in treesit-thing-settings.
>>
>>>> Do you think the examples I listed in the original bug report match
>>>> your expectations? If so, then it is probably OK to close the bug report.
>>>
>>> Yes, I do, but let's wait for others to chime in if they have opinions
>>> on this.
>
> Here are some ideas how to cover more use cases.
>
> Suppose that a user wants to disable tree-sitter sexp movement
> completely to use the default forward-sexp-default-function.
> The natural way to do this would be set the list of nodes to nil:
>
> (setq js--treesit-sexp-nodes nil)
>
> However, this currently doesn't work, and requires a change like this:
>
> @@ -2290,10 +2290,12 @@ treesit-forward-sexp
> (treesit-node-at (point) (treesit-language-at (point)))))
> (or (when (and node-at-point
> ;; Make sure point is strictly inside node.
> - (< (treesit-node-start node-at-point)
> - (point)
> - (treesit-node-end node-at-point))
> - (treesit-node-match-p node-at-point 'text t))
> + (<= (treesit-node-start node-at-point)
> + (point)
> + (treesit-node-end node-at-point))
> + (or (treesit-node-match-p node-at-point 'text t)
> + (not (treesit-node-match-p node-at-point 'sexp t))
> + ))
> (forward-sexp-default-function arg)
> t)
> (if (> arg 0)
>
> Now, the next case: what if the user wants to use the default
> forward-sexp-default-function except for the 'binary_expression'
> like "a + b" where `C-M-f' should move from "a" to the end of "b":
>
> export const add = (a, b) => -!-a + b;
>
> should move to
>
> export const add = (a, b) => a + b;
>
> ^1
>
> The best way for the user would be to customize:
>
> (setq js--treesit-sexp-nodes '("binary_expression"))
>
> But this is not yet handled by the condition above:
>
> (not (treesit-node-match-p node-at-point 'sexp t))
>
> because 'node-at-point' is "identifier".
> So we need to use 'treesit-parent-until'
> to check if all parent nodes match
> 'js--treesit-sexp-nodes'. Then it will find
> the parent "binary_expression".
>
> I believe something like this will make
> treesit-forward-sexp more customizable.
The user can modify treesit-thing-settings to alter the behavior of sexp navigation, they don’t necessarily need to use js--treesit-sexp-nodes. Maybe we should add a test for (treesit-thing-defined-p 'sexp nil) in treesit-forward-sexp?
Your second example sounds useful, but right now the premise of tree-sitter sexp movement is to use the parse tree primarily, and only use the default sexp movement for comments and strings. What you envisioned seems to be the other way around: use default sexp movement by default, and only use tree-sitter movement under certain conditions. Is that few lines of change able to make such big difference in the logic?
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-10-01 3:57 ` Yuan Fu
@ 2024-10-01 17:49 ` Juri Linkov
2024-10-02 6:14 ` Yuan Fu
2024-12-05 18:52 ` Juri Linkov
0 siblings, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2024-10-01 17:49 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404
> The user can modify treesit-thing-settings to alter the behavior of
> sexp navigation, they don’t necessarily need to use
> js--treesit-sexp-nodes. Maybe we should add a test for
> (treesit-thing-defined-p 'sexp nil) in treesit-forward-sexp?
I tried to do something like this, and everything works nicely with:
@@ -2289,11 +2289,10 @@ treesit-forward-sexp
(node-at-point
(treesit-node-at (point) (treesit-language-at (point)))))
(or (when (and node-at-point
- ;; Make sure point is strictly inside node.
- (< (treesit-node-start node-at-point)
- (point)
- (treesit-node-end node-at-point))
- (treesit-node-match-p node-at-point 'text t))
+ (or (treesit-node-match-p node-at-point 'text t)
+ (not (treesit-thing-at
+ (if (> arg 0) (point) (1- (point)))
+ (treesit-thing-definition 'sexp nil)))))
(forward-sexp-default-function arg)
t)
(if (> arg 0)
The new logic is the following: if there is no sexp thing defined at point,
then fall back to 'forward-sexp-default-function'.
Then after (setq js--treesit-sexp-nodes '("binary_expression"))
'C-M-f' in e.g.
export const add = (a, b) => -!-a + b;
moves point to
export const add = (a, b) => a + b-!-;
The condition (if (> arg 0) (point) (1- (point))) above
is necessary to allow 'C-M-b' to move back to:
export const add = (a, b) => -!-a + b;
Also the condition to make sure point is strictly inside node
was removed to handle the case when point was at the beginning
of the buffer:
-!-
export const add = (a, b) => a + b;
to move after
export-!- const add = (a, b) => a + b;
by 'forward-sexp-default-function'.
> Your second example sounds useful, but right now the premise of tree-sitter
> sexp movement is to use the parse tree primarily, and only use the default
> sexp movement for comments and strings. What you envisioned seems to be the
> other way around: use default sexp movement by default, and only use
> tree-sitter movement under certain conditions. Is that few lines of change
> able to make such big difference in the logic?
I think we need to support both ways:
1. opt-out - where sexp-thing definition is used by default,
and only text-thing allows users to override it;
2. opt-in - where 'forward-sexp-default-function' is used by default,
and user can explicitly define what sexp-things are preferable
for navigation by treesit.
Then in the latter case the users could prefer to use
treesit sexp navigation only for constructions with
"invisible parens". For example, in Ruby there are
two interchangeable syntaxes for code blocks:
1. curly braces {...} that are already handled
by 'forward-sexp-default-function';
2. do...end that can't be handled by 'forward-sexp-default-function',
so treesit is coming to the rescue for the case of such
implicit braces.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-10-01 17:49 ` Juri Linkov
@ 2024-10-02 6:14 ` Yuan Fu
2024-12-05 18:52 ` Juri Linkov
1 sibling, 0 replies; 106+ messages in thread
From: Yuan Fu @ 2024-10-02 6:14 UTC (permalink / raw)
To: Juri Linkov; +Cc: Eli Zaretskii, Mickey Petersen, 73404
> On Oct 1, 2024, at 10:49 AM, Juri Linkov <juri@linkov.net> wrote:
>
>> The user can modify treesit-thing-settings to alter the behavior of
>> sexp navigation, they don’t necessarily need to use
>> js--treesit-sexp-nodes. Maybe we should add a test for
>> (treesit-thing-defined-p 'sexp nil) in treesit-forward-sexp?
>
> I tried to do something like this, and everything works nicely with:
>
> @@ -2289,11 +2289,10 @@ treesit-forward-sexp
> (node-at-point
> (treesit-node-at (point) (treesit-language-at (point)))))
> (or (when (and node-at-point
> - ;; Make sure point is strictly inside node.
> - (< (treesit-node-start node-at-point)
> - (point)
> - (treesit-node-end node-at-point))
> - (treesit-node-match-p node-at-point 'text t))
> + (or (treesit-node-match-p node-at-point 'text t)
> + (not (treesit-thing-at
> + (if (> arg 0) (point) (1- (point)))
> + (treesit-thing-definition 'sexp nil)))))
> (forward-sexp-default-function arg)
> t)
> (if (> arg 0)
>
> The new logic is the following: if there is no sexp thing defined at point,
> then fall back to 'forward-sexp-default-function'.
>
> Then after (setq js--treesit-sexp-nodes '("binary_expression"))
> 'C-M-f' in e.g.
>
> export const add = (a, b) => -!-a + b;
>
> moves point to
>
> export const add = (a, b) => a + b-!-;
>
> The condition (if (> arg 0) (point) (1- (point))) above
> is necessary to allow 'C-M-b' to move back to:
>
> export const add = (a, b) => -!-a + b;
>
> Also the condition to make sure point is strictly inside node
> was removed to handle the case when point was at the beginning
> of the buffer:
>
> -!-
> export const add = (a, b) => a + b;
>
> to move after
>
> export-!- const add = (a, b) => a + b;
>
> by 'forward-sexp-default-function'.
Sounds good. Feel free to install on master if you think it works well :-)
>
>> Your second example sounds useful, but right now the premise of tree-sitter
>> sexp movement is to use the parse tree primarily, and only use the default
>> sexp movement for comments and strings. What you envisioned seems to be the
>> other way around: use default sexp movement by default, and only use
>> tree-sitter movement under certain conditions. Is that few lines of change
>> able to make such big difference in the logic?
>
> I think we need to support both ways:
>
> 1. opt-out - where sexp-thing definition is used by default,
> and only text-thing allows users to override it;
>
> 2. opt-in - where 'forward-sexp-default-function' is used by default,
> and user can explicitly define what sexp-things are preferable
> for navigation by treesit.
>
> Then in the latter case the users could prefer to use
> treesit sexp navigation only for constructions with
> "invisible parens". For example, in Ruby there are
> two interchangeable syntaxes for code blocks:
>
> 1. curly braces {...} that are already handled
> by 'forward-sexp-default-function';
>
> 2. do...end that can't be handled by 'forward-sexp-default-function',
> so treesit is coming to the rescue for the case of such
> implicit braces.
Sounds good to me, I wonder if there are clever way to implement this. If there isn’t, we’d need to define two sets treesit-sexp functions and add a custom option to control which one to use. Seems a bit clunky to me.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-10-01 17:49 ` Juri Linkov
2024-10-02 6:14 ` Yuan Fu
@ 2024-12-05 18:52 ` Juri Linkov
2024-12-05 19:53 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-05 18:52 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404
> The new logic is the following: if there is no sexp thing defined at point,
> then fall back to 'forward-sexp-default-function'.
>
> Then after (setq js--treesit-sexp-nodes '("binary_expression"))
> 'C-M-f' in e.g.
>
> export const add = (a, b) => -!-a + b;
>
> moves point to
>
> export const add = (a, b) => a + b-!-;
Unfortunately, I still can't find a way to handle such case
that from
export const add = (a, b) -!- => a + b;
typing 'C-M-f' should jump to the end of the next sexp
(to the end of whole "binary_expression"):
export const add = (a, b) => a + b-!-;
since only tree-sitter knows about "binary_expression",
so 'forward-sexp-default-function' can't be used here.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-05 18:52 ` Juri Linkov
@ 2024-12-05 19:53 ` Juri Linkov
2024-12-10 17:20 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-05 19:53 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404
>> The new logic is the following: if there is no sexp thing defined at point,
>> then fall back to 'forward-sexp-default-function'.
>>
>> Then after (setq js--treesit-sexp-nodes '("binary_expression"))
>> 'C-M-f' in e.g.
>>
>> export const add = (a, b) => -!-a + b;
>>
>> moves point to
>>
>> export const add = (a, b) => a + b-!-;
>
> Unfortunately, I still can't find a way to handle such case
> that from
>
> export const add = (a, b) -!- => a + b;
>
> typing 'C-M-f' should jump to the end of the next sexp
> (to the end of whole "binary_expression"):
>
> export const add = (a, b) => a + b-!-;
>
> since only tree-sitter knows about "binary_expression",
> so 'forward-sexp-default-function' can't be used here.
Actually, I have one idea of possible heuristics:
1. first try 'forward-sexp-default-function'
2. if it crosses the boundary of sexp defined by 'treesit-thing-settings'
then use 'treesit-end-of-thing' instead
This should work. Ok, will try.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-05 19:53 ` Juri Linkov
@ 2024-12-10 17:20 ` Juri Linkov
2024-12-11 6:31 ` Yuan Fu
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-10 17:20 UTC (permalink / raw)
To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404
[-- Attachment #1: Type: text/plain, Size: 1619 bytes --]
>> export const add = (a, b) -!- => a + b;
>>
>> typing 'C-M-f' should jump to the end of the next sexp
>> (to the end of whole "binary_expression"):
>>
>> export const add = (a, b) => a + b-!-;
>>
>> since only tree-sitter knows about "binary_expression",
>> so 'forward-sexp-default-function' can't be used here.
>
> Actually, I have one idea of possible heuristics:
>
> 1. first try 'forward-sexp-default-function'
> 2. if it crosses the boundary of sexp defined by 'treesit-thing-settings'
> then use 'treesit-end-of-thing' instead
>
> This should work. Ok, will try.
This is implemented now in the attached patch, and it works nicely.
The main rule is the following: 'forward-sexp-default-function'
should not go out of the current thing, neither go inside a sibling.
So we use 'treesit-end-of-thing' in such cases. But when inside
a thing or outside a thing, use the default function.
This supposes that such things as "identifier" in js should be
removed from 'treesit-thing-settings' since identifiers should be
navigated the same way as such keywords as "export" and "const"
using 'forward-sexp-default-function'.
What should remain in 'treesit-thing-settings' are only grouping
constructs such as "parenthesized_expression" and "statement_block".
Removing "identifier" from 'treesit-thing-settings' exposed a problem
in 'treesit-navigate-thing'. This line
((and (null next) (null prev)) parent)
tries to go out of the current thing to its parent,
thus breaking the main principle that 'forward-sexp'
should move forward across siblings only. But removing
this line fixed the problem:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: treesit-forward-sexp.patch --]
[-- Type: text/x-diff, Size: 3264 bytes --]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index db8f7a7595d..4fcdbe7fc56 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2373,21 +2373,41 @@ treesit-forward-sexp
What constitutes as text and source code sexp is determined
by `text' and `sexp' in `treesit-thing-settings'."
(interactive "^p")
- (let ((arg (or arg 1))
- (pred (or treesit-sexp-type-regexp 'sexp))
- (node-at-point
- (treesit-node-at (point) (treesit-language-at (point)))))
- (or (when (and node-at-point
- ;; Make sure point is strictly inside node.
- (< (treesit-node-start node-at-point)
- (point)
- (treesit-node-end node-at-point))
- (treesit-node-match-p node-at-point 'text t))
- (forward-sexp-default-function arg)
- t)
- (if (> arg 0)
- (treesit-end-of-thing pred (abs arg) 'restricted)
- (treesit-beginning-of-thing pred (abs arg) 'restricted))
+ (let* ((arg (or arg 1))
+ (pred (or treesit-sexp-type-regexp 'sexp))
+ (current-thing (treesit-thing-at (point) pred t))
+ (default-pos
+ (condition-case _
+ (save-excursion
+ (forward-sexp-default-function arg)
+ (point))
+ (scan-error nil)))
+ (default-pos (unless (eq (point) default-pos) default-pos))
+ (sibling-pos
+ (save-excursion
+ (and (if (> arg 0)
+ (treesit-end-of-thing pred (abs arg) 'restricted)
+ (treesit-beginning-of-thing pred (abs arg) 'restricted))
+ (point))))
+ (sibling (when sibling-pos
+ (if (> arg 0)
+ (treesit-thing-prev sibling-pos pred)
+ (treesit-thing-next sibling-pos pred)))))
+
+ ;; 'forward-sexp-default-function' should not go out of the current thing,
+ ;; neither go inside the next thing, neither go over the next thing
+ (or (when (and default-pos
+ (or (null current-thing)
+ (if (> arg 0)
+ (< default-pos (treesit-node-end current-thing))
+ (> default-pos (treesit-node-start current-thing))))
+ (or (null sibling)
+ (if (> arg 0)
+ (< default-pos (treesit-node-start sibling))
+ (> default-pos (treesit-node-end sibling)))))
+ (goto-char default-pos))
+ (when sibling-pos
+ (goto-char sibling-pos))
;; If we couldn't move, we should signal an error and report
;; the obstacle, like `forward-sexp' does. If we couldn't
;; find a parent, we simply return nil without moving point,
@@ -2849,8 +2869,7 @@ treesit-navigate-thing
(if (eq tactic 'restricted)
(setq pos (funcall
advance
- (cond ((and (null next) (null prev)) parent)
- ((> arg 0) next)
+ (cond ((> arg 0) next)
(t prev))))
;; For `nested', it's a bit more work:
;; Move...
^ permalink raw reply related [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-10 17:20 ` Juri Linkov
@ 2024-12-11 6:31 ` Yuan Fu
2024-12-11 15:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-19 7:34 ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
0 siblings, 2 replies; 106+ messages in thread
From: Yuan Fu @ 2024-12-11 6:31 UTC (permalink / raw)
To: Juri Linkov
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Stefan Monnier
> On Dec 10, 2024, at 9:20 AM, Juri Linkov <juri@linkov.net> wrote:
>
>>> export const add = (a, b) -!- => a + b;
>>>
>>> typing 'C-M-f' should jump to the end of the next sexp
>>> (to the end of whole "binary_expression"):
>>>
>>> export const add = (a, b) => a + b-!-;
>>>
>>> since only tree-sitter knows about "binary_expression",
>>> so 'forward-sexp-default-function' can't be used here.
>>
>> Actually, I have one idea of possible heuristics:
>>
>> 1. first try 'forward-sexp-default-function'
>> 2. if it crosses the boundary of sexp defined by 'treesit-thing-settings'
>> then use 'treesit-end-of-thing' instead
>>
>> This should work. Ok, will try.
>
> This is implemented now in the attached patch, and it works nicely.
>
> The main rule is the following: 'forward-sexp-default-function'
> should not go out of the current thing, neither go inside a sibling.
> So we use 'treesit-end-of-thing' in such cases. But when inside
> a thing or outside a thing, use the default function.
>
> This supposes that such things as "identifier" in js should be
> removed from 'treesit-thing-settings' since identifiers should be
> navigated the same way as such keywords as "export" and "const"
> using 'forward-sexp-default-function'.
>
> What should remain in 'treesit-thing-settings' are only grouping
> constructs such as "parenthesized_expression" and "statement_block".
Ah, this matches my idea of defining sexp in other languages as “repeatable construct/list-like construct”. We went with “every syntactic construct” at the time, which I didn’t object to, but I’m definitely happier with the repeatable construct approach. Including Stefan and Theo since they were part of the original sexp navigation discussion.
My only concern is that would the result be a bit unpredictable/confusing when we mix the result of two logic together in such an involved way? We can push to master and try it out for a while. I use tree-sitter sexp navigation for work every day, albeit strictly for navigating list-like constructs—I use forward/backward-word for smaller navigation.
>
> Removing "identifier" from 'treesit-thing-settings' exposed a problem
> in 'treesit-navigate-thing'. This line
>
> ((and (null next) (null prev)) parent)
>
> tries to go out of the current thing to its parent,
> thus breaking the main principle that 'forward-sexp'
> should move forward across siblings only. But removing
> this line fixed the problem:
Thanks, LGTM.
>
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index db8f7a7595d..4fcdbe7fc56 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -2373,21 +2373,41 @@ treesit-forward-sexp
> What constitutes as text and source code sexp is determined
> by `text' and `sexp' in `treesit-thing-settings'."
> (interactive "^p")
> - (let ((arg (or arg 1))
> - (pred (or treesit-sexp-type-regexp 'sexp))
> - (node-at-point
> - (treesit-node-at (point) (treesit-language-at (point)))))
> - (or (when (and node-at-point
> - ;; Make sure point is strictly inside node.
> - (< (treesit-node-start node-at-point)
> - (point)
> - (treesit-node-end node-at-point))
> - (treesit-node-match-p node-at-point 'text t))
> - (forward-sexp-default-function arg)
> - t)
> - (if (> arg 0)
> - (treesit-end-of-thing pred (abs arg) 'restricted)
> - (treesit-beginning-of-thing pred (abs arg) 'restricted))
> + (let* ((arg (or arg 1))
> + (pred (or treesit-sexp-type-regexp 'sexp))
> + (current-thing (treesit-thing-at (point) pred t))
> + (default-pos
> + (condition-case _
> + (save-excursion
> + (forward-sexp-default-function arg)
> + (point))
> + (scan-error nil)))
> + (default-pos (unless (eq (point) default-pos) default-pos))
> + (sibling-pos
> + (save-excursion
> + (and (if (> arg 0)
> + (treesit-end-of-thing pred (abs arg) 'restricted)
> + (treesit-beginning-of-thing pred (abs arg) 'restricted))
> + (point))))
> + (sibling (when sibling-pos
> + (if (> arg 0)
> + (treesit-thing-prev sibling-pos pred)
> + (treesit-thing-next sibling-pos pred)))))
> +
> + ;; 'forward-sexp-default-function' should not go out of the current thing,
> + ;; neither go inside the next thing, neither go over the next thing
> + (or (when (and default-pos
> + (or (null current-thing)
> + (if (> arg 0)
> + (< default-pos (treesit-node-end current-thing))
> + (> default-pos (treesit-node-start current-thing))))
> + (or (null sibling)
> + (if (> arg 0)
> + (< default-pos (treesit-node-start sibling))
> + (> default-pos (treesit-node-end sibling)))))
> + (goto-char default-pos))
> + (when sibling-pos
> + (goto-char sibling-pos))
> ;; If we couldn't move, we should signal an error and report
> ;; the obstacle, like `forward-sexp' does. If we couldn't
> ;; find a parent, we simply return nil without moving point,
> @@ -2849,8 +2869,7 @@ treesit-navigate-thing
> (if (eq tactic 'restricted)
> (setq pos (funcall
> advance
> - (cond ((and (null next) (null prev)) parent)
> - ((> arg 0) next)
> + (cond ((> arg 0) next)
> (t prev))))
> ;; For `nested', it's a bit more work:
> ;; Move...
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-11 6:31 ` Yuan Fu
@ 2024-12-11 15:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 15:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
` (2 more replies)
2024-12-19 7:34 ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
1 sibling, 3 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-11 15:12 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Juri Linkov
> Ah, this matches my idea of defining sexp in other languages as “repeatable
> construct/list-like construct”. We went with “every syntactic construct” at
> the time, which I didn’t object to, but I’m definitely happier with the
> repeatable construct approach. Including Stefan and Theo since they were
> part of the original sexp navigation discussion.
FWIW, we have both `forward-list` and `forward-list` and the new
behavior you suggest sounds closer to the historical behavior of
`forward-list` than `forward-sexp`.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-11 15:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-11 15:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 16:50 ` Mickey Petersen
2024-12-11 18:27 ` Yuan Fu
2 siblings, 0 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-11 15:29 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Juri Linkov
> FWIW, we have both `forward-list` and `forward-list` and the new
^^^^
sexp
Otherwise it sounds smarter than it is.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-11 15:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 15:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-11 16:50 ` Mickey Petersen
2024-12-11 18:27 ` Yuan Fu
2 siblings, 0 replies; 106+ messages in thread
From: Mickey Petersen @ 2024-12-11 16:50 UTC (permalink / raw)
To: Stefan Monnier
Cc: 73404, Yuan Fu, Theodor Thornhill, Eli Zaretskii, Juri Linkov
Stefan Monnier <monnier@iro.umontreal.ca> writes:
>> Ah, this matches my idea of defining sexp in other languages as “repeatable
>> construct/list-like construct”. We went with “every syntactic construct” at
>> the time, which I didn’t object to, but I’m definitely happier with the
>> repeatable construct approach. Including Stefan and Theo since they were
>> part of the original sexp navigation discussion.
>
> FWIW, we have both `forward-list` and `forward-list` and the new
> behavior you suggest sounds closer to the historical behavior of
> `forward-list` than `forward-sexp`.
>
Indeed, in Combobulate `<forward/backward>-list' is explicitly used
for sibling navigation, and `<forward/backward>-sexp' instead does
what it does in plain major modes, but tweaked ever so slightly.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-11 15:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 15:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 16:50 ` Mickey Petersen
@ 2024-12-11 18:27 ` Yuan Fu
2024-12-12 7:17 ` Juri Linkov
2 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-12-11 18:27 UTC (permalink / raw)
To: Stefan Monnier
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Juri Linkov
> On Dec 11, 2024, at 7:12 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> Ah, this matches my idea of defining sexp in other languages as “repeatable
>> construct/list-like construct”. We went with “every syntactic construct” at
>> the time, which I didn’t object to, but I’m definitely happier with the
>> repeatable construct approach. Including Stefan and Theo since they were
>> part of the original sexp navigation discussion.
>
> FWIW, we have both `forward-list` and `forward-list` and the new
> behavior you suggest sounds closer to the historical behavior of
> `forward-list` than `forward-sexp`.
>
>
> Stefan
>
Actually, what’s the difference between forward-list and forward-sexp? I always thought they are the same at least for Lisp.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-11 18:27 ` Yuan Fu
@ 2024-12-12 7:17 ` Juri Linkov
2024-12-12 7:40 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-12 7:17 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, Stefan Monnier,
73404
>>> Ah, this matches my idea of defining sexp in other languages as “repeatable
>>> construct/list-like construct”. We went with “every syntactic construct” at
>>> the time, which I didn’t object to, but I’m definitely happier with the
>>> repeatable construct approach. Including Stefan and Theo since they were
>>> part of the original sexp navigation discussion.
>>
>> FWIW, we have both `forward-list` and `forward-list` and the new
>> behavior you suggest sounds closer to the historical behavior of
>> `forward-list` than `forward-sexp`.
>
> Actually, what’s the difference between forward-list and forward-sexp?
> I always thought they are the same at least for Lisp.
forward-sexp moves over a balanced parenthetical group like
forward-list does. Plus forward-sexp also moves over an atom
such as a symbol, a number.
The problem is that treesit adds too much structural information
to such simple things as a symbol and a number. For example, in js
a simple keyword "export" gets the "(export_statement export" subtree,
Another keyword "const" gets "(lexical_declaration kind: const", etc.
Therefore for such symbols forward-sexp needs to bypass the structure
and use simpler syntactic information to move over them like on a flat list.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 7:17 ` Juri Linkov
@ 2024-12-12 7:40 ` Eli Zaretskii
2024-12-12 7:58 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2024-12-12 7:40 UTC (permalink / raw)
To: Juri Linkov; +Cc: theo, casouri, mickey, monnier, 73404
> From: Juri Linkov <juri@linkov.net>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>, Eli Zaretskii
> <eliz@gnu.org>, Mickey Petersen <mickey@masteringemacs.org>,
> 73404@debbugs.gnu.org, Theodor Thornhill <theo@thornhill.no>
> Date: Thu, 12 Dec 2024 09:17:44 +0200
>
> >>> Ah, this matches my idea of defining sexp in other languages as “repeatable
> >>> construct/list-like construct”. We went with “every syntactic construct” at
> >>> the time, which I didn’t object to, but I’m definitely happier with the
> >>> repeatable construct approach. Including Stefan and Theo since they were
> >>> part of the original sexp navigation discussion.
> >>
> >> FWIW, we have both `forward-list` and `forward-list` and the new
> >> behavior you suggest sounds closer to the historical behavior of
> >> `forward-list` than `forward-sexp`.
> >
> > Actually, what’s the difference between forward-list and forward-sexp?
> > I always thought they are the same at least for Lisp.
>
> forward-sexp moves over a balanced parenthetical group like
> forward-list does. Plus forward-sexp also moves over an atom
> such as a symbol, a number.
>
> The problem is that treesit adds too much structural information
> to such simple things as a symbol and a number. For example, in js
> a simple keyword "export" gets the "(export_statement export" subtree,
> Another keyword "const" gets "(lexical_declaration kind: const", etc.
>
> Therefore for such symbols forward-sexp needs to bypass the structure
> and use simpler syntactic information to move over them like on a flat list.
If you mean we should ignore the information provided by tree-sitter
and instead use our own syntactic information, then that sounds wrong
to me, FWIW. Why cannot we understand enough of the tree-sitter
structural information to move like we want? Presumably, the
structural information provided by tree-sitter is a portion of a parse
tree, which to me means we should be able to move between the parse
tree's nodes as long as we understand the tree and can interpret it in
our terms.
Aren't there some grammar-agnostic traits of tree-sitter nodes that
would allow us to interpret the nodes in language-independent terms?
If that is not available, then each major mode will have to provide
treesit.el with a way to interpret the tree-sitter nodes of the
corresponding grammar in a way that will allow sexp movement, thus
providing an abstraction layer that treesit.el could use for the
movement commands.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 7:40 ` Eli Zaretskii
@ 2024-12-12 7:58 ` Juri Linkov
2024-12-12 8:14 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-12 7:58 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: theo, casouri, mickey, monnier, 73404
>> forward-sexp moves over a balanced parenthetical group like
>> forward-list does. Plus forward-sexp also moves over an atom
>> such as a symbol, a number.
>>
>> The problem is that treesit adds too much structural information
>> to such simple things as a symbol and a number. For example, in js
>> a simple keyword "export" gets the "(export_statement export" subtree,
>> Another keyword "const" gets "(lexical_declaration kind: const", etc.
>>
>> Therefore for such symbols forward-sexp needs to bypass the structure
>> and use simpler syntactic information to move over them like on a flat list.
>
> If you mean we should ignore the information provided by tree-sitter
> and instead use our own syntactic information, then that sounds wrong
> to me, FWIW. Why cannot we understand enough of the tree-sitter
> structural information to move like we want? Presumably, the
> structural information provided by tree-sitter is a portion of a parse
> tree, which to me means we should be able to move between the parse
> tree's nodes as long as we understand the tree and can interpret it in
> our terms.
>
> Aren't there some grammar-agnostic traits of tree-sitter nodes that
> would allow us to interpret the nodes in language-independent terms?
> If that is not available, then each major mode will have to provide
> treesit.el with a way to interpret the tree-sitter nodes of the
> corresponding grammar in a way that will allow sexp movement, thus
> providing an abstraction layer that treesit.el could use for the
> movement commands.
Maybe it would be possible to use something like 'flatten-tree'
on the treesit's syntax tree? But this will require the addition
of a lot of rules to specify what nodes should be flattened.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 7:58 ` Juri Linkov
@ 2024-12-12 8:14 ` Juri Linkov
2024-12-12 16:31 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-12 8:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404
> Maybe it would be possible to use something like 'flatten-tree'
> on the treesit's syntax tree? But this will require the addition
> of a lot of rules to specify what nodes should be flattened.
A better idea: instead of 'sexp' in treesit-thing-settings
define separately 'list' and 'atom', e.g. replace
(setq-local treesit-thing-settings
`((javascript
(sexp ,(js--regexp-opt-symbol js--treesit-sexp-nodes)))))
with
(setq-local treesit-thing-settings
`((javascript
(list ,(js--regexp-opt-symbol js--treesit-list-nodes))
(atom ,(js--regexp-opt-symbol js--treesit-atom-nodes)))))
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 8:14 ` Juri Linkov
@ 2024-12-12 16:31 ` Juri Linkov
2024-12-12 17:49 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-12 16:31 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: theo, casouri, mickey, monnier, 73404
> A better idea: instead of 'sexp' in treesit-thing-settings
> define separately 'list' and 'atom', e.g. replace
>
> (setq-local treesit-thing-settings
> `((javascript
> (sexp ,(js--regexp-opt-symbol js--treesit-sexp-nodes)))))
>
> with
>
> (setq-local treesit-thing-settings
> `((javascript
> (list ,(js--regexp-opt-symbol js--treesit-list-nodes))
> (atom ,(js--regexp-opt-symbol js--treesit-atom-nodes)))))
Still the problem is that using the atoms from the tree
doesn't provide backward-compatibility with non-ts modes
and how C-M-f moves on non-list atoms there.
One way would be to extract anonymous text leaf nodes
such as "export" and "const" from
(export_statement "export" declaration:
(lexical_declaration kind: "const"
But still need to check symbol/word syntax to omit such nodes
as "+" from
(binary_expression left: (identifier) operator: "+"
Therefore to provide backward-compatibility with non-ts modes
in regard to C-M-f navigation, navigation on atoms should follow
the Sword/Ssymbol rules of 'scan_lists' with non-nil 'sexpflag'.
So an atom should be a syntactical entity, not structural.
This means that treesit-forward-sexp should use the 'list' thing
with syntactical atoms.
For example, for 'C-M-f' on
var p = {
case: 'zzzz',
-!-default: 'donkey',
tee: 'ornery'
};
in js-ts-mode it would be unexpected to move to
var p = {
case: 'zzzz',
default: 'donkey'-!-,
tee: 'ornery'
};
because js-mode moves to
var p = {
case: 'zzzz',
default-!-: 'donkey',
tee: 'ornery'
};
But anyway 'list' should be customizable as 'sexp' already is.
OTOH, transpose-sexps should use treesit 'sexp' with more fine-grained
lists that are not suitable for forward-sexp.
(And we have no transpose-lists.)
For example, it's expected for transpose-sexps to transpose
a pair of key:value defined by 'sexp':
var p = {
default: 'donkey',
-!-case: 'zzzz',
tee: 'ornery'
};
that will be a big improvement when comparing to js-mode.
This also will close bug#60655.
And if someone want to transpose adjacent atoms (symbols or words),
there is transpose-words.
In summary:
The current implementation in treesit-forward-sexp is more like forward-list.
So let's rename it to treesit-forward-list, then create a new implementation
of treesit-forward-sexp that uses the 'list' thing together with
syntactical atoms.
Another variant is to leave treesit-forward-sexp as is,
and create a new function treesit-forward-sexp-with-list
that uses the 'list' thing.
And anyway let's keep the current implementation
of treesit-transpose-sexps that uses the 'sexp' thing.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 16:31 ` Juri Linkov
@ 2024-12-12 17:49 ` Juri Linkov
2024-12-12 19:13 ` Eli Zaretskii
2024-12-18 7:37 ` Juri Linkov
0 siblings, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-12 17:49 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404
[-- Attachment #1: Type: text/plain, Size: 372 bytes --]
> Another variant is to leave treesit-forward-sexp as is,
> and create a new function treesit-forward-sexp-with-list
> that uses the 'list' thing.
This patch keep the current function treesit-forward-sexp,
and creates a new function treesit-forward-sexp-list
that uses the 'sexp-list' thing to navigate lists while
using forward-sexp-default-function to navigate atoms:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: treesit-forward-sexp-list.patch --]
[-- Type: text/x-diff, Size: 5601 bytes --]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index db8f7a7595d..f064be55b9c 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2400,6 +2400,68 @@ treesit-forward-sexp
(treesit-node-start boundary)
(treesit-node-end boundary)))))))
+(defun treesit-forward-sexp-list (&optional arg)
+ "Tree-sitter implementation for `forward-sexp-function'.
+
+ARG is described in the docstring of `forward-sexp-function'.
+
+If point is inside a text environment where tree-sitter is not
+supported, go forward a sexp using `forward-sexp-default-function'.
+If point is inside code, use tree-sitter functions with the
+following behavior. If there are no further sexps to move across,
+signal `scan-error' like `forward-sexp' does. If point is already
+at top-level, return nil without moving point.
+
+What constitutes as text and source code sexp is determined
+by `text' and `sexp' in `treesit-thing-settings'."
+ (interactive "^p")
+ (let* ((arg (or arg 1))
+ (pred (or treesit-sexp-type-regexp 'sexp-list))
+ (current-thing (treesit-thing-at (point) pred t))
+ (default-pos
+ (condition-case _
+ (save-excursion
+ (forward-sexp-default-function arg)
+ (point))
+ (scan-error nil)))
+ (default-pos (unless (eq (point) default-pos) default-pos))
+ (sibling-pos
+ (save-excursion
+ (and (if (> arg 0)
+ (treesit-end-of-thing pred (abs arg) 'restricted)
+ (treesit-beginning-of-thing pred (abs arg) 'restricted))
+ (point))))
+ (sibling (when sibling-pos
+ (if (> arg 0)
+ (treesit-thing-prev sibling-pos pred)
+ (treesit-thing-next sibling-pos pred)))))
+
+ ;; 'forward-sexp-default-function' should not go out of the current thing,
+ ;; neither go inside the next thing, neither go over the next thing
+ (or (when (and default-pos
+ (or (null current-thing)
+ (if (> arg 0)
+ (< default-pos (treesit-node-end current-thing))
+ (> default-pos (treesit-node-start current-thing))))
+ (or (null sibling)
+ (if (> arg 0)
+ (< default-pos (treesit-node-start sibling))
+ (> default-pos (treesit-node-end sibling)))))
+ (goto-char default-pos))
+ (when sibling-pos
+ (goto-char sibling-pos))
+ ;; If we couldn't move, we should signal an error and report
+ ;; the obstacle, like `forward-sexp' does. If we couldn't
+ ;; find a parent, we simply return nil without moving point,
+ ;; then functions like `up-list' will signal "at top level".
+ (when-let* ((parent (treesit-thing-at (point) pred t))
+ (boundary (if (> arg 0)
+ (treesit-node-child parent -1)
+ (treesit-node-child parent 0))))
+ (signal 'scan-error (list "No more sexp to move across"
+ (treesit-node-start boundary)
+ (treesit-node-end boundary)))))))
+
(defun treesit-transpose-sexps (&optional arg)
"Tree-sitter `transpose-sexps' function.
ARG is the same as in `transpose-sexps'.
@@ -2849,7 +2911,7 @@ treesit-navigate-thing
(if (eq tactic 'restricted)
(setq pos (funcall
advance
- (cond ((and (null next) (null prev)) parent)
+ (cond ((and (null next) (null prev) (not (eq thing 'sexp-list))) parent)
((> arg 0) next)
(t prev))))
;; For `nested', it's a bit more work:
@@ -3246,6 +3308,9 @@ treesit-major-mode-setup
(setq-local forward-sexp-function #'treesit-forward-sexp)
(setq-local transpose-sexps-function #'treesit-transpose-sexps))
+ (when (treesit-thing-defined-p 'sexp-list nil)
+ (setq-local forward-sexp-function #'treesit-forward-sexp-list))
+
(when (treesit-thing-defined-p 'sentence nil)
(setq-local forward-sentence-function #'treesit-forward-sentence))
diff --git a/lisp/progmodes/js.el b/lisp/progmodes/js.el
index dbf721e8d0f..c4d33564e80 100644
--- a/lisp/progmodes/js.el
+++ b/lisp/progmodes/js.el
@@ -3878,6 +3878,19 @@ js--treesit-sexp-nodes
"Nodes that designate sexps in JavaScript.
See `treesit-thing-settings' for more information.")
+(defvar js--treesit-sexp-list-nodes
+ '("formal_parameters"
+ "arguments"
+ "statement_block"
+ "parenthesized_expression"
+ "switch_body"
+ "array"
+ "object"
+ "string"
+ "regex")
+ "Nodes that designate lists in JavaScript.
+See `treesit-thing-settings' for more information.")
+
(defvar js--treesit-jsdoc-beginning-regexp (rx bos "/**")
"Regular expression matching the beginning of a jsdoc block comment.")
@@ -3921,6 +3934,7 @@ js-ts-mode
(setq-local treesit-thing-settings
`((javascript
(sexp ,(js--regexp-opt-symbol js--treesit-sexp-nodes))
+ (sexp-list ,(js--regexp-opt-symbol js--treesit-sexp-list-nodes))
(sentence ,(js--regexp-opt-symbol js--treesit-sentence-nodes))
(text ,(js--regexp-opt-symbol '("comment"
"string_fragment"))))))
^ permalink raw reply related [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 17:49 ` Juri Linkov
@ 2024-12-12 19:13 ` Eli Zaretskii
2024-12-13 7:06 ` Juri Linkov
2024-12-18 7:37 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2024-12-12 19:13 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, monnier, 73404
> From: Juri Linkov <juri@linkov.net>
> Cc: theo@thornhill.no, casouri@gmail.com, mickey@masteringemacs.org,
> monnier@iro.umontreal.ca, 73404@debbugs.gnu.org
> Date: Thu, 12 Dec 2024 19:49:30 +0200
>
> > Another variant is to leave treesit-forward-sexp as is,
> > and create a new function treesit-forward-sexp-with-list
> > that uses the 'list' thing.
>
> This patch keep the current function treesit-forward-sexp,
> and creates a new function treesit-forward-sexp-list
> that uses the 'sexp-list' thing to navigate lists while
> using forward-sexp-default-function to navigate atoms:
Introducing a new command raises the question how should users
navigate using the two. Will C-M-<RIGHT> invoke forward-sexp or the
new command (or maybe one or the other in some dwin-ish way)?
What are the problems with teaching treesit-forward-sexp do what the
new command does? IOW, why do we need a new command?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 19:13 ` Eli Zaretskii
@ 2024-12-13 7:06 ` Juri Linkov
2024-12-14 11:02 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-13 7:06 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404
>> This patch keep the current function treesit-forward-sexp,
>> and creates a new function treesit-forward-sexp-list
>> that uses the 'sexp-list' thing to navigate lists while
>> using forward-sexp-default-function to navigate atoms:
>
> Introducing a new command raises the question how should users
> navigate using the two. Will C-M-<RIGHT> invoke forward-sexp or the
> new command (or maybe one or the other in some dwin-ish way)?
>
> What are the problems with teaching treesit-forward-sexp do what the
> new command does? IOW, why do we need a new command?
Actually, it's not a new command, but a new function for C-M-<RIGHT>
via forward-sexp-function. This is from treesit-major-mode-setup:
(when (treesit-thing-defined-p 'sexp nil)
(setq-local forward-sexp-function #'treesit-forward-sexp)
(setq-local transpose-sexps-function #'treesit-transpose-sexps))
(when (treesit-thing-defined-p 'sexp-list nil)
(setq-local forward-sexp-function #'treesit-forward-sexp-list))
When a ts-mode defines the 'sexp-list' thing, only then it's used.
Otherwise the current implementation with 'sexp' is used for C-M-<RIGHT>.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-13 7:06 ` Juri Linkov
@ 2024-12-14 11:02 ` Eli Zaretskii
2024-12-14 18:14 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2024-12-14 11:02 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, monnier, 73404
> From: Juri Linkov <juri@linkov.net>
> Cc: theo@thornhill.no, casouri@gmail.com, mickey@masteringemacs.org,
> monnier@iro.umontreal.ca, 73404@debbugs.gnu.org
> Date: Fri, 13 Dec 2024 09:06:04 +0200
>
> >> This patch keep the current function treesit-forward-sexp,
> >> and creates a new function treesit-forward-sexp-list
> >> that uses the 'sexp-list' thing to navigate lists while
> >> using forward-sexp-default-function to navigate atoms:
> >
> > Introducing a new command raises the question how should users
> > navigate using the two. Will C-M-<RIGHT> invoke forward-sexp or the
> > new command (or maybe one or the other in some dwin-ish way)?
> >
> > What are the problems with teaching treesit-forward-sexp do what the
> > new command does? IOW, why do we need a new command?
>
> Actually, it's not a new command, but a new function for C-M-<RIGHT>
> via forward-sexp-function.
It's an interactive function, so it's a command.
> This is from treesit-major-mode-setup:
>
> (when (treesit-thing-defined-p 'sexp nil)
> (setq-local forward-sexp-function #'treesit-forward-sexp)
> (setq-local transpose-sexps-function #'treesit-transpose-sexps))
>
> (when (treesit-thing-defined-p 'sexp-list nil)
> (setq-local forward-sexp-function #'treesit-forward-sexp-list))
>
> When a ts-mode defines the 'sexp-list' thing, only then it's used.
> Otherwise the current implementation with 'sexp' is used for C-M-<RIGHT>.
Then why is the new function interactive?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-14 11:02 ` Eli Zaretskii
@ 2024-12-14 18:14 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-14 18:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404
[-- Attachment #1: Type: text/plain, Size: 937 bytes --]
>> Actually, it's not a new command, but a new function for C-M-<RIGHT>
>> via forward-sexp-function.
>
> It's an interactive function, so it's a command.
>
>> When a ts-mode defines the 'sexp-list' thing, only then it's used.
>> Otherwise the current implementation with 'sexp' is used for C-M-<RIGHT>.
>
> Then why is the new function interactive?
Ah, I didn't notice it's interactive!
treesit-forward-sexp-list was based on treesit-forward-sexp
that has the interactive spec added by Theo in the commit 207901457c01.
I guess that Theo decided to make this function interactive
for such use case that users could use it separately
or bind it to a key.
What really needs to be interactive is the new function
treesit-forward-list added in the following patch
because there is no special variable forward-list-function for
forward-list like there is forward-sexp-function for forward-sexp.
So users might want to use it separately:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: treesit-forward-list.patch --]
[-- Type: text/x-diff, Size: 5315 bytes --]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 18200acf53f..a1c012b6d2f 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2366,6 +2366,11 @@ treesit-sexp-type-regexp
however, smaller in scope than sentences. This is used by
`treesit-forward-sexp' and friends.")
+(defun treesit-forward-list (&optional arg)
+ (interactive "^p")
+ (let ((treesit-sexp-type-regexp 'sexp-list))
+ (treesit-forward-sexp arg)))
+
(defun treesit-forward-sexp (&optional arg)
"Tree-sitter implementation for `forward-sexp-function'.
@@ -2382,18 +2387,8 @@ treesit-forward-sexp
by `text' and `sexp' in `treesit-thing-settings'."
(interactive "^p")
(let ((arg (or arg 1))
- (pred (or treesit-sexp-type-regexp 'sexp))
- (node-at-point
- (treesit-node-at (point) (treesit-language-at (point)))))
- (or (when (and node-at-point
- ;; Make sure point is strictly inside node.
- (< (treesit-node-start node-at-point)
- (point)
- (treesit-node-end node-at-point))
- (treesit-node-match-p node-at-point 'text t))
- (forward-sexp-default-function arg)
- t)
- (if (> arg 0)
+ (pred (or treesit-sexp-type-regexp 'sexp)))
+ (or (if (> arg 0)
(treesit-end-of-thing pred (abs arg) 'restricted)
(treesit-beginning-of-thing pred (abs arg) 'restricted))
;; If we couldn't move, we should signal an error and report
@@ -2408,6 +2403,63 @@ treesit-forward-sexp
(treesit-node-start boundary)
(treesit-node-end boundary)))))))
+(defun treesit-forward-sexp-list (&optional arg)
+ "Tree-sitter implementation for `forward-sexp-function'.
+
+ARG is described in the docstring of `forward-sexp-function'.
+
+If point is inside a text environment where tree-sitter is not
+supported, go forward a sexp using `forward-sexp-default-function'.
+If point is inside code, use tree-sitter functions with the
+following behavior. If there are no further sexps to move across,
+signal `scan-error' like `forward-sexp' does. If point is already
+at top-level, return nil without moving point.
+
+What constitutes as text and source code sexp is determined
+by `text' and `sexp' in `treesit-thing-settings'."
+ (interactive "^p")
+ (let* ((arg (or arg 1))
+ (pred 'sexp-list)
+ (default-pos
+ (condition-case _
+ (save-excursion
+ (forward-sexp-default-function arg)
+ (point))
+ (scan-error nil)))
+ (default-pos (unless (eq (point) default-pos) default-pos))
+ (sibling-pos
+ (when default-pos
+ (save-excursion
+ (and (if (> arg 0)
+ (treesit-end-of-thing pred (abs arg) 'restricted)
+ (treesit-beginning-of-thing pred (abs arg) 'restricted))
+ (point)))))
+ (sibling (when sibling-pos
+ (if (> arg 0)
+ (treesit-thing-prev sibling-pos pred)
+ (treesit-thing-next sibling-pos pred))))
+ (sibling (when (and sibling
+ (if (> arg 0)
+ (<= (point) (treesit-node-start sibling))
+ (>= (point) (treesit-node-start sibling))))
+ sibling))
+ (current-thing (when default-pos
+ (treesit-thing-at (point) pred t))))
+
+ ;; 'forward-sexp-default-function' should not go out of the current thing,
+ ;; neither go inside the next thing, neither go over the next thing
+ (or (when (and default-pos
+ (or (null current-thing)
+ (if (> arg 0)
+ (< default-pos (treesit-node-end current-thing))
+ (> default-pos (treesit-node-start current-thing))))
+ (or (null sibling)
+ (if (> arg 0)
+ (<= default-pos (treesit-node-start sibling))
+ (>= default-pos (treesit-node-end sibling)))))
+ (goto-char default-pos))
+ (treesit-forward-list arg))))
+
(defun treesit-transpose-sexps (&optional arg)
"Tree-sitter `transpose-sexps' function.
ARG is the same as in `transpose-sexps'.
@@ -2857,7 +2909,7 @@ treesit-navigate-thing
(if (eq tactic 'restricted)
(setq pos (funcall
advance
- (cond ((and (null next) (null prev)) parent)
+ (cond ((and (null next) (null prev) (not (eq thing 'sexp-list))) parent)
((> arg 0) next)
(t prev))))
;; For `nested', it's a bit more work:
@@ -3254,6 +3306,9 @@ treesit-major-mode-setup
(setq-local forward-sexp-function #'treesit-forward-sexp)
(setq-local transpose-sexps-function #'treesit-transpose-sexps))
+ (when (treesit-thing-defined-p 'sexp-list nil)
+ (setq-local forward-sexp-function #'treesit-forward-sexp-list))
+
(when (treesit-thing-defined-p 'sentence nil)
(setq-local forward-sentence-function #'treesit-forward-sentence))
^ permalink raw reply related [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-12 17:49 ` Juri Linkov
2024-12-12 19:13 ` Eli Zaretskii
@ 2024-12-18 7:37 ` Juri Linkov
2024-12-19 4:04 ` Yuan Fu
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-18 7:37 UTC (permalink / raw)
To: Yuan Fu; +Cc: theo, mickey, monnier, 73404
While testing treesit-forward-sexp-list, I discovered that
thing-navigation functions are not restricted to named nodes.
I wonder if there a reason to find anonymous nodes as things?
The problem was found with the node "unless" in Ruby:
unless cond
a += 1
else
b -= 1
end
Here the named node 'unless' has exactly the same name
as the anonymous node with the text "unless":
(unless "unless" condition: (identifier)
Finding anonymous nodes breaks forward-sexp when point is on "unless":
un-!-less cond
a += 1
else
b -= 1
end
because (treesit-thing-at (point) 'sexp t) finds
#<treesit-node "unless" in 156-162>
instead of
#<treesit-node unless in 156-203>
Also this breaks backward-sexp and backward-up-list
because treesit--thing-sibling finds
the anonymous node "unless" as a previous sibling
instead of the named node 'unless' as a parent.
Would the right solution be to check if the found thing
is a named node? With something like:
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 18200acf53f..9ad879ee40c 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2711,6 +2774,7 @@ treesit--thing-sibling
(lambda (n) (>= (treesit-node-start n) pos))))
(iter-pred (lambda (node)
(and (treesit-node-match-p node thing t)
+ (treesit-node-check node 'named)
(funcall pos-pred node))))
(sibling nil))
(when cursor
@@ -2760,6 +2824,7 @@ treesit-thing-at
(let* ((cursor (treesit-node-at pos))
(iter-pred (lambda (node)
(and (treesit-node-match-p node thing t)
+ (treesit-node-check node 'named)
(if strict
(< (treesit-node-start node) pos)
(<= (treesit-node-start node) pos))
^ permalink raw reply related [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-18 7:37 ` Juri Linkov
@ 2024-12-19 4:04 ` Yuan Fu
2024-12-19 7:14 ` Juri Linkov
2024-12-19 7:18 ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
0 siblings, 2 replies; 106+ messages in thread
From: Yuan Fu @ 2024-12-19 4:04 UTC (permalink / raw)
To: Juri Linkov; +Cc: theo, Mickey Petersen, monnier, 73404
> On Dec 17, 2024, at 11:37 PM, Juri Linkov <juri@linkov.net> wrote:
>
> While testing treesit-forward-sexp-list, I discovered that
> thing-navigation functions are not restricted to named nodes.
>
> I wonder if there a reason to find anonymous nodes as things?
We should rather ask is there any reason to not find anonymous nodes as things? Even ruby-ts-mode defines a bunch of anonymous nodes as sexp, no? In any case, excluding anonymous nodes from things doesn’t sound right.
>
> The problem was found with the node "unless" in Ruby:
>
> unless cond
> a += 1
> else
> b -= 1
> end
>
> Here the named node 'unless' has exactly the same name
> as the anonymous node with the text "unless":
>
> (unless "unless" condition: (identifier)
I feel like Ruby’s grammar should call the named node something else, like unless_statement.
>
> Finding anonymous nodes breaks forward-sexp when point is on "unless":
>
> un-!-less cond
> a += 1
> else
> b -= 1
> end
>
> because (treesit-thing-at (point) 'sexp t) finds
>
> #<treesit-node "unless" in 156-162>
>
> instead of
>
> #<treesit-node unless in 156-203>
>
> Also this breaks backward-sexp and backward-up-list
> because treesit--thing-sibling finds
> the anonymous node "unless" as a previous sibling
> instead of the named node 'unless' as a parent.
>
> Would the right solution be to check if the found thing
> is a named node? With something like:
>
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 18200acf53f..9ad879ee40c 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -2711,6 +2774,7 @@ treesit--thing-sibling
> (lambda (n) (>= (treesit-node-start n) pos))))
> (iter-pred (lambda (node)
> (and (treesit-node-match-p node thing t)
> + (treesit-node-check node 'named)
> (funcall pos-pred node))))
> (sibling nil))
> (when cursor
> @@ -2760,6 +2824,7 @@ treesit-thing-at
> (let* ((cursor (treesit-node-at pos))
> (iter-pred (lambda (node)
> (and (treesit-node-match-p node thing t)
> + (treesit-node-check node 'named)
> (if strict
> (< (treesit-node-start node) pos)
> (<= (treesit-node-start node) pos))
A better solution IMO is to add some way to distinguish between named and anonymous nodes. I can think of two ways, either add “and” and “named/anonymous” predicate, so (and named “unless”) only matches the named “unless” node; or we add a special syntax such that “(unless)” only matches named nodes, and “\”unless\”” only matches anonymous nodes.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-19 4:04 ` Yuan Fu
@ 2024-12-19 7:14 ` Juri Linkov
2024-12-19 7:18 ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
1 sibling, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-19 7:14 UTC (permalink / raw)
To: Yuan Fu; +Cc: theo, Mickey Petersen, monnier, 73404
> A better solution IMO is to add some way to distinguish between named and
> anonymous nodes. I can think of two ways, either add “and” and
> “named/anonymous” predicate, so (and named “unless”) only matches the named
> “unless” node; or we add a special syntax such that “(unless)” only matches
> named nodes, and “\”unless\”” only matches anonymous nodes.
I agree, so will create a new feature request.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-19 4:04 ` Yuan Fu
2024-12-19 7:14 ` Juri Linkov
@ 2024-12-19 7:18 ` Juri Linkov
2024-12-24 3:02 ` Yuan Fu
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-19 7:18 UTC (permalink / raw)
To: 74963; +Cc: Yuan Fu, Dmitry Gutov
[This is a separate bug report from bug#73404]
>> While testing treesit-forward-sexp-list, I discovered that
>> thing-navigation functions are not restricted to named nodes.
>>
>> I wonder if there a reason to find anonymous nodes as things?
>
> We should rather ask is there any reason to not find anonymous nodes
> as things? Even ruby-ts-mode defines a bunch of anonymous nodes as
> sexp, no? In any case, excluding anonymous nodes from things doesn’t
> sound right.
Indeed, there are many anonymous nodes used in ruby-ts-mode.
>> The problem was found with the node "unless" in Ruby:
>>
>> unless cond
>> a += 1
>> else
>> b -= 1
>> end
>>
>> Here the named node 'unless' has exactly the same name
>> as the anonymous node with the text "unless":
>>
>> (unless "unless" condition: (identifier)
>
> I feel like Ruby’s grammar should call the named node something else,
> like unless_statement.
Agreed, the problem is that nodes defined in Ruby’s grammar
are too ambiguous. There are more such nodes with the same name
for named and anonymous: "if", "while", "until", etc.
>> Finding anonymous nodes breaks forward-sexp when point is on "unless":
>>
>> un-!-less cond
>> a += 1
>> else
>> b -= 1
>> end
>>
>> because (treesit-thing-at (point) 'sexp t) finds
>>
>> #<treesit-node "unless" in 156-162>
>>
>> instead of
>>
>> #<treesit-node unless in 156-203>
>>
>> Also this breaks backward-sexp and backward-up-list
>> because treesit--thing-sibling finds
>> the anonymous node "unless" as a previous sibling
>> instead of the named node 'unless' as a parent.
>>
>> Would the right solution be to check if the found thing
>> is a named node? With something like:
>>
>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>> index 18200acf53f..9ad879ee40c 100644
>> --- a/lisp/treesit.el
>> +++ b/lisp/treesit.el
>> @@ -2711,6 +2774,7 @@ treesit--thing-sibling
>> (lambda (n) (>= (treesit-node-start n) pos))))
>> (iter-pred (lambda (node)
>> (and (treesit-node-match-p node thing t)
>> + (treesit-node-check node 'named)
>> (funcall pos-pred node))))
>> (sibling nil))
>> (when cursor
>> @@ -2760,6 +2824,7 @@ treesit-thing-at
>> (let* ((cursor (treesit-node-at pos))
>> (iter-pred (lambda (node)
>> (and (treesit-node-match-p node thing t)
>> + (treesit-node-check node 'named)
>> (if strict
>> (< (treesit-node-start node) pos)
>> (<= (treesit-node-start node) pos))
>
> A better solution IMO is to add some way to distinguish between named and
> anonymous nodes. I can think of two ways, either add “and” and
> “named/anonymous” predicate, so (and named “unless”) only matches the named
> “unless” node; or we add a special syntax such that “(unless)” only matches
> named nodes, and “\”unless\”” only matches anonymous nodes.
Either predicate or a special syntax is welcome.
This would be more handy than writing a lambda with implicit calls
of treesit-node-check.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-11 6:31 ` Yuan Fu
2024-12-11 15:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-19 7:34 ` Juri Linkov
2024-12-24 19:05 ` Juri Linkov
2024-12-30 7:15 ` Juri Linkov
1 sibling, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-19 7:34 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Stefan Monnier
>> What should remain in 'treesit-thing-settings' are only grouping
>> constructs such as "parenthesized_expression" and "statement_block".
>
> Ah, this matches my idea of defining sexp in other languages as “repeatable
> construct/list-like construct”. We went with “every syntactic construct” at
> the time, which I didn’t object to, but I’m definitely happier with the
> repeatable construct approach. Including Stefan and Theo since they were
> part of the original sexp navigation discussion.
>
> My only concern is that would the result be a bit unpredictable/confusing
> when we mix the result of two logic together in such an involved way? We
> can push to master and try it out for a while. I use tree-sitter sexp
> navigation for work every day, albeit strictly for navigating list-like
> constructs—I use forward/backward-word for smaller navigation.
>
>> tries to go out of the current thing to its parent,
>> thus breaking the main principle that 'forward-sexp'
>> should move forward across siblings only. But removing
>> this line fixed the problem:
>
> Thanks, LGTM.
Ok, so now pushed to master in such backwards-compatible way
that when a ts-mode doesn't define the 'sexp-list' thing,
then the existing 'sexp' is used.
Also added 'sexp-list' to c-ts-mode, js-ts-mode, ruby-ts-mode and
html-ts-mode. Addition of 'sexp-list' to other ts-modes is underway.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-19 7:18 ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
@ 2024-12-24 3:02 ` Yuan Fu
2024-12-24 7:17 ` Juri Linkov
2024-12-24 17:52 ` Juri Linkov
0 siblings, 2 replies; 106+ messages in thread
From: Yuan Fu @ 2024-12-24 3:02 UTC (permalink / raw)
To: Juri Linkov; +Cc: Dmitry Gutov, 74963
> On Dec 18, 2024, at 11:18 PM, Juri Linkov <juri@linkov.net> wrote:
>
> [This is a separate bug report from bug#73404]
>
>>> While testing treesit-forward-sexp-list, I discovered that
>>> thing-navigation functions are not restricted to named nodes.
>>>
>>> I wonder if there a reason to find anonymous nodes as things?
>>
>> We should rather ask is there any reason to not find anonymous nodes
>> as things? Even ruby-ts-mode defines a bunch of anonymous nodes as
>> sexp, no? In any case, excluding anonymous nodes from things doesn’t
>> sound right.
>
> Indeed, there are many anonymous nodes used in ruby-ts-mode.
>
>>> The problem was found with the node "unless" in Ruby:
>>>
>>> unless cond
>>> a += 1
>>> else
>>> b -= 1
>>> end
>>>
>>> Here the named node 'unless' has exactly the same name
>>> as the anonymous node with the text "unless":
>>>
>>> (unless "unless" condition: (identifier)
>>
>> I feel like Ruby’s grammar should call the named node something else,
>> like unless_statement.
>
> Agreed, the problem is that nodes defined in Ruby’s grammar
> are too ambiguous. There are more such nodes with the same name
> for named and anonymous: "if", "while", "until", etc.
>
>>> Finding anonymous nodes breaks forward-sexp when point is on "unless":
>>>
>>> un-!-less cond
>>> a += 1
>>> else
>>> b -= 1
>>> end
>>>
>>> because (treesit-thing-at (point) 'sexp t) finds
>>>
>>> #<treesit-node "unless" in 156-162>
>>>
>>> instead of
>>>
>>> #<treesit-node unless in 156-203>
>>>
>>> Also this breaks backward-sexp and backward-up-list
>>> because treesit--thing-sibling finds
>>> the anonymous node "unless" as a previous sibling
>>> instead of the named node 'unless' as a parent.
>>>
>>> Would the right solution be to check if the found thing
>>> is a named node? With something like:
>>>
>>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>>> index 18200acf53f..9ad879ee40c 100644
>>> --- a/lisp/treesit.el
>>> +++ b/lisp/treesit.el
>>> @@ -2711,6 +2774,7 @@ treesit--thing-sibling
>>> (lambda (n) (>= (treesit-node-start n) pos))))
>>> (iter-pred (lambda (node)
>>> (and (treesit-node-match-p node thing t)
>>> + (treesit-node-check node 'named)
>>> (funcall pos-pred node))))
>>> (sibling nil))
>>> (when cursor
>>> @@ -2760,6 +2824,7 @@ treesit-thing-at
>>> (let* ((cursor (treesit-node-at pos))
>>> (iter-pred (lambda (node)
>>> (and (treesit-node-match-p node thing t)
>>> + (treesit-node-check node 'named)
>>> (if strict
>>> (< (treesit-node-start node) pos)
>>> (<= (treesit-node-start node) pos))
>>
>> A better solution IMO is to add some way to distinguish between named and
>> anonymous nodes. I can think of two ways, either add “and” and
>> “named/anonymous” predicate, so (and named “unless”) only matches the named
>> “unless” node; or we add a special syntax such that “(unless)” only matches
>> named nodes, and “\”unless\”” only matches anonymous nodes.
>
> Either predicate or a special syntax is welcome.
>
> This would be more handy than writing a lambda with implicit calls
> of treesit-node-check.
I’ll go with the (and named “unless”) route because after thinking about it more, “(unless)” will be hard to work with because the string predicate is actually a regexp.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-24 3:02 ` Yuan Fu
@ 2024-12-24 7:17 ` Juri Linkov
2024-12-24 7:41 ` Juri Linkov
2024-12-25 3:25 ` Dmitry Gutov
2024-12-24 17:52 ` Juri Linkov
1 sibling, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-24 7:17 UTC (permalink / raw)
To: Yuan Fu; +Cc: Dmitry Gutov, 74963
>>> A better solution IMO is to add some way to distinguish between named and
>>> anonymous nodes. I can think of two ways, either add “and” and
>>> “named/anonymous” predicate, so (and named “unless”) only matches the named
>>> “unless” node; or we add a special syntax such that “(unless)” only matches
>>> named nodes, and “\”unless\”” only matches anonymous nodes.
>>
>> Either predicate or a special syntax is welcome.
>>
>> This would be more handy than writing a lambda with implicit calls
>> of treesit-node-check.
>
> I’ll go with the (and named “unless”) route because after thinking
> about it more, “(unless)” will be hard to work with because the string
> predicate is actually a regexp.
Thanks. While addition of '(and named "unless")' would be appreciated,
I see that currently it's possible to do this by proving a predicate
like there is 'ruby-ts--sexp-p' in
(setq-local treesit-thing-settings
`((ruby
(sexp ,(cons (rx
bol
(or
"class"
...
)
eol)
#'ruby-ts--sexp-p))
Then 'ruby-ts--sexp-p' could check for the named node "unless" as well.
But it seems such solution is less efficient than adding '(and named "unless")'.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-24 7:17 ` Juri Linkov
@ 2024-12-24 7:41 ` Juri Linkov
2024-12-25 3:25 ` Dmitry Gutov
1 sibling, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-24 7:41 UTC (permalink / raw)
To: Yuan Fu; +Cc: Dmitry Gutov, 74963
> (setq-local treesit-thing-settings
> `((ruby
> (sexp ,(cons (rx
> bol
> (or
> "class"
> ...
> )
> eol)
> #'ruby-ts--sexp-p))
BTW, I just fixed a bug in typescript-ts-mode
where "string_fragment" was mismatched by "string",
because its regexp-opt matched node names too widely,
so needed to enclose in regexp anchors.
I see that all ts-modes solve this common problem each in its own way
(here 'list' indicates a list of strings that should match node names):
c-ts-mode: (regexp-opt list 'symbols)
js-ts-mode: (concat "\\_<" (regexp-opt list t) "\\_>")
java-ts-mode: (rx (or list))
ruby-ts-mode: (rx bol (or list) eol)
Currently there is no uniform way to handle this frequent need.
'concat' like above looks too ugly, but 'regexp-opt' with the
'symbols' arg produces a strange regexp for matching symbols.
Maybe better would be create a new argument for 'regexp-opt', e.g.:
(regexp-opt list 'complete)
that will expand to:
(concat "^" list "$")
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-24 3:02 ` Yuan Fu
2024-12-24 7:17 ` Juri Linkov
@ 2024-12-24 17:52 ` Juri Linkov
2024-12-24 21:03 ` Yuan Fu
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-24 17:52 UTC (permalink / raw)
To: Yuan Fu; +Cc: Dmitry Gutov, 74963
> I’ll go with the (and named “unless”) route because after thinking
> about it more, “(unless)” will be hard to work with because the string
> predicate is actually a regexp.
Is it possible to mark all node names specified in treesit-thing-settings
as named?
I just discovered a new problem:
1. With typescript-ts-mode on the following snippet:
type NodeInfo =
| (BaseNode & {
subtypes: BaseNode[];
})
| (BaseNode & {
fields: { [name: string]: ChildNode };
children: ChildNode[];
});
You can move point inside "string" and type C-M-f or C-M-b.
But point doesn't move.
This is because treesit-thing-settings defines a named node "string".
But anonymous node has the same name "string":
(index_signature [ name: (identifier) :
index_type: (predefined_type string)
and (treesit-node-at (point)) returns
#<treesit-node "string" in 111-117>
This mismatched "string" in TypeScript is even more
unexpected than "unless" in Ruby.
So probably we need a way to mark all used nodes as named
to avoid such unexpected matches. Maybe matching anonymous nodes
should be opt-in, and by default match only named nodes.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-19 7:34 ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
@ 2024-12-24 19:05 ` Juri Linkov
2024-12-24 21:14 ` Yuan Fu
2024-12-25 17:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-30 7:15 ` Juri Linkov
1 sibling, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-24 19:05 UTC (permalink / raw)
To: Yuan Fu
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
> Ok, so now pushed to master in such backwards-compatible way
> that when a ts-mode doesn't define the 'sexp-list' thing,
> then the existing 'sexp' is used.
Also I'm looking into allowing more list-navigation commands
to be usable in ts-modes. E.g. instead of limiting list-navigation
only to the current 'forward-sexp', another useful command is
'down-list'.
Currently to get inside an HTML element in html-ts-mode
is possible by using 'M-e' (forward-sentence) to skip HTML tag.
This is not quite obvious.
More natural would be to support 'C-M-d' (down-list).
But this requires overriding more low-level 'scan-lists' and
'scan-sexps' in treesit, instead of overriding the current top-level
'forward-sexp-function'.
Also treesit-thing-settings for 'down-list' would require
pairs of node names: one node to define a list, and another node
to skip a node to get inside the list.
For html-ts-mode a list node is "element", and the node to skip
is "tag". So for example:
-!-text <html></html>
'C-M-f' will use node "element" to move to the beginning of "element":
text -!-<html></html>
then to get inside also need to skip the <html> tag:
text <html>-!-</html>
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-24 17:52 ` Juri Linkov
@ 2024-12-24 21:03 ` Yuan Fu
2024-12-25 7:49 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-12-24 21:03 UTC (permalink / raw)
To: Juri Linkov; +Cc: Dmitry Gutov, 74963
> On Dec 24, 2024, at 9:52 AM, Juri Linkov <juri@linkov.net> wrote:
>
>> I’ll go with the (and named “unless”) route because after thinking
>> about it more, “(unless)” will be hard to work with because the string
>> predicate is actually a regexp.
>
> Is it possible to mark all node names specified in treesit-thing-settings
> as named?
>
> I just discovered a new problem:
>
> 1. With typescript-ts-mode on the following snippet:
>
> type NodeInfo =
> | (BaseNode & {
> subtypes: BaseNode[];
> })
> | (BaseNode & {
> fields: { [name: string]: ChildNode };
> children: ChildNode[];
> });
>
> You can move point inside "string" and type C-M-f or C-M-b.
> But point doesn't move.
>
> This is because treesit-thing-settings defines a named node "string".
> But anonymous node has the same name "string":
>
> (index_signature [ name: (identifier) :
> index_type: (predefined_type string)
>
> and (treesit-node-at (point)) returns
> #<treesit-node "string" in 111-117>
>
> This mismatched "string" in TypeScript is even more
> unexpected than "unless" in Ruby.
>
> So probably we need a way to mark all used nodes as named
> to avoid such unexpected matches. Maybe matching anonymous nodes
> should be opt-in, and by default match only named nodes.
IMHO this is just an unfortunate bug that needs to be fixed. I agree that this type of bug are hard to avoid, which is a bad thing, but that doesn’t mean we should try to alleviate it at any cost. Making predicates named by default just adds complexity and inflexibility for not much benefit.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-24 19:05 ` Juri Linkov
@ 2024-12-24 21:14 ` Yuan Fu
2024-12-25 7:44 ` Juri Linkov
2024-12-25 17:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-12-24 21:14 UTC (permalink / raw)
To: Juri Linkov
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
> On Dec 24, 2024, at 11:05 AM, Juri Linkov <juri@linkov.net> wrote:
>
>> Ok, so now pushed to master in such backwards-compatible way
>> that when a ts-mode doesn't define the 'sexp-list' thing,
>> then the existing 'sexp' is used.
>
> Also I'm looking into allowing more list-navigation commands
> to be usable in ts-modes. E.g. instead of limiting list-navigation
> only to the current 'forward-sexp', another useful command is
> 'down-list'.
>
> Currently to get inside an HTML element in html-ts-mode
> is possible by using 'M-e' (forward-sentence) to skip HTML tag.
> This is not quite obvious.
>
> More natural would be to support 'C-M-d' (down-list).
>
> But this requires overriding more low-level 'scan-lists' and
> 'scan-sexps' in treesit, instead of overriding the current top-level
> 'forward-sexp-function'.
>
> Also treesit-thing-settings for 'down-list' would require
> pairs of node names: one node to define a list, and another node
> to skip a node to get inside the list.
>
> For html-ts-mode a list node is "element", and the node to skip
> is "tag". So for example:
>
> -!-text <html></html>
>
> 'C-M-f' will use node "element" to move to the beginning of "element":
>
> text -!-<html></html>
>
> then to get inside also need to skip the <html> tag:
>
> text <html>-!-</html>
Right, the thing navigation only supports going up/outside, but not going down/inside. We can add a new thing for beginning and end of balanced pairs. Then down-list will be going from the start of a balanced-pair-open to the end of it. Up-list will be going from the start of a balanced-pair-close to the end of it.
We might also want a way to jump from pair-open to pair-end. Going to pair-open’s parent’s end will be almost always correct. (Except for the grammars that do weird stuff, like tree-sitter-c’s for statement, the condition is not a node by itself, but split into opening parentheses, initializer, loop condition, increment, closing paren, all of which direct child of the for_statement, not grouped into a condition node.)
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-24 7:17 ` Juri Linkov
2024-12-24 7:41 ` Juri Linkov
@ 2024-12-25 3:25 ` Dmitry Gutov
2024-12-25 7:52 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Dmitry Gutov @ 2024-12-25 3:25 UTC (permalink / raw)
To: Juri Linkov, Yuan Fu; +Cc: 74963
Hi Juri,
On 24/12/2024 09:17, Juri Linkov wrote:
> While addition of '(and named "unless")' would be appreciated,
> I see that currently it's possible to do this by proving a predicate
> like there is 'ruby-ts--sexp-p' in
>
> (setq-local treesit-thing-settings
> `((ruby
> (sexp ,(cons (rx
> bol
> (or
> "class"
> ...
> )
> eol)
> #'ruby-ts--sexp-p))
>
> Then 'ruby-ts--sexp-p' could check for the named node "unless" as well.
>
> But it seems such solution is less efficient than adding '(and named "unless")'.
Given that we're already calling a predicate every time (in
ruby-ts-mode), we might as well add one more check. See the patch at the
end.
Speaking of tricky examples though, here's a definition:
module Bar
class Foo
def baz
end
end
end
If you move point inside the keyword "module" or "class", C-M-f wouldn't
move forward either as of the latest master. No such problem with "def".
Adding the check for "named" fixes the first two cases, but then C-M-f
inside "def" jumps to after "baaz". Could be worked around with a
special case, but I wonder what this difference comes from (haven't
properly debugged yet).
diff --git a/lisp/progmodes/ruby-ts-mode.el b/lisp/progmodes/ruby-ts-mode.el
index 4ef0cb18eae..4b15c6cbf27 100644
--- a/lisp/progmodes/ruby-ts-mode.el
+++ b/lisp/progmodes/ruby-ts-mode.el
@@ -1120,6 +1120,10 @@ ruby-ts--sexp-p
(equal (treesit-node-type (treesit-node-child node 0))
"(")))
+(defun ruby-ts--sexp-list-p (node)
+ (when (treesit-node-check node 'named)
+ (ruby-ts--sexp-p node)))
+
(defvar-keymap ruby-ts-mode-map
:doc "Keymap used in Ruby mode"
:parent prog-mode-map
@@ -1235,7 +1239,7 @@ ruby-ts-mode
"array"
"hash")
eol)
- #'ruby-ts--sexp-p))
+ #'ruby-ts--sexp-list-p))
(text ,(lambda (node)
(or (member (treesit-node-type node)
'("comment" "string_content"
"heredoc_content"))
^ permalink raw reply related [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-24 21:14 ` Yuan Fu
@ 2024-12-25 7:44 ` Juri Linkov
2024-12-25 8:34 ` Yuan Fu
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-25 7:44 UTC (permalink / raw)
To: Yuan Fu
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
>> For html-ts-mode a list node is "element", and the node to skip
>> is "tag". So for example:
>>
>> -!-text <html></html>
>>
>> 'C-M-f' will use node "element" to move to the beginning of "element":
>>
>> text -!-<html></html>
>>
>> then to get inside also need to skip the <html> tag:
>>
>> text <html>-!-</html>
>
> Right, the thing navigation only supports going up/outside, but not going
> down/inside. We can add a new thing for beginning and end of balanced
> pairs. Then down-list will be going from the start of a balanced-pair-open
> to the end of it. Up-list will be going from the start of
> a balanced-pair-close to the end of it.
Probably we could use just such heuristics that 'down-list' should skip
the first node of the balanced pair. This should work for most ts-modes.
For example, for 'jsx_element' the first child to skip is 'jsx_opening_element'.
For 'argument_list' the first child to skip is an anonymous node "(".
For 'statement_block' the first child to skip is an anonymous node "{".
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-24 21:03 ` Yuan Fu
@ 2024-12-25 7:49 ` Juri Linkov
2024-12-25 9:11 ` Yuan Fu
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-25 7:49 UTC (permalink / raw)
To: Yuan Fu; +Cc: Dmitry Gutov, 74963
>> This mismatched "string" in TypeScript is even more
>> unexpected than "unless" in Ruby.
>>
>> So probably we need a way to mark all used nodes as named
>> to avoid such unexpected matches. Maybe matching anonymous nodes
>> should be opt-in, and by default match only named nodes.
>
> IMHO this is just an unfortunate bug that needs to be fixed. I agree that
> this type of bug are hard to avoid, which is a bad thing, but that doesn’t
> mean we should try to alleviate it at any cost. Making predicates named by
> default just adds complexity and inflexibility for not much benefit.
Not sure if a possible flexibility is better than unintended matches.
When the authors of a ts-mode carefully selected a list of named nodes to match,
why treesit should try to match some random and unintended anonymous nodes?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-25 3:25 ` Dmitry Gutov
@ 2024-12-25 7:52 ` Juri Linkov
2024-12-26 1:00 ` Dmitry Gutov
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-25 7:52 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Yuan Fu, 74963
>> While addition of '(and named "unless")' would be appreciated,
>> I see that currently it's possible to do this by proving a predicate
>> like there is 'ruby-ts--sexp-p' in
>>
>> (setq-local treesit-thing-settings
>> `((ruby
>> (sexp ,(cons (rx
>> bol
>> (or
>> "class"
>> ...
>> )
>> eol)
>> #'ruby-ts--sexp-p))
>>
>> Then 'ruby-ts--sexp-p' could check for the named node "unless" as well.
>>
>> But it seems such solution is less efficient than adding '(and named "unless")'.
>
> Given that we're already calling a predicate every time (in
> ruby-ts-mode), we might as well add one more check. See the patch at the
> end.
Thanks, I tried the patch. It was broken, so needed to edit manually.
Also the new key 'w' doesn't work in diff buffers, need to fix it as well.
> Speaking of tricky examples though, here's a definition:
>
> module Bar
> class Foo
> def baz
> end
> end
> end
>
> If you move point inside the keyword "module" or "class", C-M-f wouldn't
> move forward either as of the latest master. No such problem with "def".
>
> Adding the check for "named" fixes the first two cases, but then C-M-f
> inside "def" jumps to after "baaz". Could be worked around with a
> special case, but I wonder what this difference comes from (haven't
> properly debugged yet).
I see no problems with your patch. Everything works nicely.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-25 7:44 ` Juri Linkov
@ 2024-12-25 8:34 ` Yuan Fu
2024-12-25 17:36 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-12-25 8:34 UTC (permalink / raw)
To: Juri Linkov
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
> On Dec 24, 2024, at 11:44 PM, Juri Linkov <juri@linkov.net> wrote:
>
>>> For html-ts-mode a list node is "element", and the node to skip
>>> is "tag". So for example:
>>>
>>> -!-text <html></html>
>>>
>>> 'C-M-f' will use node "element" to move to the beginning of "element":
>>>
>>> text -!-<html></html>
>>>
>>> then to get inside also need to skip the <html> tag:
>>>
>>> text <html>-!-</html>
>>
>> Right, the thing navigation only supports going up/outside, but not going
>> down/inside. We can add a new thing for beginning and end of balanced
>> pairs. Then down-list will be going from the start of a balanced-pair-open
>> to the end of it. Up-list will be going from the start of
>> a balanced-pair-close to the end of it.
>
> Probably we could use just such heuristics that 'down-list' should skip
> the first node of the balanced pair. This should work for most ts-modes.
>
> For example, for 'jsx_element' the first child to skip is 'jsx_opening_element'.
> For 'argument_list' the first child to skip is an anonymous node "(".
> For 'statement_block' the first child to skip is an anonymous node "{“.
Yes, that should work for the vast majority of grammars. I can’t think of an counter example other than for_statment in tree-sitter-c.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-25 7:49 ` Juri Linkov
@ 2024-12-25 9:11 ` Yuan Fu
2024-12-25 17:39 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-12-25 9:11 UTC (permalink / raw)
To: Juri Linkov; +Cc: Dmitry Gutov, 74963
> On Dec 24, 2024, at 11:49 PM, Juri Linkov <juri@linkov.net> wrote:
>
>>> This mismatched "string" in TypeScript is even more
>>> unexpected than "unless" in Ruby.
>>>
>>> So probably we need a way to mark all used nodes as named
>>> to avoid such unexpected matches. Maybe matching anonymous nodes
>>> should be opt-in, and by default match only named nodes.
>>
>> IMHO this is just an unfortunate bug that needs to be fixed. I agree that
>> this type of bug are hard to avoid, which is a bad thing, but that doesn’t
>> mean we should try to alleviate it at any cost. Making predicates named by
>> default just adds complexity and inflexibility for not much benefit.
>
> Not sure if a possible flexibility is better than unintended matches.
>
> When the authors of a ts-mode carefully selected a list of named nodes to match,
> why treesit should try to match some random and unintended anonymous nodes?
I don’t know and can’t prove how much the flexibility is worth, but the cost on complexity is real. If everywhere else uses thing predicates as-is, but sexp navigation auto-converts thing predicates into named predicate, that’s a cognitive burden and a special case that’s guaranteed to trip people over.
OTOH, what’s the downside of wrapping the sexp predicate with (and named …), if you only want named nodes to match?
I just think the cost outweighs the benefit, if there is any to begin with.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-24 19:05 ` Juri Linkov
2024-12-24 21:14 ` Yuan Fu
@ 2024-12-25 17:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-25 18:01 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-25 17:19 UTC (permalink / raw)
To: Juri Linkov
Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404
> Also I'm looking into allowing more list-navigation commands
> to be usable in ts-modes. E.g. instead of limiting list-navigation
> only to the current 'forward-sexp', another useful command is
> 'down-list'.
Gentle reminder that `forward-sexp` is not a "list-navigation" function.
That would be `forward-list`. We very often use sexp commands and
functions to manipulate non-lists such as identifiers.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-25 8:34 ` Yuan Fu
@ 2024-12-25 17:36 ` Juri Linkov
2024-12-27 7:59 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-25 17:36 UTC (permalink / raw)
To: Yuan Fu
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
>> Probably we could use just such heuristics that 'down-list' should skip
>> the first node of the balanced pair. This should work for most ts-modes.
>>
>> For example, for 'jsx_element' the first child to skip is 'jsx_opening_element'.
>> For 'argument_list' the first child to skip is an anonymous node "(".
>> For 'statement_block' the first child to skip is an anonymous node "{“.
>
> Yes, that should work for the vast majority of grammars. I can’t think
> of an counter example other than for_statment in tree-sitter-c.
Indeed, the 'for_statement' is a hard problem, and I see
no good solution. I already encountered in different languages
the same problem that you described:
> We might also want a way to jump from pair-open to pair-end. Going to
> pair-open’s parent’s end will be almost always correct. (Except for the
> grammars that do weird stuff, like tree-sitter-c’s for statement, the
> condition is not a node by itself, but split into opening parentheses,
> initializer, loop condition, increment, closing paren, all of which direct
> child of the for_statement, not grouped into a condition node.)
Maybe indeed worth to try defining pair-open and pair-end as
some children nodes of the 'for_statement', i.e. only the opening paren
and the closing paren. Something like (completely untested):
(and (member (treesit-node-text node) '("(" ")"))
(equal (treesit-node-type (treesit-node-parent node)) "for_statement"))
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-25 9:11 ` Yuan Fu
@ 2024-12-25 17:39 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-25 17:39 UTC (permalink / raw)
To: Yuan Fu; +Cc: Dmitry Gutov, 74963
>> Not sure if a possible flexibility is better than unintended matches.
>>
>> When the authors of a ts-mode carefully selected a list of named nodes to match,
>> why treesit should try to match some random and unintended anonymous nodes?
>
> I don’t know and can’t prove how much the flexibility is worth, but the
> cost on complexity is real. If everywhere else uses thing predicates as-is,
> but sexp navigation auto-converts thing predicates into named predicate,
> that’s a cognitive burden and a special case that’s guaranteed to trip
> people over.
>
> OTOH, what’s the downside of wrapping the sexp predicate with (and named …),
> if you only want named nodes to match?
>
> I just think the cost outweighs the benefit, if there is any to begin with.
Actually, what I had in mind is not to enable named-only mode by default,
but only to allow the authors of ts-modes to specify this condition.
For example, if it will be possible to write
(setq-local treesit-thing-settings
`((typescript
(sexp (and named ,(regexp-opt typescript-ts-mode--sexp-nodes 'symbols))))))
this should be fine. This is similar to how the authors of ts-modes
decide whether to restrict matches to exact names by using
"^...$" with regexp-opt.
BTW, I'm thinking about adding such simple helper:
(defun treesit-regexp-opt (strings)
(concat "^" (regexp-opt strings) "$"))
to use like this:
(setq-local treesit-thing-settings
`((typescript
(sexp (and named ,(treesit-regexp-opt typescript-ts-mode--sexp-nodes))))))
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-25 17:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-25 18:01 ` Juri Linkov
2024-12-25 19:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-25 18:01 UTC (permalink / raw)
To: Stefan Monnier
Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404
>> Also I'm looking into allowing more list-navigation commands
>> to be usable in ts-modes. E.g. instead of limiting list-navigation
>> only to the current 'forward-sexp', another useful command is
>> 'down-list'.
>
> Gentle reminder that `forward-sexp` is not a "list-navigation" function.
> That would be `forward-list`. We very often use sexp commands and
> functions to manipulate non-lists such as identifiers.
Do you think it would be better to override low-level functions 'scan-lists'
and 'scan-sexps' with new variables like 'scan-lists-function'
and 'scan-sexps-function', instead of adding more variables for
overriding top-level commands such as a new variable 'forward-list-function'
and 'down-list-function', like the existing 'forward-sexp-function'?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-25 18:01 ` Juri Linkov
@ 2024-12-25 19:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-27 7:54 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-25 19:29 UTC (permalink / raw)
To: Juri Linkov
Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404
>> Gentle reminder that `forward-sexp` is not a "list-navigation" function.
>> That would be `forward-list`. We very often use sexp commands and
>> functions to manipulate non-lists such as identifiers.
> Do you think it would be better to override low-level functions 'scan-lists'
> and 'scan-sexps' with new variables like 'scan-lists-function'
> and 'scan-sexps-function', instead of adding more variables for
> overriding top-level commands such as a new variable 'forward-list-function'
> and 'down-list-function', like the existing 'forward-sexp-function'?
Don't know.
What I do know is that in general we'd also want an `up-sexp` operation.
Currently we have an ugly kludge in `up-list` to try and use
`forward-sexp-function` (which is ugly both because
`forward-sexp-function` doesn't really provide the functionality we
need, and because it mixes up sexp and list navigation), and it would be
good to clean it up.
AFAIK `up-list` is the only place where I've needed sexp-based
navigation and where `forward-sexp-function` didn't do the job.
In theory I guess `down-list` is another, but I've never found a use
for it.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-25 7:52 ` Juri Linkov
@ 2024-12-26 1:00 ` Dmitry Gutov
2024-12-27 7:42 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Dmitry Gutov @ 2024-12-26 1:00 UTC (permalink / raw)
To: Juri Linkov; +Cc: Yuan Fu, 74963
On 25/12/2024 09:52, Juri Linkov wrote:
>> Given that we're already calling a predicate every time (in
>> ruby-ts-mode), we might as well add one more check. See the patch at the
>> end.
>
> Thanks, I tried the patch. It was broken, so needed to edit manually.
Maybe something regarding whitespace at the end?
> Also the new key 'w' doesn't work in diff buffers, need to fix it as well.
The binding for 'diff-kill-ring-save'? Seems to work here, as long as
the diff buffer is in read-only mode.
>> Adding the check for "named" fixes the first two cases, but then C-M-f
>> inside "def" jumps to after "baaz". Could be worked around with a
>> special case, but I wonder what this difference comes from (haven't
>> properly debugged yet).
>
> I see no problems with your patch. Everything works nicely.
Hmm, I can't reproduce it either anymore.
Thanks for testing, pushed to master now (unfortunately the commit
message refers to bug#73404).
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
2024-12-26 1:00 ` Dmitry Gutov
@ 2024-12-27 7:42 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-27 7:42 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: Yuan Fu, 74963
>>> Given that we're already calling a predicate every time (in
>>> ruby-ts-mode), we might as well add one more check. See the patch at the
>>> end.
>> Thanks, I tried the patch. It was broken, so needed to edit manually.
>
> Maybe something regarding whitespace at the end?
Something with whitespace, but not a big problem.
>> Also the new key 'w' doesn't work in diff buffers, need to fix it as well.
>
> The binding for 'diff-kill-ring-save'? Seems to work here, as long as the
> diff buffer is in read-only mode.
Yes, 'W' with 'diff-kill-ring-save'. Single keys are still a problem
in visited diff files.
>>> Adding the check for "named" fixes the first two cases, but then C-M-f
>>> inside "def" jumps to after "baaz". Could be worked around with a
>>> special case, but I wonder what this difference comes from (haven't
>>> properly debugged yet).
>> I see no problems with your patch. Everything works nicely.
>
> Hmm, I can't reproduce it either anymore.
>
> Thanks for testing, pushed to master now (unfortunately the commit message
> refers to bug#73404).
Thanks. Maybe a helper for other ts-modes will be handy:
(defun treesit-node-named (node)
(treesit-node-check node 'named))
to be used like this
(sexp ,(cons
(treesit-match-nodes strings)
'treesit-node-named))
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-25 19:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-27 7:54 ` Juri Linkov
2024-12-29 17:58 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-27 7:54 UTC (permalink / raw)
To: Stefan Monnier
Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404
[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]
>>> Gentle reminder that `forward-sexp` is not a "list-navigation" function.
>>> That would be `forward-list`. We very often use sexp commands and
>>> functions to manipulate non-lists such as identifiers.
>> Do you think it would be better to override low-level functions 'scan-lists'
>> and 'scan-sexps' with new variables like 'scan-lists-function'
>> and 'scan-sexps-function', instead of adding more variables for
>> overriding top-level commands such as a new variable 'forward-list-function'
>> and 'down-list-function', like the existing 'forward-sexp-function'?
>
> Don't know.
I tried to override `scan-lists` and `scan-sexps` with advices
(a proof of concept attached below), and it works nicely.
For example, `C-M-d` moves down to the HTML element in html-ts-mode.
But then I realized there are not too many places where such
overriding might be useful. In fact, there is only 1 place
in `show-paren--default` that uses `scan-sexps`. But even this
occurrence can't be used, so I created `treesit-show-paren-data`
for `show-paren-data-function` in bug#75122.
And overriding `scan-lists` is useful only in `forward-list`,
`down-list` and `up-list`. That's all. So clearly instead
of overriding `scan-lists` and `scan-sexps`, better would be
to add 3 new variables: `forward-list-function`,
`down-list-function` and `up-list-function`.
This gives the users more flexibility to choose for example navigation
for C-M-f with a limited number of treesit lists + syntax symbols, and
for C-M-n everything that is a list in treesit. This means that with
start_atimer (-!-enum atimer_type type, struct timespec timestamp)
C-M-f could skip only the next symbol and move to
start_atimer (enum-!- atimer_type type, struct timespec timestamp)
while C-M-n could skip the whole next `parameter_declaration`
start_atimer (enum atimer_type type-!-, struct timespec timestamp)
or vice versa depending on the values of all new options `...-function`
in ts-modes.
> What I do know is that in general we'd also want an `up-sexp` operation.
> Currently we have an ugly kludge in `up-list` to try and use
> `forward-sexp-function` (which is ugly both because
> `forward-sexp-function` doesn't really provide the functionality we
> need, and because it mixes up sexp and list navigation), and it would be
> good to clean it up.
Agreed, this distinction is required for treesit. Hopefully,
this can be achieved by separating `forward-sexp-function`
and `up-list-function` that in ts-modes could be set to new
functions either `treesit-up-list` or `treesit-up-sexp`.
PS: will try to refactor everything in this attachment to new
`forward-list-function`, `down-list-function` and `up-list-function`:
[-- Attachment #2: scan-lists-advice.el --]
[-- Type: application/emacs-lisp, Size: 1594 bytes --]
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-25 17:36 ` Juri Linkov
@ 2024-12-27 7:59 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-27 7:59 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Stefan Monnier
>> I can’t think of an counter example other than for_statment in
>> tree-sitter-c.
>
> Indeed, the 'for_statement' is a hard problem, and I see
> no good solution.
A possible solution would be to find a way to create "virtual" nodes.
This means transforming the existing syntax tree by inserting
addition nodes to it. For example, transforming
(for_statement "for" "("
condition: (assignment_expression left: (identifier) operator: "=" right: (number_literal))
;
body: (binary_expression left: (identifier) operator: "<" right: (number_literal))
;
(update_expression operator: "++" argument: (identifier))
")"
into
(for_statement "for"
(for_parameters "("
condition: (assignment_expression left: (identifier) operator: "=" right: (number_literal))
;
body: (binary_expression left: (identifier) operator: "<" right: (number_literal))
;
(update_expression operator: "++" argument: (identifier))
")"
by inserting the virtual node "for_parameters". Not sure
if such transformation is supported by tree-sitter.
Generally, we have two problems with syntax trees created
by the authors of tree-sitter grammars:
1. Insufficient structure
2. Excessive structure
For insufficient structure a possible solution would be to insert
virtual nodes like above. And for excessive structure we need
to flatten some nodes. For example, in c-ts-mode:
static -!-struct atimer *free_atimers;
'C-M-f' unexpectedly jumped not to the next symbol, but to
static struct atimer-!- *free_atimers;
because of too much structure built in the syntax tree:
(declaration
type: (storage_class_specifier "static")
declarator: (struct_specifier "struct" name: (type_identifier))
(pointer_declarator "*" declarator: (identifier))
";")
So the solution could be to flatten 'struct_specifier' to move
'type_identifier' to be a sibling in the same list inside 'declaration':
(declaration
type: (storage_class_specifier "static")
declarator: (struct_specifier "struct")
(type_identifier)
(pointer_declarator "*" declarator: (identifier))
";")
Also not sure if such thing is possible in tree-sitter.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-27 7:54 ` Juri Linkov
@ 2024-12-29 17:58 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-29 17:58 UTC (permalink / raw)
To: Stefan Monnier
Cc: Theodor Thornhill, Yuan Fu, Mickey Petersen, Eli Zaretskii, 73404
> So clearly instead of overriding `scan-lists` and `scan-sexps`,
> better would be to add 3 new variables: `forward-list-function`,
> `down-list-function` and `up-list-function`.
Ok, now these 3 functions are added in lisp.el and overridden in treesit.el.
>> What I do know is that in general we'd also want an `up-sexp` operation.
>> Currently we have an ugly kludge in `up-list` to try and use
>> `forward-sexp-function` (which is ugly both because
>> `forward-sexp-function` doesn't really provide the functionality we
>> need, and because it mixes up sexp and list navigation), and it would be
>> good to clean it up.
>
> Agreed, this distinction is required for treesit. Hopefully,
> this can be achieved by separating `forward-sexp-function`
> and `up-list-function` that in ts-modes could be set to new
> functions either `treesit-up-list` or `treesit-up-sexp`.
Like `treesit-forward-sexp-list` uses `forward-sexp-default-function`
to move between symbols inside lists, I'm going to add syntax-based
fallback in `treesit-down-list` and `treesit-up-list` as well.
This is useful in strings and comments. This is what `up-list`
already does. Unfortunately, currently there is such limitation
in `down-list`:
(when (ppss-comment-or-string-start (syntax-ppss))
(user-error "This command doesn't work in strings or comments"))
But this is not a problem for `treesit-down-list`.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-19 7:34 ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
2024-12-24 19:05 ` Juri Linkov
@ 2024-12-30 7:15 ` Juri Linkov
2024-12-30 8:00 ` Yuan Fu
` (2 more replies)
1 sibling, 3 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-30 7:15 UTC (permalink / raw)
To: Yuan Fu
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
> Also added 'sexp-list' to c-ts-mode, js-ts-mode, ruby-ts-mode and
> html-ts-mode. Addition of 'sexp-list' to other ts-modes is underway.
BTW, my initial intention was to add the thing 'list'.
But then I discovered that (treesit-thing-next (point) 'list)
uses the function 'list' instead of the thing 'list'.
However, the current 'sexp-list' as a two-word composite is too ugly.
Now I found a better replacement: 'group'. This word is already used
in lisp.el such as in the docstring of 'forward-list':
"Move forward across one balanced group of parentheses."
And the error messages: "No next group".
So exactly the same message will be used by
(format-message "No next %S" pred)
where 'pred' will be 'group'. IMHO, this looks better:
(setq-local treesit-thing-settings
`((html
(sexp ,(regexp-opt '("element" "text" "attribute" "value")))
(group ,(regexp-opt '("element"")))
(sentence "tag")
(text ,(regexp-opt '("comment" "text"))))))
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-30 7:15 ` Juri Linkov
@ 2024-12-30 8:00 ` Yuan Fu
2025-01-04 17:46 ` Juri Linkov
2024-12-30 13:40 ` Eli Zaretskii
2024-12-30 16:30 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2024-12-30 8:00 UTC (permalink / raw)
To: Juri Linkov
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
> On Dec 29, 2024, at 11:15 PM, Juri Linkov <juri@linkov.net> wrote:
>
>> Also added 'sexp-list' to c-ts-mode, js-ts-mode, ruby-ts-mode and
>> html-ts-mode. Addition of 'sexp-list' to other ts-modes is underway.
>
> BTW, my initial intention was to add the thing 'list'.
> But then I discovered that (treesit-thing-next (point) 'list)
> uses the function 'list' instead of the thing 'list'.
>
> However, the current 'sexp-list' as a two-word composite is too ugly.
> Now I found a better replacement: 'group'. This word is already used
> in lisp.el such as in the docstring of 'forward-list':
>
> "Move forward across one balanced group of parentheses."
>
> And the error messages: "No next group".
>
> So exactly the same message will be used by
>
> (format-message "No next %S" pred)
>
> where 'pred' will be 'group'. IMHO, this looks better:
>
> (setq-local treesit-thing-settings
> `((html
> (sexp ,(regexp-opt '("element" "text" "attribute" "value")))
> (group ,(regexp-opt '("element"")))
> (sentence "tag")
> (text ,(regexp-opt '("comment" "text"))))))
I don’t have an opinion on this. Groups sounds a bit abstract, but so does sexp-list. So as long as we do a good job explaining the concept, it should be fine.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-30 7:15 ` Juri Linkov
2024-12-30 8:00 ` Yuan Fu
@ 2024-12-30 13:40 ` Eli Zaretskii
2024-12-30 18:54 ` Juri Linkov
2024-12-30 16:30 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2 siblings, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2024-12-30 13:40 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, 73404, monnier
> From: Juri Linkov <juri@linkov.net>
> Cc: Theodor Thornhill <theo@thornhill.no>, Eli Zaretskii <eliz@gnu.org>,
> Mickey Petersen <mickey@masteringemacs.org>, 73404@debbugs.gnu.org,
> Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Mon, 30 Dec 2024 09:15:42 +0200
>
> BTW, my initial intention was to add the thing 'list'.
> But then I discovered that (treesit-thing-next (point) 'list)
> uses the function 'list' instead of the thing 'list'.
>
> However, the current 'sexp-list' as a two-word composite is too ugly.
> Now I found a better replacement: 'group'. This word is already used
> in lisp.el such as in the docstring of 'forward-list':
>
> "Move forward across one balanced group of parentheses."
>
> And the error messages: "No next group".
"Group" is too abstract and too removed from the actual thing. I'm
quite sure we can do better.
What are the possible kinds of "groups", which are not balanced
parenthetical expressions? Can you show a more-or-less exhaustive
list of them? With that, we could try looking for a proper
terminology.
Thanks.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-30 7:15 ` Juri Linkov
2024-12-30 8:00 ` Yuan Fu
2024-12-30 13:40 ` Eli Zaretskii
@ 2024-12-30 16:30 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-30 18:59 ` Juri Linkov
2 siblings, 1 reply; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-30 16:30 UTC (permalink / raw)
To: Juri Linkov
Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404
> BTW, my initial intention was to add the thing 'list'.
> But then I discovered that (treesit-thing-next (point) 'list)
> uses the function 'list' instead of the thing 'list'.
I don't have a good suggestion for the actual naming.
Regarding the fact that this arg can take a either symbol or a function
(which suffers from a risk of ambiguity, like you discovered), I think
it's very important to try and avoid the intersection of the two, and
a "standard" way to do that is to use keywords, like `:list`.
Stefan
PS: Tho, strictly speaking you can `(defun :list ...)`.
I just hope noone ever does that (although I plead guilty to getting
dangerously close to it when I suggested this very idea to Philip for
`setup.el`).
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-30 13:40 ` Eli Zaretskii
@ 2024-12-30 18:54 ` Juri Linkov
2024-12-30 19:36 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2024-12-30 18:54 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, 73404, monnier
>> BTW, my initial intention was to add the thing 'list'.
>> But then I discovered that (treesit-thing-next (point) 'list)
>> uses the function 'list' instead of the thing 'list'.
>>
>> However, the current 'sexp-list' as a two-word composite is too ugly.
>> Now I found a better replacement: 'group'. This word is already used
>> in lisp.el such as in the docstring of 'forward-list':
>>
>> "Move forward across one balanced group of parentheses."
>>
>> And the error messages: "No next group".
>
> "Group" is too abstract and too removed from the actual thing. I'm
> quite sure we can do better.
The proper name is "list". Since it can't be used directly,
the second best variant is "group".
> What are the possible kinds of "groups", which are not balanced
> parenthetical expressions?
All groups should be balanced. Even in languages that don't use
parens and brackets, e.g. "def" and "end" are balanced
while these are not parenthetical expressions.
> Can you show a more-or-less exhaustive list of them?
A exhaustive list is not yet finished. But basically most
groups are parenthetical expressions while some don't use parens.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-30 16:30 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-30 18:59 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2024-12-30 18:59 UTC (permalink / raw)
To: Stefan Monnier
Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404
>> BTW, my initial intention was to add the thing 'list'.
>> But then I discovered that (treesit-thing-next (point) 'list)
>> uses the function 'list' instead of the thing 'list'.
>
> I don't have a good suggestion for the actual naming.
>
> Regarding the fact that this arg can take a either symbol or a function
> (which suffers from a risk of ambiguity, like you discovered), I think
> it's very important to try and avoid the intersection of the two, and
> a "standard" way to do that is to use keywords, like `:list`.
Then for backward-compatibility for existing things could
support both variants, e.g. 'sentence' and ':sentence',
'text' and ':text' (luckily no one yet defined such functions).
And for the new thing to use ':list', e.g.
(setq-local treesit-thing-settings
`((html
(:sexp ,(regexp-opt '("element" "text" "attribute" "value")))
(:list ,(regexp-opt '("element"")))
(:sentence "tag")
(:text ,(regexp-opt '("comment" "text"))))))
Another variant to avoid ambiguity would be to use a special syntax
in calls, e.g. instead of (treesit-thing-next pos 'list)
to use (treesit-thing-next pos '(thing list)).
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-30 18:54 ` Juri Linkov
@ 2024-12-30 19:36 ` Eli Zaretskii
0 siblings, 0 replies; 106+ messages in thread
From: Eli Zaretskii @ 2024-12-30 19:36 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, 73404, monnier
> From: Juri Linkov <juri@linkov.net>
> Cc: casouri@gmail.com, theo@thornhill.no, mickey@masteringemacs.org,
> 73404@debbugs.gnu.org, monnier@iro.umontreal.ca
> Date: Mon, 30 Dec 2024 20:54:39 +0200
>
> > "Group" is too abstract and too removed from the actual thing. I'm
> > quite sure we can do better.
>
> The proper name is "list". Since it can't be used directly,
> the second best variant is "group".
Let's delay this part of the discussion until we see enough examples
of those "lists" or "groups".
> > What are the possible kinds of "groups", which are not balanced
> > parenthetical expressions?
>
> All groups should be balanced. Even in languages that don't use
> parens and brackets, e.g. "def" and "end" are balanced
> while these are not parenthetical expressions.
>
> > Can you show a more-or-less exhaustive list of them?
>
> A exhaustive list is not yet finished. But basically most
> groups are parenthetical expressions while some don't use parens.
OK, maybe "exhaustive" is too much. Can you post a list that you have
now?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2024-12-30 8:00 ` Yuan Fu
@ 2025-01-04 17:46 ` Juri Linkov
2025-01-04 19:05 ` Eli Zaretskii
2025-01-05 7:37 ` Juri Linkov
0 siblings, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-04 17:46 UTC (permalink / raw)
To: Yuan Fu
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
Stefan Monnier
>> BTW, my initial intention was to add the thing 'list'.
>> But then I discovered that (treesit-thing-next (point) 'list)
>> uses the function 'list' instead of the thing 'list'.
>>
>> However, the current 'sexp-list' as a two-word composite is too ugly.
>> Now I found a better replacement: 'group'. This word is already used
>> in lisp.el such as in the docstring of 'forward-list':
>
> I don’t have an opinion on this. Groups sounds a bit abstract, but so
> does sexp-list.
Indeed, the right name is 'list'. So let's use it.
The minimal change that doesn't require updating
all existing definitions of treesit-thing-settings,
is just to add an exception for 'list':
diff --git a/src/treesit.c b/src/treesit.c
index b3214dad836..2d15e64180d 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -3618,7 +3618,7 @@ treesit_traverse_validate_predicate (Lisp_Object pred,
}
if (STRINGP (pred))
return true;
- else if (FUNCTIONP (pred))
+ else if (FUNCTIONP (pred) && !BASE_EQ (pred, Qlist))
return true;
else if (SYMBOLP (pred))
{
@@ -3722,7 +3722,7 @@ treesit_traverse_match_predicate (TSTreeCursor *cursor, Lisp_Object pred,
const char *type = ts_node_type (node);
return fast_c_string_match (pred, type, strlen (type)) >= 0;
}
- else if (FUNCTIONP (pred))
+ else if (FUNCTIONP (pred) && !BASE_EQ (pred, Qlist))
{
Lisp_Object lisp_node = make_treesit_node (parser, node);
return !NILP (CALLN (Ffuncall, pred, lisp_node));
^ permalink raw reply related [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-04 17:46 ` Juri Linkov
@ 2025-01-04 19:05 ` Eli Zaretskii
2025-01-05 7:32 ` Juri Linkov
2025-01-05 7:37 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2025-01-04 19:05 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, 73404, monnier
> From: Juri Linkov <juri@linkov.net>
> Cc: Theodor Thornhill <theo@thornhill.no>, Eli Zaretskii <eliz@gnu.org>,
> Mickey Petersen <mickey@masteringemacs.org>, 73404@debbugs.gnu.org,
> Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sat, 04 Jan 2025 19:46:12 +0200
>
> >> BTW, my initial intention was to add the thing 'list'.
> >> But then I discovered that (treesit-thing-next (point) 'list)
> >> uses the function 'list' instead of the thing 'list'.
> >>
> >> However, the current 'sexp-list' as a two-word composite is too ugly.
> >> Now I found a better replacement: 'group'. This word is already used
> >> in lisp.el such as in the docstring of 'forward-list':
> >
> > I don’t have an opinion on this. Groups sounds a bit abstract, but so
> > does sexp-list.
>
> Indeed, the right name is 'list'. So let's use it.
Please don't change the terminology yet. I'd like to try these
commands with some tree-sitter modes and see what names we could use
for them. I didn't yet have time for that, but OTOH there's no rush.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-04 19:05 ` Eli Zaretskii
@ 2025-01-05 7:32 ` Juri Linkov
2025-01-05 11:46 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 13:02 ` Eli Zaretskii
0 siblings, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-05 7:32 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, 73404, monnier
>> >> BTW, my initial intention was to add the thing 'list'.
>> >> But then I discovered that (treesit-thing-next (point) 'list)
>> >> uses the function 'list' instead of the thing 'list'.
>> >>
>> >> However, the current 'sexp-list' as a two-word composite is too ugly.
>> >> Now I found a better replacement: 'group'. This word is already used
>> >> in lisp.el such as in the docstring of 'forward-list':
>> >
>> > I don’t have an opinion on this. Groups sounds a bit abstract, but so
>> > does sexp-list.
>>
>> Indeed, the right name is 'list'. So let's use it.
>
> Please don't change the terminology yet.
This doesn't change the terminology. The ugly name was added a week ago.
We need to fix it to the right name. The right name is 'list' to
conform to the function name 'forward-list'.
> I'd like to try these commands with some tree-sitter modes and see
> what names we could use for them.
The command is 'forward-list'. With 'list' it will do in ts-modes
the same that 'forward-list' already does in non-ts modes.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-04 17:46 ` Juri Linkov
2025-01-04 19:05 ` Eli Zaretskii
@ 2025-01-05 7:37 ` Juri Linkov
2025-01-05 14:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-05 7:37 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Stefan Monnier
[-- Attachment #1: Type: text/plain, Size: 455 bytes --]
> The minimal change that doesn't require updating
> all existing definitions of treesit-thing-settings,
> is just to add an exception for 'list':
> [...]
> @@ -3618,7 +3618,7 @@ treesit_traverse_validate_predicate (Lisp_Object pred,
> - else if (FUNCTIONP (pred))
> + else if (FUNCTIONP (pred) && !BASE_EQ (pred, Qlist))
I admit that hard-coding one symbol is not the right thing to do.
So here is a better patch that checks for the symbol property:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: treesit-predicate.patch --]
[-- Type: text/x-diff, Size: 1593 bytes --]
diff --git a/lisp/treesit.el b/lisp/treesit.el
index e643eb48654..169607b150e 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2516,6 +2516,8 @@ treesit-forward-sexp-list
(treesit--scan-error pred arg)))
(setq cnt (- cnt inc)))))
+(put 'list 'treesit-predicate t)
+
(defun treesit-forward-list (&optional arg)
"Move forward across a list.
What constitutes a list is determined by `sexp-list' in
diff --git a/src/treesit.c b/src/treesit.c
index b3214dad836..4d2cd45e1b2 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -3618,7 +3618,7 @@ treesit_traverse_validate_predicate (Lisp_Object pred,
}
if (STRINGP (pred))
return true;
- else if (FUNCTIONP (pred))
+ else if (FUNCTIONP (pred) && !(SYMBOLP (pred) && Fget (pred, Qtreesit_predicate)))
return true;
else if (SYMBOLP (pred))
{
@@ -3722,7 +3722,7 @@ treesit_traverse_match_predicate (TSTreeCursor *cursor, Lisp_Object pred,
const char *type = ts_node_type (node);
return fast_c_string_match (pred, type, strlen (type)) >= 0;
}
- else if (FUNCTIONP (pred))
+ else if (FUNCTIONP (pred) && !(SYMBOLP (pred) && Fget (pred, Qtreesit_predicate)))
{
Lisp_Object lisp_node = make_treesit_node (parser, node);
return !NILP (CALLN (Ffuncall, pred, lisp_node));
@@ -4333,6 +4333,8 @@ syms_of_treesit (void)
DEFSYM (Qtreesit_invalid_predicate, "treesit-invalid-predicate");
DEFSYM (Qtreesit_predicate_not_found, "treesit-predicate-not-found");
+ DEFSYM (Qtreesit_predicate, "treesit-predicate");
+
DEFSYM (Qor, "or");
#ifdef WINDOWSNT
^ permalink raw reply related [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 7:32 ` Juri Linkov
@ 2025-01-05 11:46 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 12:18 ` Eli Zaretskii
2025-01-05 17:59 ` Juri Linkov
2025-01-05 13:02 ` Eli Zaretskii
1 sibling, 2 replies; 106+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 11:46 UTC (permalink / raw)
To: Juri Linkov, Eli Zaretskii; +Cc: casouri, mickey, 73404, monnier
Juri Linkov <juri@linkov.net> writes:
>>> >> BTW, my initial intention was to add the thing 'list'.
>>> >> But then I discovered that (treesit-thing-next (point) 'list)
>>> >> uses the function 'list' instead of the thing 'list'.
>>> >>
>>> >> However, the current 'sexp-list' as a two-word composite is too ugly.
>>> >> Now I found a better replacement: 'group'. This word is already used
>>> >> in lisp.el such as in the docstring of 'forward-list':
>>> >
>>> > I don’t have an opinion on this. Groups sounds a bit abstract, but so
>>> > does sexp-list.
>>>
>>> Indeed, the right name is 'list'. So let's use it.
>>
>> Please don't change the terminology yet.
>
> This doesn't change the terminology. The ugly name was added a week ago.
> We need to fix it to the right name. The right name is 'list' to
> conform to the function name 'forward-list'.
>
>> I'd like to try these commands with some tree-sitter modes and see
>> what names we could use for them.
>
> The command is 'forward-list'. With 'list' it will do in ts-modes
> the same that 'forward-list' already does in non-ts modes.
One issue I've had when exploring these things earlier is that I don't
know the "spec" for how navigation in programming languages should work
in Emacs. I mean, there are lots of examples in the source code, but no
true specification. Should we possibly try to define some sane defaults
here, so that it will be simpler to deal with idiosyncrasies of the tree
sitter parsers? Lisp isn't in my experience the best language to define
this, as it is just too simple. Also, many of the functions are related
to word processing, like sentence, word, paragraph etc, which aren't
really useful in programming, imo. What is a sentence in java, lisp, or
python, for example? Also, maybe sexp is too lisp/ast-like of a term, so
much so that it is hard to reason about what a sexp is or isn't.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 11:46 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 12:18 ` Eli Zaretskii
2025-01-05 12:30 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 17:59 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2025-01-05 12:18 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: casouri, mickey, 73404, monnier, juri
> X-Spam-Status: No, score=-1 tagged_above=-10 required=5 tests=[ALL_TRUSTED=-1]
> autolearn=ham autolearn_force=no
> From: Theodor Thornhill <theo@thornhill.no>
> Cc: casouri@gmail.com, mickey@masteringemacs.org, 73404@debbugs.gnu.org,
> monnier@iro.umontreal.ca
> Date: Sun, 05 Jan 2025 12:46:03 +0100
>
> Juri Linkov <juri@linkov.net> writes:
>
> > The command is 'forward-list'. With 'list' it will do in ts-modes
> > the same that 'forward-list' already does in non-ts modes.
>
> One issue I've had when exploring these things earlier is that I don't
> know the "spec" for how navigation in programming languages should work
> in Emacs. I mean, there are lots of examples in the source code, but no
> true specification. Should we possibly try to define some sane defaults
> here, so that it will be simpler to deal with idiosyncrasies of the tree
> sitter parsers? Lisp isn't in my experience the best language to define
> this, as it is just too simple. Also, many of the functions are related
> to word processing, like sentence, word, paragraph etc, which aren't
> really useful in programming, imo. What is a sentence in java, lisp, or
> python, for example? Also, maybe sexp is too lisp/ast-like of a term, so
> much so that it is hard to reason about what a sexp is or isn't.
Is this goal achievable in practice,l given the sometimes radical
differences between languages? Maybe we only can have examples and
some dwim-ish behavior in most cases?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 12:18 ` Eli Zaretskii
@ 2025-01-05 12:30 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 16:59 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 12:30 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, mickey, 73404, monnier, juri
Eli Zaretskii <eliz@gnu.org> writes:
>> X-Spam-Status: No, score=-1 tagged_above=-10 required=5 tests=[ALL_TRUSTED=-1]
>> autolearn=ham autolearn_force=no
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: casouri@gmail.com, mickey@masteringemacs.org, 73404@debbugs.gnu.org,
>> monnier@iro.umontreal.ca
>> Date: Sun, 05 Jan 2025 12:46:03 +0100
>>
>> Juri Linkov <juri@linkov.net> writes:
>>
>> > The command is 'forward-list'. With 'list' it will do in ts-modes
>> > the same that 'forward-list' already does in non-ts modes.
>>
>> One issue I've had when exploring these things earlier is that I don't
>> know the "spec" for how navigation in programming languages should work
>> in Emacs. I mean, there are lots of examples in the source code, but no
>> true specification. Should we possibly try to define some sane defaults
>> here, so that it will be simpler to deal with idiosyncrasies of the tree
>> sitter parsers? Lisp isn't in my experience the best language to define
>> this, as it is just too simple. Also, many of the functions are related
>> to word processing, like sentence, word, paragraph etc, which aren't
>> really useful in programming, imo. What is a sentence in java, lisp, or
>> python, for example? Also, maybe sexp is too lisp/ast-like of a term, so
>> much so that it is hard to reason about what a sexp is or isn't.
>
> Is this goal achievable in practice,l given the sometimes radical
> differences between languages? Maybe we only can have examples and
> some dwim-ish behavior in most cases?
I'm not sure if it is achievable or not, but I think it'd be nice to
avoid any "hand-waviness" to confuse us while we talk about
sexps/lists/etc. I could try to document thoroughly how lisp-modes and
cc modes implement these and see if we have some glaring discrepancies
between just those two?
I'm just thinking that describing the as-is could be very beneficial, so
that we know we're talking the same abstractions. It could start just as
a scratch/movements.org file, not touching any code until you, stefan
and others agree that we're talking about the same stuff.
Then we can at least consult this document when we stumble upon things
like jsx-text, whitespace etc. After this we can proceed with mickey's
combobulate and friends to inform us on what the best way to deal with
it is? Maybe we can discover som simplifications in core too.
WDYT?
Theo
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 7:32 ` Juri Linkov
2025-01-05 11:46 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 13:02 ` Eli Zaretskii
2025-01-05 18:05 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2025-01-05 13:02 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, 73404, monnier
> From: Juri Linkov <juri@linkov.net>
> Cc: casouri@gmail.com, theo@thornhill.no, mickey@masteringemacs.org,
> 73404@debbugs.gnu.org, monnier@iro.umontreal.ca
> Date: Sun, 05 Jan 2025 09:32:58 +0200
>
> >> >> BTW, my initial intention was to add the thing 'list'.
> >> >> But then I discovered that (treesit-thing-next (point) 'list)
> >> >> uses the function 'list' instead of the thing 'list'.
> >> >>
> >> >> However, the current 'sexp-list' as a two-word composite is too ugly.
> >> >> Now I found a better replacement: 'group'. This word is already used
> >> >> in lisp.el such as in the docstring of 'forward-list':
> >> >
> >> > I don’t have an opinion on this. Groups sounds a bit abstract, but so
> >> > does sexp-list.
> >>
> >> Indeed, the right name is 'list'. So let's use it.
> >
> > Please don't change the terminology yet.
>
> This doesn't change the terminology. The ugly name was added a week ago.
> We need to fix it to the right name. The right name is 'list' to
> conform to the function name 'forward-list'.
>
> > I'd like to try these commands with some tree-sitter modes and see
> > what names we could use for them.
>
> The command is 'forward-list'. With 'list' it will do in ts-modes
> the same that 'forward-list' already does in non-ts modes.
Thanks.
I tried this now in c-ts-mode, and it seems to behave strangely in
some cases.
First, AFAICT it supports "lists" enclosed in "(..)", "{..}" and
"<..>", but not "[..]". Is that expected? Why don't we support
"[..]"?
Next, it many times signals an error "No next group", which is less
useful than it could be. For example:
w32_get_internal_run_time (void)
{
if (get_process_times_fn)
{
FILETIME create, exit, kernel, user;
HANDLE proc = GetCurrentProcess ();
if ((*get_process_times_fn) (proc, &create, &exit, &kernel, &user))
{
return ltime (total.QuadPart);
} ^
}
return Fcurrent_time ();
}
If you put point on the semi-colon marked by "^", C-M-n signals an
error. Can't we instead move to the next "list" after
"Fcurrent_time"?
Also, it sometimes "misses" what looks like a "list". For example:
s_pfn_Open_Process_Token = (OpenProcessToken_Proc)
get_proc_addr (hm_advapi32, "OpenProcessToken");
If you put point at the beginning of the line, C-M-n moves to the end
of the 2nd parenthesized group, not to the end of the first group.
As for terminology: the Emacs user manual calls these "groupings
delimited by parentheses (or whatever else serves as delimiters in the
language you are working with)", or in short "parenthetical group".
Why cannot we keep this terminology? It sounds like it describes its
subject as accurate as we can come up with.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 7:37 ` Juri Linkov
@ 2025-01-05 14:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 18:10 ` Juri Linkov
2025-01-06 8:56 ` Yuan Fu
0 siblings, 2 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 14:40 UTC (permalink / raw)
To: Juri Linkov
Cc: Theodor Thornhill, Yuan Fu, Mickey Petersen, Eli Zaretskii, 73404
> I admit that hard-coding one symbol is not the right thing to do.
> So here is a better patch that checks for the symbol property:
That's better but I still wonder why we don't want to use a keyword.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 12:30 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 16:59 ` Eli Zaretskii
2025-01-05 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 21:18 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 2 replies; 106+ messages in thread
From: Eli Zaretskii @ 2025-01-05 16:59 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: casouri, mickey, 73404, monnier, juri
> From: Theodor Thornhill <theo@thornhill.no>
> Cc: juri@linkov.net, casouri@gmail.com, mickey@masteringemacs.org,
> 73404@debbugs.gnu.org, monnier@iro.umontreal.ca
> Date: Sun, 05 Jan 2025 13:30:31 +0100
>
> > Is this goal achievable in practice, given the sometimes radical
> > differences between languages? Maybe we only can have examples and
> > some dwim-ish behavior in most cases?
>
> I'm not sure if it is achievable or not, but I think it'd be nice to
> avoid any "hand-waviness" to confuse us while we talk about
> sexps/lists/etc. I could try to document thoroughly how lisp-modes and
> cc modes implement these and see if we have some glaring discrepancies
> between just those two?
They are not "discrepancies", and certainly not "glaring", but there
are differences. E.g., in c-ts-mode, we support "lists" enclosed in
(..), {..}, and <..> (but not [..], for some reason). You will not
find anything like that in Lisp-like languages.
So, if we want to have a broader picture, we will need to try more
modes. Can you or someone produce an exhaustive list of all TS modes
we have that implement these movement commands? Then we could try the
commands in each one of those modes and see if there are common
patterns that we can name and describe.
For now, my conclusion was that (almost) every parenthetical grouping
supported by the language grammar can be moved across with these
commands. But I only tried Lisp and c-ts-mode.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 11:46 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 12:18 ` Eli Zaretskii
@ 2025-01-05 17:59 ` Juri Linkov
2025-01-05 19:50 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-05 17:59 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: Eli Zaretskii, mickey, casouri, monnier, 73404
>> The command is 'forward-list'. With 'list' it will do in ts-modes
>> the same that 'forward-list' already does in non-ts modes.
>
> One issue I've had when exploring these things earlier is that I don't
> know the "spec" for how navigation in programming languages should work
> in Emacs. I mean, there are lots of examples in the source code, but no
> true specification. Should we possibly try to define some sane defaults
> here, so that it will be simpler to deal with idiosyncrasies of the tree
> sitter parsers? Lisp isn't in my experience the best language to define
> this, as it is just too simple.
Very simple: I'm striving to provide compatibility between ts-modes
with their predecessor non-ts modes. So the de-facto "spec" is the
existing non-ts modes that provide canonical reference implementation.
This helps the users to more smoothly switch to ts-modes.
> Also, many of the functions are related to word processing, like
> sentence, word, paragraph etc, which aren't really useful in
> programming, imo. What is a sentence in java, lisp, or python, for
> example? Also, maybe sexp is too lisp/ast-like of a term, so much so
> that it is hard to reason about what a sexp is or isn't.
Indeed, sentences and paragraphs are even more fuzzy concepts when applied
to programming languages. But some modes make sense of these things
nevertheless. For example, in both c-mode and c-ts-mode 'M-e'
moves to the end of statement that usually end with a semicolon;
so your intuition about these things looks correct.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 13:02 ` Eli Zaretskii
@ 2025-01-05 18:05 ` Juri Linkov
2025-01-05 19:54 ` Eli Zaretskii
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-05 18:05 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, 73404, monnier
>> The command is 'forward-list'. With 'list' it will do in ts-modes
>> the same that 'forward-list' already does in non-ts modes.
>
> Thanks.
>
> I tried this now in c-ts-mode, and it seems to behave strangely in
> some cases.
>
> First, AFAICT it supports "lists" enclosed in "(..)", "{..}" and
> "<..>", but not "[..]". Is that expected? Why don't we support
> "[..]"?
This looks like a bug. It would be nice if you posted an example.
> Next, it many times signals an error "No next group", which is less
> useful than it could be. For example:
>
> w32_get_internal_run_time (void)
> {
> if (get_process_times_fn)
> {
> FILETIME create, exit, kernel, user;
> HANDLE proc = GetCurrentProcess ();
> if ((*get_process_times_fn) (proc, &create, &exit, &kernel, &user))
> {
> return ltime (total.QuadPart);
> } ^
> }
>
> return Fcurrent_time ();
> }
>
> If you put point on the semi-colon marked by "^", C-M-n signals an
> error. Can't we instead move to the next "list" after
> "Fcurrent_time"?
'C-M-n' doesn't move to the next statement in c-mode,
since we need to support backward-compatibility. The key
that moves to the next statement ignoring nested lists
in C modes is 'M-e'.
> Also, it sometimes "misses" what looks like a "list". For example:
>
> s_pfn_Open_Process_Token = (OpenProcessToken_Proc)
> get_proc_addr (hm_advapi32, "OpenProcessToken");
>
> If you put point at the beginning of the line, C-M-n moves to the end
> of the 2nd parenthesized group, not to the end of the first group.
Welcome to the twisty maze of tree-sitter grammars. C-M-n is unable
to move to the first group because it's inside the node in the AST:
(cast_expression "("
type: (type_descriptor type: (type_identifier))
")"
value:
(call_expression function: (identifier)
arguments:
(argument_list "(" (identifier) ,
(string_literal " (string_content) ")
")")))
Both the first and the 2nd parenthesized groups are inside the
"cast_expression" node.
> As for terminology: the Emacs user manual calls these "groupings
> delimited by parentheses (or whatever else serves as delimiters in the
> language you are working with)", or in short "parenthetical group".
> Why cannot we keep this terminology? It sounds like it describes its
> subject as accurate as we can come up with.
So you prefer "group" over "list"? Then 'forward-list'
should move over "group"?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 14:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 18:10 ` Juri Linkov
2025-01-06 8:56 ` Yuan Fu
1 sibling, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-05 18:10 UTC (permalink / raw)
To: Stefan Monnier
Cc: Theodor Thornhill, Yuan Fu, Mickey Petersen, Eli Zaretskii, 73404
>> I admit that hard-coding one symbol is not the right thing to do.
>> So here is a better patch that checks for the symbol property:
>
> That's better but I still wonder why we don't want to use a keyword.
The downside of a keyword is that changing the syntax will require
updating all existing settings of treesit-thing-settings.
But if Yuan agrees to this change, this would be fine.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 16:59 ` Eli Zaretskii
@ 2025-01-05 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 21:18 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 106+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 18:20 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: casouri, mickey, 73404, monnier, juri
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: juri@linkov.net, casouri@gmail.com, mickey@masteringemacs.org,
>> 73404@debbugs.gnu.org, monnier@iro.umontreal.ca
>> Date: Sun, 05 Jan 2025 13:30:31 +0100
>>
>> > Is this goal achievable in practice, given the sometimes radical
>> > differences between languages? Maybe we only can have examples and
>> > some dwim-ish behavior in most cases?
>>
>> I'm not sure if it is achievable or not, but I think it'd be nice to
>> avoid any "hand-waviness" to confuse us while we talk about
>> sexps/lists/etc. I could try to document thoroughly how lisp-modes and
>> cc modes implement these and see if we have some glaring discrepancies
>> between just those two?
>
> They are not "discrepancies", and certainly not "glaring", but there
> are differences. E.g., in c-ts-mode, we support "lists" enclosed in
> (..), {..}, and <..> (but not [..], for some reason). You will not
> find anything like that in Lisp-like languages.
>
I didn't mean to sound harsh, my apoligies.
> So, if we want to have a broader picture, we will need to try more
> modes. Can you or someone produce an exhaustive list of all TS modes
> we have that implement these movement commands? Then we could try the
> commands in each one of those modes and see if there are common
> patterns that we can name and describe.
>
Sure!
> For now, my conclusion was that (almost) every parenthetical grouping
> supported by the language grammar can be moved across with these
> commands. But I only tried Lisp and c-ts-mode.
Yeah, absolutely. I think it is moving in the right direction. What I'm
thinking of is more in the line of standarization as a conscious effort,
not by consequence of convention. The reason being to help aid us when
discussing these issues. I feel we often attribute bugs to our
subjective meaning, perhaps because we are missing some clearer
structure in the expected behavior. I'm absolutely open to being wrong
here, though.
Theo
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 17:59 ` Juri Linkov
@ 2025-01-05 19:50 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 7:54 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 19:50 UTC (permalink / raw)
To: Juri Linkov; +Cc: Eli Zaretskii, mickey, casouri, monnier, 73404
Juri Linkov <juri@linkov.net> writes:
>>> The command is 'forward-list'. With 'list' it will do in ts-modes
>>> the same that 'forward-list' already does in non-ts modes.
>>
>> One issue I've had when exploring these things earlier is that I don't
>> know the "spec" for how navigation in programming languages should work
>> in Emacs. I mean, there are lots of examples in the source code, but no
>> true specification. Should we possibly try to define some sane defaults
>> here, so that it will be simpler to deal with idiosyncrasies of the tree
>> sitter parsers? Lisp isn't in my experience the best language to define
>> this, as it is just too simple.
>
> Very simple: I'm striving to provide compatibility between ts-modes
> with their predecessor non-ts modes. So the de-facto "spec" is the
> existing non-ts modes that provide canonical reference implementation.
> This helps the users to more smoothly switch to ts-modes.
>
Yes, I agree on this goal, absolutely. That was also my initial intent
and attempt when we started doing this. There are still many places
where either the canonical modes are inconsistent but has so much
history that we're so used to it and forget. If we blindly try to copy
everything one to one we might also adopt some stuff that doesn't really
make sense.
However, deciding on what makes sense is hard, which is why I'm
suggesting we try to define a "spec". It needn't really be much more
than reference suggestions for mode implementors at first, then maybe
some intersection of features crystallize across all modes.
>> Also, many of the functions are related to word processing, like
>> sentence, word, paragraph etc, which aren't really useful in
>> programming, imo. What is a sentence in java, lisp, or python, for
>> example? Also, maybe sexp is too lisp/ast-like of a term, so much so
>> that it is hard to reason about what a sexp is or isn't.
>
> Indeed, sentences and paragraphs are even more fuzzy concepts when applied
> to programming languages. But some modes make sense of these things
> nevertheless. For example, in both c-mode and c-ts-mode 'M-e'
> moves to the end of statement that usually end with a semicolon;
> so your intuition about these things looks correct.
Yes, but imo more importantly, disregarding any confusing naming we have
a very rich set of operations available to us that I think should be as
distinct as possible:
- C-M-f
- M-f
- M-a
- M-e
- C-f
- C-a
- C-e
- more
If we'll aim to get the absolute most out of the parse tree
possibilities we should strive to make these distinct, at least down the
line. Some of these are simpler to categorize than others, such as a
sentence meaning "some expression that ends that isn't a function or a
class definition".
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 18:05 ` Juri Linkov
@ 2025-01-05 19:54 ` Eli Zaretskii
2025-01-06 18:02 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Eli Zaretskii @ 2025-01-05 19:54 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, 73404, monnier
> From: Juri Linkov <juri@linkov.net>
> Cc: casouri@gmail.com, theo@thornhill.no, mickey@masteringemacs.org,
> 73404@debbugs.gnu.org, monnier@iro.umontreal.ca
> Date: Sun, 05 Jan 2025 20:05:53 +0200
>
> >> The command is 'forward-list'. With 'list' it will do in ts-modes
> >> the same that 'forward-list' already does in non-ts modes.
> >
> > Thanks.
> >
> > I tried this now in c-ts-mode, and it seems to behave strangely in
> > some cases.
> >
> > First, AFAICT it supports "lists" enclosed in "(..)", "{..}" and
> > "<..>", but not "[..]". Is that expected? Why don't we support
> > "[..]"?
>
> This looks like a bug. It would be nice if you posted an example.
That's easy:
C-f C-f src/dispnew.c RET
M-x -c-ts-mode RET
C-s [ RET
This brings you to this function:
static void
check_rows (struct frame *f)
{
for (int y = 0; y < f->desired_matrix->nrows; ++y)
if (MATRIX_ROW_ENABLED_P (f->desired_matrix, y))
{
struct glyph_row *row = MATRIX_ROW (f->desired_matrix, y);
for (int x = 0; x < row->used[TEXT_AREA]; ++x)
eassert (row->glyphs[TEXT_AREA][x].frame != 0);
}
}
Move point to any of the [ brackets there and type C-M-n -- you will
be moved to the last ) of the function.
Another example on line 152 in dispnew.c:
struct redisplay_history
{
char trace[512 + 100];
};
Move point to the [ before 512: C-M-n will signal an error.
Any bracket in the file will trigger either the first or the second
behavior.
> > Next, it many times signals an error "No next group", which is less
> > useful than it could be. For example:
> >
> > w32_get_internal_run_time (void)
> > {
> > if (get_process_times_fn)
> > {
> > FILETIME create, exit, kernel, user;
> > HANDLE proc = GetCurrentProcess ();
> > if ((*get_process_times_fn) (proc, &create, &exit, &kernel, &user))
> > {
> > return ltime (total.QuadPart);
> > } ^
> > }
> >
> > return Fcurrent_time ();
> > }
> >
> > If you put point on the semi-colon marked by "^", C-M-n signals an
> > error. Can't we instead move to the next "list" after
> > "Fcurrent_time"?
>
> 'C-M-n' doesn't move to the next statement in c-mode,
> since we need to support backward-compatibility. The key
> that moves to the next statement ignoring nested lists
> in C modes is 'M-e'.
Well, then maybe we could have the more useful behavior as an opt-in
option?
> > Also, it sometimes "misses" what looks like a "list". For example:
> >
> > s_pfn_Open_Process_Token = (OpenProcessToken_Proc)
> > get_proc_addr (hm_advapi32, "OpenProcessToken");
> >
> > If you put point at the beginning of the line, C-M-n moves to the end
> > of the 2nd parenthesized group, not to the end of the first group.
>
> Welcome to the twisty maze of tree-sitter grammars. C-M-n is unable
> to move to the first group because it's inside the node in the AST:
>
> (cast_expression "("
> type: (type_descriptor type: (type_identifier))
> ")"
> value:
> (call_expression function: (identifier)
> arguments:
> (argument_list "(" (identifier) ,
> (string_literal " (string_content) ")
> ")")))
>
> Both the first and the 2nd parenthesized groups are inside the
> "cast_expression" node.
Sorry, I don't follow: the parentheses of the argument list are also
inside a node, and yet we do recognize them. What's the difference?
> > As for terminology: the Emacs user manual calls these "groupings
> > delimited by parentheses (or whatever else serves as delimiters in the
> > language you are working with)", or in short "parenthetical group".
> > Why cannot we keep this terminology? It sounds like it describes its
> > subject as accurate as we can come up with.
>
> So you prefer "group" over "list"?
As a shorthand; the better term is "parenthesized group".
> Then 'forward-list' should move over "group"?
I'm not sure we should rename the commands, given the legacy. But
the doc string should use "group" or "parenthesized group", IMO.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 16:59 ` Eli Zaretskii
2025-01-05 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-05 21:18 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-05 21:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, Theodor Thornhill, 73404, juri
> For now, my conclusion was that (almost) every parenthetical grouping
> supported by the language grammar can be moved across with these
> commands. But I only tried Lisp and c-ts-mode.
Agreed, "list" should probably be anything that makes up an AST node and
is delimited by an opening and a closing "keyword":
(...), [...], if...fi, begin...end, ...
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 19:50 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-06 7:54 ` Juri Linkov
2025-01-06 14:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-06 7:54 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: Eli Zaretskii, mickey, casouri, monnier, 73404
> However, deciding on what makes sense is hard, which is why I'm
> suggesting we try to define a "spec". It needn't really be much more
> than reference suggestions for mode implementors at first, then maybe
> some intersection of features crystallize across all modes.
I don't think it's possible to find a spec better than Stefan already
defined as an AST node delimited by an opening and a closing "keyword":
(...), [...], if...fi, begin...end, ...
>> Indeed, sentences and paragraphs are even more fuzzy concepts when applied
>> to programming languages. But some modes make sense of these things
>> nevertheless. For example, in both c-mode and c-ts-mode 'M-e'
>> moves to the end of statement that usually end with a semicolon;
>> so your intuition about these things looks correct.
>
> Yes, but imo more importantly, disregarding any confusing naming we have
> a very rich set of operations available to us that I think should be as
> distinct as possible:
>
> - C-M-f
> - M-f
> - M-a
> - M-e
> - C-f
> - C-a
> - C-e
> - more
Or when looking at the commands with the 'forward-' prefix:
forward-list (C-M-n)
forward-page (C-x ])
forward-paragraph (M-})
forward-sentence (M-e)
forward-sexp (C-M-f)
forward-symbol
forward-word (M-f)
Interesting that a potentially useful 'forward-symbol' has no binding.
> If we'll aim to get the absolute most out of the parse tree
> possibilities we should strive to make these distinct, at least down the
> line. Some of these are simpler to categorize than others, such as a
> sentence meaning "some expression that ends that isn't a function or a
> class definition".
In languages with statements a sentence corresponds to a statement.
Then a paragraph is a less useful concept since a group of statements
compose a block that corresponds to a list, not a paragraph. Then
probably the definition of a paragraph should remain as a group of lines
separated by empty lines.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 14:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 18:10 ` Juri Linkov
@ 2025-01-06 8:56 ` Yuan Fu
2025-01-07 17:56 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2025-01-06 8:56 UTC (permalink / raw)
To: Stefan Monnier
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Juri Linkov
> On Jan 5, 2025, at 6:40 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> I admit that hard-coding one symbol is not the right thing to do.
>> So here is a better patch that checks for the symbol property:
>
> That's better but I still wonder why we don't want to use a keyword.
>
>
> Stefan
>
My impression is that Elisp has “traditionally” used symbols? Like symbol properties, text properties, thing-at-point, etc. Keywords are really only seen in plists, face attributes, and keyword arguments, all of which involves a key-value relationship.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 7:54 ` Juri Linkov
@ 2025-01-06 14:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 17:40 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-06 14:14 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
>> However, deciding on what makes sense is hard, which is why I'm
>> suggesting we try to define a "spec". It needn't really be much more
>> than reference suggestions for mode implementors at first, then maybe
>> some intersection of features crystallize across all modes.
>
> I don't think it's possible to find a spec better than Stefan already
> defined as an AST node delimited by an opening and a closing "keyword":
> (...), [...], if...fi, begin...end, ...
IIUC, Theodor was talking more generally than just about "list" operations.
I think "list" operations are among the clearest case (BTW, I wonder if
someone has already tried to implement that functionality for Haskell or
Python, where the opening and a closing "keywords" are basically newlines).
> Or when looking at the commands with the 'forward-' prefix:
>
> forward-sexp (C-M-f)
> forward-list (C-M-n)
OK, we already talked about these two. We could probably start drafting
a kind of guideline.
> forward-symbol
This one doesn't look too hard either.
> forward-sentence (M-e)
This one is trickier. You just provided a reasonably good start with
"In languages with statements a sentence corresponds to a statement",
but there are some non-trivial questions about language without
statements, about use within comments, about how to handle nesting
(e.g. statements within statements) and more generally the use of `M-e`
from outside of statements.
> forward-paragraph (M-})
This one is also tied to filling, so it's important for it to behave
well w.r.t to comments to allow filling actual text paragraphs
within comments.
As for its behavior outside of comments ...
> forward-page (C-x ])
> forward-word (M-f)
I think these should be oblivious to the syntax (i.e. line `forward-line`
and `forward-char`).
Your list missed the "defun" unit of navigation.
One other potentially useful unit of navigation that depends on the
syntax would be "list element", which would do something like skip to
the next/previous separator delimiting elements of the nearest enclosing
paired delimiters. For {...;...;...} blocks, this would behave like
`forward-sentence`, except that when used from within the args of
a function call like a `foo (..., ..., ...)`, it would jump
between arguments instead of skipping all the way to the next statement.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 14:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-06 17:40 ` Juri Linkov
2025-01-06 18:04 ` Yuan Fu
2025-01-06 20:07 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-06 17:40 UTC (permalink / raw)
To: Stefan Monnier; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
>>> However, deciding on what makes sense is hard, which is why I'm
>>> suggesting we try to define a "spec". It needn't really be much more
>>> than reference suggestions for mode implementors at first, then maybe
>>> some intersection of features crystallize across all modes.
>>
>> I don't think it's possible to find a spec better than Stefan already
>> defined as an AST node delimited by an opening and a closing "keyword":
>> (...), [...], if...fi, begin...end, ...
>
> IIUC, Theodor was talking more generally than just about "list" operations.
> I think "list" operations are among the clearest case (BTW, I wonder if
> someone has already tried to implement that functionality for Haskell or
> Python,
Recently I looked at python-ts-mode, but then discovered that it
completely misses any use of treesit-thing-settings unlike many
other ts-modes. Also there is no treesit-thing-settings yet
in the new haskell-ts-mode.
> where the opening and a closing "keywords" are basically newlines).
The main problem with empty opening and closing keywords is that
nodes from different levels all end up at the same position,
and currently treesit prefers a higher-level node. This means
that the sequence of `C-M-f C-M-b` is not idempotent anymore:
`C-M-b` might jump to the start of a higher-level node.
>> Or when looking at the commands with the 'forward-' prefix:
>>
>> forward-sexp (C-M-f)
>> forward-list (C-M-n)
>
> OK, we already talked about these two. We could probably start drafting
> a kind of guideline.
Also in treesit.el, forward-sexp and forward-list
could share the same implementation.
>> forward-symbol
>
> This one doesn't look too hard either.
>
>> forward-sentence (M-e)
>
> This one is trickier. You just provided a reasonably good start with
> "In languages with statements a sentence corresponds to a statement",
> but there are some non-trivial questions about language without
> statements, about use within comments, about how to handle nesting
> (e.g. statements within statements) and more generally the use of `M-e`
> from outside of statements.
I've checked that `M-e` in text-mode currently doesn't support
sentences in nested lists. For example, `M-e` jumps inside
the list in:
Top sentence. (First inner sentence. Second inner sentence.)
>> forward-paragraph (M-})
>
> This one is also tied to filling, so it's important for it to behave
> well w.r.t to comments to allow filling actual text paragraphs
> within comments.
This provides an opportunity to move treesit-specific code
from `prog-fill-reindent-defun` to treesit.el.
> As for its behavior outside of comments ...
>
>> forward-page (C-x ])
>> forward-word (M-f)
>
> I think these should be oblivious to the syntax (i.e. line `forward-line`
> and `forward-char`).
>
> Your list missed the "defun" unit of navigation.
Because there is no `forward-defun` ;-)
But strange that "defun" still is defined by `treesit-defun-type-regexp`
instead of a "defun" thing in `treesit-thing-settings`.
> One other potentially useful unit of navigation that depends on the
> syntax would be "list element", which would do something like skip to
> the next/previous separator delimiting elements of the nearest enclosing
> paired delimiters. For {...;...;...} blocks, this would behave like
> `forward-sentence`, except that when used from within the args of
> a function call like a `foo (..., ..., ...)`, it would jump
> between arguments instead of skipping all the way to the next statement.
Indeed, there is a need to have an additional command to move
between more ATS nodes, besides the list nodes navigated by 'C-M-n'.
But the problem is to find free keys for more commands.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-05 19:54 ` Eli Zaretskii
@ 2025-01-06 18:02 ` Juri Linkov
2025-01-06 18:45 ` Eli Zaretskii
2025-01-06 20:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-06 18:02 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: mickey, casouri, theo, 73404, monnier
> Another example on line 152 in dispnew.c:
>
> struct redisplay_history
> {
> char trace[512 + 100];
> };
>
> Move point to the [ before 512: C-M-n will signal an error.
>
> Any bracket in the file will trigger either the first or the second
> behavior.
Sorry, I didn't foresee that you would expect C-M-n and C-M-f to behave
the same way regarding list motion. This is fixed now by syncing
treesit-forward-list with treesit-forward-sexp-list, so all your
test cases work correctly now.
>> > w32_get_internal_run_time (void)
>> > {
>> > if (get_process_times_fn)
>> > {
>> > FILETIME create, exit, kernel, user;
>> > HANDLE proc = GetCurrentProcess ();
>> > if ((*get_process_times_fn) (proc, &create, &exit, &kernel, &user))
>> > {
>> > return ltime (total.QuadPart);
>> > } ^
>> > }
>> >
>> > return Fcurrent_time ();
>> > }
>> >
>> > If you put point on the semi-colon marked by "^", C-M-n signals an
>> > error. Can't we instead move to the next "list" after
>> > "Fcurrent_time"?
>>
>> 'C-M-n' doesn't move to the next statement in c-mode,
>> since we need to support backward-compatibility. The key
>> that moves to the next statement ignoring nested lists
>> in C modes is 'M-e'.
>
> Well, then maybe we could have the more useful behavior as an opt-in
> option?
Do you also need an option to move in Lisp from 1 to 2 in:
(compound_statement
(if_statement
(compound_statement
(return_statement 1)))
(return_statement 2))
>> C-M-n is unable to move to the first group because
>> it's inside the node in the AST:
>>
>> (cast_expression "("
>> type: (type_descriptor type: (type_identifier))
>> ")"
>> value:
>> (call_expression function: (identifier)
>> arguments:
>> (argument_list "(" (identifier) ,
>> (string_literal " (string_content) ")
>> ")")))
>>
>> Both the first and the 2nd parenthesized groups are inside the
>> "cast_expression" node.
>
> Sorry, I don't follow: the parentheses of the argument list are also
> inside a node, and yet we do recognize them. What's the difference?
The problem is that ")" is not the final child of 'cast_expression'.
So C-M-f/C-M-n can't stop after ")".
All these problems stem from the imperfection of ts grammars.
It would be nice if someone will find a way to modify the grammars
after importing them to Emacs core, to be able to insert more nodes.
This will solve a whole class of problems.
Otherwise, if vendoring grammars is not feasible, an alternative
would be to define a new treesit-thing like e.g. "paren" that
will match anonymous nodes like:
(setq-local treesit-thing-settings
`((c
(paren ,(rx (or "{" "}" "[" "]" "(" ")")))
Then low-level treesit functions such as treesit-parent-until
will have to check if there a preceding sibling that matches "paren"
before using their real AST parent node. This means maintaining
a parallel virtual tree.
>> > As for terminology: the Emacs user manual calls these "groupings
>> > delimited by parentheses (or whatever else serves as delimiters in the
>> > language you are working with)", or in short "parenthetical group".
>> > Why cannot we keep this terminology? It sounds like it describes its
>> > subject as accurate as we can come up with.
>>
>> So you prefer "group" over "list"?
>
> As a shorthand; the better term is "parenthesized group".
By definition a list is a parenthesized group.
The term "list" is much shorter than its synonym
"parenthesized group", so I see no reasons not to use "list".
>> Then 'forward-list' should move over "group"?
>
> I'm not sure we should rename the commands, given the legacy. But
> the doc string should use "group" or "parenthesized group", IMO.
The doc string of 'forward-list' already uses this:
(forward-list &optional ARG INTERACTIVE)
Move forward across one balanced group of parentheses.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 17:40 ` Juri Linkov
@ 2025-01-06 18:04 ` Yuan Fu
2025-01-06 20:07 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 106+ messages in thread
From: Yuan Fu @ 2025-01-06 18:04 UTC (permalink / raw)
To: Juri Linkov
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
73404
>>
>> I think these should be oblivious to the syntax (i.e. line `forward-line`
>> and `forward-char`).
>>
>> Your list missed the "defun" unit of navigation.
>
> Because there is no `forward-defun` ;-)
>
> But strange that "defun" still is defined by `treesit-defun-type-regexp`
> instead of a "defun" thing in `treesit-thing-settings`.
treesit-defun-type-regexp is shipped in Emacs 29, so we can’t just remove it. Defun navigation functions in treesit.el does support the defun thing, but I made treesit-defun-type-regexp an override of the defun thing. OTOH treesit-sexp/sentence-type-regexp wasn’t shipped in Emacs 29 so I removed them, preferring the thing definition.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 18:02 ` Juri Linkov
@ 2025-01-06 18:45 ` Eli Zaretskii
2025-01-06 20:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 106+ messages in thread
From: Eli Zaretskii @ 2025-01-06 18:45 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, casouri, theo, 73404, monnier
> From: Juri Linkov <juri@linkov.net>
> Cc: casouri@gmail.com, theo@thornhill.no, mickey@masteringemacs.org,
> 73404@debbugs.gnu.org, monnier@iro.umontreal.ca
> Date: Mon, 06 Jan 2025 20:02:36 +0200
>
> > Another example on line 152 in dispnew.c:
> >
> > struct redisplay_history
> > {
> > char trace[512 + 100];
> > };
> >
> > Move point to the [ before 512: C-M-n will signal an error.
> >
> > Any bracket in the file will trigger either the first or the second
> > behavior.
>
> Sorry, I didn't foresee that you would expect C-M-n and C-M-f to behave
> the same way regarding list motion. This is fixed now by syncing
> treesit-forward-list with treesit-forward-sexp-list, so all your
> test cases work correctly now.
Thanks.
> >> > w32_get_internal_run_time (void)
> >> > {
> >> > if (get_process_times_fn)
> >> > {
> >> > FILETIME create, exit, kernel, user;
> >> > HANDLE proc = GetCurrentProcess ();
> >> > if ((*get_process_times_fn) (proc, &create, &exit, &kernel, &user))
> >> > {
> >> > return ltime (total.QuadPart);
> >> > } ^
> >> > }
> >> >
> >> > return Fcurrent_time ();
> >> > }
> >> >
> >> > If you put point on the semi-colon marked by "^", C-M-n signals an
> >> > error. Can't we instead move to the next "list" after
> >> > "Fcurrent_time"?
> >>
> >> 'C-M-n' doesn't move to the next statement in c-mode,
> >> since we need to support backward-compatibility. The key
> >> that moves to the next statement ignoring nested lists
> >> in C modes is 'M-e'.
> >
> > Well, then maybe we could have the more useful behavior as an opt-in
> > option?
>
> Do you also need an option to move in Lisp from 1 to 2 in:
>
> (compound_statement
> (if_statement
> (compound_statement
> (return_statement 1)))
> (return_statement 2))
It sounds more useful to me than signaling an error, yes.
> It would be nice if someone will find a way to modify the grammars
> after importing them to Emacs core, to be able to insert more nodes.
> This will solve a whole class of problems.
Yes, an ability to modify the grammars and recompile them into the
library would be great.
> >> So you prefer "group" over "list"?
> >
> > As a shorthand; the better term is "parenthesized group".
>
> By definition a list is a parenthesized group.
> The term "list" is much shorter than its synonym
> "parenthesized group", so I see no reasons not to use "list".
The problem with "list" is that it is not clear what it means in
languages that are not Lisp-like.
But we can keep that in the names of commands and variables, IMO.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 17:40 ` Juri Linkov
2025-01-06 18:04 ` Yuan Fu
@ 2025-01-06 20:07 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-07 18:05 ` Juri Linkov
2025-01-11 18:19 ` Juri Linkov
1 sibling, 2 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-06 20:07 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
> The main problem with empty opening and closing keywords is that
> nodes from different levels all end up at the same position,
> and currently treesit prefers a higher-level node. This means
> that the sequence of `C-M-f C-M-b` is not idempotent anymore:
> `C-M-b` might jump to the start of a higher-level node.
It's never been idempotent, tho, e.g. in cases like:
;; Starting from before this comment
;; `C-M-f C-M-b` will end up after it.
(hello world)
or
+ x
But indeed for list operations in indentation-based languages like
Haskell/Python/Coffeescript/YAML, we're faced again with the fact that
a closing LF can close several nested lists, so when going backward
there is some amount of ambiguity (and I'd argue that it's preferable
(and closer to the historical behavior) to choose to jump less far,
e.g. so that `C-M-n C-M-p` never moves backward).
> Also in treesit.el, forward-sexp and forward-list could share the
> same implementation.
[ I assume you mean "share *some* of the implementation", rather than
being literally aliases of each other. ]
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 18:02 ` Juri Linkov
2025-01-06 18:45 ` Eli Zaretskii
@ 2025-01-06 20:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 0 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-06 20:19 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, Eli Zaretskii, theo, casouri, 73404
> The problem is that ")" is not the final child of 'cast_expression'.
> So C-M-f/C-M-n can't stop after ")".
>
> All these problems stem from the imperfection of ts grammars.
I'm not sure if I'd call it "imperfection". There are tradeoffs.
> It would be nice if someone will find a way to modify the grammars
> after importing them to Emacs core, to be able to insert more nodes.
> This will solve a whole class of problems.
I think some way to massage the parse tree would be useful (maybe by
massaging the grammar, but not necessarily). The `cast_expression` is
one place where we could use it, but it would also help us deal with the
fact that grammars can change outside of our control and we may want our
major modes to support various versions of a grammar.
> Otherwise, if vendoring grammars is not feasible, an alternative
> would be to define a new treesit-thing like e.g. "paren" that
> will match anonymous nodes like:
>
> (setq-local treesit-thing-settings
> `((c
> (paren ,(rx (or "{" "}" "[" "]" "(" ")")))
>
> Then low-level treesit functions such as treesit-parent-until
> will have to check if there a preceding sibling that matches "paren"
> before using their real AST parent node. This means maintaining
> a parallel virtual tree.
I'd prefer something more targeted, i.e. some way for the major mode to
add "extra rules" to explain that some part of `cast_expression` should
also be treated as a `list`.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 8:56 ` Yuan Fu
@ 2025-01-07 17:56 ` Juri Linkov
2025-01-07 19:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-07 17:56 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, Stefan Monnier,
73404
>>> I admit that hard-coding one symbol is not the right thing to do.
>>> So here is a better patch that checks for the symbol property:
>>
>> That's better but I still wonder why we don't want to use a keyword.
>
> My impression is that Elisp has “traditionally” used symbols? Like
> symbol properties, text properties, thing-at-point, etc. Keywords are
> really only seen in plists, face attributes, and keyword arguments,
> all of which involves a key-value relationship.
Indeed, symbols are more traditional.
However, the problem arose only because
'treesit_traverse_validate_predicate' accepts a symbol
for both a function and a thing, that caused ambiguity.
So there are two variants to resolve this ambiguity:
1. use keywords like
(setq-local treesit-thing-settings
`((html
(:sexp ,(regexp-opt '("element" "text" "attribute" "value")))
(:list ,(regexp-opt '("element"")))
(:sentence "tag")
(:text ,(regexp-opt '("comment" "text"))))))
2. use a symbol property like (put 'list 'treesit-thing t)
You decide ;-)
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 20:07 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-07 18:05 ` Juri Linkov
2025-01-07 19:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-11 18:19 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-07 18:05 UTC (permalink / raw)
To: Stefan Monnier; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
> But indeed for list operations in indentation-based languages like
> Haskell/Python/Coffeescript/YAML, we're faced again with the fact that
> a closing LF can close several nested lists, so when going backward
> there is some amount of ambiguity (and I'd argue that it's preferable
> (and closer to the historical behavior) to choose to jump less far,
> e.g. so that `C-M-n C-M-p` never moves backward).
I had an idea to keep the currently navigated level in an
internal variable, so `C-M-n C-M-p` will always return back.
But this doesn't look reliable.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-07 18:05 ` Juri Linkov
@ 2025-01-07 19:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 0 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-07 19:19 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
> I had an idea to keep the currently navigated level in an
> internal variable, so `C-M-n C-M-p` will always return back.
> But this doesn't look reliable.
Not only it's not reliable, but it's not necessarily the behavior
we want. After all, if you want to go back, you can say so explicitly
by remembering the original position jumping back to it instead of hoping
against all evidence that the navigation operations are mutual inverses.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-07 17:56 ` Juri Linkov
@ 2025-01-07 19:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-08 2:27 ` Yuan Fu
0 siblings, 1 reply; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-07 19:25 UTC (permalink / raw)
To: Juri Linkov
Cc: Theodor Thornhill, Yuan Fu, Mickey Petersen, Eli Zaretskii, 73404
> Indeed, symbols are more traditional.
>
> However, the problem arose only because
> 'treesit_traverse_validate_predicate' accepts a symbol
> for both a function and a thing, that caused ambiguity.
>
> So there are two variants to resolve this ambiguity:
>
> 1. use keywords like
>
> (setq-local treesit-thing-settings
> `((html
> (:sexp ,(regexp-opt '("element" "text" "attribute" "value")))
> (:list ,(regexp-opt '("element"")))
> (:sentence "tag")
> (:text ,(regexp-opt '("comment" "text"))))))
>
> 2. use a symbol property like (put 'list 'treesit-thing t)
>
> You decide ;-)
Of course, there are more alternatives:
- Use a symbol property like (put 'list 'treesit--this-is-a-function t)
- Refuse functions represented as symbols (callers need to use an
eta-wrapper).
- Distinguish the two cases via a more verbose representation like
`(thing list)` vs `(function list)`.
- Use a string rather than a symbol.
- <I'm sure you can come up with more options>
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-07 19:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-08 2:27 ` Yuan Fu
2025-01-09 7:42 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2025-01-08 2:27 UTC (permalink / raw)
To: Stefan Monnier
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
Juri Linkov
> On Jan 7, 2025, at 11:25 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>
>> Indeed, symbols are more traditional.
>>
>> However, the problem arose only because
>> 'treesit_traverse_validate_predicate' accepts a symbol
>> for both a function and a thing, that caused ambiguity.
>>
>> So there are two variants to resolve this ambiguity:
>>
>> 1. use keywords like
>>
>> (setq-local treesit-thing-settings
>> `((html
>> (:sexp ,(regexp-opt '("element" "text" "attribute" "value")))
>> (:list ,(regexp-opt '("element"")))
>> (:sentence "tag")
>> (:text ,(regexp-opt '("comment" "text"))))))
>>
>> 2. use a symbol property like (put 'list 'treesit-thing t)
>>
>> You decide ;-)
>
> Of course, there are more alternatives:
>
> - Use a symbol property like (put 'list 'treesit--this-is-a-function t)
> - Refuse functions represented as symbols (callers need to use an
> eta-wrapper).
> - Distinguish the two cases via a more verbose representation like
> `(thing list)` vs `(function list)`.
> - Use a string rather than a symbol.
> - <I'm sure you can come up with more options>
Thanks. IMHO it’s best to keep it simple and familiar, so let’s keep using symbols and use a symbol property to solve this edge case. I think this problem is rare enough that we don’t need any fancy solutions for it.
Juri, please feel free to apply your symbol property patch. For the symbol property name, I feel that something like treesit-symbol-predicate/treesit-use-as-symbol-predicate would be more descriptive. Since we don’t have docstrings for symbol properties (right?). Also it’d be nice to document this in the manual, maybe in the manual section for treesit-thing-settings. The docstring probably don’t need this niche detail.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-08 2:27 ` Yuan Fu
@ 2025-01-09 7:42 ` Juri Linkov
2025-01-09 18:39 ` Dmitry Gutov
2025-01-10 3:14 ` Yuan Fu
0 siblings, 2 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-09 7:42 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, Stefan Monnier,
73404
>>> 1. use keywords like
>>>
>>> (setq-local treesit-thing-settings
>>> `((html
>>> (:sexp ,(regexp-opt '("element" "text" "attribute" "value")))
>>> (:list ,(regexp-opt '("element"")))
>>> (:sentence "tag")
>>> (:text ,(regexp-opt '("comment" "text"))))))
>>>
>>> 2. use a symbol property like (put 'list 'treesit-thing t)
>>>
>>> You decide ;-)
>>
>> Of course, there are more alternatives:
>>
>> - Use a symbol property like (put 'list 'treesit--this-is-a-function t)
>> - Refuse functions represented as symbols (callers need to use an
>> eta-wrapper).
>> - Distinguish the two cases via a more verbose representation like
>> `(thing list)` vs `(function list)`.
>> - Use a string rather than a symbol.
>> - <I'm sure you can come up with more options>
>
> Thanks. IMHO it’s best to keep it simple and familiar, so let’s keep
> using symbols and use a symbol property to solve this edge case.
> I think this problem is rare enough that we don’t need any fancy
> solutions for it.
>
> Juri, please feel free to apply your symbol property patch. For the symbol
> property name, I feel that something like
> treesit-symbol-predicate/treesit-use-as-symbol-predicate would be more
> descriptive.
If we settle on the variant that keeps the current format with symbols,
wouldn't it better to keep a symbol property name shorter?
First I proposed 'treesit-predicate', but then realized it's wrong
since it's opposite to the predicate function.
'treesit-thing' was better since it indicates that it's used in
'treesit-thing-settings'.
> Since we don’t have docstrings for symbol properties (right?).
There are no docstrings for symbol properties indeed. But I think
the important property for the name is that it should be unique enough
to be easily greppable. But I tried to grep, and 'treesit-thing' finds
too many matches.
So probably the most suitable symbol name would be 'treesit-thing-symbol',
that currently has no occurrences in source code.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-09 7:42 ` Juri Linkov
@ 2025-01-09 18:39 ` Dmitry Gutov
2025-01-09 18:44 ` Juri Linkov
2025-01-10 3:14 ` Yuan Fu
1 sibling, 1 reply; 106+ messages in thread
From: Dmitry Gutov @ 2025-01-09 18:39 UTC (permalink / raw)
To: Juri Linkov, Yuan Fu
Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, Stefan Monnier,
73404
On 09/01/2025 09:42, Juri Linkov wrote:
>> Juri, please feel free to apply your symbol property patch. For the symbol
>> property name, I feel that something like
>> treesit-symbol-predicate/treesit-use-as-symbol-predicate would be more
>> descriptive.
> If we settle on the variant that keeps the current format with symbols,
> wouldn't it better to keep a symbol property name shorter?
>
> First I proposed 'treesit-predicate', but then realized it's wrong
> since it's opposite to the predicate function.
> 'treesit-thing' was better since it indicates that it's used in
> 'treesit-thing-settings'.
Is it too late to keep the current name, 'sexp-list'?
It it of course somewhat redundant, but not by much, and not needing an
extra indirection is a plus on its own.
Just my 2c.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-09 18:39 ` Dmitry Gutov
@ 2025-01-09 18:44 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-09 18:44 UTC (permalink / raw)
To: Dmitry Gutov
Cc: Yuan Fu, Theodor Thornhill, Mickey Petersen, 73404, Eli Zaretskii,
Stefan Monnier
>>> Juri, please feel free to apply your symbol property patch. For the symbol
>>> property name, I feel that something like
>>> treesit-symbol-predicate/treesit-use-as-symbol-predicate would be more
>>> descriptive.
>> If we settle on the variant that keeps the current format with symbols,
>> wouldn't it better to keep a symbol property name shorter?
>> First I proposed 'treesit-predicate', but then realized it's wrong
>> since it's opposite to the predicate function.
>> 'treesit-thing' was better since it indicates that it's used in
>> 'treesit-thing-settings'.
>
> Is it too late to keep the current name, 'sexp-list'?
No, not late. And it's too ugly.
> It it of course somewhat redundant, but not by much, and not needing an
> extra indirection is a plus on its own.
An extra indirection is needed in any case to avoid such ambiguity
in the future.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-09 7:42 ` Juri Linkov
2025-01-09 18:39 ` Dmitry Gutov
@ 2025-01-10 3:14 ` Yuan Fu
2025-01-10 7:34 ` Juri Linkov
1 sibling, 1 reply; 106+ messages in thread
From: Yuan Fu @ 2025-01-10 3:14 UTC (permalink / raw)
To: Juri Linkov
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, Stefan Monnier,
73404
> On Jan 8, 2025, at 11:42 PM, Juri Linkov <juri@linkov.net> wrote:
>
>>>> 1. use keywords like
>>>>
>>>> (setq-local treesit-thing-settings
>>>> `((html
>>>> (:sexp ,(regexp-opt '("element" "text" "attribute" "value")))
>>>> (:list ,(regexp-opt '("element"")))
>>>> (:sentence "tag")
>>>> (:text ,(regexp-opt '("comment" "text"))))))
>>>>
>>>> 2. use a symbol property like (put 'list 'treesit-thing t)
>>>>
>>>> You decide ;-)
>>>
>>> Of course, there are more alternatives:
>>>
>>> - Use a symbol property like (put 'list 'treesit--this-is-a-function t)
>>> - Refuse functions represented as symbols (callers need to use an
>>> eta-wrapper).
>>> - Distinguish the two cases via a more verbose representation like
>>> `(thing list)` vs `(function list)`.
>>> - Use a string rather than a symbol.
>>> - <I'm sure you can come up with more options>
>>
>> Thanks. IMHO it’s best to keep it simple and familiar, so let’s keep
>> using symbols and use a symbol property to solve this edge case.
>> I think this problem is rare enough that we don’t need any fancy
>> solutions for it.
>>
>> Juri, please feel free to apply your symbol property patch. For the symbol
>> property name, I feel that something like
>> treesit-symbol-predicate/treesit-use-as-symbol-predicate would be more
>> descriptive.
>
> If we settle on the variant that keeps the current format with symbols,
> wouldn't it better to keep a symbol property name shorter?
>
> First I proposed 'treesit-predicate', but then realized it's wrong
> since it's opposite to the predicate function.
> 'treesit-thing' was better since it indicates that it's used in
> 'treesit-thing-settings'.
>
>> Since we don’t have docstrings for symbol properties (right?).
>
> There are no docstrings for symbol properties indeed. But I think
> the important property for the name is that it should be unique enough
> to be easily greppable. But I tried to grep, and 'treesit-thing' finds
> too many matches.
>
> So probably the most suitable symbol name would be 'treesit-thing-symbol',
> that currently has no occurrences in source code.
Sounds good. Let’s go with it then.
Yuan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-10 3:14 ` Yuan Fu
@ 2025-01-10 7:34 ` Juri Linkov
0 siblings, 0 replies; 106+ messages in thread
From: Juri Linkov @ 2025-01-10 7:34 UTC (permalink / raw)
To: Yuan Fu
Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, Stefan Monnier,
73404
close 73404 31.0.50
thanks
>>> Juri, please feel free to apply your symbol property patch. For the symbol
>>> property name, I feel that something like
>>> treesit-symbol-predicate/treesit-use-as-symbol-predicate would be more
>>> descriptive.
>>
>> If we settle on the variant that keeps the current format with symbols,
>> wouldn't it better to keep a symbol property name shorter?
>>
>> First I proposed 'treesit-predicate', but then realized it's wrong
>> since it's opposite to the predicate function.
>> 'treesit-thing' was better since it indicates that it's used in
>> 'treesit-thing-settings'.
>>
>>> Since we don’t have docstrings for symbol properties (right?).
>>
>> There are no docstrings for symbol properties indeed. But I think
>> the important property for the name is that it should be unique enough
>> to be easily greppable. But I tried to grep, and 'treesit-thing' finds
>> too many matches.
>>
>> So probably the most suitable symbol name would be 'treesit-thing-symbol',
>> that currently has no occurrences in source code.
>
> Sounds good. Let’s go with it then.
So now pushed together with updates in the manual.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-06 20:07 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-07 18:05 ` Juri Linkov
@ 2025-01-11 18:19 ` Juri Linkov
2025-01-11 19:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-11 18:19 UTC (permalink / raw)
To: Stefan Monnier; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
> But indeed for list operations in indentation-based languages like
> Haskell/Python/Coffeescript/YAML, we're faced again with the fact that
> a closing LF can close several nested lists, so when going backward
> there is some amount of ambiguity (and I'd argue that it's preferable
> (and closer to the historical behavior) to choose to jump less far,
> e.g. so that `C-M-n C-M-p` never moves backward).
I tried to implement list operations in yaml-ts-mode
on the nodes "block_mapping_pair" and "flow_sequence".
While testing everything looked so nice when `C-M-f`
and `C-M-b` moved on the whole key-value pairs.
But in actual use, it turned out to be completely unusable.
It feels this is because `C-M-f` is expected to move
to the next symbol unless it's at the beginning
of (...), [...], if...fi, begin...end.
Perhaps we should still support list navigation only
for these 4 keys: 'C-M-n', 'C-M-p', 'C-M-d', 'C-M-y',
but not for `C-M-f` and `C-M-b`.
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-11 18:19 ` Juri Linkov
@ 2025-01-11 19:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-12 7:34 ` Juri Linkov
0 siblings, 1 reply; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-11 19:14 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
> While testing everything looked so nice when `C-M-f`
> and `C-M-b` moved on the whole key-value pairs.
>
> But in actual use, it turned out to be completely unusable.
>
> It feels this is because `C-M-f` is expected to move
> to the next symbol unless it's at the beginning
> of (...), [...], if...fi, begin...end.
Maybe this can be tuned?
Let's take a piece of Haskell code like:
foo x y = do
a <- readsomething
x (bar y)
b <- readsomethingelse
x (baz y)
return (a, b)
A) If the cursor is right before `a <- ...`, I definitely expect `C-M-f` to
move just over the identifier.
B) If the cursor is right after `do` (i.e. looking-at EOL), I could see
`C-M-f` moving either to the end of the `do` block (i.e. the end of the
`foo` function, here), or to right after the closing paren of `(bar y)`.
C) If the cursor is right after the closing paren of `(bar y)`, then I'd
expect `C-M-f` to move to the closing paren of `(baz y)`.
If we make the block/list syntax more explicit (which Haskell allows),
the above code is:
foo x y = do{
a <- readsomething
x (bar y);
b <- readsomethingelse
x (baz y);
return (a, b)
}
In (A) it's clear we're already inside the block.
In (B), we have an ambiguity where the cursor can be considered to be
looking either at the whole {...} block, or just at the first statement
within it, but I think it's clearly not looking at just the `a` identifier.
In (C) what I propose is the behavior that SMIE uses in such cases.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-11 19:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2025-01-12 7:34 ` Juri Linkov
2025-01-12 15:08 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 106+ messages in thread
From: Juri Linkov @ 2025-01-12 7:34 UTC (permalink / raw)
To: Stefan Monnier; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
> Let's take a piece of Haskell code like:
>
> foo x y = do
> a <- readsomething
> x (bar y)
> b <- readsomethingelse
> x (baz y)
> return (a, b)
>
> A) If the cursor is right before `a <- ...`, I definitely expect `C-M-f` to
> move just over the identifier.
Is the cursor at the beginning of the line or after indentation?
It seems this scheme is whitespace-sensitive since moving the cursor
back over the newline to the end of the previous line with `do`
will change whether `C-M-f` should move over the next `bind` node
or just over the next symbol.
> B) If the cursor is right after `do` (i.e. looking-at EOL), I could see
> `C-M-f` moving either to the end of the `do` block (i.e. the end of the
> `foo` function, here), or to right after the closing paren of `(bar y)`.
Usually `do` designates the beginning of the node, so I'd expect the cursor
to be before `do` to be able to move to the end of the `do` block.
Unless explicit curly braces are used in which case the cursor
after `do` is expected to move after the closing curly brace.
> C) If the cursor is right after the closing paren of `(bar y)`, then I'd
> expect `C-M-f` to move to the closing paren of `(baz y)`.
This heuristics looks whitespace-sensitive too: if the cursor was
after the newline at the beginning of the next line then `C-M-f`
would move over the next symbol.
> If we make the block/list syntax more explicit (which Haskell allows),
> the above code is:
>
> foo x y = do{
> a <- readsomething
> x (bar y);
> b <- readsomethingelse
> x (baz y);
> return (a, b)
> }
>
> In (A) it's clear we're already inside the block.
Explicit syntax removes a lot of ambiguity indeed.
> In (B), we have an ambiguity where the cursor can be considered to be
> looking either at the whole {...} block, or just at the first statement
> within it, but I think it's clearly not looking at just the `a` identifier.
When the cursor is after `{` then it's inside the {...} block.
> In (C) what I propose is the behavior that SMIE uses in such cases.
Is SMIE sensitive to the position of the cursor inside whitespace region
including newlines?
^ permalink raw reply [flat|nested] 106+ messages in thread
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
2025-01-12 7:34 ` Juri Linkov
@ 2025-01-12 15:08 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 0 replies; 106+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2025-01-12 15:08 UTC (permalink / raw)
To: Juri Linkov; +Cc: mickey, Eli Zaretskii, Theodor Thornhill, casouri, 73404
>> A) If the cursor is right before `a <- ...`, I definitely expect `C-M-f` to
>> move just over the identifier.
> Is the cursor at the beginning of the line or after indentation?
I was thinking "after indentation".
I don't know what behavior I'd want if it's at the beginning of the line.
> This heuristics looks whitespace-sensitive too: if the cursor was
> after the newline at the beginning of the next line then `C-M-f`
> would move over the next symbol.
Of course: in such languages, LF is very much significant.
>> In (C) what I propose is the behavior that SMIE uses in such cases.
> Is SMIE sensitive to the position of the cursor inside whitespace region
> including newlines?
In those languages where LF is a statement separator
(e.g. `octave-mode`), yes.
Stefan
^ permalink raw reply [flat|nested] 106+ messages in thread
end of thread, other threads:[~2025-01-12 15:08 UTC | newest]
Thread overview: 106+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-21 5:06 bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Mickey Petersen
2024-09-26 7:42 ` Yuan Fu
2024-09-26 9:56 ` Mickey Petersen
2024-09-26 10:53 ` Eli Zaretskii
2024-09-26 12:13 ` Mickey Petersen
2024-09-26 13:46 ` Eli Zaretskii
2024-09-26 15:21 ` Mickey Petersen
2024-09-26 15:45 ` Eli Zaretskii
2024-09-27 5:43 ` Yuan Fu
2024-09-29 16:56 ` Juri Linkov
2024-10-01 3:57 ` Yuan Fu
2024-10-01 17:49 ` Juri Linkov
2024-10-02 6:14 ` Yuan Fu
2024-12-05 18:52 ` Juri Linkov
2024-12-05 19:53 ` Juri Linkov
2024-12-10 17:20 ` Juri Linkov
2024-12-11 6:31 ` Yuan Fu
2024-12-11 15:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 15:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 16:50 ` Mickey Petersen
2024-12-11 18:27 ` Yuan Fu
2024-12-12 7:17 ` Juri Linkov
2024-12-12 7:40 ` Eli Zaretskii
2024-12-12 7:58 ` Juri Linkov
2024-12-12 8:14 ` Juri Linkov
2024-12-12 16:31 ` Juri Linkov
2024-12-12 17:49 ` Juri Linkov
2024-12-12 19:13 ` Eli Zaretskii
2024-12-13 7:06 ` Juri Linkov
2024-12-14 11:02 ` Eli Zaretskii
2024-12-14 18:14 ` Juri Linkov
2024-12-18 7:37 ` Juri Linkov
2024-12-19 4:04 ` Yuan Fu
2024-12-19 7:14 ` Juri Linkov
2024-12-19 7:18 ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
2024-12-24 3:02 ` Yuan Fu
2024-12-24 7:17 ` Juri Linkov
2024-12-24 7:41 ` Juri Linkov
2024-12-25 3:25 ` Dmitry Gutov
2024-12-25 7:52 ` Juri Linkov
2024-12-26 1:00 ` Dmitry Gutov
2024-12-27 7:42 ` Juri Linkov
2024-12-24 17:52 ` Juri Linkov
2024-12-24 21:03 ` Yuan Fu
2024-12-25 7:49 ` Juri Linkov
2024-12-25 9:11 ` Yuan Fu
2024-12-25 17:39 ` Juri Linkov
2024-12-19 7:34 ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
2024-12-24 19:05 ` Juri Linkov
2024-12-24 21:14 ` Yuan Fu
2024-12-25 7:44 ` Juri Linkov
2024-12-25 8:34 ` Yuan Fu
2024-12-25 17:36 ` Juri Linkov
2024-12-27 7:59 ` Juri Linkov
2024-12-25 17:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-25 18:01 ` Juri Linkov
2024-12-25 19:29 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-27 7:54 ` Juri Linkov
2024-12-29 17:58 ` Juri Linkov
2024-12-30 7:15 ` Juri Linkov
2024-12-30 8:00 ` Yuan Fu
2025-01-04 17:46 ` Juri Linkov
2025-01-04 19:05 ` Eli Zaretskii
2025-01-05 7:32 ` Juri Linkov
2025-01-05 11:46 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 12:18 ` Eli Zaretskii
2025-01-05 12:30 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 16:59 ` Eli Zaretskii
2025-01-05 18:20 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 21:18 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 17:59 ` Juri Linkov
2025-01-05 19:50 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 7:54 ` Juri Linkov
2025-01-06 14:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-06 17:40 ` Juri Linkov
2025-01-06 18:04 ` Yuan Fu
2025-01-06 20:07 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-07 18:05 ` Juri Linkov
2025-01-07 19:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-11 18:19 ` Juri Linkov
2025-01-11 19:14 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-12 7:34 ` Juri Linkov
2025-01-12 15:08 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 13:02 ` Eli Zaretskii
2025-01-05 18:05 ` Juri Linkov
2025-01-05 19:54 ` Eli Zaretskii
2025-01-06 18:02 ` Juri Linkov
2025-01-06 18:45 ` Eli Zaretskii
2025-01-06 20:19 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 7:37 ` Juri Linkov
2025-01-05 14:40 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-05 18:10 ` Juri Linkov
2025-01-06 8:56 ` Yuan Fu
2025-01-07 17:56 ` Juri Linkov
2025-01-07 19:25 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2025-01-08 2:27 ` Yuan Fu
2025-01-09 7:42 ` Juri Linkov
2025-01-09 18:39 ` Dmitry Gutov
2025-01-09 18:44 ` Juri Linkov
2025-01-10 3:14 ` Yuan Fu
2025-01-10 7:34 ` Juri Linkov
2024-12-30 13:40 ` Eli Zaretskii
2024-12-30 18:54 ` Juri Linkov
2024-12-30 19:36 ` Eli Zaretskii
2024-12-30 16:30 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-30 18:59 ` Juri Linkov
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.