all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
@ 2024-09-21  5:06 Mickey Petersen
  2024-09-26  7:42 ` Yuan Fu
  0 siblings, 1 reply; 58+ messages in thread
From: Mickey Petersen @ 2024-09-21  5:06 UTC (permalink / raw)
  To: 73404



Examples with javascript-mode. It holds for all modes i tested with a
TS equivalent. Let -!- be the starting point and ^N be the subsequent
position after a movement command.

-!-export const add = (a, b) => a + b;

Repeated `C-M-f' yields

export const add = (a, b) => a + b;

      ^1    ^2  ^3       ^4   ^5  ^6


In other words, it works as it always has.

Meanwhile, in `js-ts-mode':

export const add = (a, b) => a + b;
                ^1       ^2   ^3  ^4

From ^1 and back with `C-M-b'

export const add-!- = (a, b) => a + b;

export const add = (a, b) => a + b;
             ^1

At this point, `C-M-b' no longer goes back. It is stuck.


Another example:

-!-console.log("Addition result:", result1);

With `C-M-f':

console.log("Addition result:", result1);

       ^1                               ^2


This affects every single -sexp function that uses either
`forward-sexp-function' or `transpose-sexp-function' to do its job.

Thanks.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-21  5:06 bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Mickey Petersen
@ 2024-09-26  7:42 ` Yuan Fu
  2024-09-26  9:56   ` Mickey Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-09-26  7:42 UTC (permalink / raw)
  To: Mickey Petersen; +Cc: 73404



> On Sep 20, 2024, at 10:06 PM, Mickey Petersen <mickey@masteringemacs.org> wrote:
> 
> 
> 
> Examples with javascript-mode. It holds for all modes i tested with a
> TS equivalent. Let -!- be the starting point and ^N be the subsequent
> position after a movement command.
> 
> -!-export const add = (a, b) => a + b;
> 
> Repeated `C-M-f' yields
> 
> export const add = (a, b) => a + b;
> 
>      ^1    ^2  ^3       ^4   ^5  ^6
> 
> 
> In other words, it works as it always has.
> 
> Meanwhile, in `js-ts-mode':
> 
> export const add = (a, b) => a + b;
>                ^1       ^2   ^3  ^4
> 
> From ^1 and back with `C-M-b'
> 
> export const add-!- = (a, b) => a + b;
> 
> export const add = (a, b) => a + b;
>             ^1
> 
> At this point, `C-M-b' no longer goes back. It is stuck.
> 
> 
> Another example:
> 
> -!-console.log("Addition result:", result1);
> 
> With `C-M-f':
> 
> console.log("Addition result:", result1);
> 
>       ^1                               ^2
> 
> 
> This affects every single -sexp function that uses either
> `forward-sexp-function' or `transpose-sexp-function' to do its job.
> 
> Thanks.
> 

I’m aware of this problem and it’s quite inconvenient at times, but right now I don’t have a good solution for it. Ideas are welcome.

Basically tree-sitter’s sexp movement works on subtrees. It determines the position of the point in the whole parse tree and goes forward/back across the next subtree in the parse tree. If there’s no more sibling subtrees in the same level to move over, sexp movement stops like in lisp. The parse tree is invisible and often groups token in unexpected ways, so many times the sexp movement isn’t intuitive.

We might need to add a user option so people can easily turn off tree-sitter sexp movement, since it isn’t a strict upgrade from the generic sexp movement—it’s more of a different flavored sexp movement.

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-26  7:42 ` Yuan Fu
@ 2024-09-26  9:56   ` Mickey Petersen
  2024-09-26 10:53     ` Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Mickey Petersen @ 2024-09-26  9:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 73404


Yuan Fu <casouri@gmail.com> writes:

>> On Sep 20, 2024, at 10:06 PM, Mickey Petersen <mickey@masteringemacs.org> wrote:
>>
>>
>>
>> Examples with javascript-mode. It holds for all modes i tested with a
>> TS equivalent. Let -!- be the starting point and ^N be the subsequent
>> position after a movement command.
>>
>> -!-export const add = (a, b) => a + b;
>>
>> Repeated `C-M-f' yields
>>
>> export const add = (a, b) => a + b;
>>
>>      ^1    ^2  ^3       ^4   ^5  ^6
>>
>>
>> In other words, it works as it always has.
>>
>> Meanwhile, in `js-ts-mode':
>>
>> export const add = (a, b) => a + b;
>>                ^1       ^2   ^3  ^4
>>
>> From ^1 and back with `C-M-b'
>>
>> export const add-!- = (a, b) => a + b;
>>
>> export const add = (a, b) => a + b;
>>             ^1
>>
>> At this point, `C-M-b' no longer goes back. It is stuck.
>>
>>
>> Another example:
>>
>> -!-console.log("Addition result:", result1);
>>
>> With `C-M-f':
>>
>> console.log("Addition result:", result1);
>>
>>       ^1                               ^2
>>
>>
>> This affects every single -sexp function that uses either
>> `forward-sexp-function' or `transpose-sexp-function' to do its job.
>>
>> Thanks.
>>
>
> I’m aware of this problem and it’s quite inconvenient at times, but right now I don’t have a good solution for it. Ideas are welcome.
>
> Basically tree-sitter’s sexp movement works on subtrees. It determines
> the position of the point in the whole parse tree and goes
> forward/back across the next subtree in the parse tree. If there’s no
> more sibling subtrees in the same level to move over, sexp movement
> stops like in lisp. The parse tree is invisible and often groups token
> in unexpected ways, so many times the sexp movement isn’t intuitive.
>

Hi Yuan,

In my opinion, that's not what `sexp' movement is.

Sexp movement is movement by balanced expressions -- and a fallback to
word-like behaviour absent that -- and this is not that. It would be
better to relegate this sort of thing to its own set of keybindings.


> We might need to add a user option so people can easily turn off
> tree-sitter sexp movement, since it isn’t a strict upgrade from the
> generic sexp movement—it’s more of a different flavored sexp movement.

It should be opt-in, not opt-out.

>
> Yuan






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-26  9:56   ` Mickey Petersen
@ 2024-09-26 10:53     ` Eli Zaretskii
  2024-09-26 12:13       ` Mickey Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-09-26 10:53 UTC (permalink / raw)
  To: Mickey Petersen; +Cc: casouri, 73404

> Cc: 73404@debbugs.gnu.org
> From: Mickey Petersen <mickey@masteringemacs.org>
> Date: Thu, 26 Sep 2024 10:56:35 +0100
> 
> In my opinion, that's not what `sexp' movement is.
> 
> Sexp movement is movement by balanced expressions -- and a fallback to
> word-like behaviour absent that -- and this is not that. It would be
> better to relegate this sort of thing to its own set of keybindings.

The term "balanced expression" is not well defined in languages other
than Lisp and Lisp-like ones.  It is clear what expected when point is
on a brace or a parenthesis, but entirely NOT clear when you start
from something else.  For example:

  int foo = bar + 2 * baz;

Suppose you start with point at "foo": what would you expect
forward-sexp to do? nothing?

> > We might need to add a user option so people can easily turn off
> > tree-sitter sexp movement, since it isn’t a strict upgrade from the
> > generic sexp movement—it’s more of a different flavored sexp movement.
> 
> It should be opt-in, not opt-out.

I disagree.  Moving by sub-trees is a natural generalization of sexp
movement for languages where parentheses and braces are rare and far
in-between.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-26 10:53     ` Eli Zaretskii
@ 2024-09-26 12:13       ` Mickey Petersen
  2024-09-26 13:46         ` Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Mickey Petersen @ 2024-09-26 12:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 73404


Eli Zaretskii <eliz@gnu.org> writes:

>> Cc: 73404@debbugs.gnu.org
>> From: Mickey Petersen <mickey@masteringemacs.org>
>> Date: Thu, 26 Sep 2024 10:56:35 +0100
>>
>> In my opinion, that's not what `sexp' movement is.
>>
>> Sexp movement is movement by balanced expressions -- and a fallback to
>> word-like behaviour absent that -- and this is not that. It would be
>> better to relegate this sort of thing to its own set of keybindings.
>
> The term "balanced expression" is not well defined in languages other
> than Lisp and Lisp-like ones.  It is clear what expected when point is
> on a brace or a parenthesis, but entirely NOT clear when you start
> from something else.  For example:
>
>   int foo = bar + 2 * baz;
>
> Suppose you start with point at "foo": what would you expect
> forward-sexp to do? nothing?
>

I expect it to behave as it presently does: default to word-like
behaviour such as M-@ / M-f etc.

Balanced expression is not well defined, de jure, but it is in
practical terms, making it de facto rather well understood and
supported. It behaves reasonably consistently across languages, and I
use *-sexp commands thousands of times a day in a wide range of major modes and
contexts, both in code and also prose.

Most people who use *-sexp (or *-word commands for that matter) in
major modes come to recognise how they work and know what happens to
the text/point in their buffer before they run them.

I would challenge anyone, given even small samples of code, to do the
same with the current TS only implementation.

>> > We might need to add a user option so people can easily turn off
>> > tree-sitter sexp movement, since it isn’t a strict upgrade from the
>> > generic sexp movement—it’s more of a different flavored sexp movement.
>>
>> It should be opt-in, not opt-out.
>
> I disagree.  Moving by sub-trees is a natural generalization of sexp
> movement for languages where parentheses and braces are rare and far
> in-between.

Yes, if one can intuit the sub trees' structure, which is not so
simple; and if the selection of commands are sufficiently expressive
enough to let you navigate the tree. I am not sure they are.

The CSTs are deep, wide, and nodes' ranges frequently overlap; they
are multi-dimensional structures that map to a simple 2-dimensional
'grid' in your buffer. Making heads or tails of that is no easy feat.






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-26 12:13       ` Mickey Petersen
@ 2024-09-26 13:46         ` Eli Zaretskii
  2024-09-26 15:21           ` Mickey Petersen
  0 siblings, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-09-26 13:46 UTC (permalink / raw)
  To: Mickey Petersen; +Cc: casouri, 73404

> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NO_RECEIVED,
> 	NO_RELAYS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no
> 	version=3.4.2
> From: Mickey Petersen <mickey@masteringemacs.org>
> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
> Date: Thu, 26 Sep 2024 13:13:53 +0100
> 
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >   int foo = bar + 2 * baz;
> >
> > Suppose you start with point at "foo": what would you expect
> > forward-sexp to do? nothing?
> >
> 
> I expect it to behave as it presently does: default to word-like
> behaviour such as M-@ / M-f etc.

Then we just lost an opportunity to have more useful commands, because
we already have M-f and M-@.

> Balanced expression is not well defined, de jure, but it is in
> practical terms, making it de facto rather well understood and
> supported. It behaves reasonably consistently across languages, and I
> use *-sexp commands thousands of times a day in a wide range of major modes and
> contexts, both in code and also prose.

I think the ability to move by parse sub-trees is also very useful.

> Most people who use *-sexp (or *-word commands for that matter) in
> major modes come to recognise how they work and know what happens to
> the text/point in their buffer before they run them.
> 
> I would challenge anyone, given even small samples of code, to do the
> same with the current TS only implementation.

That's just a matter of getting used to the new semantics.

> > I disagree.  Moving by sub-trees is a natural generalization of sexp
> > movement for languages where parentheses and braces are rare and far
> > in-between.
> 
> Yes, if one can intuit the sub trees' structure, which is not so
> simple; and if the selection of commands are sufficiently expressive
> enough to let you navigate the tree. I am not sure they are.

There are enough situations where moving by words will also surprise
you.  For example, did you know that M-f stops when it finds a
character from a different script?  And yet we still use these
commands.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-26 13:46         ` Eli Zaretskii
@ 2024-09-26 15:21           ` Mickey Petersen
  2024-09-26 15:45             ` Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Mickey Petersen @ 2024-09-26 15:21 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 73404


Eli Zaretskii <eliz@gnu.org> writes:

>> X-Spam-Status: No, score=-1.9 required=5.0 tests=BAYES_00,NO_RECEIVED,
>> 	NO_RELAYS,URIBL_BLOCKED autolearn=unavailable autolearn_force=no
>> 	version=3.4.2
>> From: Mickey Petersen <mickey@masteringemacs.org>
>> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
>> Date: Thu, 26 Sep 2024 13:13:53 +0100
>>
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> >   int foo = bar + 2 * baz;
>> >
>> > Suppose you start with point at "foo": what would you expect
>> > forward-sexp to do? nothing?
>> >
>>
>> I expect it to behave as it presently does: default to word-like
>> behaviour such as M-@ / M-f etc.
>
> Then we just lost an opportunity to have more useful commands, because
> we already have M-f and M-@.
>
>> Balanced expression is not well defined, de jure, but it is in
>> practical terms, making it de facto rather well understood and
>> supported. It behaves reasonably consistently across languages, and I
>> use *-sexp commands thousands of times a day in a wide range of major modes and
>> contexts, both in code and also prose.
>
> I think the ability to move by parse sub-trees is also very useful.
>

Agreed. What matters is whether the crop of new sexp commands, such as they
are, perform satisfactorily.

Do you think the examples I listed in the original bug report match
your expectations? If so, then it is probably OK to close the bug report.

>> Most people who use *-sexp (or *-word commands for that matter) in
>> major modes come to recognise how they work and know what happens to
>> the text/point in their buffer before they run them.
>>
>> I would challenge anyone, given even small samples of code, to do the
>> same with the current TS only implementation.
>
> That's just a matter of getting used to the new semantics.
>
>> > I disagree.  Moving by sub-trees is a natural generalization of sexp
>> > movement for languages where parentheses and braces are rare and far
>> > in-between.
>>
>> Yes, if one can intuit the sub trees' structure, which is not so
>> simple; and if the selection of commands are sufficiently expressive
>> enough to let you navigate the tree. I am not sure they are.
>
> There are enough situations where moving by words will also surprise
> you.  For example, did you know that M-f stops when it finds a
> character from a different script?  And yet we still use these
> commands.






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-26 15:21           ` Mickey Petersen
@ 2024-09-26 15:45             ` Eli Zaretskii
  2024-09-27  5:43               ` Yuan Fu
  0 siblings, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-09-26 15:45 UTC (permalink / raw)
  To: Mickey Petersen; +Cc: casouri, 73404

> From: Mickey Petersen <mickey@masteringemacs.org>
> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
> Date: Thu, 26 Sep 2024 16:21:33 +0100
> 
> > I think the ability to move by parse sub-trees is also very useful.
> >
> 
> Agreed. What matters is whether the crop of new sexp commands, such as they
> are, perform satisfactorily.
> 
> Do you think the examples I listed in the original bug report match
> your expectations? If so, then it is probably OK to close the bug report.

Yes, I do, but let's wait for others to chime in if they have opinions
on this.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-26 15:45             ` Eli Zaretskii
@ 2024-09-27  5:43               ` Yuan Fu
  2024-09-29 16:56                 ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-09-27  5:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Mickey Petersen, 73404



> On Sep 26, 2024, at 8:45 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Mickey Petersen <mickey@masteringemacs.org>
>> Cc: casouri@gmail.com, 73404@debbugs.gnu.org
>> Date: Thu, 26 Sep 2024 16:21:33 +0100
>> 
>>> I think the ability to move by parse sub-trees is also very useful.
>>> 
>> 
>> Agreed. What matters is whether the crop of new sexp commands, such as they
>> are, perform satisfactorily.

Note that you can affect the behavior of tree-sitter sexp movement by defining the sexp “thing” in treesit-thing-settings. Js-ts-mode defines one (js--treesit-sexp-nodes) and it only consider some nodes as sexp. You might be able to tweak the sexp movement to your liking by changing it, or directly modifying the definition for `sexp’ in treesit-thing-settings.

>> 
>> Do you think the examples I listed in the original bug report match
>> your expectations? If so, then it is probably OK to close the bug report.
> 
> Yes, I do, but let's wait for others to chime in if they have opinions
> on this.






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-27  5:43               ` Yuan Fu
@ 2024-09-29 16:56                 ` Juri Linkov
  2024-10-01  3:57                   ` Yuan Fu
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-09-29 16:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404

> Note that you can affect the behavior of tree-sitter sexp movement by
> defining the sexp “thing” in treesit-thing-settings. Js-ts-mode defines one
> (js--treesit-sexp-nodes) and it only consider some nodes as sexp. You might
> be able to tweak the sexp movement to your liking by changing it, or
> directly modifying the definition for `sexp’ in treesit-thing-settings.
>
>>> Do you think the examples I listed in the original bug report match
>>> your expectations? If so, then it is probably OK to close the bug report.
>>
>> Yes, I do, but let's wait for others to chime in if they have opinions
>> on this.

Here are some ideas how to cover more use cases.

Suppose that a user wants to disable tree-sitter sexp movement
completely to use the default forward-sexp-default-function.
The natural way to do this would be set the list of nodes to nil:

  (setq js--treesit-sexp-nodes nil)

However, this currently doesn't work, and requires a change like this:

  @@ -2290,10 +2290,12 @@ treesit-forward-sexp
            (treesit-node-at (point) (treesit-language-at (point)))))
       (or (when (and node-at-point
                      ;; Make sure point is strictly inside node.
  -                   (< (treesit-node-start node-at-point)
  -                      (point)
  -                      (treesit-node-end node-at-point))
  -                   (treesit-node-match-p node-at-point 'text t))
  +                   (<= (treesit-node-start node-at-point)
  +                       (point)
  +                       (treesit-node-end node-at-point))
  +                   (or (treesit-node-match-p node-at-point 'text t)
  +                       (not (treesit-node-match-p node-at-point 'sexp t))
  +                       ))
             (forward-sexp-default-function arg)
             t)
           (if (> arg 0)

Now, the next case: what if the user wants to use the default
forward-sexp-default-function except for the 'binary_expression'
like "a + b" where `C-M-f' should move from "a" to the end of "b":

  export const add = (a, b) => -!-a + b;

should move to

  export const add = (a, b) => a + b;

                                    ^1

The best way for the user would be to customize:

  (setq js--treesit-sexp-nodes '("binary_expression"))

But this is not yet handled by the condition above:

  (not (treesit-node-match-p node-at-point 'sexp t))

because 'node-at-point' is "identifier".
So we need to use 'treesit-parent-until'
to check if all parent nodes match
'js--treesit-sexp-nodes'.  Then it will find
the parent "binary_expression".

I believe something like this will make
treesit-forward-sexp more customizable.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-09-29 16:56                 ` Juri Linkov
@ 2024-10-01  3:57                   ` Yuan Fu
  2024-10-01 17:49                     ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-10-01  3:57 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Eli Zaretskii, Mickey Petersen, 73404



> On Sep 29, 2024, at 9:56 AM, Juri Linkov <juri@linkov.net> wrote:
> 
>> Note that you can affect the behavior of tree-sitter sexp movement by
>> defining the sexp “thing” in treesit-thing-settings. Js-ts-mode defines one
>> (js--treesit-sexp-nodes) and it only consider some nodes as sexp. You might
>> be able to tweak the sexp movement to your liking by changing it, or
>> directly modifying the definition for `sexp’ in treesit-thing-settings.
>> 
>>>> Do you think the examples I listed in the original bug report match
>>>> your expectations? If so, then it is probably OK to close the bug report.
>>> 
>>> Yes, I do, but let's wait for others to chime in if they have opinions
>>> on this.
> 
> Here are some ideas how to cover more use cases.
> 
> Suppose that a user wants to disable tree-sitter sexp movement
> completely to use the default forward-sexp-default-function.
> The natural way to do this would be set the list of nodes to nil:
> 
>  (setq js--treesit-sexp-nodes nil)
> 
> However, this currently doesn't work, and requires a change like this:
> 
>  @@ -2290,10 +2290,12 @@ treesit-forward-sexp
>            (treesit-node-at (point) (treesit-language-at (point)))))
>       (or (when (and node-at-point
>                      ;; Make sure point is strictly inside node.
>  -                   (< (treesit-node-start node-at-point)
>  -                      (point)
>  -                      (treesit-node-end node-at-point))
>  -                   (treesit-node-match-p node-at-point 'text t))
>  +                   (<= (treesit-node-start node-at-point)
>  +                       (point)
>  +                       (treesit-node-end node-at-point))
>  +                   (or (treesit-node-match-p node-at-point 'text t)
>  +                       (not (treesit-node-match-p node-at-point 'sexp t))
>  +                       ))
>             (forward-sexp-default-function arg)
>             t)
>           (if (> arg 0)
> 
> Now, the next case: what if the user wants to use the default
> forward-sexp-default-function except for the 'binary_expression'
> like "a + b" where `C-M-f' should move from "a" to the end of "b":
> 
>  export const add = (a, b) => -!-a + b;
> 
> should move to
> 
>  export const add = (a, b) => a + b;
> 
>                                    ^1
> 
> The best way for the user would be to customize:
> 
>  (setq js--treesit-sexp-nodes '("binary_expression"))
> 
> But this is not yet handled by the condition above:
> 
>  (not (treesit-node-match-p node-at-point 'sexp t))
> 
> because 'node-at-point' is "identifier".
> So we need to use 'treesit-parent-until'
> to check if all parent nodes match
> 'js--treesit-sexp-nodes'.  Then it will find
> the parent "binary_expression".
> 
> I believe something like this will make
> treesit-forward-sexp more customizable.

The user can modify treesit-thing-settings to alter the behavior of sexp navigation, they don’t necessarily need to use js--treesit-sexp-nodes. Maybe we should add a test for (treesit-thing-defined-p 'sexp nil) in treesit-forward-sexp? 

Your second example sounds useful, but right now the premise of tree-sitter sexp movement is to use the parse tree primarily, and only use the default sexp movement for comments and strings. What you envisioned seems to be the other way around: use default sexp movement by default, and only use tree-sitter movement under certain conditions. Is that few lines of change able to make such big difference in the logic?

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-10-01  3:57                   ` Yuan Fu
@ 2024-10-01 17:49                     ` Juri Linkov
  2024-10-02  6:14                       ` Yuan Fu
  2024-12-05 18:52                       ` Juri Linkov
  0 siblings, 2 replies; 58+ messages in thread
From: Juri Linkov @ 2024-10-01 17:49 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404

> The user can modify treesit-thing-settings to alter the behavior of
> sexp navigation, they don’t necessarily need to use
> js--treesit-sexp-nodes.  Maybe we should add a test for
> (treesit-thing-defined-p 'sexp nil) in treesit-forward-sexp?

I tried to do something like this, and everything works nicely with:

  @@ -2289,11 +2289,10 @@ treesit-forward-sexp
           (node-at-point
            (treesit-node-at (point) (treesit-language-at (point)))))
       (or (when (and node-at-point
  -                   ;; Make sure point is strictly inside node.
  -                   (< (treesit-node-start node-at-point)
  -                      (point)
  -                      (treesit-node-end node-at-point))
  -                   (treesit-node-match-p node-at-point 'text t))
  +                   (or (treesit-node-match-p node-at-point 'text t)
  +                       (not (treesit-thing-at
  +                             (if (> arg 0) (point) (1- (point)))
  +                             (treesit-thing-definition 'sexp nil)))))
             (forward-sexp-default-function arg)
             t)
           (if (> arg 0)

The new logic is the following: if there is no sexp thing defined at point,
then fall back to 'forward-sexp-default-function'.

Then after (setq js--treesit-sexp-nodes '("binary_expression"))
'C-M-f' in e.g.

  export const add = (a, b) => -!-a + b;

moves point to

  export const add = (a, b) => a + b-!-;

The condition (if (> arg 0) (point) (1- (point))) above
is necessary to allow 'C-M-b' to move back to:

  export const add = (a, b) => -!-a + b;

Also the condition to make sure point is strictly inside node
was removed to handle the case when point was at the beginning
of the buffer:

  -!-
  export const add = (a, b) => a + b;

to move after

  export-!- const add = (a, b) => a + b;

by 'forward-sexp-default-function'.

> Your second example sounds useful, but right now the premise of tree-sitter
> sexp movement is to use the parse tree primarily, and only use the default
> sexp movement for comments and strings. What you envisioned seems to be the
> other way around: use default sexp movement by default, and only use
> tree-sitter movement under certain conditions. Is that few lines of change
> able to make such big difference in the logic?

I think we need to support both ways:

1. opt-out - where sexp-thing definition is used by default,
   and only text-thing allows users to override it;

2. opt-in - where 'forward-sexp-default-function' is used by default,
   and user can explicitly define what sexp-things are preferable
   for navigation by treesit.

Then in the latter case the users could prefer to use
treesit sexp navigation only for constructions with
"invisible parens".  For example, in Ruby there are
two interchangeable syntaxes for code blocks:

1. curly braces {...} that are already handled
   by 'forward-sexp-default-function';

2. do...end that can't be handled by 'forward-sexp-default-function',
   so treesit is coming to the rescue for the case of such
   implicit braces.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-10-01 17:49                     ` Juri Linkov
@ 2024-10-02  6:14                       ` Yuan Fu
  2024-12-05 18:52                       ` Juri Linkov
  1 sibling, 0 replies; 58+ messages in thread
From: Yuan Fu @ 2024-10-02  6:14 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Eli Zaretskii, Mickey Petersen, 73404



> On Oct 1, 2024, at 10:49 AM, Juri Linkov <juri@linkov.net> wrote:
> 
>> The user can modify treesit-thing-settings to alter the behavior of
>> sexp navigation, they don’t necessarily need to use
>> js--treesit-sexp-nodes.  Maybe we should add a test for
>> (treesit-thing-defined-p 'sexp nil) in treesit-forward-sexp?
> 
> I tried to do something like this, and everything works nicely with:
> 
>  @@ -2289,11 +2289,10 @@ treesit-forward-sexp
>           (node-at-point
>            (treesit-node-at (point) (treesit-language-at (point)))))
>       (or (when (and node-at-point
>  -                   ;; Make sure point is strictly inside node.
>  -                   (< (treesit-node-start node-at-point)
>  -                      (point)
>  -                      (treesit-node-end node-at-point))
>  -                   (treesit-node-match-p node-at-point 'text t))
>  +                   (or (treesit-node-match-p node-at-point 'text t)
>  +                       (not (treesit-thing-at
>  +                             (if (> arg 0) (point) (1- (point)))
>  +                             (treesit-thing-definition 'sexp nil)))))
>             (forward-sexp-default-function arg)
>             t)
>           (if (> arg 0)
> 
> The new logic is the following: if there is no sexp thing defined at point,
> then fall back to 'forward-sexp-default-function'.
> 
> Then after (setq js--treesit-sexp-nodes '("binary_expression"))
> 'C-M-f' in e.g.
> 
>  export const add = (a, b) => -!-a + b;
> 
> moves point to
> 
>  export const add = (a, b) => a + b-!-;
> 
> The condition (if (> arg 0) (point) (1- (point))) above
> is necessary to allow 'C-M-b' to move back to:
> 
>  export const add = (a, b) => -!-a + b;
> 
> Also the condition to make sure point is strictly inside node
> was removed to handle the case when point was at the beginning
> of the buffer:
> 
>  -!-
>  export const add = (a, b) => a + b;
> 
> to move after
> 
>  export-!- const add = (a, b) => a + b;
> 
> by 'forward-sexp-default-function'.

Sounds good. Feel free to install on master if you think it works well :-)

> 
>> Your second example sounds useful, but right now the premise of tree-sitter
>> sexp movement is to use the parse tree primarily, and only use the default
>> sexp movement for comments and strings. What you envisioned seems to be the
>> other way around: use default sexp movement by default, and only use
>> tree-sitter movement under certain conditions. Is that few lines of change
>> able to make such big difference in the logic?
> 
> I think we need to support both ways:
> 
> 1. opt-out - where sexp-thing definition is used by default,
>   and only text-thing allows users to override it;
> 
> 2. opt-in - where 'forward-sexp-default-function' is used by default,
>   and user can explicitly define what sexp-things are preferable
>   for navigation by treesit.
> 
> Then in the latter case the users could prefer to use
> treesit sexp navigation only for constructions with
> "invisible parens".  For example, in Ruby there are
> two interchangeable syntaxes for code blocks:
> 
> 1. curly braces {...} that are already handled
>   by 'forward-sexp-default-function';
> 
> 2. do...end that can't be handled by 'forward-sexp-default-function',
>   so treesit is coming to the rescue for the case of such
>   implicit braces.

Sounds good to me, I wonder if there are clever way to implement this. If there isn’t, we’d need to define two sets treesit-sexp functions and add a custom option to control which one to use. Seems a bit clunky to me.

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-10-01 17:49                     ` Juri Linkov
  2024-10-02  6:14                       ` Yuan Fu
@ 2024-12-05 18:52                       ` Juri Linkov
  2024-12-05 19:53                         ` Juri Linkov
  1 sibling, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-05 18:52 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404

> The new logic is the following: if there is no sexp thing defined at point,
> then fall back to 'forward-sexp-default-function'.
>
> Then after (setq js--treesit-sexp-nodes '("binary_expression"))
> 'C-M-f' in e.g.
>
>   export const add = (a, b) => -!-a + b;
>
> moves point to
>
>   export const add = (a, b) => a + b-!-;

Unfortunately, I still can't find a way to handle such case
that from

    export const add = (a, b) -!- => a + b;

typing 'C-M-f' should jump to the end of the next sexp
(to the end of whole "binary_expression"):

    export const add = (a, b) => a + b-!-;

since only tree-sitter knows about "binary_expression",
so 'forward-sexp-default-function' can't be used here.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-05 18:52                       ` Juri Linkov
@ 2024-12-05 19:53                         ` Juri Linkov
  2024-12-10 17:20                           ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-05 19:53 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404

>> The new logic is the following: if there is no sexp thing defined at point,
>> then fall back to 'forward-sexp-default-function'.
>>
>> Then after (setq js--treesit-sexp-nodes '("binary_expression"))
>> 'C-M-f' in e.g.
>>
>>   export const add = (a, b) => -!-a + b;
>>
>> moves point to
>>
>>   export const add = (a, b) => a + b-!-;
>
> Unfortunately, I still can't find a way to handle such case
> that from
>
>     export const add = (a, b) -!- => a + b;
>
> typing 'C-M-f' should jump to the end of the next sexp
> (to the end of whole "binary_expression"):
>
>     export const add = (a, b) => a + b-!-;
>
> since only tree-sitter knows about "binary_expression",
> so 'forward-sexp-default-function' can't be used here.

Actually, I have one idea of possible heuristics:

1. first try 'forward-sexp-default-function'
2. if it crosses the boundary of sexp defined by 'treesit-thing-settings'
   then use 'treesit-end-of-thing' instead

This should work.  Ok, will try.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-05 19:53                         ` Juri Linkov
@ 2024-12-10 17:20                           ` Juri Linkov
  2024-12-11  6:31                             ` Yuan Fu
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-10 17:20 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Mickey Petersen, 73404

[-- Attachment #1: Type: text/plain, Size: 1619 bytes --]

>>     export const add = (a, b) -!- => a + b;
>>
>> typing 'C-M-f' should jump to the end of the next sexp
>> (to the end of whole "binary_expression"):
>>
>>     export const add = (a, b) => a + b-!-;
>>
>> since only tree-sitter knows about "binary_expression",
>> so 'forward-sexp-default-function' can't be used here.
>
> Actually, I have one idea of possible heuristics:
>
> 1. first try 'forward-sexp-default-function'
> 2. if it crosses the boundary of sexp defined by 'treesit-thing-settings'
>    then use 'treesit-end-of-thing' instead
>
> This should work.  Ok, will try.

This is implemented now in the attached patch, and it works nicely.

The main rule is the following: 'forward-sexp-default-function'
should not go out of the current thing, neither go inside a sibling.
So we use 'treesit-end-of-thing' in such cases.  But when inside
a thing or outside a thing, use the default function.

This supposes that such things as "identifier" in js should be
removed from 'treesit-thing-settings' since identifiers should be
navigated the same way as such keywords as "export" and "const"
using 'forward-sexp-default-function'.

What should remain in 'treesit-thing-settings' are only grouping
constructs such as "parenthesized_expression" and "statement_block".

Removing "identifier" from 'treesit-thing-settings' exposed a problem
in 'treesit-navigate-thing'.  This line

  ((and (null next) (null prev)) parent)

tries to go out of the current thing to its parent,
thus breaking the main principle that 'forward-sexp'
should move forward across siblings only.  But removing
this line fixed the problem:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: treesit-forward-sexp.patch --]
[-- Type: text/x-diff, Size: 3264 bytes --]

diff --git a/lisp/treesit.el b/lisp/treesit.el
index db8f7a7595d..4fcdbe7fc56 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2373,21 +2373,41 @@ treesit-forward-sexp
 What constitutes as text and source code sexp is determined
 by `text' and `sexp' in `treesit-thing-settings'."
   (interactive "^p")
-  (let ((arg (or arg 1))
-        (pred (or treesit-sexp-type-regexp 'sexp))
-        (node-at-point
-         (treesit-node-at (point) (treesit-language-at (point)))))
-    (or (when (and node-at-point
-                   ;; Make sure point is strictly inside node.
-                   (< (treesit-node-start node-at-point)
-                      (point)
-                      (treesit-node-end node-at-point))
-                   (treesit-node-match-p node-at-point 'text t))
-          (forward-sexp-default-function arg)
-          t)
-        (if (> arg 0)
-            (treesit-end-of-thing pred (abs arg) 'restricted)
-          (treesit-beginning-of-thing pred (abs arg) 'restricted))
+  (let* ((arg (or arg 1))
+         (pred (or treesit-sexp-type-regexp 'sexp))
+         (current-thing (treesit-thing-at (point) pred t))
+         (default-pos
+          (condition-case _
+              (save-excursion
+                (forward-sexp-default-function arg)
+                (point))
+            (scan-error nil)))
+         (default-pos (unless (eq (point) default-pos) default-pos))
+         (sibling-pos
+          (save-excursion
+            (and (if (> arg 0)
+                     (treesit-end-of-thing pred (abs arg) 'restricted)
+                   (treesit-beginning-of-thing pred (abs arg) 'restricted))
+                 (point))))
+         (sibling (when sibling-pos
+                    (if (> arg 0)
+                        (treesit-thing-prev sibling-pos pred)
+                      (treesit-thing-next sibling-pos pred)))))
+
+    ;; 'forward-sexp-default-function' should not go out of the current thing,
+    ;; neither go inside the next thing, neither go over the next thing
+    (or (when (and default-pos
+                   (or (null current-thing)
+                       (if (> arg 0)
+                           (< default-pos (treesit-node-end current-thing))
+                         (> default-pos (treesit-node-start current-thing))))
+                   (or (null sibling)
+                       (if (> arg 0)
+                           (< default-pos (treesit-node-start sibling))
+                         (> default-pos (treesit-node-end sibling)))))
+          (goto-char default-pos))
+        (when sibling-pos
+          (goto-char sibling-pos))
         ;; If we couldn't move, we should signal an error and report
         ;; the obstacle, like `forward-sexp' does.  If we couldn't
         ;; find a parent, we simply return nil without moving point,
@@ -2849,8 +2869,7 @@ treesit-navigate-thing
           (if (eq tactic 'restricted)
               (setq pos (funcall
                          advance
-                         (cond ((and (null next) (null prev)) parent)
-                               ((> arg 0) next)
+                         (cond ((> arg 0) next)
                                (t prev))))
             ;; For `nested', it's a bit more work:
             ;; Move...

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-10 17:20                           ` Juri Linkov
@ 2024-12-11  6:31                             ` Yuan Fu
  2024-12-11 15:12                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-19  7:34                               ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
  0 siblings, 2 replies; 58+ messages in thread
From: Yuan Fu @ 2024-12-11  6:31 UTC (permalink / raw)
  To: Juri Linkov
  Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
	Stefan Monnier



> On Dec 10, 2024, at 9:20 AM, Juri Linkov <juri@linkov.net> wrote:
> 
>>>    export const add = (a, b) -!- => a + b;
>>> 
>>> typing 'C-M-f' should jump to the end of the next sexp
>>> (to the end of whole "binary_expression"):
>>> 
>>>    export const add = (a, b) => a + b-!-;
>>> 
>>> since only tree-sitter knows about "binary_expression",
>>> so 'forward-sexp-default-function' can't be used here.
>> 
>> Actually, I have one idea of possible heuristics:
>> 
>> 1. first try 'forward-sexp-default-function'
>> 2. if it crosses the boundary of sexp defined by 'treesit-thing-settings'
>>   then use 'treesit-end-of-thing' instead
>> 
>> This should work.  Ok, will try.
> 
> This is implemented now in the attached patch, and it works nicely.
> 
> The main rule is the following: 'forward-sexp-default-function'
> should not go out of the current thing, neither go inside a sibling.
> So we use 'treesit-end-of-thing' in such cases.  But when inside
> a thing or outside a thing, use the default function.
> 
> This supposes that such things as "identifier" in js should be
> removed from 'treesit-thing-settings' since identifiers should be
> navigated the same way as such keywords as "export" and "const"
> using 'forward-sexp-default-function'.
> 
> What should remain in 'treesit-thing-settings' are only grouping
> constructs such as "parenthesized_expression" and "statement_block".

Ah, this matches my idea of defining sexp in other languages as “repeatable construct/list-like construct”. We went with “every syntactic construct” at the time, which I didn’t object to, but I’m definitely happier with the repeatable construct approach. Including Stefan and Theo since they were part of the original sexp navigation discussion.

My only concern is that would the result be a bit unpredictable/confusing when we mix the result of two logic together in such an involved way? We can push to master and try it out for a while. I use tree-sitter sexp navigation for work every day, albeit strictly for navigating list-like constructs—I use forward/backward-word for smaller navigation.

> 
> Removing "identifier" from 'treesit-thing-settings' exposed a problem
> in 'treesit-navigate-thing'.  This line
> 
>  ((and (null next) (null prev)) parent)
> 
> tries to go out of the current thing to its parent,
> thus breaking the main principle that 'forward-sexp'
> should move forward across siblings only.  But removing
> this line fixed the problem:

Thanks, LGTM.

> 
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index db8f7a7595d..4fcdbe7fc56 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -2373,21 +2373,41 @@ treesit-forward-sexp
> What constitutes as text and source code sexp is determined
> by `text' and `sexp' in `treesit-thing-settings'."
>   (interactive "^p")
> -  (let ((arg (or arg 1))
> -        (pred (or treesit-sexp-type-regexp 'sexp))
> -        (node-at-point
> -         (treesit-node-at (point) (treesit-language-at (point)))))
> -    (or (when (and node-at-point
> -                   ;; Make sure point is strictly inside node.
> -                   (< (treesit-node-start node-at-point)
> -                      (point)
> -                      (treesit-node-end node-at-point))
> -                   (treesit-node-match-p node-at-point 'text t))
> -          (forward-sexp-default-function arg)
> -          t)
> -        (if (> arg 0)
> -            (treesit-end-of-thing pred (abs arg) 'restricted)
> -          (treesit-beginning-of-thing pred (abs arg) 'restricted))
> +  (let* ((arg (or arg 1))
> +         (pred (or treesit-sexp-type-regexp 'sexp))
> +         (current-thing (treesit-thing-at (point) pred t))
> +         (default-pos
> +          (condition-case _
> +              (save-excursion
> +                (forward-sexp-default-function arg)
> +                (point))
> +            (scan-error nil)))
> +         (default-pos (unless (eq (point) default-pos) default-pos))
> +         (sibling-pos
> +          (save-excursion
> +            (and (if (> arg 0)
> +                     (treesit-end-of-thing pred (abs arg) 'restricted)
> +                   (treesit-beginning-of-thing pred (abs arg) 'restricted))
> +                 (point))))
> +         (sibling (when sibling-pos
> +                    (if (> arg 0)
> +                        (treesit-thing-prev sibling-pos pred)
> +                      (treesit-thing-next sibling-pos pred)))))
> +
> +    ;; 'forward-sexp-default-function' should not go out of the current thing,
> +    ;; neither go inside the next thing, neither go over the next thing
> +    (or (when (and default-pos
> +                   (or (null current-thing)
> +                       (if (> arg 0)
> +                           (< default-pos (treesit-node-end current-thing))
> +                         (> default-pos (treesit-node-start current-thing))))
> +                   (or (null sibling)
> +                       (if (> arg 0)
> +                           (< default-pos (treesit-node-start sibling))
> +                         (> default-pos (treesit-node-end sibling)))))
> +          (goto-char default-pos))
> +        (when sibling-pos
> +          (goto-char sibling-pos))
>         ;; If we couldn't move, we should signal an error and report
>         ;; the obstacle, like `forward-sexp' does.  If we couldn't
>         ;; find a parent, we simply return nil without moving point,
> @@ -2849,8 +2869,7 @@ treesit-navigate-thing
>           (if (eq tactic 'restricted)
>               (setq pos (funcall
>                          advance
> -                         (cond ((and (null next) (null prev)) parent)
> -                               ((> arg 0) next)
> +                         (cond ((> arg 0) next)
>                                (t prev))))
>             ;; For `nested', it's a bit more work:
>             ;; Move...







^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-11  6:31                             ` Yuan Fu
@ 2024-12-11 15:12                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-11 15:29                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
                                                   ` (2 more replies)
  2024-12-19  7:34                               ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
  1 sibling, 3 replies; 58+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-11 15:12 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
	Juri Linkov

> Ah, this matches my idea of defining sexp in other languages as “repeatable
> construct/list-like construct”.  We went with “every syntactic construct” at
> the time, which I didn’t object to, but I’m definitely happier with the
> repeatable construct approach. Including Stefan and Theo since they were
> part of the original sexp navigation discussion.

FWIW, we have both `forward-list` and `forward-list` and the new
behavior you suggest sounds closer to the historical behavior of
`forward-list` than `forward-sexp`.


        Stefan






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-11 15:12                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-11 15:29                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-11 16:50                                 ` Mickey Petersen
  2024-12-11 18:27                                 ` Yuan Fu
  2 siblings, 0 replies; 58+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-11 15:29 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
	Juri Linkov

> FWIW, we have both `forward-list` and `forward-list` and the new
                                                 ^^^^
                                                 sexp

Otherwise it sounds smarter than it is.


        Stefan






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-11 15:12                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-11 15:29                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-11 16:50                                 ` Mickey Petersen
  2024-12-11 18:27                                 ` Yuan Fu
  2 siblings, 0 replies; 58+ messages in thread
From: Mickey Petersen @ 2024-12-11 16:50 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: 73404, Yuan Fu, Theodor Thornhill, Eli Zaretskii, Juri Linkov


Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> Ah, this matches my idea of defining sexp in other languages as “repeatable
>> construct/list-like construct”.  We went with “every syntactic construct” at
>> the time, which I didn’t object to, but I’m definitely happier with the
>> repeatable construct approach. Including Stefan and Theo since they were
>> part of the original sexp navigation discussion.
>
> FWIW, we have both `forward-list` and `forward-list` and the new
> behavior you suggest sounds closer to the historical behavior of
> `forward-list` than `forward-sexp`.
>

Indeed, in Combobulate `<forward/backward>-list' is explicitly used
for sibling navigation, and `<forward/backward>-sexp' instead does
what it does in plain major modes, but tweaked ever so slightly.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-11 15:12                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-11 15:29                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-11 16:50                                 ` Mickey Petersen
@ 2024-12-11 18:27                                 ` Yuan Fu
  2024-12-12  7:17                                   ` Juri Linkov
  2 siblings, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-12-11 18:27 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
	Juri Linkov



> On Dec 11, 2024, at 7:12 AM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
>> Ah, this matches my idea of defining sexp in other languages as “repeatable
>> construct/list-like construct”.  We went with “every syntactic construct” at
>> the time, which I didn’t object to, but I’m definitely happier with the
>> repeatable construct approach. Including Stefan and Theo since they were
>> part of the original sexp navigation discussion.
> 
> FWIW, we have both `forward-list` and `forward-list` and the new
> behavior you suggest sounds closer to the historical behavior of
> `forward-list` than `forward-sexp`.
> 
> 
>        Stefan
> 

Actually, what’s the difference between forward-list and forward-sexp? I always thought they are the same at least for Lisp.

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-11 18:27                                 ` Yuan Fu
@ 2024-12-12  7:17                                   ` Juri Linkov
  2024-12-12  7:40                                     ` Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-12  7:17 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, Stefan Monnier,
	73404

>>> Ah, this matches my idea of defining sexp in other languages as “repeatable
>>> construct/list-like construct”.  We went with “every syntactic construct” at
>>> the time, which I didn’t object to, but I’m definitely happier with the
>>> repeatable construct approach. Including Stefan and Theo since they were
>>> part of the original sexp navigation discussion.
>> 
>> FWIW, we have both `forward-list` and `forward-list` and the new
>> behavior you suggest sounds closer to the historical behavior of
>> `forward-list` than `forward-sexp`.
>
> Actually, what’s the difference between forward-list and forward-sexp?
> I always thought they are the same at least for Lisp.

forward-sexp moves over a balanced parenthetical group like
forward-list does.  Plus forward-sexp also moves over an atom
such as a symbol, a number.

The problem is that treesit adds too much structural information
to such simple things as a symbol and a number.  For example, in js
a simple keyword "export" gets the "(export_statement export" subtree,
Another keyword "const" gets "(lexical_declaration kind: const", etc.

Therefore for such symbols forward-sexp needs to bypass the structure
and use simpler syntactic information to move over them like on a flat list.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12  7:17                                   ` Juri Linkov
@ 2024-12-12  7:40                                     ` Eli Zaretskii
  2024-12-12  7:58                                       ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-12-12  7:40 UTC (permalink / raw)
  To: Juri Linkov; +Cc: theo, casouri, mickey, monnier, 73404

> From: Juri Linkov <juri@linkov.net>
> Cc: Stefan Monnier <monnier@iro.umontreal.ca>,  Eli Zaretskii
>  <eliz@gnu.org>,  Mickey Petersen <mickey@masteringemacs.org>,
>   73404@debbugs.gnu.org,  Theodor Thornhill <theo@thornhill.no>
> Date: Thu, 12 Dec 2024 09:17:44 +0200
> 
> >>> Ah, this matches my idea of defining sexp in other languages as “repeatable
> >>> construct/list-like construct”.  We went with “every syntactic construct” at
> >>> the time, which I didn’t object to, but I’m definitely happier with the
> >>> repeatable construct approach. Including Stefan and Theo since they were
> >>> part of the original sexp navigation discussion.
> >> 
> >> FWIW, we have both `forward-list` and `forward-list` and the new
> >> behavior you suggest sounds closer to the historical behavior of
> >> `forward-list` than `forward-sexp`.
> >
> > Actually, what’s the difference between forward-list and forward-sexp?
> > I always thought they are the same at least for Lisp.
> 
> forward-sexp moves over a balanced parenthetical group like
> forward-list does.  Plus forward-sexp also moves over an atom
> such as a symbol, a number.
> 
> The problem is that treesit adds too much structural information
> to such simple things as a symbol and a number.  For example, in js
> a simple keyword "export" gets the "(export_statement export" subtree,
> Another keyword "const" gets "(lexical_declaration kind: const", etc.
> 
> Therefore for such symbols forward-sexp needs to bypass the structure
> and use simpler syntactic information to move over them like on a flat list.

If you mean we should ignore the information provided by tree-sitter
and instead use our own syntactic information, then that sounds wrong
to me, FWIW.  Why cannot we understand enough of the tree-sitter
structural information to move like we want?  Presumably, the
structural information provided by tree-sitter is a portion of a parse
tree, which to me means we should be able to move between the parse
tree's nodes as long as we understand the tree and can interpret it in
our terms.

Aren't there some grammar-agnostic traits of tree-sitter nodes that
would allow us to interpret the nodes in language-independent terms?
If that is not available, then each major mode will have to provide
treesit.el with a way to interpret the tree-sitter nodes of the
corresponding grammar in a way that will allow sexp movement, thus
providing an abstraction layer that treesit.el could use for the
movement commands.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12  7:40                                     ` Eli Zaretskii
@ 2024-12-12  7:58                                       ` Juri Linkov
  2024-12-12  8:14                                         ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-12  7:58 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: theo, casouri, mickey, monnier, 73404

>> forward-sexp moves over a balanced parenthetical group like
>> forward-list does.  Plus forward-sexp also moves over an atom
>> such as a symbol, a number.
>> 
>> The problem is that treesit adds too much structural information
>> to such simple things as a symbol and a number.  For example, in js
>> a simple keyword "export" gets the "(export_statement export" subtree,
>> Another keyword "const" gets "(lexical_declaration kind: const", etc.
>> 
>> Therefore for such symbols forward-sexp needs to bypass the structure
>> and use simpler syntactic information to move over them like on a flat list.
>
> If you mean we should ignore the information provided by tree-sitter
> and instead use our own syntactic information, then that sounds wrong
> to me, FWIW.  Why cannot we understand enough of the tree-sitter
> structural information to move like we want?  Presumably, the
> structural information provided by tree-sitter is a portion of a parse
> tree, which to me means we should be able to move between the parse
> tree's nodes as long as we understand the tree and can interpret it in
> our terms.
>
> Aren't there some grammar-agnostic traits of tree-sitter nodes that
> would allow us to interpret the nodes in language-independent terms?
> If that is not available, then each major mode will have to provide
> treesit.el with a way to interpret the tree-sitter nodes of the
> corresponding grammar in a way that will allow sexp movement, thus
> providing an abstraction layer that treesit.el could use for the
> movement commands.

Maybe it would be possible to use something like 'flatten-tree'
on the treesit's syntax tree?  But this will require the addition
of a lot of rules to specify what nodes should be flattened.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12  7:58                                       ` Juri Linkov
@ 2024-12-12  8:14                                         ` Juri Linkov
  2024-12-12 16:31                                           ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-12  8:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404

> Maybe it would be possible to use something like 'flatten-tree'
> on the treesit's syntax tree?  But this will require the addition
> of a lot of rules to specify what nodes should be flattened.

A better idea: instead of 'sexp' in treesit-thing-settings
define separately 'list' and 'atom', e.g. replace

    (setq-local treesit-thing-settings
                `((javascript
                   (sexp ,(js--regexp-opt-symbol js--treesit-sexp-nodes)))))

with

    (setq-local treesit-thing-settings
                `((javascript
                   (list ,(js--regexp-opt-symbol js--treesit-list-nodes))
                   (atom ,(js--regexp-opt-symbol js--treesit-atom-nodes)))))





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12  8:14                                         ` Juri Linkov
@ 2024-12-12 16:31                                           ` Juri Linkov
  2024-12-12 17:49                                             ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-12 16:31 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: theo, casouri, mickey, monnier, 73404

> A better idea: instead of 'sexp' in treesit-thing-settings
> define separately 'list' and 'atom', e.g. replace
>
>     (setq-local treesit-thing-settings
>                 `((javascript
>                    (sexp ,(js--regexp-opt-symbol js--treesit-sexp-nodes)))))
>
> with
>
>     (setq-local treesit-thing-settings
>                 `((javascript
>                    (list ,(js--regexp-opt-symbol js--treesit-list-nodes))
>                    (atom ,(js--regexp-opt-symbol js--treesit-atom-nodes)))))

Still the problem is that using the atoms from the tree
doesn't provide backward-compatibility with non-ts modes
and how C-M-f moves on non-list atoms there.

One way would be to extract anonymous text leaf nodes
such as "export" and "const" from

 (export_statement "export" declaration: 
 (lexical_declaration kind: "const"

But still need to check symbol/word syntax to omit such nodes
as "+" from

  (binary_expression left: (identifier) operator: "+"

Therefore to provide backward-compatibility with non-ts modes
in regard to C-M-f navigation, navigation on atoms should follow
the Sword/Ssymbol rules of 'scan_lists' with non-nil 'sexpflag'.

So an atom should be a syntactical entity, not structural.
This means that treesit-forward-sexp should use the 'list' thing
with syntactical atoms.

For example, for 'C-M-f' on

  var p = {
    case: 'zzzz',
    -!-default: 'donkey',
    tee: 'ornery'
  };

in js-ts-mode it would be unexpected to move to

  var p = {
    case: 'zzzz',
    default: 'donkey'-!-,
    tee: 'ornery'
  };

because js-mode moves to

  var p = {
    case: 'zzzz',
    default-!-: 'donkey',
    tee: 'ornery'
  };

But anyway 'list' should be customizable as 'sexp' already is.

OTOH, transpose-sexps should use treesit 'sexp' with more fine-grained
lists that are not suitable for forward-sexp.
(And we have no transpose-lists.)

For example, it's expected for transpose-sexps to transpose
a pair of key:value defined by 'sexp':

  var p = {
    default: 'donkey',
    -!-case: 'zzzz',
    tee: 'ornery'
  };

that will be a big improvement when comparing to js-mode.
This also will close bug#60655.

And if someone want to transpose adjacent atoms (symbols or words),
there is transpose-words.

In summary:
The current implementation in treesit-forward-sexp is more like forward-list.
So let's rename it to treesit-forward-list, then create a new implementation
of treesit-forward-sexp that uses the 'list' thing together with
syntactical atoms.

Another variant is to leave treesit-forward-sexp as is,
and create a new function treesit-forward-sexp-with-list
that uses the 'list' thing.

And anyway let's keep the current implementation
of treesit-transpose-sexps that uses the 'sexp' thing.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12 16:31                                           ` Juri Linkov
@ 2024-12-12 17:49                                             ` Juri Linkov
  2024-12-12 19:13                                               ` Eli Zaretskii
  2024-12-18  7:37                                               ` Juri Linkov
  0 siblings, 2 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-12 17:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404

[-- Attachment #1: Type: text/plain, Size: 372 bytes --]

> Another variant is to leave treesit-forward-sexp as is,
> and create a new function treesit-forward-sexp-with-list
> that uses the 'list' thing.

This patch keep the current function treesit-forward-sexp,
and creates a new function treesit-forward-sexp-list
that uses the 'sexp-list' thing to navigate lists while
using forward-sexp-default-function to navigate atoms:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: treesit-forward-sexp-list.patch --]
[-- Type: text/x-diff, Size: 5601 bytes --]

diff --git a/lisp/treesit.el b/lisp/treesit.el
index db8f7a7595d..f064be55b9c 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2400,6 +2400,68 @@ treesit-forward-sexp
                                     (treesit-node-start boundary)
                                     (treesit-node-end boundary)))))))
 
+(defun treesit-forward-sexp-list (&optional arg)
+  "Tree-sitter implementation for `forward-sexp-function'.
+
+ARG is described in the docstring of `forward-sexp-function'.
+
+If point is inside a text environment where tree-sitter is not
+supported, go forward a sexp using `forward-sexp-default-function'.
+If point is inside code, use tree-sitter functions with the
+following behavior.  If there are no further sexps to move across,
+signal `scan-error' like `forward-sexp' does.  If point is already
+at top-level, return nil without moving point.
+
+What constitutes as text and source code sexp is determined
+by `text' and `sexp' in `treesit-thing-settings'."
+  (interactive "^p")
+  (let* ((arg (or arg 1))
+         (pred (or treesit-sexp-type-regexp 'sexp-list))
+         (current-thing (treesit-thing-at (point) pred t))
+         (default-pos
+          (condition-case _
+              (save-excursion
+                (forward-sexp-default-function arg)
+                (point))
+            (scan-error nil)))
+         (default-pos (unless (eq (point) default-pos) default-pos))
+         (sibling-pos
+          (save-excursion
+            (and (if (> arg 0)
+                     (treesit-end-of-thing pred (abs arg) 'restricted)
+                   (treesit-beginning-of-thing pred (abs arg) 'restricted))
+                 (point))))
+         (sibling (when sibling-pos
+                    (if (> arg 0)
+                        (treesit-thing-prev sibling-pos pred)
+                      (treesit-thing-next sibling-pos pred)))))
+
+    ;; 'forward-sexp-default-function' should not go out of the current thing,
+    ;; neither go inside the next thing, neither go over the next thing
+    (or (when (and default-pos
+                   (or (null current-thing)
+                       (if (> arg 0)
+                           (< default-pos (treesit-node-end current-thing))
+                         (> default-pos (treesit-node-start current-thing))))
+                   (or (null sibling)
+                       (if (> arg 0)
+                           (< default-pos (treesit-node-start sibling))
+                         (> default-pos (treesit-node-end sibling)))))
+          (goto-char default-pos))
+        (when sibling-pos
+          (goto-char sibling-pos))
+        ;; If we couldn't move, we should signal an error and report
+        ;; the obstacle, like `forward-sexp' does.  If we couldn't
+        ;; find a parent, we simply return nil without moving point,
+        ;; then functions like `up-list' will signal "at top level".
+        (when-let* ((parent (treesit-thing-at (point) pred t))
+                    (boundary (if (> arg 0)
+                                  (treesit-node-child parent -1)
+                                (treesit-node-child parent 0))))
+          (signal 'scan-error (list "No more sexp to move across"
+                                    (treesit-node-start boundary)
+                                    (treesit-node-end boundary)))))))
+
 (defun treesit-transpose-sexps (&optional arg)
   "Tree-sitter `transpose-sexps' function.
 ARG is the same as in `transpose-sexps'.
@@ -2849,7 +2911,7 @@ treesit-navigate-thing
           (if (eq tactic 'restricted)
               (setq pos (funcall
                          advance
-                         (cond ((and (null next) (null prev)) parent)
+                         (cond ((and (null next) (null prev) (not (eq thing 'sexp-list))) parent)
                                ((> arg 0) next)
                                (t prev))))
             ;; For `nested', it's a bit more work:
@@ -3246,6 +3308,9 @@ treesit-major-mode-setup
     (setq-local forward-sexp-function #'treesit-forward-sexp)
     (setq-local transpose-sexps-function #'treesit-transpose-sexps))
 
+  (when (treesit-thing-defined-p 'sexp-list nil)
+    (setq-local forward-sexp-function #'treesit-forward-sexp-list))
+
   (when (treesit-thing-defined-p 'sentence nil)
     (setq-local forward-sentence-function #'treesit-forward-sentence))
 
diff --git a/lisp/progmodes/js.el b/lisp/progmodes/js.el
index dbf721e8d0f..c4d33564e80 100644
--- a/lisp/progmodes/js.el
+++ b/lisp/progmodes/js.el
@@ -3878,6 +3878,19 @@ js--treesit-sexp-nodes
   "Nodes that designate sexps in JavaScript.
 See `treesit-thing-settings' for more information.")
 
+(defvar js--treesit-sexp-list-nodes
+  '("formal_parameters"
+    "arguments"
+    "statement_block"
+    "parenthesized_expression"
+    "switch_body"
+    "array"
+    "object"
+    "string"
+    "regex")
+  "Nodes that designate lists in JavaScript.
+See `treesit-thing-settings' for more information.")
+
 (defvar js--treesit-jsdoc-beginning-regexp (rx bos "/**")
   "Regular expression matching the beginning of a jsdoc block comment.")
 
@@ -3921,6 +3934,7 @@ js-ts-mode
     (setq-local treesit-thing-settings
                 `((javascript
                    (sexp ,(js--regexp-opt-symbol js--treesit-sexp-nodes))
+                   (sexp-list ,(js--regexp-opt-symbol js--treesit-sexp-list-nodes))
                    (sentence ,(js--regexp-opt-symbol js--treesit-sentence-nodes))
                    (text ,(js--regexp-opt-symbol '("comment"
                                                    "string_fragment"))))))

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12 17:49                                             ` Juri Linkov
@ 2024-12-12 19:13                                               ` Eli Zaretskii
  2024-12-13  7:06                                                 ` Juri Linkov
  2024-12-18  7:37                                               ` Juri Linkov
  1 sibling, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-12-12 19:13 UTC (permalink / raw)
  To: Juri Linkov; +Cc: mickey, casouri, theo, monnier, 73404

> From: Juri Linkov <juri@linkov.net>
> Cc: theo@thornhill.no,  casouri@gmail.com,  mickey@masteringemacs.org,
>   monnier@iro.umontreal.ca,  73404@debbugs.gnu.org
> Date: Thu, 12 Dec 2024 19:49:30 +0200
> 
> > Another variant is to leave treesit-forward-sexp as is,
> > and create a new function treesit-forward-sexp-with-list
> > that uses the 'list' thing.
> 
> This patch keep the current function treesit-forward-sexp,
> and creates a new function treesit-forward-sexp-list
> that uses the 'sexp-list' thing to navigate lists while
> using forward-sexp-default-function to navigate atoms:

Introducing a new command raises the question how should users
navigate using the two.  Will C-M-<RIGHT> invoke forward-sexp or the
new command (or maybe one or the other in some dwin-ish way)?

What are the problems with teaching treesit-forward-sexp do what the
new command does?  IOW, why do we need a new command?





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12 19:13                                               ` Eli Zaretskii
@ 2024-12-13  7:06                                                 ` Juri Linkov
  2024-12-14 11:02                                                   ` Eli Zaretskii
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-13  7:06 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404

>> This patch keep the current function treesit-forward-sexp,
>> and creates a new function treesit-forward-sexp-list
>> that uses the 'sexp-list' thing to navigate lists while
>> using forward-sexp-default-function to navigate atoms:
>
> Introducing a new command raises the question how should users
> navigate using the two.  Will C-M-<RIGHT> invoke forward-sexp or the
> new command (or maybe one or the other in some dwin-ish way)?
>
> What are the problems with teaching treesit-forward-sexp do what the
> new command does?  IOW, why do we need a new command?

Actually, it's not a new command, but a new function for C-M-<RIGHT>
via forward-sexp-function.  This is from treesit-major-mode-setup:

  (when (treesit-thing-defined-p 'sexp nil)
    (setq-local forward-sexp-function #'treesit-forward-sexp)
    (setq-local transpose-sexps-function #'treesit-transpose-sexps))

  (when (treesit-thing-defined-p 'sexp-list nil)
    (setq-local forward-sexp-function #'treesit-forward-sexp-list))

When a ts-mode defines the 'sexp-list' thing, only then it's used.
Otherwise the current implementation with 'sexp' is used for C-M-<RIGHT>.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-13  7:06                                                 ` Juri Linkov
@ 2024-12-14 11:02                                                   ` Eli Zaretskii
  2024-12-14 18:14                                                     ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Eli Zaretskii @ 2024-12-14 11:02 UTC (permalink / raw)
  To: Juri Linkov; +Cc: mickey, casouri, theo, monnier, 73404

> From: Juri Linkov <juri@linkov.net>
> Cc: theo@thornhill.no,  casouri@gmail.com,  mickey@masteringemacs.org,
>   monnier@iro.umontreal.ca,  73404@debbugs.gnu.org
> Date: Fri, 13 Dec 2024 09:06:04 +0200
> 
> >> This patch keep the current function treesit-forward-sexp,
> >> and creates a new function treesit-forward-sexp-list
> >> that uses the 'sexp-list' thing to navigate lists while
> >> using forward-sexp-default-function to navigate atoms:
> >
> > Introducing a new command raises the question how should users
> > navigate using the two.  Will C-M-<RIGHT> invoke forward-sexp or the
> > new command (or maybe one or the other in some dwin-ish way)?
> >
> > What are the problems with teaching treesit-forward-sexp do what the
> > new command does?  IOW, why do we need a new command?
> 
> Actually, it's not a new command, but a new function for C-M-<RIGHT>
> via forward-sexp-function.

It's an interactive function, so it's a command.

> This is from treesit-major-mode-setup:
> 
>   (when (treesit-thing-defined-p 'sexp nil)
>     (setq-local forward-sexp-function #'treesit-forward-sexp)
>     (setq-local transpose-sexps-function #'treesit-transpose-sexps))
> 
>   (when (treesit-thing-defined-p 'sexp-list nil)
>     (setq-local forward-sexp-function #'treesit-forward-sexp-list))
> 
> When a ts-mode defines the 'sexp-list' thing, only then it's used.
> Otherwise the current implementation with 'sexp' is used for C-M-<RIGHT>.

Then why is the new function interactive?





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-14 11:02                                                   ` Eli Zaretskii
@ 2024-12-14 18:14                                                     ` Juri Linkov
  0 siblings, 0 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-14 18:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: mickey, casouri, theo, monnier, 73404

[-- Attachment #1: Type: text/plain, Size: 937 bytes --]

>> Actually, it's not a new command, but a new function for C-M-<RIGHT>
>> via forward-sexp-function.
>
> It's an interactive function, so it's a command.
>
>> When a ts-mode defines the 'sexp-list' thing, only then it's used.
>> Otherwise the current implementation with 'sexp' is used for C-M-<RIGHT>.
>
> Then why is the new function interactive?

Ah, I didn't notice it's interactive!

treesit-forward-sexp-list was based on treesit-forward-sexp
that has the interactive spec added by Theo in the commit 207901457c01.

I guess that Theo decided to make this function interactive
for such use case that users could use it separately
or bind it to a key.

What really needs to be interactive is the new function
treesit-forward-list added in the following patch
because there is no special variable forward-list-function for
forward-list like there is forward-sexp-function for forward-sexp.
So users might want to use it separately:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: treesit-forward-list.patch --]
[-- Type: text/x-diff, Size: 5315 bytes --]

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 18200acf53f..a1c012b6d2f 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2366,6 +2366,11 @@ treesit-sexp-type-regexp
 however, smaller in scope than sentences.  This is used by
 `treesit-forward-sexp' and friends.")
 
+(defun treesit-forward-list (&optional arg)
+  (interactive "^p")
+  (let ((treesit-sexp-type-regexp 'sexp-list))
+    (treesit-forward-sexp arg)))
+
 (defun treesit-forward-sexp (&optional arg)
   "Tree-sitter implementation for `forward-sexp-function'.
 
@@ -2382,18 +2387,8 @@ treesit-forward-sexp
 by `text' and `sexp' in `treesit-thing-settings'."
   (interactive "^p")
   (let ((arg (or arg 1))
-        (pred (or treesit-sexp-type-regexp 'sexp))
-        (node-at-point
-         (treesit-node-at (point) (treesit-language-at (point)))))
-    (or (when (and node-at-point
-                   ;; Make sure point is strictly inside node.
-                   (< (treesit-node-start node-at-point)
-                      (point)
-                      (treesit-node-end node-at-point))
-                   (treesit-node-match-p node-at-point 'text t))
-          (forward-sexp-default-function arg)
-          t)
-        (if (> arg 0)
+        (pred (or treesit-sexp-type-regexp 'sexp)))
+    (or (if (> arg 0)
             (treesit-end-of-thing pred (abs arg) 'restricted)
           (treesit-beginning-of-thing pred (abs arg) 'restricted))
         ;; If we couldn't move, we should signal an error and report
@@ -2408,6 +2403,63 @@ treesit-forward-sexp
                                     (treesit-node-start boundary)
                                     (treesit-node-end boundary)))))))
 
+(defun treesit-forward-sexp-list (&optional arg)
+  "Tree-sitter implementation for `forward-sexp-function'.
+
+ARG is described in the docstring of `forward-sexp-function'.
+
+If point is inside a text environment where tree-sitter is not
+supported, go forward a sexp using `forward-sexp-default-function'.
+If point is inside code, use tree-sitter functions with the
+following behavior.  If there are no further sexps to move across,
+signal `scan-error' like `forward-sexp' does.  If point is already
+at top-level, return nil without moving point.
+
+What constitutes as text and source code sexp is determined
+by `text' and `sexp' in `treesit-thing-settings'."
+  (interactive "^p")
+  (let* ((arg (or arg 1))
+         (pred 'sexp-list)
+         (default-pos
+          (condition-case _
+              (save-excursion
+                (forward-sexp-default-function arg)
+                (point))
+            (scan-error nil)))
+         (default-pos (unless (eq (point) default-pos) default-pos))
+         (sibling-pos
+          (when default-pos
+            (save-excursion
+              (and (if (> arg 0)
+                       (treesit-end-of-thing pred (abs arg) 'restricted)
+                     (treesit-beginning-of-thing pred (abs arg) 'restricted))
+                   (point)))))
+         (sibling (when sibling-pos
+                    (if (> arg 0)
+                        (treesit-thing-prev sibling-pos pred)
+                      (treesit-thing-next sibling-pos pred))))
+         (sibling (when (and sibling
+                             (if (> arg 0)
+                                 (<= (point) (treesit-node-start sibling))
+                               (>= (point) (treesit-node-start sibling))))
+                    sibling))
+         (current-thing (when default-pos
+                          (treesit-thing-at (point) pred t))))
+
+    ;; 'forward-sexp-default-function' should not go out of the current thing,
+    ;; neither go inside the next thing, neither go over the next thing
+    (or (when (and default-pos
+                   (or (null current-thing)
+                       (if (> arg 0)
+                           (< default-pos (treesit-node-end current-thing))
+                         (> default-pos (treesit-node-start current-thing))))
+                   (or (null sibling)
+                       (if (> arg 0)
+                           (<= default-pos (treesit-node-start sibling))
+                         (>= default-pos (treesit-node-end sibling)))))
+          (goto-char default-pos))
+        (treesit-forward-list arg))))
+
 (defun treesit-transpose-sexps (&optional arg)
   "Tree-sitter `transpose-sexps' function.
 ARG is the same as in `transpose-sexps'.
@@ -2857,7 +2909,7 @@ treesit-navigate-thing
           (if (eq tactic 'restricted)
               (setq pos (funcall
                          advance
-                         (cond ((and (null next) (null prev)) parent)
+                         (cond ((and (null next) (null prev) (not (eq thing 'sexp-list))) parent)
                                ((> arg 0) next)
                                (t prev))))
             ;; For `nested', it's a bit more work:
@@ -3254,6 +3306,9 @@ treesit-major-mode-setup
     (setq-local forward-sexp-function #'treesit-forward-sexp)
     (setq-local transpose-sexps-function #'treesit-transpose-sexps))
 
+  (when (treesit-thing-defined-p 'sexp-list nil)
+    (setq-local forward-sexp-function #'treesit-forward-sexp-list))
+
   (when (treesit-thing-defined-p 'sentence nil)
     (setq-local forward-sentence-function #'treesit-forward-sentence))
 

^ permalink raw reply related	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-12 17:49                                             ` Juri Linkov
  2024-12-12 19:13                                               ` Eli Zaretskii
@ 2024-12-18  7:37                                               ` Juri Linkov
  2024-12-19  4:04                                                 ` Yuan Fu
  1 sibling, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-18  7:37 UTC (permalink / raw)
  To: Yuan Fu; +Cc: theo, mickey, monnier, 73404

While testing treesit-forward-sexp-list, I discovered that
thing-navigation functions are not restricted to named nodes.

I wonder if there a reason to find anonymous nodes as things?

The problem was found with the node "unless" in Ruby:

  unless cond
    a += 1
  else
    b -= 1
  end

Here the named node 'unless' has exactly the same name
as the anonymous node with the text "unless":

  (unless "unless" condition: (identifier)

Finding anonymous nodes breaks forward-sexp when point is on "unless":

  un-!-less cond
    a += 1
  else
    b -= 1
  end

because (treesit-thing-at (point) 'sexp t) finds

  #<treesit-node "unless" in 156-162>

instead of

  #<treesit-node unless in 156-203>

Also this breaks backward-sexp and backward-up-list
because treesit--thing-sibling finds
the anonymous node "unless" as a previous sibling
instead of the named node 'unless' as a parent.

Would the right solution be to check if the found thing
is a named node?  With something like:

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 18200acf53f..9ad879ee40c 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -2711,6 +2774,7 @@ treesit--thing-sibling
                      (lambda (n) (>= (treesit-node-start n) pos))))
          (iter-pred (lambda (node)
                       (and (treesit-node-match-p node thing t)
+                           (treesit-node-check node 'named)
                            (funcall pos-pred node))))
          (sibling nil))
     (when cursor
@@ -2760,6 +2824,7 @@ treesit-thing-at
   (let* ((cursor (treesit-node-at pos))
          (iter-pred (lambda (node)
                       (and (treesit-node-match-p node thing t)
+                           (treesit-node-check node 'named)
                            (if strict
                                (< (treesit-node-start node) pos)
                              (<= (treesit-node-start node) pos))





^ permalink raw reply related	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-18  7:37                                               ` Juri Linkov
@ 2024-12-19  4:04                                                 ` Yuan Fu
  2024-12-19  7:14                                                   ` Juri Linkov
  2024-12-19  7:18                                                   ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
  0 siblings, 2 replies; 58+ messages in thread
From: Yuan Fu @ 2024-12-19  4:04 UTC (permalink / raw)
  To: Juri Linkov; +Cc: theo, Mickey Petersen, monnier, 73404



> On Dec 17, 2024, at 11:37 PM, Juri Linkov <juri@linkov.net> wrote:
> 
> While testing treesit-forward-sexp-list, I discovered that
> thing-navigation functions are not restricted to named nodes.
> 
> I wonder if there a reason to find anonymous nodes as things?

We should rather ask is there any reason to not find anonymous nodes as things? Even ruby-ts-mode defines a bunch of anonymous nodes as sexp, no? In any case, excluding anonymous nodes from things doesn’t sound right.

> 
> The problem was found with the node "unless" in Ruby:
> 
>  unless cond
>    a += 1
>  else
>    b -= 1
>  end
> 
> Here the named node 'unless' has exactly the same name
> as the anonymous node with the text "unless":
> 
>  (unless "unless" condition: (identifier)

I feel like Ruby’s grammar should call the named node something else, like unless_statement.

> 
> Finding anonymous nodes breaks forward-sexp when point is on "unless":
> 
>  un-!-less cond
>    a += 1
>  else
>    b -= 1
>  end
> 
> because (treesit-thing-at (point) 'sexp t) finds
> 
>  #<treesit-node "unless" in 156-162>
> 
> instead of
> 
>  #<treesit-node unless in 156-203>
> 
> Also this breaks backward-sexp and backward-up-list
> because treesit--thing-sibling finds
> the anonymous node "unless" as a previous sibling
> instead of the named node 'unless' as a parent.
> 
> Would the right solution be to check if the found thing
> is a named node?  With something like:
> 
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 18200acf53f..9ad879ee40c 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -2711,6 +2774,7 @@ treesit--thing-sibling
>                      (lambda (n) (>= (treesit-node-start n) pos))))
>          (iter-pred (lambda (node)
>                       (and (treesit-node-match-p node thing t)
> +                           (treesit-node-check node 'named)
>                            (funcall pos-pred node))))
>          (sibling nil))
>     (when cursor
> @@ -2760,6 +2824,7 @@ treesit-thing-at
>   (let* ((cursor (treesit-node-at pos))
>          (iter-pred (lambda (node)
>                       (and (treesit-node-match-p node thing t)
> +                           (treesit-node-check node 'named)
>                            (if strict
>                                (< (treesit-node-start node) pos)
>                              (<= (treesit-node-start node) pos))

A better solution IMO is to add some way to distinguish between named and anonymous nodes. I can think of two ways, either add “and” and “named/anonymous” predicate, so (and named “unless”) only matches the named “unless” node; or we add a special syntax such that “(unless)” only matches named nodes, and “\”unless\”” only matches anonymous nodes.

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-19  4:04                                                 ` Yuan Fu
@ 2024-12-19  7:14                                                   ` Juri Linkov
  2024-12-19  7:18                                                   ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
  1 sibling, 0 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-19  7:14 UTC (permalink / raw)
  To: Yuan Fu; +Cc: theo, Mickey Petersen, monnier, 73404

> A better solution IMO is to add some way to distinguish between named and
> anonymous nodes. I can think of two ways, either add “and” and
> “named/anonymous” predicate, so (and named “unless”) only matches the named
> “unless” node; or we add a special syntax such that “(unless)” only matches
> named nodes, and “\”unless\”” only matches anonymous nodes.

I agree, so will create a new feature request.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-19  4:04                                                 ` Yuan Fu
  2024-12-19  7:14                                                   ` Juri Linkov
@ 2024-12-19  7:18                                                   ` Juri Linkov
  2024-12-24  3:02                                                     ` Yuan Fu
  1 sibling, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-19  7:18 UTC (permalink / raw)
  To: 74963; +Cc: Yuan Fu, Dmitry Gutov

[This is a separate bug report from bug#73404]

>> While testing treesit-forward-sexp-list, I discovered that
>> thing-navigation functions are not restricted to named nodes.
>> 
>> I wonder if there a reason to find anonymous nodes as things?
>
> We should rather ask is there any reason to not find anonymous nodes
> as things? Even ruby-ts-mode defines a bunch of anonymous nodes as
> sexp, no? In any case, excluding anonymous nodes from things doesn’t
> sound right.

Indeed, there are many anonymous nodes used in ruby-ts-mode.

>> The problem was found with the node "unless" in Ruby:
>> 
>>  unless cond
>>    a += 1
>>  else
>>    b -= 1
>>  end
>> 
>> Here the named node 'unless' has exactly the same name
>> as the anonymous node with the text "unless":
>> 
>>  (unless "unless" condition: (identifier)
>
> I feel like Ruby’s grammar should call the named node something else,
> like unless_statement.

Agreed, the problem is that nodes defined in Ruby’s grammar
are too ambiguous.  There are more such nodes with the same name
for named and anonymous: "if", "while", "until", etc.

>> Finding anonymous nodes breaks forward-sexp when point is on "unless":
>> 
>>  un-!-less cond
>>    a += 1
>>  else
>>    b -= 1
>>  end
>> 
>> because (treesit-thing-at (point) 'sexp t) finds
>> 
>>  #<treesit-node "unless" in 156-162>
>> 
>> instead of
>> 
>>  #<treesit-node unless in 156-203>
>> 
>> Also this breaks backward-sexp and backward-up-list
>> because treesit--thing-sibling finds
>> the anonymous node "unless" as a previous sibling
>> instead of the named node 'unless' as a parent.
>> 
>> Would the right solution be to check if the found thing
>> is a named node?  With something like:
>> 
>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>> index 18200acf53f..9ad879ee40c 100644
>> --- a/lisp/treesit.el
>> +++ b/lisp/treesit.el
>> @@ -2711,6 +2774,7 @@ treesit--thing-sibling
>>                      (lambda (n) (>= (treesit-node-start n) pos))))
>>          (iter-pred (lambda (node)
>>                       (and (treesit-node-match-p node thing t)
>> +                           (treesit-node-check node 'named)
>>                            (funcall pos-pred node))))
>>          (sibling nil))
>>     (when cursor
>> @@ -2760,6 +2824,7 @@ treesit-thing-at
>>   (let* ((cursor (treesit-node-at pos))
>>          (iter-pred (lambda (node)
>>                       (and (treesit-node-match-p node thing t)
>> +                           (treesit-node-check node 'named)
>>                            (if strict
>>                                (< (treesit-node-start node) pos)
>>                              (<= (treesit-node-start node) pos))
>
> A better solution IMO is to add some way to distinguish between named and
> anonymous nodes. I can think of two ways, either add “and” and
> “named/anonymous” predicate, so (and named “unless”) only matches the named
> “unless” node; or we add a special syntax such that “(unless)” only matches
> named nodes, and “\”unless\”” only matches anonymous nodes.

Either predicate or a special syntax is welcome.

This would be more handy than writing a lambda with implicit calls
of treesit-node-check.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-11  6:31                             ` Yuan Fu
  2024-12-11 15:12                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-19  7:34                               ` Juri Linkov
  2024-12-24 19:05                                 ` Juri Linkov
  1 sibling, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-19  7:34 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
	Stefan Monnier

>> What should remain in 'treesit-thing-settings' are only grouping
>> constructs such as "parenthesized_expression" and "statement_block".
>
> Ah, this matches my idea of defining sexp in other languages as “repeatable
> construct/list-like construct”. We went with “every syntactic construct” at
> the time, which I didn’t object to, but I’m definitely happier with the
> repeatable construct approach. Including Stefan and Theo since they were
> part of the original sexp navigation discussion.
>
> My only concern is that would the result be a bit unpredictable/confusing
> when we mix the result of two logic together in such an involved way? We
> can push to master and try it out for a while. I use tree-sitter sexp
> navigation for work every day, albeit strictly for navigating list-like
> constructs—I use forward/backward-word for smaller navigation.
>
>> tries to go out of the current thing to its parent,
>> thus breaking the main principle that 'forward-sexp'
>> should move forward across siblings only.  But removing
>> this line fixed the problem:
>
> Thanks, LGTM.

Ok, so now pushed to master in such backwards-compatible way
that when a ts-mode doesn't define the 'sexp-list' thing,
then the existing 'sexp' is used.

Also added 'sexp-list' to c-ts-mode, js-ts-mode, ruby-ts-mode and
html-ts-mode.  Addition of 'sexp-list' to other ts-modes is underway.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-19  7:18                                                   ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
@ 2024-12-24  3:02                                                     ` Yuan Fu
  2024-12-24  7:17                                                       ` Juri Linkov
  2024-12-24 17:52                                                       ` Juri Linkov
  0 siblings, 2 replies; 58+ messages in thread
From: Yuan Fu @ 2024-12-24  3:02 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Dmitry Gutov, 74963



> On Dec 18, 2024, at 11:18 PM, Juri Linkov <juri@linkov.net> wrote:
> 
> [This is a separate bug report from bug#73404]
> 
>>> While testing treesit-forward-sexp-list, I discovered that
>>> thing-navigation functions are not restricted to named nodes.
>>> 
>>> I wonder if there a reason to find anonymous nodes as things?
>> 
>> We should rather ask is there any reason to not find anonymous nodes
>> as things? Even ruby-ts-mode defines a bunch of anonymous nodes as
>> sexp, no? In any case, excluding anonymous nodes from things doesn’t
>> sound right.
> 
> Indeed, there are many anonymous nodes used in ruby-ts-mode.
> 
>>> The problem was found with the node "unless" in Ruby:
>>> 
>>> unless cond
>>>   a += 1
>>> else
>>>   b -= 1
>>> end
>>> 
>>> Here the named node 'unless' has exactly the same name
>>> as the anonymous node with the text "unless":
>>> 
>>> (unless "unless" condition: (identifier)
>> 
>> I feel like Ruby’s grammar should call the named node something else,
>> like unless_statement.
> 
> Agreed, the problem is that nodes defined in Ruby’s grammar
> are too ambiguous.  There are more such nodes with the same name
> for named and anonymous: "if", "while", "until", etc.
> 
>>> Finding anonymous nodes breaks forward-sexp when point is on "unless":
>>> 
>>> un-!-less cond
>>>   a += 1
>>> else
>>>   b -= 1
>>> end
>>> 
>>> because (treesit-thing-at (point) 'sexp t) finds
>>> 
>>> #<treesit-node "unless" in 156-162>
>>> 
>>> instead of
>>> 
>>> #<treesit-node unless in 156-203>
>>> 
>>> Also this breaks backward-sexp and backward-up-list
>>> because treesit--thing-sibling finds
>>> the anonymous node "unless" as a previous sibling
>>> instead of the named node 'unless' as a parent.
>>> 
>>> Would the right solution be to check if the found thing
>>> is a named node?  With something like:
>>> 
>>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>>> index 18200acf53f..9ad879ee40c 100644
>>> --- a/lisp/treesit.el
>>> +++ b/lisp/treesit.el
>>> @@ -2711,6 +2774,7 @@ treesit--thing-sibling
>>>                     (lambda (n) (>= (treesit-node-start n) pos))))
>>>         (iter-pred (lambda (node)
>>>                      (and (treesit-node-match-p node thing t)
>>> +                           (treesit-node-check node 'named)
>>>                           (funcall pos-pred node))))
>>>         (sibling nil))
>>>    (when cursor
>>> @@ -2760,6 +2824,7 @@ treesit-thing-at
>>>  (let* ((cursor (treesit-node-at pos))
>>>         (iter-pred (lambda (node)
>>>                      (and (treesit-node-match-p node thing t)
>>> +                           (treesit-node-check node 'named)
>>>                           (if strict
>>>                               (< (treesit-node-start node) pos)
>>>                             (<= (treesit-node-start node) pos))
>> 
>> A better solution IMO is to add some way to distinguish between named and
>> anonymous nodes. I can think of two ways, either add “and” and
>> “named/anonymous” predicate, so (and named “unless”) only matches the named
>> “unless” node; or we add a special syntax such that “(unless)” only matches
>> named nodes, and “\”unless\”” only matches anonymous nodes.
> 
> Either predicate or a special syntax is welcome.
> 
> This would be more handy than writing a lambda with implicit calls
> of treesit-node-check.

I’ll go with the (and named “unless”) route because after thinking about it more, “(unless)” will be hard to work with because the string predicate is actually a regexp.

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-24  3:02                                                     ` Yuan Fu
@ 2024-12-24  7:17                                                       ` Juri Linkov
  2024-12-24  7:41                                                         ` Juri Linkov
  2024-12-25  3:25                                                         ` Dmitry Gutov
  2024-12-24 17:52                                                       ` Juri Linkov
  1 sibling, 2 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-24  7:17 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Dmitry Gutov, 74963

>>> A better solution IMO is to add some way to distinguish between named and
>>> anonymous nodes. I can think of two ways, either add “and” and
>>> “named/anonymous” predicate, so (and named “unless”) only matches the named
>>> “unless” node; or we add a special syntax such that “(unless)” only matches
>>> named nodes, and “\”unless\”” only matches anonymous nodes.
>> 
>> Either predicate or a special syntax is welcome.
>> 
>> This would be more handy than writing a lambda with implicit calls
>> of treesit-node-check.
>
> I’ll go with the (and named “unless”) route because after thinking
> about it more, “(unless)” will be hard to work with because the string
> predicate is actually a regexp.

Thanks.  While addition of '(and named "unless")' would be appreciated,
I see that currently it's possible to do this by proving a predicate
like there is 'ruby-ts--sexp-p' in

  (setq-local treesit-thing-settings
              `((ruby
                 (sexp ,(cons (rx
                               bol
                               (or
                                "class"
                                ...
                                )
                               eol)
                              #'ruby-ts--sexp-p))

Then 'ruby-ts--sexp-p' could check for the named node "unless" as well.

But it seems such solution is less efficient than adding '(and named "unless")'.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-24  7:17                                                       ` Juri Linkov
@ 2024-12-24  7:41                                                         ` Juri Linkov
  2024-12-25  3:25                                                         ` Dmitry Gutov
  1 sibling, 0 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-24  7:41 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Dmitry Gutov, 74963

>   (setq-local treesit-thing-settings
>               `((ruby
>                  (sexp ,(cons (rx
>                                bol
>                                (or
>                                 "class"
>                                 ...
>                                 )
>                                eol)
>                               #'ruby-ts--sexp-p))

BTW, I just fixed a bug in typescript-ts-mode
where "string_fragment" was mismatched by "string",
because its regexp-opt matched node names too widely,
so needed to enclose in regexp anchors.

I see that all ts-modes solve this common problem each in its own way
(here 'list' indicates a list of strings that should match node names):

  c-ts-mode:    (regexp-opt list 'symbols)
  js-ts-mode:   (concat "\\_<" (regexp-opt list t) "\\_>")
  java-ts-mode: (rx (or list))
  ruby-ts-mode: (rx bol (or list) eol)

Currently there is no uniform way to handle this frequent need.
'concat' like above looks too ugly, but 'regexp-opt' with the
'symbols' arg produces a strange regexp for matching symbols.

Maybe better would be create a new argument for 'regexp-opt', e.g.:

  (regexp-opt list 'complete)

that will expand to:

  (concat "^" list "$")





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-24  3:02                                                     ` Yuan Fu
  2024-12-24  7:17                                                       ` Juri Linkov
@ 2024-12-24 17:52                                                       ` Juri Linkov
  2024-12-24 21:03                                                         ` Yuan Fu
  1 sibling, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-24 17:52 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Dmitry Gutov, 74963

> I’ll go with the (and named “unless”) route because after thinking
> about it more, “(unless)” will be hard to work with because the string
> predicate is actually a regexp.

Is it possible to mark all node names specified in treesit-thing-settings
as named?

I just discovered a new problem:

1. With typescript-ts-mode on the following snippet:

type NodeInfo =
  | (BaseNode & {
      subtypes: BaseNode[];
    })
  | (BaseNode & {
      fields: { [name: string]: ChildNode };
      children: ChildNode[];
    });

You can move point inside "string" and type C-M-f or C-M-b.
But point doesn't move.

This is because treesit-thing-settings defines a named node "string".
But anonymous node has the same name "string":

           (index_signature [ name: (identifier) :
            index_type: (predefined_type string)

and (treesit-node-at (point)) returns
#<treesit-node "string" in 111-117>

This mismatched "string" in TypeScript is even more
unexpected than "unless" in Ruby.

So probably we need a way to mark all used nodes as named
to avoid such unexpected matches.  Maybe matching anonymous nodes
should be opt-in, and by default match only named nodes.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-19  7:34                               ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
@ 2024-12-24 19:05                                 ` Juri Linkov
  2024-12-24 21:14                                   ` Yuan Fu
  2024-12-25 17:19                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-24 19:05 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
	Stefan Monnier

> Ok, so now pushed to master in such backwards-compatible way
> that when a ts-mode doesn't define the 'sexp-list' thing,
> then the existing 'sexp' is used.

Also I'm looking into allowing more list-navigation commands
to be usable in ts-modes.  E.g. instead of limiting list-navigation
only to the current 'forward-sexp', another useful command is
'down-list'.

Currently to get inside an HTML element in html-ts-mode
is possible by using 'M-e' (forward-sentence) to skip HTML tag.
This is not quite obvious.

More natural would be to support 'C-M-d' (down-list).

But this requires overriding more low-level 'scan-lists' and
'scan-sexps' in treesit, instead of overriding the current top-level
'forward-sexp-function'.

Also treesit-thing-settings for 'down-list' would require
pairs of node names: one node to define a list, and another node
to skip a node to get inside the list.

For html-ts-mode a list node is "element", and the node to skip
is "tag".  So for example:

 -!-text <html></html>

'C-M-f' will use node "element" to move to the beginning of "element":

 text -!-<html></html>

then to get inside also need to skip the <html> tag:

 text <html>-!-</html>





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-24 17:52                                                       ` Juri Linkov
@ 2024-12-24 21:03                                                         ` Yuan Fu
  2024-12-25  7:49                                                           ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-12-24 21:03 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Dmitry Gutov, 74963



> On Dec 24, 2024, at 9:52 AM, Juri Linkov <juri@linkov.net> wrote:
> 
>> I’ll go with the (and named “unless”) route because after thinking
>> about it more, “(unless)” will be hard to work with because the string
>> predicate is actually a regexp.
> 
> Is it possible to mark all node names specified in treesit-thing-settings
> as named?
> 
> I just discovered a new problem:
> 
> 1. With typescript-ts-mode on the following snippet:
> 
> type NodeInfo =
>  | (BaseNode & {
>      subtypes: BaseNode[];
>    })
>  | (BaseNode & {
>      fields: { [name: string]: ChildNode };
>      children: ChildNode[];
>    });
> 
> You can move point inside "string" and type C-M-f or C-M-b.
> But point doesn't move.
> 
> This is because treesit-thing-settings defines a named node "string".
> But anonymous node has the same name "string":
> 
>           (index_signature [ name: (identifier) :
>            index_type: (predefined_type string)
> 
> and (treesit-node-at (point)) returns
> #<treesit-node "string" in 111-117>
> 
> This mismatched "string" in TypeScript is even more
> unexpected than "unless" in Ruby.
> 
> So probably we need a way to mark all used nodes as named
> to avoid such unexpected matches.  Maybe matching anonymous nodes
> should be opt-in, and by default match only named nodes.

IMHO this is just an unfortunate bug that needs to be fixed. I agree that this type of bug are hard to avoid, which is a bad thing, but that doesn’t mean we should try to  alleviate it at any cost. Making predicates named by default just adds complexity and inflexibility for not much benefit.

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-24 19:05                                 ` Juri Linkov
@ 2024-12-24 21:14                                   ` Yuan Fu
  2024-12-25  7:44                                     ` Juri Linkov
  2024-12-25 17:19                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-12-24 21:14 UTC (permalink / raw)
  To: Juri Linkov
  Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
	Stefan Monnier



> On Dec 24, 2024, at 11:05 AM, Juri Linkov <juri@linkov.net> wrote:
> 
>> Ok, so now pushed to master in such backwards-compatible way
>> that when a ts-mode doesn't define the 'sexp-list' thing,
>> then the existing 'sexp' is used.
> 
> Also I'm looking into allowing more list-navigation commands
> to be usable in ts-modes.  E.g. instead of limiting list-navigation
> only to the current 'forward-sexp', another useful command is
> 'down-list'.
> 
> Currently to get inside an HTML element in html-ts-mode
> is possible by using 'M-e' (forward-sentence) to skip HTML tag.
> This is not quite obvious.
> 
> More natural would be to support 'C-M-d' (down-list).
> 
> But this requires overriding more low-level 'scan-lists' and
> 'scan-sexps' in treesit, instead of overriding the current top-level
> 'forward-sexp-function'.
> 
> Also treesit-thing-settings for 'down-list' would require
> pairs of node names: one node to define a list, and another node
> to skip a node to get inside the list.
> 
> For html-ts-mode a list node is "element", and the node to skip
> is "tag".  So for example:
> 
> -!-text <html></html>
> 
> 'C-M-f' will use node "element" to move to the beginning of "element":
> 
> text -!-<html></html>
> 
> then to get inside also need to skip the <html> tag:
> 
> text <html>-!-</html>

Right, the thing navigation only supports going up/outside, but not going down/inside. We can add a new thing for beginning and end of balanced pairs. Then down-list will be going from the start of a balanced-pair-open to the end of it. Up-list will be going from the start of a balanced-pair-close to the end of it.

We might also want a way to jump from pair-open to pair-end. Going to pair-open’s parent’s end will be almost always correct. (Except for the grammars that do weird stuff, like tree-sitter-c’s for statement, the condition is not a node by itself, but split into opening parentheses, initializer, loop condition, increment, closing paren, all of which direct child of the for_statement, not grouped into a condition node.) 

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-24  7:17                                                       ` Juri Linkov
  2024-12-24  7:41                                                         ` Juri Linkov
@ 2024-12-25  3:25                                                         ` Dmitry Gutov
  2024-12-25  7:52                                                           ` Juri Linkov
  1 sibling, 1 reply; 58+ messages in thread
From: Dmitry Gutov @ 2024-12-25  3:25 UTC (permalink / raw)
  To: Juri Linkov, Yuan Fu; +Cc: 74963

Hi Juri,

On 24/12/2024 09:17, Juri Linkov wrote:
> While addition of '(and named "unless")' would be appreciated,
> I see that currently it's possible to do this by proving a predicate
> like there is 'ruby-ts--sexp-p' in
> 
>    (setq-local treesit-thing-settings
>                `((ruby
>                   (sexp ,(cons (rx
>                                 bol
>                                 (or
>                                  "class"
>                                  ...
>                                  )
>                                 eol)
>                                #'ruby-ts--sexp-p))
> 
> Then 'ruby-ts--sexp-p' could check for the named node "unless" as well.
> 
> But it seems such solution is less efficient than adding '(and named "unless")'.

Given that we're already calling a predicate every time (in 
ruby-ts-mode), we might as well add one more check. See the patch at the 
end.

Speaking of tricky examples though, here's a definition:

   module Bar
     class Foo
       def baz
       end
     end
   end

If you move point inside the keyword "module" or "class", C-M-f wouldn't 
move forward either as of the latest master. No such problem with "def".

Adding the check for "named" fixes the first two cases, but then C-M-f 
inside "def" jumps to after "baaz". Could be worked around with a 
special case, but I wonder what this difference comes from (haven't 
properly debugged yet).

diff --git a/lisp/progmodes/ruby-ts-mode.el b/lisp/progmodes/ruby-ts-mode.el
index 4ef0cb18eae..4b15c6cbf27 100644
--- a/lisp/progmodes/ruby-ts-mode.el
+++ b/lisp/progmodes/ruby-ts-mode.el
@@ -1120,6 +1120,10 @@ ruby-ts--sexp-p
        (equal (treesit-node-type (treesit-node-child node 0))
               "(")))

+(defun ruby-ts--sexp-list-p (node)
+  (when (treesit-node-check node 'named)
+    (ruby-ts--sexp-p node)))
+
  (defvar-keymap ruby-ts-mode-map
    :doc "Keymap used in Ruby mode"
    :parent prog-mode-map
@@ -1235,7 +1239,7 @@ ruby-ts-mode
                             "array"
                             "hash")
                            eol)
-                         #'ruby-ts--sexp-p))
+                         #'ruby-ts--sexp-list-p))
                   (text ,(lambda (node)
                            (or (member (treesit-node-type node)
                                        '("comment" "string_content" 
"heredoc_content"))






^ permalink raw reply related	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-24 21:14                                   ` Yuan Fu
@ 2024-12-25  7:44                                     ` Juri Linkov
  2024-12-25  8:34                                       ` Yuan Fu
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-25  7:44 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
	Stefan Monnier

>> For html-ts-mode a list node is "element", and the node to skip
>> is "tag".  So for example:
>> 
>> -!-text <html></html>
>> 
>> 'C-M-f' will use node "element" to move to the beginning of "element":
>> 
>> text -!-<html></html>
>> 
>> then to get inside also need to skip the <html> tag:
>> 
>> text <html>-!-</html>
>
> Right, the thing navigation only supports going up/outside, but not going
> down/inside. We can add a new thing for beginning and end of balanced
> pairs. Then down-list will be going from the start of a balanced-pair-open
> to the end of it. Up-list will be going from the start of
> a balanced-pair-close to the end of it.

Probably we could use just such heuristics that 'down-list' should skip
the first node of the balanced pair.  This should work for most ts-modes.

For example, for 'jsx_element' the first child to skip is 'jsx_opening_element'.
For 'argument_list' the first child to skip is an anonymous node "(".
For 'statement_block' the first child to skip is an anonymous node "{".





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-24 21:03                                                         ` Yuan Fu
@ 2024-12-25  7:49                                                           ` Juri Linkov
  2024-12-25  9:11                                                             ` Yuan Fu
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-25  7:49 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Dmitry Gutov, 74963

>> This mismatched "string" in TypeScript is even more
>> unexpected than "unless" in Ruby.
>> 
>> So probably we need a way to mark all used nodes as named
>> to avoid such unexpected matches.  Maybe matching anonymous nodes
>> should be opt-in, and by default match only named nodes.
>
> IMHO this is just an unfortunate bug that needs to be fixed. I agree that
> this type of bug are hard to avoid, which is a bad thing, but that doesn’t
> mean we should try to alleviate it at any cost. Making predicates named by
> default just adds complexity and inflexibility for not much benefit.

Not sure if a possible flexibility is better than unintended matches.

When the authors of a ts-mode carefully selected a list of named nodes to match,
why treesit should try to match some random and unintended anonymous nodes?





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-25  3:25                                                         ` Dmitry Gutov
@ 2024-12-25  7:52                                                           ` Juri Linkov
  2024-12-26  1:00                                                             ` Dmitry Gutov
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-25  7:52 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Yuan Fu, 74963

>> While addition of '(and named "unless")' would be appreciated,
>> I see that currently it's possible to do this by proving a predicate
>> like there is 'ruby-ts--sexp-p' in
>> 
>>    (setq-local treesit-thing-settings
>>                `((ruby
>>                   (sexp ,(cons (rx
>>                                 bol
>>                                 (or
>>                                  "class"
>>                                  ...
>>                                  )
>>                                 eol)
>>                                #'ruby-ts--sexp-p))
>> 
>> Then 'ruby-ts--sexp-p' could check for the named node "unless" as well.
>> 
>> But it seems such solution is less efficient than adding '(and named "unless")'.
>
> Given that we're already calling a predicate every time (in 
> ruby-ts-mode), we might as well add one more check. See the patch at the 
> end.

Thanks, I tried the patch.  It was broken, so needed to edit manually.
Also the new key 'w' doesn't work in diff buffers, need to fix it as well.

> Speaking of tricky examples though, here's a definition:
>
>    module Bar
>      class Foo
>        def baz
>        end
>      end
>    end
>
> If you move point inside the keyword "module" or "class", C-M-f wouldn't 
> move forward either as of the latest master. No such problem with "def".
>
> Adding the check for "named" fixes the first two cases, but then C-M-f 
> inside "def" jumps to after "baaz". Could be worked around with a 
> special case, but I wonder what this difference comes from (haven't 
> properly debugged yet).

I see no problems with your patch.  Everything works nicely.





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-25  7:44                                     ` Juri Linkov
@ 2024-12-25  8:34                                       ` Yuan Fu
  2024-12-25 17:36                                         ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-12-25  8:34 UTC (permalink / raw)
  To: Juri Linkov
  Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
	Stefan Monnier



> On Dec 24, 2024, at 11:44 PM, Juri Linkov <juri@linkov.net> wrote:
> 
>>> For html-ts-mode a list node is "element", and the node to skip
>>> is "tag".  So for example:
>>> 
>>> -!-text <html></html>
>>> 
>>> 'C-M-f' will use node "element" to move to the beginning of "element":
>>> 
>>> text -!-<html></html>
>>> 
>>> then to get inside also need to skip the <html> tag:
>>> 
>>> text <html>-!-</html>
>> 
>> Right, the thing navigation only supports going up/outside, but not going
>> down/inside. We can add a new thing for beginning and end of balanced
>> pairs. Then down-list will be going from the start of a balanced-pair-open
>> to the end of it. Up-list will be going from the start of
>> a balanced-pair-close to the end of it.
> 
> Probably we could use just such heuristics that 'down-list' should skip
> the first node of the balanced pair.  This should work for most ts-modes.
> 
> For example, for 'jsx_element' the first child to skip is 'jsx_opening_element'.
> For 'argument_list' the first child to skip is an anonymous node "(".
> For 'statement_block' the first child to skip is an anonymous node "{“.

Yes, that should work for the vast majority of grammars. I can’t think of an counter example other than for_statment in tree-sitter-c.

Yuan







^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-25  7:49                                                           ` Juri Linkov
@ 2024-12-25  9:11                                                             ` Yuan Fu
  2024-12-25 17:39                                                               ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Yuan Fu @ 2024-12-25  9:11 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Dmitry Gutov, 74963



> On Dec 24, 2024, at 11:49 PM, Juri Linkov <juri@linkov.net> wrote:
> 
>>> This mismatched "string" in TypeScript is even more
>>> unexpected than "unless" in Ruby.
>>> 
>>> So probably we need a way to mark all used nodes as named
>>> to avoid such unexpected matches.  Maybe matching anonymous nodes
>>> should be opt-in, and by default match only named nodes.
>> 
>> IMHO this is just an unfortunate bug that needs to be fixed. I agree that
>> this type of bug are hard to avoid, which is a bad thing, but that doesn’t
>> mean we should try to alleviate it at any cost. Making predicates named by
>> default just adds complexity and inflexibility for not much benefit.
> 
> Not sure if a possible flexibility is better than unintended matches.
> 
> When the authors of a ts-mode carefully selected a list of named nodes to match,
> why treesit should try to match some random and unintended anonymous nodes?

I don’t know and can’t prove how much the flexibility is worth, but the cost on complexity is real. If everywhere else uses thing predicates as-is, but sexp navigation auto-converts thing predicates into named predicate, that’s a cognitive burden and a special case that’s guaranteed to trip people over.

OTOH, what’s the downside of wrapping the sexp predicate with (and named …), if you only want named nodes to match?

I just think the cost outweighs the benefit, if there is any to begin with.

Yuan




^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-24 19:05                                 ` Juri Linkov
  2024-12-24 21:14                                   ` Yuan Fu
@ 2024-12-25 17:19                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-25 18:01                                     ` Juri Linkov
  1 sibling, 1 reply; 58+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-25 17:19 UTC (permalink / raw)
  To: Juri Linkov
  Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404

> Also I'm looking into allowing more list-navigation commands
> to be usable in ts-modes.  E.g. instead of limiting list-navigation
> only to the current 'forward-sexp', another useful command is
> 'down-list'.

Gentle reminder that `forward-sexp` is not a "list-navigation" function.
That would be `forward-list`.  We very often use sexp commands and
functions to manipulate non-lists such as identifiers.


        Stefan






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-25  8:34                                       ` Yuan Fu
@ 2024-12-25 17:36                                         ` Juri Linkov
  2024-12-27  7:59                                           ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-25 17:36 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Mickey Petersen, Eli Zaretskii, Theodor Thornhill, 73404,
	Stefan Monnier

>> Probably we could use just such heuristics that 'down-list' should skip
>> the first node of the balanced pair.  This should work for most ts-modes.
>> 
>> For example, for 'jsx_element' the first child to skip is 'jsx_opening_element'.
>> For 'argument_list' the first child to skip is an anonymous node "(".
>> For 'statement_block' the first child to skip is an anonymous node "{“.
>
> Yes, that should work for the vast majority of grammars. I can’t think
> of an counter example other than for_statment in tree-sitter-c.

Indeed, the 'for_statement' is a hard problem, and I see
no good solution.  I already encountered in different languages
the same problem that you described:

> We might also want a way to jump from pair-open to pair-end. Going to
> pair-open’s parent’s end will be almost always correct. (Except for the
> grammars that do weird stuff, like tree-sitter-c’s for statement, the
> condition is not a node by itself, but split into opening parentheses,
> initializer, loop condition, increment, closing paren, all of which direct
> child of the for_statement, not grouped into a condition node.)

Maybe indeed worth to try defining pair-open and pair-end as
some children nodes of the 'for_statement', i.e. only the opening paren
and the closing paren.  Something like (completely untested):

(and (member (treesit-node-text node) '("(" ")"))
     (equal (treesit-node-type (treesit-node-parent node)) "for_statement"))





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-25  9:11                                                             ` Yuan Fu
@ 2024-12-25 17:39                                                               ` Juri Linkov
  0 siblings, 0 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-25 17:39 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Dmitry Gutov, 74963

>> Not sure if a possible flexibility is better than unintended matches.
>> 
>> When the authors of a ts-mode carefully selected a list of named nodes to match,
>> why treesit should try to match some random and unintended anonymous nodes?
>
> I don’t know and can’t prove how much the flexibility is worth, but the
> cost on complexity is real. If everywhere else uses thing predicates as-is,
> but sexp navigation auto-converts thing predicates into named predicate,
> that’s a cognitive burden and a special case that’s guaranteed to trip
> people over.
>
> OTOH, what’s the downside of wrapping the sexp predicate with (and named …),
> if you only want named nodes to match?
>
> I just think the cost outweighs the benefit, if there is any to begin with.

Actually, what I had in mind is not to enable named-only mode by default,
but only to allow the authors of ts-modes to specify this condition.
For example, if it will be possible to write

  (setq-local treesit-thing-settings
              `((typescript
                 (sexp (and named ,(regexp-opt typescript-ts-mode--sexp-nodes 'symbols))))))

this should be fine.  This is similar to how the authors of ts-modes
decide whether to restrict matches to exact names by using
"^...$" with regexp-opt.

BTW, I'm thinking about adding such simple helper:

  (defun treesit-regexp-opt (strings)
    (concat "^" (regexp-opt strings) "$"))

to use like this:

  (setq-local treesit-thing-settings
                `((typescript
                   (sexp (and named ,(treesit-regexp-opt typescript-ts-mode--sexp-nodes))))))





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-25 17:19                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-25 18:01                                     ` Juri Linkov
  2024-12-25 19:29                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 58+ messages in thread
From: Juri Linkov @ 2024-12-25 18:01 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404

>> Also I'm looking into allowing more list-navigation commands
>> to be usable in ts-modes.  E.g. instead of limiting list-navigation
>> only to the current 'forward-sexp', another useful command is
>> 'down-list'.
>
> Gentle reminder that `forward-sexp` is not a "list-navigation" function.
> That would be `forward-list`.  We very often use sexp commands and
> functions to manipulate non-lists such as identifiers.

Do you think it would be better to override low-level functions 'scan-lists'
and 'scan-sexps' with new variables like 'scan-lists-function'
and 'scan-sexps-function', instead of adding more variables for
overriding top-level commands such as a new variable 'forward-list-function'
and 'down-list-function', like the existing 'forward-sexp-function'?





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-25 18:01                                     ` Juri Linkov
@ 2024-12-25 19:29                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-12-27  7:54                                         ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-12-25 19:29 UTC (permalink / raw)
  To: Juri Linkov
  Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404

>> Gentle reminder that `forward-sexp` is not a "list-navigation" function.
>> That would be `forward-list`.  We very often use sexp commands and
>> functions to manipulate non-lists such as identifiers.
> Do you think it would be better to override low-level functions 'scan-lists'
> and 'scan-sexps' with new variables like 'scan-lists-function'
> and 'scan-sexps-function', instead of adding more variables for
> overriding top-level commands such as a new variable 'forward-list-function'
> and 'down-list-function', like the existing 'forward-sexp-function'?

Don't know.

What I do know is that in general we'd also want an `up-sexp` operation.
Currently we have an ugly kludge in `up-list` to try and use
`forward-sexp-function` (which is ugly both because
`forward-sexp-function` doesn't really provide the functionality we
need, and because it mixes up sexp and list navigation), and it would be
good to clean it up.

AFAIK `up-list` is the only place where I've needed sexp-based
navigation and where `forward-sexp-function` didn't do the job.
In theory I guess `down-list` is another, but I've never found a use
for it.


        Stefan






^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-25  7:52                                                           ` Juri Linkov
@ 2024-12-26  1:00                                                             ` Dmitry Gutov
  2024-12-27  7:42                                                               ` Juri Linkov
  0 siblings, 1 reply; 58+ messages in thread
From: Dmitry Gutov @ 2024-12-26  1:00 UTC (permalink / raw)
  To: Juri Linkov; +Cc: Yuan Fu, 74963

On 25/12/2024 09:52, Juri Linkov wrote:

>> Given that we're already calling a predicate every time (in
>> ruby-ts-mode), we might as well add one more check. See the patch at the
>> end.
> 
> Thanks, I tried the patch.  It was broken, so needed to edit manually.

Maybe something regarding whitespace at the end?

> Also the new key 'w' doesn't work in diff buffers, need to fix it as well.

The binding for 'diff-kill-ring-save'? Seems to work here, as long as 
the diff buffer is in read-only mode.

>> Adding the check for "named" fixes the first two cases, but then C-M-f
>> inside "def" jumps to after "baaz". Could be worked around with a
>> special case, but I wonder what this difference comes from (haven't
>> properly debugged yet).
> 
> I see no problems with your patch.  Everything works nicely.

Hmm, I can't reproduce it either anymore.

Thanks for testing, pushed to master now (unfortunately the commit 
message refers to bug#73404).





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode
  2024-12-26  1:00                                                             ` Dmitry Gutov
@ 2024-12-27  7:42                                                               ` Juri Linkov
  0 siblings, 0 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-27  7:42 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Yuan Fu, 74963

>>> Given that we're already calling a predicate every time (in
>>> ruby-ts-mode), we might as well add one more check. See the patch at the
>>> end.
>> Thanks, I tried the patch.  It was broken, so needed to edit manually.
>
> Maybe something regarding whitespace at the end?

Something with whitespace, but not a big problem.

>> Also the new key 'w' doesn't work in diff buffers, need to fix it as well.
>
> The binding for 'diff-kill-ring-save'? Seems to work here, as long as the
> diff buffer is in read-only mode.

Yes, 'W' with 'diff-kill-ring-save'.  Single keys are still a problem
in visited diff files.

>>> Adding the check for "named" fixes the first two cases, but then C-M-f
>>> inside "def" jumps to after "baaz". Could be worked around with a
>>> special case, but I wonder what this difference comes from (haven't
>>> properly debugged yet).
>> I see no problems with your patch.  Everything works nicely.
>
> Hmm, I can't reproduce it either anymore.
>
> Thanks for testing, pushed to master now (unfortunately the commit message
> refers to bug#73404).

Thanks.  Maybe a helper for other ts-modes will be handy:

  (defun treesit-node-named (node)
    (treesit-node-check node 'named))

to be used like this

  (sexp ,(cons
          (treesit-match-nodes strings)
          'treesit-node-named))





^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-25 19:29                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-12-27  7:54                                         ` Juri Linkov
  0 siblings, 0 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-27  7:54 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Mickey Petersen, Yuan Fu, Theodor Thornhill, Eli Zaretskii, 73404

[-- Attachment #1: Type: text/plain, Size: 2710 bytes --]

>>> Gentle reminder that `forward-sexp` is not a "list-navigation" function.
>>> That would be `forward-list`.  We very often use sexp commands and
>>> functions to manipulate non-lists such as identifiers.
>> Do you think it would be better to override low-level functions 'scan-lists'
>> and 'scan-sexps' with new variables like 'scan-lists-function'
>> and 'scan-sexps-function', instead of adding more variables for
>> overriding top-level commands such as a new variable 'forward-list-function'
>> and 'down-list-function', like the existing 'forward-sexp-function'?
>
> Don't know.

I tried to override `scan-lists` and `scan-sexps` with advices
(a proof of concept attached below), and it works nicely.
For example, `C-M-d` moves down to the HTML element in html-ts-mode.

But then I realized there are not too many places where such
overriding might be useful.  In fact, there is only 1 place
in `show-paren--default` that uses `scan-sexps`.  But even this
occurrence can't be used, so I created `treesit-show-paren-data`
for `show-paren-data-function` in bug#75122.

And overriding `scan-lists` is useful only in `forward-list`,
`down-list` and `up-list`.  That's all.  So clearly instead
of overriding `scan-lists` and `scan-sexps`, better would be
to add 3 new variables: `forward-list-function`,
`down-list-function` and `up-list-function`.

This gives the users more flexibility to choose for example navigation
for C-M-f with a limited number of treesit lists + syntax symbols, and
for C-M-n everything that is a list in treesit.  This means that with

  start_atimer (-!-enum atimer_type type, struct timespec timestamp)

C-M-f could skip only the next symbol and move to

  start_atimer (enum-!- atimer_type type, struct timespec timestamp)

while C-M-n could skip the whole next `parameter_declaration`

  start_atimer (enum atimer_type type-!-, struct timespec timestamp)

or vice versa depending on the values of all new options `...-function`
in ts-modes.

> What I do know is that in general we'd also want an `up-sexp` operation.
> Currently we have an ugly kludge in `up-list` to try and use
> `forward-sexp-function` (which is ugly both because
> `forward-sexp-function` doesn't really provide the functionality we
> need, and because it mixes up sexp and list navigation), and it would be
> good to clean it up.

Agreed, this distinction is required for treesit.  Hopefully,
this can be achieved by separating `forward-sexp-function`
and `up-list-function` that in ts-modes could be set to new
functions either `treesit-up-list` or `treesit-up-sexp`.

PS: will try to refactor everything in this attachment to new
`forward-list-function`, `down-list-function` and `up-list-function`:


[-- Attachment #2: scan-lists-advice.el --]
[-- Type: application/emacs-lisp, Size: 1594 bytes --]

^ permalink raw reply	[flat|nested] 58+ messages in thread

* bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes
  2024-12-25 17:36                                         ` Juri Linkov
@ 2024-12-27  7:59                                           ` Juri Linkov
  0 siblings, 0 replies; 58+ messages in thread
From: Juri Linkov @ 2024-12-27  7:59 UTC (permalink / raw)
  To: Yuan Fu
  Cc: Theodor Thornhill, Eli Zaretskii, Mickey Petersen, 73404,
	Stefan Monnier

>> I can’t think of an counter example other than for_statment in
>> tree-sitter-c.
>
> Indeed, the 'for_statement' is a hard problem, and I see
> no good solution.

A possible solution would be to find a way to create "virtual" nodes.

This means transforming the existing syntax tree by inserting
addition nodes to it.  For example, transforming

   (for_statement "for" "("
    condition: (assignment_expression left: (identifier) operator: "=" right: (number_literal))
    ;
    body: (binary_expression left: (identifier) operator: "<" right: (number_literal))
    ;
    (update_expression operator: "++" argument: (identifier))
    ")"

into

   (for_statement "for"
    (for_parameters "("
     condition: (assignment_expression left: (identifier) operator: "=" right: (number_literal))
     ;
     body: (binary_expression left: (identifier) operator: "<" right: (number_literal))
     ;
     (update_expression operator: "++" argument: (identifier))
     ")"

by inserting the virtual node "for_parameters".  Not sure
if such transformation is supported by tree-sitter.

Generally, we have two problems with syntax trees created
by the authors of tree-sitter grammars:

1. Insufficient structure
2. Excessive structure

For insufficient structure a possible solution would be to insert
virtual nodes like above.  And for excessive structure we need
to flatten some nodes.  For example, in c-ts-mode:

  static -!-struct atimer *free_atimers;

'C-M-f' unexpectedly jumped not to the next symbol, but to

  static struct atimer-!- *free_atimers;

because of too much structure built in the syntax tree:

  (declaration
   type: (storage_class_specifier "static")
   declarator: (struct_specifier "struct" name: (type_identifier))
   (pointer_declarator "*" declarator: (identifier))
   ";")

So the solution could be to flatten 'struct_specifier' to move
'type_identifier' to be a sibling in the same list inside 'declaration':

  (declaration
   type: (storage_class_specifier "static")
   declarator: (struct_specifier "struct")
   (type_identifier)
   (pointer_declarator "*" declarator: (identifier))
   ";")

Also not sure if such thing is possible in tree-sitter.





^ permalink raw reply	[flat|nested] 58+ messages in thread

end of thread, other threads:[~2024-12-27  7:59 UTC | newest]

Thread overview: 58+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-09-21  5:06 bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Mickey Petersen
2024-09-26  7:42 ` Yuan Fu
2024-09-26  9:56   ` Mickey Petersen
2024-09-26 10:53     ` Eli Zaretskii
2024-09-26 12:13       ` Mickey Petersen
2024-09-26 13:46         ` Eli Zaretskii
2024-09-26 15:21           ` Mickey Petersen
2024-09-26 15:45             ` Eli Zaretskii
2024-09-27  5:43               ` Yuan Fu
2024-09-29 16:56                 ` Juri Linkov
2024-10-01  3:57                   ` Yuan Fu
2024-10-01 17:49                     ` Juri Linkov
2024-10-02  6:14                       ` Yuan Fu
2024-12-05 18:52                       ` Juri Linkov
2024-12-05 19:53                         ` Juri Linkov
2024-12-10 17:20                           ` Juri Linkov
2024-12-11  6:31                             ` Yuan Fu
2024-12-11 15:12                               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 15:29                                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-11 16:50                                 ` Mickey Petersen
2024-12-11 18:27                                 ` Yuan Fu
2024-12-12  7:17                                   ` Juri Linkov
2024-12-12  7:40                                     ` Eli Zaretskii
2024-12-12  7:58                                       ` Juri Linkov
2024-12-12  8:14                                         ` Juri Linkov
2024-12-12 16:31                                           ` Juri Linkov
2024-12-12 17:49                                             ` Juri Linkov
2024-12-12 19:13                                               ` Eli Zaretskii
2024-12-13  7:06                                                 ` Juri Linkov
2024-12-14 11:02                                                   ` Eli Zaretskii
2024-12-14 18:14                                                     ` Juri Linkov
2024-12-18  7:37                                               ` Juri Linkov
2024-12-19  4:04                                                 ` Yuan Fu
2024-12-19  7:14                                                   ` Juri Linkov
2024-12-19  7:18                                                   ` bug#74963: Ambiguous treesit named and anonymous nodes in ruby-ts-mode Juri Linkov
2024-12-24  3:02                                                     ` Yuan Fu
2024-12-24  7:17                                                       ` Juri Linkov
2024-12-24  7:41                                                         ` Juri Linkov
2024-12-25  3:25                                                         ` Dmitry Gutov
2024-12-25  7:52                                                           ` Juri Linkov
2024-12-26  1:00                                                             ` Dmitry Gutov
2024-12-27  7:42                                                               ` Juri Linkov
2024-12-24 17:52                                                       ` Juri Linkov
2024-12-24 21:03                                                         ` Yuan Fu
2024-12-25  7:49                                                           ` Juri Linkov
2024-12-25  9:11                                                             ` Yuan Fu
2024-12-25 17:39                                                               ` Juri Linkov
2024-12-19  7:34                               ` bug#73404: 30.0.50; [forward/kill/etc]-sexp commands do not behave as expected in tree-sitter modes Juri Linkov
2024-12-24 19:05                                 ` Juri Linkov
2024-12-24 21:14                                   ` Yuan Fu
2024-12-25  7:44                                     ` Juri Linkov
2024-12-25  8:34                                       ` Yuan Fu
2024-12-25 17:36                                         ` Juri Linkov
2024-12-27  7:59                                           ` Juri Linkov
2024-12-25 17:19                                   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-25 18:01                                     ` Juri Linkov
2024-12-25 19:29                                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-12-27  7:54                                         ` Juri Linkov

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.