unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
@ 2023-09-09 19:26 Danny Freeman
  2023-09-10  1:15 ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Danny Freeman @ 2023-09-09 19:26 UTC (permalink / raw)
  To: emacs-devel; +Cc: Yuan Fu

Hello,

I believe I have found a regression in the treesit package introduced by
commit ec4d29c4494f32acf0ff7c5632a1d951d957f084

I noticed that indentation does not work properly on Emacs master branch
in clojure-ts-mode. After doing a git bisect on Emacs source code I
landed on the above commit.



To reproduce, I build emacs with ec4d29c44 [0][1] and load up
clojure-ts-mode from the current main branch (commit 58f3c835 [2]).

I open up a new file, say test.clj, and run `M-x clojure-ts-mode`.

In that file I can paste the following contents

(defn foo [] 1)

The buffer should have the following tree structure, which can be seen
with M-x treesit-explore-mode <RET> clojure <RET>

(source
 (list_lit close: (
  (sym_lit name: (sym_name))
  (sym_lit (sym_name))
  (vec_lit [ ])
  (num_lit) )))

If I place my cursor directly before the 1 character like so, where | is
my cursor

(defn foo [] |1)

and then press <RET>, I would expect the following indentation

(defn foo []
  |1)

Instead I get

(defn foo []
|1)

and if `treesit--indent-verbose` is set to `t` then I see these messages
in *Messages*

  Matched rule: ((parent-is "source") parent-bol 0)
  PARENT is nil, not indenting

Before this change I would see

  Matched rule: ((parent-is "source") parent-bol 0)
  Matched rule: (clojure-ts--match-expression-in-body parent 2)

While my cursor is before the 1 character, running

M-x eval-expression <RET> (treesit-node-parent (treesit-node-at (point)))

will return nil. Before this change, it would return the parent list_lit
node as I would expect.


Please let me know if you have any question, or if there is something I
can do to make this easier to test. I see the commit the git bisect
pointed me to contains changes to the C code for treesit. I do not know
C well enough to know why this breaks, unfortunately. I can only say
something is not quite right.

[0] - the configure flags I use to build emacs
https://git.sr.ht/~dannyfreeman/emacs/tree/ba4a5f3678bf83a28ce382fc33611e0d7aed2a86/item/flake.nix#L63-81
[1] - The init file I use
https://git.sr.ht/~dannyfreeman/emacs/tree/main/item/init/init.el
[2] - clojure-ts-mode version used to reproduce this error:
https://github.com/clojure-emacs/clojure-ts-mode/commit/58f3c835aeafe0378ab9693731b95096827dbb24

Thank you,
-- 
Danny Freeman



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
  2023-09-09 19:26 Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084 Danny Freeman
@ 2023-09-10  1:15 ` Yuan Fu
  2023-09-10  4:00   ` Danny Freeman
  0 siblings, 1 reply; 8+ messages in thread
From: Yuan Fu @ 2023-09-10  1:15 UTC (permalink / raw)
  To: Danny Freeman; +Cc: emacs-devel



> On Sep 9, 2023, at 12:26 PM, Danny Freeman <danny@dfreeman.email> wrote:
> 
> Hello,
> 
> I believe I have found a regression in the treesit package introduced by
> commit ec4d29c4494f32acf0ff7c5632a1d951d957f084
> 
> I noticed that indentation does not work properly on Emacs master branch
> in clojure-ts-mode. After doing a git bisect on Emacs source code I
> landed on the above commit.
> 
> 
> 
> To reproduce, I build emacs with ec4d29c44 [0][1] and load up
> clojure-ts-mode from the current main branch (commit 58f3c835 [2]).
> 
> I open up a new file, say test.clj, and run `M-x clojure-ts-mode`.
> 
> In that file I can paste the following contents
> 
> (defn foo [] 1)
> 
> The buffer should have the following tree structure, which can be seen
> with M-x treesit-explore-mode <RET> clojure <RET>
> 
> (source
> (list_lit close: (
>  (sym_lit name: (sym_name))
>  (sym_lit (sym_name))
>  (vec_lit [ ])
>  (num_lit) )))
> 
> If I place my cursor directly before the 1 character like so, where | is
> my cursor
> 
> (defn foo [] |1)
> 
> and then press <RET>, I would expect the following indentation
> 
> (defn foo []
>  |1)
> 
> Instead I get
> 
> (defn foo []
> |1)
> 
> and if `treesit--indent-verbose` is set to `t` then I see these messages
> in *Messages*
> 
>  Matched rule: ((parent-is "source") parent-bol 0)
>  PARENT is nil, not indenting
> 
> Before this change I would see
> 
>  Matched rule: ((parent-is "source") parent-bol 0)
>  Matched rule: (clojure-ts--match-expression-in-body parent 2)
> 
> While my cursor is before the 1 character, running
> 
> M-x eval-expression <RET> (treesit-node-parent (treesit-node-at (point)))
> 
> will return nil. Before this change, it would return the parent list_lit
> node as I would expect.
> 
> 
> Please let me know if you have any question, or if there is something I
> can do to make this easier to test. I see the commit the git bisect
> pointed me to contains changes to the C code for treesit. I do not know
> C well enough to know why this breaks, unfortunately. I can only say
> something is not quite right.
> 
> [0] - the configure flags I use to build emacs
> https://git.sr.ht/~dannyfreeman/emacs/tree/ba4a5f3678bf83a28ce382fc33611e0d7aed2a86/item/flake.nix#L63-81
> [1] - The init file I use
> https://git.sr.ht/~dannyfreeman/emacs/tree/main/item/init/init.el
> [2] - clojure-ts-mode version used to reproduce this error:
> https://github.com/clojure-emacs/clojure-ts-mode/commit/58f3c835aeafe0378ab9693731b95096827dbb24
> 
> Thank you,
> -- 
> Danny Freeman

Hmmm, I can’t reproduce this. I’m getting

Matched rule: ((parent-is "source") parent-bol 0)
Matched rule: (clojure-ts--match-expression-in-body parent 2)

The commit you pointed to does change treesit-node-parent, so it does match with parent-is matcher not working.

Might be because I recently upgraded to a newer tree-sitter library. If you compile with the newest tree-sitter build (from their repo), do you still see this problem?

Yuan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
  2023-09-10  1:15 ` Yuan Fu
@ 2023-09-10  4:00   ` Danny Freeman
  2023-09-10  5:26     ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Danny Freeman @ 2023-09-10  4:00 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel


Yuan Fu <casouri@gmail.com> writes:
>
> Hmmm, I can’t reproduce this. I’m getting
>
> Matched rule: ((parent-is "source") parent-bol 0)
> Matched rule: (clojure-ts--match-expression-in-body parent 2)
>
> The commit you pointed to does change treesit-node-parent, so it does match with parent-is matcher not working.
>
> Might be because I recently upgraded to a newer tree-sitter library. If you compile with the newest tree-sitter build (from their repo), do you still see this problem?

Before I was just using whatever tree-sitter version my distro packages.
Apparently v0.20.8 (built from the git tag of the same name)

I tried compiling with this version
https://github.com/tree-sitter/tree-sitter/commit/524bf7e2c664d4a5dbd0c20d4d10f1e58f99e8ce

which at the time of writing is the latest, and still experience the
same problem. I also made sure to re-compile my grammars.

I may have done something wrong though when building. I'm
going to try to build using a more traditional process rather than with
nix and see if that helps.

Thank you,
-- 
Danny Freeman



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
  2023-09-10  4:00   ` Danny Freeman
@ 2023-09-10  5:26     ` Yuan Fu
  2023-09-10  7:12       ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Yuan Fu @ 2023-09-10  5:26 UTC (permalink / raw)
  To: Danny Freeman; +Cc: emacs-devel



> On Sep 9, 2023, at 9:00 PM, Danny Freeman <danny@dfreeman.email> wrote:
> 
> 
> Yuan Fu <casouri@gmail.com> writes:
>> 
>> Hmmm, I can’t reproduce this. I’m getting
>> 
>> Matched rule: ((parent-is "source") parent-bol 0)
>> Matched rule: (clojure-ts--match-expression-in-body parent 2)
>> 
>> The commit you pointed to does change treesit-node-parent, so it does match with parent-is matcher not working.
>> 
>> Might be because I recently upgraded to a newer tree-sitter library. If you compile with the newest tree-sitter build (from their repo), do you still see this problem?
> 
> Before I was just using whatever tree-sitter version my distro packages.
> Apparently v0.20.8 (built from the git tag of the same name)
> 
> I tried compiling with this version
> https://github.com/tree-sitter/tree-sitter/commit/524bf7e2c664d4a5dbd0c20d4d10f1e58f99e8ce
> 
> which at the time of writing is the latest, and still experience the
> same problem. I also made sure to re-compile my grammars.
> 
> I may have done something wrong though when building. I'm
> going to try to build using a more traditional process rather than with
> nix and see if that helps.
> 
> Thank you,
> -- 
> Danny Freeman

I take that back. It is indeed caused by that commit. I just had a wrong Emacs build when testing. In that commit, ts_tree_cursor_goto_first_child_for_byte seem to be the problem.

Yuan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
  2023-09-10  5:26     ` Yuan Fu
@ 2023-09-10  7:12       ` Yuan Fu
  2023-09-11 12:41         ` Danny Freeman
  0 siblings, 1 reply; 8+ messages in thread
From: Yuan Fu @ 2023-09-10  7:12 UTC (permalink / raw)
  To: Danny Freeman; +Cc: emacs-devel



> On Sep 9, 2023, at 10:26 PM, Yuan Fu <casouri@gmail.com> wrote:
> 
> 
> 
>> On Sep 9, 2023, at 9:00 PM, Danny Freeman <danny@dfreeman.email> wrote:
>> 
>> 
>> Yuan Fu <casouri@gmail.com> writes:
>>> 
>>> Hmmm, I can’t reproduce this. I’m getting
>>> 
>>> Matched rule: ((parent-is "source") parent-bol 0)
>>> Matched rule: (clojure-ts--match-expression-in-body parent 2)
>>> 
>>> The commit you pointed to does change treesit-node-parent, so it does match with parent-is matcher not working.
>>> 
>>> Might be because I recently upgraded to a newer tree-sitter library. If you compile with the newest tree-sitter build (from their repo), do you still see this problem?
>> 
>> Before I was just using whatever tree-sitter version my distro packages.
>> Apparently v0.20.8 (built from the git tag of the same name)
>> 
>> I tried compiling with this version
>> https://github.com/tree-sitter/tree-sitter/commit/524bf7e2c664d4a5dbd0c20d4d10f1e58f99e8ce
>> 
>> which at the time of writing is the latest, and still experience the
>> same problem. I also made sure to re-compile my grammars.
>> 
>> I may have done something wrong though when building. I'm
>> going to try to build using a more traditional process rather than with
>> nix and see if that helps.
>> 
>> Thank you,
>> -- 
>> Danny Freeman
> 
> I take that back. It is indeed caused by that commit. I just had a wrong Emacs build when testing. In that commit, ts_tree_cursor_goto_first_child_for_byte seem to be the problem.
> 
> Yuan

What I found is that, for some reason, ts_tree_cursor_goto_first_child_for_byte doesn’t always work as expected in clojure [1]. In your example, if I evaluate (treesit-node-parent (treesit-node-at (point))) with point at the beginning of  “defn”, “foo”, “[]”, and “1”, only “defn” correctly evaluates to the parent node. I tried this in other modes like c-ts-mode, but things seems to work fine in all the few cases I tried.

So now the question is, is there anything tree-sitter-clojure does differently? Not saying it does, but that seems like a starting point.

[1] Instead of returning the first child node of NODE that extends beyond POS, ts_tree_cursor_goto_first_child_for_byte (NODE, POS) returns false (meaning not found).

Yuan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
  2023-09-10  7:12       ` Yuan Fu
@ 2023-09-11 12:41         ` Danny Freeman
  2023-09-11 22:19           ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Danny Freeman @ 2023-09-11 12:41 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel


Yuan Fu <casouri@gmail.com> writes:

>> I take that back. It is indeed caused by that commit. I just had a wrong Emacs build when testing. In that commit, ts_tree_cursor_goto_first_child_for_byte seem to be the problem.

Well I'm glad to know I am capable of building Emacs with the latest
tree sitter. Not as glad that the problem still exists lol


> What I found is that, for some reason, ts_tree_cursor_goto_first_child_for_byte doesn’t always work
> as expected in clojure [1]. In your example, if I evaluate (treesit-node-parent (treesit-node-at
> (point))) with point at the beginning of “defn”, “foo”, “[]”, and “1”, only “defn” correctly
> evaluates to the parent node. I tried this in other modes like c-ts-mode, but things seems to work
> fine in all the few cases I tried.
>
> So now the question is, is there anything tree-sitter-clojure does differently? Not saying it does, but that seems like a starting point.

Maybe there is. The grammar has not changed in a couple months, and
those things that did change were related to keyword literals, which are
not involved in the example I've been using to reproduce this.

Even knowing that. I will try to narrow the problem down even further,
and see if I can find another grammar the exhibits this behavior.

> [1] Instead of returning the first child node of NODE that extends beyond POS, ts_tree_cursor_goto_first_child_for_byte (NODE, POS) returns false (meaning not found).

I remember we found an issue with a similarly named function here

https://github.com/tree-sitter/tree-sitter/issues/2012
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60127

You ended up writing a workaround in emacs for this called
treesit_cursor_first_child_for_byte

I don't think there is overlap between that change and this ec4d29 one.
However, maybe there is overlap in the tree-sitter implementation? I'm
not sure how to tell right now, but maybe this is an upstream
tree-sitter problem that is just brought into the light by the commit in
question here.

-- 
Danny Freeman



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
  2023-09-11 12:41         ` Danny Freeman
@ 2023-09-11 22:19           ` Yuan Fu
  2023-09-12  3:49             ` Danny Freeman
  0 siblings, 1 reply; 8+ messages in thread
From: Yuan Fu @ 2023-09-11 22:19 UTC (permalink / raw)
  To: Danny Freeman; +Cc: emacs-devel



> On Sep 11, 2023, at 5:41 AM, Danny Freeman <danny@dfreeman.email> wrote:
> 
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> I take that back. It is indeed caused by that commit. I just had a wrong Emacs build when testing. In that commit, ts_tree_cursor_goto_first_child_for_byte seem to be the problem.
> 
> Well I'm glad to know I am capable of building Emacs with the latest
> tree sitter. Not as glad that the problem still exists lol
> 
> 
>> What I found is that, for some reason, ts_tree_cursor_goto_first_child_for_byte doesn’t always work
>> as expected in clojure [1]. In your example, if I evaluate (treesit-node-parent (treesit-node-at
>> (point))) with point at the beginning of “defn”, “foo”, “[]”, and “1”, only “defn” correctly
>> evaluates to the parent node. I tried this in other modes like c-ts-mode, but things seems to work
>> fine in all the few cases I tried.
>> 
>> So now the question is, is there anything tree-sitter-clojure does differently? Not saying it does, but that seems like a starting point.
> 
> Maybe there is. The grammar has not changed in a couple months, and
> those things that did change were related to keyword literals, which are
> not involved in the example I've been using to reproduce this.
> 
> Even knowing that. I will try to narrow the problem down even further,
> and see if I can find another grammar the exhibits this behavior.
> 
>> [1] Instead of returning the first child node of NODE that extends beyond POS, ts_tree_cursor_goto_first_child_for_byte (NODE, POS) returns false (meaning not found).
> 
> I remember we found an issue with a similarly named function here
> 
> https://github.com/tree-sitter/tree-sitter/issues/2012
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60127
> 
> You ended up writing a workaround in emacs for this called
> treesit_cursor_first_child_for_byte
> 
> I don't think there is overlap between that change and this ec4d29 one.
> However, maybe there is overlap in the tree-sitter implementation? I'm
> not sure how to tell right now, but maybe this is an upstream
> tree-sitter problem that is just brought into the light by the commit in
> question here.

Duh, I can’t believe I completely forgot about this. Yes, this is exactly it, thank you for binging it up. Before tree-sitter can fix treesit_cursor_first_child_for_byte, I’ll just replicate the workaround and we should be golden. I’ve pushed a fix to emacs-29. Once master pulls from emacs-29, the problem should go away.

BTW, I should’ve opened a bug report and move the discussion there. But since we are mostly done here, and there’s another bug report to refer to, there’s no need to do that now.

Yuan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084
  2023-09-11 22:19           ` Yuan Fu
@ 2023-09-12  3:49             ` Danny Freeman
  0 siblings, 0 replies; 8+ messages in thread
From: Danny Freeman @ 2023-09-12  3:49 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel


Yuan Fu <casouri@gmail.com> writes:

> Duh, I can’t believe I completely forgot about this. Yes, this is exactly it, thank you for binging
> it up. Before tree-sitter can fix treesit_cursor_first_child_for_byte, I’ll just replicate the
> workaround and we should be golden. I’ve pushed a fix to emacs-29. Once master pulls from emacs-29,
> the problem should go away.

It took me a minute to connect the dots there, but that did seem to be
the cause. I just tried it against Emacs 30 with the commit in question
cherry picked and it works perfectly. Thank you for fixing it.

> BTW, I should’ve opened a bug report and move the discussion there. But since we are mostly done here, and there’s another bug report to refer to, there’s no need to do that now.

Understood. I probably should have started with a bug report but for
some reason sent my initial email to emacs-devel. I'll be better about
that next time I run into something in the core.

Thanks again for your help, all the best.
-- 
Danny Freeman



^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-09-12  3:49 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-09 19:26 Treesit Regression In ec4d29c4494f32acf0ff7c5632a1d951d957f084 Danny Freeman
2023-09-10  1:15 ` Yuan Fu
2023-09-10  4:00   ` Danny Freeman
2023-09-10  5:26     ` Yuan Fu
2023-09-10  7:12       ` Yuan Fu
2023-09-11 12:41         ` Danny Freeman
2023-09-11 22:19           ` Yuan Fu
2023-09-12  3:49             ` Danny Freeman

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).