unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode
@ 2023-02-05  0:39 Dmitry Gutov
  2023-02-06  0:08 ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Gutov @ 2023-02-05  0:39 UTC (permalink / raw)
  To: 61285

This probably involves a parser bug and/or maybe a tree-sitter one. But 
I'm posting this here anyway because this might not be the only way to 
trigger this problem. Or it could give us some optimization insights.

Also, while it involves a node which is parsed to have a large number of 
descendants, the performance depends heavily on whether the node is at 
the top level of the program (then it's slow), or not.

To repro:

1. Visit test/lisp/progmodes/ruby-mode-resources/ruby.rb
2. add 'a = %w' (without quotes) as a separate new line before all of 
the existing code.
3. Notice the delay in redisplay after you type 'w'.

In you do that with a larger file, BTW, this delay may be on the order 
of a minute. Here's an example of such file: 
https://github.com/rails/rails/blob/main/activerecord/lib/active_record/associations.rb

The superficial reason for this delay is that %w opens a new "array of 
strings" literal which parses every separate word in the rest of the 
buffer as a separate string. So we get a node with thousands of 
children, in the case of associations.rb. Or just ~1000 in the case of 
ruby.rb.

I also tried setting treesit--font-lock-fast-mode to t: no effect.

But! If we do the same not on top-level -- say, put the 'a = %w' line 
after the 'foo' line inside the first 'if' statement (i.e. on line 7), 
the delay is much smaller -- not noticeable in ruby.rb, and still 
apparent but much more bearable in associations.rb (you can put that 
statement right after 'module ActiveRecord') -- even though the size of 
the tree is changed minimally, and the number of children nodes for that 
"array of strings" still counts in the thousands (e.g. 13319).

Perf report for the "bad" highlighting delay looks like this:

   61.19%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_current_status
   30.88%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_parent_node
    7.44%  emacs libtree-sitter.so.0.0  [.] ts_language_symbol_metadata
    0.06%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_first_child
    0.05%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_next_sibling
    0.03%  emacs libtree-sitter.so.0.0  [.] ts_node_end_byte

And like this in the "good" case (with many type-backspace repetitions):

   32.10%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_current_status
    9.50%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_first_child
    7.89%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_next_sibling
    7.51%  emacs libtree-sitter.so.0.0  [.] ts_language_symbol_metadata
    6.45%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_parent_node
    1.87%  emacs libtree-sitter.so.0.0  [.] ts_node_start_point
    1.85%  emacs emacs  [.] process_mark_stack
    0.93%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_current_node






^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#61285: (Sometimes very) slow font-lock after %w in  ruby-ts-mode
  2023-02-05  0:39 bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode Dmitry Gutov
@ 2023-02-06  0:08 ` Yuan Fu
  2023-02-06  1:03   ` Dmitry Gutov
  0 siblings, 1 reply; 8+ messages in thread
From: Yuan Fu @ 2023-02-06  0:08 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 61285


Dmitry Gutov <dgutov@yandex.ru> writes:

> This probably involves a parser bug and/or maybe a tree-sitter one.
> But I'm posting this here anyway because this might not be the only
> way to trigger this problem. Or it could give us some optimization
> insights.
>
> Also, while it involves a node which is parsed to have a large number
> of descendants, the performance depends heavily on whether the node is
> at the top level of the program (then it's slow), or not.
>
> To repro:
>
> 1. Visit test/lisp/progmodes/ruby-mode-resources/ruby.rb
> 2. add 'a = %w' (without quotes) as a separate new line before all of
> the existing code.
> 3. Notice the delay in redisplay after you type 'w'.
>
> In you do that with a larger file, BTW, this delay may be on the order
> of a minute. Here's an example of such file:
> https://github.com/rails/rails/blob/main/activerecord/lib/active_record/associations.rb
>
> The superficial reason for this delay is that %w opens a new "array of
> strings" literal which parses every separate word in the rest of the
> buffer as a separate string. So we get a node with thousands of
> children, in the case of associations.rb. Or just ~1000 in the case of
> ruby.rb.
>
> I also tried setting treesit--font-lock-fast-mode to t: no effect.
>
> But! If we do the same not on top-level -- say, put the 'a = %w' line
> after the 'foo' line inside the first 'if' statement (i.e. on line 7),
> the delay is much smaller -- not noticeable in ruby.rb, and still
> apparent but much more bearable in associations.rb (you can put that
> statement right after 'module ActiveRecord') -- even though the size
> of the tree is changed minimally, and the number of children nodes for
> that "array of strings" still counts in the thousands (e.g. 13319).
>
> Perf report for the "bad" highlighting delay looks like this:
>
>   61.19%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_current_status
>   30.88%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_parent_node
>    7.44%  emacs libtree-sitter.so.0.0  [.] ts_language_symbol_metadata
>    0.06%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_first_child
>    0.05%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_next_sibling
>    0.03%  emacs libtree-sitter.so.0.0  [.] ts_node_end_byte
>
> And like this in the "good" case (with many type-backspace repetitions):
>
>   32.10%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_current_status
>    9.50%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_first_child
>    7.89%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_goto_next_sibling
>    7.51%  emacs libtree-sitter.so.0.0  [.] ts_language_symbol_metadata
>    6.45%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_parent_node
>    1.87%  emacs libtree-sitter.so.0.0  [.] ts_node_start_point
>    1.85%  emacs emacs  [.] process_mark_stack
>    0.93%  emacs libtree-sitter.so.0.0  [.] ts_tree_cursor_current_node

Interesting. Perhaps it has to do with how tree-sitter implements the
"incremental" part of the parser? But the profile doesn’t look like it’s
spending time parsing, I need to look at what does
ts_tree_cursor_current_status actually do (maybe it’s used in parsing?)

Yuan





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode
  2023-02-06  0:08 ` Yuan Fu
@ 2023-02-06  1:03   ` Dmitry Gutov
  2023-02-06  3:20     ` Dmitry Gutov
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Gutov @ 2023-02-06  1:03 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 61285

On 06/02/2023 02:08, Yuan Fu wrote:
> Interesting. Perhaps it has to do with how tree-sitter implements the
> "incremental" part of the parser? But the profile doesn’t look like it’s
> spending time parsing, 

According to my tests, what gets slower are the treesit-query-capture 
calls. And I mean all of them (for every element in 
treesit-font-lock-settings), not just the first one, which I imagine 
would be the case if tree-sitter needed to finish parsing the current 
buffer contents.

If I just wrap the treesit-query-capture calls inside 
treesit-font-lock-fontify-region in benchmark-progn, with %w inside the 
'if' block the queries are an order of a magnitude faster than with it 
at top level.

E.g. in the ruby.rb example, the former look like

...
Elapsed time: 0.001648s
Elapsed time: 0.001498s
Elapsed time: 0.001211s
Elapsed time: 0.000949s
Elapsed time: 0.000950s
...

and the latter are like

...
Elapsed time: 0.006567s
Elapsed time: 0.006583s
Elapsed time: 0.007072s
Elapsed time: 0.006867s
Elapsed time: 0.006575s
Elapsed time: 0.006608s
...

Multiply that by 19 (the number of rules), and we get the perceived delay.

And for associations.rb, the query times are 0.004322s vs 1.083029s.

 > I need to look at what does
 > ts_tree_cursor_current_status actually do (maybe it’s used in parsing?)

   // Private - Get various facts about the current node that are needed
   // when executing tree queries.
   void ts_tree_cursor_current_status(

https://github.com/tree-sitter/tree-sitter/blob/84c1c6a271cd0ab742ce0f46cd3576a6f6bf5b8c/lib/src/tree_cursor.c#L284

I see it is called by ts_query_cursor__advance, which is in turn called 
by ts_query_cursor_next_match. And ts_query_cursor_next_capture, which 
we don't seem to be using.

Here's an existing report on its tracker which might be relevant: 
https://github.com/tree-sitter/tree-sitter/issues/1972

Similar perf report, though not exactly the same.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode
  2023-02-06  1:03   ` Dmitry Gutov
@ 2023-02-06  3:20     ` Dmitry Gutov
  2023-02-21  3:12       ` Dmitry Gutov
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Gutov @ 2023-02-06  3:20 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 61285

On 06/02/2023 03:03, Dmitry Gutov wrote:
> And for associations.rb, the query times are 0.004322s vs 1.083029s.

Similarly, using tree-sitter-cli,

   time tree-sitter query associations-query.scm associations.rb

with a simple query also reports ~1.3s here.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode
  2023-02-06  3:20     ` Dmitry Gutov
@ 2023-02-21  3:12       ` Dmitry Gutov
  2023-02-21  8:14         ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Gutov @ 2023-02-21  3:12 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 61285

On 06/02/2023 05:20, Dmitry Gutov wrote:
> On 06/02/2023 03:03, Dmitry Gutov wrote:
>> And for associations.rb, the query times are 0.004322s vs 1.083029s.
> 
> Similarly, using tree-sitter-cli,
> 
>    time tree-sitter query associations-query.scm associations.rb
> 
> with a simple query also reports ~1.3s here.

Looks like this got fixed in 
https://github.com/tree-sitter/tree-sitter/commit/0b817a609f7cd3d7309a81dbfe96287c6945a085, 
just a week ago.

And further improved (to the point where the delay seems entirely gone) 
in https://github.com/tree-sitter/tree-sitter/pull/2085.

Too bad it'll be a while until it makes in into popular distros.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode
  2023-02-21  3:12       ` Dmitry Gutov
@ 2023-02-21  8:14         ` Yuan Fu
  2023-02-21  9:53           ` Dmitry Gutov
  0 siblings, 1 reply; 8+ messages in thread
From: Yuan Fu @ 2023-02-21  8:14 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 61285



> On Feb 20, 2023, at 7:12 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> On 06/02/2023 05:20, Dmitry Gutov wrote:
>> On 06/02/2023 03:03, Dmitry Gutov wrote:
>>> And for associations.rb, the query times are 0.004322s vs 1.083029s.
>> Similarly, using tree-sitter-cli,
>>   time tree-sitter query associations-query.scm associations.rb
>> with a simple query also reports ~1.3s here.
> 
> Looks like this got fixed in https://github.com/tree-sitter/tree-sitter/commit/0b817a609f7cd3d7309a81dbfe96287c6945a085, just a week ago.

Great news!

> 
> And further improved (to the point where the delay seems entirely gone) in https://github.com/tree-sitter/tree-sitter/pull/2085.

Looks fantastic, this might render one of the reason for using the “fast mode” obsolete. 
> 
> Too bad it'll be a while until it makes in into popular distros.

See you in Emacs 30 ;-)

Yuan




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode
  2023-02-21  8:14         ` Yuan Fu
@ 2023-02-21  9:53           ` Dmitry Gutov
  2023-02-27  0:37             ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Dmitry Gutov @ 2023-02-21  9:53 UTC (permalink / raw)
  To: Yuan Fu; +Cc: 61285

On 21/02/2023 10:14, Yuan Fu wrote:
>> Too bad it'll be a while until it makes in into popular distros.
> See you in Emacs 30 😉

I suppose we could also start bundling the latest libtree-sitter and/or 
offer a helper to compile the latest (recommended) version?

It seems to be even easier to build than the grammars.





^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode
  2023-02-21  9:53           ` Dmitry Gutov
@ 2023-02-27  0:37             ` Yuan Fu
  0 siblings, 0 replies; 8+ messages in thread
From: Yuan Fu @ 2023-02-27  0:37 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: 61285



> On Feb 21, 2023, at 1:53 AM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> On 21/02/2023 10:14, Yuan Fu wrote:
>>> Too bad it'll be a while until it makes in into popular distros.
>> See you in Emacs 30 😉
> 
> I suppose we could also start bundling the latest libtree-sitter and/or offer a helper to compile the latest (recommended) version?
> 
> It seems to be even easier to build than the grammars.

There has been many discussions about bundling libtree-sitter, but the decision is to not. Personally I think this new improvement is great, but probably doesn’t justify including libtree-sitter.

Yuan




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-02-27  0:37 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-02-05  0:39 bug#61285: (Sometimes very) slow font-lock after %w in ruby-ts-mode Dmitry Gutov
2023-02-06  0:08 ` Yuan Fu
2023-02-06  1:03   ` Dmitry Gutov
2023-02-06  3:20     ` Dmitry Gutov
2023-02-21  3:12       ` Dmitry Gutov
2023-02-21  8:14         ` Yuan Fu
2023-02-21  9:53           ` Dmitry Gutov
2023-02-27  0:37             ` Yuan Fu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).