From: Yuan Fu <casouri@gmail.com>
To: Dmitry Gutov <dgutov@yandex.ru>
Cc: 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Tue, 24 Jan 2023 19:13:29 -0800 [thread overview]
Message-ID: <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@gmail.com> (raw)
In-Reply-To: <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru>
> On Jan 23, 2023, at 8:04 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
>
> Cc-ing Yuan, just in case.
>
> On 20/01/2023 05:53, Dmitry Gutov wrote:
>> In my benchmarking -- using this form in test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:
>> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure))))
>> the rule added to its font-lock in commit d66ac5285f7
>> :language language
>> :feature 'builtin-functions
>> `((((identifier) @font-lock-builtin-face)
>> (:match ,ruby-ts--builtin-methods
>> @font-lock-builtin-face)))
>> ...seems to have made it 50% slower.
>> The profile looked like this:
>> 9454 84% - font-lock-fontify-region
>> 9454 84% - font-lock-default-fontify-region
>> 8862 79% - font-lock-fontify-syntactically-region
>> 8702 78% - treesit-font-lock-fontify-region
>> 128 1% treesit-fontify-with-override
>> 123 1% facep
>> 84 0% treesit--children-covering-range-recurse
>> 60 0% + ruby-ts--comment-font-lock
>> 4 0% + font-lock-unfontify-region
>> 568 5% + font-lock-fontify-keywords-region
>> 16 0% + font-lock-unfontify-region
>> So there's nothing on the Lisp level to look at.
>
> I've done some perf recordings now. It seems most/all of the difference comes down to garbage collection. Or more concretely, time spent inside process_mark_stack.
>
> Without the added query benchmark reports:
>
> (10.13723333 49 1.141649534999999)
>
> And the perf top5 is:
>
> 17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status
> 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling
> 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
> 8.37% emacs emacs [.] process_mark_stack
> 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> With this simple query that colors everything:
>
> :language language
> :feature 'builtin-function
> `((((identifier) @font-lock-builtin-face)))
>
> I get:
>
> (11.993968995 82 1.9326509270000045)
>
> Note the jump in runtime that's larger than the jump in GC.
>
> 17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status
> 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling
> 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
> 8.37% emacs emacs [.] process_mark_stack
> 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> The current query looks like this:
>
> :language language
> :feature 'builtin-function
> `((((identifier) @font-lock-builtin-face)
> (:pred ruby-ts--builtin-method-p @font-lock-builtin-face)))
>
> Benchmarking:
>
> (12.493614359 107 2.558609025999999)
>
> 15.30% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status
> 14.92% emacs emacs [.] process_mark_stack
> 9.75% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling
> 8.90% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
> 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> Here we get the same jump in runtime as in GC. Even though this rule ends up coloring much fewer (almost none) nodes in the current buffer. I interpret the results like this:
>
> - The jump in runtime of the previous query was probably related to the number of nodes needed to be processed, but not with the resulting highlighting, even though every identifier in the buffer ends up being colored.
>
> - The GC overhead created by the predicates is non-negligible.
>
> And the original query that I tried:
>
> :language language
> :feature 'builtin-function
> `((((identifier) @font-lock-builtin-face)
> (:match ,ruby-ts--builtin-methods @font-lock-builtin-face)))
>
> Benchmarking:
>
> (16.433451865000002 249 5.908674810000001)
>
> 23.72% emacs emacs [.] process_mark_stack
> 12.33% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status
> 7.96% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling
> 7.38% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child
> 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
>
> Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking?
>
> I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.
>
FYI the predicates are not processed by tree-sitter, but by us. For example, the #equal predicate is handled by treesit_predicate_equal. For #match, right now we create a string with Fbuffer_substring and pass it to fast_string_match, so it definitely causes a lot of gc’s, as you observed. We can probably match the regexp in-place, just limit the match to the range of the node.
Yuan
next prev parent reply other threads:[~2023-01-25 3:13 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24 4:04 ` Dmitry Gutov
2023-01-25 3:13 ` Yuan Fu [this message]
2023-01-25 3:48 ` Dmitry Gutov
2023-01-25 12:49 ` Eli Zaretskii
2023-01-25 23:21 ` Dmitry Gutov
2023-01-26 6:50 ` Eli Zaretskii
2023-01-26 7:17 ` Yuan Fu
2023-01-26 8:10 ` Eli Zaretskii
2023-01-26 17:15 ` Dmitry Gutov
2023-01-26 18:24 ` Eli Zaretskii
2023-01-26 19:35 ` Dmitry Gutov
2023-01-26 20:01 ` Eli Zaretskii
2023-01-26 21:26 ` Dmitry Gutov
2023-01-30 0:49 ` Dmitry Gutov
2023-01-30 14:06 ` Eli Zaretskii
2023-01-30 14:47 ` Dmitry Gutov
2023-01-30 15:08 ` Eli Zaretskii
2023-01-30 17:15 ` Dmitry Gutov
2023-01-30 17:49 ` Eli Zaretskii
2023-01-30 18:20 ` Dmitry Gutov
2023-01-30 18:42 ` Eli Zaretskii
2023-01-30 19:01 ` Dmitry Gutov
2023-01-30 19:05 ` Eli Zaretskii
2023-01-30 19:58 ` Dmitry Gutov
2023-01-30 23:57 ` Yuan Fu
2023-01-31 0:44 ` Dmitry Gutov
2023-01-31 3:23 ` Eli Zaretskii
2023-01-31 18:16 ` Dmitry Gutov
2023-02-01 2:39 ` Dmitry Gutov
2023-02-01 13:39 ` Eli Zaretskii
2023-02-01 15:13 ` Dmitry Gutov
2023-02-01 21:20 ` Dmitry Gutov
2023-02-02 2:16 ` Yuan Fu
2023-02-02 6:34 ` Eli Zaretskii
2023-02-02 12:12 ` Dmitry Gutov
2023-02-02 14:23 ` Eli Zaretskii
2023-02-02 17:03 ` Dmitry Gutov
2023-02-02 17:26 ` Eli Zaretskii
2023-02-02 17:53 ` Dmitry Gutov
2023-02-02 18:03 ` Eli Zaretskii
2023-02-02 19:44 ` Dmitry Gutov
2023-02-01 13:10 ` Eli Zaretskii
2023-02-01 15:15 ` Dmitry Gutov
2023-01-26 17:12 ` Dmitry Gutov
2023-01-26 18:07 ` Dmitry Gutov
2023-01-26 20:46 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@gmail.com \
--to=casouri@gmail.com \
--cc=60953@debbugs.gnu.org \
--cc=dgutov@yandex.ru \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).