all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Yuan Fu <casouri@gmail.com>
To: Dmitry Gutov <dgutov@yandex.ru>
Cc: 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Tue, 24 Jan 2023 19:13:29 -0800	[thread overview]
Message-ID: <AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@gmail.com> (raw)
In-Reply-To: <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru>



> On Jan 23, 2023, at 8:04 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> Cc-ing Yuan, just in case.
> 
> On 20/01/2023 05:53, Dmitry Gutov wrote:
>> In my benchmarking -- using this form in test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:
>>   (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure))))
>> the rule added to its font-lock in commit d66ac5285f7
>>    :language language
>>    :feature 'builtin-functions
>>    `((((identifier) @font-lock-builtin-face)
>>       (:match ,ruby-ts--builtin-methods
>>        @font-lock-builtin-face)))
>> ...seems to have made it 50% slower.
>> The profile looked like this:
>>   9454  84%                   - font-lock-fontify-region
>>   9454  84%                    - font-lock-default-fontify-region
>>   8862  79%                     - font-lock-fontify-syntactically-region
>>   8702  78%                      - treesit-font-lock-fontify-region
>>    128   1%                         treesit-fontify-with-override
>>    123   1%                         facep
>>     84   0% treesit--children-covering-range-recurse
>>     60   0%                       + ruby-ts--comment-font-lock
>>      4   0%                       + font-lock-unfontify-region
>>    568   5%                     + font-lock-fontify-keywords-region
>>     16   0%                     + font-lock-unfontify-region
>> So there's nothing on the Lisp level to look at.
> 
> I've done some perf recordings now. It seems most/all of the difference comes down to garbage collection. Or more concretely, time spent inside process_mark_stack.
> 
> Without the added query benchmark reports:
> 
> (10.13723333 49 1.141649534999999)
> 
> And the perf top5 is:
> 
>  17.26%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_current_status
>  10.83%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_goto_next_sibling
>  10.18%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_goto_first_child
>   8.37%  emacs         emacs                       [.] process_mark_stack
>   4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point
> 
> With this simple query that colors everything:
> 
>   :language language
>   :feature 'builtin-function
>   `((((identifier) @font-lock-builtin-face)))
> 
> I get:
> 
> (11.993968995 82 1.9326509270000045)
> 
> Note the jump in runtime that's larger than the jump in GC.
> 
>  17.26%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_current_status
>  10.83%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_goto_next_sibling
>  10.18%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_goto_first_child
>   8.37%  emacs         emacs                       [.] process_mark_stack
>   4.63%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point
> 
> The current query looks like this:
> 
>   :language language
>   :feature 'builtin-function
>   `((((identifier) @font-lock-builtin-face)
>      (:pred ruby-ts--builtin-method-p @font-lock-builtin-face)))
> 
> Benchmarking:
> 
> (12.493614359 107 2.558609025999999)
> 
>  15.30%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_current_status
>  14.92%  emacs         emacs                       [.] process_mark_stack
>   9.75%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_goto_next_sibling
>   8.90%  emacs         libtree-sitter.so.0.0       [.] ts_tree_cursor_goto_first_child
>   3.87%  emacs         libtree-sitter.so.0.0       [.] ts_node_start_point
> 
> Here we get the same jump in runtime as in GC. Even though this rule ends up coloring much fewer (almost none) nodes in the current buffer. I interpret the results like this:
> 
> - The jump in runtime of the previous query was probably related to the number of nodes needed to be processed, but not with the resulting highlighting, even though every identifier in the buffer ends up being colored.
> 
> - The GC overhead created by the predicates is non-negligible.
> 
> And the original query that I tried:
> 
>   :language language
>   :feature 'builtin-function
>   `((((identifier) @font-lock-builtin-face)
>      (:match ,ruby-ts--builtin-methods @font-lock-builtin-face)))
> 
> Benchmarking:
> 
> (16.433451865000002 249 5.908674810000001)
> 
>  23.72%  emacs         emacs                    [.] process_mark_stack
>  12.33%  emacs         libtree-sitter.so.0.0    [.] ts_tree_cursor_current_status
>   7.96%  emacs         libtree-sitter.so.0.0    [.] ts_tree_cursor_goto_next_sibling
>   7.38%  emacs         libtree-sitter.so.0.0    [.] ts_tree_cursor_goto_first_child
>   3.37%  emacs         libtree-sitter.so.0.0    [.] ts_node_start_point
> 
> Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking?
> 
> I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.
> 

FYI the predicates are not processed by tree-sitter, but by us. For example, the #equal predicate is handled by treesit_predicate_equal. For #match, right now we create a string with Fbuffer_substring and pass it to fast_string_match, so it definitely causes a lot of gc’s, as you observed. We can probably match the regexp in-place, just limit the match to the range of the node.

Yuan






  reply	other threads:[~2023-01-25  3:13 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20  3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24  4:04 ` Dmitry Gutov
2023-01-25  3:13   ` Yuan Fu [this message]
2023-01-25  3:48     ` Dmitry Gutov
2023-01-25 12:49       ` Eli Zaretskii
2023-01-25 23:21         ` Dmitry Gutov
2023-01-26  6:50           ` Eli Zaretskii
2023-01-26  7:17             ` Yuan Fu
2023-01-26  8:10               ` Eli Zaretskii
2023-01-26 17:15                 ` Dmitry Gutov
2023-01-26 18:24                   ` Eli Zaretskii
2023-01-26 19:35                     ` Dmitry Gutov
2023-01-26 20:01                       ` Eli Zaretskii
2023-01-26 21:26                         ` Dmitry Gutov
2023-01-30  0:49                           ` Dmitry Gutov
2023-01-30 14:06                             ` Eli Zaretskii
2023-01-30 14:47                               ` Dmitry Gutov
2023-01-30 15:08                                 ` Eli Zaretskii
2023-01-30 17:15                                   ` Dmitry Gutov
2023-01-30 17:49                                     ` Eli Zaretskii
2023-01-30 18:20                                       ` Dmitry Gutov
2023-01-30 18:42                                         ` Eli Zaretskii
2023-01-30 19:01                                           ` Dmitry Gutov
2023-01-30 19:05                                             ` Eli Zaretskii
2023-01-30 19:58                                               ` Dmitry Gutov
2023-01-30 23:57                                                 ` Yuan Fu
2023-01-31  0:44                                                   ` Dmitry Gutov
2023-01-31  3:23                                                 ` Eli Zaretskii
2023-01-31 18:16                                                   ` Dmitry Gutov
2023-02-01  2:39                                                     ` Dmitry Gutov
2023-02-01 13:39                                                       ` Eli Zaretskii
2023-02-01 15:13                                                         ` Dmitry Gutov
2023-02-01 21:20                                                         ` Dmitry Gutov
2023-02-02  2:16                                                           ` Yuan Fu
2023-02-02  6:34                                                           ` Eli Zaretskii
2023-02-02 12:12                                                             ` Dmitry Gutov
2023-02-02 14:23                                                               ` Eli Zaretskii
2023-02-02 17:03                                                                 ` Dmitry Gutov
2023-02-02 17:26                                                                   ` Eli Zaretskii
2023-02-02 17:53                                                                     ` Dmitry Gutov
2023-02-02 18:03                                                                       ` Eli Zaretskii
2023-02-02 19:44                                                                         ` Dmitry Gutov
2023-02-01 13:10                                                     ` Eli Zaretskii
2023-02-01 15:15                                                       ` Dmitry Gutov
2023-01-26 17:12               ` Dmitry Gutov
2023-01-26 18:07             ` Dmitry Gutov
2023-01-26 20:46               ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=AB9CD94C-CB3C-4D84-B4AA-22EDC206EB12@gmail.com \
    --to=casouri@gmail.com \
    --cc=60953@debbugs.gnu.org \
    --cc=dgutov@yandex.ru \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.