unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
@ 2023-01-20  3:53 Dmitry Gutov
  2023-01-24  4:04 ` Dmitry Gutov
  0 siblings, 1 reply; 47+ messages in thread
From: Dmitry Gutov @ 2023-01-20  3:53 UTC (permalink / raw)
  To: 60953

[-- Attachment #1: Type: text/plain, Size: 2831 bytes --]

In my benchmarking -- using this form in 
test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:

   (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) 
(let (treesit--font-lock-fast-mode) (font-lock-ensure))))

the rule added to its font-lock in commit d66ac5285f7

    :language language
    :feature 'builtin-functions
    `((((identifier) @font-lock-builtin-face)
       (:match ,ruby-ts--builtin-methods
        @font-lock-builtin-face)))

...seems to have made it 50% slower.

The profile looked like this:

   9454  84%                   - font-lock-fontify-region
   9454  84%                    - font-lock-default-fontify-region
   8862  79%                     - font-lock-fontify-syntactically-region
   8702  78%                      - treesit-font-lock-fontify-region
    128   1%                         treesit-fontify-with-override
    123   1%                         facep
     84   0% 
treesit--children-covering-range-recurse
     60   0%                       + ruby-ts--comment-font-lock
      4   0%                       + font-lock-unfontify-region
    568   5%                     + font-lock-fontify-keywords-region
     16   0%                     + font-lock-unfontify-region

So there's nothing on the Lisp level to look at.

Looking at the code, apparently we get a cursor and basically iterate 
through all (identifier) nodes, running our predicate manually.

Without trying something more advanced like perf, I took a stab in the 
dark and tried to reduce string allocation in treesit_predicate_match 
(it currently ends up delegating to buffer-substring for every node), 
which seemed inefficient. But while my patch (attached) compiles and 
doesn't crash, it doesn't actually work (the rule's highlighting is 
missing), and the performance was unchanged.

This message was originally longer, but see commit d94dc606a09: I 
switched to using :pred -- thus avoiding embedding the 720-char long 
regexp in the query -- and the performance drop got reduced to like 20%.

As a baseline, this simplified query without predicates and colors every 
identifier in the buffer using the specified face, is still faster (just 
10% over the original):

    :language language
    :feature 'builtin-function
    `(((identifier) @font-lock-builtin-face))

The regexp matching itself doesn't seem to be the problem:

(benchmark 354100 '(string-match-p ruby-ts--builtin-methods "gsub"))

=> Elapsed time: 0.141681s

-- whereas the difference between the benchmarks is on the order of seconds.

I think the marshaling of the long regexp string back and forth could be 
the culprit. Would be nice to fix that somehow.

I also think that trying to reduce the string allocation overhead has 
potential, but so far all my experiments haven't moved the needle 
anywhere noticeable.

[-- Attachment #2: treesit_predicate_match.diff --]
[-- Type: text/x-patch, Size: 1440 bytes --]

diff --git a/src/treesit.c b/src/treesit.c
index 917db582676..7e294a0a66f 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2466,10 +2466,26 @@ treesit_predicate_match (Lisp_Object args, struct capture_range captures)
 	      build_string ("The second argument to `match' should "
 		            "be a capture name, not a string"));
 
-  Lisp_Object text = treesit_predicate_capture_name_to_text (capture_name,
-							     captures);
+  Lisp_Object node = treesit_predicate_capture_name_to_node (capture_name, captures);
 
-  if (fast_string_match (regexp, text) >= 0)
+  struct buffer *old_buffer = current_buffer;
+  struct buffer *buffer = XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+  set_buffer_internal (buffer);
+
+  TSNode treesit_node = XTS_NODE (node)->node;
+  ptrdiff_t visible_beg = XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+  uint32_t start_byte_offset = ts_node_start_byte (treesit_node);
+  uint32_t end_byte_offset = ts_node_end_byte (treesit_node);
+  ptrdiff_t start_byte = visible_beg + start_byte_offset;
+  ptrdiff_t end_byte = visible_beg + end_byte_offset;
+  ptrdiff_t start_pos = buf_bytepos_to_charpos (buffer, start_byte);
+  ptrdiff_t end_pos = buf_bytepos_to_charpos (buffer, end_byte);
+
+  ptrdiff_t val = fast_looking_at (regexp, start_pos, start_byte, end_pos, end_byte, Qnil);
+
+  set_buffer_internal (old_buffer);
+
+  if (val >= 0)
     return true;
   else
     return false;

^ permalink raw reply related	[flat|nested] 47+ messages in thread

end of thread, other threads:[~2023-02-02 19:44 UTC | newest]

Thread overview: 47+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-01-20  3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24  4:04 ` Dmitry Gutov
2023-01-25  3:13   ` Yuan Fu
2023-01-25  3:48     ` Dmitry Gutov
2023-01-25 12:49       ` Eli Zaretskii
2023-01-25 23:21         ` Dmitry Gutov
2023-01-26  6:50           ` Eli Zaretskii
2023-01-26  7:17             ` Yuan Fu
2023-01-26  8:10               ` Eli Zaretskii
2023-01-26 17:15                 ` Dmitry Gutov
2023-01-26 18:24                   ` Eli Zaretskii
2023-01-26 19:35                     ` Dmitry Gutov
2023-01-26 20:01                       ` Eli Zaretskii
2023-01-26 21:26                         ` Dmitry Gutov
2023-01-30  0:49                           ` Dmitry Gutov
2023-01-30 14:06                             ` Eli Zaretskii
2023-01-30 14:47                               ` Dmitry Gutov
2023-01-30 15:08                                 ` Eli Zaretskii
2023-01-30 17:15                                   ` Dmitry Gutov
2023-01-30 17:49                                     ` Eli Zaretskii
2023-01-30 18:20                                       ` Dmitry Gutov
2023-01-30 18:42                                         ` Eli Zaretskii
2023-01-30 19:01                                           ` Dmitry Gutov
2023-01-30 19:05                                             ` Eli Zaretskii
2023-01-30 19:58                                               ` Dmitry Gutov
2023-01-30 23:57                                                 ` Yuan Fu
2023-01-31  0:44                                                   ` Dmitry Gutov
2023-01-31  3:23                                                 ` Eli Zaretskii
2023-01-31 18:16                                                   ` Dmitry Gutov
2023-02-01  2:39                                                     ` Dmitry Gutov
2023-02-01 13:39                                                       ` Eli Zaretskii
2023-02-01 15:13                                                         ` Dmitry Gutov
2023-02-01 21:20                                                         ` Dmitry Gutov
2023-02-02  2:16                                                           ` Yuan Fu
2023-02-02  6:34                                                           ` Eli Zaretskii
2023-02-02 12:12                                                             ` Dmitry Gutov
2023-02-02 14:23                                                               ` Eli Zaretskii
2023-02-02 17:03                                                                 ` Dmitry Gutov
2023-02-02 17:26                                                                   ` Eli Zaretskii
2023-02-02 17:53                                                                     ` Dmitry Gutov
2023-02-02 18:03                                                                       ` Eli Zaretskii
2023-02-02 19:44                                                                         ` Dmitry Gutov
2023-02-01 13:10                                                     ` Eli Zaretskii
2023-02-01 15:15                                                       ` Dmitry Gutov
2023-01-26 17:12               ` Dmitry Gutov
2023-01-26 18:07             ` Dmitry Gutov
2023-01-26 20:46               ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).