all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Fri, 20 Jan 2023 05:53:12 +0200	[thread overview]
Message-ID: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> (raw)

[-- Attachment #1: Type: text/plain, Size: 2831 bytes --]

In my benchmarking -- using this form in 
test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode:

   (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) 
(let (treesit--font-lock-fast-mode) (font-lock-ensure))))

the rule added to its font-lock in commit d66ac5285f7

    :language language
    :feature 'builtin-functions
    `((((identifier) @font-lock-builtin-face)
       (:match ,ruby-ts--builtin-methods
        @font-lock-builtin-face)))

...seems to have made it 50% slower.

The profile looked like this:

   9454  84%                   - font-lock-fontify-region
   9454  84%                    - font-lock-default-fontify-region
   8862  79%                     - font-lock-fontify-syntactically-region
   8702  78%                      - treesit-font-lock-fontify-region
    128   1%                         treesit-fontify-with-override
    123   1%                         facep
     84   0% 
treesit--children-covering-range-recurse
     60   0%                       + ruby-ts--comment-font-lock
      4   0%                       + font-lock-unfontify-region
    568   5%                     + font-lock-fontify-keywords-region
     16   0%                     + font-lock-unfontify-region

So there's nothing on the Lisp level to look at.

Looking at the code, apparently we get a cursor and basically iterate 
through all (identifier) nodes, running our predicate manually.

Without trying something more advanced like perf, I took a stab in the 
dark and tried to reduce string allocation in treesit_predicate_match 
(it currently ends up delegating to buffer-substring for every node), 
which seemed inefficient. But while my patch (attached) compiles and 
doesn't crash, it doesn't actually work (the rule's highlighting is 
missing), and the performance was unchanged.

This message was originally longer, but see commit d94dc606a09: I 
switched to using :pred -- thus avoiding embedding the 720-char long 
regexp in the query -- and the performance drop got reduced to like 20%.

As a baseline, this simplified query without predicates and colors every 
identifier in the buffer using the specified face, is still faster (just 
10% over the original):

    :language language
    :feature 'builtin-function
    `(((identifier) @font-lock-builtin-face))

The regexp matching itself doesn't seem to be the problem:

(benchmark 354100 '(string-match-p ruby-ts--builtin-methods "gsub"))

=> Elapsed time: 0.141681s

-- whereas the difference between the benchmarks is on the order of seconds.

I think the marshaling of the long regexp string back and forth could be 
the culprit. Would be nice to fix that somehow.

I also think that trying to reduce the string allocation overhead has 
potential, but so far all my experiments haven't moved the needle 
anywhere noticeable.

[-- Attachment #2: treesit_predicate_match.diff --]
[-- Type: text/x-patch, Size: 1440 bytes --]

diff --git a/src/treesit.c b/src/treesit.c
index 917db582676..7e294a0a66f 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2466,10 +2466,26 @@ treesit_predicate_match (Lisp_Object args, struct capture_range captures)
 	      build_string ("The second argument to `match' should "
 		            "be a capture name, not a string"));
 
-  Lisp_Object text = treesit_predicate_capture_name_to_text (capture_name,
-							     captures);
+  Lisp_Object node = treesit_predicate_capture_name_to_node (capture_name, captures);
 
-  if (fast_string_match (regexp, text) >= 0)
+  struct buffer *old_buffer = current_buffer;
+  struct buffer *buffer = XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+  set_buffer_internal (buffer);
+
+  TSNode treesit_node = XTS_NODE (node)->node;
+  ptrdiff_t visible_beg = XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+  uint32_t start_byte_offset = ts_node_start_byte (treesit_node);
+  uint32_t end_byte_offset = ts_node_end_byte (treesit_node);
+  ptrdiff_t start_byte = visible_beg + start_byte_offset;
+  ptrdiff_t end_byte = visible_beg + end_byte_offset;
+  ptrdiff_t start_pos = buf_bytepos_to_charpos (buffer, start_byte);
+  ptrdiff_t end_pos = buf_bytepos_to_charpos (buffer, end_byte);
+
+  ptrdiff_t val = fast_looking_at (regexp, start_pos, start_byte, end_pos, end_byte, Qnil);
+
+  set_buffer_internal (old_buffer);
+
+  if (val >= 0)
     return true;
   else
     return false;

             reply	other threads:[~2023-01-20  3:53 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20  3:53 Dmitry Gutov [this message]
2023-01-24  4:04 ` bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-25  3:13   ` Yuan Fu
2023-01-25  3:48     ` Dmitry Gutov
2023-01-25 12:49       ` Eli Zaretskii
2023-01-25 23:21         ` Dmitry Gutov
2023-01-26  6:50           ` Eli Zaretskii
2023-01-26  7:17             ` Yuan Fu
2023-01-26  8:10               ` Eli Zaretskii
2023-01-26 17:15                 ` Dmitry Gutov
2023-01-26 18:24                   ` Eli Zaretskii
2023-01-26 19:35                     ` Dmitry Gutov
2023-01-26 20:01                       ` Eli Zaretskii
2023-01-26 21:26                         ` Dmitry Gutov
2023-01-30  0:49                           ` Dmitry Gutov
2023-01-30 14:06                             ` Eli Zaretskii
2023-01-30 14:47                               ` Dmitry Gutov
2023-01-30 15:08                                 ` Eli Zaretskii
2023-01-30 17:15                                   ` Dmitry Gutov
2023-01-30 17:49                                     ` Eli Zaretskii
2023-01-30 18:20                                       ` Dmitry Gutov
2023-01-30 18:42                                         ` Eli Zaretskii
2023-01-30 19:01                                           ` Dmitry Gutov
2023-01-30 19:05                                             ` Eli Zaretskii
2023-01-30 19:58                                               ` Dmitry Gutov
2023-01-30 23:57                                                 ` Yuan Fu
2023-01-31  0:44                                                   ` Dmitry Gutov
2023-01-31  3:23                                                 ` Eli Zaretskii
2023-01-31 18:16                                                   ` Dmitry Gutov
2023-02-01  2:39                                                     ` Dmitry Gutov
2023-02-01 13:39                                                       ` Eli Zaretskii
2023-02-01 15:13                                                         ` Dmitry Gutov
2023-02-01 21:20                                                         ` Dmitry Gutov
2023-02-02  2:16                                                           ` Yuan Fu
2023-02-02  6:34                                                           ` Eli Zaretskii
2023-02-02 12:12                                                             ` Dmitry Gutov
2023-02-02 14:23                                                               ` Eli Zaretskii
2023-02-02 17:03                                                                 ` Dmitry Gutov
2023-02-02 17:26                                                                   ` Eli Zaretskii
2023-02-02 17:53                                                                     ` Dmitry Gutov
2023-02-02 18:03                                                                       ` Eli Zaretskii
2023-02-02 19:44                                                                         ` Dmitry Gutov
2023-02-01 13:10                                                     ` Eli Zaretskii
2023-02-01 15:15                                                       ` Dmitry Gutov
2023-01-26 17:12               ` Dmitry Gutov
2023-01-26 18:07             ` Dmitry Gutov
2023-01-26 20:46               ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=60953@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.