unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: Eli Zaretskii <eliz@gnu.org>
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Mon, 30 Jan 2023 02:49:47 +0200	[thread overview]
Message-ID: <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@yandex.ru> (raw)
In-Reply-To: <2da844d3-ea31-289e-2821-aa174e365ffd@yandex.ru>

[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]

On 26/01/2023 23:26, Dmitry Gutov wrote:
>>>> (But I thought you concluded that GC alone cannot explain the
>>>> difference in performance?)
>>> I'm inclined to think the difference is related to copying of the regexp
>>> string, but whether the time is spent in actually copying it, or
>>> scanning its copies for garbage later, it was harder to say. Seems like
>>> it's the latter, though.
>> If we can avoid the copying, I think it's desirable in any case.  They
>> are constant regexps, aren't they?
> 
> Yes, but how?
> 
> Memoization is one possible step, but then we only avoid re-creating the 
> predicate structures for each match. We still send a pretty large query 
> and, apparently, get it back..? Might be some copying involved there.
> 
> TBH the moderate success the memoization patch shows has me stumped.

Okay, I have cleaned up both experiments that I had. And when combined, 
they make the :match approach a little faster than the :pred one.

I'm still not sure why the difference is so little, given that the :pred 
one has Lisp funcalls and extra allocation, and :match does not.

Still, if nobody has any better ideas, I suggest we install both of 
these changes now. They are attached in separate patches.

memoize_vector.diff improves the performance of both cases. For :pred, 
it's roughly 10%; for :match, it's more.

treesit_predicate_match.diff improves the performance of the latter, 
though only a little: maybe 3-4%.

Code review welcome.

Is applying (and undoing) the narrowing this way legal enough? Or should 
I go through some error handlers, or ensure blocks, etc?

Speaking of pref, the profile looks like this now (very similar to what 
it was before the added rule):

   17.25%  emacs  libtree-sitter.so.0.0       [.] 
ts_tree_cursor_current_status
   10.93%  emacs  libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_next_sibling
    9.89%  emacs  libtree-sitter.so.0.0       [.] 
ts_tree_cursor_goto_first_child
    9.01%  emacs  emacs                       [.] process_mark_stack
    4.80%  emacs  libtree-sitter.so.0.0       [.] ts_node_start_point
    3.84%  emacs  emacs                       [.] re_match_2_internal
    3.82%  emacs  libtree-sitter.so.0.0       [.] ts_tree_cursor_parent_node
    3.06%  emacs  libtree-sitter.so.0.0       [.] 
ts_language_symbol_metadata

[-- Attachment #2: memoize_vector.diff --]
[-- Type: text/x-patch, Size: 1316 bytes --]

diff --git a/src/treesit.c b/src/treesit.c
index b210ec0923a..71aff3202ae 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2720,8 +2744,10 @@ DEFUN ("treesit-query-capture",
      every for loop and nconc it to RESULT every time.  That is indeed
      the initial implementation in which Yoav found nconc being the
      bottleneck (98.4% of the running time spent on nconc).  */
+  uint32_t patterns_count = ts_query_pattern_count(treesit_query);
   Lisp_Object result = Qnil;
   Lisp_Object prev_result = result;
+  Lisp_Object predicates_table = make_vector(patterns_count, Qt);
   while (ts_query_cursor_next_match (cursor, &match))
     {
       /* Record the checkpoint that we may roll back to.  */
@@ -2750,9 +2776,12 @@ DEFUN ("treesit-query-capture",
 	  result = Fcons (cap, result);
 	}
       /* Get predicates.  */
-      Lisp_Object predicates
-	= treesit_predicates_for_pattern (treesit_query,
-					  match.pattern_index);
+      Lisp_Object predicates = AREF(predicates_table, match.pattern_index);
+      if (EQ (predicates, Qt))
+	{
+	  predicates = treesit_predicates_for_pattern (treesit_query, 0);
+	  ASET(predicates_table, match.pattern_index, predicates);
+	}
 
       /* captures_lisp = Fnreverse (captures_lisp); */
       struct capture_range captures_range = { result, prev_result };

[-- Attachment #3: treesit_predicate_match.diff --]
[-- Type: text/x-patch, Size: 1633 bytes --]

diff --git a/src/treesit.c b/src/treesit.c
index b210ec0923a..3630db42f5e 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2466,10 +2466,34 @@ treesit_predicate_match (Lisp_Object args, struct capture_range captures)
 	      build_string ("The second argument to `match' should "
 		            "be a capture name, not a string"));
 
-  Lisp_Object text = treesit_predicate_capture_name_to_text (capture_name,
-							     captures);
+  Lisp_Object node = treesit_predicate_capture_name_to_node (capture_name, captures);
 
-  if (fast_string_match (regexp, text) >= 0)
+  struct buffer *old_buffer = current_buffer;
+  struct buffer *buffer = XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+  set_buffer_internal (buffer);
+
+  TSNode treesit_node = XTS_NODE (node)->node;
+  ptrdiff_t visible_beg = XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+  uint32_t start_byte_offset = ts_node_start_byte (treesit_node);
+  uint32_t end_byte_offset = ts_node_end_byte (treesit_node);
+  ptrdiff_t start_byte = visible_beg + start_byte_offset;
+  ptrdiff_t end_byte = visible_beg + end_byte_offset;
+  ptrdiff_t start_pos = buf_bytepos_to_charpos (buffer, start_byte);
+  ptrdiff_t end_pos = buf_bytepos_to_charpos (buffer, end_byte);
+  ptrdiff_t old_begv = BEGV;
+  ptrdiff_t old_zv = ZV;
+
+  SET_BUF_BEGV(buffer, start_pos);
+  SET_BUF_ZV(buffer, end_pos);
+
+  ptrdiff_t val = fast_looking_at (regexp, start_pos, start_byte, end_pos, end_byte, Qnil);
+
+  SET_BUF_BEGV(buffer, old_begv);
+  SET_BUF_ZV(buffer, old_zv);
+
+  set_buffer_internal (old_buffer);
+
+  if (val > 0)
     return true;
   else
     return false;

  reply	other threads:[~2023-01-30  0:49 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20  3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24  4:04 ` Dmitry Gutov
2023-01-25  3:13   ` Yuan Fu
2023-01-25  3:48     ` Dmitry Gutov
2023-01-25 12:49       ` Eli Zaretskii
2023-01-25 23:21         ` Dmitry Gutov
2023-01-26  6:50           ` Eli Zaretskii
2023-01-26  7:17             ` Yuan Fu
2023-01-26  8:10               ` Eli Zaretskii
2023-01-26 17:15                 ` Dmitry Gutov
2023-01-26 18:24                   ` Eli Zaretskii
2023-01-26 19:35                     ` Dmitry Gutov
2023-01-26 20:01                       ` Eli Zaretskii
2023-01-26 21:26                         ` Dmitry Gutov
2023-01-30  0:49                           ` Dmitry Gutov [this message]
2023-01-30 14:06                             ` Eli Zaretskii
2023-01-30 14:47                               ` Dmitry Gutov
2023-01-30 15:08                                 ` Eli Zaretskii
2023-01-30 17:15                                   ` Dmitry Gutov
2023-01-30 17:49                                     ` Eli Zaretskii
2023-01-30 18:20                                       ` Dmitry Gutov
2023-01-30 18:42                                         ` Eli Zaretskii
2023-01-30 19:01                                           ` Dmitry Gutov
2023-01-30 19:05                                             ` Eli Zaretskii
2023-01-30 19:58                                               ` Dmitry Gutov
2023-01-30 23:57                                                 ` Yuan Fu
2023-01-31  0:44                                                   ` Dmitry Gutov
2023-01-31  3:23                                                 ` Eli Zaretskii
2023-01-31 18:16                                                   ` Dmitry Gutov
2023-02-01  2:39                                                     ` Dmitry Gutov
2023-02-01 13:39                                                       ` Eli Zaretskii
2023-02-01 15:13                                                         ` Dmitry Gutov
2023-02-01 21:20                                                         ` Dmitry Gutov
2023-02-02  2:16                                                           ` Yuan Fu
2023-02-02  6:34                                                           ` Eli Zaretskii
2023-02-02 12:12                                                             ` Dmitry Gutov
2023-02-02 14:23                                                               ` Eli Zaretskii
2023-02-02 17:03                                                                 ` Dmitry Gutov
2023-02-02 17:26                                                                   ` Eli Zaretskii
2023-02-02 17:53                                                                     ` Dmitry Gutov
2023-02-02 18:03                                                                       ` Eli Zaretskii
2023-02-02 19:44                                                                         ` Dmitry Gutov
2023-02-01 13:10                                                     ` Eli Zaretskii
2023-02-01 15:15                                                       ` Dmitry Gutov
2023-01-26 17:12               ` Dmitry Gutov
2023-01-26 18:07             ` Dmitry Gutov
2023-01-26 20:46               ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=6784f9e7-844b-374d-2a1e-8a61cebe0d7e@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=60953@debbugs.gnu.org \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).