From: Dmitry Gutov <dgutov@yandex.ru>
To: Eli Zaretskii <eliz@gnu.org>
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Mon, 30 Jan 2023 02:49:47 +0200 [thread overview]
Message-ID: <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@yandex.ru> (raw)
In-Reply-To: <2da844d3-ea31-289e-2821-aa174e365ffd@yandex.ru>
[-- Attachment #1: Type: text/plain, Size: 2323 bytes --]
On 26/01/2023 23:26, Dmitry Gutov wrote:
>>>> (But I thought you concluded that GC alone cannot explain the
>>>> difference in performance?)
>>> I'm inclined to think the difference is related to copying of the regexp
>>> string, but whether the time is spent in actually copying it, or
>>> scanning its copies for garbage later, it was harder to say. Seems like
>>> it's the latter, though.
>> If we can avoid the copying, I think it's desirable in any case. They
>> are constant regexps, aren't they?
>
> Yes, but how?
>
> Memoization is one possible step, but then we only avoid re-creating the
> predicate structures for each match. We still send a pretty large query
> and, apparently, get it back..? Might be some copying involved there.
>
> TBH the moderate success the memoization patch shows has me stumped.
Okay, I have cleaned up both experiments that I had. And when combined,
they make the :match approach a little faster than the :pred one.
I'm still not sure why the difference is so little, given that the :pred
one has Lisp funcalls and extra allocation, and :match does not.
Still, if nobody has any better ideas, I suggest we install both of
these changes now. They are attached in separate patches.
memoize_vector.diff improves the performance of both cases. For :pred,
it's roughly 10%; for :match, it's more.
treesit_predicate_match.diff improves the performance of the latter,
though only a little: maybe 3-4%.
Code review welcome.
Is applying (and undoing) the narrowing this way legal enough? Or should
I go through some error handlers, or ensure blocks, etc?
Speaking of pref, the profile looks like this now (very similar to what
it was before the added rule):
17.25% emacs libtree-sitter.so.0.0 [.]
ts_tree_cursor_current_status
10.93% emacs libtree-sitter.so.0.0 [.]
ts_tree_cursor_goto_next_sibling
9.89% emacs libtree-sitter.so.0.0 [.]
ts_tree_cursor_goto_first_child
9.01% emacs emacs [.] process_mark_stack
4.80% emacs libtree-sitter.so.0.0 [.] ts_node_start_point
3.84% emacs emacs [.] re_match_2_internal
3.82% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node
3.06% emacs libtree-sitter.so.0.0 [.]
ts_language_symbol_metadata
[-- Attachment #2: memoize_vector.diff --]
[-- Type: text/x-patch, Size: 1316 bytes --]
diff --git a/src/treesit.c b/src/treesit.c
index b210ec0923a..71aff3202ae 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2720,8 +2744,10 @@ DEFUN ("treesit-query-capture",
every for loop and nconc it to RESULT every time. That is indeed
the initial implementation in which Yoav found nconc being the
bottleneck (98.4% of the running time spent on nconc). */
+ uint32_t patterns_count = ts_query_pattern_count(treesit_query);
Lisp_Object result = Qnil;
Lisp_Object prev_result = result;
+ Lisp_Object predicates_table = make_vector(patterns_count, Qt);
while (ts_query_cursor_next_match (cursor, &match))
{
/* Record the checkpoint that we may roll back to. */
@@ -2750,9 +2776,12 @@ DEFUN ("treesit-query-capture",
result = Fcons (cap, result);
}
/* Get predicates. */
- Lisp_Object predicates
- = treesit_predicates_for_pattern (treesit_query,
- match.pattern_index);
+ Lisp_Object predicates = AREF(predicates_table, match.pattern_index);
+ if (EQ (predicates, Qt))
+ {
+ predicates = treesit_predicates_for_pattern (treesit_query, 0);
+ ASET(predicates_table, match.pattern_index, predicates);
+ }
/* captures_lisp = Fnreverse (captures_lisp); */
struct capture_range captures_range = { result, prev_result };
[-- Attachment #3: treesit_predicate_match.diff --]
[-- Type: text/x-patch, Size: 1633 bytes --]
diff --git a/src/treesit.c b/src/treesit.c
index b210ec0923a..3630db42f5e 100644
--- a/src/treesit.c
+++ b/src/treesit.c
@@ -2466,10 +2466,34 @@ treesit_predicate_match (Lisp_Object args, struct capture_range captures)
build_string ("The second argument to `match' should "
"be a capture name, not a string"));
- Lisp_Object text = treesit_predicate_capture_name_to_text (capture_name,
- captures);
+ Lisp_Object node = treesit_predicate_capture_name_to_node (capture_name, captures);
- if (fast_string_match (regexp, text) >= 0)
+ struct buffer *old_buffer = current_buffer;
+ struct buffer *buffer = XBUFFER (XTS_PARSER (XTS_NODE (node)->parser)->buffer);
+ set_buffer_internal (buffer);
+
+ TSNode treesit_node = XTS_NODE (node)->node;
+ ptrdiff_t visible_beg = XTS_PARSER (XTS_NODE (node)->parser)->visible_beg;
+ uint32_t start_byte_offset = ts_node_start_byte (treesit_node);
+ uint32_t end_byte_offset = ts_node_end_byte (treesit_node);
+ ptrdiff_t start_byte = visible_beg + start_byte_offset;
+ ptrdiff_t end_byte = visible_beg + end_byte_offset;
+ ptrdiff_t start_pos = buf_bytepos_to_charpos (buffer, start_byte);
+ ptrdiff_t end_pos = buf_bytepos_to_charpos (buffer, end_byte);
+ ptrdiff_t old_begv = BEGV;
+ ptrdiff_t old_zv = ZV;
+
+ SET_BUF_BEGV(buffer, start_pos);
+ SET_BUF_ZV(buffer, end_pos);
+
+ ptrdiff_t val = fast_looking_at (regexp, start_pos, start_byte, end_pos, end_byte, Qnil);
+
+ SET_BUF_BEGV(buffer, old_begv);
+ SET_BUF_ZV(buffer, old_zv);
+
+ set_buffer_internal (old_buffer);
+
+ if (val > 0)
return true;
else
return false;
next prev parent reply other threads:[~2023-01-30 0:49 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24 4:04 ` Dmitry Gutov
2023-01-25 3:13 ` Yuan Fu
2023-01-25 3:48 ` Dmitry Gutov
2023-01-25 12:49 ` Eli Zaretskii
2023-01-25 23:21 ` Dmitry Gutov
2023-01-26 6:50 ` Eli Zaretskii
2023-01-26 7:17 ` Yuan Fu
2023-01-26 8:10 ` Eli Zaretskii
2023-01-26 17:15 ` Dmitry Gutov
2023-01-26 18:24 ` Eli Zaretskii
2023-01-26 19:35 ` Dmitry Gutov
2023-01-26 20:01 ` Eli Zaretskii
2023-01-26 21:26 ` Dmitry Gutov
2023-01-30 0:49 ` Dmitry Gutov [this message]
2023-01-30 14:06 ` Eli Zaretskii
2023-01-30 14:47 ` Dmitry Gutov
2023-01-30 15:08 ` Eli Zaretskii
2023-01-30 17:15 ` Dmitry Gutov
2023-01-30 17:49 ` Eli Zaretskii
2023-01-30 18:20 ` Dmitry Gutov
2023-01-30 18:42 ` Eli Zaretskii
2023-01-30 19:01 ` Dmitry Gutov
2023-01-30 19:05 ` Eli Zaretskii
2023-01-30 19:58 ` Dmitry Gutov
2023-01-30 23:57 ` Yuan Fu
2023-01-31 0:44 ` Dmitry Gutov
2023-01-31 3:23 ` Eli Zaretskii
2023-01-31 18:16 ` Dmitry Gutov
2023-02-01 2:39 ` Dmitry Gutov
2023-02-01 13:39 ` Eli Zaretskii
2023-02-01 15:13 ` Dmitry Gutov
2023-02-01 21:20 ` Dmitry Gutov
2023-02-02 2:16 ` Yuan Fu
2023-02-02 6:34 ` Eli Zaretskii
2023-02-02 12:12 ` Dmitry Gutov
2023-02-02 14:23 ` Eli Zaretskii
2023-02-02 17:03 ` Dmitry Gutov
2023-02-02 17:26 ` Eli Zaretskii
2023-02-02 17:53 ` Dmitry Gutov
2023-02-02 18:03 ` Eli Zaretskii
2023-02-02 19:44 ` Dmitry Gutov
2023-02-01 13:10 ` Eli Zaretskii
2023-02-01 15:15 ` Dmitry Gutov
2023-01-26 17:12 ` Dmitry Gutov
2023-01-26 18:07 ` Dmitry Gutov
2023-01-26 20:46 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=6784f9e7-844b-374d-2a1e-8a61cebe0d7e@yandex.ru \
--to=dgutov@yandex.ru \
--cc=60953@debbugs.gnu.org \
--cc=casouri@gmail.com \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).