From: Dmitry Gutov <dgutov@yandex.ru>
To: Eli Zaretskii <eliz@gnu.org>
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Thu, 26 Jan 2023 20:07:30 +0200 [thread overview]
Message-ID: <31559c1f-1a12-691d-3d03-f566019a0aab@yandex.ru> (raw)
In-Reply-To: <838rhpg57n.fsf@gnu.org>
On 26/01/2023 08:50, Eli Zaretskii wrote:
>> Date: Thu, 26 Jan 2023 01:21:08 +0200
>> Cc: casouri@gmail.com, 60953@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> Thank you. Unfortunately, the performance improvement from this patch is
>> still fairly negligible.
>
> This is quite strange, since all of the approaches basically use the
> same primitives under the hood. Perhaps the reason for the slowness
> is that the code which computes the text span of a node is slow?
That code seems to be the same between the two options:
treesit_predicate_capture_name_to_text basically does the same as
treesit-node-text (except in C) after iterating through a Lisp list to
find the node. ruby-ts--builtin-method-p calls treesit-node-text.
And treesit_predicate_pred does the same iteration, so the :pred option
should just be slower, due to Lisp-related overhead. funcalls and stuff.
> Otherwise, I must be missing something here, since the rest of the
> code on the C level is basically the same, give or take some wrappers
> that should not change the overall picture.
The query object is smaller, though. That's basically my only remaining
hypothesis.
>> Switching to using :pred with function (like I did in commit
>> d94dc606a0934) which still uses buffer-substring inside is significantly
>> faster.
>
> If the performance issue is fixed, then the only aspect that we should
> perhaps try to improve is consing.
I wouldn't say it's "fixed", just improved. And :match really should be
able to be made faster than :pred, since it'll probably be used for
similar cases (where a lot/most of nodes match).
There seems to be a fair amount of consing going on inside
treesit-query-capture already: we wrap every TS node in our objects, we
turn the captured nodes into a Lisp alist, and we turn the predicates
into a list, turning the strings into "our" strings. The 'make_string'
function creates a new copy in the memory, right?
One could hope to avoid recreating the list of predicates on every
match, but that seems to be a limitation of the TS API:
ts_query_predicates_for_pattern requires a second argument,
match.pattern_index. Maybe we could memoize that, though?
In any case, that seems to explain why adding or avoiding one
buffer-substring call per match isn't moving the needle very much.
> Consing a string each time you
> need to fontify increases the GC pressure, so if there's a good way of
> avoiding that without performance degradation, we should take it. Is
> it possible to use your :pred technique in a way that doesn't need to
> produce strings from buffer text?
The only version I managed to get some (very minor) performance
improvement is this:
(defun ruby-ts--builtin-method-p (node)
(goto-char (treesit-node-start node))
(let ((inhibit-changing-match-data t))
(re-search-forward ruby-ts--builtin-methods (treesit-node-end node)
t)))
The improvement is like 200-300ms, whereas the difference between :match
and :pred in this benchmark is several seconds.
And if I try to bring it back to 100% correctness, to ensure that the
whole of node text is matched, I have to use narrowing (and string-start
and string-end anchors in regexp):
(defvar ruby-ts--builtin-methods
(format "\\`%s\\'" (regexp-opt (append ruby-builtin-methods-no-reqs
ruby-builtin-methods-with-reqs)))
"Ruby built-in methods.")
(defun ruby-ts--builtin-method-p (node)
(save-restriction
(goto-char (treesit-node-start node))
(narrow-to-region (point) (treesit-node-end node))
(let ((inhibit-changing-match-data t))
(re-search-forward ruby-ts--builtin-methods nil t))))
And with that, the performance is again no better than the current
version. If I also add save-excursion, it's worse.
next prev parent reply other threads:[~2023-01-26 18:07 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24 4:04 ` Dmitry Gutov
2023-01-25 3:13 ` Yuan Fu
2023-01-25 3:48 ` Dmitry Gutov
2023-01-25 12:49 ` Eli Zaretskii
2023-01-25 23:21 ` Dmitry Gutov
2023-01-26 6:50 ` Eli Zaretskii
2023-01-26 7:17 ` Yuan Fu
2023-01-26 8:10 ` Eli Zaretskii
2023-01-26 17:15 ` Dmitry Gutov
2023-01-26 18:24 ` Eli Zaretskii
2023-01-26 19:35 ` Dmitry Gutov
2023-01-26 20:01 ` Eli Zaretskii
2023-01-26 21:26 ` Dmitry Gutov
2023-01-30 0:49 ` Dmitry Gutov
2023-01-30 14:06 ` Eli Zaretskii
2023-01-30 14:47 ` Dmitry Gutov
2023-01-30 15:08 ` Eli Zaretskii
2023-01-30 17:15 ` Dmitry Gutov
2023-01-30 17:49 ` Eli Zaretskii
2023-01-30 18:20 ` Dmitry Gutov
2023-01-30 18:42 ` Eli Zaretskii
2023-01-30 19:01 ` Dmitry Gutov
2023-01-30 19:05 ` Eli Zaretskii
2023-01-30 19:58 ` Dmitry Gutov
2023-01-30 23:57 ` Yuan Fu
2023-01-31 0:44 ` Dmitry Gutov
2023-01-31 3:23 ` Eli Zaretskii
2023-01-31 18:16 ` Dmitry Gutov
2023-02-01 2:39 ` Dmitry Gutov
2023-02-01 13:39 ` Eli Zaretskii
2023-02-01 15:13 ` Dmitry Gutov
2023-02-01 21:20 ` Dmitry Gutov
2023-02-02 2:16 ` Yuan Fu
2023-02-02 6:34 ` Eli Zaretskii
2023-02-02 12:12 ` Dmitry Gutov
2023-02-02 14:23 ` Eli Zaretskii
2023-02-02 17:03 ` Dmitry Gutov
2023-02-02 17:26 ` Eli Zaretskii
2023-02-02 17:53 ` Dmitry Gutov
2023-02-02 18:03 ` Eli Zaretskii
2023-02-02 19:44 ` Dmitry Gutov
2023-02-01 13:10 ` Eli Zaretskii
2023-02-01 15:15 ` Dmitry Gutov
2023-01-26 17:12 ` Dmitry Gutov
2023-01-26 18:07 ` Dmitry Gutov [this message]
2023-01-26 20:46 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=31559c1f-1a12-691d-3d03-f566019a0aab@yandex.ru \
--to=dgutov@yandex.ru \
--cc=60953@debbugs.gnu.org \
--cc=casouri@gmail.com \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.