all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Dmitry Gutov <dgutov@yandex.ru>
To: Eli Zaretskii <eliz@gnu.org>
Cc: casouri@gmail.com, 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Thu, 26 Jan 2023 20:07:30 +0200	[thread overview]
Message-ID: <31559c1f-1a12-691d-3d03-f566019a0aab@yandex.ru> (raw)
In-Reply-To: <838rhpg57n.fsf@gnu.org>

On 26/01/2023 08:50, Eli Zaretskii wrote:
>> Date: Thu, 26 Jan 2023 01:21:08 +0200
>> Cc: casouri@gmail.com, 60953@debbugs.gnu.org
>> From: Dmitry Gutov <dgutov@yandex.ru>
>>
>> Thank you. Unfortunately, the performance improvement from this patch is
>> still fairly negligible.
> 
> This is quite strange, since all of the approaches basically use the
> same primitives under the hood.  Perhaps the reason for the slowness
> is that the code which computes the text span of a node is slow?

That code seems to be the same between the two options: 
treesit_predicate_capture_name_to_text basically does the same as 
treesit-node-text (except in C) after iterating through a Lisp list to 
find the node. ruby-ts--builtin-method-p calls treesit-node-text.

And treesit_predicate_pred does the same iteration, so the :pred option 
should just be slower, due to Lisp-related overhead. funcalls and stuff.

> Otherwise, I must be missing something here, since the rest of the
> code on the C level is basically the same, give or take some wrappers
> that should not change the overall picture.

The query object is smaller, though. That's basically my only remaining 
hypothesis.

>> Switching to using :pred with function (like I did in commit
>> d94dc606a0934) which still uses buffer-substring inside is significantly
>> faster.
> 
> If the performance issue is fixed, then the only aspect that we should
> perhaps try to improve is consing.

I wouldn't say it's "fixed", just improved. And :match really should be 
able to be made faster than :pred, since it'll probably be used for 
similar cases (where a lot/most of nodes match).

There seems to be a fair amount of consing going on inside 
treesit-query-capture already: we wrap every TS node in our objects, we 
turn the captured nodes into a Lisp alist, and we turn the predicates 
into a list, turning the strings into "our" strings. The 'make_string' 
function creates a new copy in the memory, right?

One could hope to avoid recreating the list of predicates on every 
match, but that seems to be a limitation of the TS API: 
ts_query_predicates_for_pattern requires a second argument, 
match.pattern_index. Maybe we could memoize that, though?

In any case, that seems to explain why adding or avoiding one 
buffer-substring call per match isn't moving the needle very much.

> Consing a string each time you
> need to fontify increases the GC pressure, so if there's a good way of
> avoiding that without performance degradation, we should take it.  Is
> it possible to use your :pred technique in a way that doesn't need to
> produce strings from buffer text?

The only version I managed to get some (very minor) performance 
improvement is this:

(defun ruby-ts--builtin-method-p (node)
   (goto-char (treesit-node-start node))
   (let ((inhibit-changing-match-data t))
     (re-search-forward ruby-ts--builtin-methods (treesit-node-end node) 
t)))

The improvement is like 200-300ms, whereas the difference between :match 
and :pred in this benchmark is several seconds.

And if I try to bring it back to 100% correctness, to ensure that the 
whole of node text is matched, I have to use narrowing (and string-start 
and string-end anchors in regexp):

(defvar ruby-ts--builtin-methods
   (format "\\`%s\\'" (regexp-opt (append ruby-builtin-methods-no-reqs
                                          ruby-builtin-methods-with-reqs)))
   "Ruby built-in methods.")

(defun ruby-ts--builtin-method-p (node)
   (save-restriction
     (goto-char (treesit-node-start node))
     (narrow-to-region (point) (treesit-node-end node))
     (let ((inhibit-changing-match-data t))
       (re-search-forward ruby-ts--builtin-methods nil t))))

And with that, the performance is again no better than the current 
version. If I also add save-excursion, it's worse.





  parent reply	other threads:[~2023-01-26 18:07 UTC|newest]

Thread overview: 47+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-01-20  3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24  4:04 ` Dmitry Gutov
2023-01-25  3:13   ` Yuan Fu
2023-01-25  3:48     ` Dmitry Gutov
2023-01-25 12:49       ` Eli Zaretskii
2023-01-25 23:21         ` Dmitry Gutov
2023-01-26  6:50           ` Eli Zaretskii
2023-01-26  7:17             ` Yuan Fu
2023-01-26  8:10               ` Eli Zaretskii
2023-01-26 17:15                 ` Dmitry Gutov
2023-01-26 18:24                   ` Eli Zaretskii
2023-01-26 19:35                     ` Dmitry Gutov
2023-01-26 20:01                       ` Eli Zaretskii
2023-01-26 21:26                         ` Dmitry Gutov
2023-01-30  0:49                           ` Dmitry Gutov
2023-01-30 14:06                             ` Eli Zaretskii
2023-01-30 14:47                               ` Dmitry Gutov
2023-01-30 15:08                                 ` Eli Zaretskii
2023-01-30 17:15                                   ` Dmitry Gutov
2023-01-30 17:49                                     ` Eli Zaretskii
2023-01-30 18:20                                       ` Dmitry Gutov
2023-01-30 18:42                                         ` Eli Zaretskii
2023-01-30 19:01                                           ` Dmitry Gutov
2023-01-30 19:05                                             ` Eli Zaretskii
2023-01-30 19:58                                               ` Dmitry Gutov
2023-01-30 23:57                                                 ` Yuan Fu
2023-01-31  0:44                                                   ` Dmitry Gutov
2023-01-31  3:23                                                 ` Eli Zaretskii
2023-01-31 18:16                                                   ` Dmitry Gutov
2023-02-01  2:39                                                     ` Dmitry Gutov
2023-02-01 13:39                                                       ` Eli Zaretskii
2023-02-01 15:13                                                         ` Dmitry Gutov
2023-02-01 21:20                                                         ` Dmitry Gutov
2023-02-02  2:16                                                           ` Yuan Fu
2023-02-02  6:34                                                           ` Eli Zaretskii
2023-02-02 12:12                                                             ` Dmitry Gutov
2023-02-02 14:23                                                               ` Eli Zaretskii
2023-02-02 17:03                                                                 ` Dmitry Gutov
2023-02-02 17:26                                                                   ` Eli Zaretskii
2023-02-02 17:53                                                                     ` Dmitry Gutov
2023-02-02 18:03                                                                       ` Eli Zaretskii
2023-02-02 19:44                                                                         ` Dmitry Gutov
2023-02-01 13:10                                                     ` Eli Zaretskii
2023-02-01 15:15                                                       ` Dmitry Gutov
2023-01-26 17:12               ` Dmitry Gutov
2023-01-26 18:07             ` Dmitry Gutov [this message]
2023-01-26 20:46               ` Dmitry Gutov

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=31559c1f-1a12-691d-3d03-f566019a0aab@yandex.ru \
    --to=dgutov@yandex.ru \
    --cc=60953@debbugs.gnu.org \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.