From: Dmitry Gutov <dgutov@yandex.ru>
To: Yuan Fu <casouri@gmail.com>
Cc: Eli Zaretskii <eliz@gnu.org>, 60953@debbugs.gnu.org
Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient
Date: Tue, 31 Jan 2023 02:44:53 +0200 [thread overview]
Message-ID: <a8dc0f23-92cc-69d7-c308-2ea970119d8e@yandex.ru> (raw)
In-Reply-To: <A9D3AD21-2057-4964-801C-B8966326F17F@gmail.com>
On 31/01/2023 01:57, Yuan Fu wrote:
>
>
>> On Jan 30, 2023, at 11:58 AM, Dmitry Gutov <dgutov@yandex.ru> wrote:
>>
>> On 30/01/2023 21:05, Eli Zaretskii wrote:
>>>> Date: Mon, 30 Jan 2023 21:01:02 +0200
>>>> Cc:casouri@gmail.com,60953@debbugs.gnu.org
>>>> From: Dmitry Gutov<dgutov@yandex.ru>
>>>>
>>>> But that doesn't answer the question "Could it?".
>>> I don't understand what you are asking. "Could" in what sense?
>>
>> Like, would it make sense to try to modify it that way, or extract a function that would do that, without writing it from scratch.
>>
>> Or create a new function which would reuse some common code.
>>
>> We would call the new function something like match_buffer_substring. Optionally, also expose it to Lisp.
>
> Another option is to change user/programmer’s expectation of the anchor: we could say that the regexp must match the entirety of the node text. IOW, \\` \\' are implied.
Huh, I guess that's an option too.
A couple reasons not to do that would be:
- Potential breakage in all existing TS modes, a week (?) before we're
going to release Emacs 29 pretest. Maybe that's okay, I can't say. But
the breakage from that kind of change could be subtle.
- Compatibility reasons? People writing TS modes for Emacs might be
coming from other editors/TS integrations.
While TreeSitter docs say the predicates are not handled by it, it does
show this example:
(#match? @constant "^[A-Z][A-Z_]+")
The use of '^' anchor seems to imply that the regexp doesn't have to
otherwise match the whole node text (OTOH it's not clear why the example
doesn't just say "^[A-Z]" or "^[A-Z][A-Z_]").
The doc also references the Rust crate and WebAssembly binding which
support #match?.
IIUC Rust uses "re.is_match", which is documented to use "implicit .*?
at the beginning and end". Which matches our current semantics.
WebAssembly uses "regex.test", same effect.
next prev parent reply other threads:[~2023-01-31 0:44 UTC|newest]
Thread overview: 47+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-01-20 3:53 bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Dmitry Gutov
2023-01-24 4:04 ` Dmitry Gutov
2023-01-25 3:13 ` Yuan Fu
2023-01-25 3:48 ` Dmitry Gutov
2023-01-25 12:49 ` Eli Zaretskii
2023-01-25 23:21 ` Dmitry Gutov
2023-01-26 6:50 ` Eli Zaretskii
2023-01-26 7:17 ` Yuan Fu
2023-01-26 8:10 ` Eli Zaretskii
2023-01-26 17:15 ` Dmitry Gutov
2023-01-26 18:24 ` Eli Zaretskii
2023-01-26 19:35 ` Dmitry Gutov
2023-01-26 20:01 ` Eli Zaretskii
2023-01-26 21:26 ` Dmitry Gutov
2023-01-30 0:49 ` Dmitry Gutov
2023-01-30 14:06 ` Eli Zaretskii
2023-01-30 14:47 ` Dmitry Gutov
2023-01-30 15:08 ` Eli Zaretskii
2023-01-30 17:15 ` Dmitry Gutov
2023-01-30 17:49 ` Eli Zaretskii
2023-01-30 18:20 ` Dmitry Gutov
2023-01-30 18:42 ` Eli Zaretskii
2023-01-30 19:01 ` Dmitry Gutov
2023-01-30 19:05 ` Eli Zaretskii
2023-01-30 19:58 ` Dmitry Gutov
2023-01-30 23:57 ` Yuan Fu
2023-01-31 0:44 ` Dmitry Gutov [this message]
2023-01-31 3:23 ` Eli Zaretskii
2023-01-31 18:16 ` Dmitry Gutov
2023-02-01 2:39 ` Dmitry Gutov
2023-02-01 13:39 ` Eli Zaretskii
2023-02-01 15:13 ` Dmitry Gutov
2023-02-01 21:20 ` Dmitry Gutov
2023-02-02 2:16 ` Yuan Fu
2023-02-02 6:34 ` Eli Zaretskii
2023-02-02 12:12 ` Dmitry Gutov
2023-02-02 14:23 ` Eli Zaretskii
2023-02-02 17:03 ` Dmitry Gutov
2023-02-02 17:26 ` Eli Zaretskii
2023-02-02 17:53 ` Dmitry Gutov
2023-02-02 18:03 ` Eli Zaretskii
2023-02-02 19:44 ` Dmitry Gutov
2023-02-01 13:10 ` Eli Zaretskii
2023-02-01 15:15 ` Dmitry Gutov
2023-01-26 17:12 ` Dmitry Gutov
2023-01-26 18:07 ` Dmitry Gutov
2023-01-26 20:46 ` Dmitry Gutov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=a8dc0f23-92cc-69d7-c308-2ea970119d8e@yandex.ru \
--to=dgutov@yandex.ru \
--cc=60953@debbugs.gnu.org \
--cc=casouri@gmail.com \
--cc=eliz@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.