From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Mon, 30 Jan 2023 19:49:45 +0200 Message-ID: <83bkmf52va.fsf@gnu.org> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> <83wn5ag4nc.fsf@gnu.org> <01b5d074-fb12-6b1f-cbfb-5e759833b854@yandex.ru> <838rhpg57n.fsf@gnu.org> <5026D975-983F-4D18-8690-BE139C92825D@gmail.com> <83pmb1emxi.fsf@gnu.org> <6f318afc-ca71-8b7e-c822-52e6635b5718@yandex.ru> <83sffxcfxw.fsf@gnu.org> <83pmb1cbg5.fsf@gnu.org> <2da844d3-ea31-289e-2821-aa174e365ffd@yandex.ru> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@yandex.ru> <83zga03yne.fsf@gnu.org> <373a575f-c683-1581-c3e6-502e9897fb04@yandex.ru> <83mt603vrc.fsf@gnu.org> <83e58a1b-2e4a-356a-36d8-c756ff105b62@yandex.ru> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="39993"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, 60953@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jan 30 18:50:25 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pMYIT-000ABg-6E for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 30 Jan 2023 18:50:25 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pMYI9-0002w6-7b; Mon, 30 Jan 2023 12:50:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMYI7-0002uv-01 for bug-gnu-emacs@gnu.org; Mon, 30 Jan 2023 12:50:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pMYI6-0004Gm-CY for bug-gnu-emacs@gnu.org; Mon, 30 Jan 2023 12:50:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pMYI6-0001Um-2X for bug-gnu-emacs@gnu.org; Mon, 30 Jan 2023 12:50:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Eli Zaretskii Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 30 Jan 2023 17:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.16751010005737 (code B ref 60953); Mon, 30 Jan 2023 17:50:01 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 17:50:00 +0000 Original-Received: from localhost ([127.0.0.1]:50306 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pMYI4-0001UT-Da for submit@debbugs.gnu.org; Mon, 30 Jan 2023 12:50:00 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:48006) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pMYI2-0001UE-4L for 60953@debbugs.gnu.org; Mon, 30 Jan 2023 12:49:58 -0500 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMYHw-0004Eu-DE; Mon, 30 Jan 2023 12:49:52 -0500 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=knZ8xvmGmarAskBzn1bopQ9SFrdn1i5UVZdsQuJ8imM=; b=kk97wdW9mmpn 8+3SHx0/IHOhtf9DIxxm+XJlmqFjK1F8a/uNJD8ys7slxibZR0OqSWha+vM/DgEtDfR3tc+pDY0GO T2JI++fLQMCHjyGkZeuG5jEZi+J9CE/i9JPDpI2lAqT6DnJorfQuUigktJsoC0WuuMQ6AHZmQ2mTY Uaa5xcZ5kJCfKEuAw2IMUXgVQvdo30BTSXcqZ0B0nF2JhcpXJl6C7az229/t5nuU5SB1znsXL7ZUL E4cjD52hcKUtk2UgHq1xFkiLVuSCG7mqbz0JZwURJcy3ZAgGdI+6sm1TiW6CwVWcy1g93DT1TKlHs hPubm2T+HPpRUYNdqcn40Q==; Original-Received: from [87.69.77.57] (helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMYHv-0004dt-Sk; Mon, 30 Jan 2023 12:49:52 -0500 In-Reply-To: <83e58a1b-2e4a-356a-36d8-c756ff105b62@yandex.ru> (message from Dmitry Gutov on Mon, 30 Jan 2023 19:15:07 +0200) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254458 Archived-At: > Date: Mon, 30 Jan 2023 19:15:07 +0200 > Cc: casouri@gmail.com, 60953@debbugs.gnu.org > From: Dmitry Gutov > > > fast_looking_at already does an anchored match, so I'm not sure I > > follow. I don't even understand why you need th \` part, when the > > match will either always start from the first position or fail. > > The regexp might include the anchors, or it might not. > > It might also use a different anchor like ^ or $ or \b. OK, but it always goes only forward, so narrowing to the beginning shouldn't be necessary. Right? And you can use the LIMIT argument to limit how far it goes forward, right? So once again, why narrow? > > And for \', just compare the length of the match returned by > > fast_looking_at with the length of the text. > > This seems to work, i.e. even when before "carpet", > > (and (looking-at (regexp-opt '("car" "cardigan" "carpet"))) > (match-string 0)) > > returns the full match. I was expecting that it could return just "car" > -- not sure why it doesn't stop there. Because regex search is greedy? > One possible alternative, I suppose, would be to create a raw pointer to > a part of the buffer text and call re_search directly specifying the > known length of the node in bytes. If buffer text is one contiguous > region in memory, that is. It isn't, though: there's the gap. Which is why doing this is not recommended; instead, use something like search_buffer_re, which already handles this complication for you. (Except that search_buffer_re is a static function, so only code in search.c can use it. So you'd need to make it non-static.)