From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Mon, 30 Jan 2023 19:15:07 +0200 Message-ID: <83e58a1b-2e4a-356a-36d8-c756ff105b62@yandex.ru> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> <83wn5ag4nc.fsf@gnu.org> <01b5d074-fb12-6b1f-cbfb-5e759833b854@yandex.ru> <838rhpg57n.fsf@gnu.org> <5026D975-983F-4D18-8690-BE139C92825D@gmail.com> <83pmb1emxi.fsf@gnu.org> <6f318afc-ca71-8b7e-c822-52e6635b5718@yandex.ru> <83sffxcfxw.fsf@gnu.org> <83pmb1cbg5.fsf@gnu.org> <2da844d3-ea31-289e-2821-aa174e365ffd@yandex.ru> <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@yandex.ru> <83zga03yne.fsf@gnu.org> <373a575f-c683-1581-c3e6-502e9897fb04@yandex.ru> <83mt603vrc.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="22954"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: casouri@gmail.com, 60953@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jan 30 18:16:52 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pMXm0-0005s4-JZ for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 30 Jan 2023 18:16:52 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pMXlf-0005jW-GC; Mon, 30 Jan 2023 12:16:33 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMXlD-0005dy-EW for bug-gnu-emacs@gnu.org; Mon, 30 Jan 2023 12:16:15 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pMXlC-00041x-Np for bug-gnu-emacs@gnu.org; Mon, 30 Jan 2023 12:16:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pMXlC-0006Vj-Ax for bug-gnu-emacs@gnu.org; Mon, 30 Jan 2023 12:16:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 30 Jan 2023 17:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.167509891824971 (code B ref 60953); Mon, 30 Jan 2023 17:16:02 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 17:15:18 +0000 Original-Received: from localhost ([127.0.0.1]:50234 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pMXkT-0006Ue-SC for submit@debbugs.gnu.org; Mon, 30 Jan 2023 12:15:18 -0500 Original-Received: from mail-wm1-f46.google.com ([209.85.128.46]:54845) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pMXkS-0006UP-05 for 60953@debbugs.gnu.org; Mon, 30 Jan 2023 12:15:16 -0500 Original-Received: by mail-wm1-f46.google.com with SMTP id n13so1516093wmr.4 for <60953@debbugs.gnu.org>; Mon, 30 Jan 2023 09:15:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=c6eoin3lV+OngQFheRXNZDPsJeRwUEn5wHklsXe4LZ4=; b=nmywdIpp4P+pUB8BecJYZYlEuDMVSYikn4RjdZUB+TgKtNV53TBBl0mLaxTcS5ZEIo RuEfDJr2+ua0evUISrWBc3m6nhVOzmdJL6h0gBOQoPp7ud5Ss3nGIrnPNVH3R7dH9eXQ XrQ40Qy6H0WMEYjgRnTIpBux8fTXqhd3Dii0OngNUg81SMtj8yVaw20OTX9ryccSSUmz 5QO1w9e/Ib2zaka2LNJwli9pK8BVvX6rxA7UiU0uZoXIbOTDvK0JwE852fErZfApB4jZ Wv6tmZFQHlw9Ez1ccK8DZmvKCZfuE32oedsyhFcU2Gm1fLULa265RfTFvN0NN7daOul3 Vrhg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=c6eoin3lV+OngQFheRXNZDPsJeRwUEn5wHklsXe4LZ4=; b=pIc+QCzMqoRqm+ThtVEMrTA5rLEIJ0iJ81QxWqO76PHCZLekO79yRXDpLafB3d9+0x 6VzAauIeRFHomwlowsMsC1EAEJOmNCgvDxcGMdvm0MixhY8GP3NTqwBTiJfehMRDPuJt 3d77HrmfCSP3fmIRlraDDFx97fCErckbT3AKe7I/1co9+m73M7l/Zg8cqDUQyx5i0sFR nAj1FAbTTLM4cfxJLhU396agLrmrU+KG/cL35mUzjn9EhqaWS4fu4RtjcjRn9+cQdYXW G3pZAiClOnt14znpPqmHM9Pemg32FgnjAOq4AeB/poPCky+tZGc6Ykzvufk2rfzeMlSr K2sw== X-Gm-Message-State: AFqh2krHIloKxu24ewTRwZRGHVKwvdcDIQgqq/5ly/zMZtABtRlsFZ0k KErZZLO/KaDBQY/BslaJHxQ= X-Google-Smtp-Source: AMrXdXviseF0baJfN+Rpk9lF5nN5SMeATLkpJIx/37COD4U5YRTxEnHzlz4CqGMrlLP56uZrnLZ1sQ== X-Received: by 2002:a05:600c:4f83:b0:3db:eab:a600 with SMTP id n3-20020a05600c4f8300b003db0eaba600mr45974749wmq.7.1675098909912; Mon, 30 Jan 2023 09:15:09 -0800 (PST) Original-Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id fl22-20020a05600c0b9600b003d1e3b1624dsm17650099wmb.2.2023.01.30.09.15.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 30 Jan 2023 09:15:09 -0800 (PST) Content-Language: en-US In-Reply-To: <83mt603vrc.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254451 Archived-At: On 30/01/2023 17:08, Eli Zaretskii wrote: >> Date: Mon, 30 Jan 2023 16:47:01 +0200 >> Cc:casouri@gmail.com,60953@debbugs.gnu.org >> From: Dmitry Gutov >> >> On 30/01/2023 16:06, Eli Zaretskii wrote: >> >>> But why do you need to narrow there? fast_looking_at will not go >>> beyond end_pos/end_byte anyway, there's no need to restrict it. >> The reason for that is to be able to support the \` and \' markers in >> REGEXP. I haven't found any alternative approach that doesn't call >> 'substring'. > fast_looking_at already does an anchored match, so I'm not sure I > follow. I don't even understand why you need th \` part, when the > match will either always start from the first position or fail. The regexp might include the anchors, or it might not. It might also use a different anchor like ^ or $ or \b. See these examples from the documentation: ((_) @bob (#match \"^B.b$\" @bob)) '(( (compound_expression :anchor (_) @@first (_) :* @@rest) (:match "love" @@first) )) > And for \', just compare the length of the match returned by > fast_looking_at with the length of the text. This seems to work, i.e. even when before "carpet", (and (looking-at (regexp-opt '("car" "cardigan" "carpet"))) (match-string 0)) returns the full match. I was expecting that it could return just "car" -- not sure why it doesn't stop there. But again, to find out whether we need to use the end anchor at all, we'd have to parse the regexp, remove the actual anchor before calling fast_looking_at, and then add the above check. One possible alternative, I suppose, would be to create a raw pointer to a part of the buffer text and call re_search directly specifying the known length of the node in bytes. If buffer text is one contiguous region in memory, that is. This way we would regexp test against a string (not a buffer), but without creating a separate string object.