From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Thu, 26 Jan 2023 01:21:08 +0200 Message-ID: <01b5d074-fb12-6b1f-cbfb-5e759833b854@yandex.ru> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> <83wn5ag4nc.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="10226"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: casouri@gmail.com, 60953@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jan 26 00:22:12 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pKp5o-0002Tp-Jv for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 26 Jan 2023 00:22:12 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pKp5f-0000X1-GU; Wed, 25 Jan 2023 18:22:03 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pKp5e-0000Wn-O9 for bug-gnu-emacs@gnu.org; Wed, 25 Jan 2023 18:22:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pKp5e-0007rt-3E for bug-gnu-emacs@gnu.org; Wed, 25 Jan 2023 18:22:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pKp5d-0000pM-Sj for bug-gnu-emacs@gnu.org; Wed, 25 Jan 2023 18:22:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 25 Jan 2023 23:22:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.16746888793129 (code B ref 60953); Wed, 25 Jan 2023 23:22:01 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 23:21:19 +0000 Original-Received: from localhost ([127.0.0.1]:60188 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pKp4w-0000oO-TX for submit@debbugs.gnu.org; Wed, 25 Jan 2023 18:21:19 -0500 Original-Received: from mail-ed1-f52.google.com ([209.85.208.52]:36440) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pKp4u-0000o6-CJ for 60953@debbugs.gnu.org; Wed, 25 Jan 2023 18:21:18 -0500 Original-Received: by mail-ed1-f52.google.com with SMTP id u21so400069edv.3 for <60953@debbugs.gnu.org>; Wed, 25 Jan 2023 15:21:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=pP7gDnrI4m9kj9KUZ4SIHlTchsrVVFxIvqqg+Wn5TJw=; b=oYN92WcECUxyazraxpd0PnPXfAXroOr4VMFPP7rJ/ZO8ZrZ+HffB6REvUjYl8q79G3 S0RfVc/XAx8yC5XBf3WSvUukbSLAiLzQNHWlq1TgRRGXaoBkNWmgHN0ApE2CNJX1av3P GqbG7wJaTccry924U5tnfl1PYMCCli8SDaqPvv4yX/K6ld18674wqsHEFYE0u2q4qxxx MZoEVGbnAjZarLt4ajV4nAt8wUsh9pTcF5nXkJaA/NICQ8gQlngGgvpcS4ER+S93CZgC QhshT2pWdTe0/QAcMUqHY6w6skpKFU0dh/EMNSAHKc5NPM0vrug+7Csm6DLVTUQAwcTy d33Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=pP7gDnrI4m9kj9KUZ4SIHlTchsrVVFxIvqqg+Wn5TJw=; b=ovh8O0cvtRkpJ46U5HfsOSj+gQuKgGQVZQFVY6/umtUBuMnoHCFQ2MHwx6euF0WO3t 5i+8GySVTYCMhQ3twWIqK8v4Ng10UgQ2XN1MiYN/l0FErYrmaEz8QNh2umyPXErv35zm rF6JEMx6UE5BjyzVrFljwlOOF7maYUz/6GZSHgpTz/kTHZmgxGErHcv0+CQKukam3exw V+vr4V/Nj2ntwZUarIt/ZRpgRXoPqFS6KsRZKezUJ7rM/iQQ+A3hQYVb2hf3aYPOwvcJ 1XquEPOEhZ9ZZU7C/QS6Ou4Sx8ww/UjjZVCebyv9+4LfeuTjeDolbmPZ86HxOXgoiFBC nm/A== X-Gm-Message-State: AO0yUKVEd+EIKVL8SqxL5CoXS43esAnE2h2C+PR7d53TIDHrukaGblIx q3j3cPxbGsUH7d/BErHLr1U= X-Google-Smtp-Source: AK7set8NT8p/BbSNwO9eRnU9aR3AZEwd+1219nFucQK/pvW7dqNQCw7RmPWtArHNUTXhei442Hy5xg== X-Received: by 2002:a05:6402:1a5a:b0:4a0:b72a:6552 with SMTP id bf26-20020a0564021a5a00b004a0b72a6552mr1411563edb.19.1674688870524; Wed, 25 Jan 2023 15:21:10 -0800 (PST) Original-Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id k25-20020aa7c059000000b00499b6b50419sm2944297edo.11.2023.01.25.15.21.09 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Wed, 25 Jan 2023 15:21:09 -0800 (PST) Content-Language: en-US In-Reply-To: <83wn5ag4nc.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254160 Archived-At: On 25/01/2023 14:49, Eli Zaretskii wrote: >> Cc: 60953@debbugs.gnu.org >> Date: Wed, 25 Jan 2023 05:48:13 +0200 >> From: Dmitry Gutov >> >>> We can probably match the regexp in-place, just limit the match to the range of the node. >> >> That's what I tried to do in the patch attached to the first message: >> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#5 >> >> But the effect on performance was surprisingly hard to notice. It also >> broke the actual highlighting, but that's probably because the regexp >> uses anchors \` and \', which don't really work for fast_looking_at >> calls inside a buffer. > > The condition for a match in that patch is not correct, AFAIU: > > if (val >= 0) > return true; > else > return false; > > It should be "if (val > 0)" instead, since fast_looking_at returns the > number of characters that matched (unlike fast_string_match in the > original code, which returns the _index_ of the match). Thank you. Unfortunately, the performance improvement from this patch is still fairly negligible. Even though I got the highlighting to work -- by removing the \` and \' anchors from ruby-ts--builtin-methods (reducing the precision a little, but that's not important for the benchmark). Switching to using :pred with function (like I did in commit d94dc606a0934) which still uses buffer-substring inside is significantly faster. > Also, fast_string_match is capable of succeeding if the match begins > not at the first character, whereas fast_looking_at does an anchored > match. Do we expect the text to match from its beginning in this > case? If not, I think the replacement didn't do what the original > code does, and you should have used search_buffer or maybe > search_buffer_re instead. I suppose one could use a non-anchored regexp with :match, but that's not the case with the regexp I'm using currently. Anyway, that's only going to be important if we find something that I missed here with this patch. Because otherwise the major bottleneck is somewhere else. If we do end up using it and try to get it to 100% correctness, I suppose a combination of narrow-to-region (so that the \` and \' anchors work) with re-search-forward can do the trick. Although I've tried using that combination inside ruby-ts--builtin-method-p (to avoid the buffer-substring call), and it wasn't much of an improvement in performance either.