From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Thu, 26 Jan 2023 19:15:51 +0200 Message-ID: <6f318afc-ca71-8b7e-c822-52e6635b5718@yandex.ru> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> <83wn5ag4nc.fsf@gnu.org> <01b5d074-fb12-6b1f-cbfb-5e759833b854@yandex.ru> <838rhpg57n.fsf@gnu.org> <5026D975-983F-4D18-8690-BE139C92825D@gmail.com> <83pmb1emxi.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29194"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: 60953@debbugs.gnu.org To: Eli Zaretskii , Yuan Fu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jan 26 18:17:18 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pL5sE-0007O6-EK for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 26 Jan 2023 18:17:18 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pL5s0-0000UT-IF; Thu, 26 Jan 2023 12:17:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pL5ry-0000UJ-PT for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 12:17:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pL5ry-0003MQ-2w for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 12:17:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pL5rx-0005oC-Uz for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 12:17:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 26 Jan 2023 17:17:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.167475336222241 (code B ref 60953); Thu, 26 Jan 2023 17:17:01 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 17:16:02 +0000 Original-Received: from localhost ([127.0.0.1]:36083 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pL5qz-0005mS-ID for submit@debbugs.gnu.org; Thu, 26 Jan 2023 12:16:01 -0500 Original-Received: from mail-ej1-f51.google.com ([209.85.218.51]:35499) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pL5qx-0005m5-ES for 60953@debbugs.gnu.org; Thu, 26 Jan 2023 12:16:00 -0500 Original-Received: by mail-ej1-f51.google.com with SMTP id rl14so6968480ejb.2 for <60953@debbugs.gnu.org>; Thu, 26 Jan 2023 09:15:59 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=/8JpnlPbbX1T4t/TLHZ4sc3Y+1T/wLznUvwd/XH3M8c=; b=hF/xBfZybUxCX8gixm2lLrvMo3OlmYqCKjCrraM7NV5Wh0twspPUxi6tYgm7zhK7MB +ctSH1iaSao3Psc2LQuZELKFTpaxEj9cvmend/HXwtTjZwnMdrf0cRCK0iGg3D1GvVhy IvIQX4RY4qb9M/+o7uRADeQkp5tSzZjuFw73Gwnn1sXGiKRdvPx4V4W+BmIOznQzlFdt 6u7Z/okBK90EAFoORJ5SuuRmVGjpP6PaN8LSLN3VSHKv+sbO1hWRi9UMYsMWQ5yblF2Q t3PT99EASHPqnumUkr7TRBv2RQyO3x8PD3KMRm0nLSd07YOmv4gGnnzm9tBwNC+JfhNn gK2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=/8JpnlPbbX1T4t/TLHZ4sc3Y+1T/wLznUvwd/XH3M8c=; b=XEzsd3vsmEqBMnuXkhyAyFas+NtLAK+eEo0sS0ZXvx2cqf+jIumMPBU6nU09G/uJB8 HvhvOuakKmhTO+KBk0lUCS7aR5d738Mn+4Ppc3niyK6p/mvW+bbAtjAbOWmlaUhcaiEX XEr8ybRaOTeQM8KnWqeMTdAjAszc6vOENkTSt1U/KMWzi9qw1ol1yQZP62kkt8wNdX/k G9IHegeS2BWL/salps0b/YBWylji73gkA3THduAknHEDPXaJqxbt7Hv2yVP7HuXrlclk UOAy+7PkBbOErcIt2fa50L8dfqNqtEza9fzfqJTE7KAfPWwfqqye/0LkfbIsZn1dSC7X SNtA== X-Gm-Message-State: AFqh2kpgMHuUZFChzb/7C5K6Zb8KRhsdxXUXMqiRZasQP4fCq/g3u6dF dng19Me0spK5VUCHIxOQZQw= X-Google-Smtp-Source: AMrXdXuklpiKLg0ZuwjkGrOhdM5cvpzdvqTt5IkE7Iys4kDX6CyVbkeJes6U/ys77MgYD7dPlyHqrg== X-Received: by 2002:a17:907:6021:b0:843:a9fe:f115 with SMTP id fs33-20020a170907602100b00843a9fef115mr34561411ejc.32.1674753353920; Thu, 26 Jan 2023 09:15:53 -0800 (PST) Original-Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id kv15-20020a17090778cf00b007bd28b50305sm853625ejc.200.2023.01.26.09.15.52 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 09:15:53 -0800 (PST) Content-Language: en-US In-Reply-To: <83pmb1emxi.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254222 Archived-At: On 26/01/2023 10:10, Eli Zaretskii wrote: >> From: Yuan Fu >> Date: Wed, 25 Jan 2023 23:17:25 -0800 >> Cc: Dmitry Gutov, >> 60953@debbugs.gnu.org >> >>>> Switching to using :pred with function (like I did in commit >>>> d94dc606a0934) which still uses buffer-substring inside is significantly >>>> faster. >>> If the performance issue is fixed, then the only aspect that we should >>> perhaps try to improve is consing. Consing a string each time you >>> need to fontify increases the GC pressure, so if there's a good way of >>> avoiding that without performance degradation, we should take it. Is >>> it possible to use your :pred technique in a way that doesn't need to >>> produce strings from buffer text? >> Why is :pred more performant though? They just use string-match-p. If anything, the :pred predicates should be more expensive, since they execute lisp functions and conses tree-sitter nodes into lisp objects. > Yes, exactly my thoughts. > > Perhaps Dmitry could present comparison of profiles from perf which > would allow us to understand the reason(s)? I believe I did that in the second message in this thread: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8 To quote the specific profiles, it's 15.30% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 14.92% emacs emacs [.] process_mark_stack 9.75% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 8.90% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point for :pred vs. 23.72% emacs emacs [.] process_mark_stack 12.33% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 7.96% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 7.38% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point for :match. And to continue the quote: Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking? I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.