From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Thu, 26 Jan 2023 23:26:54 +0200 Message-ID: <2da844d3-ea31-289e-2821-aa174e365ffd@yandex.ru> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> <83wn5ag4nc.fsf@gnu.org> <01b5d074-fb12-6b1f-cbfb-5e759833b854@yandex.ru> <838rhpg57n.fsf@gnu.org> <5026D975-983F-4D18-8690-BE139C92825D@gmail.com> <83pmb1emxi.fsf@gnu.org> <6f318afc-ca71-8b7e-c822-52e6635b5718@yandex.ru> <83sffxcfxw.fsf@gnu.org> <83pmb1cbg5.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="27528"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: casouri@gmail.com, 60953@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jan 26 22:28:21 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pL9nB-0006uW-82 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 26 Jan 2023 22:28:21 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pL9mu-0000Y6-8r; Thu, 26 Jan 2023 16:28:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pL9ms-0000Xu-8p for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 16:28:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pL9ms-0003lb-0I for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 16:28:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pL9mr-00042j-IJ for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 16:28:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 26 Jan 2023 21:28:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.167476842515476 (code B ref 60953); Thu, 26 Jan 2023 21:28:01 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 21:27:05 +0000 Original-Received: from localhost ([127.0.0.1]:36340 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pL9lx-00041Y-1g for submit@debbugs.gnu.org; Thu, 26 Jan 2023 16:27:05 -0500 Original-Received: from mail-ej1-f43.google.com ([209.85.218.43]:42826) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pL9lu-000414-S1 for 60953@debbugs.gnu.org; Thu, 26 Jan 2023 16:27:03 -0500 Original-Received: by mail-ej1-f43.google.com with SMTP id bk15so8663703ejb.9 for <60953@debbugs.gnu.org>; Thu, 26 Jan 2023 13:27:02 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=BS3DGF0sSLKQtAa0ql9SoQjS5x+j1bcnb6YbOErguac=; b=eDBmkJe/kjYLsTCejgZAThIFGHFumDuhOGE4MwdROUAtuIKWJcIv0HKaUFlOJIKkje laMnzTX0nn9Ge9yoiTDsssvG+n5mZQc/g9rtzncbN9p8aukbKXbtTwp+UabRiPNyvzvS tO4IPHOFP6IhmeURQHO2Er/ikBMkipr9K00ygMoCpJ5N74gxmRoqeyydXYEh5m/4lGAF B0oIFh/licKsQAM7cF0ndFcYuHyp4oHm6SlVdAW2syh1MQgdHv0agjjzrmoWvVO+scK8 bzm334vn2PvUKAdnMoVxag+7BVrKF734NwQtyWpYOjgnT348aBjWL0zy5rmoT8lEieoS incA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:content-language :references:cc:to:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=BS3DGF0sSLKQtAa0ql9SoQjS5x+j1bcnb6YbOErguac=; b=XacnzGBxpMQaa4wdRdnmECruNopGQfYKTkOdX7dTOvVb913UbLTyHxZsJoHngOEWFx bdBjLrPx22wFOu9h5V2OXkms6g0Q4df5qwdcTiS96VrHDaPu3BCM7hSsiZc3IIBGrS5x OFae1bqRn6YcaAg+CGVzvkAGUELH9mE5jL6AsHWyt5o8Y6DbNwMpqMKjLC8b0NMd20eH PAECCYgEO24mtYuEC6o0zG8OXBoqoeaWuze4GQOifCpg2BqIKvAHG2gcvPbN2LhPbN/Z m3pwqE26LtpzjUIjB4PciGUOTp486qdyWI9Vwjo/Q69FLuNvvxc8pqvOEMX+5Zgh04DU wTLQ== X-Gm-Message-State: AFqh2kqHXTEltVUcq7b7fIj5GskAdiudb6qg6+V84WYJ0wE44V2YZTCx V0DVc9S8mSb650QFPzkz2EA= X-Google-Smtp-Source: AMrXdXsmtwgb8B0cUUQv+2/LS3QZ6WipMm7C5J7hS8wYm6FSwNzSRQm+jn611kQKzsPQckAxzgQmGg== X-Received: by 2002:a17:906:175a:b0:877:6713:7e99 with SMTP id d26-20020a170906175a00b0087767137e99mr31234882eje.58.1674768416834; Thu, 26 Jan 2023 13:26:56 -0800 (PST) Original-Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id w9-20020a170906184900b007c0f217aadbsm1121822eje.24.2023.01.26.13.26.55 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 13:26:56 -0800 (PST) Content-Language: en-US In-Reply-To: <83pmb1cbg5.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254236 Archived-At: On 26/01/2023 22:01, Eli Zaretskii wrote: >> Date: Thu, 26 Jan 2023 21:35:55 +0200 >> Cc:casouri@gmail.com,60953@debbugs.gnu.org >> From: Dmitry Gutov >> >>> If you are saying that GC is responsible, then running the benchmark >>> with gc-cons-threshold set to most-positive-fixnum should produce a >>> more interesting profile and perhaps a more interesting comparison. >> That really helps: >> >> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let >> (treesit--font-lock-fast-mode) (font-lock-ensure)))) >> >> => (16.078430587 251 5.784299419999996) >> >> (let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 >> (progn (font-lock-mode -1) (font-lock-mode 1) (let >> (treesit--font-lock-fast-mode) (font-lock-ensure))))) >> >> => (10.369389725 0 0.0) >> >> Do you want a perf profile for the latter? It might not be very useful. > I'd be interested in comparing the profiles of the two techniques, the > :pred and the :match, with GC disabled like that. Curiously, :pred is still faster, but the difference is much smaller: pred: (9.212951344 0 0.0) 18.23% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 11.61% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 11.43% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 5.00% emacs libtree-sitter.so.0.0 [.] ts_node_start_point 4.02% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node 3.97% emacs emacs [.] re_match_2_internal 3.36% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata 2.45% emacs emacs [.] parse_str_as_multibyte 1.95% emacs emacs [.] exec_byte_code 1.66% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_node 1.66% emacs libtree-sitter.so.0.0 [.] ts_node_end_point 1.30% emacs emacs [.] allocate_vectorlike 1.24% emacs emacs [.] find_interval match: (10.059083317 0 0.0) 19.23% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 12.41% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 11.22% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 5.21% emacs libtree-sitter.so.0.0 [.] ts_node_start_point 4.22% emacs emacs [.] re_match_2_internal 3.97% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node 3.64% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata 2.36% emacs emacs [.] exec_byte_code 1.66% emacs libtree-sitter.so.0.0 [.] ts_node_end_point 1.62% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_node 1.34% emacs libtree-sitter.so.0.0 [.] ts_node_end_byte 1.28% emacs emacs [.] allocate_vectorlike 0.95% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_parent This is with the current code and disabled GC. No additional changes to treesit.c. >>> (But I thought you concluded that GC alone cannot explain the >>> difference in performance?) >> I'm inclined to think the difference is related to copying of the regexp >> string, but whether the time is spent in actually copying it, or >> scanning its copies for garbage later, it was harder to say. Seems like >> it's the latter, though. > If we can avoid the copying, I think it's desirable in any case. They > are constant regexps, aren't they? Yes, but how? Memoization is one possible step, but then we only avoid re-creating the predicate structures for each match. We still send a pretty large query and, apparently, get it back..? Might be some copying involved there. TBH the moderate success the memoization patch shows has me stumped.