From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Tue, 24 Jan 2023 06:04:07 +0200 Message-ID: <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32166"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 To: 60953@debbugs.gnu.org, Yuan Fu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Jan 24 05:05:27 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pKAYo-0008Fc-RH for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 24 Jan 2023 05:05:26 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pKAYY-0006Hp-JL; Mon, 23 Jan 2023 23:05:10 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pKAYQ-0006GW-Od for bug-gnu-emacs@gnu.org; Mon, 23 Jan 2023 23:05:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pKAYQ-0005st-En for bug-gnu-emacs@gnu.org; Mon, 23 Jan 2023 23:05:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pKAYP-0003wl-SS for bug-gnu-emacs@gnu.org; Mon, 23 Jan 2023 23:05:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 24 Jan 2023 04:05:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.167453305815116 (code B ref 60953); Tue, 24 Jan 2023 04:05:01 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 24 Jan 2023 04:04:18 +0000 Original-Received: from localhost ([127.0.0.1]:55687 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pKAXi-0003vk-6e for submit@debbugs.gnu.org; Mon, 23 Jan 2023 23:04:18 -0500 Original-Received: from mail-ed1-f44.google.com ([209.85.208.44]:35390) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pKAXg-0003vY-27 for 60953@debbugs.gnu.org; Mon, 23 Jan 2023 23:04:16 -0500 Original-Received: by mail-ed1-f44.google.com with SMTP id y19so16955453edc.2 for <60953@debbugs.gnu.org>; Mon, 23 Jan 2023 20:04:16 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=5K5a7QtZ0imfO2URB0gWWkM0LaW8K3Lncn8FRoQ4V4I=; b=ZVapMInVnPF5uPy5Ud49OutEyb2zc8AFvnOkisrq4jI9tUhsx8vHztxxRbrh3knV2T WxSxwG3ycsaeze4R6p1nbwgLDcVRVgx0oIJntuH5JR8GgkOcdh3eK6q3B99UbcsjdHP5 Ft0twTqxT/B01pnslXdTei9RCFUm2Gcxn5YnqHyb9ieqwe1rYXNSIxW+BlS9nVE/pv6t RO3dn9Hoh/1DEu9K2D6QQ8ZWLesSDVmCmBpVyoDnj9lTX28D4MijZrV5iJf+cYsN/t4U qyIXuuP+UOUZnaI39U+wkega+3FtNzGJn5K5InyHHlA9+9+7etvm7EipUd7njWD27toE 1U+w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:references:to:from :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=5K5a7QtZ0imfO2URB0gWWkM0LaW8K3Lncn8FRoQ4V4I=; b=Z0Dapm0skS/TJd6r/jJj1dEr+KUOhAC4n2VuK4yHF1vRQNlbHoLjOHwzNl+Aw+PbCA ECTRLg8xzzaGom/t/StXcYcXMdKM+ld9g6yJSEcUsMXRzeWkVtBjw+gq2gbZnLX+gdZ8 cRT+FKF3pjxGS1uJPd27TzojXahQHsv4foCehotoJaPfYvfRDun7CdS8W8CRdoBWVFsj +N9w69AJ1GeEJG7xbhByqUEnoQaREdB8ZkR4jerJ0Gz+gDLHXpG1kd96MPmYAIsEJbN1 aQFPCej8SQKdcsz2t5jO7DGMrHyRsPdaBNU0MUWrZ8aDGbJCbHXYBetpUNuUsW6GHnWm BdZg== X-Gm-Message-State: AFqh2krORSjt43bS3gxFNV8Qk7xQ+Bo9RjgadnxEHAkWlCH7cySL2A+G LqxE3/fLNaGxl8b/g6SaMSRa4OCIZnM= X-Google-Smtp-Source: AMrXdXvSnHBdXpZQTRB7lDi2NBg4HINbj6ZE1EuMyAY9kzgieyR2JkFO30B6lLhDYc0or0SNEejKAA== X-Received: by 2002:a05:6402:4305:b0:49c:7aa2:55de with SMTP id m5-20020a056402430500b0049c7aa255demr36687914edc.1.1674533049913; Mon, 23 Jan 2023 20:04:09 -0800 (PST) Original-Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id k13-20020a056402048d00b0049dc0123f29sm487955edv.61.2023.01.23.20.04.08 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Mon, 23 Jan 2023 20:04:09 -0800 (PST) Content-Language: en-US In-Reply-To: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254036 Archived-At: Cc-ing Yuan, just in case. On 20/01/2023 05:53, Dmitry Gutov wrote: > In my benchmarking -- using this form in > test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling > ruby-ts-mode: > >   (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) > (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) > > the rule added to its font-lock in commit d66ac5285f7 > >    :language language >    :feature 'builtin-functions >    `((((identifier) @font-lock-builtin-face) >       (:match ,ruby-ts--builtin-methods >        @font-lock-builtin-face))) > > ...seems to have made it 50% slower. > > The profile looked like this: > >   9454  84%                   - font-lock-fontify-region >   9454  84%                    - font-lock-default-fontify-region >   8862  79%                     - font-lock-fontify-syntactically-region >   8702  78%                      - treesit-font-lock-fontify-region >    128   1%                         treesit-fontify-with-override >    123   1%                         facep >     84   0% treesit--children-covering-range-recurse >     60   0%                       + ruby-ts--comment-font-lock >      4   0%                       + font-lock-unfontify-region >    568   5%                     + font-lock-fontify-keywords-region >     16   0%                     + font-lock-unfontify-region > > So there's nothing on the Lisp level to look at. I've done some perf recordings now. It seems most/all of the difference comes down to garbage collection. Or more concretely, time spent inside process_mark_stack. Without the added query benchmark reports: (10.13723333 49 1.141649534999999) And the perf top5 is: 17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 8.37% emacs emacs [.] process_mark_stack 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point With this simple query that colors everything: :language language :feature 'builtin-function `((((identifier) @font-lock-builtin-face))) I get: (11.993968995 82 1.9326509270000045) Note the jump in runtime that's larger than the jump in GC. 17.26% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.83% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 10.18% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 8.37% emacs emacs [.] process_mark_stack 4.63% emacs libtree-sitter.so.0.0 [.] ts_node_start_point The current query looks like this: :language language :feature 'builtin-function `((((identifier) @font-lock-builtin-face) (:pred ruby-ts--builtin-method-p @font-lock-builtin-face))) Benchmarking: (12.493614359 107 2.558609025999999) 15.30% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 14.92% emacs emacs [.] process_mark_stack 9.75% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 8.90% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point Here we get the same jump in runtime as in GC. Even though this rule ends up coloring much fewer (almost none) nodes in the current buffer. I interpret the results like this: - The jump in runtime of the previous query was probably related to the number of nodes needed to be processed, but not with the resulting highlighting, even though every identifier in the buffer ends up being colored. - The GC overhead created by the predicates is non-negligible. And the original query that I tried: :language language :feature 'builtin-function `((((identifier) @font-lock-builtin-face) (:match ,ruby-ts--builtin-methods @font-lock-builtin-face))) Benchmarking: (16.433451865000002 249 5.908674810000001) 23.72% emacs emacs [.] process_mark_stack 12.33% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 7.96% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 7.38% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point Here's a significant jump in GC time which is almost the same as the difference in runtime. And all of it is spent marking? I suppose if the problem is allocation of a large string (many times over), the GC could be spending a lot of time scanning through the memory. Could this be avoided by passing some substitute handle to TS, instead of the full string? E.g. some kind of reference to it in the regexp cache.