From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Tue, 24 Jan 2023 19:13:29 -0800 Message-ID: References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.300.101.1.3\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15851"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 60953@debbugs.gnu.org To: Dmitry Gutov Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Jan 25 04:14:20 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pKWEt-0003wc-VM for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 25 Jan 2023 04:14:20 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pKWEi-000324-P8; Tue, 24 Jan 2023 22:14:08 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pKWEd-00031m-S4 for bug-gnu-emacs@gnu.org; Tue, 24 Jan 2023 22:14:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pKWEc-00035s-7U for bug-gnu-emacs@gnu.org; Tue, 24 Jan 2023 22:14:03 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pKWEc-0006ed-27 for bug-gnu-emacs@gnu.org; Tue, 24 Jan 2023 22:14:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Yuan Fu Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 25 Jan 2023 03:14:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.167461643025548 (code B ref 60953); Wed, 25 Jan 2023 03:14:02 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 25 Jan 2023 03:13:50 +0000 Original-Received: from localhost ([127.0.0.1]:57981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pKWEP-0006e0-RD for submit@debbugs.gnu.org; Tue, 24 Jan 2023 22:13:50 -0500 Original-Received: from mail-pj1-f44.google.com ([209.85.216.44]:36564) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pKWEN-0006dk-Gr for 60953@debbugs.gnu.org; Tue, 24 Jan 2023 22:13:48 -0500 Original-Received: by mail-pj1-f44.google.com with SMTP id e10-20020a17090a630a00b0022bedd66e6dso647545pjj.1 for <60953@debbugs.gnu.org>; Tue, 24 Jan 2023 19:13:47 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=ZxnLshmRBdXkTCPmScFG4XWddbad1R3badRfjSwGSX8=; b=g5H2C1scYa2Uo71baaoVqV8jhRuqhNOVWcwBebrdGITKptG6wndHEgrWwOp0FEMYFy pgUZxtTl93aKJzq5zfDx2wwxyOzwXNMEKdRIsUyGoqtWYZnli35Sb+6nyq8w12fQyP/B wZIclQkykUPlBksexhaQMz38szuUMYTpD3Y0W+8TTrNtLxBIzfusrHe6k/F6TEL3jj0s 4QhrukA6TO2HhzgoLLKg6v/krZT7NMCsM29n+YkGD9zfY0pbq7PoqYTSbTOWu613DPKa I16RYhpCogfEdCNDXeqHFE1RH7HptM8FFt28taFSUDUf02924+tWW4E4kwhozebGyVGu 3oog== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=ZxnLshmRBdXkTCPmScFG4XWddbad1R3badRfjSwGSX8=; b=EZ1BZctgaLY8w7pLA91tjtOpNAUyylJ/lTHbX2KXnkqt2MbK3aP4aA42sbzCANSfij 0HgcmJfwOpOzYgPOkjOAArKMH4EW+69cW0JPQxzME6nn9D2VfPmcvYvoOLmUed211cyG dAF3OQ2u/F5EoLxvhQBvFdMsb3V+eWGI50pV6WEsPfb7QwTRZ2CvOt1gR1uRJjI+nSxj PDj8IpWS5VGbghfBopK8NXLBwZn3dbsz+5c0CdWCYfLNwDiY+NIlnrrWAFZ2kuamlkOD THwadr11yz1IhCRtZUuHDyApqBBf1Eo+wSRSCom131xTG/36LmlgKzM4GRSm2J9NSkex 91tA== X-Gm-Message-State: AFqh2kr2kVA8nE3LGeeS42JbT4dOCxDeYJxdXI/6KKCy4SjGtdEFKufu 1GGD0ibDi6eQMNuhcWqj+8Q= X-Google-Smtp-Source: AMrXdXuTH2iBpfccSedj0iTUa5ZxKzvV/dr8neaokw32AZHJrHErZdnBlQbEuDVeJoXA2ObCL7E4hQ== X-Received: by 2002:a05:6a21:788c:b0:b5:f6de:e299 with SMTP id bf12-20020a056a21788c00b000b5f6dee299mr40624567pzc.35.1674616421409; Tue, 24 Jan 2023 19:13:41 -0800 (PST) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id x3-20020a63b343000000b004b1b9e23790sm2076551pgt.92.2023.01.24.19.13.40 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Tue, 24 Jan 2023 19:13:41 -0800 (PST) In-Reply-To: <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> X-Mailer: Apple Mail (2.3731.300.101.1.3) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254094 Archived-At: > On Jan 23, 2023, at 8:04 PM, Dmitry Gutov wrote: >=20 > Cc-ing Yuan, just in case. >=20 > On 20/01/2023 05:53, Dmitry Gutov wrote: >> In my benchmarking -- using this form in = test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling = ruby-ts-mode: >> (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) = (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) >> the rule added to its font-lock in commit d66ac5285f7 >> :language language >> :feature 'builtin-functions >> `((((identifier) @font-lock-builtin-face) >> (:match ,ruby-ts--builtin-methods >> @font-lock-builtin-face))) >> ...seems to have made it 50% slower. >> The profile looked like this: >> 9454 84% - font-lock-fontify-region >> 9454 84% - font-lock-default-fontify-region >> 8862 79% - = font-lock-fontify-syntactically-region >> 8702 78% - treesit-font-lock-fontify-region >> 128 1% treesit-fontify-with-override >> 123 1% facep >> 84 0% treesit--children-covering-range-recurse >> 60 0% + ruby-ts--comment-font-lock >> 4 0% + font-lock-unfontify-region >> 568 5% + font-lock-fontify-keywords-region >> 16 0% + font-lock-unfontify-region >> So there's nothing on the Lisp level to look at. >=20 > I've done some perf recordings now. It seems most/all of the = difference comes down to garbage collection. Or more concretely, time = spent inside process_mark_stack. >=20 > Without the added query benchmark reports: >=20 > (10.13723333 49 1.141649534999999) >=20 > And the perf top5 is: >=20 > 17.26% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 10.83% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 10.18% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 8.37% emacs emacs [.] = process_mark_stack > 4.63% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > With this simple query that colors everything: >=20 > :language language > :feature 'builtin-function > `((((identifier) @font-lock-builtin-face))) >=20 > I get: >=20 > (11.993968995 82 1.9326509270000045) >=20 > Note the jump in runtime that's larger than the jump in GC. >=20 > 17.26% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 10.83% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 10.18% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 8.37% emacs emacs [.] = process_mark_stack > 4.63% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > The current query looks like this: >=20 > :language language > :feature 'builtin-function > `((((identifier) @font-lock-builtin-face) > (:pred ruby-ts--builtin-method-p @font-lock-builtin-face))) >=20 > Benchmarking: >=20 > (12.493614359 107 2.558609025999999) >=20 > 15.30% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 14.92% emacs emacs [.] = process_mark_stack > 9.75% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 8.90% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 3.87% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > Here we get the same jump in runtime as in GC. Even though this rule = ends up coloring much fewer (almost none) nodes in the current buffer. I = interpret the results like this: >=20 > - The jump in runtime of the previous query was probably related to = the number of nodes needed to be processed, but not with the resulting = highlighting, even though every identifier in the buffer ends up being = colored. >=20 > - The GC overhead created by the predicates is non-negligible. >=20 > And the original query that I tried: >=20 > :language language > :feature 'builtin-function > `((((identifier) @font-lock-builtin-face) > (:match ,ruby-ts--builtin-methods @font-lock-builtin-face))) >=20 > Benchmarking: >=20 > (16.433451865000002 249 5.908674810000001) >=20 > 23.72% emacs emacs [.] process_mark_stack > 12.33% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_current_status > 7.96% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_next_sibling > 7.38% emacs libtree-sitter.so.0.0 [.] = ts_tree_cursor_goto_first_child > 3.37% emacs libtree-sitter.so.0.0 [.] = ts_node_start_point >=20 > Here's a significant jump in GC time which is almost the same as the = difference in runtime. And all of it is spent marking? >=20 > I suppose if the problem is allocation of a large string (many times = over), the GC could be spending a lot of time scanning through the = memory. Could this be avoided by passing some substitute handle to TS, = instead of the full string? E.g. some kind of reference to it in the = regexp cache. >=20 FYI the predicates are not processed by tree-sitter, but by us. For = example, the #equal predicate is handled by treesit_predicate_equal. For = #match, right now we create a string with Fbuffer_substring and pass it = to fast_string_match, so it definitely causes a lot of gc=E2=80=99s, as = you observed. We can probably match the regexp in-place, just limit the = match to the range of the node. Yuan