From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Thu, 26 Jan 2023 21:35:55 +0200 Message-ID: References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> <83wn5ag4nc.fsf@gnu.org> <01b5d074-fb12-6b1f-cbfb-5e759833b854@yandex.ru> <838rhpg57n.fsf@gnu.org> <5026D975-983F-4D18-8690-BE139C92825D@gmail.com> <83pmb1emxi.fsf@gnu.org> <6f318afc-ca71-8b7e-c822-52e6635b5718@yandex.ru> <83sffxcfxw.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="33315"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: casouri@gmail.com, 60953@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Thu Jan 26 20:37:20 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pL83k-0008Vj-20 for geb-bug-gnu-emacs@m.gmane-mx.org; Thu, 26 Jan 2023 20:37:20 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pL83U-00077l-3s; Thu, 26 Jan 2023 14:37:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pL83S-000778-8F for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 14:37:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pL83S-0006Em-0f for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 14:37:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pL83R-0001E5-QS for bug-gnu-emacs@gnu.org; Thu, 26 Jan 2023 14:37:01 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Thu, 26 Jan 2023 19:37:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.16747617664631 (code B ref 60953); Thu, 26 Jan 2023 19:37:01 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 26 Jan 2023 19:36:06 +0000 Original-Received: from localhost ([127.0.0.1]:36241 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pL82Y-0001Cc-2h for submit@debbugs.gnu.org; Thu, 26 Jan 2023 14:36:06 -0500 Original-Received: from mail-ej1-f42.google.com ([209.85.218.42]:36516) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pL82V-0001C8-MD for 60953@debbugs.gnu.org; Thu, 26 Jan 2023 14:36:04 -0500 Original-Received: by mail-ej1-f42.google.com with SMTP id kt14so7986388ejc.3 for <60953@debbugs.gnu.org>; Thu, 26 Jan 2023 11:36:03 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:from:to:cc:subject:date:message-id:reply-to; bh=uoMPe98iMFQ+ydLt/Iw2fyylgcLnQkR3iEyT4x7zKfw=; b=d8Q+2gUl9emJ1ki63OJGVu+aaJMeiIrX7kKBpkxHZy21ZG+OXp3wFSPq7hlTrPHqRu 4EpU59dFEHooq90cZWmrtbcc1tPfEz8r0KFnXpdACaPjCd1EyaGK1/dARUGRWaznfWaQ feT6QnkoJ5dMyEIVKc/5Z03qh9dOLdKafzSBxN88afDQFpfpFlzGRQcWheJllJmBCyIg 21sKQiCt+bt1gEQlxynqaQL6ZWWdcxQlkDRFt2zFOTJNaVle4jmjmPKH6CKSj/5g3lvc xMmS3ybZUQjNKw9Isg+3p+RzOH7psUrmbReE29JCOdNX7Xcya4eLmfXul4LqLLdaM7xG 9Rrw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:in-reply-to:from:references:cc:to :content-language:subject:user-agent:mime-version:date:message-id :sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=uoMPe98iMFQ+ydLt/Iw2fyylgcLnQkR3iEyT4x7zKfw=; b=zFJLBt5EhXAU3on3lVhAbgtk+sGfWGzx5W6mrxR0o+xYcNLiivVNb+r2E9IN69udQ+ aLYev7lZS7yFtx90rDDkUvkJej3JiBnizdYX/jSgJIyGuKSEpgePWb/trGIIsbMyvU7i e7iyzf01qgCpdxp0FEuchUqPWzzNWATlO+YpedDpBJcP+IQrp0C1pQwpiO5EvXj4wBrA V1rkPa3QhTsUGdxTrxmue0nhhuMple+FEANcU9/b8gcJmmT2T9uLPK6hsVE36Gmo35SQ Fe4Fr8/vF6cZDIihAfkjAmZnBiOeeFOV61Do3BfHR4p3H7be+nKiR6VpD35TLlHj9KLr eqMA== X-Gm-Message-State: AFqh2kpin718+4YMWZCMwLsKiDw3nBHsdnji4szQw153n1Sn/n2/nQeF FL3YGOUuXEPKiUWS6T6iTmQ= X-Google-Smtp-Source: AMrXdXuG1JW8ocRKBUZRz5rG7Djq9Zw4aFvjde0YmvggdX18SraYKeW2feoP79SevEk3exBGbCrd6A== X-Received: by 2002:a17:907:d089:b0:7ad:aed7:a5da with SMTP id vc9-20020a170907d08900b007adaed7a5damr42422134ejc.28.1674761757744; Thu, 26 Jan 2023 11:35:57 -0800 (PST) Original-Received: from [10.115.253.32] ([138.199.34.134]) by smtp.googlemail.com with ESMTPSA id t24-20020a170906269800b007c16e083b01sm1030770ejc.9.2023.01.26.11.35.56 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 26 Jan 2023 11:35:57 -0800 (PST) Content-Language: en-US In-Reply-To: <83sffxcfxw.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254228 Archived-At: On 26/01/2023 20:24, Eli Zaretskii wrote: >> Date: Thu, 26 Jan 2023 19:15:51 +0200 >> Cc:60953@debbugs.gnu.org >> From: Dmitry Gutov >> >> On 26/01/2023 10:10, Eli Zaretskii wrote: >>> Perhaps Dmitry could present comparison of profiles from perf which >>> would allow us to understand the reason(s)? >> I believe I did that in the second message in this thread: >> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60953#8 >> >> To quote the specific profiles, it's >> >> 15.30% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_current_status >> 14.92% emacs emacs [.] process_mark_stack >> 9.75% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_next_sibling >> 8.90% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_first_child >> 3.87% emacs libtree-sitter.so.0.0 [.] ts_node_start_point >> >> for :pred vs. >> >> 23.72% emacs emacs [.] process_mark_stack >> 12.33% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_current_status >> 7.96% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_next_sibling >> 7.38% emacs libtree-sitter.so.0.0 [.] >> ts_tree_cursor_goto_first_child >> 3.37% emacs libtree-sitter.so.0.0 [.] ts_node_start_point >> >> for :match. >> >> And to continue the quote: >> >> Here's a significant jump in GC time which is almost the same as the >> difference in runtime. And all of it is spent marking? >> >> I suppose if the problem is allocation of a large string (many times >> over), the GC could be spending a lot of time scanning through the >> memory. Could this be avoided by passing some substitute handle to TS, >> instead of the full string? E.g. some kind of reference to it in the >> regexp cache. > If you are saying that GC is responsible, then running the benchmark > with gc-cons-threshold set to most-positive-fixnum should produce a > more interesting profile and perhaps a more interesting comparison. That really helps: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) => (16.078430587 251 5.784299419999996) (let ((gc-cons-threshold most-positive-fixnum)) (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure))))) => (10.369389725 0 0.0) Do you want a perf profile for the latter? It might not be very useful. > (But I thought you concluded that GC alone cannot explain the > difference in performance?) I'm inclined to think the difference is related to copying of the regexp string, but whether the time is spent in actually copying it, or scanning its copies for garbage later, it was harder to say. Seems like it's the latter, though.