From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Mon, 30 Jan 2023 02:49:47 +0200 Message-ID: <6784f9e7-844b-374d-2a1e-8a61cebe0d7e@yandex.ru> References: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> <04729838-b7d4-8a08-2b71-12536a28aebb@yandex.ru> <83wn5ag4nc.fsf@gnu.org> <01b5d074-fb12-6b1f-cbfb-5e759833b854@yandex.ru> <838rhpg57n.fsf@gnu.org> <5026D975-983F-4D18-8690-BE139C92825D@gmail.com> <83pmb1emxi.fsf@gnu.org> <6f318afc-ca71-8b7e-c822-52e6635b5718@yandex.ru> <83sffxcfxw.fsf@gnu.org> <83pmb1cbg5.fsf@gnu.org> <2da844d3-ea31-289e-2821-aa174e365ffd@yandex.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------FtNCE7uFJJvPd9u0gZl5ikKW" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34732"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 Cc: casouri@gmail.com, 60953@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Jan 30 01:50:16 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pMINE-0008rB-56 for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 30 Jan 2023 01:50:16 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pMIN2-0005Ij-KN; Sun, 29 Jan 2023 19:50:04 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pMIN0-0005IM-Vb for bug-gnu-emacs@gnu.org; Sun, 29 Jan 2023 19:50:02 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pMIN0-0002Qi-KB for bug-gnu-emacs@gnu.org; Sun, 29 Jan 2023 19:50:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pMIN0-0007sE-29 for bug-gnu-emacs@gnu.org; Sun, 29 Jan 2023 19:50:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 30 Jan 2023 00:50:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 60953 X-GNU-PR-Package: emacs Original-Received: via spool by 60953-submit@debbugs.gnu.org id=B60953.167503979930254 (code B ref 60953); Mon, 30 Jan 2023 00:50:01 +0000 Original-Received: (at 60953) by debbugs.gnu.org; 30 Jan 2023 00:49:59 +0000 Original-Received: from localhost ([127.0.0.1]:45616 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pMIMx-0007ru-3i for submit@debbugs.gnu.org; Sun, 29 Jan 2023 19:49:59 -0500 Original-Received: from mail-wm1-f53.google.com ([209.85.128.53]:51140) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pMIMu-0007rg-Rw for 60953@debbugs.gnu.org; Sun, 29 Jan 2023 19:49:57 -0500 Original-Received: by mail-wm1-f53.google.com with SMTP id bg26so1142981wmb.0 for <60953@debbugs.gnu.org>; Sun, 29 Jan 2023 16:49:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:from:to:cc:subject :date:message-id:reply-to; bh=ccMJ3JxQbMNi8a0+irMIdsxaWvzxPV5RMJaE5jeftp0=; b=Qs9To49G1i0hEEl2RdkNgPCAcgi3rMNzz03fMYW9/NldMSoLudpp3FB4e22qmLHVVH zKbA3JeVGFg6y4wO9mbnzFO3hEDw3o5Y9/sSoe39+g1o9vcEeGVTwPnm8ik5fvFm7D1G krfTzRFueyWRg6ZlhWLIOBLg0Y45rfqqnZv8tXZ2CkBOfyRjbqrTNrwyYmMCvY4ZmzBm 1InLYTB8IuHzU40FsT0gsEHMQBwa0eoBV5Cfj2eYkZOl9zs+Z45sXgaHEN9Wkx+AcdMv HSe3sLWfcZSGRDa7bh+t+ycRH3MlsapA7gdGrCOuZxfyhGVyueXImJDydOnL24twJVEF nzfQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=in-reply-to:references:cc:to:from:content-language:subject :user-agent:mime-version:date:message-id:sender:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=ccMJ3JxQbMNi8a0+irMIdsxaWvzxPV5RMJaE5jeftp0=; b=FiJN4M3tbj/kA7r0fAiqBWwPDLeEjCuJtaMd/KbrswvkXCsjBF792pyy9ZProw95b1 dy8AuC/JHv8IK7efsRks3KgK9W9phLi2G7jlA8/GTku+ukFh7tzsOP5UMZfwXOl9dcD1 zpjZr21a84lxincNrzM+QN5ypGFsFfHFM6+sf8hM7YfrxL9OY6qz9Z3KKG0F1lMlgKY9 LpO0YGAzJR2PIIWJo0OvXYCdtONfXRkv4IkU4MVVVuMQTaEjiNRw9+kv+LzjMMMi5Eun NYkWN+frJvH4x/VV7SqH8/6URb7Be4O02cJm2URSfx2Rg2MPh/P/chvdF6F9dDFF6+H6 Y4Uw== X-Gm-Message-State: AFqh2kqxG5RHbf2N2F2FU+76WmGvcl7bzIGH/61iyP0nq0e9yv7k0nuO DJ2OX9oOwbFpr7S30C6I1QE= X-Google-Smtp-Source: AMrXdXscN4xgE4HJaKMvzP3NYA33iT0iAVy/7/ksuuAseJtluurk0rMJl4vcZrH/ojnT16JtxOO39w== X-Received: by 2002:a05:600c:601c:b0:3d9:ee01:60a4 with SMTP id az28-20020a05600c601c00b003d9ee0160a4mr48658374wmb.1.1675039790753; Sun, 29 Jan 2023 16:49:50 -0800 (PST) Original-Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id h15-20020a05600c2caf00b003d974076f13sm12525596wmc.3.2023.01.29.16.49.49 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Sun, 29 Jan 2023 16:49:49 -0800 (PST) Content-Language: en-US In-Reply-To: <2da844d3-ea31-289e-2821-aa174e365ffd@yandex.ru> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:254387 Archived-At: This is a multi-part message in MIME format. --------------FtNCE7uFJJvPd9u0gZl5ikKW Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit On 26/01/2023 23:26, Dmitry Gutov wrote: >>>> (But I thought you concluded that GC alone cannot explain the >>>> difference in performance?) >>> I'm inclined to think the difference is related to copying of the regexp >>> string, but whether the time is spent in actually copying it, or >>> scanning its copies for garbage later, it was harder to say. Seems like >>> it's the latter, though. >> If we can avoid the copying, I think it's desirable in any case.  They >> are constant regexps, aren't they? > > Yes, but how? > > Memoization is one possible step, but then we only avoid re-creating the > predicate structures for each match. We still send a pretty large query > and, apparently, get it back..? Might be some copying involved there. > > TBH the moderate success the memoization patch shows has me stumped. Okay, I have cleaned up both experiments that I had. And when combined, they make the :match approach a little faster than the :pred one. I'm still not sure why the difference is so little, given that the :pred one has Lisp funcalls and extra allocation, and :match does not. Still, if nobody has any better ideas, I suggest we install both of these changes now. They are attached in separate patches. memoize_vector.diff improves the performance of both cases. For :pred, it's roughly 10%; for :match, it's more. treesit_predicate_match.diff improves the performance of the latter, though only a little: maybe 3-4%. Code review welcome. Is applying (and undoing) the narrowing this way legal enough? Or should I go through some error handlers, or ensure blocks, etc? Speaking of pref, the profile looks like this now (very similar to what it was before the added rule): 17.25% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_current_status 10.93% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_next_sibling 9.89% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_goto_first_child 9.01% emacs emacs [.] process_mark_stack 4.80% emacs libtree-sitter.so.0.0 [.] ts_node_start_point 3.84% emacs emacs [.] re_match_2_internal 3.82% emacs libtree-sitter.so.0.0 [.] ts_tree_cursor_parent_node 3.06% emacs libtree-sitter.so.0.0 [.] ts_language_symbol_metadata --------------FtNCE7uFJJvPd9u0gZl5ikKW Content-Type: text/x-patch; charset=UTF-8; name="memoize_vector.diff" Content-Disposition: attachment; filename="memoize_vector.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIyMTBl YzA5MjNhLi43MWFmZjMyMDJhZSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNzIwLDggKzI3NDQsMTAgQEAgREVGVU4gKCJ0cmVlc2l0LXF1 ZXJ5LWNhcHR1cmUiLAogICAgICBldmVyeSBmb3IgbG9vcCBhbmQgbmNvbmMgaXQgdG8gUkVT VUxUIGV2ZXJ5IHRpbWUuICBUaGF0IGlzIGluZGVlZAogICAgICB0aGUgaW5pdGlhbCBpbXBs ZW1lbnRhdGlvbiBpbiB3aGljaCBZb2F2IGZvdW5kIG5jb25jIGJlaW5nIHRoZQogICAgICBi b3R0bGVuZWNrICg5OC40JSBvZiB0aGUgcnVubmluZyB0aW1lIHNwZW50IG9uIG5jb25jKS4g ICovCisgIHVpbnQzMl90IHBhdHRlcm5zX2NvdW50ID0gdHNfcXVlcnlfcGF0dGVybl9jb3Vu dCh0cmVlc2l0X3F1ZXJ5KTsKICAgTGlzcF9PYmplY3QgcmVzdWx0ID0gUW5pbDsKICAgTGlz cF9PYmplY3QgcHJldl9yZXN1bHQgPSByZXN1bHQ7CisgIExpc3BfT2JqZWN0IHByZWRpY2F0 ZXNfdGFibGUgPSBtYWtlX3ZlY3RvcihwYXR0ZXJuc19jb3VudCwgUXQpOwogICB3aGlsZSAo dHNfcXVlcnlfY3Vyc29yX25leHRfbWF0Y2ggKGN1cnNvciwgJm1hdGNoKSkKICAgICB7CiAg ICAgICAvKiBSZWNvcmQgdGhlIGNoZWNrcG9pbnQgdGhhdCB3ZSBtYXkgcm9sbCBiYWNrIHRv LiAgKi8KQEAgLTI3NTAsOSArMjc3NiwxMiBAQCBERUZVTiAoInRyZWVzaXQtcXVlcnktY2Fw dHVyZSIsCiAJICByZXN1bHQgPSBGY29ucyAoY2FwLCByZXN1bHQpOwogCX0KICAgICAgIC8q IEdldCBwcmVkaWNhdGVzLiAgKi8KLSAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMKLQk9 IHRyZWVzaXRfcHJlZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwKLQkJCQkJ ICBtYXRjaC5wYXR0ZXJuX2luZGV4KTsKKyAgICAgIExpc3BfT2JqZWN0IHByZWRpY2F0ZXMg PSBBUkVGKHByZWRpY2F0ZXNfdGFibGUsIG1hdGNoLnBhdHRlcm5faW5kZXgpOworICAgICAg aWYgKEVRIChwcmVkaWNhdGVzLCBRdCkpCisJeworCSAgcHJlZGljYXRlcyA9IHRyZWVzaXRf cHJlZGljYXRlc19mb3JfcGF0dGVybiAodHJlZXNpdF9xdWVyeSwgMCk7CisJICBBU0VUKHBy ZWRpY2F0ZXNfdGFibGUsIG1hdGNoLnBhdHRlcm5faW5kZXgsIHByZWRpY2F0ZXMpOworCX0K IAogICAgICAgLyogY2FwdHVyZXNfbGlzcCA9IEZucmV2ZXJzZSAoY2FwdHVyZXNfbGlzcCk7 ICovCiAgICAgICBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJlc19yYW5nZSA9IHsgcmVz dWx0LCBwcmV2X3Jlc3VsdCB9Owo= --------------FtNCE7uFJJvPd9u0gZl5ikKW Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff" Content-Disposition: attachment; filename="treesit_predicate_match.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IGIyMTBl YzA5MjNhLi4zNjMwZGI0MmY1ZSAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDM0IEBAIHRyZWVzaXRfcHJlZGljYXRl X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCi0JCQkJCQkJICAgICBjYXB0dXJl cyk7CisgIExpc3BfT2JqZWN0IG5vZGUgPSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25h bWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1lLCBjYXB0dXJlcyk7CiAKLSAgaWYgKGZhc3Rfc3Ry aW5nX21hdGNoIChyZWdleHAsIHRleHQpID49IDApCisgIHN0cnVjdCBidWZmZXIgKm9sZF9i dWZmZXIgPSBjdXJyZW50X2J1ZmZlcjsKKyAgc3RydWN0IGJ1ZmZlciAqYnVmZmVyID0gWEJV RkZFUiAoWFRTX1BBUlNFUiAoWFRTX05PREUgKG5vZGUpLT5wYXJzZXIpLT5idWZmZXIpOwor ICBzZXRfYnVmZmVyX2ludGVybmFsIChidWZmZXIpOworCisgIFRTTm9kZSB0cmVlc2l0X25v ZGUgPSBYVFNfTk9ERSAobm9kZSktPm5vZGU7CisgIHB0cmRpZmZfdCB2aXNpYmxlX2JlZyA9 IFhUU19QQVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+dmlzaWJsZV9iZWc7Cisg IHVpbnQzMl90IHN0YXJ0X2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9zdGFydF9ieXRlICh0cmVl c2l0X25vZGUpOworICB1aW50MzJfdCBlbmRfYnl0ZV9vZmZzZXQgPSB0c19ub2RlX2VuZF9i eXRlICh0cmVlc2l0X25vZGUpOworICBwdHJkaWZmX3Qgc3RhcnRfYnl0ZSA9IHZpc2libGVf YmVnICsgc3RhcnRfYnl0ZV9vZmZzZXQ7CisgIHB0cmRpZmZfdCBlbmRfYnl0ZSA9IHZpc2li bGVfYmVnICsgZW5kX2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3Qgc3RhcnRfcG9zID0gYnVm X2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBzdGFydF9ieXRlKTsKKyAgcHRyZGlmZl90 IGVuZF9wb3MgPSBidWZfYnl0ZXBvc190b19jaGFycG9zIChidWZmZXIsIGVuZF9ieXRlKTsK KyAgcHRyZGlmZl90IG9sZF9iZWd2ID0gQkVHVjsKKyAgcHRyZGlmZl90IG9sZF96diA9IFpW OworCisgIFNFVF9CVUZfQkVHVihidWZmZXIsIHN0YXJ0X3Bvcyk7CisgIFNFVF9CVUZfWlYo YnVmZmVyLCBlbmRfcG9zKTsKKworICBwdHJkaWZmX3QgdmFsID0gZmFzdF9sb29raW5nX2F0 IChyZWdleHAsIHN0YXJ0X3Bvcywgc3RhcnRfYnl0ZSwgZW5kX3BvcywgZW5kX2J5dGUsIFFu aWwpOworCisgIFNFVF9CVUZfQkVHVihidWZmZXIsIG9sZF9iZWd2KTsKKyAgU0VUX0JVRl9a VihidWZmZXIsIG9sZF96dik7CisKKyAgc2V0X2J1ZmZlcl9pbnRlcm5hbCAob2xkX2J1ZmZl cik7CisKKyAgaWYgKHZhbCA+IDApCiAgICAgcmV0dXJuIHRydWU7CiAgIGVsc2UKICAgICBy ZXR1cm4gZmFsc2U7Cg== --------------FtNCE7uFJJvPd9u0gZl5ikKW--