From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Dmitry Gutov Newsgroups: gmane.emacs.bugs Subject: bug#60953: The :match predicate with large regexp in tree-sitter font-lock seems inefficient Date: Fri, 20 Jan 2023 05:53:12 +0200 Message-ID: <7624dddc-4600-9a03-ac8b-d3c9e0ab618c@yandex.ru> Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="------------UQx0D8HGepFSU5bBQqKxNJmr" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31855"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101 Thunderbird/102.4.2 To: 60953@debbugs.gnu.org Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Jan 20 04:54:21 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pIiTt-00087k-It for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 20 Jan 2023 04:54:21 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pIiTd-0001lE-3T; Thu, 19 Jan 2023 22:54:05 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pIiTb-0001l0-70 for bug-gnu-emacs@gnu.org; Thu, 19 Jan 2023 22:54:03 -0500 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pIiTa-0005Hr-Sx for bug-gnu-emacs@gnu.org; Thu, 19 Jan 2023 22:54:02 -0500 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pIiTa-0002SM-K8 for bug-gnu-emacs@gnu.org; Thu, 19 Jan 2023 22:54:02 -0500 X-Loop: help-debbugs@gnu.org Resent-From: Dmitry Gutov Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 20 Jan 2023 03:54:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 60953 X-GNU-PR-Package: emacs X-Debbugs-Original-To: bug-gnu-emacs@gnu.org Original-Received: via spool by submit@debbugs.gnu.org id=B.16741868019389 (code B ref -1); Fri, 20 Jan 2023 03:54:02 +0000 Original-Received: (at submit) by debbugs.gnu.org; 20 Jan 2023 03:53:21 +0000 Original-Received: from localhost ([127.0.0.1]:45102 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pIiSu-0002RN-SV for submit@debbugs.gnu.org; Thu, 19 Jan 2023 22:53:21 -0500 Original-Received: from lists.gnu.org ([209.51.188.17]:59018) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pIiSs-0002RF-Kp for submit@debbugs.gnu.org; Thu, 19 Jan 2023 22:53:19 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pIiSs-0001i6-Cl for bug-gnu-emacs@gnu.org; Thu, 19 Jan 2023 22:53:18 -0500 Original-Received: from mail-ej1-x62d.google.com ([2a00:1450:4864:20::62d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pIiSq-0005ET-BX for bug-gnu-emacs@gnu.org; Thu, 19 Jan 2023 22:53:18 -0500 Original-Received: by mail-ej1-x62d.google.com with SMTP id u19so10844727ejm.8 for ; Thu, 19 Jan 2023 19:53:15 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:sender:from:to:cc:subject:date:message-id:reply-to; bh=8EUcOHn1OmXnlPRul+DiGBvti9B6k9GySpE5DXhQTzk=; b=qXsRHp5f9LA05Cjr3of2oAIGQtTl8plit3JKcLy9xt+VqF7KP8aKsvrXvKninIj70/ i2C+76Oo74Tq+feSg6Byuqx09g/iUDCYThRUJFpQttfJZTcAcK8DpotiWht1k4Gdxa2M f7k6Ys/uNw7ggUTxFhivwsEGxL5xuh1hrJiQUk6wW6q/5VRav7eHD0qg9dZHosq8II6+ 5Bxzl1gygcCAcxFNkXZmI3hn/3OnOpXGla3DnA7cssATW1Ip/U6aOJvLKI7nJM4xRTZT 2tTYK/bSa72/dyGSHujuJg731jX2Zfsp0z+xtrSgwC8O6yJC/pN2oI9LaR+6viYOcRvy yFTg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=subject:from:to:content-language:user-agent:mime-version:date :message-id:sender:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=8EUcOHn1OmXnlPRul+DiGBvti9B6k9GySpE5DXhQTzk=; b=JOCGTOCljhCW3ZUzEmvHcJL9EUTq9uuXBCOTs9O9arSZVapfam5+AlVuniux0YTpca ACizkAWPRpnMyjJXGOoRxSQcfNjrQDYO0gS+W38JVG2BvNwTc7J6HQagXqL77SYBGwhj jnRdVWExFMCO8siA8GrWO54Soq5lhbfokR+jV4IDUuMCfkh98/2sLhnkUqJ8jOVFAVMK CTWK0e43u/2akOFCCEm6V05DifFNPm0XJbiKBZlVhwPRYv9ivqSV6Ab/Ld8Jgv2VkEwa Tys5sWs/YNYZhemujExHgf8olY2K/jvaR/qUgrA+FutS1Tftir9KBc0tXg/64XYi4PIS Gm7g== X-Gm-Message-State: AFqh2kpN/ZV5mlZX9lUDSEd7scpTO6W6WcZqUtxoK/NH+2+G4cJ7Q5cO Lt0g+00ljiwkCxIQpBdDOiQspQyR7Kg= X-Google-Smtp-Source: AMrXdXtSdHAVHurP2uS2AYjbF0lpu2fqdi4EVWZDHHaDOaZQ7N9jw84Em4UMLZqbpMbFGulgdWIltQ== X-Received: by 2002:a17:906:1851:b0:86e:4067:b699 with SMTP id w17-20020a170906185100b0086e4067b699mr17804418eje.4.1674186794816; Thu, 19 Jan 2023 19:53:14 -0800 (PST) Original-Received: from [192.168.0.2] ([46.251.119.176]) by smtp.googlemail.com with ESMTPSA id k11-20020a1709062a4b00b0073022b796a7sm17567994eje.93.2023.01.19.19.53.13 for (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Thu, 19 Jan 2023 19:53:14 -0800 (PST) Content-Language: en-US Received-SPF: pass client-ip=2a00:1450:4864:20::62d; envelope-from=raaahh@gmail.com; helo=mail-ej1-x62d.google.com X-Spam_score_int: -14 X-Spam_score: -1.5 X-Spam_bar: - X-Spam_report: (-1.5 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FORGED_FROMDOMAIN=0.25, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:253742 Archived-At: This is a multi-part message in MIME format. --------------UQx0D8HGepFSU5bBQqKxNJmr Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit In my benchmarking -- using this form in test/lisp/progmodes/ruby-mode-resources/ruby.rb after enabling ruby-ts-mode: (benchmark-run 1000 (progn (font-lock-mode -1) (font-lock-mode 1) (let (treesit--font-lock-fast-mode) (font-lock-ensure)))) the rule added to its font-lock in commit d66ac5285f7 :language language :feature 'builtin-functions `((((identifier) @font-lock-builtin-face) (:match ,ruby-ts--builtin-methods @font-lock-builtin-face))) ...seems to have made it 50% slower. The profile looked like this: 9454 84% - font-lock-fontify-region 9454 84% - font-lock-default-fontify-region 8862 79% - font-lock-fontify-syntactically-region 8702 78% - treesit-font-lock-fontify-region 128 1% treesit-fontify-with-override 123 1% facep 84 0% treesit--children-covering-range-recurse 60 0% + ruby-ts--comment-font-lock 4 0% + font-lock-unfontify-region 568 5% + font-lock-fontify-keywords-region 16 0% + font-lock-unfontify-region So there's nothing on the Lisp level to look at. Looking at the code, apparently we get a cursor and basically iterate through all (identifier) nodes, running our predicate manually. Without trying something more advanced like perf, I took a stab in the dark and tried to reduce string allocation in treesit_predicate_match (it currently ends up delegating to buffer-substring for every node), which seemed inefficient. But while my patch (attached) compiles and doesn't crash, it doesn't actually work (the rule's highlighting is missing), and the performance was unchanged. This message was originally longer, but see commit d94dc606a09: I switched to using :pred -- thus avoiding embedding the 720-char long regexp in the query -- and the performance drop got reduced to like 20%. As a baseline, this simplified query without predicates and colors every identifier in the buffer using the specified face, is still faster (just 10% over the original): :language language :feature 'builtin-function `(((identifier) @font-lock-builtin-face)) The regexp matching itself doesn't seem to be the problem: (benchmark 354100 '(string-match-p ruby-ts--builtin-methods "gsub")) => Elapsed time: 0.141681s -- whereas the difference between the benchmarks is on the order of seconds. I think the marshaling of the long regexp string back and forth could be the culprit. Would be nice to fix that somehow. I also think that trying to reduce the string allocation overhead has potential, but so far all my experiments haven't moved the needle anywhere noticeable. --------------UQx0D8HGepFSU5bBQqKxNJmr Content-Type: text/x-patch; charset=UTF-8; name="treesit_predicate_match.diff" Content-Disposition: attachment; filename="treesit_predicate_match.diff" Content-Transfer-Encoding: base64 ZGlmZiAtLWdpdCBhL3NyYy90cmVlc2l0LmMgYi9zcmMvdHJlZXNpdC5jCmluZGV4IDkxN2Ri NTgyNjc2Li43ZTI5NGEwYTY2ZiAxMDA2NDQKLS0tIGEvc3JjL3RyZWVzaXQuYworKysgYi9z cmMvdHJlZXNpdC5jCkBAIC0yNDY2LDEwICsyNDY2LDI2IEBAIHRyZWVzaXRfcHJlZGljYXRl X21hdGNoIChMaXNwX09iamVjdCBhcmdzLCBzdHJ1Y3QgY2FwdHVyZV9yYW5nZSBjYXB0dXJl cykKIAkgICAgICBidWlsZF9zdHJpbmcgKCJUaGUgc2Vjb25kIGFyZ3VtZW50IHRvIGBtYXRj aCcgc2hvdWxkICIKIAkJICAgICAgICAgICAgImJlIGEgY2FwdHVyZSBuYW1lLCBub3QgYSBz dHJpbmciKSk7CiAKLSAgTGlzcF9PYmplY3QgdGV4dCA9IHRyZWVzaXRfcHJlZGljYXRlX2Nh cHR1cmVfbmFtZV90b190ZXh0IChjYXB0dXJlX25hbWUsCi0JCQkJCQkJICAgICBjYXB0dXJl cyk7CisgIExpc3BfT2JqZWN0IG5vZGUgPSB0cmVlc2l0X3ByZWRpY2F0ZV9jYXB0dXJlX25h bWVfdG9fbm9kZSAoY2FwdHVyZV9uYW1lLCBjYXB0dXJlcyk7CiAKLSAgaWYgKGZhc3Rfc3Ry aW5nX21hdGNoIChyZWdleHAsIHRleHQpID49IDApCisgIHN0cnVjdCBidWZmZXIgKm9sZF9i dWZmZXIgPSBjdXJyZW50X2J1ZmZlcjsKKyAgc3RydWN0IGJ1ZmZlciAqYnVmZmVyID0gWEJV RkZFUiAoWFRTX1BBUlNFUiAoWFRTX05PREUgKG5vZGUpLT5wYXJzZXIpLT5idWZmZXIpOwor ICBzZXRfYnVmZmVyX2ludGVybmFsIChidWZmZXIpOworCisgIFRTTm9kZSB0cmVlc2l0X25v ZGUgPSBYVFNfTk9ERSAobm9kZSktPm5vZGU7CisgIHB0cmRpZmZfdCB2aXNpYmxlX2JlZyA9 IFhUU19QQVJTRVIgKFhUU19OT0RFIChub2RlKS0+cGFyc2VyKS0+dmlzaWJsZV9iZWc7Cisg IHVpbnQzMl90IHN0YXJ0X2J5dGVfb2Zmc2V0ID0gdHNfbm9kZV9zdGFydF9ieXRlICh0cmVl c2l0X25vZGUpOworICB1aW50MzJfdCBlbmRfYnl0ZV9vZmZzZXQgPSB0c19ub2RlX2VuZF9i eXRlICh0cmVlc2l0X25vZGUpOworICBwdHJkaWZmX3Qgc3RhcnRfYnl0ZSA9IHZpc2libGVf YmVnICsgc3RhcnRfYnl0ZV9vZmZzZXQ7CisgIHB0cmRpZmZfdCBlbmRfYnl0ZSA9IHZpc2li bGVfYmVnICsgZW5kX2J5dGVfb2Zmc2V0OworICBwdHJkaWZmX3Qgc3RhcnRfcG9zID0gYnVm X2J5dGVwb3NfdG9fY2hhcnBvcyAoYnVmZmVyLCBzdGFydF9ieXRlKTsKKyAgcHRyZGlmZl90 IGVuZF9wb3MgPSBidWZfYnl0ZXBvc190b19jaGFycG9zIChidWZmZXIsIGVuZF9ieXRlKTsK KworICBwdHJkaWZmX3QgdmFsID0gZmFzdF9sb29raW5nX2F0IChyZWdleHAsIHN0YXJ0X3Bv cywgc3RhcnRfYnl0ZSwgZW5kX3BvcywgZW5kX2J5dGUsIFFuaWwpOworCisgIHNldF9idWZm ZXJfaW50ZXJuYWwgKG9sZF9idWZmZXIpOworCisgIGlmICh2YWwgPj0gMCkKICAgICByZXR1 cm4gdHJ1ZTsKICAgZWxzZQogICAgIHJldHVybiBmYWxzZTsK --------------UQx0D8HGepFSU5bBQqKxNJmr--