From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter navigation time grows as sqrt(line-number) Date: Sat, 19 Aug 2023 19:01:25 -0700 Message-ID: References: <3E82D409-6903-4679-9031-939CA35791FF@gmail.com> <32507689-3b2c-ccbf-dd14-e7bf0bed1ac7@gutov.dev> <6db52945-5459-197c-405d-153ff395a824@gutov.dev> <1F7C956D-6D22-4CC1-8656-6E2A4D07D5FB@gmail.com> <69D18963-D94F-4792-9FF1-159897A99E50@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_16D2F22E-327A-46FC-8A07-85E1F80AAEFB" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="34682"; mail-complaints-to="usenet@ciao.gmane.io" Cc: JD Smith , emacs-devel@gnu.org To: Dmitry Gutov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Aug 20 04:02:39 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qXXm2-0008pS-15 for ged-emacs-devel@m.gmane-mx.org; Sun, 20 Aug 2023 04:02:38 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qXXl9-0005ui-Sk; Sat, 19 Aug 2023 22:01:43 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qXXl7-0005t4-JQ for emacs-devel@gnu.org; Sat, 19 Aug 2023 22:01:41 -0400 Original-Received: from mail-pl1-x636.google.com ([2607:f8b0:4864:20::636]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qXXl5-0000iE-6J for emacs-devel@gnu.org; Sat, 19 Aug 2023 22:01:41 -0400 Original-Received: by mail-pl1-x636.google.com with SMTP id d9443c01a7336-1bda9207132so17422535ad.0 for ; Sat, 19 Aug 2023 19:01:38 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692496898; x=1693101698; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=xf9LQaZ7iIvpXHk9Vcb/4fOYuTIwSZ5yVLD8ahM8GZM=; b=bsh2Ojg4ZFbY0iBH101s3rhRxhdAC2uXfqYuU1F1PQkiLlOy4r94Hbj1rV49hXiR4m +mYFOosuPVq0tySzJFhRloT+NcYwzVg1R0kcb73uzueLAf4pk5BuFnUYlVlzIvG8PaPg Ie6KiyOTRw4snWULFdd198ZIzXIhIdoIRJMwAHWl/BG8Lm7mZNOIleuZtjX6KmRmbHzu LF+ohN1B9sT7TYEe4zuNn2ujmVXbXCE7Nov86czLKmQEtSssXm8tUn824GdBdIFT0RBc y6ndl2+iQbFEwR05bZJRc8hv2UjbzTV+fTxYLhNKMuKnXRcMtn04Ada3sELMZctU+tmy OnoA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692496898; x=1693101698; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=xf9LQaZ7iIvpXHk9Vcb/4fOYuTIwSZ5yVLD8ahM8GZM=; b=HIbxbmuSg3GxYzZK35qVQNOb7n1Shber3HPTrFJtjmn+3TottD/mPNLMepilO7Exme GN3A0Mb1uxlGc5VQrFGDEyWDJ3wHQiqhpqRlcd71YgDz2jHoaMfnZ+iaJ0vOIStKc2VC EocZFkPOUxggS+MpFp7c3MtFvhXh6+DGHBZ+oB4LYNL/Wj6Ntx7heuxGujk5B2zKA+sl d/faieiAFdH24YMSrmlrdewdpw3g36DOmmQB9zjf3qqwFNdaanOm8zwT9MDnoz1OR/fs qv9KwpzD36V/h2bgW9bcAPe+Gl5ARZnwlBmEMymUK9zM66pI853aFi8jbN8PKMVrCOs0 iP7A== X-Gm-Message-State: AOJu0YwkhR290rjjj0oH/Ktud2gjR9KsWDBDzujxeJ2AIBNsOIIPwTLF mp++CAkttadLsQbw0ADkM4o= X-Google-Smtp-Source: AGHT+IGE4CFfMrEMlmn+1cmx0JuxRrwk/Vu4Ti7yErShEalf39P8Hxe4KdthAFjgSXpvhHAmS19KPQ== X-Received: by 2002:a17:903:1208:b0:1b6:6625:d3a8 with SMTP id l8-20020a170903120800b001b66625d3a8mr3817955plh.16.1692496897585; Sat, 19 Aug 2023 19:01:37 -0700 (PDT) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id jj19-20020a170903049300b001ba066c589dsm4259461plb.137.2023.08.19.19.01.36 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Sat, 19 Aug 2023 19:01:37 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3731.600.7) Received-SPF: pass client-ip=2607:f8b0:4864:20::636; envelope-from=casouri@gmail.com; helo=mail-pl1-x636.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:308945 Archived-At: --Apple-Mail=_16D2F22E-327A-46FC-8A07-85E1F80AAEFB Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Aug 19, 2023, at 5:39 PM, Dmitry Gutov wrote: >=20 > On 20/08/2023 03:18, JD Smith wrote: >> Great, thanks. I tried this patch out, and there is indeed about 10x = of improvement. Check the bottom of the gist. That said, node_parent = remains 10x faster yet (at worst, in a long file), so maybe there=E2=80=99= s room for further improvement? >=20 > Similarly, I also see an improvement from Yuan's patch in my testing = (about 2x), while the patch with ts_node_parent remains the fastest = anyway. Where my test looks like this: >=20 > (benchmark 1000 '(treesit-node-parent n)) >=20 > I looked around for the reasons for the difference. Built the latest = tree-sitter (didn't help) and found these two threads on GH: >=20 > = https://github.com/tree-sitter/tree-sitter/issues/567#issuecomment-5955641= 71 - Max Brunsfield says "There is some caching done in that method to = make sure it performs well in the common case of walking repeatedly up = the tree", but I haven't found where said caching resides so far. >=20 > https://github.com/tree-sitter/tree-sitter/discussions/878 - mentions = that mixing cursor and direct node apis leads to suboptimal results, and = just using the former gives an improvement. No "good" code example in = there. >=20 > > May be worth looking at how others are doing it, e.g. the python = API. >=20 > Apparently they have both a wrapper for a cursor API, and = node_get_parent which is implemented using ts_node_parent: = https://github.com/tree-sitter/py-tree-sitter/issues/34 >=20 > Leaving it to the callers to choose which one to use. Ok, I fiddled around a bit more, and this patch (applies to master) = should make the speed comparable to ts_node_parent. Yuan --Apple-Mail=_16D2F22E-327A-46FC-8A07-85E1F80AAEFB Content-Disposition: attachment; filename=node-parent.patch Content-Type: application/octet-stream; x-unix-mode=0644; name="node-parent.patch" Content-Transfer-Encoding: quoted-printable =46rom=2021d3e612d1d6819278621b956629f6c28a324145=20Mon=20Sep=2017=20= 00:00:00=202001=0AFrom:=20Yuan=20Fu=20=0ADate:=20Sat,=20= 19=20Aug=202023=2015:04:20=20-0700=0ASubject:=20[PATCH]=20Improve=20= performance=20of=20treesit_cursor_helper_1=0A=0A*=20src/treesit.c:=20= (treesit_cursor_helper_1):=20Use=0A= ts_tree_cursor_goto_first_child_for_byte=20to=20speed=20up=20traversing=20= among=0Asiblings.=20=20The=20"while=20(ts_node_end_byte=20(cursor_node)=20= <=20end_pos)"=20can=0Abe=20removed=20with=20the=20check=20added=20in=20= the=20loop=20below.=0A---=0A=20src/treesit.c=20|=2022=20= +++++++++-------------=0A=201=20file=20changed,=209=20insertions(+),=20= 13=20deletions(-)=0A=0Adiff=20--git=20a/src/treesit.c=20b/src/treesit.c=0A= index=201f694e47201..1017c64f899=20100644=0A---=20a/src/treesit.c=0A+++=20= b/src/treesit.c=0A@@=20-3023,7=20+3023,8=20@@=20treesit_assume_true=20= (bool=20val)=0A=20=20=20=20limit.=20=20*/=0A=20static=20bool=0A=20= treesit_cursor_helper_1=20(TSTreeCursor=20*cursor,=20TSNode=20*target,=0A= -=09=09=09=20uint32_t=20end_pos,=20ptrdiff_t=20limit)=0A+=09=09=09=20= uint32_t=20start_pos,=20uint32_t=20end_pos,=0A+=09=09=09=20ptrdiff_t=20= limit)=0A=20{=0A=20=20=20if=20(limit=20<=3D=200)=0A=20=20=20=20=20return=20= false;=0A@@=20-3032,23=20+3033,17=20@@=20treesit_cursor_helper_1=20= (TSTreeCursor=20*cursor,=20TSNode=20*target,=0A=20=20=20if=20(ts_node_eq=20= (cursor_node,=20*target))=0A=20=20=20=20=20return=20true;=0A=20=0A-=20=20= if=20(!ts_tree_cursor_goto_first_child=20(cursor))=0A+=20=20if=20= (ts_tree_cursor_goto_first_child_for_byte=20(cursor,=20start_pos)=20=3D=3D= =20-1)=0A=20=20=20=20=20return=20false;=0A=20=0A-=20=20/*=20Skip=20nodes=20= that=20definitely=20don't=20contain=20TARGET.=20=20*/=0A-=20=20while=20= (ts_node_end_byte=20(cursor_node)=20<=20end_pos)=0A-=20=20=20=20{=0A-=20=20= =20=20=20=20if=20(!ts_tree_cursor_goto_next_sibling=20(cursor))=0A-=09= break;=0A-=20=20=20=20=20=20cursor_node=20=3D=20= ts_tree_cursor_current_node=20(cursor);=0A-=20=20=20=20}=0A-=0A=20=20=20= /*=20Go=20through=20each=20sibling=20that=20could=20contain=20TARGET.=20=20= Because=20of=0A=20=20=20=20=20=20missing=20nodes=20(their=20width=20is=20= 0),=20there=20could=20be=20multiple=0A=20=20=20=20=20=20siblings=20that=20= could=20contain=20TARGET.=20=20*/=0A=20=20=20while=20(ts_node_start_byte=20= (cursor_node)=20<=3D=20end_pos)=0A=20=20=20=20=20{=0A-=20=20=20=20=20=20= if=20(treesit_cursor_helper_1=20(cursor,=20target,=20end_pos,=20limit=20= -=201))=0A+=20=20=20=20=20=20if=20(ts_node_end_byte=20(cursor_node)=20>=3D= =20end_pos=0A+=09=20=20&&=20treesit_cursor_helper_1=20(cursor,=20target,=20= start_pos,=20end_pos,=0A+=09=09=09=09=20=20=20=20=20=20limit=20-=201))=0A= =20=09return=20true;=0A=20=0A=20=20=20=20=20=20=20if=20= (!ts_tree_cursor_goto_next_sibling=20(cursor))=0A@@=20-3080,11=20= +3075,12=20@@=20treesit_cursor_helper_1=20(TSTreeCursor=20*cursor,=20= TSNode=20*target,=0A=20static=20bool=0A=20treesit_cursor_helper=20= (TSTreeCursor=20*cursor,=20TSNode=20node,=20Lisp_Object=20parser)=0A=20{=0A= +=20=20uint32_t=20start_pos=20=3D=20ts_node_start_byte=20(node);=0A=20=20= =20uint32_t=20end_pos=20=3D=20ts_node_end_byte=20(node);=0A=20=20=20= TSNode=20root=20=3D=20ts_tree_root_node=20(XTS_PARSER=20(parser)->tree);=0A= =20=20=20*cursor=20=3D=20ts_tree_cursor_new=20(root);=0A-=20=20bool=20= success=20=3D=20treesit_cursor_helper_1=20(cursor,=20&node,=20end_pos,=0A= -=09=09=09=09=09=20=20TREESIT_RECURSION_LIMIT);=0A+=20=20bool=20success=20= =3D=20treesit_cursor_helper_1=20(cursor,=20&node,=20start_pos,=0A+=09=09=09= =09=09=20=20end_pos,=20TREESIT_RECURSION_LIMIT);=0A=20=20=20if=20= (!success)=0A=20=20=20=20=20ts_tree_cursor_delete=20(cursor);=0A=20=20=20= return=20success;=0A--=20=0A2.41.0=0A=0A= --Apple-Mail=_16D2F22E-327A-46FC-8A07-85E1F80AAEFB Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii --Apple-Mail=_16D2F22E-327A-46FC-8A07-85E1F80AAEFB--