From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: JD Smith Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter navigation time grows as sqrt(line-number) Date: Fri, 18 Aug 2023 00:19:02 -0400 Message-ID: <2112821D-6A84-4AE7-BF7F-D26B4463A419@gmail.com> References: <3E82D409-6903-4679-9031-939CA35791FF@gmail.com> <264EECA4-0920-4217-834C-19F9A58CEBBF@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_C2142055-6695-4BB6-A7AD-FD3A285C1256" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2672"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Aug 18 06:20:31 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qWqyM-0000SE-Qy for ged-emacs-devel@m.gmane-mx.org; Fri, 18 Aug 2023 06:20:30 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qWqxJ-0000YL-6L; Fri, 18 Aug 2023 00:19:25 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qWqxE-0000Y2-V4 for emacs-devel@gnu.org; Fri, 18 Aug 2023 00:19:21 -0400 Original-Received: from mail-il1-x129.google.com ([2607:f8b0:4864:20::129]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qWqxB-00022f-Ss for emacs-devel@gnu.org; Fri, 18 Aug 2023 00:19:20 -0400 Original-Received: by mail-il1-x129.google.com with SMTP id e9e14a558f8ab-34bae88001dso1737455ab.2 for ; Thu, 17 Aug 2023 21:19:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692332355; x=1692937155; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=m6ljQxpzqwiRFpfk0wwxxQSNOeKenCGI9aWM6+D1Eyk=; b=Pq4TyRSK2vHoBTSqOnf1ay31YAQf4pRG1atIO5SwxS95rmY44C0LNDH2gxNbIm7Z8U nqrvf6hJmLEvIPzWvA4BM5FtQ6dJXuLHAd9MrN+n/rB/FJ+Jq7A01G8CnAPs0lUNyEMQ Y8YtGXbjZAz3WGelf3tTT4eX8QVZpZlr4M0cW7bp/tthIlQ7C5pFlc8SSGVSFwLfpIv1 Xk1ZjtG/W/PVFcIBKgpecTzhUaTsusMWBzBq1LoMvZQcyXcpUBFGKBaSkYGLtLlEG/3a wrDXNCz9PRgyVOIAZNTVfl1xiUgG70oWCweBpDTVZfxGs+mGMf4+9qLrt1p8gWeOoJ5Y oDOw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692332355; x=1692937155; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=m6ljQxpzqwiRFpfk0wwxxQSNOeKenCGI9aWM6+D1Eyk=; b=KPrke10ZtWJD0pnjWmUCuzysA9CJXRkwYNqFHQth2zggo+GqW9loImF3Co/jL1R833 vKst0way4idfSN4GXNSNmrV0OVoHD31gS8V/DvAiuubYKzKXST9lJPVTecOj+qaCbGDF D6tvmCwUghgHgSHcoTVuC4MtxNiqM3jNJVfK7+E2qhiJUGrqnzA8+SFCOJ5yI5QqhnT7 c2KX2JIhutl2PPZIaZsNfLR6Qcpjau0gW13e9fehjTyscIZVav6kLtWp56mijd0PdLz6 lLRbX1b1mqtijQQTEqzcQjvsF3SKD/DIk0TxrgfgN1Q7bOfDnEEA37OVZ+GsEWGmBJHK Hyiw== X-Gm-Message-State: AOJu0YyT7edY/VVjs28no00N9YUbXc0dqCsxaWXWY8/N/c1w+E6ksKE/ MmUtKPzn5YspjTn1xtJ6LjQ= X-Google-Smtp-Source: AGHT+IEBOfzYNQMTtAmZcKHt3A8CW760QCIDKO6bTmfQZvuJapT2uV2hHUZkfrdVREHK1MK4crjs2Q== X-Received: by 2002:a92:c7c2:0:b0:349:311f:83aa with SMTP id g2-20020a92c7c2000000b00349311f83aamr1600232ilk.28.1692332354858; Thu, 17 Aug 2023 21:19:14 -0700 (PDT) Original-Received: from smtpclient.apple (cm-24-53-184-115.buckeyecom.net. [24.53.184.115]) by smtp.gmail.com with ESMTPSA id v16-20020a92c6d0000000b00345d6297aa7sm288027ilm.16.2023.08.17.21.19.13 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Aug 2023 21:19:13 -0700 (PDT) In-Reply-To: <264EECA4-0920-4217-834C-19F9A58CEBBF@gmail.com> X-Mailer: Apple Mail (2.3731.700.6) Received-SPF: pass client-ip=2607:f8b0:4864:20::129; envelope-from=jdtsmith@gmail.com; helo=mail-il1-x129.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:308882 Archived-At: --Apple-Mail=_C2142055-6695-4BB6-A7AD-FD3A285C1256 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=utf-8 > On Aug 17, 2023, at 11:00 PM, Yuan Fu wrote: >>=20 >> Code and details here: >>=20 >> https://gist.github.com/jdtsmith/7fa6263a13559d587abb51827e6ae472 >=20 > I=E2=80=99m not entirely surprised. In the parse tree that tree-sitter = generates is a DAG, where the parent node has pointers to the child = nodes, but not the other way around. That means to go to the parent node = from a child node, what tree-sitter actually does is go down from the = root node until it hits the parent node. This process is linear to the = height of the tree. Thanks for this info, very interesting. =20 > to go to the parent node from a child node, what tree-sitter actually = does is go down from the root node until it hits the parent node. This = process is linear to the height of the tree. Do you mean linear in the width of the tree? I=E2=80=99ve color-coded = the plot by tree depth (i.e. how many ancestors a given node has back to = root). Or maybe you are thinking of the tree lying on its side (like = vundo ;)? > Also, getting the node at point isn=E2=80=99t free either. To get the = node at point, we actually iterates from the first child node of the = root node until reaching one that contains the point, then iterate from = the first child node of that node until reaching one that contains the = point, etc, until we reach a leaf node. So log(N) time complexity is = expected. I tested node-at-point on the same file, and it is quite fast, with = worst case performance only 30=C2=B5s (vs. 3ms for the full parent = navigation), and growing very slowly, between sqrt(log(N)) and log(N). = Check the gist for a new figure. Unless I am misunderstanding, for the common case of finding parent of = the node at point, it seems the algorithm you describe could be tweaked = to work well. I.e. instead of "stop when reaching a leaf node = containing point", just "stop when you reach a node containing point who = has the original node as a child=E2=80=9D? This should give = (hypothetical) =E2=80=9Cparent-of-node-at-point=E2=80=9D the same great = speed as node-at-point. =20 Then =E2=80=9Cparent-of-node-at-point-until=E2=80=9D could do something = quite clever: accumulate parent nodes all the way from root to the = child=E2=80=99s direct parent into a list (same low cost, modulo the = node storage). Then run the predicate on that list (in reverse), = returning the first match. Could of course return that full list too = (=E2=80=9Cancestors-of-node-at-point=E2=80=9D), for other uses. These = should all be quite fast compared to a full breadth and depth searching = of every nook and cranny in the syntax tree that node-parent seems to do = (having no positional information to wield). > I=E2=80=99m not too worried tho, because IIRC the absolute time is = very short. The 100x variability doesn=E2=80=99t mean much if the 100x = is still very fast. This is on a brand new fast machine, and 3ms is pretty slow for things = that need to run dozens of times per keystroke (e.g. font-lock). > These are fundamental limits of tree-sitter, unless it changes its = data structure.=20 That is an interesting limitation for sure. It seems that perhaps each = node's start..end information can be really helpful here for winnowing = the tree, when you are mostly concerned about nodes relevant to and = covering particular positions in the buffer.= --Apple-Mail=_C2142055-6695-4BB6-A7AD-FD3A285C1256 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=utf-8
On Aug 17, = 2023, at 11:00 PM, Yuan Fu <casouri@gmail.com> = wrote:

Code and details here:

https://gist.github.com/jdtsmith/7fa6263a13559d587abb51827e6ae472
=

I=E2=80=99m not entirely = surprised. In the parse tree that tree-sitter generates is a DAG, where = the parent node has pointers to the child nodes, but not the other way = around. That means to go to the parent node from a child node, what = tree-sitter actually does is go down from the root node until it hits = the parent node. This process is linear to the height of the = tree.

Thanks for this info, very = interesting.  

to = go to the parent node from a child node, what tree-sitter actually does = is go down from the root node until it hits the parent node. This = process is linear to the height of the = tree.

Do you mean linear in the width of = the tree?  I=E2=80=99ve color-coded the plot by tree depth (i.e. = how many ancestors a given node has back to root).  Or maybe you = are thinking of the tree lying on its side (like vundo = ;)?

Also, getting the = node at point isn=E2=80=99t free either. To get the node at point, we = actually iterates from the first child node of the root node until = reaching one that contains the point, then iterate from the first child = node of that node until reaching one that contains the point, etc, until = we reach a leaf node. So log(N) time complexity is = expected.

I tested node-at-point on the same = file, and it is quite fast, with worst case performance only 30=C2=B5s = (vs. 3ms for the full parent navigation), and growing very slowly, = between sqrt(log(N)) and log(N).  Check the gist for a new = figure.

Unless I am misunderstanding, for the = common case of finding parent of the node at point, it seems the = algorithm you describe could be tweaked to work well.  I.e. instead = of "stop when reaching a leaf node containing point", just "stop when = you reach a node containing point who has the original node as a = child=E2=80=9D?   This should give (hypothetical) = =E2=80=9Cparent-of-node-at-point=E2=80=9D the same great speed as = node-at-point.  

Then =E2=80=9Cparent-of-node-at-point-until=E2=80=9D = could do something quite clever: accumulate parent nodes all the way = from root to the child=E2=80=99s direct parent into a list (same low = cost, modulo the node storage).  Then run the predicate on that = list (in reverse), returning the first match.  Could of course = return that full list too (=E2=80=9Cancestors-of-node-at-point=E2=80=9D), = for other uses.  These should all be quite fast compared to a full = breadth and depth searching of every nook and cranny in the syntax tree = that node-parent seems to do (having no positional information to = wield).

I=E2=80=99= m not too worried tho, because IIRC the absolute time is very short. The = 100x variability doesn=E2=80=99t mean much if the 100x is still very = fast.

This is on a brand new fast = machine, and 3ms is pretty slow for things that need to run dozens of = times per keystroke (e.g. = font-lock).

These are = fundamental limits of tree-sitter, unless it changes its data = structure. 

That is an = interesting limitation for sure.  It seems that perhaps each node's = start..end information can be really helpful here for winnowing the = tree, when you are mostly concerned about nodes relevant to and covering = particular positions in the buffer.
= --Apple-Mail=_C2142055-6695-4BB6-A7AD-FD3A285C1256--