From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter navigation time grows as sqrt(line-number) Date: Thu, 17 Aug 2023 22:20:55 -0700 Message-ID: References: <3E82D409-6903-4679-9031-939CA35791FF@gmail.com> <264EECA4-0920-4217-834C-19F9A58CEBBF@gmail.com> <2112821D-6A84-4AE7-BF7F-D26B4463A419@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14272"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: JD Smith Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Aug 18 07:21:59 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qWrvr-0003VO-Il for ged-emacs-devel@m.gmane-mx.org; Fri, 18 Aug 2023 07:21:59 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qWrv6-0003aw-Ub; Fri, 18 Aug 2023 01:21:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qWrv5-0003am-N9 for emacs-devel@gnu.org; Fri, 18 Aug 2023 01:21:11 -0400 Original-Received: from mail-oi1-x22a.google.com ([2607:f8b0:4864:20::22a]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qWrv2-0006Dd-PF for emacs-devel@gnu.org; Fri, 18 Aug 2023 01:21:11 -0400 Original-Received: by mail-oi1-x22a.google.com with SMTP id 5614622812f47-3a741f46fadso381994b6e.0 for ; Thu, 17 Aug 2023 22:21:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692336067; x=1692940867; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=1yzapqpw3lrrjrN5W1QoKxTNYAOH9nFfz0k7jjLRbYs=; b=q1NpL81Ixi/sp+kSfY/nO5hlOu8el2JkUDo+8r6/0mwaaG3CRxw6D7HbB0+QNvKTSL FvBYQ6JfrhaJbXXmUuBE+UOUMvoHXi0Tm3NuCQEFJNAlEug5uqXFTFkcSIuGXkG/pXB2 5HqaZ5P3T3g6loZv0Wjr0Eu7GXIDo4BvGMDMgrcMSBJNF+0XPwObitkClwhokltg04H5 xs/mlvXBPBUzOEGA21keEaLG3GWlF3emqU9zHxSYp1VUwjqXyDaPeTtvfZ4bs7/4qZEa 01Jxo7kt+MGPhgn1F2R1IsjvgN8COy4beYj1r06GHCT9Z+fi7ISTgkFT5nmz8nmdx9XC 8T1g== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692336067; x=1692940867; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=1yzapqpw3lrrjrN5W1QoKxTNYAOH9nFfz0k7jjLRbYs=; b=dhKR5CxcwUXuDMJNcx1BMHmcf2kZuiCSqzv8pMRjoGzIgutEyTkc21yua/KayT9tqR M0ow+SZlI91v0tptio2QnvF6i1Qtgsw31rxJpHeV4LGKv8neilnkxxjuNIPx6PYU/fqT lbabX4aUFTYUXvqaKDcBuhUB67DuTLwvHUp8Lm6+mRRe6HFcFENZ+gLK8/fq9efoyVNB yTNH5b3yzqO4AB/KhsB/dPkIiRz6fSC1sBMFvcDlZKA/HHueZXtJF3z6ZDYhtsn4n/tu uzE8nDwz3FnfYM3TBCDSfFxYurp4osY0pWE5QEAo5H2lyzNxlzbRA8uNrYl9dnwrHqRF e54w== X-Gm-Message-State: AOJu0YyS3nFej9Q+SREXlW6XCYl2uvWj61Bz08t7lsZfSV2Ozsbyy3kA fFhe78tDhnH9pIhFzzlTDAg= X-Google-Smtp-Source: AGHT+IGmAaO1mfifl1ivvQMGRAdC7JrnhcSJFwp89H6nwHLifBS+XA3hSDOL31iY9HLOXUm4Trlwpw== X-Received: by 2002:a05:6870:738c:b0:1c0:d0e8:8fda with SMTP id z12-20020a056870738c00b001c0d0e88fdamr1727052oam.16.1692336067166; Thu, 17 Aug 2023 22:21:07 -0700 (PDT) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id y13-20020a17090aca8d00b002693505391csm2392855pjt.37.2023.08.17.22.21.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Aug 2023 22:21:06 -0700 (PDT) In-Reply-To: <2112821D-6A84-4AE7-BF7F-D26B4463A419@gmail.com> X-Mailer: Apple Mail (2.3731.600.7) Received-SPF: pass client-ip=2607:f8b0:4864:20::22a; envelope-from=casouri@gmail.com; helo=mail-oi1-x22a.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:308883 Archived-At: >> to go to the parent node from a child node, what tree-sitter actually = does is go down from the root node until it hits the parent node. This = process is linear to the height of the tree. >=20 > Do you mean linear in the width of the tree? I=E2=80=99ve color-coded = the plot by tree depth (i.e. how many ancestors a given node has back to = root). Or maybe you are thinking of the tree lying on its side (like = vundo ;)? Going down from the root node is proportion to the height of the tree, = no? >=20 >> Also, getting the node at point isn=E2=80=99t free either. To get the = node at point, we actually iterates from the first child node of the = root node until reaching one that contains the point, then iterate from = the first child node of that node until reaching one that contains the = point, etc, until we reach a leaf node. So log(N) time complexity is = expected. >=20 > I tested node-at-point on the same file, and it is quite fast, with = worst case performance only 30=C2=B5s (vs. 3ms for the full parent = navigation), and growing very slowly, between sqrt(log(N)) and log(N). = Check the gist for a new figure. >=20 > Unless I am misunderstanding, for the common case of finding parent of = the node at point, it seems the algorithm you describe could be tweaked = to work well. I.e. instead of "stop when reaching a leaf node = containing point", just "stop when you reach a node containing point who = has the original node as a child=E2=80=9D? This should give = (hypothetical) =E2=80=9Cparent-of-node-at-point=E2=80=9D the same great = speed as node-at-point. =20 I should=E2=80=99ve been more clear. For finding the parent of a node x, = we stop at the parent of x. We only go to the leaf node when finding the = node at point, which always returns a leaf node. >=20 > Then =E2=80=9Cparent-of-node-at-point-until=E2=80=9D could do = something quite clever: accumulate parent nodes all the way from root to = the child=E2=80=99s direct parent into a list (same low cost, modulo the = node storage). Then run the predicate on that list (in reverse), = returning the first match. Could of course return that full list too = (=E2=80=9Cancestors-of-node-at-point=E2=80=9D), for other uses. These = should all be quite fast compared to a full breadth and depth searching = of every nook and cranny in the syntax tree that node-parent seems to do = (having no positional information to wield). Sure, there hasn=E2=80=99t been a clever version because so far no one = have complained they can=E2=80=99t find the parent of a node fast = enough. Also I doubt the amount of gain we can get with a more clever = algorithm. You can try implement one and benchmark it. I=E2=80=99m = curious to know the result :-) >=20 >> I=E2=80=99m not too worried tho, because IIRC the absolute time is = very short. The 100x variability doesn=E2=80=99t mean much if the 100x = is still very fast. >=20 >=20 > This is on a brand new fast machine, and 3ms is pretty slow for things = that need to run dozens of times per keystroke (e.g. font-lock). Font-lock doesn=E2=80=99t need to find the parent of a node. So that = hasn=E2=80=99t been a problem. In use-cases where finding the parent of = a node is useful, 3ms hasn=E2=80=99t been a problem AFAIK. (Well, I = guess indentation could benefit from a faster parent-finding function.) >=20 >> These are fundamental limits of tree-sitter, unless it changes its = data structure.=20 >=20 > That is an interesting limitation for sure. It seems that perhaps = each node's start..end information can be really helpful here for = winnowing the tree, when you are mostly concerned about nodes relevant = to and covering particular positions in the buffer. > BTW, https://tree-sitter.github.io/tree-sitter/using-parsers says: > =E2=80=A2=20 > A TSNode represents a single node in the syntax tree. It tracks its = start and end positions in the source code, as well as its relation to = other nodes like its parent, siblings and children. I=E2=80=99m not entirely sure what exactly do you have in mind. A = node=E2=80=99s start and end position doesn=E2=80=99t really help us = finding it. Suppose you have an array of numbers, and you know one of = the numbers is 3. How do you find the index of the number 3 in the = array? While the documentation say it tracks its relation to other nodes, I = think it=E2=80=99s more pedagogical than factual. Behind the scenes, = tree-sitter=E2=80=99s own node API does the same thing as I described: = it goes from the root node and traverses down. Yuan