From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter navigation time grows as sqrt(line-number) Date: Thu, 17 Aug 2023 20:00:50 -0700 Message-ID: <264EECA4-0920-4217-834C-19F9A58CEBBF@gmail.com> References: <3E82D409-6903-4679-9031-939CA35791FF@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.600.7\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28339"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org To: JD Smith Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Aug 18 05:01:58 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qWpkM-0007BE-4g for ged-emacs-devel@m.gmane-mx.org; Fri, 18 Aug 2023 05:01:58 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qWpja-0004Hv-UQ; Thu, 17 Aug 2023 23:01:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qWpjY-0004Hm-Pm for emacs-devel@gnu.org; Thu, 17 Aug 2023 23:01:08 -0400 Original-Received: from mail-pg1-x52f.google.com ([2607:f8b0:4864:20::52f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qWpjU-0002hr-1k for emacs-devel@gnu.org; Thu, 17 Aug 2023 23:01:08 -0400 Original-Received: by mail-pg1-x52f.google.com with SMTP id 41be03b00d2f7-56546b45f30so405220a12.3 for ; Thu, 17 Aug 2023 20:01:03 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1692327662; x=1692932462; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=gw+dMPwYGdcUIEhLYmCvyCHtU16fbr1cnaJitJA0p9A=; b=SWe/7OjrkdJ2AZNNYw+FTAY1DeBr0g740+SdCZhgHOrbVMZIuHQyxzoGykKHFIRBp8 vcpUQ+7PXB0V+hl23RrHVa4xI59np5v7Tgh2v1Bf3Je1lhs2kQvTb4FDzvuIX4HOTJjU 95OOUfitWGyR+4SQ7kJOcktYIlyATMl1HIPN6i9qriaTuXXDespw44F1sndZL8IyKZgC Q7fC4mi9RyoRaVo6rsKfay+fS84LQBynyXzZVemqlbIZnGWdfG6U41zh3t7D4XVl56hH UQ6REIPOd2VrSE6nljWYVlflBk7tz26dLlTlhlj35V4PHyImxVZDITeCG0kN4A3Zc+q/ Rizg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1692327662; x=1692932462; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=gw+dMPwYGdcUIEhLYmCvyCHtU16fbr1cnaJitJA0p9A=; b=ade7R7aYwNDNAApjJ8P6SiOWNkRMPBInnMuYIfmoYoHWa5+tiLyUgX90M8lRU0zRm5 Ny5ZS/H3rkeAPwdWT74zoyEduQgDnsVvwu7RHObnQFCVgmbMH2rb3rszkh0YN/5jGJwk TtII0nWjmM/FTXytHIXO5U8vYFyK/JhhOmFt++fQJepqQriHgXRL6OlXPn1tiDmdWZiL 3wnvq3qWPCx+R3Gsh+lYo84CR5o/sKD8SzvUBeFgnzfjOpkyjIr8LuHXzvgCDqhuVALo /pqNEYLNPzv8TzwherVnUzD06uxpvMk4xTdL39f/FAeqjVe7j4kBV58qRcM6oWu/Wa/r 2FYw== X-Gm-Message-State: AOJu0YzZ8g0vKl8Fv5F5H4oDU/UwpupeUBYX9CZ/IswYIKGpA70AzGXZ rQwwgpDxu2+QqbJr761nnMV/BXZoCh4= X-Google-Smtp-Source: AGHT+IFPM78DB2MdUZ/WBWYkJMx4R3oLIT5c/6sgh7dSV90BdC9I4lBgwnjT5Oz2RJnETsUjGdlyMQ== X-Received: by 2002:a05:6a20:7f8f:b0:140:98b2:899b with SMTP id d15-20020a056a207f8f00b0014098b2899bmr2026845pzj.59.1692327662514; Thu, 17 Aug 2023 20:01:02 -0700 (PDT) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id q13-20020a62e10d000000b00686236718d8sm454048pfh.41.2023.08.17.20.01.01 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 17 Aug 2023 20:01:02 -0700 (PDT) In-Reply-To: <3E82D409-6903-4679-9031-939CA35791FF@gmail.com> X-Mailer: Apple Mail (2.3731.600.7) Received-SPF: pass client-ip=2607:f8b0:4864:20::52f; envelope-from=casouri@gmail.com; helo=mail-pg1-x52f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:308881 Archived-At: > On Aug 16, 2023, at 9:01 PM, JD Smith wrote: >=20 > I recently posted about the high variability of Emacs 29=E2=80=99s = tree-sitter navigation performance within a file. I decided to conduct = a simple test on a large python file of about 8400 lines to see if I = could learn more. The test is as follows: at the start of each line, = locate the current syntax node, and starting from it, navigate up to the = root node via `treesit-node-parent=E2=80=99. =20 >=20 > I was surprised to find that the time this takes grows as sqrt(N), for = line number N. This leads to performance variability of >100x for code = that needs to walk the local syntax tree in large files. Such = variability can make performance projections and optimizations for = latency-sensitive uses of tree-sitter (e.g. via font-lock) tricky. =20 >=20 > I=E2=80=99m unclear whether this is fundamental to the tree-sitter = parse/tree algorithm, or if the scaling comes from Emacs=E2=80=99 TS = implementation. It does vaguely remind me of similar scaling with an = old line-numbering algorithm, where lines were always being counted from = the beginning of the buffer, so very fast at the front, and very slow = near the end.=20 >=20 > Code and details here: >=20 > =20 > https://gist.github.com/jdtsmith/7fa6263a13559d587abb51827e6ae472 I=E2=80=99m not entirely surprised. In the parse tree that tree-sitter = generates is a DAG, where the parent node has pointers to the child = nodes, but not the other way around. That means to go to the parent node = from a child node, what tree-sitter actually does is go down from the = root node until it hits the parent node. This process is linear to the = height of the tree. Also, getting the node at point isn=E2=80=99t free either. To get the = node at point, we actually iterates from the first child node of the = root node until reaching one that contains the point, then iterate from = the first child node of that node until reaching one that contains the = point, etc, until we reach a leaf node. So log(N) time complexity is = expected. Theses are fundamental limits of tree-sitter, unless it changes its data = structure. I=E2=80=99m not too worried tho, because IIRC the absolute = time is very short. The 100x variability doesn=E2=80=99t mean much if = the 100x is still very fast. Yuan=