From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Update on tree-sitter structure navigation Date: Fri, 1 Sep 2023 22:01:54 -0700 Message-ID: <5E7F2A94-4377-45C0-8541-7F59F3B54BA1@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37154"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Danny Freeman , Theodor Thornhill , =?utf-8?Q?Jostein_Kj=C3=B8nigsen?= , Randy Taylor , Wilhelm Kirschbaum , Perry Smith , Dmitry Gutov To: emacs-devel Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Sep 02 07:03:09 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qcImo-0009Td-IK for ged-emacs-devel@m.gmane-mx.org; Sat, 02 Sep 2023 07:03:06 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qcIlz-0002zW-6C; Sat, 02 Sep 2023 01:02:15 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qcIlx-0002z8-QO for emacs-devel@gnu.org; Sat, 02 Sep 2023 01:02:13 -0400 Original-Received: from mail-pf1-x42f.google.com ([2607:f8b0:4864:20::42f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qcIlt-0000PR-Nj for emacs-devel@gnu.org; Sat, 02 Sep 2023 01:02:13 -0400 Original-Received: by mail-pf1-x42f.google.com with SMTP id d2e1a72fcca58-68bed8de5b9so2238958b3a.3 for ; Fri, 01 Sep 2023 22:02:08 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1693630927; x=1694235727; darn=gnu.org; h=to:cc:date:message-id:subject:mime-version :content-transfer-encoding:from:from:to:cc:subject:date:message-id :reply-to; bh=BmIELiWoMWhCEKcXIgWsMz5tMhJG5hzMsygtPGTj3AE=; b=ACYOSI9sRIV3j1fauqWZ+vxYUwrRKfLYV78eHtlgNQQM03BIvTv/7kHC21j+tuNxqz u1Li2kOpVd8f+c6nCHjuf82Okunzn2mSEjaJ+peCK+AcQlfeFbcnUXFLK2rmaX4SCXcq rgW+FYq81JETZBnWuE8+Fq6rLmA7XEI2+0pUD394dCr6faVIHN9I/Jd+APAq64ZagD80 QHJCtn3s8ypKZWvX4I2hg2bBcG+YDGmXX6uPC/Nts1cqMC6bLeR4jWMbkIl9jhA38hgX ynUTjJkXy3NAcKIkM3GnYK1Or0ZKpMhMFxd5qvRggMZDl2s6SX7dj3t/gbsGEfp5u+k4 8n0Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1693630927; x=1694235727; h=to:cc:date:message-id:subject:mime-version :content-transfer-encoding:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=BmIELiWoMWhCEKcXIgWsMz5tMhJG5hzMsygtPGTj3AE=; b=f1uLmSt4H6L5LTwH7ZLto0MVwEkVijnPkAhhY8/Vv3DBvmAgUVDFl5s6pwRITbVQDK JwTMMXs4HCufWIwdAO3nALYwHHr/eRYUR88cR1lcwzWy0Yr9saLnk72wZW82gZ6BAfkb Pft8qS5X/dzUO1bvy1hfuw/kMa5kff6K4k86/XgZsbxVVV28+I03oEPl0lWjH0PqOm+C QXHOKWrpc/uF2TvrI+/7MyA//WcLNIuS/WcsR7zhj/KwwwI/i36jeF7PX3QsR7f9F8l1 AJn9k8H0I2VEslR1Bp+SeAnF/qoUL7kp35l/ww1YABqphI66G9nkU5U3Bi48RTwMBaiA Mo0w== X-Gm-Message-State: AOJu0YxvFR0RkeNVaykE1rL2Fy2QQArcHDIrLbY9KU8nleW6XfBglABm kDOc09LFcgDmCTQGECMIE3FL5f/wTaQEGQ== X-Google-Smtp-Source: AGHT+IEJyx6GGHQ7BuPCBLEWbWoU3KMA2U4qOrubr6LAR3F6a61bFm1b/tWhn3Spe38HB0EXXc0Obw== X-Received: by 2002:a05:6a20:948a:b0:14b:f8d0:c276 with SMTP id hs10-20020a056a20948a00b0014bf8d0c276mr4385154pzb.29.1693630927211; Fri, 01 Sep 2023 22:02:07 -0700 (PDT) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id j13-20020aa7928d000000b0064f76992905sm3726578pfa.202.2023.09.01.22.02.06 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 01 Sep 2023 22:02:06 -0700 (PDT) X-Mailer: Apple Mail (2.3731.700.6) Received-SPF: pass client-ip=2607:f8b0:4864:20::42f; envelope-from=casouri@gmail.com; helo=mail-pf1-x42f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:309846 Archived-At: Hey guys, In the months after wrapping up tree-sitter stuff in emacs-29, I was = thinking about how to implement structural navigation and extracting = information from the parser with tree-sitter. In emacs-29 we have things = like treesit-beginning/end-of-defun, and treesit-defun-name. I was = thinking maybe we can generalize this to support getting arbitrary = =E2=80=9Cthing=E2=80=9D at point, move around them, and getting = information like the name of a defun, its arglist, parent of a class, = type of an variable declaration, etc, in a language-agnostic way. Also, at the time, we only support defining things by a regexp matching = a node=E2=80=99s type, which is often not enough.=20 And it would be nice to somehow take advantage of the tree-sitter = queries for the features I mentioned above. Tree-sitter query is what = every other editor are using for virtually all tree-sitter related = features. But in Emacs, we mostly only use it for font-lock. Here=E2=80=99s the progress as of now: - Functions like treesit-search-forward, treesit-induce-sparse-tree, = treesit-thing-at-point, treesit--navigate-thing, etc, support a richer = set of predicates now. Besides regexp matching the type, the predicate = can also be a predication function, or (REGEP . FUNC), or compound = predicates like (or PRED PRED) or (not PRED). - There=E2=80=99s now a variable treesit-thing-settings, which holds = definition for things. Then, instead of passing the predicate to the = functions I mentioned above, you can save the predicate in = treesit-thing-settings under a symbol, say =E2=80=98sexp', and pass the = symbol instead, just like thing-at-point.el. (We=E2=80=99ll work on = integrating with thing-at-point.el later.) - I can=E2=80=99t think of a good way to integrate tree-sitter queries = with the navigation functions we have right now. Most importantly, = tree-sitter query always search top-down, and you can=E2=80=99t limit = the depth it searches. OTOH, our navigation functions work by traversing = the tree node-to-node. - There=E2=80=99s no progress on getting information like name and type, = etc, in a language-agnostic way. I haven=E2=80=99t come up with a good = interface and/or implementation. I encourage interested folks to give it = some thought. Bonus points for reusing the query files neovim folks has = accumulated :-) Some other things on the TODO list that people can take a jab at: - Query-based indentation (neovim=E2=80=99s implementation can be a = source of inspiration) - Improve c-ts-mode (indentation styles, other cc-mode features, etc) = and other tree-sitter modes - Solve the grammar versioning/breaking-change problem: tree-sitter = grammar don=E2=80=99t have a version number, so every time the author = changes the grammar, our queries break, and loading the mode only = produces a giant error. - Major mode fallback/inheritance, this has been discussed many times, = no good solution emerged. - Isolated ranges. For many embedded languages, each blocks should be = independent from another, but currently all the embedded blocks are = connected together and parsed by a single parser. We probably need to = spawn a parser for each block. I=E2=80=99ll probably work on this one = next. Finally, feel free to send me an email or send to emacs-devel and CC me, = if there are things treesit.c and treesit.el can do better, or when = there are nice things in neovim and other editors and Emacs ought to = have, too. Yuan=