From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter api Date: Wed, 18 Aug 2021 11:27:36 -0700 Message-ID: <86im02bobr.fsf@stephe-leake.org> References: <83r1f7hydn.fsf@gnu.org> <95F37923-5BF9-4D81-B361-267CF119FBCA@gmail.com> <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="32261"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (windows-nt) Cc: Eli Zaretskii , Theodor Thornhill , Stefan Monnier , =?utf-8?Q?Cl?= =?utf-8?Q?=C3=A9ment?= Pit-Claudel , emacs-devel To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Aug 18 20:28:47 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mGQIx-00088a-1w for ged-emacs-devel@m.gmane-mx.org; Wed, 18 Aug 2021 20:28:47 +0200 Original-Received: from localhost ([::1]:38526 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mGQIv-0005Vh-NP for ged-emacs-devel@m.gmane-mx.org; Wed, 18 Aug 2021 14:28:45 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:48806) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mGQI4-0004Jn-HD for emacs-devel@gnu.org; Wed, 18 Aug 2021 14:27:52 -0400 Original-Received: from gateway30.websitewelcome.com ([192.185.147.85]:22403) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mGQI0-0005zi-4y for emacs-devel@gnu.org; Wed, 18 Aug 2021 14:27:50 -0400 Original-Received: from cm13.websitewelcome.com (cm13.websitewelcome.com [100.42.49.6]) by gateway30.websitewelcome.com (Postfix) with ESMTP id F07EA7825 for ; Wed, 18 Aug 2021 13:27:41 -0500 (CDT) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id GQHtmdDBSrJtZGQHtmBERL; Wed, 18 Aug 2021 13:27:41 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:In-Reply-To:Date:References:Subject:Cc:To:From:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=1LVpkMSHvjIO6Z3P8ll8dHRNKu+yKMO9t+MaGow5r2M=; b=SkhgS0AM9FlIz72pUrQfHk14// aV/qZmXwthNZo9XCICr2gNQ7T9j6WOVOx6U6Ocey3srXT+niDMIppcodpfdiur0Zf/+sb9o4MIPLP NVV9MUNBxI1QmvVlkMhinh764ZJ1dK+wydw5aWPMSmGRLEVBihr6Z073kUzNPijkFsu1eVYWvqH1L l/Dn0+k5h7bzkF6MLNQmwcDwkzhYX9zT4lXYSW7I32aIrC+o6jwhasznc1uG8MMeInVEu3iraULAM 4hgWoq+xmBuav3I6tbkKvEwNNdYXtLq6rbsA1z3essCEPZhh91QEFWzYgSuhqM5jOiXZ+xazKo4cu 1lQFRm/w==; Original-Received: from [76.77.182.20] (port=50748 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mGQHs-003aMD-Jc; Wed, 18 Aug 2021 12:27:40 -0600 In-Reply-To: <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> (Yuan Fu's message of "Mon, 16 Aug 2021 23:18:19 -0700") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1mGQHs-003aMD-Jc X-Source-Sender: (Takver4) [76.77.182.20]:50748 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 6 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes Received-SPF: permerror client-ip=192.185.147.85; envelope-from=stephen_leake@stephe-leake.org; helo=gateway30.websitewelcome.com X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.779 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:272607 Archived-At: This looks very interesting, but I have a migraine right now, so I'll have to look at it later. You could try writing indent rules for Ada; current ada-mode code is in https://savannah.nongnu.org/git/?group=3Dada-mode. See the test/ directory for examples of known good indentation. ada-mode takes the approach of embedding the indent rules directly in the grammar, and the functions that do that provide a few more options than yours. To see the definition of those functions, you'll have to install the wisi package, and look in wisi.info, section Grammar actions. (it would be nice if that info/html file was linked from the GNU ELPA package page; I'll start a new thread for that). Yuan Fu writes: >>=20 >> I'm thinking of rules specified via a function that takes a TS node >> (from which the function can explore the rest of the TS tree) and return >> the indentation to use, represented as a pair (POSITION . OFFSET) >> (meaning to indent OFFSET columns further than the column position of >> POSITION). >>=20 >> The infrastructure would limit itself to making sure we have an uptodate >> tree (computed from a properly widened buffer), find the node >> corresponding to point pass it to the function and then turn the return >> value into an actual column and indent the text accordingly (paying >> attention to the usual difference between when point is "within the >> indentation" vs "within the text=E2=80=9D). > > Okay, here is the (ad-hoc) infrastructure I came up with: > > We have a tree-sitter-simple-indent-function. Major-mode authors can set = indent-line-function to it to use the simple-indent system. tree-sitter-sim= ple-indent-function indents according to tree-sitter-simple-indent-rules. D= oc string of tree-sitter-simple-indent-rules reads: > > A list of indent rule settings. > Each indent rule setting should be (LANGUAGE . RULES), > where LANGUAGE is a language symbol, and RULES is a list of > (MATCHER ANCHOR OFFSET). > > MATCHER determines whether this rule applies, ANCHOR and OFFSET > together determines which column to indent to. > > A MATCHER is a function that takes three arguments (NODE PARENT > BOL). NODE is the largest (highest-in-tree) node starting at > point. PARENT is the parent of NODE. BOL is the point where we > are indenting: the beginning of line content, the position of the > first non-whitespace character. > > If MATCHER returns non-nil, meaning the rule matches, Emacs then > uses ANCHOR to find an anchor, it should be a function that takes > the same argument (NODE PARENT BOL) and returns a point. > > Finally Emacs computes the column of that point returned by ANCHOR > and adds OFFSET to it, and indent the line to that column. > > For MATCHER and ANCHOR, Emacs provides some convenient presets. > See `tree-sitter-simple-indent-presets=E2=80=99. > > And doc string for tree-sitter-simple-indent-presets: > > A list of presets. > These presets can be used as MATHER and ANCHOR in > `tree-sitter-simple-indent-rules'. > > MATCHER: > > (match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX) > > NODE-TYPE checks for node's type, PARENT-TYPE check for > parent's type, NODE-FIELD checks for the filed name of node > in the parent, NODE-INDEX-MIN and NODE-INDEX-MAX checks for > the node's index in the parent. Therefore, to match the > first child where parent is \"argument_list\", use (match nil > \"argument_list\" nil nil 0 0). > > no-node > > Matches the case where node is nil, i.e., there is no node > that starts at point. This is the case when indenting an > empty line. > > (node-at-point TYPE NAMED) > > Check that the node at point -- not the largest node starting at > point -- has type TYPE. If NAMED non-nil, check the named node > at point. > > (parent-is TYPE) > > Check that the parent has type TYPE. > > (node-is TYPE) > > Checks that the node has type TYPE. > > (parent-match PATTERN) > > Checks that the parent matches PATTERN, a query pattern. > > (node-match PATTERN) > > Checks that the node matches PATTERN, a query pattern. > > ANCHOR: > > first-child > > Find the first child of the parent. > > parent > > Find the parent. > > prev-sibling > > Find node's previous sibling. > > no-indent > > Do nothing. > > prev-line > > Find the named node on previous line. This can be used when > indenting an empty line: just indent like the previous node. > > An example of using these facility can be found in ts-c-tree-sitter-inden= t-rules. > > For example,=20 > > ((match nil "function_definition" "body") parent 0) > > means =E2=80=9Cmatch the node which it=E2=80=99s parent=E2=80=99s type is= =E2=80=9Cfunction_definition=E2=80=9D and its field name is =E2=80=9Cbody= =E2=80=9D, indent to the start of its parent. That indents the starting bra= ces in > > int main () > { > } > > ((parent-is "call_expression") parent 2) > > Means =E2=80=9Cmatch the node which its=E2=80=99 parent=E2=80=99s type is= =E2=80=9Ccall_expression=E2=80=9D, and indent to the start of its parent += 2. That indents the second line in > > my_cool_function > (arg1, arg2, arg3) > > I=E2=80=99ve implemented some indentation rules for C in ts-c-mode as usu= al. I expect someone more knowledgeable in C to actually implement it later. > > So=E2=80=A6 do you think this is ok, or convoluted? In particular, is the= re a better way to implement those =E2=80=9Cpresets=E2=80=9D? I don=E2=80= =99t want to define them as normal functions, because then their name will = be super long (parent-is -> tree-sitter-simple-indent-parent-is) and annoyi= ng to use when writing rules, but putting them in an alist (tree-sitter-sim= ple-indent-presets) is a bit ad-hoc. I call these presets with tree-sitter-= -simple-apply, which basically looks up tree-sitter-simple-indent-presets, = get the function and apply it. > > You can find the latest version at https://github.com/casouri/emacs/tree/= ts > I.e., git clone https://github.com/casouri/emacs.git --branch ts > > Yuan > > --=20 -- Stephe