From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stephen Leake Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Re: Tree-sitter api Date: Tue, 24 Aug 2021 07:59:09 -0700 Message-ID: <86bl5m51oi.fsf@stephe-leake.org> References: <83r1f7hydn.fsf@gnu.org> <95F37923-5BF9-4D81-B361-267CF119FBCA@gmail.com> <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> <86im02bobr.fsf@stephe-leake.org> <8B9072FF-02F5-48CC-844A-1E0C2FC42CE9@gmail.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30412"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (windows-nt) Cc: Eli Zaretskii , Theodor Thornhill , Stefan Monnier , =?utf-8?Q?Cl?= =?utf-8?Q?=C3=A9ment?= Pit-Claudel , emacs-devel To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Aug 24 17:00:10 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mIXuM-0007aS-4U for ged-emacs-devel@m.gmane-mx.org; Tue, 24 Aug 2021 17:00:10 +0200 Original-Received: from localhost ([::1]:56690 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mIXuK-0008IJ-1B for ged-emacs-devel@m.gmane-mx.org; Tue, 24 Aug 2021 11:00:08 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33710) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mIXta-0007Zv-7U for emacs-devel@gnu.org; Tue, 24 Aug 2021 10:59:22 -0400 Original-Received: from gateway33.websitewelcome.com ([192.185.146.130]:18376) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mIXtW-0003hI-GQ for emacs-devel@gnu.org; Tue, 24 Aug 2021 10:59:21 -0400 Original-Received: from cm17.websitewelcome.com (cm17.websitewelcome.com [100.42.49.20]) by gateway33.websitewelcome.com (Postfix) with ESMTP id E27D7BF456 for ; Tue, 24 Aug 2021 09:59:12 -0500 (CDT) Original-Received: from host2007.hostmonster.com ([67.20.76.71]) by cmsmtp with SMTP id IXtQmnZGkMGeEIXtQmp3kF; Tue, 24 Aug 2021 09:59:12 -0500 X-Authority-Reason: nr=8 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=stephe-leake.org; s=default; h=Content-Transfer-Encoding:Content-Type: MIME-Version:Message-ID:In-Reply-To:Date:References:Subject:Cc:To:From:Sender :Reply-To:Content-ID:Content-Description:Resent-Date:Resent-From: Resent-Sender:Resent-To:Resent-Cc:Resent-Message-ID:List-Id:List-Help: List-Unsubscribe:List-Subscribe:List-Post:List-Owner:List-Archive; bh=tOtAxceFD+MvTMKw7dokMYb+Lfqj+/QExjz7naOwFq0=; b=n9oZ5aE5GE/veqdKsJG+IrHUSe 5mbxx8vN80D1aUdQEhNvQafijH5II7xovjku5qpaFXV6MZblaOnG5oqvvZ6YWlkWIoPiz5gdO8MAz /pmQon2/dQ3I6JMKvtpDwROtBVEa8pIEvFShPwpUEbB/N78QRl76McXkVUdSGhuppARAK7SEqM117 YHT/XirefdgHxitoYef8ec1BoeqS8sy70Jcf0EUy1zojtYmcMJdIApZ6ZH1vsY7LamhFYpIN3gHoG 2sDTIRpTKaOXnjnHAcckaYohnIaLnN4nVkSEQbSXo/OaGWU9A4f6gHUOnWhDb3QI3GYA+vClPdJZ7 7377pq9w==; Original-Received: from [76.77.182.20] (port=61403 helo=Takver4) by host2007.hostmonster.com with esmtpsa (TLS1.2) tls TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384 (Exim 4.94.2) (envelope-from ) id 1mIXtP-001hN0-R9; Tue, 24 Aug 2021 08:59:11 -0600 In-Reply-To: <8B9072FF-02F5-48CC-844A-1E0C2FC42CE9@gmail.com> (Yuan Fu's message of "Sun, 22 Aug 2021 23:51:08 -0700") X-AntiAbuse: This header was added to track abuse, please include it with any abuse report X-AntiAbuse: Primary Hostname - host2007.hostmonster.com X-AntiAbuse: Original Domain - gnu.org X-AntiAbuse: Originator/Caller UID/GID - [47 12] / [47 12] X-AntiAbuse: Sender Address Domain - stephe-leake.org X-BWhitelist: no X-Source-IP: 76.77.182.20 X-Source-L: No X-Exim-ID: 1mIXtP-001hN0-R9 X-Source-Sender: (Takver4) [76.77.182.20]:61403 X-Source-Auth: stephen_leake@stephe-leake.org X-Email-Count: 6 X-Source-Cap: c3RlcGhlbGU7c3RlcGhlbGU7aG9zdDIwMDcuaG9zdG1vbnN0ZXIuY29t X-Local-Domain: yes Received-SPF: permerror client-ip=192.185.146.130; envelope-from=stephen_leake@stephe-leake.org; helo=gateway33.websitewelcome.com X-Spam_score_int: -8 X-Spam_score: -0.9 X-Spam_bar: / X-Spam_report: (-0.9 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_NEUTRAL=0.779 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:272908 Archived-At: Yuan Fu writes: >>=20 >> ada-mode takes the approach of embedding the indent rules directly in >> the grammar, and the functions that do that provide a few more options >> than yours. To see the definition of those functions, you'll have to >> install the wisi package, and look in wisi.info, section Grammar >> actions. (it would be nice if that info/html file was linked from the >> GNU ELPA package page; I'll start a new thread for that). > > I had a cursory look at the manual for indent in wisi and have some > questions. Why does wisi indent from =E2=80=9Clow-level productions=E2=80= =9D?=20 The indent of every new-line must be specified; low level productions can contain new-lines. > (I think most indentation engine works line-by-line from the first > line.) I don=E2=80=99t know much about how wisi works, but the indentation > system seems to stem from circumstances quite different from that of > tree-sitter. For example, wiki=E2=80=99s indent is devised alongside the > grammar definition, while for tree-sitter, all the hard work of > defining grammar is done for me and I=E2=80=99m merely a user of the gram= mar: > that makes indenting with tree-sitter a much simpler job. The Ada grammar is taken from the Ada Reference Manual; the indent information is added after. The indent information could be in a separate file, as in tree-sitter (wisitoken does not currently support this; there would need to be a way to specify which production the indent rule is associated with). A tree-sitter based indent engine still has to specify the indent of every new-line; it's the same amount of information. Taking the examples from your email: > ((match nil "function_definition" "body") parent 0) > means =E2=80=9Cmatch the node which it=E2=80=99s parent=E2=80=99s type is > =E2=80=9Cfunction_definition=E2=80=9D and its field name is =E2=80=9Cbody= =E2=80=9D, indent to the > start of its parent. That indents the starting braces in > int main () > { > } Refering to the tree-sitter-c grammar at https://github.com/tree-sitter/tree-sitter-c/blob/master/grammar.js, there is a C grammar production (in tree-sitter syntax): function_definition: $ =3D> seq( optional($.ms_call_modifier), $._declaration_specifiers, field('declarator', $._declarator), field('body', $.compound_statement) ), In wisitoken syntax, this is: function_definition : [ms_call_modifier] declaration_specifiers declarator=3Ddeclarator body=3Dcompound_statement (the current wisi user guide does not define the "=3D" syntax for declaring token names, but it is supported; I'll add it to the user guide) The indent rule specifies the indent of the field named 'body', relative to the start of the production. So in wisitoken, this would specify one component of the indent action for this production: {(wisi-indent-action [nil nil nil (body . 0)])} Presumably there are other rules that specify the indent of the other tokens in that production, so they would not be 'nil', which in wisitoken means "undefined"; it is an error for any new-line to have an undefined indent after all indent actions are applied. Next example: ((parent-is "call_expression") parent 2) The production is: call_expression: $ =3D> prec(PREC.CALL, seq( field('function', $._expression), field('arguments', $.argument_list) )), In wisitoken syntax (note that wisitoken does not support precedence declarations (yet)): call_expression : function=3Dexpression arguments=3Dargument_list {(wisi-indent-action [nil (arguments . 2)])} So your syntax for indent is much more verbose than the wisi syntax (because each token gets a separate rule), but specifies the same information. Your syntax also requires naming each token that is referenced in an indent rule; wisitoken can use token position to do that, which is the main reason indent is specified directly in the grammar file; it's very easy to associate each indent expression with the corresponding token, without having to make up names for the tokens. Here are the above wisitoken productions without the token names: function_definition : [ms_call_modifier] declaration_specifiers declarator compound_statement {(wisi-indent-action [nil nil nil 0])} call_expression : expression argument_list {(wisi-indent-action [nil 2])} To be fair, we'd have to look at the other types of rules, to see if this pattern holds up. I think you were biased by the "matching" rules tree-sitter supports. That approach is reasonable when you only want to specify information for a few nodes in the tree. Wisi assumes you want to specify indent information for most of the nodes in the tree, so it supports a tree-traversal model instead. Tree-sitter does support tree traversal, but doesn't provide an easy way to add information for each node, as the wisi indent-action syntax does. --=20 -- Stephe