From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Tree-sitter api Date: Thu, 26 Aug 2021 22:18:08 -0700 Message-ID: <3A182DFA-0CCA-4F2A-AE55-E829E76D717F@gmail.com> References: <83r1f7hydn.fsf@gnu.org> <95F37923-5BF9-4D81-B361-267CF119FBCA@gmail.com> <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> <86im02bobr.fsf@stephe-leake.org> <8B9072FF-02F5-48CC-844A-1E0C2FC42CE9@gmail.com> <86bl5m51oi.fsf@stephe-leake.org> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.13\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="24340"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , Theodor Thornhill , Stefan Monnier , =?utf-8?Q?Cl=C3=A9ment_Pit-Claudel?= , emacs-devel To: Stephen Leake Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Aug 27 07:18:51 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mJUGQ-000683-Su for ged-emacs-devel@m.gmane-mx.org; Fri, 27 Aug 2021 07:18:50 +0200 Original-Received: from localhost ([::1]:35708 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mJUGP-00011X-1U for ged-emacs-devel@m.gmane-mx.org; Fri, 27 Aug 2021 01:18:49 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:35758) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mJUFq-0000Lo-GG for emacs-devel@gnu.org; Fri, 27 Aug 2021 01:18:14 -0400 Original-Received: from mail-oi1-x234.google.com ([2607:f8b0:4864:20::234]:46721) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mJUFo-0005Yf-0L; Fri, 27 Aug 2021 01:18:14 -0400 Original-Received: by mail-oi1-x234.google.com with SMTP id o185so7948856oih.13; Thu, 26 Aug 2021 22:18:10 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=RTXYmwgCircFSoQ3CCBEnZn8U8Xb61OLgXSgJIwwDIo=; b=YUkc9WtM7ftJBI8p7JAeYo7l7hsCsD4ZihrqguM7wvLzbH4LAmV5MAIX9KtA7+JB38 rVHT4rcy1TeJrUvXEF0rRVIRKhrzOFG0W+OLb4DnCDgsfSeIIzcUv27cxOXzC8Lipzw9 NszjcaQK3AmR8posjFF64hX8v/SY06lR0SB1WFHPMeSCPUMk0KkqLyY5V5d1sutsqipV Dbhm5RHgdrJ97NooLUiZNhFMSH6jfMIYMCGUVMb4Zlw7W+xAOqCiZ3dV1/rgg0Jc238R aUabURsUrmj4jlG1NLM6ZOOMyItfxqYYKNJwxCtnqGaFj8NgRnIbvd9AlVj8HWrzzdIh zo5A== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=RTXYmwgCircFSoQ3CCBEnZn8U8Xb61OLgXSgJIwwDIo=; b=Db0sNxfTv4Nu+4hGxFPk5ZheTA90CIV/YdXO2XCdMuvaz2j55dzDhiRtvYOdBTexpD 8UDYmiPFB4uMAnx0DoNH8oA9jtIHwM0IIDnaE1l531JfkM1fhGVa05zMzGQfkKEWJfay NKghhc/9OD5rMETIPflrZmK8ZPmjoTO8d6cchPitYuOTHewEbahMVgDn/X8KpvaN2e5q gIuSqXOLEx3A2MCfr9hQ7HD7J1EhvZLMluP0NDAD8xsLgMgg04kn+Wo18Dw2rCpVsmQi we4xzrQVBLGFIawcv8m/vHf1XlR+M5Im9L+4Z/xYq9r/WwhnIa3razf/J25LyChBRkRt YpkQ== X-Gm-Message-State: AOAM532OIlujb0ORaAkffxjyOFEkfzDbQWFcFsj+aFxllo7/YiM4TXlt l8iDoJtmwOt5GhGhFdGzTIY= X-Google-Smtp-Source: ABdhPJzCa79hkJ1vbdYOJ+s8dXMPojqd83pdvQLIr8/fCu/HlVC5XlVzLdBNRWOx+oL/8zv4Khxkcw== X-Received: by 2002:aca:1216:: with SMTP id 22mr3368683ois.35.1630041489960; Thu, 26 Aug 2021 22:18:09 -0700 (PDT) Original-Received: from smtpclient.apple ([2600:1700:2ec7:8c9f:c59b:314a:fb16:6e26]) by smtp.gmail.com with ESMTPSA id e11sm1192720oiw.18.2021.08.26.22.18.09 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Thu, 26 Aug 2021 22:18:09 -0700 (PDT) In-Reply-To: <86bl5m51oi.fsf@stephe-leake.org> X-Mailer: Apple Mail (2.3654.120.0.1.13) Received-SPF: pass client-ip=2607:f8b0:4864:20::234; envelope-from=casouri@gmail.com; helo=mail-oi1-x234.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:273113 Archived-At: Thank you very much for spending time on this :-) > On Aug 24, 2021, at 7:59 AM, Stephen Leake = wrote: >=20 > Yuan Fu writes: >=20 >>>=20 >>> ada-mode takes the approach of embedding the indent rules directly = in >>> the grammar, and the functions that do that provide a few more = options >>> than yours. To see the definition of those functions, you'll have to >>> install the wisi package, and look in wisi.info, section Grammar >>> actions. (it would be nice if that info/html file was linked from = the >>> GNU ELPA package page; I'll start a new thread for that). >>=20 >> I had a cursory look at the manual for indent in wisi and have some >> questions. Why does wisi indent from =E2=80=9Clow-level = productions=E2=80=9D?=20 >=20 > The indent of every new-line must be specified; low level productions > can contain new-lines. Ah, I see, what I did is to find the =E2=80=9Clargest=E2=80=9D node that = starts at BOL, and try to match that. IIUC, wisi starts from the = =E2=80=9Csmallest=E2=80=9D entity, and goes up (by getting its parent = repeatedly) until there is a non-nil indent rule for it? [snip] > So your syntax for indent is much more verbose than the wisi syntax > (because each token gets a separate rule), but specifies the same > information. >=20 > Your syntax also requires naming each token that is referenced in an > indent rule; wisitoken can use token position to do that, which is the > main reason indent is specified directly in the grammar file; it's = very > easy to associate each indent expression with the corresponding token, > without having to make up names for the tokens. > Here are the above > wisitoken productions without the token names: >=20 > function_definition : [ms_call_modifier] declaration_specifiers > declarator compound_statement > {(wisi-indent-action [nil nil nil 0])} >=20 > call_expression : expression argument_list > {(wisi-indent-action [nil 2])} >=20 > To be fair, we'd have to look at the other types of rules, to see if > this pattern holds up. I tried and all rules can be translated into wisi=E2=80=99s style. = However, it ends up as verbose as the previous one. My idea is to write = out match patterns (similar to that in wisi) and give names to the = interesting ones (so we use names as opposed to position). Then, if any = matched node happens to be the node at point, use that node=E2=80=99s = corresponding indent rule to indent. And in the indent rule, we can = refer to other matched nodes. For example, in the indent rule of = list_rest, the anchor is list_first. Maybe there are better ways to implement this, but at its current stage = I don=E2=80=99t think this is better than tree-sitter-simple-indent. I think part of the reason why wisi=E2=80=99s indent rule can be = succinct is that it is written along the grammar definition. It is hard = to make tree-sitter=E2=80=99s indent rule as succinct while being easy = to understand. (defvar tree-sitter-query-indent-rules '((tree-sitter-c "(function_definition body: (_) @body) (field_declaration_list) @field_decl (call_expression (_) @call_child) (if_statement (condition) @if_cond (consequence) @if_cons (alternative) @if_alt \"else\" @else) (switch_statement (condition) @switch_cond) (case_statement (_) @case-child) @case (compound_statement) @lbracket \"}\" @rbracket (compound_statement . (_) @list_first (_)* @list_rest) (initializer_list . (_) @list_first (_)* @list_rest) (argument_list . (_) @list_first (_)* @list_rest) (parameter_list . (_) @list_first (_)* @list_rest) (field_declaration_list . (_) @list_first (_)* @list_rest) " (body parent 0) (field_decl parent 0) (call_child parent 2) (if_cond parent 2) (if_cons parent 2) (if_alt parent 2) (switch_cond parent 2) (else parent 0) (case parent 0) (case-child parent 2) (lbracket parent 2) (rbracket parent 0) (list_first parent 2) (list_rest list_first 0))) "A list of indent rule settings. Each indent rule setting should be (LANGUAGE PATTERN INDENT INDENT...) where LANGUAGE is a language symbol, PATTERN is a query pattern string, and each INDENT is a list (CAPTURE_NAME ANCHOR OFFSET) If a captured node matches with the node at point, Emacs looks for an INDENT that has a matching CAPTURE_NAME, and use the ANCHOR and OFFSET of that INDENT to indent the current line. ANCHOR should be a capture name, this capture name should capture another node in PATTERN. Emacs finds the column of that node, adds OFFSET to it, and indent the current line to that column. TODO: examples in manual") >=20 > I think you were biased by the "matching" rules tree-sitter supports. > That approach is reasonable when you only want to specify information > for a few nodes in the tree. Wisi assumes you want to specify indent > information for most of the nodes in the tree, so it supports a > tree-traversal model instead. I assumed that the indent rule for most nodes would be something basic, = like =E2=80=9Csame as previous line=E2=80=9D, and we only need to = specify indent rules for some =E2=80=9Cspecial=E2=80=9D nodes.=20 IIUC, this tree-traversal method that you mentioned is like going = bottom-up, and (in tree-sitter terms) match on each level, and = accumulate indent delta for each matched indent rule, is that right? = Does wisi go all the way up to top-level? > Tree-sitter does support tree traversal, > but doesn't provide an easy way to add information for each node, as = the > wisi indent-action syntax does. Yes, I would still need to use a match pattern and name each node that I = want to specify an indent delta for. There is no way to specify indent = by position in the match pattern without naming each node. Yuan=