From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter api Date: Mon, 16 Aug 2021 23:18:19 -0700 Message-ID: <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> References: <83r1f7hydn.fsf@gnu.org> <95F37923-5BF9-4D81-B361-267CF119FBCA@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.60.0.2.21\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28761"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Stephen Leake , Eli Zaretskii , Theodor Thornhill , =?utf-8?Q?Cl=C3=A9ment_Pit-Claudel?= , emacs-devel To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Aug 17 08:19:05 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mFsRF-0007HG-Aw for ged-emacs-devel@m.gmane-mx.org; Tue, 17 Aug 2021 08:19:05 +0200 Original-Received: from localhost ([::1]:56786 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mFsRD-0008Pp-K6 for ged-emacs-devel@m.gmane-mx.org; Tue, 17 Aug 2021 02:19:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:46650) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mFsQb-0007hs-JD for emacs-devel@gnu.org; Tue, 17 Aug 2021 02:18:25 -0400 Original-Received: from mail-pj1-x1029.google.com ([2607:f8b0:4864:20::1029]:56043) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mFsQZ-0001R1-MY; Tue, 17 Aug 2021 02:18:25 -0400 Original-Received: by mail-pj1-x1029.google.com with SMTP id mq3so941760pjb.5; Mon, 16 Aug 2021 23:18:22 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=0IAswJfDzgFh9rBCBLsvSRZZT+JJuiJkvx/ImxhXX64=; b=UdjAAOuLFotcrK8UqwekrLVUI/o4MDrgj5144CAaT2/zjjQ4M5vIsPus/mAfn0qVAp qNiNirfwA2s1VTrOWEkKTmW20lh0tQj19r4V4HJYrgDwgBfJfIrsxGXQvAq8kV90Wm9i Esm2WFs4OHM4mWJbK4/GwC8xPP7DxUVTu1p/rx//6a9pThjJgN3r/Kq50a6eEaF+jb0+ bLrZ9RdzsqjNL9dYDfu86Dy9wkC0HGHWQiFM/l1ff9NvkqodK52/Syr3uX+fhVrpvRzA 5X3A8nXfLs7hq5w56DxHCEi7kis4+Mhqs6FWgecJX+KB5I2FvaU4VBUfPz6nqgv9/+VQ hIIg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:subject:from:in-reply-to:date:cc :content-transfer-encoding:message-id:references:to; bh=0IAswJfDzgFh9rBCBLsvSRZZT+JJuiJkvx/ImxhXX64=; b=Yv9Z/vg/qLK/MjbXDBtNwDGDPqQqj6m78fC6Me1RgCsbIrTW62TBgqP+/ayqXpxFMK hUGPiKkDtDiX10nQ7IyJlU5yq4F24I17SBt3M5/3rHnP7a/JRmlVCxx74HT7N3bqsRrE 7TaF23R8X0+8xKQrvXCY/cq90idGaT1t2eqvWdVgKjTAmAaQe84Jlu7/imtaGy0Nbu8n C7hxLskL8AlW0W7l6oUCMnquAJLOkDwP/3j87c0sDI22VI5/5FOSgqxdl0Gp8Mk4GXhJ e1L8VjBLanBvbpZskvA9UtWHCEKhn0sfClEO/S/dwko7QkWWaTrxGYKDjDhSZnBQl66J EMig== X-Gm-Message-State: AOAM532alYOPMpIQl3/Sx1kbTQibA4P6LSlZG/rTb/KCiFXjlTTxK1Vm GFeje+H8CggbKOFolLNA22w= X-Google-Smtp-Source: ABdhPJwak628IY2jLdbCOB87Rn8g3j3oao4fowZBSfQhbuLUhrT3jQUPEGTGGxS5lHexHzptYvd6eg== X-Received: by 2002:a17:90a:9308:: with SMTP id p8mr1922363pjo.119.1629181101477; Mon, 16 Aug 2021 23:18:21 -0700 (PDT) Original-Received: from 2603-8000-d900-871b-fc74-54d8-c576-7900.res6.spectrum.com (2603-8000-d900-871b-fc74-54d8-c576-7900.res6.spectrum.com. [2603:8000:d900:871b:fc74:54d8:c576:7900]) by smtp.gmail.com with ESMTPSA id m28sm1537895pgl.9.2021.08.16.23.18.20 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 16 Aug 2021 23:18:21 -0700 (PDT) In-Reply-To: X-Mailer: Apple Mail (2.3654.60.0.2.21) Received-SPF: pass client-ip=2607:f8b0:4864:20::1029; envelope-from=casouri@gmail.com; helo=mail-pj1-x1029.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:272475 Archived-At: >=20 > I'm thinking of rules specified via a function that takes a TS node > (from which the function can explore the rest of the TS tree) and = return > the indentation to use, represented as a pair (POSITION . OFFSET) > (meaning to indent OFFSET columns further than the column position of > POSITION). >=20 > The infrastructure would limit itself to making sure we have an = uptodate > tree (computed from a properly widened buffer), find the node > corresponding to point pass it to the function and then turn the = return > value into an actual column and indent the text accordingly (paying > attention to the usual difference between when point is "within the > indentation" vs "within the text=E2=80=9D). Okay, here is the (ad-hoc) infrastructure I came up with: We have a tree-sitter-simple-indent-function. Major-mode authors can set = indent-line-function to it to use the simple-indent system. = tree-sitter-simple-indent-function indents according to = tree-sitter-simple-indent-rules. Doc string of = tree-sitter-simple-indent-rules reads: A list of indent rule settings. Each indent rule setting should be (LANGUAGE . RULES), where LANGUAGE is a language symbol, and RULES is a list of (MATCHER ANCHOR OFFSET). MATCHER determines whether this rule applies, ANCHOR and OFFSET together determines which column to indent to. A MATCHER is a function that takes three arguments (NODE PARENT BOL). NODE is the largest (highest-in-tree) node starting at point. PARENT is the parent of NODE. BOL is the point where we are indenting: the beginning of line content, the position of the first non-whitespace character. If MATCHER returns non-nil, meaning the rule matches, Emacs then uses ANCHOR to find an anchor, it should be a function that takes the same argument (NODE PARENT BOL) and returns a point. Finally Emacs computes the column of that point returned by ANCHOR and adds OFFSET to it, and indent the line to that column. For MATCHER and ANCHOR, Emacs provides some convenient presets. See `tree-sitter-simple-indent-presets=E2=80=99. And doc string for tree-sitter-simple-indent-presets: A list of presets. These presets can be used as MATHER and ANCHOR in `tree-sitter-simple-indent-rules'. MATCHER: (match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN = NODE-INDEX-MAX) NODE-TYPE checks for node's type, PARENT-TYPE check for parent's type, NODE-FIELD checks for the filed name of node in the parent, NODE-INDEX-MIN and NODE-INDEX-MAX checks for the node's index in the parent. Therefore, to match the first child where parent is \"argument_list\", use (match nil \"argument_list\" nil nil 0 0). no-node Matches the case where node is nil, i.e., there is no node that starts at point. This is the case when indenting an empty line. (node-at-point TYPE NAMED) Check that the node at point -- not the largest node starting at point -- has type TYPE. If NAMED non-nil, check the named node at point. (parent-is TYPE) Check that the parent has type TYPE. (node-is TYPE) Checks that the node has type TYPE. (parent-match PATTERN) Checks that the parent matches PATTERN, a query pattern. (node-match PATTERN) Checks that the node matches PATTERN, a query pattern. ANCHOR: first-child Find the first child of the parent. parent Find the parent. prev-sibling Find node's previous sibling. no-indent Do nothing. prev-line Find the named node on previous line. This can be used when indenting an empty line: just indent like the previous node. An example of using these facility can be found in = ts-c-tree-sitter-indent-rules. For example,=20 ((match nil "function_definition" "body") parent 0) means =E2=80=9Cmatch the node which it=E2=80=99s parent=E2=80=99s type = is =E2=80=9Cfunction_definition=E2=80=9D and its field name is = =E2=80=9Cbody=E2=80=9D, indent to the start of its parent. That indents = the starting braces in int main () { } ((parent-is "call_expression") parent 2) Means =E2=80=9Cmatch the node which its=E2=80=99 parent=E2=80=99s type = is =E2=80=9Ccall_expression=E2=80=9D, and indent to the start of its = parent + 2. That indents the second line in my_cool_function (arg1, arg2, arg3) I=E2=80=99ve implemented some indentation rules for C in ts-c-mode as = usual. I expect someone more knowledgeable in C to actually implement it = later. So=E2=80=A6 do you think this is ok, or convoluted? In particular, is = there a better way to implement those =E2=80=9Cpresets=E2=80=9D? I = don=E2=80=99t want to define them as normal functions, because then = their name will be super long (parent-is -> = tree-sitter-simple-indent-parent-is) and annoying to use when writing = rules, but putting them in an alist (tree-sitter-simple-indent-presets) = is a bit ad-hoc. I call these presets with tree-sitter--simple-apply, = which basically looks up tree-sitter-simple-indent-presets, get the = function and apply it. You can find the latest version at = https://github.com/casouri/emacs/tree/ts I.e., git clone https://github.com/casouri/emacs.git --branch ts Yuan=