From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter api Date: Tue, 24 Aug 2021 20:21:39 -0400 Message-ID: References: <83r1f7hydn.fsf@gnu.org> <95F37923-5BF9-4D81-B361-267CF119FBCA@gmail.com> <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="5382"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: Theodor Thornhill , Eli Zaretskii , =?windows-1252?Q?Cl=E9ment?= Pit-Claudel , Stephen Leake , emacs-devel To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Aug 25 02:22:53 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mIggv-0001AK-3A for ged-emacs-devel@m.gmane-mx.org; Wed, 25 Aug 2021 02:22:53 +0200 Original-Received: from localhost ([::1]:56656 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mIggt-0007C9-CF for ged-emacs-devel@m.gmane-mx.org; Tue, 24 Aug 2021 20:22:51 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:53150) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mIgfv-0006Wt-MB for emacs-devel@gnu.org; Tue, 24 Aug 2021 20:21:51 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:8739) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mIgfq-0005ZE-LF; Tue, 24 Aug 2021 20:21:49 -0400 Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 728B2440FCC; Tue, 24 Aug 2021 20:21:43 -0400 (EDT) Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id BC490440FC2; Tue, 24 Aug 2021 20:21:40 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1629850900; bh=NTLV+czt7tFKuPZtg3CYYnMaMz59HiOFxbwhiJVpaFQ=; h=From:To:Cc:Subject:References:Date:In-Reply-To:From; b=UTaiWu1KkLstzhtfbypBggulgxs11aslEkGk+LmhlqE+sTF9hfvZoUIFOUZr3LyCo NZh68pYWkQknuup0OQZC9KmHoccxYhR4YJG/cRRK0srfyn40cJ/mXdCYfcg7N59Pva nWn2bqA6taA0FgC7dB5FsezXTfFldTTImp8hF4NnbTbaHS0c7x5atv91ukRHWISEvC rYDtQqjWv88yy+G/Kye/fPacV6C//LnY46pmhs6bUZvA8a7A+e6ogd643rcXloFQeM s1GAW5Vm7XgCxaV+39YpRhm91PL3TpkblW3H6e309ttJoLpaUWJyMWwqqXh/XOBMty V5tezyulIcj2Q== Original-Received: from alfajor (unknown [104.247.244.135]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 70BB71202BC; Tue, 24 Aug 2021 20:21:40 -0400 (EDT) In-Reply-To: <735AF34C-FD18-4A6A-A99D-E5D8EB4DE4F3@gmail.com> (Yuan Fu's message of "Mon, 16 Aug 2021 23:18:19 -0700") Received-SPF: pass client-ip=132.204.25.50; envelope-from=monnier@iro.umontreal.ca; helo=mailscanner.iro.umontreal.ca X-Spam_score_int: -42 X-Spam_score: -4.3 X-Spam_bar: ---- X-Spam_report: (-4.3 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, RCVD_IN_DNSWL_MED=-2.3, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:272935 Archived-At: > Okay, here is the (ad-hoc) infrastructure I came up with: It's more than what I proposed, but it looks fairly good. See patch below which is the "side effect" of reading your code. You'll see that I removed the "-function" from the function name (this suffix is used for variables holding functions rather than for the function themselves) and I split that function into two, the outer one (tree-sitter-indent) implementing basically what I suggested and the inner one (tree-sitter-simple-indent) implementing the extra structure you added to it, mediated by a new var `tree-sitter-indent-function` which modes can set if they want to use another algorithm than the one you implemented in `tree-sitter-simple-indent`. The reason why I divided it this way is that my experience with indentation code is that it can be useful occasionally to call recursively the indentation code to know where a node *would* be indented. This comes in handy when you want to be able to provide indentation styles like: let myvariable = if (foo) { bar } else { baz } where the body of the `if` branches needs to be indented relative to the position where the `if` itself would be indented if it were on its own line. Stefan PS: The patch also adds some space before open-paren-in-column-0-in-strings to circumvent some problems with outline-minor-mode incorrectly thinking those open-parens correspond to actual top-level definitions :-( diff --git a/lisp/tree-sitter.el b/lisp/tree-sitter.el index 83aa2d0d123..2c5d103c42d 100644 --- a/lisp/tree-sitter.el +++ b/lisp/tree-sitter.el @@ -52,6 +52,8 @@ tree-sitter-should-enable-p ;;; Parser API supplement +(defvar tree-sitter-parser-list) + (defun tree-sitter-get-parser (language) "Find the first parser using LANGUAGE in `tree-sitter-parser-list'." (catch 'found @@ -196,7 +198,7 @@ tree-sitter-simple-indent-rules "A list of indent rule settings. Each indent rule setting should be (LANGUAGE . RULES), where LANGUAGE is a language symbol, and RULES is a list of -(MATCHER ANCHOR OFFSET). + (MATCHER ANCHOR OFFSET). MATCHER determines whether this rule applies, ANCHOR and OFFSET together determines which column to indent to. @@ -289,7 +291,7 @@ tree-sitter-simple-indent-presets MATCHER: -(match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX) + (match NODE-TYPE PARENT-TYPE NODE-FIELD NODE-INDEX-MIN NODE-INDEX-MAX) NODE-TYPE checks for node's type, PARENT-TYPE check for parent's type, NODE-FIELD checks for the filed name of node @@ -304,25 +306,25 @@ tree-sitter-simple-indent-presets that starts at point. This is the case when indenting an empty line. -(node-at-point TYPE NAMED) + (node-at-point TYPE NAMED) Check that the node at point -- not the largest node at point, has type TYPE. If NAMED non-nil, check the named node at point. -(parent-is TYPE) + (parent-is TYPE) Check that the parent has type TYPE. -(node-is TYPE) + (node-is TYPE) Checks that the node has type TYPE. -(parent-match PATTERN) + (parent-match PATTERN) Checks that the parent matches PATTERN, a query pattern. -(node-match PATTERN) + (node-match PATTERN) Checks that the node matches PATTERN, a query pattern. @@ -356,7 +358,7 @@ tree-sitter--simple-apply If FN is a key in `tree-sitter-simple-indent-presets', use the corresponding value as the function." - (cond ((consp fn) + (cond ((consp fn) ;FIXME: This will mis-match for non-compiled lambdas! (apply (tree-sitter--simple-apply (car fn) (cdr fn)) args)) ((and (symbolp fn) @@ -366,21 +368,46 @@ tree-sitter--simple-apply ((functionp fn) (apply fn args)) (t (error "Couldn't find appropriate function for FN")))) -(defun tree-sitter-simple-indent-function () +(defvar tree-sitter-indent-function #'tree-sitter-simple-indent + "Document.") + +(defun tree-sitter-indent () "Indent according to `tree-sitter-simple-indent-rules'." - (let* ((orig-pos (point)) - (bol (save-excursion + (pcase-let* + ((orig-pos (point)) + (bol (save-excursion + (beginning-of-line) + (skip-chars-forward " \t") + (point))) + (node (tree-sitter-parent-while + (cl-loop for parser in tree-sitter-parser-list + for node = (tree-sitter-node-at + bol nil parser) + if node return node) + (lambda (node) + (eq bol (tree-sitter-node-start node))))) + (parent (tree-sitter-node-parent node)) + (`(,anchor . ,offset) + (funcall tree-sitter-indent-function node parent))) + (let ((col (+ (save-excursion + (goto-char anchor) + (current-column)) + offset))) + (if (< bol orig-pos) + (save-excursion + (indent-line-to col)) + (indent-line-to col)) + (when tree-sitter--indent-verbose + (message "indent to %S (%S position + %S)" + col anchor offset))))) + +(defun tree-sitter-simple-indent (node parent) + (let* ((bol (save-excursion (beginning-of-line) (skip-chars-forward " \t") (point))) - (node (tree-sitter-parent-while - (cl-loop for parser in tree-sitter-parser-list - for node = (tree-sitter-node-at - bol nil parser) - if node return node) - (lambda (node) - (eq bol (tree-sitter-node-start node))))) - (parent (tree-sitter-node-parent node)) + ;; FIXME: Can't we get the language from `node' rather than + ;; from `point'? (language (tree-sitter-language-at (point))) (rules (alist-get language tree-sitter-simple-indent-rules))) (cl-loop for rule in rules @@ -388,20 +415,9 @@ tree-sitter-simple-indent-function for anchor = (nth 1 rule) for offset = (nth 2 rule) if (tree-sitter--simple-apply pred (list node parent bol)) - do (let ((col (+ (save-excursion - (goto-char - (tree-sitter--simple-apply - anchor (list node parent bol))) - (current-column)) - offset))) - (if (< bol orig-pos) - (save-excursion - (indent-line-to col)) - (indent-line-to col)) - (when tree-sitter--indent-verbose - (message "matched %S\nindent to %s" - pred col))) - and return nil))) + do `(,(tree-sitter--simple-apply + anchor (list node parent bol)) + . ,offset)))) ;;; Lab @@ -435,7 +451,7 @@ ts-c-mode (ignore t nil nil nil) indent-line-function - #'tree-sitter-simple-indent-function + #'tree-sitter-indent tree-sitter-simple-indent-rules ts-c-tree-sitter-indent-rules)