From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Philip Kaludercic Newsgroups: gmane.emacs.devel Subject: Re: New tree-sitter mode: bison-ts-mode Date: Fri, 22 Sep 2023 07:38:55 +0000 Message-ID: <871qeqwtbk.fsf@posteo.net> References: Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16250"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel To: =?utf-8?Q?Augustin_Ch=C3=A9neau_=28BTuin=29?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Sep 22 09:39:58 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qjala-0003yC-Et for ged-emacs-devel@m.gmane-mx.org; Fri, 22 Sep 2023 09:39:58 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qjakl-0002t2-Bo; Fri, 22 Sep 2023 03:39:07 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qjakj-0002sT-VP for emacs-devel@gnu.org; Fri, 22 Sep 2023 03:39:06 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qjakd-00064X-Gs for emacs-devel@gnu.org; Fri, 22 Sep 2023 03:39:05 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 0C124240027 for ; Fri, 22 Sep 2023 09:38:55 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1695368336; bh=VCMJjfI6azDzCg1ksUEYcXJAczWSsDoW6ixkxY/aCko=; h=From:To:Cc:Subject:Autocrypt:Date:Message-ID:MIME-Version: Content-Transfer-Encoding:From; b=KF+ZK/JGr+XctgBFGG2eRdi6Wv/50gTnWTsSAWNUF5mvly3uQjg6TcxRR5sWszJNm RPwNXElYQB8eSeT0PzWsapAX0IjNrRkV2HZMT0ioOl7jRymNOPSy9Lnupvdo+WQFH5 3bDn9RkUXAoRR7x/2yeNBgggfsp/VRzWAq4jMbIE1QoZEEU60lJKUdhG0E0YLAtjh8 qoxqV0kjVp+tvsTfgnvwa4g+Fw1wJ33b9I/cH+1kmGIBpD3g1iKGGo45SFMLEhHwYP MVAytj51FORCAbmtderLQxN08IS5LXN8Br7hsD5Fx281IDFuqm3Fw1/NIU3Xl4PXvF rH5dbJhXOlOXA== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4RsPKC4CKGz6v8r; Fri, 22 Sep 2023 09:38:55 +0200 (CEST) In-Reply-To: ("Augustin =?utf-8?Q?Ch=C3=A9neau?= (BTuin)"'s message of "Thu, 21 Sep 2023 22:15:45 +0200") Autocrypt: addr=philipk@posteo.net; keydata= mDMEZBBQQhYJKwYBBAHaRw8BAQdAHJuofBrfqFh12uQu0Yi7mrl525F28eTmwUDflFNmdui0QlBo aWxpcCBLYWx1ZGVyY2ljIChnZW5lcmF0ZWQgYnkgYXV0b2NyeXB0LmVsKSA8cGhpbGlwa0Bwb3N0 ZW8ubmV0PoiWBBMWCAA+FiEEDg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwMFCQHhM4AFCwkI BwIGFQoJCAsCBBYCAwECHgECF4AACgkQ8xYDWXahwulikAEA77hloUiSrXgFkUVJhlKBpLCHUjA0 mWZ9j9w5d08+jVwBAK6c4iGP7j+/PhbkxaEKa4V3MzIl7zJkcNNjHCXmvFcEuDgEZBBQQhIKKwYB BAGXVQEFAQEHQI5NLiLRjZy3OfSt1dhCmFyn+fN/QKELUYQetiaoe+MMAwEIB4h+BBgWCAAmFiEE Dg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwwFCQHhM4AACgkQ8xYDWXahwukm+wEA8cml4JpK NeAu65rg+auKrPOP6TP/4YWRCTIvuYDm0joBALw98AMz7/qMHvSCeU/hw9PL6u6R2EScxtpKnWof z4oM Received-SPF: pass client-ip=185.67.36.65; envelope-from=philipk@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -43 X-Spam_score: -4.4 X-Spam_bar: ---- X-Spam_report: (-4.4 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:310941 Archived-At: "Augustin Ch=C3=A9neau (BTuin)" writes: A few comments on the proposed file: > ;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; = -*- > Could you add the usual header information here? ;; Copyright (C) 2022-2023 Free Software Foundation, Inc. --8<---------------cut here---------------start------------->8--- ;; Author: Augustin Ch=C3=A9neau ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see . --8<---------------cut here---------------end--------------->8--- > ;;; Commentary: > > ;; This is a mode based on tree-sitter for Bison and Yacc files, tools to= generate parsers. Shouldn't you mention what tree sitter grammar is being being used here? > ;;; Code: > > (require 'treesit) > (require 'c-ts-common) > > (declare-function treesit-parser-create "treesit.c") > (declare-function treesit-induce-sparse-tree "treesit.c") > (declare-function treesit-node-child-by-field-name "treesit.c") > (declare-function treesit-search-subtree "treesit.c") > (declare-function treesit-node-parent "treesit.c") > (declare-function treesit-node-next-sibling "treesit.c") > (declare-function treesit-node-type "treesit.c") > (declare-function treesit-node-child "treesit.c") > (declare-function treesit-node-end "treesit.c") > (declare-function treesit-node-start "treesit.c") > (declare-function treesit-node-string "treesit.c") > (declare-function treesit-query-compile "treesit.c") > (declare-function treesit-query-capture "treesit.c") > (declare-function treesit-parser-add-notifier "treesit.c") > (declare-function treesit-parser-buffer "treesit.c") > (declare-function treesit-parser-list "treesit.c") > > > (defgroup bison nil bison or bison-ts? > "Support for Bison and Yacc." Shouldn't tree-sitter be mentioned here? > :group 'languages) > > (defcustom bison-ts-mode-indent-offset 2 > "Number of spaces for each indentation step in `bison-ts-mode'. > It has no effect in the epilogue part of the file." > :version "30.1" > :type 'integer > :safe 'integerp > :group 'bison) > > (defcustom bison-ts-mode-autodetect-language t > "Search for a %language directive in the file at initialization. > Changing the value of this directive in the file requires to reload the m= ode to > be effective. If `bison-ts-mode-buffer-language' is set by a file-local > variable, the auto-detection is not run." > :version "30.1" > :type 'boolean > :safe 'boolean > :group 'bison) > > (defvar-local bison-ts-mode-embedded-language nil > "Embedded language in Bison buffer.") > > (defun bison-ts-mode--merge-feature-lists (l1 l2) > "Merge the lists of lists L1 and L2. > The first sublist of L1 is merged with the first sublist of L2 and so on. > L1 and L2 don't need to have the same size." > (let ((res ())) > (while (or l1 l2) > (setq res (push (append (car l1) (car l2)) res)) > (setq l1 (cdr l1) l2 (cdr l2))) > (nreverse res))) Is this a generic function that should be extracted into some common file? > (defun bison-ts-mode--find-language-in-buffer (&optional buffer) > "Find and return the language set by the Bison directive %language. > If BUFFER is set, search in this buffer, otherwise search in the current > buffer." > (save-excursion > (when buffer > (switch-to-buffer buffer)) Or rather? (with-current-buffer (or buffer (current-buffer)) (save-excursion ...)) > (goto-char (point-min)) > (let ((pos-end > (re-search-forward > (rx > bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha "+"= )) "\"") > nil > t)) > (pos-beg nil)) > (when pos-end Using when-let might be nice here. > (goto-char (1- pos-end)) > (setq pos-beg (1+ (search-backward "\"" nil t))) > (buffer-substring-no-properties pos-beg (1- pos-end)))))) I'd use a single regular expression here with a group, then extract the right information using `match-string'. > > > (defun bison-ts-mode--detect-language (&optional buffer) > "Dectect the embedded language in a Bison buffer. > Known languages are C, C++, D, and Java, but D is not supported as there = is > no support for tree-sitter D in Emacs yet. > If BUFFER is set, search in this buffer, otherwise search in the current > buffer." > (if-let ((str (bison-ts-mode--find-language-in-buffer buffer))) > (pcase (downcase str) > ("c" 'c) > ("c++" 'cpp) > ("d" (progn (message "D language not yet supported") nil)) Each pcase case has an implicit progn. > ("java" 'java)) > (progn Use when-let to avoid this progn. > (message > "bison-ts-mode: %%language specification not found or invalid, def= aulting to C.") Is it necessary to prefix the message with the major mode name? > 'c))) > > > (defun bison-ts-mode--language-at-point-function (position) > "Return the language at POSITION." > (let* ((node (treesit-node-at position 'bison))) ^ let is enough > (if (equal (treesit-node-type node) > "embedded_code") There is no need to break the line here. > bison-ts-mode-embedded-language > 'bison))) > > (defun bison-ts-mode--font-lock-settings (language) > "Return the font-lock settings for Bison. > LANGUAGE should be set to \\=3D'bison." > (treesit-font-lock-rules > :language language > :feature 'bison-comment > '((comment) @font-lock-comment-face) > > :language language > :feature 'bison-declaration > '((declaration_name) @font-lock-keyword-face) > > :language language > :feature 'bison-type > '((type) @font-lock-type-face) > > :language language > :feature 'bison-grammar-rule-usage > '((grammar_rule_identifier) @font-lock-variable-use-face) > > :language language > :feature 'bison-grammar-rule-declaration > '((grammar_rule (grammar_rule_declaration) > @font-lock-variable-use-face)) > > :language language > :feature 'bison-string > :override t > '((string) @font-lock-string-face) > > :language language > :feature 'bison-literal > :override t > '((char_literal) @font-lock-keyword-face > (number_literal) @font-lock-number-face) > > :language language > :feature 'bison-directive-grammar-rule > :override t > '((grammar_rule (directive) @font-lock-keyword-face)) > > :language language > :feature 'bison-operator > :override t > '(["|"] @font-lock-operator-face) > > :language language > :feature 'bison-delimiter > :override t > '([";"] @font-lock-delimiter-face))) > > > (defvar bison-ts-mode--font-lock-feature-list > '(( bison-comment bison-declaration bison-type > bison-grammar-rule-usage bison-grammar-rule-declaration > bison-string bison-literal bison-directive-grammar-rule > bison-operator bison-delimiter))) > > > (defun bison-ts-mode--bison-matcher-action (root-name) > "Treesit matcher to check if NODE at BOL is not located in the epilogue. > ROOT-NAME is the highest-level node of the embedded language." > (lambda (node _parent bol &rest _) > (if (equal (treesit-node-type (treesit-node-parent node)) root-name) > (let* ((bison-node (treesit-node-at bol 'bison))) ^ here again, let is enough (if (equal > (treesit-node-type > (treesit-node-parent(treesit-node-parent bison-node))) "a= ction") Though you could bind the (treesit-node-type ...) expression under the above let. > t > nil))))) Why (if foo t nil) when foo would do the same job (equal only returns nil and t, so normalising the value isn't even necessary). > > (defun bison-ts-mode--bison-matcher-not-epilogue (root-name) > "Treesit matcher to check if NODE at BOL is not located in the epilogue. > ROOT-NAME is the highest-level node of the embedded language." > (lambda (node _parent bol &rest _) > (if (equal (treesit-node-type (treesit-node-parent node)) root-name) > (let* ((bison-node (treesit-node-at bol 'bison))) > (if (equal (treesit-node-type (treesit-node-parent bison-node))= "epilogue") > nil > t))))) Am I missing something, or couldn't these two functions be merged if you give them a third argument NODE-TYPE and pass it "action" or "epilogue". > > > (defun bison-ts-mode--bison-parent (_node _parent bol &rest _) > "Get the parent of the bison node at BOL." > (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison)))) > > > (defun bison-ts-mode--indent-rules () > "Indent rules supported by `bison-ts-mode'." > (let* > ((common > `(((node-is "^declaration$") > column-0 0) > ((and (parent-is "^declaration$") > (not (node-is "^code_block$"))) > column-0 2) > ((and (parent-is "comment") c-ts-common-looking-at-star) > c-ts-common-comment-start-after-first-star -1) > (c-ts-common-comment-2nd-line-matcher > c-ts-common-comment-2nd-line-anchor > 1) > ((parent-is "comment") prev-adaptive-prefix 0) > > ;; Opening and closing brackets "{}" of declarations > ((and (parent-is "^declaration$") > (node-is "^code_block$")) > column-0 0) > ((and (n-p-gp "}" "" "^declaration$")) > column-0 0) > ((parent-is "^declaration$") parent 2) > ((node-is "^grammar_rule$") column-0 0) > ((and > (parent-is "^grammar_rule$") > (node-is ";")) > column-0 bison-ts-mode-indent-offset) > ((and (parent-is "^grammar_rule$") > (node-is "|")) > column-0 bison-ts-mode-indent-offset) > ((and (parent-is "^grammar_rule$") > (not (node-is "^grammar_rule_declaration$")) > (not (node-is "^action$"))) > column-0 ,(+ bison-ts-mode-indent-offset 2)) > ((or > (node-is "^action$") > (node-is "^}$")) > column-0 12) > ;; Set '%%' at the beginning of the line > ((or > (and (parent-is "^grammar_rules_section$") > (node-is "%%")) > (node-is "^grammar_rules_section$")) > column-0 0) > (no-node parent-bol 0)))) > `((bison . ,common) > ;; Import and override embedded languages rules to add an offset > ,(pcase bison-ts-mode-embedded-language > ('c `(c > ((bison-ts-mode--bison-matcher-action "translation_unit") > bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset) > ((bison-ts-mode--bison-matcher-not-epilogue "translation_u= nit") > column-0 ,bison-ts-mode-indent-offset) > ,@(alist-get 'c (c-ts-mode--get-indent-style 'c)))) > ('cpp `(cpp > ((bison-ts-mode--bison-matcher-action "translation_unit") > bison-ts-mode--bison-parent ,bison-ts-mode-indent-offse= t) > ((bison-ts-mode--bison-matcher-not-epilogue "translation= _unit") > parent-0 ,bison-ts-mode-indent-offset) > ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp)))) > ('java `(java > ((bison-ts-mode--bison-matcher-action "program") > bison-ts-mode--bison-parent ,bison-ts-mode-indent-offs= et) > ((bison-ts-mode--bison-matcher-not-epilogue "program") > column-0 ,bison-ts-mode-indent-offset) > ,@java-ts-mode--indent-rules)))))) > > > (define-derived-mode bison-ts-mode prog-mode "Bison" > "A mode for Bison." ^ major-mode Also, mentioning tree-sitter seems like something worth doing. > (when (treesit-ready-p 'bison) > (when (not bison-ts-mode-embedded-language) > (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-langua= ge))) > > ;; Require only if needed, to avoid warnings if a grammar is not > ;; installed but not used. > (pcase bison-ts-mode-embedded-language Would a `pcase-exhaustive' be appropriate here? > ('c (require 'c-ts-mode)) > ('cpp (require 'c-ts-mode)) > ('java (require 'java-ts-mode))) > > (setq-local treesit-font-lock-settings > (append (bison-ts-mode--font-lock-settings 'bison) > (pcase bison-ts-mode-embedded-language > ('c (c-ts-mode--font-lock-settings 'c)) > ('cpp (c-ts-mode--font-lock-settings 'cpp)) > ('java java-ts-mode--font-lock-settings)))) > > (setq-local treesit-font-lock-feature-list > (if bison-ts-mode-embedded-language > (bison-ts-mode--merge-feature-lists > bison-ts-mode--font-lock-feature-list > (pcase bison-ts-mode-embedded-language > ('c c-ts-mode--feature-list) > ('cpp c-ts-mode--feature-list) > ('java java-ts-mode--feature-list))) > bison-ts-mode--font-lock-feature-list)) > > (setq-local treesit-simple-imenu-settings > `(("Grammar" > "\\`grammar_rule_declaration\\'" > nil > (lambda (node) (substring-no-properties (treesit-node-= text node)))))) > > (c-ts-common-comment-setup) > > (setq-local treesit-simple-indent-rules > (bison-ts-mode--indent-rules)) > > (setq-local treesit-language-at-point-function 'bison-ts-mode--langua= ge-at-point-function) > > (when bison-ts-mode-embedded-language > (setq-local treesit-range-settings > (treesit-range-rules > :embed bison-ts-mode-embedded-language > :host 'bison > :local t > '((embedded_code) @capture)))) > > (treesit-major-mode-setup))) > > (provide 'bison-ts-mode) > ;;; bison-ts-mode.el ends here Sorry for the number of comments, but there has been a discussion on the code-quality of tree-sitter major modes that has been less than optimal, so I hope that your contribution could help raise the bar.