From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Philip Kaludercic Newsgroups: gmane.emacs.devel Subject: Re: New tree-sitter mode: bison-ts-mode Date: Fri, 22 Sep 2023 20:40:36 +0000 Message-ID: <87zg1e7xh7.fsf@posteo.net> References: <871qeqwtbk.fsf@posteo.net> <97f90014-6507-44ac-a61c-4e74d058063f@mailo.com> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="33955"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel To: =?utf-8?Q?Augustin_Ch=C3=A9neau_=28BTuin=29?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Sep 22 22:41:48 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qjmyC-0008dk-4B for ged-emacs-devel@m.gmane-mx.org; Fri, 22 Sep 2023 22:41:48 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qjmxK-0001v9-3A; Fri, 22 Sep 2023 16:40:54 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qjmxH-0001uq-Q2 for emacs-devel@gnu.org; Fri, 22 Sep 2023 16:40:51 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qjmxA-0002Jm-94 for emacs-devel@gnu.org; Fri, 22 Sep 2023 16:40:49 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 1F6FE240028 for ; Fri, 22 Sep 2023 22:40:40 +0200 (CEST) Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4RskgC3f8Mz6tsg; Fri, 22 Sep 2023 22:40:39 +0200 (CEST) In-Reply-To: <97f90014-6507-44ac-a61c-4e74d058063f@mailo.com> ("Augustin =?utf-8?Q?Ch=C3=A9neau?= (BTuin)"'s message of "Fri, 22 Sep 2023 16:53:08 +0200") Autocrypt: addr=philipk@posteo.net; keydata= mDMEZBBQQhYJKwYBBAHaRw8BAQdAHJuofBrfqFh12uQu0Yi7mrl525F28eTmwUDflFNmdui0QlBo aWxpcCBLYWx1ZGVyY2ljIChnZW5lcmF0ZWQgYnkgYXV0b2NyeXB0LmVsKSA8cGhpbGlwa0Bwb3N0 ZW8ubmV0PoiWBBMWCAA+FiEEDg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwMFCQHhM4AFCwkI BwIGFQoJCAsCBBYCAwECHgECF4AACgkQ8xYDWXahwulikAEA77hloUiSrXgFkUVJhlKBpLCHUjA0 mWZ9j9w5d08+jVwBAK6c4iGP7j+/PhbkxaEKa4V3MzIl7zJkcNNjHCXmvFcEuDgEZBBQQhIKKwYB BAGXVQEFAQEHQI5NLiLRjZy3OfSt1dhCmFyn+fN/QKELUYQetiaoe+MMAwEIB4h+BBgWCAAmFiEE Dg7HY17ghYlni8XN8xYDWXahwukFAmQQUEICGwwFCQHhM4AACgkQ8xYDWXahwukm+wEA8cml4JpK NeAu65rg+auKrPOP6TP/4YWRCTIvuYDm0joBALw98AMz7/qMHvSCeU/hw9PL6u6R2EScxtpKnWof z4oM Received-SPF: pass client-ip=185.67.36.65; envelope-from=philipk@posteo.net; helo=mout01.posteo.de X-Spam_score_int: -41 X-Spam_score: -4.2 X-Spam_bar: ---- X-Spam_report: (-4.2 / 5.0 requ) BAYES_00=-1.9, RCVD_IN_DNSWL_MED=-2.3, RCVD_IN_MSPIKE_H5=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:310989 Archived-At: "Augustin Ch=C3=A9neau (BTuin)" writes: > Le 22/09/2023 =C3=A0 09:38, Philip Kaludercic a =C3=A9crit=C2=A0: >> "Augustin Ch=C3=A9neau (BTuin)" writes: >> A few comments on the proposed file: >>=20 >>> ;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t= ; -*- >>> >> Could you add the usual header information here? >> ;; Copyright (C) 2022-2023 Free Software Foundation, Inc. >> --8<---------------cut here---------------start------------->8--- >> ;; Author: Augustin Ch=C3=A9neau >> ;; This file is part of GNU Emacs. >> ;; GNU Emacs is free software: you can redistribute it and/or modify >> ;; it under the terms of the GNU General Public License as published by >> ;; the Free Software Foundation, either version 3 of the License, or >> ;; (at your option) any later version. >> ;; GNU Emacs is distributed in the hope that it will be useful, >> ;; but WITHOUT ANY WARRANTY; without even the implied warranty of >> ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> ;; GNU General Public License for more details. >> ;; You should have received a copy of the GNU General Public License >> ;; along with GNU Emacs. If not, see . >> --8<---------------cut here---------------end--------------->8--- >>=20 >>> ;;; Commentary: >>> >>> ;; This is a mode based on tree-sitter for Bison and Yacc files, tools = to generate parsers. >> Shouldn't you mention what tree sitter grammar is being being used >> here? >>=20 >>> ;;; Code: >>> >>> (require 'treesit) >>> (require 'c-ts-common) >>> >>> (declare-function treesit-parser-create "treesit.c") >>> (declare-function treesit-induce-sparse-tree "treesit.c") >>> (declare-function treesit-node-child-by-field-name "treesit.c") >>> (declare-function treesit-search-subtree "treesit.c") >>> (declare-function treesit-node-parent "treesit.c") >>> (declare-function treesit-node-next-sibling "treesit.c") >>> (declare-function treesit-node-type "treesit.c") >>> (declare-function treesit-node-child "treesit.c") >>> (declare-function treesit-node-end "treesit.c") >>> (declare-function treesit-node-start "treesit.c") >>> (declare-function treesit-node-string "treesit.c") >>> (declare-function treesit-query-compile "treesit.c") >>> (declare-function treesit-query-capture "treesit.c") >>> (declare-function treesit-parser-add-notifier "treesit.c") >>> (declare-function treesit-parser-buffer "treesit.c") >>> (declare-function treesit-parser-list "treesit.c") >>> >>> >>> (defgroup bison nil >> bison or bison-ts? >>=20 > > As far as I know, no group is explicitly reserved for > tree-sitter. rust-ts-mode uses the group "rust", java-ts-mode uses > "java", ruby-ts-mode uses "ruby"... OK, that is fine then. > >>> "Support for Bison and Yacc." >> Shouldn't tree-sitter be mentioned here? >>=20 > Same as above. Sure? Above I was talking about using the group name "bison" or "bison-ts", this is about mentioning that tree sitter is required for the mode to work. >>> :group 'languages) >>> >>> (defcustom bison-ts-mode-indent-offset 2 >>> "Number of spaces for each indentation step in `bison-ts-mode'. >>> It has no effect in the epilogue part of the file." >>> :version "30.1" >>> :type 'integer >>> :safe 'integerp >>> :group 'bison) >>> >>> (defcustom bison-ts-mode-autodetect-language t >>> "Search for a %language directive in the file at initialization. >>> Changing the value of this directive in the file requires to reload the= mode to >>> be effective. If `bison-ts-mode-buffer-language' is set by a file-local >>> variable, the auto-detection is not run." >>> :version "30.1" >>> :type 'boolean >>> :safe 'boolean >>> :group 'bison) >>> >>> (defvar-local bison-ts-mode-embedded-language nil >>> "Embedded language in Bison buffer.") >>> >>> (defun bison-ts-mode--merge-feature-lists (l1 l2) >>> "Merge the lists of lists L1 and L2. >>> The first sublist of L1 is merged with the first sublist of L2 and so o= n. >>> L1 and L2 don't need to have the same size." >>> (let ((res ())) >>> (while (or l1 l2) >>> (setq res (push (append (car l1) (car l2)) res)) >>> (setq l1 (cdr l1) l2 (cdr l2))) >>> (nreverse res))) >> Is this a generic function that should be extracted into some common >> file? >>=20 > It probably should, it could be useful for other *-ts-mode that want > to mix multiple languages, unless there already is such a function in > Elisp. Then again, if you use `cl-loop', you can reproduce this specific behaviour using (cl-loop for i1 in '(1 2 3) collect i1 for i2 in '(a b c) collect i2) > >>> (defun bison-ts-mode--find-language-in-buffer (&optional buffer) >>> "Find and return the language set by the Bison directive %language. >>> If BUFFER is set, search in this buffer, otherwise search in the current >>> buffer." >>> (save-excursion >>> (when buffer >>> (switch-to-buffer buffer)) >> Or rather? >> (with-current-buffer (or buffer (current-buffer)) >> (save-excursion >> ...)) >>=20 >>> (goto-char (point-min)) >>> (let ((pos-end >>> (re-search-forward >>> (rx >>> bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha = "+")) "\"") >>> nil >>> t)) >>> (pos-beg nil)) >>> (when pos-end >> Using when-let might be nice here. >>=20 >>> (goto-char (1- pos-end)) >>> (setq pos-beg (1+ (search-backward "\"" nil t))) >>> (buffer-substring-no-properties pos-beg (1- pos-end)))))) >> I'd use a single regular expression here with a group, then extract >> the >> right information using `match-string'. >>=20 > Nice, I didn't know it was possible. >>> >>> >>> (defun bison-ts-mode--detect-language (&optional buffer) >>> "Dectect the embedded language in a Bison buffer. >>> Known languages are C, C++, D, and Java, but D is not supported as ther= e is >>> no support for tree-sitter D in Emacs yet. >>> If BUFFER is set, search in this buffer, otherwise search in the current >>> buffer." >>> (if-let ((str (bison-ts-mode--find-language-in-buffer buffer))) >>> (pcase (downcase str) >>> ("c" 'c) >>> ("c++" 'cpp) >>> ("d" (progn (message "D language not yet supported") nil)) >> Each pcase case has an implicit progn. >>=20 >>> ("java" 'java)) >>> (progn >> Use when-let to avoid this progn. >>=20 >>> (message >>> "bison-ts-mode: %%language specification not found or invalid, = defaulting to C.") >> Is it necessary to prefix the message with the major mode name? >>=20 > If feared it would be a bit cryptic without. Anyway I modified this > function to only display a message if "%language" is present but the > language is invalid, so it probably isn't necessary anymore. OK. >>> 'c))) >>> >>> >>> (defun bison-ts-mode--language-at-point-function (position) >>> "Return the language at POSITION." >>> (let* ((node (treesit-node-at position 'bison))) >> ^ >> let is enough >>=20 >>> (if (equal (treesit-node-type node) >>> "embedded_code") >> There is no need to break the line here. >>=20 >>> bison-ts-mode-embedded-language >>> 'bison))) >>> >>> (defun bison-ts-mode--font-lock-settings (language) >>> "Return the font-lock settings for Bison. >>> LANGUAGE should be set to \\=3D'bison." >>> (treesit-font-lock-rules >>> :language language >>> :feature 'bison-comment >>> '((comment) @font-lock-comment-face) >>> >>> :language language >>> :feature 'bison-declaration >>> '((declaration_name) @font-lock-keyword-face) >>> >>> :language language >>> :feature 'bison-type >>> '((type) @font-lock-type-face) >>> >>> :language language >>> :feature 'bison-grammar-rule-usage >>> '((grammar_rule_identifier) @font-lock-variable-use-face) >>> >>> :language language >>> :feature 'bison-grammar-rule-declaration >>> '((grammar_rule (grammar_rule_declaration) >>> @font-lock-variable-use-face)) >>> >>> :language language >>> :feature 'bison-string >>> :override t >>> '((string) @font-lock-string-face) >>> >>> :language language >>> :feature 'bison-literal >>> :override t >>> '((char_literal) @font-lock-keyword-face >>> (number_literal) @font-lock-number-face) >>> >>> :language language >>> :feature 'bison-directive-grammar-rule >>> :override t >>> '((grammar_rule (directive) @font-lock-keyword-face)) >>> >>> :language language >>> :feature 'bison-operator >>> :override t >>> '(["|"] @font-lock-operator-face) >>> >>> :language language >>> :feature 'bison-delimiter >>> :override t >>> '([";"] @font-lock-delimiter-face))) >>> >>> >>> (defvar bison-ts-mode--font-lock-feature-list >>> '(( bison-comment bison-declaration bison-type >>> bison-grammar-rule-usage bison-grammar-rule-declaration >>> bison-string bison-literal bison-directive-grammar-rule >>> bison-operator bison-delimiter))) >>> >>> >>> (defun bison-ts-mode--bison-matcher-action (root-name) >>> "Treesit matcher to check if NODE at BOL is not located in the epilo= gue. >>> ROOT-NAME is the highest-level node of the embedded language." >>> (lambda (node _parent bol &rest _) >>> (if (equal (treesit-node-type (treesit-node-parent node)) root-nam= e) >>> (let* ((bison-node (treesit-node-at bol 'bison))) >> ^ >> here again, let is enough >> (if (equal >>> (treesit-node-type >>> (treesit-node-parent(treesit-node-parent bison-node)))= "action") >> Though you could bind the (treesit-node-type ...) expression under >> the >> above let. >>=20 >>> t >>> nil))))) >> Why (if foo t nil) when foo would do the same job (equal only >> returns >> nil and t, so normalising the value isn't even necessary). >>=20 > Because I was stupid. OK, I am glad I didn't miss something obvious ^^ >>> >>> (defun bison-ts-mode--bison-matcher-not-epilogue (root-name) >>> "Treesit matcher to check if NODE at BOL is not located in the epilo= gue. >>> ROOT-NAME is the highest-level node of the embedded language." >>> (lambda (node _parent bol &rest _) >>> (if (equal (treesit-node-type (treesit-node-parent node)) root-nam= e) >>> (let* ((bison-node (treesit-node-at bol 'bison))) >>> (if (equal (treesit-node-type (treesit-node-parent bison-nod= e)) "epilogue") >>> nil >>> t))))) >> Am I missing something, or couldn't these two functions be merged if >> you >> give them a third argument NODE-TYPE and pass it "action" or "epilogue". >>=20 > No, bison-ts-mode--bison-matcher-action checks if the _grandparent_ is > an "action" node, while bison-ts-mode--bison-matcher-not-epilogue > checks if the _parent_ is an "epilogue" node. In that case, would adding another parameter and then binding the returned lambda expression via defalias be worth the effort?=20=20 >>> >>> >>> (defun bison-ts-mode--bison-parent (_node _parent bol &rest _) >>> "Get the parent of the bison node at BOL." >>> (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison= )))) >>> >>> >>> (defun bison-ts-mode--indent-rules () >>> "Indent rules supported by `bison-ts-mode'." >>> (let* >>> ((common >>> `(((node-is "^declaration$") >>> column-0 0) >>> ((and (parent-is "^declaration$") >>> (not (node-is "^code_block$"))) >>> column-0 2) >>> ((and (parent-is "comment") c-ts-common-looking-at-star) >>> c-ts-common-comment-start-after-first-star -1) >>> (c-ts-common-comment-2nd-line-matcher >>> c-ts-common-comment-2nd-line-anchor >>> 1) >>> ((parent-is "comment") prev-adaptive-prefix 0) >>> >>> ;; Opening and closing brackets "{}" of declarations >>> ((and (parent-is "^declaration$") >>> (node-is "^code_block$")) >>> column-0 0) >>> ((and (n-p-gp "}" "" "^declaration$")) >>> column-0 0) >>> ((parent-is "^declaration$") parent 2) >>> ((node-is "^grammar_rule$") column-0 0) >>> ((and >>> (parent-is "^grammar_rule$") >>> (node-is ";")) >>> column-0 bison-ts-mode-indent-offset) >>> ((and (parent-is "^grammar_rule$") >>> (node-is "|")) >>> column-0 bison-ts-mode-indent-offset) >>> ((and (parent-is "^grammar_rule$") >>> (not (node-is "^grammar_rule_declaration$")) >>> (not (node-is "^action$"))) >>> column-0 ,(+ bison-ts-mode-indent-offset 2)) >>> ((or >>> (node-is "^action$") >>> (node-is "^}$")) >>> column-0 12) >>> ;; Set '%%' at the beginning of the line >>> ((or >>> (and (parent-is "^grammar_rules_section$") >>> (node-is "%%")) >>> (node-is "^grammar_rules_section$")) >>> column-0 0) >>> (no-node parent-bol 0)))) >>> `((bison . ,common) >>> ;; Import and override embedded languages rules to add an offset >>> ,(pcase bison-ts-mode-embedded-language >>> ('c `(c >>> ((bison-ts-mode--bison-matcher-action "translation_unit= ") >>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offs= et) >>> ((bison-ts-mode--bison-matcher-not-epilogue "translatio= n_unit") >>> column-0 ,bison-ts-mode-indent-offset) >>> ,@(alist-get 'c (c-ts-mode--get-indent-style 'c)))) >>> ('cpp `(cpp >>> ((bison-ts-mode--bison-matcher-action "translation_un= it") >>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-of= fset) >>> ((bison-ts-mode--bison-matcher-not-epilogue "translat= ion_unit") >>> parent-0 ,bison-ts-mode-indent-offset) >>> ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))= )) >>> ('java `(java >>> ((bison-ts-mode--bison-matcher-action "program") >>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-o= ffset) >>> ((bison-ts-mode--bison-matcher-not-epilogue "program= ") >>> column-0 ,bison-ts-mode-indent-offset) >>> ,@java-ts-mode--indent-rules)))))) >>> >>> >>> (define-derived-mode bison-ts-mode prog-mode "Bison" >>> "A mode for Bison." >> ^ >> major-mode >> Also, mentioning tree-sitter seems like something worth doing. >>=20 >>> (when (treesit-ready-p 'bison) >>> (when (not bison-ts-mode-embedded-language) >>> (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-lan= guage))) >>> >>> ;; Require only if needed, to avoid warnings if a grammar is not >>> ;; installed but not used. >>> (pcase bison-ts-mode-embedded-language >> Would a `pcase-exhaustive' be appropriate here? >>=20 > No, the language D is recognized but not supported yet in Emacs, so if > this language is detected it will not configure anything. There is a d-mode in NonGNU ELPA, but I guess that isn't enough since you need a d-ts-mode, right? >>> ('c (require 'c-ts-mode)) >>> ('cpp (require 'c-ts-mode)) >>> ('java (require 'java-ts-mode))) >>> >>> (setq-local treesit-font-lock-settings >>> (append (bison-ts-mode--font-lock-settings 'bison) >>> (pcase bison-ts-mode-embedded-language >>> ('c (c-ts-mode--font-lock-settings 'c)) >>> ('cpp (c-ts-mode--font-lock-settings 'cpp)) >>> ('java java-ts-mode--font-lock-settings)))) >>> >>> (setq-local treesit-font-lock-feature-list >>> (if bison-ts-mode-embedded-language >>> (bison-ts-mode--merge-feature-lists >>> bison-ts-mode--font-lock-feature-list >>> (pcase bison-ts-mode-embedded-language >>> ('c c-ts-mode--feature-list) >>> ('cpp c-ts-mode--feature-list) >>> ('java java-ts-mode--feature-list))) >>> bison-ts-mode--font-lock-feature-list)) >>> >>> (setq-local treesit-simple-imenu-settings >>> `(("Grammar" >>> "\\`grammar_rule_declaration\\'" >>> nil >>> (lambda (node) (substring-no-properties (treesit-no= de-text node)))))) >>> >>> (c-ts-common-comment-setup) >>> >>> (setq-local treesit-simple-indent-rules >>> (bison-ts-mode--indent-rules)) >>> >>> (setq-local treesit-language-at-point-function 'bison-ts-mode--lan= guage-at-point-function) >>> >>> (when bison-ts-mode-embedded-language >>> (setq-local treesit-range-settings >>> (treesit-range-rules >>> :embed bison-ts-mode-embedded-language >>> :host 'bison >>> :local t >>> '((embedded_code) @capture)))) >>> >>> (treesit-major-mode-setup))) >>> >>> (provide 'bison-ts-mode) >>> ;;; bison-ts-mode.el ends here >> Sorry for the number of comments, but there has been a discussion on >> the >> code-quality of tree-sitter major modes that has been less than optimal, >> so I hope that your contribution could help raise the bar. > > No problem, thank you for your review! [...] > > (defgroup bison-ts nil > "Support for Bison and Yacc." > :group 'languages) > > (defcustom bison-ts-mode-indent-offset 2 > "Number of spaces for each indentation step in `bison-ts-mode'. > It has no effect in the epilogue part of the file." > :version "30.1" > :type 'integer > :safe 'integerp > :group 'bison) The ":group" annotations here are not necessary in general, defcustoms can automatically detect the previous defgroup. > (defcustom bison-ts-mode-autodetect-language t > "Search for a %language directive in the file at initialization. > Changing the value of this directive in the file requires to reload the m= ode to > be effective. If `bison-ts-mode-buffer-language' is set by a file-local > variable, the auto-detection is not run." > :version "30.1" > :type 'boolean > :safe 'boolean > :group 'bison) > > (defvar-local bison-ts-mode-embedded-language nil > "Embedded language in Bison buffer. > Supported values are `c', `cpp', and `java'.") > ;;;###autoload > (put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp) > > > (defun bison-ts-mode--merge-feature-lists (l1 l2) > "Merge the lists of lists L1 and L2. > The first sublist of L1 is merged with the first sublist of L2 and so on. > L1 and L2 don't need to have the same size." > (let ((res ())) > (while (or l1 l2) > (setq res (push (seq-uniq (append (car l1) (car l2)) 'eq) res)) > (setq l1 (cdr l1) l2 (cdr l2))) > (nreverse res))) > > (defun bison-ts-mode--find-language-in-buffer (&optional buffer) > "Find and return the language set by the Bison directive %language. > If BUFFER is set, search in this buffer, otherwise search in the current > buffer." > (save-excursion > (with-current-buffer (or buffer (current-buffer)) > (goto-char (point-min)) > (when > (re-search-forward > (rx > bol (0+ blank) "%language" (0+ blank) "\"" (group (1+ (in alpha "+")= )) "\"") I'd say this regular expression is complex enough to be split into multiple lines. And you can use the fact that `rx' takes s-expressions to add comments inbetween. > nil > t))) > (substring-no-properties (match-string 1)))) Or `match-string-no-properties' > > > (defun bison-ts-mode--detect-language (&optional buffer) > "Dectect the embedded language in a Bison buffer. > Known languages are C, C++, D, and Java, but D is not supported as there = is > no support for tree-sitter D in Emacs yet. > If BUFFER is set, search in this buffer, otherwise search in the current > buffer." > (if-let ((str (bison-ts-mode--find-language-in-buffer buffer))) > (pcase-exhaustive (downcase str) > ("c" 'c) > ("c++" 'cpp) > ("d" (message "D language not yet supported") nil) > ("java" 'java) > (_ (message "%%language specification \"%s\" is invalid, defaulting to C= " str) 'c)))) No point in using `pcase-exhaustive' if you end with _ anyway? > > > (defun bison-ts-mode--language-at-point-function (position) > "Return the language at POSITION." > (let ((node (treesit-node-at position 'bison))) > (if (equal (treesit-node-type node) "embedded_code") > bison-ts-mode-embedded-language > 'bison))) > > (defun bison-ts-mode--font-lock-settings (language) > "Return the font-lock settings for Bison. > LANGUAGE should be set to \\=3D'bison." > (treesit-font-lock-rules > :language language > :feature 'comment > '((comment) @font-lock-comment-face) > > :language language > :feature 'declaration > '((declaration_name) @font-lock-keyword-face) > > :language language > :feature 'type > '((type) @font-lock-type-face) > > :language language > :feature 'variable > '((grammar_rule_identifier) @font-lock-variable-use-face) > > :language language > :feature 'grammar-declaration > '((grammar_rule (grammar_rule_declaration) > @font-lock-variable-use-face)) > > :language language > :feature 'string > :override t > '((string) @font-lock-string-face) > > :language language > :feature 'literal > :override t > '((char_literal) @font-lock-keyword-face > (number_literal) @font-lock-number-face) > > :language language > :feature 'directive-grammar-rule > :override t > '((grammar_rule (directive) @font-lock-keyword-face)) > > :language language > :feature 'operator > :override t > '(["|"] @font-lock-operator-face) > > :language language > :feature 'delimiter > :override t > '([";"] @font-lock-delimiter-face))) > > > (defvar bison-ts-mode--font-lock-feature-list I am not that familiar with the tree-sitter stuff, but would it be possible to use `defconst' here? > '(( comment declaration grammar-declaration) > ( type string directive-grammar-rule) > ( literal) > ( variable operator delimiter))) > > > (defun bison-ts-mode--bison-matcher-action (root-name) > "Treesit matcher to check if NODE at BOL is located in an action node. > ROOT-NAME is the highest-level node of the embedded language." > (lambda (node _parent bol &rest _) > (if (equal (treesit-node-type (treesit-node-parent node)) root-name) > (let ((bison-node (treesit-node-at bol 'bison))) > (equal > (treesit-node-type > (treesit-node-parent (treesit-node-parent bison-node))) > "action"))))) > > (defun bison-ts-mode--bison-matcher-not-epilogue (root-name) > "Treesit matcher to check if NODE at BOL is not located in the epilogue. > ROOT-NAME is the highest-level node of the embedded language." > (lambda (node _parent bol &rest _) > (if (equal (treesit-node-type (treesit-node-parent node)) root-name) > (let ((bison-node (treesit-node-at bol 'bison))) > (not (equal (treesit-node-type (treesit-node-parent bison-node)= ) "epilogue")))))) > > > (defun bison-ts-mode--bison-parent (_node _parent bol &rest _) > "Get the parent of the bison node at BOL." > (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison)))) > > > (defun bison-ts-mode--indent-rules () > "Indent rules supported by `bison-ts-mode'." > (let* > ((common > `(((node-is "^declaration$") > column-0 0) > ((and (parent-is "^declaration$") > (not (node-is "^code_block$"))) > column-0 2) > ((and (parent-is "comment") c-ts-common-looking-at-star) > c-ts-common-comment-start-after-first-star -1) > (c-ts-common-comment-2nd-line-matcher > c-ts-common-comment-2nd-line-anchor > 1) > ((parent-is "comment") prev-adaptive-prefix 0) > > ;; Opening and closing brackets "{}" of declarations > ((and (parent-is "^declaration$") > (node-is "^code_block$")) > column-0 0) > ((and (n-p-gp "}" "" "^declaration$")) > column-0 0) > ((parent-is "^declaration$") parent 2) > ((node-is "^grammar_rule$") column-0 0) > ((and > (parent-is "^grammar_rule$") > (node-is ";")) > column-0 bison-ts-mode-indent-offset) > ((and (parent-is "^grammar_rule$") > (node-is "|")) > column-0 bison-ts-mode-indent-offset) > ((and (parent-is "^grammar_rule$") > (not (node-is "^grammar_rule_declaration$")) > (not (node-is "^action$"))) > column-0 ,(+ bison-ts-mode-indent-offset 2)) > ((or > (node-is "^action$") > (node-is "^}$")) > column-0 12) > ;; Set '%%' at the beginning of the line > ((or > (and (parent-is "^grammar_rules_section$") > (node-is "%%")) > (node-is "^grammar_rules_section$")) > column-0 0) > (no-node parent-bol 0)))) > `((bison . ,common) > ;; Import and override embedded languages rules to add an offset > ,(pcase bison-ts-mode-embedded-language > ('c `(c > ((bison-ts-mode--bison-matcher-action "translation_unit") > bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset) > ((bison-ts-mode--bison-matcher-not-epilogue "translation_u= nit") > column-0 ,bison-ts-mode-indent-offset) > ,@(alist-get 'c (c-ts-mode--get-indent-style 'c)))) > ('cpp `(cpp > ((bison-ts-mode--bison-matcher-action "translation_unit") > bison-ts-mode--bison-parent ,bison-ts-mode-indent-offse= t) > ((bison-ts-mode--bison-matcher-not-epilogue "translation= _unit") > parent-0 ,bison-ts-mode-indent-offset) > ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp)))) > ('java `(java > ((bison-ts-mode--bison-matcher-action "program") > bison-ts-mode--bison-parent ,bison-ts-mode-indent-offs= et) > ((bison-ts-mode--bison-matcher-not-epilogue "program") > column-0 ,bison-ts-mode-indent-offset) > ,@java-ts-mode--indent-rules)))))) > > > (define-derived-mode bison-ts-mode prog-mode "Bison" > "A major-mode for Bison based on tree-sitter." > (when (treesit-ready-p 'bison) > (when (not bison-ts-mode-embedded-language) Or `unless' > (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-langua= ge))) > > ;; Require only if needed, to avoid warnings if a grammar is not > ;; installed but not used. > (pcase bison-ts-mode-embedded-language > ('c (require 'c-ts-mode)) > ('cpp (require 'c-ts-mode)) > ('java (require 'java-ts-mode))) > > (setq-local treesit-font-lock-settings > (append (bison-ts-mode--font-lock-settings 'bison) > (pcase bison-ts-mode-embedded-language > ('c (c-ts-mode--font-lock-settings 'c)) > ('cpp (c-ts-mode--font-lock-settings 'cpp)) > ('java java-ts-mode--font-lock-settings)))) > > (setq-local treesit-font-lock-feature-list > (if bison-ts-mode-embedded-language > (bison-ts-mode--merge-feature-lists > bison-ts-mode--font-lock-feature-list > (pcase bison-ts-mode-embedded-language > ('c c-ts-mode--feature-list) > ('cpp c-ts-mode--feature-list) > ('java java-ts-mode--feature-list))) > bison-ts-mode--font-lock-feature-list)) > > (setq-local treesit-simple-imenu-settings > `(("Grammar" > "\\`grammar_rule_declaration\\'" > nil > (lambda (node) (substring-no-properties (treesit-node-= text node)))))) The function `treesit-node-text' appears to take an optional NO-PROPERTY ar= gument. > > (c-ts-common-comment-setup) > > (setq-local treesit-simple-indent-rules > (bison-ts-mode--indent-rules)) > > (setq-local treesit-language-at-point-function 'bison-ts-mode--langua= ge-at-point-function) > > (when bison-ts-mode-embedded-language > (setq-local treesit-range-settings > (treesit-range-rules > :embed bison-ts-mode-embedded-language > :host 'bison > :local t > '((embedded_code) @capture)))) > > (treesit-major-mode-setup))) > > (provide 'bison-ts-mode) > ;;; bison-ts-mode.el ends here