* New tree-sitter mode: bison-ts-mode
@ 2023-09-21 20:15 Augustin Chéneau (BTuin)
2023-09-21 22:23 ` Stefan Kangas
` (5 more replies)
0 siblings, 6 replies; 17+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-21 20:15 UTC (permalink / raw)
To: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]
Hello,
I'd like to propose a new tree-sitter mode for Emacs, bison-ts-mode.
This is a major mode for GNU Bison grammar files, and it supports the
embedded languages C, C++ and Java. Bison also supports D, but this
language is not yet supported in Emacs so this mode does not support it
either.
Since Bison is a superset of Yacc, it should work with Yacc without
modification.
The grammar is available here: <https://gitlab.com/btuin2/tree-sitter-bison>
This mode auto-detects the embedded language in the file if the
directive "%language" is set.
In addition of the mode, three patches are joined (should I open a bug
report for those?).
The first one is to add some missing rules in c-ts-mode. Currently
there is no rule if an identifier in a declaration is on a new line.
In these two cases, "variable_name" is not indented:
static myttype *
variable_name;
static myttype
variable_name;
The new behavior is the same as c-mode.
The second one fix an edge case with multiple languages indentation.
Here is a simple example:
// language_A
{
// language_B
}
// language_A
Here, brackets are managed by language_A, but everything inside is
managed by language_B. Meaning that if the pointer is at the beginning
of the third line (with the closing bracket), it will use language_B's
indent rules. But if the point is located on the bracket, it will use
language_A's rules.
The third patch simply extracts java-ts-mode's feature list in a
variable, to allow to reuse it.
Huge thanks to Yuan Fu for his help and his amazing work with tree-sitter.
I'll need to sign the copyright assignment.
[-- Attachment #2: 0001-Add-missing-indent-rules-in-c-ts-mode.patch --]
[-- Type: text/x-patch, Size: 998 bytes --]
From 2d5248d9612feb6e55bf9e22e5d8bc1b8f565ba8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Wed, 20 Sep 2023 12:23:06 +0200
Subject: [PATCH 1/3] Add missing indent rules in c-ts-mode
---
lisp/progmodes/c-ts-mode.el | 2 ++
1 file changed, 2 insertions(+)
diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
index 70717a90caa..314f37f7ec5 100644
--- a/lisp/progmodes/c-ts-mode.el
+++ b/lisp/progmodes/c-ts-mode.el
@@ -407,6 +407,8 @@ c-ts-mode--indent-styles
((parent-is "preproc") c-ts-mode--anchor-prev-sibling 0)
((parent-is "function_definition") parent-bol 0)
+ ((parent-is "pointer_declarator") parent-bol 0)
+ ((parent-is "declaration") parent-bol 0)
((parent-is "conditional_expression") first-sibling 0)
((parent-is "assignment_expression") parent-bol c-ts-mode-indent-offset)
((parent-is "concatenated_string") first-sibling 0)
--
2.42.0
[-- Attachment #3: 0002-Fix-tree-sitter-indentation-conflict-with-multiple-l.patch --]
[-- Type: text/x-patch, Size: 1052 bytes --]
From a72df328a1195328a364be20612be189eb2dfd59 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Wed, 20 Sep 2023 12:23:22 +0200
Subject: [PATCH 2/3] Fix tree-sitter indentation conflict with multiple
languages
---
lisp/treesit.el | 4 ++--
1 file changed, 2 insertions(+), 2 deletions(-)
diff --git a/lisp/treesit.el b/lisp/treesit.el
index f0d6f32b421..0bb3aa6d653 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -1684,8 +1684,8 @@ treesit--indent-1
bol (car local-parsers)))
((eq 1 (length (treesit-parser-list nil nil t)))
(treesit-node-at bol))
- ((treesit-language-at (point))
- (treesit-node-at bol (treesit-language-at (point))))
+ ((treesit-language-at bol)
+ (treesit-node-at bol (treesit-language-at bol)))
(t (treesit-node-at bol))))
(root (treesit-parser-root-node
(treesit-node-parser smallest-node)))
--
2.42.0
[-- Attachment #4: 0003-Extract-feature-list-of-java-ts-mode-in-a-variable.patch --]
[-- Type: text/x-patch, Size: 1466 bytes --]
From 3a1e8eb01481423ad8309e52931d580704f62141 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Thu, 21 Sep 2023 18:29:24 +0200
Subject: [PATCH 3/3] Extract feature list of java-ts-mode in a variable
---
lisp/progmodes/java-ts-mode.el | 13 ++++++++-----
1 file changed, 8 insertions(+), 5 deletions(-)
diff --git a/lisp/progmodes/java-ts-mode.el b/lisp/progmodes/java-ts-mode.el
index 99d34789dcc..a6bc4c4bded 100644
--- a/lisp/progmodes/java-ts-mode.el
+++ b/lisp/progmodes/java-ts-mode.el
@@ -303,6 +303,13 @@ java-ts-mode--defun-name
(treesit-node-child-by-field-name node "name")
t))))
+
+(defvar java-ts-mode--feature-list
+ '(( comment definition )
+ ( constant keyword string type)
+ ( annotation expression literal)
+ ( bracket delimiter operator)))
+
;;;###autoload
(define-derived-mode java-ts-mode prog-mode "Java"
"Major mode for editing Java, powered by tree-sitter."
@@ -382,11 +389,7 @@ java-ts-mode
;; Font-lock.
(setq-local treesit-font-lock-settings java-ts-mode--font-lock-settings)
- (setq-local treesit-font-lock-feature-list
- '(( comment definition )
- ( constant keyword string type)
- ( annotation expression literal)
- ( bracket delimiter operator)))
+ (setq-local treesit-font-lock-feature-list java-ts-mode--feature-list)
;; Imenu.
(setq-local treesit-simple-imenu-settings
--
2.42.0
[-- Attachment #5: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 11858 bytes --]
;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-
;;; Commentary:
;; This is a mode based on tree-sitter for Bison and Yacc files, tools to generate parsers.
;;; Code:
(require 'treesit)
(require 'c-ts-common)
(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")
(defgroup bison nil
"Support for Bison and Yacc."
:group 'languages)
(defcustom bison-ts-mode-indent-offset 2
"Number of spaces for each indentation step in `bison-ts-mode'.
It has no effect in the epilogue part of the file."
:version "30.1"
:type 'integer
:safe 'integerp
:group 'bison)
(defcustom bison-ts-mode-autodetect-language t
"Search for a %language directive in the file at initialization.
Changing the value of this directive in the file requires to reload the mode to
be effective. If `bison-ts-mode-buffer-language' is set by a file-local
variable, the auto-detection is not run."
:version "30.1"
:type 'boolean
:safe 'boolean
:group 'bison)
(defvar-local bison-ts-mode-embedded-language nil
"Embedded language in Bison buffer.")
(defun bison-ts-mode--merge-feature-lists (l1 l2)
"Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
(let ((res ()))
(while (or l1 l2)
(setq res (push (append (car l1) (car l2)) res))
(setq l1 (cdr l1) l2 (cdr l2)))
(nreverse res)))
(defun bison-ts-mode--find-language-in-buffer (&optional buffer)
"Find and return the language set by the Bison directive %language.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(save-excursion
(when buffer
(switch-to-buffer buffer))
(goto-char (point-min))
(let ((pos-end
(re-search-forward
(rx
bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha "+")) "\"")
nil
t))
(pos-beg nil))
(when pos-end
(goto-char (1- pos-end))
(setq pos-beg (1+ (search-backward "\"" nil t)))
(buffer-substring-no-properties pos-beg (1- pos-end))))))
(defun bison-ts-mode--detect-language (&optional buffer)
"Dectect the embedded language in a Bison buffer.
Known languages are C, C++, D, and Java, but D is not supported as there is
no support for tree-sitter D in Emacs yet.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
(pcase (downcase str)
("c" 'c)
("c++" 'cpp)
("d" (progn (message "D language not yet supported") nil))
("java" 'java))
(progn
(message
"bison-ts-mode: %%language specification not found or invalid, defaulting to C.")
'c)))
(defun bison-ts-mode--language-at-point-function (position)
"Return the language at POSITION."
(let* ((node (treesit-node-at position 'bison)))
(if (equal (treesit-node-type node)
"embedded_code")
bison-ts-mode-embedded-language
'bison)))
(defun bison-ts-mode--font-lock-settings (language)
"Return the font-lock settings for Bison.
LANGUAGE should be set to \\='bison."
(treesit-font-lock-rules
:language language
:feature 'bison-comment
'((comment) @font-lock-comment-face)
:language language
:feature 'bison-declaration
'((declaration_name) @font-lock-keyword-face)
:language language
:feature 'bison-type
'((type) @font-lock-type-face)
:language language
:feature 'bison-grammar-rule-usage
'((grammar_rule_identifier) @font-lock-variable-use-face)
:language language
:feature 'bison-grammar-rule-declaration
'((grammar_rule (grammar_rule_declaration)
@font-lock-variable-use-face))
:language language
:feature 'bison-string
:override t
'((string) @font-lock-string-face)
:language language
:feature 'bison-literal
:override t
'((char_literal) @font-lock-keyword-face
(number_literal) @font-lock-number-face)
:language language
:feature 'bison-directive-grammar-rule
:override t
'((grammar_rule (directive) @font-lock-keyword-face))
:language language
:feature 'bison-operator
:override t
'(["|"] @font-lock-operator-face)
:language language
:feature 'bison-delimiter
:override t
'([";"] @font-lock-delimiter-face)))
(defvar bison-ts-mode--font-lock-feature-list
'(( bison-comment bison-declaration bison-type
bison-grammar-rule-usage bison-grammar-rule-declaration
bison-string bison-literal bison-directive-grammar-rule
bison-operator bison-delimiter)))
(defun bison-ts-mode--bison-matcher-action (root-name)
"Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(if (equal (treesit-node-type (treesit-node-parent node)) root-name)
(let* ((bison-node (treesit-node-at bol 'bison)))
(if (equal
(treesit-node-type
(treesit-node-parent(treesit-node-parent bison-node))) "action")
t
nil)))))
(defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
"Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(if (equal (treesit-node-type (treesit-node-parent node)) root-name)
(let* ((bison-node (treesit-node-at bol 'bison)))
(if (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue")
nil
t)))))
(defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
"Get the parent of the bison node at BOL."
(treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
(defun bison-ts-mode--indent-rules ()
"Indent rules supported by `bison-ts-mode'."
(let*
((common
`(((node-is "^declaration$")
column-0 0)
((and (parent-is "^declaration$")
(not (node-is "^code_block$")))
column-0 2)
((and (parent-is "comment") c-ts-common-looking-at-star)
c-ts-common-comment-start-after-first-star -1)
(c-ts-common-comment-2nd-line-matcher
c-ts-common-comment-2nd-line-anchor
1)
((parent-is "comment") prev-adaptive-prefix 0)
;; Opening and closing brackets "{}" of declarations
((and (parent-is "^declaration$")
(node-is "^code_block$"))
column-0 0)
((and (n-p-gp "}" "" "^declaration$"))
column-0 0)
((parent-is "^declaration$") parent 2)
((node-is "^grammar_rule$") column-0 0)
((and
(parent-is "^grammar_rule$")
(node-is ";"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(node-is "|"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(not (node-is "^grammar_rule_declaration$"))
(not (node-is "^action$")))
column-0 ,(+ bison-ts-mode-indent-offset 2))
((or
(node-is "^action$")
(node-is "^}$"))
column-0 12)
;; Set '%%' at the beginning of the line
((or
(and (parent-is "^grammar_rules_section$")
(node-is "%%"))
(node-is "^grammar_rules_section$"))
column-0 0)
(no-node parent-bol 0))))
`((bison . ,common)
;; Import and override embedded languages rules to add an offset
,(pcase bison-ts-mode-embedded-language
('c `(c
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
column-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
('cpp `(cpp
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
parent-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
('java `(java
((bison-ts-mode--bison-matcher-action "program")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "program")
column-0 ,bison-ts-mode-indent-offset)
,@java-ts-mode--indent-rules))))))
(define-derived-mode bison-ts-mode prog-mode "Bison"
"A mode for Bison."
(when (treesit-ready-p 'bison)
(when (not bison-ts-mode-embedded-language)
(setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
;; Require only if needed, to avoid warnings if a grammar is not
;; installed but not used.
(pcase bison-ts-mode-embedded-language
('c (require 'c-ts-mode))
('cpp (require 'c-ts-mode))
('java (require 'java-ts-mode)))
(setq-local treesit-font-lock-settings
(append (bison-ts-mode--font-lock-settings 'bison)
(pcase bison-ts-mode-embedded-language
('c (c-ts-mode--font-lock-settings 'c))
('cpp (c-ts-mode--font-lock-settings 'cpp))
('java java-ts-mode--font-lock-settings))))
(setq-local treesit-font-lock-feature-list
(if bison-ts-mode-embedded-language
(bison-ts-mode--merge-feature-lists
bison-ts-mode--font-lock-feature-list
(pcase bison-ts-mode-embedded-language
('c c-ts-mode--feature-list)
('cpp c-ts-mode--feature-list)
('java java-ts-mode--feature-list)))
bison-ts-mode--font-lock-feature-list))
(setq-local treesit-simple-imenu-settings
`(("Grammar"
"\\`grammar_rule_declaration\\'"
nil
(lambda (node) (substring-no-properties (treesit-node-text node))))))
(c-ts-common-comment-setup)
(setq-local treesit-simple-indent-rules
(bison-ts-mode--indent-rules))
(setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
(when bison-ts-mode-embedded-language
(setq-local treesit-range-settings
(treesit-range-rules
:embed bison-ts-mode-embedded-language
:host 'bison
:local t
'((embedded_code) @capture))))
(treesit-major-mode-setup)))
(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
@ 2023-09-21 22:23 ` Stefan Kangas
2023-09-22 5:52 ` Eli Zaretskii
` (4 subsequent siblings)
5 siblings, 0 replies; 17+ messages in thread
From: Stefan Kangas @ 2023-09-21 22:23 UTC (permalink / raw)
To: Augustin Chéneau (BTuin), emacs-devel
Augustin Chéneau (BTuin) <btuin@mailo.com> writes:
> I'll need to sign the copyright assignment.
Form sent off-list.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
2023-09-21 22:23 ` Stefan Kangas
@ 2023-09-22 5:52 ` Eli Zaretskii
2023-09-22 23:44 ` Yuan Fu
2023-09-22 7:38 ` Philip Kaludercic
` (3 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2023-09-22 5:52 UTC (permalink / raw)
To: Augustin Chéneau, Yuan Fu; +Cc: emacs-devel
> Date: Thu, 21 Sep 2023 22:15:45 +0200
> From: Augustin Chéneau (BTuin) <btuin@mailo.com>
>
> I'd like to propose a new tree-sitter mode for Emacs, bison-ts-mode.
> This is a major mode for GNU Bison grammar files, and it supports the
> embedded languages C, C++ and Java. Bison also supports D, but this
> language is not yet supported in Emacs so this mode does not support it
> either.
>
> Since Bison is a superset of Yacc, it should work with Yacc without
> modification.
> The grammar is available here: <https://gitlab.com/btuin2/tree-sitter-bison>
Thanks.
> In addition of the mode, three patches are joined (should I open a bug
> report for those?).
>
>
> The first one is to add some missing rules in c-ts-mode. Currently
> there is no rule if an identifier in a declaration is on a new line.
> In these two cases, "variable_name" is not indented:
>
> static myttype *
> variable_name;
>
> static myttype
> variable_name;
>
> The new behavior is the same as c-mode.
I think this patch should be installed on the emacs-29 branch. (It is
a very small change, so we can install it even before the legal
paperwork is completed.) Yuan, any objections or comments to this
one?
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
2023-09-21 22:23 ` Stefan Kangas
2023-09-22 5:52 ` Eli Zaretskii
@ 2023-09-22 7:38 ` Philip Kaludercic
2023-09-22 14:53 ` Augustin Chéneau (BTuin)
2023-09-22 7:42 ` Stefan Kangas
` (2 subsequent siblings)
5 siblings, 1 reply; 17+ messages in thread
From: Philip Kaludercic @ 2023-09-22 7:38 UTC (permalink / raw)
To: Augustin Chéneau (BTuin); +Cc: emacs-devel
"Augustin Chéneau (BTuin)" <btuin@mailo.com> writes:
A few comments on the proposed file:
> ;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-
>
Could you add the usual header information here?
;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
--8<---------------cut here---------------start------------->8---
;; Author: Augustin Chéneau <btuin@mailo.com>
;; This file is part of GNU Emacs.
;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.
;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
--8<---------------cut here---------------end--------------->8---
> ;;; Commentary:
>
> ;; This is a mode based on tree-sitter for Bison and Yacc files, tools to generate parsers.
Shouldn't you mention what tree sitter grammar is being being used here?
> ;;; Code:
>
> (require 'treesit)
> (require 'c-ts-common)
>
> (declare-function treesit-parser-create "treesit.c")
> (declare-function treesit-induce-sparse-tree "treesit.c")
> (declare-function treesit-node-child-by-field-name "treesit.c")
> (declare-function treesit-search-subtree "treesit.c")
> (declare-function treesit-node-parent "treesit.c")
> (declare-function treesit-node-next-sibling "treesit.c")
> (declare-function treesit-node-type "treesit.c")
> (declare-function treesit-node-child "treesit.c")
> (declare-function treesit-node-end "treesit.c")
> (declare-function treesit-node-start "treesit.c")
> (declare-function treesit-node-string "treesit.c")
> (declare-function treesit-query-compile "treesit.c")
> (declare-function treesit-query-capture "treesit.c")
> (declare-function treesit-parser-add-notifier "treesit.c")
> (declare-function treesit-parser-buffer "treesit.c")
> (declare-function treesit-parser-list "treesit.c")
>
>
> (defgroup bison nil
bison or bison-ts?
> "Support for Bison and Yacc."
Shouldn't tree-sitter be mentioned here?
> :group 'languages)
>
> (defcustom bison-ts-mode-indent-offset 2
> "Number of spaces for each indentation step in `bison-ts-mode'.
> It has no effect in the epilogue part of the file."
> :version "30.1"
> :type 'integer
> :safe 'integerp
> :group 'bison)
>
> (defcustom bison-ts-mode-autodetect-language t
> "Search for a %language directive in the file at initialization.
> Changing the value of this directive in the file requires to reload the mode to
> be effective. If `bison-ts-mode-buffer-language' is set by a file-local
> variable, the auto-detection is not run."
> :version "30.1"
> :type 'boolean
> :safe 'boolean
> :group 'bison)
>
> (defvar-local bison-ts-mode-embedded-language nil
> "Embedded language in Bison buffer.")
>
> (defun bison-ts-mode--merge-feature-lists (l1 l2)
> "Merge the lists of lists L1 and L2.
> The first sublist of L1 is merged with the first sublist of L2 and so on.
> L1 and L2 don't need to have the same size."
> (let ((res ()))
> (while (or l1 l2)
> (setq res (push (append (car l1) (car l2)) res))
> (setq l1 (cdr l1) l2 (cdr l2)))
> (nreverse res)))
Is this a generic function that should be extracted into some common file?
> (defun bison-ts-mode--find-language-in-buffer (&optional buffer)
> "Find and return the language set by the Bison directive %language.
> If BUFFER is set, search in this buffer, otherwise search in the current
> buffer."
> (save-excursion
> (when buffer
> (switch-to-buffer buffer))
Or rather?
(with-current-buffer (or buffer (current-buffer))
(save-excursion
...))
> (goto-char (point-min))
> (let ((pos-end
> (re-search-forward
> (rx
> bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha "+")) "\"")
> nil
> t))
> (pos-beg nil))
> (when pos-end
Using when-let might be nice here.
> (goto-char (1- pos-end))
> (setq pos-beg (1+ (search-backward "\"" nil t)))
> (buffer-substring-no-properties pos-beg (1- pos-end))))))
I'd use a single regular expression here with a group, then extract the
right information using `match-string'.
>
>
> (defun bison-ts-mode--detect-language (&optional buffer)
> "Dectect the embedded language in a Bison buffer.
> Known languages are C, C++, D, and Java, but D is not supported as there is
> no support for tree-sitter D in Emacs yet.
> If BUFFER is set, search in this buffer, otherwise search in the current
> buffer."
> (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
> (pcase (downcase str)
> ("c" 'c)
> ("c++" 'cpp)
> ("d" (progn (message "D language not yet supported") nil))
Each pcase case has an implicit progn.
> ("java" 'java))
> (progn
Use when-let to avoid this progn.
> (message
> "bison-ts-mode: %%language specification not found or invalid, defaulting to C.")
Is it necessary to prefix the message with the major mode name?
> 'c)))
>
>
> (defun bison-ts-mode--language-at-point-function (position)
> "Return the language at POSITION."
> (let* ((node (treesit-node-at position 'bison)))
^
let is enough
> (if (equal (treesit-node-type node)
> "embedded_code")
There is no need to break the line here.
> bison-ts-mode-embedded-language
> 'bison)))
>
> (defun bison-ts-mode--font-lock-settings (language)
> "Return the font-lock settings for Bison.
> LANGUAGE should be set to \\='bison."
> (treesit-font-lock-rules
> :language language
> :feature 'bison-comment
> '((comment) @font-lock-comment-face)
>
> :language language
> :feature 'bison-declaration
> '((declaration_name) @font-lock-keyword-face)
>
> :language language
> :feature 'bison-type
> '((type) @font-lock-type-face)
>
> :language language
> :feature 'bison-grammar-rule-usage
> '((grammar_rule_identifier) @font-lock-variable-use-face)
>
> :language language
> :feature 'bison-grammar-rule-declaration
> '((grammar_rule (grammar_rule_declaration)
> @font-lock-variable-use-face))
>
> :language language
> :feature 'bison-string
> :override t
> '((string) @font-lock-string-face)
>
> :language language
> :feature 'bison-literal
> :override t
> '((char_literal) @font-lock-keyword-face
> (number_literal) @font-lock-number-face)
>
> :language language
> :feature 'bison-directive-grammar-rule
> :override t
> '((grammar_rule (directive) @font-lock-keyword-face))
>
> :language language
> :feature 'bison-operator
> :override t
> '(["|"] @font-lock-operator-face)
>
> :language language
> :feature 'bison-delimiter
> :override t
> '([";"] @font-lock-delimiter-face)))
>
>
> (defvar bison-ts-mode--font-lock-feature-list
> '(( bison-comment bison-declaration bison-type
> bison-grammar-rule-usage bison-grammar-rule-declaration
> bison-string bison-literal bison-directive-grammar-rule
> bison-operator bison-delimiter)))
>
>
> (defun bison-ts-mode--bison-matcher-action (root-name)
> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
> ROOT-NAME is the highest-level node of the embedded language."
> (lambda (node _parent bol &rest _)
> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
> (let* ((bison-node (treesit-node-at bol 'bison)))
^
here again, let is enough
(if (equal
> (treesit-node-type
> (treesit-node-parent(treesit-node-parent bison-node))) "action")
Though you could bind the (treesit-node-type ...) expression under the
above let.
> t
> nil)))))
Why (if foo t nil) when foo would do the same job (equal only returns
nil and t, so normalising the value isn't even necessary).
>
> (defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
> ROOT-NAME is the highest-level node of the embedded language."
> (lambda (node _parent bol &rest _)
> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
> (let* ((bison-node (treesit-node-at bol 'bison)))
> (if (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue")
> nil
> t)))))
Am I missing something, or couldn't these two functions be merged if you
give them a third argument NODE-TYPE and pass it "action" or "epilogue".
>
>
> (defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
> "Get the parent of the bison node at BOL."
> (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
>
>
> (defun bison-ts-mode--indent-rules ()
> "Indent rules supported by `bison-ts-mode'."
> (let*
> ((common
> `(((node-is "^declaration$")
> column-0 0)
> ((and (parent-is "^declaration$")
> (not (node-is "^code_block$")))
> column-0 2)
> ((and (parent-is "comment") c-ts-common-looking-at-star)
> c-ts-common-comment-start-after-first-star -1)
> (c-ts-common-comment-2nd-line-matcher
> c-ts-common-comment-2nd-line-anchor
> 1)
> ((parent-is "comment") prev-adaptive-prefix 0)
>
> ;; Opening and closing brackets "{}" of declarations
> ((and (parent-is "^declaration$")
> (node-is "^code_block$"))
> column-0 0)
> ((and (n-p-gp "}" "" "^declaration$"))
> column-0 0)
> ((parent-is "^declaration$") parent 2)
> ((node-is "^grammar_rule$") column-0 0)
> ((and
> (parent-is "^grammar_rule$")
> (node-is ";"))
> column-0 bison-ts-mode-indent-offset)
> ((and (parent-is "^grammar_rule$")
> (node-is "|"))
> column-0 bison-ts-mode-indent-offset)
> ((and (parent-is "^grammar_rule$")
> (not (node-is "^grammar_rule_declaration$"))
> (not (node-is "^action$")))
> column-0 ,(+ bison-ts-mode-indent-offset 2))
> ((or
> (node-is "^action$")
> (node-is "^}$"))
> column-0 12)
> ;; Set '%%' at the beginning of the line
> ((or
> (and (parent-is "^grammar_rules_section$")
> (node-is "%%"))
> (node-is "^grammar_rules_section$"))
> column-0 0)
> (no-node parent-bol 0))))
> `((bison . ,common)
> ;; Import and override embedded languages rules to add an offset
> ,(pcase bison-ts-mode-embedded-language
> ('c `(c
> ((bison-ts-mode--bison-matcher-action "translation_unit")
> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
> column-0 ,bison-ts-mode-indent-offset)
> ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
> ('cpp `(cpp
> ((bison-ts-mode--bison-matcher-action "translation_unit")
> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
> parent-0 ,bison-ts-mode-indent-offset)
> ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
> ('java `(java
> ((bison-ts-mode--bison-matcher-action "program")
> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
> ((bison-ts-mode--bison-matcher-not-epilogue "program")
> column-0 ,bison-ts-mode-indent-offset)
> ,@java-ts-mode--indent-rules))))))
>
>
> (define-derived-mode bison-ts-mode prog-mode "Bison"
> "A mode for Bison."
^
major-mode
Also, mentioning tree-sitter seems like something worth doing.
> (when (treesit-ready-p 'bison)
> (when (not bison-ts-mode-embedded-language)
> (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
>
> ;; Require only if needed, to avoid warnings if a grammar is not
> ;; installed but not used.
> (pcase bison-ts-mode-embedded-language
Would a `pcase-exhaustive' be appropriate here?
> ('c (require 'c-ts-mode))
> ('cpp (require 'c-ts-mode))
> ('java (require 'java-ts-mode)))
>
> (setq-local treesit-font-lock-settings
> (append (bison-ts-mode--font-lock-settings 'bison)
> (pcase bison-ts-mode-embedded-language
> ('c (c-ts-mode--font-lock-settings 'c))
> ('cpp (c-ts-mode--font-lock-settings 'cpp))
> ('java java-ts-mode--font-lock-settings))))
>
> (setq-local treesit-font-lock-feature-list
> (if bison-ts-mode-embedded-language
> (bison-ts-mode--merge-feature-lists
> bison-ts-mode--font-lock-feature-list
> (pcase bison-ts-mode-embedded-language
> ('c c-ts-mode--feature-list)
> ('cpp c-ts-mode--feature-list)
> ('java java-ts-mode--feature-list)))
> bison-ts-mode--font-lock-feature-list))
>
> (setq-local treesit-simple-imenu-settings
> `(("Grammar"
> "\\`grammar_rule_declaration\\'"
> nil
> (lambda (node) (substring-no-properties (treesit-node-text node))))))
>
> (c-ts-common-comment-setup)
>
> (setq-local treesit-simple-indent-rules
> (bison-ts-mode--indent-rules))
>
> (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
>
> (when bison-ts-mode-embedded-language
> (setq-local treesit-range-settings
> (treesit-range-rules
> :embed bison-ts-mode-embedded-language
> :host 'bison
> :local t
> '((embedded_code) @capture))))
>
> (treesit-major-mode-setup)))
>
> (provide 'bison-ts-mode)
> ;;; bison-ts-mode.el ends here
Sorry for the number of comments, but there has been a discussion on the
code-quality of tree-sitter major modes that has been less than optimal,
so I hope that your contribution could help raise the bar.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
` (2 preceding siblings ...)
2023-09-22 7:38 ` Philip Kaludercic
@ 2023-09-22 7:42 ` Stefan Kangas
2023-09-22 8:45 ` Yuan Fu
2023-09-24 21:10 ` Yuan Fu
5 siblings, 0 replies; 17+ messages in thread
From: Stefan Kangas @ 2023-09-22 7:42 UTC (permalink / raw)
To: Augustin Chéneau (BTuin), emacs-devel
Augustin Chéneau (BTuin) <btuin@mailo.com> writes:
> I'd like to propose a new tree-sitter mode for Emacs, bison-ts-mode.
> This is a major mode for GNU Bison grammar files, and it supports the
> embedded languages C, C++ and Java. Bison also supports D, but this
> language is not yet supported in Emacs so this mode does not support it
> either.
>
> Since Bison is a superset of Yacc, it should work with Yacc without
> modification.
> The grammar is available here: <https://gitlab.com/btuin2/tree-sitter-bison>
>
> This mode auto-detects the embedded language in the file if the
> directive "%language" is set.
This sounds like a good addition to Emacs.
Thanks for your contribution.
> In addition of the mode, three patches are joined (should I open a bug
> report for those?).
Please do, and thanks in advance. If possible, it's better to open
three different bug reports, since they seem unrelated to each other.
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
` (3 preceding siblings ...)
2023-09-22 7:42 ` Stefan Kangas
@ 2023-09-22 8:45 ` Yuan Fu
2023-09-24 21:10 ` Yuan Fu
5 siblings, 0 replies; 17+ messages in thread
From: Yuan Fu @ 2023-09-22 8:45 UTC (permalink / raw)
To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel
> On Sep 21, 2023, at 1:15 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>
> Hello,
> I'd like to propose a new tree-sitter mode for Emacs, bison-ts-mode.
> This is a major mode for GNU Bison grammar files, and it supports the embedded languages C, C++ and Java. Bison also supports D, but this language is not yet supported in Emacs so this mode does not support it either.
>
> Since Bison is a superset of Yacc, it should work with Yacc without modification.
> The grammar is available here: <https://gitlab.com/btuin2/tree-sitter-bison>
>
> This mode auto-detects the embedded language in the file if the directive "%language" is set.
>
>
> In addition of the mode, three patches are joined (should I open a bug report for those?).
>
>
> The first one is to add some missing rules in c-ts-mode. Currently there is no rule if an identifier in a declaration is on a new line.
> In these two cases, "variable_name" is not indented:
>
> static myttype *
> variable_name;
>
> static myttype
> variable_name;
>
> The new behavior is the same as c-mode.
>
>
>
> The second one fix an edge case with multiple languages indentation. Here is a simple example:
>
> // language_A
> {
> // language_B
> }
> // language_A
>
>
> Here, brackets are managed by language_A, but everything inside is managed by language_B. Meaning that if the pointer is at the beginning of the third line (with the closing bracket), it will use language_B's indent rules. But if the point is located on the bracket, it will use language_A's rules.
>
>
>
> The third patch simply extracts java-ts-mode's feature list in a variable, to allow to reuse it.
>
>
> Huge thanks to Yuan Fu for his help and his amazing work with tree-sitter.
Thanks!
For now, I think we can just use the same feature names for different languages (so no need for prefixes anymore). That way, enabling a feature in the feature list would enable it for all languages. If a user desires more fine-grained control, they can use treesit-recompute-feature to enable/disable features for a particular language. (I upgraded that function in 04fd8b173bd)
Yuan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-22 7:38 ` Philip Kaludercic
@ 2023-09-22 14:53 ` Augustin Chéneau (BTuin)
2023-09-22 20:40 ` Philip Kaludercic
0 siblings, 1 reply; 17+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-22 14:53 UTC (permalink / raw)
To: Philip Kaludercic; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 17082 bytes --]
Le 22/09/2023 à 09:38, Philip Kaludercic a écrit :
> "Augustin Chéneau (BTuin)" <btuin@mailo.com> writes:
>
> A few comments on the proposed file:
>
>> ;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-
>>
> Could you add the usual header information here?
>
> ;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
>
> --8<---------------cut here---------------start------------->8---
> ;; Author: Augustin Chéneau <btuin@mailo.com>
>
> ;; This file is part of GNU Emacs.
>
> ;; GNU Emacs is free software: you can redistribute it and/or modify
> ;; it under the terms of the GNU General Public License as published by
> ;; the Free Software Foundation, either version 3 of the License, or
> ;; (at your option) any later version.
>
> ;; GNU Emacs is distributed in the hope that it will be useful,
> ;; but WITHOUT ANY WARRANTY; without even the implied warranty of
> ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
> ;; GNU General Public License for more details.
>
> ;; You should have received a copy of the GNU General Public License
> ;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
> --8<---------------cut here---------------end--------------->8---
>
>> ;;; Commentary:
>>
>> ;; This is a mode based on tree-sitter for Bison and Yacc files, tools to generate parsers.
>
> Shouldn't you mention what tree sitter grammar is being being used here?
>
>> ;;; Code:
>>
>> (require 'treesit)
>> (require 'c-ts-common)
>>
>> (declare-function treesit-parser-create "treesit.c")
>> (declare-function treesit-induce-sparse-tree "treesit.c")
>> (declare-function treesit-node-child-by-field-name "treesit.c")
>> (declare-function treesit-search-subtree "treesit.c")
>> (declare-function treesit-node-parent "treesit.c")
>> (declare-function treesit-node-next-sibling "treesit.c")
>> (declare-function treesit-node-type "treesit.c")
>> (declare-function treesit-node-child "treesit.c")
>> (declare-function treesit-node-end "treesit.c")
>> (declare-function treesit-node-start "treesit.c")
>> (declare-function treesit-node-string "treesit.c")
>> (declare-function treesit-query-compile "treesit.c")
>> (declare-function treesit-query-capture "treesit.c")
>> (declare-function treesit-parser-add-notifier "treesit.c")
>> (declare-function treesit-parser-buffer "treesit.c")
>> (declare-function treesit-parser-list "treesit.c")
>>
>>
>> (defgroup bison nil
>
> bison or bison-ts?
>
As far as I know, no group is explicitly reserved for tree-sitter.
rust-ts-mode uses the group "rust", java-ts-mode uses "java",
ruby-ts-mode uses "ruby"...
>> "Support for Bison and Yacc."
>
> Shouldn't tree-sitter be mentioned here?
>
Same as above.
>> :group 'languages)
>>
>> (defcustom bison-ts-mode-indent-offset 2
>> "Number of spaces for each indentation step in `bison-ts-mode'.
>> It has no effect in the epilogue part of the file."
>> :version "30.1"
>> :type 'integer
>> :safe 'integerp
>> :group 'bison)
>>
>> (defcustom bison-ts-mode-autodetect-language t
>> "Search for a %language directive in the file at initialization.
>> Changing the value of this directive in the file requires to reload the mode to
>> be effective. If `bison-ts-mode-buffer-language' is set by a file-local
>> variable, the auto-detection is not run."
>> :version "30.1"
>> :type 'boolean
>> :safe 'boolean
>> :group 'bison)
>>
>> (defvar-local bison-ts-mode-embedded-language nil
>> "Embedded language in Bison buffer.")
>>
>> (defun bison-ts-mode--merge-feature-lists (l1 l2)
>> "Merge the lists of lists L1 and L2.
>> The first sublist of L1 is merged with the first sublist of L2 and so on.
>> L1 and L2 don't need to have the same size."
>> (let ((res ()))
>> (while (or l1 l2)
>> (setq res (push (append (car l1) (car l2)) res))
>> (setq l1 (cdr l1) l2 (cdr l2)))
>> (nreverse res)))
>
> Is this a generic function that should be extracted into some common file?
>
It probably should, it could be useful for other *-ts-mode that want to
mix multiple languages, unless there already is such a function in Elisp.
>> (defun bison-ts-mode--find-language-in-buffer (&optional buffer)
>> "Find and return the language set by the Bison directive %language.
>> If BUFFER is set, search in this buffer, otherwise search in the current
>> buffer."
>> (save-excursion
>> (when buffer
>> (switch-to-buffer buffer))
>
> Or rather?
>
> (with-current-buffer (or buffer (current-buffer))
> (save-excursion
> ...))
>
>> (goto-char (point-min))
>> (let ((pos-end
>> (re-search-forward
>> (rx
>> bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha "+")) "\"")
>> nil
>> t))
>> (pos-beg nil))
>> (when pos-end
>
> Using when-let might be nice here.
>
>> (goto-char (1- pos-end))
>> (setq pos-beg (1+ (search-backward "\"" nil t)))
>> (buffer-substring-no-properties pos-beg (1- pos-end))))))
>
> I'd use a single regular expression here with a group, then extract the
> right information using `match-string'.
>
Nice, I didn't know it was possible.
>>
>>
>> (defun bison-ts-mode--detect-language (&optional buffer)
>> "Dectect the embedded language in a Bison buffer.
>> Known languages are C, C++, D, and Java, but D is not supported as there is
>> no support for tree-sitter D in Emacs yet.
>> If BUFFER is set, search in this buffer, otherwise search in the current
>> buffer."
>> (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
>> (pcase (downcase str)
>> ("c" 'c)
>> ("c++" 'cpp)
>> ("d" (progn (message "D language not yet supported") nil))
>
> Each pcase case has an implicit progn.
>
>> ("java" 'java))
>> (progn
>
> Use when-let to avoid this progn.
>
>> (message
>> "bison-ts-mode: %%language specification not found or invalid, defaulting to C.")
>
> Is it necessary to prefix the message with the major mode name?
>
If feared it would be a bit cryptic without. Anyway I modified this
function to only display a message if "%language" is present but the
language is invalid, so it probably isn't necessary anymore.
>> 'c)))
>>
>>
>> (defun bison-ts-mode--language-at-point-function (position)
>> "Return the language at POSITION."
>> (let* ((node (treesit-node-at position 'bison)))
> ^
> let is enough
>
>> (if (equal (treesit-node-type node)
>> "embedded_code")
>
> There is no need to break the line here.
>
>> bison-ts-mode-embedded-language
>> 'bison)))
>>
>> (defun bison-ts-mode--font-lock-settings (language)
>> "Return the font-lock settings for Bison.
>> LANGUAGE should be set to \\='bison."
>> (treesit-font-lock-rules
>> :language language
>> :feature 'bison-comment
>> '((comment) @font-lock-comment-face)
>>
>> :language language
>> :feature 'bison-declaration
>> '((declaration_name) @font-lock-keyword-face)
>>
>> :language language
>> :feature 'bison-type
>> '((type) @font-lock-type-face)
>>
>> :language language
>> :feature 'bison-grammar-rule-usage
>> '((grammar_rule_identifier) @font-lock-variable-use-face)
>>
>> :language language
>> :feature 'bison-grammar-rule-declaration
>> '((grammar_rule (grammar_rule_declaration)
>> @font-lock-variable-use-face))
>>
>> :language language
>> :feature 'bison-string
>> :override t
>> '((string) @font-lock-string-face)
>>
>> :language language
>> :feature 'bison-literal
>> :override t
>> '((char_literal) @font-lock-keyword-face
>> (number_literal) @font-lock-number-face)
>>
>> :language language
>> :feature 'bison-directive-grammar-rule
>> :override t
>> '((grammar_rule (directive) @font-lock-keyword-face))
>>
>> :language language
>> :feature 'bison-operator
>> :override t
>> '(["|"] @font-lock-operator-face)
>>
>> :language language
>> :feature 'bison-delimiter
>> :override t
>> '([";"] @font-lock-delimiter-face)))
>>
>>
>> (defvar bison-ts-mode--font-lock-feature-list
>> '(( bison-comment bison-declaration bison-type
>> bison-grammar-rule-usage bison-grammar-rule-declaration
>> bison-string bison-literal bison-directive-grammar-rule
>> bison-operator bison-delimiter)))
>>
>>
>> (defun bison-ts-mode--bison-matcher-action (root-name)
>> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
>> ROOT-NAME is the highest-level node of the embedded language."
>> (lambda (node _parent bol &rest _)
>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>> (let* ((bison-node (treesit-node-at bol 'bison)))
> ^
> here again, let is enough
>
> (if (equal
>> (treesit-node-type
>> (treesit-node-parent(treesit-node-parent bison-node))) "action")
>
> Though you could bind the (treesit-node-type ...) expression under the
> above let.
>
>> t
>> nil)))))
>
> Why (if foo t nil) when foo would do the same job (equal only returns
> nil and t, so normalising the value isn't even necessary).
>
Because I was stupid.
>>
>> (defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
>> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
>> ROOT-NAME is the highest-level node of the embedded language."
>> (lambda (node _parent bol &rest _)
>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>> (let* ((bison-node (treesit-node-at bol 'bison)))
>> (if (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue")
>> nil
>> t)))))
>
> Am I missing something, or couldn't these two functions be merged if you
> give them a third argument NODE-TYPE and pass it "action" or "epilogue".
>
No, bison-ts-mode--bison-matcher-action checks if the _grandparent_ is
an "action" node, while bison-ts-mode--bison-matcher-not-epilogue checks
if the _parent_ is an "epilogue" node.
>>
>>
>> (defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
>> "Get the parent of the bison node at BOL."
>> (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
>>
>>
>> (defun bison-ts-mode--indent-rules ()
>> "Indent rules supported by `bison-ts-mode'."
>> (let*
>> ((common
>> `(((node-is "^declaration$")
>> column-0 0)
>> ((and (parent-is "^declaration$")
>> (not (node-is "^code_block$")))
>> column-0 2)
>> ((and (parent-is "comment") c-ts-common-looking-at-star)
>> c-ts-common-comment-start-after-first-star -1)
>> (c-ts-common-comment-2nd-line-matcher
>> c-ts-common-comment-2nd-line-anchor
>> 1)
>> ((parent-is "comment") prev-adaptive-prefix 0)
>>
>> ;; Opening and closing brackets "{}" of declarations
>> ((and (parent-is "^declaration$")
>> (node-is "^code_block$"))
>> column-0 0)
>> ((and (n-p-gp "}" "" "^declaration$"))
>> column-0 0)
>> ((parent-is "^declaration$") parent 2)
>> ((node-is "^grammar_rule$") column-0 0)
>> ((and
>> (parent-is "^grammar_rule$")
>> (node-is ";"))
>> column-0 bison-ts-mode-indent-offset)
>> ((and (parent-is "^grammar_rule$")
>> (node-is "|"))
>> column-0 bison-ts-mode-indent-offset)
>> ((and (parent-is "^grammar_rule$")
>> (not (node-is "^grammar_rule_declaration$"))
>> (not (node-is "^action$")))
>> column-0 ,(+ bison-ts-mode-indent-offset 2))
>> ((or
>> (node-is "^action$")
>> (node-is "^}$"))
>> column-0 12)
>> ;; Set '%%' at the beginning of the line
>> ((or
>> (and (parent-is "^grammar_rules_section$")
>> (node-is "%%"))
>> (node-is "^grammar_rules_section$"))
>> column-0 0)
>> (no-node parent-bol 0))))
>> `((bison . ,common)
>> ;; Import and override embedded languages rules to add an offset
>> ,(pcase bison-ts-mode-embedded-language
>> ('c `(c
>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>> column-0 ,bison-ts-mode-indent-offset)
>> ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
>> ('cpp `(cpp
>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>> parent-0 ,bison-ts-mode-indent-offset)
>> ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
>> ('java `(java
>> ((bison-ts-mode--bison-matcher-action "program")
>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>> ((bison-ts-mode--bison-matcher-not-epilogue "program")
>> column-0 ,bison-ts-mode-indent-offset)
>> ,@java-ts-mode--indent-rules))))))
>>
>>
>> (define-derived-mode bison-ts-mode prog-mode "Bison"
>> "A mode for Bison."
> ^
> major-mode
>
> Also, mentioning tree-sitter seems like something worth doing.
>
>> (when (treesit-ready-p 'bison)
>> (when (not bison-ts-mode-embedded-language)
>> (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
>>
>> ;; Require only if needed, to avoid warnings if a grammar is not
>> ;; installed but not used.
>> (pcase bison-ts-mode-embedded-language
>
> Would a `pcase-exhaustive' be appropriate here?
>
No, the language D is recognized but not supported yet in Emacs, so if
this language is detected it will not configure anything.
>> ('c (require 'c-ts-mode))
>> ('cpp (require 'c-ts-mode))
>> ('java (require 'java-ts-mode)))
>>
>> (setq-local treesit-font-lock-settings
>> (append (bison-ts-mode--font-lock-settings 'bison)
>> (pcase bison-ts-mode-embedded-language
>> ('c (c-ts-mode--font-lock-settings 'c))
>> ('cpp (c-ts-mode--font-lock-settings 'cpp))
>> ('java java-ts-mode--font-lock-settings))))
>>
>> (setq-local treesit-font-lock-feature-list
>> (if bison-ts-mode-embedded-language
>> (bison-ts-mode--merge-feature-lists
>> bison-ts-mode--font-lock-feature-list
>> (pcase bison-ts-mode-embedded-language
>> ('c c-ts-mode--feature-list)
>> ('cpp c-ts-mode--feature-list)
>> ('java java-ts-mode--feature-list)))
>> bison-ts-mode--font-lock-feature-list))
>>
>> (setq-local treesit-simple-imenu-settings
>> `(("Grammar"
>> "\\`grammar_rule_declaration\\'"
>> nil
>> (lambda (node) (substring-no-properties (treesit-node-text node))))))
>>
>> (c-ts-common-comment-setup)
>>
>> (setq-local treesit-simple-indent-rules
>> (bison-ts-mode--indent-rules))
>>
>> (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
>>
>> (when bison-ts-mode-embedded-language
>> (setq-local treesit-range-settings
>> (treesit-range-rules
>> :embed bison-ts-mode-embedded-language
>> :host 'bison
>> :local t
>> '((embedded_code) @capture))))
>>
>> (treesit-major-mode-setup)))
>>
>> (provide 'bison-ts-mode)
>> ;;; bison-ts-mode.el ends here
>
> Sorry for the number of comments, but there has been a discussion on the
> code-quality of tree-sitter major modes that has been less than optimal,
> so I hope that your contribution could help raise the bar.
No problem, thank you for your review!
You can find the updated mode attached.
I've added `bison-ts-mode-embedded-language' to `safe-local-variable' with:
(defvar-local bison-ts-mode-embedded-language nil
"Embedded language in Bison buffer.
Supported values are `c', `cpp', and `java'.")
;;;###autoload
(put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp)
By the way I have sent the mail for the copyright assignment.
[-- Attachment #2: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 12489 bytes --]
;;; bison-ts-mode --- tree-sitter support for Bison -*- lexical-binding: t; -*-
;; Copyright (C) 2023 Free Software Foundation, Inc.
;; Author : Augustin Chéneau <btuin@mailo.com>
;; Keywords : bison yacc languages tree-sitter
;; This file is part of GNU Emacs.
;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.
;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
;;; Commentary:
;; This is a mode based on tree-sitter for Bison and Yacc files, tools to
;; generate parsers. The grammar used is available here:
;; https://gitlab.com/btuin2/tree-sitter-bison
;;; Code:
(require 'treesit)
(require 'c-ts-common)
(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")
(defgroup bison-ts nil
"Support for Bison and Yacc."
:group 'languages)
(defcustom bison-ts-mode-indent-offset 2
"Number of spaces for each indentation step in `bison-ts-mode'.
It has no effect in the epilogue part of the file."
:version "30.1"
:type 'integer
:safe 'integerp
:group 'bison)
(defcustom bison-ts-mode-autodetect-language t
"Search for a %language directive in the file at initialization.
Changing the value of this directive in the file requires to reload the mode to
be effective. If `bison-ts-mode-buffer-language' is set by a file-local
variable, the auto-detection is not run."
:version "30.1"
:type 'boolean
:safe 'boolean
:group 'bison)
(defvar-local bison-ts-mode-embedded-language nil
"Embedded language in Bison buffer.
Supported values are `c', `cpp', and `java'.")
;;;###autoload
(put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp)
(defun bison-ts-mode--merge-feature-lists (l1 l2)
"Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
(let ((res ()))
(while (or l1 l2)
(setq res (push (seq-uniq (append (car l1) (car l2)) 'eq) res))
(setq l1 (cdr l1) l2 (cdr l2)))
(nreverse res)))
(defun bison-ts-mode--find-language-in-buffer (&optional buffer)
"Find and return the language set by the Bison directive %language.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(save-excursion
(with-current-buffer (or buffer (current-buffer))
(goto-char (point-min))
(when
(re-search-forward
(rx
bol (0+ blank) "%language" (0+ blank) "\"" (group (1+ (in alpha "+"))) "\"")
nil
t)))
(substring-no-properties (match-string 1))))
(defun bison-ts-mode--detect-language (&optional buffer)
"Dectect the embedded language in a Bison buffer.
Known languages are C, C++, D, and Java, but D is not supported as there is
no support for tree-sitter D in Emacs yet.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
(pcase-exhaustive (downcase str)
("c" 'c)
("c++" 'cpp)
("d" (message "D language not yet supported") nil)
("java" 'java)
(_ (message "%%language specification \"%s\" is invalid, defaulting to C" str) 'c))))
(defun bison-ts-mode--language-at-point-function (position)
"Return the language at POSITION."
(let ((node (treesit-node-at position 'bison)))
(if (equal (treesit-node-type node) "embedded_code")
bison-ts-mode-embedded-language
'bison)))
(defun bison-ts-mode--font-lock-settings (language)
"Return the font-lock settings for Bison.
LANGUAGE should be set to \\='bison."
(treesit-font-lock-rules
:language language
:feature 'comment
'((comment) @font-lock-comment-face)
:language language
:feature 'declaration
'((declaration_name) @font-lock-keyword-face)
:language language
:feature 'type
'((type) @font-lock-type-face)
:language language
:feature 'variable
'((grammar_rule_identifier) @font-lock-variable-use-face)
:language language
:feature 'grammar-declaration
'((grammar_rule (grammar_rule_declaration)
@font-lock-variable-use-face))
:language language
:feature 'string
:override t
'((string) @font-lock-string-face)
:language language
:feature 'literal
:override t
'((char_literal) @font-lock-keyword-face
(number_literal) @font-lock-number-face)
:language language
:feature 'directive-grammar-rule
:override t
'((grammar_rule (directive) @font-lock-keyword-face))
:language language
:feature 'operator
:override t
'(["|"] @font-lock-operator-face)
:language language
:feature 'delimiter
:override t
'([";"] @font-lock-delimiter-face)))
(defvar bison-ts-mode--font-lock-feature-list
'(( comment declaration grammar-declaration)
( type string directive-grammar-rule)
( literal)
( variable operator delimiter)))
(defun bison-ts-mode--bison-matcher-action (root-name)
"Treesit matcher to check if NODE at BOL is located in an action node.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(if (equal (treesit-node-type (treesit-node-parent node)) root-name)
(let ((bison-node (treesit-node-at bol 'bison)))
(equal
(treesit-node-type
(treesit-node-parent (treesit-node-parent bison-node)))
"action")))))
(defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
"Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(if (equal (treesit-node-type (treesit-node-parent node)) root-name)
(let ((bison-node (treesit-node-at bol 'bison)))
(not (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue"))))))
(defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
"Get the parent of the bison node at BOL."
(treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
(defun bison-ts-mode--indent-rules ()
"Indent rules supported by `bison-ts-mode'."
(let*
((common
`(((node-is "^declaration$")
column-0 0)
((and (parent-is "^declaration$")
(not (node-is "^code_block$")))
column-0 2)
((and (parent-is "comment") c-ts-common-looking-at-star)
c-ts-common-comment-start-after-first-star -1)
(c-ts-common-comment-2nd-line-matcher
c-ts-common-comment-2nd-line-anchor
1)
((parent-is "comment") prev-adaptive-prefix 0)
;; Opening and closing brackets "{}" of declarations
((and (parent-is "^declaration$")
(node-is "^code_block$"))
column-0 0)
((and (n-p-gp "}" "" "^declaration$"))
column-0 0)
((parent-is "^declaration$") parent 2)
((node-is "^grammar_rule$") column-0 0)
((and
(parent-is "^grammar_rule$")
(node-is ";"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(node-is "|"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(not (node-is "^grammar_rule_declaration$"))
(not (node-is "^action$")))
column-0 ,(+ bison-ts-mode-indent-offset 2))
((or
(node-is "^action$")
(node-is "^}$"))
column-0 12)
;; Set '%%' at the beginning of the line
((or
(and (parent-is "^grammar_rules_section$")
(node-is "%%"))
(node-is "^grammar_rules_section$"))
column-0 0)
(no-node parent-bol 0))))
`((bison . ,common)
;; Import and override embedded languages rules to add an offset
,(pcase bison-ts-mode-embedded-language
('c `(c
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
column-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
('cpp `(cpp
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
parent-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
('java `(java
((bison-ts-mode--bison-matcher-action "program")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "program")
column-0 ,bison-ts-mode-indent-offset)
,@java-ts-mode--indent-rules))))))
(define-derived-mode bison-ts-mode prog-mode "Bison"
"A major-mode for Bison based on tree-sitter."
(when (treesit-ready-p 'bison)
(when (not bison-ts-mode-embedded-language)
(setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
;; Require only if needed, to avoid warnings if a grammar is not
;; installed but not used.
(pcase bison-ts-mode-embedded-language
('c (require 'c-ts-mode))
('cpp (require 'c-ts-mode))
('java (require 'java-ts-mode)))
(setq-local treesit-font-lock-settings
(append (bison-ts-mode--font-lock-settings 'bison)
(pcase bison-ts-mode-embedded-language
('c (c-ts-mode--font-lock-settings 'c))
('cpp (c-ts-mode--font-lock-settings 'cpp))
('java java-ts-mode--font-lock-settings))))
(setq-local treesit-font-lock-feature-list
(if bison-ts-mode-embedded-language
(bison-ts-mode--merge-feature-lists
bison-ts-mode--font-lock-feature-list
(pcase bison-ts-mode-embedded-language
('c c-ts-mode--feature-list)
('cpp c-ts-mode--feature-list)
('java java-ts-mode--feature-list)))
bison-ts-mode--font-lock-feature-list))
(setq-local treesit-simple-imenu-settings
`(("Grammar"
"\\`grammar_rule_declaration\\'"
nil
(lambda (node) (substring-no-properties (treesit-node-text node))))))
(c-ts-common-comment-setup)
(setq-local treesit-simple-indent-rules
(bison-ts-mode--indent-rules))
(setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
(when bison-ts-mode-embedded-language
(setq-local treesit-range-settings
(treesit-range-rules
:embed bison-ts-mode-embedded-language
:host 'bison
:local t
'((embedded_code) @capture))))
(treesit-major-mode-setup)))
(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-22 14:53 ` Augustin Chéneau (BTuin)
@ 2023-09-22 20:40 ` Philip Kaludercic
2023-09-22 23:21 ` Augustin Chéneau (BTuin)
0 siblings, 1 reply; 17+ messages in thread
From: Philip Kaludercic @ 2023-09-22 20:40 UTC (permalink / raw)
To: Augustin Chéneau (BTuin); +Cc: emacs-devel
"Augustin Chéneau (BTuin)" <btuin@mailo.com> writes:
> Le 22/09/2023 à 09:38, Philip Kaludercic a écrit :
>> "Augustin Chéneau (BTuin)" <btuin@mailo.com> writes:
>> A few comments on the proposed file:
>>
>>> ;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-
>>>
>> Could you add the usual header information here?
>> ;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
>> --8<---------------cut here---------------start------------->8---
>> ;; Author: Augustin Chéneau <btuin@mailo.com>
>> ;; This file is part of GNU Emacs.
>> ;; GNU Emacs is free software: you can redistribute it and/or modify
>> ;; it under the terms of the GNU General Public License as published by
>> ;; the Free Software Foundation, either version 3 of the License, or
>> ;; (at your option) any later version.
>> ;; GNU Emacs is distributed in the hope that it will be useful,
>> ;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>> ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>> ;; GNU General Public License for more details.
>> ;; You should have received a copy of the GNU General Public License
>> ;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
>> --8<---------------cut here---------------end--------------->8---
>>
>>> ;;; Commentary:
>>>
>>> ;; This is a mode based on tree-sitter for Bison and Yacc files, tools to generate parsers.
>> Shouldn't you mention what tree sitter grammar is being being used
>> here?
>>
>>> ;;; Code:
>>>
>>> (require 'treesit)
>>> (require 'c-ts-common)
>>>
>>> (declare-function treesit-parser-create "treesit.c")
>>> (declare-function treesit-induce-sparse-tree "treesit.c")
>>> (declare-function treesit-node-child-by-field-name "treesit.c")
>>> (declare-function treesit-search-subtree "treesit.c")
>>> (declare-function treesit-node-parent "treesit.c")
>>> (declare-function treesit-node-next-sibling "treesit.c")
>>> (declare-function treesit-node-type "treesit.c")
>>> (declare-function treesit-node-child "treesit.c")
>>> (declare-function treesit-node-end "treesit.c")
>>> (declare-function treesit-node-start "treesit.c")
>>> (declare-function treesit-node-string "treesit.c")
>>> (declare-function treesit-query-compile "treesit.c")
>>> (declare-function treesit-query-capture "treesit.c")
>>> (declare-function treesit-parser-add-notifier "treesit.c")
>>> (declare-function treesit-parser-buffer "treesit.c")
>>> (declare-function treesit-parser-list "treesit.c")
>>>
>>>
>>> (defgroup bison nil
>> bison or bison-ts?
>>
>
> As far as I know, no group is explicitly reserved for
> tree-sitter. rust-ts-mode uses the group "rust", java-ts-mode uses
> "java", ruby-ts-mode uses "ruby"...
OK, that is fine then.
>
>>> "Support for Bison and Yacc."
>> Shouldn't tree-sitter be mentioned here?
>>
> Same as above.
Sure? Above I was talking about using the group name "bison" or
"bison-ts", this is about mentioning that tree sitter is required for
the mode to work.
>>> :group 'languages)
>>>
>>> (defcustom bison-ts-mode-indent-offset 2
>>> "Number of spaces for each indentation step in `bison-ts-mode'.
>>> It has no effect in the epilogue part of the file."
>>> :version "30.1"
>>> :type 'integer
>>> :safe 'integerp
>>> :group 'bison)
>>>
>>> (defcustom bison-ts-mode-autodetect-language t
>>> "Search for a %language directive in the file at initialization.
>>> Changing the value of this directive in the file requires to reload the mode to
>>> be effective. If `bison-ts-mode-buffer-language' is set by a file-local
>>> variable, the auto-detection is not run."
>>> :version "30.1"
>>> :type 'boolean
>>> :safe 'boolean
>>> :group 'bison)
>>>
>>> (defvar-local bison-ts-mode-embedded-language nil
>>> "Embedded language in Bison buffer.")
>>>
>>> (defun bison-ts-mode--merge-feature-lists (l1 l2)
>>> "Merge the lists of lists L1 and L2.
>>> The first sublist of L1 is merged with the first sublist of L2 and so on.
>>> L1 and L2 don't need to have the same size."
>>> (let ((res ()))
>>> (while (or l1 l2)
>>> (setq res (push (append (car l1) (car l2)) res))
>>> (setq l1 (cdr l1) l2 (cdr l2)))
>>> (nreverse res)))
>> Is this a generic function that should be extracted into some common
>> file?
>>
> It probably should, it could be useful for other *-ts-mode that want
> to mix multiple languages, unless there already is such a function in
> Elisp.
Then again, if you use `cl-loop', you can reproduce this specific
behaviour using
(cl-loop for i1 in '(1 2 3) collect i1
for i2 in '(a b c) collect i2)
>
>>> (defun bison-ts-mode--find-language-in-buffer (&optional buffer)
>>> "Find and return the language set by the Bison directive %language.
>>> If BUFFER is set, search in this buffer, otherwise search in the current
>>> buffer."
>>> (save-excursion
>>> (when buffer
>>> (switch-to-buffer buffer))
>> Or rather?
>> (with-current-buffer (or buffer (current-buffer))
>> (save-excursion
>> ...))
>>
>>> (goto-char (point-min))
>>> (let ((pos-end
>>> (re-search-forward
>>> (rx
>>> bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha "+")) "\"")
>>> nil
>>> t))
>>> (pos-beg nil))
>>> (when pos-end
>> Using when-let might be nice here.
>>
>>> (goto-char (1- pos-end))
>>> (setq pos-beg (1+ (search-backward "\"" nil t)))
>>> (buffer-substring-no-properties pos-beg (1- pos-end))))))
>> I'd use a single regular expression here with a group, then extract
>> the
>> right information using `match-string'.
>>
> Nice, I didn't know it was possible.
>>>
>>>
>>> (defun bison-ts-mode--detect-language (&optional buffer)
>>> "Dectect the embedded language in a Bison buffer.
>>> Known languages are C, C++, D, and Java, but D is not supported as there is
>>> no support for tree-sitter D in Emacs yet.
>>> If BUFFER is set, search in this buffer, otherwise search in the current
>>> buffer."
>>> (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
>>> (pcase (downcase str)
>>> ("c" 'c)
>>> ("c++" 'cpp)
>>> ("d" (progn (message "D language not yet supported") nil))
>> Each pcase case has an implicit progn.
>>
>>> ("java" 'java))
>>> (progn
>> Use when-let to avoid this progn.
>>
>>> (message
>>> "bison-ts-mode: %%language specification not found or invalid, defaulting to C.")
>> Is it necessary to prefix the message with the major mode name?
>>
> If feared it would be a bit cryptic without. Anyway I modified this
> function to only display a message if "%language" is present but the
> language is invalid, so it probably isn't necessary anymore.
OK.
>>> 'c)))
>>>
>>>
>>> (defun bison-ts-mode--language-at-point-function (position)
>>> "Return the language at POSITION."
>>> (let* ((node (treesit-node-at position 'bison)))
>> ^
>> let is enough
>>
>>> (if (equal (treesit-node-type node)
>>> "embedded_code")
>> There is no need to break the line here.
>>
>>> bison-ts-mode-embedded-language
>>> 'bison)))
>>>
>>> (defun bison-ts-mode--font-lock-settings (language)
>>> "Return the font-lock settings for Bison.
>>> LANGUAGE should be set to \\='bison."
>>> (treesit-font-lock-rules
>>> :language language
>>> :feature 'bison-comment
>>> '((comment) @font-lock-comment-face)
>>>
>>> :language language
>>> :feature 'bison-declaration
>>> '((declaration_name) @font-lock-keyword-face)
>>>
>>> :language language
>>> :feature 'bison-type
>>> '((type) @font-lock-type-face)
>>>
>>> :language language
>>> :feature 'bison-grammar-rule-usage
>>> '((grammar_rule_identifier) @font-lock-variable-use-face)
>>>
>>> :language language
>>> :feature 'bison-grammar-rule-declaration
>>> '((grammar_rule (grammar_rule_declaration)
>>> @font-lock-variable-use-face))
>>>
>>> :language language
>>> :feature 'bison-string
>>> :override t
>>> '((string) @font-lock-string-face)
>>>
>>> :language language
>>> :feature 'bison-literal
>>> :override t
>>> '((char_literal) @font-lock-keyword-face
>>> (number_literal) @font-lock-number-face)
>>>
>>> :language language
>>> :feature 'bison-directive-grammar-rule
>>> :override t
>>> '((grammar_rule (directive) @font-lock-keyword-face))
>>>
>>> :language language
>>> :feature 'bison-operator
>>> :override t
>>> '(["|"] @font-lock-operator-face)
>>>
>>> :language language
>>> :feature 'bison-delimiter
>>> :override t
>>> '([";"] @font-lock-delimiter-face)))
>>>
>>>
>>> (defvar bison-ts-mode--font-lock-feature-list
>>> '(( bison-comment bison-declaration bison-type
>>> bison-grammar-rule-usage bison-grammar-rule-declaration
>>> bison-string bison-literal bison-directive-grammar-rule
>>> bison-operator bison-delimiter)))
>>>
>>>
>>> (defun bison-ts-mode--bison-matcher-action (root-name)
>>> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
>>> ROOT-NAME is the highest-level node of the embedded language."
>>> (lambda (node _parent bol &rest _)
>>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>>> (let* ((bison-node (treesit-node-at bol 'bison)))
>> ^
>> here again, let is enough
>> (if (equal
>>> (treesit-node-type
>>> (treesit-node-parent(treesit-node-parent bison-node))) "action")
>> Though you could bind the (treesit-node-type ...) expression under
>> the
>> above let.
>>
>>> t
>>> nil)))))
>> Why (if foo t nil) when foo would do the same job (equal only
>> returns
>> nil and t, so normalising the value isn't even necessary).
>>
> Because I was stupid.
OK, I am glad I didn't miss something obvious ^^
>>>
>>> (defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
>>> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
>>> ROOT-NAME is the highest-level node of the embedded language."
>>> (lambda (node _parent bol &rest _)
>>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>>> (let* ((bison-node (treesit-node-at bol 'bison)))
>>> (if (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue")
>>> nil
>>> t)))))
>> Am I missing something, or couldn't these two functions be merged if
>> you
>> give them a third argument NODE-TYPE and pass it "action" or "epilogue".
>>
> No, bison-ts-mode--bison-matcher-action checks if the _grandparent_ is
> an "action" node, while bison-ts-mode--bison-matcher-not-epilogue
> checks if the _parent_ is an "epilogue" node.
In that case, would adding another parameter and then binding the
returned lambda expression via defalias be worth the effort?
>>>
>>>
>>> (defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
>>> "Get the parent of the bison node at BOL."
>>> (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
>>>
>>>
>>> (defun bison-ts-mode--indent-rules ()
>>> "Indent rules supported by `bison-ts-mode'."
>>> (let*
>>> ((common
>>> `(((node-is "^declaration$")
>>> column-0 0)
>>> ((and (parent-is "^declaration$")
>>> (not (node-is "^code_block$")))
>>> column-0 2)
>>> ((and (parent-is "comment") c-ts-common-looking-at-star)
>>> c-ts-common-comment-start-after-first-star -1)
>>> (c-ts-common-comment-2nd-line-matcher
>>> c-ts-common-comment-2nd-line-anchor
>>> 1)
>>> ((parent-is "comment") prev-adaptive-prefix 0)
>>>
>>> ;; Opening and closing brackets "{}" of declarations
>>> ((and (parent-is "^declaration$")
>>> (node-is "^code_block$"))
>>> column-0 0)
>>> ((and (n-p-gp "}" "" "^declaration$"))
>>> column-0 0)
>>> ((parent-is "^declaration$") parent 2)
>>> ((node-is "^grammar_rule$") column-0 0)
>>> ((and
>>> (parent-is "^grammar_rule$")
>>> (node-is ";"))
>>> column-0 bison-ts-mode-indent-offset)
>>> ((and (parent-is "^grammar_rule$")
>>> (node-is "|"))
>>> column-0 bison-ts-mode-indent-offset)
>>> ((and (parent-is "^grammar_rule$")
>>> (not (node-is "^grammar_rule_declaration$"))
>>> (not (node-is "^action$")))
>>> column-0 ,(+ bison-ts-mode-indent-offset 2))
>>> ((or
>>> (node-is "^action$")
>>> (node-is "^}$"))
>>> column-0 12)
>>> ;; Set '%%' at the beginning of the line
>>> ((or
>>> (and (parent-is "^grammar_rules_section$")
>>> (node-is "%%"))
>>> (node-is "^grammar_rules_section$"))
>>> column-0 0)
>>> (no-node parent-bol 0))))
>>> `((bison . ,common)
>>> ;; Import and override embedded languages rules to add an offset
>>> ,(pcase bison-ts-mode-embedded-language
>>> ('c `(c
>>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>>> column-0 ,bison-ts-mode-indent-offset)
>>> ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
>>> ('cpp `(cpp
>>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>>> parent-0 ,bison-ts-mode-indent-offset)
>>> ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
>>> ('java `(java
>>> ((bison-ts-mode--bison-matcher-action "program")
>>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>>> ((bison-ts-mode--bison-matcher-not-epilogue "program")
>>> column-0 ,bison-ts-mode-indent-offset)
>>> ,@java-ts-mode--indent-rules))))))
>>>
>>>
>>> (define-derived-mode bison-ts-mode prog-mode "Bison"
>>> "A mode for Bison."
>> ^
>> major-mode
>> Also, mentioning tree-sitter seems like something worth doing.
>>
>>> (when (treesit-ready-p 'bison)
>>> (when (not bison-ts-mode-embedded-language)
>>> (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
>>>
>>> ;; Require only if needed, to avoid warnings if a grammar is not
>>> ;; installed but not used.
>>> (pcase bison-ts-mode-embedded-language
>> Would a `pcase-exhaustive' be appropriate here?
>>
> No, the language D is recognized but not supported yet in Emacs, so if
> this language is detected it will not configure anything.
There is a d-mode in NonGNU ELPA, but I guess that isn't enough since
you need a d-ts-mode, right?
>>> ('c (require 'c-ts-mode))
>>> ('cpp (require 'c-ts-mode))
>>> ('java (require 'java-ts-mode)))
>>>
>>> (setq-local treesit-font-lock-settings
>>> (append (bison-ts-mode--font-lock-settings 'bison)
>>> (pcase bison-ts-mode-embedded-language
>>> ('c (c-ts-mode--font-lock-settings 'c))
>>> ('cpp (c-ts-mode--font-lock-settings 'cpp))
>>> ('java java-ts-mode--font-lock-settings))))
>>>
>>> (setq-local treesit-font-lock-feature-list
>>> (if bison-ts-mode-embedded-language
>>> (bison-ts-mode--merge-feature-lists
>>> bison-ts-mode--font-lock-feature-list
>>> (pcase bison-ts-mode-embedded-language
>>> ('c c-ts-mode--feature-list)
>>> ('cpp c-ts-mode--feature-list)
>>> ('java java-ts-mode--feature-list)))
>>> bison-ts-mode--font-lock-feature-list))
>>>
>>> (setq-local treesit-simple-imenu-settings
>>> `(("Grammar"
>>> "\\`grammar_rule_declaration\\'"
>>> nil
>>> (lambda (node) (substring-no-properties (treesit-node-text node))))))
>>>
>>> (c-ts-common-comment-setup)
>>>
>>> (setq-local treesit-simple-indent-rules
>>> (bison-ts-mode--indent-rules))
>>>
>>> (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
>>>
>>> (when bison-ts-mode-embedded-language
>>> (setq-local treesit-range-settings
>>> (treesit-range-rules
>>> :embed bison-ts-mode-embedded-language
>>> :host 'bison
>>> :local t
>>> '((embedded_code) @capture))))
>>>
>>> (treesit-major-mode-setup)))
>>>
>>> (provide 'bison-ts-mode)
>>> ;;; bison-ts-mode.el ends here
>> Sorry for the number of comments, but there has been a discussion on
>> the
>> code-quality of tree-sitter major modes that has been less than optimal,
>> so I hope that your contribution could help raise the bar.
>
> No problem, thank you for your review!
[...]
>
> (defgroup bison-ts nil
> "Support for Bison and Yacc."
> :group 'languages)
>
> (defcustom bison-ts-mode-indent-offset 2
> "Number of spaces for each indentation step in `bison-ts-mode'.
> It has no effect in the epilogue part of the file."
> :version "30.1"
> :type 'integer
> :safe 'integerp
> :group 'bison)
The ":group" annotations here are not necessary in general, defcustoms
can automatically detect the previous defgroup.
> (defcustom bison-ts-mode-autodetect-language t
> "Search for a %language directive in the file at initialization.
> Changing the value of this directive in the file requires to reload the mode to
> be effective. If `bison-ts-mode-buffer-language' is set by a file-local
> variable, the auto-detection is not run."
> :version "30.1"
> :type 'boolean
> :safe 'boolean
> :group 'bison)
>
> (defvar-local bison-ts-mode-embedded-language nil
> "Embedded language in Bison buffer.
> Supported values are `c', `cpp', and `java'.")
> ;;;###autoload
> (put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp)
>
>
> (defun bison-ts-mode--merge-feature-lists (l1 l2)
> "Merge the lists of lists L1 and L2.
> The first sublist of L1 is merged with the first sublist of L2 and so on.
> L1 and L2 don't need to have the same size."
> (let ((res ()))
> (while (or l1 l2)
> (setq res (push (seq-uniq (append (car l1) (car l2)) 'eq) res))
> (setq l1 (cdr l1) l2 (cdr l2)))
> (nreverse res)))
>
> (defun bison-ts-mode--find-language-in-buffer (&optional buffer)
> "Find and return the language set by the Bison directive %language.
> If BUFFER is set, search in this buffer, otherwise search in the current
> buffer."
> (save-excursion
> (with-current-buffer (or buffer (current-buffer))
> (goto-char (point-min))
> (when
> (re-search-forward
> (rx
> bol (0+ blank) "%language" (0+ blank) "\"" (group (1+ (in alpha "+"))) "\"")
I'd say this regular expression is complex enough to be split into
multiple lines. And you can use the fact that `rx' takes s-expressions
to add comments inbetween.
> nil
> t)))
> (substring-no-properties (match-string 1))))
Or `match-string-no-properties'
>
>
> (defun bison-ts-mode--detect-language (&optional buffer)
> "Dectect the embedded language in a Bison buffer.
> Known languages are C, C++, D, and Java, but D is not supported as there is
> no support for tree-sitter D in Emacs yet.
> If BUFFER is set, search in this buffer, otherwise search in the current
> buffer."
> (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
> (pcase-exhaustive (downcase str)
> ("c" 'c)
> ("c++" 'cpp)
> ("d" (message "D language not yet supported") nil)
> ("java" 'java)
> (_ (message "%%language specification \"%s\" is invalid, defaulting to C" str) 'c))))
No point in using `pcase-exhaustive' if you end with _ anyway?
>
>
> (defun bison-ts-mode--language-at-point-function (position)
> "Return the language at POSITION."
> (let ((node (treesit-node-at position 'bison)))
> (if (equal (treesit-node-type node) "embedded_code")
> bison-ts-mode-embedded-language
> 'bison)))
>
> (defun bison-ts-mode--font-lock-settings (language)
> "Return the font-lock settings for Bison.
> LANGUAGE should be set to \\='bison."
> (treesit-font-lock-rules
> :language language
> :feature 'comment
> '((comment) @font-lock-comment-face)
>
> :language language
> :feature 'declaration
> '((declaration_name) @font-lock-keyword-face)
>
> :language language
> :feature 'type
> '((type) @font-lock-type-face)
>
> :language language
> :feature 'variable
> '((grammar_rule_identifier) @font-lock-variable-use-face)
>
> :language language
> :feature 'grammar-declaration
> '((grammar_rule (grammar_rule_declaration)
> @font-lock-variable-use-face))
>
> :language language
> :feature 'string
> :override t
> '((string) @font-lock-string-face)
>
> :language language
> :feature 'literal
> :override t
> '((char_literal) @font-lock-keyword-face
> (number_literal) @font-lock-number-face)
>
> :language language
> :feature 'directive-grammar-rule
> :override t
> '((grammar_rule (directive) @font-lock-keyword-face))
>
> :language language
> :feature 'operator
> :override t
> '(["|"] @font-lock-operator-face)
>
> :language language
> :feature 'delimiter
> :override t
> '([";"] @font-lock-delimiter-face)))
>
>
> (defvar bison-ts-mode--font-lock-feature-list
I am not that familiar with the tree-sitter stuff, but would it be
possible to use `defconst' here?
> '(( comment declaration grammar-declaration)
> ( type string directive-grammar-rule)
> ( literal)
> ( variable operator delimiter)))
>
>
> (defun bison-ts-mode--bison-matcher-action (root-name)
> "Treesit matcher to check if NODE at BOL is located in an action node.
> ROOT-NAME is the highest-level node of the embedded language."
> (lambda (node _parent bol &rest _)
> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
> (let ((bison-node (treesit-node-at bol 'bison)))
> (equal
> (treesit-node-type
> (treesit-node-parent (treesit-node-parent bison-node)))
> "action")))))
>
> (defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
> ROOT-NAME is the highest-level node of the embedded language."
> (lambda (node _parent bol &rest _)
> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
> (let ((bison-node (treesit-node-at bol 'bison)))
> (not (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue"))))))
>
>
> (defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
> "Get the parent of the bison node at BOL."
> (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
>
>
> (defun bison-ts-mode--indent-rules ()
> "Indent rules supported by `bison-ts-mode'."
> (let*
> ((common
> `(((node-is "^declaration$")
> column-0 0)
> ((and (parent-is "^declaration$")
> (not (node-is "^code_block$")))
> column-0 2)
> ((and (parent-is "comment") c-ts-common-looking-at-star)
> c-ts-common-comment-start-after-first-star -1)
> (c-ts-common-comment-2nd-line-matcher
> c-ts-common-comment-2nd-line-anchor
> 1)
> ((parent-is "comment") prev-adaptive-prefix 0)
>
> ;; Opening and closing brackets "{}" of declarations
> ((and (parent-is "^declaration$")
> (node-is "^code_block$"))
> column-0 0)
> ((and (n-p-gp "}" "" "^declaration$"))
> column-0 0)
> ((parent-is "^declaration$") parent 2)
> ((node-is "^grammar_rule$") column-0 0)
> ((and
> (parent-is "^grammar_rule$")
> (node-is ";"))
> column-0 bison-ts-mode-indent-offset)
> ((and (parent-is "^grammar_rule$")
> (node-is "|"))
> column-0 bison-ts-mode-indent-offset)
> ((and (parent-is "^grammar_rule$")
> (not (node-is "^grammar_rule_declaration$"))
> (not (node-is "^action$")))
> column-0 ,(+ bison-ts-mode-indent-offset 2))
> ((or
> (node-is "^action$")
> (node-is "^}$"))
> column-0 12)
> ;; Set '%%' at the beginning of the line
> ((or
> (and (parent-is "^grammar_rules_section$")
> (node-is "%%"))
> (node-is "^grammar_rules_section$"))
> column-0 0)
> (no-node parent-bol 0))))
> `((bison . ,common)
> ;; Import and override embedded languages rules to add an offset
> ,(pcase bison-ts-mode-embedded-language
> ('c `(c
> ((bison-ts-mode--bison-matcher-action "translation_unit")
> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
> column-0 ,bison-ts-mode-indent-offset)
> ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
> ('cpp `(cpp
> ((bison-ts-mode--bison-matcher-action "translation_unit")
> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
> parent-0 ,bison-ts-mode-indent-offset)
> ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
> ('java `(java
> ((bison-ts-mode--bison-matcher-action "program")
> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
> ((bison-ts-mode--bison-matcher-not-epilogue "program")
> column-0 ,bison-ts-mode-indent-offset)
> ,@java-ts-mode--indent-rules))))))
>
>
> (define-derived-mode bison-ts-mode prog-mode "Bison"
> "A major-mode for Bison based on tree-sitter."
> (when (treesit-ready-p 'bison)
> (when (not bison-ts-mode-embedded-language)
Or `unless'
> (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
>
> ;; Require only if needed, to avoid warnings if a grammar is not
> ;; installed but not used.
> (pcase bison-ts-mode-embedded-language
> ('c (require 'c-ts-mode))
> ('cpp (require 'c-ts-mode))
> ('java (require 'java-ts-mode)))
>
> (setq-local treesit-font-lock-settings
> (append (bison-ts-mode--font-lock-settings 'bison)
> (pcase bison-ts-mode-embedded-language
> ('c (c-ts-mode--font-lock-settings 'c))
> ('cpp (c-ts-mode--font-lock-settings 'cpp))
> ('java java-ts-mode--font-lock-settings))))
>
> (setq-local treesit-font-lock-feature-list
> (if bison-ts-mode-embedded-language
> (bison-ts-mode--merge-feature-lists
> bison-ts-mode--font-lock-feature-list
> (pcase bison-ts-mode-embedded-language
> ('c c-ts-mode--feature-list)
> ('cpp c-ts-mode--feature-list)
> ('java java-ts-mode--feature-list)))
> bison-ts-mode--font-lock-feature-list))
>
> (setq-local treesit-simple-imenu-settings
> `(("Grammar"
> "\\`grammar_rule_declaration\\'"
> nil
> (lambda (node) (substring-no-properties (treesit-node-text node))))))
The function `treesit-node-text' appears to take an optional NO-PROPERTY argument.
>
> (c-ts-common-comment-setup)
>
> (setq-local treesit-simple-indent-rules
> (bison-ts-mode--indent-rules))
>
> (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
>
> (when bison-ts-mode-embedded-language
> (setq-local treesit-range-settings
> (treesit-range-rules
> :embed bison-ts-mode-embedded-language
> :host 'bison
> :local t
> '((embedded_code) @capture))))
>
> (treesit-major-mode-setup)))
>
> (provide 'bison-ts-mode)
> ;;; bison-ts-mode.el ends here
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-22 20:40 ` Philip Kaludercic
@ 2023-09-22 23:21 ` Augustin Chéneau (BTuin)
0 siblings, 0 replies; 17+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-22 23:21 UTC (permalink / raw)
To: Philip Kaludercic; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 31565 bytes --]
Le 22/09/2023 à 22:40, Philip Kaludercic a écrit :
> "Augustin Chéneau (BTuin)" <btuin@mailo.com> writes:
>
>> Le 22/09/2023 à 09:38, Philip Kaludercic a écrit :
>>> "Augustin Chéneau (BTuin)" <btuin@mailo.com> writes:
>>> A few comments on the proposed file:
>>>
>>>> ;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-
>>>>
>>> Could you add the usual header information here?
>>> ;; Copyright (C) 2022-2023 Free Software Foundation, Inc.
>>> --8<---------------cut here---------------start------------->8---
>>> ;; Author: Augustin Chéneau <btuin@mailo.com>
>>> ;; This file is part of GNU Emacs.
>>> ;; GNU Emacs is free software: you can redistribute it and/or modify
>>> ;; it under the terms of the GNU General Public License as published by
>>> ;; the Free Software Foundation, either version 3 of the License, or
>>> ;; (at your option) any later version.
>>> ;; GNU Emacs is distributed in the hope that it will be useful,
>>> ;; but WITHOUT ANY WARRANTY; without even the implied warranty of
>>> ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
>>> ;; GNU General Public License for more details.
>>> ;; You should have received a copy of the GNU General Public License
>>> ;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
>>> --8<---------------cut here---------------end--------------->8---
>>>
>>>> ;;; Commentary:
>>>>
>>>> ;; This is a mode based on tree-sitter for Bison and Yacc files, tools to generate parsers.
>>> Shouldn't you mention what tree sitter grammar is being being used
>>> here?
>>>
>>>> ;;; Code:
>>>>
>>>> (require 'treesit)
>>>> (require 'c-ts-common)
>>>>
>>>> (declare-function treesit-parser-create "treesit.c")
>>>> (declare-function treesit-induce-sparse-tree "treesit.c")
>>>> (declare-function treesit-node-child-by-field-name "treesit.c")
>>>> (declare-function treesit-search-subtree "treesit.c")
>>>> (declare-function treesit-node-parent "treesit.c")
>>>> (declare-function treesit-node-next-sibling "treesit.c")
>>>> (declare-function treesit-node-type "treesit.c")
>>>> (declare-function treesit-node-child "treesit.c")
>>>> (declare-function treesit-node-end "treesit.c")
>>>> (declare-function treesit-node-start "treesit.c")
>>>> (declare-function treesit-node-string "treesit.c")
>>>> (declare-function treesit-query-compile "treesit.c")
>>>> (declare-function treesit-query-capture "treesit.c")
>>>> (declare-function treesit-parser-add-notifier "treesit.c")
>>>> (declare-function treesit-parser-buffer "treesit.c")
>>>> (declare-function treesit-parser-list "treesit.c")
>>>>
>>>>
>>>> (defgroup bison nil
>>> bison or bison-ts?
>>>
>>
>> As far as I know, no group is explicitly reserved for
>> tree-sitter. rust-ts-mode uses the group "rust", java-ts-mode uses
>> "java", ruby-ts-mode uses "ruby"...
>
> OK, that is fine then.
>
>>
>>>> "Support for Bison and Yacc."
>>> Shouldn't tree-sitter be mentioned here?
>>>
>> Same as above.
>
> Sure? Above I was talking about using the group name "bison" or
> "bison-ts", this is about mentioning that tree sitter is required for
> the mode to work.
>
Oh ok, I modified it. Should I detail more?
>>>> :group 'languages)
>>>>
>>>> (defcustom bison-ts-mode-indent-offset 2
>>>> "Number of spaces for each indentation step in `bison-ts-mode'.
>>>> It has no effect in the epilogue part of the file."
>>>> :version "30.1"
>>>> :type 'integer
>>>> :safe 'integerp
>>>> :group 'bison)
>>>>
>>>> (defcustom bison-ts-mode-autodetect-language t
>>>> "Search for a %language directive in the file at initialization.
>>>> Changing the value of this directive in the file requires to reload the mode to
>>>> be effective. If `bison-ts-mode-buffer-language' is set by a file-local
>>>> variable, the auto-detection is not run."
>>>> :version "30.1"
>>>> :type 'boolean
>>>> :safe 'boolean
>>>> :group 'bison)
>>>>
>>>> (defvar-local bison-ts-mode-embedded-language nil
>>>> "Embedded language in Bison buffer.")
>>>>
>>>> (defun bison-ts-mode--merge-feature-lists (l1 l2)
>>>> "Merge the lists of lists L1 and L2.
>>>> The first sublist of L1 is merged with the first sublist of L2 and so on.
>>>> L1 and L2 don't need to have the same size."
>>>> (let ((res ()))
>>>> (while (or l1 l2)
>>>> (setq res (push (append (car l1) (car l2)) res))
>>>> (setq l1 (cdr l1) l2 (cdr l2)))
>>>> (nreverse res)))
>>> Is this a generic function that should be extracted into some common
>>> file?
>>>
>> It probably should, it could be useful for other *-ts-mode that want
>> to mix multiple languages, unless there already is such a function in
>> Elisp.
>
> Then again, if you use `cl-loop', you can reproduce this specific
> behaviour using
>
> (cl-loop for i1 in '(1 2 3) collect i1
> for i2 in '(a b c) collect i2)
>
Maybe I'm missing something, but this is not the wanted result. The
goal is to merge to lists of lists, such as '((1 2 3) (4 5 6) (7 8 9))
and '((a b c) (d e f) (h g i) (k l m)), to get
'((1 2 3 a b c) (4 5 6 d e f) (7 8 9 h g i) (k l m))
(the order inside the sub-lists is unimportant, '(1 c b 2 3 a) would
work too).
>>
>>>> (defun bison-ts-mode--find-language-in-buffer (&optional buffer)
>>>> "Find and return the language set by the Bison directive %language.
>>>> If BUFFER is set, search in this buffer, otherwise search in the current
>>>> buffer."
>>>> (save-excursion
>>>> (when buffer
>>>> (switch-to-buffer buffer))
>>> Or rather?
>>> (with-current-buffer (or buffer (current-buffer))
>>> (save-excursion
>>> ...))
>>>
>>>> (goto-char (point-min))
>>>> (let ((pos-end
>>>> (re-search-forward
>>>> (rx
>>>> bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha "+")) "\"")
>>>> nil
>>>> t))
>>>> (pos-beg nil))
>>>> (when pos-end
>>> Using when-let might be nice here.
>>>
>>>> (goto-char (1- pos-end))
>>>> (setq pos-beg (1+ (search-backward "\"" nil t)))
>>>> (buffer-substring-no-properties pos-beg (1- pos-end))))))
>>> I'd use a single regular expression here with a group, then extract
>>> the
>>> right information using `match-string'.
>>>
>> Nice, I didn't know it was possible.
>>>>
>>>>
>>>> (defun bison-ts-mode--detect-language (&optional buffer)
>>>> "Dectect the embedded language in a Bison buffer.
>>>> Known languages are C, C++, D, and Java, but D is not supported as there is
>>>> no support for tree-sitter D in Emacs yet.
>>>> If BUFFER is set, search in this buffer, otherwise search in the current
>>>> buffer."
>>>> (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
>>>> (pcase (downcase str)
>>>> ("c" 'c)
>>>> ("c++" 'cpp)
>>>> ("d" (progn (message "D language not yet supported") nil))
>>> Each pcase case has an implicit progn.
>>>
>>>> ("java" 'java))
>>>> (progn
>>> Use when-let to avoid this progn.
>>>
>>>> (message
>>>> "bison-ts-mode: %%language specification not found or invalid, defaulting to C.")
>>> Is it necessary to prefix the message with the major mode name?
>>>
>> If feared it would be a bit cryptic without. Anyway I modified this
>> function to only display a message if "%language" is present but the
>> language is invalid, so it probably isn't necessary anymore.
>
> OK.
>
>>>> 'c)))
>>>>
>>>>
>>>> (defun bison-ts-mode--language-at-point-function (position)
>>>> "Return the language at POSITION."
>>>> (let* ((node (treesit-node-at position 'bison)))
>>> ^
>>> let is enough
>>>
>>>> (if (equal (treesit-node-type node)
>>>> "embedded_code")
>>> There is no need to break the line here.
>>>
>>>> bison-ts-mode-embedded-language
>>>> 'bison)))
>>>>
>>>> (defun bison-ts-mode--font-lock-settings (language)
>>>> "Return the font-lock settings for Bison.
>>>> LANGUAGE should be set to \\='bison."
>>>> (treesit-font-lock-rules
>>>> :language language
>>>> :feature 'bison-comment
>>>> '((comment) @font-lock-comment-face)
>>>>
>>>> :language language
>>>> :feature 'bison-declaration
>>>> '((declaration_name) @font-lock-keyword-face)
>>>>
>>>> :language language
>>>> :feature 'bison-type
>>>> '((type) @font-lock-type-face)
>>>>
>>>> :language language
>>>> :feature 'bison-grammar-rule-usage
>>>> '((grammar_rule_identifier) @font-lock-variable-use-face)
>>>>
>>>> :language language
>>>> :feature 'bison-grammar-rule-declaration
>>>> '((grammar_rule (grammar_rule_declaration)
>>>> @font-lock-variable-use-face))
>>>>
>>>> :language language
>>>> :feature 'bison-string
>>>> :override t
>>>> '((string) @font-lock-string-face)
>>>>
>>>> :language language
>>>> :feature 'bison-literal
>>>> :override t
>>>> '((char_literal) @font-lock-keyword-face
>>>> (number_literal) @font-lock-number-face)
>>>>
>>>> :language language
>>>> :feature 'bison-directive-grammar-rule
>>>> :override t
>>>> '((grammar_rule (directive) @font-lock-keyword-face))
>>>>
>>>> :language language
>>>> :feature 'bison-operator
>>>> :override t
>>>> '(["|"] @font-lock-operator-face)
>>>>
>>>> :language language
>>>> :feature 'bison-delimiter
>>>> :override t
>>>> '([";"] @font-lock-delimiter-face)))
>>>>
>>>>
>>>> (defvar bison-ts-mode--font-lock-feature-list
>>>> '(( bison-comment bison-declaration bison-type
>>>> bison-grammar-rule-usage bison-grammar-rule-declaration
>>>> bison-string bison-literal bison-directive-grammar-rule
>>>> bison-operator bison-delimiter)))
>>>>
>>>>
>>>> (defun bison-ts-mode--bison-matcher-action (root-name)
>>>> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
>>>> ROOT-NAME is the highest-level node of the embedded language."
>>>> (lambda (node _parent bol &rest _)
>>>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>>>> (let* ((bison-node (treesit-node-at bol 'bison)))
>>> ^
>>> here again, let is enough
>>> (if (equal
>>>> (treesit-node-type
>>>> (treesit-node-parent(treesit-node-parent bison-node))) "action")
>>> Though you could bind the (treesit-node-type ...) expression under
>>> the
>>> above let.
>>>
>>>> t
>>>> nil)))))
>>> Why (if foo t nil) when foo would do the same job (equal only
>>> returns
>>> nil and t, so normalising the value isn't even necessary).
>>>
>> Because I was stupid.
>
> OK, I am glad I didn't miss something obvious ^^
>
>>>>
>>>> (defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
>>>> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
>>>> ROOT-NAME is the highest-level node of the embedded language."
>>>> (lambda (node _parent bol &rest _)
>>>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>>>> (let* ((bison-node (treesit-node-at bol 'bison)))
>>>> (if (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue")
>>>> nil
>>>> t)))))
>>> Am I missing something, or couldn't these two functions be merged if
>>> you
>>> give them a third argument NODE-TYPE and pass it "action" or "epilogue".
>>>
>> No, bison-ts-mode--bison-matcher-action checks if the _grandparent_ is
>> an "action" node, while bison-ts-mode--bison-matcher-not-epilogue
>> checks if the _parent_ is an "epilogue" node.
>
> In that case, would adding another parameter and then binding the
> returned lambda expression via defalias be worth the effort?
>
I'm not sure, everything inside the let body is different, any
ressemblance is coincidental. Doing so would probably add more
complexity than it would remove.
>>>>
>>>>
>>>> (defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
>>>> "Get the parent of the bison node at BOL."
>>>> (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
>>>>
>>>>
>>>> (defun bison-ts-mode--indent-rules ()
>>>> "Indent rules supported by `bison-ts-mode'."
>>>> (let*
>>>> ((common
>>>> `(((node-is "^declaration$")
>>>> column-0 0)
>>>> ((and (parent-is "^declaration$")
>>>> (not (node-is "^code_block$")))
>>>> column-0 2)
>>>> ((and (parent-is "comment") c-ts-common-looking-at-star)
>>>> c-ts-common-comment-start-after-first-star -1)
>>>> (c-ts-common-comment-2nd-line-matcher
>>>> c-ts-common-comment-2nd-line-anchor
>>>> 1)
>>>> ((parent-is "comment") prev-adaptive-prefix 0)
>>>>
>>>> ;; Opening and closing brackets "{}" of declarations
>>>> ((and (parent-is "^declaration$")
>>>> (node-is "^code_block$"))
>>>> column-0 0)
>>>> ((and (n-p-gp "}" "" "^declaration$"))
>>>> column-0 0)
>>>> ((parent-is "^declaration$") parent 2)
>>>> ((node-is "^grammar_rule$") column-0 0)
>>>> ((and
>>>> (parent-is "^grammar_rule$")
>>>> (node-is ";"))
>>>> column-0 bison-ts-mode-indent-offset)
>>>> ((and (parent-is "^grammar_rule$")
>>>> (node-is "|"))
>>>> column-0 bison-ts-mode-indent-offset)
>>>> ((and (parent-is "^grammar_rule$")
>>>> (not (node-is "^grammar_rule_declaration$"))
>>>> (not (node-is "^action$")))
>>>> column-0 ,(+ bison-ts-mode-indent-offset 2))
>>>> ((or
>>>> (node-is "^action$")
>>>> (node-is "^}$"))
>>>> column-0 12)
>>>> ;; Set '%%' at the beginning of the line
>>>> ((or
>>>> (and (parent-is "^grammar_rules_section$")
>>>> (node-is "%%"))
>>>> (node-is "^grammar_rules_section$"))
>>>> column-0 0)
>>>> (no-node parent-bol 0))))
>>>> `((bison . ,common)
>>>> ;; Import and override embedded languages rules to add an offset
>>>> ,(pcase bison-ts-mode-embedded-language
>>>> ('c `(c
>>>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>>>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>>>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>>>> column-0 ,bison-ts-mode-indent-offset)
>>>> ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
>>>> ('cpp `(cpp
>>>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>>>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>>>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>>>> parent-0 ,bison-ts-mode-indent-offset)
>>>> ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
>>>> ('java `(java
>>>> ((bison-ts-mode--bison-matcher-action "program")
>>>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>>>> ((bison-ts-mode--bison-matcher-not-epilogue "program")
>>>> column-0 ,bison-ts-mode-indent-offset)
>>>> ,@java-ts-mode--indent-rules))))))
>>>>
>>>>
>>>> (define-derived-mode bison-ts-mode prog-mode "Bison"
>>>> "A mode for Bison."
>>> ^
>>> major-mode
>>> Also, mentioning tree-sitter seems like something worth doing.
>>>
>>>> (when (treesit-ready-p 'bison)
>>>> (when (not bison-ts-mode-embedded-language)
>>>> (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
>>>>
>>>> ;; Require only if needed, to avoid warnings if a grammar is not
>>>> ;; installed but not used.
>>>> (pcase bison-ts-mode-embedded-language
>>> Would a `pcase-exhaustive' be appropriate here?
>>>
>> No, the language D is recognized but not supported yet in Emacs, so if
>> this language is detected it will not configure anything.
>
> There is a d-mode in NonGNU ELPA, but I guess that isn't enough since
> you need a d-ts-mode, right?
>
Yes, a d-ts-mode is what's needed here, because it needs tree-sitter
rules about indentation and font-locking.
>>>> ('c (require 'c-ts-mode))
>>>> ('cpp (require 'c-ts-mode))
>>>> ('java (require 'java-ts-mode)))
>>>>
>>>> (setq-local treesit-font-lock-settings
>>>> (append (bison-ts-mode--font-lock-settings 'bison)
>>>> (pcase bison-ts-mode-embedded-language
>>>> ('c (c-ts-mode--font-lock-settings 'c))
>>>> ('cpp (c-ts-mode--font-lock-settings 'cpp))
>>>> ('java java-ts-mode--font-lock-settings))))
>>>>
>>>> (setq-local treesit-font-lock-feature-list
>>>> (if bison-ts-mode-embedded-language
>>>> (bison-ts-mode--merge-feature-lists
>>>> bison-ts-mode--font-lock-feature-list
>>>> (pcase bison-ts-mode-embedded-language
>>>> ('c c-ts-mode--feature-list)
>>>> ('cpp c-ts-mode--feature-list)
>>>> ('java java-ts-mode--feature-list)))
>>>> bison-ts-mode--font-lock-feature-list))
>>>>
>>>> (setq-local treesit-simple-imenu-settings
>>>> `(("Grammar"
>>>> "\\`grammar_rule_declaration\\'"
>>>> nil
>>>> (lambda (node) (substring-no-properties (treesit-node-text node))))))
>>>>
>>>> (c-ts-common-comment-setup)
>>>>
>>>> (setq-local treesit-simple-indent-rules
>>>> (bison-ts-mode--indent-rules))
>>>>
>>>> (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
>>>>
>>>> (when bison-ts-mode-embedded-language
>>>> (setq-local treesit-range-settings
>>>> (treesit-range-rules
>>>> :embed bison-ts-mode-embedded-language
>>>> :host 'bison
>>>> :local t
>>>> '((embedded_code) @capture))))
>>>>
>>>> (treesit-major-mode-setup)))
>>>>
>>>> (provide 'bison-ts-mode)
>>>> ;;; bison-ts-mode.el ends here
>>> Sorry for the number of comments, but there has been a discussion on
>>> the
>>> code-quality of tree-sitter major modes that has been less than optimal,
>>> so I hope that your contribution could help raise the bar.
>>
>> No problem, thank you for your review!
>
> [...]
>
>>
>> (defgroup bison-ts nil
>> "Support for Bison and Yacc."
>> :group 'languages)
>>
>> (defcustom bison-ts-mode-indent-offset 2
>> "Number of spaces for each indentation step in `bison-ts-mode'.
>> It has no effect in the epilogue part of the file."
>> :version "30.1"
>> :type 'integer
>> :safe 'integerp
>> :group 'bison)
>
> The ":group" annotations here are not necessary in general, defcustoms
> can automatically detect the previous defgroup.
>
>> (defcustom bison-ts-mode-autodetect-language t
>> "Search for a %language directive in the file at initialization.
>> Changing the value of this directive in the file requires to reload the mode to
>> be effective. If `bison-ts-mode-buffer-language' is set by a file-local
>> variable, the auto-detection is not run."
>> :version "30.1"
>> :type 'boolean
>> :safe 'boolean
>> :group 'bison)
>>
>> (defvar-local bison-ts-mode-embedded-language nil
>> "Embedded language in Bison buffer.
>> Supported values are `c', `cpp', and `java'.")
>> ;;;###autoload
>> (put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp)
>>
>>
>> (defun bison-ts-mode--merge-feature-lists (l1 l2)
>> "Merge the lists of lists L1 and L2.
>> The first sublist of L1 is merged with the first sublist of L2 and so on.
>> L1 and L2 don't need to have the same size."
>> (let ((res ()))
>> (while (or l1 l2)
>> (setq res (push (seq-uniq (append (car l1) (car l2)) 'eq) res))
>> (setq l1 (cdr l1) l2 (cdr l2)))
>> (nreverse res)))
>>
>> (defun bison-ts-mode--find-language-in-buffer (&optional buffer)
>> "Find and return the language set by the Bison directive %language.
>> If BUFFER is set, search in this buffer, otherwise search in the current
>> buffer."
>> (save-excursion
>> (with-current-buffer (or buffer (current-buffer))
>> (goto-char (point-min))
>> (when
>> (re-search-forward
>> (rx
>> bol (0+ blank) "%language" (0+ blank) "\"" (group (1+ (in alpha "+"))) "\"")
>
> I'd say this regular expression is complex enough to be split into
> multiple lines. And you can use the fact that `rx' takes s-expressions
> to add comments inbetween.
>
>> nil
>> t)))
>> (substring-no-properties (match-string 1))))
>
> Or `match-string-no-properties'
>
>>
>>
>> (defun bison-ts-mode--detect-language (&optional buffer)
>> "Dectect the embedded language in a Bison buffer.
>> Known languages are C, C++, D, and Java, but D is not supported as there is
>> no support for tree-sitter D in Emacs yet.
>> If BUFFER is set, search in this buffer, otherwise search in the current
>> buffer."
>> (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
>> (pcase-exhaustive (downcase str)
>> ("c" 'c)
>> ("c++" 'cpp)
>> ("d" (message "D language not yet supported") nil)
>> ("java" 'java)
>> (_ (message "%%language specification \"%s\" is invalid, defaulting to C" str) 'c))))
>
> No point in using `pcase-exhaustive' if you end with _ anyway?
>
>>
>>
>> (defun bison-ts-mode--language-at-point-function (position)
>> "Return the language at POSITION."
>> (let ((node (treesit-node-at position 'bison)))
>> (if (equal (treesit-node-type node) "embedded_code")
>> bison-ts-mode-embedded-language
>> 'bison)))
>>
>> (defun bison-ts-mode--font-lock-settings (language)
>> "Return the font-lock settings for Bison.
>> LANGUAGE should be set to \\='bison."
>> (treesit-font-lock-rules
>> :language language
>> :feature 'comment
>> '((comment) @font-lock-comment-face)
>>
>> :language language
>> :feature 'declaration
>> '((declaration_name) @font-lock-keyword-face)
>>
>> :language language
>> :feature 'type
>> '((type) @font-lock-type-face)
>>
>> :language language
>> :feature 'variable
>> '((grammar_rule_identifier) @font-lock-variable-use-face)
>>
>> :language language
>> :feature 'grammar-declaration
>> '((grammar_rule (grammar_rule_declaration)
>> @font-lock-variable-use-face))
>>
>> :language language
>> :feature 'string
>> :override t
>> '((string) @font-lock-string-face)
>>
>> :language language
>> :feature 'literal
>> :override t
>> '((char_literal) @font-lock-keyword-face
>> (number_literal) @font-lock-number-face)
>>
>> :language language
>> :feature 'directive-grammar-rule
>> :override t
>> '((grammar_rule (directive) @font-lock-keyword-face))
>>
>> :language language
>> :feature 'operator
>> :override t
>> '(["|"] @font-lock-operator-face)
>>
>> :language language
>> :feature 'delimiter
>> :override t
>> '([";"] @font-lock-delimiter-face)))
>>
>>
>> (defvar bison-ts-mode--font-lock-feature-list
>
> I am not that familiar with the tree-sitter stuff, but would it be
> possible to use `defconst' here?
>
I made if defconst, but I'm not sure. Maybe some people would want to
modify it?
>> '(( comment declaration grammar-declaration)
>> ( type string directive-grammar-rule)
>> ( literal)
>> ( variable operator delimiter)))
>>
>>
>> (defun bison-ts-mode--bison-matcher-action (root-name)
>> "Treesit matcher to check if NODE at BOL is located in an action node.
>> ROOT-NAME is the highest-level node of the embedded language."
>> (lambda (node _parent bol &rest _)
>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>> (let ((bison-node (treesit-node-at bol 'bison)))
>> (equal
>> (treesit-node-type
>> (treesit-node-parent (treesit-node-parent bison-node)))
>> "action")))))
>>
>> (defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
>> "Treesit matcher to check if NODE at BOL is not located in the epilogue.
>> ROOT-NAME is the highest-level node of the embedded language."
>> (lambda (node _parent bol &rest _)
>> (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
>> (let ((bison-node (treesit-node-at bol 'bison)))
>> (not (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue"))))))
>>
>>
>> (defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
>> "Get the parent of the bison node at BOL."
>> (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
>>
>>
>> (defun bison-ts-mode--indent-rules ()
>> "Indent rules supported by `bison-ts-mode'."
>> (let*
>> ((common
>> `(((node-is "^declaration$")
>> column-0 0)
>> ((and (parent-is "^declaration$")
>> (not (node-is "^code_block$")))
>> column-0 2)
>> ((and (parent-is "comment") c-ts-common-looking-at-star)
>> c-ts-common-comment-start-after-first-star -1)
>> (c-ts-common-comment-2nd-line-matcher
>> c-ts-common-comment-2nd-line-anchor
>> 1)
>> ((parent-is "comment") prev-adaptive-prefix 0)
>>
>> ;; Opening and closing brackets "{}" of declarations
>> ((and (parent-is "^declaration$")
>> (node-is "^code_block$"))
>> column-0 0)
>> ((and (n-p-gp "}" "" "^declaration$"))
>> column-0 0)
>> ((parent-is "^declaration$") parent 2)
>> ((node-is "^grammar_rule$") column-0 0)
>> ((and
>> (parent-is "^grammar_rule$")
>> (node-is ";"))
>> column-0 bison-ts-mode-indent-offset)
>> ((and (parent-is "^grammar_rule$")
>> (node-is "|"))
>> column-0 bison-ts-mode-indent-offset)
>> ((and (parent-is "^grammar_rule$")
>> (not (node-is "^grammar_rule_declaration$"))
>> (not (node-is "^action$")))
>> column-0 ,(+ bison-ts-mode-indent-offset 2))
>> ((or
>> (node-is "^action$")
>> (node-is "^}$"))
>> column-0 12)
>> ;; Set '%%' at the beginning of the line
>> ((or
>> (and (parent-is "^grammar_rules_section$")
>> (node-is "%%"))
>> (node-is "^grammar_rules_section$"))
>> column-0 0)
>> (no-node parent-bol 0))))
>> `((bison . ,common)
>> ;; Import and override embedded languages rules to add an offset
>> ,(pcase bison-ts-mode-embedded-language
>> ('c `(c
>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>> column-0 ,bison-ts-mode-indent-offset)
>> ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
>> ('cpp `(cpp
>> ((bison-ts-mode--bison-matcher-action "translation_unit")
>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>> ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
>> parent-0 ,bison-ts-mode-indent-offset)
>> ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
>> ('java `(java
>> ((bison-ts-mode--bison-matcher-action "program")
>> bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>> ((bison-ts-mode--bison-matcher-not-epilogue "program")
>> column-0 ,bison-ts-mode-indent-offset)
>> ,@java-ts-mode--indent-rules))))))
>>
>>
>> (define-derived-mode bison-ts-mode prog-mode "Bison"
>> "A major-mode for Bison based on tree-sitter."
>> (when (treesit-ready-p 'bison)
>> (when (not bison-ts-mode-embedded-language)
>
> Or `unless'
>
>> (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
>>
>> ;; Require only if needed, to avoid warnings if a grammar is not
>> ;; installed but not used.
>> (pcase bison-ts-mode-embedded-language
>> ('c (require 'c-ts-mode))
>> ('cpp (require 'c-ts-mode))
>> ('java (require 'java-ts-mode)))
>>
>> (setq-local treesit-font-lock-settings
>> (append (bison-ts-mode--font-lock-settings 'bison)
>> (pcase bison-ts-mode-embedded-language
>> ('c (c-ts-mode--font-lock-settings 'c))
>> ('cpp (c-ts-mode--font-lock-settings 'cpp))
>> ('java java-ts-mode--font-lock-settings))))
>>
>> (setq-local treesit-font-lock-feature-list
>> (if bison-ts-mode-embedded-language
>> (bison-ts-mode--merge-feature-lists
>> bison-ts-mode--font-lock-feature-list
>> (pcase bison-ts-mode-embedded-language
>> ('c c-ts-mode--feature-list)
>> ('cpp c-ts-mode--feature-list)
>> ('java java-ts-mode--feature-list)))
>> bison-ts-mode--font-lock-feature-list))
>>
>> (setq-local treesit-simple-imenu-settings
>> `(("Grammar"
>> "\\`grammar_rule_declaration\\'"
>> nil
>> (lambda (node) (substring-no-properties (treesit-node-text node))))))
>
> The function `treesit-node-text' appears to take an optional NO-PROPERTY argument.
>
>>
>> (c-ts-common-comment-setup)
>>
>> (setq-local treesit-simple-indent-rules
>> (bison-ts-mode--indent-rules))
>>
>> (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
>>
>> (when bison-ts-mode-embedded-language
>> (setq-local treesit-range-settings
>> (treesit-range-rules
>> :embed bison-ts-mode-embedded-language
>> :host 'bison
>> :local t
>> '((embedded_code) @capture))))
>>
>> (treesit-major-mode-setup)))
>>
>> (provide 'bison-ts-mode)
>> ;;; bison-ts-mode.el ends here
I've also fixed a few issues and replaced some "if" by "when".
Sorry, you'll receive this message twice as I misclicked and only sent
the previous message to you...
(I also fixed an issue in the meantime)
[-- Attachment #2: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 12888 bytes --]
;;; bison-ts-mode --- tree-sitter support for Bison -*- lexical-binding: t; -*-
;; Copyright (C) 2023 Free Software Foundation, Inc.
;; Author : Augustin Chéneau <btuin@mailo.com>
;; Keywords : bison yacc languages tree-sitter
;; This file is part of GNU Emacs.
;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.
;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
;;; Commentary:
;; This is a mode based on tree-sitter for Bison and Yacc files, tools to
;; generate parsers. The grammar used is available here:
;; https://gitlab.com/btuin2/tree-sitter-bison
;;; Code:
(require 'treesit)
(require 'c-ts-common)
(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")
(defgroup bison nil
"Support for Bison and Yacc using tree-sitter."
:group 'languages)
(defcustom bison-ts-mode-indent-offset 2
"Number of spaces for each indentation step in `bison-ts-mode'.
It has no effect in the epilogue part of the file."
:version "30.1"
:type 'integer
:safe 'integerp)
(defcustom bison-ts-mode-autodetect-language t
"Search for a %language directive in the file at initialization.
Changing the value of this directive in the file requires to reload the mode to
be effective. If `bison-ts-mode-buffer-language' is set by a file-local
variable, the auto-detection is not run."
:version "30.1"
:type 'boolean
:safe 'boolean)
(defvar-local bison-ts-mode-embedded-language nil
"Embedded language in Bison buffer.
Supported values are `c', `cpp', and `java'.")
;;;###autoload
(put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp)
(defun bison-ts-mode--merge-feature-lists (l1 l2)
"Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
(let ((res ()))
(while (or l1 l2)
(setq res (push (seq-uniq (append (car l1) (car l2)) 'eq) res))
(setq l1 (cdr l1) l2 (cdr l2)))
(nreverse res)))
(defun bison-ts-mode--find-language-in-buffer (&optional buffer)
"Find and return the language set by the Bison directive %language.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(save-excursion
(with-current-buffer (or buffer (current-buffer))
(goto-char (point-min))
(when
(re-search-forward
(rx
;; Only check for a language directive at the beginning of a line
;; I'm not entirely sure it is actually mandatory, but in practice
;; directives are always on a new line.
bol
(0+ blank)
"%language"
(0+ blank)
"\""
;; Bison supports C, C++, Java, and D. They can be capitalized or not.
(group (1+ (in alpha "+")))
"\"")
nil
t)))
(match-string-no-properties 1)))
(defun bison-ts-mode--detect-language (&optional buffer)
"Dectect the embedded language in a Bison buffer.
Known languages are C, C++, D, and Java, but D is not supported as there is
no support for tree-sitter D in Emacs yet.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
(pcase (downcase str)
("c" 'c)
("c++" 'cpp)
("d" (message "D language not yet supported") nil)
("java" 'java)
(_ (message "%%language specification \"%s\" is invalid, defaulting to C" str) 'c))))
(defun bison-ts-mode--language-at-point-function (position)
"Return the language at POSITION."
(let ((node (treesit-node-at position 'bison)))
(if (equal (treesit-node-type node) "embedded_code")
bison-ts-mode-embedded-language
'bison)))
(defun bison-ts-mode--font-lock-settings (language)
"Return the font-lock settings for Bison.
LANGUAGE should be set to \\='bison."
(treesit-font-lock-rules
:language language
:feature 'comment
'((comment) @font-lock-comment-face)
:language language
:feature 'declaration
'((declaration_name) @font-lock-keyword-face)
:language language
:feature 'type
'((type) @font-lock-type-face)
:language language
:feature 'variable
'((grammar_rule_identifier) @font-lock-variable-use-face)
:language language
:feature 'grammar-declaration
'((grammar_rule (grammar_rule_declaration)
@font-lock-variable-use-face))
:language language
:feature 'string
:override t
'((string) @font-lock-string-face)
:language language
:feature 'literal
:override t
'((char_literal) @font-lock-keyword-face
(number_literal) @font-lock-number-face)
:language language
:feature 'directive-grammar-rule
:override t
'((grammar_rule (directive) @font-lock-keyword-face))
:language language
:feature 'operator
:override t
'(["|"] @font-lock-operator-face)
:language language
:feature 'delimiter
:override t
'([";"] @font-lock-delimiter-face)))
(defconst bison-ts-mode--font-lock-feature-list
'(( comment declaration grammar-declaration)
( type string directive-grammar-rule)
( literal)
( variable operator delimiter)))
(defun bison-ts-mode--bison-matcher-action (root-name)
"Treesit matcher to check if NODE at BOL is located in an action node.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(when (equal (treesit-node-type (treesit-node-parent node)) root-name)
(let ((bison-node (treesit-node-at bol 'bison)))
(equal
(treesit-node-type
(treesit-node-parent (treesit-node-parent bison-node)))
"action")))))
(defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
"Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(when (equal (treesit-node-type (treesit-node-parent node)) root-name)
(let ((bison-node (treesit-node-at bol 'bison)))
(not (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue"))))))
(defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
"Get the parent of the bison node at BOL."
(treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
(defun bison-ts-mode--indent-rules ()
"Indent rules supported by `bison-ts-mode'."
(let*
((common
`(((node-is "^declaration$")
column-0 0)
((and (parent-is "^declaration$")
(not (node-is "^code_block$")))
column-0 2)
((and (parent-is "comment") c-ts-common-looking-at-star)
c-ts-common-comment-start-after-first-star -1)
(c-ts-common-comment-2nd-line-matcher
c-ts-common-comment-2nd-line-anchor
1)
((parent-is "comment") prev-adaptive-prefix 0)
;; Opening and closing brackets "{}" of declarations
((and (parent-is "^declaration$")
(node-is "^code_block$"))
column-0 0)
((and (n-p-gp "}" "" "^declaration$"))
column-0 0)
((parent-is "^declaration$") parent 2)
((node-is "^grammar_rule$") column-0 0)
((and
(parent-is "^grammar_rule$")
(node-is ";"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(node-is "|"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(not (node-is "^grammar_rule_declaration$"))
(not (node-is "^action$")))
column-0 ,(+ bison-ts-mode-indent-offset 2))
((or
(node-is "^action$")
(node-is "^}$"))
column-0 12)
;; Set '%%' at the beginning of the line
((or
(and (parent-is "^grammar_rules_section$")
(node-is "%%"))
(node-is "^grammar_rules_section$"))
column-0 0)
(no-node parent-bol 0))))
`((bison . ,common)
;; Import and override embedded languages rules to add an offset
,(pcase bison-ts-mode-embedded-language
('c `(c
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
column-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
('cpp `(cpp
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
column-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
('java `(java
((bison-ts-mode--bison-matcher-action "program")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "program")
column-0 ,bison-ts-mode-indent-offset)
,@java-ts-mode--indent-rules))))))
(define-derived-mode bison-ts-mode prog-mode "Bison"
"A major-mode for Bison based on tree-sitter."
(when (treesit-ready-p 'bison)
(unless bison-ts-mode-embedded-language
(setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
;; Require only if needed, to avoid warnings if a grammar is not
;; installed but not used.
(pcase bison-ts-mode-embedded-language
('c (require 'c-ts-mode))
('cpp (require 'c-ts-mode))
('java (require 'java-ts-mode)))
(setq-local treesit-font-lock-settings
(append (bison-ts-mode--font-lock-settings 'bison)
(pcase bison-ts-mode-embedded-language
('c (c-ts-mode--font-lock-settings 'c))
('cpp (c-ts-mode--font-lock-settings 'cpp))
('java java-ts-mode--font-lock-settings))))
(setq-local treesit-font-lock-feature-list
(if bison-ts-mode-embedded-language
(bison-ts-mode--merge-feature-lists
bison-ts-mode--font-lock-feature-list
(pcase bison-ts-mode-embedded-language
('c c-ts-mode--feature-list)
('cpp c-ts-mode--feature-list)
('java java-ts-mode--feature-list)))
bison-ts-mode--font-lock-feature-list))
(setq-local treesit-simple-imenu-settings
`(("Grammar"
"\\`grammar_rule_declaration\\'"
nil
(lambda (node) (treesit-node-text node t)))))
(c-ts-common-comment-setup)
(setq-local treesit-simple-indent-rules
(bison-ts-mode--indent-rules))
(setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
(if bison-ts-mode-embedded-language
(setq-local treesit-range-settings
(treesit-range-rules
:embed bison-ts-mode-embedded-language
:host 'bison
:local t
'((embedded_code) @capture)))
(treesit-parser-create 'bison))
(treesit-major-mode-setup)))
(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-22 5:52 ` Eli Zaretskii
@ 2023-09-22 23:44 ` Yuan Fu
2023-09-23 5:52 ` Eli Zaretskii
0 siblings, 1 reply; 17+ messages in thread
From: Yuan Fu @ 2023-09-22 23:44 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: "Augustin Chéneau (BTuin)", emacs-devel
> On Sep 21, 2023, at 10:52 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> Date: Thu, 21 Sep 2023 22:15:45 +0200
>> From: Augustin Chéneau (BTuin) <btuin@mailo.com>
>>
>> I'd like to propose a new tree-sitter mode for Emacs, bison-ts-mode.
>> This is a major mode for GNU Bison grammar files, and it supports the
>> embedded languages C, C++ and Java. Bison also supports D, but this
>> language is not yet supported in Emacs so this mode does not support it
>> either.
>>
>> Since Bison is a superset of Yacc, it should work with Yacc without
>> modification.
>> The grammar is available here: <https://gitlab.com/btuin2/tree-sitter-bison>
>
> Thanks.
>
>> In addition of the mode, three patches are joined (should I open a bug
>> report for those?).
>>
>>
>> The first one is to add some missing rules in c-ts-mode. Currently
>> there is no rule if an identifier in a declaration is on a new line.
>> In these two cases, "variable_name" is not indented:
>>
>> static myttype *
>> variable_name;
>>
>> static myttype
>> variable_name;
>>
>> The new behavior is the same as c-mode.
>
> I think this patch should be installed on the emacs-29 branch. (It is
> a very small change, so we can install it even before the legal
> paperwork is completed.) Yuan, any objections or comments to this
> one?
LGTM. How do you write the assignment except cookie? Is it PAPERWORK-EXEMPT: yes?
Yuan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-22 23:44 ` Yuan Fu
@ 2023-09-23 5:52 ` Eli Zaretskii
2023-09-26 3:42 ` Yuan Fu
0 siblings, 1 reply; 17+ messages in thread
From: Eli Zaretskii @ 2023-09-23 5:52 UTC (permalink / raw)
To: Yuan Fu; +Cc: btuin, emacs-devel
> From: Yuan Fu <casouri@gmail.com>
> Date: Fri, 22 Sep 2023 16:44:35 -0700
> Cc: "Augustin Chéneau (BTuin)" <btuin@mailo.com>,
> emacs-devel@gnu.org
>
> > I think this patch should be installed on the emacs-29 branch. (It is
> > a very small change, so we can install it even before the legal
> > paperwork is completed.) Yuan, any objections or comments to this
> > one?
>
> LGTM. How do you write the assignment except cookie? Is it PAPERWORK-EXEMPT: yes?
Copyright-paperwork-exempt: yes
(It's in CONTRIBUTE, in case you forget it.)
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
` (4 preceding siblings ...)
2023-09-22 8:45 ` Yuan Fu
@ 2023-09-24 21:10 ` Yuan Fu
2023-09-26 11:52 ` Augustin Chéneau (BTuin)
5 siblings, 1 reply; 17+ messages in thread
From: Yuan Fu @ 2023-09-24 21:10 UTC (permalink / raw)
To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel
> On Sep 21, 2023, at 1:15 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>
> Hello,
> I'd like to propose a new tree-sitter mode for Emacs, bison-ts-mode.
> This is a major mode for GNU Bison grammar files, and it supports the embedded languages C, C++ and Java. Bison also supports D, but this language is not yet supported in Emacs so this mode does not support it either.
>
> Since Bison is a superset of Yacc, it should work with Yacc without modification.
> The grammar is available here: <https://gitlab.com/btuin2/tree-sitter-bison>
>
> This mode auto-detects the embedded language in the file if the directive "%language" is set.
>
>
> In addition of the mode, three patches are joined (should I open a bug report for those?).
>
>
> The first one is to add some missing rules in c-ts-mode. Currently there is no rule if an identifier in a declaration is on a new line.
> In these two cases, "variable_name" is not indented:
>
> static myttype *
> variable_name;
>
> static myttype
> variable_name;
>
> The new behavior is the same as c-mode.
Actually, a second thought. IIUC the reason for this change is that without these rules, no indentation is done for variable names in code like
static myttype *
variable_name;
static myttype
variable_name;
This is fine in normal C code, since no indentation is the same as
zero indentation, which is what we want. But for C embedded in Bison,
no indentation is not the same as zero indentation. We want it indent
to the top-level indentation of the embedded C. E.g.,
{
static myttype
variable_name;
}
rather than
{
static myttype
variable_name;
}
Right? In that case, we should really add a fallback indent rule for the embedded c in bison, such that it indents to the top-level of the embedded c. This is more robust than trying to cover all cases in the c rules. Does that make sense?
Basically, add something like
(catch-all bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
To the end of C’s indent rules.
Yuan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-23 5:52 ` Eli Zaretskii
@ 2023-09-26 3:42 ` Yuan Fu
0 siblings, 0 replies; 17+ messages in thread
From: Yuan Fu @ 2023-09-26 3:42 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: "Augustin Chéneau (BTuin)", emacs-devel
> On Sep 22, 2023, at 10:52 PM, Eli Zaretskii <eliz@gnu.org> wrote:
>
>> From: Yuan Fu <casouri@gmail.com>
>> Date: Fri, 22 Sep 2023 16:44:35 -0700
>> Cc: "Augustin Chéneau (BTuin)" <btuin@mailo.com>,
>> emacs-devel@gnu.org
>>
>>> I think this patch should be installed on the emacs-29 branch. (It is
>>> a very small change, so we can install it even before the legal
>>> paperwork is completed.) Yuan, any objections or comments to this
>>> one?
>>
>> LGTM. How do you write the assignment except cookie? Is it PAPERWORK-EXEMPT: yes?
>
> Copyright-paperwork-exempt: yes
>
> (It's in CONTRIBUTE, in case you forget it.)
Thanks! I applied the patch.
Yuan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-24 21:10 ` Yuan Fu
@ 2023-09-26 11:52 ` Augustin Chéneau (BTuin)
2023-09-28 7:03 ` Yuan Fu
0 siblings, 1 reply; 17+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-26 11:52 UTC (permalink / raw)
To: Yuan Fu; +Cc: emacs-devel
[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]
Le 24/09/2023 à 23:10, Yuan Fu a écrit :
> Actually, a second thought. IIUC the reason for this change is that without these rules, no indentation is done for variable names in code like
>
> static myttype *
> variable_name;
>
> static myttype
> variable_name;
>
> This is fine in normal C code, since no indentation is the same as
> zero indentation, which is what we want. But for C embedded in Bison,
> no indentation is not the same as zero indentation. We want it indent
> to the top-level indentation of the embedded C. E.g.,
>
> {
> static myttype
> variable_name;
> }
>
> rather than
>
> {
> static myttype
> variable_name;
> }
>
> Right? In that case, we should really add a fallback indent rule for the embedded c in bison, such that it indents to the top-level of the embedded c. This is more robust than trying to cover all cases in the c rules. Does that make sense?
>
> Basically, add something like
>
> (catch-all bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>
> To the end of C’s indent rules.
>
> Yuan
It does make sense, thanks. I updated bison-ts-mode to do that.
[-- Attachment #2: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 13950 bytes --]
;;; bison-ts-mode --- tree-sitter support for Bison -*- lexical-binding: t; -*-
;; Copyright (C) 2023 Free Software Foundation, Inc.
;; Author : Augustin Chéneau <btuin@mailo.com>
;; Keywords : bison yacc languages tree-sitter
;; This file is part of GNU Emacs.
;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.
;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
;; GNU General Public License for more details.
;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs. If not, see <https://www.gnu.org/licenses/>.
;;; Commentary:
;; This is a mode based on tree-sitter for Bison and Yacc files, tools to
;; generate parsers. The grammar used is available here:
;; https://gitlab.com/btuin2/tree-sitter-bison
;;; Code:
(require 'treesit)
(require 'c-ts-common)
(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")
(defgroup bison nil
"Support for Bison and Yacc using tree-sitter."
:group 'languages)
(defcustom bison-ts-mode-indent-offset 2
"Number of spaces for each indentation step in `bison-ts-mode'.
It has no effect in the epilogue part of the file."
:version "30.1"
:type 'integer
:safe 'integerp)
(defcustom bison-ts-mode-autodetect-language t
"Search for a %language directive in the file at initialization.
Changing the value of this directive in the file requires to reload the mode to
be effective. If `bison-ts-mode-buffer-language' is set by a file-local
variable, the auto-detection is not run."
:version "30.1"
:type 'boolean
:safe 'boolean)
(defvar-local bison-ts-mode-embedded-language nil
"Embedded language in Bison buffer.
Supported values are `c', `cpp', and `java'.")
;;;###autoload
(put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp)
(defun bison-ts-mode--merge-feature-lists (l1 l2)
"Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
(let ((res ()))
(while (or l1 l2)
(setq res (push (seq-uniq (append (car l1) (car l2)) 'eq) res))
(setq l1 (cdr l1) l2 (cdr l2)))
(nreverse res)))
(defun bison-ts-mode--find-language-in-buffer (&optional buffer)
"Find and return the language set by the Bison directive %language.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(save-excursion
(with-current-buffer (or buffer (current-buffer))
(goto-char (point-min))
(when
(re-search-forward
(rx
;; Only check for a language directive at the beginning of a line
;; I'm not entirely sure it is actually mandatory, but in practice
;; directives are always on a new line.
bol
(0+ blank)
"%language"
(0+ blank)
"\""
;; Bison supports C, C++, Java, and D. They can be capitalized or not.
(group (1+ (in alpha "+")))
"\"")
nil
t)))
(match-string-no-properties 1)))
(defun bison-ts-mode--detect-language (&optional buffer)
"Dectect the embedded language in a Bison buffer.
Known languages are C, C++, D, and Java, but D is not supported as there is
no support for tree-sitter D in Emacs yet.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
(if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
(pcase (downcase str)
("c" 'c)
("c++" 'cpp)
("d" (message "D language not yet supported") nil)
("java" 'java)
(_ (message "%%language specification \"%s\" is invalid, defaulting to C" str) 'c))))
(defun bison-ts-mode--language-at-point-function (position)
"Return the language at POSITION."
(let ((node (treesit-node-at position 'bison)))
(if (equal (treesit-node-type node) "embedded_code")
bison-ts-mode-embedded-language
'bison)))
(defun bison-ts-mode--font-lock-settings (language)
"Return the font-lock settings for Bison.
LANGUAGE should be set to \\='bison."
(treesit-font-lock-rules
:language language
:feature 'comment
'((comment) @font-lock-comment-face)
:language language
:feature 'declaration
'((declaration_name) @font-lock-keyword-face)
:language language
:feature 'type
'((type) @font-lock-type-face)
:language language
:feature 'variable
'((grammar_rule_identifier) @font-lock-variable-use-face)
:language language
:feature 'grammar-declaration
'((grammar_rule (grammar_rule_declaration)
@font-lock-variable-use-face))
:language language
:feature 'string
:override t
'((string) @font-lock-string-face)
:language language
:feature 'literal
:override t
'((char_literal) @font-lock-keyword-face
(number_literal) @font-lock-number-face)
:language language
:feature 'directive-grammar-rule
:override t
'((grammar_rule (directive) @font-lock-keyword-face))
:language language
:feature 'operator
:override t
'(["|"] @font-lock-operator-face)
:language language
:feature 'delimiter
:override t
'([";"] @font-lock-delimiter-face)))
(defconst bison-ts-mode--font-lock-feature-list
'(( comment declaration grammar-declaration)
( type string directive-grammar-rule)
( literal)
( variable operator delimiter)))
(defun bison-ts-mode--inside-p (type bol)
"Check if node at BOL is contained inside a Bison TYPE node."
(treesit-parent-until
(treesit-node-at bol 'bison)
(lambda (node) (equal (treesit-node-type node) type))))
(defun bison-ts-mode--catch-all (type)
"Treesit matcher to catch all nodes inside a Bison TYPE node."
(lambda (_node _parent bol &rest _)
(bison-ts-mode--inside-p type bol)))
(defun bison-ts-mode--bison-matcher-action (root-name)
"Treesit matcher to check if NODE at BOL is located in an action node.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(when (equal (treesit-node-type (treesit-node-parent node)) root-name)
(bison-ts-mode--inside-p "action" bol))))
(defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
"Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
(lambda (node _parent bol &rest _)
(when (equal (treesit-node-type (treesit-node-parent node)) root-name)
(not (bison-ts-mode--inside-p "epilogue" bol)))))
(defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
"Get the parent of the bison node at BOL."
(treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))
(defun bison-ts-mode--indent-rules ()
"Indent rules supported by `bison-ts-mode'."
(let
((common
`(((node-is "^declaration$")
column-0 0)
((and (parent-is "^declaration$")
(not (node-is "^code_block$")))
column-0 2)
((and (parent-is "comment") (c-ts-common-looking-at ?*))
c-ts-common-comment-start-after-first-star -1)
((and (parent-is "comment") (c-ts-common-looking-at ?| ?\\ ?`) )
c-ts-common-comment-start-after-first-star -2)
(c-ts-common-comment-2nd-line-matcher
c-ts-common-comment-2nd-line-anchor
1)
((parent-is "comment") prev-adaptive-prefix 0)
;; Opening and closing brackets "{}" of declarations
((and (parent-is "^declaration$")
(node-is "^code_block$"))
column-0 0)
((and (n-p-gp "}" "" "^declaration$"))
column-0 0)
((parent-is "^declaration$") parent 2)
((node-is "^grammar_rule$") column-0 0)
((and
(parent-is "^grammar_rule$")
(node-is ";"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(node-is "|"))
column-0 bison-ts-mode-indent-offset)
((and (parent-is "^grammar_rule$")
(not (node-is "^grammar_rule_declaration$"))
(not (node-is "^action$")))
column-0 ,(+ bison-ts-mode-indent-offset 2))
((or
(node-is "^action$")
(node-is "^}$"))
column-0 12)
;; Set '%%' at the beginning of the line
((or
(and (parent-is "^grammar_rules_section$")
(node-is "%%"))
(node-is "^grammar_rules_section$"))
column-0 0)
(no-node parent-bol 0))))
`((bison . ,common)
;; Import and override embedded languages rules to add an offset
,(pcase bison-ts-mode-embedded-language
('c `(c
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
column-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'c (c-ts-mode--get-indent-style 'c))
((bison-ts-mode--catch-all "action")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((not (bison-ts-mode--catch-all "epilogue"))
column-0 ,bison-ts-mode-indent-offset)))
('cpp `(cpp
((bison-ts-mode--bison-matcher-action "translation_unit")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
column-0 ,bison-ts-mode-indent-offset)
,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))
((bison-ts-mode--catch-all "action")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((not (bison-ts-mode--catch-all "epilogue"))
column-0 ,bison-ts-mode-indent-offset)))
('java `(java
((bison-ts-mode--bison-matcher-action "program")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((bison-ts-mode--bison-matcher-not-epilogue "program")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
,@java-ts-mode--indent-rules
((bison-ts-mode--catch-all "action")
bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
((not (bison-ts-mode--catch-all "epilogue"))
column-0 ,bison-ts-mode-indent-offset)))))))
(define-derived-mode bison-ts-mode prog-mode "Bison"
"A major-mode for Bison based on tree-sitter."
(when (treesit-ready-p 'bison)
(unless bison-ts-mode-embedded-language
(setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))
;; Require only if needed, to avoid warnings if a grammar is not
;; installed but not used.
(pcase bison-ts-mode-embedded-language
('c (require 'c-ts-mode))
('cpp (require 'c-ts-mode))
('java (require 'java-ts-mode)))
(setq-local treesit-font-lock-settings
(append (bison-ts-mode--font-lock-settings 'bison)
(pcase bison-ts-mode-embedded-language
('c (c-ts-mode--font-lock-settings 'c))
('cpp (c-ts-mode--font-lock-settings 'cpp))
('java java-ts-mode--font-lock-settings))))
(setq-local treesit-font-lock-feature-list
(if bison-ts-mode-embedded-language
(bison-ts-mode--merge-feature-lists
bison-ts-mode--font-lock-feature-list
(pcase bison-ts-mode-embedded-language
('c c-ts-mode--feature-list)
('cpp c-ts-mode--feature-list)
('java java-ts-mode--feature-list)))
bison-ts-mode--font-lock-feature-list))
(setq-local treesit-simple-imenu-settings
`(("Grammar"
"\\`grammar_rule_declaration\\'"
nil
(lambda (node) (treesit-node-text node t)))))
(c-ts-common-comment-setup)
(setq-local treesit-simple-indent-rules
(bison-ts-mode--indent-rules))
(setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)
(if bison-ts-mode-embedded-language
(setq-local treesit-range-settings
(treesit-range-rules
:embed bison-ts-mode-embedded-language
:host 'bison
:local t
'((embedded_code) @capture)))
(treesit-parser-create 'bison))
(treesit-major-mode-setup)))
(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-26 11:52 ` Augustin Chéneau (BTuin)
@ 2023-09-28 7:03 ` Yuan Fu
[not found] ` <b999a251-1778-49ac-90dc-ef8d78d36d53@mailo.com>
2023-09-29 14:13 ` Eli Zaretskii
0 siblings, 2 replies; 17+ messages in thread
From: Yuan Fu @ 2023-09-28 7:03 UTC (permalink / raw)
To: "Augustin Chéneau (BTuin)"
Cc: emacs-devel, Eli Zaretskii, Stefan Kangas
[-- Attachment #1: Type: text/plain, Size: 1415 bytes --]
> On Sep 26, 2023, at 4:52 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>
> Le 24/09/2023 à 23:10, Yuan Fu a écrit :
>> Actually, a second thought. IIUC the reason for this change is that without these rules, no indentation is done for variable names in code like
>> static myttype *
>> variable_name;
>> static myttype
>> variable_name;
>> This is fine in normal C code, since no indentation is the same as
>> zero indentation, which is what we want. But for C embedded in Bison,
>> no indentation is not the same as zero indentation. We want it indent
>> to the top-level indentation of the embedded C. E.g.,
>> {
>> static myttype
>> variable_name;
>> }
>> rather than
>> {
>> static myttype
>> variable_name;
>> }
>> Right? In that case, we should really add a fallback indent rule for the embedded c in bison, such that it indents to the top-level of the embedded c. This is more robust than trying to cover all cases in the c rules. Does that make sense?
>> Basically, add something like
>> (catch-all bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
>> To the end of C’s indent rules.
>> Yuan
>
> It does make sense, thanks. I updated bison-ts-mode to do that.
Thanks. LGTM except some minor byte-compile warnings. I attached a diff, please see if you like those changes.
Once the paperwork is done I think we are good to go.
Yuan
[-- Attachment #2: bison-comment.diff --]
[-- Type: application/octet-stream, Size: 1582 bytes --]
diff --git a/bison-ts-mode.el b/bison-ts-mode.el.1
index febd2d9..913c1fa 100644
--- a/bison-ts-mode.el
+++ b/bison-ts-mode.el.1
@@ -30,6 +30,8 @@
(require 'treesit)
(require 'c-ts-common)
+(require 'c-ts-mode) ; For ‘c-ts-mode--get-indent-style’.
+(require 'java-ts-mode) ; For ‘java-ts-mode--indent-rules’.
(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
@@ -62,9 +64,10 @@ It has no effect in the epilogue part of the file."
(defcustom bison-ts-mode-autodetect-language t
"Search for a %language directive in the file at initialization.
-Changing the value of this directive in the file requires to reload the mode to
-be effective. If `bison-ts-mode-buffer-language' is set by a file-local
- variable, the auto-detection is not run."
+Changing the value of this directive in the file requires to
+reload the mode to be effective. If
+`bison-ts-mode-buffer-language' is set by a file-local variable,
+the auto-detection is not run."
:version "30.1"
:type 'boolean
:safe 'boolean)
@@ -108,8 +111,8 @@ buffer."
(group (1+ (in alpha "+")))
"\"")
nil
- t)))
- (match-string-no-properties 1)))
+ t)
+ (match-string-no-properties 1)))))
(defun bison-ts-mode--detect-language (&optional buffer)
@@ -366,4 +369,5 @@ ROOT-NAME is the highest-level node of the embedded language."
(treesit-major-mode-setup)))
(provide 'bison-ts-mode)
+
;;; bison-ts-mode.el ends here
[-- Attachment #3: Type: text/plain, Size: 2 bytes --]
^ permalink raw reply related [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
[not found] ` <b999a251-1778-49ac-90dc-ef8d78d36d53@mailo.com>
@ 2023-09-29 1:26 ` Yuan Fu
0 siblings, 0 replies; 17+ messages in thread
From: Yuan Fu @ 2023-09-29 1:26 UTC (permalink / raw)
To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel
(It seems that you forgot to CC, I added emacs-devel back.)
> On Sep 28, 2023, at 3:41 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>
> Le 28/09/2023 à 09:03, Yuan Fu a écrit :
>> Thanks. LGTM except some minor byte-compile warnings. I attached a diff, please see if you like those changes.
>> Once the paperwork is done I think we are good to go.
>> Yuan
>
> diff --git a/bison-ts-mode.el b/bison-ts-mode.el.1
> index febd2d9..913c1fa 100644
> --- a/bison-ts-mode.el
> +++ b/bison-ts-mode.el.1
> @@ -30,6 +30,8 @@
>
> (require 'treesit)
> (require 'c-ts-common)
> +(require 'c-ts-mode) ; For ‘c-ts-mode--get-indent-style’.
> +(require 'java-ts-mode) ; For ‘java-ts-mode--indent-rules’.
>
> (declare-function treesit-parser-create "treesit.c")
> (declare-function treesit-induce-sparse-tree "treesit.c")
>
>
> I did that because if you don't have one of the grammars installed (for C, C++, or Java), it generates a warning about a missing grammar, such as:
>
> Warning (treesit): Cannot activate tree-sitter, because language grammar for java is unavailable (not-found)
>
> This is because of lines like (line 404 in java-ts-mode.el):
>
> (if (treesit-ready-p 'java)
> (add-to-list 'auto-mode-alist '("\\.java\\'" . java-ts-mode)))
>
> at the end of the files.
Makes sense! I would add a comment explaining that.
>
> By the way, shouldn't (treesit-ready-p 'java) here have its argument QUIET set to t? Why generate a warning here, since there is already a check in define-derived-mode?
>
Yeah, long story short, after considerable discussion, it’s decided that (at least for Emacs 29) loading the tree-sitter major modes files should automatically enable the major mode.
>
>
> @@ -108,8 +111,8 @@ buffer."
> (group (1+ (in alpha "+")))
> "\"")
> nil
> - t)))
> - (match-string-no-properties 1)))
> + t)
> + (match-string-no-properties 1)))))
> Thanks!
>
Yuan
^ permalink raw reply [flat|nested] 17+ messages in thread
* Re: New tree-sitter mode: bison-ts-mode
2023-09-28 7:03 ` Yuan Fu
[not found] ` <b999a251-1778-49ac-90dc-ef8d78d36d53@mailo.com>
@ 2023-09-29 14:13 ` Eli Zaretskii
1 sibling, 0 replies; 17+ messages in thread
From: Eli Zaretskii @ 2023-09-29 14:13 UTC (permalink / raw)
To: Yuan Fu; +Cc: btuin, emacs-devel, stefankangas
> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 28 Sep 2023 00:03:10 -0700
> Cc: emacs-devel <emacs-devel@gnu.org>, Eli Zaretskii <eliz@gnu.org>,
> Stefan Kangas <stefankangas@gmail.com>
>
> @@ -62,9 +64,10 @@ It has no effect in the epilogue part of the file."
>
> (defcustom bison-ts-mode-autodetect-language t
> "Search for a %language directive in the file at initialization.
> -Changing the value of this directive in the file requires to reload the mode to
> -be effective. If `bison-ts-mode-buffer-language' is set by a file-local
> - variable, the auto-detection is not run."
> +Changing the value of this directive in the file requires to
> +reload the mode to be effective. If
^^
Two spaces between sentences, please.
^ permalink raw reply [flat|nested] 17+ messages in thread
end of thread, other threads:[~2023-09-29 14:13 UTC | newest]
Thread overview: 17+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
2023-09-21 22:23 ` Stefan Kangas
2023-09-22 5:52 ` Eli Zaretskii
2023-09-22 23:44 ` Yuan Fu
2023-09-23 5:52 ` Eli Zaretskii
2023-09-26 3:42 ` Yuan Fu
2023-09-22 7:38 ` Philip Kaludercic
2023-09-22 14:53 ` Augustin Chéneau (BTuin)
2023-09-22 20:40 ` Philip Kaludercic
2023-09-22 23:21 ` Augustin Chéneau (BTuin)
2023-09-22 7:42 ` Stefan Kangas
2023-09-22 8:45 ` Yuan Fu
2023-09-24 21:10 ` Yuan Fu
2023-09-26 11:52 ` Augustin Chéneau (BTuin)
2023-09-28 7:03 ` Yuan Fu
[not found] ` <b999a251-1778-49ac-90dc-ef8d78d36d53@mailo.com>
2023-09-29 1:26 ` Yuan Fu
2023-09-29 14:13 ` Eli Zaretskii
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.