unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* New tree-sitter mode: bison-ts-mode
@ 2023-09-21 20:15 Augustin Chéneau (BTuin)
  2023-09-21 22:23 ` Stefan Kangas
                   ` (5 more replies)
  0 siblings, 6 replies; 17+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-21 20:15 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1640 bytes --]

Hello,
I'd like to propose a new tree-sitter mode for Emacs, bison-ts-mode.
This is a major mode for GNU Bison grammar files, and it supports the 
embedded languages C, C++ and Java.  Bison also supports D, but this 
language is not yet supported in Emacs so this mode does not support it 
either.

Since Bison is a superset of Yacc, it should work with Yacc without 
modification.
The grammar is available here: <https://gitlab.com/btuin2/tree-sitter-bison>

This mode auto-detects the embedded language in the file if the 
directive "%language" is set.


In addition of the mode, three patches are joined (should I open a bug 
report for those?).


The first one is to add some missing rules in c-ts-mode.  Currently 
there is no rule if an identifier in a declaration is on a new line.
In these two cases, "variable_name" is not indented:

static myttype *
variable_name;

static myttype
variable_name;

The new behavior is the same as c-mode.



The second one fix an edge case with multiple languages indentation. 
Here is a simple example:

// language_A
     {
        // language_B
     }
// language_A


Here, brackets are managed by language_A, but everything inside is 
managed by language_B.  Meaning that if the pointer is at the beginning 
of the third line (with the closing bracket), it will use language_B's 
indent rules.  But if the point is located on the bracket, it will use 
language_A's rules.



The third patch simply extracts java-ts-mode's feature list in a 
variable, to allow to reuse it.


Huge thanks to Yuan Fu for his help and his amazing work with tree-sitter.

I'll need to sign the copyright assignment.

[-- Attachment #2: 0001-Add-missing-indent-rules-in-c-ts-mode.patch --]
[-- Type: text/x-patch, Size: 998 bytes --]

From 2d5248d9612feb6e55bf9e22e5d8bc1b8f565ba8 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Wed, 20 Sep 2023 12:23:06 +0200
Subject: [PATCH 1/3] Add missing indent rules in c-ts-mode

---
 lisp/progmodes/c-ts-mode.el | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
index 70717a90caa..314f37f7ec5 100644
--- a/lisp/progmodes/c-ts-mode.el
+++ b/lisp/progmodes/c-ts-mode.el
@@ -407,6 +407,8 @@ c-ts-mode--indent-styles
            ((parent-is "preproc") c-ts-mode--anchor-prev-sibling 0)
 
            ((parent-is "function_definition") parent-bol 0)
+           ((parent-is "pointer_declarator") parent-bol 0)
+           ((parent-is "declaration") parent-bol 0)
            ((parent-is "conditional_expression") first-sibling 0)
            ((parent-is "assignment_expression") parent-bol c-ts-mode-indent-offset)
            ((parent-is "concatenated_string") first-sibling 0)
-- 
2.42.0


[-- Attachment #3: 0002-Fix-tree-sitter-indentation-conflict-with-multiple-l.patch --]
[-- Type: text/x-patch, Size: 1052 bytes --]

From a72df328a1195328a364be20612be189eb2dfd59 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Wed, 20 Sep 2023 12:23:22 +0200
Subject: [PATCH 2/3] Fix tree-sitter indentation conflict with multiple
 languages

---
 lisp/treesit.el | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/lisp/treesit.el b/lisp/treesit.el
index f0d6f32b421..0bb3aa6d653 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -1684,8 +1684,8 @@ treesit--indent-1
                                 bol (car local-parsers)))
                 ((eq 1 (length (treesit-parser-list nil nil t)))
                  (treesit-node-at bol))
-                ((treesit-language-at (point))
-                 (treesit-node-at bol (treesit-language-at (point))))
+                ((treesit-language-at bol)
+                 (treesit-node-at bol (treesit-language-at bol)))
                 (t (treesit-node-at bol))))
          (root (treesit-parser-root-node
                 (treesit-node-parser smallest-node)))
-- 
2.42.0


[-- Attachment #4: 0003-Extract-feature-list-of-java-ts-mode-in-a-variable.patch --]
[-- Type: text/x-patch, Size: 1466 bytes --]

From 3a1e8eb01481423ad8309e52931d580704f62141 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Thu, 21 Sep 2023 18:29:24 +0200
Subject: [PATCH 3/3] Extract feature list of java-ts-mode in a variable

---
 lisp/progmodes/java-ts-mode.el | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lisp/progmodes/java-ts-mode.el b/lisp/progmodes/java-ts-mode.el
index 99d34789dcc..a6bc4c4bded 100644
--- a/lisp/progmodes/java-ts-mode.el
+++ b/lisp/progmodes/java-ts-mode.el
@@ -303,6 +303,13 @@ java-ts-mode--defun-name
       (treesit-node-child-by-field-name node "name")
       t))))
 
+
+(defvar java-ts-mode--feature-list
+  '(( comment definition )
+    ( constant keyword string type)
+    ( annotation expression literal)
+    ( bracket delimiter operator)))
+
 ;;;###autoload
 (define-derived-mode java-ts-mode prog-mode "Java"
   "Major mode for editing Java, powered by tree-sitter."
@@ -382,11 +389,7 @@ java-ts-mode
 
   ;; Font-lock.
   (setq-local treesit-font-lock-settings java-ts-mode--font-lock-settings)
-  (setq-local treesit-font-lock-feature-list
-              '(( comment definition )
-                ( constant keyword string type)
-                ( annotation expression literal)
-                ( bracket delimiter operator)))
+  (setq-local treesit-font-lock-feature-list java-ts-mode--feature-list)
 
   ;; Imenu.
   (setq-local treesit-simple-imenu-settings
-- 
2.42.0


[-- Attachment #5: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 11858 bytes --]

;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-

;;; Commentary:

;; This is a mode based on tree-sitter for Bison and Yacc files, tools to generate parsers.

;;; Code:

(require 'treesit)
(require 'c-ts-common)

(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")


(defgroup bison nil
  "Support for Bison and Yacc."
  :group 'languages)

(defcustom bison-ts-mode-indent-offset 2
  "Number of spaces for each indentation step in `bison-ts-mode'.
It has no effect in the epilogue part of the file."
  :version "30.1"
  :type 'integer
  :safe 'integerp
  :group 'bison)

(defcustom bison-ts-mode-autodetect-language t
  "Search for a %language directive in the file at initialization.
Changing the value of this directive in the file requires to reload the mode to
be effective.  If `bison-ts-mode-buffer-language' is set by a file-local
 variable, the auto-detection is not run."
  :version "30.1"
  :type 'boolean
  :safe 'boolean
  :group 'bison)

(defvar-local bison-ts-mode-embedded-language nil
  "Embedded language in Bison buffer.")

(defun bison-ts-mode--merge-feature-lists (l1 l2)
  "Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
  (let ((res ()))
    (while (or l1 l2)
      (setq res (push (append (car l1) (car l2)) res))
      (setq l1 (cdr l1) l2 (cdr l2)))
    (nreverse res)))

(defun bison-ts-mode--find-language-in-buffer (&optional buffer)
  "Find and return the language set by the Bison directive %language.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
  (save-excursion
    (when buffer
      (switch-to-buffer buffer))
    (goto-char (point-min))
    (let ((pos-end
           (re-search-forward
            (rx
             bol (0+ blank) "%language" (0+ blank) "\"" (1+ (in alpha "+")) "\"")
            nil
            t))
          (pos-beg nil))
      (when pos-end
        (goto-char (1- pos-end))
        (setq pos-beg (1+ (search-backward "\"" nil t)))
        (buffer-substring-no-properties pos-beg (1- pos-end))))))


(defun bison-ts-mode--detect-language (&optional buffer)
  "Dectect the embedded language in a Bison buffer.
Known languages are C, C++, D, and Java, but D is not supported as there is
no support for tree-sitter D in Emacs yet.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
  (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
      (pcase (downcase str)
        ("c" 'c)
        ("c++" 'cpp)
        ("d" (progn (message "D language not yet supported") nil))
        ("java" 'java))
    (progn
      (message
       "bison-ts-mode: %%language specification not found or invalid, defaulting to C.")
      'c)))


(defun bison-ts-mode--language-at-point-function (position)
  "Return the language at POSITION."
  (let* ((node (treesit-node-at position 'bison)))
    (if (equal (treesit-node-type node)
               "embedded_code")
        bison-ts-mode-embedded-language
      'bison)))

(defun bison-ts-mode--font-lock-settings (language)
  "Return the font-lock settings for Bison.
LANGUAGE should be set to \\='bison."
  (treesit-font-lock-rules
   :language language
   :feature 'bison-comment
   '((comment) @font-lock-comment-face)

   :language language
   :feature 'bison-declaration
   '((declaration_name) @font-lock-keyword-face)

   :language language
   :feature 'bison-type
   '((type) @font-lock-type-face)

   :language language
   :feature 'bison-grammar-rule-usage
   '((grammar_rule_identifier) @font-lock-variable-use-face)

   :language language
   :feature 'bison-grammar-rule-declaration
   '((grammar_rule (grammar_rule_declaration)
                   @font-lock-variable-use-face))

   :language language
   :feature 'bison-string
   :override t
   '((string) @font-lock-string-face)

   :language language
   :feature 'bison-literal
   :override t
   '((char_literal) @font-lock-keyword-face
     (number_literal) @font-lock-number-face)

   :language language
   :feature 'bison-directive-grammar-rule
   :override t
   '((grammar_rule (directive) @font-lock-keyword-face))

   :language language
   :feature 'bison-operator
   :override t
   '(["|"] @font-lock-operator-face)

   :language language
   :feature 'bison-delimiter
   :override t
   '([";"] @font-lock-delimiter-face)))


(defvar bison-ts-mode--font-lock-feature-list
  '(( bison-comment bison-declaration bison-type
      bison-grammar-rule-usage bison-grammar-rule-declaration
      bison-string bison-literal bison-directive-grammar-rule
      bison-operator bison-delimiter)))


(defun bison-ts-mode--bison-matcher-action (root-name)
  "Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
  (lambda (node _parent bol &rest _)
    (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
        (let* ((bison-node (treesit-node-at bol 'bison)))
          (if (equal
               (treesit-node-type
                (treesit-node-parent(treesit-node-parent bison-node))) "action")
              t
            nil)))))

(defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
  "Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
  (lambda (node _parent bol &rest _)
    (if (equal (treesit-node-type (treesit-node-parent node)) root-name)
        (let* ((bison-node (treesit-node-at bol 'bison)))
          (if (equal (treesit-node-type (treesit-node-parent bison-node)) "epilogue")
              nil
            t)))))


(defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
  "Get the parent of the bison node at BOL."
  (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))


(defun bison-ts-mode--indent-rules ()
  "Indent rules supported by `bison-ts-mode'."
  (let*
      ((common
        `(((node-is "^declaration$")
           column-0 0)
          ((and (parent-is "^declaration$")
                (not (node-is "^code_block$")))
           column-0 2)
          ((and (parent-is "comment") c-ts-common-looking-at-star)
           c-ts-common-comment-start-after-first-star -1)
          (c-ts-common-comment-2nd-line-matcher
           c-ts-common-comment-2nd-line-anchor
           1)
          ((parent-is "comment") prev-adaptive-prefix 0)

          ;; Opening and closing brackets "{}" of declarations
          ((and (parent-is "^declaration$")
                (node-is "^code_block$"))
           column-0 0)
          ((and (n-p-gp "}" "" "^declaration$"))
           column-0 0)
          ((parent-is "^declaration$") parent 2)
          ((node-is "^grammar_rule$") column-0 0)
          ((and
            (parent-is "^grammar_rule$")
            (node-is ";"))
           column-0 bison-ts-mode-indent-offset)
          ((and (parent-is "^grammar_rule$")
                (node-is "|"))
           column-0 bison-ts-mode-indent-offset)
          ((and (parent-is "^grammar_rule$")
                (not (node-is "^grammar_rule_declaration$"))
                (not (node-is "^action$")))
           column-0 ,(+ bison-ts-mode-indent-offset 2))
          ((or
            (node-is "^action$")
            (node-is "^}$"))
           column-0 12)
          ;; Set '%%' at the beginning of the line
          ((or
            (and (parent-is "^grammar_rules_section$")
                 (node-is "%%"))
            (node-is "^grammar_rules_section$"))
           column-0 0)
          (no-node parent-bol 0))))
    `((bison . ,common)
      ;; Import and override embedded languages rules to add an offset
      ,(pcase bison-ts-mode-embedded-language
         ('c `(c
               ((bison-ts-mode--bison-matcher-action "translation_unit")
                bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
               ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
                column-0 ,bison-ts-mode-indent-offset)
               ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))
         ('cpp `(cpp
                 ((bison-ts-mode--bison-matcher-action "translation_unit")
                  bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
                 ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
                  parent-0 ,bison-ts-mode-indent-offset)
                 ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))))
         ('java `(java
                  ((bison-ts-mode--bison-matcher-action "program")
                   bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
                  ((bison-ts-mode--bison-matcher-not-epilogue "program")
                   column-0 ,bison-ts-mode-indent-offset)
                  ,@java-ts-mode--indent-rules))))))


(define-derived-mode bison-ts-mode prog-mode "Bison"
  "A mode for Bison."
  (when (treesit-ready-p 'bison)
    (when (not bison-ts-mode-embedded-language)
      (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))

    ;; Require only if needed, to avoid warnings if a grammar is not
	;; installed but not used.
    (pcase bison-ts-mode-embedded-language
      ('c (require 'c-ts-mode))
      ('cpp (require 'c-ts-mode))
      ('java (require 'java-ts-mode)))

    (setq-local treesit-font-lock-settings
                (append (bison-ts-mode--font-lock-settings 'bison)
                        (pcase bison-ts-mode-embedded-language
                          ('c (c-ts-mode--font-lock-settings 'c))
                          ('cpp (c-ts-mode--font-lock-settings 'cpp))
                          ('java java-ts-mode--font-lock-settings))))

    (setq-local treesit-font-lock-feature-list
                (if bison-ts-mode-embedded-language
                    (bison-ts-mode--merge-feature-lists
                     bison-ts-mode--font-lock-feature-list
                     (pcase bison-ts-mode-embedded-language
                       ('c c-ts-mode--feature-list)
                       ('cpp c-ts-mode--feature-list)
                       ('java java-ts-mode--feature-list)))
                  bison-ts-mode--font-lock-feature-list))

    (setq-local treesit-simple-imenu-settings
                `(("Grammar"
                   "\\`grammar_rule_declaration\\'"
                   nil
                   (lambda (node) (substring-no-properties (treesit-node-text node))))))

    (c-ts-common-comment-setup)

    (setq-local treesit-simple-indent-rules
                (bison-ts-mode--indent-rules))

    (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)

    (when bison-ts-mode-embedded-language
      (setq-local treesit-range-settings
                  (treesit-range-rules
                   :embed bison-ts-mode-embedded-language
                   :host 'bison
                   :local t
                   '((embedded_code) @capture))))

    (treesit-major-mode-setup)))

(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here

^ permalink raw reply related	[flat|nested] 17+ messages in thread

end of thread, other threads:[~2023-09-29 14:13 UTC | newest]

Thread overview: 17+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
2023-09-21 22:23 ` Stefan Kangas
2023-09-22  5:52 ` Eli Zaretskii
2023-09-22 23:44   ` Yuan Fu
2023-09-23  5:52     ` Eli Zaretskii
2023-09-26  3:42       ` Yuan Fu
2023-09-22  7:38 ` Philip Kaludercic
2023-09-22 14:53   ` Augustin Chéneau (BTuin)
2023-09-22 20:40     ` Philip Kaludercic
2023-09-22 23:21       ` Augustin Chéneau (BTuin)
2023-09-22  7:42 ` Stefan Kangas
2023-09-22  8:45 ` Yuan Fu
2023-09-24 21:10 ` Yuan Fu
2023-09-26 11:52   ` Augustin Chéneau (BTuin)
2023-09-28  7:03     ` Yuan Fu
     [not found]       ` <b999a251-1778-49ac-90dc-ef8d78d36d53@mailo.com>
2023-09-29  1:26         ` Yuan Fu
2023-09-29 14:13       ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).