unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Augustin Chéneau (BTuin)" <btuin@mailo.com>
To: Yuan Fu <casouri@gmail.com>
Cc: emacs-devel <emacs-devel@gnu.org>
Subject: Re: New tree-sitter mode: bison-ts-mode
Date: Tue, 26 Sep 2023 13:52:09 +0200	[thread overview]
Message-ID: <bb4ffedb-d786-4371-8dee-40c50b4e407f@mailo.com> (raw)
In-Reply-To: <C554276E-EFE5-4D24-943C-115D604672C7@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1126 bytes --]

Le 24/09/2023 à 23:10, Yuan Fu a écrit :
> Actually, a second thought. IIUC the reason for this change is that without these rules, no indentation is done for variable names in code like
> 
> static myttype *
> variable_name;
> 
> static myttype
> variable_name;
> 
> This is fine in normal C code, since no indentation is the same as
> zero indentation, which is what we want.  But for C embedded in Bison,
> no indentation is not the same as zero indentation.  We want it indent
> to the top-level indentation of the embedded C. E.g.,
> 
> {
>    static myttype
>    variable_name;
> }
> 
> rather than
> 
> {
>    static myttype
> variable_name;
> }
> 
> Right? In that case, we should really add a fallback indent rule for the embedded c in bison, such that it indents to the top-level of the embedded c. This is more robust than trying to cover all cases in the c rules. Does that make sense?
> 
> Basically, add something like
> 
> (catch-all bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
> 
> To the end of C’s indent rules.
> 
> Yuan

It does make sense, thanks.  I updated bison-ts-mode to do that.


[-- Attachment #2: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 13950 bytes --]

;;; bison-ts-mode --- tree-sitter support for Bison -*- lexical-binding: t; -*-

;; Copyright (C) 2023 Free Software Foundation, Inc.

;; Author   : Augustin Chéneau <btuin@mailo.com>
;; Keywords : bison yacc languages tree-sitter

;; This file is part of GNU Emacs.

;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.

;;; Commentary:

;; This is a mode based on tree-sitter for Bison and Yacc files, tools to
;; generate parsers.  The grammar used is available here:
;; https://gitlab.com/btuin2/tree-sitter-bison

;;; Code:

(require 'treesit)
(require 'c-ts-common)

(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")


(defgroup bison nil
  "Support for Bison and Yacc using tree-sitter."
  :group 'languages)

(defcustom bison-ts-mode-indent-offset 2
  "Number of spaces for each indentation step in `bison-ts-mode'.
It has no effect in the epilogue part of the file."
  :version "30.1"
  :type 'integer
  :safe 'integerp)

(defcustom bison-ts-mode-autodetect-language t
  "Search for a %language directive in the file at initialization.
Changing the value of this directive in the file requires to reload the mode to
be effective.  If `bison-ts-mode-buffer-language' is set by a file-local
 variable, the auto-detection is not run."
  :version "30.1"
  :type 'boolean
  :safe 'boolean)

(defvar-local bison-ts-mode-embedded-language nil
  "Embedded language in Bison buffer.
Supported values are `c', `cpp', and `java'.")
;;;###autoload
(put 'bison-ts-mode-embedded-language 'safe-local-variable 'symbolp)


(defun bison-ts-mode--merge-feature-lists (l1 l2)
  "Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
  (let ((res ()))
    (while (or l1 l2)
      (setq res (push (seq-uniq (append (car l1) (car l2)) 'eq) res))
      (setq l1 (cdr l1) l2 (cdr l2)))
    (nreverse res)))

(defun bison-ts-mode--find-language-in-buffer (&optional buffer)
  "Find and return the language set by the Bison directive %language.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
  (save-excursion
    (with-current-buffer (or buffer (current-buffer))
      (goto-char (point-min))
      (when
          (re-search-forward
           (rx
            ;; Only check for a language directive at the beginning of a line
            ;; I'm not entirely sure it is actually mandatory, but in practice
            ;; directives are always on a new line.
            bol
            (0+ blank)
            "%language"
            (0+ blank)
            "\""
            ;; Bison supports C, C++, Java, and D.  They can be capitalized or not.
            (group (1+ (in alpha "+")))
            "\"")
           nil
           t)))
    (match-string-no-properties 1)))


(defun bison-ts-mode--detect-language (&optional buffer)
  "Dectect the embedded language in a Bison buffer.
Known languages are C, C++, D, and Java, but D is not supported as there is
no support for tree-sitter D in Emacs yet.
If BUFFER is set, search in this buffer, otherwise search in the current
buffer."
  (if-let ((str (bison-ts-mode--find-language-in-buffer buffer)))
      (pcase (downcase str)
        ("c" 'c)
        ("c++" 'cpp)
        ("d" (message "D language not yet supported") nil)
        ("java" 'java)
        (_ (message "%%language specification \"%s\" is invalid, defaulting to C" str) 'c))))


(defun bison-ts-mode--language-at-point-function (position)
  "Return the language at POSITION."
  (let ((node (treesit-node-at position 'bison)))
    (if (equal (treesit-node-type node) "embedded_code")
        bison-ts-mode-embedded-language
      'bison)))

(defun bison-ts-mode--font-lock-settings (language)
  "Return the font-lock settings for Bison.
LANGUAGE should be set to \\='bison."
  (treesit-font-lock-rules
   :language language
   :feature 'comment
   '((comment) @font-lock-comment-face)

   :language language
   :feature 'declaration
   '((declaration_name) @font-lock-keyword-face)

   :language language
   :feature 'type
   '((type) @font-lock-type-face)

   :language language
   :feature 'variable
   '((grammar_rule_identifier) @font-lock-variable-use-face)

   :language language
   :feature 'grammar-declaration
   '((grammar_rule (grammar_rule_declaration)
                   @font-lock-variable-use-face))

   :language language
   :feature 'string
   :override t
   '((string) @font-lock-string-face)

   :language language
   :feature 'literal
   :override t
   '((char_literal) @font-lock-keyword-face
     (number_literal) @font-lock-number-face)

   :language language
   :feature 'directive-grammar-rule
   :override t
   '((grammar_rule (directive) @font-lock-keyword-face))

   :language language
   :feature 'operator
   :override t
   '(["|"] @font-lock-operator-face)

   :language language
   :feature 'delimiter
   :override t
   '([";"] @font-lock-delimiter-face)))


(defconst bison-ts-mode--font-lock-feature-list
  '(( comment declaration grammar-declaration)
    ( type string directive-grammar-rule)
    ( literal)
    ( variable operator delimiter)))


(defun bison-ts-mode--inside-p (type bol)
  "Check if node at BOL is contained inside a Bison TYPE node."
  (treesit-parent-until
   (treesit-node-at bol 'bison)
   (lambda (node) (equal (treesit-node-type node) type))))


(defun bison-ts-mode--catch-all (type)
  "Treesit matcher to catch all nodes inside a Bison TYPE node."
  (lambda (_node _parent bol &rest _)
    (bison-ts-mode--inside-p type bol)))


(defun bison-ts-mode--bison-matcher-action (root-name)
  "Treesit matcher to check if NODE at BOL is located in an action node.
ROOT-NAME is the highest-level node of the embedded language."
  (lambda (node _parent bol &rest _)
    (when (equal (treesit-node-type (treesit-node-parent node)) root-name)
      (bison-ts-mode--inside-p "action" bol))))


(defun bison-ts-mode--bison-matcher-not-epilogue (root-name)
  "Treesit matcher to check if NODE at BOL is not located in the epilogue.
ROOT-NAME is the highest-level node of the embedded language."
  (lambda (node _parent bol &rest _)
    (when (equal (treesit-node-type (treesit-node-parent node)) root-name)
      (not (bison-ts-mode--inside-p "epilogue" bol)))))


(defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
  "Get the parent of the bison node at BOL."
  (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))


(defun bison-ts-mode--indent-rules ()
  "Indent rules supported by `bison-ts-mode'."
  (let
      ((common
        `(((node-is "^declaration$")
           column-0 0)
          ((and (parent-is "^declaration$")
                (not (node-is "^code_block$")))
           column-0 2)
          ((and (parent-is "comment") (c-ts-common-looking-at ?*))
           c-ts-common-comment-start-after-first-star -1)
          ((and (parent-is "comment") (c-ts-common-looking-at ?| ?\\ ?`) )
           c-ts-common-comment-start-after-first-star -2)
          (c-ts-common-comment-2nd-line-matcher
           c-ts-common-comment-2nd-line-anchor
           1)
          ((parent-is "comment") prev-adaptive-prefix 0)

          ;; Opening and closing brackets "{}" of declarations
          ((and (parent-is "^declaration$")
                (node-is "^code_block$"))
           column-0 0)
          ((and (n-p-gp "}" "" "^declaration$"))
           column-0 0)
          ((parent-is "^declaration$") parent 2)
          ((node-is "^grammar_rule$") column-0 0)
          ((and
            (parent-is "^grammar_rule$")
            (node-is ";"))
           column-0 bison-ts-mode-indent-offset)
          ((and (parent-is "^grammar_rule$")
                (node-is "|"))
           column-0 bison-ts-mode-indent-offset)
          ((and (parent-is "^grammar_rule$")
                (not (node-is "^grammar_rule_declaration$"))
                (not (node-is "^action$")))
           column-0 ,(+ bison-ts-mode-indent-offset 2))
          ((or
            (node-is "^action$")
            (node-is "^}$"))
           column-0 12)
          ;; Set '%%' at the beginning of the line
          ((or
            (and (parent-is "^grammar_rules_section$")
                 (node-is "%%"))
            (node-is "^grammar_rules_section$"))
           column-0 0)
          (no-node parent-bol 0))))
    `((bison . ,common)
      ;; Import and override embedded languages rules to add an offset
      ,(pcase bison-ts-mode-embedded-language
         ('c `(c
               ((bison-ts-mode--bison-matcher-action "translation_unit")
                bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
               ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
                column-0 ,bison-ts-mode-indent-offset)
               ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))
               ((bison-ts-mode--catch-all "action")
                bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
               ((not (bison-ts-mode--catch-all "epilogue"))
                column-0 ,bison-ts-mode-indent-offset)))
         ('cpp `(cpp
                 ((bison-ts-mode--bison-matcher-action "translation_unit")
                  bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
                 ((bison-ts-mode--bison-matcher-not-epilogue "translation_unit")
                  column-0 ,bison-ts-mode-indent-offset)
                 ,@(alist-get 'cpp (c-ts-mode--get-indent-style 'cpp))
                 ((bison-ts-mode--catch-all "action")
                  bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
                 ((not (bison-ts-mode--catch-all "epilogue"))
                  column-0 ,bison-ts-mode-indent-offset)))
         ('java `(java
                  ((bison-ts-mode--bison-matcher-action "program")
                   bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
                  ((bison-ts-mode--bison-matcher-not-epilogue "program")
                   bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
                  ,@java-ts-mode--indent-rules
                  ((bison-ts-mode--catch-all "action")
                   bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
                  ((not (bison-ts-mode--catch-all "epilogue"))
                   column-0 ,bison-ts-mode-indent-offset)))))))


(define-derived-mode bison-ts-mode prog-mode "Bison"
  "A major-mode for Bison based on tree-sitter."
  (when (treesit-ready-p 'bison)
    (unless bison-ts-mode-embedded-language
      (setq bison-ts-mode-embedded-language (bison-ts-mode--detect-language)))

    ;; Require only if needed, to avoid warnings if a grammar is not
    ;; installed but not used.
    (pcase bison-ts-mode-embedded-language
      ('c (require 'c-ts-mode))
      ('cpp (require 'c-ts-mode))
      ('java (require 'java-ts-mode)))

    (setq-local treesit-font-lock-settings
                (append (bison-ts-mode--font-lock-settings 'bison)
                        (pcase bison-ts-mode-embedded-language
                          ('c (c-ts-mode--font-lock-settings 'c))
                          ('cpp (c-ts-mode--font-lock-settings 'cpp))
                          ('java java-ts-mode--font-lock-settings))))

    (setq-local treesit-font-lock-feature-list
                (if bison-ts-mode-embedded-language
                    (bison-ts-mode--merge-feature-lists
                     bison-ts-mode--font-lock-feature-list
                     (pcase bison-ts-mode-embedded-language
                       ('c c-ts-mode--feature-list)
                       ('cpp c-ts-mode--feature-list)
                       ('java java-ts-mode--feature-list)))
                  bison-ts-mode--font-lock-feature-list))

    (setq-local treesit-simple-imenu-settings
                `(("Grammar"
                   "\\`grammar_rule_declaration\\'"
                   nil
                   (lambda (node) (treesit-node-text node t)))))

    (c-ts-common-comment-setup)

    (setq-local treesit-simple-indent-rules
                (bison-ts-mode--indent-rules))

    (setq-local treesit-language-at-point-function 'bison-ts-mode--language-at-point-function)


    (if bison-ts-mode-embedded-language
        (setq-local treesit-range-settings
                    (treesit-range-rules
                     :embed bison-ts-mode-embedded-language
                     :host 'bison
                     :local t
                     '((embedded_code) @capture)))
      (treesit-parser-create 'bison))

    (treesit-major-mode-setup)))

(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here

  reply	other threads:[~2023-09-26 11:52 UTC|newest]

Thread overview: 17+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-09-21 20:15 New tree-sitter mode: bison-ts-mode Augustin Chéneau (BTuin)
2023-09-21 22:23 ` Stefan Kangas
2023-09-22  5:52 ` Eli Zaretskii
2023-09-22 23:44   ` Yuan Fu
2023-09-23  5:52     ` Eli Zaretskii
2023-09-26  3:42       ` Yuan Fu
2023-09-22  7:38 ` Philip Kaludercic
2023-09-22 14:53   ` Augustin Chéneau (BTuin)
2023-09-22 20:40     ` Philip Kaludercic
2023-09-22 23:21       ` Augustin Chéneau (BTuin)
2023-09-22  7:42 ` Stefan Kangas
2023-09-22  8:45 ` Yuan Fu
2023-09-24 21:10 ` Yuan Fu
2023-09-26 11:52   ` Augustin Chéneau (BTuin) [this message]
2023-09-28  7:03     ` Yuan Fu
     [not found]       ` <b999a251-1778-49ac-90dc-ef8d78d36d53@mailo.com>
2023-09-29  1:26         ` Yuan Fu
2023-09-29 14:13       ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bb4ffedb-d786-4371-8dee-40c50b4e407f@mailo.com \
    --to=btuin@mailo.com \
    --cc=casouri@gmail.com \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).