unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "João Paulo Labegalini de Carvalho" <jaopaulolc@gmail.com>
To: emacs-devel@gnu.org, Yuan Fu <casouri@gmail.com>,
	 Theodor Thornhill <theo@thornhill.no>,
	Eli Zaretskii <eliz@gnu.org>
Subject: Implementation direction for shell-script-mode with tree-sitter
Date: Tue, 25 Oct 2022 09:05:10 -0600	[thread overview]
Message-ID: <CAGjvy2_a_BUUD_zBC_4_s3UN6DFbhoru_E5P2u2TX-fQ+Juewg@mail.gmail.com> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 1244 bytes --]

Hi,

The tree-sitter-bash grammar does not include many reserved words and
builtin commands that are currently fontified by the regex based
fontication in shell-script-mode.

Here a list of the ones that tree-sitter-bash does not recognize:

("time" "coproc" "type" "trap" "exit" "exec" "continue" "break" "return"
"logout" "bye")

According to the Bash Reference Manual, all of the above are reserved words.

Should I make a PR to tree-sitter-bash to incorporate the missing keywords
or should I just filter them out of the list that I obtain through (and
other variables in `shell-script-mode'):

(append (sh-feature sh-leading-keywords)
        (sh-feature sh-other-keywords))

I am attaching the patch so everyone can see code and understand better
what I did. I welcome all criticism and feedback.

PS.: I am looking at the tree-sitter-bash and it does not seem very
complicated to extend it to recognize the missing keywords. But I can
definitely keep working independently of that.
-- 
João Paulo L. de Carvalho
Ph.D Computer Science |  IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canada
joao.carvalho@ic.unicamp.br
joao.carvalho@ualberta.ca

[-- Attachment #1.2: Type: text/html, Size: 1893 bytes --]

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: sh-script-treesit.patch --]
[-- Type: text/x-patch; charset="x-binaryenc"; name="sh-script-treesit.patch", Size: 5339 bytes --]

^[[33mdiff --git a/lisp/progmodes/sh-script.el b/lisp/progmodes/sh-script.el^[[m
^[[33mindex 558b62b20a..c7cc676843 100644^[[m
^[[33m--- a/lisp/progmodes/sh-script.el^[[m
^[[33m+++ b/lisp/progmodes/sh-script.el^[[m
^[[36m@@ -148,6 +148,7 @@^[[m
   (require 'let-alist)^[[m
   (require 'subr-x))^[[m
 (require 'executable)^[[m
^[[32m+^[[m^[[32m(require 'treesit)^[[m
 ^[[m
 (autoload 'comint-completion-at-point "comint")^[[m
 (autoload 'comint-filename-completion "comint")^[[m
^[[36m@@ -170,6 +171,12 @@^[[m ^[[msh-script^[[m
   :group 'sh^[[m
   :prefix "sh-")^[[m
 ^[[m
^[[32m+^[[m^[[32m(defcustom sh-script-use-tree-sitter nil^[[m
^[[32m+^[[m^[[32m  "If non-nil, `sh-script-mode' tries to use tree-sitter.^[[m
^[[32m+^[[m^[[32mCurrently `sh-script-mode' uses tree-sitter for font-locking, imenu,^[[m
^[[32m+^[[m^[[32mand movement functions."^[[m
^[[32m+^[[m^[[32m  :type 'boolean^[[m
^[[32m+^[[m^[[32m  :version "29.1")^[[m
 ^[[m
 (defcustom sh-ancestor-alist^[[m
   '((ash . sh)^[[m
^[[36m@@ -1534,13 +1541,24 @@^[[m ^[[msh-mode^[[m
   ;; we can't look if previous line ended with `\'^[[m
   (setq-local comint-prompt-regexp "^[ \t]*")^[[m
   (setq-local imenu-case-fold-search nil)^[[m
^[[31m-  (setq font-lock-defaults^[[m
^[[31m-	`((sh-font-lock-keywords^[[m
^[[31m-	   sh-font-lock-keywords-1 sh-font-lock-keywords-2)^[[m
^[[31m-	  nil nil^[[m
^[[31m-	  ((?/ . "w") (?~ . "w") (?. . "w") (?- . "w") (?_ . "w")) nil^[[m
^[[31m-	  (font-lock-syntactic-face-function^[[m
^[[31m-	   . ,#'sh-font-lock-syntactic-face-function)))^[[m
^[[32m+^[[m
^[[32m+^[[m^[[32m  (if (and sh-script-use-tree-sitter^[[m
^[[32m+^[[m^[[32m           (treesit-can-enable-p))^[[m
^[[32m+^[[m^[[32m      (progn^[[m
^[[32m+^[[m^[[32m        (setq-local font-lock-keywords-only t)^[[m
^[[32m+^[[m^[[32m        (setq-local treesit-font-lock-feature-list^[[m
^[[32m+^[[m^[[32m                    '((basic) (moderate) (elaborate)))^[[m
^[[32m+^[[m^[[32m        (setq-local treesit-font-lock-settings^[[m
^[[32m+^[[m^[[32m                    sh-script--treesit-settings)^[[m
^[[32m+^[[m^[[32m        (treesit-font-lock-enable))^[[m
^[[32m+^[[m^[[32m    (setq font-lock-defaults^[[m
^[[32m+^[[m^[[32m          `((sh-font-lock-keywords^[[m
^[[32m+^[[m^[[32m             sh-font-lock-keywords-1 sh-font-lock-keywords-2)^[[m
^[[32m+^[[m^[[32m            nil nil^[[m
^[[32m+^[[m^[[32m            ((?/ . "w") (?~ . "w") (?. . "w") (?- . "w") (?_ . "w")) nil^[[m
^[[32m+^[[m^[[32m            (font-lock-syntactic-face-function^[[m
^[[32m+^[[m^[[32m             . ,#'sh-font-lock-syntactic-face-function))))^[[m
^[[32m+^[[m
   (setq-local syntax-propertize-function #'sh-syntax-propertize-function)^[[m
   (add-hook 'syntax-propertize-extend-region-functions^[[m
             #'syntax-propertize-multiline 'append 'local)^[[m
^[[36m@@ -3191,6 +3209,51 @@^[[m ^[[msh-shellcheck-flymake^[[m
       (process-send-region sh--shellcheck-process (point-min) (point-max))^[[m
       (process-send-eof sh--shellcheck-process))))^[[m
 ^[[m
^[[31m-(provide 'sh-script)^[[m
^[[32m+^[[m^[[32m;;; Tree-sitter font-lock^[[m
^[[32m+^[[m
^[[32m+^[[m^[[32m(defvar sh-script--treesit-bash-keywords^[[m
^[[32m+^[[m^[[32m  '("case" "do" "done" "elif" "else" "esac" "export" "fi" "for"^[[m
^[[32m+^[[m^[[32m    "function" "if" "in" "unset" "while" "then"))^[[m
^[[32m+^[[m
^[[32m+^[[m^[[32m(defun sh-script--treesit-filtered-keywords (blacklist)^[[m
^[[32m+^[[m^[[32m  "Docstring goes here"^[[m
^[[32m+^[[m^[[32m  (let ((keywords (append (sh-feature sh-leading-keywords)^[[m
^[[32m+^[[m^[[32m                          (sh-feature sh-other-keywords)))^[[m
^[[32m+^[[m^[[32m        (filtered-list))^[[m
^[[32m+^[[m^[[32m    (dolist (item keywords filtered-list)^[[m
^[[32m+^[[m^[[32m      (if (not (member item blacklist))^[[m
^[[32m+^[[m^[[32m          (setq filtered-list (cons item filtered-list))^[[m
^[[32m+^[[m^[[32m        nil))))^[[m
^[[32m+^[[m
^[[32m+^[[m^[[32m(defvar sh-script--treesit-blacklisted-keywords^[[m
^[[32m+^[[m^[[32m  "Docstring goes here"^[[m
^[[32m+^[[m^[[32m  '("time" "coproc" "type" "trap" "exit" "exec" "continue" "break"^[[m
^[[32m+^[[m^[[32m  "return" "logout" "bye"))^[[m
^[[32m+^[[m
^[[32m+^[[m^[[32m(defvar sh-script--treesit-settings^[[m
^[[32m+^[[m^[[32m  (treesit-font-lock-rules^[[m
^[[32m+^[[m^[[32m   :language 'bash^[[m
^[[32m+^[[m^[[32m   :feature 'basic^[[m
^[[32m+^[[m^[[32m   '(;; Queries for function, strings, comments, and heredocs^[[m
^[[32m+^[[m^[[32m     (function_definition name: (word) @font-lock-function-name-face)^[[m
^[[32m+^[[m^[[32m     (comment) @font-lock-comment-face^[[m
^[[32m+^[[m^[[32m     [ (string) (raw_string)(heredoc_body) (heredoc_start) ] @font-lock-string-face)^[[m
^[[32m+^[[m^[[32m   :language 'bash^[[m
^[[32m+^[[m^[[32m   :feature 'moderate^[[m
^[[32m+^[[m^[[32m   :override t^[[m
^[[32m+^[[m^[[32m   `(;; Queries for keywords and builtin commands^[[m
^[[32m+^[[m^[[32m     [ ,@(sh-script--treesit-filtered-keywords sh-script--blacklisted-keywords) ] @font-lock-keyword-face^[[m
^[[32m+^[[m^[[32m     (command name: (command_name^[[m
^[[32m+^[[m^[[32m      ((word) @font-lock-builtin-face^[[m
^[[32m+^[[m^[[32m       (:match ,(let ((builtins (sh-feature sh-builtins)))^[[m
^[[32m+^[[m^[[32m                  (rx-to-string^[[m
^[[32m+^[[m^[[32m                   `(seq bol^[[m
^[[32m+^[[m^[[32m                         (or ,@builtins)^[[m
^[[32m+^[[m^[[32m                         eol)))^[[m
^[[32m+^[[m^[[32m               @font-lock-builtin-face))))^[[m
^[[32m+^[[m^[[32m     )^[[m
^[[32m+^[[m^[[32m   )^[[m
^[[32m+^[[m^[[32m  "Tree-sitter font-lock settings.")^[[m
 ^[[m
^[[32m+^[[m^[[32m(provide 'sh-script)^[[m
 ;;; sh-script.el ends here^[[m

             reply	other threads:[~2022-10-25 15:05 UTC|newest]

Thread overview: 10+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-10-25 15:05 João Paulo Labegalini de Carvalho [this message]
2022-10-25 15:46 ` Implementation direction for shell-script-mode with tree-sitter Stefan Monnier
2022-10-25 16:26   ` João Paulo Labegalini de Carvalho
2022-10-26  0:52 ` Po Lu
2022-10-26 15:48   ` João Paulo Labegalini de Carvalho
2022-10-27  0:54     ` Po Lu
2022-10-27  6:06       ` Eli Zaretskii
2022-10-27 14:23         ` João Paulo Labegalini de Carvalho
2022-10-27 15:53           ` Eli Zaretskii
2022-10-27 14:22       ` João Paulo Labegalini de Carvalho

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAGjvy2_a_BUUD_zBC_4_s3UN6DFbhoru_E5P2u2TX-fQ+Juewg@mail.gmail.com \
    --to=jaopaulolc@gmail.com \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    --cc=theo@thornhill.no \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).