all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Vincenzo Pupillo <v.pupillo@gmail.com>
To: Yuan Fu <casouri@gmail.com>
Cc: Eli Zaretskii <eliz@gnu.org>, 74610@debbugs.gnu.org
Subject: bug#74610: 31.0.50; Submitting mhtml-ts-mode, treesitter alternative to mhtml-mode
Date: Sat, 14 Dec 2024 11:37:01 +0100	[thread overview]
Message-ID: <26683715.1r3eYUQgxm@fedora> (raw)
In-Reply-To: <0083FFEF-128F-4C5C-B6BC-92CF657416C4@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 1529 bytes --]

In data mercoledì 11 dicembre 2024 05:54:09 Ora standard dell’Europa centrale, 
Yuan Fu ha scritto:
> > On Dec 3, 2024, at 6:29 AM, Vincenzo Pupillo <v.pupillo@gmail.com> wrote:
> > 
> > In data domenica 1 dicembre 2024 07:01:21 Ora standard dell’Europa
> > centrale,> 
> > Yuan Fu ha scritto:
> >> It's not uncommon to see different indent offset for CSS and
> >> Javascript, so it's a good idea to have separate control for them.
> > 
> > Is the behavior the same as mhtml-mode, or would you like something like
> > this?> 
> >    <style>
> >    
> >                                  z {
> >                                  
> >                                      color: red;
> >                                  
> >                                  }
> >    
> >    </style>
> >    <script>
> >    
> >        function myFunction(p1, p2) {
> >        
> >            return p1 * p2;
> >        
> >        }
> >    
> >    </script>
> > 
> > The mhtml-ts-mode-js-css-indent-offset variable controls only the
> > indentation relative to the <style> and <script> tags.
> 
> Ah, I see, it’s the offset from the enclosing tag. In that case it should be
> fine to use a common variable.
> 
> Yuan
Thank you Yuan.
Attached is the revised patch following your previous comments.
As I already wrote to Dmitry, I am doing some tests to see if html-ts-mode can 
be extended and if there is a way to integrate one multi-language mode into 
another multi-language mode.

Vincenzo

[-- Attachment #2: 0001-Add-mhtml-ts-mode.patch --]
[-- Type: text/x-patch, Size: 19855 bytes --]

From 355075793eff5a58dac83756d96881b6932d5838 Mon Sep 17 00:00:00 2001
From: Vincenzo Pupillo <v.pupillo@gmail.com>
Date: Fri, 29 Nov 2024 22:48:45 +0100
Subject: [PATCH] Add mhtml-ts-mode.

New major-mode alternative to mhtml-mode, based on treesitter, for
editing files containing html, javascript and css.

* etc/NEWS: Mention the new mode.
* lisp/textmodes/mhtml-ts-mode.el: New file.
---
 etc/NEWS                        |   8 +
 lisp/textmodes/mhtml-ts-mode.el | 429 ++++++++++++++++++++++++++++++++
 2 files changed, 437 insertions(+)
 create mode 100644 lisp/textmodes/mhtml-ts-mode.el

diff --git a/etc/NEWS b/etc/NEWS
index 4d2a2c893d0..8f9a04dcf01 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -797,6 +797,14 @@ destination window is chosen using 'display-buffer-alist'.  Example:
 \f
 * New Modes and Packages in Emacs 31.1
 
+** New major modes based on the tree-sitter library
+
++++
+*** New major mode 'mhtml-ts-mode'.
+An optional major mode based on the tree-sitter library for editing html
+files. This mode handles indentation, fontification, and commenting for
+embedded JavaScript and CSS.
+
 \f
 * Incompatible Lisp Changes in Emacs 31.1
 
diff --git a/lisp/textmodes/mhtml-ts-mode.el b/lisp/textmodes/mhtml-ts-mode.el
new file mode 100644
index 00000000000..746300efc33
--- /dev/null
+++ b/lisp/textmodes/mhtml-ts-mode.el
@@ -0,0 +1,429 @@
+;;; mhtml-ts-mode.el --- Major mode for HTML using tree-sitter -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2024 Free Software Foundation, Inc.
+
+;; Author: Vincenzo Pupillo <v.pupillo@gmail.com>
+;; Maintainer: Vincenzo Pupillo <v.pupillo@gmail.com>
+;; Created: Nov 2024
+;; Keywords: HTML language tree-sitter
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+;;
+;; This package provides `mhtml-ts-mode' which is a major mode
+;; for editing HTML files with embedded JavaScript and CSS.
+;; Tree Sitter is used to parse each of these languages.
+;;
+;; Please note that this package requires `html-ts-mode', which
+;; registers itself as the major mode for editing HTML.
+;;
+;; This package is compatible and has been tested with the following
+;; tree-sitter grammars:
+;; * https://github.com/tree-sitter/tree-sitter-html
+;; * https://github.com/tree-sitter/tree-sitter-javascript
+;; * https://github.com/tree-sitter/tree-sitter-jsdoc
+;; * https://github.com/tree-sitter/tree-sitter-css
+;;
+;; Features
+;;
+;; * Indent
+;; * IMenu
+;; * Navigation
+;; * Which-function
+;; * Tree-sitter parser installation helper
+
+;;; Code:
+
+(require 'treesit)
+(require 'html-ts-mode)
+(require 'css-mode) ;; for embed css into html
+(require 'js) ;; for embed javascript into html
+
+(eval-when-compile
+  (require 'rx))
+
+;; This tells the byte-compiler where the functions are defined.
+;; Is only needed when a file needs to be able to byte-compile
+;; in a Emacs not built with tree-sitter library.
+(treesit-declare-unavailable-functions)
+
+;; In a multi-language major mode can be useful to have an "installer" to
+;; simplify the installation of the grammars supported by the major-mode.
+(defvar mhtml-ts-mode--language-source-alist
+  '((html . ("https://github.com/tree-sitter/tree-sitter-html"  "v0.23.0"))
+    (javascript . ("https://github.com/tree-sitter/tree-sitter-javascript" "v0.23.0"))
+    (jsdoc . ("https://github.com/tree-sitter/tree-sitter-jsdoc" "v0.23.0"))
+    (css . ("https://github.com/tree-sitter/tree-sitter-css" "v0.23.0")))
+  "Treesitter language parsers required by `mhtml-ts-mode'.
+You can customize this variable if you want to stick to a specific
+commit and/or use different parsers.")
+
+(defun mhtml-ts-mode-install-parsers ()
+  "Install all the required treesitter parsers.
+`mhtml-ts-mode--language-source-alist' defines which parsers to install."
+  (interactive)
+  (let ((treesit-language-source-alist mhtml-ts-mode--language-source-alist))
+    (dolist (item mhtml-ts-mode--language-source-alist)
+      (treesit-install-language-grammar (car item)))))
+
+;;; Custom variables
+
+(defgroup mhtml-ts-mode nil
+  "Major mode for editing HTML files, based on `html-ts-mode'.
+Works with JS and CSS and for that use `js-ts-mode' and `css-ts-mode'."
+  :prefix "mhtml-ts-mode-"
+  ;; :group 'languages
+  :group 'html)
+
+(defcustom mhtml-ts-mode-js-css-indent-offset 2
+  "JavaScript and CSS indent spaces related to the <script> and <style> HTML tags.
+By default should have same value as `html-ts-mode-indent-offset'."
+  :tag "HTML javascript or css indent offset"
+  :version "31.1"
+  :type 'integer
+  :safe 'integerp)
+
+(defvar mhtml-ts-mode--js-css-indent-offset
+  mhtml-ts-mode-js-css-indent-offset
+  "Internal copy of `mhtml-ts-mode-js-css-indent-offset'.
+The value changes, by `mhtml-ts-mode--tag-relative-indent-offset' according to
+the value of `mhtml-ts-mode-tag-relative-indent'.")
+
+(defun mhtml-ts-mode--tag-relative-indent-offset (sym val)
+  "Custom setter for `mhtml-ts-mode-tag-relative-indent'.
+
+Apart from setting the default value of SYM to VAL, also change the
+value of SYM in `mhtml-ts-mode' buffers to VAL.  SYM should be
+`mhtml-ts-mode-tag-relative-indent', and VAL should be t, nil or
+`ignore'.  When sym is `mhtml-ts-mode-tag-relative-indent' set the
+value of `mhtml-ts-mode--js-css-indent-offset' to 0 if VAL is t,
+otherwise to `mhtml-ts-mode-js-css-indent-offset'."
+  (set-default sym val)
+  (when (eq sym 'mhtml-ts-mode-tag-relative-indent)
+    (setq-local
+     mhtml-ts-mode--js-css-indent-offset
+     (if (eq val t)
+         mhtml-ts-mode-js-css-indent-offset
+       0))))
+
+(defcustom mhtml-ts-mode-tag-relative-indent t
+  "How <script> and <style> bodies are indented relative to the tag.
+
+When t, indentation looks like:
+
+  <script>
+    code();
+  </script>
+
+When nil, indentation of the script body starts just below the
+tag, like:
+
+  <script>
+  code();
+  </script>
+
+When `ignore', the script body starts in the first column, like:
+
+  <script>
+code();
+  </script>"
+  :type '(choice (const nil) (const t) (const ignore))
+  :safe 'symbolp
+  :set #'mhtml-ts-mode--tag-relative-indent-offset
+  :version "31.1")
+
+(defcustom mhtml-ts-mode-css-fontify-colors t
+  "Whether CSS colors should be fontified using the color as the background.
+If non-nil, text representing a CSS color will be fontified
+such that its background is the color itself.
+Works like `css--fontify-region'."
+  :tag "HTML colors the CSS properties values."
+  :version "31.1"
+  :type 'boolean
+  :safe 'booleanp)
+
+;; To enable some basic treesiter functionality, you should define
+;; a function that recognizes which grammar is used at-point.
+;; This function should be assigned to `treesit-language-at-point-function'
+(defun mhtml-ts-mode--language-at-point (point)
+  "Return the language at POINT assuming the point is within a HTML buffer."
+  (let* ((node (treesit-node-at point 'html))
+         (parent (treesit-node-parent node))
+         (node-query (format "(%s (%s))"
+                             (treesit-node-type parent)
+                             (treesit-node-type node))))
+    (cond
+     ((string-equal "(script_element (raw_text))" node-query) 'javascript)
+     ((string-equal "(style_element (raw_text))" node-query) 'css)
+     (t 'html))))
+
+;; Custom font-lock function that's used to apply color to css color
+;; The signature of the function should be conforming to signature
+;; QUERY-SPEC required by `treesit-font-lock-rules'.
+(defun mhtml-ts-mode--colorize-css-value (node override start end &rest _)
+  "Colorize CSS property value like `css--fontify-region'.
+For NODE, OVERRIDE, START, and END, see `treesit-font-lock-rules'."
+  (if (and mhtml-ts-mode-css-fontify-colors
+           (string-equal "plain_value" (treesit-node-type node)))
+      (let ((color (css--compute-color start (treesit-node-text node t))))
+        (when color
+          (with-silent-modifications
+            (add-text-properties
+             (treesit-node-start node) (treesit-node-end node)
+             (list 'face (list :background color
+                               :foreground (readable-foreground-color
+                                            color)
+                               :box '(:line-width -1)))))))
+    (treesit-fontify-with-override
+     (treesit-node-start node) (treesit-node-end node)
+     'font-lock-variable-name-face
+     override start end)))
+
+;; Embedded languages ​​should be indented according to the language
+;; that embeds them.
+;; This function signature complies with `treesit-simple-indent-rules'
+;; ANCHOR.
+(defun mhtml-ts-mode--js-css-tag-bol (_node _parent &rest _)
+  "Find the first non-space characters of html tags <script> or <style>.
+Return `line-beginning-position' when `treesit-node-at' is html, or
+`mhtml-ts-mode-tag-relative-indent' is equal to ignore.
+NODE and PARENT are ignored."
+  (if (or (eq (treesit-language-at (point)) 'html)
+          (eq mhtml-ts-mode-tag-relative-indent 'ignore))
+      (line-beginning-position)
+    ;; Ok, we are in js or css block.
+    (save-excursion
+      (re-search-backward "<script.*>\\|<style.*>" nil t))))
+
+;; Treesit supports 4 level of decoration, `treesit-font-lock-level'
+;; define which level to use.  Major modes categorize their fontification
+;; features, these categories are defined by `treesit-font-lock-rules' of
+;; each major-mode using :feature keyword.
+;; In a multiple language Major mode it's a good idea to provide, for each
+;; level, the union of the :feature of the same level.
+(defvar mhtml-ts-mode--feature-list
+  '(;; level 1
+    (;; common
+     comment definition
+     ;; JS specific
+     document
+     ;; CSS specific
+     query selector)
+    ;; level 2
+    (keyword name property string type)
+    ;; level 3
+    (;; common
+     attribute assignment constant escape-sequence
+     base-clause literal variable-name variable
+     ;; Javascript specific
+     jsx number pattern string-interpolation)
+    ;; level 4
+    (bracket delimiter error operator function)))
+
+;; In order to support `which-fuction-mode' we should define
+;; a function that return the defun name.
+;; In a multilingual treesit mode, this can be implemented simply by
+;; calling language-specific functions.
+(defun mhtml-ts-mode--defun-name (node)
+  "Return the defun name of NODE.
+Return nil if there is no name or if NODE is not a defun node."
+  (let ((lang (mhtml-ts-mode--language-at-point (point))))
+    (message "lang = %s" lang)
+    (cond
+     ((eq lang 'html) (html-ts-mode--defun-name node))
+     ((eq lang 'javascript) (js--treesit-defun-name node))
+     ((eq lang 'css) (css--treesit-defun-name node)))))
+
+(define-derived-mode mhtml-ts-mode html-mode
+  '("HTML+" (:eval (let ((lang (mhtml-ts-mode--language-at-point (point))))
+                     (cond ((eq lang 'html) "")
+                           ((eq lang 'javascript) "JS")
+                           ((eq lang 'css) "CSS")))))
+  "Major mode for editing HTML with embedded JavaScript and CSS.
+Powered by tree-sitter."
+  (if (not (and
+            (treesit-ready-p 'html)
+            (treesit-ready-p 'javascript)
+            (treesit-ready-p 'css)))
+      (error "Tree-sitter parsers for HTML isn't
+    available.  You can install the parsers with M-x
+    `mhtml-ts-mode-install-parsers'")
+
+    ;; When an language is embedded, you should initialize some variable
+    ;; just like it's done in the original mode.
+
+    ;; Comment.
+    ;; indenting settings for js-ts-mode.
+    (c-ts-common-comment-setup)
+    (setq-local comment-multi-line t)
+
+    ;; Font-lock.
+
+    ;; There are two ways to handle embedded code:
+    ;; 1. Use a single parser for all the embedded code in the buffer. In
+    ;; this case, the embedded code blocks are concatenated together and are
+    ;; seen as a single continuous document to the parser.
+    ;; 2. Each embedded code block gets its own parser. Each parser only sees
+    ;; that particular code block.
+
+    ;; If you go with 2 for a language, the local parsers are created and
+    ;; destroyed automatically by Emacs. So don't create a global parser for
+    ;; that embedded language here.
+
+    ;; Create the parsers, only the global ones.
+    ;; jsdoc is a local parser, don't create a parser for it.
+    (treesit-parser-create 'css)
+    (treesit-parser-create 'javascript)
+
+    ;; Multi-language modes must set the  primary parser.
+    (setq-local treesit-primary-parser (treesit-parser-create 'html))
+
+    (setq-local treesit-range-settings
+                (treesit-range-rules
+                 :embed 'javascript
+                 :host 'html
+                 :offset '(1 . -1)
+                 '((script_element
+                    (start_tag (tag_name))
+                    (raw_text) @cap))
+
+                 :embed 'css
+                 :host 'html
+                 :offset '(1 . -1)
+                 '((style_element
+                    (start_tag (tag_name))
+                    (raw_text) @cap))))
+
+    ;; jsdoc is not mandatory for js-ts-mode, so we respect this by
+    ;; adding jsdoc range rules only when jsdoc is available.
+    (when (treesit-ready-p 'jsdoc t)
+      (setq-local treesit-range-settings
+                  (append treesit-range-settings
+                          (treesit-range-rules
+                           :embed 'jsdoc
+                           :host 'javascript
+                           :local t
+                           `(((comment) @cap
+                              (:match ,js--treesit-jsdoc-beginning-regexp @cap))))))
+      (setq-local c-ts-common--comment-regexp
+            (rx (or "comment" "line_comment" "block_comment" "description"))))
+
+
+    ;; Many treesit fuctions need to know the language at-point.
+    ;; So you should define such a function.
+    (setq-local treesit-language-at-point-function #'mhtml-ts-mode--language-at-point)
+
+    ;; Indent.
+
+    ;; Since mhtl-ts-mode inherits indentation rules from html-ts-mode, js
+    ;; and css, if you want to change the offset you have to act on the
+    ;; *-offset variables defined for those languages.
+
+    ;; JavaScript and CSS must be indented relative to their code block.
+    ;; This is done by inserting a special rule before the normal
+    ;; indentation rules of these languages.
+    ;; The value of mhtml-ts-mode--js-css-indent-offset changes based on
+    ;; mhtml-ts-mode-tag-relative-indent and can be used to indent
+    ;; JavaScript and CSS code relative to the HTML that contains them,
+    ;; just like in mhtml-mode.
+    (setq-local treesit-simple-indent-rules
+                (append html-ts-mode--indent-rules
+                        ;; Extended rules for js and css, to
+                        ;; indent appropriately when injected
+                        ;; into html
+                        `((javascript ((parent-is "program")
+                                       mhtml-ts-mode--js-css-tag-bol
+                                       mhtml-ts-mode--js-css-indent-offset)
+                                      ,@(cdr (car js--treesit-indent-rules))))
+                        `((css ((parent-is "stylesheet")
+                                mhtml-ts-mode--js-css-tag-bol
+                                mhtml-ts-mode--js-css-indent-offset)
+                               ,@(cdr (car css--treesit-indent-rules))))))
+    ;; Navigation.
+
+    ;; This is for finding defun name, it's used by IMenu as default
+    ;; function no specific functions are defined.
+    (setq-local treesit-defun-name-function #'mhtml-ts-mode--defun-name)
+
+    ;; Define what are 'thing' for treesit.
+    ;; 'Thing' is a symbol representing the thing, like `defun', `sexp', or
+    ;; `sentence'.
+    ;; As an alternative, if you want just defun, you can define a `treesit-defun-type-regexp'.
+    (setq-local treesit-thing-settings
+                `((html
+                   (defun "element")
+                   (sexp ,(regexp-opt '("element"
+                                        "text"
+                                        "attribute"
+                                        "value")))
+                   (sentence "tag")
+                   (text ,(regexp-opt '("comment" "text"))))
+                  (javascript
+                   (defun ,(rx (or "class_declaration"
+                                   "method_definition"
+                                   "function_declaration"
+                                   "lexical_declaration")))
+                   (sexp ,(regexp-opt js--treesit-sexp-nodes 'symbols))
+                   (sentence ,(regexp-opt js--treesit-sentence-nodes 'symbols))
+                   (text ,(regexp-opt
+                           '("comment"
+                             "string_fragment")
+                           'symbols)))
+                  (css
+                   (defun "rule_set"))))
+
+    ;; Font-lock.
+
+    ;; In a multi-language scenario, font lock settings are usually a
+    ;; concatenation of language rules. As you can see, it is possible
+    ;; to extend/modify the default rule or use a different set of
+    ;; rules. See `php-ts-mode--custom-html-font-lock-settings' for more
+    ;; advanced usage.
+    (setq-local treesit-font-lock-settings
+                (append html-ts-mode--font-lock-settings
+                        js--treesit-font-lock-settings
+                        (append
+                         ;; Rule for coloring CSS property values.
+                         ;; Placed before `css--treesit-settings'
+                         ;; to win against the same rule contained therein.
+                         (treesit-font-lock-rules
+                          :language 'css
+                          :override t
+                          :feature 'variable
+                          '((plain_value) @mhtml-ts-mode--colorize-css-value))
+                         css--treesit-settings)))
+
+    ;; Tells treesit the list of features to fontify.
+    (setq-local treesit-font-lock-feature-list mhtml-ts-mode--feature-list)
+
+    ;; Imenu
+
+    ;; Setup Imenu: if no function is specified, try to find an object
+    ;; using `treesit-defun-name-function'.
+    ;; TODO: we need to see if it is possible to extend Imenu to
+    ;; embedded languages ​​as well.
+    (setq-local treesit-simple-imenu-settings
+                `(("Element" "\\`tag_name\\'" nil nil)))
+
+    (treesit-major-mode-setup)))
+
+(when (and (treesit-ready-p 'html) (treesit-ready-p 'javascript) (treesit-ready-p 'css))
+  (add-to-list
+   'auto-mode-alist '("\\.[sx]?html?\\(\\.[a-zA-Z_]+\\)?\\'" . mhtml-ts-mode)))
+
+(provide 'mhtml-ts-mode)
+;;; mhtml-ts-mode.el ends here
-- 
2.47.1


  reply	other threads:[~2024-12-14 10:37 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-11-29 21:57 bug#74610: 31.0.50; Submitting mhtml-ts-mode, treesitter alternative to mhtml-mode Vincenzo Pupillo
2024-12-01  6:01 ` Yuan Fu
2024-12-01  8:00   ` Eli Zaretskii
2024-12-01  8:18     ` Yuan Fu
2024-12-03 14:29   ` Vincenzo Pupillo
2024-12-11  4:54     ` Yuan Fu
2024-12-14 10:37       ` Vincenzo Pupillo [this message]
2024-12-16 17:37         ` Juri Linkov
2024-12-17 21:25           ` Vincenzo Pupillo
2024-12-04  1:27 ` Dmitry Gutov
2024-12-04 10:47   ` Vincenzo Pupillo
2024-12-05 16:51     ` Dmitry Gutov
2024-12-06 13:39       ` Vincenzo Pupillo

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=26683715.1r3eYUQgxm@fedora \
    --to=v.pupillo@gmail.com \
    --cc=74610@debbugs.gnu.org \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.