* bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers
@ 2023-09-15 15:45 Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-17 10:07 ` Eli Zaretskii
[not found] ` <0D81E83C-E5DB-45C1-AEE6-FE1FD6274A63@gmail.com>
0 siblings, 2 replies; 6+ messages in thread
From: Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-15 15:45 UTC (permalink / raw)
To: 66004, casouri
[-- Attachment #1: Type: text/plain, Size: 701 bytes --]
Background: In clojure-ts-mode I've been capturing docstrings and
applying some limited syntax highlighting using an embedded markdown
parser. I'm only able to capture the full string, "quotes included". I
would like to be able to easily adjust the ranges captured to only
include the contents of the string, delimiters excluded. I have a
similar desire to capture the contents of a regular expression literal
and apply a nested regex grammar.
I've seen an offset mechanism used by the neovim tree-sitter integration
for similar purposes.
I believe the javascript/typescript modes could take advantage of this
with template strings. I've included a small test in the patch that
demonstrates this.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: 0001-Offset-ranges-before-applying-embedded-tree-sitter-p.patch --]
[-- Type: text/x-patch, Size: 7316 bytes --]
From 61b89cf08ff8eb6e984d862519b9a0750f7f0cd0 Mon Sep 17 00:00:00 2001
From: Danny Freeman <danny@dfreeman.email>
Date: Fri, 15 Sep 2023 11:29:05 -0400
Subject: [PATCH] Offset ranges before applying embedded tree-sitter parser
* lisp/treesit.el
(treesit-query-range): Accept an optional offest arg, apply the offset
to all returned ranges
(treesit-range-rules): Accept an optional :offset keyword arg to adjust
ranges an embded parser is applied to
(treesit-update-ranges): Forward optional :offset setting from
`treesit-range-rules' to `treesit-query-rang'
* test/lisp/treesit-tests.el
(treesit-range-offset): Tests the new offset functioanlity
This is feature would allow treesitter major modes to easily specify
offsets when using embeded parsers. A potential use case for this is
javascript template strings, when we want to apply a different parser
to the string's contents, but do not want to include the template
string's delmiters.
---
lisp/treesit.el | 49 +++++++++++++++++++++++++++------------
test/src/treesit-tests.el | 14 +++++++++++
2 files changed, 48 insertions(+), 15 deletions(-)
diff --git a/lisp/treesit.el b/lisp/treesit.el
index 78bd149b7e2..0b257c93c44 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -449,21 +449,25 @@ treesit-query-string
(treesit-parser-root-node parser)
query))))
-(defun treesit-query-range (node query &optional beg end)
+(defun treesit-query-range (node query &optional beg end offset)
"Query the current buffer and return ranges of captured nodes.
QUERY, NODE, BEG, END are the same as in `treesit-query-capture'.
This function returns a list of (START . END), where START and
-END specifics the range of each captured node. Capture names
-generally don't matter, but names that starts with an underscore
-are ignored."
- (cl-loop for capture
- in (treesit-query-capture node query beg end)
- for name = (car capture)
- for node = (cdr capture)
- if (not (string-prefix-p "_" (symbol-name name)))
- collect (cons (treesit-node-start node)
- (treesit-node-end node))))
+END specifics the range of each captured node. OFFSET is an
+optional pair of numbers (START-OFFSET . END-OFFSET). The
+respective offset values are added to each (START . END) range
+being returned. Capture names generally don't matter, but names
+that starts with an underscore are ignored."
+ (let ((offset-left (or (car offset) 0))
+ (offset-right (or (cdr offset) 0)))
+ (cl-loop for capture
+ in (treesit-query-capture node query beg end)
+ for name = (car capture)
+ for node = (cdr capture)
+ if (not (string-prefix-p "_" (symbol-name name)))
+ collect (cons (+ (treesit-node-start node) offset-left)
+ (+ (treesit-node-end node) offset-right)))))
;;; Range API supplement
@@ -509,6 +513,7 @@ treesit-range-rules
(treesit-range-rules
:embed \\='javascript
:host \\='html
+ :offset \\='(1 . -1)
\\='((script_element (raw_text) @cap)))
The `:embed' keyword specifies the embedded language, and the
@@ -521,13 +526,20 @@ treesit-range-rules
this QUERY is given a dedicated local parser. Otherwise, the
range shares the same parser with other ranges.
+If there's an `:offset' keyword with a pair of numbers, each
+captured range is offset by those numbers. For example, an
+offset of (1 . -1) will update a captured range of (2 . 8) to
+be (3 . 7). This can be used to exclude things like surrounding
+delimiters from being included in the range covered by an
+embedded parser.
+
QUERY can also be a function that takes two arguments, START and
END. If QUERY is a function, it doesn't need the :KEYWORD VALUE
pair preceding it. This function should set the ranges for
parsers in the current buffer in the region between START and
END. It is OK for this function to set ranges in a larger region
that encompasses the region between START and END."
- (let (host embed result local)
+ (let (host embed offset result local)
(while query-specs
(pcase (pop query-specs)
(:local (when (eq t (pop query-specs))
@@ -540,6 +552,12 @@ treesit-range-rules
(unless (symbolp embed-lang)
(signal 'treesit-error (list "Value of :embed option should be a symbol" embed-lang)))
(setq embed embed-lang)))
+ (:offset (let ((range-offset (pop query-specs)))
+ (unless (and (consp range-offset)
+ (numberp (car range-offset))
+ (numberp (cdr range-offset)))
+ (signal 'treesit-error (list "Value of :offset option should be a pair of numbers" range-offset)))
+ (setq offset range-offset)))
(query (if (functionp query)
(push (list query nil nil) result)
(when (null embed)
@@ -547,9 +565,9 @@ treesit-range-rules
(when (null host)
(signal 'treesit-error (list "Value of :host option cannot be omitted")))
(push (list (treesit-query-compile host query)
- embed local)
+ embed local offset)
result))
- (setq host nil embed nil))))
+ (setq host nil embed nil offset nil))))
(nreverse result)))
(defun treesit--merge-ranges (old-ranges new-ranges start end)
@@ -676,6 +694,7 @@ treesit-update-ranges
(let ((query (nth 0 setting))
(language (nth 1 setting))
(local (nth 2 setting))
+ (offset (nth 3 setting))
(beg (or beg (point-min)))
(end (or end (point-max))))
(cond
@@ -687,7 +706,7 @@ treesit-update-ranges
(parser (treesit-parser-create language))
(old-ranges (treesit-parser-included-ranges parser))
(new-ranges (treesit-query-range
- host-lang query beg end))
+ host-lang query beg end offset))
(set-ranges (treesit--clip-ranges
(treesit--merge-ranges
old-ranges new-ranges beg end)
diff --git a/test/src/treesit-tests.el b/test/src/treesit-tests.el
index 65994ce608f..4308e4048f6 100644
--- a/test/src/treesit-tests.el
+++ b/test/src/treesit-tests.el
@@ -662,6 +662,20 @@ treesit-range
;; TODO: More tests.
)))
+(ert-deftest treesit-range-offset ()
+ "Tests if range offsets work."
+ (skip-unless (treesit-language-available-p 'javascript))
+ (with-temp-buffer
+ (let ((query '(((call_expression (identifier) @_html_template_fn
+ (template_string) @capture)
+ (:equal "html" @_html_template_fn)))))
+ (progn
+ (insert "const x = html`<p>Hello</p>`;")
+ (treesit-parser-create 'javascript))
+ (should (equal '((15 . 29)) (treesit-query-range 'javascript query)))
+ (should (equal '((16 . 28)) (treesit-query-range
+ 'javascript query nil nil '(1 . -1)))))))
+
;;; Multiple language
(ert-deftest treesit-multi-lang ()
--
2.40.1
[-- Attachment #3: Type: text/plain, Size: 59 bytes --]
Let me know what you think.
Thank you,
--
Danny Freeman
^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers
2023-09-15 15:45 bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-09-17 10:07 ` Eli Zaretskii
[not found] ` <0D81E83C-E5DB-45C1-AEE6-FE1FD6274A63@gmail.com>
1 sibling, 0 replies; 6+ messages in thread
From: Eli Zaretskii @ 2023-09-17 10:07 UTC (permalink / raw)
To: Danny Freeman; +Cc: casouri, 66004
> Date: Fri, 15 Sep 2023 11:45:00 -0400
> From: Danny Freeman via "Bug reports for GNU Emacs,
> the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
>
> Background: In clojure-ts-mode I've been capturing docstrings and
> applying some limited syntax highlighting using an embedded markdown
> parser. I'm only able to capture the full string, "quotes included". I
> would like to be able to easily adjust the ranges captured to only
> include the contents of the string, delimiters excluded. I have a
> similar desire to capture the contents of a regular expression literal
> and apply a nested regex grammar.
>
> I've seen an offset mechanism used by the neovim tree-sitter integration
> for similar purposes.
>
> I believe the javascript/typescript modes could take advantage of this
> with template strings. I've included a small test in the patch that
> demonstrates this.
Yuan, any comments?
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers
[not found] ` <0D81E83C-E5DB-45C1-AEE6-FE1FD6274A63@gmail.com>
@ 2023-09-17 12:05 ` Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-18 4:12 ` Yuan Fu
0 siblings, 1 reply; 6+ messages in thread
From: Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-09-17 12:05 UTC (permalink / raw)
To: Yuan Fu; +Cc: 66004
Yuan Fu <casouri@gmail.com> writes:
> This is a good idea, thanks! I believe you’ve sighed copyright assignment, right? If so, I’ll marge this and push to master.
Thank you, and yes I've done the copyright assignment.
> BTW, I don’t see this on debbugs. Did you get a confirmation that the bug report is created? It could be my email client problem though.
I did get the email, here is debbugs link
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66004
I will CC the debbug email. This email just went straight to me.
--
Danny Freeman
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers
2023-09-17 12:05 ` Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-09-18 4:12 ` Yuan Fu
2023-09-18 22:52 ` Stefan Kangas
0 siblings, 1 reply; 6+ messages in thread
From: Yuan Fu @ 2023-09-18 4:12 UTC (permalink / raw)
To: Danny Freeman; +Cc: 66004
> On Sep 17, 2023, at 5:05 AM, Danny Freeman <danny@dfreeman.email> wrote:
>
>
> Yuan Fu <casouri@gmail.com> writes:
>
>> This is a good idea, thanks! I believe you’ve sighed copyright assignment, right? If so, I’ll marge this and push to master.
>
> Thank you, and yes I've done the copyright assignment.
>
>> BTW, I don’t see this on debbugs. Did you get a confirmation that the bug report is created? It could be my email client problem though.
>
> I did get the email, here is debbugs link
> https://debbugs.gnu.org/cgi/bugreport.cgi?bug=66004
>
> I will CC the debbug email. This email just went straight to me.
I didn’t CC debbugs since I didn’t know the bug number. Anyway, it seems to be my email provider’s problem.
I made some minor changes and pushed to master, thanks again!
Yuan
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers
2023-09-18 4:12 ` Yuan Fu
@ 2023-09-18 22:52 ` Stefan Kangas
2023-09-19 3:34 ` Yuan Fu
0 siblings, 1 reply; 6+ messages in thread
From: Stefan Kangas @ 2023-09-18 22:52 UTC (permalink / raw)
To: Yuan Fu; +Cc: 66004-done, Danny Freeman
Yuan Fu <casouri@gmail.com> writes:
> I made some minor changes and pushed to master, thanks again!
It seems like the patch was installed, but was left open in the bug
tracker. I'm therefore closing it now.
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers
2023-09-18 22:52 ` Stefan Kangas
@ 2023-09-19 3:34 ` Yuan Fu
0 siblings, 0 replies; 6+ messages in thread
From: Yuan Fu @ 2023-09-19 3:34 UTC (permalink / raw)
To: Stefan Kangas; +Cc: 66004-done, Danny Freeman
> On Sep 18, 2023, at 3:52 PM, Stefan Kangas <stefankangas@gmail.com> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>> I made some minor changes and pushed to master, thanks again!
>
> It seems like the patch was installed, but was left open in the bug
> tracker. I'm therefore closing it now.
Thank you, Stefan.
Yuan
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2023-09-19 3:34 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-09-15 15:45 bug#66004: [PATCH] Offset ranges before applying embeded treesit parsers Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-17 10:07 ` Eli Zaretskii
[not found] ` <0D81E83C-E5DB-45C1-AEE6-FE1FD6274A63@gmail.com>
2023-09-17 12:05 ` Danny Freeman via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-09-18 4:12 ` Yuan Fu
2023-09-18 22:52 ` Stefan Kangas
2023-09-19 3:34 ` Yuan Fu
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).