unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
@ 2023-06-12 14:14 Mattias Engdegård
       [not found] ` <handler.64017.B.168657924917612.ack@debbugs.gnu.org>
  2023-06-15 22:08 ` Yuan Fu
  0 siblings, 2 replies; 13+ messages in thread
From: Mattias Engdegård @ 2023-06-12 14:14 UTC (permalink / raw)
  To: 64017; +Cc: Basil Contovounesios, Yuan Fu

`treesit-pattern-expand` converts a query pattern into tree-sitter S-expression syntax, as a string. The conversion mainly converts certain keywords but the main problem is that it prints strings in Emacs syntax which differs from that of tree-sitter.

As a consequence, :match regexps cannot contain newlines:

(treesit-query-capture
 'java
 '(((identifier) @font-lock-constant-face
    (:match "hello\n" @font-lock-constant-face))))

signals a syntax error.

As far as I can tell the tree-sitter string syntax allows for the escape sequences:

\n = LF
\r = CR
\t = TAB
\0 = NUL  (only a single 0 -- no octal escapes!)
\X = the character X itself

Unescape newlines result in a syntax error as seen in the example above. NULs don't seem to go well either.

At the very least, the conversion should avoid literal newlines and NULs in the result (and probably CR and TAB). This cannot be done with a straight prin1-to-string.

(By the way, why is the conversion written in C? Was Lisp too slow?)

Ideally we should not need to expose the tree-sitter s-exp query syntax at all. Surely Emacs s-exps should be preferable in every case?






^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
       [not found] ` <handler.64017.B.168657924917612.ack@debbugs.gnu.org>
@ 2023-06-15 10:45   ` Mattias Engdegård
  2023-06-15 22:13     ` Yuan Fu
  0 siblings, 1 reply; 13+ messages in thread
From: Mattias Engdegård @ 2023-06-15 10:45 UTC (permalink / raw)
  To: 64017; +Cc: Basil Contovounesios, Yuan Fu

I also propose that we change the documentation to describe the (Elisp) sexp-based query syntax only, or at least first and foremost, since that is what all existing code uses and is more convenient. Currently the manual starts by describing the string syntax and only then the Elisp sexp syntax.






^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-12 14:14 bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax Mattias Engdegård
       [not found] ` <handler.64017.B.168657924917612.ack@debbugs.gnu.org>
@ 2023-06-15 22:08 ` Yuan Fu
  2023-06-16 11:25   ` Mattias Engdegård
  1 sibling, 1 reply; 13+ messages in thread
From: Yuan Fu @ 2023-06-15 22:08 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: contovob, 64017

Thanks for catching this.

> On Jun 12, 2023, at 7:14 AM, Mattias Engdegård <mattias.engdegard@gmail.com> wrote:
> 
> `treesit-pattern-expand` converts a query pattern into tree-sitter S-expression syntax, as a string. The conversion mainly converts certain keywords but the main problem is that it prints strings in Emacs syntax which differs from that of tree-sitter.
> 
> As a consequence, :match regexps cannot contain newlines:
> 
> (treesit-query-capture
> 'java
> '(((identifier) @font-lock-constant-face
>    (:match "hello\n" @font-lock-constant-face))))
> 
> signals a syntax error.
> 
> As far as I can tell the tree-sitter string syntax allows for the escape sequences:
> 
> \n = LF
> \r = CR
> \t = TAB
> \0 = NUL  (only a single 0 -- no octal escapes!)
> \X = the character X itself
> 
> Unescape newlines result in a syntax error as seen in the example above. NULs don't seem to go well either.
> 
> At the very least, the conversion should avoid literal newlines and NULs in the result (and probably CR and TAB). This cannot be done with a straight prin1-to-string.
> 
> (By the way, why is the conversion written in C? Was Lisp too slow?)

Because I wasn't sure if it’s ok for C functions to rely on Lisp functions, plus the function is simple enough. Right now if one doesn’t load treesit.el, all the C functions work fine.

> 
> Ideally we should not need to expose the tree-sitter s-exp query syntax at all. Surely Emacs s-exps should be preferable in every case?
> 

It shouldn’t hurt to expose the tree-sitter sexp. Other editors mainly use the string syntax.

Yuan




^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-15 10:45   ` Mattias Engdegård
@ 2023-06-15 22:13     ` Yuan Fu
  0 siblings, 0 replies; 13+ messages in thread
From: Yuan Fu @ 2023-06-15 22:13 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Basil Contovounesios, 64017



> On Jun 15, 2023, at 3:45 AM, Mattias Engdegård <mattias.engdegard@gmail.com> wrote:
> 
> I also propose that we change the documentation to describe the (Elisp) sexp-based query syntax only, or at least first and foremost, since that is what all existing code uses and is more convenient. Currently the manual starts by describing the string syntax and only then the Elisp sexp syntax.
> 

The difference between tree-sitter syntax and Elisp sexp syntax is petty small (anchor, predicates), so the text describing the tree-sitter syntax is basically describing Elisp sexp syntax. With that said if someone makes it describe Elisp sexp syntax first, I wouldn’t mind.

Yuan




^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-15 22:08 ` Yuan Fu
@ 2023-06-16 11:25   ` Mattias Engdegård
  2023-06-16 17:02     ` Mattias Engdegård
  2023-06-17 23:02     ` Yuan Fu
  0 siblings, 2 replies; 13+ messages in thread
From: Mattias Engdegård @ 2023-06-16 11:25 UTC (permalink / raw)
  To: Yuan Fu; +Cc: contovob, 64017

16 juni 2023 kl. 00.08 skrev Yuan Fu <casouri@gmail.com>:

>> (By the way, why is the conversion written in C? Was Lisp too slow?)
> 
> Because I wasn't sure if it’s ok for C functions to rely on Lisp functions, plus the function is simple enough. Right now if one doesn’t load treesit.el, all the C functions work fine.

All right, let's keep it there for now.
I fixed the string conversion bug in 8657afac77.

>> Ideally we should not need to expose the tree-sitter s-exp query syntax at all. Surely Emacs s-exps should be preferable in every case?

> It shouldn’t hurt to expose the tree-sitter sexp. Other editors mainly use the string syntax.

Most of them probably aren't written in Lisp. But fine, let's keep it as an alternative syntax.

> The difference between tree-sitter syntax and Elisp sexp syntax is petty small (anchor, predicates), so the text describing the tree-sitter syntax is basically describing Elisp sexp syntax.

Yes, so it seemed to me but reading the source code (lib/src/query.c) seems to indicate that what I thought were symbols -- *, +, ?, @thing, #thing -- appear to be special postfix and prefix operators. (Ironically, there doesn't seem to be a grammar for this language anywhere, or am I mistaken?)

Thus a structurally correct Lispish translation of

  (teet "toot"* (#equal "fie" @fum))

should arguable be something like

  (teet (* "toot") ((# equal) "fie" (@ fum)))

rather than the current

  (teet "toot" :* (:equal "fie @fum))

but I'm not demanding that it all be changed at this stage.

> With that said if someone makes it describe Elisp sexp syntax first, I wouldn’t mind.

I'll have a look. Wouldn't it be reasonable to use the Elisp syntax, briefly state how it corresponds to the 'native' syntax, and refer to the official tree-sitter documentation for details about the latter?






^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-16 11:25   ` Mattias Engdegård
@ 2023-06-16 17:02     ` Mattias Engdegård
  2023-06-16 17:33       ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-06-17 23:02     ` Yuan Fu
  1 sibling, 1 reply; 13+ messages in thread
From: Mattias Engdegård @ 2023-06-16 17:02 UTC (permalink / raw)
  To: Yuan Fu; +Cc: contovob, 64017

[-- Attachment #1: Type: text/plain, Size: 130 bytes --]

Here is a modification of the treesit manual to teach s-expressions first.
It's mostly a matter of straightforward substitution.


[-- Attachment #2: treesit-doc-sexp-patterns.diff --]
[-- Type: application/octet-stream, Size: 9427 bytes --]

diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi
index b0824faaaa2..bd81ee3c535 100644
--- a/doc/lispref/parsing.texi
+++ b/doc/lispref/parsing.texi
@@ -1132,9 +1132,9 @@ Pattern Matching
 
 @defun treesit-query-capture node query &optional beg end node-only
 This function matches patterns in @var{query} within @var{node}.
-The argument @var{query} can be either a string, a s-expression, or a
-compiled query object.  For now, we focus on the string syntax;
-s-expression syntax and compiled query are described at the end of the
+The argument @var{query} can be either a s-expression, a string, or a
+compiled query object.  For now, we focus on the s-expression syntax;
+string syntax and compiled query are described at the end of the
 section.
 
 The argument @var{node} can also be a parser or a language symbol.  A
@@ -1165,8 +1165,8 @@ Pattern Matching
 @example
 @group
 (setq query
-      "(binary_expression
-        (number_literal) @@number-in-exp) @@biexp")
+      '((binary_expression
+         (number_literal) @@number-in-exp) @@biexp)
 @end group
 @end example
 
@@ -1187,8 +1187,8 @@ Pattern Matching
 @example
 @group
 (setq query
-      "(binary_expression) @@biexp
-       (number_literal)  @@number @@biexp")
+      '((binary_expression) @@biexp
+        (number_literal) @@number @@biexp)
 @end group
 @end example
 
@@ -1246,23 +1246,23 @@ Pattern Matching
 @subheading Quantify node
 
 @cindex quantify node, tree-sitter
-Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and
-@samp{?}.  Their meanings are the same as in regular expressions:
-@samp{*} matches the preceding pattern zero or more times, @samp{+}
-matches one or more times, and @samp{?} matches zero or one time.
+Tree-sitter recognizes quantification operators @samp{:*}, @samp{:+} and
+@samp{:?}.  Their meanings are the same as in regular expressions:
+@samp{:*} matches the preceding pattern zero or more times, @samp{:+}
+matches one or more times, and @samp{:?} matches zero or one time.
 
 For example, the following pattern matches @code{type_declaration}
 nodes that has @emph{zero or more} @code{long} keyword.
 
 @example
-(type_declaration "long"*) @@long-type
+(type_declaration "long" :*) @@long-type
 @end example
 
 The following pattern matches a type declaration that has zero or one
 @code{long} keyword:
 
 @example
-(type_declaration "long"?) @@long-type
+(type_declaration "long" :?) @@long-type
 @end example
 
 @subheading Grouping
@@ -1272,15 +1272,15 @@ Pattern Matching
 express a comma separated list of identifiers, one could write
 
 @example
-(identifier) ("," (identifier))*
+(identifier) ("," (identifier)) :*
 @end example
 
 @subheading Alternation
 
-Again, similar to regular expressions, we can express ``match anyone
-from this group of patterns'' in a pattern.  The syntax is a list of
-patterns enclosed in square brackets.  For example, to capture some
-keywords in C, the pattern would be
+Again, similar to regular expressions, we can express ``match any one
+from this group of patterns'' in a pattern.  The syntax is a vector of
+patterns.  For example, to capture some keywords in C, the pattern
+would be
 
 @example
 @group
@@ -1295,7 +1295,7 @@ Pattern Matching
 
 @subheading Anchor
 
-The anchor operator @samp{.} can be used to enforce juxtaposition,
+The anchor operator @code{:anchor} can be used to enforce juxtaposition,
 i.e., to enforce two things to be directly next to each other.  The
 two ``things'' can be two nodes, or a child and the end of its parent.
 For example, to capture the first child, the last child, or two
@@ -1304,19 +1304,19 @@ Pattern Matching
 @example
 @group
 ;; Anchor the child with the end of its parent.
-(compound_expression (_) @@last-child .)
+(compound_expression (_) @@last-child :anchor)
 @end group
 
 @group
 ;; Anchor the child with the beginning of its parent.
-(compound_expression . (_) @@first-child)
+(compound_expression :anchor (_) @@first-child)
 @end group
 
 @group
 ;; Anchor two adjacent children.
 (compound_expression
  (_) @@prev-child
- .
+ :anchor
  (_) @@next-child)
 @end group
 @end example
@@ -1332,8 +1332,8 @@ Pattern Matching
 @example
 @group
 (
- (array . (_) @@first (_) @@last .)
- (#equal @@first @@last)
+ (array :anchor (_) @@first (_) @@last :anchor)
+ (:equal @@first @@last)
 )
 @end group
 @end example
@@ -1341,22 +1341,23 @@ Pattern Matching
 @noindent
 tree-sitter only matches arrays where the first element equals to the
 last element.  To attach a predicate to a pattern, we need to group
-them together.  A predicate always starts with a @samp{#}.  Currently
-there are three predicates, @code{#equal}, @code{#match}, and
-@code{#pred}.
+them together.  Currently
+there are three predicates, @code{:equal}, @code{:match}, and
+@code{:pred}.
 
-@deffn Predicate equal arg1 arg2
+@deffn Predicate :equal arg1 arg2
 Matches if @var{arg1} equals to @var{arg2}.  Arguments can be either
 strings or capture names.  Capture names represent the text that the
 captured node spans in the buffer.
 @end deffn
 
-@deffn Predicate match regexp capture-name
+@deffn Predicate :match regexp capture-name
 Matches if the text that @var{capture-name}'s node spans in the buffer
-matches regular expression @var{regexp}.  Matching is case-sensitive.
+matches regular expression @var{regexp}, given as a string literal.
+Matching is case-sensitive.
 @end deffn
 
-@deffn Predicate pred fn &rest nodes
+@deffn Predicate :pred fn &rest nodes
 Matches if function @var{fn} returns non-@code{nil} when passed each
 node in @var{nodes} as arguments.  The function runs with the current
 buffer set to the buffer of node being queried.
@@ -1366,23 +1367,23 @@ Pattern Matching
 the same pattern.  Indeed, it makes little sense to refer to capture
 names in other patterns.
 
-@heading S-expression patterns
+@heading String patterns
 
-@cindex tree-sitter patterns as sexps
-@cindex patterns, tree-sitter, in sexp form
-Besides strings, Emacs provides a s-expression based syntax for
-tree-sitter patterns.  It largely resembles the string-based syntax.
-For example, the following query
+@cindex tree-sitter patterns as strings
+@cindex patterns, tree-sitter, in string form
+Besides s-expressions, Emacs allows the tree-sitter's native query
+syntax to be used by writing them as strings.  It largely resembles
+the s-expression syntax.  For example, the following query
 
 @example
 @group
 (treesit-query-capture
- node "(addition_expression
-        left: (_) @@left
-        \"+\" @@plus-sign
-        right: (_) @@right) @@addition
+ node '((addition_expression
+         left: (_) @@left
+         "+" @@plus-sign
+         right: (_) @@right) @@addition
 
-        [\"return\" \"break\"] @@keyword")
+         ["return" "break"] @@keyword))
 @end group
 @end example
 
@@ -1392,52 +1393,52 @@ Pattern Matching
 @example
 @group
 (treesit-query-capture
- node '((addition_expression
-         left: (_) @@left
-         "+" @@plus-sign
-         right: (_) @@right) @@addition
+ node "(addition_expression
+        left: (_) @@left
+        \"+\" @@plus-sign
+        right: (_) @@right) @@addition
 
-         ["return" "break"] @@keyword))
+        [\"return\" \"break\"] @@keyword")
 @end group
 @end example
 
-Most patterns can be written directly as strange but nevertheless
-valid s-expressions.  Only a few of them needs modification:
+Most patterns can be written directly as s-expressions inside a string.
+Only a few of them need modification:
 
 @itemize
 @item
-Anchor @samp{.} is written as @code{:anchor}.
+Anchor @code{:anchor}. is written as @samp{.}
 @item
-@samp{?} is written as @samp{:?}.
+@samp{:?} is written as @samp{?}.
 @item
-@samp{*} is written as @samp{:*}.
+@samp{:*} is written as @samp{*}.
 @item
-@samp{+} is written as @samp{:+}.
+@samp{:+} is written as @samp{+}.
 @item
-@code{#equal} is written as @code{:equal}.  In general, predicates
-change their @samp{#} to @samp{:}.
+@code{:equal} is written as @code{#equal}.  In general, predicates
+change their @samp{:} to @samp{#}.
 @end itemize
 
 For example,
 
 @example
 @group
-"(
-  (compound_expression . (_) @@first (_)* @@rest)
-  (#match \"love\" @@first)
-  )"
+'((
+   (compound_expression :anchor (_) @@first (_) :* @@rest)
+   (:match "love" @@first)
+   ))
 @end group
 @end example
 
 @noindent
-is written in s-expression as
+is written in string form as
 
 @example
 @group
-'((
-   (compound_expression :anchor (_) @@first (_) :* @@rest)
-   (:match "love" @@first)
-   ))
+"(
+  (compound_expression . (_) @@first (_)* @@rest)
+  (#match \"love\" @@first)
+  )"
 @end group
 @end example
 
@@ -1461,7 +1462,7 @@ Pattern Matching
 @end defun
 
 @defun treesit-query-language query
-This function return the language of @var{query}.
+This function returns the language of @var{query}.
 @end defun
 
 @defun treesit-query-expand query
@@ -1653,7 +1654,7 @@ Multiple Languages
 (setq css-range
       (treesit-query-range
        'html
-       "(style_element (raw_text) @@capture)"))
+       '((style_element (raw_text) @@capture))))
 (treesit-parser-set-included-ranges css css-range)
 @end group
 
@@ -1662,7 +1663,7 @@ Multiple Languages
 (setq js-range
       (treesit-query-range
        'html
-       "(script_element (raw_text) @@capture)"))
+       '((script_element (raw_text) @@capture))))
 (treesit-parser-set-included-ranges js js-range)
 @end group
 @end example

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-16 17:02     ` Mattias Engdegård
@ 2023-06-16 17:33       ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2023-06-17 10:47         ` Mattias Engdegård
  0 siblings, 1 reply; 13+ messages in thread
From: Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-06-16 17:33 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Yuan Fu, 64017

Mattias Engdegård [2023-06-16 19:02 +0200] wrote:

> Here is a modification of the treesit manual to teach s-expressions first.
> It's mostly a matter of straightforward substitution.

Generally LGTM, thanks.

> diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi
> index b0824faaaa2..bd81ee3c535 100644
> --- a/doc/lispref/parsing.texi
> +++ b/doc/lispref/parsing.texi
> @@ -1132,9 +1132,9 @@ Pattern Matching
>  
>  @defun treesit-query-capture node query &optional beg end node-only
>  This function matches patterns in @var{query} within @var{node}.
> -The argument @var{query} can be either a string, a s-expression, or a
> -compiled query object.  For now, we focus on the string syntax;
> -s-expression syntax and compiled query are described at the end of the
> +The argument @var{query} can be either a s-expression, a string, or a
> +compiled query object.  For now, we focus on the s-expression syntax;
> +string syntax and compiled query are described at the end of the
>  section.

I recently tweaked some of these docs in emacs-29, so you may want to
merge into master before respinning your patch.

> @@ -1341,22 +1341,23 @@ Pattern Matching
>  @noindent
>  tree-sitter only matches arrays where the first element equals to the
>  last element.  To attach a predicate to a pattern, we need to group
> -them together.  A predicate always starts with a @samp{#}.  Currently
> -there are three predicates, @code{#equal}, @code{#match}, and
> -@code{#pred}.
> +them together.  Currently
> +there are three predicates, @code{:equal}, @code{:match}, and
> +@code{:pred}.

Do you intend to refill the paragraph before merging?

>  @itemize
>  @item
> -Anchor @samp{.} is written as @code{:anchor}.
> +Anchor @code{:anchor}. is written as @samp{.}
                        ^
Unladen European full stop migrated from eol.

-- 
Basil





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-16 17:33       ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-06-17 10:47         ` Mattias Engdegård
  2023-06-17 12:57           ` Eli Zaretskii
  0 siblings, 1 reply; 13+ messages in thread
From: Mattias Engdegård @ 2023-06-17 10:47 UTC (permalink / raw)
  To: Basil Contovounesios; +Cc: Yuan Fu, Eli Zaretskii, 64017

16 juni 2023 kl. 19.33 skrev Basil Contovounesios <contovob@tcd.ie>:

> I recently tweaked some of these docs in emacs-29, so you may want to
> merge into master before respinning your patch.

Will do, thank you. Since this is only about documentation, perhaps it could be done in emacs-29?
Eli, would that be acceptable?

> Do you intend to refill the paragraph before merging?

I probably should (although it doesn't affect the output).

>> -Anchor @samp{.} is written as @code{:anchor}.
>> +Anchor @code{:anchor}. is written as @samp{.}
>                        ^
> Unladen European full stop migrated from eol.

So it tried to get away, that little rascal! Can't blame it for trying.






^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-17 10:47         ` Mattias Engdegård
@ 2023-06-17 12:57           ` Eli Zaretskii
  2023-06-17 13:30             ` Mattias Engdegård
  0 siblings, 1 reply; 13+ messages in thread
From: Eli Zaretskii @ 2023-06-17 12:57 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: contovob, casouri, 64017

> From: Mattias Engdegård <mattias.engdegard@gmail.com>
> Date: Sat, 17 Jun 2023 12:47:51 +0200
> Cc: Yuan Fu <casouri@gmail.com>,
>  64017@debbugs.gnu.org,
>  Eli Zaretskii <eliz@gnu.org>
> 
> 16 juni 2023 kl. 19.33 skrev Basil Contovounesios <contovob@tcd.ie>:
> 
> > I recently tweaked some of these docs in emacs-29, so you may want to
> > merge into master before respinning your patch.
> 
> Will do, thank you. Since this is only about documentation, perhaps it could be done in emacs-29?
> Eli, would that be acceptable?

If Yuan doesn't mind, yes.  But I'd like to hear from Yuan that he is
okay with these changes.





^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-17 12:57           ` Eli Zaretskii
@ 2023-06-17 13:30             ` Mattias Engdegård
  2023-06-17 22:55               ` Yuan Fu
  0 siblings, 1 reply; 13+ messages in thread
From: Mattias Engdegård @ 2023-06-17 13:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: contovob, casouri, 64017

[-- Attachment #1: Type: text/plain, Size: 388 bytes --]

17 juni 2023 kl. 14.57 skrev Eli Zaretskii <eliz@gnu.org>:

>> Will do, thank you. Since this is only about documentation, perhaps it could be done in emacs-29?
>> Eli, would that be acceptable?
> 
> If Yuan doesn't mind, yes.  But I'd like to hear from Yuan that he is
> okay with these changes.

Attached are the changes rebased to emacs-29 (fixing mistakes found by Basil).


[-- Attachment #2: treesit-doc-sexp-patterns-em29.diff --]
[-- Type: application/octet-stream, Size: 9367 bytes --]

diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi
index 3906ca0118a..9e1df07d25c 100644
--- a/doc/lispref/parsing.texi
+++ b/doc/lispref/parsing.texi
@@ -1084,9 +1084,9 @@ Pattern Matching
 
 @defun treesit-query-capture node query &optional beg end node-only
 This function matches patterns in @var{query} within @var{node}.  The
-argument @var{query} can be either a string, an s-expression, or a
-compiled query object.  For now, we focus on the string syntax;
-s-expression syntax and compiled queries are described at the end of
+argument @var{query} can be either an s-expression, a string, or a
+compiled query object.  For now, we focus on the s-expression syntax;
+string syntax and compiled queries are described at the end of
 the section.
 
 The argument @var{node} can also be a parser or a language symbol.  A
@@ -1118,8 +1118,8 @@ Pattern Matching
 @example
 @group
 (setq query
-      "(binary_expression
-        (number_literal) @@number-in-exp) @@biexp")
+      '((binary_expression
+         (number_literal) @@number-in-exp) @@biexp)
 @end group
 @end example
 
@@ -1140,8 +1140,8 @@ Pattern Matching
 @example
 @group
 (setq query
-      "(binary_expression) @@biexp
-       (number_literal)  @@number @@biexp")
+      '((binary_expression) @@biexp
+        (number_literal) @@number @@biexp)
 @end group
 @end example
 
@@ -1199,23 +1199,23 @@ Pattern Matching
 @subheading Quantify node
 
 @cindex quantify node, tree-sitter
-Tree-sitter recognizes quantification operators @samp{*}, @samp{+},
-and @samp{?}.  Their meanings are the same as in regular expressions:
-@samp{*} matches the preceding pattern zero or more times, @samp{+}
-matches one or more times, and @samp{?} matches zero or one times.
+Tree-sitter recognizes quantification operators @samp{:*}, @samp{:+},
+and @samp{:?}.  Their meanings are the same as in regular expressions:
+@samp{:*} matches the preceding pattern zero or more times, @samp{:+}
+matches one or more times, and @samp{:?} matches zero or one times.
 
 For example, the following pattern matches @code{type_declaration}
 nodes that have @emph{zero or more} @code{long} keywords.
 
 @example
-(type_declaration "long"*) @@long-type
+(type_declaration "long" :*) @@long-type
 @end example
 
 The following pattern matches a type declaration that may or may not
 have a @code{long} keyword:
 
 @example
-(type_declaration "long"?) @@long-type
+(type_declaration "long" :?) @@long-type
 @end example
 
 @subheading Grouping
@@ -1225,15 +1225,14 @@ Pattern Matching
 express a comma-separated list of identifiers, one could write
 
 @example
-(identifier) ("," (identifier))*
+(identifier) ("," (identifier)) :*
 @end example
 
 @subheading Alternation
 
 Again, similar to regular expressions, we can express ``match any one
-of these patterns'' in a pattern.  The syntax is a list of patterns
-enclosed in square brackets.  For example, to capture some keywords in
-C, the pattern would be
+of these patterns'' in a pattern.  The syntax is a vector of patterns.
+For example, to capture some keywords in C, the pattern would be
 
 @example
 @group
@@ -1248,7 +1247,7 @@ Pattern Matching
 
 @subheading Anchor
 
-The anchor operator @samp{.} can be used to enforce juxtaposition,
+The anchor operator @code{:anchor} can be used to enforce juxtaposition,
 i.e., to enforce two things to be directly next to each other.  The
 two ``things'' can be two nodes, or a child and the end of its parent.
 For example, to capture the first child, the last child, or two
@@ -1257,19 +1256,19 @@ Pattern Matching
 @example
 @group
 ;; Anchor the child with the end of its parent.
-(compound_expression (_) @@last-child .)
+(compound_expression (_) @@last-child :anchor)
 @end group
 
 @group
 ;; Anchor the child with the beginning of its parent.
-(compound_expression . (_) @@first-child)
+(compound_expression :anchor (_) @@first-child)
 @end group
 
 @group
 ;; Anchor two adjacent children.
 (compound_expression
  (_) @@prev-child
- .
+ :anchor
  (_) @@next-child)
 @end group
 @end example
@@ -1285,8 +1284,8 @@ Pattern Matching
 @example
 @group
 (
- (array . (_) @@first (_) @@last .)
- (#equal @@first @@last)
+ (array :anchor (_) @@first (_) @@last :anchor)
+ (:equal @@first @@last)
 )
 @end group
 @end example
@@ -1294,22 +1293,22 @@ Pattern Matching
 @noindent
 tree-sitter only matches arrays where the first element is equal to
 the last element.  To attach a predicate to a pattern, we need to
-group them together.  A predicate always starts with a @samp{#}.
-Currently there are three predicates: @code{#equal}, @code{#match},
-and @code{#pred}.
+group them together.  Currently there are three predicates:
+@code{:equal}, @code{:match}, and @code{:pred}.
 
-@deffn Predicate equal arg1 arg2
+@deffn Predicate :equal arg1 arg2
 Matches if @var{arg1} is equal to @var{arg2}.  Arguments can be either
 strings or capture names.  Capture names represent the text that the
 captured node spans in the buffer.
 @end deffn
 
-@deffn Predicate match regexp capture-name
+@deffn Predicate :match regexp capture-name
 Matches if the text that @var{capture-name}'s node spans in the buffer
-matches regular expression @var{regexp}.  Matching is case-sensitive.
+matches regular expression @var{regexp}, given as a string literal.
+Matching is case-sensitive.
 @end deffn
 
-@deffn Predicate pred fn &rest nodes
+@deffn Predicate :pred fn &rest nodes
 Matches if function @var{fn} returns non-@code{nil} when passed each
 node in @var{nodes} as arguments.
 @end deffn
@@ -1318,23 +1317,23 @@ Pattern Matching
 the same pattern.  Indeed, it makes little sense to refer to capture
 names in other patterns.
 
-@heading S-expression patterns
+@heading String patterns
 
-@cindex tree-sitter patterns as sexps
-@cindex patterns, tree-sitter, in sexp form
-Besides strings, Emacs provides an s-expression based syntax for
-tree-sitter patterns.  It largely resembles the string-based syntax.
-For example, the following query
+@cindex tree-sitter patterns as strings
+@cindex patterns, tree-sitter, in string form
+Besides s-expressions, Emacs allows the tree-sitter's native query
+syntax to be used by writing them as strings.  It largely resembles
+the s-expression syntax.  For example, the following query
 
 @example
 @group
 (treesit-query-capture
- node "(addition_expression
-        left: (_) @@left
-        \"+\" @@plus-sign
-        right: (_) @@right) @@addition
+ node '((addition_expression
+         left: (_) @@left
+         "+" @@plus-sign
+         right: (_) @@right) @@addition
 
-        [\"return\" \"break\"] @@keyword")
+         ["return" "break"] @@keyword))
 @end group
 @end example
 
@@ -1344,52 +1343,53 @@ Pattern Matching
 @example
 @group
 (treesit-query-capture
- node '((addition_expression
-         left: (_) @@left
-         "+" @@plus-sign
-         right: (_) @@right) @@addition
+ node "(addition_expression
+        left: (_) @@left
+        \"+\" @@plus-sign
+        right: (_) @@right) @@addition
 
-         ["return" "break"] @@keyword))
+        [\"return\" \"break\"] @@keyword")
 @end group
 @end example
 
-Most patterns can be written directly as strange but nevertheless
-valid s-expressions.  Only a few of them need modification:
+Most patterns can be written directly as s-expressions inside a string.
+Only a few of them need modification:
 
 @itemize
 @item
-Anchor @samp{.} is written as @code{:anchor}.
+Anchor @code{:anchor} is written as @samp{.}.
 @item
-@samp{?} is written as @samp{:?}.
+@samp{:?} is written as @samp{?}.
 @item
-@samp{*} is written as @samp{:*}.
+@samp{:*} is written as @samp{*}.
 @item
-@samp{+} is written as @samp{:+}.
+@samp{:+} is written as @samp{+}.
 @item
-@code{#equal} is written as @code{:equal}.  In general, predicates
-change their @samp{#} to @samp{:}.
+@code{:equal}, @code{:match} and @code{:pred} are written as
+@code{#equal}, @code{#match} and @code{#pred}, respectively.
+In general, predicates change their @samp{:} to @samp{#}.
 @end itemize
 
 For example,
 
 @example
 @group
-"(
-  (compound_expression . (_) @@first (_)* @@rest)
-  (#match \"love\" @@first)
-  )"
+'((
+   (compound_expression :anchor (_) @@first (_) :* @@rest)
+   (:match "love" @@first)
+   ))
 @end group
 @end example
 
 @noindent
-is written in s-expression syntax as
+is written in string form as
 
 @example
 @group
-'((
-   (compound_expression :anchor (_) @@first (_) :* @@rest)
-   (:match "love" @@first)
-   ))
+"(
+  (compound_expression . (_) @@first (_)* @@rest)
+  (#match \"love\" @@first)
+  )"
 @end group
 @end example
 
@@ -1413,7 +1413,7 @@ Pattern Matching
 @end defun
 
 @defun treesit-query-language query
-This function return the language of @var{query}.
+This function returns the language of @var{query}.
 @end defun
 
 @defun treesit-query-expand query
@@ -1605,7 +1605,7 @@ Multiple Languages
 (setq css-range
       (treesit-query-range
        'html
-       "(style_element (raw_text) @@capture)"))
+       '((style_element (raw_text) @@capture))))
 (treesit-parser-set-included-ranges css css-range)
 @end group
 
@@ -1614,7 +1614,7 @@ Multiple Languages
 (setq js-range
       (treesit-query-range
        'html
-       "(script_element (raw_text) @@capture)"))
+       '((script_element (raw_text) @@capture))))
 (treesit-parser-set-included-ranges js js-range)
 @end group
 @end example

^ permalink raw reply related	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-17 13:30             ` Mattias Engdegård
@ 2023-06-17 22:55               ` Yuan Fu
  2023-06-18  8:47                 ` Mattias Engdegård
  0 siblings, 1 reply; 13+ messages in thread
From: Yuan Fu @ 2023-06-17 22:55 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Basil Contovounesios, Eli Zaretskii, 64017



> On Jun 17, 2023, at 6:30 AM, Mattias Engdegård <mattias.engdegard@gmail.com> wrote:
> 
> 17 juni 2023 kl. 14.57 skrev Eli Zaretskii <eliz@gnu.org>:
> 
>>> Will do, thank you. Since this is only about documentation, perhaps it could be done in emacs-29?
>>> Eli, would that be acceptable?
>> 
>> If Yuan doesn't mind, yes.  But I'd like to hear from Yuan that he is
>> okay with these changes.
> 
> Attached are the changes rebased to emacs-29 (fixing mistakes found by Basil).
> 
> <treesit-doc-sexp-patterns-em29.diff>

LGTM!

Yuan




^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-16 11:25   ` Mattias Engdegård
  2023-06-16 17:02     ` Mattias Engdegård
@ 2023-06-17 23:02     ` Yuan Fu
  1 sibling, 0 replies; 13+ messages in thread
From: Yuan Fu @ 2023-06-17 23:02 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: contovob, 64017

> 
> Yes, so it seemed to me but reading the source code (lib/src/query.c) seems to indicate that what I thought were symbols -- *, +, ?, @thing, #thing -- appear to be special postfix and prefix operators. (Ironically, there doesn't seem to be a grammar for this language anywhere, or am I mistaken?)
> 
> Thus a structurally correct Lispish translation of
> 
>  (teet "toot"* (#equal "fie" @fum))
> 
> should arguable be something like
> 
>  (teet (* "toot") ((# equal) "fie" (@ fum)))
> 
> rather than the current
> 
>  (teet "toot" :* (:equal "fie @fum))
> 
> but I'm not demanding that it all be changed at this stage.

IMHO the query syntax is already pretty far away from a “proper sexp” that we expect, so changing these little things don’t have much benefit. For example, the field names and trailing capture names are not conventional, are we going to change them to be more sexpy too? 

In a proper sexp they would have been wrapped too, like

(field-name: node) rather than field-name: node
(@fn node) rather than node @fn

Not to mention using colon and @ to distinguish field-names and capture names from nodes—not very conventional either.

Also a more conventional sexp syntax would be much more verbose than the current one, and arguable harder to translate to the tree-sitter string syntax, which is ultimately what we feed to tree-sitter functions.

Yuan




^ permalink raw reply	[flat|nested] 13+ messages in thread

* bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax
  2023-06-17 22:55               ` Yuan Fu
@ 2023-06-18  8:47                 ` Mattias Engdegård
  0 siblings, 0 replies; 13+ messages in thread
From: Mattias Engdegård @ 2023-06-18  8:47 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Basil Contovounesios, Eli Zaretskii, 64017-done

18 juni 2023 kl. 00.55 skrev Yuan Fu <casouri@gmail.com>:

> LGTM!

Thank you, these changes are now in emacs-29.

And we are done, closing the bug.






^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2023-06-18  8:47 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-12 14:14 bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax Mattias Engdegård
     [not found] ` <handler.64017.B.168657924917612.ack@debbugs.gnu.org>
2023-06-15 10:45   ` Mattias Engdegård
2023-06-15 22:13     ` Yuan Fu
2023-06-15 22:08 ` Yuan Fu
2023-06-16 11:25   ` Mattias Engdegård
2023-06-16 17:02     ` Mattias Engdegård
2023-06-16 17:33       ` Basil Contovounesios via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-06-17 10:47         ` Mattias Engdegård
2023-06-17 12:57           ` Eli Zaretskii
2023-06-17 13:30             ` Mattias Engdegård
2023-06-17 22:55               ` Yuan Fu
2023-06-18  8:47                 ` Mattias Engdegård
2023-06-17 23:02     ` Yuan Fu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).