From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Newsgroups: gmane.emacs.bugs Subject: bug#64017: Wrong conversion from Emacs to Tree-sitter S-expression syntax Date: Fri, 16 Jun 2023 19:02:58 +0200 Message-ID: <04C45D03-D49B-4DE4-AD26-2606C94AF260@gmail.com> References: <43D49A55-2C3F-4EA4-8DF8-0CD9A516573E@gmail.com> <0CBD145C-0A92-4258-A5F3-6FC616E89ED8@gmail.com> Mime-Version: 1.0 (Mac OS X Mail 14.0 \(3654.120.0.1.15\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_DC59A527-37D8-4AC7-8BA2-A7B078AD7320" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28659"; mail-complaints-to="usenet@ciao.gmane.io" Cc: contovob@tcd.ie, 64017@debbugs.gnu.org To: Yuan Fu Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Jun 16 19:04:14 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qACru-0007Gh-Iz for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 16 Jun 2023 19:04:14 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qACrj-0006F4-EG; Fri, 16 Jun 2023 13:04:03 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qACri-0006Ep-IB for bug-gnu-emacs@gnu.org; Fri, 16 Jun 2023 13:04:02 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qACri-00030Z-1i for bug-gnu-emacs@gnu.org; Fri, 16 Jun 2023 13:04:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qACrh-0006tu-TX for bug-gnu-emacs@gnu.org; Fri, 16 Jun 2023 13:04:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Mattias =?UTF-8?Q?Engdeg=C3=A5rd?= Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 16 Jun 2023 17:04:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 64017 X-GNU-PR-Package: emacs X-Debbugs-Original-Cc: Basil Contovounesios , Bug Report Emacs Original-Received: via spool by submit@debbugs.gnu.org id=B.168693498726440 (code B ref -1); Fri, 16 Jun 2023 17:04:01 +0000 Original-Received: (at submit) by debbugs.gnu.org; 16 Jun 2023 17:03:07 +0000 Original-Received: from localhost ([127.0.0.1]:50274 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qACqo-0006sN-Rp for submit@debbugs.gnu.org; Fri, 16 Jun 2023 13:03:07 -0400 Original-Received: from lists.gnu.org ([209.51.188.17]:60182) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qACqm-0006sG-S1 for submit@debbugs.gnu.org; Fri, 16 Jun 2023 13:03:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qACqm-00069k-EK for bug-gnu-emacs@gnu.org; Fri, 16 Jun 2023 13:03:04 -0400 Original-Received: from mail-lf1-x12f.google.com ([2a00:1450:4864:20::12f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qACqk-0002ly-OJ for bug-gnu-emacs@gnu.org; Fri, 16 Jun 2023 13:03:04 -0400 Original-Received: by mail-lf1-x12f.google.com with SMTP id 2adb3069b0e04-4f84d70bf96so1226170e87.0 for ; Fri, 16 Jun 2023 10:03:02 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20221208; t=1686934980; x=1689526980; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:sender:from:to:cc:subject:date:message-id:reply-to; bh=8rv7Bkw+TqK0PC3iid2gLmXLiR1kHtL3LEBV2Wb1mCw=; b=n3RlRwsJ9xEDCoAk98QW6oZdjgTg+NfcbKP4O2DLZHGGnOQn6BGQ38QCi+CZ6BKdTx ciJXxFhue7Y8VMhCYtN7ZvouSg8QSO2xnRfvrk/TqOzZ1a/4nCspcZtHZw8shTsCC3+T DzQd2Nr+46wPlW2/7hsO2TtWUuRLNoyAFijyvJr3ohj5qokCaKgbcv3j2HRgsK3TiM7X 0ziokjV8AtzFLvWSqpPO616zn0XvBdtpiluLF2u/F3egr+LORgOcNurnGjSY+YyfhtCD ahvqNMNO3A1ePutH1eck/Xag9fw7YmRf8fssI1qSP7FwnM+uFYzIz6TJ4sblSOOgftp3 RrNQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20221208; t=1686934980; x=1689526980; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:sender:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8rv7Bkw+TqK0PC3iid2gLmXLiR1kHtL3LEBV2Wb1mCw=; b=lPqvIG1rVNcLPlSFxmhimKnKD47HtNf2E1uB2y+9T5/bFcz8vSgKMm9pEHVKUqmKZw T8kRuy3dKL515aNsNG0oDWgF0kJ4GXP98YfPSYoYAiDXO/0UQ0UMe0tEE2Vj3Ox1z/lW owt08YHZJ1ZIF6Oblsh+OOZMETyf+7yo0kKoygmdLfGgL1aXMHxaCyjIHPOsdGaOaEu2 ptUH0yLOpYqTvsF8+rnjIt1RiWAMH/Vr4nXnWhmvTwgSKlzMKN+b/pLYwJ/K7F6BZ115 5OvbeWQXVrzsh1SE5pTmoYGjuq2ib8QX6uQ40oh+XFgIJwdp9ViO50YcnHdYo8+4IHY/ oe4Q== X-Gm-Message-State: AC+VfDxCBPdYszE36LP8VG+j/RMLKwob7ByQmanTPt9bVybtY24gstjU misJKfwQOZ0s6UEzD71Vh4w= X-Google-Smtp-Source: ACHHUZ54WApxyUwQen4LTsdOjb8d9sULHs7BF2eYSj0kqSjf6BjfF/iY/7K0Skb7WZ3hoUMcJY1WJg== X-Received: by 2002:a19:9154:0:b0:4f8:45aa:f844 with SMTP id y20-20020a199154000000b004f845aaf844mr778974lfj.31.1686934980367; Fri, 16 Jun 2023 10:03:00 -0700 (PDT) Original-Received: from smtpclient.apple (c188-150-165-235.bredband.tele2.se. [188.150.165.235]) by smtp.gmail.com with ESMTPSA id d1-20020ac244c1000000b004f73eac0308sm1105497lfm.183.2023.06.16.10.02.59 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Fri, 16 Jun 2023 10:02:59 -0700 (PDT) In-Reply-To: <0CBD145C-0A92-4258-A5F3-6FC616E89ED8@gmail.com> X-Mailer: Apple Mail (2.3654.120.0.1.15) Received-SPF: pass client-ip=2a00:1450:4864:20::12f; envelope-from=mattias.engdegard@gmail.com; helo=mail-lf1-x12f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:263481 Archived-At: --Apple-Mail=_DC59A527-37D8-4AC7-8BA2-A7B078AD7320 Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii Here is a modification of the treesit manual to teach s-expressions first. It's mostly a matter of straightforward substitution. --Apple-Mail=_DC59A527-37D8-4AC7-8BA2-A7B078AD7320 Content-Disposition: attachment; filename=treesit-doc-sexp-patterns.diff Content-Type: application/octet-stream; x-unix-mode=0644; name="treesit-doc-sexp-patterns.diff" Content-Transfer-Encoding: 7bit diff --git a/doc/lispref/parsing.texi b/doc/lispref/parsing.texi index b0824faaaa2..bd81ee3c535 100644 --- a/doc/lispref/parsing.texi +++ b/doc/lispref/parsing.texi @@ -1132,9 +1132,9 @@ Pattern Matching @defun treesit-query-capture node query &optional beg end node-only This function matches patterns in @var{query} within @var{node}. -The argument @var{query} can be either a string, a s-expression, or a -compiled query object. For now, we focus on the string syntax; -s-expression syntax and compiled query are described at the end of the +The argument @var{query} can be either a s-expression, a string, or a +compiled query object. For now, we focus on the s-expression syntax; +string syntax and compiled query are described at the end of the section. The argument @var{node} can also be a parser or a language symbol. A @@ -1165,8 +1165,8 @@ Pattern Matching @example @group (setq query - "(binary_expression - (number_literal) @@number-in-exp) @@biexp") + '((binary_expression + (number_literal) @@number-in-exp) @@biexp) @end group @end example @@ -1187,8 +1187,8 @@ Pattern Matching @example @group (setq query - "(binary_expression) @@biexp - (number_literal) @@number @@biexp") + '((binary_expression) @@biexp + (number_literal) @@number @@biexp) @end group @end example @@ -1246,23 +1246,23 @@ Pattern Matching @subheading Quantify node @cindex quantify node, tree-sitter -Tree-sitter recognizes quantification operators @samp{*}, @samp{+} and -@samp{?}. Their meanings are the same as in regular expressions: -@samp{*} matches the preceding pattern zero or more times, @samp{+} -matches one or more times, and @samp{?} matches zero or one time. +Tree-sitter recognizes quantification operators @samp{:*}, @samp{:+} and +@samp{:?}. Their meanings are the same as in regular expressions: +@samp{:*} matches the preceding pattern zero or more times, @samp{:+} +matches one or more times, and @samp{:?} matches zero or one time. For example, the following pattern matches @code{type_declaration} nodes that has @emph{zero or more} @code{long} keyword. @example -(type_declaration "long"*) @@long-type +(type_declaration "long" :*) @@long-type @end example The following pattern matches a type declaration that has zero or one @code{long} keyword: @example -(type_declaration "long"?) @@long-type +(type_declaration "long" :?) @@long-type @end example @subheading Grouping @@ -1272,15 +1272,15 @@ Pattern Matching express a comma separated list of identifiers, one could write @example -(identifier) ("," (identifier))* +(identifier) ("," (identifier)) :* @end example @subheading Alternation -Again, similar to regular expressions, we can express ``match anyone -from this group of patterns'' in a pattern. The syntax is a list of -patterns enclosed in square brackets. For example, to capture some -keywords in C, the pattern would be +Again, similar to regular expressions, we can express ``match any one +from this group of patterns'' in a pattern. The syntax is a vector of +patterns. For example, to capture some keywords in C, the pattern +would be @example @group @@ -1295,7 +1295,7 @@ Pattern Matching @subheading Anchor -The anchor operator @samp{.} can be used to enforce juxtaposition, +The anchor operator @code{:anchor} can be used to enforce juxtaposition, i.e., to enforce two things to be directly next to each other. The two ``things'' can be two nodes, or a child and the end of its parent. For example, to capture the first child, the last child, or two @@ -1304,19 +1304,19 @@ Pattern Matching @example @group ;; Anchor the child with the end of its parent. -(compound_expression (_) @@last-child .) +(compound_expression (_) @@last-child :anchor) @end group @group ;; Anchor the child with the beginning of its parent. -(compound_expression . (_) @@first-child) +(compound_expression :anchor (_) @@first-child) @end group @group ;; Anchor two adjacent children. (compound_expression (_) @@prev-child - . + :anchor (_) @@next-child) @end group @end example @@ -1332,8 +1332,8 @@ Pattern Matching @example @group ( - (array . (_) @@first (_) @@last .) - (#equal @@first @@last) + (array :anchor (_) @@first (_) @@last :anchor) + (:equal @@first @@last) ) @end group @end example @@ -1341,22 +1341,23 @@ Pattern Matching @noindent tree-sitter only matches arrays where the first element equals to the last element. To attach a predicate to a pattern, we need to group -them together. A predicate always starts with a @samp{#}. Currently -there are three predicates, @code{#equal}, @code{#match}, and -@code{#pred}. +them together. Currently +there are three predicates, @code{:equal}, @code{:match}, and +@code{:pred}. -@deffn Predicate equal arg1 arg2 +@deffn Predicate :equal arg1 arg2 Matches if @var{arg1} equals to @var{arg2}. Arguments can be either strings or capture names. Capture names represent the text that the captured node spans in the buffer. @end deffn -@deffn Predicate match regexp capture-name +@deffn Predicate :match regexp capture-name Matches if the text that @var{capture-name}'s node spans in the buffer -matches regular expression @var{regexp}. Matching is case-sensitive. +matches regular expression @var{regexp}, given as a string literal. +Matching is case-sensitive. @end deffn -@deffn Predicate pred fn &rest nodes +@deffn Predicate :pred fn &rest nodes Matches if function @var{fn} returns non-@code{nil} when passed each node in @var{nodes} as arguments. The function runs with the current buffer set to the buffer of node being queried. @@ -1366,23 +1367,23 @@ Pattern Matching the same pattern. Indeed, it makes little sense to refer to capture names in other patterns. -@heading S-expression patterns +@heading String patterns -@cindex tree-sitter patterns as sexps -@cindex patterns, tree-sitter, in sexp form -Besides strings, Emacs provides a s-expression based syntax for -tree-sitter patterns. It largely resembles the string-based syntax. -For example, the following query +@cindex tree-sitter patterns as strings +@cindex patterns, tree-sitter, in string form +Besides s-expressions, Emacs allows the tree-sitter's native query +syntax to be used by writing them as strings. It largely resembles +the s-expression syntax. For example, the following query @example @group (treesit-query-capture - node "(addition_expression - left: (_) @@left - \"+\" @@plus-sign - right: (_) @@right) @@addition + node '((addition_expression + left: (_) @@left + "+" @@plus-sign + right: (_) @@right) @@addition - [\"return\" \"break\"] @@keyword") + ["return" "break"] @@keyword)) @end group @end example @@ -1392,52 +1393,52 @@ Pattern Matching @example @group (treesit-query-capture - node '((addition_expression - left: (_) @@left - "+" @@plus-sign - right: (_) @@right) @@addition + node "(addition_expression + left: (_) @@left + \"+\" @@plus-sign + right: (_) @@right) @@addition - ["return" "break"] @@keyword)) + [\"return\" \"break\"] @@keyword") @end group @end example -Most patterns can be written directly as strange but nevertheless -valid s-expressions. Only a few of them needs modification: +Most patterns can be written directly as s-expressions inside a string. +Only a few of them need modification: @itemize @item -Anchor @samp{.} is written as @code{:anchor}. +Anchor @code{:anchor}. is written as @samp{.} @item -@samp{?} is written as @samp{:?}. +@samp{:?} is written as @samp{?}. @item -@samp{*} is written as @samp{:*}. +@samp{:*} is written as @samp{*}. @item -@samp{+} is written as @samp{:+}. +@samp{:+} is written as @samp{+}. @item -@code{#equal} is written as @code{:equal}. In general, predicates -change their @samp{#} to @samp{:}. +@code{:equal} is written as @code{#equal}. In general, predicates +change their @samp{:} to @samp{#}. @end itemize For example, @example @group -"( - (compound_expression . (_) @@first (_)* @@rest) - (#match \"love\" @@first) - )" +'(( + (compound_expression :anchor (_) @@first (_) :* @@rest) + (:match "love" @@first) + )) @end group @end example @noindent -is written in s-expression as +is written in string form as @example @group -'(( - (compound_expression :anchor (_) @@first (_) :* @@rest) - (:match "love" @@first) - )) +"( + (compound_expression . (_) @@first (_)* @@rest) + (#match \"love\" @@first) + )" @end group @end example @@ -1461,7 +1462,7 @@ Pattern Matching @end defun @defun treesit-query-language query -This function return the language of @var{query}. +This function returns the language of @var{query}. @end defun @defun treesit-query-expand query @@ -1653,7 +1654,7 @@ Multiple Languages (setq css-range (treesit-query-range 'html - "(style_element (raw_text) @@capture)")) + '((style_element (raw_text) @@capture)))) (treesit-parser-set-included-ranges css css-range) @end group @@ -1662,7 +1663,7 @@ Multiple Languages (setq js-range (treesit-query-range 'html - "(script_element (raw_text) @@capture)")) + '((script_element (raw_text) @@capture)))) (treesit-parser-set-included-ranges js js-range) @end group @end example --Apple-Mail=_DC59A527-37D8-4AC7-8BA2-A7B078AD7320--