From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= Newsgroups: gmane.emacs.devel Subject: Re: Call for volunteers: add tree-sitter support to major modes Date: Mon, 24 Oct 2022 09:41:19 -0600 Message-ID: References: <83sfjtd2bg.fsf@gnu.org> <83o7uhawb9.fsf@gnu.org> <83edv0uzp0.fsf@gnu.org> <41981BCB-4797-46C9-B31E-58BA17085207@gmail.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000016c99c05ebc9a0a5" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37703"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , emacs-devel@gnu.org To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Oct 24 18:00:46 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1omzsc-0009a7-BT for ged-emacs-devel@m.gmane-mx.org; Mon, 24 Oct 2022 18:00:46 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1omza4-00067s-Bo; Mon, 24 Oct 2022 11:41:36 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omza2-00067K-Qz for emacs-devel@gnu.org; Mon, 24 Oct 2022 11:41:34 -0400 Original-Received: from mail-oi1-x22e.google.com ([2607:f8b0:4864:20::22e]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1omza0-0002xP-Og; Mon, 24 Oct 2022 11:41:34 -0400 Original-Received: by mail-oi1-x22e.google.com with SMTP id j188so11236546oih.4; Mon, 24 Oct 2022 08:41:31 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=dOgUaENXCETIkXKaFld8LTi2fc40+4pSqn8ZNwdfSMw=; b=GurpTzNkpEVuV3P7vkx0XFTgukfqHk7808DDc07rmBmUc6Eni8zfM96ihtSaAmufik p7QuKmDmO39KmpkZT0QV6aD/DgYQh8h3PzTJRzJn13KN0st45LNFp2T+viZY9765plOn HK5JUeAlN7LeyYQOdwLWIT+tv445fjOqd2U79vuiuWNHq5xEeAlvi10Em3HQugC/OkCg SaqfToszo2XIT2rVNHfCyTKqJTDgz72BCPpSf27GUjpygNt5V7gD6R4IH0hzEMAmrgt/ NKq25fOpJD2OcpVNA2xjEnVfqfUNRAosc1h9Z05AbPV1pk5N2nhISQSyNjuHU9AtBU2i 1T/w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=dOgUaENXCETIkXKaFld8LTi2fc40+4pSqn8ZNwdfSMw=; b=Z+sUNdBYADw3Z/Tn1fKlMtfrCOrpHa3r4erpcGfQYJb+2dPH6K2Gl0ttX5PdFd6DmC eb/RXW47oOQKzouQ73v0tQldQ831VO02l0MkfA3sEhUKJRWehEBeD2zDj5iuwmPwBDqo w0sraw6EDkJ/M/g2ZgVoPQG1O4umAgMXkwgSw6FX1XasT825jzRqJdV7e0mi4JluTDLt J1vnsBQmyBsS0ImxHl7SI0idKFW03erxMRNVICKRdIMPCq2PZvToZ+LU67l1RDSSoXIY c9dQFViKQsMcyvIVrnt8sqwADJfZ4jChEOjBzghgqZfsnp8etL1taCCTieR60Kq9nk1S Ia7A== X-Gm-Message-State: ACrzQf2nonJYLRCY6+u0df4eC6fHI3pY6VYs0T1wpMf4vt/OFIIYZtn5 0p0oU55v8UpmNg8PyPmkEG/IrQ2+V/JcRmQcrrp/tkRB X-Google-Smtp-Source: AMsMyM774jVTrxPR/w9Dq/9hJsPBvcqtHgYSjMAEaHWwCyU6YrnqXLzTmj+rIcEdhNGv7rbrVtzBN0bKPOgwa5HCSi4= X-Received: by 2002:a05:6808:1187:b0:353:a617:6acd with SMTP id j7-20020a056808118700b00353a6176acdmr17248729oil.105.1666626090353; Mon, 24 Oct 2022 08:41:30 -0700 (PDT) In-Reply-To: <41981BCB-4797-46C9-B31E-58BA17085207@gmail.com> Received-SPF: pass client-ip=2607:f8b0:4864:20::22e; envelope-from=jaopaulolc@gmail.com; helo=mail-oi1-x22e.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: "Emacs-devel" Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:298375 Archived-At: --00000000000016c99c05ebc9a0a5 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Thanks for pointing that out. I am still getting familiar with tree-sitter thus the message sounded very generic. According to the GNU Bash Reference Manual, "time" is a reserved word. But tree-sitter does not include it in the grammar. Should I submit a PR for tree-sitter-bash to add "time", and possibly other missing reserved word/keywords, or should I just fontify those recognized by tree-sitter-bash in shell-script-mode? I believe the best course of action is to make tree-sitter-bash as complete as possible. That way the regex and tree-sitter fontification will fontify exactly the same things. What do you guys think? On Sun, Oct 23, 2022 at 10:20 PM Yuan Fu wrote: > > > > On Oct 22, 2022, at 8:51 AM, Jo=C3=A3o Paulo Labegalini de Carvalho < > jaopaulolc@gmail.com> wrote: > > > > I am getting a query error but I don't understand why. > > > > The following query is fine: > > > > (defvar sh-script--treesit-bash-keywords > > '("case" "do" "done" "elif" "else" "esac" "export" "fi" "for" > > "function" "if" "in" "unset" "while" "then")) > > > > (treesit-validate-query 'bash `([ ,@sh-script--treesit-bash-keywords ] > @font-lock-keyword-face)) > > > > However the following query is said INVALID by `treesit-validate-query'= : > > (treesit-validate-query 'bash `([ ,@(sh-feature sh-leading-keywords) ] > @font-lock-keyword-face)) > > Node type error at: 3 > > ["time" "!" "do" "done" ...] @font-lock-keyword-face > > > > time" is highlighted in the *tree-sitter check query* buffer. > > > > Even though the forms below evaluate to equivalent forms: > > `([ ,@sh-script--treesit-bash-keywords] @font-lock-keyword-face) > > evaluates to: > > ([ "case" "do" "done" "elif" ... ] @font-lock-keyword-face) > > > > `([ ,@(sh-feature sh-leading-keywords) ] @font-lock-keyword-face) > > evaluates to: > > (["time" "!" "do" "done" ...] @font-lock-keyword-face) > > > > > > Any clues to what I am doing wrong? > > It is saying that there is no =E2=80=9Ctime=E2=80=9D node in bash grammar= . You probably > need to consult the grammar file of tree-sitter-bash to see what are the > keywords it recognizes. > > For example, running the following snippet > > (let (collection) > (goto-char (point-min)) > (while (re-search-forward "'[^ ][^ ]+?'" nil t) > (push (match-string 0) collection)) > (pop-to-buffer "*result*") > (dolist (keyword (cl-remove-duplicates collection :test #'equal)) > (insert keyword "\n"))) > > in the grammar.js gives me > > '\\\\' > '>(' > '<(' > '$(' > ':-' > ':?' > '${' > ')*' > '([^' > '[^' > '--' > '++' > 'alternative' > 'consequence' > 'right' > '>=3D' > '<=3D' > '-=3D' > '!=3D' > 'operator' > 'left' > '<<<' > 'destination' > '>|' > '>&' > '<&' > '&>>' > '&>' > '>>' > 'descriptor' > 'index' > '=3D=3D' > '=3D~' > 'argument' > 'unsetenv' > 'unset' > 'local' > 'readonly' > 'export' > 'typeset' > 'declare' > ']]' > '[[' > '||' > '&&' > '|&' > 'name' > 'function' > ';;&' > ';&' > 'fallthrough' > ';;' > 'termination' > 'esac' > 'case' > 'else' > 'elif' > 'fi' > 'then' > 'if' > 'done' > 'do' > 'until' > 'while' > '))' > 'update' > 'condition' > 'initializer' > '((' > 'value' > 'in' > 'variable' > 'select' > 'for' > 'redirect' > 'body' > '\n' > '<<-' > '<<' > '+=3D' > 'bash' > '\\s' > '\\' > '\\]' > '\\[' > > > Yuan --=20 Jo=C3=A3o Paulo L. de Carvalho Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canad= a joao.carvalho@ic.unicamp.br joao.carvalho@ualberta.ca --00000000000016c99c05ebc9a0a5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks for pointing that out. I am still getting fami= liar with tree-sitter thus the message sounded very generic.

Accordi= ng to the GNU Bash Reference Manual, "time= " is a reserved word. But tree-sitter does not include it in the g= rammar.

Should I submit a PR for tree-sitter-bash to add "time&= quot;, and possibly other missing reserved word/keywords, or should I just = fontify those recognized by tree-sitter-bash in shell-script-mode?

I= believe the best course of action is to make tree-sitter-bash as complete = as possible. That way the regex and tree-sitter fontification will fontify = exactly the same things.

What do you guys think?=C2=A0=C2=A0
On Sun, = Oct 23, 2022 at 10:20 PM Yuan Fu <c= asouri@gmail.com> wrote:


> On Oct 22, 2022, at 8:51 AM, Jo=C3=A3o Paulo Labegalini de Carvalho &l= t;jaopaulolc@gmai= l.com> wrote:
>
> I am getting a query error but I don't understand why.
>
> The following query is fine:
>
> (defvar sh-script--treesit-bash-keywords
>=C2=A0 =C2=A0'("case" "do" "done" &qu= ot;elif" "else" "esac" "export" "fi= " "for"
>=C2=A0 =C2=A0 =C2=A0"function" "if" "in" = "unset" "while" "then"))
>
> (treesit-validate-query 'bash `([ ,@sh-script--treesit-bash-keywor= ds ] @font-lock-keyword-face))
>
> However the following query is said INVALID by `treesit-validate-query= ':
> (treesit-validate-query 'bash `([ ,@(sh-feature sh-leading-keyword= s) ] @font-lock-keyword-face))
> Node type error at: 3
> ["time" "!" "do" "done" ...] @= font-lock-keyword-face
>
> time" is highlighted in the *tree-sitter check query* buffer. >
> Even though the forms below evaluate to equivalent forms:
> `([ ,@sh-script--treesit-bash-keywords] @font-lock-keyword-face)
> evaluates to:
> ([ "case" "do" "done" "elif" .= .. ] @font-lock-keyword-face)
>
> `([ ,@(sh-feature sh-leading-keywords) ] @font-lock-keyword-face)
> evaluates to:
> (["time" "!" "do" "done" ...] = @font-lock-keyword-face)
>
>
> Any clues to what I am doing wrong?

It is saying that there is no =E2=80=9Ctime=E2=80=9D node in bash grammar. = You probably need to consult the grammar file of tree-sitter-bash to see wh= at are the keywords it recognizes.

For example, running the following snippet

(let (collection)
=C2=A0 (goto-char (point-min))
=C2=A0 (while (re-search-forward "'[^ ][^ ]+?'" nil t) =C2=A0 =C2=A0 (push (match-string 0) collection))
=C2=A0 (pop-to-buffer "*result*")
=C2=A0 (dolist (keyword (cl-remove-duplicates collection :test #'equal)= )
=C2=A0 =C2=A0 (insert keyword "\n")))

in the grammar.js gives me

'\\\\'
'>('
'<('
'$('
':-'
':?'
'${'
')*'
'([^'
'[^'
'--'
'++'
'alternative'
'consequence'
'right'
'>=3D'
'<=3D'
'-=3D'
'!=3D'
'operator'
'left'
'<<<'
'destination'
'>|'
'>&'
'<&'
'&>>'
'&>'
'>>'
'descriptor'
'index'
'=3D=3D'
'=3D~'
'argument'
'unsetenv'
'unset'
'local'
'readonly'
'export'
'typeset'
'declare'
']]'
'[['
'||'
'&&'
'|&'
'name'
'function'
';;&'
';&'
'fallthrough'
';;'
'termination'
'esac'
'case'
'else'
'elif'
'fi'
'then'
'if'
'done'
'do'
'until'
'while'
'))'
'update'
'condition'
'initializer'
'(('
'value'
'in'
'variable'
'select'
'for'
'redirect'
'body'
'\n'
'<<-'
'<<'
'+=3D'
'bash'
'\\s'
'\\'
'\\]'
'\\['


Yuan


--
Jo=C3=A3o Paulo L. de Carvalho
Ph.D Computer Science | =C2=A0IC-UNICAM= P | Campinas , SP - Brazil
Postdoctoral Research Fellow | University of = Alberta | Edmonton, AB - Canada
--00000000000016c99c05ebc9a0a5--