From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= Newsgroups: gmane.emacs.devel Subject: Re: Call for volunteers: add tree-sitter support to major modes Date: Mon, 24 Oct 2022 09:46:41 -0600 Message-ID: References: <83sfjtd2bg.fsf@gnu.org> <83o7uhawb9.fsf@gnu.org> <83edv0uzp0.fsf@gnu.org> <41981BCB-4797-46C9-B31E-58BA17085207@gmail.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000044f0c305ebc9b38e" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="37193"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Eli Zaretskii , emacs-devel@gnu.org To: Yuan Fu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Mon Oct 24 17:53:54 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1omzlx-0009TH-Lv for ged-emacs-devel@m.gmane-mx.org; Mon, 24 Oct 2022 17:53:53 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1omzfF-0000yP-Sz; Mon, 24 Oct 2022 11:46:57 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1omzfE-0000xh-68 for emacs-devel@gnu.org; Mon, 24 Oct 2022 11:46:56 -0400 Original-Received: from mail-oa1-x36.google.com ([2001:4860:4864:20::36]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1omzfB-0003pa-TL; Mon, 24 Oct 2022 11:46:55 -0400 Original-Received: by mail-oa1-x36.google.com with SMTP id 586e51a60fabf-12c8312131fso12316839fac.4; Mon, 24 Oct 2022 08:46:53 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=QQAF8Gccx3K1SeDIGxdRirpvzdFEtk55MBjthQQOZP0=; b=CCE23DFaC497FzdNuo52ucvlBksDR/L0V2x8NX2w97L0EFGZa5z+Fs3B3A2u5bEgnr hYIq+DFM30f1uhGrWPiURik0IRU73KWGOjkkqU4kscA37m38PGIT78KUZK6VeY/BxhTu msihzlaOuRtcm6Q3STml2rgDYtxi1f/ySikhbUKOJivQvBow0iMne51lE/T/CAOw/lmo Mh4iA6pElXO3Qtlf/NVik+j13b7mUEYAoCRl84aEnZbJ7pd7Vsz2mEBFYGOz6c0oRfTg yCUNd2P9MatbLClfRRNuVJkiccRIQSf5pS6bUe52FHfk/zdnMXiF8JfTg0prY64cwqpS hhyQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=QQAF8Gccx3K1SeDIGxdRirpvzdFEtk55MBjthQQOZP0=; b=exHDB9SvDlEcGsZ8P05a705D23cB1SKXOtGLHDAgE8h3Mffc/fHA2Phw9tljlH64d5 9uNEwjfZkSZOBaEofPHkEPJUjT8gb1IY08uUyN+V08KmDSMthAscxgHdtV8vZ8WLA2sq D++iGk5at5+2jLfudVCDg0YXpAZJURohDFylYJn0J3svf9XlAQbFu0Y5OvRHWFZBE7lT OamUmiu7ctLjsCvuCcPcG534FBD1NTf78FeblAFTqqpsjcHWjKJ2MZQjwm5YFUS9JEAz j5oFpbPawZi5Q4XXu6UdBm6I9aEuo/eavgeAYElFv0ooB3O9G6VZvkBrBX21oHB9JgXV kgfA== X-Gm-Message-State: ACrzQf1QL7DXe3IwCpkth7GWVIw1X/HJQb8m5t76X1aE+X3B+MCjgUY1 6s4HTSpdDfZuIPKKcMJS8XPK+9Jyq1JFP6tfbCA= X-Google-Smtp-Source: AMsMyM6ZUSEVeXQSrh6KFFbrMVF9muY7A1FsEx0fuyZWAifi1vWrBmJfTZ2YBv+bAnjoxCC2XmchJRzHi235yw6JPmA= X-Received: by 2002:a05:6870:d1d0:b0:12c:cfd2:8285 with SMTP id b16-20020a056870d1d000b0012ccfd28285mr21542816oac.105.1666626412145; Mon, 24 Oct 2022 08:46:52 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2001:4860:4864:20::36; envelope-from=jaopaulolc@gmail.com; helo=mail-oa1-x36.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: "Emacs-devel" Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:298374 Archived-At: --00000000000044f0c305ebc9b38e Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable This is a better link to show all Bash's reserved words/keywords: https://www.gnu.org/software/bash/manual/bash.html#Reserved-Words I will reach out to tree-sitter-bash folks to add the missing things to the grammar. On Mon, Oct 24, 2022 at 9:41 AM Jo=C3=A3o Paulo Labegalini de Carvalho < jaopaulolc@gmail.com> wrote: > Thanks for pointing that out. I am still getting familiar with tree-sitte= r > thus the message sounded very generic. > > According to the GNU Bash Reference Manual, "time" is a reserved > > word. But tree-sitter does not include it in the grammar. > > Should I submit a PR for tree-sitter-bash to add "time", and possibly > other missing reserved word/keywords, or should I just fontify those > recognized by tree-sitter-bash in shell-script-mode? > > I believe the best course of action is to make tree-sitter-bash as > complete as possible. That way the regex and tree-sitter fontification wi= ll > fontify exactly the same things. > > What do you guys think? > > On Sun, Oct 23, 2022 at 10:20 PM Yuan Fu wrote: > >> >> >> > On Oct 22, 2022, at 8:51 AM, Jo=C3=A3o Paulo Labegalini de Carvalho < >> jaopaulolc@gmail.com> wrote: >> > >> > I am getting a query error but I don't understand why. >> > >> > The following query is fine: >> > >> > (defvar sh-script--treesit-bash-keywords >> > '("case" "do" "done" "elif" "else" "esac" "export" "fi" "for" >> > "function" "if" "in" "unset" "while" "then")) >> > >> > (treesit-validate-query 'bash `([ ,@sh-script--treesit-bash-keywords ] >> @font-lock-keyword-face)) >> > >> > However the following query is said INVALID by `treesit-validate-query= ': >> > (treesit-validate-query 'bash `([ ,@(sh-feature sh-leading-keywords) ] >> @font-lock-keyword-face)) >> > Node type error at: 3 >> > ["time" "!" "do" "done" ...] @font-lock-keyword-face >> > >> > time" is highlighted in the *tree-sitter check query* buffer. >> > >> > Even though the forms below evaluate to equivalent forms: >> > `([ ,@sh-script--treesit-bash-keywords] @font-lock-keyword-face) >> > evaluates to: >> > ([ "case" "do" "done" "elif" ... ] @font-lock-keyword-face) >> > >> > `([ ,@(sh-feature sh-leading-keywords) ] @font-lock-keyword-face) >> > evaluates to: >> > (["time" "!" "do" "done" ...] @font-lock-keyword-face) >> > >> > >> > Any clues to what I am doing wrong? >> >> It is saying that there is no =E2=80=9Ctime=E2=80=9D node in bash gramma= r. You probably >> need to consult the grammar file of tree-sitter-bash to see what are the >> keywords it recognizes. >> >> For example, running the following snippet >> >> (let (collection) >> (goto-char (point-min)) >> (while (re-search-forward "'[^ ][^ ]+?'" nil t) >> (push (match-string 0) collection)) >> (pop-to-buffer "*result*") >> (dolist (keyword (cl-remove-duplicates collection :test #'equal)) >> (insert keyword "\n"))) >> >> in the grammar.js gives me >> >> '\\\\' >> '>(' >> '<(' >> '$(' >> ':-' >> ':?' >> '${' >> ')*' >> '([^' >> '[^' >> '--' >> '++' >> 'alternative' >> 'consequence' >> 'right' >> '>=3D' >> '<=3D' >> '-=3D' >> '!=3D' >> 'operator' >> 'left' >> '<<<' >> 'destination' >> '>|' >> '>&' >> '<&' >> '&>>' >> '&>' >> '>>' >> 'descriptor' >> 'index' >> '=3D=3D' >> '=3D~' >> 'argument' >> 'unsetenv' >> 'unset' >> 'local' >> 'readonly' >> 'export' >> 'typeset' >> 'declare' >> ']]' >> '[[' >> '||' >> '&&' >> '|&' >> 'name' >> 'function' >> ';;&' >> ';&' >> 'fallthrough' >> ';;' >> 'termination' >> 'esac' >> 'case' >> 'else' >> 'elif' >> 'fi' >> 'then' >> 'if' >> 'done' >> 'do' >> 'until' >> 'while' >> '))' >> 'update' >> 'condition' >> 'initializer' >> '((' >> 'value' >> 'in' >> 'variable' >> 'select' >> 'for' >> 'redirect' >> 'body' >> '\n' >> '<<-' >> '<<' >> '+=3D' >> 'bash' >> '\\s' >> '\\' >> '\\]' >> '\\[' >> >> >> Yuan > > > > -- > Jo=C3=A3o Paulo L. de Carvalho > Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil > Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - > Canada > joao.carvalho@ic.unicamp.br > joao.carvalho@ualberta.ca > --=20 Jo=C3=A3o Paulo L. de Carvalho Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canad= a joao.carvalho@ic.unicamp.br joao.carvalho@ualberta.ca --00000000000044f0c305ebc9b38e Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
This is a better link to show all Bash's reserved word= s/keywords:=C2=A0https://www.gnu.org/software/bash/man= ual/bash.html#Reserved-Words

I will reach out to tree-sitter-bas= h folks to add the missing things to the grammar.

On Mon, Oct 24, 2022= at 9:41 AM Jo=C3=A3o Paulo Labegalini de Carvalho <jaopaulolc@gmail.com> wrote:
Thanks for po= inting that out. I am still getting familiar with tree-sitter thus the mess= age sounded very generic.

According to the GNU Bash Reference Manual= , "time" is a reser= ved word. But tree-sitter does not include it in the grammar.

Sh= ould I submit a PR for tree-sitter-bash to add "time", and possib= ly other missing reserved word/keywords, or should I just fontify those rec= ognized by tree-sitter-bash in shell-script-mode?

I believe the best= course of action is to make tree-sitter-bash as complete as possible. That= way the regex and tree-sitter fontification will fontify exactly the same = things.

What do you guys think?=C2=A0=C2=A0

On Sun, Oct 23, 2022 at 1= 0:20 PM Yuan Fu <= casouri@gmail.com> wrote:


> On Oct 22, 2022, at 8:51 AM, Jo=C3=A3o Paulo Labegalini de Carvalho &l= t;jaopaulolc@gmai= l.com> wrote:
>
> I am getting a query error but I don't understand why.
>
> The following query is fine:
>
> (defvar sh-script--treesit-bash-keywords
>=C2=A0 =C2=A0'("case" "do" "done" &qu= ot;elif" "else" "esac" "export" "fi= " "for"
>=C2=A0 =C2=A0 =C2=A0"function" "if" "in" = "unset" "while" "then"))
>
> (treesit-validate-query 'bash `([ ,@sh-script--treesit-bash-keywor= ds ] @font-lock-keyword-face))
>
> However the following query is said INVALID by `treesit-validate-query= ':
> (treesit-validate-query 'bash `([ ,@(sh-feature sh-leading-keyword= s) ] @font-lock-keyword-face))
> Node type error at: 3
> ["time" "!" "do" "done" ...] @= font-lock-keyword-face
>
> time" is highlighted in the *tree-sitter check query* buffer. >
> Even though the forms below evaluate to equivalent forms:
> `([ ,@sh-script--treesit-bash-keywords] @font-lock-keyword-face)
> evaluates to:
> ([ "case" "do" "done" "elif" .= .. ] @font-lock-keyword-face)
>
> `([ ,@(sh-feature sh-leading-keywords) ] @font-lock-keyword-face)
> evaluates to:
> (["time" "!" "do" "done" ...] = @font-lock-keyword-face)
>
>
> Any clues to what I am doing wrong?

It is saying that there is no =E2=80=9Ctime=E2=80=9D node in bash grammar. = You probably need to consult the grammar file of tree-sitter-bash to see wh= at are the keywords it recognizes.

For example, running the following snippet

(let (collection)
=C2=A0 (goto-char (point-min))
=C2=A0 (while (re-search-forward "'[^ ][^ ]+?'" nil t) =C2=A0 =C2=A0 (push (match-string 0) collection))
=C2=A0 (pop-to-buffer "*result*")
=C2=A0 (dolist (keyword (cl-remove-duplicates collection :test #'equal)= )
=C2=A0 =C2=A0 (insert keyword "\n")))

in the grammar.js gives me

'\\\\'
'>('
'<('
'$('
':-'
':?'
'${'
')*'
'([^'
'[^'
'--'
'++'
'alternative'
'consequence'
'right'
'>=3D'
'<=3D'
'-=3D'
'!=3D'
'operator'
'left'
'<<<'
'destination'
'>|'
'>&'
'<&'
'&>>'
'&>'
'>>'
'descriptor'
'index'
'=3D=3D'
'=3D~'
'argument'
'unsetenv'
'unset'
'local'
'readonly'
'export'
'typeset'
'declare'
']]'
'[['
'||'
'&&'
'|&'
'name'
'function'
';;&'
';&'
'fallthrough'
';;'
'termination'
'esac'
'case'
'else'
'elif'
'fi'
'then'
'if'
'done'
'do'
'until'
'while'
'))'
'update'
'condition'
'initializer'
'(('
'value'
'in'
'variable'
'select'
'for'
'redirect'
'body'
'\n'
'<<-'
'<<'
'+=3D'
'bash'
'\\s'
'\\'
'\\]'
'\\['


Yuan


--
Jo=C3=A3o Paulo L. de Ca= rvalho
Ph.D Computer Science | =C2=A0IC-UNICAMP | Campinas , SP - Brazil=
Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - C= anada


--
Jo= =C3=A3o Paulo L. de Carvalho
Ph.D Computer Science | =C2=A0IC-UNICAMP | = Campinas , SP - Brazil
Postdoctoral Research Fellow | University of Albe= rta | Edmonton, AB - Canada
--00000000000044f0c305ebc9b38e--