From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: =?UTF-8?Q?Jo=C3=A3o_Paulo_Labegalini_de_Carvalho?= Newsgroups: gmane.emacs.devel Subject: Re: Implementation direction for shell-script-mode with tree-sitter Date: Wed, 26 Oct 2022 09:48:10 -0600 Message-ID: References: <87mt9j1k50.fsf@yahoo.com> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000051b6c205ebf1f49d" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="711"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, Yuan Fu , Theodor Thornhill , Eli Zaretskii To: Po Lu Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Wed Oct 26 17:50:48 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1onig3-000AWs-QF for ged-emacs-devel@m.gmane-mx.org; Wed, 26 Oct 2022 17:50:47 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1onido-0006mW-4M; Wed, 26 Oct 2022 11:48:28 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1onidn-0006dV-28 for emacs-devel@gnu.org; Wed, 26 Oct 2022 11:48:27 -0400 Original-Received: from mail-oa1-x36.google.com ([2001:4860:4864:20::36]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1onidj-0000PN-Vp; Wed, 26 Oct 2022 11:48:26 -0400 Original-Received: by mail-oa1-x36.google.com with SMTP id 586e51a60fabf-13ae8117023so20551540fac.9; Wed, 26 Oct 2022 08:48:23 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=g6cLAwnxr+1WMjqzinsU1mW1deH+78THm1H+TW5ntDI=; b=D1U9qlZ6u1cLiVE9PtAh3cqzHCCY0gVjW96KsHJnfPM1D+wgdpBaMkwKcQ55rXXT2r B/8ysIFmJG81Y9wQ1/DGv+ZEW2zIGm/UVjVaTwyiLGqZaFeW0a8wWp6P+281kk7FzF9u A8CDk+cpiE+ViH/9q6TF6YJuib29cp+Gf33JDG8WTkCKCE76pFkzWE5CXgr+x9LhJfJU 5sR+E4I1TeJ5anbd7xgJTmffj8Zifod1TW7dxbTSdcHxSdueJdrUW5i0zPe7I2IDKRri JeJBRDJyJnC0yL7Wenf7yPAdLWoDiKN61GSwoi9ZiuooOQggzAFo3ApGZYaKHNLgZSjl gJ6Q== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=g6cLAwnxr+1WMjqzinsU1mW1deH+78THm1H+TW5ntDI=; b=DKWou5QzD0mn/hJV1fC0LOXLOm+rllIe/kcEvNowzPUAIwEDVWHKFDxqz4VSavcqEY TVlM2/+hyUBSPaLIr3GUAVqMTlOoT6NESSndZVSwbJkxoyXriSdjSOmdlEf1giuxtKE2 GJIFT6i8FCqeZQtnCS2wP5r4aN34s21HkM+B+MujH70MUjdkZckS3gGvTBiJ/nGC8kn/ 5Ad/6QWf7tq/QQokmVwBYgB60J4yif80QslsIY6XUtEznohCFy+AQpHvgZWxbSh92aGp LwPtIYWT9m2Rl4KTkFrK3R/GTDABDBghVTy8mkc3WyepLWxX7jkyZnAkEk/RFXqeoYz/ OOHQ== X-Gm-Message-State: ACrzQf0cDuNUHQEDBuwRbkD3f/6a/braB9HSauaubea2FrxdAadKsgxh hG+064dK8PsX7iyFExfjNmpPnwcwSWggEd9jtL4= X-Google-Smtp-Source: AMsMyM4LgnpSO/K9n2ycQDdxrHeNDGoMbR9yd7bb+7tAaA8wyVu304NOE7WXgUvJnmchBoRY6a3QYkMEBrlYnuLK0uM= X-Received: by 2002:a05:6870:ac0d:b0:13b:b9e7:e6d0 with SMTP id kw13-20020a056870ac0d00b0013bb9e7e6d0mr2499852oab.17.1666799302193; Wed, 26 Oct 2022 08:48:22 -0700 (PDT) In-Reply-To: <87mt9j1k50.fsf@yahoo.com> Received-SPF: pass client-ip=2001:4860:4864:20::36; envelope-from=jaopaulolc@gmail.com; helo=mail-oa1-x36.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: "Emacs-devel" Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:298555 Archived-At: --00000000000051b6c205ebf1f49d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable N.B that shell-script-mode supports shells other than bash, so maybe a > tree-sitter-bash based variant should be spun off into its own > `bash-mode', as opposed to possibly interfering with the support for all > of rest? > That is a very good point and that is the reason why I am pursuing an implementation that reuses the work from the existing fontification code. My goal is to use the `sh-feature' function to retrieve everything that needs to be fontified for the buffer's shell variant. It has worked well for most built-in commands (e.g. alias & source) but not as great for reserved words like "time" and "coproc". As Stefen pointed out, some reserved words complicate the grammar, so I believe that is the reason why the tree-sitter-bash folks decided to keep them out of the grammar. This becomes a nuisance because then I have to use two queries to match keywords. One simple query to match the recognized keywords and another that matches all commands but only fontifies the ones that belong to the un-recognized list. To build such a list I have to explicitly construct a list of recognized keywords using literals and that goes against my goal of reusing pre-existing functionality by relying on `sh-feature'. Built-in commands are a lesser nuisance as some of them (e.g. local, declare, and typeset) are not commands but declaration_commands in tree-sitter-bash grammar. But for those I just have two queries and I don't need to create a variable per shell variant. Aside from that, I am trying to extract the keywords in `sh-font-lock-{var, var-1, var-2}' variables to replicate the fontification based on the level selected by the user. But parsing those is more intricate as the return is a list where each element is either a list of the form: - (regex level font-face), or - (regex list [list]), where list is of the form (level face) or (level function function-args). The logic to parse those I believe exists within font-lock-mode or font-core, but I am not sure if I would be able to use the forms above as they use functions that depend on variables that the user might tweak. I don't know if the compiled queries in *-treesit-settings would be recompiled to achieve the same flexibility as the existing fontification code. --=20 Jo=C3=A3o Paulo L. de Carvalho Ph.D Computer Science | IC-UNICAMP | Campinas , SP - Brazil Postdoctoral Research Fellow | University of Alberta | Edmonton, AB - Canad= a joao.carvalho@ic.unicamp.br joao.carvalho@ualberta.ca --00000000000051b6c205ebf1f49d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable


=
N.B that shell-script-mod= e supports shells other than bash, so maybe a
tree-sitter-bash based variant should be spun off into its own
`bash-mode', as opposed to possibly interfering with the support for al= l
of rest?

That is a very good point and tha= t is the reason why=C2=A0I am pursuing=C2=A0an implementation that reuses t= he work from the existing fontification code.

My goal is to use the = `sh-feature' function to retrieve everything=C2=A0that needs to be font= ified for the buffer's shell variant.=C2=A0

It has worked well f= or most built-in commands (e.g. alias & source) but not as great for re= served words=C2=A0like "time" and "coproc". As Stefen p= ointed out, some reserved=C2=A0words complicate the grammar, so I believe t= hat is the reason why the tree-sitter-bash folks decided=C2=A0to keep them = out of the grammar.=C2=A0

This becomes=C2=A0a nuisance=C2=A0because = then I have to use two queries to match keywords. One simple query to match= the recognized keywords and another that matches all commands but only fon= tifies the ones that belong to the un-recognized list. To build such a list= I have to explicitly construct a list of recognized keywords using literal= s and that goes against my goal of reusing pre-existing functionality by re= lying on `sh-feature'.

Built-in commands are a lesser nuisance a= s some of them (e.g. local, declare, and typeset) are not commands but decl= aration_commands in tree-sitter-bash grammar. But for those I just have two= queries and I don't need to create a variable per shell variant.
Aside from that, I am trying to extract the keywords in `sh-font-lock-{va= r, var-1, var-2}' variables to replicate the fontification based on the= level selected by the user. But parsing those is more intricate as the ret= urn is a list where each element is either a list of the form:
  • (= regex level font-face), or
  • (regex list [list]), where list is of th= e form (level face) or (level function function-args).
Th= e logic to parse those I believe exists within font-lock-mode or font-core,= but I am not sure if I would be able to use the forms above as they use fu= nctions that depend on variables that the user might tweak. I don't kno= w if the compiled queries in *-treesit-settings would be recompiled to achi= eve the same flexibility as the existing fontification code.

--
Jo=C3=A3o Paulo L. de Carvalho
Ph.D Com= puter Science | =C2=A0IC-UNICAMP | Campinas , SP - Brazil
Postdoctoral R= esearch Fellow | University of Alberta | Edmonton, AB - Canada
=
--00000000000051b6c205ebf1f49d--