From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Andrew De Angelis Newsgroups: gmane.emacs.devel Subject: treesit: how to get it to parse multiple languages Date: Sun, 3 Nov 2024 13:28:57 -0500 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000001fa4a0626065855" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="994"; mail-complaints-to="usenet@ciao.gmane.io" To: emacs-devel@gnu.org Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sun Nov 03 19:44:19 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1t7faF-00008Y-0Q for ged-emacs-devel@m.gmane-mx.org; Sun, 03 Nov 2024 19:44:19 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1t7fZU-00019O-LU; Sun, 03 Nov 2024 13:43:33 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1t7fLc-0007Kk-J1 for emacs-devel@gnu.org; Sun, 03 Nov 2024 13:29:12 -0500 Original-Received: from mail-ua1-x930.google.com ([2607:f8b0:4864:20::930]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1t7fLa-0007Al-AH for emacs-devel@gnu.org; Sun, 03 Nov 2024 13:29:12 -0500 Original-Received: by mail-ua1-x930.google.com with SMTP id a1e0cc1a2514c-84fc7b58d4dso871297241.0 for ; Sun, 03 Nov 2024 10:29:09 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730658548; x=1731263348; darn=gnu.org; h=to:subject:message-id:date:from:mime-version:from:to:cc:subject :date:message-id:reply-to; bh=jnJuUd1YzL+PhdBizHJPxPODRsNKBy3TnauYrmjXOHc=; b=PygJR+D2MFCGwVN9UXKOQp/SXnfxNcAZHHmfFODXW0hfsXzBux5+jRJftRNoskDJc9 w322mn8bk4sUfsEst6hZmCxSt8udaGV7OqANHF7vG4taLg5hAqV9c3Exc5vf9wwe5wYf sCBWQHGHFEPpDCcPyEkMeLYO1ZsZuTF1mfUUW9aVqkyGmIkVGcAdgcNbfdlPKGbBeZYq MsiQmj65vsj3uqhnBxGdif8mnrZXNn5jKB92Tyu4Bg6qYSgCXl0kaQ57FsPXstYIosMw +x/uYZH7X+GgihXsAgVnqvseRgF2TE+MeHC+gjAArr+YNh4blc79Tv9+IrTMNUPAST9J OjYw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730658548; x=1731263348; h=to:subject:message-id:date:from:mime-version:x-gm-message-state :from:to:cc:subject:date:message-id:reply-to; bh=jnJuUd1YzL+PhdBizHJPxPODRsNKBy3TnauYrmjXOHc=; b=BmXr/LnfB7tAto65MVAodOShRurzsjeXBAflNjR1irljLUUSb56FdBLxZ+IjuQT1Dx SRssMs8MkIFase1Ci3IqNQIPPDpDeIzonnHv1QO8L3nb3kI3+tJgXfRK5+knJ1y4oCNT ChwGi0/JgHq2r7tXr83vxYjsEH2Xi+84burksY4AHiDCOBBgduGDyaeBuvb61YDwt5wE qVtQWSslRVRQt3pz3f9qjgVV580kMBlrDLRWN6W3SXTASVAeGSn3KlttkFJEhM8JYzWG Y7iRRCcVPbvOyDNAVMBmoMHl3o4/19xGkQ+bPmyerLw49ZiJlihNxYuhNKLImGA7Hn32 //ww== X-Gm-Message-State: AOJu0YzTThWMiukrqLXxQLObOvHccIqTWxGhX5LScdT++cqqSf5KK6Qn WmQ7l6w3khc0QYfWAQejCdXZUKKxgy1yT/+JXCD8QLsN3OrAoAXjAG3yhcrSXFTwk1Jwz9UN/Or KwZiqlhdgAAAkU1rfB/MIsDBbcqahLc5j X-Google-Smtp-Source: AGHT+IHzrKyMrHbNoJu22Bqnce+p9ORMkEi8XpR4+EhaSgv+r/+09APBNpxAnupM0tIPnMZvfnbXp2DQd+atSgadH/k= X-Received: by 2002:a05:6102:3906:b0:4a4:8928:718d with SMTP id ada2fe7eead31-4a9542a64c1mr13988050137.8.1730658548445; Sun, 03 Nov 2024 10:29:08 -0800 (PST) Received-SPF: pass client-ip=2607:f8b0:4864:20::930; envelope-from=bobodeangelis@gmail.com; helo=mail-ua1-x930.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Sun, 03 Nov 2024 13:43:28 -0500 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:325065 Archived-At: --00000000000001fa4a0626065855 Content-Type: text/plain; charset="UTF-8" Hello everyone and thanks for all your work! I'm trying to get a better understanding of treesit.el, and I've stumbled on a couple of things that make me think the manual is either outdated/faulty, or just not entirely clear and I'm missing something. The latter is most likely, but I'd appreciate any help in figuring out what exactly is wrong in my approach/setup. I would be happy to contribute to the manual, if needed, to ensure it is clearer. This is the relevant section of the manual: https://www.gnu.org/software/emacs/manual/html_node/elisp/Multiple-Languages.html I've started out with simply trying to recreate the setup described in the manual, but I've run into some issues. Here's what I've done so far: - I've defined a very simple `html-ts-mode`, using the elisp functions from the manual: https://github.com/andrewdea/poc-html-ts-mode/blob/main/html-ts-mode.el - I activate this mode when visiting the example.html file (which is also copied from the manual): https://github.com/andrewdea/poc-html-ts-mode/blob/main/example.html - the queries seem to be working as expected: when I'm in a buffer visiting example.html, evaluating `(treesit-query-capture 'html css-query)` and `(treesit-query-capture 'html js-query)` return the expected nodes - ISSUE: `treesit-update-ranges` doesn't seem to be working as expected: even if I call it multiple times, the parser for the whole buffer seems to still be 'html. `(treesit-language-at (point))` always returns 'html, even when I'm inside the nodes captured by the css-query or js-query. Some additional context: the reason I'm looking into tree-sitter (and its functionalities to support multiple languages) is to potentially use it to fontify markdown code blocks and to improve emacs support for python notebooks. For markdown, I was trying a similar approach to the HTML one described in the manual, but ran into other similar issues: https://www.reddit.com/r/emacs/comments/1gcrv8k/syntaxhighlighting_codeblocks_in_markdown/ . I'm just including this as context. Let me know if any of this is not clear. Thanks in advance for all your help! Best, Andrew --00000000000001fa4a0626065855 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hello everyone and thanks for all your work!

<= /div>
I'm trying to get a better understanding of treesit.el, and I= 've stumbled on a couple of things that make me think the manual is eit= her outdated/faulty, or just not entirely clear and I'm missing somethi= ng.

The latter is most likely, but I'd appreci= ate any help in figuring out what exactly is wrong in my approach/setup. I = would be happy to contribute to the manual, if needed, to ensure it is clea= rer.

I've started out with simply= trying to recreate the setup described in the manual, but I've run int= o some issues.=C2=A0
Here's what I've done so far:
=C2=A0- I've defined a very simple `html-ts-mode`, using the el= isp functions from the manual: https://github.com/andrewdea/poc-ht= ml-ts-mode/blob/main/html-ts-mode.el
=C2=A0- I activate this = mode when visiting the example.html file (which is also copied from the man= ual): https://github.com/andrewdea/poc-html-ts-mode/blob/main/example= .html
=C2=A0- the queries seem to be working as expected: whe= n I'm in a buffer visiting example.html, evaluating `(treesit-query-cap= ture 'html css-query)` and `(treesit-query-capture 'html js-query)`= return the expected nodes
=C2=A0- ISSUE: `treesit-update-ranges`= doesn't seem to be working as expected: even if I call it multiple tim= es, the parser for the whole buffer seems to still be 'html. `(treesit-= language-at (point))` always returns 'html, even when I'm inside th= e nodes captured by the css-query or js-query.

Som= e additional context: the reason I'm looking into tree-sitter (and its = functionalities to support multiple languages) is to potentially use it to = fontify markdown code blocks and to improve emacs support for python notebo= oks. For markdown, I was trying a similar approach to the HTML one describe= d in the manual, but ran into other similar issues: https://www.reddit.com/r/emacs/comments/1gcrv8k/syntaxhighlighting_cod= eblocks_in_markdown/.
I'm just including this as cont= ext.

Let me know if any of this is not clear.
<= /div>

Thanks in advance for all your help!
Best,
Andrew


<= /div> --00000000000001fa4a0626065855--