From: Andrew De Angelis <bobodeangelis@gmail.com>
To: Yuan Fu <casouri@gmail.com>
Cc: Eli Zaretskii <eliz@gnu.org>, emacs-devel@gnu.org
Subject: Re: treesit: how to get it to parse multiple languages
Date: Sun, 10 Nov 2024 09:35:40 -0500 [thread overview]
Message-ID: <CAP5CrM11u3Nz1ZsPfui0i-U1kfyVNO5Gs=VFmh7skPEWv4pzvA@mail.gmail.com> (raw)
In-Reply-To: <5F722FF0-EE05-4259-A222-C69526C8C37F@gmail.com>
[-- Attachment #1: Type: text/plain, Size: 5574 bytes --]
Thanks so much!
I took a look at the Emacs 30 manual and it's a lot clearer, it's perfect!
I think one thing that would truly be ideal is if there is a major mode out
there that already implements multiple-language functionalities using
treesitter. Seeing all the components in action would be quite helpful: the
simple HTML examples are very clarifying but they can only do so much.
Do you all know if such a mode exists? `(ripgrep-regexp "local-parser"
source-directory)` on the master branch only shows me matches in
`treesit.el` itself (and associated ChangeLog / manual).
If it doesn't exist yet I'm happy to give it a knack when implementing the
notebook mode. I might have run into some more questions then :)
On Tue, Nov 5, 2024 at 1:47 AM Yuan Fu <casouri@gmail.com> wrote:
>
>
> > On Nov 4, 2024, at 4:02 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> >
> >> From: Andrew De Angelis <bobodeangelis@gmail.com>
> >> Date: Sun, 3 Nov 2024 13:28:57 -0500
> >>
> >> I'm trying to get a better understanding of treesit.el, and I've
> stumbled on a couple of things that make me
> >> think the manual is either outdated/faulty, or just not entirely clear
> and I'm missing something.
> >>
> >> The latter is most likely, but I'd appreciate any help in figuring out
> what exactly is wrong in my
> >> approach/setup. I would be happy to contribute to the manual, if
> needed, to ensure it is clearer.
> >>
> >> This is the relevant section of the manual:
> >>
> https://www.gnu.org/software/emacs/manual/html_node/elisp/Multiple-Languages.html
> >> I've started out with simply trying to recreate the setup described in
> the manual, but I've run into some
> >> issues.
> >> Here's what I've done so far:
> >> - I've defined a very simple `html-ts-mode`, using the elisp functions
> from the manual:
> >> https://github.com/andrewdea/poc-html-ts-mode/blob/main/html-ts-mode.el
> >> - I activate this mode when visiting the example.html file (which is
> also copied from the manual):
> >> https://github.com/andrewdea/poc-html-ts-mode/blob/main/example.html
> >> - the queries seem to be working as expected: when I'm in a buffer
> visiting example.html, evaluating
> >> `(treesit-query-capture 'html css-query)` and `(treesit-query-capture
> 'html js-query)` return the expected
> >> nodes
> >> - ISSUE: `treesit-update-ranges` doesn't seem to be working as
> expected: even if I call it multiple times, the
> >> parser for the whole buffer seems to still be 'html.
> `(treesit-language-at (point))` always returns 'html, even
> >> when I'm inside the nodes captured by the css-query or js-query.
> >>
> >> Some additional context: the reason I'm looking into tree-sitter (and
> its functionalities to support multiple
> >> languages) is to potentially use it to fontify markdown code blocks and
> to improve emacs support for python
> >> notebooks. For markdown, I was trying a similar approach to the HTML
> one described in the manual, but ran
> >> into other similar issues:
> >>
> https://www.reddit.com/r/emacs/comments/1gcrv8k/syntaxhighlighting_codeblocks_in_markdown/
> .
> >> I'm just including this as context.
> >>
> >> Let me know if any of this is not clear.
> >>
> >> Thanks in advance for all your help!
> >
> > Yuan, can you help Andrew?
>
> Ah yes, thanks for the ping. Andrew, I take that your problem is with
> treesit-language-at, right? Specifically, it doesn’t return expected
> results. That’s because for treesit-language-at to work, major mode needs
> to define treesit-language-at-function.
>
> This confusion has came up a couple times now, evidently
> treesit-language-at is not very intuitive. Hopefully it’ll be fixed by our
> updated manual for Emacs 30. In Emacs 30, we define
> treesit-language-at-function in the example code:
>
> Emacs automates this process in ‘treesit-update-ranges’. A
> multi-language major mode should set ‘treesit-range-settings’ so that
> ‘treesit-update-ranges’ knows how to perform this process automatically.
> Major modes should use the helper function ‘treesit-range-rules’ to
> generate a value that can be assigned to ‘treesit-range-settings’. The
> settings in the following example directly translate into operations
> shown above.
>
> (setq treesit-range-settings
> (treesit-range-rules
> :embed 'javascript
> :host 'html
> '((script_element (raw_text) @capture))
> :embed 'css
> :host 'html
> '((style_element (raw_text) @capture))))
>
> ;; Major modes with multiple languages should always set
> ;; `treesit-language-at-point-function' (which see).
> (setq treesit-language-at-point-function
> (lambda (pos)
> (let* ((node (treesit-node-at pos 'html))
> (parent (treesit-node-parent node)))
> (cond
> ((and node parent
> (equal (treesit-node-type node) "raw_text")
> (equal (treesit-node-type parent) "script_element"))
> 'javascript)
> ((and node parent
> (equal (treesit-node-type node) "raw_text")
> (equal (treesit-node-type parent) "style_element"))
> 'css)
> (t 'html)))))
>
> And FYI, in Emacs 30 we added local parsers, that might make implementing
> code/markdown blocks in a notebook easier.
>
> Yuan
[-- Attachment #2: Type: text/html, Size: 7221 bytes --]
next prev parent reply other threads:[~2024-11-10 14:35 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-03 18:28 treesit: how to get it to parse multiple languages Andrew De Angelis
2024-11-04 12:02 ` Eli Zaretskii
2024-11-05 6:46 ` Yuan Fu
2024-11-10 14:35 ` Andrew De Angelis [this message]
2024-11-10 22:47 ` Peter Oliver
2024-11-11 19:28 ` Juri Linkov
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to='CAP5CrM11u3Nz1ZsPfui0i-U1kfyVNO5Gs=VFmh7skPEWv4pzvA@mail.gmail.com' \
--to=bobodeangelis@gmail.com \
--cc=casouri@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).