From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: treesit: how to get it to parse multiple languages Date: Mon, 4 Nov 2024 22:46:50 -0800 Message-ID: <5F722FF0-EE05-4259-A222-C69526C8C37F@gmail.com> References: <868qtzw6jh.fsf@gnu.org> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3776.700.51\)) Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="16223"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Andrew De Angelis , emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 05 07:51:30 2024 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1t8DPV-000413-SJ for ged-emacs-devel@m.gmane-mx.org; Tue, 05 Nov 2024 07:51:30 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1t8DLK-0004h2-Qe; Tue, 05 Nov 2024 01:47:12 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1t8DLI-0004KH-10 for emacs-devel@gnu.org; Tue, 05 Nov 2024 01:47:08 -0500 Original-Received: from mail-pg1-x52b.google.com ([2607:f8b0:4864:20::52b]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1t8DLF-0002ri-Ox; Tue, 05 Nov 2024 01:47:07 -0500 Original-Received: by mail-pg1-x52b.google.com with SMTP id 41be03b00d2f7-7ea8c4ce232so4199520a12.0; Mon, 04 Nov 2024 22:47:04 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1730789223; x=1731394023; darn=gnu.org; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:from:to:cc:subject:date :message-id:reply-to; bh=GVXMdWvIiQcXAXNI7n5J4k++824XiGyhq7oAtFxiSj4=; b=GFVV4oX0GJ8fW+XZET3LDzWQDtJ706OoG7QQfPm4w308QaMZjV6lAh1K9i94lFUSPI W2Yalh8SuJ8MvNZudmUnHhdAgwh/mwHZV0nnVUAI1eaF6q17t0OavkVgnVjqC41WzR8O csLZbBKgOL5OOA0spyxLBg3J8gXHBBA1vTHZ8JoO+t+5I5wZsj7vpjG7lTjYeetfkjSd qgQgUhPrZFWlU04uApOQATsgx8uysleFXZDyfbzHHZ4MEn9ni/Mr8ht0+IhW4iO9vGpZ lfha0gk9sX94M5FuIE3z0zwR/e1Hv5BUiAaFBlHikp5xNv0vMmY2jHq0t8IA9iuynSKM A6rg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1730789223; x=1731394023; h=to:references:message-id:content-transfer-encoding:cc:date :in-reply-to:from:subject:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=GVXMdWvIiQcXAXNI7n5J4k++824XiGyhq7oAtFxiSj4=; b=m8Yvcz05k1sY1p/DTcvtCBgCuN9CUlpd/W1PrSdcqSzzFjY0qfrxCD6XSo9kk4GZ1Q PRNbByYeVVbNAMvV6yd32snDShFQqkIm3uQveNlbHJ3PuEz/h7XMvMx+wcvb2cP/dlET OZGmxbgiyBsd7Kef7pM1//V1ERNMo6+FKMxBTXHCfnCUd6G8hLRRZvGnXybY8PMRrEkI dc5pUVUt+I/xzfhztuv0A4RMxSOE/pWIOG6/ce0CvNKzq0pQDoT/1XLWhoVBJwgOhjqg LyJOz1PElX7U7Sxpyx6/qG5PPWkRn+9NU1eUuqjTuSP6um4YSn4HVN1nbIS3iTY/oGuk E/1g== X-Forwarded-Encrypted: i=1; AJvYcCXQvtupnm89Aih8xPEY8ZSXlvR8jxn//DOmyEUtVnccluuHl2o4EMd8M+p9dOK7Yp8g+MRu3KOpBwk7QQ==@gnu.org X-Gm-Message-State: AOJu0YyhxxvHpkcFfJh2zamN4vaBvunk1v3ld55HAC7tiR4/B1GO4QPi os30eFR1012BbrZwDrpByNX6ObHR/7T2JQ2ZzHt0nLbNwkIR7sHDIRzyNOrG X-Google-Smtp-Source: AGHT+IGie2sj2lT/DW4T7obQC7oIHWZECA/PqKLD8jk2c2vYVnW7qtjTbNUF0mNYYgeeDNJ2t6rQLA== X-Received: by 2002:a05:6a21:32a1:b0:1db:f02d:dd49 with SMTP id adf61e73a8af0-1dbf02dde19mr2753920637.40.1730789222687; Mon, 04 Nov 2024 22:47:02 -0800 (PST) Original-Received: from smtpclient.apple ([2601:646:8f81:6120:1406:c1a3:4ca8:f33e]) by smtp.gmail.com with ESMTPSA id 98e67ed59e1d1-2e92fbe0252sm11169354a91.45.2024.11.04.22.47.01 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 04 Nov 2024 22:47:02 -0800 (PST) In-Reply-To: <868qtzw6jh.fsf@gnu.org> X-Mailer: Apple Mail (2.3776.700.51) Received-SPF: pass client-ip=2607:f8b0:4864:20::52b; envelope-from=casouri@gmail.com; helo=mail-pg1-x52b.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:325127 Archived-At: > On Nov 4, 2024, at 4:02=E2=80=AFAM, Eli Zaretskii = wrote: >=20 >> From: Andrew De Angelis >> Date: Sun, 3 Nov 2024 13:28:57 -0500 >>=20 >> I'm trying to get a better understanding of treesit.el, and I've = stumbled on a couple of things that make me >> think the manual is either outdated/faulty, or just not entirely = clear and I'm missing something. >>=20 >> The latter is most likely, but I'd appreciate any help in figuring = out what exactly is wrong in my >> approach/setup. I would be happy to contribute to the manual, if = needed, to ensure it is clearer. >>=20 >> This is the relevant section of the manual: >> = https://www.gnu.org/software/emacs/manual/html_node/elisp/Multiple-Languag= es.html >> I've started out with simply trying to recreate the setup described = in the manual, but I've run into some >> issues.=20 >> Here's what I've done so far: >> - I've defined a very simple `html-ts-mode`, using the elisp = functions from the manual: >> = https://github.com/andrewdea/poc-html-ts-mode/blob/main/html-ts-mode.el >> - I activate this mode when visiting the example.html file (which is = also copied from the manual): >> https://github.com/andrewdea/poc-html-ts-mode/blob/main/example.html >> - the queries seem to be working as expected: when I'm in a buffer = visiting example.html, evaluating >> `(treesit-query-capture 'html css-query)` and `(treesit-query-capture = 'html js-query)` return the expected >> nodes >> - ISSUE: `treesit-update-ranges` doesn't seem to be working as = expected: even if I call it multiple times, the >> parser for the whole buffer seems to still be 'html. = `(treesit-language-at (point))` always returns 'html, even >> when I'm inside the nodes captured by the css-query or js-query. >>=20 >> Some additional context: the reason I'm looking into tree-sitter (and = its functionalities to support multiple >> languages) is to potentially use it to fontify markdown code blocks = and to improve emacs support for python >> notebooks. For markdown, I was trying a similar approach to the HTML = one described in the manual, but ran >> into other similar issues: >> = https://www.reddit.com/r/emacs/comments/1gcrv8k/syntaxhighlighting_codeblo= cks_in_markdown/. >> I'm just including this as context. >>=20 >> Let me know if any of this is not clear. >>=20 >> Thanks in advance for all your help! >=20 > Yuan, can you help Andrew? Ah yes, thanks for the ping. Andrew, I take that your problem is with = treesit-language-at, right? Specifically, it doesn=E2=80=99t return = expected results. That=E2=80=99s because for treesit-language-at to = work, major mode needs to define treesit-language-at-function. This confusion has came up a couple times now, evidently = treesit-language-at is not very intuitive. Hopefully it=E2=80=99ll be = fixed by our updated manual for Emacs 30. In Emacs 30, we define = treesit-language-at-function in the example code: Emacs automates this process in =E2=80=98treesit-update-ranges=E2=80=99= . A multi-language major mode should set =E2=80=98treesit-range-settings=E2=80= =99 so that =E2=80=98treesit-update-ranges=E2=80=99 knows how to perform this = process automatically. Major modes should use the helper function =E2=80=98treesit-range-rules=E2= =80=99 to generate a value that can be assigned to =E2=80=98treesit-range-settings=E2= =80=99. The settings in the following example directly translate into operations shown above. (setq treesit-range-settings (treesit-range-rules :embed 'javascript :host 'html '((script_element (raw_text) @capture)) :embed 'css :host 'html '((style_element (raw_text) @capture)))) ;; Major modes with multiple languages should always set ;; `treesit-language-at-point-function' (which see). (setq treesit-language-at-point-function (lambda (pos) (let* ((node (treesit-node-at pos 'html)) (parent (treesit-node-parent node))) (cond ((and node parent (equal (treesit-node-type node) "raw_text") (equal (treesit-node-type parent) = "script_element")) 'javascript) ((and node parent (equal (treesit-node-type node) "raw_text") (equal (treesit-node-type parent) = "style_element")) 'css) (t 'html))))) And FYI, in Emacs 30 we added local parsers, that might make = implementing code/markdown blocks in a notebook easier. Yuan=