From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Yuan Fu Newsgroups: gmane.emacs.devel Subject: Re: Questions about tree-sitter Date: Mon, 18 Sep 2023 21:00:06 -0700 Message-ID: <9AF2C9AC-40A8-4DCA-8680-F5433FA9F880@gmail.com> References: <12fe5895-7d34-4f3e-b1cf-aa133b718c24@mailo.com> <52f09345-85c8-4049-b12d-bf8b84b08f75@mailo.com> <2B46C452-DC8B-4BD0-A64B-8773235C1FA8@gmail.com> <3b3f90e8-a318-4b63-915e-6477701de897@mailo.com> <1F227B69-6195-4115-A7B6-BD2F7EA08E1F@gmail.com> <2E2323FB-1A32-48B6-992E-66F474427199@gmail.com> <9fef551c-3a8d-4df8-ac1d-a9884b73da5d@mailo.com> <8ECA1B2A-CAB9-4500-A193-3A898A382C4C@gmail.com> <23661f5b-51bf-4e23-a134-51a52764fe26@mailo.com> Mime-Version: 1.0 (Mac OS X Mail 16.0 \(3731.700.6\)) Content-Type: multipart/mixed; boundary="Apple-Mail=_24D06184-D951-4198-A962-0FC3EE5BA54E" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="28079"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel To: =?utf-8?B?IkF1Z3VzdGluIENow6luZWF1IChCVHVpbiki?= Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Sep 19 06:01:24 2023 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qiRvQ-0007Dg-62 for ged-emacs-devel@m.gmane-mx.org; Tue, 19 Sep 2023 06:01:24 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qiRuS-00046V-CV; Tue, 19 Sep 2023 00:00:24 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qiRuQ-00045S-53 for emacs-devel@gnu.org; Tue, 19 Sep 2023 00:00:22 -0400 Original-Received: from mail-pj1-x102f.google.com ([2607:f8b0:4864:20::102f]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qiRuO-0003sA-CH for emacs-devel@gnu.org; Tue, 19 Sep 2023 00:00:21 -0400 Original-Received: by mail-pj1-x102f.google.com with SMTP id 98e67ed59e1d1-2769920fa24so392358a91.1 for ; Mon, 18 Sep 2023 21:00:19 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1695096019; x=1695700819; darn=gnu.org; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:from:to:cc:subject:date:message-id:reply-to; bh=40ol3mbwXQKruRi5Y5aUwlU3mmbBmBhjWJ0MtTOyLc4=; b=IPc4AzDhSZwfrT4BEVgyYUT15geApbOFeNNkgHV0J94SfhTvIuRgEGCw7MzNgUX59h mrKpHBkqvhg06qf2qDvayEk0kBbUJSVu3CDRRNZ/LSgmNaHxktwov/I/q+5Gm+MYZY26 9mHOJ8Vfl0vUPAZyNn3TrbG8hFm8MoR4MKYPSZUR2eWqwTOa4gyB7Kg88p20JRaM4Ww2 cNd0MTU5Q6aDgWqe/3saY+ea3BQDbQJmDG2h5EDfrh4cOZ9ljUiTyrq7uK13K2akEWrh DXM4uP8ID9zuEycqvqMdQwWwmQsIoyVIV7QyFGRQwgK/spmpyqDr3LLA8/GtrWpldz2L Mgdg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1695096019; x=1695700819; h=references:to:cc:in-reply-to:date:subject:mime-version:message-id :from:x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=40ol3mbwXQKruRi5Y5aUwlU3mmbBmBhjWJ0MtTOyLc4=; b=GnAqlKbmvHN2GUog/ep5rb/3wBlGqoERENryFLSshq+Bk85NUnVM92ePPuBGbep3lL xIZ5ZfO0499VMn68XQ6eu2Ih5v6OQrUDLfQeXrjj+zRCJCUdUCQG1CKbvK52hCY3kc08 bwnvByv8DpNBXPNqd9oR5P212gli8bHfHEGyXSrv2owbIJJkVZ6EQlWehSxssQ+y6O00 9Z6Yuin+vYzFAUU3duRcOHrHpqB6NkmO3/J4Wd/8f5BFl18TzvgsVsbP4M4qCp6w5NLl OKucE9YljBH5AQ0ODFprYWkKnBEot4fCFg4p5ew0tCf0YFipeRk0FvGq5oU6iNiMtQsR fPBA== X-Gm-Message-State: AOJu0YzBQbeU1AZFBFPtjZI9hPR2zSSVgEvBFP9vr/JDbnFo52oiPiPc LiXrEn9zlMUmXeiu+cgGb86vKpYT0Rg= X-Google-Smtp-Source: AGHT+IGDpRHoT7CE9KB6HVeTphjRemPodub7IRQj9UBNoqpZN3DqDO7GHW34vJpUEfnpEJ8Rd3i0sQ== X-Received: by 2002:a17:90a:8d04:b0:268:ef84:b0cd with SMTP id c4-20020a17090a8d0400b00268ef84b0cdmr9538336pjo.18.1695096018754; Mon, 18 Sep 2023 21:00:18 -0700 (PDT) Original-Received: from smtpclient.apple (cpe-172-117-161-177.socal.res.rr.com. [172.117.161.177]) by smtp.gmail.com with ESMTPSA id x4-20020a17090a530400b0025023726fc4sm9426610pjh.26.2023.09.18.21.00.17 (version=TLS1_2 cipher=ECDHE-ECDSA-AES128-GCM-SHA256 bits=128/128); Mon, 18 Sep 2023 21:00:18 -0700 (PDT) In-Reply-To: <23661f5b-51bf-4e23-a134-51a52764fe26@mailo.com> X-Mailer: Apple Mail (2.3731.700.6) Received-SPF: pass client-ip=2607:f8b0:4864:20::102f; envelope-from=casouri@gmail.com; helo=mail-pj1-x102f.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:310735 Archived-At: --Apple-Mail=_24D06184-D951-4198-A962-0FC3EE5BA54E Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii >=20 > It indeed works much better, thanks! >=20 > I found a bug and a way to replicate it (you'll need to update your = Bison grammar): > - Open the file "treesit-bug-highlighting-demo"; > - Enable bison-ts-mode; > - At the beginning of the second line (the part managed by the = embedded C parser, with "static void ..."), add a space; >=20 > =3D> The whole line loses its highlighting. >=20 > If you add a space again, the highlighting works correctly again. > Not a big issue, but pretty weird. Thanks. Weird indeed. I found the bug and fixed it. Latest master should = work fine now. >=20 > Also, I have one (last?) question: >=20 > Since the C code uses its own indentation, it's entirely independent = of Bison's nodes positions. > Is it possible to add an offset to the indentation of the embedded = parts, relative to its container node? >=20 > For instance, rather than: >=20 > %% > grammar_declaration: > grammar_rule > { > int myvar; > } > ; > %% >=20 >=20 >=20 > I would like to get >=20 >=20 >=20 > %% > grammar_declaration: > grammar_rule > { > int myvar; > } > ; > %% >=20 >=20 > ("int myvar;" is managed by a C parser). Makes sense. You can use a custom matcher to indent the top level nodes = in the C code. I modified your mode for a POC. The modified parts are = marked with "!!!". Yuan --Apple-Mail=_24D06184-D951-4198-A962-0FC3EE5BA54E Content-Disposition: attachment; filename=bison-ts-mode.el Content-Type: application/octet-stream; x-unix-mode=0644; name="bison-ts-mode.el" Content-Transfer-Encoding: 7bit ;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*- ;;; Commentary: ;;; Code: (require 'treesit) (require 'c-ts-mode) (declare-function treesit-parser-create "treesit.c") (declare-function treesit-induce-sparse-tree "treesit.c") (declare-function treesit-node-child-by-field-name "treesit.c") (declare-function treesit-search-subtree "treesit.c") (declare-function treesit-node-parent "treesit.c") (declare-function treesit-node-next-sibling "treesit.c") (declare-function treesit-node-type "treesit.c") (declare-function treesit-node-child "treesit.c") (declare-function treesit-node-end "treesit.c") (declare-function treesit-node-start "treesit.c") (declare-function treesit-node-string "treesit.c") (declare-function treesit-query-compile "treesit.c") (declare-function treesit-query-capture "treesit.c") (declare-function treesit-parser-add-notifier "treesit.c") (declare-function treesit-parser-buffer "treesit.c") (declare-function treesit-parser-list "treesit.c") (defgroup bison nil "Support for the Bison and Flex." :group 'languages) (defcustom bison-ts-mode-indent-offset 4 "Number of spaces for each indentation step in `bison-ts-mode'." :version "30.1" :type 'integer :safe 'integerp :group 'bison) (defun treesit--merge-feature-lists (l1 l2) "Merge the lists of lists L1 and L2. The first sublist of L1 is merged with the first sublist of L2 and so on. L1 and L2 don't need to have the same size." (let ((res ())) (while (or l1 l2) (setq res (push (append (car l1) (car l2)) res)) (setq l1 (cdr l1) l2 (cdr l2))) (nreverse res))) (defun bison-ts--language-at-point-function (position) "Return the language at POSITION." (let* ((node (treesit-node-at position 'bison))) (if (equal (treesit-node-type node) "embedded_code") 'c 'bison))) (defun bison-ts--font-lock-settings (language) (treesit-font-lock-rules :language language :feature 'bison-comment '((comment) @font-lock-comment-face) :language language :feature 'bison-declaration '((declaration (declaration_name) @font-lock-keyword-face)) :language language :feature 'bison-type '((type) @font-lock-type-face) :language language :feature 'bison-grammar-rule-usage '((grammar_rule_identifier) @font-lock-variable-use-face) :language language :feature 'bison-grammar-rule-declaration '((grammar_rule (grammar_rule_declaration) @font-lock-variable-use-face)) :language language :feature 'bison-string :override t '((string) @font-lock-string-face) :language language :feature 'bison-literal :override t '((char_literal) @font-lock-keyword-face (number_literal) @font-lock-number-face) :language language :feature 'bison-directive-grammar-rule :override t '((grammar_rule (directive) @font-lock-keyword-face)) :language language :feature 'bison-operator :override t '(["|"] @font-lock-operator-face) :language language :feature 'bison-delimiter :override t '([";"] @font-lock-delimiter-face))) (treesit-query-validate 'bison '((grammar_rule (grammar_rule_declaration) @font-lock-variable-name-face))) (defvar bison-ts-mode--font-lock-feature-list '(( bison-comment bison-declaration bison-type bison-grammar-rule-usage bison-grammar-rule-declaration bison-string bison-literal bison-directive-grammar-rule bison-operator bison-delimiter))) ;; !!!New matcher (defun bison-ts-mode--bison-parent (_node _parent bol &rest _) "Get the parent of the bison node at BOL." (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison)))) (defun bison-ts--indent-rules () "Indent rules supported by `bison-ts-mode'." (let* ((common `(((node-is "^declaration$") column-0 0) ((and (parent-is "^declaration$") (not (node-is "^code_block$"))) column-0 2) ((and (parent-is "^declaration$") (node-is "^code_block$")) column-0 0) ((parent-is "^declaration$") parent 2) ((node-is "^grammar_rule$") column-0 0) ((and (parent-is "^grammar_rule$") (node-is ";")) column-0 bison-ts-mode-indent-offset) ((and (parent-is "^grammar_rule$") (node-is "|")) column-0 bison-ts-mode-indent-offset) ((and (parent-is "^grammar_rule$") (not (node-is "^grammar_rule_declaration$")) (not (node-is "^action$"))) column-0 ,(+ bison-ts-mode-indent-offset 2)) ((or (node-is "^action$") (node-is "}")) column-0 12) ;; Set '%%' at the beginning of the line ((or (and (parent-is "^grammar_rules_section$") (node-is "%%")) (node-is "^grammar_rules_section$")) column-0 0) (no-node parent 0) ) )) `((bison . ,common) ;; !!! Import and override C rules. (c ((parent-is "translation_unit") bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset) ,@(alist-get 'c (c-ts-mode--get-indent-style 'c)))))) (define-derived-mode bison-ts-mode prog-mode "Bison" "A mode for Bison." (when (treesit-ready-p 'bison) (setq-local treesit-font-lock-settings (append (bison-ts--font-lock-settings 'bison) (c-ts-mode--font-lock-settings 'c))) (setq-local treesit-font-lock-feature-list (treesit--merge-feature-lists bison-ts-mode--font-lock-feature-list c-ts-mode--feature-list)) (setq-local treesit-simple-imenu-settings `(("Grammar" "\\`grammar_rule_declaration\\'" nil (lambda (node) (treesit-node-text node) )))) (setq-local treesit-simple-indent-rules ;; !!! C rules are imported already. (bison-ts--indent-rules)) (setq-local treesit-language-at-point-function 'bison-ts--language-at-point-function) (setq-local treesit-range-settings (treesit-range-rules :embed 'c :host 'bison :local t '((embedded_code) @capture) )) (treesit-major-mode-setup))) (provide 'bison-ts-mode) ;;; bison-ts-mode.el ends here --Apple-Mail=_24D06184-D951-4198-A962-0FC3EE5BA54E Content-Transfer-Encoding: 7bit Content-Type: text/plain; charset=us-ascii --Apple-Mail=_24D06184-D951-4198-A962-0FC3EE5BA54E--