unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Questions about tree-sitter
@ 2023-08-29 21:26 Augustin Chéneau (BTuin)
  2023-08-30  7:03 ` Yuan Fu
  0 siblings, 1 reply; 21+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-08-29 21:26 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1477 bytes --]

Hello,

I have a few questions about tree-sitter.

I'm currently developing a grammar for GNU Bison alongside a tree-sitter
major mode, it's a work in progress.  The grammar is here:
<https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
far able to parse simple files, and the major mode prototype is
attached to this message.

So, the questions:

1. Is there a way to reload a grammar?

Emacs is pretty nice as a playground for testing grammars, but once a
grammar is loaded, it won't be loaded again until Emacs restarts (as far
as I know).
Is it possible to reload a grammar after modifying it?


2. How to mix multiple languages?

It would be very useful for Bison since its mixed with C or other languages.
According to the documentation I need to use the function
`treesit-range-rules` to set the variable `treesit-range-settings`, but
it seems to have no effect.  The language in the selected nodes doesn't
change (as attested by `(treesit-language-at (point))`).

I did it that way (extracted from the attachment):

(setq-local treesit-range-settings
       (treesit-range-rules
        :embed 'c
        :host 'bison
        '((undelimited_code_block) @capture)))

Am I missing something?


3. Is it possible to trigger a hook when a node is modified?

Since Bison supports multiple languages (C, C++, Java and D), I'd like
to watch the declaration "%language LANGUAGE" to change the embedded
language when needed.
Is there a way to do that?


Thanks!

[-- Attachment #2: bison-ts-mode.el --]
[-- Type: text/plain, Size: 1768 bytes --]

;;; bison-ts-mode --- Tree-sitter mode for Bison

;;; Commentary:

;;; Code:

(require 'treesit)

(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")

(defun bison-ts--font-lock-settings (language)
  (treesit-font-lock-rules
   :language language
   :feature 'comment
   '((comment) @font-lock-comment-face)

  :language language
  :feature 'declaration
  '((declaration (declaration_name) @font-lock-keyword-face))))

(define-derived-mode bison-ts-mode prog-mode "Bison"
  "A mode for Bison."
  (when (treesit-ready-p 'bison)
	(setq-local treesit-font-lock-settings (bison-ts--font-lock-settings 'bison))
	(setq-local treesit-font-lock-feature-list
                '((comment)
                  (declaration)))

	(setq-local treesit-range-settings
          (treesit-range-rules
           :embed 'c
           :host 'bison
           '((undelimited_code_block) @capture)))

    (treesit-major-mode-setup)))

(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-08-29 21:26 Questions about tree-sitter Augustin Chéneau (BTuin)
@ 2023-08-30  7:03 ` Yuan Fu
  2023-08-30 11:28   ` Augustin Chéneau (BTuin)
                     ` (2 more replies)
  0 siblings, 3 replies; 21+ messages in thread
From: Yuan Fu @ 2023-08-30  7:03 UTC (permalink / raw)
  To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel



> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
> 
> Hello,
> 
> I have a few questions about tree-sitter.
> 
> I'm currently developing a grammar for GNU Bison alongside a tree-sitter
> major mode, it's a work in progress.  The grammar is here:
> <https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
> far able to parse simple files, and the major mode prototype is
> attached to this message.
> 
> So, the questions:
> 
> 1. Is there a way to reload a grammar?
> 
> Emacs is pretty nice as a playground for testing grammars, but once a
> grammar is loaded, it won't be loaded again until Emacs restarts (as far
> as I know).
> Is it possible to reload a grammar after modifying it?

No, and it’s probably not easy to implement either, since unloading the grammar would require Emacs to purge/invalid all the node/query/parsers using that grammar.

> 2. How to mix multiple languages?
> 
> It would be very useful for Bison since its mixed with C or other languages.
> According to the documentation I need to use the function
> `treesit-range-rules` to set the variable `treesit-range-settings`, but
> it seems to have no effect.  The language in the selected nodes doesn't
> change (as attested by `(treesit-language-at (point))`).
> 
> I did it that way (extracted from the attachment):
> 
> (setq-local treesit-range-settings
>      (treesit-range-rules
>       :embed 'c
>       :host 'bison
>       '((undelimited_code_block) @capture)))
> 
> Am I missing something?

The ranges are set correctly, actually. But the C parse sees all those blocks stitched together as a whole, rather than individual blocks, and the code it sees is obviously not syntactically correct.

We should really work on supporting isolated ranges, there has been multiple requests for it. I’ll try to work on that.

> 3. Is it possible to trigger a hook when a node is modified?
> 
> Since Bison supports multiple languages (C, C++, Java and D), I'd like
> to watch the declaration "%language LANGUAGE" to change the embedded
> language when needed.
> Is there a way to do that?

treesit-parser-add-notifier might be what you want.

Yuan




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-08-30  7:03 ` Yuan Fu
@ 2023-08-30 11:28   ` Augustin Chéneau (BTuin)
  2023-09-06  4:07     ` Yuan Fu
  2023-09-01  2:39   ` Madhu
  2023-09-06 16:11   ` Lynn Winebarger
  2 siblings, 1 reply; 21+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-08-30 11:28 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

Le 30/08/2023 à 09:03, Yuan Fu a écrit :
> 
> 
>> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>
>> Hello,
>>
>> I have a few questions about tree-sitter.
>>
>> I'm currently developing a grammar for GNU Bison alongside a tree-sitter
>> major mode, it's a work in progress.  The grammar is here:
>> <https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
>> far able to parse simple files, and the major mode prototype is
>> attached to this message.
>>
>> So, the questions:
>>
>> 1. Is there a way to reload a grammar?
>>
>> Emacs is pretty nice as a playground for testing grammars, but once a
>> grammar is loaded, it won't be loaded again until Emacs restarts (as far
>> as I know).
>> Is it possible to reload a grammar after modifying it?
> 
> No, and it’s probably not easy to implement either, since unloading the grammar would require Emacs to purge/invalid all the node/query/parsers using that grammar.
> 
>> 2. How to mix multiple languages?
>>
>> It would be very useful for Bison since its mixed with C or other languages.
>> According to the documentation I need to use the function
>> `treesit-range-rules` to set the variable `treesit-range-settings`, but
>> it seems to have no effect.  The language in the selected nodes doesn't
>> change (as attested by `(treesit-language-at (point))`).
>>
>> I did it that way (extracted from the attachment):
>>
>> (setq-local treesit-range-settings
>>       (treesit-range-rules
>>        :embed 'c
>>        :host 'bison
>>        '((undelimited_code_block) @capture)))
>>
>> Am I missing something?
> 
> The ranges are set correctly, actually. But the C parse sees all those blocks stitched together as a whole, rather than individual blocks, and the code it sees is obviously not syntactically correct.
> 
> We should really work on supporting isolated ranges, there has been multiple requests for it. I’ll try to work on that.
> 
>> 3. Is it possible to trigger a hook when a node is modified?
>>
>> Since Bison supports multiple languages (C, C++, Java and D), I'd like
>> to watch the declaration "%language LANGUAGE" to change the embedded
>> language when needed.
>> Is there a way to do that?
> 
> treesit-parser-add-notifier might be what you want.
> 
> Yuan
> 

I see.  Thank you for your answers and for your great work on tree-sitter!





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-08-30  7:03 ` Yuan Fu
  2023-08-30 11:28   ` Augustin Chéneau (BTuin)
@ 2023-09-01  2:39   ` Madhu
  2023-09-01  6:53     ` Eli Zaretskii
  2023-09-06 16:11   ` Lynn Winebarger
  2 siblings, 1 reply; 21+ messages in thread
From: Madhu @ 2023-09-01  2:39 UTC (permalink / raw)
  To: emacs-devel

* Yuan Fu <C3EFD02D-F02F-4BE8-A6F4-A2506A9EFC90 @gmail.com> :
Wrote on Wed, 30 Aug 2023 00:03:03 -0700:
>> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin @mailo.com> wrote:
>> 1. Is there a way to reload a grammar?
>> Emacs is pretty nice as a playground for testing grammars, but once a
>> grammar is loaded, it won't be loaded again until Emacs restarts (as
>> far as I know).  Is it possible to reload a grammar after modifying
>> it?
>
> No, and it’s probably not easy to implement either, since unloading
> the grammar would require Emacs to purge/invalid all the
> node/query/parsers using that grammar.

Does else see this a fundamental problem of the infrastructure, as it
now relates to "becoming emacs"?





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-01  2:39   ` Madhu
@ 2023-09-01  6:53     ` Eli Zaretskii
  2023-09-01  9:15       ` Madhu
  0 siblings, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2023-09-01  6:53 UTC (permalink / raw)
  To: Madhu; +Cc: emacs-devel

> From: Madhu <enometh@meer.net>
> Date: Fri, 01 Sep 2023 08:09:27 +0530
> 
> * Yuan Fu <C3EFD02D-F02F-4BE8-A6F4-A2506A9EFC90 @gmail.com> :
> Wrote on Wed, 30 Aug 2023 00:03:03 -0700:
> >> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin @mailo.com> wrote:
> >> 1. Is there a way to reload a grammar?
> >> Emacs is pretty nice as a playground for testing grammars, but once a
> >> grammar is loaded, it won't be loaded again until Emacs restarts (as
> >> far as I know).  Is it possible to reload a grammar after modifying
> >> it?
> >
> > No, and it’s probably not easy to implement either, since unloading
> > the grammar would require Emacs to purge/invalid all the
> > node/query/parsers using that grammar.
> 
> Does else see this a fundamental problem of the infrastructure, as it
> now relates to "becoming emacs"?

I don't think the capability to unload and reload is a necessary
requirement from any Emacs feature.  In particular, unloading a
feature is not always supported in a way that leaves a clean slate.

It is a good thing to have that, no doubt.  But not a hard
requirement, IMO.  Especially when the grammar is a C library, not a
Lisp library.  People who are testing grammars are advised to use
scratch Emacs sessions which are restarted when the grammar changes.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-01  6:53     ` Eli Zaretskii
@ 2023-09-01  9:15       ` Madhu
  2023-09-01 10:45         ` Dmitry Gutov
  2023-09-01 10:58         ` Eli Zaretskii
  0 siblings, 2 replies; 21+ messages in thread
From: Madhu @ 2023-09-01  9:15 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

*  Eli Zaretskii <834jkecrl1.fsf@gnu.org>
Wrote on Fri, 01 Sep 2023 09:53:14 +0300
>> From: Madhu <enometh@meer.net>
>> Date: Fri, 01 Sep 2023 08:09:27 +0530
>> * Yuan Fu <C3EFD02D-F02F-4BE8-A6F4-A2506A9EFC90 @gmail.com> :
>> Wrote on Wed, 30 Aug 2023 00:03:03 -0700:
>> >> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin @mailo.com> wrote:

>> >> 1. Is there a way to reload a grammar?
>> >> Emacs is pretty nice as a playground for testing grammars, but once a
>> >> grammar is loaded, it won't be loaded again until Emacs restarts (as
>> >> far as I know).  Is it possible to reload a grammar after modifying
>> >> it?
>> >
>> > No, and it’s probably not easy to implement either, since unloading
>> > the grammar would require Emacs to purge/invalid all the
>> > node/query/parsers using that grammar.
>> Does else see this a fundamental problem of the infrastructure, as it
>> now relates to "becoming emacs"?
> I don't think the capability to unload and reload is a necessary
> requirement from any Emacs feature.  In particular, unloading a
> feature is not always supported in a way that leaves a clean slate.
> 
> It is a good thing to have that, no doubt.  But not a hard
> requirement, IMO.  Especially when the grammar is a C library, not a
> Lisp library.  People who are testing grammars are advised to use
> scratch Emacs sessions which are restarted when the grammar changes.

So I take it that these are shipped as black boxes: Presently if I
have a probelem with say cc-mode I can attempt to patch and fix
it. Likewise if I disagree about syntax with the package author say,
whether I can get eldoc completion or evaluation within comments,
because this is emacs and elisp, I am able to change things the way the syntax is treated on the fly.

Am i right in apprehending that the move to treesitter is a change
this aspect of emacs: that the user become merely a user of the
product shipped by the llvm investors, and the consumption behaviour
is to be determined and dictated by the investors (who arrange to ship
black boxes) typically following the consumer patterns on the other
industry standard editors



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-01  9:15       ` Madhu
@ 2023-09-01 10:45         ` Dmitry Gutov
  2023-09-01 10:58         ` Eli Zaretskii
  1 sibling, 0 replies; 21+ messages in thread
From: Dmitry Gutov @ 2023-09-01 10:45 UTC (permalink / raw)
  To: Madhu, eliz; +Cc: emacs-devel

On 01/09/2023 12:15, Madhu wrote:
> Am i right in apprehending that the move to treesitter is a change
> this aspect of emacs: that the user become merely a user of the
> product shipped by the llvm investors, and the consumption behaviour
> is to be determined and dictated by the investors (who arrange to ship
> black boxes) typically following the consumer patterns on the other
> industry standard editors

There is generally no relation between the tree-sitter grammars and the 
LLVM project.

The main author of tree-sitter also left Github and started his own 
editor project. Which actually might be a concern in the other direction 
(longevity).



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-01  9:15       ` Madhu
  2023-09-01 10:45         ` Dmitry Gutov
@ 2023-09-01 10:58         ` Eli Zaretskii
  2023-11-27  7:16           ` Madhu
  1 sibling, 1 reply; 21+ messages in thread
From: Eli Zaretskii @ 2023-09-01 10:58 UTC (permalink / raw)
  To: Madhu; +Cc: emacs-devel

> Date: Fri, 01 Sep 2023 14:45:31 +0530 (IST)
> Cc: emacs-devel@gnu.org
> From: Madhu <enometh@meer.net>
> 
> *  Eli Zaretskii <834jkecrl1.fsf@gnu.org>
> Wrote on Fri, 01 Sep 2023 09:53:14 +0300
> >> From: Madhu <enometh@meer.net>
> >> Date: Fri, 01 Sep 2023 08:09:27 +0530
> >> * Yuan Fu <C3EFD02D-F02F-4BE8-A6F4-A2506A9EFC90 @gmail.com> :
> >> Wrote on Wed, 30 Aug 2023 00:03:03 -0700:
> >> >> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin @mailo.com> wrote:
> 
> >> >> 1. Is there a way to reload a grammar?
> >> >> Emacs is pretty nice as a playground for testing grammars, but once a
> >> >> grammar is loaded, it won't be loaded again until Emacs restarts (as
> >> >> far as I know).  Is it possible to reload a grammar after modifying
> >> >> it?
> >> >
> >> > No, and it’s probably not easy to implement either, since unloading
> >> > the grammar would require Emacs to purge/invalid all the
> >> > node/query/parsers using that grammar.
> >> Does else see this a fundamental problem of the infrastructure, as it
> >> now relates to "becoming emacs"?
> > I don't think the capability to unload and reload is a necessary
> > requirement from any Emacs feature.  In particular, unloading a
> > feature is not always supported in a way that leaves a clean slate.
> > 
> > It is a good thing to have that, no doubt.  But not a hard
> > requirement, IMO.  Especially when the grammar is a C library, not a
> > Lisp library.  People who are testing grammars are advised to use
> > scratch Emacs sessions which are restarted when the grammar changes.
> 
> So I take it that these are shipped as black boxes: Presently if I
> have a probelem with say cc-mode I can attempt to patch and fix
> it. Likewise if I disagree about syntax with the package author say,
> whether I can get eldoc completion or evaluation within comments,
> because this is emacs and elisp, I am able to change things the way the syntax is treated on the fly.

Yes.  Exactly like with other libraries we link against that are
maintained elsewhere: GnuTLS, the image libraries, libjansson,
HarfBuzz, etc.

> Am i right in apprehending that the move to treesitter is a change
> this aspect of emacs: that the user become merely a user of the
> product shipped by the llvm investors, and the consumption behaviour
> is to be determined and dictated by the investors (who arrange to ship
> black boxes) typically following the consumer patterns on the other
> industry standard editors

It is not a change, no.  See above: we already use quite a few of
libraries for specific jobs related to important Emacs
functionalities.  For example, good support for sophisticated text
display and shaping features is unimaginable without HarfBuzz, and
some scripts cannot even be displayed in a reasonably legible way
without it.

But since some users clearly prefer the ability to make changes by
modifying Lisp over the advantages of features based on true parsing
of the programming language, we will not be removing the major modes
based entirely on Emacs Lisp any time soon.



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-08-30 11:28   ` Augustin Chéneau (BTuin)
@ 2023-09-06  4:07     ` Yuan Fu
  2023-09-08 11:53       ` Augustin Chéneau (BTuin)
  0 siblings, 1 reply; 21+ messages in thread
From: Yuan Fu @ 2023-09-06  4:07 UTC (permalink / raw)
  To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2846 bytes --]



> On Aug 30, 2023, at 4:28 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
> 
> Le 30/08/2023 à 09:03, Yuan Fu a écrit :
>>> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>> 
>>> Hello,
>>> 
>>> I have a few questions about tree-sitter.
>>> 
>>> I'm currently developing a grammar for GNU Bison alongside a tree-sitter
>>> major mode, it's a work in progress.  The grammar is here:
>>> <https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
>>> far able to parse simple files, and the major mode prototype is
>>> attached to this message.
>>> 
>>> So, the questions:
>>> 
>>> 1. Is there a way to reload a grammar?
>>> 
>>> Emacs is pretty nice as a playground for testing grammars, but once a
>>> grammar is loaded, it won't be loaded again until Emacs restarts (as far
>>> as I know).
>>> Is it possible to reload a grammar after modifying it?
>> No, and it’s probably not easy to implement either, since unloading the grammar would require Emacs to purge/invalid all the node/query/parsers using that grammar.
>>> 2. How to mix multiple languages?
>>> 
>>> It would be very useful for Bison since its mixed with C or other languages.
>>> According to the documentation I need to use the function
>>> `treesit-range-rules` to set the variable `treesit-range-settings`, but
>>> it seems to have no effect.  The language in the selected nodes doesn't
>>> change (as attested by `(treesit-language-at (point))`).
>>> 
>>> I did it that way (extracted from the attachment):
>>> 
>>> (setq-local treesit-range-settings
>>>      (treesit-range-rules
>>>       :embed 'c
>>>       :host 'bison
>>>       '((undelimited_code_block) @capture)))
>>> 
>>> Am I missing something?
>> The ranges are set correctly, actually. But the C parse sees all those blocks stitched together as a whole, rather than individual blocks, and the code it sees is obviously not syntactically correct.
>> We should really work on supporting isolated ranges, there has been multiple requests for it. I’ll try to work on that.
>>> 3. Is it possible to trigger a hook when a node is modified?
>>> 
>>> Since Bison supports multiple languages (C, C++, Java and D), I'd like
>>> to watch the declaration "%language LANGUAGE" to change the embedded
>>> language when needed.
>>> Is there a way to do that?
>> treesit-parser-add-notifier might be what you want.
>> Yuan
> 
> I see.  Thank you for your answers and for your great work on tree-sitter!

I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.

Yuan


[-- Attachment #2: bison-ts-mode.el --]
[-- Type: application/octet-stream, Size: 2198 bytes --]

;;; bison-ts-mode --- Tree-sitter mode for Bison

;;; Commentary:

;;; Code:

(require 'treesit)
(require 'c-ts-mode)

(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")

(defun bison-ts--font-lock-settings (language)
  (treesit-font-lock-rules
   :language language
   :feature 'comment
   '((comment) @font-lock-comment-face)

   :language language
   :feature 'declaration
   '((declaration (declaration_name) @font-lock-keyword-face))))

(define-derived-mode bison-ts-mode prog-mode "Bison"
  "A mode for Bison."
  (when (treesit-ready-p 'bison)
	(setq-local treesit-font-lock-settings
                (append (bison-ts--font-lock-settings 'bison)
                        (c-ts-mode--font-lock-settings 'c)))

    (setq-local treesit-font-lock-feature-list
                '((comment
                   ;; c-ts-mode
                   definition)
                  (declaration
                   ;; c-ts-mode
                   keyword preprocessor string type)
                  (
                   ;; c-ts-mode
                   assignment constant escape-sequence label literal)))

	(setq-local treesit-range-settings
                (treesit-range-rules
                 :embed 'c
                 :host 'bison
                 :local t
                 '((undelimited_code_block) @capture)))

    (treesit-major-mode-setup)))

(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here

[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-08-30  7:03 ` Yuan Fu
  2023-08-30 11:28   ` Augustin Chéneau (BTuin)
  2023-09-01  2:39   ` Madhu
@ 2023-09-06 16:11   ` Lynn Winebarger
  2023-09-07 23:42     ` Yuan Fu
  2 siblings, 1 reply; 21+ messages in thread
From: Lynn Winebarger @ 2023-09-06 16:11 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Augustin Chéneau (BTuin), emacs-devel

On Wed, Aug 30, 2023 at 3:03 AM Yuan Fu <casouri@gmail.com> wrote:
> > On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
> > I have a few questions about tree-sitter.
> >
> > I'm currently developing a grammar for GNU Bison alongside a tree-sitter
> > major mode, it's a work in progress.  The grammar is here:
> > <https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
> > far able to parse simple files, and the major mode prototype is
> > attached to this message.
> >
> > So, the questions:
> >
> > 1. Is there a way to reload a grammar?
> >
> > Emacs is pretty nice as a playground for testing grammars, but once a
> > grammar is loaded, it won't be loaded again until Emacs restarts (as far
> > as I know).
> > Is it possible to reload a grammar after modifying it?
>
> No, and it’s probably not easy to implement either, since unloading the grammar would require Emacs to purge/invalid all the node/query/parsers using that grammar.

Reviewing some generated "parser.c" files, and some of the available
documentation, it appears the parser.c file basically creates a lexing
function that adheres to a certain protocol in terms of
producing/consuming a standard lexer state data structure, and an
LR(1) parser table suitable for GLR parsing (i.e. allows ambiguous
actions).  These and definitions of the tokens and grammar symbols are
bundled up in a language structure passed to the tree-sitter library.
LALR(1) tables are essentially simplified/compressed LR(1) tables, and
emacs has code to calculate such tables directly in elisp.
Therefore, given functionality to translate elisp data into the raw C
structures, we should be able to dynamically create language data
structures to pass to the tree-sitter library to create a library.
We would also need a table driven lexer framework in place of the
generated lexer in the C file to completely avoid going through a C
compiler.
The other novel features of tree-sitter parsers appear to be
implemented in the parser runtime, not in the table calculation.

I've implemented LALR(1) parser generators two or three times in the
last couple of decades, this might be a fun project for me while I am
unambiguously able to contribute to GNU Emacs.

Regards,
Lynn



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-06 16:11   ` Lynn Winebarger
@ 2023-09-07 23:42     ` Yuan Fu
  2023-09-08  0:11       ` Lynn Winebarger
  0 siblings, 1 reply; 21+ messages in thread
From: Yuan Fu @ 2023-09-07 23:42 UTC (permalink / raw)
  To: Lynn Winebarger; +Cc: "Augustin Chéneau (BTuin)", emacs-devel



> On Sep 6, 2023, at 9:11 AM, Lynn Winebarger <owinebar@gmail.com> wrote:
> 
> On Wed, Aug 30, 2023 at 3:03 AM Yuan Fu <casouri@gmail.com> wrote:
>>> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>> I have a few questions about tree-sitter.
>>> 
>>> I'm currently developing a grammar for GNU Bison alongside a tree-sitter
>>> major mode, it's a work in progress.  The grammar is here:
>>> <https://gitlab.com/btuin2/tree-sitter-bison>, still incomplete but so
>>> far able to parse simple files, and the major mode prototype is
>>> attached to this message.
>>> 
>>> So, the questions:
>>> 
>>> 1. Is there a way to reload a grammar?
>>> 
>>> Emacs is pretty nice as a playground for testing grammars, but once a
>>> grammar is loaded, it won't be loaded again until Emacs restarts (as far
>>> as I know).
>>> Is it possible to reload a grammar after modifying it?
>> 
>> No, and it’s probably not easy to implement either, since unloading the grammar would require Emacs to purge/invalid all the node/query/parsers using that grammar.
> 
> Reviewing some generated "parser.c" files, and some of the available
> documentation, it appears the parser.c file basically creates a lexing
> function that adheres to a certain protocol in terms of
> producing/consuming a standard lexer state data structure, and an
> LR(1) parser table suitable for GLR parsing (i.e. allows ambiguous
> actions).  These and definitions of the tokens and grammar symbols are
> bundled up in a language structure passed to the tree-sitter library.
> LALR(1) tables are essentially simplified/compressed LR(1) tables, and
> emacs has code to calculate such tables directly in elisp.
> Therefore, given functionality to translate elisp data into the raw C
> structures, we should be able to dynamically create language data
> structures to pass to the tree-sitter library to create a library.
> We would also need a table driven lexer framework in place of the
> generated lexer in the C file to completely avoid going through a C
> compiler.
> The other novel features of tree-sitter parsers appear to be
> implemented in the parser runtime, not in the table calculation.
> 
> I've implemented LALR(1) parser generators two or three times in the
> last couple of decades, this might be a fun project for me while I am
> unambiguously able to contribute to GNU Emacs.

That’ll be great. But note that the parser structure has scape hatches: certain things can be implemented by arbitrary C function. Also tree-sitter allows grammars to use custom scanners [1]. 

[1] https://tree-sitter.github.io/tree-sitter/creating-parsers#external-scanners

Yuan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-07 23:42     ` Yuan Fu
@ 2023-09-08  0:11       ` Lynn Winebarger
  0 siblings, 0 replies; 21+ messages in thread
From: Lynn Winebarger @ 2023-09-08  0:11 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Augustin Chéneau (BTuin), emacs-devel

On Thu, Sep 7, 2023 at 7:42 PM Yuan Fu <casouri@gmail.com> wrote:
> > On Sep 6, 2023, at 9:11 AM, Lynn Winebarger <owinebar@gmail.com> wrote:
> >
> > On Wed, Aug 30, 2023 at 3:03 AM Yuan Fu <casouri@gmail.com> wrote:
> >>> Is it possible to reload a grammar after modifying it?
> >>
> >> No, and it’s probably not easy to implement either, since unloading the grammar would require Emacs to purge/invalid all the node/query/parsers using that grammar.
> >
> > [ ... ]
> > Therefore, given functionality to translate elisp data into the raw C
> > structures, we should be able to dynamically create language data
> > structures to pass to the tree-sitter library to create a library.
> > We would also need a table driven lexer framework in place of the
> > generated lexer in the C file to completely avoid going through a C
> > compiler.
> > The other novel features of tree-sitter parsers appear to be
> > implemented in the parser runtime, not in the table calculation.
> >
> > I've implemented LALR(1) parser generators two or three times in the
> > last couple of decades, this might be a fun project for me while I am
> > unambiguously able to contribute to GNU Emacs.
>
> That’ll be great. But note that the parser structure has scape hatches: certain things can be implemented by arbitrary C function. Also tree-sitter allows grammars to use custom scanners [1].
>
My primary interest is in using the tree-sitter parser framework with
the grammars and lexers constructed for Semantic in elisp.  That's the
strongest use-case.  That can be done by a single library implementing
a generic table-driven scanner function.

For other cases, it's a mixed bag.  If only the grammar changes, and
all C code is fixed, then modifications to the grammar could be
reloaded.  If this feature was really important to the user, they
could probably implement the C code to call Elisp functions that could
be updated dynamically, at least during development.

But you are correct that this will not solve the problem for arbitrary
tree-sitter language definitions with embedded C code.  For use in
emacs, the user might implement any required functions in a dynamic
module that could be loaded and unloaded separately from the
tree-sitter language library.  But that will not happen with the
parser.c produced by the tree-sitter cli tool.

Lynn



^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-06  4:07     ` Yuan Fu
@ 2023-09-08 11:53       ` Augustin Chéneau (BTuin)
  2023-09-08 16:43         ` Yuan Fu
  0 siblings, 1 reply; 21+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-08 11:53 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1690 bytes --]

Le 06/09/2023 à 06:07, Yuan Fu a écrit :
> 
> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
> 
> Yuan

Thanks a lot!

I did some tests and it's working pretty well.

Do you think it's a good idea to add a prefix to bison feature names in 
font-lock settings to avoid conflicts with C names (as I did)?

I have a few issues though:

- I first defined `treesit-language-at-point-function` using
`treesit-node-at`.  However, `treesit-node-at` itself uses
`treesit-language-at-point-function` which causes an infinite recursion.
So I instead used `treesit-local-parsers-at` to check if a local parser 
is used.  Is it a good solution?


- When I try to indent C code by using c-ts-mode indent rules, I get the 
following error:

Debugger entered--Lisp error: (wrong-type-argument treesit-node-p 
#<treesit-parser for c>)
   treesit-node-parser(#<treesit-parser for c>)
   treesit--indent-1()
   treesit-indent-region(1075 1176)
   indent-region(1075 1176)
   indent-for-tab-command(nil)
   funcall-interactively(indent-for-tab-command nil)
   call-interactively(indent-for-tab-command nil nil)
   command-execute(indent-for-tab-command)

There seems to be a mistake in `treesit--indent-1` in the `cond` at the 
line `(local-parsers (car local-parsers))`, since a parser is returned 
while it should be a node.



I attached a newer version of bison-ts-mode and a small patch to
c-ts-mode so I can reuse its `treesit-font-lock-feature-list`.

[-- Attachment #2: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 4307 bytes --]

;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-

;;; Commentary:

;;; Code:

(require 'treesit)
(require 'c-ts-mode)

(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")


(defun treesit--merge-feature-lists (l1 l2)
  "Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
  (let ((res ()))
    (while (or l1 l2)
      (setq res (push (nconc (car l1) (car l2)) res))
      (setq l1 (cdr l1) l2 (cdr l2)))
    (nreverse res)))


;; (defun bison-ts--language-at-point-function (position)
;;   "Return the language at POSITION."
;;   (let* ((node (treesit-node-at position)))
;; 	(if (treesit-parent-until
;; 		 node
;; 		 (lambda (n) (let ((type (treesit-node-type n)))
;; 			       (or (equal "code_block" type)
;; 				   (equal "undelimited_code_block" type))))
;; 		 t)
;; 	    'c
;; 	  'bison)))


(defun bison-ts--language-at-point-function (position)
  "Return the language at POSITION."
  (let* ((parser (treesit-local-parsers-at position)))
    (if parser
	'c
      'bison)))


(defun bison-ts--font-lock-settings (language)
  (treesit-font-lock-rules
   :language language
   :feature 'bison-comment
   '((comment) @font-lock-comment-face)

   :language language
   :feature 'bison-declaration
   '((declaration (declaration_name) @font-lock-keyword-face))

   :language language
   :feature 'bison-type
   '((type) @font-lock-type-face)

   :language language
   :feature 'bison-grammar-rule-usage
   '((grammar_rule_identifier) @font-lock-variable-use-face)

   :language language
   :feature 'bison-grammar-rule-declaration
   :override t
   '((grammar_rule (grammar_rule_identifier)  @font-lock-variable-name-face))

   :language language
   :feature 'bison-string
   :override t
   '((string) @font-lock-string-face)

   :language language
   :feature 'bison-literal
   :override t
   '((char_literal) @font-lock-keyword-face
     (number_literal) @font-lock-number-face)

   :language language
   :feature 'bison-directive-grammar-rule
   :override t
   '((grammar_rule (directive_empty) @font-lock-keyword-face))

   :language language
   :feature 'bison-operator
   :override t
   '(["|"] @font-lock-operator-face)

   :language language
   :feature 'bison-delimiter
   :override t
   '([";"] @font-lock-delimiter-face)))


(defvar bison-ts-mode--font-lock-feature-list
  '(( bison-comment bison-declaration bison-type
      bison-grammar-rule-usage bison-grammar-rule-declaration
      bison-string bison-literal bison-directive-grammar-rule
      bison-operator bison-delimiter)))


(define-derived-mode bison-ts-mode prog-mode "Bison"
  "A mode for Bison."
  (when (treesit-ready-p 'bison)
    (setq-local treesit-font-lock-settings
                (append (bison-ts--font-lock-settings 'bison)
                        (c-ts-mode--font-lock-settings 'c)))

    (setq-local treesit-font-lock-feature-list
                (treesit--merge-feature-lists
		 bison-ts-mode--font-lock-feature-list
		 c-ts-mode--font-lock-feature-list))

    (setq-local treesit-simple-indent-rules
                (c-ts-mode--get-indent-style 'c))

    (setq-local treesit-language-at-point-function 'bison-ts--language-at-point-function)

    (setq-local treesit-range-settings
		(treesit-range-rules
		 :embed 'c
		 :host 'bison
		 :local t
		 '((undelimited_code_block) @capture)
		 ))

    (treesit-major-mode-setup)))

(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here

[-- Attachment #3: 0001-Put-font-lock-features-of-c-ts-mode-in-a-variable-to.patch --]
[-- Type: text/x-patch, Size: 1580 bytes --]

From 5f8d4435a316855c60b9b2db5c0bb291a28d4b32 Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Fri, 8 Sep 2023 13:20:51 +0200
Subject: [PATCH] Put font-lock features of c-ts-mode in a variable to allow
 reuse

---
 lisp/progmodes/c-ts-mode.el | 13 ++++++++-----
 1 file changed, 8 insertions(+), 5 deletions(-)

diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
index 5b698eb09f4..c551f9f06b4 100644
--- a/lisp/progmodes/c-ts-mode.el
+++ b/lisp/progmodes/c-ts-mode.el
@@ -1123,6 +1123,13 @@ c-ts-mode--emacs-set-ranges
     (setq-local c-ts-mode--for-each-tail-ranges set-ranges)
     (treesit-parser-set-included-ranges c-parser reversed-ranges)))
 
+(defvar c-ts-mode--font-lock-feature-list
+  '(( comment definition)
+    ( keyword preprocessor string type)
+    ( assignment constant escape-sequence label literal)
+    ( bracket delimiter error function operator property variable)))
+
+
 ;;; Modes
 
 (defvar-keymap c-ts-base-mode-map
@@ -1213,11 +1220,7 @@ c-ts-base-mode
                                 eos)
                    c-ts-mode--defun-for-class-in-imenu-p nil))))
 
-  (setq-local treesit-font-lock-feature-list
-              '(( comment definition)
-                ( keyword preprocessor string type)
-                ( assignment constant escape-sequence label literal)
-                ( bracket delimiter error function operator property variable))))
+  (setq-local treesit-font-lock-feature-list c-ts-mode--font-lock-feature-list))
 
 (defvar treesit-load-name-override-list)
 
-- 
2.42.0


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-08 11:53       ` Augustin Chéneau (BTuin)
@ 2023-09-08 16:43         ` Yuan Fu
  2023-09-09 16:39           ` Augustin Chéneau (BTuin)
  0 siblings, 1 reply; 21+ messages in thread
From: Yuan Fu @ 2023-09-08 16:43 UTC (permalink / raw)
  To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel



> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
> 
> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>> Yuan
> 
> Thanks a lot!
> 
> I did some tests and it's working pretty well.

Awesome!

> Do you think it's a good idea to add a prefix to bison feature names in font-lock settings to avoid conflicts with C names (as I did)?

I should add some way to distinguish between languages in feature name list. The tricky part is to make it backward compatible and not too confusing and still convenient to use.

> I have a few issues though:
> 
> - I first defined `treesit-language-at-point-function` using
> `treesit-node-at`.  However, `treesit-node-at` itself uses
> `treesit-language-at-point-function` which causes an infinite recursion.
> So I instead used `treesit-local-parsers-at` to check if a local parser is used.  Is it a good solution?

No no, you should use the host langauge’s parser (bison) and see if point is in an undelimited_code_block, and return c or bison accordingly. I’m highlight this in the docstring, thanks.

For now, you can use something like

(mapcar (lambda (rule)
             (list (nth 0 rule)
                   (nth 1 rule)
                   (intern (format "js-%s" (nth 2 rule)))
                   (nth 3 rule)))
           js--treesit-font-lock-settings)

to add prefix to c-ts-mode’s font-lock rules’ features.

> 
> - When I try to indent C code by using c-ts-mode indent rules, I get the following error:
> 
> Debugger entered--Lisp error: (wrong-type-argument treesit-node-p #<treesit-parser for c>)
>  treesit-node-parser(#<treesit-parser for c>)
>  treesit--indent-1()
>  treesit-indent-region(1075 1176)
>  indent-region(1075 1176)
>  indent-for-tab-command(nil)
>  funcall-interactively(indent-for-tab-command nil)
>  call-interactively(indent-for-tab-command nil nil)
>  command-execute(indent-for-tab-command)
> 
> There seems to be a mistake in `treesit--indent-1` in the `cond` at the line `(local-parsers (car local-parsers))`, since a parser is returned while it should be a node.

Thanks, I fixed that on master. And c-ts-mode’s feature list is a separate variable now.

Yuan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-08 16:43         ` Yuan Fu
@ 2023-09-09 16:39           ` Augustin Chéneau (BTuin)
  2023-09-12  0:22             ` Yuan Fu
  0 siblings, 1 reply; 21+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-09 16:39 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1657 bytes --]

Le 08/09/2023 à 18:43, Yuan Fu a écrit :
> 
> 
>> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>
>> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>>> Yuan
>>
>> Thanks a lot!
>>
>> I did some tests and it's working pretty well.
> 
> Awesome!
> 


It seems I spoke a bit too soon  :(
When I edit the buffer, sometimes there is an offset between the text 
and the nodes after modifying the buffer, or the syntax highlighting 
breaks in C code.

I attached an example Bison file if needed.

> 
>> I have a few issues though:
>>
>> - I first defined `treesit-language-at-point-function` using
>> `treesit-node-at`.  However, `treesit-node-at` itself uses
>> `treesit-language-at-point-function` which causes an infinite recursion.
>> So I instead used `treesit-local-parsers-at` to check if a local parser is used.  Is it a good solution?
> 
> No no, you should use the host langauge’s parser (bison) and see if point is in an undelimited_code_block, and return c or bison accordingly. I’m highlight this in the docstring, thanks.

So I need to call `treesit-node-at` with `'bison` as the value for 
PARSER-OR-LANG to see in which node I am?
Then I think there is a problem with `treesit-node-at`, because it 
always call `treesit-language-at` even if PARSER-OR-LANG is provided.
I propose a fix in the attached patch.



[-- Attachment #2: 0001-Do-not-always-call-treesit-language-at-in-treesit-no.patch --]
[-- Type: text/x-patch, Size: 1512 bytes --]

From dda0b7a9cd5f8b325b401aa7ba44c6fbe103fb6a Mon Sep 17 00:00:00 2001
From: =?UTF-8?q?Augustin=20Ch=C3=A9neau?= <btuin@mailo.com>
Date: Sat, 9 Sep 2023 15:35:49 +0200
Subject: [PATCH] Do not always call `treesit-language-at` in 
 `treesit-node-at`

---
 lisp/treesit.el | 7 +++----
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/lisp/treesit.el b/lisp/treesit.el
index 1711446b40b..3d1ceda6d06 100644
--- a/lisp/treesit.el
+++ b/lisp/treesit.el
@@ -190,15 +190,14 @@ treesit-node-at
 is nil, try to guess the language at POS using `treesit-language-at'.
 
 If there's a local parser at POS, try to use that parser first."
-  (let* ((lang-at-point (treesit-language-at pos))
-         (root (if (treesit-parser-p parser-or-lang)
+  (let* ((root (if (treesit-parser-p parser-or-lang)
                    (treesit-parser-root-node parser-or-lang)
                  (or (when-let ((parser (car (treesit-local-parsers-at
                                               pos (or parser-or-lang
-                                                      lang-at-point)))))
+                                                      (treesit-language-at pos))))))
                        (treesit-parser-root-node parser))
                      (treesit-buffer-root-node
-                      (or parser-or-lang lang-at-point)))))
+                      (or parser-or-lang (treesit-language-at pos))))))
          (node root)
          (node-before root)
          (pos-1 (max (1- pos) (point-min)))
-- 
2.42.0


[-- Attachment #3: bison-example.y --]
[-- Type: text/plain, Size: 8044 bytes --]

/*                                                       -*- C -*-
  Copyright (C) 2020-2022 Free Software Foundation, Inc.

  This program is free software: you can redistribute it and/or modify
  it under the terms of the GNU General Public License as published by
  the Free Software Foundation, either version 3 of the License, or
  (at your option) any later version.

  This program is distributed in the hope that it will be useful,
  but WITHOUT ANY WARRANTY; without even the implied warranty of
  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
  GNU General Public License for more details.

  You should have received a copy of the GNU General Public License
  along with this program.  If not, see <https://www.gnu.org/licenses/>.
*/

/* Simplified C++ Type and Expression Grammar.
   Written by Paul Hilfinger for Bison's test suite.  */

%define api.pure
%header
%define api.header.include {"c++-types.h"}
%locations
%debug

/* Nice error messages with details. */
%define parse.error detailed

%code requires
{
  union node {
    struct {
      int is_nterm;
      int parents;
    } node_info;
    struct {
      int is_nterm; /* 1 */
      int parents;
      char const *form;
      union node *children[3];
    } nterm;
    struct {
      int is_nterm; /* 0 */
      int parents;
      char *text;
    } term;
  };
  typedef union node node_t;
}

%define api.value.type union

%code
{
  /* Portability issues for strdup. */
#ifndef _XOPEN_SOURCE
# define _XOPEN_SOURCE 600
#endif

#include <assert.h>
#include <ctype.h>
#include <stdarg.h>
#include <stdio.h>
#include <stdlib.h>
#include <string.h>

  static node_t *new_nterm (char const *, node_t *, node_t *, node_t *);
  static node_t *new_term (char *);
  static void free_node (node_t *);
  static char *node_to_string (const node_t *);
  static void node_print (FILE *, const node_t *);
  static node_t *stmt_merge (YYSTYPE x0, YYSTYPE x1);

  static void yyerror (YYLTYPE const * const loc, const char *msg);
  static yytoken_kind_t yylex (YYSTYPE *lval, YYLTYPE *lloc);
}

%expect-rr 1

%token
  TYPENAME "typename"
  ID "identifier"

%right '='
%left '+'

%glr-parser

%type <node_t *> stmt expr decl declarator TYPENAME ID
%destructor { free_node ($$); } <node_t *>
%printer { node_print (yyo, $$); } <node_t *>

%%

prog : %empty
     | prog stmt   {
                     YYLOCATION_PRINT (stdout, &@2);
                     fputs (": ", stdout);
                     node_print (stdout, $2);
                     putc ('\n', stdout);
                     fflush (stdout);
                     free_node ($2);
                   }
     ;

stmt : expr ';'  %merge <stmt_merge>     { $$ = $1; }
     | decl      %merge <stmt_merge>
     | error ';'        { $$ = new_nterm ("<error>", NULL, NULL, NULL); }
     ;

expr : ID
     | TYPENAME '(' expr ')'
                        { $$ = new_nterm ("<cast>(%s, %s)", $3, $1, NULL); }
     | expr '+' expr    { $$ = new_nterm ("+(%s, %s)", $1, $3, NULL); }
     | expr '=' expr    { $$ = new_nterm ("=(%s, %s)", $1, $3, NULL); }
     ;

decl : TYPENAME declarator ';'
                        { $$ = new_nterm ("<declare>(%s, %s)", $1, $2, NULL); }
     | TYPENAME declarator '=' expr ';'
                        { $$ = new_nterm ("<init-declare>(%s, %s, %s)", $1,
                                          $2, $4); }
     ;

declarator
     : ID
     | '(' declarator ')' { $$ = $2; }
     ;

%%

/* A C error reporting function.  */
static void
yyerror (YYLTYPE const * const loc, const char *msg)
{
  YYLOCATION_PRINT (stderr, loc);
  fprintf (stderr, ": %s\n", msg);
}

/* The input file. */
FILE * input = NULL;

yytoken_kind_t
yylex (YYSTYPE *lval, YYLTYPE *lloc)
{
  static int line_num = 1;
  static int col_num = 0;

  while (1)
    {
      int c;
      assert (!feof (input));
      c = getc (input);
      switch (c)
        {
        case EOF:
          return 0;
        case '\t':
          col_num = (col_num + 7) & ~7;
          break;
        case ' ': case '\f':
          col_num += 1;
          break;
        case '\n':
          line_num += 1;
          col_num = 0;
          break;
        default:
          {
            yytoken_kind_t tok;
            lloc->first_line = lloc->last_line = line_num;
            lloc->first_column = col_num;
            if (isalpha (c))
              {
                char buffer[256];
                unsigned i = 0;

                do
                  {
                    buffer[i++] = (char) c;
                    col_num += 1;
                    assert (i != sizeof buffer - 1);
                    c = getc (input);
                  }
                while (isalnum (c) || c == '_');

                ungetc (c, input);
                buffer[i++] = 0;
                if (isupper ((unsigned char) buffer[0]))
                  {
                    tok = TYPENAME;
                    lval->TYPENAME = new_term (strdup (buffer));
                  }
                else
                  {
                    tok = ID;
                    lval->ID = new_term (strdup (buffer));
                  }
              }
            else
              {
                col_num += 1;
                tok = c;
              }
            lloc->last_column = col_num;
            return tok;
          }
        }
    }
}

static node_t *
new_nterm (char const *form, node_t *child0, node_t *child1, node_t *child2)
{
  node_t *res = malloc (sizeof *res);
  res->nterm.is_nterm = 1;
  res->nterm.parents = 0;
  res->nterm.form = form;
  res->nterm.children[0] = child0;
  if (child0)
    child0->node_info.parents += 1;
  res->nterm.children[1] = child1;
  if (child1)
    child1->node_info.parents += 1;
  res->nterm.children[2] = child2;
  if (child2)
    child2->node_info.parents += 1;
  return res;
}

static node_t *
new_term (char *text)
{
  node_t *res = malloc (sizeof *res);
  res->term.is_nterm = 0;
  res->term.parents = 0;
  res->term.text = text;
  return res;
}

static void
free_node (node_t *node)
{
  if (!node)
    return;
  node->node_info.parents -= 1;
  /* Free only if 0 (last parent) or -1 (no parents).  */
  if (node->node_info.parents > 0)
    return;
  if (node->node_info.is_nterm == 1)
    {
      free_node (node->nterm.children[0]);
      free_node (node->nterm.children[1]);
      free_node (node->nterm.children[2]);
    }
  else
    free (node->term.text);
  free (node);
}

static char *
node_to_string (const node_t *node)
{
  char *res;
  if (!node)
    res = strdup ("");
  else if (node->node_info.is_nterm)
    {
      char *child0 = node_to_string (node->nterm.children[0]);
      char *child1 = node_to_string (node->nterm.children[1]);
      char *child2 = node_to_string (node->nterm.children[2]);
      res = malloc (strlen (node->nterm.form) + strlen (child0)
                    + strlen (child1) + strlen (child2) + 1);
      sprintf (res, node->nterm.form, child0, child1, child2);
      free (child2);
      free (child1);
      free (child0);
    }
  else
    res = strdup (node->term.text);
  return res;
}

static void
node_print (FILE *out, const node_t *n)
{
  char *str = node_to_string (n);
  fputs (str, out);
  free (str);
}


static node_t *
stmt_merge (YYSTYPE x0, YYSTYPE x1)
{
  return new_nterm ("<OR>(%s, %s)", x0.stmt, x1.stmt, NULL);
}

static int
process (const char *file)
{
  int is_stdin = !file || strcmp (file, "-") == 0;
  if (is_stdin)
    input = stdin;
  else
    input = fopen (file, "r");
  assert (input);
  int status = yyparse ();
  if (!is_stdin)
    fclose (input);
  return status;
}

int
main (int argc, char **argv)
{
  if (getenv ("YYDEBUG"))
    yydebug = 1;

  int ran = 0;
  for (int i = 1; i < argc; ++i)
    // Enable parse traces on option -p.
    if (strcmp (argv[i], "-p") == 0)
      yydebug = 1;
    else
      {
        int status = process (argv[i]);
        ran = 1;
        if (!status)
          return status;
      }

  if (!ran)
    {
      int status = process (NULL);
      if (!status)
        return status;
    }
  return 0;
}


^ permalink raw reply related	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-09 16:39           ` Augustin Chéneau (BTuin)
@ 2023-09-12  0:22             ` Yuan Fu
  2023-09-13 12:43               ` Augustin Chéneau (BTuin)
  0 siblings, 1 reply; 21+ messages in thread
From: Yuan Fu @ 2023-09-12  0:22 UTC (permalink / raw)
  To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel



> On Sep 9, 2023, at 9:39 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
> 
> Le 08/09/2023 à 18:43, Yuan Fu a écrit :
>>> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>> 
>>> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>>>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>>>> Yuan
>>> 
>>> Thanks a lot!
>>> 
>>> I did some tests and it's working pretty well.
>> Awesome!
> 
> 
> It seems I spoke a bit too soon  :(
> When I edit the buffer, sometimes there is an offset between the text and the nodes after modifying the buffer, or the syntax highlighting breaks in C code.
> 
> I attached an example Bison file if needed.

Thanks. I was able to reproduce this, but then can’t. I’ll keep looking into this, if you found out something new please let me know.

> 
>>> I have a few issues though:
>>> 
>>> - I first defined `treesit-language-at-point-function` using
>>> `treesit-node-at`.  However, `treesit-node-at` itself uses
>>> `treesit-language-at-point-function` which causes an infinite recursion.
>>> So I instead used `treesit-local-parsers-at` to check if a local parser is used.  Is it a good solution?
>> No no, you should use the host langauge’s parser (bison) and see if point is in an undelimited_code_block, and return c or bison accordingly. I’m highlight this in the docstring, thanks.
> 
> So I need to call `treesit-node-at` with `'bison` as the value for PARSER-OR-LANG to see in which node I am?
> Then I think there is a problem with `treesit-node-at`, because it always call `treesit-language-at` even if PARSER-OR-LANG is provided.
> I propose a fix in the attached patch.

You are right. I applied a similar fix. It should be good now. Thanks!

Yuan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-12  0:22             ` Yuan Fu
@ 2023-09-13 12:43               ` Augustin Chéneau (BTuin)
  2023-09-14  4:11                 ` Yuan Fu
  0 siblings, 1 reply; 21+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-13 12:43 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

Le 12/09/2023 à 02:22, Yuan Fu a écrit :
> 
> 
>> On Sep 9, 2023, at 9:39 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>
>> Le 08/09/2023 à 18:43, Yuan Fu a écrit :
>>>> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>>>
>>>> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>>>>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>>>>> Yuan
>>>>
>>>> Thanks a lot!
>>>>
>>>> I did some tests and it's working pretty well.
>>> Awesome!
>>
>>
>> It seems I spoke a bit too soon  :(
>> When I edit the buffer, sometimes there is an offset between the text and the nodes after modifying the buffer, or the syntax highlighting breaks in C code.
>>
>> I attached an example Bison file if needed.
> 
> Thanks. I was able to reproduce this, but then can’t. I’ll keep looking into this, if you found out something new please let me know.
> 

It may be unrelated, but I have this popping in *Messages* sometimes:

Error during redisplay: (jit-lock-function 1410) signaled 
(treesit-load-language-error not-found ("libtree-sitter-nil" 
"libtree-sitter-nil.0" "libtree-sitter-nil.0.0" "libtree-sitter-nil.so" 
"libtree-sitter-nil.so.0" "libtree-sitter-nil.so.0.0") "No such file or 
directory")


>>
>>>> I have a few issues though:
>>>>
>>>> - I first defined `treesit-language-at-point-function` using
>>>> `treesit-node-at`.  However, `treesit-node-at` itself uses
>>>> `treesit-language-at-point-function` which causes an infinite recursion.
>>>> So I instead used `treesit-local-parsers-at` to check if a local parser is used.  Is it a good solution?
>>> No no, you should use the host langauge’s parser (bison) and see if point is in an undelimited_code_block, and return c or bison accordingly. I’m highlight this in the docstring, thanks.
>>
>> So I need to call `treesit-node-at` with `'bison` as the value for PARSER-OR-LANG to see in which node I am?
>> Then I think there is a problem with `treesit-node-at`, because it always call `treesit-language-at` even if PARSER-OR-LANG is provided.
>> I propose a fix in the attached patch.
> 
> You are right. I applied a similar fix. It should be good now. Thanks!

Thanks!






^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-13 12:43               ` Augustin Chéneau (BTuin)
@ 2023-09-14  4:11                 ` Yuan Fu
  2023-09-18 17:04                   ` Augustin Chéneau (BTuin)
  0 siblings, 1 reply; 21+ messages in thread
From: Yuan Fu @ 2023-09-14  4:11 UTC (permalink / raw)
  To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel



> On Sep 13, 2023, at 5:43 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
> 
> Le 12/09/2023 à 02:22, Yuan Fu a écrit :
>>> On Sep 9, 2023, at 9:39 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>> 
>>> Le 08/09/2023 à 18:43, Yuan Fu a écrit :
>>>>> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>>>> 
>>>>> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>>>>>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>>>>>> Yuan
>>>>> 
>>>>> Thanks a lot!
>>>>> 
>>>>> I did some tests and it's working pretty well.
>>>> Awesome!
>>> 
>>> 
>>> It seems I spoke a bit too soon  :(
>>> When I edit the buffer, sometimes there is an offset between the text and the nodes after modifying the buffer, or the syntax highlighting breaks in C code.
>>> 
>>> I attached an example Bison file if needed.
>> Thanks. I was able to reproduce this, but then can’t. I’ll keep looking into this, if you found out something new please let me know.
> 
> It may be unrelated, but I have this popping in *Messages* sometimes:
> 
> Error during redisplay: (jit-lock-function 1410) signaled (treesit-load-language-error not-found ("libtree-sitter-nil" "libtree-sitter-nil.0" "libtree-sitter-nil.0.0" "libtree-sitter-nil.so" "libtree-sitter-nil.so.0" "libtree-sitter-nil.so.0.0") "No such file or directory”)

Thanks. I’ve fixed that and some other problems. Please pull master and try it out. Now bison-ts-mode works pretty well for me. I can’t reproduce the offset problem anymore, maybe it’s fixed in some of the fixes I made. Anyway, let me know if you observe it again.

Yuan





^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-14  4:11                 ` Yuan Fu
@ 2023-09-18 17:04                   ` Augustin Chéneau (BTuin)
  2023-09-19  4:00                     ` Yuan Fu
  0 siblings, 1 reply; 21+ messages in thread
From: Augustin Chéneau (BTuin) @ 2023-09-18 17:04 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2910 bytes --]

Le 14/09/2023 à 06:11, Yuan Fu a écrit :
> 
> 
>> On Sep 13, 2023, at 5:43 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>
>> Le 12/09/2023 à 02:22, Yuan Fu a écrit :
>>>> On Sep 9, 2023, at 9:39 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>>>
>>>> Le 08/09/2023 à 18:43, Yuan Fu a écrit :
>>>>>> On Sep 8, 2023, at 4:53 AM, Augustin Chéneau (BTuin) <btuin@mailo.com> wrote:
>>>>>>
>>>>>> Le 06/09/2023 à 06:07, Yuan Fu a écrit :
>>>>>>> I added local parser support to master. If everything goes right, you just need to add a :local t flag in treesit-range-rules. Check out the modified bision-ts-mode.el that I hacked up for an example. BTW, it’s vital that you define treesit-language-at-point-function for a multi-language mode.
>>>>>>> Yuan
>>>>>>
>>>>>> Thanks a lot!
>>>>>>
>>>>>> I did some tests and it's working pretty well.
>>>>> Awesome!
>>>>
>>>>
>>>> It seems I spoke a bit too soon  :(
>>>> When I edit the buffer, sometimes there is an offset between the text and the nodes after modifying the buffer, or the syntax highlighting breaks in C code.
>>>>
>>>> I attached an example Bison file if needed.
>>> Thanks. I was able to reproduce this, but then can’t. I’ll keep looking into this, if you found out something new please let me know.
>>
>> It may be unrelated, but I have this popping in *Messages* sometimes:
>>
>> Error during redisplay: (jit-lock-function 1410) signaled (treesit-load-language-error not-found ("libtree-sitter-nil" "libtree-sitter-nil.0" "libtree-sitter-nil.0.0" "libtree-sitter-nil.so" "libtree-sitter-nil.so.0" "libtree-sitter-nil.so.0.0") "No such file or directory”)
> 
> Thanks. I’ve fixed that and some other problems. Please pull master and try it out. Now bison-ts-mode works pretty well for me. I can’t reproduce the offset problem anymore, maybe it’s fixed in some of the fixes I made. Anyway, let me know if you observe it again.
> 
> Yuan
> 

It indeed works much better, thanks!

I found a bug and a way to replicate it (you'll need to update your 
Bison grammar):
- Open the file "treesit-bug-highlighting-demo";
- Enable bison-ts-mode;
- At the beginning of the second line (the part managed by the embedded 
C parser, with "static void ..."), add a space;

=> The whole line loses its highlighting.

If you add a space again, the highlighting works correctly again.
Not a big issue, but pretty weird.


Also, I have one (last?) question:

Since the C code uses its own indentation, it's entirely independent of 
Bison's nodes positions.
Is it possible to add an offset to the indentation of the embedded 
parts, relative to its container node?

For instance, rather than:

%%
grammar_declaration:
       grammar_rule
	    {
int myvar;
	    }
     ;
%%



I would like to get



%%
grammar_declaration:
       grammar_rule
	    {
	      int myvar;
	    }
     ;
%%


("int myvar;" is managed by a C parser).

[-- Attachment #2: bison-ts-mode.el --]
[-- Type: text/x-emacs-lisp, Size: 5645 bytes --]

;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-

;;; Commentary:

;;; Code:

(require 'treesit)
(require 'c-ts-mode)

(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")


(defgroup bison nil
  "Support for the Bison and Flex."
  :group 'languages)

(defcustom bison-ts-mode-indent-offset 4
  "Number of spaces for each indentation step in `bison-ts-mode'."
  :version "30.1"
  :type 'integer
  :safe 'integerp
  :group 'bison)


(defun treesit--merge-feature-lists (l1 l2)
  "Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
  (let ((res ()))
    (while (or l1 l2)
      (setq res (push (append (car l1) (car l2)) res))
      (setq l1 (cdr l1) l2 (cdr l2)))
    (nreverse res)))


(defun bison-ts--language-at-point-function (position)
  "Return the language at POSITION."
  (let* ((node (treesit-node-at position 'bison)))
    (if (equal (treesit-node-type node)
	       "embedded_code")
	'c
      'bison)))

(defun bison-ts--font-lock-settings (language)
  (treesit-font-lock-rules
   :language language
   :feature 'bison-comment
   '((comment) @font-lock-comment-face)

   :language language
   :feature 'bison-declaration
   '((declaration (declaration_name) @font-lock-keyword-face))

   :language language
   :feature 'bison-type
   '((type) @font-lock-type-face)

   :language language
   :feature 'bison-grammar-rule-usage
   '((grammar_rule_identifier) @font-lock-variable-use-face)

   :language language
   :feature 'bison-grammar-rule-declaration
   '((grammar_rule (grammar_rule_declaration)
		   @font-lock-variable-use-face))

   :language language
   :feature 'bison-string
   :override t
   '((string) @font-lock-string-face)

   :language language
   :feature 'bison-literal
   :override t
   '((char_literal) @font-lock-keyword-face
     (number_literal) @font-lock-number-face)

   :language language
   :feature 'bison-directive-grammar-rule
   :override t
   '((grammar_rule (directive) @font-lock-keyword-face))

   :language language
   :feature 'bison-operator
   :override t
   '(["|"] @font-lock-operator-face)

   :language language
   :feature 'bison-delimiter
   :override t
   '([";"] @font-lock-delimiter-face)))


(treesit-query-validate 'bison '((grammar_rule (grammar_rule_declaration)  @font-lock-variable-name-face)))

(defvar bison-ts-mode--font-lock-feature-list
  '(( bison-comment bison-declaration bison-type
      bison-grammar-rule-usage bison-grammar-rule-declaration
      bison-string bison-literal bison-directive-grammar-rule
      bison-operator bison-delimiter)))



(defun bison-ts--indent-rules ()
  "Indent rules supported by `bison-ts-mode'."
  (let*
      ((common
	 `(

	   ((node-is "^declaration$")
	    column-0 0)

	   ((and (parent-is "^declaration$")
		 (not (node-is "^code_block$")))
	    column-0 2)

	   ((and (parent-is "^declaration$")
		 (node-is "^code_block$"))
	    column-0 0)

	   ((parent-is "^declaration$")
	    parent 2)

	   ((node-is "^grammar_rule$")
	    column-0 0)

	   ((and
	     (parent-is "^grammar_rule$")
	     (node-is ";"))
	    column-0 bison-ts-mode-indent-offset)

	   ((and (parent-is "^grammar_rule$")
		 (node-is "|"))
	    column-0 bison-ts-mode-indent-offset)

	   ((and (parent-is "^grammar_rule$")
		 (not (node-is "^grammar_rule_declaration$"))
		 (not (node-is "^action$")))
	    column-0 ,(+ bison-ts-mode-indent-offset 2))

	   ((or
	     (node-is "^action$")
	     (node-is "}"))
	    column-0 12)

	   ;; Set '%%' at the beginning of the line
	   ((or
	     (and (parent-is "^grammar_rules_section$")
		  (node-is "%%"))
	     (node-is "^grammar_rules_section$"))
	    column-0 0)

	   (no-node parent 0)
	   )
	 ))
    `((bison . ,common))))


(define-derived-mode bison-ts-mode prog-mode "Bison"
  "A mode for Bison."
  (when (treesit-ready-p 'bison)
    (setq-local treesit-font-lock-settings
                (append (bison-ts--font-lock-settings 'bison)
                        (c-ts-mode--font-lock-settings 'c)))

    (setq-local treesit-font-lock-feature-list
                (treesit--merge-feature-lists
		 bison-ts-mode--font-lock-feature-list
		 c-ts-mode--feature-list))

    (setq-local treesit-simple-imenu-settings
		`(("Grammar"
		   "\\`grammar_rule_declaration\\'"
		   nil
		   (lambda (node) (treesit-node-text node) ))))

    (setq-local treesit-simple-indent-rules
                (append (c-ts-mode--get-indent-style 'c)
			(bison-ts--indent-rules)))

    (setq-local treesit-language-at-point-function 'bison-ts--language-at-point-function)

    (setq-local treesit-range-settings
		(treesit-range-rules
		 :embed 'c
		 :host 'bison
		 :local t
		 '((embedded_code) @capture)
		 ))

    (treesit-major-mode-setup)))

(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here

[-- Attachment #3: treesit-bug-highlighting-demo --]
[-- Type: text/plain, Size: 83 bytes --]

%{
   static void print_token (yytoken_kind_t token, YYSTYPE val);
%}

%%
rule: a;

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-18 17:04                   ` Augustin Chéneau (BTuin)
@ 2023-09-19  4:00                     ` Yuan Fu
  0 siblings, 0 replies; 21+ messages in thread
From: Yuan Fu @ 2023-09-19  4:00 UTC (permalink / raw)
  To: "Augustin Chéneau (BTuin)"; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1351 bytes --]

> 
> It indeed works much better, thanks!
> 
> I found a bug and a way to replicate it (you'll need to update your Bison grammar):
> - Open the file "treesit-bug-highlighting-demo";
> - Enable bison-ts-mode;
> - At the beginning of the second line (the part managed by the embedded C parser, with "static void ..."), add a space;
> 
> => The whole line loses its highlighting.
> 
> If you add a space again, the highlighting works correctly again.
> Not a big issue, but pretty weird.

Thanks. Weird indeed. I found the bug and fixed it. Latest master should work fine now.

> 
> Also, I have one (last?) question:
> 
> Since the C code uses its own indentation, it's entirely independent of Bison's nodes positions.
> Is it possible to add an offset to the indentation of the embedded parts, relative to its container node?
> 
> For instance, rather than:
> 
> %%
> grammar_declaration:
>      grammar_rule
>    {
> int myvar;
>    }
>    ;
> %%
> 
> 
> 
> I would like to get
> 
> 
> 
> %%
> grammar_declaration:
>      grammar_rule
>    {
>      int myvar;
>    }
>    ;
> %%
> 
> 
> ("int myvar;" is managed by a C parser).

Makes sense. You can use a custom matcher to indent the top level nodes in the C code. I modified your mode for a POC. The modified parts are marked with "!!!".

Yuan


[-- Attachment #2: bison-ts-mode.el --]
[-- Type: application/octet-stream, Size: 6305 bytes --]

;;; bison-ts-mode --- Tree-sitter mode for Bison -*- lexical-binding: t; -*-

;;; Commentary:

;;; Code:

(require 'treesit)
(require 'c-ts-mode)

(declare-function treesit-parser-create "treesit.c")
(declare-function treesit-induce-sparse-tree "treesit.c")
(declare-function treesit-node-child-by-field-name "treesit.c")
(declare-function treesit-search-subtree "treesit.c")
(declare-function treesit-node-parent "treesit.c")
(declare-function treesit-node-next-sibling "treesit.c")
(declare-function treesit-node-type "treesit.c")
(declare-function treesit-node-child "treesit.c")
(declare-function treesit-node-end "treesit.c")
(declare-function treesit-node-start "treesit.c")
(declare-function treesit-node-string "treesit.c")
(declare-function treesit-query-compile "treesit.c")
(declare-function treesit-query-capture "treesit.c")
(declare-function treesit-parser-add-notifier "treesit.c")
(declare-function treesit-parser-buffer "treesit.c")
(declare-function treesit-parser-list "treesit.c")


(defgroup bison nil
  "Support for the Bison and Flex."
  :group 'languages)

(defcustom bison-ts-mode-indent-offset 4
  "Number of spaces for each indentation step in `bison-ts-mode'."
  :version "30.1"
  :type 'integer
  :safe 'integerp
  :group 'bison)


(defun treesit--merge-feature-lists (l1 l2)
  "Merge the lists of lists L1 and L2.
The first sublist of L1 is merged with the first sublist of L2 and so on.
L1 and L2 don't need to have the same size."
  (let ((res ()))
    (while (or l1 l2)
      (setq res (push (append (car l1) (car l2)) res))
      (setq l1 (cdr l1) l2 (cdr l2)))
    (nreverse res)))


(defun bison-ts--language-at-point-function (position)
  "Return the language at POSITION."
  (let* ((node (treesit-node-at position 'bison)))
    (if (equal (treesit-node-type node)
	           "embedded_code")
	    'c
      'bison)))

(defun bison-ts--font-lock-settings (language)
  (treesit-font-lock-rules
   :language language
   :feature 'bison-comment
   '((comment) @font-lock-comment-face)

   :language language
   :feature 'bison-declaration
   '((declaration (declaration_name) @font-lock-keyword-face))

   :language language
   :feature 'bison-type
   '((type) @font-lock-type-face)

   :language language
   :feature 'bison-grammar-rule-usage
   '((grammar_rule_identifier) @font-lock-variable-use-face)

   :language language
   :feature 'bison-grammar-rule-declaration
   '((grammar_rule (grammar_rule_declaration)
		           @font-lock-variable-use-face))

   :language language
   :feature 'bison-string
   :override t
   '((string) @font-lock-string-face)

   :language language
   :feature 'bison-literal
   :override t
   '((char_literal) @font-lock-keyword-face
     (number_literal) @font-lock-number-face)

   :language language
   :feature 'bison-directive-grammar-rule
   :override t
   '((grammar_rule (directive) @font-lock-keyword-face))

   :language language
   :feature 'bison-operator
   :override t
   '(["|"] @font-lock-operator-face)

   :language language
   :feature 'bison-delimiter
   :override t
   '([";"] @font-lock-delimiter-face)))


(treesit-query-validate 'bison '((grammar_rule (grammar_rule_declaration)  @font-lock-variable-name-face)))

(defvar bison-ts-mode--font-lock-feature-list
  '(( bison-comment bison-declaration bison-type
      bison-grammar-rule-usage bison-grammar-rule-declaration
      bison-string bison-literal bison-directive-grammar-rule
      bison-operator bison-delimiter)))

;; !!!New matcher
(defun bison-ts-mode--bison-parent (_node _parent bol &rest _)
  "Get the parent of the bison node at BOL."
  (treesit-node-start (treesit-node-parent (treesit-node-at bol 'bison))))

(defun bison-ts--indent-rules ()
  "Indent rules supported by `bison-ts-mode'."
  (let*
      ((common
	    `(((node-is "^declaration$")
	       column-0 0)

	      ((and (parent-is "^declaration$")
		        (not (node-is "^code_block$")))
	       column-0 2)

	      ((and (parent-is "^declaration$")
		        (node-is "^code_block$"))
	       column-0 0)

	      ((parent-is "^declaration$")
	       parent 2)

	      ((node-is "^grammar_rule$")
	       column-0 0)

	      ((and
	        (parent-is "^grammar_rule$")
	        (node-is ";"))
	       column-0 bison-ts-mode-indent-offset)

	      ((and (parent-is "^grammar_rule$")
		        (node-is "|"))
	       column-0 bison-ts-mode-indent-offset)

	      ((and (parent-is "^grammar_rule$")
		        (not (node-is "^grammar_rule_declaration$"))
		        (not (node-is "^action$")))
	       column-0 ,(+ bison-ts-mode-indent-offset 2))

	      ((or
	        (node-is "^action$")
	        (node-is "}"))
	       column-0 12)

	      ;; Set '%%' at the beginning of the line
	      ((or
	        (and (parent-is "^grammar_rules_section$")
		         (node-is "%%"))
	        (node-is "^grammar_rules_section$"))
	       column-0 0)

	      (no-node parent 0)
	      )
	    ))
    `((bison . ,common)
      ;; !!! Import and override C rules.
      (c
       ((parent-is "translation_unit")
        bison-ts-mode--bison-parent ,bison-ts-mode-indent-offset)
       ,@(alist-get 'c (c-ts-mode--get-indent-style 'c))))))


(define-derived-mode bison-ts-mode prog-mode "Bison"
  "A mode for Bison."
  (when (treesit-ready-p 'bison)
    (setq-local treesit-font-lock-settings
                (append (bison-ts--font-lock-settings 'bison)
                        (c-ts-mode--font-lock-settings 'c)))

    (setq-local treesit-font-lock-feature-list
                (treesit--merge-feature-lists
		         bison-ts-mode--font-lock-feature-list
		         c-ts-mode--feature-list))

    (setq-local treesit-simple-imenu-settings
		        `(("Grammar"
		           "\\`grammar_rule_declaration\\'"
		           nil
		           (lambda (node) (treesit-node-text node) ))))

    (setq-local treesit-simple-indent-rules
                ;; !!! C rules are imported already.
                (bison-ts--indent-rules))

    (setq-local treesit-language-at-point-function 'bison-ts--language-at-point-function)

    (setq-local treesit-range-settings
		        (treesit-range-rules
		         :embed 'c
		         :host 'bison
		         :local t
		         '((embedded_code) @capture)
		         ))

    (treesit-major-mode-setup)))

(provide 'bison-ts-mode)
;;; bison-ts-mode.el ends here

[-- Attachment #3: Type: text/plain, Size: 2 bytes --]




^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Questions about tree-sitter
  2023-09-01 10:58         ` Eli Zaretskii
@ 2023-11-27  7:16           ` Madhu
  0 siblings, 0 replies; 21+ messages in thread
From: Madhu @ 2023-11-27  7:16 UTC (permalink / raw)
  To: eliz; +Cc: emacs-devel

*  Eli Zaretskii  <83zg26b1nz.fsf @gnu.org>
Wrote on Fri, 01 Sep 2023 13:58:24 +0300
>> Date: Fri, 01 Sep 2023 14:45:31 +0530 (IST)
>> Cc: emacs-devel@gnu.org
>> From: Madhu <enometh @meer.net>
>> *  Eli Zaretskii <834jkecrl1.fsf @gnu.org>
>> Wrote on Fri, 01 Sep 2023 09:53:14 +0300
>> >> From: Madhu <enometh @meer.net>
>> >> Date: Fri, 01 Sep 2023 08:09:27 +0530
>> >> * Yuan Fu <C3EFD02D-F02F-4BE8-A6F4-A2506A9EFC90 @gmail.com> :
>> >> Wrote on Wed, 30 Aug 2023 00:03:03 -0700:
>> >> >> On Aug 29, 2023, at 2:26 PM, Augustin Chéneau (BTuin) <btuin @mailo.com> wrote:
>> >> >> 1. Is there a way to reload a grammar?  Emacs is pretty nice
>> >> >> as a playground for testing grammars, but once a grammar is
>> >> >> loaded, it won't be loaded again until Emacs restarts (as far
>> >> >> as I know).  Is it possible to reload a grammar after
>> >> >> modifying it?
>> >> >
>> >> > No, and it’s probably not easy to implement either, since unloading
>> >> > the grammar would require Emacs to purge/invalid all the
>> >> > node/query/parsers using that grammar.

I ran into this, when a "wrong" tressitter dll got loaded, and had an
undefined symbol.  In this situation there is no invalid state to
purge, and all I needed was a way to call dlclose from Elisp. The
shared library loading and unloading mechanism could have been exposed
to the user, in the spirit of the lisp machine.

[There may be an argument of not letting the user shoot himself in the
foot but in context it would not apply, but the security argument only
covers a locking down on freedoms of the user -- retaining control
with developers that consolidates power in a particular direction]

> It is not a change, no.  See above: we already use quite a few of
> libraries for specific jobs related to important Emacs
> functionalities.  For example, good support for sophisticated text
> display and shaping features is unimaginable without HarfBuzz, and
> some scripts cannot even be displayed in a reasonably legible way
> without it.

[But there is a fundamental sense in which the use of treesitter
modules ( "plugins" which are designed to be loaded and unloaded) that
different from the base library examples you give, which I hope you
can appreciate.



^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2023-11-27  7:16 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-29 21:26 Questions about tree-sitter Augustin Chéneau (BTuin)
2023-08-30  7:03 ` Yuan Fu
2023-08-30 11:28   ` Augustin Chéneau (BTuin)
2023-09-06  4:07     ` Yuan Fu
2023-09-08 11:53       ` Augustin Chéneau (BTuin)
2023-09-08 16:43         ` Yuan Fu
2023-09-09 16:39           ` Augustin Chéneau (BTuin)
2023-09-12  0:22             ` Yuan Fu
2023-09-13 12:43               ` Augustin Chéneau (BTuin)
2023-09-14  4:11                 ` Yuan Fu
2023-09-18 17:04                   ` Augustin Chéneau (BTuin)
2023-09-19  4:00                     ` Yuan Fu
2023-09-01  2:39   ` Madhu
2023-09-01  6:53     ` Eli Zaretskii
2023-09-01  9:15       ` Madhu
2023-09-01 10:45         ` Dmitry Gutov
2023-09-01 10:58         ` Eli Zaretskii
2023-11-27  7:16           ` Madhu
2023-09-06 16:11   ` Lynn Winebarger
2023-09-07 23:42     ` Yuan Fu
2023-09-08  0:11       ` Lynn Winebarger

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).