Creating a paradigm for leveraging Tree Sitter's power

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Creating a paradigm for leveraging Tree Sitter's power
@ 2022-12-24  1:32 Perry Smith
  2022-12-24  9:09 ` Yuan Fu
  0 siblings, 1 reply; 8+ messages in thread
From: Perry Smith @ 2022-12-24  1:32 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2818 bytes --]

I've seen on this list talk about "adding" Tree Sitter concepts to
such things as mark-defun and other existing Emacs commands.

I've just spent a few days writing rtsn-mark-method that I intend on
adding to ruby-ts-mode which implements all the features of mark-defun
but I did it from scratch.  This is mostly out of ignorance on how to
leverage existing Emacs features and facilities.

One more concrete reason is mark-defun will include the comments
before the defun.  I wanted the same for mark-method but (as far as I
can tell) the hooks back into mark-defun and its associated routines
are simple regular expressions and that, to me, walks away form the
power that Tree Sitter is providing.

I've got rtsn-mark-method working but I plan to rework it over the
next few days.  The "arg" as well as the "interactive" features of
mark-defun which I also put into rtsn-mark-method I believe can be
pulled out to a wrapper routine.  There is actually quite a few
features in the code that I did not know about.  For example, if
mark-defun is called with -1 (or -N) it marks N back.  Ok.  But now if
it is called again with no argument, mark-defun knows to go backwards
to add in the previous defun.  The same is true in the forward
direction as well.  (Although it does have a subtle bug in my opinion
but I digress.)

To repeat myself, I believe these higher level features could be
separated into a wrapper function so that all that would be needed for
the language specific piece is a routine that would be passed a point
and a direction.

I'll call this the "primitive routine".  The routine would be
responsible for returning a beginning and end (in a cons cell) and it
would be the routine's responsibility to make sure that the beginning
and end lie after (in the forward case) or before (in the backward
case) the point that is passed in.

The wrapper routine would then know how to properly adjust the mark
and point, execute counts, add to regions, catch errors, etc.

Broadening the picture: navigation, transpose, and other Emacs
commands could likewise call the same primitive routine to provide
transpose-method, forward-method, etc.  And, broadening the picture
once more: primitive routines could be written not just for methods
(defun) but also for statements, arguments, expressions, classes, etc.
In all cases, the primitive routine would be relatively simple.  It
would be given a point and return a ( begin . end ) cons cell leaving
all the harder work of expanding the region, remembering the
direction, etc to the wrapper routines.

To be clear, there would be a wrapper routine for mark, a wrapper for
forward, transpose, etc.  We would end up with the classic:

For A + B number of routines we would have A x B number of commands.

Does this strike others as a good idea or insanity?

Perry

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Creating a paradigm for leveraging Tree Sitter's power
  2022-12-24  1:32 Creating a paradigm for leveraging Tree Sitter's power Perry Smith
@ 2022-12-24  9:09 ` Yuan Fu
  2022-12-24 10:07   ` Theodor Thornhill
  2022-12-24 14:32   ` Perry Smith
  0 siblings, 2 replies; 8+ messages in thread
From: Yuan Fu @ 2022-12-24  9:09 UTC (permalink / raw)
  To: Perry Smith; +Cc: emacs-devel



> On Dec 23, 2022, at 5:32 PM, Perry Smith <pedz@easesoftware.com> wrote:
> 
> I've seen on this list talk about "adding" Tree Sitter concepts to
> such things as mark-defun and other existing Emacs commands.
> 
> I've just spent a few days writing rtsn-mark-method that I intend on
> adding to ruby-ts-mode which implements all the features of mark-defun
> but I did it from scratch.  This is mostly out of ignorance on how to
> leverage existing Emacs features and facilities.
> 
> One more concrete reason is mark-defun will include the comments
> before the defun.  I wanted the same for mark-method but (as far as I
> can tell) the hooks back into mark-defun and its associated routines
> are simple regular expressions and that, to me, walks away form the
> power that Tree Sitter is providing.
> 
> I've got rtsn-mark-method working but I plan to rework it over the
> next few days.  The "arg" as well as the "interactive" features of
> mark-defun which I also put into rtsn-mark-method I believe can be
> pulled out to a wrapper routine.  There is actually quite a few
> features in the code that I did not know about.  For example, if
> mark-defun is called with -1 (or -N) it marks N back.  Ok.  But now if
> it is called again with no argument, mark-defun knows to go backwards
> to add in the previous defun.  The same is true in the forward
> direction as well.  (Although it does have a subtle bug in my opinion
> but I digress.)
> 
> To repeat myself, I believe these higher level features could be
> separated into a wrapper function so that all that would be needed for
> the language specific piece is a routine that would be passed a point
> and a direction.

I think it makes a lot of sense.

> 
> I'll call this the "primitive routine".  The routine would be
> responsible for returning a beginning and end (in a cons cell) and it
> would be the routine's responsibility to make sure that the beginning
> and end lie after (in the forward case) or before (in the backward
> case) the point that is passed in.

You mean beginning and end of (symbol | string | statement | …)?

From my experience implementing defun navigation for tree-sitter, it might be more helpful to return three ranges: the thing before point, the thing at point, and the thing after point, and either one could be nil if there  doesn’t exist one. For nested things it can be prev-sibling, parent, next-sibling instead. The point is that the user can move back and forward and make decisions easily with this “field of view”.

> 
> The wrapper routine would then know how to properly adjust the mark
> and point, execute counts, add to regions, catch errors, etc.
> 
> Broadening the picture: navigation, transpose, and other Emacs
> commands could likewise call the same primitive routine to provide
> transpose-method, forward-method, etc.  And, broadening the picture
> once more: primitive routines could be written not just for methods
> (defun) but also for statements, arguments, expressions, classes, etc.
> In all cases, the primitive routine would be relatively simple.  It
> would be given a point and return a ( begin . end ) cons cell leaving
> all the harder work of expanding the region, remembering the
> direction, etc to the wrapper routines.
> 
> To be clear, there would be a wrapper routine for mark, a wrapper for
> forward, transpose, etc.  We would end up with the classic:
> 
> For A + B number of routines we would have A x B number of commands.
> 
> Does this strike others as a good idea or insanity?

Yeah it sounds pretty good. Probably no one needs/wants all the A x B combinations, but the framework that allows this miss and match is good, since it allows us easily create the few combinations we care about. We need more brains thinking about it and more experiments to fully flesh it out.

We also had a related discussion in another thread “Plug treesit.el into other emacs constructs”.

Yuan


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Creating a paradigm for leveraging Tree Sitter's power
  2022-12-24  9:09 ` Yuan Fu
@ 2022-12-24 10:07   ` Theodor Thornhill
  2022-12-24 14:57     ` Perry Smith
  2022-12-24 14:32   ` Perry Smith
  1 sibling, 1 reply; 8+ messages in thread
From: Theodor Thornhill @ 2022-12-24 10:07 UTC (permalink / raw)
  To: emacs-devel, Yuan Fu, Perry Smith; +Cc: emacs-devel



On 24 December 2022 10:09:17 CET, Yuan Fu <casouri@gmail.com> wrote:
>
>
>> On Dec 23, 2022, at 5:32 PM, Perry Smith <pedz@easesoftware.com> wrote:
>> 
>> I've seen on this list talk about "adding" Tree Sitter concepts to
>> such things as mark-defun and other existing Emacs commands.
>> 
>> I've just spent a few days writing rtsn-mark-method that I intend on
>> adding to ruby-ts-mode which implements all the features of mark-defun
>> but I did it from scratch.  This is mostly out of ignorance on how to
>> leverage existing Emacs features and facilities.
>> 
>> One more concrete reason is mark-defun will include the comments
>> before the defun.  I wanted the same for mark-method but (as far as I
>> can tell) the hooks back into mark-defun and its associated routines
>> are simple regular expressions and that, to me, walks away form the
>> power that Tree Sitter is providing.
>> 
>> I've got rtsn-mark-method working but I plan to rework it over the
>> next few days.  The "arg" as well as the "interactive" features of
>> mark-defun which I also put into rtsn-mark-method I believe can be
>> pulled out to a wrapper routine.  There is actually quite a few
>> features in the code that I did not know about.  For example, if
>> mark-defun is called with -1 (or -N) it marks N back.  Ok.  But now if
>> it is called again with no argument, mark-defun knows to go backwards
>> to add in the previous defun.  The same is true in the forward
>> direction as well.  (Although it does have a subtle bug in my opinion
>> but I digress.)
>> 
>> To repeat myself, I believe these higher level features could be
>> separated into a wrapper function so that all that would be needed for
>> the language specific piece is a routine that would be passed a point
>> and a direction.
>
>I think it makes a lot of sense.
>
>> 
>> I'll call this the "primitive routine".  The routine would be
>> responsible for returning a beginning and end (in a cons cell) and it
>> would be the routine's responsibility to make sure that the beginning
>> and end lie after (in the forward case) or before (in the backward
>> case) the point that is passed in.
>
>You mean beginning and end of (symbol | string | statement | …)?
>
>From my experience implementing defun navigation for tree-sitter, it might be more helpful to return three ranges: the thing before point, the thing at point, and the thing after point, and either one could be nil if there  doesn’t exist one. For nested things it can be prev-sibling, parent, next-sibling instead. The point is that the user can move back and forward and make decisions easily with this “field of view”.
>
>> 
>> The wrapper routine would then know how to properly adjust the mark
>> and point, execute counts, add to regions, catch errors, etc.
>> 
>> Broadening the picture: navigation, transpose, and other Emacs
>> commands could likewise call the same primitive routine to provide
>> transpose-method, forward-method, etc.  And, broadening the picture
>> once more: primitive routines could be written not just for methods
>> (defun) but also for statements, arguments, expressions, classes, etc.
>> In all cases, the primitive routine would be relatively simple.  It
>> would be given a point and return a ( begin . end ) cons cell leaving
>> all the harder work of expanding the region, remembering the
>> direction, etc to the wrapper routines.
>> 
>> To be clear, there would be a wrapper routine for mark, a wrapper for
>> forward, transpose, etc.  We would end up with the classic:
>> 
>> For A + B number of routines we would have A x B number of commands.
>> 
>> Does this strike others as a good idea or insanity?
>
>Yeah it sounds pretty good. Probably no one needs/wants all the A x B combinations, but the framework that allows this miss and match is good, since it allows us easily create the few combinations we care about. We need more brains thinking about it and more experiments to fully flesh it out.
>
>We also had a related discussion in another thread “Plug treesit.el into other emacs constructs”.
>
>Yuan

Yeah. One shortcoming of tree-sitter imo is that the parser author decides what the nodes are named. So I think we need to create a framework so that every mode can map over ast-names to Emacs concepts. The goal must be for the normal Emacs things to require little to no changes, but get the benefits from treesit.

I think we should just start doing that immediately on the master branch and allow for "big" changes going forward. We should settle on something good for Emacs 30, hopefully.

I'm a little worried we feel we need "complete" proposals too soon.  

Let's get all good ideas on the table, implemented and installed, then we can consolidate after we discover pain points etc.

I'm working on changing the forward/backward thing and transpose. Not only for tree-sitter, but for others as well :)

What do you think?

Happy holidays!

Theo



^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Creating a paradigm for leveraging Tree Sitter's power
  2022-12-24  9:09 ` Yuan Fu
  2022-12-24 10:07   ` Theodor Thornhill
@ 2022-12-24 14:32   ` Perry Smith
  2022-12-24 16:52     ` [External] : " Drew Adams
  1 sibling, 1 reply; 8+ messages in thread
From: Perry Smith @ 2022-12-24 14:32 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1312 bytes --]


> On Dec 24, 2022, at 03:09, Yuan Fu <casouri@gmail.com> wrote:
> 
>> On Dec 23, 2022, at 5:32 PM, Perry Smith <pedz@easesoftware.com> wrote:
> 
>> 
>> I'll call this the "primitive routine".  The routine would be
>> responsible for returning a beginning and end (in a cons cell) and it
>> would be the routine's responsibility to make sure that the beginning
>> and end lie after (in the forward case) or before (in the backward
>> case) the point that is passed in.
> 
> You mean beginning and end of (symbol | string | statement | …)?

Yes.  A “simple” routine for each concept that the language has that would
Return the beg / end of that construct.  e.g. foo-bar-statement would
return the beg / end of a statement.

> From my experience implementing defun navigation for tree-sitter, it might be more helpful to return three ranges: the thing before point, the thing at point, and the thing after point, and either one could be nil if there  doesn’t exist one. For nested things it can be prev-sibling, parent, next-sibling instead. The point is that the user can move back and forward and make decisions easily with this “field of view”.

Hmm… interesting idea.  As I work on things I’ll keep this idea in mind.  It
might indeed lead to simpler routines.

Perry


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Creating a paradigm for leveraging Tree Sitter's power
  2022-12-24 10:07   ` Theodor Thornhill
@ 2022-12-24 14:57     ` Perry Smith
  2022-12-24 15:13       ` Theodor Thornhill
  0 siblings, 1 reply; 8+ messages in thread
From: Perry Smith @ 2022-12-24 14:57 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: emacs-devel, Yuan Fu

[-- Attachment #1: Type: text/plain, Size: 1402 bytes --]


> On Dec 24, 2022, at 04:07, Theodor Thornhill <theo@thornhill.no> wrote:
> 
> Yeah. One shortcoming of tree-sitter imo is that the parser author decides what the nodes are named. So I think we need to create a framework so that every mode can map over ast-names to Emacs concepts. The goal must be for the normal Emacs things to require little to no changes, but get the benefits from treesit.

To me, in my brain, Tree Sitter is far more expressive and powerful than existing concepts.  “Little to no changes” to me implies fitting a much larger concept into a smaller container and sacrificing the possible expressiveness and power.

> I think we should just start doing that immediately on the master branch and allow for "big" changes going forward. We should settle on something good for Emacs 30, hopefully.
> 
> I'm a little worried we feel we need "complete" proposals too soon.
> 
> Let's get all good ideas on the table, implemented and installed, then we can consolidate after we discover pain points etc.
> 
> I'm working on changing the forward/backward thing and transpose. Not only for tree-sitter, but for others as well :)
> 
> What do you think?

Yes.  I completely agree.  I guess for others, you can take my initial post as “this is the direction I’m exploring in” but it is helpful to get feedback of new ideas and experiences from past mistakes.



[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Creating a paradigm for leveraging Tree Sitter's power
  2022-12-24 14:57     ` Perry Smith
@ 2022-12-24 15:13       ` Theodor Thornhill
  0 siblings, 0 replies; 8+ messages in thread
From: Theodor Thornhill @ 2022-12-24 15:13 UTC (permalink / raw)
  To: Perry Smith; +Cc: emacs-devel, Yuan Fu



On 24 December 2022 15:57:30 CET, Perry Smith <pedz@easesoftware.com> wrote:
>
>> On Dec 24, 2022, at 04:07, Theodor Thornhill <theo@thornhill.no> wrote:
>> 
>> Yeah. One shortcoming of tree-sitter imo is that the parser author decides what the nodes are named. So I think we need to create a framework so that every mode can map over ast-names to Emacs concepts. The goal must be for the normal Emacs things to require little to no changes, but get the benefits from treesit.
>
>To me, in my brain, Tree Sitter is far more expressive and powerful than existing concepts.  “Little to no changes” to me implies fitting a much larger concept into a smaller container and sacrificing the possible expressiveness and power.
>

Yeah I agree, but for many things such as movement, there are some pretty advanced utilities already in Emacs. Would be a shame to just implement it all in an incompatible package just because we didn't try really hard to incorporate it :)

>> I think we should just start doing that immediately on the master branch and allow for "big" changes going forward. We should settle on something good for Emacs 30, hopefully.
>> 
>> I'm a little worried we feel we need "complete" proposals too soon.
>> 
>> Let's get all good ideas on the table, implemented and installed, then we can consolidate after we discover pain points etc.
>> 
>> I'm working on changing the forward/backward thing and transpose. Not only for tree-sitter, but for others as well :)
>> 
>> What do you think?
>
>Yes.  I completely agree.  I guess for others, you can take my initial post as “this is the direction I’m exploring in” but it is helpful to get feedback of new ideas and experiences from past mistakes.
>
>

Yeah I'm in no way objecting to anything you say. From experience (when it was mostly Yuan and me) these things solidify over time, and the more the merrier!

Theo



^ permalink raw reply	[flat|nested] 8+ messages in thread

* RE: [External] : Re: Creating a paradigm for leveraging Tree Sitter's power
  2022-12-24 14:32   ` Perry Smith
@ 2022-12-24 16:52     ` Drew Adams
  2022-12-24 17:29       ` Perry Smith
  0 siblings, 1 reply; 8+ messages in thread
From: Drew Adams @ 2022-12-24 16:52 UTC (permalink / raw)
  To: Perry Smith, Yuan Fu; +Cc: emacs-devel

> >> I'll call this the "primitive routine".  The routine would be
> >> responsible for returning a beginning and end (in a cons cell) and it
> >> would be the routine's responsibility to make sure that the beginning
> >> and end lie after (in the forward case) or before (in the backward
> >> case) the point that is passed in.
> >
> > You mean beginning and end of (symbol | string | statement | …)?
> 
> Yes.  A “simple” routine for each concept that the language has that
> would Return the beg / end of that construct.  e.g. foo-bar-statement 
> would return the beg / end of a statement.

I guess your "concept" corresponds roughly
to what `thing-at-point's calls a THING.

Returning a cons with the "beg / end" of a
THING is what `bounds-of-thing-at-point' does.

Of course, it's based on functions such as
`forward-thing' and those named on properties
`beginning-op' and `end-op' of the THING.
(It's not based on Tree Sitter.)

If Tree Sitter were to define a `beginning-op'
and `end-op' for each "concept" (THING)...

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: [External] : Re: Creating a paradigm for leveraging Tree Sitter's power
  2022-12-24 16:52     ` [External] : " Drew Adams
@ 2022-12-24 17:29       ` Perry Smith
  0 siblings, 0 replies; 8+ messages in thread
From: Perry Smith @ 2022-12-24 17:29 UTC (permalink / raw)
  To: Drew Adams; +Cc: Yuan Fu, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 1425 bytes --]


> On Dec 24, 2022, at 10:52, Drew Adams <drew.adams@oracle.com> wrote:
> 
>>>> I'll call this the "primitive routine".  The routine would be
>>>> responsible for returning a beginning and end (in a cons cell) and it
>>>> would be the routine's responsibility to make sure that the beginning
>>>> and end lie after (in the forward case) or before (in the backward
>>>> case) the point that is passed in.
>>> 
>>> You mean beginning and end of (symbol | string | statement | …)?
>> 
>> Yes.  A “simple” routine for each concept that the language has that
>> would Return the beg / end of that construct.  e.g. foo-bar-statement
>> would return the beg / end of a statement.
> 
> I guess your "concept" corresponds roughly
> to what `thing-at-point's calls a THING.
> 
> Returning a cons with the "beg / end" of a
> THING is what `bounds-of-thing-at-point' does.
> 
> Of course, it's based on functions such as
> `forward-thing' and those named on properties
> `beginning-op' and `end-op' of the THING.
> (It's not based on Tree Sitter.)
> 
> If Tree Sitter were to define a `beginning-op'
> and `end-op' for each "concept" (THING)…


Fascinating.  I will look into this.

Strangely, I’ve used GNU Emacs since it was first released around 1985
but I tend to stay within very small boundaries of all of its features
so I am mostly unaware of 99% of Emacs’ concepts and features.


[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2022-12-24 17:29 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-24  1:32 Creating a paradigm for leveraging Tree Sitter's power Perry Smith
2022-12-24  9:09 ` Yuan Fu
2022-12-24 10:07   ` Theodor Thornhill
2022-12-24 14:57     ` Perry Smith
2022-12-24 15:13       ` Theodor Thornhill
2022-12-24 14:32   ` Perry Smith
2022-12-24 16:52     ` [External] : " Drew Adams
2022-12-24 17:29       ` Perry Smith

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).