all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Standardizing tree-sitter fontification features
@ 2022-11-24 22:16 Yuan Fu
  2022-11-25  1:13 ` Randy Taylor
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Yuan Fu @ 2022-11-24 22:16 UTC (permalink / raw)
  To: emacs-devel

For tree-sitter-based major modes, fontification rules are categorized into “features”, which can be individually turned on/off. I think it would be good to have a standardized list of common features and their precise meaning defined. We’ve been working on these fontification rules for some time and arrived at a reasonable baseline, and now it’s a good time to discuss and bless it, I think.

Right now we have:

Basic tokens:

delimiter       ,.;
operator        = != ||
bracket         []{}()

constant        true, false, null
number
keyword
comment
string
string-interpolation    f"text {variable}"
escape-sequence         "\n\t\\"
function                every function identifier
variable                every variable identifier
type                    every type identifier
property                a.b  <--- highlight b
key                     { a: b, c: d } <--- highlight a, c
error                   highlight parse error

More abstract ones:

assignment: the LHS of an assignment (thing being assigned to), eg:

a = b    <--- highlight a
a.b = c  <--- highlight b
a[1] = d <--- highlight a

definition: the thing being defined, eg:

int a(int b) { <--- highlight a
  return 0
}

int a;  <-- highlight a

struct a { <--- highlight a
  int b;   <--- highlight b
}

There are also language-specific features, but they are not the focus here.

Once we agree on a list of standard features and their definition, the next step would be to figure out how should a major mode introduce its supported features to a user (major mode docstring + link to manual for standard features?).

Also, some of the features are very busy, it would be good if we can disable they by default. The default value of font-lock-maximum-decoration is t, meaning use everything, which is not very helpful...

Yuan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-24 22:16 Standardizing tree-sitter fontification features Yuan Fu
@ 2022-11-25  1:13 ` Randy Taylor
  2022-11-25  6:15   ` Yuan Fu
  2022-11-25  8:13   ` Eli Zaretskii
  2022-11-25  2:56 ` Stefan Monnier
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 22+ messages in thread
From: Randy Taylor @ 2022-11-25  1:13 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

On Thursday, November 24th, 2022 at 17:16, Yuan Fu <casouri@gmail.com> wrote:

> 
> For tree-sitter-based major modes, fontification rules are categorized into “features”, which can be individually turned on/off. I think it would be good to have a standardized list of common features and their precise meaning defined. We’ve been working on these fontification rules for some time and arrived at a reasonable baseline, and now it’s a good time to discuss and bless it, I think.
> 
> Right now we have:
> 
> Basic tokens:
> 
> delimiter ,.;
> operator = != ||
> bracket []{}()
> 
> constant true, false, null
> number
> keyword
> comment
> string
> string-interpolation f"text {variable}"
> escape-sequence "\n\t\\"
> function every function identifier
> variable every variable identifier
> type every type identifier
> property a.b <--- highlight b
> key { a: b, c: d } <--- highlight a, c
> error highlight parse error
> 
> More abstract ones:
> 
> assignment: the LHS of an assignment (thing being assigned to), eg:
> 
> a = b <--- highlight a
> a.b = c <--- highlight b
> a[1] = d <--- highlight a
> 
> definition: the thing being defined, eg:
> 
> int a(int b) { <--- highlight a
> return 0
> }
> 
> int a; <-- highlight a
> 
> struct a { <--- highlight a
> int b; <--- highlight b
> }
> 
> There are also language-specific features, but they are not the focus here.
> 
> Once we agree on a list of standard features and their definition, the next step would be to figure out how should a major mode introduce its supported features to a user (major mode docstring + link to manual for standard features?).
> 
> Also, some of the features are very busy, it would be good if we can disable they by default. The default value of font-lock-maximum-decoration is t, meaning use everything, which is not very helpful...
> 
> Yuan

Looks good!

key should be considered property IMO, and that's how we're highlighting things now.

I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.

I would also add:
- misc-punctuation, for anything not considered a delimiter or bracket. Most modes would use this for any special punctuation they've got.
- (maybe) literal instead of number? That way there is a group for chars too (and any other literals if there are any?). Or a char feature in addition to the existing number one. I'm undecided...

Maybe a slight tangent but I also suggest we alphabetize all of these; both the queries and the list of features. I'll send a patch to do that myself once things cool down a bit. Although anything that overrides will need to go at the bottom to make sure it gets applied.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-24 22:16 Standardizing tree-sitter fontification features Yuan Fu
  2022-11-25  1:13 ` Randy Taylor
@ 2022-11-25  2:56 ` Stefan Monnier
  2022-11-25  6:34   ` Yuan Fu
  2022-11-26 14:03 ` Stephen Leake
       [not found] ` <2AEA8AB6-593E-4D89-AB05-0C8EB2BCE327@gmail.com>
  3 siblings, 1 reply; 22+ messages in thread
From: Stefan Monnier @ 2022-11-25  2:56 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

> Basic tokens:
>
> delimiter       ,.;
> operator        = != ||

From the grammar point of view, both of these are simply infix thingies.
Defining precisely the difference between them can be a bit more
delicate, tho.  I guess intuitively the idea is that "operator"
corresponds to some kind of run-time operation whereas "delimiter"
serves only to figure out where things start and end but doesn't do
anything in itself.

Not clear where the `=` used for definitions (as in "let x = foo in
bar") should fall.  Same for the "," used to construct pairs
or the conjunction/disjunction in Prolog written `,` and `;`
respectively :-)

> string-interpolation    f"text {variable}"
> escape-sequence         "\n\t\\"
> function                every function identifier

I think we definitely need to distinguish a reference/use of a function
from a definition of a function.  I do want my function identifiers
highlighted in my function definitions but *not* in my function calls.

> variable                every variable identifier

Same here.
[ Note that in many languages (e.g. Scheme and C), functions and
  "variables" are the same (i.e. Lisp-1 in the world of Lisp).  ]

> type                    every type identifier

And same here as well.
[ I will spare you the discussion of what should happen for dependently
  typed languages where types are "normal" values.  ]

> Also, some of the features are very busy, it would be good if we can disable
> they by default. The default value of font-lock-maximum-decoration is t,
> meaning use everything, which is not very helpful...

If you compare the "style" of font-lock rules used until now to those
provide in the new tree-sitter modes, I think it makes sense for
tree-sitter modes to default to "medium" decorations.


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25  1:13 ` Randy Taylor
@ 2022-11-25  6:15   ` Yuan Fu
  2022-11-25 19:03     ` Randy Taylor
  2022-11-25  8:13   ` Eli Zaretskii
  1 sibling, 1 reply; 22+ messages in thread
From: Yuan Fu @ 2022-11-25  6:15 UTC (permalink / raw)
  To: Randy Taylor; +Cc: emacs-devel



> On Nov 24, 2022, at 5:13 PM, Randy Taylor <dev@rjt.dev> wrote:
> 
> On Thursday, November 24th, 2022 at 17:16, Yuan Fu <casouri@gmail.com> wrote:
> 
>> 
>> For tree-sitter-based major modes, fontification rules are categorized into “features”, which can be individually turned on/off. I think it would be good to have a standardized list of common features and their precise meaning defined. We’ve been working on these fontification rules for some time and arrived at a reasonable baseline, and now it’s a good time to discuss and bless it, I think.
>> 
>> Right now we have:
>> 
>> Basic tokens:
>> 
>> delimiter ,.;
>> operator = != ||
>> bracket []{}()
>> 
>> constant true, false, null
>> number
>> keyword
>> comment
>> string
>> string-interpolation f"text {variable}"
>> escape-sequence "\n\t\\"
>> function every function identifier
>> variable every variable identifier
>> type every type identifier
>> property a.b <--- highlight b
>> key { a: b, c: d } <--- highlight a, c
>> error highlight parse error
>> 
>> More abstract ones:
>> 
>> assignment: the LHS of an assignment (thing being assigned to), eg:
>> 
>> a = b <--- highlight a
>> a.b = c <--- highlight b
>> a[1] = d <--- highlight a
>> 
>> definition: the thing being defined, eg:
>> 
>> int a(int b) { <--- highlight a
>> return 0
>> }
>> 
>> int a; <-- highlight a
>> 
>> struct a { <--- highlight a
>> int b; <--- highlight b
>> }
>> 
>> There are also language-specific features, but they are not the focus here.
>> 
>> Once we agree on a list of standard features and their definition, the next step would be to figure out how should a major mode introduce its supported features to a user (major mode docstring + link to manual for standard features?).
>> 
>> Also, some of the features are very busy, it would be good if we can disable they by default. The default value of font-lock-maximum-decoration is t, meaning use everything, which is not very helpful...
>> 
>> Yuan
> 
> Looks good!
> 
> key should be considered property IMO, and that's how we're highlighting things now.

I agree.

> 
> I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.

They are definitely useful. They are the things we currently highlight, and for a reason. Personally I only want to highlight identifiers in definition and assignment, not every occurrence of them. Since so much of a program consists of variable and function identifiers, highlighting all of them looks almost like highlighting everything. I just want some visual cues on the program structure, not programming in skittles :-)

> 
> I would also add:
> - misc-punctuation, for anything not considered a delimiter or bracket. Most modes would use this for any special punctuation they've got.

Is there any examples? Maybe just merge delimiter and punctuation together?

> - (maybe) literal instead of number? That way there is a group for chars too (and any other literals if there are any?). Or a char feature in addition to the existing number one. I'm undecided...

Literal seems to encompass numbers, strings, chars, and constants. So I don’t know how does it fit. We could add char to string feature.

> 
> Maybe a slight tangent but I also suggest we alphabetize all of these; both the queries and the list of features. I'll send a patch to do that myself once things cool down a bit. Although anything that overrides will need to go at the bottom to make sure it gets applied.

Good idea :-) A tangent of your tangent: how did you alphabetize them? Did you use the sort-word package on EmacsWiki?

Yuan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25  2:56 ` Stefan Monnier
@ 2022-11-25  6:34   ` Yuan Fu
  2022-11-25 14:52     ` Stefan Monnier
  0 siblings, 1 reply; 22+ messages in thread
From: Yuan Fu @ 2022-11-25  6:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel



> On Nov 24, 2022, at 6:56 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
>> Basic tokens:
>> 
>> delimiter       ,.;
>> operator        = != ||
> 
> From the grammar point of view, both of these are simply infix thingies.
> Defining precisely the difference between them can be a bit more
> delicate, tho.  I guess intuitively the idea is that "operator"
> corresponds to some kind of run-time operation whereas "delimiter"
> serves only to figure out where things start and end but doesn't do
> anything in itself.

Maybe think in semantics? If it produces some value, it’s an operator, otherwise it’s a delimiter/punctuation. And I should have used == instead of = in the example. I was thinking of ==.

> 
> Not clear where the `=` used for definitions (as in "let x = foo in
> bar") should fall.  Same for the "," used to construct pairs
> or the conjunction/disjunction in Prolog written `,` and `;`
> respectively :-)

I think stuff like `:` in Haskell (I only know Haskell) can be considered operator, but `::` would be punctuation. With that said, I didn’t put much thought into delimiters, and it seems to be a slippery slope (as shown in your `=` example). Maybe we should limit it to the narrowest scope for now, just things like , ; that truly are delimiting things.

> 
>> string-interpolation    f"text {variable}"
>> escape-sequence         "\n\t\\"
>> function                every function identifier
> 
> I think we definitely need to distinguish a reference/use of a function
> from a definition of a function.  I do want my function identifiers
> highlighted in my function definitions but *not* in my function calls.

That’s when assignment/definition come in. See my reply to Randy.

> 
>> variable                every variable identifier
> 
> Same here.
> [ Note that in many languages (e.g. Scheme and C), functions and
>  "variables" are the same (i.e. Lisp-1 in the world of Lisp).  ]

We can say that if the symbol is used as a function, highlight in function face, and if it is used as a value, highlight in variables face.

> 
>> type                    every type identifier
> 
> And same here as well.
> [ I will spare you the discussion of what should happen for dependently
>  typed languages where types are "normal" values.  ]

“Anything capitalized in type face”, I guess ;-)

> 
>> Also, some of the features are very busy, it would be good if we can disable
>> they by default. The default value of font-lock-maximum-decoration is t,
>> meaning use everything, which is not very helpful...
> 
> If you compare the "style" of font-lock rules used until now to those
> provide in the new tree-sitter modes, I think it makes sense for
> tree-sitter modes to default to "medium" decorations.

Like setting font-lock-maximum-decoration to a number in major mode body, it its default value is t? That changes the meaning of t in font-lock-maximum-decoration. Can we change its default value to 3, and put the busy features at level 4? I need to survey existing major modes and see how many levels do they have right now. 

Yuan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25  1:13 ` Randy Taylor
  2022-11-25  6:15   ` Yuan Fu
@ 2022-11-25  8:13   ` Eli Zaretskii
  2022-11-25 19:14     ` Randy Taylor
  2022-11-26 14:07     ` Stephen Leake
  1 sibling, 2 replies; 22+ messages in thread
From: Eli Zaretskii @ 2022-11-25  8:13 UTC (permalink / raw)
  To: Randy Taylor; +Cc: casouri, emacs-devel

> Date: Fri, 25 Nov 2022 01:13:46 +0000
> From: Randy Taylor <dev@rjt.dev>
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> On Thursday, November 24th, 2022 at 17:16, Yuan Fu <casouri@gmail.com> wrote:
> 
> I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.

AFAIU, this is about the difference between defining a function and calling
it.  The distinction could be useful, at least in some cases.  We could make
this off by default, of course, but I don't think we should ignore the
distinction.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25  6:34   ` Yuan Fu
@ 2022-11-25 14:52     ` Stefan Monnier
  0 siblings, 0 replies; 22+ messages in thread
From: Stefan Monnier @ 2022-11-25 14:52 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

> Like setting font-lock-maximum-decoration to a number in major mode body, it
> its default value is t? That changes the meaning of t in
> font-lock-maximum-decoration. Can we change its default value to 3, and put
> the busy features at level 4? I need to survey existing major modes and see
> how many levels do they have right now. 

I would use a new `treesit-font-lock-(decoration|level|younameit)` and
ignore `font-lock-maximum-decoration` (or pay attention to it only if
it's not set to t).


        Stefan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25  6:15   ` Yuan Fu
@ 2022-11-25 19:03     ` Randy Taylor
  2022-11-25 20:55       ` Yuan Fu
  0 siblings, 1 reply; 22+ messages in thread
From: Randy Taylor @ 2022-11-25 19:03 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

On Friday, November 25th, 2022 at 01:15, Yuan Fu <casouri@gmail.com> wrote:

> 
> > On Nov 24, 2022, at 5:13 PM, Randy Taylor dev@rjt.dev wrote:
> > 
> 
> > I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.
> 
> 
> They are definitely useful. They are the things we currently highlight, and for a reason. Personally I only want to highlight identifiers in definition and assignment, not every occurrence of them. Since so much of a program consists of variable and function identifiers, highlighting all of them looks almost like highlighting everything. I just want some visual cues on the program structure, not programming in skittles :-)

Fair enough. In that case, are we going to end up with duplication in the variable and function features? So long as I can program in skittles, I am happy :).

> 
> > I would also add:
> > - misc-punctuation, for anything not considered a delimiter or bracket. Most modes would use this for any special punctuation they've got.
> 
> 
> Is there any examples? Maybe just merge delimiter and punctuation together?
>

Yes, see sh-script.el. $ is used for misc-punctuation. neovim and the Emacs dynamic module tree-sitter implementation have more examples (they call it special punctuation).

I think we should keep them all separate.
 
> > - (maybe) literal instead of number? That way there is a group for chars too (and any other literals if there are any?). Or a char feature in addition to the existing number one. I'm undecided...
> 
> 
> Literal seems to encompass numbers, strings, chars, and constants. So I don’t know how does it fit. We could add char to string feature.
> 

Sounds good. I think I've seen char also end up in constant in some of our tree-sitter modes. Doesn't matter to me where it goes.

> > Maybe a slight tangent but I also suggest we alphabetize all of these; both the queries and the list of features. I'll send a patch to do that myself once things cool down a bit. Although anything that overrides will need to go at the bottom to make sure it gets applied.
> 
> 
> Good idea :-) A tangent of your tangent: how did you alphabetize them? Did you use the sort-word package on EmacsWiki?
> 
> Yuan

Manually ;) (except for the keyword and such lists, those I use the built-in sort-lines on). I really like saying the alphabet over and over.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25  8:13   ` Eli Zaretskii
@ 2022-11-25 19:14     ` Randy Taylor
  2022-11-26 14:07     ` Stephen Leake
  1 sibling, 0 replies; 22+ messages in thread
From: Randy Taylor @ 2022-11-25 19:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, emacs-devel

On Friday, November 25th, 2022 at 03:13, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> > Date: Fri, 25 Nov 2022 01:13:46 +0000
> 
> > From: Randy Taylor dev@rjt.dev
> > Cc: emacs-devel emacs-devel@gnu.org
> > 
> > On Thursday, November 24th, 2022 at 17:16, Yuan Fu casouri@gmail.com wrote:
> > 
> > I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.
> 
> 
> AFAIU, this is about the difference between defining a function and calling
> it. The distinction could be useful, at least in some cases. We could make
> this off by default, of course, but I don't think we should ignore the
> distinction.

Yes, in my mind the variable, function, etc. features would cover everything but it's clear people want more control over that aspect. We could give them different faces, but that would require making more faces which probably isn't desired. The only unfortunate thing, as I asked Yuan, is we may end up with duplication (depending on how we do it). Assignment and declaration will have a fair bit of overlap with the variable and function features.



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25 19:03     ` Randy Taylor
@ 2022-11-25 20:55       ` Yuan Fu
  2022-11-26  3:35         ` Randy Taylor
  0 siblings, 1 reply; 22+ messages in thread
From: Yuan Fu @ 2022-11-25 20:55 UTC (permalink / raw)
  To: Randy Taylor; +Cc: emacs-devel



> On Nov 25, 2022, at 11:03 AM, Randy Taylor <dev@rjt.dev> wrote:
> 
> On Friday, November 25th, 2022 at 01:15, Yuan Fu <casouri@gmail.com> wrote:
> 
>> 
>>> On Nov 24, 2022, at 5:13 PM, Randy Taylor dev@rjt.dev wrote:
>>> 
>> 
>>> I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.
>> 
>> 
>> They are definitely useful. They are the things we currently highlight, and for a reason. Personally I only want to highlight identifiers in definition and assignment, not every occurrence of them. Since so much of a program consists of variable and function identifiers, highlighting all of them looks almost like highlighting everything. I just want some visual cues on the program structure, not programming in skittles :-)
> 
> Fair enough. In that case, are we going to end up with duplication in the variable and function features? So long as I can program in skittles, I am happy :).

I don’t think duplication would be a problem. You can disable them, or just left it on: the face would be the same, the difference is only one _which_ ones to highlight.

> 
>> 
>>> I would also add:
>>> - misc-punctuation, for anything not considered a delimiter or bracket. Most modes would use this for any special punctuation they've got.
>> 
>> 
>> Is there any examples? Maybe just merge delimiter and punctuation together?
>> 
> 
> Yes, see sh-script.el. $ is used for misc-punctuation. neovim and the Emacs dynamic module tree-sitter implementation have more examples (they call it special punctuation).
> 
> I think we should keep them all separate.

Ok, then delimiter and misc-punctuation it is.

> 
>>> - (maybe) literal instead of number? That way there is a group for chars too (and any other literals if there are any?). Or a char feature in addition to the existing number one. I'm undecided...
>> 
>> 
>> Literal seems to encompass numbers, strings, chars, and constants. So I don’t know how does it fit. We could add char to string feature.
>> 
> 
> Sounds good. I think I've seen char also end up in constant in some of our tree-sitter modes. Doesn't matter to me where it goes.

Hmm, what does neovim and emacs-tree-sitter do? (If you happen to know)


>>> Maybe a slight tangent but I also suggest we alphabetize all of these; both the queries and the list of features. I'll send a patch to do that myself once things cool down a bit. Although anything that overrides will need to go at the bottom to make sure it gets applied.
>> 
>> 
>> Good idea :-) A tangent of your tangent: how did you alphabetize them? Did you use the sort-word package on EmacsWiki?
>> 
>> Yuan
> 
> Manually ;) (except for the keyword and such lists, those I use the built-in sort-lines on). I really like saying the alphabet over and over.

Well that’s a very convenient hobby :-)

Yuan




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25 20:55       ` Yuan Fu
@ 2022-11-26  3:35         ` Randy Taylor
  2022-12-05 21:17           ` Yuan Fu
  0 siblings, 1 reply; 22+ messages in thread
From: Randy Taylor @ 2022-11-26  3:35 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

On Friday, November 25th, 2022 at 15:55, Yuan Fu <casouri@gmail.com> wrote:
>
> > On Nov 25, 2022, at 11:03 AM, Randy Taylor dev@rjt.dev wrote:
> >
> > On Friday, November 25th, 2022 at 01:15, Yuan Fu casouri@gmail.com wrote:
> >
> > > > On Nov 24, 2022, at 5:13 PM, Randy Taylor dev@rjt.dev wrote:
> > >
> > > > I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.
> > >
> > > They are definitely useful. They are the things we currently highlight, and for a reason. Personally I only want to highlight identifiers in definition and assignment, not every occurrence of them. Since so much of a program consists of variable and function identifiers, highlighting all of them looks almost like highlighting everything. I just want some visual cues on the program structure, not programming in skittles :-)
> >
> > Fair enough. In that case, are we going to end up with duplication in the variable and function features? So long as I can program in skittles, I am happy :).
>
>
> I don’t think duplication would be a problem. You can disable them, or just left it on: the face would be the same, the difference is only one which ones to highlight.
>

I meant duplication in writing the queries. Variable and function queries would already cover most or all of what's in assignment and definition.

>
> > > > - (maybe) literal instead of number? That way there is a group for chars too (and any other literals if there are any?). Or a char feature in addition to the existing number one. I'm undecided...
> > >
> > > Literal seems to encompass numbers, strings, chars, and constants. So I don’t know how does it fit. We could add char to string feature.
> >
> > Sounds good. I think I've seen char also end up in constant in some of our tree-sitter modes. Doesn't matter to me where it goes.
>
>
> Hmm, what does neovim and emacs-tree-sitter do? (If you happen to know)
>

I'm not sure how the neovim stuff works, if it's faces they are defining or features or what. But they get pretty granular, so they've got their own thing for character (they even differentiate between conditionals and loops!!!). emacs-tree-sitter isn't consistent.

I think it makes sense to include it in string though, just so long as it doesn't have the same face. Perhaps the constant face is best for it?



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-24 22:16 Standardizing tree-sitter fontification features Yuan Fu
  2022-11-25  1:13 ` Randy Taylor
  2022-11-25  2:56 ` Stefan Monnier
@ 2022-11-26 14:03 ` Stephen Leake
  2022-11-26 14:29   ` [SPAM UNSURE] " Stephen Leake
       [not found] ` <2AEA8AB6-593E-4D89-AB05-0C8EB2BCE327@gmail.com>
  3 siblings, 1 reply; 22+ messages in thread
From: Stephen Leake @ 2022-11-26 14:03 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

Yuan Fu <casouri@gmail.com> writes:

> For tree-sitter-based major modes, fontification rules are categorized
> into “features”, which can be individually turned on/off. I think it
> would be good to have a standardized list of common features and their
> precise meaning defined. We’ve been working on these fontification
> rules for some time and arrived at a reasonable baseline, and now it’s
> a good time to discuss and bless it, I think.
>
> Right now we have:
>
> Basic tokens:
>
> delimiter       ,.;
> operator        = != ||
> bracket         []{}()
>
> constant        true, false, null
> number
> keyword
> comment
> string
> string-interpolation    f"text {variable}"
> escape-sequence         "\n\t\\"
> function                every function identifier
> variable                every variable identifier
> type                    every type identifier
> property                a.b  <--- highlight b

namespace would be useful; in some languages, like C++, the syntax
namespace::namespace::object.member makes it clear which identifiers are
namespaces. In others, like Ada the syntax is
namespace.namespace.object.member, so it's not clear.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-25  8:13   ` Eli Zaretskii
  2022-11-25 19:14     ` Randy Taylor
@ 2022-11-26 14:07     ` Stephen Leake
  1 sibling, 0 replies; 22+ messages in thread
From: Stephen Leake @ 2022-11-26 14:07 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Randy Taylor, casouri, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Fri, 25 Nov 2022 01:13:46 +0000
>> From: Randy Taylor <dev@rjt.dev>
>> Cc: emacs-devel <emacs-devel@gnu.org>
>> 
>> On Thursday, November 24th, 2022 at 17:16, Yuan Fu <casouri@gmail.com> wrote:
>> 
>> I wonder if assignment and definition are really worth having (and
>> would prefer to do without them), since they should be covered by
>> the variable, function, type and property features.
>
> AFAIU, this is about the difference between defining a function and calling
> it.  The distinction could be useful, at least in some cases.  We could make
> this off by default, of course, but I don't think we should ignore the
> distinction.

Language Server Protocol uses distinct "types" and "modifiers" for this
(https://microsoft.github.io/language-server-protocol/specifications/lsp/3.17/specification/#textDocument_semanticTokens);
an identifier can have "function" type, and an instance of it can have
"declaration" or "implementation" or no modifier.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPAM UNSURE] Re: Standardizing tree-sitter fontification features
  2022-11-26 14:03 ` Stephen Leake
@ 2022-11-26 14:29   ` Stephen Leake
  2022-11-26 22:05     ` [SPAM UNSURE] " Yuan Fu
  0 siblings, 1 reply; 22+ messages in thread
From: Stephen Leake @ 2022-11-26 14:29 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

Stephen Leake <stephen_leake@stephe-leake.org> writes:

>
> namespace would be useful; in some languages, like C++, the syntax
> namespace::namespace::object.member makes it clear which identifiers are
> namespaces. In others, like Ada the syntax is
> namespace.namespace.object.member, so it's not clear.

Never mind; for Ada, this requires name resolution, which is beyond what
tree-sitter can accomplish (same for the wisi parser). The
ada_language_server does do name resolution, so it can label namespace
accurately.

-- 
-- Stephe



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: [SPAM UNSURE] Standardizing tree-sitter fontification features
  2022-11-26 14:29   ` [SPAM UNSURE] " Stephen Leake
@ 2022-11-26 22:05     ` Yuan Fu
  0 siblings, 0 replies; 22+ messages in thread
From: Yuan Fu @ 2022-11-26 22:05 UTC (permalink / raw)
  To: Stephen Leake; +Cc: emacs-devel



> On Nov 26, 2022, at 6:29 AM, Stephen Leake <stephen_leake@stephe-leake.org> wrote:
> 
> Stephen Leake <stephen_leake@stephe-leake.org> writes:
> 
>> 
>> namespace would be useful; in some languages, like C++, the syntax
>> namespace::namespace::object.member makes it clear which identifiers are
>> namespaces. In others, like Ada the syntax is
>> namespace.namespace.object.member, so it's not clear.
> 
> Never mind; for Ada, this requires name resolution, which is beyond what
> tree-sitter can accomplish (same for the wisi parser). The
> ada_language_server does do name resolution, so it can label namespace
> accurately.

Indeed, I glance at their types and modifiers, and they are very rich in semantics, whereas we only have the syntax to work with.

Yuan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
       [not found] ` <2AEA8AB6-593E-4D89-AB05-0C8EB2BCE327@gmail.com>
@ 2022-12-03  1:12   ` Yuan Fu
  2022-12-03 14:34     ` Mattias Engdegård
  0 siblings, 1 reply; 22+ messages in thread
From: Yuan Fu @ 2022-12-03  1:12 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: emacs-devel

[Addint emacs-devel back]

> On Nov 25, 2022, at 3:20 AM, Mattias Engdegård <mattias.engdegard@gmail.com> wrote:
> 
>> Right now we have:
> [...]
>> comment
> 
> What about treating doc-comments as a separate case?
> (May also want to fontify various parts inside, such as doc mark-up elements.)

That makes me wonder what should we do with string & docstrings & string-interpolation. Should we make them all separate features, or put them all in a single “string” feature, or have some major mode variable to turn things on/off?

My thoughts:
- docstring is pretty standard for a given language, so they can probably enabled by default and be part of the string feature.
- string interpolation could be part of string feature, but since “string” feature is enabled at pretty low fontification level, I decided to make it separate, and only enabled it in higher fontification levels.
- doc-comment seems rather non-standards, and could even be different from project to project. So I tend to think it should be turned on/off by a major mode variable in major modes that support this feature. And user can enable/disable it with dir-local or file-local variables.

> 
> Otherwise I mostly agree with the proposal but fear that it may result in an overly busy scheme. Colour only really helps when it helps highlighting structure, not contents.

I agree, that’s why we don’t enable all the features by default. The default fontification should be pretty sane (IMO).

Yuan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-12-03  1:12   ` Yuan Fu
@ 2022-12-03 14:34     ` Mattias Engdegård
  2022-12-05  8:58       ` Theodor Thornhill
  0 siblings, 1 reply; 22+ messages in thread
From: Mattias Engdegård @ 2022-12-03 14:34 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

3 dec. 2022 kl. 02.12 skrev Yuan Fu <casouri@gmail.com>:

> - doc-comment seems rather non-standards, and could even be different from project to project.

They actually tend to be quite well standardised for most languages. Main exceptions are C and C++, but even there Doxygen seems to dominate.

> So I tend to think it should be turned on/off by a major mode variable in major modes that support this feature. And user can enable/disable it with dir-local or file-local variables.

I definitely think doc comments should be painted differently from other comments by default, at least for languages where there is a clear standard.




^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-12-03 14:34     ` Mattias Engdegård
@ 2022-12-05  8:58       ` Theodor Thornhill
  2022-12-05 10:26         ` Mattias Engdegård
  0 siblings, 1 reply; 22+ messages in thread
From: Theodor Thornhill @ 2022-12-05  8:58 UTC (permalink / raw)
  To: Mattias Engdegård, Yuan Fu; +Cc: emacs-devel

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> 3 dec. 2022 kl. 02.12 skrev Yuan Fu <casouri@gmail.com>:
>
>> - doc-comment seems rather non-standards, and could even be different
>> from project to project.
>
> They actually tend to be quite well standardised for most
> languages. Main exceptions are C and C++, but even there Doxygen seems
> to dominate.
>
>> So I tend to think it should be turned on/off by a major mode
>> variable in major modes that support this feature. And user can
>> enable/disable it with dir-local or file-local variables.
>
> I definitely think doc comments should be painted differently from
> other comments by default, at least for languages where there is a
> clear standard.

I agree - but in most tree-sitter languages it seems like there usually
is no distinction between them.  We need to implement some heuristics to
locate a comment above method etc, if I'm not mistaken.

Theo



^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-12-05  8:58       ` Theodor Thornhill
@ 2022-12-05 10:26         ` Mattias Engdegård
  2022-12-05 11:30           ` Theodor Thornhill
  0 siblings, 1 reply; 22+ messages in thread
From: Mattias Engdegård @ 2022-12-05 10:26 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: Yuan Fu, emacs-devel

5 dec. 2022 kl. 09.58 skrev Theodor Thornhill <theo@thornhill.no>:

> I agree - but in most tree-sitter languages it seems like there usually
> is no distinction between them.  We need to implement some heuristics to
> locate a comment above method etc, if I'm not mistaken.

At least distinguish doc comments by their special syntax, such as `-- !` or `/**`; it's better than nothing and only requires local analysis. A grammar tie-in to make sure they aren't misplaced is obviously better (and valuable) but it can be a later improvement.





^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-12-05 10:26         ` Mattias Engdegård
@ 2022-12-05 11:30           ` Theodor Thornhill
  2022-12-05 21:02             ` Yuan Fu
  0 siblings, 1 reply; 22+ messages in thread
From: Theodor Thornhill @ 2022-12-05 11:30 UTC (permalink / raw)
  To: Mattias Engdegård; +Cc: Yuan Fu, emacs-devel

Mattias Engdegård <mattias.engdegard@gmail.com> writes:

> 5 dec. 2022 kl. 09.58 skrev Theodor Thornhill <theo@thornhill.no>:
>
>> I agree - but in most tree-sitter languages it seems like there usually
>> is no distinction between them.  We need to implement some heuristics to
>> locate a comment above method etc, if I'm not mistaken.
>
> At least distinguish doc comments by their special syntax, such as `--
> !` or `/**`; it's better than nothing and only requires local
> analysis. A grammar tie-in to make sure they aren't misplaced is
> obviously better (and valuable) but it can be a later improvement.

Sure, but I don't think it's too hard.  We could do something like (on
emacs-29 branch):

diff --git a/lisp/progmodes/java-ts-mode.el b/lisp/progmodes/java-ts-mode.el
index 2c42505ac9..abf67a4c14 100644
--- a/lisp/progmodes/java-ts-mode.el
+++ b/lisp/progmodes/java-ts-mode.el
@@ -123,13 +123,24 @@ java-ts-mode--operators
     "|=" "~" ">>" ">>>" "<<" "::" "?" "&=")
   "C operators for tree-sitter font-locking.")
 
+(defun java-ts-mode--font-lock-comment (node override start end &rest _)
+  (when (or (equal (treesit-node-type node) "block_comment")
+            (equal (treesit-node-type node) "line_comment"))
+    (let ((face (if (equal (treesit-node-type (treesit-node-next-sibling node))
+                           "method_declaration")
+                    'font-lock-doc-face
+                  'font-lock-comment-face)))
+      (treesit-fontify-with-override
+       (treesit-node-start node) (treesit-node-end node)
+       face override start end))))
+
 (defvar java-ts-mode--font-lock-settings
   (treesit-font-lock-rules
    :language 'java
    :override t
    :feature 'comment
-   `((line_comment) @font-lock-comment-face
-     (block_comment) @font-lock-comment-face)
+   `((line_comment) @java-ts-mode--font-lock-comment
+     (block_comment) @java-ts-mode--font-lock-comment)
    :language 'java
    :override t
    :feature 'constant


This naive function will work for comments directly above a method.  It
won't try to fix annotations and do other smartness.  The local analysis
is actually a little more complex because you need to extract the
comment text and scan it.  Is a more robust variant of this of interest?

Theo



^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-12-05 11:30           ` Theodor Thornhill
@ 2022-12-05 21:02             ` Yuan Fu
  0 siblings, 0 replies; 22+ messages in thread
From: Yuan Fu @ 2022-12-05 21:02 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: Mattias Engdegård, emacs-devel



> On Dec 5, 2022, at 3:30 AM, Theodor Thornhill <theo@thornhill.no> wrote:
> 
> Mattias Engdegård <mattias.engdegard@gmail.com> writes:
> 
>> 5 dec. 2022 kl. 09.58 skrev Theodor Thornhill <theo@thornhill.no>:
>> 
>>> I agree - but in most tree-sitter languages it seems like there usually
>>> is no distinction between them.  We need to implement some heuristics to
>>> locate a comment above method etc, if I'm not mistaken.
>> 
>> At least distinguish doc comments by their special syntax, such as `--
>> !` or `/**`; it's better than nothing and only requires local
>> analysis. A grammar tie-in to make sure they aren't misplaced is
>> obviously better (and valuable) but it can be a later improvement.
> 
> Sure, but I don't think it's too hard.  We could do something like (on
> emacs-29 branch):
> 
> diff --git a/lisp/progmodes/java-ts-mode.el b/lisp/progmodes/java-ts-mode.el
> index 2c42505ac9..abf67a4c14 100644
> --- a/lisp/progmodes/java-ts-mode.el
> +++ b/lisp/progmodes/java-ts-mode.el
> @@ -123,13 +123,24 @@ java-ts-mode--operators
>     "|=" "~" ">>" ">>>" "<<" "::" "?" "&=")
>   "C operators for tree-sitter font-locking.")
> 
> +(defun java-ts-mode--font-lock-comment (node override start end &rest _)
> +  (when (or (equal (treesit-node-type node) "block_comment")
> +            (equal (treesit-node-type node) "line_comment"))
> +    (let ((face (if (equal (treesit-node-type (treesit-node-next-sibling node))
> +                           "method_declaration")
> +                    'font-lock-doc-face
> +                  'font-lock-comment-face)))
> +      (treesit-fontify-with-override
> +       (treesit-node-start node) (treesit-node-end node)
> +       face override start end))))
> +
> (defvar java-ts-mode--font-lock-settings
>   (treesit-font-lock-rules
>    :language 'java
>    :override t
>    :feature 'comment
> -   `((line_comment) @font-lock-comment-face
> -     (block_comment) @font-lock-comment-face)
> +   `((line_comment) @java-ts-mode--font-lock-comment
> +     (block_comment) @java-ts-mode--font-lock-comment)
>    :language 'java
>    :override t
>    :feature 'constant
> 
> 
> This naive function will work for comments directly above a method.  It
> won't try to fix annotations and do other smartness.  The local analysis
> is actually a little more complex because you need to extract the
> comment text and scan it.  Is a more robust variant of this of interest?

Yeah! Throw in some checks for empty lines and  `-- !` or `/**` (when applicable) and it’ll be good to go.

Yuan


^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Standardizing tree-sitter fontification features
  2022-11-26  3:35         ` Randy Taylor
@ 2022-12-05 21:17           ` Yuan Fu
  0 siblings, 0 replies; 22+ messages in thread
From: Yuan Fu @ 2022-12-05 21:17 UTC (permalink / raw)
  To: Randy Taylor; +Cc: emacs-devel



> On Nov 25, 2022, at 7:35 PM, Randy Taylor <dev@rjt.dev> wrote:
> 
> On Friday, November 25th, 2022 at 15:55, Yuan Fu <casouri@gmail.com> wrote:
>> 
>>> On Nov 25, 2022, at 11:03 AM, Randy Taylor dev@rjt.dev wrote:
>>> 
>>> On Friday, November 25th, 2022 at 01:15, Yuan Fu casouri@gmail.com wrote:
>>> 
>>>>> On Nov 24, 2022, at 5:13 PM, Randy Taylor dev@rjt.dev wrote:
>>>> 
>>>>> I wonder if assignment and definition are really worth having (and would prefer to do without them), since they should be covered by the variable, function, type and property features.
>>>> 
>>>> They are definitely useful. They are the things we currently highlight, and for a reason. Personally I only want to highlight identifiers in definition and assignment, not every occurrence of them. Since so much of a program consists of variable and function identifiers, highlighting all of them looks almost like highlighting everything. I just want some visual cues on the program structure, not programming in skittles :-)
>>> 
>>> Fair enough. In that case, are we going to end up with duplication in the variable and function features? So long as I can program in skittles, I am happy :).
>> 
>> 
>> I don’t think duplication would be a problem. You can disable them, or just left it on: the face would be the same, the difference is only one which ones to highlight.
>> 
> 
> I meant duplication in writing the queries. Variable and function queries would already cover most or all of what's in assignment and definition.
> 
>> 
>>>>> - (maybe) literal instead of number? That way there is a group for chars too (and any other literals if there are any?). Or a char feature in addition to the existing number one. I'm undecided...
>>>> 
>>>> Literal seems to encompass numbers, strings, chars, and constants. So I don’t know how does it fit. We could add char to string feature.
>>> 
>>> Sounds good. I think I've seen char also end up in constant in some of our tree-sitter modes. Doesn't matter to me where it goes.
>> 
>> 
>> Hmm, what does neovim and emacs-tree-sitter do? (If you happen to know)
>> 
> 
> I'm not sure how the neovim stuff works, if it's faces they are defining or features or what. But they get pretty granular, so they've got their own thing for character (they even differentiate between conditionals and loops!!!). emacs-tree-sitter isn't consistent.
> 
> I think it makes sense to include it in string though, just so long as it doesn't have the same face. Perhaps the constant face is best for it?

(Sorry I missed this message)

C-mode uses string-face for chars, and I think it’s fine to use string-face for chars, too. They already use different syntax (single quotes), so they shouldn’t be hard to distinguish even with the same face.

Yuan


^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2022-12-05 21:17 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-24 22:16 Standardizing tree-sitter fontification features Yuan Fu
2022-11-25  1:13 ` Randy Taylor
2022-11-25  6:15   ` Yuan Fu
2022-11-25 19:03     ` Randy Taylor
2022-11-25 20:55       ` Yuan Fu
2022-11-26  3:35         ` Randy Taylor
2022-12-05 21:17           ` Yuan Fu
2022-11-25  8:13   ` Eli Zaretskii
2022-11-25 19:14     ` Randy Taylor
2022-11-26 14:07     ` Stephen Leake
2022-11-25  2:56 ` Stefan Monnier
2022-11-25  6:34   ` Yuan Fu
2022-11-25 14:52     ` Stefan Monnier
2022-11-26 14:03 ` Stephen Leake
2022-11-26 14:29   ` [SPAM UNSURE] " Stephen Leake
2022-11-26 22:05     ` [SPAM UNSURE] " Yuan Fu
     [not found] ` <2AEA8AB6-593E-4D89-AB05-0C8EB2BCE327@gmail.com>
2022-12-03  1:12   ` Yuan Fu
2022-12-03 14:34     ` Mattias Engdegård
2022-12-05  8:58       ` Theodor Thornhill
2022-12-05 10:26         ` Mattias Engdegård
2022-12-05 11:30           ` Theodor Thornhill
2022-12-05 21:02             ` Yuan Fu

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.