unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Status update of tree-sitter features
@ 2022-12-28  9:44 Yuan Fu
  2022-12-28 15:40 ` Mickey Petersen
  2022-12-28 23:27 ` Dmitry Gutov
  0 siblings, 2 replies; 15+ messages in thread
From: Yuan Fu @ 2022-12-28  9:44 UTC (permalink / raw)
  To: emacs-devel

Hi,

As the complete feature freeze approaching, this is probably the last set of features added to Emacs 29. I stuffed  them in just in time ;-)

1. There is a new predicate in the query language, #pred. It’s like #equal and #match. Basically it allows you to filter the captured node with an arbitrary function. Right now there are some queries in the font-lock settings that matches a little more than what we actually want. For example, for the property feature, we only want the “bb” in “aa.bb”, but not in “aa.bb(cc)”, because the latter is a method, not property. The query usually matches both. With this new predicate we can use a function to filter out the methods.

If we can ensure that every query only captures the intended nodes, the font-lock queries can be reused for context extraction: using the query for the variable feature, I can find all the variables in a given region, etc.

2. We’ve had treesit-defun-type-regexp for a while, I recently generalized the idea into “things”. Now you can use treesit—things-around, treesit—navigate-thing, and treesit—thing-at-point to find and navigate arbitrary “things”. A “thing” is defined by a regexp that matches the node types, plus (optionally) a filter function.

3. Now there is imenu support. Major modes don’t need to define their own imenu functions anymore, they just need to set treesit-simple-imenu-settings. They also need to set treesit-defun-name-function, which is a function that finds out the name of a defun node. It is used by both imenu and add-log-entry.

4. C-like modes now have adequate indent and filling for block comments. 

Lastly I want to remind everyone to update the font-lock settings for your major mode to be more complaint to the standard list of features we decided on. This is not a hard requirement and major modes are free to extend upon it, but it’s nice to be consistent, especially among built-in modes.

Here is the list, for your reference. Among all the features, I think assignment is “nice to have”, it’s fine to leave it out if there isn’t enough time. Same goes for key: it may or may not apply to a language.

Basic tokens:

delimiter       ,.;      (delimit things)
operator        == != || (produces a value)
bracket         []{}()
misc-punctuation

constant        true, false, null
number
keyword
comment         (includes doc-comments)
string          (includes chars and docstrings)
string-interpolation    f"text {variable}"
escape-sequence         "\n\t\\"
function                every function identifier
variable                every variable identifier
type                    every type identifier
property                a.b  <--- highlight b
key                     { a: b, c: d } <--- highlight a, c
error                   highlight parse error

Abstract features:

assignment: the LHS of an assignment (thing being assigned to), eg:

a = b    <--- highlight a
a.b = c  <--- highlight b
a[1] = d <--- highlight a

definition: the thing being defined, eg:

int a(int b) { <--- highlight a
 return 0
}

int a;  <-- highlight a

struct a { <--- highlight a
 int b;   <--- highlight b
}

As for decoration levels, this is my suggestion:

'(( comment definition)
  ( keyword string type)
  ( assignment builtin constant decorator
    escape-sequence key number property string-interpolation)
  ( bracket delimiter function misc-punctuation operator variable))

Yuan 


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-28  9:44 Status update of tree-sitter features Yuan Fu
@ 2022-12-28 15:40 ` Mickey Petersen
  2022-12-29  0:15   ` Yuan Fu
  2022-12-28 23:27 ` Dmitry Gutov
  1 sibling, 1 reply; 15+ messages in thread
From: Mickey Petersen @ 2022-12-28 15:40 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel


Yuan Fu <casouri@gmail.com> writes:

> Hi,
>
> As the complete feature freeze approaching, this is probably the last set of features added to Emacs 29. I stuffed  them in just in time ;-)
>
> 1. There is a new predicate in the query language, #pred. It’s like #equal and #match. Basically it allows you to filter the captured node with an arbitrary function. Right now there are some queries in the font-lock settings that matches a little more than what we actually want. For example, for the property feature, we only want the “bb” in “aa.bb”, but not in “aa.bb(cc)”, because the latter is a method, not property. The query usually matches both. With this new predicate we can use a function to filter out the methods.
>
> If we can ensure that every query only captures the intended nodes, the font-lock queries can be reused for context extraction: using the query for the variable feature, I can find all the variables in a given region, etc.
>

Looks useful. But this, I think, goes back to the issue I raised with you in an email: that queries are recursive, as each query is (I pressume this is how it is done in tree-sitter) applied to each node in the tree. That makes it very hard to build a query that pattern matches against nested nodes only once. It is very hard, when you have recursive queries with self-similar matches, to filter out recursed matches. Particularly when you are writing generic code that merely acts on the output of arbitrary queries.

#pred is surely no different to applying `seq-filter' or similar to the query result, no?



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-28  9:44 Status update of tree-sitter features Yuan Fu
  2022-12-28 15:40 ` Mickey Petersen
@ 2022-12-28 23:27 ` Dmitry Gutov
  2022-12-29  0:23   ` Yuan Fu
  1 sibling, 1 reply; 15+ messages in thread
From: Dmitry Gutov @ 2022-12-28 23:27 UTC (permalink / raw)
  To: Yuan Fu, emacs-devel

Hi Yuan,

I'm late to the party, so apologies if any of the following has been 
discussed before.

On 28/12/2022 11:44, Yuan Fu wrote:

> Here is the list, for your reference. Among all the features, I think assignment is “nice to have”, it’s fine to leave it out if there isn’t enough time. Same goes for key: it may or may not apply to a language.
> 
> Basic tokens:
> 
> delimiter       ,.;      (delimit things)
> operator        == != || (produces a value)
> bracket         []{}()
> misc-punctuation
> 
> constant        true, false, null
> number
> keyword
> comment         (includes doc-comments)
> string          (includes chars and docstrings)
> string-interpolation    f"text {variable}"
> escape-sequence         "\n\t\\"
> function                every function identifier

The description makes it easy to mistake for function definitions as 
well. Whereas the place where this category is used is function/method 
calls, right? And perhaps some other references to methods inside the 
code where the language parser can distinguish those from property 
access, etc.

If it's only about calls, maybe call this category funcall?

> type                    every type identifier
> property                a.b  <--- highlight b

Do we think it's a good idea to have 'property' in the default 
highlighting? IIUC the default level is 3.

Code like foo.bar is easy to understand at a glance, so it seems like 
it'll lead to a lot of repeated highlights. If 'function' (or 'funcall') 
is a level 4, maybe 'property' should be there too?

 > variable                every variable identifier

'variable', so far, seems like the least useful. When enabled, it lights 
up every bit of text that remained from other matchers -- because 
identifier are everywhere.

There is this more advanced prior art for highlighting variables, by 
tracking the scopes using custom annotations, see locals.scm here:

https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables

What's displayed under "Result" would be really handy to have in Ruby.

It's nothing urgent, of course. Maybe for Emacs 30?

For Emacs 29, though, I would discourage the use of 'variable'.

Thanks,
Dmitry.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-28 15:40 ` Mickey Petersen
@ 2022-12-29  0:15   ` Yuan Fu
  0 siblings, 0 replies; 15+ messages in thread
From: Yuan Fu @ 2022-12-29  0:15 UTC (permalink / raw)
  To: Mickey Petersen; +Cc: emacs-devel

> 
> Looks useful. But this, I think, goes back to the issue I raised with you in an email: that queries are recursive, as each query is (I pressume this is how it is done in tree-sitter) applied to each node in the tree. That makes it very hard to build a query that pattern matches against nested nodes only once. It is very hard, when you have recursive queries with self-similar matches, to filter out recursed matches. Particularly when you are writing generic code that merely acts on the output of arbitrary queries.
> 
> #pred is surely no different to applying `seq-filter' or similar to the query result, no?

No, I don’t think there’s any significant difference. #pred is just a bit nicer for other people to use the query, because the filter is embedded in the query.

Yuan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-28 23:27 ` Dmitry Gutov
@ 2022-12-29  0:23   ` Yuan Fu
  2022-12-29  0:34     ` Dmitry Gutov
  2022-12-29  3:28     ` Stefan Monnier
  0 siblings, 2 replies; 15+ messages in thread
From: Yuan Fu @ 2022-12-29  0:23 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel



> On Dec 28, 2022, at 3:27 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> Hi Yuan,
> 
> I'm late to the party, so apologies if any of the following has been discussed before.
> 
> On 28/12/2022 11:44, Yuan Fu wrote:
> 
>> Here is the list, for your reference. Among all the features, I think assignment is “nice to have”, it’s fine to leave it out if there isn’t enough time. Same goes for key: it may or may not apply to a language.
>> Basic tokens:
>> delimiter       ,.;      (delimit things)
>> operator        == != || (produces a value)
>> bracket         []{}()
>> misc-punctuation
>> constant        true, false, null
>> number
>> keyword
>> comment         (includes doc-comments)
>> string          (includes chars and docstrings)
>> string-interpolation    f"text {variable}"
>> escape-sequence         "\n\t\\"
>> function                every function identifier
> 
> The description makes it easy to mistake for function definitions as well. Whereas the place where this category is used is function/method calls, right? And perhaps some other references to methods inside the code where the language parser can distinguish those from property access, etc.
> 
> If it's only about calls, maybe call this category funcall?

Function, property and variable are for every occurrence of them (the touted “consistent highlighting”). So there will be a bit of overlap between function, variable, and definition. 

> 
>> type                    every type identifier
>> property                a.b  <--- highlight b
> 
> Do we think it's a good idea to have 'property' in the default highlighting? IIUC the default level is 3.
> 
> Code like foo.bar is easy to understand at a glance, so it seems like it'll lead to a lot of repeated highlights. If 'function' (or 'funcall') is a level 4, maybe 'property' should be there too?

Ah yes property should be level 4, along with function and variable.

> 
> > variable                every variable identifier
> 
> 'variable', so far, seems like the least useful. When enabled, it lights up every bit of text that remained from other matchers -- because identifier are everywhere.

Yes, but apparently people want it ;-)

> There is this more advanced prior art for highlighting variables, by tracking the scopes using custom annotations, see locals.scm here:
> 
> https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables
> 
> What's displayed under "Result" would be really handy to have in Ruby.
> 
> It's nothing urgent, of course. Maybe for Emacs 30?

Yeah, this requires some non-trivial addition to the current fontification code.

> For Emacs 29, though, I would discourage the use of 'variable’.

It’s on level 4, meaning not enabled by default, so I think it’s fine.





^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-29  0:23   ` Yuan Fu
@ 2022-12-29  0:34     ` Dmitry Gutov
  2022-12-29  9:21       ` Yuan Fu
  2022-12-29  3:28     ` Stefan Monnier
  1 sibling, 1 reply; 15+ messages in thread
From: Dmitry Gutov @ 2022-12-29  0:34 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

On 29/12/2022 02:23, Yuan Fu wrote:

>>> function                every function identifier
>>
>> The description makes it easy to mistake for function definitions as well. Whereas the place where this category is used is function/method calls, right? And perhaps some other references to methods inside the code where the language parser can distinguish those from property access, etc.
>>
>> If it's only about calls, maybe call this category funcall?
> 
> Function, property and variable are for every occurrence of them (the touted “consistent highlighting”). So there will be a bit of overlap between function, variable, and definition.

By "overlap", do you mean that font-lock rules should also have entries 
for variable/function definitions with category 'variable'/'function'?

In case somebody removes 'definition' from their list of enabled 
features but keeps 'variable' and 'function' there?

>>> variable                every variable identifier
>>
>> 'variable', so far, seems like the least useful. When enabled, it lights up every bit of text that remained from other matchers -- because identifier are everywhere.
> 
> Yes, but apparently people want it ;-)

Well, if they really do.

I figured that people who added this maybe haven't tested this 
thoroughly. And that maybe they expected the effect of that "local 
variables highlights" feature that some editors showcase already.

>> There is this more advanced prior art for highlighting variables, by tracking the scopes using custom annotations, see locals.scm here:
>>
>> https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables
>>
>> What's displayed under "Result" would be really handy to have in Ruby.
>>
>> It's nothing urgent, of course. Maybe for Emacs 30?
> 
> Yeah, this requires some non-trivial addition to the current fontification code.

Thank you.

>> For Emacs 29, though, I would discourage the use of 'variable’.
> 
> It’s on level 4, meaning not enabled by default, so I think it’s fine.

Fair enough. If someone wants function/property but not variable, they 
could fine-tune the list.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-29  0:23   ` Yuan Fu
  2022-12-29  0:34     ` Dmitry Gutov
@ 2022-12-29  3:28     ` Stefan Monnier
  2022-12-29  9:23       ` Yuan Fu
  2022-12-30 14:27       ` Jostein Kjønigsen
  1 sibling, 2 replies; 15+ messages in thread
From: Stefan Monnier @ 2022-12-29  3:28 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Dmitry Gutov, emacs-devel

>> If it's only about calls, maybe call this category funcall?
> Function, property and variable are for every occurrence of them (the touted
> “consistent highlighting”).

In my own use, the difference between a definition and a use is more
important than the difference between a variable name and a method name.
I'm not fundamentally opposed to highlighting funcalls, but it is
indispensable that funcalls get a different face from
function definitions.


        Stefan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-29  0:34     ` Dmitry Gutov
@ 2022-12-29  9:21       ` Yuan Fu
  2022-12-29 16:38         ` Dmitry Gutov
  0 siblings, 1 reply; 15+ messages in thread
From: Yuan Fu @ 2022-12-29  9:21 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel



> On Dec 28, 2022, at 4:34 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> On 29/12/2022 02:23, Yuan Fu wrote:
> 
>>>> function                every function identifier
>>> 
>>> The description makes it easy to mistake for function definitions as well. Whereas the place where this category is used is function/method calls, right? And perhaps some other references to methods inside the code where the language parser can distinguish those from property access, etc.
>>> 
>>> If it's only about calls, maybe call this category funcall?
>> Function, property and variable are for every occurrence of them (the touted “consistent highlighting”). So there will be a bit of overlap between function, variable, and definition.
> 
> By "overlap", do you mean that font-lock rules should also have entries for variable/function definitions with category 'variable'/'function'?
> 
> In case somebody removes 'definition' from their list of enabled features but keeps 'variable' and 'function' there?

Basically, if you enable definition alone, you get highlight for function/variable/class definition. If you enable function/variable alone, you get highlight for all occurrence of function/variable identifiers, which would includ what definition highlights. Definition can be seen as the union of subsets of function & variable feature.

> 
>>>> variable                every variable identifier
>>> 
>>> 'variable', so far, seems like the least useful. When enabled, it lights up every bit of text that remained from other matchers -- because identifier are everywhere.
>> Yes, but apparently people want it ;-)
> 
> Well, if they really do.
> 
> I figured that people who added this maybe haven't tested this thoroughly. And that maybe they expected the effect of that "local variables highlights" feature that some editors showcase already.

The purpose of the standard list is to regulate features, so if a major mode wants to support a feature in the list, it uses the definition and name from that list (rather than creating a feature with same definition but different name, or same name but different definition). If a major mode really want variable feature, they can add it, if not, they don’t have to.

> 
>>> There is this more advanced prior art for highlighting variables, by tracking the scopes using custom annotations, see locals.scm here:
>>> 
>>> https://tree-sitter.github.io/tree-sitter/syntax-highlighting#local-variables
>>> 
>>> What's displayed under "Result" would be really handy to have in Ruby.
>>> 
>>> It's nothing urgent, of course. Maybe for Emacs 30?
>> Yeah, this requires some non-trivial addition to the current fontification code.
> 
> Thank you.
> 
>>> For Emacs 29, though, I would discourage the use of 'variable’.
>> It’s on level 4, meaning not enabled by default, so I think it’s fine.
> 
> Fair enough. If someone wants function/property but not variable, they could fine-tune the list.

Right. All the features in level 4 are pretty over-the-top IMO, so simply bumping to level 4 and enable everything is probably not the way to go.

Yuan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-29  3:28     ` Stefan Monnier
@ 2022-12-29  9:23       ` Yuan Fu
  2022-12-30 14:27       ` Jostein Kjønigsen
  1 sibling, 0 replies; 15+ messages in thread
From: Yuan Fu @ 2022-12-29  9:23 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Dmitry Gutov, emacs-devel



> On Dec 28, 2022, at 7:28 PM, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
>>> If it's only about calls, maybe call this category funcall?
>> Function, property and variable are for every occurrence of them (the touted
>> “consistent highlighting”).
> 
> In my own use, the difference between a definition and a use is more
> important than the difference between a variable name and a method name.
> I'm not fundamentally opposed to highlighting funcalls, but it is
> indispensable that funcalls get a different face from
> function definitions.

The default behavior is we only highlight definition, and funcall are not highlighted. So it’s function-face vs no face. More or less fits your bill, I guess.

Yuan


^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-29  9:21       ` Yuan Fu
@ 2022-12-29 16:38         ` Dmitry Gutov
  2022-12-30 11:16           ` Yuan Fu
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Gutov @ 2022-12-29 16:38 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

On 29/12/2022 11:21, Yuan Fu wrote:

>>>> If it's only about calls, maybe call this category funcall?
>>> Function, property and variable are for every occurrence of them (the touted “consistent highlighting”). So there will be a bit of overlap between function, variable, and definition.
>>
>> By "overlap", do you mean that font-lock rules should also have entries for variable/function definitions with category 'variable'/'function'?
>>
>> In case somebody removes 'definition' from their list of enabled features but keeps 'variable' and 'function' there?
> 
> Basically, if you enable definition alone, you get highlight for function/variable/class definition. If you enable function/variable alone, you get highlight for all occurrence of function/variable identifiers, which would includ what definition highlights. Definition can be seen as the union of subsets of function & variable feature.

Yes, but is that classification useful enough to justify the duplication 
in the font-lock rules' definitions?

FWIW, I "broke" python-ts-mode in 8503b370be1, according to this 
definition. Sorry?

But what do we get by this particular classification?

The user will be able to disable 'definitions' and still have all 
function definitions highlighted? But not, say, variable definitions?

That doesn't strike me as particularly more useful than e.g. disabling 
'definitions' and having all function references highlighted (but not 
definitions). Which we would make possible by limiting 'function' to 
non-definitions.

>>>>> variable                every variable identifier
>>>>
>>>> 'variable', so far, seems like the least useful. When enabled, it lights up every bit of text that remained from other matchers -- because identifier are everywhere.
>>> Yes, but apparently people want it ;-)
>>
>> Well, if they really do.
>>
>> I figured that people who added this maybe haven't tested this thoroughly. And that maybe they expected the effect of that "local variables highlights" feature that some editors showcase already.
> 
> The purpose of the standard list is to regulate features, so if a major mode wants to support a feature in the list, it uses the definition and name from that list (rather than creating a feature with same definition but different name, or same name but different definition). If a major mode really want variable feature, they can add it, if not, they don’t have to.

Okay, I also have a different, somewhat related question.

Certain languages have a special syntax for "variables".

Ruby has @instance_variable and $global_variable -- the @ and $ are used 
both during assignment and for later reference.

Perl has $var and @array_var.

PHP has $local_var.

Up until now we've highlighted those with font-lock-variable-name-face. 
Except for @array_var in Perl, which has separate derived face. Either 
way, we did highlight them.

What category should we use for them in ts-based modes? If it's 
'variable', then won't be highlighted by default.

If 'variable' matchers in other existing ts modes didn't include all 
identifiers everywhere, we could put 'variable' into level 3.

Or we could add a separate category for all those, I guess.

>>>> For Emacs 29, though, I would discourage the use of 'variable’.
>>> It’s on level 4, meaning not enabled by default, so I think it’s fine.
>>
>> Fair enough. If someone wants function/property but not variable, they could fine-tune the list.
> 
> Right. All the features in level 4 are pretty over-the-top IMO, so simply bumping to level 4 and enable everything is probably not the way to go.

I actually considered having level 4 enabled. The punctuation faces look 
like 'default' without customization anyway, so it doesn't turn into 
angry fruit salad right away. One can stop at customizing the "brackets" 
face, for example.

Though I'd probably also want 'function' and 'property' highlights to 
use other faces, distinct from 'function-name' and 'variable-name'.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-29 16:38         ` Dmitry Gutov
@ 2022-12-30 11:16           ` Yuan Fu
  2022-12-30 23:41             ` Dmitry Gutov
  0 siblings, 1 reply; 15+ messages in thread
From: Yuan Fu @ 2022-12-30 11:16 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel



> On Dec 29, 2022, at 8:38 AM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> On 29/12/2022 11:21, Yuan Fu wrote:
> 
>>>>> If it's only about calls, maybe call this category funcall?
>>>> Function, property and variable are for every occurrence of them (the touted “consistent highlighting”). So there will be a bit of overlap between function, variable, and definition.
>>> 
>>> By "overlap", do you mean that font-lock rules should also have entries for variable/function definitions with category 'variable'/'function'?
>>> 
>>> In case somebody removes 'definition' from their list of enabled features but keeps 'variable' and 'function' there?
>> Basically, if you enable definition alone, you get highlight for function/variable/class definition. If you enable function/variable alone, you get highlight for all occurrence of function/variable identifiers, which would includ what definition highlights. Definition can be seen as the union of subsets of function & variable feature.
> 
> Yes, but is that classification useful enough to justify the duplication in the font-lock rules' definitions?
> 
> FWIW, I "broke" python-ts-mode in 8503b370be1, according to this definition. Sorry?
> 
> But what do we get by this particular classification?
> 
> The user will be able to disable 'definitions' and still have all function definitions highlighted? But not, say, variable definitions?
> 
> That doesn't strike me as particularly more useful than e.g. disabling 'definitions' and having all function references highlighted (but not definitions). Which we would make possible by limiting 'function' to non-definitions.

Right now you can choose to highlight:
1. Only definition identifiers, be it function, variable, class, etc
2. All function identifiers
3. All variable identifiers

There could be better features, but the existing ones still have their merits. If you want a feature that highlights only funcalls, maybe we can add them, if there are enough time & interest.

>>>>>> variable                every variable identifier
>>>>> 
>>>>> 'variable', so far, seems like the least useful. When enabled, it lights up every bit of text that remained from other matchers -- because identifier are everywhere.
>>>> Yes, but apparently people want it ;-)
>>> 
>>> Well, if they really do.
>>> 
>>> I figured that people who added this maybe haven't tested this thoroughly. And that maybe they expected the effect of that "local variables highlights" feature that some editors showcase already.
>> The purpose of the standard list is to regulate features, so if a major mode wants to support a feature in the list, it uses the definition and name from that list (rather than creating a feature with same definition but different name, or same name but different definition). If a major mode really want variable feature, they can add it, if not, they don’t have to.
> 
> Okay, I also have a different, somewhat related question.
> 
> Certain languages have a special syntax for "variables".
> 
> Ruby has @instance_variable and $global_variable -- the @ and $ are used both during assignment and for later reference.
> 
> Perl has $var and @array_var.
> 
> PHP has $local_var.
> 
> Up until now we've highlighted those with font-lock-variable-name-face. Except for @array_var in Perl, which has separate derived face. Either way, we did highlight them.
> 
> What category should we use for them in ts-based modes? If it's 'variable', then won't be highlighted by default.
> 
> If 'variable' matchers in other existing ts modes didn't include all identifiers everywhere, we could put 'variable' into level 3.
> 
> Or we could add a separate category for all those, I guess.

If a major mode thinks highlighting all variables is the best default behavior, it can support the variable feature and put it at level 3. The standard list is just a guideline.

> 
>>>>> For Emacs 29, though, I would discourage the use of 'variable’.
>>>> It’s on level 4, meaning not enabled by default, so I think it’s fine.
>>> 
>>> Fair enough. If someone wants function/property but not variable, they could fine-tune the list.
>> Right. All the features in level 4 are pretty over-the-top IMO, so simply bumping to level 4 and enable everything is probably not the way to go.
> 
> I actually considered having level 4 enabled. The punctuation faces look like 'default' without customization anyway, so it doesn't turn into angry fruit salad right away. One can stop at customizing the "brackets" face, for example.
> 
> Though I'd probably also want 'function' and 'property' highlights to use other faces, distinct from 'function-name' and 'variable-name’.

You could use level 4 and remove unwanted features, yes. I guess that works too :-)

Yuan




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-29  3:28     ` Stefan Monnier
  2022-12-29  9:23       ` Yuan Fu
@ 2022-12-30 14:27       ` Jostein Kjønigsen
  2022-12-30 15:37         ` Eli Zaretskii
  1 sibling, 1 reply; 15+ messages in thread
From: Jostein Kjønigsen @ 2022-12-30 14:27 UTC (permalink / raw)
  To: Stefan Monnier, Yuan Fu, Eli Zaretskii; +Cc: Dmitry Gutov, emacs-devel

Hey everyone.

Sorry for being late to the party. As far as I can tell, there seems to 
be mostly consensus on the overall picture, while some different 
opinions wrt to what to actually standardize.

On 29.12.2022 04:28, Stefan Monnier wrote:
>>> If it's only about calls, maybe call this category funcall?
>> Function, property and variable are for every occurrence of them (the touted
>> “consistent highlighting”).
> In my own use, the difference between a definition and a use is more
> important than the difference between a variable name and a method name.
> I'm not fundamentally opposed to highlighting funcalls, but it is
> indispensable that funcalls get a different face from
> function definitions.
>
>
>          Stefan
>
I know that some people prefer "sparse" syntax-highlighting, while 
others prefer more "complete" highlighting. Personally, I value being 
able to see function invocations, and love how tree-sitter makes that 
easy to highlight.

As long as we have ways to accommodate both needs ("fruit salad" vs more 
sparse fontification) I'm going to be happy, and we kind of already have 
that through font-lock levels in the modes already.

That said, some of the tree-sitter based major modes (half? most?) 
currently highlight function-invocations on level 3. If we are to make 
everyone happy, we need to make changes to that, and move that to 4.

Which leads me to my main point here.

Emacs 29 release is near. We should seriously limit the amount of 
changes we apply in order to fix the things we need.

Deciding at this point to rework all the features, which features we 
should have, how they can be enabled/disabled, and what 
enabling/disabling those should entail in practice... That's essentially 
suggestion we rewrite all those tree-sitter-based major-modes.

I think a much more realistic strategy at this point is:

- to aim for what concrete things should be fontified at level 3 and 4 
(default vs max). We already know 1 thing we want to change here.

- ensure the major-modes adhere to this with a minimum of changes in the 
existing code.

- and most importantly: leave the discussion about adding the ability 
for users to enable/disable specific features, and how that should work 
in practice (overlap, presedence, etc) for Emacs 30.

Doing so would leave the new tree-sitter based major-modes pretty much 
functionally on par with existing major-modes (or better, more 
accurate), and it buys us a full, new release-period to figure out how 
we can leverage tree-sitter to give end-users better customization 
options, instead of coming up with ideas at the verge of release, 
without having time to let those ideas mature.

This will end up being an API of sorts, and from my experience you don't 
want to rush those.

My 2 cents.

--
Jostein




^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-30 14:27       ` Jostein Kjønigsen
@ 2022-12-30 15:37         ` Eli Zaretskii
  0 siblings, 0 replies; 15+ messages in thread
From: Eli Zaretskii @ 2022-12-30 15:37 UTC (permalink / raw)
  To: jostein; +Cc: monnier, casouri, dgutov, emacs-devel

> Date: Fri, 30 Dec 2022 15:27:16 +0100
> Cc: Dmitry Gutov <dgutov@yandex.ru>, emacs-devel <emacs-devel@gnu.org>
> From: Jostein Kjønigsen <jostein@secure.kjonigsen.net>
> 
> I know that some people prefer "sparse" syntax-highlighting, while 
> others prefer more "complete" highlighting. Personally, I value being 
> able to see function invocations, and love how tree-sitter makes that 
> easy to highlight.
> 
> As long as we have ways to accommodate both needs ("fruit salad" vs more 
> sparse fontification) I'm going to be happy, and we kind of already have 
> that through font-lock levels in the modes already.
> 
> That said, some of the tree-sitter based major modes (half? most?) 
> currently highlight function-invocations on level 3. If we are to make 
> everyone happy, we need to make changes to that, and move that to 4.
> 
> Which leads me to my main point here.
> 
> Emacs 29 release is near. We should seriously limit the amount of 
> changes we apply in order to fix the things we need.
> 
> Deciding at this point to rework all the features, which features we 
> should have, how they can be enabled/disabled, and what 
> enabling/disabling those should entail in practice... That's essentially 
> suggestion we rewrite all those tree-sitter-based major-modes.
> 
> I think a much more realistic strategy at this point is:
> 
> - to aim for what concrete things should be fontified at level 3 and 4 
> (default vs max). We already know 1 thing we want to change here.
> 
> - ensure the major-modes adhere to this with a minimum of changes in the 
> existing code.
> 
> - and most importantly: leave the discussion about adding the ability 
> for users to enable/disable specific features, and how that should work 
> in practice (overlap, presedence, etc) for Emacs 30.

I agree.  Please post concrete proposals for making modes consistent
regarding the default fontification level.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-30 11:16           ` Yuan Fu
@ 2022-12-30 23:41             ` Dmitry Gutov
  2022-12-31 22:15               ` Yuan Fu
  0 siblings, 1 reply; 15+ messages in thread
From: Dmitry Gutov @ 2022-12-30 23:41 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel

On 30/12/2022 13:16, Yuan Fu wrote:

>> Yes, but is that classification useful enough to justify the duplication in the font-lock rules' definitions?
>>
>> FWIW, I "broke" python-ts-mode in 8503b370be1, according to this definition. Sorry?
>>
>> But what do we get by this particular classification?
>>
>> The user will be able to disable 'definitions' and still have all function definitions highlighted? But not, say, variable definitions?
>>
>> That doesn't strike me as particularly more useful than e.g. disabling 'definitions' and having all function references highlighted (but not definitions). Which we would make possible by limiting 'function' to non-definitions.
> 
> Right now you can choose to highlight:
> 1. Only definition identifiers, be it function, variable, class, etc
> 2. All function identifiers
> 3. All variable identifiers
> 
> There could be better features, but the existing ones still have their merits. If you want a feature that highlights only funcalls, maybe we can add them, if there are enough time & interest.

I'm mostly worried about having to duplicate font-lock rules between 
categories at this point. It just looks impractical to me.

>> Okay, I also have a different, somewhat related question.
>>
>> Certain languages have a special syntax for "variables".
>>
>> Ruby has @instance_variable and $global_variable -- the @ and $ are used both during assignment and for later reference.
>>
>> Perl has $var and @array_var.
>>
>> PHP has $local_var.
>>
>> Up until now we've highlighted those with font-lock-variable-name-face. Except for @array_var in Perl, which has separate derived face. Either way, we did highlight them.
>>
>> What category should we use for them in ts-based modes? If it's 'variable', then won't be highlighted by default.
>>
>> If 'variable' matchers in other existing ts modes didn't include all identifiers everywhere, we could put 'variable' into level 3.
>>
>> Or we could add a separate category for all those, I guess.
> 
> If a major mode thinks highlighting all variables is the best default behavior, it can support the variable feature and put it at level 3. The standard list is just a guideline.

So you're suggesting ruby-ts-mode changes treesit-font-lock-feature-list 
locally, and either moves 'variable' to a lower level, or adds a new 
category for said vars.

We can do that, thank you.



^ permalink raw reply	[flat|nested] 15+ messages in thread

* Re: Status update of tree-sitter features
  2022-12-30 23:41             ` Dmitry Gutov
@ 2022-12-31 22:15               ` Yuan Fu
  0 siblings, 0 replies; 15+ messages in thread
From: Yuan Fu @ 2022-12-31 22:15 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: emacs-devel



> On Dec 30, 2022, at 3:41 PM, Dmitry Gutov <dgutov@yandex.ru> wrote:
> 
> On 30/12/2022 13:16, Yuan Fu wrote:
> 
>>> Yes, but is that classification useful enough to justify the duplication in the font-lock rules' definitions?
>>> 
>>> FWIW, I "broke" python-ts-mode in 8503b370be1, according to this definition. Sorry?
>>> 
>>> But what do we get by this particular classification?
>>> 
>>> The user will be able to disable 'definitions' and still have all function definitions highlighted? But not, say, variable definitions?
>>> 
>>> That doesn't strike me as particularly more useful than e.g. disabling 'definitions' and having all function references highlighted (but not definitions). Which we would make possible by limiting 'function' to non-definitions.
>> Right now you can choose to highlight:
>> 1. Only definition identifiers, be it function, variable, class, etc
>> 2. All function identifiers
>> 3. All variable identifiers
>> There could be better features, but the existing ones still have their merits. If you want a feature that highlights only funcalls, maybe we can add them, if there are enough time & interest.
> 
> I'm mostly worried about having to duplicate font-lock rules between categories at this point. It just looks impractical to me.

I’d say we judge it case-by-case. In this case it seems fine to me.

> 
>>> Okay, I also have a different, somewhat related question.
>>> 
>>> Certain languages have a special syntax for "variables".
>>> 
>>> Ruby has @instance_variable and $global_variable -- the @ and $ are used both during assignment and for later reference.
>>> 
>>> Perl has $var and @array_var.
>>> 
>>> PHP has $local_var.
>>> 
>>> Up until now we've highlighted those with font-lock-variable-name-face. Except for @array_var in Perl, which has separate derived face. Either way, we did highlight them.
>>> 
>>> What category should we use for them in ts-based modes? If it's 'variable', then won't be highlighted by default.
>>> 
>>> If 'variable' matchers in other existing ts modes didn't include all identifiers everywhere, we could put 'variable' into level 3.
>>> 
>>> Or we could add a separate category for all those, I guess.
>> If a major mode thinks highlighting all variables is the best default behavior, it can support the variable feature and put it at level 3. The standard list is just a guideline.
> 
> So you're suggesting ruby-ts-mode changes treesit-font-lock-feature-list locally, and either moves 'variable' to a lower level, or adds a new category for said vars.
> 
> We can do that, thank you.

Yeah, ultimately it’s your judgement.

Yuan


^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-12-31 22:15 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-12-28  9:44 Status update of tree-sitter features Yuan Fu
2022-12-28 15:40 ` Mickey Petersen
2022-12-29  0:15   ` Yuan Fu
2022-12-28 23:27 ` Dmitry Gutov
2022-12-29  0:23   ` Yuan Fu
2022-12-29  0:34     ` Dmitry Gutov
2022-12-29  9:21       ` Yuan Fu
2022-12-29 16:38         ` Dmitry Gutov
2022-12-30 11:16           ` Yuan Fu
2022-12-30 23:41             ` Dmitry Gutov
2022-12-31 22:15               ` Yuan Fu
2022-12-29  3:28     ` Stefan Monnier
2022-12-29  9:23       ` Yuan Fu
2022-12-30 14:27       ` Jostein Kjønigsen
2022-12-30 15:37         ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).