unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Recent updates to tree-sitter branch
@ 2022-09-25  4:27 Yuan Fu
  2022-09-25  6:17 ` Ihor Radchenko
  2022-09-29 10:13 ` Aurélien Aptel
  0 siblings, 2 replies; 14+ messages in thread
From: Yuan Fu @ 2022-09-25  4:27 UTC (permalink / raw)
  To: emacs-devel; +Cc: Theodor Thornhill

Hi,

I’ve recently made some breaking changes and added some goodies to the tree-sitter branch. If anyone have used that branch for … stuff, please have a look at these changes:

1. I reworked the tree-traversal functions—moved them from lisp to C, and changed their name and signature:

treesit-traverse-depth-first   -> treesit-search-subtree
treesit-traverse-breadth-first ->
treesit-traverse-forward       -> treesit-search-forward
treesit-search-forward         -> treesit-search-forward-goto
treesit-search-beginning/end   -> treesit-search-forward-goto
                               -> treesit-induce-sparse-tree

Treesit-induce-sparse-tree is very good for very quickly extracting a tree containing only nodes that satisfy some condition. Maybe it should be called treesit-extract-sparse-tree? “Sparse tree” is an invented word, I’m not aware of a good terminology for this kind of induced tree.

2. Although treesit-font-lock-settings didn’t change, treesit-font-lock-defaults is abandoned. You are also now supposed to use treesit-font-lock-rules to build the queries and set it to treesit-font-lock-settings. It is much cleaner than setting treesit-font-lock-settings manually.

Basically:

(setq treesit-font-lock-defaults ...)

|
V

(setq treesit-font-lock-settings
      (treesit-font-lock-rules
       ...))

3. I removed treesit-defun-query, treesit-beginning/end-of-defun.

That’s it! Happy hacking.

Yuan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-25  4:27 Recent updates to tree-sitter branch Yuan Fu
@ 2022-09-25  6:17 ` Ihor Radchenko
  2022-09-26  8:35   ` Yuan Fu
  2022-09-29 10:13 ` Aurélien Aptel
  1 sibling, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2022-09-25  6:17 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, Theodor Thornhill

Yuan Fu <casouri@gmail.com> writes:

> 2. Although treesit-font-lock-settings didn’t change, treesit-font-lock-defaults is abandoned. You are also now supposed to use treesit-font-lock-rules to build the queries and set it to treesit-font-lock-settings. It is much cleaner than setting treesit-font-lock-settings manually.

I am not sure if it has been discussed, but may I ask a few questions
regarding treesit-font-lock-rules.

If my understanding is correct, the font-lock rules are somewhat
equivalent font-lock-keywords, but much more limited.

font-lock-keywords elements can have a form of

 MATCHER
 (MATCHER . SUBEXP)
 (MATCHER . FACENAME)
 (MATCHER . HIGHLIGHT)
 (MATCHER HIGHLIGHT ...)
 (eval . FORM)

where MATCHER is either a regexp or a function.

treesit-font-lock-rules rules take a form of
(MATCHER FACENAME) or (MATCHER FUNCTION)

where MATCHER can only be a query.

Is there any reason why MATCHER in treesit-font-lock-rules cannot be a
function with access to the fontified node? It will allow more flexible
fontification, when programmatic query can be used to decide the
fontification.

Further, can OVERRIDE FLAG of the MATCH-HIGHLIGHT as in
font-lock-keywords be supported?

   "If OVERRIDE is t, existing fontification can be overwritten. If
    keep, only parts not already fontified are highlighted. If prepend or
    append, existing fontification is merged with the new, in which the new
    or existing fontification, respectively, takes precedence."

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-25  6:17 ` Ihor Radchenko
@ 2022-09-26  8:35   ` Yuan Fu
  2022-09-26  9:43     ` Ihor Radchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Yuan Fu @ 2022-09-26  8:35 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel, Theodor Thornhill



> On Sep 24, 2022, at 11:17 PM, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>> 2. Although treesit-font-lock-settings didn’t change, treesit-font-lock-defaults is abandoned. You are also now supposed to use treesit-font-lock-rules to build the queries and set it to treesit-font-lock-settings. It is much cleaner than setting treesit-font-lock-settings manually.
> 
> I am not sure if it has been discussed, but may I ask a few questions
> regarding treesit-font-lock-rules.
> 
> If my understanding is correct, the font-lock rules are somewhat
> equivalent font-lock-keywords, but much more limited.
> 
> font-lock-keywords elements can have a form of
> 
> MATCHER
> (MATCHER . SUBEXP)
> (MATCHER . FACENAME)
> (MATCHER . HIGHLIGHT)
> (MATCHER HIGHLIGHT ...)
> (eval . FORM)
> 
> where MATCHER is either a regexp or a function.
> 
> treesit-font-lock-rules rules take a form of
> (MATCHER FACENAME) or (MATCHER FUNCTION)
> 
> where MATCHER can only be a query.
> 
> Is there any reason why MATCHER in treesit-font-lock-rules cannot be a
> function with access to the fontified node?

Hmm, I’m not sure what do you mean. The whole thing passed to treesit-font-lock-rules is a single query, and we can’t really change the query syntax, that’s defined by tree-sitter. Basically in a query you have patterns paired with capture names, if the pattern matches to a node, that node is returned with corresponding capture name tagged on it. For font-lock, we just use face names as capture names, and when a query returns captured nodes, fontify the node with its capture name, aka a face (or a function).

> It will allow more flexible
> fontification, when programmatic query can be used to decide the
> fontification.
> 
> Further, can OVERRIDE FLAG of the MATCH-HIGHLIGHT as in
> font-lock-keywords be supported?
> 
>   "If OVERRIDE is t, existing fontification can be overwritten. If
>    keep, only parts not already fontified are highlighted. If prepend or
>    append, existing fontification is merged with the new, in which the new
>    or existing fontification, respectively, takes precedence.”

I can do that, but would it be really useful? Unlike regex font-lock which is used for so many different things, tree-sitter font-lock is, IMO, only used to apply a base layer of language-specific highlight. How would one use the override feature in this scenario?

Yuan




^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-26  8:35   ` Yuan Fu
@ 2022-09-26  9:43     ` Ihor Radchenko
  2022-09-27 22:28       ` Yuan Fu
  0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2022-09-26  9:43 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, Theodor Thornhill

Yuan Fu <casouri@gmail.com> writes:

>> treesit-font-lock-rules rules take a form of
>> (MATCHER FACENAME) or (MATCHER FUNCTION)
>> 
>> where MATCHER can only be a query.
>> 
>> Is there any reason why MATCHER in treesit-font-lock-rules cannot be a
>> function with access to the fontified node?
>
> Hmm, I’m not sure what do you mean. The whole thing passed to treesit-font-lock-rules is a single query, and we can’t really change the query syntax, that’s defined by tree-sitter. Basically in a query you have patterns paired with capture names, if the pattern matches to a node, that node is returned with corresponding capture name tagged on it. For font-lock, we just use face names as capture names, and when a query returns captured nodes, fontify the node with its capture name, aka a face (or a function).

What I am asking is an extra dynamic condition in addition to the query.
For example:
1. Only apply FACENAME for nodes matching QUERY, but only when Elisp
   variable is non-nil

2. Only apply FACENAME for nodes matching QUERY, which are in the second
   half of the buffer

3. Only apply FACENAME for notes matching QUERY, which also have a field
   matching a dynamically assigned regexp.

Essentially any condition that is not covered by the QUERY, but can be
checked in Elisp given that node object is passed to the test function.

>> Further, can OVERRIDE FLAG of the MATCH-HIGHLIGHT as in
>> font-lock-keywords be supported?
>> 
>>   "If OVERRIDE is t, existing fontification can be overwritten. If
>>    keep, only parts not already fontified are highlighted. If prepend or
>>    append, existing fontification is merged with the new, in which the new
>>    or existing fontification, respectively, takes precedence.”
>
> I can do that, but would it be really useful? Unlike regex font-lock which is used for so many different things, tree-sitter font-lock is, IMO, only used to apply a base layer of language-specific highlight. How would one use the override feature in this scenario?

For example, consider a function definition with docstring field.
Imagine that you want the function definition to have gray background,
but the docstring to have yellow background. OVERRIDE t is how this is
usually implemented in font-lock-keywords.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-26  9:43     ` Ihor Radchenko
@ 2022-09-27 22:28       ` Yuan Fu
  2022-09-29  4:01         ` Ihor Radchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Yuan Fu @ 2022-09-27 22:28 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel, Theodor Thornhill



> On Sep 26, 2022, at 2:43 AM, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> treesit-font-lock-rules rules take a form of
>>> (MATCHER FACENAME) or (MATCHER FUNCTION)
>>> 
>>> where MATCHER can only be a query.
>>> 
>>> Is there any reason why MATCHER in treesit-font-lock-rules cannot be a
>>> function with access to the fontified node?
>> 
>> Hmm, I’m not sure what do you mean. The whole thing passed to treesit-font-lock-rules is a single query, and we can’t really change the query syntax, that’s defined by tree-sitter. Basically in a query you have patterns paired with capture names, if the pattern matches to a node, that node is returned with corresponding capture name tagged on it. For font-lock, we just use face names as capture names, and when a query returns captured nodes, fontify the node with its capture name, aka a face (or a function).
> 
> What I am asking is an extra dynamic condition in addition to the query.
> For example:
> 1. Only apply FACENAME for nodes matching QUERY, but only when Elisp
>   variable is non-nil
> 
> 2. Only apply FACENAME for nodes matching QUERY, which are in the second
>   half of the buffer
> 
> 3. Only apply FACENAME for notes matching QUERY, which also have a field
>   matching a dynamically assigned regexp.
> 
> Essentially any condition that is not covered by the QUERY, but can be
> checked in Elisp given that node object is passed to the test function.

These can be achieved by using a function, no? You do need to declare global functions for them, but it shouldn’t be a problem. Besides, as I said, the query syntax is not something we can change. The freedom we have is how do we use the capture names. We can’t extend the query with arbitrary lisp.

> 
>>> Further, can OVERRIDE FLAG of the MATCH-HIGHLIGHT as in
>>> font-lock-keywords be supported?
>>> 
>>>  "If OVERRIDE is t, existing fontification can be overwritten. If
>>>   keep, only parts not already fontified are highlighted. If prepend or
>>>   append, existing fontification is merged with the new, in which the new
>>>   or existing fontification, respectively, takes precedence.”
>> 
>> I can do that, but would it be really useful? Unlike regex font-lock which is used for so many different things, tree-sitter font-lock is, IMO, only used to apply a base layer of language-specific highlight. How would one use the override feature in this scenario?
> 
> For example, consider a function definition with docstring field.
> Imagine that you want the function definition to have gray background,
> but the docstring to have yellow background. OVERRIDE t is how this is
> usually implemented in font-lock-keywords.

The pattern that comes after will override patterns that come before. By the nature of parse trees, for any node A and another smaller node B, B is either completely contained in A or completely outside A. So I think the override relationship is enough.

Yuan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-27 22:28       ` Yuan Fu
@ 2022-09-29  4:01         ` Ihor Radchenko
  2022-09-30 21:03           ` Yuan Fu
  0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2022-09-29  4:01 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, Theodor Thornhill

Yuan Fu <casouri@gmail.com> writes:

>> What I am asking is an extra dynamic condition in addition to the query.
>> For example:
>> 1. Only apply FACENAME for nodes matching QUERY, but only when Elisp
>>   variable is non-nil
>> 
>> 2. Only apply FACENAME for nodes matching QUERY, which are in the second
>>   half of the buffer
>> 
>> 3. Only apply FACENAME for notes matching QUERY, which also have a field
>>   matching a dynamically assigned regexp.
>> 
>> Essentially any condition that is not covered by the QUERY, but can be
>> checked in Elisp given that node object is passed to the test function.
>
> These can be achieved by using a function, no? You do need to declare global functions for them, but it shouldn’t be a problem. Besides, as I said, the query syntax is not something we can change. The freedom we have is how do we use the capture names. We can’t extend the query with arbitrary lisp.

Will the currently matched node be passed to the function? Or should the
function run yet another query to determine the node it was called on?

>>>> Further, can OVERRIDE FLAG of the MATCH-HIGHLIGHT as in
>>>> font-lock-keywords be supported?
>>>> 
>>>>  "If OVERRIDE is t, existing fontification can be overwritten. If
>>>>   keep, only parts not already fontified are highlighted. If prepend or
>>>>   append, existing fontification is merged with the new, in which the new
>>>>   or existing fontification, respectively, takes precedence.”
>>> 
>>> I can do that, but would it be really useful? Unlike regex font-lock which is used for so many different things, tree-sitter font-lock is, IMO, only used to apply a base layer of language-specific highlight. How would one use the override feature in this scenario?
>> 
>> For example, consider a function definition with docstring field.
>> Imagine that you want the function definition to have gray background,
>> but the docstring to have yellow background. OVERRIDE t is how this is
>> usually implemented in font-lock-keywords.
>
> The pattern that comes after will override patterns that come before. By the nature of parse trees, for any node A and another smaller node B, B is either completely contained in A or completely outside A. So I think the override relationship is enough.

OVERRIDE can also be 'prepend or 'append to combine faces from multiple
nodes.

Also, OVERRIDE nil will not apply fontification on the already fontified
parts of the region. Note that the parent node might only fontify
fraction of the text inside the child node. The parts not yet fontified
can make use of OVERRIDE nil.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-25  4:27 Recent updates to tree-sitter branch Yuan Fu
  2022-09-25  6:17 ` Ihor Radchenko
@ 2022-09-29 10:13 ` Aurélien Aptel
  1 sibling, 0 replies; 14+ messages in thread
From: Aurélien Aptel @ 2022-09-29 10:13 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, Theodor Thornhill

On Sun, Sep 25, 2022 at 6:29 AM Yuan Fu <casouri@gmail.com> wrote:
> 1. I reworked the tree-traversal functions—moved them from lisp to C, and changed their name and signature:

Did you see any perf improvements from doing this?



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-29  4:01         ` Ihor Radchenko
@ 2022-09-30 21:03           ` Yuan Fu
  2022-10-01  4:20             ` Ihor Radchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Yuan Fu @ 2022-09-30 21:03 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel, Theodor Thornhill



> On Sep 28, 2022, at 9:01 PM, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> What I am asking is an extra dynamic condition in addition to the query.
>>> For example:
>>> 1. Only apply FACENAME for nodes matching QUERY, but only when Elisp
>>>  variable is non-nil
>>> 
>>> 2. Only apply FACENAME for nodes matching QUERY, which are in the second
>>>  half of the buffer
>>> 
>>> 3. Only apply FACENAME for notes matching QUERY, which also have a field
>>>  matching a dynamically assigned regexp.
>>> 
>>> Essentially any condition that is not covered by the QUERY, but can be
>>> checked in Elisp given that node object is passed to the test function.
>> 
>> These can be achieved by using a function, no? You do need to declare global functions for them, but it shouldn’t be a problem. Besides, as I said, the query syntax is not something we can change. The freedom we have is how do we use the capture names. We can’t extend the query with arbitrary lisp.
> 
> Will the currently matched node be passed to the function? Or should the
> function run yet another query to determine the node it was called on?

The matched node is passed to the function.

> 
>>>>> Further, can OVERRIDE FLAG of the MATCH-HIGHLIGHT as in
>>>>> font-lock-keywords be supported?
>>>>> 
>>>>> "If OVERRIDE is t, existing fontification can be overwritten. If
>>>>>  keep, only parts not already fontified are highlighted. If prepend or
>>>>>  append, existing fontification is merged with the new, in which the new
>>>>>  or existing fontification, respectively, takes precedence.”
>>>> 
>>>> I can do that, but would it be really useful? Unlike regex font-lock which is used for so many different things, tree-sitter font-lock is, IMO, only used to apply a base layer of language-specific highlight. How would one use the override feature in this scenario?
>>> 
>>> For example, consider a function definition with docstring field.
>>> Imagine that you want the function definition to have gray background,
>>> but the docstring to have yellow background. OVERRIDE t is how this is
>>> usually implemented in font-lock-keywords.
>> 
>> The pattern that comes after will override patterns that come before. By the nature of parse trees, for any node A and another smaller node B, B is either completely contained in A or completely outside A. So I think the override relationship is enough.
> 
> OVERRIDE can also be 'prepend or 'append to combine faces from multiple
> nodes.

You can’t really pretend or append if the only face format we allow is symbol.

> Also, OVERRIDE nil will not apply fontification on the already fontified
> parts of the region. Note that the parent node might only fontify
> fraction of the text inside the child node. The parts not yet fontified
> can make use of OVERRIDE nil.

Ok, I guess it’s good to have options. But I think it is more intuitive and convenient to override by default.

Yuan


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-09-30 21:03           ` Yuan Fu
@ 2022-10-01  4:20             ` Ihor Radchenko
  2022-10-02  3:46               ` Yuan Fu
  0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2022-10-01  4:20 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, Theodor Thornhill

Yuan Fu <casouri@gmail.com> writes:

>> Will the currently matched node be passed to the function? Or should the
>> function run yet another query to determine the node it was called on?
>
> The matched node is passed to the function.

Thanks for the clarification! I missed this detail in the documentation.

>> OVERRIDE can also be 'prepend or 'append to combine faces from multiple
>> nodes.
>
> You can’t really pretend or append if the only face format we allow is symbol.

Why?
'prepend implies that if there is an existing font-lock-face, the new
face will be prepended to it. Note the 'face text property may contain a
list of faces:

    ‘face’
         The ‘face’ property controls the appearance of the character (*note
         Faces::).  The value of the property can be the following:
    
     ...
             • A list of faces.  Each list element should be either a face
              name or an anonymous face.  This specifies a face which is an
              aggregate of the attributes of each of the listed faces.
              Faces occurring earlier in the list have higher priority.

>> Also, OVERRIDE nil will not apply fontification on the already fontified
>> parts of the region. Note that the parent node might only fontify
>> fraction of the text inside the child node. The parts not yet fontified
>> can make use of OVERRIDE nil.
>
> Ok, I guess it’s good to have options. But I think it is more intuitive and convenient to override by default.

I disagree. The current default in font-lock-keywords is not to
override. If programmatic font-lock behaves differently, it will be
confusing.

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-10-01  4:20             ` Ihor Radchenko
@ 2022-10-02  3:46               ` Yuan Fu
  2022-10-02  7:33                 ` Ihor Radchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Yuan Fu @ 2022-10-02  3:46 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel, Theodor Thornhill



> On Sep 30, 2022, at 9:20 PM, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> Will the currently matched node be passed to the function? Or should the
>>> function run yet another query to determine the node it was called on?
>> 
>> The matched node is passed to the function.
> 
> Thanks for the clarification! I missed this detail in the documentation.
> 
>>> OVERRIDE can also be 'prepend or 'append to combine faces from multiple
>>> nodes.
>> 
>> You can’t really pretend or append if the only face format we allow is symbol.
> 
> Why?
> 'prepend implies that if there is an existing font-lock-face, the new
> face will be prepended to it. Note the 'face text property may contain a
> list of faces:
> 
>    ‘face’
>         The ‘face’ property controls the appearance of the character (*note
>         Faces::).  The value of the property can be the following:
> 
>     ...
>             • A list of faces.  Each list element should be either a face
>              name or an anonymous face.  This specifies a face which is an
>              aggregate of the attributes of each of the listed faces.
>              Faces occurring earlier in the list have higher priority.

I see, yeah you are right.

>>> Also, OVERRIDE nil will not apply fontification on the already fontified
>>> parts of the region. Note that the parent node might only fontify
>>> fraction of the text inside the child node. The parts not yet fontified
>>> can make use of OVERRIDE nil.
>> 
>> Ok, I guess it’s good to have options. But I think it is more intuitive and convenient to override by default.
> 
> I disagree. The current default in font-lock-keywords is not to
> override. If programmatic font-lock behaves differently, it will be
> confusing.

I think the tree-sitter queries are different enough from font-lock keywords that it will not bring confusion. Further more, default to override should make things easier, especially to delicate things like string interpolation, or other nested constructs, where tree-sitter shines. By default, if the to-be-fontified region has any existing face, the whole fontification is given up instead of filling in new fontification. That would be IMO confusing because user would think the match failed.

Also bear in mind that the override flag can only be applied to the whole query, rather than individual captured nodes.

Yuan





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-10-02  3:46               ` Yuan Fu
@ 2022-10-02  7:33                 ` Ihor Radchenko
  2022-10-02 22:54                   ` Yuan Fu
  0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2022-10-02  7:33 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, Theodor Thornhill

Yuan Fu <casouri@gmail.com> writes:

>> I disagree. The current default in font-lock-keywords is not to
>> override. If programmatic font-lock behaves differently, it will be
>> confusing.
>
> I think the tree-sitter queries are different enough from font-lock keywords that it will not bring confusion. Further more, default to override should make things easier, especially to delicate things like string interpolation, or other nested constructs, where tree-sitter shines. By default, if the to-be-fontified region has any existing face, the whole fontification is given up instead of filling in new fontification. That would be IMO confusing because user would think the match failed.

I do agree that it may be confusing. Yet, it is how the default
fontification works. I do not think that tree-sitter matching is
conceptually different compared to regexp matching. (And this particular
area is not even limited to tree-sitter, AFAIU).

I do not insist on my idea being actually used, but wanted to leave a
data point to be considered.

> Also bear in mind that the override flag can only be applied to the whole query, rather than individual captured nodes.

How does it change anything? I may be misunderstanding something---can
you provide some illustrative example clarifying whole query vs.
individual notes?

-- 
Ihor Radchenko,
Org mode contributor,
Learn more about Org mode at https://orgmode.org/.
Support Org development at https://liberapay.com/org-mode,
or support my work at https://liberapay.com/yantar92



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-10-02  7:33                 ` Ihor Radchenko
@ 2022-10-02 22:54                   ` Yuan Fu
  2022-10-03  5:58                     ` Ihor Radchenko
  0 siblings, 1 reply; 14+ messages in thread
From: Yuan Fu @ 2022-10-02 22:54 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel, Theodor Thornhill, Stefan Monnier



> On Oct 2, 2022, at 12:33 AM, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> I disagree. The current default in font-lock-keywords is not to
>>> override. If programmatic font-lock behaves differently, it will be
>>> confusing.
>> 
>> I think the tree-sitter queries are different enough from font-lock keywords that it will not bring confusion. Further more, default to override should make things easier, especially to delicate things like string interpolation, or other nested constructs, where tree-sitter shines. By default, if the to-be-fontified region has any existing face, the whole fontification is given up instead of filling in new fontification. That would be IMO confusing because user would think the match failed.
> 
> I do agree that it may be confusing. Yet, it is how the default
> fontification works. I do not think that tree-sitter matching is
> conceptually different compared to regexp matching. (And this particular
> area is not even limited to tree-sitter, AFAIU).

It’s not conceptually different from regex matching, but the tooling is different enough that may bear to be different in some aspect without creating confusion. But I understand that this is purely subjective. 

I cc’ed Stefan. Maybe he has more educated opinions about this. (Aka could you make a decision please because I couldn’t.)

> 
> I do not insist on my idea being actually used, but wanted to leave a
> data point to be considered.

Thanks. I don’t think my opinion is definitely better, either. Both sides seems reasonable to me. I hope that through discussion we can explore the topic throughly.

> 
>> Also bear in mind that the override flag can only be applied to the whole query, rather than individual captured nodes.
> 
> How does it change anything? I may be misunderstanding something---can
> you provide some illustrative example clarifying whole query vs.
> individual notes?

What I meant is that, for font-lock-keywords, one can set override flag for each individual match:

(string-regex font-lock-string-face t)
(function-name-regexp font-lock-function-name-face nil)
(class-name-regexp font-lock-type-face t)
...

But for tree-sitter, a query contains many matches and the flag is set for the query. So if I want to use different override flag for different matches, I need to split them into two queries:

(treesit-font-lock-rules
 :language 'python
 :override 'append
 '((string) @python--treesit-fontify-string
   ((string) @font-lock-doc-face
    (:match "^\"\"\"" @font-lock-doc-face))
   (interpolation (identifier) @font-lock-variable-name-face))

 :language 'python
 :override nil
 '((function_definition
    name: (identifier) @font-lock-function-name-face)

   (class_definition
    name: (identifier) @font-lock-type-face)

   ;; Comment and string.
   (comment) @font-lock-comment-face))


That means if we use override=nil as default, it is very likely that users need to explicitly set override to t for the whole query, or split the query into separate parts. Nothing serious, but it seems less convenient.

A real use-case for override is how I fontified Python strings above. I have three matches for (1) all strings (2) docstrings (3) variable names in string interpolations. IMO it’s intuitive and convenient for later more specific matches to override earlier more general matches.

Yuan





^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-10-02 22:54                   ` Yuan Fu
@ 2022-10-03  5:58                     ` Ihor Radchenko
  2022-10-04 16:58                       ` Yuan Fu
  0 siblings, 1 reply; 14+ messages in thread
From: Ihor Radchenko @ 2022-10-03  5:58 UTC (permalink / raw)
  To: Yuan Fu; +Cc: emacs-devel, Theodor Thornhill, Stefan Monnier

Yuan Fu <casouri@gmail.com> writes:

>>> Also bear in mind that the override flag can only be applied to the whole query, rather than individual captured nodes.
>> 
>> How does it change anything? I may be misunderstanding something---can
>> you provide some illustrative example clarifying whole query vs.
>> individual notes?
>
> What I meant is that, for font-lock-keywords, one can set override flag for each individual match:
>
> (string-regex font-lock-string-face t)
> (function-name-regexp font-lock-function-name-face nil)
> (class-name-regexp font-lock-type-face t)
> ...
>
> But for tree-sitter, a query contains many matches and the flag is set for the query. So if I want to use different override flag for different matches, I need to split them into two queries:
>
> (treesit-font-lock-rules
>  :language 'python
>  :override 'append
>  '((string) @python--treesit-fontify-string
>    ((string) @font-lock-doc-face
>     (:match "^\"\"\"" @font-lock-doc-face))
>    (interpolation (identifier) @font-lock-variable-name-face))
>
>  :language 'python
>  :override nil
>  '((function_definition
>     name: (identifier) @font-lock-function-name-face)
>
>    (class_definition
>     name: (identifier) @font-lock-type-face)
>
>    ;; Comment and string.
>    (comment) @font-lock-comment-face))

> That means if we use override=nil as default, it is very likely that users need to explicitly set override to t for the whole query, or split the query into separate parts. Nothing serious, but it seems less convenient.

What about allowing (@python--treesit-fontify-string 'append) to specify
the override?

> A real use-case for override is how I fontified Python strings above. I have three matches for (1) all strings (2) docstrings (3) variable names in string interpolations. IMO it’s intuitive and convenient for later more specific matches to override earlier more general matches.

The current convention in font-lock-keywords is exactly opposite -
earlier matches are more specific, and they are later not replaced by
later more general matches.

Also, for reference, I am currently developing parser-based
fontification for Org.

I am using a somewhat different approach (closer to font-lock-keywords):

((drawer property-drawer) ;; <- match node types
  (:begin-marker 'org-drawer t) ;; <- apply fontification to :begin-marker field inside 
  (:end-marker 'org-drawer t)) ;;  <- ...                    :end-marker ....
((headline inlinetask)
  (:title-line
   (if (org-element-match-property :archivedp) ;; <- Elisp matching of the node properties
       'org-archived
     (pcase (org-element-match-property :todo-type) ;; <- ....
       (`todo (when org-fontify-todo-headline 'org-headline-todo))
       (`done (when org-fontify-done-headline 'org-headline-done))
       (_ nil)))
   t)) ;; <- override
 ((bold italic underline verbatim code strike-through)
   (:full-no-blank '(face nil org-emphasis t))) ;; <- fontify contents of the matched node
   
Also, see https://github.com/yantar92/org/blob/feature/org-font-lock-element/lisp/org-font-lock.el#L574

From my experience re-implementing the vanilla fontification,
fontification order is important and may create subtle issues when not
designed carefully.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: Recent updates to tree-sitter branch
  2022-10-03  5:58                     ` Ihor Radchenko
@ 2022-10-04 16:58                       ` Yuan Fu
  0 siblings, 0 replies; 14+ messages in thread
From: Yuan Fu @ 2022-10-04 16:58 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: emacs-devel, Theodor Thornhill, Stefan Monnier



> On Oct 2, 2022, at 10:58 PM, Ihor Radchenko <yantar92@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>>> Also bear in mind that the override flag can only be applied to the whole query, rather than individual captured nodes.
>>> 
>>> How does it change anything? I may be misunderstanding something---can
>>> you provide some illustrative example clarifying whole query vs.
>>> individual notes?
>> 
>> What I meant is that, for font-lock-keywords, one can set override flag for each individual match:
>> 
>> (string-regex font-lock-string-face t)
>> (function-name-regexp font-lock-function-name-face nil)
>> (class-name-regexp font-lock-type-face t)
>> ...
>> 
>> But for tree-sitter, a query contains many matches and the flag is set for the query. So if I want to use different override flag for different matches, I need to split them into two queries:
>> 
>> (treesit-font-lock-rules
>> :language 'python
>> :override 'append
>> '((string) @python--treesit-fontify-string
>>   ((string) @font-lock-doc-face
>>    (:match "^\"\"\"" @font-lock-doc-face))
>>   (interpolation (identifier) @font-lock-variable-name-face))
>> 
>> :language 'python
>> :override nil
>> '((function_definition
>>    name: (identifier) @font-lock-function-name-face)
>> 
>>   (class_definition
>>    name: (identifier) @font-lock-type-face)
>> 
>>   ;; Comment and string.
>>   (comment) @font-lock-comment-face))
> 
>> That means if we use override=nil as default, it is very likely that users need to explicitly set override to t for the whole query, or split the query into separate parts. Nothing serious, but it seems less convenient.
> 
> What about allowing (@python--treesit-fontify-string 'append) to specify
> the override?

That’s not impossible but I don’t think it’s worth it.

> 
>> A real use-case for override is how I fontified Python strings above. I have three matches for (1) all strings (2) docstrings (3) variable names in string interpolations. IMO it’s intuitive and convenient for later more specific matches to override earlier more general matches.
> 
> The current convention in font-lock-keywords is exactly opposite -
> earlier matches are more specific, and they are later not replaced by
> later more general matches.
> 
> Also, for reference, I am currently developing parser-based
> fontification for Org.
> 
> I am using a somewhat different approach (closer to font-lock-keywords):
> 
> ((drawer property-drawer) ;; <- match node types
>  (:begin-marker 'org-drawer t) ;; <- apply fontification to :begin-marker field inside 
>  (:end-marker 'org-drawer t)) ;;  <- ...                    :end-marker ....
> ((headline inlinetask)
>  (:title-line
>   (if (org-element-match-property :archivedp) ;; <- Elisp matching of the node properties
>       'org-archived
>     (pcase (org-element-match-property :todo-type) ;; <- ....
>       (`todo (when org-fontify-todo-headline 'org-headline-todo))
>       (`done (when org-fontify-done-headline 'org-headline-done))
>       (_ nil)))
>   t)) ;; <- override
> ((bold italic underline verbatim code strike-through)
>   (:full-no-blank '(face nil org-emphasis t))) ;; <- fontify contents of the matched node
> 
> Also, see https://github.com/yantar92/org/blob/feature/org-font-lock-element/lisp/org-font-lock.el#L574
> 
> From my experience re-implementing the vanilla fontification,
> fontification order is important and may create subtle issues when not
> designed carefully.

Thinking more about it, it is not a tall order to ask to add :override t for those who want override by default. And we can keep the consistency between font-lock and tree-sitter. I just need to make sure to document it clearly so no one misses it.

Yuan




^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-10-04 16:58 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-09-25  4:27 Recent updates to tree-sitter branch Yuan Fu
2022-09-25  6:17 ` Ihor Radchenko
2022-09-26  8:35   ` Yuan Fu
2022-09-26  9:43     ` Ihor Radchenko
2022-09-27 22:28       ` Yuan Fu
2022-09-29  4:01         ` Ihor Radchenko
2022-09-30 21:03           ` Yuan Fu
2022-10-01  4:20             ` Ihor Radchenko
2022-10-02  3:46               ` Yuan Fu
2022-10-02  7:33                 ` Ihor Radchenko
2022-10-02 22:54                   ` Yuan Fu
2022-10-03  5:58                     ` Ihor Radchenko
2022-10-04 16:58                       ` Yuan Fu
2022-09-29 10:13 ` Aurélien Aptel

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).