* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 19:41 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-14 20:02 ` Eli Zaretskii
2023-02-14 20:21 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 20:59 ` Dmitry Gutov
2023-02-14 23:57 ` Dmitry Gutov
2 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-14 20:02 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: 61502, casouri, pankaj
> From: Theodor Thornhill <theo@thornhill.no>
> Cc: 61502@debbugs.gnu.org
> Date: Tue, 14 Feb 2023 20:41:04 +0100
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Keep typing whatever code you wan "int main" to include, and it will
> > auto-indent soon enough.
>
> Yeah, but.
My point is that what we are used to from CC mode does not necessarily
have to work the same way with tree-sitter based modes. As long as
the indentation fixes itself soon enough, we are still fine, I think.
> int
> main
> {
> for (;;)
> {|
> }
> ```
>
> If you press RET if point at | you'll see we indent immediately, even
> though there is no closing bracket. This is because of how
> treesit-indent defaults to treesit-node-on when there is no node at
> point. So in the example without the for loop the parent is then set to
> whatever treesit-node-on returns, which in this case is the root
> node. That means that the rule for translation_unit is selected, which
> is:
>
> `(((parent-is "translation_unit") point-min 0)
>
> However, what's interesting here is that treesit-indent selects an
> "unexisting" node as the "smallest-node". Specifically that is:
>
> #<treesit-node "}" in 13-13>
>
> This node in turn will return "compound_statement" if you look for its
> parent. It seems some parsers detects these nodes, so maybe we should
> add some handling for that? Some "block-closers" code in
> treesit-node-on, so that treesit-node-on doesn't default to the root
> node, but rather the compound_statement?
AFAIU, you are talking about hitting RET in the following situation
(where "|" stands for point):
int main ()
{|
}
However, the OP presented a slightly different situation:
int main ()
{|
That is, without the closing brace. In that case, there's no "}" in
the source. Are you saying that the tree-sitter's parser "invents"
such a node?
And why does treesit-indent select that "unexisting" node in the first
place?
> I'm not sure this explanation was easy to follow at all, but I'll add a
> hack in a diff to make the point hopefully a little clearer.
>
> What do you think?
How well did you test that? Does it fix similar problems with struct
definition at top-level? Are there any regressions elsewhere in the
indentation?
There are also other similar cases, but with code on deeper levels.
Try this, for example (where "|" again stands for point):
int
main
{
for (;;)|
}
Now press RET and observe the result:
int
main
{
for (;;)
|
}
instead of the expected
int
main
{
for (;;)
|
}
Why?
(Of course, as soon as you type ";", the code is automatically
reindented to yield the correct indentation. Which was my point.)
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 20:02 ` Eli Zaretskii
@ 2023-02-14 20:21 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 12:24 ` Eli Zaretskii
0 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-14 20:21 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, pankaj
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: 61502@debbugs.gnu.org
>> Date: Tue, 14 Feb 2023 20:41:04 +0100
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> > Keep typing whatever code you wan "int main" to include, and it will
>> > auto-indent soon enough.
>>
>> Yeah, but.
>
> My point is that what we are used to from CC mode does not necessarily
> have to work the same way with tree-sitter based modes. As long as
> the indentation fixes itself soon enough, we are still fine, I think.
>
>> int
>> main
>> {
>> for (;;)
>> {|
>> }
>> ```
>>
>> If you press RET if point at | you'll see we indent immediately, even
>> though there is no closing bracket. This is because of how
>> treesit-indent defaults to treesit-node-on when there is no node at
>> point. So in the example without the for loop the parent is then set to
>> whatever treesit-node-on returns, which in this case is the root
>> node. That means that the rule for translation_unit is selected, which
>> is:
>>
>> `(((parent-is "translation_unit") point-min 0)
>>
>> However, what's interesting here is that treesit-indent selects an
>> "unexisting" node as the "smallest-node". Specifically that is:
>>
>> #<treesit-node "}" in 13-13>
>>
>> This node in turn will return "compound_statement" if you look for its
>> parent. It seems some parsers detects these nodes, so maybe we should
>> add some handling for that? Some "block-closers" code in
>> treesit-node-on, so that treesit-node-on doesn't default to the root
>> node, but rather the compound_statement?
>
> AFAIU, you are talking about hitting RET in the following situation
> (where "|" stands for point):
>
> int main ()
> {|
> }
>
> However, the OP presented a slightly different situation:
>
> int main ()
> {|
>
> That is, without the closing brace. In that case, there's no "}" in
> the source. Are you saying that the tree-sitter's parser "invents"
> such a node?
That's correct. In tree-sitter-c at least that's the case.
>
> And why does treesit-indent select that "unexisting" node in the first
> place?
>
This code:
(smallest-node
(cond ((null (treesit-parser-list)) nil)
((eq 1 (length (treesit-parser-list)))
(treesit-node-at bol))
((treesit-language-at (point))
(treesit-node-at bol (treesit-language-at (point))))
(t (treesit-node-at bol))))
treesit-node-at selects the "invented" node.
>> I'm not sure this explanation was easy to follow at all, but I'll add a
>> hack in a diff to make the point hopefully a little clearer.
>>
>> What do you think?
>
> How well did you test that?
Not well at all. I just created that hack to make the example a little
clearer. I think the change probably should go into treesit-node-on.
> Does it fix similar problems with struct
> definition at top-level? Are there any regressions elsewhere in the
> indentation?
Not that I found, but I'll experiment some more.
>
> There are also other similar cases, but with code on deeper levels.
> Try this, for example (where "|" again stands for point):
>
> int
> main
> {
> for (;;)|
> }
>
> Now press RET and observe the result:
>
> int
> main
> {
> for (;;)
> |
> }
>
> instead of the expected
>
> int
> main
> {
> for (;;)
> |
> }
>
> Why?
If I'm not mistaken the same "problem". Treesit-node-on selects the
surrounding compound_statement, so it only indents one step from column 0.
>
> (Of course, as soon as you type ";", the code is automatically
> reindented to yield the correct indentation. Which was my point.)
Yeah, but consider the same example of yours without the closing brace:
```
int
main
{
for (;;)|
```
Now type RET
```
int
main
{
for (;;)
|
```
Now type {
```
int
main
{
for (;;)
{|
```
Now type RET
```
int
main
{
for (;;)
{
|
```
Which is what I consider a little confusing. We get different
indentation with and without the closed scope.
Theo
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 20:21 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 12:24 ` Eli Zaretskii
2023-02-15 12:41 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 12:24 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: 61502, casouri, pankaj
> From: Theodor Thornhill <theo@thornhill.no>
> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
> Date: Tue, 14 Feb 2023 21:21:33 +0100
>
> int
> main
> {
> for (;;)|
> ```
>
> Now type RET
>
> ```
> int
> main
> {
> for (;;)
> |
> ```
>
> Now type {
>
> ```
> int
> main
> {
> for (;;)
> {|
> ```
>
> Now type RET
>
> ```
> int
> main
> {
> for (;;)
> {
> |
> ```
>
> Which is what I consider a little confusing. We get different
> indentation with and without the closed scope.
Any idea why it doesn't choose the 'for' node? That's what a naïve
user would expect, I think.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 12:24 ` Eli Zaretskii
@ 2023-02-15 12:41 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 13:35 ` Dmitry Gutov
2023-02-15 14:03 ` Eli Zaretskii
0 siblings, 2 replies; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 12:41 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, pankaj
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>> Date: Tue, 14 Feb 2023 21:21:33 +0100
>>
>> int
>> main
>> {
>> for (;;)|
>> ```
>>
>> Now type RET
>>
>> ```
>> int
>> main
>> {
>> for (;;)
>> |
>> ```
>>
>> Now type {
>>
>> ```
>> int
>> main
>> {
>> for (;;)
>> {|
>> ```
>>
>> Now type RET
>>
>> ```
>> int
>> main
>> {
>> for (;;)
>> {
>> |
>> ```
>>
>> Which is what I consider a little confusing. We get different
>> indentation with and without the closed scope.
>
> Any idea why it doesn't choose the 'for' node? That's what a naïve
> user would expect, I think.
That's what any user should expect, IMO. It's due to
treesit-node-on. See its docstring:
"Return the smallest node covering BEG to END.
BEWARE! Calling this function on an empty line that is not
inside any top-level construct (function definition, etc.) most
probably will give you the root node, because the root node is
the smallest node that covers that empty line. You probably want
to use `treesit-node-at' instead.
Return nil if none was found. If NAMED is non-nil, only look for
named node.
If PARSER-OR-LANG is a parser, use that parser; if PARSER-OR-LANG
is a language, find the first parser for that language in the
current buffer, or create one if none exists; If PARSER-OR-LANG
is nil, try to guess the language at BEG using `treesit-language-at'."
Rather selecting the first node "above" point sounds more reasonable?
Theo
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 12:41 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 13:35 ` Dmitry Gutov
2023-02-15 14:03 ` Eli Zaretskii
1 sibling, 0 replies; 28+ messages in thread
From: Dmitry Gutov @ 2023-02-15 13:35 UTC (permalink / raw)
To: Theodor Thornhill, Eli Zaretskii; +Cc: 61502, casouri, pankaj
On 15/02/2023 14:41, Theodor Thornhill via Bug reports for GNU Emacs,
the Swiss army knife of text editors wrote:
> Rather selecting the first node "above" point sounds more reasonable?
treesit-node-on is the "precise" finding function. treesit-node-at is
the dwim-y one.
I think having treesit-node-on return a node that doesn't actually cover
BEG to END will be very counter-intuitive.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 12:41 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 13:35 ` Dmitry Gutov
@ 2023-02-15 14:03 ` Eli Zaretskii
2023-02-15 14:21 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 14:03 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: 61502, casouri, pankaj
> From: Theodor Thornhill <theo@thornhill.no>
> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
> Date: Wed, 15 Feb 2023 13:41:47 +0100
>
> "Return the smallest node covering BEG to END.
>
> BEWARE! Calling this function on an empty line that is not
> inside any top-level construct (function definition, etc.) most
> probably will give you the root node, because the root node is
> the smallest node that covers that empty line. You probably want
> to use `treesit-node-at' instead.
>
> Return nil if none was found. If NAMED is non-nil, only look for
> named node.
>
> If PARSER-OR-LANG is a parser, use that parser; if PARSER-OR-LANG
> is a language, find the first parser for that language in the
> current buffer, or create one if none exists; If PARSER-OR-LANG
> is nil, try to guess the language at BEG using `treesit-language-at'."
>
>
> Rather selecting the first node "above" point sounds more reasonable?
What happens if you indeed call treesit-node-at, as the doc string
suggests?
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 14:03 ` Eli Zaretskii
@ 2023-02-15 14:21 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 14:27 ` Eli Zaretskii
0 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 14:21 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, pankaj
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>> Date: Wed, 15 Feb 2023 13:41:47 +0100
>>
>> "Return the smallest node covering BEG to END.
>>
>> BEWARE! Calling this function on an empty line that is not
>> inside any top-level construct (function definition, etc.) most
>> probably will give you the root node, because the root node is
>> the smallest node that covers that empty line. You probably want
>> to use `treesit-node-at' instead.
>>
>> Return nil if none was found. If NAMED is non-nil, only look for
>> named node.
>>
>> If PARSER-OR-LANG is a parser, use that parser; if PARSER-OR-LANG
>> is a language, find the first parser for that language in the
>> current buffer, or create one if none exists; If PARSER-OR-LANG
>> is nil, try to guess the language at BEG using `treesit-language-at'."
>>
>>
>> Rather selecting the first node "above" point sounds more reasonable?
>
> What happens if you indeed call treesit-node-at, as the doc string
> suggests?
int
main
{
for (;;)
|
eval: (treesit-node-at (point)) ;; #<treesit-node ")" in 21-22>
Theo
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 14:21 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 14:27 ` Eli Zaretskii
2023-02-15 14:53 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 14:27 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: 61502, casouri, pankaj
> From: Theodor Thornhill <theo@thornhill.no>
> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
> Date: Wed, 15 Feb 2023 15:21:11 +0100
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > What happens if you indeed call treesit-node-at, as the doc string
> > suggests?
>
>
> int
> main
> {
> for (;;)
> |
>
> eval: (treesit-node-at (point)) ;; #<treesit-node ")" in 21-22>
I'm afraid I cannot interpret that. What does it mean?
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 14:27 ` Eli Zaretskii
@ 2023-02-15 14:53 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 15:02 ` Eli Zaretskii
0 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 14:53 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, pankaj
Eli Zaretskii <eliz@gnu.org> writes:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>> Date: Wed, 15 Feb 2023 15:21:11 +0100
>>
>> Eli Zaretskii <eliz@gnu.org> writes:
>>
>> > What happens if you indeed call treesit-node-at, as the doc string
>> > suggests?
>>
>>
>> int
>> main
>> {
>> for (;;)
>> |
>>
>> eval: (treesit-node-at (point)) ;; #<treesit-node ")" in 21-22>
>
> I'm afraid I cannot interpret that. What does it mean?
It returns the closing paren in "for (;;)", right before point. Which
may not be as useful, as it is a child of for_statement, IIRC. Making a
rule for that isn't too hard, but it complicates things.
Theo
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 14:53 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 15:02 ` Eli Zaretskii
2023-02-15 15:48 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 15:02 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: 61502, casouri, pankaj
> From: Theodor Thornhill <theo@thornhill.no>
> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
> Date: Wed, 15 Feb 2023 15:53:22 +0100
>
> >> int
> >> main
> >> {
> >> for (;;)
> >> |
> >>
> >> eval: (treesit-node-at (point)) ;; #<treesit-node ")" in 21-22>
> >
> > I'm afraid I cannot interpret that. What does it mean?
>
> It returns the closing paren in "for (;;)", right before point. Which
> may not be as useful, as it is a child of for_statement, IIRC. Making a
> rule for that isn't too hard, but it complicates things.
Hmm... this might make no sense, but: why are we asking about the node
at point? For indentation purposes, when RET is pressed, shouldn't we
ask about the node of the first non-whitespace character of the line
where we get RET?
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 15:02 ` Eli Zaretskii
@ 2023-02-15 15:48 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 15:57 ` Dmitry Gutov
2023-02-15 17:09 ` Eli Zaretskii
0 siblings, 2 replies; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 15:48 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, pankaj
On 15 February 2023 16:02:03 CET, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>> Date: Wed, 15 Feb 2023 15:53:22 +0100
>>
>> >> int
>> >> main
>> >> {
>> >> for (;;)
>> >> |
>> >>
>> >> eval: (treesit-node-at (point)) ;; #<treesit-node ")" in 21-22>
>> >
>> > I'm afraid I cannot interpret that. What does it mean?
>>
>> It returns the closing paren in "for (;;)", right before point. Which
>> may not be as useful, as it is a child of for_statement, IIRC. Making a
>> rule for that isn't too hard, but it complicates things.
>
>Hmm... this might make no sense, but: why are we asking about the node
>at point? For indentation purposes, when RET is pressed, shouldn't we
>ask about the node of the first non-whitespace character of the line
>where we get RET?
Yeah, but what then to do when there is only whitespace?
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 15:48 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 15:57 ` Dmitry Gutov
2023-02-15 17:11 ` Eli Zaretskii
2023-02-15 17:09 ` Eli Zaretskii
1 sibling, 1 reply; 28+ messages in thread
From: Dmitry Gutov @ 2023-02-15 15:57 UTC (permalink / raw)
To: Theodor Thornhill, Eli Zaretskii; +Cc: 61502, casouri, pankaj
On 15/02/2023 17:48, Theodor Thornhill via Bug reports for GNU Emacs,
the Swiss army knife of text editors wrote:
> For indentation purposes, when RET is pressed, shouldn't we
> ask about the node of the first non-whitespace character of the line
> where we get RET?
RET usually indents two lines now: the current one and the next.
The issue is what to do about indentation on the next (opened) line.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 15:57 ` Dmitry Gutov
@ 2023-02-15 17:11 ` Eli Zaretskii
2023-02-15 17:57 ` Dmitry Gutov
0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 17:11 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 61502, casouri, theo, pankaj
> Date: Wed, 15 Feb 2023 17:57:51 +0200
> Cc: 61502@debbugs.gnu.org, casouri@gmail.com, pankaj@codeisgreat.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> On 15/02/2023 17:48, Theodor Thornhill via Bug reports for GNU Emacs,
> the Swiss army knife of text editors wrote:
> > For indentation purposes, when RET is pressed, shouldn't we
> > ask about the node of the first non-whitespace character of the line
> > where we get RET?
>
> RET usually indents two lines now: the current one and the next.
>
> The issue is what to do about indentation on the next (opened) line.
Yes, I tried to suggest that for the indentation of the next line we
find the node that corresponds to the beginning of the current line.
That assumes that we already know how to indent a line after "for" (or
"if" or "while" or...).
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 17:11 ` Eli Zaretskii
@ 2023-02-15 17:57 ` Dmitry Gutov
2023-02-15 18:11 ` Eli Zaretskii
0 siblings, 1 reply; 28+ messages in thread
From: Dmitry Gutov @ 2023-02-15 17:57 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, theo, pankaj
On 15/02/2023 19:11, Eli Zaretskii wrote:
>> Date: Wed, 15 Feb 2023 17:57:51 +0200
>> Cc:61502@debbugs.gnu.org,casouri@gmail.com,pankaj@codeisgreat.org
>> From: Dmitry Gutov<dgutov@yandex.ru>
>>
>> On 15/02/2023 17:48, Theodor Thornhill via Bug reports for GNU Emacs,
>> the Swiss army knife of text editors wrote:
>>> For indentation purposes, when RET is pressed, shouldn't we
>>> ask about the node of the first non-whitespace character of the line
>>> where we get RET?
>> RET usually indents two lines now: the current one and the next.
>>
>> The issue is what to do about indentation on the next (opened) line.
> Yes, I tried to suggest that for the indentation of the next line we
> find the node that corresponds to the beginning of the current line.
> That assumes that we already know how to indent a line after "for" (or
> "if" or "while" or...).
If the current line is empty, we could indeed look for the node at the
end of the previous line.
But to determine whether that node is an opener, or a closer, or a
container that should have spanned the current line (or should be
considered to do so), I think will require grammar-specific logic.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 17:57 ` Dmitry Gutov
@ 2023-02-15 18:11 ` Eli Zaretskii
2023-02-15 18:18 ` Dmitry Gutov
0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 18:11 UTC (permalink / raw)
To: Dmitry Gutov; +Cc: 61502, casouri, theo, pankaj
> Date: Wed, 15 Feb 2023 19:57:04 +0200
> Cc: 61502@debbugs.gnu.org, casouri@gmail.com, theo@thornhill.no,
> pankaj@codeisgreat.org
> From: Dmitry Gutov <dgutov@yandex.ru>
>
> On 15/02/2023 19:11, Eli Zaretskii wrote:
> >> Date: Wed, 15 Feb 2023 17:57:51 +0200
> >> Cc:61502@debbugs.gnu.org,casouri@gmail.com,pankaj@codeisgreat.org
> >> From: Dmitry Gutov<dgutov@yandex.ru>
> >>
> >> On 15/02/2023 17:48, Theodor Thornhill via Bug reports for GNU Emacs,
> >> the Swiss army knife of text editors wrote:
> >>> For indentation purposes, when RET is pressed, shouldn't we
> >>> ask about the node of the first non-whitespace character of the line
> >>> where we get RET?
> >> RET usually indents two lines now: the current one and the next.
> >>
> >> The issue is what to do about indentation on the next (opened) line.
> > Yes, I tried to suggest that for the indentation of the next line we
> > find the node that corresponds to the beginning of the current line.
> > That assumes that we already know how to indent a line after "for" (or
> > "if" or "while" or...).
>
> If the current line is empty, we could indeed look for the node at the
> end of the previous line.
>
> But to determine whether that node is an opener, or a closer, or a
> container that should have spanned the current line (or should be
> considered to do so), I think will require grammar-specific logic.
Sure, but I thought the problem was we were using the wrong nodes. I
presumed that, once the "right" node is found, we can thereafter use
the information of that node (which is grammar-specific) to take it
from there and determine the required indentation. You seem to be
saying there's more there than meets the eye?
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 18:11 ` Eli Zaretskii
@ 2023-02-15 18:18 ` Dmitry Gutov
0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Gutov @ 2023-02-15 18:18 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, theo, pankaj
On 15/02/2023 20:11, Eli Zaretskii wrote:
> Sure, but I thought the problem was we were using the wrong nodes. I
> presumed that, once the "right" node is found, we can thereafter use
> the information of that node (which is grammar-specific) to take it
> from there and determine the required indentation. You seem to be
> saying there's more there than meets the eye?
I think the method of finding the correct node will need to be
grammar-specific as well.
Note that this is mostly important for "incomplete" code. The users of
electric-pair-mode should experience adequate indentation behavior already.
The for/if/else statements without curlies seem to be one of the few
exceptions, but that the user also might be typing "{" next, in which
case the current indentation will be the correct one.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 15:48 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 15:57 ` Dmitry Gutov
@ 2023-02-15 17:09 ` Eli Zaretskii
2023-02-15 17:14 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
1 sibling, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 17:09 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: 61502, casouri, pankaj
> Date: Wed, 15 Feb 2023 16:48:33 +0100
> From: Theodor Thornhill <theo@thornhill.no>
> CC: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>
>
>
> On 15 February 2023 16:02:03 CET, Eli Zaretskii <eliz@gnu.org> wrote:
> >> From: Theodor Thornhill <theo@thornhill.no>
> >> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
> >> Date: Wed, 15 Feb 2023 15:53:22 +0100
> >>
> >> >> int
> >> >> main
> >> >> {
> >> >> for (;;)
> >> >> |
> >> >>
> >> >> eval: (treesit-node-at (point)) ;; #<treesit-node ")" in 21-22>
> >> >
> >> > I'm afraid I cannot interpret that. What does it mean?
> >>
> >> It returns the closing paren in "for (;;)", right before point. Which
> >> may not be as useful, as it is a child of for_statement, IIRC. Making a
> >> rule for that isn't too hard, but it complicates things.
> >
> >Hmm... this might make no sense, but: why are we asking about the node
> >at point? For indentation purposes, when RET is pressed, shouldn't we
> >ask about the node of the first non-whitespace character of the line
> >where we get RET?
>
> Yeah, but what then to do when there is only whitespace?
Look back for the first line that has something other than whitespace?
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 17:09 ` Eli Zaretskii
@ 2023-02-15 17:14 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 17:30 ` Eli Zaretskii
0 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 17:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, pankaj
On 15 February 2023 18:09:18 CET, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Wed, 15 Feb 2023 16:48:33 +0100
>> From: Theodor Thornhill <theo@thornhill.no>
>> CC: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>>
>>
>>
>> On 15 February 2023 16:02:03 CET, Eli Zaretskii <eliz@gnu.org> wrote:
>> >> From: Theodor Thornhill <theo@thornhill.no>
>> >> Cc: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>> >> Date: Wed, 15 Feb 2023 15:53:22 +0100
>> >>
>> >> >> int
>> >> >> main
>> >> >> {
>> >> >> for (;;)
>> >> >> |
>> >> >>
>> >> >> eval: (treesit-node-at (point)) ;; #<treesit-node ")" in 21-22>
>> >> >
>> >> > I'm afraid I cannot interpret that. What does it mean?
>> >>
>> >> It returns the closing paren in "for (;;)", right before point. Which
>> >> may not be as useful, as it is a child of for_statement, IIRC. Making a
>> >> rule for that isn't too hard, but it complicates things.
>> >
>> >Hmm... this might make no sense, but: why are we asking about the node
>> >at point? For indentation purposes, when RET is pressed, shouldn't we
>> >ask about the node of the first non-whitespace character of the line
>> >where we get RET?
>>
>> Yeah, but what then to do when there is only whitespace?
>
>Look back for the first line that has something other than whitespace?
Yeah, we could. But what then with
for (
;;)
|
?
Not saying it's impossible, but we will get complexity that may be hard to get into emacs-29, i think.
Theo
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 17:14 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 17:30 ` Eli Zaretskii
2023-02-15 17:52 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2023-02-15 17:30 UTC (permalink / raw)
To: Theodor Thornhill; +Cc: 61502, casouri, pankaj
> Date: Wed, 15 Feb 2023 18:14:46 +0100
> From: Theodor Thornhill <theo@thornhill.no>
> CC: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>
> >Look back for the first line that has something other than whitespace?
>
> Yeah, we could. But what then with
>
> for (
> ;;)
> |
>
> ?
>
> Not saying it's impossible, but we will get complexity that may be hard to get into emacs-29, i think.
At some point we will have to rely on electric characters to fix the
indentation later. We just need to try to fix the more blatant and
simple situations, I think, and punt (for now) where it becomes hairy.
I wonder, though: what do other IDEs which use tree-sitter do in these
cases?
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-15 17:30 ` Eli Zaretskii
@ 2023-02-15 17:52 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 0 replies; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 17:52 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: 61502, casouri, pankaj
Eli Zaretskii <eliz@gnu.org> writes:
>> Date: Wed, 15 Feb 2023 18:14:46 +0100
>> From: Theodor Thornhill <theo@thornhill.no>
>> CC: pankaj@codeisgreat.org, casouri@gmail.com, 61502@debbugs.gnu.org
>>
>> >Look back for the first line that has something other than whitespace?
>>
>> Yeah, we could. But what then with
>>
>> for (
>> ;;)
>> |
>>
>> ?
>>
>> Not saying it's impossible, but we will get complexity that may be hard to get into emacs-29, i think.
>
> At some point we will have to rely on electric characters to fix the
> indentation later. We just need to try to fix the more blatant and
> simple situations, I think, and punt (for now) where it becomes hairy.
>
> I wonder, though: what do other IDEs which use tree-sitter do in these
> cases?
Most of them avoid the whole emacs' tradition of guessing. They usually
just indent and deindent on block openers, otherwise just keep the
previous line indent level.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 19:41 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 20:02 ` Eli Zaretskii
@ 2023-02-14 20:59 ` Dmitry Gutov
2023-02-14 21:00 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 23:57 ` Dmitry Gutov
2 siblings, 1 reply; 28+ messages in thread
From: Dmitry Gutov @ 2023-02-14 20:59 UTC (permalink / raw)
To: Theodor Thornhill, Eli Zaretskii, Pankaj Jangid, Yuan Fu; +Cc: 61502
On 14/02/2023 21:41, Theodor Thornhill via Bug reports for GNU Emacs,
the Swiss army knife of text editors wrote:
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 749781894b..300a703515 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -1418,6 +1418,8 @@ treesit--indent-1
> ;; encompass the whitespace.
> (parent (cond ((and node parser)
> (treesit-node-parent node))
> + ((equal (treesit-node-type smallest-node) "}")
> + (treesit-node-parent smallest-node))
> (t (treesit-node-on bol bol)))))
> (funcall treesit-indent-function node parent bol))))
Is it a good idea to add C-specific constants to generic code?
Other modes might not have a node called "}" at all.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 20:59 ` Dmitry Gutov
@ 2023-02-14 21:00 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-15 0:12 ` Dmitry Gutov
0 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-14 21:00 UTC (permalink / raw)
To: Dmitry Gutov, Eli Zaretskii, Pankaj Jangid, Yuan Fu; +Cc: 61502
On 14 February 2023 21:59:03 CET, Dmitry Gutov <dgutov@yandex.ru> wrote:
>On 14/02/2023 21:41, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors wrote:
>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>> index 749781894b..300a703515 100644
>> --- a/lisp/treesit.el
>> +++ b/lisp/treesit.el
>> @@ -1418,6 +1418,8 @@ treesit--indent-1
>> ;; encompass the whitespace.
>> (parent (cond ((and node parser)
>> (treesit-node-parent node))
>> + ((equal (treesit-node-type smallest-node) "}")
>> + (treesit-node-parent smallest-node))
>> (t (treesit-node-on bol bol)))))
>> (funcall treesit-indent-function node parent bol))))
>
>Is it a good idea to add C-specific constants to generic code?
>
>Other modes might not have a node called "}" at all.
Yeah this was merely an example. There may be some "block-ender" concept one could envision. I need to experiment with it, and it may not be feasible at all
Theo
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 21:00 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2023-02-15 0:12 ` Dmitry Gutov
0 siblings, 0 replies; 28+ messages in thread
From: Dmitry Gutov @ 2023-02-15 0:12 UTC (permalink / raw)
To: Theodor Thornhill, Eli Zaretskii, Pankaj Jangid, Yuan Fu; +Cc: 61502
On 14/02/2023 23:00, Theodor Thornhill via Bug reports for GNU Emacs,
the Swiss army knife of text editors wrote:
>
> On 14 February 2023 21:59:03 CET, Dmitry Gutov<dgutov@yandex.ru> wrote:
>> On 14/02/2023 21:41, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors wrote:
>>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>>> index 749781894b..300a703515 100644
>>> --- a/lisp/treesit.el
>>> +++ b/lisp/treesit.el
>>> @@ -1418,6 +1418,8 @@ treesit--indent-1
>>> ;; encompass the whitespace.
>>> (parent (cond ((and node parser)
>>> (treesit-node-parent node))
>>> + ((equal (treesit-node-type smallest-node) "}")
>>> + (treesit-node-parent smallest-node))
>>> (t (treesit-node-on bol bol)))))
>>> (funcall treesit-indent-function node parent bol))))
>> Is it a good idea to add C-specific constants to generic code?
>>
>> Other modes might not have a node called "}" at all.
> Yeah this was merely an example. There may be some "block-ender" concept one could envision. I need to experiment with it, and it may not be feasible at all
I was thinking we could use a more/grammar-specific override to choose
which node to use to base the indentation on.
I.e. being able to override this piece of logic:
(treesit-parent-while
smallest-node
(lambda (node)
(and (eq bol (treesit-node-start node))
(not (treesit-node-eq node root)))))
It could also help with issues like
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=60602#8, to avoid
duplication in indentation rules.
AFAICT this is largely possible to do already by changing
treesit-indent-function to some mode-specific wrapper, but an override
like the above could be written more succinctly.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 19:41 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2023-02-14 20:02 ` Eli Zaretskii
2023-02-14 20:59 ` Dmitry Gutov
@ 2023-02-14 23:57 ` Dmitry Gutov
2023-02-15 6:07 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
2 siblings, 1 reply; 28+ messages in thread
From: Dmitry Gutov @ 2023-02-14 23:57 UTC (permalink / raw)
To: Theodor Thornhill, Eli Zaretskii, Pankaj Jangid, Yuan Fu; +Cc: 61502
On 14/02/2023 21:41, Theodor Thornhill via Bug reports for GNU Emacs,
the Swiss army knife of text editors wrote:
> What do you think?
>
> Theo
>
>
> diff --git a/lisp/treesit.el b/lisp/treesit.el
> index 749781894b..300a703515 100644
> --- a/lisp/treesit.el
> +++ b/lisp/treesit.el
> @@ -1418,6 +1418,8 @@ treesit--indent-1
> ;; encompass the whitespace.
> (parent (cond ((and node parser)
> (treesit-node-parent node))
> + ((equal (treesit-node-type smallest-node) "}")
> + (treesit-node-parent smallest-node))
> (t (treesit-node-on bol bol)))))
> (funcall treesit-indent-function node parent bol))))
Here's a counter-example for this patch:
int
main ()
{}|
press RET at |, and you'll see the next line indented by 2 spaces.
Whereas it shouldn't. This happens, among other things, because the
added code doesn't distinguish between "real" and "virtual" nodes.
BTW, in this example:
int
main
{
for (;;)
{|
}
the "}" node selected by treesit--indent-1 is not "unexisting": it
selects the closer on the next line, which is parsed to be the part of
the "for" node. Thanks to its presence, the parent compound_statement
node contains the point, and everything works out.
And this one
int
main
{
for (;;)|
}
isn't fixed with your patch because the "unexisting" node in place is a
different one: "expression_statement", and it has no closers. And it's
"virtually" placed at the end of the previous line by the parser.
So in most cases if the user has electric-pair-mode on, indentation
should work okay. Without a pairing solution, though, we see different
grammars handling incomplete code in different ways for different
syntactic elements: virtual nodes, container nodes that span after
point, container nodes that don't span after point, statements that
parse into a different node type (usually wrapped in ERROR). We could
report these one by one, hoping for the best. I'm curious how different
editors fare with indentation in these conditions -- perhaps they use a
different, more error-proof approach. But it could be that their uses
are less fussy about indentation than we are.
^ permalink raw reply [flat|nested] 28+ messages in thread
* bug#61502: 29.0.60; c-ts-mode auto-indent not working
2023-02-14 23:57 ` Dmitry Gutov
@ 2023-02-15 6:07 ` Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors
0 siblings, 0 replies; 28+ messages in thread
From: Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2023-02-15 6:07 UTC (permalink / raw)
To: Dmitry Gutov, Eli Zaretskii, Pankaj Jangid, Yuan Fu; +Cc: 61502
On 15 February 2023 00:57:17 CET, Dmitry Gutov <dgutov@yandex.ru> wrote:
>On 14/02/2023 21:41, Theodor Thornhill via Bug reports for GNU Emacs, the Swiss army knife of text editors wrote:
>> What do you think?
>>
>> Theo
>>
>>
>> diff --git a/lisp/treesit.el b/lisp/treesit.el
>> index 749781894b..300a703515 100644
>> --- a/lisp/treesit.el
>> +++ b/lisp/treesit.el
>> @@ -1418,6 +1418,8 @@ treesit--indent-1
>> ;; encompass the whitespace.
>> (parent (cond ((and node parser)
>> (treesit-node-parent node))
>> + ((equal (treesit-node-type smallest-node) "}")
>> + (treesit-node-parent smallest-node))
>> (t (treesit-node-on bol bol)))))
>> (funcall treesit-indent-function node parent bol))))
>
>Here's a counter-example for this patch:
>
>int
>main ()
>{}|
>
>press RET at |, and you'll see the next line indented by 2 spaces. Whereas it shouldn't. This happens, among other things, because the added code doesn't distinguish between "real" and "virtual" nodes.
>
>BTW, in this example:
>
>int
>main
>{
> for (;;)
> {|
>}
>
>the "}" node selected by treesit--indent-1 is not "unexisting": it selects the closer on the next line, which is parsed to be the part of the "for" node. Thanks to its presence, the parent compound_statement node contains the point, and everything works out.
>
>And this one
>
>int
>main
>{
> for (;;)|
>}
>
>isn't fixed with your patch because the "unexisting" node in place is a different one: "expression_statement", and it has no closers. And it's "virtually" placed at the end of the previous line by the parser.
>
>So in most cases if the user has electric-pair-mode on, indentation should work okay. Without a pairing solution, though, we see different grammars handling incomplete code in different ways for different syntactic elements: virtual nodes, container nodes that span after point, container nodes that don't span after point, statements that parse into a different node type (usually wrapped in ERROR). We could report these one by one, hoping for the best. I'm curious how different editors fare with indentation in these conditions -- perhaps they use a different, more error-proof approach. But it could be that their uses are less fussy about indentation than we are.
Sure, i agree - the patch was not a change request, merely an example to show why the root node was chosen rather than the closer compound_statement node.
^ permalink raw reply [flat|nested] 28+ messages in thread