unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* More Tree Sitter Questions / Problems.
@ 2022-12-14 20:43 Perry Smith
  2022-12-14 21:15 ` Stefan Monnier
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Perry Smith @ 2022-12-14 20:43 UTC (permalink / raw)
  To: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 1341 bytes --]

All three of ruby-mode, java-mode, and c-mode indent a simple arithmetic expression broken into lines like this:

foodog = 12 + 4 *
    18 * 99 + 8

I think this is the Java sample which has the indent set to 4.  I’ll call this “the old way”.

All three of ruby-ts-mode, java-ts-mode, and c-ts-mode indent it like this:

variable = 12 + 4 *
                18 * 99 + 8

In Ruby’s case, this rule is doing it:

           ((parent-is "binary") first-sibling 0)

If I comment that rule out, then no rule hits and so there is no indent (the line is left unchanged no matter how it is indented).

While I think the new way is ultra cool… I am 100% positive I am in the vast minority on this topic.  Most prefer to have it indented the old way.

I’ve developed two new rules but I believe these will not solve the issue 100%:

           ((ancestor-is "parenthesized_statements") (ancestor "parenthesized_statements") 1)
           ((ancestor-is "assignment") (ancestor "assignment") ruby-ts-mode-indent-offset)

I also wrote ancestor-is and ancestor so now I get:

eddie = (a + b *
         c * d + 12)
bobby = a + b *
  c * d + 12

I fear as I test and play with this more I’m going to need more rules to catch all the cases where a line starts with a term of an arithmetic expression.

Perry


[-- Attachment #1.2: Type: text/html, Size: 3064 bytes --]

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith
@ 2022-12-14 21:15 ` Stefan Monnier
  2022-12-14 23:22   ` Perry Smith
  2022-12-26 16:28   ` Dmitry Gutov
  2022-12-15  6:05 ` Yuri Khan
  2022-12-26 16:24 ` Dmitry Gutov
  2 siblings, 2 replies; 9+ messages in thread
From: Stefan Monnier @ 2022-12-14 21:15 UTC (permalink / raw)
  To: Perry Smith; +Cc: emacs-devel

> foodog = 12 + 4 *
>     18 * 99 + 8

[ Trying to provide some SMIE perspective:  ]

In the context of sh-mode, I've had requests to provide that kind of
"AST-oblivious" indentation.  The result is controlled by
`sh-indent-after-continuation`.

> variable = 12 + 4 *
>                 18 * 99 + 8

That's my favorite, yes.
[ Tho GNU style would recommend breaking the line just before the `*`
  rather than just after it.  ]

> I also wrote ancestor-is and ancestor so now I get:
>
> eddie = (a + b *
>          c * d + 12)

I think this one sucks.  Do we really need it?
Can we have

    eddie = (a + b *
                 c * d + 12)

instead?

> bobby = a + b *
>   c * d + 12
>
> I fear as I test and play with this more I’m going to need more rules
> to catch all the cases where a line starts with a term of an
> arithmetic expression.

I'm not sure how you're looking at it, but for me, I've found it
important to try and understand what those indentation choices "mean".

I can see two interpretations of

    foodog = 12 + 4 *
        18 * 99 + 8

one is that this is one logical line spread over several physical lines
and the syntactic structure should be ignored, so it leads to:

    foodog = (12 + 4 *
        18 * 99 + 8)

That's the interpretation I used in `sh-indent-after-continuation` and
which I found to be easier to understand (and hence define in code).

Another way to look at it is via what I call "virtual indentation" in
SMIE: while "12 + 4 *" in the above code is indented 9 columns deeper
than "foodog", we could decide that what follows a "=" assignment is always
"virtually indented" only 4 columns deeper than the var.  So we get

    foodog = 12 + 4 +
        18 * 99 + 8

because the "18" is aligned with (the virtual indentation of) "12".
Then we also get

    foodog = (12 + 4 +
              18 * 99 + 8)

because "18" is still aligned "12" but while "(" is virtually indented
to +4, the virtual indentation of "12" is not special (it's the same as
its real indentation).

But if want to obey the syntactic structure we still won't get

    foodog = 12 + 4 *
        18 * 99 + 8

because "18" shouldn't be aligned with "12" in this case.


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 21:15 ` Stefan Monnier
@ 2022-12-14 23:22   ` Perry Smith
  2022-12-14 23:48     ` Yuan Fu
                       ` (2 more replies)
  2022-12-26 16:28   ` Dmitry Gutov
  1 sibling, 3 replies; 9+ messages in thread
From: Perry Smith @ 2022-12-14 23:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel


[-- Attachment #1.1: Type: text/plain, Size: 2078 bytes --]



> On Dec 14, 2022, at 15:15, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> 
>> foodog = 12 + 4 *
>>    18 * 99 + 8
> 
> [ Trying to provide some SMIE perspective:  ]
> 
> In the context of sh-mode, I've had requests to provide that kind of
> "AST-oblivious" indentation.  The result is controlled by
> `sh-indent-after-continuation`.
> 
>> variable = 12 + 4 *
>>                18 * 99 + 8
> 
> That's my favorite, yes.

You might be misunderstanding my concern (but I do appreciate all of your examples and thoughts).

My concern is if Tree Sitter modes deviate too much from the old way, they may not catch on.  Perhaps I should not worry about that.  The old modes are not going anywhere so people can keep which ever they prefer.  But, that is where my worry is coming from.

On a slightly different topic but only slightly, I discovered that my first draft also does this:

if 12 * 18 +
   45 - 19
     frog = 12
end

Rather than this:

if 12 * 18 +
   45 - 19
  frog = 12
end

I can change the code but I mention it because I bet others will be making the same mistake.

“frog” (in the first example) is indented to the bol of parent.  The parent is “then” (not if) and the “then” is on the 2nd line, not the first.  So instead of indenting two spaces from the bol of the “if", it is indented two spaces from the bol of the “then” which is the 45.

All this to say that now that I”m getting deeper into this, I plan to rethink things.  “Parent”, “grand parent”, “first-sibling”, etc are likely not going to be good for anchor points because simply adding paren’s makes a node on level deeper.  I think what will be better in the case of the “if” will be to be anchor to closest “if”.

I’m also coming to the conclusion that “parent-bol” would be better if it was (bol (parent)) so that (bol (grand-parent)) and (bol (ancestor “if”)), etc could be easily done.

These are mostly “thinking out loud” type comments.  I plan on piddling with things more.

Perry



[-- Attachment #1.2: Type: text/html, Size: 3637 bytes --]

[-- Attachment #2: Message signed with OpenPGP --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 23:22   ` Perry Smith
@ 2022-12-14 23:48     ` Yuan Fu
  2022-12-14 23:53     ` Stefan Monnier
  2022-12-15  6:56     ` Eli Zaretskii
  2 siblings, 0 replies; 9+ messages in thread
From: Yuan Fu @ 2022-12-14 23:48 UTC (permalink / raw)
  To: Perry Smith; +Cc: Stefan Monnier, emacs-devel



> On Dec 14, 2022, at 3:22 PM, Perry Smith <pedz@easesoftware.com> wrote:
> 
> 
> 
>> On Dec 14, 2022, at 15:15, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>> 
>>> foodog = 12 + 4 *
>>>    18 * 99 + 8
>> 
>> [ Trying to provide some SMIE perspective:  ]
>> 
>> In the context of sh-mode, I've had requests to provide that kind of
>> "AST-oblivious" indentation.  The result is controlled by
>> `sh-indent-after-continuation`.
>> 
>>> variable = 12 + 4 *
>>>                18 * 99 + 8
>> 
>> That's my favorite, yes.
> 
> You might be misunderstanding my concern (but I do appreciate all of your examples and thoughts).
> 
> My concern is if Tree Sitter modes deviate too much from the old way, they may not catch on.  Perhaps I should not worry about that.  The old modes are not going anywhere so people can keep which ever they prefer.  But, that is where my worry is coming from.
> 
> On a slightly different topic but only slightly, I discovered that my first draft also does this:
> 
> if 12 * 18 +
>    45 - 19
>      frog = 12
> end
> 
> Rather than this:
> 
> if 12 * 18 +
>    45 - 19
>   frog = 12
> end
> 
> I can change the code but I mention it because I bet others will be making the same mistake.
> 
> “frog” (in the first example) is indented to the bol of parent.  The parent is “then” (not if) and the “then” is on the 2nd line, not the first.  So instead of indenting two spaces from the bol of the “if", it is indented two spaces from the bol of the “then” which is the 45.

Yes, we’ve seen the same problem in other languages, like in bug#59686.

> 
> All this to say that now that I”m getting deeper into this, I plan to rethink things.  “Parent”, “grand parent”, “first-sibling”, etc are likely not going to be good for anchor points because simply adding paren’s makes a node on level deeper.  

Where exactly would you insert the parenthesizes? Because if you add them around frog = 12, ie, the following:

if 12 * 18 +
   45 - 19
  (frog = 12)
end

then we are now indenting the (frog = 12), not frog = 12, so we are still in the same level, and using grand-parent-bol would still work. And there is no need for searching for the “if” node.

> I think what will be better in the case of the “if” will be to be anchor to closest “if”.
> 
> I’m also coming to the conclusion that “parent-bol” would be better if it was (bol (parent)) so that (bol (grand-parent)) and (bol (ancestor “if”)), etc could be easily done.

That’ll depend on how many combinations we end up needing. If we really only need (bol (parent)) and (bol (grand-parent)), simply adding them as parent-bol and grandparent-bol is better (for code complexity, for understanding, for documentation, etc).

Yuan


^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 23:22   ` Perry Smith
  2022-12-14 23:48     ` Yuan Fu
@ 2022-12-14 23:53     ` Stefan Monnier
  2022-12-15  6:56     ` Eli Zaretskii
  2 siblings, 0 replies; 9+ messages in thread
From: Stefan Monnier @ 2022-12-14 23:53 UTC (permalink / raw)
  To: Perry Smith; +Cc: emacs-devel

> You might be misunderstanding my concern (but I do appreciate all of
> your examples and thoughts).
>
> My concern is if Tree Sitter modes deviate too much from the old way,
> they may not catch on.  Perhaps I should not worry about that.
> The old modes are not going anywhere so people can keep which ever
> they prefer.  But, that is where my worry is coming from.

Makes sense.  There's a delicate balance sometimes.  Some indentation
styles don't "make sense" if you look a them from the point of view of
an AST, yet people like them and in some cases like them enough that
they'd rather use another tool for that.

We'll need to come up with ways to accommodate them, but accommodating
them still requires "making sense of those non-sensical styles".
My examples and thoughts are just trying to help you make sense of the
case you're bumping into.

> On a slightly different topic but only slightly, I discovered that my first draft also does this:
>
> if 12 * 18 +
>    45 - 19
>      frog = 12
> end
>
> Rather than this:
>
> if 12 * 18 +
>    45 - 19
>   frog = 12
> end
>
> I can change the code but I mention it because I bet others will be making the same mistake.
>
> “frog” (in the first example) is indented to the bol of parent.
> The parent is “then” (not if) and the “then” is on the 2nd line, not
> the first.  So instead of indenting two spaces from the bol of the
> “if", it is indented two spaces from the bol of the “then” which is
> the 45.

I don't see a "then" keyword above, so I'll assume that the "then" is
implicit (can be taken as being represented by the newline or somesuch).

In SMIE this is usually handled so as to allow "frog" to be indented as follows:

    if 12 * 18 +
       45 - 19 then
      frog = 12
    end
or
    if 12 * 18 +
       45 - 19
    then
      frog = 12
    end
or
    if 12 * 18 +
       45 - 19
      then
        frog = 12
      end

IOW, "frog" here is *always* indented 2 columns deeper than "then".
This is true even for the first case above because in the first case
above, the *virtual* indentation of "then" is the column of "if" rather
than the column where "then" is actually placed in the file.  This is
defined by specifying in the indentation rules how "then" is (virtually)
indented when it is "hanging" (i.e. the last (and not only) token on
a line).

I've found this idea of virtual indentation to be very convenient,
arguably the most important idea behind SMIE's indentation rules.
For example, it lets me say that "fn x => ..." is virtually indented
like its parent when it parent is another "fn x =>", so that we get:

    fn x => fn y => fn z =>
      <BODY>

instead of

    fn x => fn y => fn z =>
                      <BODY>

[ In the above example I presumed that "end" is defined to be aligned
  with "then" but of course, another valid choice would be to align it
  with "if".  ]


        Stefan




^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith
  2022-12-14 21:15 ` Stefan Monnier
@ 2022-12-15  6:05 ` Yuri Khan
  2022-12-26 16:24 ` Dmitry Gutov
  2 siblings, 0 replies; 9+ messages in thread
From: Yuri Khan @ 2022-12-15  6:05 UTC (permalink / raw)
  To: Perry Smith; +Cc: emacs-devel

On Thu, 15 Dec 2022 at 03:44, Perry Smith <pedz@easesoftware.com> wrote:

> All three of ruby-mode, java-mode, and c-mode indent a simple arithmetic expression broken into lines like this:
>
> foodog = 12 + 4 *
>     18 * 99 + 8

My opinion is that this is the wrong problem to solve. Assuming the
example is simplified and there is in fact reason to break lines, it
should be

    variable = 12 +
        4 * 18 * 99 +
        8

or

    variable = 12
        + 4 * 18 * 99
        + 8

or maybe

    variable =
        12 + 4 * 18 * 99 + 8

i.e. prefer breaking lines higher in the AST.

> All three of ruby-ts-mode, java-ts-mode, and c-ts-mode indent it like this:
>
> variable = 12 + 4 *
>                 18 * 99 + 8

Also, this style only makes sense if tab characters are not used.
Otherwise, the alignment will break for users whose tab width differs.
(Emacs’s source code suffers from this a lot.)



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 23:22   ` Perry Smith
  2022-12-14 23:48     ` Yuan Fu
  2022-12-14 23:53     ` Stefan Monnier
@ 2022-12-15  6:56     ` Eli Zaretskii
  2 siblings, 0 replies; 9+ messages in thread
From: Eli Zaretskii @ 2022-12-15  6:56 UTC (permalink / raw)
  To: Perry Smith; +Cc: monnier, emacs-devel

> From: Perry Smith <pedz@easesoftware.com>
> Date: Wed, 14 Dec 2022 17:22:44 -0600
> Cc: emacs-devel <emacs-devel@gnu.org>
> 
> My concern is if Tree Sitter modes deviate too much from the old way, they may not catch on.  Perhaps I
> should not worry about that.  The old modes are not going anywhere so people can keep which ever they
> prefer.

Indeed.  Moreover, if you can easily allow both behaviors, add a user
option to select one of them, and move on.  Emacs 29.1 will be the
first release with these TS-based modes, and we will hopefully have a
lot of user feedback to guide us in making these decisions in the
future.  We cannot reasonably hope to solve all those dilemmas now, so
we must leave it for users to tell us their experiences and
preferences.

Please also keep in mind that I'd like to put out the first pretest of
Emacs 29.1 in about 2 months, so things should be becoming stable very
soon.

Thanks.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith
  2022-12-14 21:15 ` Stefan Monnier
  2022-12-15  6:05 ` Yuri Khan
@ 2022-12-26 16:24 ` Dmitry Gutov
  2 siblings, 0 replies; 9+ messages in thread
From: Dmitry Gutov @ 2022-12-26 16:24 UTC (permalink / raw)
  To: Perry Smith, emacs-devel

Hi Perry,

On 14/12/2022 22:43, Perry Smith wrote:
> All three of ruby-mode, java-mode, and c-mode indent a simple arithmetic 
> expression broken into lines like this:
> 
> foodog = 12 + 4 *
>      18 * 99 + 8
> 
> I think this is the Java sample which has the indent set to 4.  I’ll 
> call this “the old way”.
> 
> All three of ruby-ts-mode, java-ts-mode, and c-ts-mode indent it like this:
> 
> variable = 12 + 4 *
>                  18 * 99 + 8

First of all, this is IMO not too terrible, as it shows the user how the 
code is parsed, which can be helpful on different occasions.

And as apparent from practice, people who don't like this behavior (e.g. 
Rubocop frowns on it) will just break the line after "=". Seems like the 
different indentation behaviors across different editors (when the team 
uses several ones) converge on this practice anyway.

> In Ruby’s case, this rule is doing it:
> 
>             ((parent-is "binary") first-sibling 0)
> 
> If I comment that rule out, then no rule hits and so there is no indent 
> (the line is left unchanged no matter how it is indented).

As luck would have it, I'm finishing work on a feature request for 
ruby-mode for a user option which switches indentation from this sort of 
AST-aware to simpler continuations. See the latest patch in debbugs#60186.

Further, have you looked into supporting/being more-or-less compatible 
with existing indentation-related options in ruby-mode? At least those 
that affect SMIE.

It should be helpful to synchronize, both to provide a straightforward 
upgrade path for existing users, and to ensure good experience for those 
who don't build tree-sitter yet, and for users of older Emacs.



^ permalink raw reply	[flat|nested] 9+ messages in thread

* Re: More Tree Sitter Questions / Problems.
  2022-12-14 21:15 ` Stefan Monnier
  2022-12-14 23:22   ` Perry Smith
@ 2022-12-26 16:28   ` Dmitry Gutov
  1 sibling, 0 replies; 9+ messages in thread
From: Dmitry Gutov @ 2022-12-26 16:28 UTC (permalink / raw)
  To: Stefan Monnier, Perry Smith; +Cc: emacs-devel

On 14/12/2022 23:15, Stefan Monnier wrote:
> In the context of sh-mode, I've had requests to provide that kind of
> "AST-oblivious" indentation.  The result is controlled by
> `sh-indent-after-continuation`.

*-after-continuation is a cool naming example, would be nice to be able 
to reuse it.

But in the case of Ruby it seems like we're choosing between AST-based 
indentation (where the indentation offset can reach arbitrary nesting 
after several continuation like breaks) and "continuation"-based 
indentation, where we basically have 0 or 1 levels of nesting, unless 
certain subexpressions are used (blocks or parentheses, mostly).

OTOH, ruby-indent-simplified (the name of the option I used in 
https://debbugs.gnu.org/60186), is even less semantic.



^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2022-12-26 16:28 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith
2022-12-14 21:15 ` Stefan Monnier
2022-12-14 23:22   ` Perry Smith
2022-12-14 23:48     ` Yuan Fu
2022-12-14 23:53     ` Stefan Monnier
2022-12-15  6:56     ` Eli Zaretskii
2022-12-26 16:28   ` Dmitry Gutov
2022-12-15  6:05 ` Yuri Khan
2022-12-26 16:24 ` Dmitry Gutov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).