* More Tree Sitter Questions / Problems. @ 2022-12-14 20:43 Perry Smith 2022-12-14 21:15 ` Stefan Monnier ` (2 more replies) 0 siblings, 3 replies; 9+ messages in thread From: Perry Smith @ 2022-12-14 20:43 UTC (permalink / raw) To: emacs-devel [-- Attachment #1.1: Type: text/plain, Size: 1341 bytes --] All three of ruby-mode, java-mode, and c-mode indent a simple arithmetic expression broken into lines like this: foodog = 12 + 4 * 18 * 99 + 8 I think this is the Java sample which has the indent set to 4. I’ll call this “the old way”. All three of ruby-ts-mode, java-ts-mode, and c-ts-mode indent it like this: variable = 12 + 4 * 18 * 99 + 8 In Ruby’s case, this rule is doing it: ((parent-is "binary") first-sibling 0) If I comment that rule out, then no rule hits and so there is no indent (the line is left unchanged no matter how it is indented). While I think the new way is ultra cool… I am 100% positive I am in the vast minority on this topic. Most prefer to have it indented the old way. I’ve developed two new rules but I believe these will not solve the issue 100%: ((ancestor-is "parenthesized_statements") (ancestor "parenthesized_statements") 1) ((ancestor-is "assignment") (ancestor "assignment") ruby-ts-mode-indent-offset) I also wrote ancestor-is and ancestor so now I get: eddie = (a + b * c * d + 12) bobby = a + b * c * d + 12 I fear as I test and play with this more I’m going to need more rules to catch all the cases where a line starts with a term of an arithmetic expression. Perry [-- Attachment #1.2: Type: text/html, Size: 3064 bytes --] [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith @ 2022-12-14 21:15 ` Stefan Monnier 2022-12-14 23:22 ` Perry Smith 2022-12-26 16:28 ` Dmitry Gutov 2022-12-15 6:05 ` Yuri Khan 2022-12-26 16:24 ` Dmitry Gutov 2 siblings, 2 replies; 9+ messages in thread From: Stefan Monnier @ 2022-12-14 21:15 UTC (permalink / raw) To: Perry Smith; +Cc: emacs-devel > foodog = 12 + 4 * > 18 * 99 + 8 [ Trying to provide some SMIE perspective: ] In the context of sh-mode, I've had requests to provide that kind of "AST-oblivious" indentation. The result is controlled by `sh-indent-after-continuation`. > variable = 12 + 4 * > 18 * 99 + 8 That's my favorite, yes. [ Tho GNU style would recommend breaking the line just before the `*` rather than just after it. ] > I also wrote ancestor-is and ancestor so now I get: > > eddie = (a + b * > c * d + 12) I think this one sucks. Do we really need it? Can we have eddie = (a + b * c * d + 12) instead? > bobby = a + b * > c * d + 12 > > I fear as I test and play with this more I’m going to need more rules > to catch all the cases where a line starts with a term of an > arithmetic expression. I'm not sure how you're looking at it, but for me, I've found it important to try and understand what those indentation choices "mean". I can see two interpretations of foodog = 12 + 4 * 18 * 99 + 8 one is that this is one logical line spread over several physical lines and the syntactic structure should be ignored, so it leads to: foodog = (12 + 4 * 18 * 99 + 8) That's the interpretation I used in `sh-indent-after-continuation` and which I found to be easier to understand (and hence define in code). Another way to look at it is via what I call "virtual indentation" in SMIE: while "12 + 4 *" in the above code is indented 9 columns deeper than "foodog", we could decide that what follows a "=" assignment is always "virtually indented" only 4 columns deeper than the var. So we get foodog = 12 + 4 + 18 * 99 + 8 because the "18" is aligned with (the virtual indentation of) "12". Then we also get foodog = (12 + 4 + 18 * 99 + 8) because "18" is still aligned "12" but while "(" is virtually indented to +4, the virtual indentation of "12" is not special (it's the same as its real indentation). But if want to obey the syntactic structure we still won't get foodog = 12 + 4 * 18 * 99 + 8 because "18" shouldn't be aligned with "12" in this case. Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 21:15 ` Stefan Monnier @ 2022-12-14 23:22 ` Perry Smith 2022-12-14 23:48 ` Yuan Fu ` (2 more replies) 2022-12-26 16:28 ` Dmitry Gutov 1 sibling, 3 replies; 9+ messages in thread From: Perry Smith @ 2022-12-14 23:22 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel [-- Attachment #1.1: Type: text/plain, Size: 2078 bytes --] > On Dec 14, 2022, at 15:15, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > >> foodog = 12 + 4 * >> 18 * 99 + 8 > > [ Trying to provide some SMIE perspective: ] > > In the context of sh-mode, I've had requests to provide that kind of > "AST-oblivious" indentation. The result is controlled by > `sh-indent-after-continuation`. > >> variable = 12 + 4 * >> 18 * 99 + 8 > > That's my favorite, yes. You might be misunderstanding my concern (but I do appreciate all of your examples and thoughts). My concern is if Tree Sitter modes deviate too much from the old way, they may not catch on. Perhaps I should not worry about that. The old modes are not going anywhere so people can keep which ever they prefer. But, that is where my worry is coming from. On a slightly different topic but only slightly, I discovered that my first draft also does this: if 12 * 18 + 45 - 19 frog = 12 end Rather than this: if 12 * 18 + 45 - 19 frog = 12 end I can change the code but I mention it because I bet others will be making the same mistake. “frog” (in the first example) is indented to the bol of parent. The parent is “then” (not if) and the “then” is on the 2nd line, not the first. So instead of indenting two spaces from the bol of the “if", it is indented two spaces from the bol of the “then” which is the 45. All this to say that now that I”m getting deeper into this, I plan to rethink things. “Parent”, “grand parent”, “first-sibling”, etc are likely not going to be good for anchor points because simply adding paren’s makes a node on level deeper. I think what will be better in the case of the “if” will be to be anchor to closest “if”. I’m also coming to the conclusion that “parent-bol” would be better if it was (bol (parent)) so that (bol (grand-parent)) and (bol (ancestor “if”)), etc could be easily done. These are mostly “thinking out loud” type comments. I plan on piddling with things more. Perry [-- Attachment #1.2: Type: text/html, Size: 3637 bytes --] [-- Attachment #2: Message signed with OpenPGP --] [-- Type: application/pgp-signature, Size: 833 bytes --] ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 23:22 ` Perry Smith @ 2022-12-14 23:48 ` Yuan Fu 2022-12-14 23:53 ` Stefan Monnier 2022-12-15 6:56 ` Eli Zaretskii 2 siblings, 0 replies; 9+ messages in thread From: Yuan Fu @ 2022-12-14 23:48 UTC (permalink / raw) To: Perry Smith; +Cc: Stefan Monnier, emacs-devel > On Dec 14, 2022, at 3:22 PM, Perry Smith <pedz@easesoftware.com> wrote: > > > >> On Dec 14, 2022, at 15:15, Stefan Monnier <monnier@iro.umontreal.ca> wrote: >> >>> foodog = 12 + 4 * >>> 18 * 99 + 8 >> >> [ Trying to provide some SMIE perspective: ] >> >> In the context of sh-mode, I've had requests to provide that kind of >> "AST-oblivious" indentation. The result is controlled by >> `sh-indent-after-continuation`. >> >>> variable = 12 + 4 * >>> 18 * 99 + 8 >> >> That's my favorite, yes. > > You might be misunderstanding my concern (but I do appreciate all of your examples and thoughts). > > My concern is if Tree Sitter modes deviate too much from the old way, they may not catch on. Perhaps I should not worry about that. The old modes are not going anywhere so people can keep which ever they prefer. But, that is where my worry is coming from. > > On a slightly different topic but only slightly, I discovered that my first draft also does this: > > if 12 * 18 + > 45 - 19 > frog = 12 > end > > Rather than this: > > if 12 * 18 + > 45 - 19 > frog = 12 > end > > I can change the code but I mention it because I bet others will be making the same mistake. > > “frog” (in the first example) is indented to the bol of parent. The parent is “then” (not if) and the “then” is on the 2nd line, not the first. So instead of indenting two spaces from the bol of the “if", it is indented two spaces from the bol of the “then” which is the 45. Yes, we’ve seen the same problem in other languages, like in bug#59686. > > All this to say that now that I”m getting deeper into this, I plan to rethink things. “Parent”, “grand parent”, “first-sibling”, etc are likely not going to be good for anchor points because simply adding paren’s makes a node on level deeper. Where exactly would you insert the parenthesizes? Because if you add them around frog = 12, ie, the following: if 12 * 18 + 45 - 19 (frog = 12) end then we are now indenting the (frog = 12), not frog = 12, so we are still in the same level, and using grand-parent-bol would still work. And there is no need for searching for the “if” node. > I think what will be better in the case of the “if” will be to be anchor to closest “if”. > > I’m also coming to the conclusion that “parent-bol” would be better if it was (bol (parent)) so that (bol (grand-parent)) and (bol (ancestor “if”)), etc could be easily done. That’ll depend on how many combinations we end up needing. If we really only need (bol (parent)) and (bol (grand-parent)), simply adding them as parent-bol and grandparent-bol is better (for code complexity, for understanding, for documentation, etc). Yuan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 23:22 ` Perry Smith 2022-12-14 23:48 ` Yuan Fu @ 2022-12-14 23:53 ` Stefan Monnier 2022-12-15 6:56 ` Eli Zaretskii 2 siblings, 0 replies; 9+ messages in thread From: Stefan Monnier @ 2022-12-14 23:53 UTC (permalink / raw) To: Perry Smith; +Cc: emacs-devel > You might be misunderstanding my concern (but I do appreciate all of > your examples and thoughts). > > My concern is if Tree Sitter modes deviate too much from the old way, > they may not catch on. Perhaps I should not worry about that. > The old modes are not going anywhere so people can keep which ever > they prefer. But, that is where my worry is coming from. Makes sense. There's a delicate balance sometimes. Some indentation styles don't "make sense" if you look a them from the point of view of an AST, yet people like them and in some cases like them enough that they'd rather use another tool for that. We'll need to come up with ways to accommodate them, but accommodating them still requires "making sense of those non-sensical styles". My examples and thoughts are just trying to help you make sense of the case you're bumping into. > On a slightly different topic but only slightly, I discovered that my first draft also does this: > > if 12 * 18 + > 45 - 19 > frog = 12 > end > > Rather than this: > > if 12 * 18 + > 45 - 19 > frog = 12 > end > > I can change the code but I mention it because I bet others will be making the same mistake. > > “frog” (in the first example) is indented to the bol of parent. > The parent is “then” (not if) and the “then” is on the 2nd line, not > the first. So instead of indenting two spaces from the bol of the > “if", it is indented two spaces from the bol of the “then” which is > the 45. I don't see a "then" keyword above, so I'll assume that the "then" is implicit (can be taken as being represented by the newline or somesuch). In SMIE this is usually handled so as to allow "frog" to be indented as follows: if 12 * 18 + 45 - 19 then frog = 12 end or if 12 * 18 + 45 - 19 then frog = 12 end or if 12 * 18 + 45 - 19 then frog = 12 end IOW, "frog" here is *always* indented 2 columns deeper than "then". This is true even for the first case above because in the first case above, the *virtual* indentation of "then" is the column of "if" rather than the column where "then" is actually placed in the file. This is defined by specifying in the indentation rules how "then" is (virtually) indented when it is "hanging" (i.e. the last (and not only) token on a line). I've found this idea of virtual indentation to be very convenient, arguably the most important idea behind SMIE's indentation rules. For example, it lets me say that "fn x => ..." is virtually indented like its parent when it parent is another "fn x =>", so that we get: fn x => fn y => fn z => <BODY> instead of fn x => fn y => fn z => <BODY> [ In the above example I presumed that "end" is defined to be aligned with "then" but of course, another valid choice would be to align it with "if". ] Stefan ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 23:22 ` Perry Smith 2022-12-14 23:48 ` Yuan Fu 2022-12-14 23:53 ` Stefan Monnier @ 2022-12-15 6:56 ` Eli Zaretskii 2 siblings, 0 replies; 9+ messages in thread From: Eli Zaretskii @ 2022-12-15 6:56 UTC (permalink / raw) To: Perry Smith; +Cc: monnier, emacs-devel > From: Perry Smith <pedz@easesoftware.com> > Date: Wed, 14 Dec 2022 17:22:44 -0600 > Cc: emacs-devel <emacs-devel@gnu.org> > > My concern is if Tree Sitter modes deviate too much from the old way, they may not catch on. Perhaps I > should not worry about that. The old modes are not going anywhere so people can keep which ever they > prefer. Indeed. Moreover, if you can easily allow both behaviors, add a user option to select one of them, and move on. Emacs 29.1 will be the first release with these TS-based modes, and we will hopefully have a lot of user feedback to guide us in making these decisions in the future. We cannot reasonably hope to solve all those dilemmas now, so we must leave it for users to tell us their experiences and preferences. Please also keep in mind that I'd like to put out the first pretest of Emacs 29.1 in about 2 months, so things should be becoming stable very soon. Thanks. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 21:15 ` Stefan Monnier 2022-12-14 23:22 ` Perry Smith @ 2022-12-26 16:28 ` Dmitry Gutov 1 sibling, 0 replies; 9+ messages in thread From: Dmitry Gutov @ 2022-12-26 16:28 UTC (permalink / raw) To: Stefan Monnier, Perry Smith; +Cc: emacs-devel On 14/12/2022 23:15, Stefan Monnier wrote: > In the context of sh-mode, I've had requests to provide that kind of > "AST-oblivious" indentation. The result is controlled by > `sh-indent-after-continuation`. *-after-continuation is a cool naming example, would be nice to be able to reuse it. But in the case of Ruby it seems like we're choosing between AST-based indentation (where the indentation offset can reach arbitrary nesting after several continuation like breaks) and "continuation"-based indentation, where we basically have 0 or 1 levels of nesting, unless certain subexpressions are used (blocks or parentheses, mostly). OTOH, ruby-indent-simplified (the name of the option I used in https://debbugs.gnu.org/60186), is even less semantic. ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith 2022-12-14 21:15 ` Stefan Monnier @ 2022-12-15 6:05 ` Yuri Khan 2022-12-26 16:24 ` Dmitry Gutov 2 siblings, 0 replies; 9+ messages in thread From: Yuri Khan @ 2022-12-15 6:05 UTC (permalink / raw) To: Perry Smith; +Cc: emacs-devel On Thu, 15 Dec 2022 at 03:44, Perry Smith <pedz@easesoftware.com> wrote: > All three of ruby-mode, java-mode, and c-mode indent a simple arithmetic expression broken into lines like this: > > foodog = 12 + 4 * > 18 * 99 + 8 My opinion is that this is the wrong problem to solve. Assuming the example is simplified and there is in fact reason to break lines, it should be variable = 12 + 4 * 18 * 99 + 8 or variable = 12 + 4 * 18 * 99 + 8 or maybe variable = 12 + 4 * 18 * 99 + 8 i.e. prefer breaking lines higher in the AST. > All three of ruby-ts-mode, java-ts-mode, and c-ts-mode indent it like this: > > variable = 12 + 4 * > 18 * 99 + 8 Also, this style only makes sense if tab characters are not used. Otherwise, the alignment will break for users whose tab width differs. (Emacs’s source code suffers from this a lot.) ^ permalink raw reply [flat|nested] 9+ messages in thread
* Re: More Tree Sitter Questions / Problems. 2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith 2022-12-14 21:15 ` Stefan Monnier 2022-12-15 6:05 ` Yuri Khan @ 2022-12-26 16:24 ` Dmitry Gutov 2 siblings, 0 replies; 9+ messages in thread From: Dmitry Gutov @ 2022-12-26 16:24 UTC (permalink / raw) To: Perry Smith, emacs-devel Hi Perry, On 14/12/2022 22:43, Perry Smith wrote: > All three of ruby-mode, java-mode, and c-mode indent a simple arithmetic > expression broken into lines like this: > > foodog = 12 + 4 * > 18 * 99 + 8 > > I think this is the Java sample which has the indent set to 4. I’ll > call this “the old way”. > > All three of ruby-ts-mode, java-ts-mode, and c-ts-mode indent it like this: > > variable = 12 + 4 * > 18 * 99 + 8 First of all, this is IMO not too terrible, as it shows the user how the code is parsed, which can be helpful on different occasions. And as apparent from practice, people who don't like this behavior (e.g. Rubocop frowns on it) will just break the line after "=". Seems like the different indentation behaviors across different editors (when the team uses several ones) converge on this practice anyway. > In Ruby’s case, this rule is doing it: > > ((parent-is "binary") first-sibling 0) > > If I comment that rule out, then no rule hits and so there is no indent > (the line is left unchanged no matter how it is indented). As luck would have it, I'm finishing work on a feature request for ruby-mode for a user option which switches indentation from this sort of AST-aware to simpler continuations. See the latest patch in debbugs#60186. Further, have you looked into supporting/being more-or-less compatible with existing indentation-related options in ruby-mode? At least those that affect SMIE. It should be helpful to synchronize, both to provide a straightforward upgrade path for existing users, and to ensure good experience for those who don't build tree-sitter yet, and for users of older Emacs. ^ permalink raw reply [flat|nested] 9+ messages in thread
end of thread, other threads:[~2022-12-26 16:28 UTC | newest] Thread overview: 9+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-12-14 20:43 More Tree Sitter Questions / Problems Perry Smith 2022-12-14 21:15 ` Stefan Monnier 2022-12-14 23:22 ` Perry Smith 2022-12-14 23:48 ` Yuan Fu 2022-12-14 23:53 ` Stefan Monnier 2022-12-15 6:56 ` Eli Zaretskii 2022-12-26 16:28 ` Dmitry Gutov 2022-12-15 6:05 ` Yuri Khan 2022-12-26 16:24 ` Dmitry Gutov
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).