* How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? @ 2024-11-27 23:27 Björn Lindqvist 2024-11-28 7:30 ` Eli Zaretskii 0 siblings, 1 reply; 7+ messages in thread From: Björn Lindqvist @ 2024-11-27 23:27 UTC (permalink / raw) To: Emacs developers Hello Emacs developers! I've been trying to get c-ts-mode to indent like I want, but I'm running into problems related to preprocessor directives. For example, consider a type definition nested in two #ifdefs: #ifdef X #ifdef Y typedef int foo; #endif #endif Since both the parent and grand parent of the type_definition is a preproc_ifdef no rule matches. Another issue is that I want my preprocessor directives kept at column 0, which unfortunately screws up all rules that refer to the parent. E.g.: ((parent-is "if_statement") standalone-parent 4) Doesn't work for int main() { if (true) #ifdef A prutt(); #else fis(); #endif } The rule I'd like to express is "take the indent of the closest *indenting* parent and add one indent". That rule would match whether that parent is a "while_statement", "if_statement", "for_statement", etc. You can't express such rules with tree-sitter, can you? Btw, I get that tree-sitter can't handle *all* weird preprocessor constructs you can create, but my examples are really common and appear in most C code bases. -- mvh/best regards Björn Lindqvist ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? 2024-11-27 23:27 How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? Björn Lindqvist @ 2024-11-28 7:30 ` Eli Zaretskii 2024-11-28 10:03 ` Yuan Fu 2024-11-28 18:30 ` Filippo Argiolas 0 siblings, 2 replies; 7+ messages in thread From: Eli Zaretskii @ 2024-11-28 7:30 UTC (permalink / raw) To: Björn Lindqvist, Yuan Fu; +Cc: emacs-devel > From: Björn Lindqvist <bjourne@gmail.com> > Date: Thu, 28 Nov 2024 00:27:17 +0100 > > I've been trying to get c-ts-mode to indent like I want, but I'm > running into problems related to preprocessor directives. Preprocessor directives are difficult because the tree-sitter C/C++ grammars include only partial support for them. > For > example, consider a type definition nested in two #ifdefs: > > #ifdef X > #ifdef Y > typedef int foo; > #endif > #endif > > Since both the parent and grand parent of the type_definition is a > preproc_ifdef no rule matches. But if you go back (up) the parent-child hierarchy, you will eventually find a node which is not a preproc_SOMETHING, and can go from there, no? > Another issue is that I want my > preprocessor directives kept at column 0, which unfortunately screws > up all rules that refer to the parent. E.g.: > > ((parent-is "if_statement") standalone-parent 4) > > Doesn't work for > > int main() { > if (true) > #ifdef A > prutt(); > #else > fis(); > #endif > } > > The rule I'd like to express is "take the indent of the closest > *indenting* parent and add one indent". That rule would match whether > that parent is a "while_statement", "if_statement", "for_statement", > etc. You can't express such rules with tree-sitter, can you? Not sure, but Yuan will know. ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? 2024-11-28 7:30 ` Eli Zaretskii @ 2024-11-28 10:03 ` Yuan Fu 2024-11-28 18:30 ` Filippo Argiolas 1 sibling, 0 replies; 7+ messages in thread From: Yuan Fu @ 2024-11-28 10:03 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Björn Lindqvist, Emacs Devel [-- Attachment #1: Type: text/plain, Size: 2197 bytes --] > On Nov 27, 2024, at 11:30 PM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: Björn Lindqvist <bjourne@gmail.com> >> Date: Thu, 28 Nov 2024 00:27:17 +0100 >> >> I've been trying to get c-ts-mode to indent like I want, but I'm >> running into problems related to preprocessor directives. > > Preprocessor directives are difficult because the tree-sitter C/C++ > grammars include only partial support for them. > >> For >> example, consider a type definition nested in two #ifdefs: >> >> #ifdef X >> #ifdef Y >> typedef int foo; >> #endif >> #endif >> >> Since both the parent and grand parent of the type_definition is a >> preproc_ifdef no rule matches. > > But if you go back (up) the parent-child hierarchy, you will > eventually find a node which is not a preproc_SOMETHING, and can go > from there, no? > >> Another issue is that I want my >> preprocessor directives kept at column 0, which unfortunately screws >> up all rules that refer to the parent. E.g.: >> >> ((parent-is "if_statement") standalone-parent 4) >> >> Doesn't work for >> >> int main() { >> if (true) >> #ifdef A >> prutt(); >> #else >> fis(); >> #endif >> } >> >> The rule I'd like to express is "take the indent of the closest >> *indenting* parent and add one indent". That rule would match whether >> that parent is a "while_statement", "if_statement", "for_statement", >> etc. You can't express such rules with tree-sitter, can you? > > Not sure, but Yuan will know. Everything is possible, it’s just elisp. The only problem is how generic you can make the rule. Here’s a POC that only works for this example; specifically, it only works for if statements and #ifdef directives. It should be extendable to for statement, while statement, etc, and maybe other directives too. Speaking of indent, we need to do something with c-ts-mode’s indentation rules. It’s getting too long and too complex. But I don’t have any great idea at this point. Maybe we can replace the rules with a hand-rolled function so it has more structure, or try nvim’s query approach. Yuan [-- Attachment #2: preproc-indent.patch --] [-- Type: application/octet-stream, Size: 1695 bytes --] From 25de026b3eb32e7457270cd199fe0902876a2715 Mon Sep 17 00:00:00 2001 From: Yuan Fu <casouri@gmail.com> Date: Thu, 28 Nov 2024 01:51:44 -0800 Subject: [PATCH] Preproc indent POC --- lisp/progmodes/c-ts-mode.el | 18 ++++++++++++++++++ 1 file changed, 18 insertions(+) diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el index c815ee35501..313dcfb5c05 100644 --- a/lisp/progmodes/c-ts-mode.el +++ b/lisp/progmodes/c-ts-mode.el @@ -435,6 +435,24 @@ c-ts-mode--indent-styles ((parent-is "labeled_statement") c-ts-mode--standalone-grandparent c-ts-mode-indent-offset) + ,(let (anchor) + (list + (lambda (_node parent &rest _) + (let ((anchor-node + (cond + ((treesit-node-match-p parent "preproc_ifdef") + (treesit-node-prev-sibling parent)) + ((treesit-node-match-p parent "preproc_else") + (treesit-node-prev-sibling + (treesit-node-parent parent)))))) + (when anchor-node + (setq anchor (treesit-node-start anchor-node)) + ;; If parent is preproc and previous sibling is + ;; if_statement, set anchor and return t. + (treesit-node-match-p anchor-node "if_statement")))) + (lambda (&rest _) anchor) + c-ts-mode-indent-offset)) + ;; Preproc directives ((node-is "preproc") column-0 0) ((node-is "#endif") column-0 0) -- 2.39.5 (Apple Git-151) ^ permalink raw reply related [flat|nested] 7+ messages in thread
* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? 2024-11-28 7:30 ` Eli Zaretskii 2024-11-28 10:03 ` Yuan Fu @ 2024-11-28 18:30 ` Filippo Argiolas 2024-12-01 6:18 ` Yuan Fu 1 sibling, 1 reply; 7+ messages in thread From: Filippo Argiolas @ 2024-11-28 18:30 UTC (permalink / raw) To: Eli Zaretskii, Björn Lindqvist, Yuan Fu; +Cc: emacs-devel Eli Zaretskii <eliz@gnu.org> writes: >> From: Björn Lindqvist <bjourne@gmail.com> >> Date: Thu, 28 Nov 2024 00:27:17 +0100 >> >> I've been trying to get c-ts-mode to indent like I want, but I'm >> running into problems related to preprocessor directives. > > Preprocessor directives are difficult because the tree-sitter C/C++ > grammars include only partial support for them. > >> For >> example, consider a type definition nested in two #ifdefs: >> >> #ifdef X >> #ifdef Y >> typedef int foo; >> #endif >> #endif >> >> Since both the parent and grand parent of the type_definition is a >> preproc_ifdef no rule matches. > > But if you go back (up) the parent-child hierarchy, you will > eventually find a node which is not a preproc_SOMETHING, and can go > from there, no? > I believe we might have a bug here, as far as I can tell it does not match ((n-p-gp nil "preproc" "translation_unit") column-0 0) Because both parent and grand parent are preproc. So it matches one of the `c-ts-mode--standalone-parent-skip-preproc' rules right after. After skipping preproc nodes parent is translation_unit and indents an offset from there. Guess this step could be made smarter to check for translation_unit and the rule above could be removed? >> Another issue is that I want my >> preprocessor directives kept at column 0, which unfortunately screws >> up all rules that refer to the parent. E.g.: >> >> ((parent-is "if_statement") standalone-parent 4) >> >> Doesn't work for >> >> int main() { >> if (true) >> #ifdef A >> prutt(); >> #else >> fis(); >> #endif >> } >> >> The rule I'd like to express is "take the indent of the closest >> *indenting* parent and add one indent". That rule would match whether >> that parent is a "while_statement", "if_statement", "for_statement", >> etc. You can't express such rules with tree-sitter, can you? > > Not sure, but Yuan will know. This can be worked around as Yuan showed, but isn't it a grammar bug? problem is with the #ifdef function and if statement become siblings, without preproc they have a child-parent relation. In my experience c-ts-mode is a bit fragile with preprocessor statements, probably because the grammar itself is fragile (see e.g. [1]) and the problem is an hard one. Yuan, do you think c-ts-mode could some way benefit from LSP knowledge about inactive preprocessor branches? Idea is that we would at least have a good syntax tree in the active branches while allowing some errors in the inactive ones. Filippo 1. https://github.com/tree-sitter/tree-sitter-c/issues/108 ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? 2024-11-28 18:30 ` Filippo Argiolas @ 2024-12-01 6:18 ` Yuan Fu 2024-12-01 8:36 ` Filippo Argiolas 0 siblings, 1 reply; 7+ messages in thread From: Yuan Fu @ 2024-12-01 6:18 UTC (permalink / raw) To: Filippo Argiolas; +Cc: Eli Zaretskii, Björn Lindqvist, emacs-devel > On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote: > > Eli Zaretskii <eliz@gnu.org> writes: > >>> From: Björn Lindqvist <bjourne@gmail.com> >>> Date: Thu, 28 Nov 2024 00:27:17 +0100 >>> >>> I've been trying to get c-ts-mode to indent like I want, but I'm >>> running into problems related to preprocessor directives. >> >> Preprocessor directives are difficult because the tree-sitter C/C++ >> grammars include only partial support for them. >> >>> For >>> example, consider a type definition nested in two #ifdefs: >>> >>> #ifdef X >>> #ifdef Y >>> typedef int foo; >>> #endif >>> #endif >>> >>> Since both the parent and grand parent of the type_definition is a >>> preproc_ifdef no rule matches. >> >> But if you go back (up) the parent-child hierarchy, you will >> eventually find a node which is not a preproc_SOMETHING, and can go >> from there, no? >> > > I believe we might have a bug here, as far as I can tell it does not > match > > ((n-p-gp nil "preproc" "translation_unit") column-0 0) > > Because both parent and grand parent are preproc. So it matches one of > the `c-ts-mode--standalone-parent-skip-preproc' rules right after. > > After skipping preproc nodes parent is translation_unit and indents an offset > from there. Guess this step could be made smarter to check for > translation_unit and the rule above could be removed? > >>> Another issue is that I want my >>> preprocessor directives kept at column 0, which unfortunately screws >>> up all rules that refer to the parent. E.g.: >>> >>> ((parent-is "if_statement") standalone-parent 4) >>> >>> Doesn't work for >>> >>> int main() { >>> if (true) >>> #ifdef A >>> prutt(); >>> #else >>> fis(); >>> #endif >>> } >>> >>> The rule I'd like to express is "take the indent of the closest >>> *indenting* parent and add one indent". That rule would match whether >>> that parent is a "while_statement", "if_statement", "for_statement", >>> etc. You can't express such rules with tree-sitter, can you? >> >> Not sure, but Yuan will know. > > This can be worked around as Yuan showed, but isn't it a grammar bug? > problem is with the #ifdef function and if statement become siblings, without > preproc they have a child-parent relation. > > In my experience c-ts-mode is a bit fragile with preprocessor > statements, probably because the grammar itself is fragile (see > e.g. [1]) and the problem is an hard one. Right. > Yuan, do you think c-ts-mode could some way benefit from LSP knowledge > about inactive preprocessor branches? Idea is that we would at least > have a good syntax tree in the active branches while allowing some > errors in the inactive ones. Maybe. Technically you can create a parser and sets its range to only included the active branches. But for it to work end-to-end would require some major effort. I’m not sure if it’s worth it (in terms of code complexity and maintenance cost). Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? 2024-12-01 6:18 ` Yuan Fu @ 2024-12-01 8:36 ` Filippo Argiolas 2024-12-01 9:32 ` Yuan Fu 0 siblings, 1 reply; 7+ messages in thread From: Filippo Argiolas @ 2024-12-01 8:36 UTC (permalink / raw) To: Yuan Fu; +Cc: Eli Zaretskii, Björn Lindqvist, emacs-devel Yuan Fu <casouri@gmail.com> writes: >> On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote: >> >> Eli Zaretskii <eliz@gnu.org> writes: >> >>>> From: Björn Lindqvist <bjourne@gmail.com> >>>> Date: Thu, 28 Nov 2024 00:27:17 +0100 >>>> >>>> I've been trying to get c-ts-mode to indent like I want, but I'm >>>> running into problems related to preprocessor directives. >>> >>> Preprocessor directives are difficult because the tree-sitter C/C++ >>> grammars include only partial support for them. >>> >>>> For >>>> example, consider a type definition nested in two #ifdefs: >>>> >>>> #ifdef X >>>> #ifdef Y >>>> typedef int foo; >>>> #endif >>>> #endif >>>> >>>> Since both the parent and grand parent of the type_definition is a >>>> preproc_ifdef no rule matches. >>> >>> But if you go back (up) the parent-child hierarchy, you will >>> eventually find a node which is not a preproc_SOMETHING, and can go >>> from there, no? >>> >> >> I believe we might have a bug here, as far as I can tell it does not >> match >> >> ((n-p-gp nil "preproc" "translation_unit") column-0 0) >> >> Because both parent and grand parent are preproc. So it matches one of >> the `c-ts-mode--standalone-parent-skip-preproc' rules right after. >> >> After skipping preproc nodes parent is translation_unit and indents an offset >> from there. Guess this step could be made smarter to check for >> translation_unit and the rule above could be removed? >> >>>> Another issue is that I want my >>>> preprocessor directives kept at column 0, which unfortunately screws >>>> up all rules that refer to the parent. E.g.: >>>> >>>> ((parent-is "if_statement") standalone-parent 4) >>>> >>>> Doesn't work for >>>> >>>> int main() { >>>> if (true) >>>> #ifdef A >>>> prutt(); >>>> #else >>>> fis(); >>>> #endif >>>> } >>>> >>>> The rule I'd like to express is "take the indent of the closest >>>> *indenting* parent and add one indent". That rule would match whether >>>> that parent is a "while_statement", "if_statement", "for_statement", >>>> etc. You can't express such rules with tree-sitter, can you? >>> >>> Not sure, but Yuan will know. >> >> This can be worked around as Yuan showed, but isn't it a grammar bug? >> problem is with the #ifdef function and if statement become siblings, without >> preproc they have a child-parent relation. >> >> In my experience c-ts-mode is a bit fragile with preprocessor >> statements, probably because the grammar itself is fragile (see >> e.g. [1]) and the problem is an hard one. > > Right. > >> Yuan, do you think c-ts-mode could some way benefit from LSP knowledge >> about inactive preprocessor branches? Idea is that we would at least >> have a good syntax tree in the active branches while allowing some >> errors in the inactive ones. > > Maybe. Technically you can create a parser and sets its range to only included the active branches. But for it to work end-to-end would require some major effort. I’m not sure if it’s worth it (in terms of code complexity and maintenance cost). Interesting, maybe I'll experiment a bit with it and see where it goes. Agree that it already sounds overkill for little gain. My major annoyance more than indent is when the preprocessor statements break function detection and imenu/breadcrumb. I have one offending file of this kind at work which unfortunately I cannot share. Will try to extract a test case that reproduce the issue and open a bug. May be it can be worked around some way from c-ts-mode. Filippo ^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? 2024-12-01 8:36 ` Filippo Argiolas @ 2024-12-01 9:32 ` Yuan Fu 0 siblings, 0 replies; 7+ messages in thread From: Yuan Fu @ 2024-12-01 9:32 UTC (permalink / raw) To: Filippo Argiolas; +Cc: Eli Zaretskii, Björn Lindqvist, emacs-devel > On Dec 1, 2024, at 12:36 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote: > > Yuan Fu <casouri@gmail.com> writes: > >>> On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote: >>> >>> Eli Zaretskii <eliz@gnu.org> writes: >>> >>>>> From: Björn Lindqvist <bjourne@gmail.com> >>>>> Date: Thu, 28 Nov 2024 00:27:17 +0100 >>>>> >>>>> I've been trying to get c-ts-mode to indent like I want, but I'm >>>>> running into problems related to preprocessor directives. >>>> >>>> Preprocessor directives are difficult because the tree-sitter C/C++ >>>> grammars include only partial support for them. >>>> >>>>> For >>>>> example, consider a type definition nested in two #ifdefs: >>>>> >>>>> #ifdef X >>>>> #ifdef Y >>>>> typedef int foo; >>>>> #endif >>>>> #endif >>>>> >>>>> Since both the parent and grand parent of the type_definition is a >>>>> preproc_ifdef no rule matches. >>>> >>>> But if you go back (up) the parent-child hierarchy, you will >>>> eventually find a node which is not a preproc_SOMETHING, and can go >>>> from there, no? >>>> >>> >>> I believe we might have a bug here, as far as I can tell it does not >>> match >>> >>> ((n-p-gp nil "preproc" "translation_unit") column-0 0) >>> >>> Because both parent and grand parent are preproc. So it matches one of >>> the `c-ts-mode--standalone-parent-skip-preproc' rules right after. >>> >>> After skipping preproc nodes parent is translation_unit and indents an offset >>> from there. Guess this step could be made smarter to check for >>> translation_unit and the rule above could be removed? >>> >>>>> Another issue is that I want my >>>>> preprocessor directives kept at column 0, which unfortunately screws >>>>> up all rules that refer to the parent. E.g.: >>>>> >>>>> ((parent-is "if_statement") standalone-parent 4) >>>>> >>>>> Doesn't work for >>>>> >>>>> int main() { >>>>> if (true) >>>>> #ifdef A >>>>> prutt(); >>>>> #else >>>>> fis(); >>>>> #endif >>>>> } >>>>> >>>>> The rule I'd like to express is "take the indent of the closest >>>>> *indenting* parent and add one indent". That rule would match whether >>>>> that parent is a "while_statement", "if_statement", "for_statement", >>>>> etc. You can't express such rules with tree-sitter, can you? >>>> >>>> Not sure, but Yuan will know. >>> >>> This can be worked around as Yuan showed, but isn't it a grammar bug? >>> problem is with the #ifdef function and if statement become siblings, without >>> preproc they have a child-parent relation. >>> >>> In my experience c-ts-mode is a bit fragile with preprocessor >>> statements, probably because the grammar itself is fragile (see >>> e.g. [1]) and the problem is an hard one. >> >> Right. >> >>> Yuan, do you think c-ts-mode could some way benefit from LSP knowledge >>> about inactive preprocessor branches? Idea is that we would at least >>> have a good syntax tree in the active branches while allowing some >>> errors in the inactive ones. >> >> Maybe. Technically you can create a parser and sets its range to only included the active branches. But for it to work end-to-end would require some major effort. I’m not sure if it’s worth it (in terms of code complexity and maintenance cost). > > Interesting, maybe I'll experiment a bit with it and see where it > goes. Agree that it already sounds overkill for little gain. > > My major annoyance more than indent is when the preprocessor statements > break function detection and imenu/breadcrumb. I have one offending file > of this kind at work which unfortunately I cannot share. Will try to > extract a test case that reproduce the issue and open a bug. May be it > can be worked around some way from c-ts-mode. I share the frustration. Tree-sitter for C could’ve been so much better if weren’t for the preprocessor and macros. IME, whether it can be worked around depends on the specific code. Some code just generates a parse tree that’s hard to recover. Yuan ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2024-12-01 9:32 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2024-11-27 23:27 How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? Björn Lindqvist 2024-11-28 7:30 ` Eli Zaretskii 2024-11-28 10:03 ` Yuan Fu 2024-11-28 18:30 ` Filippo Argiolas 2024-12-01 6:18 ` Yuan Fu 2024-12-01 8:36 ` Filippo Argiolas 2024-12-01 9:32 ` Yuan Fu
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).