unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
@ 2024-11-27 23:27 Björn Lindqvist
  2024-11-28  7:30 ` Eli Zaretskii
  0 siblings, 1 reply; 7+ messages in thread
From: Björn Lindqvist @ 2024-11-27 23:27 UTC (permalink / raw)
  To: Emacs developers

Hello Emacs developers!

I've been trying to get c-ts-mode to indent like I want, but I'm
running into problems related to preprocessor directives. For
example, consider a type definition nested in two #ifdefs:

    #ifdef X
    #ifdef Y
    typedef int foo;
    #endif
    #endif

Since both the parent and grand parent of the type_definition is a
preproc_ifdef no rule matches. Another issue is that I want my
preprocessor directives kept at column 0, which unfortunately screws
up all rules that refer to the parent. E.g.:

    ((parent-is "if_statement") standalone-parent 4)

Doesn't work for

    int main() {
        if (true)
    #ifdef A
            prutt();
    #else
            fis();
    #endif
    }

The rule I'd like to express is "take the indent of the closest
*indenting* parent and add one indent". That rule would match whether
that parent is a "while_statement", "if_statement", "for_statement",
etc. You can't express such rules with tree-sitter, can you?

Btw, I get that tree-sitter can't handle *all* weird preprocessor
constructs you can create, but my examples are really common and
appear in most C code bases.


-- 
mvh/best regards Björn Lindqvist



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
  2024-11-27 23:27 How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? Björn Lindqvist
@ 2024-11-28  7:30 ` Eli Zaretskii
  2024-11-28 10:03   ` Yuan Fu
  2024-11-28 18:30   ` Filippo Argiolas
  0 siblings, 2 replies; 7+ messages in thread
From: Eli Zaretskii @ 2024-11-28  7:30 UTC (permalink / raw)
  To: Björn Lindqvist, Yuan Fu; +Cc: emacs-devel

> From: Björn Lindqvist <bjourne@gmail.com>
> Date: Thu, 28 Nov 2024 00:27:17 +0100
> 
> I've been trying to get c-ts-mode to indent like I want, but I'm
> running into problems related to preprocessor directives.

Preprocessor directives are difficult because the tree-sitter C/C++
grammars include only partial support for them.

> For
> example, consider a type definition nested in two #ifdefs:
> 
>     #ifdef X
>     #ifdef Y
>     typedef int foo;
>     #endif
>     #endif
> 
> Since both the parent and grand parent of the type_definition is a
> preproc_ifdef no rule matches.

But if you go back (up) the parent-child hierarchy, you will
eventually find a node which is not a preproc_SOMETHING, and can go
from there, no?

> Another issue is that I want my
> preprocessor directives kept at column 0, which unfortunately screws
> up all rules that refer to the parent. E.g.:
> 
>     ((parent-is "if_statement") standalone-parent 4)
> 
> Doesn't work for
> 
>     int main() {
>         if (true)
>     #ifdef A
>             prutt();
>     #else
>             fis();
>     #endif
>     }
> 
> The rule I'd like to express is "take the indent of the closest
> *indenting* parent and add one indent". That rule would match whether
> that parent is a "while_statement", "if_statement", "for_statement",
> etc. You can't express such rules with tree-sitter, can you?

Not sure, but Yuan will know.



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
  2024-11-28  7:30 ` Eli Zaretskii
@ 2024-11-28 10:03   ` Yuan Fu
  2024-11-28 18:30   ` Filippo Argiolas
  1 sibling, 0 replies; 7+ messages in thread
From: Yuan Fu @ 2024-11-28 10:03 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Björn Lindqvist, Emacs Devel

[-- Attachment #1: Type: text/plain, Size: 2197 bytes --]



> On Nov 27, 2024, at 11:30 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: Björn Lindqvist <bjourne@gmail.com>
>> Date: Thu, 28 Nov 2024 00:27:17 +0100
>> 
>> I've been trying to get c-ts-mode to indent like I want, but I'm
>> running into problems related to preprocessor directives.
> 
> Preprocessor directives are difficult because the tree-sitter C/C++
> grammars include only partial support for them.
> 
>> For
>> example, consider a type definition nested in two #ifdefs:
>> 
>>    #ifdef X
>>    #ifdef Y
>>    typedef int foo;
>>    #endif
>>    #endif
>> 
>> Since both the parent and grand parent of the type_definition is a
>> preproc_ifdef no rule matches.
> 
> But if you go back (up) the parent-child hierarchy, you will
> eventually find a node which is not a preproc_SOMETHING, and can go
> from there, no?
> 
>> Another issue is that I want my
>> preprocessor directives kept at column 0, which unfortunately screws
>> up all rules that refer to the parent. E.g.:
>> 
>>    ((parent-is "if_statement") standalone-parent 4)
>> 
>> Doesn't work for
>> 
>>    int main() {
>>        if (true)
>>    #ifdef A
>>            prutt();
>>    #else
>>            fis();
>>    #endif
>>    }
>> 
>> The rule I'd like to express is "take the indent of the closest
>> *indenting* parent and add one indent". That rule would match whether
>> that parent is a "while_statement", "if_statement", "for_statement",
>> etc. You can't express such rules with tree-sitter, can you?
> 
> Not sure, but Yuan will know.

Everything is possible, it’s just elisp. The only problem is how generic you can make the rule. Here’s a POC that only works for this example; specifically, it only works for if statements and #ifdef directives. It should be extendable to for statement, while statement, etc, and maybe other directives too.

Speaking of indent, we need to do something with c-ts-mode’s indentation rules. It’s getting too long and too complex. But I don’t have any great idea at this point. Maybe we can replace the rules with a hand-rolled function so it has more structure, or try nvim’s query approach.

Yuan



[-- Attachment #2: preproc-indent.patch --]
[-- Type: application/octet-stream, Size: 1695 bytes --]

From 25de026b3eb32e7457270cd199fe0902876a2715 Mon Sep 17 00:00:00 2001
From: Yuan Fu <casouri@gmail.com>
Date: Thu, 28 Nov 2024 01:51:44 -0800
Subject: [PATCH] Preproc indent POC

---
 lisp/progmodes/c-ts-mode.el | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

diff --git a/lisp/progmodes/c-ts-mode.el b/lisp/progmodes/c-ts-mode.el
index c815ee35501..313dcfb5c05 100644
--- a/lisp/progmodes/c-ts-mode.el
+++ b/lisp/progmodes/c-ts-mode.el
@@ -435,6 +435,24 @@ c-ts-mode--indent-styles
            ((parent-is "labeled_statement")
             c-ts-mode--standalone-grandparent c-ts-mode-indent-offset)
 
+           ,(let (anchor)
+              (list
+               (lambda (_node parent &rest _)
+                 (let ((anchor-node
+                        (cond
+                         ((treesit-node-match-p parent "preproc_ifdef")
+                          (treesit-node-prev-sibling parent))
+                         ((treesit-node-match-p parent "preproc_else")
+                          (treesit-node-prev-sibling
+                           (treesit-node-parent parent))))))
+                   (when anchor-node
+                     (setq anchor (treesit-node-start anchor-node))
+                     ;; If parent is preproc and previous sibling is
+                     ;; if_statement, set anchor and return t.
+                     (treesit-node-match-p anchor-node "if_statement"))))
+               (lambda (&rest _) anchor)
+               c-ts-mode-indent-offset))
+
            ;; Preproc directives
            ((node-is "preproc") column-0 0)
            ((node-is "#endif") column-0 0)
-- 
2.39.5 (Apple Git-151)


^ permalink raw reply related	[flat|nested] 7+ messages in thread

* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
  2024-11-28  7:30 ` Eli Zaretskii
  2024-11-28 10:03   ` Yuan Fu
@ 2024-11-28 18:30   ` Filippo Argiolas
  2024-12-01  6:18     ` Yuan Fu
  1 sibling, 1 reply; 7+ messages in thread
From: Filippo Argiolas @ 2024-11-28 18:30 UTC (permalink / raw)
  To: Eli Zaretskii, Björn Lindqvist, Yuan Fu; +Cc: emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Björn Lindqvist <bjourne@gmail.com>
>> Date: Thu, 28 Nov 2024 00:27:17 +0100
>> 
>> I've been trying to get c-ts-mode to indent like I want, but I'm
>> running into problems related to preprocessor directives.
>
> Preprocessor directives are difficult because the tree-sitter C/C++
> grammars include only partial support for them.
>
>> For
>> example, consider a type definition nested in two #ifdefs:
>> 
>>     #ifdef X
>>     #ifdef Y
>>     typedef int foo;
>>     #endif
>>     #endif
>> 
>> Since both the parent and grand parent of the type_definition is a
>> preproc_ifdef no rule matches.
>
> But if you go back (up) the parent-child hierarchy, you will
> eventually find a node which is not a preproc_SOMETHING, and can go
> from there, no?
>

I believe we might have a bug here, as far as I can tell it does not
match

  ((n-p-gp nil "preproc" "translation_unit") column-0 0)

Because both parent and grand parent are preproc. So it matches one of
the `c-ts-mode--standalone-parent-skip-preproc' rules right after.

After skipping preproc nodes parent is translation_unit and indents an offset
from there. Guess this step could be made smarter to check for
translation_unit and the rule above could be removed?

>> Another issue is that I want my
>> preprocessor directives kept at column 0, which unfortunately screws
>> up all rules that refer to the parent. E.g.:
>> 
>>     ((parent-is "if_statement") standalone-parent 4)
>> 
>> Doesn't work for
>> 
>>     int main() {
>>         if (true)
>>     #ifdef A
>>             prutt();
>>     #else
>>             fis();
>>     #endif
>>     }
>> 
>> The rule I'd like to express is "take the indent of the closest
>> *indenting* parent and add one indent". That rule would match whether
>> that parent is a "while_statement", "if_statement", "for_statement",
>> etc. You can't express such rules with tree-sitter, can you?
>
> Not sure, but Yuan will know.

This can be worked around as Yuan showed, but isn't it a grammar bug?
problem is with the #ifdef function and if statement become siblings, without
preproc they have a child-parent relation.

In my experience c-ts-mode is a bit fragile with preprocessor
statements, probably because the grammar itself is fragile (see
e.g. [1]) and the problem is an hard one.

Yuan, do you think c-ts-mode could some way benefit from LSP knowledge
about inactive preprocessor branches? Idea is that we would at least
have a good syntax tree in the active branches while allowing some
errors in the inactive ones.


Filippo


1. https://github.com/tree-sitter/tree-sitter-c/issues/108



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
  2024-11-28 18:30   ` Filippo Argiolas
@ 2024-12-01  6:18     ` Yuan Fu
  2024-12-01  8:36       ` Filippo Argiolas
  0 siblings, 1 reply; 7+ messages in thread
From: Yuan Fu @ 2024-12-01  6:18 UTC (permalink / raw)
  To: Filippo Argiolas; +Cc: Eli Zaretskii, Björn Lindqvist, emacs-devel



> On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote:
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
>>> From: Björn Lindqvist <bjourne@gmail.com>
>>> Date: Thu, 28 Nov 2024 00:27:17 +0100
>>> 
>>> I've been trying to get c-ts-mode to indent like I want, but I'm
>>> running into problems related to preprocessor directives.
>> 
>> Preprocessor directives are difficult because the tree-sitter C/C++
>> grammars include only partial support for them.
>> 
>>> For
>>> example, consider a type definition nested in two #ifdefs:
>>> 
>>>    #ifdef X
>>>    #ifdef Y
>>>    typedef int foo;
>>>    #endif
>>>    #endif
>>> 
>>> Since both the parent and grand parent of the type_definition is a
>>> preproc_ifdef no rule matches.
>> 
>> But if you go back (up) the parent-child hierarchy, you will
>> eventually find a node which is not a preproc_SOMETHING, and can go
>> from there, no?
>> 
> 
> I believe we might have a bug here, as far as I can tell it does not
> match
> 
>  ((n-p-gp nil "preproc" "translation_unit") column-0 0)
> 
> Because both parent and grand parent are preproc. So it matches one of
> the `c-ts-mode--standalone-parent-skip-preproc' rules right after.
> 
> After skipping preproc nodes parent is translation_unit and indents an offset
> from there. Guess this step could be made smarter to check for
> translation_unit and the rule above could be removed?
> 
>>> Another issue is that I want my
>>> preprocessor directives kept at column 0, which unfortunately screws
>>> up all rules that refer to the parent. E.g.:
>>> 
>>>    ((parent-is "if_statement") standalone-parent 4)
>>> 
>>> Doesn't work for
>>> 
>>>    int main() {
>>>        if (true)
>>>    #ifdef A
>>>            prutt();
>>>    #else
>>>            fis();
>>>    #endif
>>>    }
>>> 
>>> The rule I'd like to express is "take the indent of the closest
>>> *indenting* parent and add one indent". That rule would match whether
>>> that parent is a "while_statement", "if_statement", "for_statement",
>>> etc. You can't express such rules with tree-sitter, can you?
>> 
>> Not sure, but Yuan will know.
> 
> This can be worked around as Yuan showed, but isn't it a grammar bug?
> problem is with the #ifdef function and if statement become siblings, without
> preproc they have a child-parent relation.
> 
> In my experience c-ts-mode is a bit fragile with preprocessor
> statements, probably because the grammar itself is fragile (see
> e.g. [1]) and the problem is an hard one.

Right.

> Yuan, do you think c-ts-mode could some way benefit from LSP knowledge
> about inactive preprocessor branches? Idea is that we would at least
> have a good syntax tree in the active branches while allowing some
> errors in the inactive ones.

Maybe. Technically you can create a parser and sets its range to only included the active branches. But for it to work end-to-end would require some major effort. I’m not sure if it’s worth it (in terms of code complexity and maintenance cost).

Yuan


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
  2024-12-01  6:18     ` Yuan Fu
@ 2024-12-01  8:36       ` Filippo Argiolas
  2024-12-01  9:32         ` Yuan Fu
  0 siblings, 1 reply; 7+ messages in thread
From: Filippo Argiolas @ 2024-12-01  8:36 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, Björn Lindqvist, emacs-devel

Yuan Fu <casouri@gmail.com> writes:

>> On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote:
>> 
>> Eli Zaretskii <eliz@gnu.org> writes:
>> 
>>>> From: Björn Lindqvist <bjourne@gmail.com>
>>>> Date: Thu, 28 Nov 2024 00:27:17 +0100
>>>> 
>>>> I've been trying to get c-ts-mode to indent like I want, but I'm
>>>> running into problems related to preprocessor directives.
>>> 
>>> Preprocessor directives are difficult because the tree-sitter C/C++
>>> grammars include only partial support for them.
>>> 
>>>> For
>>>> example, consider a type definition nested in two #ifdefs:
>>>> 
>>>>    #ifdef X
>>>>    #ifdef Y
>>>>    typedef int foo;
>>>>    #endif
>>>>    #endif
>>>> 
>>>> Since both the parent and grand parent of the type_definition is a
>>>> preproc_ifdef no rule matches.
>>> 
>>> But if you go back (up) the parent-child hierarchy, you will
>>> eventually find a node which is not a preproc_SOMETHING, and can go
>>> from there, no?
>>> 
>> 
>> I believe we might have a bug here, as far as I can tell it does not
>> match
>> 
>>  ((n-p-gp nil "preproc" "translation_unit") column-0 0)
>> 
>> Because both parent and grand parent are preproc. So it matches one of
>> the `c-ts-mode--standalone-parent-skip-preproc' rules right after.
>> 
>> After skipping preproc nodes parent is translation_unit and indents an offset
>> from there. Guess this step could be made smarter to check for
>> translation_unit and the rule above could be removed?
>> 
>>>> Another issue is that I want my
>>>> preprocessor directives kept at column 0, which unfortunately screws
>>>> up all rules that refer to the parent. E.g.:
>>>> 
>>>>    ((parent-is "if_statement") standalone-parent 4)
>>>> 
>>>> Doesn't work for
>>>> 
>>>>    int main() {
>>>>        if (true)
>>>>    #ifdef A
>>>>            prutt();
>>>>    #else
>>>>            fis();
>>>>    #endif
>>>>    }
>>>> 
>>>> The rule I'd like to express is "take the indent of the closest
>>>> *indenting* parent and add one indent". That rule would match whether
>>>> that parent is a "while_statement", "if_statement", "for_statement",
>>>> etc. You can't express such rules with tree-sitter, can you?
>>> 
>>> Not sure, but Yuan will know.
>> 
>> This can be worked around as Yuan showed, but isn't it a grammar bug?
>> problem is with the #ifdef function and if statement become siblings, without
>> preproc they have a child-parent relation.
>> 
>> In my experience c-ts-mode is a bit fragile with preprocessor
>> statements, probably because the grammar itself is fragile (see
>> e.g. [1]) and the problem is an hard one.
>
> Right.
>
>> Yuan, do you think c-ts-mode could some way benefit from LSP knowledge
>> about inactive preprocessor branches? Idea is that we would at least
>> have a good syntax tree in the active branches while allowing some
>> errors in the inactive ones.
>
> Maybe. Technically you can create a parser and sets its range to only included the active branches. But for it to work end-to-end would require some major effort. I’m not sure if it’s worth it (in terms of code complexity and maintenance cost).

Interesting, maybe I'll experiment a bit with it and see where it
goes. Agree that it already sounds overkill for little gain.

My major annoyance more than indent is when the preprocessor statements
break function detection and imenu/breadcrumb. I have one offending file
of this kind at work which unfortunately I cannot share. Will try to
extract a test case that reproduce the issue and open a bug. May be it
can be worked around some way from c-ts-mode.


Filippo



^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
  2024-12-01  8:36       ` Filippo Argiolas
@ 2024-12-01  9:32         ` Yuan Fu
  0 siblings, 0 replies; 7+ messages in thread
From: Yuan Fu @ 2024-12-01  9:32 UTC (permalink / raw)
  To: Filippo Argiolas; +Cc: Eli Zaretskii, Björn Lindqvist, emacs-devel



> On Dec 1, 2024, at 12:36 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote:
> 
> Yuan Fu <casouri@gmail.com> writes:
> 
>>> On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote:
>>> 
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>> 
>>>>> From: Björn Lindqvist <bjourne@gmail.com>
>>>>> Date: Thu, 28 Nov 2024 00:27:17 +0100
>>>>> 
>>>>> I've been trying to get c-ts-mode to indent like I want, but I'm
>>>>> running into problems related to preprocessor directives.
>>>> 
>>>> Preprocessor directives are difficult because the tree-sitter C/C++
>>>> grammars include only partial support for them.
>>>> 
>>>>> For
>>>>> example, consider a type definition nested in two #ifdefs:
>>>>> 
>>>>>   #ifdef X
>>>>>   #ifdef Y
>>>>>   typedef int foo;
>>>>>   #endif
>>>>>   #endif
>>>>> 
>>>>> Since both the parent and grand parent of the type_definition is a
>>>>> preproc_ifdef no rule matches.
>>>> 
>>>> But if you go back (up) the parent-child hierarchy, you will
>>>> eventually find a node which is not a preproc_SOMETHING, and can go
>>>> from there, no?
>>>> 
>>> 
>>> I believe we might have a bug here, as far as I can tell it does not
>>> match
>>> 
>>> ((n-p-gp nil "preproc" "translation_unit") column-0 0)
>>> 
>>> Because both parent and grand parent are preproc. So it matches one of
>>> the `c-ts-mode--standalone-parent-skip-preproc' rules right after.
>>> 
>>> After skipping preproc nodes parent is translation_unit and indents an offset
>>> from there. Guess this step could be made smarter to check for
>>> translation_unit and the rule above could be removed?
>>> 
>>>>> Another issue is that I want my
>>>>> preprocessor directives kept at column 0, which unfortunately screws
>>>>> up all rules that refer to the parent. E.g.:
>>>>> 
>>>>>   ((parent-is "if_statement") standalone-parent 4)
>>>>> 
>>>>> Doesn't work for
>>>>> 
>>>>>   int main() {
>>>>>       if (true)
>>>>>   #ifdef A
>>>>>           prutt();
>>>>>   #else
>>>>>           fis();
>>>>>   #endif
>>>>>   }
>>>>> 
>>>>> The rule I'd like to express is "take the indent of the closest
>>>>> *indenting* parent and add one indent". That rule would match whether
>>>>> that parent is a "while_statement", "if_statement", "for_statement",
>>>>> etc. You can't express such rules with tree-sitter, can you?
>>>> 
>>>> Not sure, but Yuan will know.
>>> 
>>> This can be worked around as Yuan showed, but isn't it a grammar bug?
>>> problem is with the #ifdef function and if statement become siblings, without
>>> preproc they have a child-parent relation.
>>> 
>>> In my experience c-ts-mode is a bit fragile with preprocessor
>>> statements, probably because the grammar itself is fragile (see
>>> e.g. [1]) and the problem is an hard one.
>> 
>> Right.
>> 
>>> Yuan, do you think c-ts-mode could some way benefit from LSP knowledge
>>> about inactive preprocessor branches? Idea is that we would at least
>>> have a good syntax tree in the active branches while allowing some
>>> errors in the inactive ones.
>> 
>> Maybe. Technically you can create a parser and sets its range to only included the active branches. But for it to work end-to-end would require some major effort. I’m not sure if it’s worth it (in terms of code complexity and maintenance cost).
> 
> Interesting, maybe I'll experiment a bit with it and see where it
> goes. Agree that it already sounds overkill for little gain.
> 
> My major annoyance more than indent is when the preprocessor statements
> break function detection and imenu/breadcrumb. I have one offending file
> of this kind at work which unfortunately I cannot share. Will try to
> extract a test case that reproduce the issue and open a bug. May be it
> can be worked around some way from c-ts-mode.

I share the frustration. Tree-sitter for C could’ve been so much better if weren’t for the preprocessor and macros. 

IME, whether it can be worked around depends on the specific code. Some code just generates a parse tree that’s hard to recover.

Yuan


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2024-12-01  9:32 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2024-11-27 23:27 How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? Björn Lindqvist
2024-11-28  7:30 ` Eli Zaretskii
2024-11-28 10:03   ` Yuan Fu
2024-11-28 18:30   ` Filippo Argiolas
2024-12-01  6:18     ` Yuan Fu
2024-12-01  8:36       ` Filippo Argiolas
2024-12-01  9:32         ` Yuan Fu

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).