From: Yuan Fu <casouri@gmail.com>
To: Filippo Argiolas <filippo.argiolas@gmail.com>
Cc: "Eli Zaretskii" <eliz@gnu.org>,
"Björn Lindqvist" <bjourne@gmail.com>,
emacs-devel@gnu.org
Subject: Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work?
Date: Sun, 1 Dec 2024 01:32:20 -0800 [thread overview]
Message-ID: <FE856458-E3F4-4306-B882-602CFE9BA586@gmail.com> (raw)
In-Reply-To: <m2jzcj7qb1.fsf@gmail.com>
> On Dec 1, 2024, at 12:36 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>> On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com> wrote:
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>>> From: Björn Lindqvist <bjourne@gmail.com>
>>>>> Date: Thu, 28 Nov 2024 00:27:17 +0100
>>>>>
>>>>> I've been trying to get c-ts-mode to indent like I want, but I'm
>>>>> running into problems related to preprocessor directives.
>>>>
>>>> Preprocessor directives are difficult because the tree-sitter C/C++
>>>> grammars include only partial support for them.
>>>>
>>>>> For
>>>>> example, consider a type definition nested in two #ifdefs:
>>>>>
>>>>> #ifdef X
>>>>> #ifdef Y
>>>>> typedef int foo;
>>>>> #endif
>>>>> #endif
>>>>>
>>>>> Since both the parent and grand parent of the type_definition is a
>>>>> preproc_ifdef no rule matches.
>>>>
>>>> But if you go back (up) the parent-child hierarchy, you will
>>>> eventually find a node which is not a preproc_SOMETHING, and can go
>>>> from there, no?
>>>>
>>>
>>> I believe we might have a bug here, as far as I can tell it does not
>>> match
>>>
>>> ((n-p-gp nil "preproc" "translation_unit") column-0 0)
>>>
>>> Because both parent and grand parent are preproc. So it matches one of
>>> the `c-ts-mode--standalone-parent-skip-preproc' rules right after.
>>>
>>> After skipping preproc nodes parent is translation_unit and indents an offset
>>> from there. Guess this step could be made smarter to check for
>>> translation_unit and the rule above could be removed?
>>>
>>>>> Another issue is that I want my
>>>>> preprocessor directives kept at column 0, which unfortunately screws
>>>>> up all rules that refer to the parent. E.g.:
>>>>>
>>>>> ((parent-is "if_statement") standalone-parent 4)
>>>>>
>>>>> Doesn't work for
>>>>>
>>>>> int main() {
>>>>> if (true)
>>>>> #ifdef A
>>>>> prutt();
>>>>> #else
>>>>> fis();
>>>>> #endif
>>>>> }
>>>>>
>>>>> The rule I'd like to express is "take the indent of the closest
>>>>> *indenting* parent and add one indent". That rule would match whether
>>>>> that parent is a "while_statement", "if_statement", "for_statement",
>>>>> etc. You can't express such rules with tree-sitter, can you?
>>>>
>>>> Not sure, but Yuan will know.
>>>
>>> This can be worked around as Yuan showed, but isn't it a grammar bug?
>>> problem is with the #ifdef function and if statement become siblings, without
>>> preproc they have a child-parent relation.
>>>
>>> In my experience c-ts-mode is a bit fragile with preprocessor
>>> statements, probably because the grammar itself is fragile (see
>>> e.g. [1]) and the problem is an hard one.
>>
>> Right.
>>
>>> Yuan, do you think c-ts-mode could some way benefit from LSP knowledge
>>> about inactive preprocessor branches? Idea is that we would at least
>>> have a good syntax tree in the active branches while allowing some
>>> errors in the inactive ones.
>>
>> Maybe. Technically you can create a parser and sets its range to only included the active branches. But for it to work end-to-end would require some major effort. I’m not sure if it’s worth it (in terms of code complexity and maintenance cost).
>
> Interesting, maybe I'll experiment a bit with it and see where it
> goes. Agree that it already sounds overkill for little gain.
>
> My major annoyance more than indent is when the preprocessor statements
> break function detection and imenu/breadcrumb. I have one offending file
> of this kind at work which unfortunately I cannot share. Will try to
> extract a test case that reproduce the issue and open a bug. May be it
> can be worked around some way from c-ts-mode.
I share the frustration. Tree-sitter for C could’ve been so much better if weren’t for the preprocessor and macros.
IME, whether it can be worked around depends on the specific code. Some code just generates a parse tree that’s hard to recover.
Yuan
prev parent reply other threads:[~2024-12-01 9:32 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2024-11-27 23:27 How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? Björn Lindqvist
2024-11-28 7:30 ` Eli Zaretskii
2024-11-28 10:03 ` Yuan Fu
2024-11-28 18:30 ` Filippo Argiolas
2024-12-01 6:18 ` Yuan Fu
2024-12-01 8:36 ` Filippo Argiolas
2024-12-01 9:32 ` Yuan Fu [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://www.gnu.org/software/emacs/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=FE856458-E3F4-4306-B882-602CFE9BA586@gmail.com \
--to=casouri@gmail.com \
--cc=bjourne@gmail.com \
--cc=eliz@gnu.org \
--cc=emacs-devel@gnu.org \
--cc=filippo.argiolas@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/emacs.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).