[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: How does c-ts-mode, tree-sitter indentation, and preprocessor direct
From: |
Yuan Fu |
Subject: |
Re: How does c-ts-mode, tree-sitter indentation, and preprocessor directives work? |
Date: |
Sun, 1 Dec 2024 01:32:20 -0800 |
> On Dec 1, 2024, at 12:36 AM, Filippo Argiolas <filippo.argiolas@gmail.com>
> wrote:
>
> Yuan Fu <casouri@gmail.com> writes:
>
>>> On Nov 28, 2024, at 10:30 AM, Filippo Argiolas <filippo.argiolas@gmail.com>
>>> wrote:
>>>
>>> Eli Zaretskii <eliz@gnu.org> writes:
>>>
>>>>> From: Björn Lindqvist <bjourne@gmail.com>
>>>>> Date: Thu, 28 Nov 2024 00:27:17 +0100
>>>>>
>>>>> I've been trying to get c-ts-mode to indent like I want, but I'm
>>>>> running into problems related to preprocessor directives.
>>>>
>>>> Preprocessor directives are difficult because the tree-sitter C/C++
>>>> grammars include only partial support for them.
>>>>
>>>>> For
>>>>> example, consider a type definition nested in two #ifdefs:
>>>>>
>>>>> #ifdef X
>>>>> #ifdef Y
>>>>> typedef int foo;
>>>>> #endif
>>>>> #endif
>>>>>
>>>>> Since both the parent and grand parent of the type_definition is a
>>>>> preproc_ifdef no rule matches.
>>>>
>>>> But if you go back (up) the parent-child hierarchy, you will
>>>> eventually find a node which is not a preproc_SOMETHING, and can go
>>>> from there, no?
>>>>
>>>
>>> I believe we might have a bug here, as far as I can tell it does not
>>> match
>>>
>>> ((n-p-gp nil "preproc" "translation_unit") column-0 0)
>>>
>>> Because both parent and grand parent are preproc. So it matches one of
>>> the `c-ts-mode--standalone-parent-skip-preproc' rules right after.
>>>
>>> After skipping preproc nodes parent is translation_unit and indents an
>>> offset
>>> from there. Guess this step could be made smarter to check for
>>> translation_unit and the rule above could be removed?
>>>
>>>>> Another issue is that I want my
>>>>> preprocessor directives kept at column 0, which unfortunately screws
>>>>> up all rules that refer to the parent. E.g.:
>>>>>
>>>>> ((parent-is "if_statement") standalone-parent 4)
>>>>>
>>>>> Doesn't work for
>>>>>
>>>>> int main() {
>>>>> if (true)
>>>>> #ifdef A
>>>>> prutt();
>>>>> #else
>>>>> fis();
>>>>> #endif
>>>>> }
>>>>>
>>>>> The rule I'd like to express is "take the indent of the closest
>>>>> *indenting* parent and add one indent". That rule would match whether
>>>>> that parent is a "while_statement", "if_statement", "for_statement",
>>>>> etc. You can't express such rules with tree-sitter, can you?
>>>>
>>>> Not sure, but Yuan will know.
>>>
>>> This can be worked around as Yuan showed, but isn't it a grammar bug?
>>> problem is with the #ifdef function and if statement become siblings,
>>> without
>>> preproc they have a child-parent relation.
>>>
>>> In my experience c-ts-mode is a bit fragile with preprocessor
>>> statements, probably because the grammar itself is fragile (see
>>> e.g. [1]) and the problem is an hard one.
>>
>> Right.
>>
>>> Yuan, do you think c-ts-mode could some way benefit from LSP knowledge
>>> about inactive preprocessor branches? Idea is that we would at least
>>> have a good syntax tree in the active branches while allowing some
>>> errors in the inactive ones.
>>
>> Maybe. Technically you can create a parser and sets its range to only
>> included the active branches. But for it to work end-to-end would require
>> some major effort. I’m not sure if it’s worth it (in terms of code
>> complexity and maintenance cost).
>
> Interesting, maybe I'll experiment a bit with it and see where it
> goes. Agree that it already sounds overkill for little gain.
>
> My major annoyance more than indent is when the preprocessor statements
> break function detection and imenu/breadcrumb. I have one offending file
> of this kind at work which unfortunately I cannot share. Will try to
> extract a test case that reproduce the issue and open a bug. May be it
> can be worked around some way from c-ts-mode.
I share the frustration. Tree-sitter for C could’ve been so much better if
weren’t for the preprocessor and macros.
IME, whether it can be worked around depends on the specific code. Some code
just generates a parse tree that’s hard to recover.
Yuan