unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* tree-sitter: conceptional problem solvable at Emacs' level?
@ 2023-02-09  8:09 Holger Schurig
  2023-02-09  8:17 ` Po Lu
  2023-02-09 16:25 ` Ergus
  0 siblings, 2 replies; 26+ messages in thread
From: Holger Schurig @ 2023-02-09  8:09 UTC (permalink / raw)
  To: Emacs-devel

Hi, I run branch emacs-29 since some time with great success. And now I
wanted to test out tree-sitter and c++-test-mode. Unfortunately, I
stumbled into some conceptional problems and wonder if this is actually
solvable by Emacs, or if some would need a completely new grammar.

The issue is: tree-sitter doesn't work well with C macros.

I program a lot in C++/Qt. So let's look at this (valid) C++ program:

-----------------------------------------------------------------------------
#include <QObject>

class Test : public QObject
{
        Q_OBJECT
public:
        Test() : QObject() {};
public slots:
        void someSlot() {};
};
-----------------------------------------------------------------------------

If have the libraries installed (e.g. qtbase5-dev on Debian), you can
compile this perfectly.

However, tree-sitter produces a garbage syntax tree:

- contain some bitfield node (which isn't really there)
- contains an error node (despite the code being compilable)

And as a result, BOTH the indentation and the font-locking is wrong.


Would I need to create a tree-sitter grammar in JavaScript that
understands this macro-enhanced C++?   That would be quite difficult.
Or will there be a method to add some kind of tiny-preprocessor to
c++-ts-mode, so that it can substitute "Q_OBJECT", "signals" and "slots"
with nothing before handing things over to tree-sitter?


In comparison, I could teach the old cc-mode about this macro-enriched
C++ just with

  (c-add-style "qt-gnu"
               '("gnu" (c-access-key .
                       "\\<\\(signals\\|public\\|protected\\|private\\|public
               slots\\|protected slots\\|private slots\\):")))


I guess that a lot of C and C++ programs use macros. And if there is no
simple way to aid tree-sitter in understanding this, then I fear
tree-sitter enhanced modes will often be unusable on them.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09  8:09 tree-sitter: conceptional problem solvable at Emacs' level? Holger Schurig
@ 2023-02-09  8:17 ` Po Lu
  2023-02-09  8:50   ` Eli Zaretskii
  2023-02-10  7:33   ` Yuan Fu
  2023-02-09 16:25 ` Ergus
  1 sibling, 2 replies; 26+ messages in thread
From: Po Lu @ 2023-02-09  8:17 UTC (permalink / raw)
  To: Holger Schurig; +Cc: Emacs-devel

Holger Schurig <holgerschurig@gmail.com> writes:

> Hi, I run branch emacs-29 since some time with great success. And now I
> wanted to test out tree-sitter and c++-test-mode. Unfortunately, I
> stumbled into some conceptional problems and wonder if this is actually
> solvable by Emacs, or if some would need a completely new grammar.
>
> The issue is: tree-sitter doesn't work well with C macros.
>
> I program a lot in C++/Qt. So let's look at this (valid) C++ program:
>
> -----------------------------------------------------------------------------
> #include <QObject>
>
> class Test : public QObject
> {
>         Q_OBJECT
> public:
>         Test() : QObject() {};
> public slots:
>         void someSlot() {};
> };
> -----------------------------------------------------------------------------
>
> If have the libraries installed (e.g. qtbase5-dev on Debian), you can
> compile this perfectly.
>
> However, tree-sitter produces a garbage syntax tree:
>
> - contain some bitfield node (which isn't really there)
> - contains an error node (despite the code being compilable)
>
> And as a result, BOTH the indentation and the font-locking is wrong.
>
>
> Would I need to create a tree-sitter grammar in JavaScript that
> understands this macro-enhanced C++?   That would be quite difficult.
> Or will there be a method to add some kind of tiny-preprocessor to
> c++-ts-mode, so that it can substitute "Q_OBJECT", "signals" and "slots"
> with nothing before handing things over to tree-sitter?
>
>
> In comparison, I could teach the old cc-mode about this macro-enriched
> C++ just with
>
>   (c-add-style "qt-gnu"
>                '("gnu" (c-access-key .
>                        "\\<\\(signals\\|public\\|protected\\|private\\|public
>                slots\\|protected slots\\|private slots\\):")))
>
>
> I guess that a lot of C and C++ programs use macros. And if there is no
> simple way to aid tree-sitter in understanding this, then I fear
> tree-sitter enhanced modes will often be unusable on them.

My suggestion is simply to stay with CC Mode.

Parsers (without a full C preprocessor inside) can only work for
languages like Python, which cannot be enhanced with syntax-modifying
macros.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09  8:17 ` Po Lu
@ 2023-02-09  8:50   ` Eli Zaretskii
  2023-02-09 10:13     ` Po Lu
  2023-02-10  7:33   ` Yuan Fu
  1 sibling, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2023-02-09  8:50 UTC (permalink / raw)
  To: Po Lu; +Cc: holgerschurig, Emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: Emacs-devel@gnu.org
> Date: Thu, 09 Feb 2023 16:17:27 +0800
> 
> Holger Schurig <holgerschurig@gmail.com> writes:
> 
> > Hi, I run branch emacs-29 since some time with great success. And now I
> > wanted to test out tree-sitter and c++-test-mode. Unfortunately, I
> > stumbled into some conceptional problems and wonder if this is actually
> > solvable by Emacs, or if some would need a completely new grammar.
> >
> > The issue is: tree-sitter doesn't work well with C macros.
> >
> > I program a lot in C++/Qt. So let's look at this (valid) C++ program:
> >
> > -----------------------------------------------------------------------------
> > #include <QObject>
> >
> > class Test : public QObject
> > {
> >         Q_OBJECT
> > public:
> >         Test() : QObject() {};
> > public slots:
> >         void someSlot() {};
> > };
> > -----------------------------------------------------------------------------
> >
> > If have the libraries installed (e.g. qtbase5-dev on Debian), you can
> > compile this perfectly.
> >
> > However, tree-sitter produces a garbage syntax tree:
> >
> > - contain some bitfield node (which isn't really there)
> > - contains an error node (despite the code being compilable)
> >
> > And as a result, BOTH the indentation and the font-locking is wrong.
> >
> >
> > Would I need to create a tree-sitter grammar in JavaScript that
> > understands this macro-enhanced C++?   That would be quite difficult.
> > Or will there be a method to add some kind of tiny-preprocessor to
> > c++-ts-mode, so that it can substitute "Q_OBJECT", "signals" and "slots"
> > with nothing before handing things over to tree-sitter?
> >
> >
> > In comparison, I could teach the old cc-mode about this macro-enriched
> > C++ just with
> >
> >   (c-add-style "qt-gnu"
> >                '("gnu" (c-access-key .
> >                        "\\<\\(signals\\|public\\|protected\\|private\\|public
> >                slots\\|protected slots\\|private slots\\):")))
> >
> >
> > I guess that a lot of C and C++ programs use macros. And if there is no
> > simple way to aid tree-sitter in understanding this, then I fear
> > tree-sitter enhanced modes will often be unusable on them.
> 
> My suggestion is simply to stay with CC Mode.

Suggestions for what to do for now aside, I would still want us to try
to figure out the possibilities for better handling of C/C++ macros in
tree-sitter supported modes.  I don't want to give up yet, because the
kludges similar to c-add-style used by CC mode might be possible with
tree-sitter modes as well.  Or maybe some other solution could work,
including the idea of letting tree-sitter see preprocessed source code
(although this is probably harder to implement, and must be done on
the C level).

We just started using these modes in Emacs, so it is small wonder that
issues like this are popping up, and will probably keep popping up for
some time to come.  I see no reason whatsoever to give up on
tree-sitter just because these minor problems in marginal cases are
brought up; we should instead solve them one by one.  Being minor
problems, they in no way invalidate the basic decision to try using
tree-sitter in Emacs, not from where I stand.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09  8:50   ` Eli Zaretskii
@ 2023-02-09 10:13     ` Po Lu
  2023-02-09 10:55       ` Eli Zaretskii
  0 siblings, 1 reply; 26+ messages in thread
From: Po Lu @ 2023-02-09 10:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: holgerschurig, Emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Suggestions for what to do for now aside, I would still want us to try
> to figure out the possibilities for better handling of C/C++ macros in
> tree-sitter supported modes.  I don't want to give up yet, because the
> kludges similar to c-add-style used by CC mode might be possible with
> tree-sitter modes as well.  Or maybe some other solution could work,
> including the idea of letting tree-sitter see preprocessed source code
> (although this is probably harder to implement, and must be done on
> the C level).

I don't oppose us trying, I just see problems with trying to parse C
macros with anything other than guesswork whilst maintaining reasonable
speed.

Preprocessing C can be very slow, and might not be easy to set up to run
on each edit, which would be necessary to let tree-sitter see
preprocessed source code.

And once you start trying to do that, you have to determine how to run
whatever C preprocessor is supposed to preprocess a file.  Even language
servers, which are typically entire C compilers, fail to understand some
kinds of macros not written with them in mind.  I guess one example
would be the very compiler specific ``macro''-like keywords used by
various compilers to designate long and short interrupt service
routines, or Intel's ``near'' and ``far'' keywords.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09 10:13     ` Po Lu
@ 2023-02-09 10:55       ` Eli Zaretskii
  0 siblings, 0 replies; 26+ messages in thread
From: Eli Zaretskii @ 2023-02-09 10:55 UTC (permalink / raw)
  To: Po Lu; +Cc: holgerschurig, Emacs-devel

> From: Po Lu <luangruo@yahoo.com>
> Cc: holgerschurig@gmail.com,  Emacs-devel@gnu.org
> Date: Thu, 09 Feb 2023 18:13:59 +0800
> 
> I don't oppose us trying, I just see problems with trying to parse C
> macros with anything other than guesswork whilst maintaining reasonable
> speed.

Well, guesswork or not, CC mode does it, so it's definitely doable.  I
see no reason to give up on this without trying.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09  8:09 tree-sitter: conceptional problem solvable at Emacs' level? Holger Schurig
  2023-02-09  8:17 ` Po Lu
@ 2023-02-09 16:25 ` Ergus
  2023-02-09 20:09   ` Dmitry Gutov
  1 sibling, 1 reply; 26+ messages in thread
From: Ergus @ 2023-02-09 16:25 UTC (permalink / raw)
  To: Holger Schurig; +Cc: Emacs-devel

Hi:

Probably I a saying the obvious, but, did you tried to share this in the
treesit syntax repository issues?

https://github.com/tree-sitter/tree-sitter-cpp/issues

Maybe they could give a native better solution... and fix it there, so
we won't need to reinvennt the wheel.

Best,
Ergus



On Thu, Feb 09, 2023 at 12:09:10AM -0800, Holger Schurig wrote:
>Hi, I run branch emacs-29 since some time with great success. And now I
>wanted to test out tree-sitter and c++-test-mode. Unfortunately, I
>stumbled into some conceptional problems and wonder if this is actually
>solvable by Emacs, or if some would need a completely new grammar.
>
>The issue is: tree-sitter doesn't work well with C macros.
>
>I program a lot in C++/Qt. So let's look at this (valid) C++ program:
>
>-----------------------------------------------------------------------------
>#include <QObject>
>
>class Test : public QObject
>{
>        Q_OBJECT
>public:
>        Test() : QObject() {};
>public slots:
>        void someSlot() {};
>};
>-----------------------------------------------------------------------------
>
>If have the libraries installed (e.g. qtbase5-dev on Debian), you can
>compile this perfectly.
>
>However, tree-sitter produces a garbage syntax tree:
>
>- contain some bitfield node (which isn't really there)
>- contains an error node (despite the code being compilable)
>
>And as a result, BOTH the indentation and the font-locking is wrong.
>
>
>Would I need to create a tree-sitter grammar in JavaScript that
>understands this macro-enhanced C++?   That would be quite difficult.
>Or will there be a method to add some kind of tiny-preprocessor to
>c++-ts-mode, so that it can substitute "Q_OBJECT", "signals" and "slots"
>with nothing before handing things over to tree-sitter?
>
>
>In comparison, I could teach the old cc-mode about this macro-enriched
>C++ just with
>
>  (c-add-style "qt-gnu"
>               '("gnu" (c-access-key .
>                       "\\<\\(signals\\|public\\|protected\\|private\\|public
>               slots\\|protected slots\\|private slots\\):")))
>
>
>I guess that a lot of C and C++ programs use macros. And if there is no
>simple way to aid tree-sitter in understanding this, then I fear
>tree-sitter enhanced modes will often be unusable on them.
>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09 16:25 ` Ergus
@ 2023-02-09 20:09   ` Dmitry Gutov
  2023-02-10  7:41     ` Holger Schurig
  0 siblings, 1 reply; 26+ messages in thread
From: Dmitry Gutov @ 2023-02-09 20:09 UTC (permalink / raw)
  To: Ergus, Holger Schurig; +Cc: Emacs-devel

On 09/02/2023 18:25, Ergus wrote:
> Hi:
> 
> Probably I a saying the obvious, but, did you tried to share this in the
> treesit syntax repository issues?
> 
> https://github.com/tree-sitter/tree-sitter-cpp/issues
> 
> Maybe they could give a native better solution... and fix it there, so
> we won't need to reinvennt the wheel.

These might be relevant:

https://github.com/tree-sitter/tree-sitter-cpp/issues/85

https://github.com/tree-sitter/tree-sitter-cpp/issues/40

https://github.com/tree-sitter/tree-sitter-cpp/issues/146



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09  8:17 ` Po Lu
  2023-02-09  8:50   ` Eli Zaretskii
@ 2023-02-10  7:33   ` Yuan Fu
  2023-02-10  8:42     ` Eli Zaretskii
  1 sibling, 1 reply; 26+ messages in thread
From: Yuan Fu @ 2023-02-10  7:33 UTC (permalink / raw)
  To: Po Lu; +Cc: Holger Schurig, Emacs-devel



> On Feb 9, 2023, at 12:17 AM, Po Lu <luangruo@yahoo.com> wrote:
> 
> Holger Schurig <holgerschurig@gmail.com> writes:
> 
>> Hi, I run branch emacs-29 since some time with great success. And now I
>> wanted to test out tree-sitter and c++-test-mode. Unfortunately, I
>> stumbled into some conceptional problems and wonder if this is actually
>> solvable by Emacs, or if some would need a completely new grammar.
>> 
>> The issue is: tree-sitter doesn't work well with C macros.
>> 
>> I program a lot in C++/Qt. So let's look at this (valid) C++ program:
>> 
>> -----------------------------------------------------------------------------
>> #include <QObject>
>> 
>> class Test : public QObject
>> {
>>        Q_OBJECT
>> public:
>>        Test() : QObject() {};
>> public slots:
>>        void someSlot() {};
>> };
>> -----------------------------------------------------------------------------
>> 
>> If have the libraries installed (e.g. qtbase5-dev on Debian), you can
>> compile this perfectly.
>> 
>> However, tree-sitter produces a garbage syntax tree:
>> 
>> - contain some bitfield node (which isn't really there)
>> - contains an error node (despite the code being compilable)
>> 
>> And as a result, BOTH the indentation and the font-locking is wrong.
>> 
>> 
>> Would I need to create a tree-sitter grammar in JavaScript that
>> understands this macro-enhanced C++?   That would be quite difficult.
>> Or will there be a method to add some kind of tiny-preprocessor to
>> c++-ts-mode, so that it can substitute "Q_OBJECT", "signals" and "slots"
>> with nothing before handing things over to tree-sitter?
>> 
>> 
>> In comparison, I could teach the old cc-mode about this macro-enriched
>> C++ just with
>> 
>>  (c-add-style "qt-gnu"
>>               '("gnu" (c-access-key .
>>                       "\\<\\(signals\\|public\\|protected\\|private\\|public
>>               slots\\|protected slots\\|private slots\\):")))
>> 
>> 
>> I guess that a lot of C and C++ programs use macros. And if there is no
>> simple way to aid tree-sitter in understanding this, then I fear
>> tree-sitter enhanced modes will often be unusable on them.
> 
> My suggestion is simply to stay with CC Mode.
> 
> Parsers (without a full C preprocessor inside) can only work for
> languages like Python, which cannot be enhanced with syntax-modifying
> macros.
> 

Right. Our best hope is for someone to try extend the current tree-sitter-c grammar, but I don’t know how feasible it is. Emacs can also do some limited workaround, but the potential in that department is slim.

Yuan


^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-09 20:09   ` Dmitry Gutov
@ 2023-02-10  7:41     ` Holger Schurig
  0 siblings, 0 replies; 26+ messages in thread
From: Holger Schurig @ 2023-02-10  7:41 UTC (permalink / raw)
  To: Dmitry Gutov, Ergus; +Cc: Emacs-devel

> https://github.com/tree-sitter/tree-sitter-cpp/issues/85

From January 2021

Inconclusive.


> https://github.com/tree-sitter/tree-sitter-cpp/issues/40

From June 2019

Speaks about regenerating the parser based on environment variabled.
That means you'll have to have the whole NPM toolchain installed.

Final suggestion is replacing things with spaces before feeding the data to
tree-sitter. That would keep offsets intact.


> https://github.com/tree-sitter/tree-sitter-cpp/issues/146

From January 2022

Points to alternate parsers for C++ dialects, here for OpenFOAM code.



Given that these bugs are sitting there for sometimes years, I can
conclude that the tree-sitter project doesn't care at all. Or lacks
the manpower. Or assigns it differently. Or that the core developers of
it have different itches to scratch.

Whatever the reason, I wouldn't hold my breath that anything changes on
their side soon.

The last idea from their bug 40 sounds like it could be implemented on
Emacs side.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-10  7:33   ` Yuan Fu
@ 2023-02-10  8:42     ` Eli Zaretskii
       [not found]       ` <CAOpc7mHX6s0B8vdDee+9FMvQejGTSL3jzgwVekS7Esg-AOf=jw@mail.gmail.com>
  2023-02-11  9:34       ` Ihor Radchenko
  0 siblings, 2 replies; 26+ messages in thread
From: Eli Zaretskii @ 2023-02-10  8:42 UTC (permalink / raw)
  To: Yuan Fu; +Cc: luangruo, holgerschurig, Emacs-devel

> From: Yuan Fu <casouri@gmail.com>
> Date: Thu, 9 Feb 2023 23:33:10 -0800
> Cc: Holger Schurig <holgerschurig@gmail.com>,
>  Emacs-devel@gnu.org
> 
> > Parsers (without a full C preprocessor inside) can only work for
> > languages like Python, which cannot be enhanced with syntax-modifying
> > macros.
> 
> Right. Our best hope is for someone to try extend the current tree-sitter-c grammar, but I don’t know how feasible it is. Emacs can also do some limited workaround, but the potential in that department is slim.

I think we still have a way to go before we reach the above
conclusions (which basically mean we give up on improving the
situation with C/C++ macros).  We should explore other approaches.

One such approach would be to perform our own analysis when the parser
returns an error node due to macros.

Another possibility is to complicate the function we pass to
tree-sitter with which to read buffer text, in a way that replaces the
text of a macro with something else (in the simplest case, just space
characters), so as to avoid errors in the parser, and again analyze
the macros in our own code.

And I'm sure there are other alternatives.  This issue is not unique
to Emacs, so studying how other IDEs deal with it could also yield
ideas.

Volunteers interested in improving support for C/C++ based on
tree-sitter are very welcome to step forward and work on these issues.

Thanks.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
       [not found]       ` <CAOpc7mHX6s0B8vdDee+9FMvQejGTSL3jzgwVekS7Esg-AOf=jw@mail.gmail.com>
@ 2023-02-10 11:48         ` Eli Zaretskii
  2023-02-11  2:17           ` Po Lu
  0 siblings, 1 reply; 26+ messages in thread
From: Eli Zaretskii @ 2023-02-10 11:48 UTC (permalink / raw)
  To: Holger Schurig; +Cc: luangruo, holgerschurig, Emacs-devel

> From: Holger Schurig <holgerschurig@gmail.com>
> Date: Fri, 10 Feb 2023 11:31:39 +0000
> 
> > And I'm sure there are other alternatives.  This issue is not unique
> > to Emacs, so studying how other IDEs deal with it could also yield
> > ideas.
> 
> I'm aware of three FOSS IDEs that are somewhat linked to C++ and/or Qt:
> 
> * KDevelop (from the KDE pro project)
> * Kate (dito, although some might not call it an IDE)
> * Qt Creator
> 
> For the first two I have no info if they even looked into Tree-Sitter.
> 
> However, Qt Creator does't want to implement Tree-Sitter at all, they
> are happy with KSyntaxHighlighting. See
> https://bugreports.qt.io/browse/QTCREATORBUG-26348
> 
> KSyntaxHighligting is in turn from the KDE project, and used in KDevelop
> and Kate. So basically all three use the same technology. Which seems to
> work nicely for them.

Thanks.

However, I meant the IDEs which are using tree-sitter and support
developing C/C++ programs.  I believe some do.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-10 11:48         ` Eli Zaretskii
@ 2023-02-11  2:17           ` Po Lu
  2023-02-11  6:25             ` Konstantin Kharlamov
  0 siblings, 1 reply; 26+ messages in thread
From: Po Lu @ 2023-02-11  2:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Holger Schurig, Emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> However, I meant the IDEs which are using tree-sitter and support
> developing C/C++ programs.  I believe some do.

I think most of those have similar problems supporting macros.
Who knows their names? I may be able to ask some of their users.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  2:17           ` Po Lu
@ 2023-02-11  6:25             ` Konstantin Kharlamov
  2023-02-11  6:36               ` Konstantin Kharlamov
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-02-11  6:25 UTC (permalink / raw)
  To: Po Lu, Eli Zaretskii; +Cc: Holger Schurig, Emacs-devel

On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > However, I meant the IDEs which are using tree-sitter and support
> > developing C/C++ programs.  I believe some do.
> 
> I think most of those have similar problems supporting macros.
> Who knows their names? I may be able to ask some of their users.

From my experience on and off work, there are just two IDEs (as in, not editors)
used most widely for C++ code: QtCreator and Visual Studio. The first you
discussed, the second is proprietary.

Then again, people most often code in C++ and C with text editors, in that case
popular choices from my experience: Sublime Text and VS Code. These two have
don't use tree-sitter either.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  6:25             ` Konstantin Kharlamov
@ 2023-02-11  6:36               ` Konstantin Kharlamov
  2023-02-11  6:51                 ` Theodor Thornhill
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-02-11  6:36 UTC (permalink / raw)
  To: Po Lu, Eli Zaretskii; +Cc: Holger Schurig, Emacs-devel

On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
> On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> > Eli Zaretskii <eliz@gnu.org> writes:
> > 
> > > However, I meant the IDEs which are using tree-sitter and support
> > > developing C/C++ programs.  I believe some do.
> > 
> > I think most of those have similar problems supporting macros.
> > Who knows their names? I may be able to ask some of their users.
> 
> From my experience on and off work, there are just two IDEs (as in, not
> editors)
> used most widely for C++ code: QtCreator and Visual Studio. The first you
> discussed, the second is proprietary.
> 
> Then again, people most often code in C++ and C with text editors, in that
> case
> popular choices from my experience: Sublime Text and VS Code. These two have
> don't use tree-sitter either.

I installed Sublime Text on my Archlinux and tested with the C++ code OP posted.

What I see is that ST does seem confused about indentation, while trying to make
a newline right after `slots:` line.

However, if you try to make a newline after the `void someSlot() {};` line, it
will use the indentation used on the previous line.

The default cc-mode in Emacs works similarly. The cc-ts-mode on the other hand
doesn't make use of the previous indentation, and I think it should. It would
resolve that problem and others, because in my experience it happens very often
in C and C++ code that you want some custom indentation level, so you just make
one and you expect the editor to keep it while creating more new lines.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  6:36               ` Konstantin Kharlamov
@ 2023-02-11  6:51                 ` Theodor Thornhill
  2023-02-11  7:11                   ` Konstantin Kharlamov
  2023-04-16 19:21                   ` Konstantin Kharlamov
  0 siblings, 2 replies; 26+ messages in thread
From: Theodor Thornhill @ 2023-02-11  6:51 UTC (permalink / raw)
  To: emacs-devel, Konstantin Kharlamov, Po Lu, Eli Zaretskii
  Cc: Holger Schurig, Emacs-devel



On 11 February 2023 07:36:26 CET, Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
>On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
>> On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
>> > Eli Zaretskii <eliz@gnu.org> writes:
>> > 
>> > > However, I meant the IDEs which are using tree-sitter and support
>> > > developing C/C++ programs.  I believe some do.
>> > 
>> > I think most of those have similar problems supporting macros.
>> > Who knows their names? I may be able to ask some of their users.
>> 
>> From my experience on and off work, there are just two IDEs (as in, not
>> editors)
>> used most widely for C++ code: QtCreator and Visual Studio. The first you
>> discussed, the second is proprietary.
>> 
>> Then again, people most often code in C++ and C with text editors, in that
>> case
>> popular choices from my experience: Sublime Text and VS Code. These two have
>> don't use tree-sitter either.
>
>I installed Sublime Text on my Archlinux and tested with the C++ code OP posted.
>
>What I see is that ST does seem confused about indentation, while trying to make
>a newline right after `slots:` line.
>
>However, if you try to make a newline after the `void someSlot() {};` line, it
>will use the indentation used on the previous line.
>
>The default cc-mode in Emacs works similarly. The cc-ts-mode on the other hand
>doesn't make use of the previous indentation, and I think it should. It would
>resolve that problem and others, because in my experience it happens very often
>in C and C++ code that you want some custom indentation level, so you just make
>one and you expect the editor to keep it while creating more new lines.
>

That last statement sounds easily solvable. Can you send me a short example describing exactly what you want in a code snippet and I'll add it.

Thanks,
Theo



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  6:51                 ` Theodor Thornhill
@ 2023-02-11  7:11                   ` Konstantin Kharlamov
  2023-02-11  7:53                     ` Konstantin Kharlamov
                                       ` (2 more replies)
  2023-04-16 19:21                   ` Konstantin Kharlamov
  1 sibling, 3 replies; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-02-11  7:11 UTC (permalink / raw)
  To: Theodor Thornhill, emacs-devel, Po Lu, Eli Zaretskii; +Cc: Holger Schurig

On Sat, 2023-02-11 at 07:51 +0100, Theodor Thornhill wrote:
> 
> 
> On 11 February 2023 07:36:26 CET, Konstantin Kharlamov <hi-angel@yandex.ru>
> wrote:
> > On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
> > > On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> > > > Eli Zaretskii <eliz@gnu.org> writes:
> > > > 
> > > > > However, I meant the IDEs which are using tree-sitter and support
> > > > > developing C/C++ programs.  I believe some do.
> > > > 
> > > > I think most of those have similar problems supporting macros.
> > > > Who knows their names? I may be able to ask some of their users.
> > > 
> > > From my experience on and off work, there are just two IDEs (as in, not
> > > editors)
> > > used most widely for C++ code: QtCreator and Visual Studio. The first you
> > > discussed, the second is proprietary.
> > > 
> > > Then again, people most often code in C++ and C with text editors, in that
> > > case
> > > popular choices from my experience: Sublime Text and VS Code. These two
> > > have
> > > don't use tree-sitter either.
> > 
> > I installed Sublime Text on my Archlinux and tested with the C++ code OP
> > posted.
> > 
> > What I see is that ST does seem confused about indentation, while trying to
> > make
> > a newline right after `slots:` line.
> > 
> > However, if you try to make a newline after the `void someSlot() {};` line,
> > it
> > will use the indentation used on the previous line.
> > 
> > The default cc-mode in Emacs works similarly. The cc-ts-mode on the other
> > hand
> > doesn't make use of the previous indentation, and I think it should. It
> > would
> > resolve that problem and others, because in my experience it happens very
> > often
> > in C and C++ code that you want some custom indentation level, so you just
> > make
> > one and you expect the editor to keep it while creating more new lines.
> > 
> 
> That last statement sounds easily solvable. Can you send me a short example
> describing exactly what you want in a code snippet and I'll add it.
> 
> Thanks,
> Theo

Thank you! The example is below, but please wait a bit just to make sure there's no opposition from other people, because I don't know if it works like this on purpose, or not.

Given this C++ code with weird class members indentation:

    class Foo {
           int a;
           bool b;
    };

Now, suppose you put a caret after `bool b;` text and press Enter to make a new
line (all tests are done with `emacs -Q`). The behaviour:

* cc-mode and Sublime Text: creates a newline with the indentation exactly as on
the previous one.
* cc-ts-mode: re-indents the `bool b;` line, then creates a new one with a
custom indentation that is different from one on the `int a;` line.

The cc-mode and Sublime Text behaviour seems like less annoying to me, because
if I wanted to reindent the prev. line, most likely I'd did it by pressing an
indentation hotkey (e.g. `=` in Evil mode I use).



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  7:11                   ` Konstantin Kharlamov
@ 2023-02-11  7:53                     ` Konstantin Kharlamov
  2023-02-11  8:22                     ` Konstantin Kharlamov
  2023-02-11  8:43                     ` Eli Zaretskii
  2 siblings, 0 replies; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-02-11  7:53 UTC (permalink / raw)
  To: Theodor Thornhill, emacs-devel, Po Lu, Eli Zaretskii; +Cc: Holger Schurig

On Sat, 2023-02-11 at 10:11 +0300, Konstantin Kharlamov wrote:
> On Sat, 2023-02-11 at 07:51 +0100, Theodor Thornhill wrote:
> > 
> > 
> > On 11 February 2023 07:36:26 CET, Konstantin Kharlamov <hi-angel@yandex.ru>
> > wrote:
> > > On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
> > > > On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> > > > > Eli Zaretskii <eliz@gnu.org> writes:
> > > > > 
> > > > > > However, I meant the IDEs which are using tree-sitter and support
> > > > > > developing C/C++ programs.  I believe some do.
> > > > > 
> > > > > I think most of those have similar problems supporting macros.
> > > > > Who knows their names? I may be able to ask some of their users.
> > > > 
> > > > From my experience on and off work, there are just two IDEs (as in, not
> > > > editors)
> > > > used most widely for C++ code: QtCreator and Visual Studio. The first
> > > > you
> > > > discussed, the second is proprietary.
> > > > 
> > > > Then again, people most often code in C++ and C with text editors, in
> > > > that
> > > > case
> > > > popular choices from my experience: Sublime Text and VS Code. These two
> > > > have
> > > > don't use tree-sitter either.
> > > 
> > > I installed Sublime Text on my Archlinux and tested with the C++ code OP
> > > posted.
> > > 
> > > What I see is that ST does seem confused about indentation, while trying
> > > to
> > > make
> > > a newline right after `slots:` line.
> > > 
> > > However, if you try to make a newline after the `void someSlot() {};`
> > > line,
> > > it
> > > will use the indentation used on the previous line.
> > > 
> > > The default cc-mode in Emacs works similarly. The cc-ts-mode on the other
> > > hand
> > > doesn't make use of the previous indentation, and I think it should. It
> > > would
> > > resolve that problem and others, because in my experience it happens very
> > > often
> > > in C and C++ code that you want some custom indentation level, so you just
> > > make
> > > one and you expect the editor to keep it while creating more new lines.
> > > 
> > 
> > That last statement sounds easily solvable. Can you send me a short example
> > describing exactly what you want in a code snippet and I'll add it.
> > 
> > Thanks,
> > Theo
> 
> Thank you! The example is below, but please wait a bit just to make sure
> there's no opposition from other people, because I don't know if it works like
> this on purpose, or not.
> 
> Given this C++ code with weird class members indentation:
> 
>     class Foo {
>            int a;
>            bool b;
>     };
> 
> Now, suppose you put a caret after `bool b;` text and press Enter to make a
> new
> line (all tests are done with `emacs -Q`). The behaviour:
> 
> * cc-mode and Sublime Text: creates a newline with the indentation exactly as
> on
> the previous one.
> * cc-ts-mode: re-indents the `bool b;` line, then creates a new one with a
> custom indentation that is different from one on the `int a;` line.
> 
> The cc-mode and Sublime Text behaviour seems like less annoying to me, because
> if I wanted to reindent the prev. line, most likely I'd did it by pressing an
> indentation hotkey (e.g. `=` in Evil mode I use).

FTR, re-using indentation from prev. line seems to has been the default in all
Emacs modes, and one that proved to be useful. To support that here are more
examples. 

While writing C, depending on a circumstances the pre-existing code might have
indentation of function args like this:

foo(arg1,
    arg2);

Or like this:

foo(
 foo1,
 foo2);

You might want to add another argument to the call, but you don't want to re-
indent everything. So when you press Enter, you expect the new line to have
prev. indentation.

Another example from elisp-mode: I have this snippet in my Evil config:

(use-package evil
  ;; […]
  :bind (:map evil-insert-state-map
         ;; after having insert-state keymap wiped out make [escape] switch back
         ;; to normal state
         ([escape] . 'evil-normal-state)

         :map evil-normal-state-map
         ("C-u"    . 'evil-scroll-up)
         ("k"      . 'evil-previous-visual-line)
         ("j"      . 'evil-next-visual-line)
         ;; […]

         :map evil-visual-state-map
         ("k"      . 'evil-previous-visual-line)
         ("j"      . 'evil-next-visual-line)

         :map isearch-mode-map
         ;; allow for "up/down" history scrolling in / search
         ("<down>" . 'isearch-ring-advance)
         ("<up>"   . 'isearch-ring-retreat)
         )
  )

If I were to re-indent everything, the indentation in `:bind` body would be
different. But I want to keep it the way it is now, and due to elisp-mode re-
using prev. line indentation, whenever I create a new line, it will always have
the indentation I wanted.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  7:11                   ` Konstantin Kharlamov
  2023-02-11  7:53                     ` Konstantin Kharlamov
@ 2023-02-11  8:22                     ` Konstantin Kharlamov
  2023-02-11  8:41                       ` Theodor Thornhill
  2023-02-11  8:43                     ` Eli Zaretskii
  2 siblings, 1 reply; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-02-11  8:22 UTC (permalink / raw)
  To: Theodor Thornhill, emacs-devel, Po Lu, Eli Zaretskii; +Cc: Holger Schurig

On Sat, 2023-02-11 at 10:11 +0300, Konstantin Kharlamov wrote:
> On Sat, 2023-02-11 at 07:51 +0100, Theodor Thornhill wrote:
> > 
> > 
> > On 11 February 2023 07:36:26 CET, Konstantin Kharlamov <hi-angel@yandex.ru>
> > wrote:
> > > On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
> > > > On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> > > > > Eli Zaretskii <eliz@gnu.org> writes:
> > > > > 
> > > > > > However, I meant the IDEs which are using tree-sitter and support
> > > > > > developing C/C++ programs.  I believe some do.
> > > > > 
> > > > > I think most of those have similar problems supporting macros.
> > > > > Who knows their names? I may be able to ask some of their users.
> > > > 
> > > > From my experience on and off work, there are just two IDEs (as in, not
> > > > editors)
> > > > used most widely for C++ code: QtCreator and Visual Studio. The first
> > > > you
> > > > discussed, the second is proprietary.
> > > > 
> > > > Then again, people most often code in C++ and C with text editors, in
> > > > that
> > > > case
> > > > popular choices from my experience: Sublime Text and VS Code. These two
> > > > have
> > > > don't use tree-sitter either.
> > > 
> > > I installed Sublime Text on my Archlinux and tested with the C++ code OP
> > > posted.
> > > 
> > > What I see is that ST does seem confused about indentation, while trying
> > > to
> > > make
> > > a newline right after `slots:` line.
> > > 
> > > However, if you try to make a newline after the `void someSlot() {};`
> > > line,
> > > it
> > > will use the indentation used on the previous line.
> > > 
> > > The default cc-mode in Emacs works similarly. The cc-ts-mode on the other
> > > hand
> > > doesn't make use of the previous indentation, and I think it should. It
> > > would
> > > resolve that problem and others, because in my experience it happens very
> > > often
> > > in C and C++ code that you want some custom indentation level, so you just
> > > make
> > > one and you expect the editor to keep it while creating more new lines.
> > > 
> > 
> > That last statement sounds easily solvable. Can you send me a short example
> > describing exactly what you want in a code snippet and I'll add it.
> > 
> > Thanks,
> > Theo
> 
> Thank you! The example is below, but please wait a bit just to make sure
> there's no opposition from other people, because I don't know if it works like
> this on purpose, or not.
> 
> Given this C++ code with weird class members indentation:
> 
>     class Foo {
>            int a;
>            bool b;
>     };
> 
> Now, suppose you put a caret after `bool b;` text and press Enter to make a
> new
> line (all tests are done with `emacs -Q`). The behaviour:
> 
> * cc-mode and Sublime Text: creates a newline with the indentation exactly as
> on
> the previous one.
> * cc-ts-mode: re-indents the `bool b;` line, then creates a new one with a
> custom indentation that is different from one on the `int a;` line.
> 
> The cc-mode and Sublime Text behaviour seems like less annoying to me, because
> if I wanted to reindent the prev. line, most likely I'd did it by pressing an
> indentation hotkey (e.g. `=` in Evil mode I use).

Oh, wait, though I mistakengly used c-mode instead of c++-mode. The c-mode works this way, it keeps prev. indentation, however c++-mode instead uses a new indentation. It's odd they behave differently, and it certainly is different from other modes (e.g. emacs-lisp-mode). In this case I think the question of whether it should re-use prev. line indentation, which I think the should.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  8:22                     ` Konstantin Kharlamov
@ 2023-02-11  8:41                       ` Theodor Thornhill
  2023-02-11  9:37                         ` Konstantin Kharlamov
  0 siblings, 1 reply; 26+ messages in thread
From: Theodor Thornhill @ 2023-02-11  8:41 UTC (permalink / raw)
  To: Konstantin Kharlamov, emacs-devel, Po Lu, Eli Zaretskii; +Cc: Holger Schurig



On 11 February 2023 09:22:06 CET, Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
>On Sat, 2023-02-11 at 10:11 +0300, Konstantin Kharlamov wrote:
>> On Sat, 2023-02-11 at 07:51 +0100, Theodor Thornhill wrote:
>> > 
>> > 
>> > On 11 February 2023 07:36:26 CET, Konstantin Kharlamov <hi-angel@yandex.ru>
>> > wrote:
>> > > On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
>> > > > On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
>> > > > > Eli Zaretskii <eliz@gnu.org> writes:
>> > > > > 
>> > > > > > However, I meant the IDEs which are using tree-sitter and support
>> > > > > > developing C/C++ programs.  I believe some do.
>> > > > > 
>> > > > > I think most of those have similar problems supporting macros.
>> > > > > Who knows their names? I may be able to ask some of their users.
>> > > > 
>> > > > From my experience on and off work, there are just two IDEs (as in, not
>> > > > editors)
>> > > > used most widely for C++ code: QtCreator and Visual Studio. The first
>> > > > you
>> > > > discussed, the second is proprietary.
>> > > > 
>> > > > Then again, people most often code in C++ and C with text editors, in
>> > > > that
>> > > > case
>> > > > popular choices from my experience: Sublime Text and VS Code. These two
>> > > > have
>> > > > don't use tree-sitter either.
>> > > 
>> > > I installed Sublime Text on my Archlinux and tested with the C++ code OP
>> > > posted.
>> > > 
>> > > What I see is that ST does seem confused about indentation, while trying
>> > > to
>> > > make
>> > > a newline right after `slots:` line.
>> > > 
>> > > However, if you try to make a newline after the `void someSlot() {};`
>> > > line,
>> > > it
>> > > will use the indentation used on the previous line.
>> > > 
>> > > The default cc-mode in Emacs works similarly. The cc-ts-mode on the other
>> > > hand
>> > > doesn't make use of the previous indentation, and I think it should. It
>> > > would
>> > > resolve that problem and others, because in my experience it happens very
>> > > often
>> > > in C and C++ code that you want some custom indentation level, so you just
>> > > make
>> > > one and you expect the editor to keep it while creating more new lines.
>> > > 
>> > 
>> > That last statement sounds easily solvable. Can you send me a short example
>> > describing exactly what you want in a code snippet and I'll add it.
>> > 
>> > Thanks,
>> > Theo
>> 
>> Thank you! The example is below, but please wait a bit just to make sure
>> there's no opposition from other people, because I don't know if it works like
>> this on purpose, or not.
>> 
>> Given this C++ code with weird class members indentation:
>> 
>>     class Foo {
>>            int a;
>>            bool b;
>>     };
>> 
>> Now, suppose you put a caret after `bool b;` text and press Enter to make a
>> new
>> line (all tests are done with `emacs -Q`). The behaviour:
>> 
>> * cc-mode and Sublime Text: creates a newline with the indentation exactly as
>> on
>> the previous one.
>> * cc-ts-mode: re-indents the `bool b;` line, then creates a new one with a
>> custom indentation that is different from one on the `int a;` line.
>> 
>> The cc-mode and Sublime Text behaviour seems like less annoying to me, because
>> if I wanted to reindent the prev. line, most likely I'd did it by pressing an
>> indentation hotkey (e.g. `=` in Evil mode I use).
>
>Oh, wait, though I mistakengly used c-mode instead of c++-mode. The c-mode works this way, it keeps prev. indentation, however c++-mode instead uses a new indentation. It's odd they behave differently, and it certainly is different from other modes (e.g. emacs-lisp-mode). In this case I think the question of whether it should re-use prev. line indentation, which I think the should.

C-mode or c-ts-mode?

Yeah, this is what I'm thinking too. I'll look at it tonight or tomorrow :)

Theo



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  7:11                   ` Konstantin Kharlamov
  2023-02-11  7:53                     ` Konstantin Kharlamov
  2023-02-11  8:22                     ` Konstantin Kharlamov
@ 2023-02-11  8:43                     ` Eli Zaretskii
  2 siblings, 0 replies; 26+ messages in thread
From: Eli Zaretskii @ 2023-02-11  8:43 UTC (permalink / raw)
  To: Konstantin Kharlamov; +Cc: theo, emacs-devel, luangruo, holgerschurig

> From: Konstantin Kharlamov <hi-angel@yandex.ru>
> Cc: Holger Schurig <holgerschurig@gmail.com>
> Date: Sat, 11 Feb 2023 10:11:24 +0300
> 
> On Sat, 2023-02-11 at 07:51 +0100, Theodor Thornhill wrote:
> > 
> > 
> > On 11 February 2023 07:36:26 CET, Konstantin Kharlamov <hi-angel@yandex.ru>
> > wrote:
> > > On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
> > > > On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> > > > > Eli Zaretskii <eliz@gnu.org> writes:
> > > > > 
> > > > > > However, I meant the IDEs which are using tree-sitter and support
> > > > > > developing C/C++ programs.  I believe some do.
> > > > > 
> > > > > I think most of those have similar problems supporting macros.
> > > > > Who knows their names? I may be able to ask some of their users.
> > > > 
> > > > From my experience on and off work, there are just two IDEs (as in, not
> > > > editors)
> > > > used most widely for C++ code: QtCreator and Visual Studio. The first you
> > > > discussed, the second is proprietary.
> > > > 
> > > > Then again, people most often code in C++ and C with text editors, in that
> > > > case
> > > > popular choices from my experience: Sublime Text and VS Code. These two
> > > > have
> > > > don't use tree-sitter either.
> > > 
> > > I installed Sublime Text on my Archlinux and tested with the C++ code OP
> > > posted.
> > > 
> > > What I see is that ST does seem confused about indentation, while trying to
> > > make
> > > a newline right after `slots:` line.
> > > 
> > > However, if you try to make a newline after the `void someSlot() {};` line,
> > > it
> > > will use the indentation used on the previous line.
> > > 
> > > The default cc-mode in Emacs works similarly. The cc-ts-mode on the other
> > > hand
> > > doesn't make use of the previous indentation, and I think it should. It
> > > would
> > > resolve that problem and others, because in my experience it happens very
> > > often
> > > in C and C++ code that you want some custom indentation level, so you just
> > > make
> > > one and you expect the editor to keep it while creating more new lines.
> > > 
> > 
> > That last statement sounds easily solvable. Can you send me a short example
> > describing exactly what you want in a code snippet and I'll add it.
> > 
> > Thanks,
> > Theo
> 
> Thank you! The example is below, but please wait a bit just to make sure there's no opposition from other people, because I don't know if it works like this on purpose, or not.

Since we are close to a pretest, I think we should have a defcustom
which controls this behavior, and leave to users whether to turn this
on or off.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-10  8:42     ` Eli Zaretskii
       [not found]       ` <CAOpc7mHX6s0B8vdDee+9FMvQejGTSL3jzgwVekS7Esg-AOf=jw@mail.gmail.com>
@ 2023-02-11  9:34       ` Ihor Radchenko
  2023-02-11 10:42         ` Eli Zaretskii
  2023-02-11 13:58         ` Lynn Winebarger
  1 sibling, 2 replies; 26+ messages in thread
From: Ihor Radchenko @ 2023-02-11  9:34 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Yuan Fu, luangruo, holgerschurig, Emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

> Another possibility is to complicate the function we pass to
> tree-sitter with which to read buffer text, in a way that replaces the
> text of a macro with something else (in the simplest case, just space
> characters), so as to avoid errors in the parser, and again analyze
> the macros in our own code.

Another idea is delegating parts of buffer to Elisp/alternative parser.

Tree sitter provides support to documents written using a mixture of
grammars: https://tree-sitter.github.io/tree-sitter/using-parsers#multi-language-documents
Macros can be considered such a "mixed" grammar with macros being a
grammar of their own.

AFAIU, tree sitter allows excluding certain file ranges from parsing
and instead parse the excluded ranges using alternative grammar. If
Elisp can somehow tell tree-sitter backend not skip parsing
macro-looking lines, it should solve the problem at least partially.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  8:41                       ` Theodor Thornhill
@ 2023-02-11  9:37                         ` Konstantin Kharlamov
  2023-02-11 10:25                           ` Konstantin Kharlamov
  0 siblings, 1 reply; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-02-11  9:37 UTC (permalink / raw)
  To: Theodor Thornhill, emacs-devel, Po Lu, Eli Zaretskii; +Cc: Holger Schurig

On Sat, 2023-02-11 at 09:41 +0100, Theodor Thornhill wrote:
> 
> 
> On 11 February 2023 09:22:06 CET, Konstantin Kharlamov <hi-angel@yandex.ru>
> wrote:
> > On Sat, 2023-02-11 at 10:11 +0300, Konstantin Kharlamov wrote:
> > > On Sat, 2023-02-11 at 07:51 +0100, Theodor Thornhill wrote:
> > > > 
> > > > 
> > > > On 11 February 2023 07:36:26 CET, Konstantin Kharlamov
> > > > <hi-angel@yandex.ru>
> > > > wrote:
> > > > > On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
> > > > > > On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> > > > > > > Eli Zaretskii <eliz@gnu.org> writes:
> > > > > > > 
> > > > > > > > However, I meant the IDEs which are using tree-sitter and
> > > > > > > > support
> > > > > > > > developing C/C++ programs.  I believe some do.
> > > > > > > 
> > > > > > > I think most of those have similar problems supporting macros.
> > > > > > > Who knows their names? I may be able to ask some of their users.
> > > > > > 
> > > > > > From my experience on and off work, there are just two IDEs (as in,
> > > > > > not
> > > > > > editors)
> > > > > > used most widely for C++ code: QtCreator and Visual Studio. The
> > > > > > first
> > > > > > you
> > > > > > discussed, the second is proprietary.
> > > > > > 
> > > > > > Then again, people most often code in C++ and C with text editors,
> > > > > > in
> > > > > > that
> > > > > > case
> > > > > > popular choices from my experience: Sublime Text and VS Code. These
> > > > > > two
> > > > > > have
> > > > > > don't use tree-sitter either.
> > > > > 
> > > > > I installed Sublime Text on my Archlinux and tested with the C++ code
> > > > > OP
> > > > > posted.
> > > > > 
> > > > > What I see is that ST does seem confused about indentation, while
> > > > > trying
> > > > > to
> > > > > make
> > > > > a newline right after `slots:` line.
> > > > > 
> > > > > However, if you try to make a newline after the `void someSlot() {};`
> > > > > line,
> > > > > it
> > > > > will use the indentation used on the previous line.
> > > > > 
> > > > > The default cc-mode in Emacs works similarly. The cc-ts-mode on the
> > > > > other
> > > > > hand
> > > > > doesn't make use of the previous indentation, and I think it should.
> > > > > It
> > > > > would
> > > > > resolve that problem and others, because in my experience it happens
> > > > > very
> > > > > often
> > > > > in C and C++ code that you want some custom indentation level, so you
> > > > > just
> > > > > make
> > > > > one and you expect the editor to keep it while creating more new
> > > > > lines.
> > > > > 
> > > > 
> > > > That last statement sounds easily solvable. Can you send me a short
> > > > example
> > > > describing exactly what you want in a code snippet and I'll add it.
> > > > 
> > > > Thanks,
> > > > Theo
> > > 
> > > Thank you! The example is below, but please wait a bit just to make sure
> > > there's no opposition from other people, because I don't know if it works
> > > like
> > > this on purpose, or not.
> > > 
> > > Given this C++ code with weird class members indentation:
> > > 
> > >     class Foo {
> > >            int a;
> > >            bool b;
> > >     };
> > > 
> > > Now, suppose you put a caret after `bool b;` text and press Enter to make
> > > a
> > > new
> > > line (all tests are done with `emacs -Q`). The behaviour:
> > > 
> > > * cc-mode and Sublime Text: creates a newline with the indentation exactly
> > > as
> > > on
> > > the previous one.
> > > * cc-ts-mode: re-indents the `bool b;` line, then creates a new one with a
> > > custom indentation that is different from one on the `int a;` line.
> > > 
> > > The cc-mode and Sublime Text behaviour seems like less annoying to me,
> > > because
> > > if I wanted to reindent the prev. line, most likely I'd did it by pressing
> > > an
> > > indentation hotkey (e.g. `=` in Evil mode I use).
> > 
> > Oh, wait, though I mistakengly used c-mode instead of c++-mode. The c-mode
> > works this way, it keeps prev. indentation, however c++-mode instead uses a
> > new indentation. It's odd they behave differently, and it certainly is
> > different from other modes (e.g. emacs-lisp-mode). In this case I think the
> > question of whether it should re-use prev. line indentation, which I think
> > the should.
> 
> C-mode or c-ts-mode?
> 
> Yeah, this is what I'm thinking too. I'll look at it tonight or tomorrow :)

c-ts-mode works the same way as c++-ts-mode does.

Upon further inspection I realised that the vanilla c-mode keeps previous
indentation in aforementioned case just because it doesn't recognise `class`
keyword. But if you replace it with `struct`, it will make use of whatever
indentation it thinks is correct instead of one from previous line.

However, actually, the vanilla c-mode and c++-mode behave inconsistently.
Depending on the code they may or may not make use of previous indentation. So
anyway, I re-created an example where indentation is being kept in ST, c-mode,
c++-mode, but not in c-ts-mode or c++-ts-mode, below. I also threw in other
editors for comparison.

Given this code:

    int main() {
        foobar(
             arg1,
             arg2
             );
    }

Suppose you put a caret after `arg2` text and press Enter to make a new line
(all tests are done with `emacs -Q`). The behaviour:

* c-mode, c++-mode, Sublime Text (both with `.c` and `.cpp` file), VS Code (both
with `.c` and `.cpp` file): creates a new line indented same way as previous
one.
* c-ts-mode, c++-ts-mode: re-indents the `arg2` line to have indentation
different from `arg1,` line, and creates a new line that also has new
indentation.
* QtCreator: lol, it does no indentation whatsoever in this case.

Overall, it seems like "using the previous indentation" is the way to go, it
also is used in VS Code and Sublime Text.

As a side note, if a user explicitly wants to re-indent the code, behaviour
should depend on how much text they selected for re-indentation (at least c-mode
and c++-mode behave this way) which is intuitive. For example: if I only select
arg2 line, then re-indentation uses previous offsets, so basically nothing
happens. However if I select arg1 and arg2 lines, then indentation would be
different because the previous line has a different syntax construction of
"opening parenthesis", so the default indentation for that case is used, which
is "indent arguments to the opening parenthesis of the function".



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  9:37                         ` Konstantin Kharlamov
@ 2023-02-11 10:25                           ` Konstantin Kharlamov
  0 siblings, 0 replies; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-02-11 10:25 UTC (permalink / raw)
  To: Theodor Thornhill, emacs-devel, Po Lu, Eli Zaretskii; +Cc: Holger Schurig

On Sat, 2023-02-11 at 12:37 +0300, Konstantin Kharlamov wrote:
> Given this code:
> 
>     int main() {
>         foobar(
>              arg1,
>              arg2
>              );
>     }
> 
> Suppose you put a caret after `arg2` text and press Enter to make a new line
> (all tests are done with `emacs -Q`). The behaviour:
> 
> * c-mode, c++-mode, Sublime Text (both with `.c` and `.cpp` file), VS Code
> (both
> with `.c` and `.cpp` file): creates a new line indented same way as previous
> one.
> * c-ts-mode, c++-ts-mode: re-indents the `arg2` line to have indentation
> different from `arg1,` line, and creates a new line that also has new
> indentation.
> * QtCreator: lol, it does no indentation whatsoever in this case.

Ah, QtCreator does indent, it's just it isn't clear right away due to the way
its GUI behaves. Anyway, it creates a new line with its own indentation (weird
one btw, it is 5 spaces further than the opening parenthesis).



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  9:34       ` Ihor Radchenko
@ 2023-02-11 10:42         ` Eli Zaretskii
  2023-02-11 13:58         ` Lynn Winebarger
  1 sibling, 0 replies; 26+ messages in thread
From: Eli Zaretskii @ 2023-02-11 10:42 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, luangruo, holgerschurig, Emacs-devel

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: Yuan Fu <casouri@gmail.com>, luangruo@yahoo.com,
>  holgerschurig@gmail.com, Emacs-devel@gnu.org
> Date: Sat, 11 Feb 2023 09:34:42 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Another possibility is to complicate the function we pass to
> > tree-sitter with which to read buffer text, in a way that replaces the
> > text of a macro with something else (in the simplest case, just space
> > characters), so as to avoid errors in the parser, and again analyze
> > the macros in our own code.
> 
> Another idea is delegating parts of buffer to Elisp/alternative parser.

Could be.  However:

> Tree sitter provides support to documents written using a mixture of
> grammars: https://tree-sitter.github.io/tree-sitter/using-parsers#multi-language-documents
> Macros can be considered such a "mixed" grammar with macros being a
> grammar of their own.
> 
> AFAIU, tree sitter allows excluding certain file ranges from parsing
> and instead parse the excluded ranges using alternative grammar. If
> Elisp can somehow tell tree-sitter backend not skip parsing
> macro-looking lines, it should solve the problem at least partially.

I believe the problem is with handling the parts which _use_ the
macro, not those parts which _define_ macros.

Still, this idea should be explored, I think.

Thanks.



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  9:34       ` Ihor Radchenko
  2023-02-11 10:42         ` Eli Zaretskii
@ 2023-02-11 13:58         ` Lynn Winebarger
  1 sibling, 0 replies; 26+ messages in thread
From: Lynn Winebarger @ 2023-02-11 13:58 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: Eli Zaretskii, Yuan Fu, luangruo, holgerschurig, Emacs-devel

On Sat, Feb 11, 2023 at 4:34 AM Ihor Radchenko <yantar92@posteo.net> wrote:
>
> Eli Zaretskii <eliz@gnu.org> writes:
>
> > Another possibility is to complicate the function we pass to
> > tree-sitter with which to read buffer text, in a way that replaces the
> > text of a macro with something else (in the simplest case, just space
> > characters), so as to avoid errors in the parser, and again analyze
> > the macros in our own code.
>
> Another idea is delegating parts of buffer to Elisp/alternative parser.
>
> Tree sitter provides support to documents written using a mixture of
> grammars: https://tree-sitter.github.io/tree-sitter/using-parsers#multi-language-documents
> Macros can be considered such a "mixed" grammar with macros being a
> grammar of their own.
>
> AFAIU, tree sitter allows excluding certain file ranges from parsing
> and instead parse the excluded ranges using alternative grammar. If
> Elisp can somehow tell tree-sitter backend not skip parsing
> macro-looking lines, it should solve the problem at least partially.
>
What's needed is (a) a generic "macro keyword" terminal in tree-sitter
grammar, recognized by the lexer because they can appear anywhere, and
(b) a parser for macro definitions.
Then (b) maintains the set of macros that the lexer uses to recognize
instances of (a).
For extra credit, the macros could be hypothetically expanded, the
results parsed, and the annotations generated on instances of the
macro arguments mapped back to their occurrence as arguments.  Maybe
some kind of "unfold" notation could be used to see the results of the
expansion and the resulting annotations in context.

Lynn



^ permalink raw reply	[flat|nested] 26+ messages in thread

* Re: tree-sitter: conceptional problem solvable at Emacs' level?
  2023-02-11  6:51                 ` Theodor Thornhill
  2023-02-11  7:11                   ` Konstantin Kharlamov
@ 2023-04-16 19:21                   ` Konstantin Kharlamov
  1 sibling, 0 replies; 26+ messages in thread
From: Konstantin Kharlamov @ 2023-04-16 19:21 UTC (permalink / raw)
  To: Theodor Thornhill, emacs-devel, Po Lu, Eli Zaretskii; +Cc: Holger Schurig

On Sat, 2023-02-11 at 07:51 +0100, Theodor Thornhill wrote:
>
>
> On 11 February 2023 07:36:26 CET, Konstantin Kharlamov <hi-angel@yandex.ru>
> wrote:
> > On Sat, 2023-02-11 at 09:25 +0300, Konstantin Kharlamov wrote:
> > > On Sat, 2023-02-11 at 10:17 +0800, Po Lu wrote:
> > > > Eli Zaretskii <eliz@gnu.org> writes:
> > > >
> > > > > However, I meant the IDEs which are using tree-sitter and support
> > > > > developing C/C++ programs.  I believe some do.
> > > >
> > > > I think most of those have similar problems supporting macros.
> > > > Who knows their names? I may be able to ask some of their users.
> > >
> > > From my experience on and off work, there are just two IDEs (as in, not
> > > editors)
> > > used most widely for C++ code: QtCreator and Visual Studio. The first you
> > > discussed, the second is proprietary.
> > >
> > > Then again, people most often code in C++ and C with text editors, in that
> > > case
> > > popular choices from my experience: Sublime Text and VS Code. These two
> > > have
> > > don't use tree-sitter either.
> >
> > I installed Sublime Text on my Archlinux and tested with the C++ code OP
> > posted.
> >
> > What I see is that ST does seem confused about indentation, while trying to
> > make
> > a newline right after `slots:` line.
> >
> > However, if you try to make a newline after the `void someSlot() {};` line,
> > it
> > will use the indentation used on the previous line.
> >
> > The default cc-mode in Emacs works similarly. The cc-ts-mode on the other
> > hand
> > doesn't make use of the previous indentation, and I think it should. It
> > would
> > resolve that problem and others, because in my experience it happens very
> > often
> > in C and C++ code that you want some custom indentation level, so you just
> > make
> > one and you expect the editor to keep it while creating more new lines.
> >
>
> That last statement sounds easily solvable. Can you send me a short example
> describing exactly what you want in a code snippet and I'll add it.
>
> Thanks,
> Theo

Incidentally, I'm reading Stefan Monnier's paper presenting SMIE¹, and it has the
following example:

> […] It also means that the indentation code should strive to obey previous choices
> that the user made. For example if the user wants to indent its code in the
> following unconventional way:
>
>  longfunctionname(argument1, argument2,
>                      argument3,
>                      argument4);
>
> while the user should not be surprised if the auto-indenter tries to align
> ‘argument3’ with ‘argument1’, it would be reasonable for them to expect that
> ‘argument4’ stays put by simply aligning it with its nearest sibling rather than
> with the earlier ‘argument1’.

That is to say: yeah, the previous indentation level should be used by default if 
present.

1: https://arxiv.org/abs/2006.03103




^ permalink raw reply	[flat|nested] 26+ messages in thread

end of thread, other threads:[~2023-04-16 19:21 UTC | newest]

Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-02-09  8:09 tree-sitter: conceptional problem solvable at Emacs' level? Holger Schurig
2023-02-09  8:17 ` Po Lu
2023-02-09  8:50   ` Eli Zaretskii
2023-02-09 10:13     ` Po Lu
2023-02-09 10:55       ` Eli Zaretskii
2023-02-10  7:33   ` Yuan Fu
2023-02-10  8:42     ` Eli Zaretskii
     [not found]       ` <CAOpc7mHX6s0B8vdDee+9FMvQejGTSL3jzgwVekS7Esg-AOf=jw@mail.gmail.com>
2023-02-10 11:48         ` Eli Zaretskii
2023-02-11  2:17           ` Po Lu
2023-02-11  6:25             ` Konstantin Kharlamov
2023-02-11  6:36               ` Konstantin Kharlamov
2023-02-11  6:51                 ` Theodor Thornhill
2023-02-11  7:11                   ` Konstantin Kharlamov
2023-02-11  7:53                     ` Konstantin Kharlamov
2023-02-11  8:22                     ` Konstantin Kharlamov
2023-02-11  8:41                       ` Theodor Thornhill
2023-02-11  9:37                         ` Konstantin Kharlamov
2023-02-11 10:25                           ` Konstantin Kharlamov
2023-02-11  8:43                     ` Eli Zaretskii
2023-04-16 19:21                   ` Konstantin Kharlamov
2023-02-11  9:34       ` Ihor Radchenko
2023-02-11 10:42         ` Eli Zaretskii
2023-02-11 13:58         ` Lynn Winebarger
2023-02-09 16:25 ` Ergus
2023-02-09 20:09   ` Dmitry Gutov
2023-02-10  7:41     ` Holger Schurig

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).