unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* feature/tree-sitter: Where to Put C/C++ Stuff
@ 2022-11-01  2:30 Randy Taylor
  2022-11-01  5:44 ` Theodor Thornhill
  2022-11-01  7:20 ` Eli Zaretskii
  0 siblings, 2 replies; 28+ messages in thread
From: Randy Taylor @ 2022-11-01  2:30 UTC (permalink / raw)
  To: emacs-devel@gnu.org

[-- Attachment #1: Type: text/plain, Size: 327 bytes --]

Hi.

Where specifically should the C and C++ tree-sitter stuff go? I've been using it for a couple months and would like to upstream syntax highlighting for both. I'll focus on getting C done first.

I see there are a lot of cc- files; would it be appropriate to add the tree-sitter stuff into a new cc-treesit.el file?
Thanks.

[-- Attachment #2: Type: text/html, Size: 877 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  2:30 feature/tree-sitter: Where to Put C/C++ Stuff Randy Taylor
@ 2022-11-01  5:44 ` Theodor Thornhill
  2022-11-01  7:24   ` Eli Zaretskii
  2022-11-02 20:43   ` João Távora
  2022-11-01  7:20 ` Eli Zaretskii
  1 sibling, 2 replies; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01  5:44 UTC (permalink / raw)
  To: emacs-devel, Randy Taylor, emacs-devel@gnu.org



On 1 November 2022 03:30:54 CET, Randy Taylor <dev@rjt.dev> wrote:
>Hi.
>
>Where specifically should the C and C++ tree-sitter stuff go? I've been using it for a couple months and would like to upstream syntax highlighting for both. I'll focus on getting C done first.
>
>I see there are a lot of cc- files; would it be appropriate to add the tree-sitter stuff into a new cc-treesit.el file?
>Thanks.

I'm no authority on the matter, but I'd love for us not to complicate things too much. I vote for separate, non-cc-prefixed _new_ modes, that derives from prog-mode.

I understand that this is a controversial opinion, but that's what I want. I believe people will do that anyway if we don't.

Theo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  2:30 feature/tree-sitter: Where to Put C/C++ Stuff Randy Taylor
  2022-11-01  5:44 ` Theodor Thornhill
@ 2022-11-01  7:20 ` Eli Zaretskii
  2022-11-01 12:10   ` Alan Mackenzie
  1 sibling, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01  7:20 UTC (permalink / raw)
  To: Randy Taylor, Alan Mackenzie; +Cc: emacs-devel

> Date: Tue, 01 Nov 2022 02:30:54 +0000
> From: Randy Taylor <dev@rjt.dev>
> 
> Where specifically should the C and C++ tree-sitter stuff go? I've been using it for a couple months and would
> like to upstream syntax highlighting for both. I'll focus on getting C done first.
> 
> I see there are a lot of cc- files; would it be appropriate to add the tree-sitter stuff into a new cc-treesit.el file?

I suggest a separate cc-*.el file (e.g., cc-treesit.el), and some user
option to trigger its use instead of (or maybe in addition to, as the
case may be) the equivalent CC mode stuff.

Alan, are you okay with this approach?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  5:44 ` Theodor Thornhill
@ 2022-11-01  7:24   ` Eli Zaretskii
  2022-11-01  7:55     ` Theodor Thornhill
  2022-11-01 13:32     ` Stefan Monnier
  2022-11-02 20:43   ` João Távora
  1 sibling, 2 replies; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01  7:24 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: emacs-devel, dev, emacs-devel

> Date: Tue, 01 Nov 2022 06:44:38 +0100
> From: Theodor Thornhill <theo@thornhill.no>
> 
> >Where specifically should the C and C++ tree-sitter stuff go? I've been using it for a couple months and would like to upstream syntax highlighting for both. I'll focus on getting C done first.
> >
> >I see there are a lot of cc- files; would it be appropriate to add the tree-sitter stuff into a new cc-treesit.el file?
> >Thanks.
> 
> I'm no authority on the matter, but I'd love for us not to complicate things too much. I vote for separate, non-cc-prefixed _new_ modes, that derives from prog-mode.

That'd mean people will need either to invent all the other goodies in
CC mode (everything except fontifications and indentation) from
scratch, or give up all those other goodies.  Does that make sense?

Tree-sitter doesn't (and cannot) replace everything a major mode does
for a programming language.  So a completely new mode means we through
the baby with the bathwater.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  7:24   ` Eli Zaretskii
@ 2022-11-01  7:55     ` Theodor Thornhill
  2022-11-01  9:22       ` Yuan Fu
  2022-11-01  9:57       ` Eli Zaretskii
  2022-11-01 13:32     ` Stefan Monnier
  1 sibling, 2 replies; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01  7:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dev, emacs-devel


Hi Eli!

>> Date: Tue, 01 Nov 2022 06:44:38 +0100
>> From: Theodor Thornhill <theo@thornhill.no>
>> 
>> >Where specifically should the C and C++ tree-sitter stuff go? I've
>> >been using it for a couple months and would like to upstream syntax
>> >highlighting for both. I'll focus on getting C done first.
>> >
>> >I see there are a lot of cc- files; would it be appropriate to add
>> >the tree-sitter stuff into a new cc-treesit.el file?  Thanks.
>> 
>> I'm no authority on the matter, but I'd love for us not to complicate
>> things too much. I vote for separate, non-cc-prefixed _new_ modes,
>> that derives from prog-mode.
>
> That'd mean people will need either to invent all the other goodies in
> CC mode (everything except fontifications and indentation) from
> scratch, or give up all those other goodies.  Does that make sense?
>

Yes, well, partially.  I think that we are too likely to create unwanted
issues by merging the two too closely.  I have seen several of these
issues the last couple of years while implementing c-sharp mode in cc
mode, emacs-tree-sitter and treesit.  There are several things that are
happening.  I'll try to expand on some of them just to create some
perspective, but also for some specific points where we can improve to
maybe don't have a problem with this at all.

1: Use CC mode for one thing and tree-sitter for the rest
While first implementing tree-sitter in c-sharp mode we tried just
applying font-locking, and use cc mode for indentation and the rest.
What happened was that we immediately inherited the performance issues
from cc mode straight into our code.  Specifically, when typing in a
file with too many (from cc mode's perspective) strings, typing lag rose
to several seconds per press.  I filed several bug reports on this both
here and to Alan.  After some time and much heroics we got some
improvement on this from Alan, but c-sharp already had moved on.

2: Using separate names for modes.
The great advantage here is easy to understand.  You have no inheritance
issues, and are free to develop features without regards to legacy.  A
disadvantage is that some users depend on that major mode name for other
stuff.  We had some issues filed with us to flip over to tree-sitter
completely, because that name (csharp-mode) was so important compared to
(csharp-tree-sitter-mode).  We almost made the change, but then Yuan
started his work so we waited.  This would have sunsetted the cc mode
almost immediately

3: Confusion with where to file bugs
We have many bugs in c-sharp mode where some things are emacs bugs, some
things are cc mode bugs, some are treesitter bugs and some are our own
bugs.  There is a real issue with understanding cc mode and figuring out
where a bug fix should end up.  It has taken me many weeks worth of
digging to understand only the simplest mechanisms of cc mode.
Tree-sitter takes contributors only a couple of hours to be immediately
productive.  To disregard this point with only compatibility with cc
mode is a huge mistake, IMO.

4: How do we know what to disable?
If there's a problem somewhere in the tree-sitter variant of the cc mode
derived new mode, and we see some issue - who makes the fix?  For
example, previously there was limited support for multiline strings in
cc mode, which took almost a year to finalize.  The tree-sitter variant
with more performance and accuracy took me maybe 20 minutes in a
work-meeting.  Should a feature that is simple to implement in the
tree-sitter variant wait for a similar cc mode implementation?  The
namespacing seems to suggest that yes, it should.

5: While tree-sitter is only an engine, it provides a lot more goodies
We have a huge opportunity to create real new frameworks for emacs now,
but limiting us to merge the features/modes suggests that we cannot
reliably do overarching advancements such as we see now in the
feature/tree-sitter branch.  For example, many small hacks I've made in
the modes I've submitted thus far has made it into general mechanisms in
treesit.el.  All modes that enable tree-sitter should be able to use
these and all the new that come _without_ worrying whether or not some
issue will crop up from inheriting from cc mode or some other thing.
Examples are indentation styles, paredit-like funciontalities,
refactorings and more.

6: What are the goodies that we really need from CC mode?
CC mode provides indentation and font locking.  What else does it
provide that isn't replaceable pretty quickly?  I mean this not as a
contrarian, but out of real curiosity.  My guess is that we can get to
feature parity and well beyond that in a very short amount of time, if
we're not hindered by merging everything.


Sorry for the long mail, but I think we are missing the point by viewing
tree-sitter simply as an engine to plop in aside cc mode for
convenience, and not the real infrastructure change it is.  There is no
need to sunset cc mode, but equally there is no need to limit tree-sitter.


> Tree-sitter doesn't (and cannot) replace everything a major mode does
> for a programming language.  So a completely new mode means we through
> the baby with the bathwater.

I don't agree, but I'm very curious to what else would take a
significant effort _apart_ from indentation feature parity with cc mode is.

One thing I know of is integration with package managers such as what
elm-mode and go-mode does, but that is an easy fix.  The upstream
go-mode, if not possible to move to core can just derive from a simple
go-treesit, skip all indentation and font-locking in its own mode, but
supply the goodies.

-- 
Theo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  7:55     ` Theodor Thornhill
@ 2022-11-01  9:22       ` Yuan Fu
  2022-11-01  9:41         ` Theodor Thornhill
  2022-11-01  9:57       ` Eli Zaretskii
  1 sibling, 1 reply; 28+ messages in thread
From: Yuan Fu @ 2022-11-01  9:22 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: Eli Zaretskii, emacs-devel, dev

Before we jump into discussions, I want to note that many of your (Theo’s) arguments seem to be against cc-mode rather than “using the same major mode”. For major modes that doesn’t use cc-mode (like python-mode), tree-sitter and non-tree-sitter features so far coexist just fine.

>> 
>> That'd mean people will need either to invent all the other goodies in
>> CC mode (everything except fontifications and indentation) from
>> scratch, or give up all those other goodies.  Does that make sense?
>> 
> 
> Yes, well, partially.  I think that we are too likely to create unwanted
> issues by merging the two too closely.  I have seen several of these
> issues the last couple of years while implementing c-sharp mode in cc
> mode, emacs-tree-sitter and treesit.  There are several things that are
> happening.  I'll try to expand on some of them just to create some
> perspective, but also for some specific points where we can improve to
> maybe don't have a problem with this at all.
> 
> 1: Use CC mode for one thing and tree-sitter for the rest
> While first implementing tree-sitter in c-sharp mode we tried just
> applying font-locking, and use cc mode for indentation and the rest.
> What happened was that we immediately inherited the performance issues
> from cc mode straight into our code.  Specifically, when typing in a
> file with too many (from cc mode's perspective) strings, typing lag rose
> to several seconds per press.  I filed several bug reports on this both
> here and to Alan.  After some time and much heroics we got some
> improvement on this from Alan, but c-sharp already had moved on.
> 
> 2: Using separate names for modes.
> The great advantage here is easy to understand.  You have no inheritance
> issues, and are free to develop features without regards to legacy.  A
> disadvantage is that some users depend on that major mode name for other
> stuff.  We had some issues filed with us to flip over to tree-sitter
> completely, because that name (csharp-mode) was so important compared to
> (csharp-tree-sitter-mode).  We almost made the change, but then Yuan
> started his work so we waited.  This would have sunsetted the cc mode
> almost immediately
> 
> 3: Confusion with where to file bugs
> We have many bugs in c-sharp mode where some things are emacs bugs, some
> things are cc mode bugs, some are treesitter bugs and some are our own
> bugs.  There is a real issue with understanding cc mode and figuring out
> where a bug fix should end up.  It has taken me many weeks worth of
> digging to understand only the simplest mechanisms of cc mode.
> Tree-sitter takes contributors only a couple of hours to be immediately
> productive.  To disregard this point with only compatibility with cc
> mode is a huge mistake, IMO.
> 
> 4: How do we know what to disable?
> If there's a problem somewhere in the tree-sitter variant of the cc mode
> derived new mode, and we see some issue - who makes the fix?  For
> example, previously there was limited support for multiline strings in
> cc mode, which took almost a year to finalize.  The tree-sitter variant
> with more performance and accuracy took me maybe 20 minutes in a
> work-meeting.  Should a feature that is simple to implement in the
> tree-sitter variant wait for a similar cc mode implementation?  The
> namespacing seems to suggest that yes, it should.

I don’t think it should (which I think we both agree). And I don’t think it’s any problem if a major mode has some tree-sitter-powered feature that the non-tree-sitter version doesn’t have.

> 
> 5: While tree-sitter is only an engine, it provides a lot more goodies
> We have a huge opportunity to create real new frameworks for emacs now,
> but limiting us to merge the features/modes suggests that we cannot
> reliably do overarching advancements such as we see now in the
> feature/tree-sitter branch.  For example, many small hacks I've made in
> the modes I've submitted thus far has made it into general mechanisms in
> treesit.el.  All modes that enable tree-sitter should be able to use
> these and all the new that come _without_ worrying whether or not some
> issue will crop up from inheriting from cc mode or some other thing.
> Examples are indentation styles, paredit-like funciontalities,
> refactorings and more.
> 
> 6: What are the goodies that we really need from CC mode?
> CC mode provides indentation and font locking.  What else does it
> provide that isn't replaceable pretty quickly?  I mean this not as a
> contrarian, but out of real curiosity.  

One thing I found, which might be the only thing, is filling, specifically filling the /* */ style comments while respecting all style of drawing stars in these comments. I mean all the style like

/*
 *
 */

/*=====================================

=======================================*/

Etc, etc. I tried to look at c-mask-paragraph, and it is very complicated. Maybe we can use c-fill-paragraph without setting up the rest of cc-mode?

> My guess is that we can get to
> feature parity and well beyond that in a very short amount of time, if
> we're not hindered by merging everything.
> 
> 
> Sorry for the long mail, but I think we are missing the point by viewing
> tree-sitter simply as an engine to plop in aside cc mode for
> convenience, and not the real infrastructure change it is.  There is no
> need to sunset cc mode, but equally there is no need to limit tree-sitter.
> 

If mixing cc-mode and tree-sitter brings more problem than merit, maybe we can adopt a mutual exclusive policy, where a major mode either sets up cc-mode or uses tree-sitter, but never together.

> 
>> Tree-sitter doesn't (and cannot) replace everything a major mode does
>> for a programming language.  So a completely new mode means we through
>> the baby with the bathwater.
> 
> I don't agree, but I'm very curious to what else would take a
> significant effort _apart_ from indentation feature parity with cc mode is.

Tree-sitter is just a tool, obviously there are things a major mode provides that doesn’t involve a parser, eg, python’s REPL. But I see no prblem putting this feature alongside tree-sitter features in the same major mode.

> 
> One thing I know of is integration with package managers such as what
> elm-mode and go-mode does, but that is an easy fix.  The upstream
> go-mode, if not possible to move to core can just derive from a simple
> go-treesit, skip all indentation and font-locking in its own mode, but
> supply the goodies.
> 
> -- 
> Theo




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  9:22       ` Yuan Fu
@ 2022-11-01  9:41         ` Theodor Thornhill
  0 siblings, 0 replies; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01  9:41 UTC (permalink / raw)
  To: Yuan Fu; +Cc: Eli Zaretskii, emacs-devel, dev


Hi Yuan!

> Before we jump into discussions, I want to note that many of your
> (Theo’s) arguments seem to be against cc-mode rather than “using the
> same major mode”. For major modes that doesn’t use cc-mode (like
> python-mode), tree-sitter and non-tree-sitter features so far coexist
> just fine.
>

Yes, absolutely, but that's mostly because of the nature of complexity
in cc-mode.  For the record - I'm not against cc-mode, I'm actually
pretty impressed.  But I'm wary of the consequences of mixing
complexities here.  Also I'm not sure if cc-mode "owning" java-mode,
js-mode, c-mode, c++-mode etc makes sense.  They are their own
languages, and should maybe live as such.

>> 4: How do we know what to disable?
>> If there's a problem somewhere in the tree-sitter variant of the cc mode
>> derived new mode, and we see some issue - who makes the fix?  For
>> example, previously there was limited support for multiline strings in
>> cc mode, which took almost a year to finalize.  The tree-sitter variant
>> with more performance and accuracy took me maybe 20 minutes in a
>> work-meeting.  Should a feature that is simple to implement in the
>> tree-sitter variant wait for a similar cc mode implementation?  The
>> namespacing seems to suggest that yes, it should.
>
> I don’t think it should (which I think we both agree). And I don’t
> think it’s any problem if a major mode has some tree-sitter-powered
> feature that the non-tree-sitter version doesn’t have.
>

I agree.  For example I'm all for some variant of what we're doing in
js-mode.  I think we're still not there, but mixing _can_ be done.

>> 6: What are the goodies that we really need from CC mode?
>> CC mode provides indentation and font locking.  What else does it
>> provide that isn't replaceable pretty quickly?  I mean this not as a
>> contrarian, but out of real curiosity.  
>
> One thing I found, which might be the only thing, is filling,
> specifically filling the /* */ style comments while respecting all
> style of drawing stars in these comments. I mean all the style like
>
> /*
>  *
>  */
>
> /*=====================================
>
> =======================================*/
>
> Etc, etc. I tried to look at c-mask-paragraph, and it is very
> complicated. Maybe we can use c-fill-paragraph without setting up the
> rest of cc-mode?

Yes, this is true.  Either we can see if it's possible to reuse, or we
can roll our own down the line.  For the record.  Many things that tries
to use fill even in cc mode isn't 100%.  I doubt that using only parts
of cc mode is really feasible without it bleeding in other places, but I
don't have expertise to judge that alone.

>
>> My guess is that we can get to
>> feature parity and well beyond that in a very short amount of time, if
>> we're not hindered by merging everything.
>> 
>> 
>> Sorry for the long mail, but I think we are missing the point by viewing
>> tree-sitter simply as an engine to plop in aside cc mode for
>> convenience, and not the real infrastructure change it is.  There is no
>> need to sunset cc mode, but equally there is no need to limit tree-sitter.
>> 
>
> If mixing cc-mode and tree-sitter brings more problem than merit,
> maybe we can adopt a mutual exclusive policy, where a major mode
> either sets up cc-mode or uses tree-sitter, but never together.
>

This is my hope :-)

>> 
>>> Tree-sitter doesn't (and cannot) replace everything a major mode does
>>> for a programming language.  So a completely new mode means we through
>>> the baby with the bathwater.
>> 
>> I don't agree, but I'm very curious to what else would take a
>> significant effort _apart_ from indentation feature parity with cc mode is.
>
> Tree-sitter is just a tool, obviously there are things a major mode
> provides that doesn’t involve a parser, eg, python’s REPL. But I see
> no prblem putting this feature alongside tree-sitter features in the
> same major mode.
>

I agree with you here as well.

-- 
Theo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  7:55     ` Theodor Thornhill
  2022-11-01  9:22       ` Yuan Fu
@ 2022-11-01  9:57       ` Eli Zaretskii
  2022-11-01 11:53         ` Theodor Thornhill
  1 sibling, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01  9:57 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: emacs-devel, dev, emacs-devel

> From: Theodor Thornhill <theo@thornhill.no>
> Cc: emacs-devel@gnu.org, dev@rjt.dev, emacs-devel@gnu.org
> Date: Tue, 01 Nov 2022 08:55:44 +0100
> 
> Yes, well, partially.  I think that we are too likely to create unwanted
> issues by merging the two too closely.

Then we should merge them "not too closely", I guess.  The challenge
is to merge them so that we gain the most and lose the least.

> 1: Use CC mode for one thing and tree-sitter for the rest
> While first implementing tree-sitter in c-sharp mode we tried just
> applying font-locking, and use cc mode for indentation and the rest.
> What happened was that we immediately inherited the performance issues
> from cc mode straight into our code.

If those same performance issues exist today, then we don't lose
anything, do we?  We just gain less than we could.  But the amount of
work required for rewriting the other parts of CC Mode is huge, and we
don't want to leave users of CC Mode in a dilemma whether to switch to
a new mode and lose everything else for a significant amount of time,
or give up tree-sitter and stay with CC Mode.  Not something I'd agree
to.

I also have hard time believing that you can reimplement those slow
parts of CC Mode to be much faster, but if you have code to show which
does that, I'm sure I'd be interested to look at it and consider
improving CC Mode using that code.

> Specifically, when typing in a
> file with too many (from cc mode's perspective) strings, typing lag rose
> to several seconds per press.  I filed several bug reports on this both
> here and to Alan.  After some time and much heroics we got some
> improvement on this from Alan, but c-sharp already had moved on.

I don't know what c-sharp mode does besides fontification and
indentation, but CC Mode does a lot more, see below.  If you
disregarded a significant part of that, or if it is not relevant for
editing C# code, then your particular experience is not very
educational for the purposes of this discussion, and could lead us to
wrong conclusions.

It is trivially correct that a new mode can move much faster and make
breaking changes, but this is unacceptable for a mode that comes with
Emacs.  We respect our users much more than 3rd-party packages out
there do, and we do that for good reasons.

> 2: Using separate names for modes.
> The great advantage here is easy to understand.  You have no inheritance
> issues, and are free to develop features without regards to legacy.  A
> disadvantage is that some users depend on that major mode name for other
> stuff.

That's a _huge_ disadvantage, in my book.

> 3: Confusion with where to file bugs

Not relevant in our case: the bugs should be filed with Emacs.

> 4: How do we know what to disable?
> If there's a problem somewhere in the tree-sitter variant of the cc mode
> derived new mode, and we see some issue - who makes the fix?

Also not relevant: the answer is "we the Emacs project make the fix".

> 5: While tree-sitter is only an engine, it provides a lot more goodies
> We have a huge opportunity to create real new frameworks for emacs now,
> but limiting us to merge the features/modes suggests that we cannot
> reliably do overarching advancements such as we see now in the
> feature/tree-sitter branch.

Yes.  And trying to make breaking changes in important Emacs features
such as CC Mode is really a non-starter.  It isn't going to happen.

> 6: What are the goodies that we really need from CC mode?
> CC mode provides indentation and font locking.  What else does it
> provide that isn't replaceable pretty quickly?  I mean this not as a
> contrarian, but out of real curiosity.

CC Mode has a full-blown manual, where this question is answered.
Here's a partial list of features outside of the fontification and
indentation area, which I collected just by looking at the top-level
menus of that manual:

 . filling and breaking text in comments and strings
 . automatic insertion of newlines after braces, colons, commas, semi-colons
 . whitespace cleanups
 . minor modes: electric, hungry-delete, comment-style
 . c-offsets-alist and interactive indentation customization (related
   to indentation, but still extremely important, and not directly in
   tree-sitter)

> My guess is that we can get to feature parity and well beyond that
> in a very short amount of time, if we're not hindered by merging
> everything.

As they say, "show me the code".  If you can write up a C/C++ mode
from scratch which supports most everything in the CC Mode manual, do
it better/cleaner than CC Mode does, and do it before the emacs-29
branch is cut, in a month or so, I might change my mind.

> Sorry for the long mail, but I think we are missing the point by viewing
> tree-sitter simply as an engine to plop in aside cc mode for
> convenience, and not the real infrastructure change it is.

Who said we view tree-sitter that way?

What actually happens is that we gradually introduce tree-sitter as an
engine for replacing the implementation of Emacs features where it is
faster and/or better.  That is the plan.  There's no limit to these
replacements, except what tree-sitter can do and how we can use that.
But one thing we will NOT do is throw away existing important features
before we have equivalent replacements and before users tell us the
replacements are indeed better.

> There is no need to sunset cc mode, but equally there is no need to
> limit tree-sitter.

There's no limits.  The fact that we use tree-sitter for what we use
it now is just because _we_ decided to do that initially, in order to
have it in Emacs 29 as a useful infrastructure that users can take
advantage of.  I don't believe in releasing Emacs with infrastructure
that has no user-level features built on it.

> > Tree-sitter doesn't (and cannot) replace everything a major mode does
> > for a programming language.  So a completely new mode means we through
> > the baby with the bathwater.
> 
> I don't agree, but I'm very curious to what else would take a
> significant effort _apart_ from indentation feature parity with cc mode is.

See above: just read the CC Mode manual, and see for yourself.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  9:57       ` Eli Zaretskii
@ 2022-11-01 11:53         ` Theodor Thornhill
  2022-11-01 12:28           ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01 11:53 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dev, emacs-devel

Eli Zaretskii <eliz@gnu.org> writes:

>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: emacs-devel@gnu.org, dev@rjt.dev, emacs-devel@gnu.org
>> Date: Tue, 01 Nov 2022 08:55:44 +0100
>> 
>> Yes, well, partially.  I think that we are too likely to create unwanted
>> issues by merging the two too closely.
>
> Then we should merge them "not too closely", I guess.  The challenge
> is to merge them so that we gain the most and lose the least.
>

That is reasonable.  It's just the sentiment that we should do a full on
merge between tree-sitter and cc mode I don't like.  If we can find a
way to blend and still keep them distinct, we are on the correct path.
I don't have a clear solution, I'm afraid.  Personally I like how I did
it in ts-mode, where we fall back to cc mode if we cannot enable
tree-sitter.  That's not as easy an option for i.e java because java
already exists.  So some code has to end up in cc-mode, unless we make
separate modes.

>> 1: Use CC mode for one thing and tree-sitter for the rest
>> While first implementing tree-sitter in c-sharp mode we tried just
>> applying font-locking, and use cc mode for indentation and the rest.
>> What happened was that we immediately inherited the performance issues
>> from cc mode straight into our code.
>
> If those same performance issues exist today, then we don't lose
> anything, do we?  We just gain less than we could.  But the amount of
> work required for rewriting the other parts of CC Mode is huge, and we
> don't want to leave users of CC Mode in a dilemma whether to switch to
> a new mode and lose everything else for a significant amount of time,
> or give up tree-sitter and stay with CC Mode.  Not something I'd agree
> to.
>

That is also reasonable.


> I also have hard time believing that you can reimplement those slow
> parts of CC Mode to be much faster, but if you have code to show which
> does that, I'm sure I'd be interested to look at it and consider
> improving CC Mode using that code.
>

You'd be surprised.

- https://github.com/emacs-csharp/csharp-mode/pull/251
- https://github.com/emacs-csharp/csharp-mode/issues/207
- https://github.com/emacs-csharp/csharp-mode/issues/164
- https://debbugs.gnu.org/db/43/43631.html
- https://github.com/emacs-csharp/csharp-mode/issues/151
- https://github.com/emacs-csharp/csharp-mode/issues/200

All of these are solved with [0], no implementation needed for anything
(apart from generic tree-sitter machinery of course). 


>> Specifically, when typing in a
>> file with too many (from cc mode's perspective) strings, typing lag rose
>> to several seconds per press.  I filed several bug reports on this both
>> here and to Alan.  After some time and much heroics we got some
>> improvement on this from Alan, but c-sharp already had moved on.
>
> I don't know what c-sharp mode does besides fontification and
> indentation, but CC Mode does a lot more, see below.  If you
> disregarded a significant part of that, or if it is not relevant for
> editing C# code, then your particular experience is not very
> educational for the purposes of this discussion, and could lead us to
> wrong conclusions.
>
> It is trivially correct that a new mode can move much faster and make
> breaking changes, but this is unacceptable for a mode that comes with
> Emacs.  We respect our users much more than 3rd-party packages out
> there do, and we do that for good reasons.
>

I don't believe I disregard much here.  Yes it is trivially correct, but
I've spent a lot of time to improve on the c#-cc-mode support, out of
the same reasons you mention.

>> 2: Using separate names for modes.
>> The great advantage here is easy to understand.  You have no inheritance
>> issues, and are free to develop features without regards to legacy.  A
>> disadvantage is that some users depend on that major mode name for other
>> stuff.
>
> That's a _huge_ disadvantage, in my book.
>

Yes I agree

>> 3: Confusion with where to file bugs
>
> Not relevant in our case: the bugs should be filed with Emacs.
>

Well, are you sure?  Diagnosing a bug and its origin is as important as
actually writing the code.  Trying to make that diagnosing step easier
isn't worthless.  Even though all bugs end up in Emacs, the likelihood
that some casual reader of this list submits some queries and a function
to tree-sitter is _much_ bigger than almost anyone on this list trying
to grok cc.

>> 4: How do we know what to disable?
>> If there's a problem somewhere in the tree-sitter variant of the cc mode
>> derived new mode, and we see some issue - who makes the fix?
>
> Also not relevant: the answer is "we the Emacs project make the fix".
>

Sure, but we want as many as possible to be able to fix them, no?


>> 5: While tree-sitter is only an engine, it provides a lot more goodies
>> We have a huge opportunity to create real new frameworks for emacs now,
>> but limiting us to merge the features/modes suggests that we cannot
>> reliably do overarching advancements such as we see now in the
>> feature/tree-sitter branch.
>
> Yes.  And trying to make breaking changes in important Emacs features
> such as CC Mode is really a non-starter.  It isn't going to happen.
>

Ok.  Let me be clear.  I'm not suggesting breaking changes.  I'm only
saying that CC mode should go.  I agree with you here.  I'm trying to be
mindful with how, and offering some real, hard won experiences in this
exact tree-sitter/cc-mode gap.  It is trivially easy to say that we
should just add it to cc mode, not so much to know what some of the
hidden issues are.

>> 6: What are the goodies that we really need from CC mode?
>> CC mode provides indentation and font locking.  What else does it
>> provide that isn't replaceable pretty quickly?  I mean this not as a
>> contrarian, but out of real curiosity.
>
> CC Mode has a full-blown manual, where this question is answered.
> Here's a partial list of features outside of the fontification and
> indentation area, which I collected just by looking at the top-level
> menus of that manual:
>
>  . filling and breaking text in comments and strings
>  . automatic insertion of newlines after braces, colons, commas, semi-colons
>  . whitespace cleanups
>  . minor modes: electric, hungry-delete, comment-style
>  . c-offsets-alist and interactive indentation customization (related
>    to indentation, but still extremely important, and not directly in
>    tree-sitter)
>

Yes, I've read the manual many times.  Filling is one nice thing,
agreed.  electric, hungry-delete is just sitting there waiting for us to
create a framework using tree-sitter that would benefit _all_ languages
supported by tree-sitter, not just cc.

>> My guess is that we can get to feature parity and well beyond that
>> in a very short amount of time, if we're not hindered by merging
>> everything.
>
> As they say, "show me the code".  If you can write up a C/C++ mode
> from scratch which supports most everything in the CC Mode manual, do
> it better/cleaner than CC Mode does, and do it before the emacs-29
> branch is cut, in a month or so, I might change my mind.
>

Challenge accepted.  Can I create it for java, which is a language I'm
writing a lot in these days?  It would be simpler for me to test with
stuff I use daily, but still very much related to CC mode functionality.
I can branch out from feature/tree-sitter and create
progmodes/java-ts-mode.el in scratch/tree-sitter/java, then we can
decide if some variant of it should be merged in to tree-sitter before
the branch is cut.  What do you think?  If so, it would be nice to be
able to commit myself to simplify rebasing/merging with
feature/tree-sitter, and also not littering Yuan with reviews.

>> Sorry for the long mail, but I think we are missing the point by viewing
>> tree-sitter simply as an engine to plop in aside cc mode for
>> convenience, and not the real infrastructure change it is.
>
> Who said we view tree-sitter that way?
>
> What actually happens is that we gradually introduce tree-sitter as an
> engine for replacing the implementation of Emacs features where it is
> faster and/or better.  That is the plan.  There's no limit to these
> replacements, except what tree-sitter can do and how we can use that.
> But one thing we will NOT do is throw away existing important features
> before we have equivalent replacements and before users tell us the
> replacements are indeed better.
>

Yes, I don't disagree and never said we should.  If did then I misspoke.

>> There is no need to sunset cc mode, but equally there is no need to
>> limit tree-sitter.
>
> There's no limits.  The fact that we use tree-sitter for what we use
> it now is just because _we_ decided to do that initially, in order to
> have it in Emacs 29 as a useful infrastructure that users can take
> advantage of.  I don't believe in releasing Emacs with infrastructure
> that has no user-level features built on it.
>

And which is why I try to create some actual, useful modes for us for
the merge.

>> > Tree-sitter doesn't (and cannot) replace everything a major mode does
>> > for a programming language.  So a completely new mode means we through
>> > the baby with the bathwater.
>> 
>> I don't agree, but I'm very curious to what else would take a
>> significant effort _apart_ from indentation feature parity with cc mode is.
>
> See above: just read the CC Mode manual, and see for yourself.

I have, many times :-)


-- 
Theo


[0]: https://github.com/emacs-csharp/csharp-mode/blob/master/csharp-tree-sitter.el#L69-L78



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  7:20 ` Eli Zaretskii
@ 2022-11-01 12:10   ` Alan Mackenzie
  0 siblings, 0 replies; 28+ messages in thread
From: Alan Mackenzie @ 2022-11-01 12:10 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Randy Taylor, emacs-devel

Hello, Eli.

On Tue, Nov 01, 2022 at 09:20:33 +0200, Eli Zaretskii wrote:
> > Date: Tue, 01 Nov 2022 02:30:54 +0000
> > From: Randy Taylor <dev@rjt.dev>

> > Where specifically should the C and C++ tree-sitter stuff go? I've been using it for a couple months and would
> > like to upstream syntax highlighting for both. I'll focus on getting C done first.

> > I see there are a lot of cc- files; would it be appropriate to add the tree-sitter stuff into a new cc-treesit.el file?

> I suggest a separate cc-*.el file (e.g., cc-treesit.el), and some user
> option to trigger its use instead of (or maybe in addition to, as the
> case may be) the equivalent CC mode stuff.

> Alan, are you okay with this approach?

Yes, certainly.  It is the approach I would have chosen myself.  The key
sequence C-c C-t is currently unused in CC Mode, and it would seem ideal
to toggle tree-sitter with.

-- 
Alan Mackenzie (Nuremberg, Germany).



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 11:53         ` Theodor Thornhill
@ 2022-11-01 12:28           ` Eli Zaretskii
  2022-11-01 13:05             ` Theodor Thornhill
                               ` (2 more replies)
  0 siblings, 3 replies; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01 12:28 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: emacs-devel, dev, emacs-devel

> From: Theodor Thornhill <theo@thornhill.no>
> Cc: emacs-devel@gnu.org, dev@rjt.dev, emacs-devel@gnu.org
> Date: Tue, 01 Nov 2022 12:53:11 +0100
> 
> > I also have hard time believing that you can reimplement those slow
> > parts of CC Mode to be much faster, but if you have code to show which
> > does that, I'm sure I'd be interested to look at it and consider
> > improving CC Mode using that code.
> >
> 
> You'd be surprised.
> 
> - https://github.com/emacs-csharp/csharp-mode/pull/251
> - https://github.com/emacs-csharp/csharp-mode/issues/207
> - https://github.com/emacs-csharp/csharp-mode/issues/164
> - https://debbugs.gnu.org/db/43/43631.html
> - https://github.com/emacs-csharp/csharp-mode/issues/151
> - https://github.com/emacs-csharp/csharp-mode/issues/200
> 
> All of these are solved with [0], no implementation needed for anything
> (apart from generic tree-sitter machinery of course). 

That's for C#, not for C/C++.

But if you can do the same for C/C++, sure, let's see the code and
judge its relative merits and demerits.

> >> 3: Confusion with where to file bugs
> >
> > Not relevant in our case: the bugs should be filed with Emacs.
> >
> 
> Well, are you sure?

You asked where to file the bugs.  The answer is: on debbugs.  If it
eventually turns out the bug is in tree-sitter, we will file a bug
there.  Just like we do with any other library we use.  Nothing new
here, IMO.

> >  . filling and breaking text in comments and strings
> >  . automatic insertion of newlines after braces, colons, commas, semi-colons
> >  . whitespace cleanups
> >  . minor modes: electric, hungry-delete, comment-style
> >  . c-offsets-alist and interactive indentation customization (related
> >    to indentation, but still extremely important, and not directly in
> >    tree-sitter)
> >
> 
> Yes, I've read the manual many times.  Filling is one nice thing,
> agreed.  electric, hungry-delete is just sitting there waiting for us to
> create a framework using tree-sitter that would benefit _all_ languages
> supported by tree-sitter, not just cc.

If tree-sitter can make these easier or faster or better, I see no
reason not to use tree-sitter for (some of) those features as well.
There's no decision to limit tree-sitter's use to fontification and
indentation, and I don't think we will ever make such decisions,
except if we have some bitter experience.

> > As they say, "show me the code".  If you can write up a C/C++ mode
> > from scratch which supports most everything in the CC Mode manual, do
> > it better/cleaner than CC Mode does, and do it before the emacs-29
> > branch is cut, in a month or so, I might change my mind.
> 
> Challenge accepted.  Can I create it for java, which is a language I'm
> writing a lot in these days?

Sorry, no.  It has to support all the languages supported by CC Mode
now.  That's the challenge.

It is fine by me to have a separate java-mode, but then I personally
will not be very interested in this, since editing the Emacs C code,
which I do a lot, will still need to use CC Mode.  Without decent
support for C/C++, CC Mode cannot be retired.

(Do people really use Emacs to develop Java?  I'd be surprised.)



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 12:28           ` Eli Zaretskii
@ 2022-11-01 13:05             ` Theodor Thornhill
  2022-11-01 13:10               ` Eli Zaretskii
  2022-11-01 13:12             ` Manuel Uberti
  2022-11-04 14:49             ` Benjamin Riefenstahl
  2 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01 13:05 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dev



On 1 November 2022 13:28:05 CET, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: emacs-devel@gnu.org, dev@rjt.dev, emacs-devel@gnu.org
>> Date: Tue, 01 Nov 2022 12:53:11 +0100
>> 
>> > I also have hard time believing that you can reimplement those slow
>> > parts of CC Mode to be much faster, but if you have code to show which
>> > does that, I'm sure I'd be interested to look at it and consider
>> > improving CC Mode using that code.
>> >
>> 
>> You'd be surprised.
>> 
>> - https://github.com/emacs-csharp/csharp-mode/pull/251
>> - https://github.com/emacs-csharp/csharp-mode/issues/207
>> - https://github.com/emacs-csharp/csharp-mode/issues/164
>> - https://debbugs.gnu.org/db/43/43631.html
>> - https://github.com/emacs-csharp/csharp-mode/issues/151
>> - https://github.com/emacs-csharp/csharp-mode/issues/200
>> 
>> All of these are solved with [0], no implementation needed for anything
>> (apart from generic tree-sitter machinery of course). 
>
>That's for C#, not for C/C++.
>
>But if you can do the same for C/C++, sure, let's see the code and
>judge its relative merits and demerits.
>
>> >> 3: Confusion with where to file bugs
>> >
>> > Not relevant in our case: the bugs should be filed with Emacs.
>> >
>> 
>> Well, are you sure?
>
>You asked where to file the bugs.  The answer is: on debbugs.  If it
>eventually turns out the bug is in tree-sitter, we will file a bug
>there.  Just like we do with any other library we use.  Nothing new
>here, IMO.
>
>> >  . filling and breaking text in comments and strings
>> >  . automatic insertion of newlines after braces, colons, commas, semi-colons
>> >  . whitespace cleanups
>> >  . minor modes: electric, hungry-delete, comment-style
>> >  . c-offsets-alist and interactive indentation customization (related
>> >    to indentation, but still extremely important, and not directly in
>> >    tree-sitter)
>> >
>> 
>> Yes, I've read the manual many times.  Filling is one nice thing,
>> agreed.  electric, hungry-delete is just sitting there waiting for us to
>> create a framework using tree-sitter that would benefit _all_ languages
>> supported by tree-sitter, not just cc.
>
>If tree-sitter can make these easier or faster or better, I see no
>reason not to use tree-sitter for (some of) those features as well.
>There's no decision to limit tree-sitter's use to fontification and
>indentation, and I don't think we will ever make such decisions,
>except if we have some bitter experience.
>
>> > As they say, "show me the code".  If you can write up a C/C++ mode
>> > from scratch which supports most everything in the CC Mode manual, do
>> > it better/cleaner than CC Mode does, and do it before the emacs-29
>> > branch is cut, in a month or so, I might change my mind.
>> 
>> Challenge accepted.  Can I create it for java, which is a language I'm
>> writing a lot in these days?
>
>Sorry, no.  It has to support all the languages supported by CC Mode
>now.  That's the challenge.
>

Ok let's do it. But let's restrict it to languages considered stable from https://tree-sitter.github.io/tree-sitter/#available-parsers

- c
- c++
- c#
- java
- javascript
- typescript
- json

Ok? 

>It is fine by me to have a separate java-mode, but then I personally
>will not be very interested in this, since editing the Emacs C code,
>which I do a lot, will still need to use CC Mode.  Without decent
>support for C/C++, CC Mode cannot be retired.
>
>(Do people really use Emacs to develop Java?  I'd be surprised.)

Yes. I do, no problem

Theo 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 13:05             ` Theodor Thornhill
@ 2022-11-01 13:10               ` Eli Zaretskii
  2022-11-01 13:27                 ` Theodor Thornhill
  2022-11-01 16:09                 ` tomas
  0 siblings, 2 replies; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01 13:10 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: emacs-devel, dev

> Date: Tue, 01 Nov 2022 14:05:39 +0100
> From: Theodor Thornhill <theo@thornhill.no>
> CC: emacs-devel@gnu.org, dev@rjt.dev
> 
> >> Challenge accepted.  Can I create it for java, which is a language I'm
> >> writing a lot in these days?
> >
> >Sorry, no.  It has to support all the languages supported by CC Mode
> >now.  That's the challenge.
> >
> 
> Ok let's do it. But let's restrict it to languages considered stable from https://tree-sitter.github.io/tree-sitter/#available-parsers
> 
> - c
> - c++
> - c#
> - java
> - javascript
> - typescript
> - json
> 
> Ok? 

You mean, C, C++, and Java?  Yes, SGTM.  That'd leave Objective C,
IDL, Awk, and Pike out.

> >(Do people really use Emacs to develop Java?  I'd be surprised.)
> 
> Yes. I do, no problem

I meant the stuff that's missing in Emacs which is present in any
decent Java IDE.  Maybe you use Emacs for Java with many add-on
packages?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 12:28           ` Eli Zaretskii
  2022-11-01 13:05             ` Theodor Thornhill
@ 2022-11-01 13:12             ` Manuel Uberti
  2022-11-04 14:49             ` Benjamin Riefenstahl
  2 siblings, 0 replies; 28+ messages in thread
From: Manuel Uberti @ 2022-11-01 13:12 UTC (permalink / raw)
  To: Eli Zaretskii, Theodor Thornhill; +Cc: emacs-devel, dev

On 01/11/22 13:28, Eli Zaretskii wrote:
> (Do people really use Emacs to develop Java?  I'd be surprised.)

I've not coded in Java professionally for a while, but I still need some 
Java for work. A combination of LSP (via Eglot), project.el, and 
shell-mode covers my needs these days without having to run to Eclipse. 
But again, it's pet projects and proof of concepts nowadays, nothing big 
nor business related. Still, Emacs is way more useful with Java than it 
used to be when I first picked it up ~10yrs ago.

-- 
Manuel Uberti
https://manueluberti.eu




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 13:10               ` Eli Zaretskii
@ 2022-11-01 13:27                 ` Theodor Thornhill
  2022-11-01 13:49                   ` Eli Zaretskii
  2022-11-01 16:09                 ` tomas
  1 sibling, 1 reply; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01 13:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dev



On 1 November 2022 14:10:43 CET, Eli Zaretskii <eliz@gnu.org> wrote:
>> Date: Tue, 01 Nov 2022 14:05:39 +0100
>> From: Theodor Thornhill <theo@thornhill.no>
>> CC: emacs-devel@gnu.org, dev@rjt.dev
>> 
>> >> Challenge accepted.  Can I create it for java, which is a language I'm
>> >> writing a lot in these days?
>> >
>> >Sorry, no.  It has to support all the languages supported by CC Mode
>> >now.  That's the challenge.
>> >
>> 
>> Ok let's do it. But let's restrict it to languages considered stable from https://tree-sitter.github.io/tree-sitter/#available-parsers
>> 
>> - c
>> - c++
>> - c#
>> - java
>> - javascript
>> - typescript
>> - json
>> 
>> Ok? 
>
>You mean, C, C++, and Java?  Yes, SGTM.  That'd leave Objective C,
>IDL, Awk, and Pike out.
>

Yes, they have no parser apart from objc, which is in development. 

>> >(Do people really use Emacs to develop Java?  I'd be surprised.)
>> 
>> Yes. I do, no problem
>
>I meant the stuff that's missing in Emacs which is present in any
>decent Java IDE.  Maybe you use Emacs for Java with many add-on
>packages?

Nope. Eglot. That's it. Some details and integration are not as nice, but it's 80% there ootb. A little config gets me 95% there. 

Should I begin? I understand there's no obligation from anyone to accept this, but I think it's worth a shot. Can it live in a scratch branch? 

Theo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  7:24   ` Eli Zaretskii
  2022-11-01  7:55     ` Theodor Thornhill
@ 2022-11-01 13:32     ` Stefan Monnier
  2022-11-01 14:02       ` Eli Zaretskii
  1 sibling, 1 reply; 28+ messages in thread
From: Stefan Monnier @ 2022-11-01 13:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Theodor Thornhill, emacs-devel, dev

>> I'm no authority on the matter, but I'd love for us not to complicate
>> things too much. I vote for separate, non-cc-prefixed _new_ modes, that
>> derives from prog-mode.
>
> That'd mean people will need either to invent all the other goodies in
> CC mode (everything except fontifications and indentation) from
> scratch, or give up all those other goodies.  Does that make sense?

I'm a strong proponent of keeping "one mode" but from what I've seen so
far, trying to mix tree-sitter with CC-mode's `c-mode`, I agree with
Theodor that it might be better to start from scratch :-(

I have not looked at other languages in CC-mode, so I don't know if the
same should apply to all CC-mode's modes (my guess is that it does, tho).

My best hope so far is to:

- Rename `c-mode` to `cc-c-mode`.
- Make a new `c-mode` which delegates to `cc-c-mode` by default unless
  the user asked for the "new, tree-sitter based, c-mode" in which case
  it uses the brand new code base.

`cc-c-mode` would still set `major-mode` to `c-mode`, so from the users's
point of view there's still only one `c-mode` but the two variants
(tree-sitter and CC-mode) are almost completely separate.

We should make some effort to avoid users thinking "oh, there's the
legacy CC-mode-based c-mode and the shiny new tree-sitter-based C-mode",
but rather think "should I stay with the trusty CC-mode-based c-mode, or
try the toddler c-mode".

> Tree-sitter doesn't (and cannot) replace everything a major mode does
> for a programming language.

No, indeed.  But it's hard to use one part of CC-mode without another.
One of the great things about CC-mode is how it is all
nicely integrated.  But that cuts both ways :-(

> So a completely new mode means we through the baby with the bathwater.

The way I see it is that it will not break backward compatibility, and
in the short term it may fail to provide a strict superset of CC-mode's
`c-mode` features, but it's still going to be better than mixing the two
and then trying to fix the corresponding breakage.

> CC Mode has a full-blown manual, where this question is answered.
> Here's a partial list of features outside of the fontification and
> indentation area, which I collected just by looking at the top-level
> menus of that manual:
>
>  . filling and breaking text in comments and strings

This should be broken out of CC-mode so that all modes can benefit from it.
AFAIK this is the most valuable feature of CC-mode that's sorely missing
in our generic infrastructure (lots and lots of other major modes suffer
from it, so making it available to all major modes will be a great
improvement).

>  . automatic insertion of newlines after braces, colons, commas, semi-colons

This is already provided by `electric-layout-mode`.
[ More specifically it's one of the parts of CC-mode which I
  "broke out of CC-mode so that all modes can benefit from it".
  Of course, CC-mode doesn't use it, because when you try to
  implement something to be more generic, you rarely end up with
  100% identical behavior; and CC-mode wants to be backward compatible
  with old Emacsen that don't have `electric-layout-mode`.  ]

>  . whitespace cleanups

Not very familiar with this, but I'd be surprised if it wouldn't benefit
from "break out of CC-mode so that all modes can benefit from it".

>  . minor modes: electric, hungry-delete, comment-style

"Break out of CC-mode so that all modes can benefit from it".

>  . c-offsets-alist and interactive indentation customization (related
>    to indentation, but still extremely important, and not directly in
>    tree-sitter)

This is indeed important, but we can't use CC-mode's code for that in
any case: it needs to be reimplemented for tree-sitter's indentation.
And it'd be better if we could do that without having to worry about
backward compatibility with existing CC-mode users's settings
(i.e. we're free to cover the same functionality in a different way).


        Stefan




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 13:27                 ` Theodor Thornhill
@ 2022-11-01 13:49                   ` Eli Zaretskii
  2022-11-01 13:54                     ` Theodor Thornhill
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01 13:49 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: emacs-devel, dev

> Date: Tue, 01 Nov 2022 14:27:19 +0100
> From: Theodor Thornhill <theo@thornhill.no>
> CC: emacs-devel@gnu.org, dev@rjt.dev
> 
> Should I begin?

Yes, please.

> I understand there's no obligation from anyone to accept this, but I think it's worth a shot. Can it live in a scratch branch? 

I'd prefer the tree-sitter branch.  Why make one more?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 13:49                   ` Eli Zaretskii
@ 2022-11-01 13:54                     ` Theodor Thornhill
  2022-11-01 14:03                       ` Eli Zaretskii
  0 siblings, 1 reply; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01 13:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dev

Eli Zaretskii <eliz@gnu.org> writes:

>> Date: Tue, 01 Nov 2022 14:27:19 +0100
>> From: Theodor Thornhill <theo@thornhill.no>
>> CC: emacs-devel@gnu.org, dev@rjt.dev
>> 
>> Should I begin?
>
> Yes, please.
>
>> I understand there's no obligation from anyone to accept this, but I think it's worth a shot. Can it live in a scratch branch? 
>
> I'd prefer the tree-sitter branch.  Why make one more?

No objections here.  I was just trying to enable us to more easily
reject without messing with git history.  Can I get push access?

-- 
Theo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 13:32     ` Stefan Monnier
@ 2022-11-01 14:02       ` Eli Zaretskii
  2022-11-01 15:09         ` Stefan Monnier
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01 14:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: theo, emacs-devel, dev

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Theodor Thornhill <theo@thornhill.no>,  emacs-devel@gnu.org,  dev@rjt.dev
> Date: Tue, 01 Nov 2022 09:32:18 -0400
> 
> My best hope so far is to:
> 
> - Rename `c-mode` to `cc-c-mode`.
> - Make a new `c-mode` which delegates to `cc-c-mode` by default unless
>   the user asked for the "new, tree-sitter based, c-mode" in which case
>   it uses the brand new code base.
> 
> `cc-c-mode` would still set `major-mode` to `c-mode`, so from the users's
> point of view there's still only one `c-mode` but the two variants
> (tree-sitter and CC-mode) are almost completely separate.
> 
> We should make some effort to avoid users thinking "oh, there's the
> legacy CC-mode-based c-mode and the shiny new tree-sitter-based C-mode",
> but rather think "should I stay with the trusty CC-mode-based c-mode, or
> try the toddler c-mode".
> 
> > Tree-sitter doesn't (and cannot) replace everything a major mode does
> > for a programming language.
> 
> No, indeed.  But it's hard to use one part of CC-mode without another.
> One of the great things about CC-mode is how it is all
> nicely integrated.  But that cuts both ways :-(
> 
> > So a completely new mode means we through the baby with the bathwater.
> 
> The way I see it is that it will not break backward compatibility, and
> in the short term it may fail to provide a strict superset of CC-mode's
> `c-mode` features, but it's still going to be better than mixing the two
> and then trying to fix the corresponding breakage.
> 
> > CC Mode has a full-blown manual, where this question is answered.
> > Here's a partial list of features outside of the fontification and
> > indentation area, which I collected just by looking at the top-level
> > menus of that manual:
> >
> >  . filling and breaking text in comments and strings
> 
> This should be broken out of CC-mode so that all modes can benefit from it.
> AFAIK this is the most valuable feature of CC-mode that's sorely missing
> in our generic infrastructure (lots and lots of other major modes suffer
> from it, so making it available to all major modes will be a great
> improvement).
> 
> >  . automatic insertion of newlines after braces, colons, commas, semi-colons
> 
> This is already provided by `electric-layout-mode`.
> [ More specifically it's one of the parts of CC-mode which I
>   "broke out of CC-mode so that all modes can benefit from it".
>   Of course, CC-mode doesn't use it, because when you try to
>   implement something to be more generic, you rarely end up with
>   100% identical behavior; and CC-mode wants to be backward compatible
>   with old Emacsen that don't have `electric-layout-mode`.  ]
> 
> >  . whitespace cleanups
> 
> Not very familiar with this, but I'd be surprised if it wouldn't benefit
> from "break out of CC-mode so that all modes can benefit from it".
> 
> >  . minor modes: electric, hungry-delete, comment-style
> 
> "Break out of CC-mode so that all modes can benefit from it".
> 
> >  . c-offsets-alist and interactive indentation customization (related
> >    to indentation, but still extremely important, and not directly in
> >    tree-sitter)
> 
> This is indeed important, but we can't use CC-mode's code for that in
> any case: it needs to be reimplemented for tree-sitter's indentation.
> And it'd be better if we could do that without having to worry about
> backward compatibility with existing CC-mode users's settings
> (i.e. we're free to cover the same functionality in a different way).

Sorry for being blunt, but you've presented a plan for Emacs 32 if not
42.  If that's what we need, we should first make sure that Theodor
(or whoever picks up the gauntlet) will be willing to work on such a
branch for that long a time ;-)

What _I_ want is to have some decent tree-sitter supported modes in
Emacs 29, and I still hope C/C++ editing could benefit from that, in
Emacs 29.  That calls for a completely different plan, if my
experience with Emacs development is of any significance.

Bottom line: I don't see how we could make a "revolution" the size you
are envisioning in such a short time.  Not unless you somehow can
summon a team of talented and motivated individuals to work on it
starting today.  The only practical way I see is by _evolution_,
gradually replacing CC Mode's features with tree-sitter supported ones
where that makes sense, and at first as opt-in.  And yes, this means
no "breaking out of CC-mode", at least not as part of this particular
effort: it simply is too much, too high a bar to jump.  It could well
enough kill the effort, for all practical purposes.

Of course, I'd be happy to be proven wrong, and be dazzled by a
full-fledged, backward-compatible C/C++ mode based on tree-sitter,
with all of the stuff you mentioned on top of that, within the month.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 13:54                     ` Theodor Thornhill
@ 2022-11-01 14:03                       ` Eli Zaretskii
  2022-11-01 14:12                         ` Theodor Thornhill
  0 siblings, 1 reply; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01 14:03 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: emacs-devel, dev

> From: Theodor Thornhill <theo@thornhill.no>
> Cc: emacs-devel@gnu.org, dev@rjt.dev
> Date: Tue, 01 Nov 2022 14:54:14 +0100
> 
> Can I get push access?

Is it a necessary condition?



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 14:03                       ` Eli Zaretskii
@ 2022-11-01 14:12                         ` Theodor Thornhill
  0 siblings, 0 replies; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01 14:12 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, dev



On 1 November 2022 15:03:43 CET, Eli Zaretskii <eliz@gnu.org> wrote:
>> From: Theodor Thornhill <theo@thornhill.no>
>> Cc: emacs-devel@gnu.org, dev@rjt.dev
>> Date: Tue, 01 Nov 2022 14:54:14 +0100
>> 
>> Can I get push access?
>
>Is it a necessary condition?

No of course not, but it's simpler. I can work around it, but it'll be slower :-) 

Theo



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 14:02       ` Eli Zaretskii
@ 2022-11-01 15:09         ` Stefan Monnier
  2022-11-01 15:36           ` Theodor Thornhill
  2022-11-01 16:43           ` Eli Zaretskii
  0 siblings, 2 replies; 28+ messages in thread
From: Stefan Monnier @ 2022-11-01 15:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: theo, emacs-devel, dev

> Sorry for being blunt, but you've presented a plan for Emacs 32 if
> not 42.

Huh?  What makes you think that?

On the contrary it's a plan that lets us get quickly a working
tree-sitter-based C-mode.  Not one that's a strict superset of CC-mode's
`c-mode`, but a quite decent `c-mode` nevertheless.

> Bottom line: I don't see how we could make a "revolution" the size you
> are envisioning in such a short time.

It's not at all a revolution.
It's a very smooth path that breaks nothing and lets us move progressively.

It's a mini "revolution" maybe for users who will have to choose
between two different flavors of `c-mode`, each one with its current
strengths and downsides, but that's the cost to pay for a much smoother
job on the implementation.

> Not unless you somehow can summon a team of talented and motivated
> individuals to work on it starting today.  The only practical way
> I see is by _evolution_, gradually replacing CC Mode's features with
> tree-sitter supported ones where that makes sense, and at first as
> opt-in.  And yes, this means no "breaking out of CC-mode", at least
> not as part of this particular effort: it simply is too much, too high
> a bar to jump.  It could well enough kill the effort, for all
> practical purposes.

Slowly evolving CC-mode itself to use tree-sitter is something I can't
even begin to imagine how to do.  That's what I would expect to take
years :-)

> Of course, I'd be happy to be proven wrong, and be dazzled by a
> full-fledged, backward-compatible C/C++ mode based on tree-sitter,
> with all of the stuff you mentioned on top of that, within the month.

I don't foresee "all of the stuff" to be done immediately, no.
[ Tho I do think the filling code at least can be extracted from CC-mode
within a month (or at least, an important subset of it).  ]

Which is why users will have to choose (and we'll stick to CC-mode by
default, of course).


        Stefan




^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 15:09         ` Stefan Monnier
@ 2022-11-01 15:36           ` Theodor Thornhill
  2022-11-01 16:43           ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: Theodor Thornhill @ 2022-11-01 15:36 UTC (permalink / raw)
  To: Stefan Monnier, Eli Zaretskii; +Cc: emacs-devel, dev



On 1 November 2022 16:09:39 CET, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
>> Sorry for being blunt, but you've presented a plan for Emacs 32 if
>> not 42.
>
>Huh?  What makes you think that?
>
>On the contrary it's a plan that lets us get quickly a working
>tree-sitter-based C-mode.  Not one that's a strict superset of CC-mode's
>`c-mode`, but a quite decent `c-mode` nevertheless.
>

No matter what we'll decide on, I'll make these modes and submit it for review in some weeks time. I'm no c++ expert, so I'm bound to make mistakes there, but the others I think I have an idea of how to do. 

>
>> Not unless you somehow can summon a team of talented and motivated
>> individuals to work on it starting today.  The only practical way
>> I see is by _evolution_, gradually replacing CC Mode's features with
>> tree-sitter supported ones where that makes sense, and at first as
>> opt-in.  And yes, this means no "breaking out of CC-mode", at least
>> not as part of this particular effort: it simply is too much, too high
>> a bar to jump.  It could well enough kill the effort, for all
>> practical purposes.
>

I'll try to prove you wrong. It seems someone is trying to add it to the proposed cc-treesit.el, so maybe we can have the cake and eat it too ;-) 


>
>I don't foresee "all of the stuff" to be done immediately, no.
>[ Tho I do think the filling code at least can be extracted from CC-mode
>within a month (or at least, an important subset of it).  ]
>

I think I'll try to make a tree-sitter powered auto-fill. 

>Which is why users will have to choose (and we'll stick to CC-mode by
>default, of course).
>

Of course. 

Theo 



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 13:10               ` Eli Zaretskii
  2022-11-01 13:27                 ` Theodor Thornhill
@ 2022-11-01 16:09                 ` tomas
  1 sibling, 0 replies; 28+ messages in thread
From: tomas @ 2022-11-01 16:09 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 548 bytes --]

On Tue, Nov 01, 2022 at 03:10:43PM +0200, Eli Zaretskii wrote:

[...]

> > >(Do people really use Emacs to develop Java?  I'd be surprised.)
> > 
> > Yes. I do, no problem
> 
> I meant the stuff that's missing in Emacs which is present in any
> decent Java IDE.  Maybe you use Emacs for Java with many add-on
> packages?

It's a while ago, but I did participate in a Java project, and
Emacs was absolutely fine back then. No add-ons.

Now I know this isn't everyone's "way of working", but it suits
me perfectly.

Cheers
-- 
t

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 195 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 15:09         ` Stefan Monnier
  2022-11-01 15:36           ` Theodor Thornhill
@ 2022-11-01 16:43           ` Eli Zaretskii
  1 sibling, 0 replies; 28+ messages in thread
From: Eli Zaretskii @ 2022-11-01 16:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: theo, emacs-devel, dev

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: theo@thornhill.no,  emacs-devel@gnu.org,  dev@rjt.dev
> Date: Tue, 01 Nov 2022 11:09:39 -0400
> 
> > Sorry for being blunt, but you've presented a plan for Emacs 32 if
> > not 42.
> 
> Huh?  What makes you think that?

A bit of gray hair, nothing more.

> On the contrary it's a plan that lets us get quickly a working
> tree-sitter-based C-mode.  Not one that's a strict superset of CC-mode's
> `c-mode`, but a quite decent `c-mode` nevertheless.

I just disagree with the "quickly" part, that's all.

> I don't foresee "all of the stuff" to be done immediately, no.

So we basically agree.

> [ Tho I do think the filling code at least can be extracted from CC-mode
> within a month (or at least, an important subset of it).  ]

Let's see.  I'll be happy if that happens.



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01  5:44 ` Theodor Thornhill
  2022-11-01  7:24   ` Eli Zaretskii
@ 2022-11-02 20:43   ` João Távora
  1 sibling, 0 replies; 28+ messages in thread
From: João Távora @ 2022-11-02 20:43 UTC (permalink / raw)
  To: Theodor Thornhill; +Cc: Randy Taylor, emacs-devel

[-- Attachment #1: Type: text/plain, Size: 847 bytes --]

On Tue, Nov 1, 2022, 05:46 Theodor Thornhill <theo@thornhill.no> wrote:

>
>
> On 1 November 2022 03:30:54 CET, Randy Taylor <dev@rjt.dev> wrote:
> >Hi.
> >
> >Where specifically should the C and C++ tree-sitter stuff go? I've been
> using it for a couple months and would like to upstream syntax highlighting
> for both. I'll focus on getting C done first.
> >
> >I see there are a lot of cc- files; would it be appropriate to add the
> tree-sitter stuff into a new cc-treesit.el file?
> >Thanks.
>
> I'm no authority on the matter, but I'd love for us not to complicate
> things too much. I vote for separate, non-cc-prefixed _new_ modes, that
> derives from prog-mode.
>
> I understand that this is a controversial opinion, but that's what I want.
> I believe people will do that anyway if we don't.
>

+1

João

>

[-- Attachment #2: Type: text/html, Size: 1491 bytes --]

^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-01 12:28           ` Eli Zaretskii
  2022-11-01 13:05             ` Theodor Thornhill
  2022-11-01 13:12             ` Manuel Uberti
@ 2022-11-04 14:49             ` Benjamin Riefenstahl
  2022-11-04 16:17               ` Pascal Quesseveur
  2 siblings, 1 reply; 28+ messages in thread
From: Benjamin Riefenstahl @ 2022-11-04 14:49 UTC (permalink / raw)
  To: emacs-devel

Eli Zaretskii writes:
> (Do people really use Emacs to develop Java?  I'd be surprised.)

Just FTR, I do.  And I know some other people that use Emacs for Java,
too.  I only use core Emacs for this and a few functions that I wrote
myself.

benny



^ permalink raw reply	[flat|nested] 28+ messages in thread

* Re: feature/tree-sitter: Where to Put C/C++ Stuff
  2022-11-04 14:49             ` Benjamin Riefenstahl
@ 2022-11-04 16:17               ` Pascal Quesseveur
  0 siblings, 0 replies; 28+ messages in thread
From: Pascal Quesseveur @ 2022-11-04 16:17 UTC (permalink / raw)
  To: emacs-devel

>"BR" == Benjamin Riefenstahl <b.riefenstahl@turtle-trading.net> writes:

  BR> Just FTR, I do. 

Me too. In the past I used jde (or jdee) but not anymore. I developed
some functions to use ant and jswat and I'm pretty happy with the
result.


-- 
Pascal Quesseveur
pquessev@gmail.com




^ permalink raw reply	[flat|nested] 28+ messages in thread

end of thread, other threads:[~2022-11-04 16:17 UTC | newest]

Thread overview: 28+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-11-01  2:30 feature/tree-sitter: Where to Put C/C++ Stuff Randy Taylor
2022-11-01  5:44 ` Theodor Thornhill
2022-11-01  7:24   ` Eli Zaretskii
2022-11-01  7:55     ` Theodor Thornhill
2022-11-01  9:22       ` Yuan Fu
2022-11-01  9:41         ` Theodor Thornhill
2022-11-01  9:57       ` Eli Zaretskii
2022-11-01 11:53         ` Theodor Thornhill
2022-11-01 12:28           ` Eli Zaretskii
2022-11-01 13:05             ` Theodor Thornhill
2022-11-01 13:10               ` Eli Zaretskii
2022-11-01 13:27                 ` Theodor Thornhill
2022-11-01 13:49                   ` Eli Zaretskii
2022-11-01 13:54                     ` Theodor Thornhill
2022-11-01 14:03                       ` Eli Zaretskii
2022-11-01 14:12                         ` Theodor Thornhill
2022-11-01 16:09                 ` tomas
2022-11-01 13:12             ` Manuel Uberti
2022-11-04 14:49             ` Benjamin Riefenstahl
2022-11-04 16:17               ` Pascal Quesseveur
2022-11-01 13:32     ` Stefan Monnier
2022-11-01 14:02       ` Eli Zaretskii
2022-11-01 15:09         ` Stefan Monnier
2022-11-01 15:36           ` Theodor Thornhill
2022-11-01 16:43           ` Eli Zaretskii
2022-11-02 20:43   ` João Távora
2022-11-01  7:20 ` Eli Zaretskii
2022-11-01 12:10   ` Alan Mackenzie

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).