From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Theodor Thornhill Newsgroups: gmane.emacs.devel Subject: Re: feature/tree-sitter: Where to Put C/C++ Stuff Date: Tue, 01 Nov 2022 08:55:44 +0100 Message-ID: <878rkv3y7z.fsf@thornhill.no> References: <83pme7cf23.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18719"; mail-complaints-to="usenet@ciao.gmane.io" Cc: emacs-devel@gnu.org, dev@rjt.dev, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org+ged-emacs-devel=m.gmane-mx.org@gnu.org Tue Nov 01 08:57:01 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1opm8p-0004hK-7P for ged-emacs-devel@m.gmane-mx.org; Tue, 01 Nov 2022 08:56:59 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1opm82-0004Ow-3V; Tue, 01 Nov 2022 03:56:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1opm7o-0004Jv-3V for emacs-devel@gnu.org; Tue, 01 Nov 2022 03:55:56 -0400 Original-Received: from out0.migadu.com ([94.23.1.103]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1opm7h-0002HR-1n; Tue, 01 Nov 2022 03:55:55 -0400 X-Report-Abuse: Please report any abuse attempt to abuse@migadu.com and include these headers. DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=thornhill.no; s=key1; t=1667289346; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=pqncrAXYhQQ3Xj9NpF41XPh7Xw7F7bf3qAmIoO0PZCU=; b=Ov+kkFEjvDkv0WCsfLYT2RCklFVTqFoXD7OjcJchHKNwIL0VjriyNhIVvxajOUmU1FcSnF iCN/18AE7CtLVlRIDCkLN2Go601IH9dED+7tOXNXN/OZp3So4r9MCYUDdd1BX07T5RSz+D njCililLTLhuc2whFxLYk7EkkqvGXWhPBDQGLjCqpFg1Mmfp5gjRqtv2hACtoFGU3rl2e3 n02deEBHa7gNelB8/w7sxdr6A68TKL7d1xVQROPyzLSC+FN2KODF4xgwKecrKi986q4HvE Xq+zbn0kYLotPFMHuYsjsznr7/FySImg4MGD1g+YyMQlUDXKFhXOJJ0krwV9+Q== In-Reply-To: <83pme7cf23.fsf@gnu.org> X-Migadu-Flow: FLOW_OUT Received-SPF: pass client-ip=94.23.1.103; envelope-from=theo@thornhill.no; helo=out0.migadu.com X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Original-Sender: "Emacs-devel" Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org+ged-emacs-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.devel:298913 Archived-At: Hi Eli! >> Date: Tue, 01 Nov 2022 06:44:38 +0100 >> From: Theodor Thornhill >> >> >Where specifically should the C and C++ tree-sitter stuff go? I've >> >been using it for a couple months and would like to upstream syntax >> >highlighting for both. I'll focus on getting C done first. >> > >> >I see there are a lot of cc- files; would it be appropriate to add >> >the tree-sitter stuff into a new cc-treesit.el file? Thanks. >> >> I'm no authority on the matter, but I'd love for us not to complicate >> things too much. I vote for separate, non-cc-prefixed _new_ modes, >> that derives from prog-mode. > > That'd mean people will need either to invent all the other goodies in > CC mode (everything except fontifications and indentation) from > scratch, or give up all those other goodies. Does that make sense? > Yes, well, partially. I think that we are too likely to create unwanted issues by merging the two too closely. I have seen several of these issues the last couple of years while implementing c-sharp mode in cc mode, emacs-tree-sitter and treesit. There are several things that are happening. I'll try to expand on some of them just to create some perspective, but also for some specific points where we can improve to maybe don't have a problem with this at all. 1: Use CC mode for one thing and tree-sitter for the rest While first implementing tree-sitter in c-sharp mode we tried just applying font-locking, and use cc mode for indentation and the rest. What happened was that we immediately inherited the performance issues from cc mode straight into our code. Specifically, when typing in a file with too many (from cc mode's perspective) strings, typing lag rose to several seconds per press. I filed several bug reports on this both here and to Alan. After some time and much heroics we got some improvement on this from Alan, but c-sharp already had moved on. 2: Using separate names for modes. The great advantage here is easy to understand. You have no inheritance issues, and are free to develop features without regards to legacy. A disadvantage is that some users depend on that major mode name for other stuff. We had some issues filed with us to flip over to tree-sitter completely, because that name (csharp-mode) was so important compared to (csharp-tree-sitter-mode). We almost made the change, but then Yuan started his work so we waited. This would have sunsetted the cc mode almost immediately 3: Confusion with where to file bugs We have many bugs in c-sharp mode where some things are emacs bugs, some things are cc mode bugs, some are treesitter bugs and some are our own bugs. There is a real issue with understanding cc mode and figuring out where a bug fix should end up. It has taken me many weeks worth of digging to understand only the simplest mechanisms of cc mode. Tree-sitter takes contributors only a couple of hours to be immediately productive. To disregard this point with only compatibility with cc mode is a huge mistake, IMO. 4: How do we know what to disable? If there's a problem somewhere in the tree-sitter variant of the cc mode derived new mode, and we see some issue - who makes the fix? For example, previously there was limited support for multiline strings in cc mode, which took almost a year to finalize. The tree-sitter variant with more performance and accuracy took me maybe 20 minutes in a work-meeting. Should a feature that is simple to implement in the tree-sitter variant wait for a similar cc mode implementation? The namespacing seems to suggest that yes, it should. 5: While tree-sitter is only an engine, it provides a lot more goodies We have a huge opportunity to create real new frameworks for emacs now, but limiting us to merge the features/modes suggests that we cannot reliably do overarching advancements such as we see now in the feature/tree-sitter branch. For example, many small hacks I've made in the modes I've submitted thus far has made it into general mechanisms in treesit.el. All modes that enable tree-sitter should be able to use these and all the new that come _without_ worrying whether or not some issue will crop up from inheriting from cc mode or some other thing. Examples are indentation styles, paredit-like funciontalities, refactorings and more. 6: What are the goodies that we really need from CC mode? CC mode provides indentation and font locking. What else does it provide that isn't replaceable pretty quickly? I mean this not as a contrarian, but out of real curiosity. My guess is that we can get to feature parity and well beyond that in a very short amount of time, if we're not hindered by merging everything. Sorry for the long mail, but I think we are missing the point by viewing tree-sitter simply as an engine to plop in aside cc mode for convenience, and not the real infrastructure change it is. There is no need to sunset cc mode, but equally there is no need to limit tree-sitter. > Tree-sitter doesn't (and cannot) replace everything a major mode does > for a programming language. So a completely new mode means we through > the baby with the bathwater. I don't agree, but I'm very curious to what else would take a significant effort _apart_ from indentation feature parity with cc mode is. One thing I know of is integration with package managers such as what elm-mode and go-mode does, but that is an easy fix. The upstream go-mode, if not possible to move to core can just derive from a simple go-treesit, skip all indentation and font-locking in its own mode, but supply the goodies. -- Theo