From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ergus Newsgroups: gmane.emacs.devel Subject: Re: cc-mode fontification feels random Date: Sat, 12 Jun 2021 18:59:52 +0200 Message-ID: <20210612165952.h5pv6x6wpotdfuvq@Ergus> References: <83k0n09tkp.fsf@gnu.org> <837dj09p0e.fsf@gnu.org> <20210611232535.b4dyu3a2yxvdixys@Ergus> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="18378"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Stefan Monnier , Eli Zaretskii , dancol@dancol.org, acm@muc.de, rudalics@gmx.at, emacs-devel@gnu.org To: Theodor Thornhill Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Sat Jun 12 19:02:40 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1ls71r-0004Yh-JS for ged-emacs-devel@m.gmane-mx.org; Sat, 12 Jun 2021 19:02:39 +0200 Original-Received: from localhost ([::1]:51194 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1ls71q-0007k4-6I for ged-emacs-devel@m.gmane-mx.org; Sat, 12 Jun 2021 13:02:38 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:36960) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1ls6zP-0006lg-DJ for emacs-devel@gnu.org; Sat, 12 Jun 2021 13:00:07 -0400 Original-Received: from sonic313-13.consmr.mail.bf2.yahoo.com ([74.6.133.123]:35101) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1ls6zM-00007v-4A for emacs-devel@gnu.org; Sat, 12 Jun 2021 13:00:06 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1623517199; bh=04/Vd0zOtD9wPu3DTZ8+UFu1OtOJMDOoRZdixkVyfSM=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From:Subject:Reply-To; b=IgTI8rA+9bZZU4EYCbxhyBG48JMSzF3jpbWwEOX4jeOy+sL4mtn0DGadCTxhSBP2sScrgJ0CTKFP/LWHD9EfWuCKc8CZKZRiuxY49xt+igZYWDi/rWSeetEzK2Fi4yX0xwmfAG8UdT/Gie74Ho5+N/xtq1JSrGcOCuhAOe4PhFMjfIjywnEqdK4HL63R/oQuQUo2bOD4b69dlGAAPP7cAyUTD2i2eFcPWkvUHCujV37PeSLhIRzYpiyQB1v3rNLTosmsL98e1nKZLIRXdaMgIq/g5+x/CKN4GKvsIchSJ9XZiMyeGTqHtaTE8JqVUD7EqlHfuKIK5ZT/p03YEeTQJg== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1623517199; bh=6KbCICcpDX3OiPC4E+tWcbrOtKtvrVZvHtZgIPDVHKz=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=cveox1yeE8OMIAm9JvCAIBI3bmTY/2W3MfIE7dkoE97mPTGSNRpwM4EM6IqK0WtbBXoHNXD7LPiyYJcBsV2Yoy6Dp61lq3frzaMp5O2Agf003ln+k9+8GOU5rCAuzoDou2BdPA+tb+tFp4ry7zRmbHcOAMDI8MGX8frNGNd01Ea8pO7NixaILKKQtKJUEVIK85+XumOM7me3LFwnxhsGVtKfmSnuhTVTz1YvhKqBNZHanMsuMosff8wyiVhuXKukySbLJClx1gdxslJosuya13OZkP4olsvxNWueRsLyv2l/WTMbA3pksHqf8o9eX6F6+mJWvtmapJqSKHubbVNBdg== X-YMail-OSG: cn1XCZMVM1lq_X3cBP6Z4tvNYxP224YKzY.ggcBY_9InMWsyVeYtU1HjrlVKeD_ WzAi7CBx2mj3uvWp3ldUwQnmihNHgVnxm0NXGVhsdw2NEhWthAV_sabKXccEc3GOokNG0v3KnXvj xk47Ibv5GAgDyyWgM8oBM2FXL5pt6XXfClKS4_tkY82FIzbQHb4.gVLtkCbjpGmmVwo9ZvVCdAjz KfQfuxVwo8ENrmm5jwFGZi6gR6CHZ8xpyDNJ5Jcm0JlpfkiFjKAwG9Px0K9AdpHHrPnN6o3yswdC QcoZrjIEB5QW3urRgfQnJp89YUSZ7x0DNPxUSunu3fxZfpynXDXh6rcRMiF83dx1RMbSPJH6e6lJ 52TeW2pRT837P.ExeShpVeTYoz.p8rs6fWxhhwxzc7EEbRwBlNAD2KD_uhz8OiF1Y6XQYS2IpRTv O9eR4deyO7e5V4yyvcQ6eJUp5NH.RXWiekuuc027cDXztBvargWlHF_ojvEkz6939fDSSVanxzt7 dux98UgqeNpQOLqpuKiuWVMthnmrrzkiQocNuWRZwN14ka.dwIL5LEyweJ5feeGxCsFTzubwlKPT tgDhVF1_ZKqIhIHE076ckvTWyx_9T.KXtgda8g8na8b94ZT7R1dyepy3AObVWmoU4b7tXr1nSUkb NE_Cwz3cOx7fVPVeQqk4LwhofusHpJhW_Q5bQtpkt7NMFa6MxO04TVvKohuWy8Zf_eX8iRo1SikE hf2MZzeZh0LHgm5lQ87xGaO4T0zL3pH_RMDBIoJt1tObRrbF3OjRBajdoP5WoNoi40JofuHOxEiV U40jAWe4RGkoWH4XPI_Q0S4bTfkohokhITZ7ptKRJw X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic313.consmr.mail.bf2.yahoo.com with HTTP; Sat, 12 Jun 2021 16:59:59 +0000 Original-Received: by kubenode510.mail-prod1.omega.ir2.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 03901a129d498c8edb2cc64331845cd6; Sat, 12 Jun 2021 16:59:57 +0000 (UTC) Content-Disposition: inline In-Reply-To: X-Mailer: WebService/1.1.18368 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.aol Received-SPF: pass client-ip=74.6.133.123; envelope-from=spacibba@aol.com; helo=sonic313-13.consmr.mail.bf2.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:270773 Archived-At: On Sat, Jun 12, 2021 at 05:56:34PM +0200, Theodor Thornhill wrote: >Stefan Monnier writes: > >>> @Stefan - I'm not sure I understand what you mean by troublesome for >>> elisp hackers. These grammars have a lisp-like dsl, and is pretty >>> usable through C-M-x and defvars, see: >>> https://github.com/emacs-csharp/csharp-mode/blob/master/csharp-tree-sitter.el#L44. >> >> AFAIK the grammar itself is still written in Javascript. >> > >Yeah, but compiled parsers can be supplied through CI or something like that. > > >[...] >> >> Agreed. Maybe a first step would be to get copyright assignments and >> include the tree sitter module in GNU ELPA? >> > >If I read some of these mails correctly it seems like that wouldn't be >possible due to interest from some of the parties involved in the main >package. I don't know the details on that, though. And Eli seems >unhappy with what's there. > >As for making a little more concrete proposal for how to move forward, >would this be something like what we want? > >- create/use c or rust bindings Hi: Eli and the others will give better info for sure, but just to start (and also they may correct my ideas): First there is needed a "mode-local" initialization for the parser based on the major mode (as explained in the TS doc). The parser probably must be stored somewhere in the "mode" to avoid parser duplication for the same language. This should be executed probably once/mode (it may be perfectly in the lisp side then) and will be a wrapper to call: ts_parser_new ts_parser_set_language After that in the C side I think that all we need is in buffer.{h,c}. to pass the current_buffer->text->beg (or similar) directly to ts_parser_parse_string or ts_parser_parse_string_encoding. Here we must exclude the gap region maybe with ts_parser_included_ranges (all that information seems to be there as macros in buffer.h). Once we have a tree we associate it with the buffer it belongs to. And then comes the rest. >- create an elisp-layer for interaction with the parse tree Basically we need to expose some of them, but it is better if we can handle the most we can in the C side. Using simpler data types and handling entire regions with the ts_tree_cursor_* functionalities. Must of course, some of the will be needed for other functionalities. I don't know if we can manage the font-locking from C? But I think that text properties can. So the next step is just traverse the visible region of the tree to convert the info in text properties. Here will be needed a sort of translation between ts_language_symbol_count and font-lock faces. >- hook fontification and indentation into that elisp-layer > If I understood what Eli wants to prevent, if we set the properties and faces in step 2; then these hooks may not be needed. In most cases we will need to call ts_parser_parse_string somewhere `after-change-functions` (or maybe earlier I don't know) passing it the old tree and getting the differences with the new one with ts_tree_get_changed_ranges. This returns something much smaller than the tree so maybe we can convert it into a lisp list to use it in font-lock in the lisp side if we can't handle most of it in C. >It feels like the elisp-layer will be the easiest part. I'm not really >well versed in where to look in the c code of emacs for where and how to >link this, so some pointers would be nice. > >It looks like most people agree that tree sitter support is wanted, so >maybe it's time to start doing it? I can surely have a stab at it, but >I'd like some guidance for how to proceed best - if it's wanted, that >is. > >-- >Theodor >