From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ergus Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter Date: Fri, 30 Jul 2021 15:32:54 +0200 Message-ID: <20210730133254.qtxgjkje36nqehpd@Ergus> References: <8735rzyzbz.fsf@163.com> <86v94v3xh9.fsf@stephe-leake.org> <87wnpargnb.fsf@elite.giraud> <87h7gey7zx.fsf@163.com> <83pmv2twrl.fsf@gnu.org> <86sfzwogsn.fsf@stephe-leake.org> <87o8akmy4p.fsf@163.com> Mime-Version: 1.0 Content-Type: text/plain; charset=us-ascii; format=flowed Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29801"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Andrei Kuznetsov , Eli Zaretskii , Stephen Leake , manuel@ledu-giraud.fr, emacs-devel@gnu.org To: Arthur Miller Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jul 30 15:35:46 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m9Sfy-0007Ts-6D for ged-emacs-devel@m.gmane-mx.org; Fri, 30 Jul 2021 15:35:46 +0200 Original-Received: from localhost ([::1]:52716 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m9Sfw-0003gC-QV for ged-emacs-devel@m.gmane-mx.org; Fri, 30 Jul 2021 09:35:44 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:58946) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m9Sdc-0000vj-Oz for emacs-devel@gnu.org; Fri, 30 Jul 2021 09:33:24 -0400 Original-Received: from sonic312-20.consmr.mail.bf2.yahoo.com ([74.6.128.82]:44503) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1m9SdY-0005mm-Bz for emacs-devel@gnu.org; Fri, 30 Jul 2021 09:33:20 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=aol.com; s=a2048; t=1627651993; bh=QGRC+2D7ZFZVtWK+UpZ+u69M2aylF6JgHifaC08swKs=; h=Date:From:To:Cc:Subject:References:In-Reply-To:From:Subject:Reply-To; b=l5vMi8ni6abdorQ1mVLzLyE+TkJ6DIZN0WCnnHt8Lla+wdDpUr2bpZefFrMKEs0HqlSBCvlo9RjkbxBlOTXYO02DamSWub2tDnr8D/g2CdZsOHBBDfCaLzNqAh1uJrRrFTLIWdVCbeJ12g/uLl4Y2VnCVyPbUpZtSwSJUXudFEox/kQWKdLTahiACQB/n6TER+ZuH6YUeKYr0eh+FIopInCwx1Icf+KmJFxTAoSKeTaSRLA1DWflSB63hx6XRacG2ySH5fFFHAKdMIFsT77FW8+wUKtV5xjQunGOzgNDBOxmMDcy2obuOUU4OuhZpn6B5+c9sJWoVVANePoSfLzbgQ== X-SONIC-DKIM-SIGN: v=1; a=rsa-sha256; c=relaxed/relaxed; d=yahoo.com; s=s2048; t=1627651993; bh=8EDnP6L1D/TyhMhyFLWoWOPO6Wc4Yd8+6FtGENF1Obo=; h=X-Sonic-MF:Date:From:To:Subject:From:Subject; b=LLrn2ZxiyvJUthZEoEz1EVLN+LOhhvpN9rJBxO5y2iZo5KSyFuexdepYjl2D3+tgOFL6XK4cwkCPrARYhcMSStm6MZDfWKWRNlDsNhMzHen65oBTTql0e3/BTuSH9aOQneQfXW8ZqNGIP+t6R3Uvi8AIvE87S/KG/KRpBA/wEv89j2PYKDEB2m44H1Rxv30g8pq1tkSOP6VOGJR5mjdA/66Q+dU7gizE88qVX9MrCPmfGk9rJU7tV3c0+nTnaS4EqOyTSQ9oHuPZwKUHv25p3d0TtHdIaxz69AFQsCxrSNa9laQ1kZEJMwmlZACjpnMbCSTFYS83DF6dnL9klgf8tQ== X-YMail-OSG: fDc.LxkVM1lcZB8UriCkjJqKv.7jpaFj1AJ6.wJgixgRU7e2YHnNLStQU_5D2c6 0GS9vIkg8FWyl9H5wSby2p9FcX3rt6_oYZtuh.ToW1z2khhjwmeUzi3tiGTOTl0IhblPjhzEFvji .73dkFAJY4ZOJQ.P9I4CgrhkmtpOkCPQx0HFdJMCrrT8MIXHIUgr3k9ngQoP_kvn5TQS8Moc3dxI wV5wN4d_1MXFI4OmYJHaPB9Cz94O_.2nB8Q5Py8irQim1cAtYiwN4FM.I4OMt_mP4Wf4JXsfNiJh FpRgVuOLNbf9pDu62dxlpqInjKyA2WPl2XXl.s14O.FVWHe2udgqAAptUp3t7FUrYaQMg9Uqnh0d FZ_wz__HJLHhd95u2EPtmlZOs4zta9SSjOmqHZp7j6XKPhfD3H5.F_NivIno1Fa2JB2M0V7HzPhE JQJ1oBGDuqw_6diKTTMPZEHfVPkG0z5Hn488i0h6edH2WmzqY87_jEXq3VsLceX2bLSNfYcNQ6hQ YoNRbMBTZsZBpQPixDqz8QkAMSLjddIVVPdfSWFR0kf4GjcmFVpoH1oVn6KozPfNB4ENY.AVpwso CBM35bj8W8efRmkx26TwfPGz0DGDToyGtSqBOTxIIHEJLK3wv9Hyz8vGThfFLeKHNApD1Lm9PIig oaUeH0sz2TTCEeJfCd7Ybjsc1hANGlbEoMNpX.Rk3nGYN8CsiU7lZQ5uHLlWu1psGzhqB1Hugiue .9X3w6CFL0RMdlohrP9806G4knz3MxI9ROsiyHGfVFIY6KauNuROCSB8BruDFv8DW7OL7FW4IQMC lGpACUTojdNQY0N5ChraPUFRdNh9Nh8znGtp8vsOt1 X-Sonic-MF: Original-Received: from sonic.gate.mail.ne1.yahoo.com by sonic312.consmr.mail.bf2.yahoo.com with HTTP; Fri, 30 Jul 2021 13:33:13 +0000 Original-Received: by kubenode521.mail-prod1.omega.ir2.yahoo.com (VZM Hermes SMTP Server) with ESMTPA ID 1c45548db45175d8a653a3f43e4bce3f; Fri, 30 Jul 2021 13:33:11 +0000 (UTC) Content-Disposition: inline In-Reply-To: X-Mailer: WebService/1.1.18749 mail.backend.jedi.jws.acl:role.jedi.acl.token.atz.jws.hermes.aol Received-SPF: pass client-ip=74.6.128.82; envelope-from=spacibba@aol.com; helo=sonic312-20.consmr.mail.bf2.yahoo.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271844 Archived-At: On Fri, Jul 30, 2021 at 02:06:00PM +0200, Arthur Miller wrote: >Andrei Kuznetsov writes: > > Leake writes: >> >>> That's true for the common TS runtime, which implements the parser and >>> error recovery, but the code for each language, that builds the LR parse >>> table and some other data structures, is generated in C from a grammar >>> file written in javascript, and must be linked into Emacs somehow. In >>> addition, some languages require an "external scanner", which is more >>> code in C that is specific to the language. >> >> Interesting. I assume it would be possible to reuse the source grammar >> files? > >It probably is, and looking at neowim's gh repo, there are some >instructions on how to create a grammar for new language: > >https://github.com/nvim-treesitter/nvim-treesitter > >The process could probably be somehow automated from lisp. > >I have though a sincere question about this entire tree-sitter >venture. Is it really worth trouble in Emacs case? As I understand TS it >is a specialized regex matcher, and looking at some language specs leave >me with that feeling (for example the grammar for bash): > >https://github.com/tree-sitter/tree-sitter-bash/blob/master/src/grammar.json > >I undestand that having specialized regex matcher is more efficient than >some generalized regular matcher current font-locking in Emacs relies >upon, but is it *that* more efficient to be worth the extra troubles? >TS seem to keep state (a node) for each character typed, that will be a >lot of memory consumed in some big files. If this syntax tree it keeps >to implement what it does can be re-used for something else than it >could be very useful, but just for syntax-highlight and indentation? >Some years ago, when opening some 10k lines as found in Emacs src dir, I >noticed some slowdown on font lock. But nowadays I don't experience any >hickups with syntax hightlighting or indentation. > >Anyway, it is very educating to see TS get merged into Emacs and to read >Eli's tips and guidance about Emacs internals. > The TS thing came out due to some issues in the c-mode highlighting reported in that thread: correctness and speed (slowing down things like scrolling). c-mode does its best, but C++ is evolving and more complex analysis comes with a penalty and more and more code complexity in the parser. Same happens with new languages very extended. It will be very difficult to implement a complete/competitive mode like c-mode for all the new languages that are very popular today (rust, typescript; even python). So we end having some "weak" modes with inconsistencies and different bindings and color themes. Those become unmaintained after a time because the developers migrate to more complete editors/ide and new developers just don't come to emacs because it does not satisfy their needs to start with. Probably I am wrong but 99% of the web developers (React, Nodejs, Angular) are using VSCode, the rest are with neovim; so we don't even have people with enough knowledge and motivation to implement one of those in Emacs one by one. Because these languages are more complex to analyze and because we don't have people to maintain a mode for all of them. Trying to do so will spend too much developer time reinventing what TS already does (and does it right, efficiently and with a support community). So; maintaining a mode for every language we currently don't support is not scalable over time. And reimplementing a replacement for TS in Elisp won't worth it and will end up being very slow and repeating all the errors that TS developers have already solved. TS may be useful not only for syntax highlight and indentation but also for code navigation and some basic syntax checking. Basically TS is: One "infrastructure" to rule them all.