From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Arthur Miller Newsgroups: gmane.emacs.devel Subject: Re: [SPAM UNSURE] Maybe we're taking a wrong approach towards tree-sitter Date: Fri, 30 Jul 2021 14:06:00 +0200 Message-ID: References: <8735rzyzbz.fsf@163.com> <86v94v3xh9.fsf@stephe-leake.org> <87wnpargnb.fsf@elite.giraud> <87h7gey7zx.fsf@163.com> <83pmv2twrl.fsf@gnu.org> <86sfzwogsn.fsf@stephe-leake.org> <87o8akmy4p.fsf@163.com> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="29866"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.0.50 (gnu/linux) Cc: Eli Zaretskii , Stephen Leake , manuel@ledu-giraud.fr, emacs-devel@gnu.org To: Andrei Kuznetsov Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jul 30 14:07:23 2021 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1m9RIQ-0007cG-5z for ged-emacs-devel@m.gmane-mx.org; Fri, 30 Jul 2021 14:07:23 +0200 Original-Received: from localhost ([::1]:58316 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1m9RIO-0000s9-UQ for ged-emacs-devel@m.gmane-mx.org; Fri, 30 Jul 2021 08:07:20 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:33396) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m9RHE-00006C-N6 for emacs-devel@gnu.org; Fri, 30 Jul 2021 08:06:10 -0400 Original-Received: from mail-oln040092075077.outbound.protection.outlook.com ([40.92.75.77]:62126 helo=EUR04-VI1-obe.outbound.protection.outlook.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1m9RHA-0001tj-BR; Fri, 30 Jul 2021 08:06:08 -0400 ARC-Seal: i=1; a=rsa-sha256; s=arcselector9901; d=microsoft.com; cv=none; b=Xub4o1TIrr5BlbpCQlhXH9Fwtp4cxF6xnK/2X9PTDo+EU6KNWW+BHPq+zdiiOFZ9pYuGB3zQ3gHYGzwbBLHHyFPnzIoXfxkMzDttBe7DJPyWo2vj1jm63pwlJlJeIpuskwyTtSdgyCxlQn6vTsCWKuP8Ioyxwn6Ua3xdiwhtbCO2gkbVZ0goHdPXy4Q0+ZwidDGVP6QtrqbSEaXeeInXyeQ3C4CYzxBubo2278hs4dRISIPrSXUNQuyFlNZQC5VAj9jeI696tUZDLTb+P9TAXZhzvev2Oile6JT5toSjv/AcZw4gcxeEIpwWFFj0DehaiDsYeE0sgzEd7ju1H6elHQ== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=microsoft.com; s=arcselector9901; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f5XWUTmjnb1BUTXeXaMzceq1n8zLGxtRPNEQ35/R3cE=; b=mM5dSUN3Xp9P/y6dmbxJhE088DfsgD6xxHOs8ZNwJ34G7epGgNi+UaTv4fmUFSJ8NNYaC9RRhVbce+GzTtbLcDtIIdVIHra1anrAtMPMIohPC+x3Flr6K1YD6wgZDkB6kp2vdasiYqkCLeIpluM3vY8LRHY/q7VNZM0PP9DI723wzuLuZv/NX1ES/Ym0XEepqKyc1djSeNAdeEtfSBzWq0p1/nG+pk4YtllGs+vHJwQQVPSHLIRc/0XA3B3dvqpNg9FpEEOPoLi8KQUfXhItHnTbLBs1DGMTJJF/CX6lkO9AaR+AMSalyrOUZcBmslyfc1cM3JXhoGeaSg9uTB/MHQ== ARC-Authentication-Results: i=1; mx.microsoft.com 1; spf=none; dmarc=none; dkim=none; arc=none DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=live.com; s=selector1; h=From:Date:Subject:Message-ID:Content-Type:MIME-Version:X-MS-Exchange-SenderADCheck; bh=f5XWUTmjnb1BUTXeXaMzceq1n8zLGxtRPNEQ35/R3cE=; b=h4J0vub7wjozYgH+IGQ4D3WVy5Xtwd6nkuZmWwCKI09ZjQtQRIYSpUypy9HTQ0ALPqiV6ynPnIZfgFYInfdu4T+lScRdaFMX0pB+gsUQSWeqJGe1xVuQN5IZFAqKCSG4nlRqBnSEEqVXtUqFJGKTuLJo4R/yahBoRt6y3am+9q8xkwhUBNSKV4zux8wC4wf401/HFt1ft11FQ/cATcAN5WXZxgA8ob0IB3KNhszn97lZUU6Li9c77gUsO/D8xB/Htr3d95YXbAFNlHNEy8hVIp2Gm02k/kL6TGLVOLkrG6tv4nNA3maW/G3rUhn3EZOEojBzEMupVr5g4yE56Idv4w== Original-Received: from HE1EUR04FT024.eop-eur04.prod.protection.outlook.com (2a01:111:e400:7e0d::4a) by HE1EUR04HT119.eop-eur04.prod.protection.outlook.com (2a01:111:e400:7e0d::359) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.18; Fri, 30 Jul 2021 12:06:01 +0000 Original-Received: from AM9PR09MB4977.eurprd09.prod.outlook.com (2a01:111:e400:7e0d::53) by HE1EUR04FT024.mail.protection.outlook.com (2a01:111:e400:7e0d::283) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.18 via Frontend Transport; Fri, 30 Jul 2021 12:06:01 +0000 X-IncomingTopHeaderMarker: OriginalChecksum:5456AD0422416D67BF2130DD550C5D1C0C5A8C1108E4275DE6D572DC99C38EC3; UpperCasedChecksum:C436D42E6877ED728227D63656462C623E1494BE0458ADC9C72E067FECF1E1CE; SizeAsReceived:7692; Count:46 Original-Received: from AM9PR09MB4977.eurprd09.prod.outlook.com ([fe80::e47b:760e:fa35:f28b]) by AM9PR09MB4977.eurprd09.prod.outlook.com ([fe80::e47b:760e:fa35:f28b%6]) with mapi id 15.20.4373.025; Fri, 30 Jul 2021 12:06:01 +0000 In-Reply-To: <87o8akmy4p.fsf@163.com> (Andrei Kuznetsov's message of "Fri, 30 Jul 2021 08:41:26 +0800") X-TMN: [z78fpKvbtBBfDT2ponLVU7n1GmevoxXE] X-ClientProxiedBy: AM6PR04CA0029.eurprd04.prod.outlook.com (2603:10a6:20b:92::42) To AM9PR09MB4977.eurprd09.prod.outlook.com (2603:10a6:20b:304::20) X-Microsoft-Original-Message-ID: <875ywsj9av.fsf@live.com> X-MS-Exchange-MessageSentRepresentingType: 1 Original-Received: from pascal.homepc (81.232.177.30) by AM6PR04CA0029.eurprd04.prod.outlook.com (2603:10a6:20b:92::42) with Microsoft SMTP Server (version=TLS1_2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id 15.20.4373.18 via Frontend Transport; Fri, 30 Jul 2021 12:06:00 +0000 X-MS-PublicTrafficType: Email X-IncomingHeaderCount: 46 X-EOPAttributedMessage: 0 X-MS-Office365-Filtering-Correlation-Id: 7748ed16-d621-43a9-dde9-08d9535265c4 X-MS-TrafficTypeDiagnostic: HE1EUR04HT119: X-Microsoft-Antispam: BCL:0; X-Microsoft-Antispam-Message-Info: q0GozTZwNhDGAUGFfk75swAqtAz3S5GF2gbwE5iOFPD3rpuPLejKXw5FD31jfNU4rv1Paneblu2Zz0yViMmvKNRVoPrywjVzNp0h+hDYGH5TDUORXEJlAM4v3VJ2777oqE8QDTcGG0nxtR9MgxQjhJmBQsCPLznIBOWOwOXMpBZEdVo8xlhHCE6yYuld9uLrmeMtxy/yPia7yzWe+WI6bvquuQWOF5Xyscv74zl28f25SOdmU8/Ae9V+L4wyIqOEf/l6H+aF+jNtKvv2E2JSvyoEvyxGZNp4/AGr4qNeRvd26AA1u6mpqNmD8FsZEuPRGEBsnkeMpifI4V8IMh74atRgoaVYGTqV48AEirMQEVcwA7ZePmyAmujhwf3zdS2Imte5CAj6SZZHy9OXO4EtUs8shsDHDMcUfb3oS2r+QWIEtjC8tPDG4GgVZk196qvUJWczp53QyosS9jj+UKh/KYnbosplW3Zo7Jcx6uEiE84= X-MS-Exchange-AntiSpam-MessageData-ChunkCount: 1 X-MS-Exchange-AntiSpam-MessageData-0: EvMP8S4FXk6lLodBwrOp55q8yPxoOpKDsQSIg3UprLqvrwfx9J4lqjclmml5J41resGSQDZMY67iP3+RtskoPLzdx+i+M+hKKXFzhT1KoLuvPk152OeP1M+xUzlo2DKAEfY2dW1vo4wP7S+/7/WOPg== X-OriginatorOrg: live.com X-MS-Exchange-CrossTenant-Network-Message-Id: 7748ed16-d621-43a9-dde9-08d9535265c4 X-MS-Exchange-CrossTenant-OriginalArrivalTime: 30 Jul 2021 12:06:01.2805 (UTC) X-MS-Exchange-CrossTenant-FromEntityHeader: Hosted X-MS-Exchange-CrossTenant-Id: 84df9e7f-e9f6-40af-b435-aaaaaaaaaaaa X-MS-Exchange-CrossTenant-AuthSource: HE1EUR04FT024.eop-eur04.prod.protection.outlook.com X-MS-Exchange-CrossTenant-AuthAs: Anonymous X-MS-Exchange-CrossTenant-FromEntityHeader: Internet X-MS-Exchange-CrossTenant-RMS-PersistedConsumerOrg: 00000000-0000-0000-0000-000000000000 X-MS-Exchange-Transport-CrossTenantHeadersStamped: HE1EUR04HT119 Received-SPF: pass client-ip=40.92.75.77; envelope-from=arthur.miller@live.com; helo=EUR04-VI1-obe.outbound.protection.outlook.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, MSGID_FROM_MTA_HEADER=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:271838 Archived-At: Andrei Kuznetsov writes: Leake writes: > >> That's true for the common TS runtime, which implements the parser and >> error recovery, but the code for each language, that builds the LR parse >> table and some other data structures, is generated in C from a grammar >> file written in javascript, and must be linked into Emacs somehow. In >> addition, some languages require an "external scanner", which is more >> code in C that is specific to the language. > > Interesting. I assume it would be possible to reuse the source grammar > files? It probably is, and looking at neowim's gh repo, there are some instructions on how to create a grammar for new language: https://github.com/nvim-treesitter/nvim-treesitter The process could probably be somehow automated from lisp. I have though a sincere question about this entire tree-sitter venture. Is it really worth trouble in Emacs case? As I understand TS it is a specialized regex matcher, and looking at some language specs leave me with that feeling (for example the grammar for bash): https://github.com/tree-sitter/tree-sitter-bash/blob/master/src/grammar.json I undestand that having specialized regex matcher is more efficient than some generalized regular matcher current font-locking in Emacs relies upon, but is it *that* more efficient to be worth the extra troubles? TS seem to keep state (a node) for each character typed, that will be a lot of memory consumed in some big files. If this syntax tree it keeps to implement what it does can be re-used for something else than it could be very useful, but just for syntax-highlight and indentation? Some years ago, when opening some 10k lines as found in Emacs src dir, I noticed some slowdown on font lock. But nowadays I don't experience any hickups with syntax hightlighting or indentation. Anyway, it is very educating to see TS get merged into Emacs and to read Eli's tips and guidance about Emacs internals.