From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.devel Subject: Re: Tree-sitter integration on feature/tree-sitter Date: Fri, 17 Jun 2022 18:40:52 +0800 Message-ID: <87sfo37etn.fsf@localhost> References: <2c2746e5f2558a87e8eab6f0914264a020173a9d.camel@pm.me> <27630AA3-8026-4E24-8852-ACCD9325B99D@gmail.com> <0E9E702B-B07C-4794-8498-29B9320E14CC@gmail.com> <871qvorqvv.fsf@localhost> <83tu8jq2vl.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="26735"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jun 17 12:42:31 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o29Qt-0006lE-LN for ged-emacs-devel@m.gmane-mx.org; Fri, 17 Jun 2022 12:42:31 +0200 Original-Received: from localhost ([::1]:48652 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o29Qr-0005K2-Vl for ged-emacs-devel@m.gmane-mx.org; Fri, 17 Jun 2022 06:42:30 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:51864) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o29OI-0003zF-G5 for emacs-devel@gnu.org; Fri, 17 Jun 2022 06:39:50 -0400 Original-Received: from mail-pf1-x42b.google.com ([2607:f8b0:4864:20::42b]:33606) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1o29OG-0003QM-PJ; Fri, 17 Jun 2022 06:39:50 -0400 Original-Received: by mail-pf1-x42b.google.com with SMTP id w21so3871015pfc.0; Fri, 17 Jun 2022 03:39:47 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=from:to:cc:subject:in-reply-to:references:date:message-id :mime-version; bh=iM7UOXC+4jZ5hJGsotjG7krWancndXd8ReNSjVs2b6U=; b=lsbJDeOVFGQKYaHNwSK2BgjPLUyyuTqzKpnQZioL/DLW9WL0Dv12zEHEVwRVlr6iQ3 2FI0SH9aXOfhXhoZetvPW04YazAJF6xYaDmsVuaJ6l5+hLivzW8cIXPI/jaKf68WvfPx IgsPvESpZraqgYTGXEGcawr0ZVDaGygBvtbsrXySz5woFqfH1L+/BV6qPuGTwQ7Z2sVV s9LNidw/SEn5YXIhoBktrwZVnWr0c3l9HzSNvedK35jMysUDjWq5oO7CVBlMjarvAm9t 7kWZJZWjTASva9R56Zl1Ro44qGJYUMlT4zSaMmUfnK7LPM3buqTggAvxezLpfz+iOwOW IOBQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=iM7UOXC+4jZ5hJGsotjG7krWancndXd8ReNSjVs2b6U=; b=UWJ3s6g6Jj6BNOAzy1w2T7Jcq+5jgpDqsIfX7c0WS/Pmww7o3NNR0a/31kodJKLgBd flbvkgn35X1EiVPT9SYOfX4TxppePvwGHi7ftu0lvMbPcViq6I/Pns5YOT2jBCTmv0Sd f8OKJuLukMDZ5ci+eKIlW2FLRT5Psfcl1u/E8vrv76kYZzUFBRhQeydICQZVPj/Qt7k/ ONRaZK9Y//+wO1/18+0hNxSb53ngXZ5ZV9kzMHi/+nIrSNxIYEe8hziagWCyDelYfuD1 aWcoktCwQdDotS7o6EQNzk9TvWf3lENxb030eECI9LLtPvVvSqehfXaYlzxjql6ni2Bl GbIw== X-Gm-Message-State: AJIora+mLnvv+cPYiMbYyarsfuiRMWDhRW+3iQpGSSvx2hhdsAqeOzox u0Mrh+mMvO6gegsedjIlLj6zwCaFqUC60Sq6 X-Google-Smtp-Source: AGRyM1shPUpva4NmA/+0WjF3QDwVkUFWYpV54IuOCKuuC7G/kXNKcFamSdqqBQ3iECXOIuXiNRGRcA== X-Received: by 2002:a65:5b86:0:b0:405:1870:39d3 with SMTP id i6-20020a655b86000000b00405187039d3mr8462615pgr.226.1655462385561; Fri, 17 Jun 2022 03:39:45 -0700 (PDT) Original-Received: from localhost ([66.154.105.4]) by smtp.gmail.com with ESMTPSA id z2-20020a170903018200b001635a8f9dfdsm3332530plg.26.2022.06.17.03.39.44 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 17 Jun 2022 03:39:44 -0700 (PDT) In-Reply-To: <83tu8jq2vl.fsf@gnu.org> Received-SPF: pass client-ip=2607:f8b0:4864:20::42b; envelope-from=yantar92@gmail.com; helo=mail-pf1-x42b.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_ENVFROM_END_DIGIT=0.25, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:291285 Archived-At: Eli Zaretskii writes: >> I am asking in the interest of Org mode parser that is also parsing the >> buffer AST and tracks buffer modifications. > > Please tell more about the need. I'm not happy with exposing this to > Lisp, and don't understand why the low-level parts of parsing the > buffer AST should be written in Lisp in the first place. The > tree-sitter branch does this in C for that very reason. AFAIK, tree-sitter branch does not do anything related to _writing_ parsers. Parsers are implemented via tree-sitter modules. Org mode parses Org markup elements in buffer into AST structure. This AST structure is used to fontify Org buffers, modify various elements, query element properties, build lists of matching elements according to user queries (agenda), etc The Org mode parser is implementing pretty much the same features tree-sitter provides (except that the relevant Org code was in place before tree-sitter became a thing): Only parts of Org buffer are parsed as needed; buffer modifications trigger updates only within the affected parts of the AST. Thanks to the parser, Org is able to handle quite large buffers. Our parser written in Lisp and yet it can parse a 15Mb Org file within 17sec vs. 8sec if parsed using the available incomplete tree-sitter Org parser (https://github.com/milisims/tree-sitter-org). Note that unlike tree-sitter, Org parser is able to change syntax using Elisp. For example, adding new link element types is trivial with a number of ol-*.el libraries provided by Org and third-party packages. Moreover, the on-demand parsing makes even 15Mb Org files responsive on runtime with little issues. I was able to get a bearable performance even in 100Mb Org file. Org mode parser with all its flexibility would be difficult to implement using tree-sitter. As for implementing in C, I am not even sure how to approach this. Emacs does provide external module, but AFAIU modules communicate with Emacs process via print-ing/read-ing strings and the internal Emacs-C functions are not available. I am not convinced that the speed difference will be worth it to bother rewriting the whole parser in Emacs-C. Best, Ihor