From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: Exposing buffer text modifications to Lisp (was: Tree-sitter integration on feature/tree-sitter) Date: Fri, 17 Jun 2022 14:42:33 +0300 Message-ID: <834k0jplcm.fsf@gnu.org> References: <2c2746e5f2558a87e8eab6f0914264a020173a9d.camel@pm.me> <27630AA3-8026-4E24-8852-ACCD9325B99D@gmail.com> <0E9E702B-B07C-4794-8498-29B9320E14CC@gmail.com> <871qvorqvv.fsf@localhost> <83tu8jq2vl.fsf@gnu.org> <87sfo37etn.fsf@localhost> Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="19142"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, emacs-devel@gnu.org To: Ihor Radchenko Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Fri Jun 17 13:44:12 2022 Return-path: Envelope-to: ged-emacs-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1o2AOa-0004ir-Bt for ged-emacs-devel@m.gmane-mx.org; Fri, 17 Jun 2022 13:44:12 +0200 Original-Received: from localhost ([::1]:35418 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1o2AOY-0004wF-Mm for ged-emacs-devel@m.gmane-mx.org; Fri, 17 Jun 2022 07:44:10 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37358) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o2AN5-00049r-Jv for emacs-devel@gnu.org; Fri, 17 Jun 2022 07:42:40 -0400 Original-Received: from fencepost.gnu.org ([2001:470:142:3::e]:50390) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o2AN5-0004T9-Av; Fri, 17 Jun 2022 07:42:39 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=gnu.org; s=fencepost-gnu-org; h=References:Subject:In-Reply-To:To:From:Date: mime-version; bh=fe61L5ja1hHcvu+tmcS7+mhN6WOhh5EtmcHqmOkeuko=; b=ECbVEhTvG1vi w4zVzX6hNj2J+m/P6Rb1sjzeceQ0Sfdq6dpPWcxel+Otk7zmMVVigR+2v2gNHZlxuPo18BcrtiVU7 C+J6iMceeLu9Mm8TKDDJ4FA7zRPhH6IoeAyHubpxvWyUqhSaz4Cq5GqRb07msdOFVn/CbJuLunn5+ KYvjxlNXVXpvFHLgbT0y50eAIMnahoNumSLNJvQNNit7bT0fSz9r2lwfNESTiXgs5bT6KuO7Gkp6+ ayyGDaN4bTkt3QUoeU6p05c2Jv5JtWZo2aniTbbwoEMO3mutEvx3hBA6BoqHUOR49vIzv/ZDxmdBo GAwA5lPIGq78BMfoZLE10Q==; Original-Received: from [87.69.77.57] (port=3342 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1o2AN4-00034J-O6; Fri, 17 Jun 2022 07:42:39 -0400 In-Reply-To: <87sfo37etn.fsf@localhost> (message from Ihor Radchenko on Fri, 17 Jun 2022 18:40:52 +0800) X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane-mx.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.io gmane.emacs.devel:291292 Archived-At: [I've changed the Subject, since this is not longer about tree-sitter.] > From: Ihor Radchenko > Cc: casouri@gmail.com, emacs-devel@gnu.org > Date: Fri, 17 Jun 2022 18:40:52 +0800 > > AFAIK, tree-sitter branch does not do anything related to _writing_ > parsers. Parsers are implemented via tree-sitter modules. That is correct. However, tree-sitter support is called for certain changes in buffer text because tree-sitter needs direct and efficient access to buffer text when those certain changes happen, and that cannot be provided in Lisp. There was a long discussion several months ago where we came to this conclusion; the original design ideas were different, and indeed at least some of them were based on buffer-substring, which IMO is a terrible idea for this class of features. > Org mode parses Org markup elements in buffer into AST structure. > This AST structure is used to fontify Org buffers, modify various > elements, query element properties, build lists of matching elements > according to user queries (agenda), etc > > The Org mode parser is implementing pretty much the same features > tree-sitter provides (except that the relevant Org code was in place > before tree-sitter became a thing): Only parts of Org buffer are parsed > as needed; buffer modifications trigger updates only within the affected > parts of the AST. OK, but that still doesn't tell what you need from the Emacs core. Can you describe those needs? I presume that modification hooks (of any kind) are just the means; the real need is something else. What is it? If (as I presume) you need to know about changes to the buffer, then can you enumerate the changes that are of interest? For example, are changes in text properties and overlays of interest, and if so, what kind of properties/overlays? (But please don't limit your answers to just text properties and overlays, because I asked about them explicitly.) Next, what kind of ASTs do you want to build, and how do you represent text as AST? In particular, is the AST defined by regexps or some other Lisp data structures? > As for implementing in C, I am not even sure how to approach this. This is what needs to be discussed. Emacs does have features implemented partially in Lisp and partially in C, so this is not impossible, far from that. One example that comes to mind is character composition -- a feature of the display engine that is completely controlled by Lisp data structures that can be easily changed at run time. So, once we understand the needs and the requirements, I'm quite sure ideas about the possible implementations will not have us waiting for long. > Emacs does provide external module, but AFAIU modules communicate > with Emacs process via print-ing/read-ing strings and the internal > Emacs-C functions are not available. I am not convinced that the > speed difference will be worth it to bother rewriting the whole > parser in Emacs-C. I wasn't suggesting using modules. Modules are intentionally limited in their access to the Emacs internals. For a core feature like the one you are describing, using modules makes no sense at all. No, I was talking about providing new primitives and/or extending existing primitives in order to support these features you want to provide in Org (and also potentially to enable implementation of other similar features by other packages). As for speed, I suggest to delay the discussion of that until we have a better understanding of the requirements and their various aspects, and have some ideas regarding the possible implementations. Even if eventually there will be no gain in speed (and I find that hard to believe), the safety of keeping some of the implementation un-exposed to Lisp could well be worth our while. Speed alone is not a good-enough reason to implement something in C, especially if Lisp performance is acceptable.