From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made Date: Wed, 23 Aug 2023 08:52:30 +0000 Message-ID: <874jkq87jl.fsf@localhost> References: <871qfv2zlk.fsf@localhost> <83a5ujtgfo.fsf@gnu.org> <87jztn1c5x.fsf@localhost> <834jkrters.fsf@gnu.org> <87v8d7i48y.fsf@localhost> <83ttsrrroo.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="8814"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, 65451@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Wed Aug 23 10:53:27 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1qYjcE-00023Q-Js for geb-bug-gnu-emacs@m.gmane-mx.org; Wed, 23 Aug 2023 10:53:26 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1qYjbp-0002hc-C8; Wed, 23 Aug 2023 04:53:01 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1qYjbm-0002Zo-Kc for bug-gnu-emacs@gnu.org; Wed, 23 Aug 2023 04:52:58 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1qYjbm-0007l2-Cp for bug-gnu-emacs@gnu.org; Wed, 23 Aug 2023 04:52:58 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1qYjbp-0001iS-Kz for bug-gnu-emacs@gnu.org; Wed, 23 Aug 2023 04:53:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Wed, 23 Aug 2023 08:53:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 65451 X-GNU-PR-Package: emacs Original-Received: via spool by 65451-submit@debbugs.gnu.org id=B65451.16927807356516 (code B ref 65451); Wed, 23 Aug 2023 08:53:01 +0000 Original-Received: (at 65451) by debbugs.gnu.org; 23 Aug 2023 08:52:15 +0000 Original-Received: from localhost ([127.0.0.1]:32791 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYjb4-0001h2-Nx for submit@debbugs.gnu.org; Wed, 23 Aug 2023 04:52:15 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]:39245) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1qYjb1-0001gk-Ow for 65451@debbugs.gnu.org; Wed, 23 Aug 2023 04:52:13 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 75AC124002A for <65451@debbugs.gnu.org>; Wed, 23 Aug 2023 10:52:02 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1692780722; bh=SvN7BCt4RuFOItFbl44HxBrmYVsFJrcEP47qHxr4k94=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:From; b=BVviGfpdmNbUYKHSImIZoElf8kmmzi9iaVprI3dCUC12IGFvHfxIqIxwTT2VM2G7r WwQ00SNbLlylg2IwOy1gd23konJ2B1S/h25gJsbAipvSVz0Qxythveap0PlzKZ09zO aMo2TDGSYz5R2cndOLN92QLOgbtGmNQ9x69lc2gRLpcnQ9l+OJI01ssAfj1R4TMVx0 NuB3DG++r/T38oPch/fTZ0XmNlxyq+p7iGJl6vlOYQmPsJyzJcNagRyOxW9KY347/F 0ZxYson1Im4No51rBg9nsccsTtInQDU5qao+OF3eJ1Dz3ZrBwVxyLoPe63oLYi1sIR AVXWumvs/SxAA== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4RW0MN2KxSz6tsj; Wed, 23 Aug 2023 10:52:00 +0200 (CEST) In-Reply-To: <83ttsrrroo.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:268211 Archived-At: Eli Zaretskii writes: >> Org wants to do the same thing tree-sitter does - keep parsed AST in >> sync with buffer modifications without having to re-parse the whole >> buffer. So, we basically need the same information tree-sitter needs - >> the sequence of buffer text changes, in their order. > > We don't expose the data you want to tree-sitter in Lisp. What is > exposed to Lisp are the parser and parse-tree objects that we build > (in C) based on tree-sitter parsing results. I understand. The difference is that Org mode tries to implement the tree sitter algorithm (not exactly, but similar) in Elisp. > ... When the buffer is > modified, the information about the modifications is used internally > by Emacs, in C code, to find and update the relevant parsers, and for > that we call the tree-sitter functions involved in this process. See > the function treesit_record_change which does that, and which is > called from C when buffer text changes in a way relevant to treesit.el > functionalities. (Note that some changes of buffer text are not > visible even to tree-sitter, because we decided they are not relevant, > for now.) I understand. Org also does not need every single change. The changes that, when combined, do not affect the buffer text are not important to Org as well. In fact, Org even merge subsequent edits in the same, or intersecting region of text. For example, if no queries to Org AST are done between edits, Foo bar baz -> Foo bar|new text| baz -> Foo bar|new text| and more| baz is simply treated as a single edit Foo bar baz -> Foo bar|new text| and more| baz > If tracking markers is not enough, then I wonder how the information > from the lower levels, which is basically the same but noisier, will > be able to help you. We do not really need information from lower levels. At every moment of time, Org parser maintains a linkage between buffer text and its parsed AST. After a buffer edit, we need to know how to update the AST corresponding to the last known buffer text state to the new (current) text state. To update the old AST we need to know which regions in the old AST (and old buffer text) were edited and their change in length. Then, we (1) remove elements in AST that are structurally affected by the edit; (2) shift elements after the edit that are not structurally affected but whose boundaries must be adjusted to accommodate for the buffer length being changed; (3) re-parse buffer text (after the edit) and fill the gap in the AST. For step (1), it is critical to have information about changed regions in the same order the edits happen. And, in theory, we might utilize before-change-functions to obtain this information (assuming that before-change-functions are always called in the same order the edits happen). But, unfortunately, before-change-functions are too noisy for this purpose because, for example, simply altering text properties also triggers before-change-functions, while no actual change in buffer text occurs in such scenario and re-parsing AST is unnecessary. So, we instead have to use after-change-functions and work out the original changed region boundaties using beg_changed end_changed pre-change_lentgh -> beg_before_change = beg_changed; end_before_change = end_changed - pre-change_length This calculation of the original changed region becomes incorrect when the order of after-change-functions is not the same as the editing order - this bug report. Step (2) is technically not necessary if we use markers, but we cannot do it yet for performance reasons. And without markers, we again need to know the change in length of th edited region: end_changed - beg_changed - pre-change_lentgh. This calculation is also incorrect when after-change-function is not the same as the editing order. All the above information does not have to be granular. We are good as long as nothing queries the AST between the edits. For example, internal details of edit sequence done by replace-match are not important because Org AST will not be queried from Emacs C internals. >> I hope that we can solve this issue one way or another. This currently >> breaks the very core functionality of Org. Every part of Org relies on >> it to obtain reasonable performance. Prior to using cache, we had orders >> of magnitude slowdowns. > > If you can arrange your design such that Lisp sees only AST-specific > objects affected by the modifications in buffer text, then I believe > we will have a good chance of finding a satisfactory solution. If > that requires to have some of your code in C (preferably, generalized > to some extent), then so be it. The problem is that whether an edit requires re-parsing an AST object or not depends on the object type. For example, :DRAWER: Text inside drawer. :END: Corresponds to the following simplified AST: (drawer (:name "DRAWER") (paragraph)) Then, changing it to :DR: ... :END: will require updating 'drawer' AST node: (drawer (:name "DR") (paragraph)) while :DRAWER: Text inside drawer. :ND: will invalidate drawer and turn the whole thing into (paragraph) and :DRAWER: Text inside drawer. :END: will not affect the drawer, just inner paragraph. Further, :DRAWER: Text inside * Heading drawer. :END: will split the whole thing into (paragraph) (heading (paragraph)) Basically, I do not see an obvious way to abstract AST away into C without implementing Org parser in C as well. With an exception of exposing treesit_record_change to Elisp, so that we can get information about region before and after the edit when updating the AST - that's what I had in mind in https://yhetil.org/emacs-devel/83tu8jq2vl.fsf@gnu.org/ We would need something like after-change-text-functions that will only trigger after actual text edits and provide information about region boundaries before and after the edit. > Moreover, I think the solution you think you want you actually _don't_ > want, because it will overwhelm you with changes that are not relevant > to your purposes. You can see a clear evidence to that in the fact > that treesit_record_change is called only in several strategical > places, not everywhere where we change buffer text, and not at the > lowest level of such changes. There's a reason to that. Indeed. As I tried to explain, the problem for me is not the granularity of changes, but their ordering. -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at