unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Ihor Radchenko <yantar92@posteo.net>
To: Eli Zaretskii <eliz@gnu.org>
Cc: casouri@gmail.com, 65451@debbugs.gnu.org
Subject: bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
Date: Wed, 23 Aug 2023 08:52:30 +0000	[thread overview]
Message-ID: <874jkq87jl.fsf@localhost> (raw)
In-Reply-To: <83ttsrrroo.fsf@gnu.org>

Eli Zaretskii <eliz@gnu.org> writes:

>> Org wants to do the same thing tree-sitter does - keep parsed AST in
>> sync with buffer modifications without having to re-parse the whole
>> buffer. So, we basically need the same information tree-sitter needs -
>> the sequence of buffer text changes, in their order.
>
> We don't expose the data you want to tree-sitter in Lisp.  What is
> exposed to Lisp are the parser and parse-tree objects that we build
> (in C) based on tree-sitter parsing results.

I understand. The difference is that Org mode tries to implement the
tree sitter algorithm (not exactly, but similar) in Elisp.

> ... When the buffer is
> modified, the information about the modifications is used internally
> by Emacs, in C code, to find and update the relevant parsers, and for
> that we call the tree-sitter functions involved in this process.  See
> the function treesit_record_change which does that, and which is
> called from C when buffer text changes in a way relevant to treesit.el
> functionalities.  (Note that some changes of buffer text are not
> visible even to tree-sitter, because we decided they are not relevant,
> for now.)

I understand. Org also does not need every single change. The changes
that, when combined, do not affect the buffer text are not important to
Org as well. In fact, Org even merge subsequent edits in the same, or
intersecting region of text. For example, if no queries to Org AST are
done between edits,

Foo bar<point> baz ->
Foo bar|new text|<point> baz ->
Foo bar|new text| and more|<point> baz

is simply treated as a single edit
Foo bar<point> baz -> Foo bar|new text| and more|<point> baz

> If tracking markers is not enough, then I wonder how the information
> from the lower levels, which is basically the same but noisier, will
> be able to help you.

We do not really need information from lower levels.

At every moment of time, Org parser maintains a linkage between buffer
text and its parsed AST. After a buffer edit, we need to know how to
update the AST corresponding to the last known buffer text state to the
new (current) text state.

To update the old AST we need to know which regions in the old AST (and
old buffer text) were edited and their change in length. Then, we (1)
remove elements in AST that are structurally affected by the edit;
(2) shift elements after the edit that are not structurally affected but
whose boundaries must be adjusted to accommodate for the buffer length
being changed; (3) re-parse buffer text (after the edit) and fill the
gap in the AST.

For step (1), it is critical to have information about changed regions
in the same order the edits happen. And, in theory, we might utilize
before-change-functions to obtain this information (assuming that
before-change-functions are always called in the same order the edits
happen). But, unfortunately, before-change-functions are too noisy for
this purpose because, for example, simply altering text properties also
triggers before-change-functions, while no actual change in buffer text
occurs in such scenario and re-parsing AST is unnecessary. So, we
instead have to use after-change-functions and work out the original
changed region boundaties using
beg_changed end_changed pre-change_lentgh ->
beg_before_change = beg_changed;
end_before_change = end_changed - pre-change_length
This calculation of the original changed region becomes incorrect when
the order of after-change-functions is not the same as the editing order
- this bug report.

Step (2) is technically not necessary if we use markers, but we cannot
do it yet for performance reasons. And without markers, we again need to
know the change in length of th edited region:
end_changed - beg_changed - pre-change_lentgh.
This calculation is also incorrect when after-change-function is not the
same as the editing order.

All the above information does not have to be granular. We are good as
long as nothing queries the AST between the edits. For example, internal
details of edit sequence done by replace-match are not important because
Org AST will not be queried from Emacs C internals.

>> I hope that we can solve this issue one way or another. This currently
>> breaks the very core functionality of Org. Every part of Org relies on
>> it to obtain reasonable performance. Prior to using cache, we had orders
>> of magnitude slowdowns.
>
> If you can arrange your design such that Lisp sees only AST-specific
> objects affected by the modifications in buffer text, then I believe
> we will have a good chance of finding a satisfactory solution.  If
> that requires to have some of your code in C (preferably, generalized
> to some extent), then so be it.

The problem is that whether an edit requires re-parsing an AST object or
not depends on the object type. For example,

:DRAWER:
Text inside
drawer.
:END:

Corresponds to the following simplified AST:

(drawer (:name "DRAWER") (paragraph))

Then, changing it to

:DR:
...
:END:

will require updating 'drawer' AST node:

(drawer (:name "DR") (paragraph))

while 

:DRAWER:
Text inside
drawer.
:ND:

will invalidate drawer and turn the whole thing into

(paragraph)

and 

:DRAWER:
Text inside
<more text>
drawer.
:END:

will not affect the drawer, just inner paragraph.

Further,

:DRAWER:
Text inside
* Heading
drawer.
:END:

will split the whole thing into

(paragraph) (heading (paragraph))

Basically, I do not see an obvious way to abstract AST away into C
without implementing Org parser in C as well.
With an exception of exposing treesit_record_change to Elisp, so that we
can get information about region before and after the edit when updating
the AST - that's what I had in mind in
https://yhetil.org/emacs-devel/83tu8jq2vl.fsf@gnu.org/ We would need
something like after-change-text-functions that will only trigger after
actual text edits and provide information about region boundaries before
and after the edit.

> Moreover, I think the solution you think you want you actually _don't_
> want, because it will overwhelm you with changes that are not relevant
> to your purposes.  You can see a clear evidence to that in the fact
> that treesit_record_change is called only in several strategical
> places, not everywhere where we change buffer text, and not at the
> lowest level of such changes.  There's a reason to that.

Indeed. As I tried to explain, the problem for me is not the granularity
of changes, but their ordering.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





  reply	other threads:[~2023-08-23  8:52 UTC|newest]

Thread overview: 34+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-08-22  9:30 bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made Ihor Radchenko
2023-08-22 12:22 ` Eli Zaretskii
2023-08-22 12:42   ` Ihor Radchenko
2023-08-22 12:58     ` Eli Zaretskii
2023-08-22 13:41       ` Ihor Radchenko
2023-08-22 16:02         ` Eli Zaretskii
2023-08-23  8:52           ` Ihor Radchenko [this message]
2023-08-23 17:58             ` Eli Zaretskii
2023-08-24  7:46               ` Ihor Radchenko
2023-08-24  8:08                 ` Eli Zaretskii
2023-08-24 11:24                   ` Ihor Radchenko
2023-08-24 12:08                     ` Eli Zaretskii
2023-08-24 13:27                       ` Ihor Radchenko
2023-08-24 14:53                         ` Eli Zaretskii
2023-08-25  6:37                           ` Eli Zaretskii
2023-08-25  9:09                             ` Ihor Radchenko
2023-08-26  7:10                               ` Eli Zaretskii
2023-08-27  8:13                                 ` Ihor Radchenko
2023-08-27  8:29                                   ` Eli Zaretskii
2023-08-29  7:39                                     ` Ihor Radchenko
2023-08-25  8:09                           ` Ihor Radchenko
2023-08-25 10:25                             ` Eli Zaretskii
2023-08-25 10:49                               ` Ihor Radchenko
2024-03-30 13:51   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30 14:11     ` Eli Zaretskii
2024-03-30 15:38       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30 16:47         ` Eli Zaretskii
2024-03-31  3:04           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-31  6:06       ` Eli Zaretskii
2024-03-31 13:57         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-08 19:10         ` Ihor Radchenko
2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874jkq87jl.fsf@localhost \
    --to=yantar92@posteo.net \
    --cc=65451@debbugs.gnu.org \
    --cc=casouri@gmail.com \
    --cc=eliz@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).