unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
@ 2023-08-22  9:30 Ihor Radchenko
  2023-08-22 12:22 ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-22  9:30 UTC (permalink / raw)
  To: 65451


Consider the following reproducer:

1. Create /tmp/bug.el with the following contents

(defun my/setup ()
  (interactive)
  (defun my/before-change (beg end)
    (warn "Before: %d %d" beg end))
  (add-hook 'before-change-functions #'my/before-change nil 'local)

  (defun my/after-change (beg end pre)
    (warn "After: %d %d delta: %d" beg end (- end beg pre)))
  (add-hook 'after-change-functions #'my/after-change nil 'local))

2. Create /tmp/bug.org

https://www.gnu.org/software/emacs/manual/html_node/emacs/Mode-Line.html

-UUU: @**-   F2  UTF-8-demo.txt   Top   (1,0)     <N>   (Text s-/ WS Wrap)

| string | meaning                                  | note                                                                 |
|--------+------------------------------------------+----------------------------------------------------------------------|
| -      | input method                             |                                                                      |
| UUU    | coding system (keyboard terminal buffer) | (U utf-8-unix)                                                       |
| :      | end-of-line convention                   | (: LF) (/ CR) (\ CRLF)                                               |
| @      | emacsclient                              |                                                                      |
| **     | buffer status                            | (-- unmodified) (** modified) (%% read-only) (%* read-only_modified) |
| -      | default-directory                        | (- local) (@ remote)                                                 |
| F2     | frame name                               | (F2 the-2nd-frame)                                                   |
|        |                                          |                                                                      |

3. emacs -Q -l /tmp/bug.el /tmp/bug.org
4. M-x my/setup
5. Move point to

| F2     | frame name                               | (F2 the-2nd-frame)                                                   |
| <point>       |                                          |                                                                      |

6. Insert "UTF"
7. M-x dabbrev-expand
8. Observe the following in the *Warnings* buffer

⛔ Warning (emacs): Before: 1278 1278
⛔ Warning (emacs): After: 1278 1279 delta: 1
⛔ Warning (emacs): Before: 1284 1285
⛔ Warning (emacs): After: 1284 1284 delta: -1
⛔ Warning (emacs): Before: 1279 1279
⛔ Warning (emacs): After: 1279 1280 delta: 1
⛔ Warning (emacs): Before: 1284 1285
⛔ Warning (emacs): After: 1284 1284 delta: -1
⛔ Warning (emacs): Before: 1280 1280
⛔ Warning (emacs): After: 1280 1281 delta: 1
⛔ Warning (emacs): Before: 1284 1285
⛔ Warning (emacs): After: 1284 1284 delta: -1
⛔ Warning (emacs): Before: 1278 1281
⛔ Warning (emacs): Before: 1278 1288
⛔ Warning (emacs): After: 1278 1288 delta: 0
⛔ Warning (emacs): After: 1278 1288 delta: 7

The last bit of the messages corresponds to dabbrev expansion of "UTF"
to "utf-8-unix":

⛔ Warning (emacs): Before: 1278 1281
⛔ Warning (emacs): Before: 1278 1288
⛔ Warning (emacs): After: 1278 1288 delta: 0
⛔ Warning (emacs): After: 1278 1288 delta: 7

Note how "After: 1278 1288 delta: 0" reports a change of "utf-8-unix"
that did not alter the buffer text size. It is trigerred _before_
"After: 1278 1288 delta: 7" that corresponds to replacing "UTF" with
"utf-8-unix".

The order of after-change notifications thus do not correspond to the
order of buffer changes. Org mode is relying upon the correct change
order to update parser cache with buffer modifications.

The same recipe using Emacs 27 yields the correct order

<...>
Warning (emacs): Before: 1278 1281
Warning (emacs): After: 1278 1288 delta: 7
Warning (emacs): Before: 1278 1288
Warning (emacs): After: 1278 1288 delta: 0


In GNU Emacs 30.0.50 (build 1, x86_64-pc-linux-gnu, GTK+ Version
 3.24.38, cairo version 1.17.8) of 2023-08-14 built on localhost
Repository revision: d483b38070120f17b1d00975081d27191d1deacc
Repository branch: master
Windowing system distributor 'The X.Org Foundation', version 11.0.12101008
System Description: Gentoo Linux

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-22  9:30 bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made Ihor Radchenko
@ 2023-08-22 12:22 ` Eli Zaretskii
  2023-08-22 12:42   ` Ihor Radchenko
  2024-03-30 13:51   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-22 12:22 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Date: Tue, 22 Aug 2023 09:30:47 +0000
> 
> The last bit of the messages corresponds to dabbrev expansion of "UTF"
> to "utf-8-unix":
> 
> ⛔ Warning (emacs): Before: 1278 1281
> ⛔ Warning (emacs): Before: 1278 1288
> ⛔ Warning (emacs): After: 1278 1288 delta: 0
> ⛔ Warning (emacs): After: 1278 1288 delta: 7
> 
> Note how "After: 1278 1288 delta: 0" reports a change of "utf-8-unix"
> that did not alter the buffer text size. It is trigerred _before_
> "After: 1278 1288 delta: 7" that corresponds to replacing "UTF" with
> "utf-8-unix".

The inner Before+After is because replace-match also changes the
letter-case of the replaced text (to match the letter-case of the
original).

> The order of after-change notifications thus do not correspond to the
> order of buffer changes. Org mode is relying upon the correct change
> order to update parser cache with buffer modifications.

I think Org mode is relying on something it should not.  This
particular use case aside, Emacs is allowed to call a function that
changes the buffer from a function that itself changes the buffer, and
it is allowed to call that inner function _before_ it did all the
changes it intended to do.  In such cases you will see Before+After
pairs embedded inside an outer Before+After pair, because most
functions which end up calling these hooks try not to call them too
often, so they generally call the before-hook once at the beginning
and the after-hook once near the end.  IOW, we try to report all of
the changes in one go, instead of reporting them one small chunk
(perhaps even one character) at a time.  And that inevitably leads to
situations like this one.  Which is perfectly normal, from my POV, and
not a bug at all.  I'd hate to make our code more complex, let alone
slower, just to satisfy the reliance on the perfect pairing like you
do.

(We've been through this in other discussions: Org uses these hooks
for purposes they were not designed for.  The ELisp manual even warns
about such assumptions.)

> The same recipe using Emacs 27 yields the correct order

AFAIR, Emacs 27 had quite a few bugs in this area, which were fixed in
development since then.  See bug#42424, for example.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-22 12:22 ` Eli Zaretskii
@ 2023-08-22 12:42   ` Ihor Radchenko
  2023-08-22 12:58     ` Eli Zaretskii
  2024-03-30 13:51   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-22 12:42 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> The order of after-change notifications thus do not correspond to the
>> order of buffer changes. Org mode is relying upon the correct change
>> order to update parser cache with buffer modifications.
>
> I think Org mode is relying on something it should not.

Then, I'd like to point back to the previous discussion where I asked to
expose to Elisp information about buffer changes available to
tree-sitter.
https://yhetil.org/emacs-devel/83tu8jq2vl.fsf@gnu.org/

If there is no guarantee that buffer modifications are not signaled in
order, on-the-fly updates of the parsed AST will become impossible for
Org, even if using markers (as you suggested in the linked thread).

In fact, I am not sure if tree-sitter will behave correctly if it is
signaled changes in incorrect order.

https://tree-sitter.github.io/tree-sitter/using-parsers#advanced-parsing
says

    In applications like text editors, you often need to re-parse a file
    after its source code has changed. Tree-sitter is designed to support
    this use case efficiently. There are two steps required. First, you must
    edit the syntax tree, which adjusts the ranges of its nodes so that they
    stay in sync with the code.
    
    typedef struct {
      uint32_t start_byte;
      uint32_t old_end_byte;
      uint32_t new_end_byte;
      TSPoint start_point;
      TSPoint old_end_point;
      TSPoint new_end_point;
    } TSInputEdit;
    
    void ts_tree_edit(TSTree *, const TSInputEdit *);

Note how the API is similar to `after-change-functions' - it expects
edited region boundaries before/after the edit.

If Emacs feeds the following to tree-sitter
⛔ Warning (emacs): After: 1278 1288 delta: 0
⛔ Warning (emacs): After: 1278 1288 delta: 7
, the first "edit" query will apply to wrong range in the cached AST.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-22 12:42   ` Ihor Radchenko
@ 2023-08-22 12:58     ` Eli Zaretskii
  2023-08-22 13:41       ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-22 12:58 UTC (permalink / raw)
  To: Ihor Radchenko, Yuan Fu; +Cc: 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: 65451@debbugs.gnu.org
> Date: Tue, 22 Aug 2023 12:42:18 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> The order of after-change notifications thus do not correspond to the
> >> order of buffer changes. Org mode is relying upon the correct change
> >> order to update parser cache with buffer modifications.
> >
> > I think Org mode is relying on something it should not.
> 
> Then, I'd like to point back to the previous discussion where I asked to
> expose to Elisp information about buffer changes available to
> tree-sitter.
> https://yhetil.org/emacs-devel/83tu8jq2vl.fsf@gnu.org/

I don't want to do that, sorry.  Not without a good understanding of
what exactly do you need from that and in what way.  If we will expose
anything, it will have to be the minimum possible exposure, not the
maximum, so I would like to understand this very well before I agree
to any change in this direction.

> If there is no guarantee that buffer modifications are not signaled in
> order, on-the-fly updates of the parsed AST will become impossible for
> Org, even if using markers (as you suggested in the linked thread).
> 
> In fact, I am not sure if tree-sitter will behave correctly if it is
> signaled changes in incorrect order.

I will defer to Yuan, but tree-sitter doesn't use these hooks, we call
its functions directly from insdel.c where needed.  This makes sense
for a library to which we link and whose interface code we control,
but giving such access to Lisp (and Org on top of that) is out of the
question.  We don't even give such access to modules.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-22 12:58     ` Eli Zaretskii
@ 2023-08-22 13:41       ` Ihor Radchenko
  2023-08-22 16:02         ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-22 13:41 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Yuan Fu, 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> Then, I'd like to point back to the previous discussion where I asked to
>> expose to Elisp information about buffer changes available to
>> tree-sitter.
>> https://yhetil.org/emacs-devel/83tu8jq2vl.fsf@gnu.org/
>
> I don't want to do that, sorry.  Not without a good understanding of
> what exactly do you need from that and in what way.  If we will expose
> anything, it will have to be the minimum possible exposure, not the
> maximum, so I would like to understand this very well before I agree
> to any change in this direction.

Org wants to do the same thing tree-sitter does - keep parsed AST in
sync with buffer modifications without having to re-parse the whole
buffer. So, we basically need the same information tree-sitter needs -
the sequence of buffer text changes, in their order.

Note that the markers discussed in the thread I linked are not
sufficient. When editing near AST node boundaries, even if the
boundaries are represented by markers, we have to re-parse the AST
around to account for the possible structural changes. So, information
about buffer edits is still required.

>> In fact, I am not sure if tree-sitter will behave correctly if it is
>> signaled changes in incorrect order.
>
> I will defer to Yuan, but tree-sitter doesn't use these hooks, we call
> its functions directly from insdel.c where needed.  This makes sense
> for a library to which we link and whose interface code we control,
> but giving such access to Lisp (and Org on top of that) is out of the
> question.  We don't even give such access to modules.

I hope that we can solve this issue one way or another. This currently
breaks the very core functionality of Org. Every part of Org relies on
it to obtain reasonable performance. Prior to using cache, we had orders
of magnitude slowdowns.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-22 13:41       ` Ihor Radchenko
@ 2023-08-22 16:02         ` Eli Zaretskii
  2023-08-23  8:52           ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-22 16:02 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: Yuan Fu <casouri@gmail.com>, 65451@debbugs.gnu.org
> Date: Tue, 22 Aug 2023 13:41:17 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Then, I'd like to point back to the previous discussion where I asked to
> >> expose to Elisp information about buffer changes available to
> >> tree-sitter.
> >> https://yhetil.org/emacs-devel/83tu8jq2vl.fsf@gnu.org/
> >
> > I don't want to do that, sorry.  Not without a good understanding of
> > what exactly do you need from that and in what way.  If we will expose
> > anything, it will have to be the minimum possible exposure, not the
> > maximum, so I would like to understand this very well before I agree
> > to any change in this direction.
> 
> Org wants to do the same thing tree-sitter does - keep parsed AST in
> sync with buffer modifications without having to re-parse the whole
> buffer. So, we basically need the same information tree-sitter needs -
> the sequence of buffer text changes, in their order.

We don't expose the data you want to tree-sitter in Lisp.  What is
exposed to Lisp are the parser and parse-tree objects that we build
(in C) based on tree-sitter parsing results.  When the buffer is
modified, the information about the modifications is used internally
by Emacs, in C code, to find and update the relevant parsers, and for
that we call the tree-sitter functions involved in this process.  See
the function treesit_record_change which does that, and which is
called from C when buffer text changes in a way relevant to treesit.el
functionalities.  (Note that some changes of buffer text are not
visible even to tree-sitter, because we decided they are not relevant,
for now.)

> Note that the markers discussed in the thread I linked are not
> sufficient. When editing near AST node boundaries, even if the
> boundaries are represented by markers, we have to re-parse the AST
> around to account for the possible structural changes. So, information
> about buffer edits is still required.

If tracking markers is not enough, then I wonder how the information
from the lower levels, which is basically the same but noisier, will
be able to help you.

> >> In fact, I am not sure if tree-sitter will behave correctly if it is
> >> signaled changes in incorrect order.
> >
> > I will defer to Yuan, but tree-sitter doesn't use these hooks, we call
> > its functions directly from insdel.c where needed.  This makes sense
> > for a library to which we link and whose interface code we control,
> > but giving such access to Lisp (and Org on top of that) is out of the
> > question.  We don't even give such access to modules.
> 
> I hope that we can solve this issue one way or another. This currently
> breaks the very core functionality of Org. Every part of Org relies on
> it to obtain reasonable performance. Prior to using cache, we had orders
> of magnitude slowdowns.

If you can arrange your design such that Lisp sees only AST-specific
objects affected by the modifications in buffer text, then I believe
we will have a good chance of finding a satisfactory solution.  If
that requires to have some of your code in C (preferably, generalized
to some extent), then so be it.

You see, I think the buffer-change hooks we have are already too much:
Lisp programs abuse them all the time (you can see a good example in
the bug which I mentioned up-thread, and which led to the change you
are now complaining about).  Doing more of that is not very wise, to
say the least.

Moreover, I think the solution you think you want you actually _don't_
want, because it will overwhelm you with changes that are not relevant
to your purposes.  You can see a clear evidence to that in the fact
that treesit_record_change is called only in several strategical
places, not everywhere where we change buffer text, and not at the
lowest level of such changes.  There's a reason to that.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-22 16:02         ` Eli Zaretskii
@ 2023-08-23  8:52           ` Ihor Radchenko
  2023-08-23 17:58             ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-23  8:52 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> Org wants to do the same thing tree-sitter does - keep parsed AST in
>> sync with buffer modifications without having to re-parse the whole
>> buffer. So, we basically need the same information tree-sitter needs -
>> the sequence of buffer text changes, in their order.
>
> We don't expose the data you want to tree-sitter in Lisp.  What is
> exposed to Lisp are the parser and parse-tree objects that we build
> (in C) based on tree-sitter parsing results.

I understand. The difference is that Org mode tries to implement the
tree sitter algorithm (not exactly, but similar) in Elisp.

> ... When the buffer is
> modified, the information about the modifications is used internally
> by Emacs, in C code, to find and update the relevant parsers, and for
> that we call the tree-sitter functions involved in this process.  See
> the function treesit_record_change which does that, and which is
> called from C when buffer text changes in a way relevant to treesit.el
> functionalities.  (Note that some changes of buffer text are not
> visible even to tree-sitter, because we decided they are not relevant,
> for now.)

I understand. Org also does not need every single change. The changes
that, when combined, do not affect the buffer text are not important to
Org as well. In fact, Org even merge subsequent edits in the same, or
intersecting region of text. For example, if no queries to Org AST are
done between edits,

Foo bar<point> baz ->
Foo bar|new text|<point> baz ->
Foo bar|new text| and more|<point> baz

is simply treated as a single edit
Foo bar<point> baz -> Foo bar|new text| and more|<point> baz

> If tracking markers is not enough, then I wonder how the information
> from the lower levels, which is basically the same but noisier, will
> be able to help you.

We do not really need information from lower levels.

At every moment of time, Org parser maintains a linkage between buffer
text and its parsed AST. After a buffer edit, we need to know how to
update the AST corresponding to the last known buffer text state to the
new (current) text state.

To update the old AST we need to know which regions in the old AST (and
old buffer text) were edited and their change in length. Then, we (1)
remove elements in AST that are structurally affected by the edit;
(2) shift elements after the edit that are not structurally affected but
whose boundaries must be adjusted to accommodate for the buffer length
being changed; (3) re-parse buffer text (after the edit) and fill the
gap in the AST.

For step (1), it is critical to have information about changed regions
in the same order the edits happen. And, in theory, we might utilize
before-change-functions to obtain this information (assuming that
before-change-functions are always called in the same order the edits
happen). But, unfortunately, before-change-functions are too noisy for
this purpose because, for example, simply altering text properties also
triggers before-change-functions, while no actual change in buffer text
occurs in such scenario and re-parsing AST is unnecessary. So, we
instead have to use after-change-functions and work out the original
changed region boundaties using
beg_changed end_changed pre-change_lentgh ->
beg_before_change = beg_changed;
end_before_change = end_changed - pre-change_length
This calculation of the original changed region becomes incorrect when
the order of after-change-functions is not the same as the editing order
- this bug report.

Step (2) is technically not necessary if we use markers, but we cannot
do it yet for performance reasons. And without markers, we again need to
know the change in length of th edited region:
end_changed - beg_changed - pre-change_lentgh.
This calculation is also incorrect when after-change-function is not the
same as the editing order.

All the above information does not have to be granular. We are good as
long as nothing queries the AST between the edits. For example, internal
details of edit sequence done by replace-match are not important because
Org AST will not be queried from Emacs C internals.

>> I hope that we can solve this issue one way or another. This currently
>> breaks the very core functionality of Org. Every part of Org relies on
>> it to obtain reasonable performance. Prior to using cache, we had orders
>> of magnitude slowdowns.
>
> If you can arrange your design such that Lisp sees only AST-specific
> objects affected by the modifications in buffer text, then I believe
> we will have a good chance of finding a satisfactory solution.  If
> that requires to have some of your code in C (preferably, generalized
> to some extent), then so be it.

The problem is that whether an edit requires re-parsing an AST object or
not depends on the object type. For example,

:DRAWER:
Text inside
drawer.
:END:

Corresponds to the following simplified AST:

(drawer (:name "DRAWER") (paragraph))

Then, changing it to

:DR:
...
:END:

will require updating 'drawer' AST node:

(drawer (:name "DR") (paragraph))

while 

:DRAWER:
Text inside
drawer.
:ND:

will invalidate drawer and turn the whole thing into

(paragraph)

and 

:DRAWER:
Text inside
<more text>
drawer.
:END:

will not affect the drawer, just inner paragraph.

Further,

:DRAWER:
Text inside
* Heading
drawer.
:END:

will split the whole thing into

(paragraph) (heading (paragraph))

Basically, I do not see an obvious way to abstract AST away into C
without implementing Org parser in C as well.
With an exception of exposing treesit_record_change to Elisp, so that we
can get information about region before and after the edit when updating
the AST - that's what I had in mind in
https://yhetil.org/emacs-devel/83tu8jq2vl.fsf@gnu.org/ We would need
something like after-change-text-functions that will only trigger after
actual text edits and provide information about region boundaries before
and after the edit.

> Moreover, I think the solution you think you want you actually _don't_
> want, because it will overwhelm you with changes that are not relevant
> to your purposes.  You can see a clear evidence to that in the fact
> that treesit_record_change is called only in several strategical
> places, not everywhere where we change buffer text, and not at the
> lowest level of such changes.  There's a reason to that.

Indeed. As I tried to explain, the problem for me is not the granularity
of changes, but their ordering.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-23  8:52           ` Ihor Radchenko
@ 2023-08-23 17:58             ` Eli Zaretskii
  2023-08-24  7:46               ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-23 17:58 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Wed, 23 Aug 2023 08:52:30 +0000
> 
> To update the old AST we need to know which regions in the old AST (and
> old buffer text) were edited and their change in length. Then, we (1)
> remove elements in AST that are structurally affected by the edit;
> (2) shift elements after the edit that are not structurally affected but
> whose boundaries must be adjusted to accommodate for the buffer length
> being changed; (3) re-parse buffer text (after the edit) and fill the
> gap in the AST.
> 
> For step (1), it is critical to have information about changed regions
> in the same order the edits happen. And, in theory, we might utilize
> before-change-functions to obtain this information (assuming that
> before-change-functions are always called in the same order the edits
> happen). But, unfortunately, before-change-functions are too noisy for
> this purpose because, for example, simply altering text properties also
> triggers before-change-functions, while no actual change in buffer text
> occurs in such scenario and re-parsing AST is unnecessary. So, we
> instead have to use after-change-functions and work out the original
> changed region boundaties using
> beg_changed end_changed pre-change_lentgh ->
> beg_before_change = beg_changed;
> end_before_change = end_changed - pre-change_length
> This calculation of the original changed region becomes incorrect when
> the order of after-change-functions is not the same as the editing order
> - this bug report.

If these measures still don't help you enough, perhaps the conclusion
is that it isn't feasible to implement text parsers in Lisp, at least
as long as you want all those micro-optimizations of knowing exactly
which parts of the buffer text were modified (as opposed to only know
how many characters at the beginning and at the end of the buffer
remain unchanged, and reparse the rest).  Maybe it must be done in C,
if we want Emacs to remain a relatively safe environment.

> Indeed. As I tried to explain, the problem for me is not the granularity
> of changes, but their ordering.

I still don't understand why the ordering matters, but I do know that
you cannot rely on it, and I hope I explained why.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-23 17:58             ` Eli Zaretskii
@ 2023-08-24  7:46               ` Ihor Radchenko
  2023-08-24  8:08                 ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-24  7:46 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

> If these measures still don't help you enough, perhaps the conclusion
> is that it isn't feasible to implement text parsers in Lisp, at least
> as long as you want all those micro-optimizations of knowing exactly
> which parts of the buffer text were modified (as opposed to only know
> how many characters at the beginning and at the end of the buffer
> remain unchanged, and reparse the rest).  Maybe it must be done in C,
> if we want Emacs to remain a relatively safe environment.

Do I understand correctly that `treesit_record_change' is called
__less frequently__ compared to before-change-functions and
after-change-functions? If yes, I do not see how exposing it to Elisp
will make things any worse than already available
`before-change-functions'/`after-change-functions'.

Elisp code that does not care about text property changes will not be
forced to use `before/after-change-functions' (because no other option)
and could be called less frequently.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-24  7:46               ` Ihor Radchenko
@ 2023-08-24  8:08                 ` Eli Zaretskii
  2023-08-24 11:24                   ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-24  8:08 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Thu, 24 Aug 2023 07:46:06 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > If these measures still don't help you enough, perhaps the conclusion
> > is that it isn't feasible to implement text parsers in Lisp, at least
> > as long as you want all those micro-optimizations of knowing exactly
> > which parts of the buffer text were modified (as opposed to only know
> > how many characters at the beginning and at the end of the buffer
> > remain unchanged, and reparse the rest).  Maybe it must be done in C,
> > if we want Emacs to remain a relatively safe environment.
> 
> Do I understand correctly that `treesit_record_change' is called
> __less frequently__ compared to before-change-functions and
> after-change-functions?

No, treesit_record_change is called at a lower level than
buffer-change hooks are, and therefore in some cases the hooks will
not be called, but treesit_record_change will be.

The frequency might be lower, but only because treesit_record_change
is called once per change; there's no separate "before" and "after"
calls.  In any case, the correspondence is not 1:1, because they are
called on different levels.

> If yes, I do not see how exposing it to Elisp will make things any
> worse than already available
> `before-change-functions'/`after-change-functions'.

Exposing buffer text changes to Lisp is inherently dangerous, because
Lisp code can do anything and everything in these hooks, and many
packages already do.  The fact that we allow this via those two hooks
is unfortunate as it is, but adding more opportunities for Lisp to do
potentially dangerous stuff, and doing that on a lower level, where
the buffer object is sometimes in a state that is not 100% consistent,
is unwise, to say the least.  It will make Emacs much less stable.
No, thanks.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-24  8:08                 ` Eli Zaretskii
@ 2023-08-24 11:24                   ` Ihor Radchenko
  2023-08-24 12:08                     ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-24 11:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

> Exposing buffer text changes to Lisp is inherently dangerous, because
> Lisp code can do anything and everything in these hooks, and many
> packages already do.  The fact that we allow this via those two hooks
> is unfortunate as it is, but adding more opportunities for Lisp to do
> potentially dangerous stuff, and doing that on a lower level, where
> the buffer object is sometimes in a state that is not 100% consistent,
> is unwise, to say the least.  It will make Emacs much less stable.
> No, thanks.

I can see the danger running lisp code while buffer state is transient.

Would it be acceptable to accumulate treesit_record_change transactions
into a queue, combine them later, and run the Elisp hook only when it is
safe (around the same place `after-change-functions' is called, but only
when actual edit is made to the buffer text)?

That way, we can have a hook that will run strictly less frequently
compared to `after-change-functions'.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-24 11:24                   ` Ihor Radchenko
@ 2023-08-24 12:08                     ` Eli Zaretskii
  2023-08-24 13:27                       ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-24 12:08 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Thu, 24 Aug 2023 11:24:37 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Exposing buffer text changes to Lisp is inherently dangerous, because
> > Lisp code can do anything and everything in these hooks, and many
> > packages already do.  The fact that we allow this via those two hooks
> > is unfortunate as it is, but adding more opportunities for Lisp to do
> > potentially dangerous stuff, and doing that on a lower level, where
> > the buffer object is sometimes in a state that is not 100% consistent,
> > is unwise, to say the least.  It will make Emacs much less stable.
> > No, thanks.
> 
> I can see the danger running lisp code while buffer state is transient.
> 
> Would it be acceptable to accumulate treesit_record_change transactions
> into a queue, combine them later, and run the Elisp hook only when it is
> safe (around the same place `after-change-functions' is called, but only
> when actual edit is made to the buffer text)?
> 
> That way, we can have a hook that will run strictly less frequently
> compared to `after-change-functions'.

When exactly do you need this to run?  At the same time as
after-change-functions doesn't sound like a good idea to me, but I
think you don't need to run it there.  What about running just before
redisplay kicks in?

Or how about explaining how is the (updated) AST used, i.e. who are
the clients of the AST updates, and what do they do when the AST
changes?





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-24 12:08                     ` Eli Zaretskii
@ 2023-08-24 13:27                       ` Ihor Radchenko
  2023-08-24 14:53                         ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-24 13:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> Would it be acceptable to accumulate treesit_record_change transactions
>> into a queue, combine them later, and run the Elisp hook only when it is
>> safe (around the same place `after-change-functions' is called, but only
>> when actual edit is made to the buffer text)?
>> 
>> That way, we can have a hook that will run strictly less frequently
>> compared to `after-change-functions'.
>
> When exactly do you need this to run?  At the same time as
> after-change-functions doesn't sound like a good idea to me, but I
> think you don't need to run it there.  What about running just before
> redisplay kicks in?

Usually, we need to update AST when some other Elisp code needs Org
element API. And we update AST on idle timer to speed things up.

What if we use no hooks at all? Instead, the edit transactions are
accumulated in a list that can be examined and processed by Elisp code
as needed.

Elements of the list will be like
[:buffer-chars-modified-tick :region-beginning :region-end-before-edit :region-end-after-edit]

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-24 13:27                       ` Ihor Radchenko
@ 2023-08-24 14:53                         ` Eli Zaretskii
  2023-08-25  6:37                           ` Eli Zaretskii
  2023-08-25  8:09                           ` Ihor Radchenko
  0 siblings, 2 replies; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-24 14:53 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Thu, 24 Aug 2023 13:27:02 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > When exactly do you need this to run?  At the same time as
> > after-change-functions doesn't sound like a good idea to me, but I
> > think you don't need to run it there.  What about running just before
> > redisplay kicks in?
> 
> Usually, we need to update AST when some other Elisp code needs Org
> element API. And we update AST on idle timer to speed things up.
> 
> What if we use no hooks at all? Instead, the edit transactions are
> accumulated in a list that can be examined and processed by Elisp code
> as needed.

If you need that from timers, then yes, all you need is access from
the timer function to a data structure that holds the accumulated
transactions.  Timers run approximately at the same time and under the
same conditions as redisplay, so this mechanism will indeed ensure
this data is accessed when Emacs is in a consistent state, and it is
safe to access and use this data.

> Elements of the list will be like
> [:buffer-chars-modified-tick :region-beginning :region-end-before-edit :region-end-after-edit]

If you really need buffer-chars-modified-tick, you will have to verify
that it is updated before calling the function which updates the
"transactions list".





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-24 14:53                         ` Eli Zaretskii
@ 2023-08-25  6:37                           ` Eli Zaretskii
  2023-08-25  9:09                             ` Ihor Radchenko
  2023-08-25  8:09                           ` Ihor Radchenko
  1 sibling, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-25  6:37 UTC (permalink / raw)
  To: yantar92; +Cc: casouri, 65451

> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Thu, 24 Aug 2023 17:53:26 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> If you need that from timers, then yes, all you need is access from
> the timer function to a data structure that holds the accumulated
> transactions.  Timers run approximately at the same time and under the
> same conditions as redisplay, so this mechanism will indeed ensure
> this data is accessed when Emacs is in a consistent state, and it is
> safe to access and use this data.
> 
> > Elements of the list will be like
> > [:buffer-chars-modified-tick :region-beginning :region-end-before-edit :region-end-after-edit]
> 
> If you really need buffer-chars-modified-tick, you will have to verify
> that it is updated before calling the function which updates the
> "transactions list".

Thinking about this some more, we will need to consider whether this
list of accumulated transactions is ever compacted by deleting old
transactions, or we let it grow indefinitely.  If the former, we
should consider the case where more than one feature wants to track
buffer edits (so it is impossible to remove entries once they have
been processed by a single consumer).





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-24 14:53                         ` Eli Zaretskii
  2023-08-25  6:37                           ` Eli Zaretskii
@ 2023-08-25  8:09                           ` Ihor Radchenko
  2023-08-25 10:25                             ` Eli Zaretskii
  1 sibling, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-25  8:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> Elements of the list will be like
>> [:buffer-chars-modified-tick :region-beginning :region-end-before-edit :region-end-after-edit]
>
> If you really need buffer-chars-modified-tick, you will have to verify
> that it is updated before calling the function which updates the
> "transactions list".

Do you mean that buffer-chars-modified-tick may not be yet updated at
the time treesit_record_change is called?

Then, treesit_record_change only uses byte positions. Will it be
possible to record buffer point positions, or may it be problematic?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-25  6:37                           ` Eli Zaretskii
@ 2023-08-25  9:09                             ` Ihor Radchenko
  2023-08-26  7:10                               ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-25  9:09 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

> Thinking about this some more, we will need to consider whether this
> list of accumulated transactions is ever compacted by deleting old
> transactions, or we let it grow indefinitely.  If the former, we
> should consider the case where more than one feature wants to track
> buffer edits (so it is impossible to remove entries once they have
> been processed by a single consumer).

What I propose is actually quite similar to `buffer-undo-list'.
But a bit less generic - (apply FUN-NAME ARGS) entries cannot be handled
outside the narrow scope of `undo'.
Similar to `buffer-undo-list' it needs to be compacted.

To not lose the information when the edit history is compacted, there
may be a hook executed right before the compaction, so that all the
users can update their state as needed.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-25  8:09                           ` Ihor Radchenko
@ 2023-08-25 10:25                             ` Eli Zaretskii
  2023-08-25 10:49                               ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-25 10:25 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Fri, 25 Aug 2023 08:09:41 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> Elements of the list will be like
> >> [:buffer-chars-modified-tick :region-beginning :region-end-before-edit :region-end-after-edit]
> >
> > If you really need buffer-chars-modified-tick, you will have to verify
> > that it is updated before calling the function which updates the
> > "transactions list".
> 
> Do you mean that buffer-chars-modified-tick may not be yet updated at
> the time treesit_record_change is called?

Yes.  You can audit the code and see it for yourself.

> Then, treesit_record_change only uses byte positions. Will it be
> possible to record buffer point positions, or may it be problematic?

Sorry, I don't understand what you are saying here and how this is
relevant to the issue at hand and to my comment in particular.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-25 10:25                             ` Eli Zaretskii
@ 2023-08-25 10:49                               ` Ihor Radchenko
  0 siblings, 0 replies; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-25 10:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> Then, treesit_record_change only uses byte positions. Will it be
>> possible to record buffer point positions, or may it be problematic?
>
> Sorry, I don't understand what you are saying here and how this is
> relevant to the issue at hand and to my comment in particular.

Never mind. I am jumping ahead of the discussion.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-25  9:09                             ` Ihor Radchenko
@ 2023-08-26  7:10                               ` Eli Zaretskii
  2023-08-27  8:13                                 ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-26  7:10 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Fri, 25 Aug 2023 09:09:11 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> > Thinking about this some more, we will need to consider whether this
> > list of accumulated transactions is ever compacted by deleting old
> > transactions, or we let it grow indefinitely.  If the former, we
> > should consider the case where more than one feature wants to track
> > buffer edits (so it is impossible to remove entries once they have
> > been processed by a single consumer).
> 
> What I propose is actually quite similar to `buffer-undo-list'.
> But a bit less generic - (apply FUN-NAME ARGS) entries cannot be handled
> outside the narrow scope of `undo'.
> Similar to `buffer-undo-list' it needs to be compacted.

Not sure what this means in practice.  the entries in the list we are
discussing will be very different from the entries in
buffer-undo-list.

> To not lose the information when the edit history is compacted, there
> may be a hook executed right before the compaction, so that all the
> users can update their state as needed.

If the compaction will run from GC, then we cannot safely call Lisp
hooks at that time.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-26  7:10                               ` Eli Zaretskii
@ 2023-08-27  8:13                                 ` Ihor Radchenko
  2023-08-27  8:29                                   ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-27  8:13 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> What I propose is actually quite similar to `buffer-undo-list'.
>> But a bit less generic - (apply FUN-NAME ARGS) entries cannot be handled
>> outside the narrow scope of `undo'.
>> Similar to `buffer-undo-list' it needs to be compacted.
>
> Not sure what this means in practice.  the entries in the list we are
> discussing will be very different from the entries in
> buffer-undo-list.

What I meant is that similar principles with undo-limit-like variables
may apply.

>> To not lose the information when the edit history is compacted, there
>> may be a hook executed right before the compaction, so that all the
>> users can update their state as needed.
>
> If the compaction will run from GC, then we cannot safely call Lisp
> hooks at that time.

Fair point.
Then, what about compacting the "edit list" more frequently, so that we
do not need to worry about its size? But I am not sure what frequency
will be safe.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-27  8:13                                 ` Ihor Radchenko
@ 2023-08-27  8:29                                   ` Eli Zaretskii
  2023-08-29  7:39                                     ` Ihor Radchenko
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2023-08-27  8:29 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: casouri, 65451

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: casouri@gmail.com, 65451@debbugs.gnu.org
> Date: Sun, 27 Aug 2023 08:13:38 +0000
> 
> Eli Zaretskii <eliz@gnu.org> writes:
> 
> >> What I propose is actually quite similar to `buffer-undo-list'.
> >> But a bit less generic - (apply FUN-NAME ARGS) entries cannot be handled
> >> outside the narrow scope of `undo'.
> >> Similar to `buffer-undo-list' it needs to be compacted.
> >
> > Not sure what this means in practice.  the entries in the list we are
> > discussing will be very different from the entries in
> > buffer-undo-list.
> 
> What I meant is that similar principles with undo-limit-like variables
> may apply.

Well, the devil is in the details ;-)

> >> To not lose the information when the edit history is compacted, there
> >> may be a hook executed right before the compaction, so that all the
> >> users can update their state as needed.
> >
> > If the compaction will run from GC, then we cannot safely call Lisp
> > hooks at that time.
> 
> Fair point.
> Then, what about compacting the "edit list" more frequently, so that we
> do not need to worry about its size? But I am not sure what frequency
> will be safe.

Something like that, yes.  But we need to invent a protocol which
would allow several clients to consume the list safely and without the
risk of missing edits.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-27  8:29                                   ` Eli Zaretskii
@ 2023-08-29  7:39                                     ` Ihor Radchenko
  0 siblings, 0 replies; 34+ messages in thread
From: Ihor Radchenko @ 2023-08-29  7:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: casouri, 65451

Eli Zaretskii <eliz@gnu.org> writes:

>> Then, what about compacting the "edit list" more frequently, so that we
>> do not need to worry about its size? But I am not sure what frequency
>> will be safe.
>
> Something like that, yes.  But we need to invent a protocol which
> would allow several clients to consume the list safely and without the
> risk of missing edits.

I can think of two approaches:

1. There will be a new buffer-local variable - `buffer-edit-history'
   that will hold recent `buffer-edit-history-limit' edits. This way,
   Elisp functions will be able to examine it any time they need to.

   In addition, there will be `after-edit-functions' hook that will be
   called after `buffer-edit-history-limit' is exceeded. Before the hook
   is called, `buffer-edit-history' is truncated. The hook functions
   will be called with a single argument - list of edits that have been
   removed from the `buffer-undo-history'. That way, Elisp will be able
   to process edits that will disappear from the `buffer-edit-history'.

   Each entry in `buffer-edit-history' will be a list of
   (beg end_before end_after counter), describing region boundaries
   before and after the edit + a counter that can be used to keep track
   of processed positions. The counter will be useful for the code that
   processes `buffer-edit-history' independently, outside
   `after-edit-functions', and may need to skip already processed
   elements. (I initially though that we can simply hold
   `buffer-chars-modified-tick' here, but it is not necessary to hold
   `buffer-chars-modified-tick' specifically - just something to
   indicate "epoch" in the edit history).

   The downside of exposing `buffer-edit-history' is that some
   bad-written Elisp may be tempted to hold a pointer to a cons cell in
   `buffer-edit-history', thus preventing GC.

2. We can have `after-edit-functions' being called once for each edit
   event with (beg end_before end_after) arguments.

   To avoid skipping edits, in addition to Emacs sometimes calling the
   hook, we should allow Elisp to trigger the hook early, by calling
   `process-buffer-edits'. This way, synchronization can be ensured.

   The downside here is when multiple consumers are using
   `after-edit-functions' - synchronization (`process-buffer-edits')
   requested by one consumer will also trigger all other consumers,
   potentially creating extra overheads.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2023-08-22 12:22 ` Eli Zaretskii
  2023-08-22 12:42   ` Ihor Radchenko
@ 2024-03-30 13:51   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30 14:11     ` Eli Zaretskii
  2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 2 replies; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-30 13:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, Ihor Radchenko, 65451

>> ⛔ Warning (emacs): Before: 1278 1281
>> ⛔ Warning (emacs): Before: 1278 1288
>> ⛔ Warning (emacs): After: 1278 1288 delta: 0
>> ⛔ Warning (emacs): After: 1278 1288 delta: 7
>> 
>> Note how "After: 1278 1288 delta: 0" reports a change of "utf-8-unix"
>> that did not alter the buffer text size. It is trigerred _before_
>> "After: 1278 1288 delta: 7" that corresponds to replacing "UTF" with
>> "utf-8-unix".

Hmm... yes, that's bad.  Alan, have you looked at this?

I suspect the best option in the above case is to inhibit the inner
calls to before/after (assuming we're sure they change only the "new
text"), so we'd be down to:

    ⛔ Warning (emacs): Before: 1278 1281
    ⛔ Warning (emacs): After: 1278 1288 delta: 7

> I think Org mode is relying on something it should not.  This
> particular use case aside, Emacs is allowed to call a function that
> changes the buffer from a function that itself changes the buffer, and
> it is allowed to call that inner function _before_ it did all the
> changes it intended to do.

AFAIK the above sequences breaks the promise we make about
`before-change-functions` and `after-change-functions`.

Almost all the non-trivial users of those hooks (i.e. basically those
that need to use both hooks) have extra sanity and raise the heads up in
despair when faced with things like the above (my `track-changes.el`
lacks such sanity checks, but that's because it's a PoC).


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-30 13:51   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-30 14:11     ` Eli Zaretskii
  2024-03-30 15:38       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-30 14:11 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 65451

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Ihor Radchenko <yantar92@posteo.net>,  65451@debbugs.gnu.org, Alan
>  Mackenzie <acm@muc.de>
> Date: Sat, 30 Mar 2024 09:51:59 -0400
> 
> > I think Org mode is relying on something it should not.  This
> > particular use case aside, Emacs is allowed to call a function that
> > changes the buffer from a function that itself changes the buffer, and
> > it is allowed to call that inner function _before_ it did all the
> > changes it intended to do.
> 
> AFAIK the above sequences breaks the promise we make about
> `before-change-functions` and `after-change-functions`.
> 
> Almost all the non-trivial users of those hooks (i.e. basically those
> that need to use both hooks) have extra sanity and raise the heads up in
> despair when faced with things like the above (my `track-changes.el`
> lacks such sanity checks, but that's because it's a PoC).

I still stand by my opinion: Org is relying on something it cannot
rely upon, not as long as a function that changes a buffer can be
called from another function which changes the same buffer.  I don't
see how we can avoid breaking code which relies on such assumptions,
not in general anyway.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-30 14:11     ` Eli Zaretskii
@ 2024-03-30 15:38       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30 16:47         ` Eli Zaretskii
  0 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-30 15:38 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 65451

> I still stand by my opinion: Org is relying on something it cannot
> rely upon, not as long as a function that changes a buffer can be
> called from another function which changes the same buffer.

`*-change-functions` should not modify the buffer (we could try and
enforce this, tho in my experience those that do will get punished
pretty quickly already), so the only cases I can think of where "a
function that changes a buffer can be called from another function which
changes the same buffer" is when both of those functions are in our
C code and we should have enough control to fix those cases.

> I don't see how we can avoid breaking code which relies on such
> assumptions, not in general anyway.

All sophisticated-enough users of `*-change-functions` rely on this
(and come with sanity checks to detect the problem and fallback on an
expensive recovery when needed, because indeed we tend to break our
promises 🙁).

So we really should try and fix those.
Alan did convince me that we should treat them as bugs and that we
should try and fix them.


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-30 15:38       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-30 16:47         ` Eli Zaretskii
  2024-03-31  3:04           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-30 16:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 65451

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: yantar92@posteo.net,  65451@debbugs.gnu.org,  acm@muc.de
> Date: Sat, 30 Mar 2024 11:38:24 -0400
> 
> > I still stand by my opinion: Org is relying on something it cannot
> > rely upon, not as long as a function that changes a buffer can be
> > called from another function which changes the same buffer.
> 
> `*-change-functions` should not modify the buffer

That's not what happened in the case described in that bug, AFAIR.
Simply a buffer change started, then, before it ended, another nested
change was started.

> so the only cases I can think of where "a function that changes a
> buffer can be called from another function which changes the same
> buffer" is when both of those functions are in our C code and we
> should have enough control to fix those cases.

You forget the various hooks, other than buffer modification hooks.

> Alan did convince me that we should treat them as bugs and that we
> should try and fix them.

He didn't convince me.





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-30 13:51   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30 14:11     ` Eli Zaretskii
@ 2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-31  6:06       ` Eli Zaretskii
                         ` (2 more replies)
  1 sibling, 3 replies; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-31  3:02 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, Ihor Radchenko, 65451

> I suspect the best option in the above case is to inhibit the inner
> calls to before/after (assuming we're sure they change only the "new
> text"), so we'd be down to:
>
>     ⛔ Warning (emacs): Before: 1278 1281
>     ⛔ Warning (emacs): After: 1278 1288 delta: 7

A simpler option is the patch below.


        Stefan


diff --git a/src/search.c b/src/search.c
index 2ad565fadde..83ff7120e43 100644
--- a/src/search.c
+++ b/src/search.c
@@ -2769,6 +2769,8 @@ DEFUN ("replace-match", Freplace_match, Sreplace_match, 1, 5, 0,
   /* Replace the old text with the new in the cleanest possible way.  */
   replace_range (sub_start, sub_end, newtext, 1, 0, 1, true, true);
 
+  signal_after_change (sub_start, sub_end - sub_start, SCHARS (newtext));
+
   if (case_action == all_caps)
     Fupcase_region (make_fixnum (search_regs.start[sub]),
 		    make_fixnum (newpoint),
@@ -2792,7 +2794,6 @@ DEFUN ("replace-match", Freplace_match, Sreplace_match, 1, 5, 0,
   /* Now move point "officially" to the end of the inserted replacement.  */
   move_if_not_intangible (newpoint);
 
-  signal_after_change (sub_start, sub_end - sub_start, SCHARS (newtext));
   update_compositions (sub_start, newpoint, CHECK_BORDER);
 
   return Qnil;






^ permalink raw reply related	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-30 16:47         ` Eli Zaretskii
@ 2024-03-31  3:04           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-31  3:04 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 65451

>> `*-change-functions` should not modify the buffer
> That's not what happened in the case described in that bug, AFAIR.

No, indeed.

>> so the only cases I can think of where "a function that changes a
>> buffer can be called from another function which changes the same
>> buffer" is when both of those functions are in our C code and we
>> should have enough control to fix those cases.
> You forget the various hooks, other than buffer modification hooks.

If we have to run them some time between `before-c-f` and `after-c-f`,
then they should not modify the buffer, just like the
`*-change-functions` hooks, 

>> Alan did convince me that we should treat them as bugs and that we
>> should try and fix them.
> He didn't convince me.

🙂


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-31  6:06       ` Eli Zaretskii
  2024-03-31 13:57         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 34+ messages in thread
From: Eli Zaretskii @ 2024-03-31  6:06 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 65451

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: Ihor Radchenko <yantar92@posteo.net>,  65451@debbugs.gnu.org,  Alan
>  Mackenzie <acm@muc.de>
> Date: Sat, 30 Mar 2024 23:02:14 -0400
> 
> > I suspect the best option in the above case is to inhibit the inner
> > calls to before/after (assuming we're sure they change only the "new
> > text"), so we'd be down to:
> >
> >     ⛔ Warning (emacs): Before: 1278 1281
> >     ⛔ Warning (emacs): After: 1278 1288 delta: 7
> 
> A simpler option is the patch below.

Doesn't that miss the changes done by upcase-region?

Also, what about point not being after the inserted replacement at
that place?





^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-31  6:06       ` Eli Zaretskii
@ 2024-03-31 13:57         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-31 13:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 65451

>> > I suspect the best option in the above case is to inhibit the inner
>> > calls to before/after (assuming we're sure they change only the "new
>> > text"), so we'd be down to:
>> >
>> >     ⛔ Warning (emacs): Before: 1278 1281
>> >     ⛔ Warning (emacs): After: 1278 1288 delta: 7
>> 
>> A simpler option is the patch below.
>
> Doesn't that miss the changes done by upcase-region?

No: `upcase-region` runs its own `before/after-change-functions`
(indeed, these were the problematic nested ones which break the order).

> Also, what about point not being after the inserted replacement at
> that place?

`*-change-functions` can't rely on the position of point so that's not
an issue.


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-31  6:06       ` Eli Zaretskii
@ 2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-08 19:10         ` Ihor Radchenko
  2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-07 18:19 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Alan Mackenzie, Ihor Radchenko, 65451

> A simpler option is the patch below.

Pushed to `master` along with a corresponding test.
The test uses a "sanity check" suite which encodes the promises we make
after before/after-change-functions.
[ My Emacs now runs those sanity checks in all non-temp buffers. 🙂  ]


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-31  6:06       ` Eli Zaretskii
  2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 0 replies; 34+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-07 18:19 UTC (permalink / raw)
  To: 65451-done

Closing,


        Stefan






^ permalink raw reply	[flat|nested] 34+ messages in thread

* bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made
  2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-08 19:10         ` Ihor Radchenko
  0 siblings, 0 replies; 34+ messages in thread
From: Ihor Radchenko @ 2024-04-08 19:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Alan Mackenzie, Eli Zaretskii, 65451

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> A simpler option is the patch below.
>
> Pushed to `master` along with a corresponding test.
> The test uses a "sanity check" suite which encodes the promises we make
> after before/after-change-functions.

I confirm the fix on the latest master.
Thanks!

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 34+ messages in thread

end of thread, other threads:[~2024-04-08 19:10 UTC | newest]

Thread overview: 34+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-08-22  9:30 bug#65451: 30.0.50; `after-change-functions' are not triggered in the same order the changes are made Ihor Radchenko
2023-08-22 12:22 ` Eli Zaretskii
2023-08-22 12:42   ` Ihor Radchenko
2023-08-22 12:58     ` Eli Zaretskii
2023-08-22 13:41       ` Ihor Radchenko
2023-08-22 16:02         ` Eli Zaretskii
2023-08-23  8:52           ` Ihor Radchenko
2023-08-23 17:58             ` Eli Zaretskii
2023-08-24  7:46               ` Ihor Radchenko
2023-08-24  8:08                 ` Eli Zaretskii
2023-08-24 11:24                   ` Ihor Radchenko
2023-08-24 12:08                     ` Eli Zaretskii
2023-08-24 13:27                       ` Ihor Radchenko
2023-08-24 14:53                         ` Eli Zaretskii
2023-08-25  6:37                           ` Eli Zaretskii
2023-08-25  9:09                             ` Ihor Radchenko
2023-08-26  7:10                               ` Eli Zaretskii
2023-08-27  8:13                                 ` Ihor Radchenko
2023-08-27  8:29                                   ` Eli Zaretskii
2023-08-29  7:39                                     ` Ihor Radchenko
2023-08-25  8:09                           ` Ihor Radchenko
2023-08-25 10:25                             ` Eli Zaretskii
2023-08-25 10:49                               ` Ihor Radchenko
2024-03-30 13:51   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30 14:11     ` Eli Zaretskii
2024-03-30 15:38       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30 16:47         ` Eli Zaretskii
2024-03-31  3:04           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-31  3:02     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-31  6:06       ` Eli Zaretskii
2024-03-31 13:57         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-08 19:10         ` Ihor Radchenko
2024-04-07 18:19       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).