bug#70077: An easier way to track buffer changes

unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed

* bug#70077: An easier way to track buffer changes
@ 2024-03-29 16:15 Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-29 18:12 ` Eli Zaretskii
                   ` (3 more replies)
  0 siblings, 4 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-29 16:15 UTC (permalink / raw)
  To: 70077
  Cc: Nicolas Goaziou, Ihor Radchenko, Alan Mackenzie,
	João Távora, Alan Zimmerman, Frédéric Bour,
	Phillip Lord, Stephen Leake, Yuan Fu, Qiantan Hong, monnier

[-- Attachment #1: Type: text/plain, Size: 2911 bytes --]

Tags: patch

Our `*-change-functions` hook are fairly tricky to use right.
Some of the issues are:

- before and after calls are not necessarily paired.
- the beg/end values don't always match.
- there can be thousands of calls from within a single command.
- these hooks are run at a fairly low-level so there are things they
  really shouldn't do, such as modify the buffer or wait.
- the after call doesn't get enough info to rebuild the before-change state,
  so some callers need to use both before-c-f and after-c-f (and then
  deal with the first two points above).

The worst part is that those problems occur rarely, so many coders don't
see it at first and have to learn them the hard way, sometimes forcing
them to rethink their original design.

So I think we should provide something simpler.
I attached a proof-of-concept API which aims to do that, with the
following entry points:

    (defun track-changes-register ( signal)
      "Register a new tracker and return a new tracker ID.
    SIGNAL is a function that will be called with no argument when
    the current buffer is modified, so that we can react to the change.
    Once called, SIGNAL is not called again until `track-changes-fetch'
    is called with the corresponding tracker ID."
    
    (defun track-changes-unregister (id)
      "Remove the tracker denoted by ID.
    Trackers can consume resources (especially if `track-changes-fetch' is
    not called), so it is good practice to unregister them when you don't
    need them any more."
    
    (defun track-changes-fetch (id func)
      "Fetch the pending changes.
    ID is the tracker ID returned by a previous `track-changes-register'.
    FUNC is a function.  It is called with 3 arguments (BEGIN END BEFORE)
    where BEGIN..END delimit the region that was changed since the last
    time `track-changes-fetch' was called and BEFORE is a string containing
    the previous content of that region.
    
    If no changes occurred since the last time, FUNC is not called and
    we return nil, otherwise we return the value returned by FUNC,
    and re-enable the TRACKER corresponding to ID."

It's not meant as a replacement of the existing hooks since it doesn't
try to accommodate some uses such as those that use before-c-f to
implement a finer-grained form of read-only text.

The driving design was:

- Try to provide enough info such that it is possible and easy to
  maintain a copy of the buffer simply by applying the reported changes.
  E.g. for uses such as `eglot.el` or `crdt.el`.
- Make the API less synchronous: take care of combining small changes
  into larger ones, and let the clients decide when they react to changes.

If you're in the Cc, it's because I believe you have valuable experience
with those hooks, so I'd be happy to hear your thought about whether
you think this would indeed (have) be(en) better than what we have.


        Stefan



[-- Attachment #2: track-changes.el --]
[-- Type: text/x-emacs-lisp, Size: 13002 bytes --]

;;; track-changes.el --- API to react to buffer modifications  -*- lexical-binding: t; -*-

;; Copyright (C) 2024  Free Software Foundation, Inc.

;; Author: Stefan Monnier <monnier@iro.umontreal.ca>

;; This file is part of GNU Emacs.

;; GNU Emacs is free software: you can redistribute it and/or modify
;; it under the terms of the GNU General Public License as published by
;; the Free Software Foundation, either version 3 of the License, or
;; (at your option) any later version.

;; GNU Emacs is distributed in the hope that it will be useful,
;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
;; GNU General Public License for more details.

;; You should have received a copy of the GNU General Public License
;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.

;;; Commentary:

;; This library is a layer of abstraction above `before-change-functions'
;; and `after-change-functions' which takes care of accumulating changes
;; until a time when its client finds it convenient to react to them.

;; It provides the following operations:
;;
;;     (track-changes-register SIGNAL)
;;     (track-changes-fetch ID FUNC)
;;     (track-changes-unregister ID)
;;
;; A typical use case might look like:
;;
;;     (defvar my-foo--change-tracker nil)
;;     (define-minor-mode my-foo-mode
;;       "Fooing like there's no tomorrow."
;;       (if (null my-foo-mode)
;;           (when my-foo--change-tracker
;;             (track-changes-unregister my-foo--change-tracker)
;;             (setq my-foo--change-tracker nil))
;;         (unless my-foo--change-tracker
;;           (setq my-foo--change-tracker
;;                 (track-changes-register
;;                  (lambda ()
;;                    (track-changes-fetch
;;                     my-foo--change-tracker
;;                     (lambda (beg end before)
;;                       ..DO THE THING..))))))))

;;; Code:

;; FIXME: Try and do some sanity-checks (e.g. looking at `buffer-size'),
;; to detect if/when we somehow missed some changes.
;; FIXME: The API doesn't offer an easy way to signal a "full resync"
;; kind of change, as might be needed if we lost changes.

(require 'cl-lib)

(cl-defstruct (track-changes--tracker
               (:noinline t)
               (:constructor nil)
               (:constructor track-changes--tracker ( signal state)))
  ( signal nil :read-only t)
  state)

(cl-defstruct (track-changes--state
               (:noinline t)
               (:constructor nil)
               (:constructor track-changes--state ()))
  (beg (point-max))
  (end (point-min))
  (bbeg (point-max))                    ;BEG of the BEFORE string,
  (bend (point-min))                    ;END of the BEFORE string.
  (before nil)
  (next nil))

(defvar-local track-changes--trackers ())
(defvar-local track-changes--clean-trackers ())

(defvar-local track-changes--state nil)

(defun track-changes-register ( signal)
  "Register a new tracker and return a new tracker ID.
SIGNAL is a function that will be called with no argument when
the current buffer is modified, so that we can react to the change.
Once called, SIGNAL is not called again until `track-changes-fetch'
is called with the corresponding tracker ID."
  ;; FIXME: Add an optional arg to choose between `funcall' and `funcall-later'?
  (track-changes--clean-state)
  (add-hook 'before-change-functions #'track-changes--before nil t)
  (add-hook 'after-change-functions  #'track-changes--after  nil t)
  (let ((tracker (track-changes--tracker signal track-changes--state)))
    (push tracker track-changes--trackers)
    (push tracker track-changes--clean-trackers)
    tracker))

(defun track-changes-unregister (id)
  "Remove the tracker denoted by ID.
Trackers can consume resources (especially if `track-changes-fetch' is
not called), so it is good practice to unregister them when you don't
need them any more."
  (unless (memq id track-changes--trackers)
    (error "Unregistering a non-registered tracker: %S" id))
  (setq track-changes--trackers (delq id track-changes--trackers))
  (setq track-changes--clean-trackers (delq id track-changes--clean-trackers))
  (when (null track-changes--trackers)
    (setq track-changes--state nil)
    (remove-hook 'before-change-functions #'track-changes--before t)
    (remove-hook 'after-change-functions  #'track-changes--after  t)))


(defun track-changes--clean-p ()
  (null (track-changes--state-before track-changes--state)))

(defun track-changes--clean-state ()
  (cond
   ((null track-changes--state)
    ;; No state has been created yet.  Do it now.
    (setq track-changes--state (track-changes--state)))
   ((track-changes--clean-p) nil)
   (t
    ;; FIXME: We may be in-between a before-c-f and an after-c-f, so we
    ;; should save some of the current buffer in case an after-c-f comes
    ;; before a before-c-f.
    (let ((new (track-changes--state)))
      (setf (track-changes--state-next track-changes--state) new)
      (setq track-changes--state new)))))

(defun track-changes--before (beg end)
  (cl-assert track-changes--state)
  (cl-assert (<= beg end))
  (if (track-changes--clean-p)
      (progn
        (setf (track-changes--state-before track-changes--state)
              (buffer-substring-no-properties beg end))
        (setf (track-changes--state-bbeg track-changes--state) beg)
        (setf (track-changes--state-bend track-changes--state) end))
    (cl-assert
     (save-restriction
       (widen)
       (<= (point-min)
           (track-changes--state-bbeg track-changes--state)
           (track-changes--state-bend track-changes--state)
           (point-max))))
    (when (< beg (track-changes--state-bbeg track-changes--state))
      (let* ((old-bbeg (track-changes--state-bbeg track-changes--state))
             ;; To avoid O(N²) behavior when faced with many small changes,
             ;; we copy more than needed.
             (new-bbeg (min (max (point-min)
                                 (- old-bbeg
                                    (length (track-changes--state-before
                                             track-changes--state))))
                            beg)))
        (setf (track-changes--state-bbeg track-changes--state) beg)
        (cl-callf (lambda (old new) (concat new old))
            (track-changes--state-before track-changes--state)
          (buffer-substring-no-properties new-bbeg old-bbeg))))

    (when (< (track-changes--state-bend track-changes--state) end)
      (let* ((old-bend (track-changes--state-bend track-changes--state))
             ;; To avoid O(N²) behavior when faced with many small changes,
             ;; we copy more than needed.
             (new-bend (max (min (point-max)
                                 (+ old-bend
                                    (length (track-changes--state-before
                                             track-changes--state))))
                            end)))
        (setf (track-changes--state-bend track-changes--state) end)
        (cl-callf concat (track-changes--state-before track-changes--state)
          (buffer-substring-no-properties old-bend new-bend))))))

(defun track-changes--after (beg end len)
  (cl-assert track-changes--state)
  (cl-assert (track-changes--state-before track-changes--state))
  (let ((offset (- (- end beg) len)))
    (cl-incf (track-changes--state-bend track-changes--state) offset)
    (cl-assert
     (save-restriction
       (widen)
       (<= (point-min)
           (track-changes--state-bbeg track-changes--state)
           beg end
           (track-changes--state-bend track-changes--state)
           (point-max))))
    ;; Note the new changes.
    (when (< beg (track-changes--state-beg track-changes--state))
      (setf (track-changes--state-beg track-changes--state) beg))
    (cl-callf (lambda (old-end) (max end (+ old-end offset)))
        (track-changes--state-end track-changes--state)))
  (cl-assert (<= (track-changes--state-bbeg track-changes--state)
                 (track-changes--state-beg track-changes--state)
                 beg end
                 (track-changes--state-end track-changes--state)
                 (track-changes--state-bend track-changes--state)))
  (while track-changes--clean-trackers
    (let ((tracker (pop track-changes--clean-trackers)))
      ;; FIXME: Use `funcall'?
      (funcall-later (track-changes--tracker-signal tracker) ()))))

(defun track-changes-fetch (id func)
  "Fetch the pending changes.
ID is the tracker ID returned by a previous `track-changes-register'.
FUNC is a function.  It is called with 3 arguments (BEGIN END BEFORE)
where BEGIN..END delimit the region that was changed since the last
time `track-changes-fetch' was called and BEFORE is a string containing
the previous content of that region.

If no changes occurred since the last time, FUNC is not called and
we return nil, otherwise we return the value returned by FUNC,
and re-enable the TRACKER corresponding to ID."
  (let ((beg nil)
        (end nil)
        (before nil)
        (states ()))
    ;; We want to combine the states from most recent to oldest,
    ;; so reverse them.
    (let ((state (track-changes--tracker-state id)))
      (while state
        (push state states)
        (setq state (track-changes--state-next state))))
    (when (null (track-changes--state-before (car states)))
      (cl-assert (eq (car states) track-changes--state))
      (setq states (cdr states)))
    (if (null states)
        (progn
          (cl-assert (memq id track-changes--clean-trackers))
          nil)
      (dolist (state states)
        (let ((prevbbeg (track-changes--state-bbeg state))
              (prevbend (track-changes--state-bend state))
              (prevbefore (track-changes--state-before state)))
          (if (not before)
              (progn
                ;; This is the most recent change.  Just initialize the vars.
                (setq beg (track-changes--state-beg state))
                (setq end (track-changes--state-end state))
                (setq before prevbefore)
                (unless (and (= beg prevbbeg) (= end prevbend))
                  (setq before
                        (substring before
                                   (- beg (track-changes--state-bbeg state))
                                   (- (length before)
                                      (- (track-changes--state-bend state)
                                         end))))))
            ;; FIXME: When merging "states", we disregard the `beg/end'
            ;; in favor of `bbeg/bend' which also works but is conservative.
            (let ((endb (+ beg (length before))))
              (when (< prevbbeg beg)
                (setq before (concat (buffer-substring-no-properties
                                      prevbbeg beg)
                                     before))
                (setq beg prevbbeg)
                (cl-assert (= endb (+ beg (length before)))))
              (when (< endb prevbend)
                (let ((new-end (+ end (- prevbend endb))))
                  (setq before (concat before
                                       (buffer-substring-no-properties
                                        end new-end)))
                  (setq end new-end)
                  (cl-assert (= prevbend (+ beg (length before))))
                  (setq endb (+ beg (length before)))))
              (cl-assert (<= beg prevbbeg prevbend endb))
              ;; The `prevbefore' is covered by the new one.
              (setq before
                    (concat (substring before 0 (- prevbbeg beg))
                            prevbefore
                            (substring before (- (length before)
                                                 (- endb prevbend)))))))))
      (cl-assert (<= (point-min) beg end (point-max)))
      ;; Clean the state of the tracker before calling `func', in case
      ;; `func' performs buffer modifications.
      (track-changes--clean-state)
      ;; Update the tracker's state before running `func' so we don't risk
      ;; mistakenly replaying the changes in case `func' exits non-locally.
      (setf (track-changes--tracker-state id) track-changes--state)
      (unwind-protect (funcall func beg end before)
        ;; Re-enable the tracker's signal only after running `func', so
        ;; as to avoid recursive invocations.
        (cl-pushnew id track-changes--clean-trackers)))))

(defmacro with-track-changes (id vars &rest body)
  (declare (indent 2) (debug (form sexp body)))
  `(track-changes-fetch ,id (lambda ,vars ,@body)))
                        

(provide 'track-changes)
;;; track-changes.el end here.

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 16:15 bug#70077: An easier way to track buffer changes Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-29 18:12 ` Eli Zaretskii
  2024-03-29 18:53   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30  3:17   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-29 22:20 ` phillip.lord
                   ` (2 subsequent siblings)
  3 siblings, 2 replies; 50+ messages in thread
From: Eli Zaretskii @ 2024-03-29 18:12 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: casouri, 70077, yantar92, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, monnier, phillip.lord

> Cc: Nicolas Goaziou <mail@nicolasgoaziou.fr>,
>  Ihor Radchenko <yantar92@gmail.com>, Alan Mackenzie <acm@muc.de>,
>  João Távora <joaotavora@gmail.com>,
>  Alan Zimmerman <alan.zimm@gmail.com>,
>  Frédéric Bour <frederic.bour@lakaban.net>,
>  Phillip Lord <phillip.lord@russet.org.uk>,
>  Stephen Leake <stephen_leake@stephe-leake.org>, Yuan Fu <casouri@gmail.com>,
>  Qiantan Hong <qhong@alum.mit.edu>, monnier@iro.umontreal.ca
> Date: Fri, 29 Mar 2024 12:15:53 -0400
> From:  Stefan Monnier via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
>     (defun track-changes-register ( signal)
>       "Register a new tracker and return a new tracker ID.
>     SIGNAL is a function that will be called with no argument when
>     the current buffer is modified, so that we can react to the change.
>     Once called, SIGNAL is not called again until `track-changes-fetch'
>     is called with the corresponding tracker ID."
>     
>     (defun track-changes-unregister (id)
>       "Remove the tracker denoted by ID.
>     Trackers can consume resources (especially if `track-changes-fetch' is
>     not called), so it is good practice to unregister them when you don't
>     need them any more."
>     
>     (defun track-changes-fetch (id func)
>       "Fetch the pending changes.
>     ID is the tracker ID returned by a previous `track-changes-register'.
>     FUNC is a function.  It is called with 3 arguments (BEGIN END BEFORE)
>     where BEGIN..END delimit the region that was changed since the last
>     time `track-changes-fetch' was called and BEFORE is a string containing
>     the previous content of that region.
>     
>     If no changes occurred since the last time, FUNC is not called and
>     we return nil, otherwise we return the value returned by FUNC,
>     and re-enable the TRACKER corresponding to ID."

I cannot imagine how applications would use these APIs.  I'm probably
missing something, org the above documentation does.  Can you show
some real-life examples?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 18:12 ` Eli Zaretskii
@ 2024-03-29 18:53   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30  6:34     ` Eli Zaretskii
  2024-03-30  3:17   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-29 18:53 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: yantar92, 70077, casouri, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

> I cannot imagine how applications would use these APIs.  I'm probably
> missing something, org the above documentation does.  Can you show
> some real-life examples?

Haven't written real code for it yes, no.
The best I can offer is the sample code in the file:

    (defvar my-foo--change-tracker nil)
    (define-minor-mode my-foo-mode
      "Fooing like there's no tomorrow."
      (if (null my-foo-mode)
          (when my-foo--change-tracker
            (track-changes-unregister my-foo--change-tracker)
            (setq my-foo--change-tracker nil))
        (unless my-foo--change-tracker
          (setq my-foo--change-tracker
                (track-changes-register
                 (lambda ()
                   (track-changes-fetch
                    my-foo--change-tracker
                    (lambda (beg end before)
                      ..DO THE THING..))))))))

Where "DO THE THING" is run similarly to what would happen in an
`after-change-functions`, except:

- BEFORE is a string holding the content of what was in BEG..END
  instead of being limited to its length.
- It's run at most once per command, so there's no performance worries.
- It's run "outside" of the modifications themselves,
  so `inhibit-modification-hooks` is nil and the code can wait, modify
  the buffer, or do any kind of crazy things.

The code can also be changed to:

                [...]
                (track-changes-register
                 (lambda ()
                   (run-with-idle-timer
                    2 nil
                    (lambda ()
                      (track-changes-fetch
                       my-foo--change-tracker
                       (lambda (beg end before)
                         ..DO THE THING..))))))))

without any extra work.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 16:15 bug#70077: An easier way to track buffer changes Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-29 18:12 ` Eli Zaretskii
@ 2024-03-29 22:20 ` phillip.lord
  2024-03-29 22:59   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30  9:51 ` Ihor Radchenko
  2024-04-05 22:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  3 siblings, 1 reply; 50+ messages in thread
From: phillip.lord @ 2024-03-29 22:20 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Yuan Fu, 70077, Ihor Radchenko, Qiantan Hong,
	Frédéric Bour, João Távora, Nicolas Goaziou,
	Alan Mackenzie, Stephen Leake, Alan Zimmerman

On 2024-03-29 12:15, Stefan Monnier wrote:
> Tags: patch
> 
> Our `*-change-functions` hook are fairly tricky to use right.
> Some of the issues are:
> 
> - before and after calls are not necessarily paired.
> - the beg/end values don't always match.
> - there can be thousands of calls from within a single command.
> - these hooks are run at a fairly low-level so there are things they
>   really shouldn't do, such as modify the buffer or wait.
> - the after call doesn't get enough info to rebuild the before-change 
> state,
>   so some callers need to use both before-c-f and after-c-f (and then
>   deal with the first two points above).
> 
> The worst part is that those problems occur rarely, so many coders 
> don't
> see it at first and have to learn them the hard way, sometimes forcing
> them to rethink their original design.
> 
> So I think we should provide something simpler.
> I attached a proof-of-concept API which aims to do that, with the
> following entry points:
> 
>     (defun track-changes-register ( signal)
>       "Register a new tracker and return a new tracker ID.
>     SIGNAL is a function that will be called with no argument when
>     the current buffer is modified, so that we can react to the change.
>     Once called, SIGNAL is not called again until `track-changes-fetch'
>     is called with the corresponding tracker ID."
> 
>     (defun track-changes-unregister (id)
>       "Remove the tracker denoted by ID.
>     Trackers can consume resources (especially if `track-changes-fetch' 
> is
>     not called), so it is good practice to unregister them when you 
> don't
>     need them any more."
> 
>     (defun track-changes-fetch (id func)
>       "Fetch the pending changes.
>     ID is the tracker ID returned by a previous 
> `track-changes-register'.
>     FUNC is a function.  It is called with 3 arguments (BEGIN END 
> BEFORE)
>     where BEGIN..END delimit the region that was changed since the last
>     time `track-changes-fetch' was called and BEFORE is a string 
> containing
>     the previous content of that region.
> 
>     If no changes occurred since the last time, FUNC is not called and
>     we return nil, otherwise we return the value returned by FUNC,
>     and re-enable the TRACKER corresponding to ID."
> 
> It's not meant as a replacement of the existing hooks since it doesn't
> try to accommodate some uses such as those that use before-c-f to
> implement a finer-grained form of read-only text.
> 
> The driving design was:
> 
> - Try to provide enough info such that it is possible and easy to
>   maintain a copy of the buffer simply by applying the reported 
> changes.
>   E.g. for uses such as `eglot.el` or `crdt.el`.
> - Make the API less synchronous: take care of combining small changes
>   into larger ones, and let the clients decide when they react to 
> changes.
> 
> If you're in the Cc, it's because I believe you have valuable 
> experience
> with those hooks, so I'd be happy to hear your thought about whether
> you think this would indeed (have) be(en) better than what we have.


Your description of the problem is entirely consistent with my 
experience. The last time I
checked it was `subst-char-in-region' which was causing most of the 
difficulties, normally as a result of `fill-paragraph'.

If I remember correctly, I think this wouldn't be enough for my use. You 
keep two buffers
in sync, you have to use before-change-function -- it is only before any 
change that the
two buffers are guaranteed to be in sync and it is this that allows you 
to work out what the
`start' and `end' positions mean in the copied buffer. Afterward, you 
cannot work out what the end position because you don't know if the 
change is a change, insertion, deletion or both.

Last time I checked, I did find relatively few primitives that were 
guilty of being inconsistent -- in the case of `subst-char-in-region', 
it returned the maximal area of effect before the and the minimal area 
of effect after. Would it not be easier to fix these?

Phil






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 22:20 ` phillip.lord
@ 2024-03-29 22:59   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30  6:46     ` Eli Zaretskii
  2024-03-30 12:06     ` phillip.lord
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-29 22:59 UTC (permalink / raw)
  To: phillip.lord
  Cc: Ihor Radchenko, 70077, Yuan Fu, Qiantan Hong,
	Frédéric Bour, João Távora, Nicolas Goaziou,
	Alan Mackenzie, Stephen Leake, Alan Zimmerman

> If I remember correctly, I think this wouldn't be enough for my
> use. You keep two buffers in sync, you have to use
> before-change-function -- it is only before any change that the two
> buffers are guaranteed to be in sync and it is this that allows you to
> work out what the `start' and `end' positions mean in the copied
> buffer. Afterward, you cannot work out what the end position because
> you don't know if the change is a change, insertion, deletion or both.

I believe the API I propose does provide that information: you can
recover the state of the buffer before the change (or more specifically,
the state of the buffer as of the last time you called
track-changes-fetch) from the BEG/END/BEFORE arguments as follows:

    (concat (buffer-substring (point-min) beg)
            before
            (buffer-substring end (point-max)))

I don't mean to suggest to do that, since it's costly for large
buffers, but to illustrate that the information is properly preserved.

> Last time I checked, I did find relatively few primitives that were guilty
> of being inconsistent -- in the case of `subst-char-in-region', it returned
> the maximal area of effect before the and the minimal area of effect
> after. Would it not be easier to fix these?

[ IIRC `revert-buffer` has a similar behavior, and in that case the
  difference can be large since the "before" covers the whole buffer.  ]

Also, it would fix only the problem of pairing, and not the other ones.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 18:12 ` Eli Zaretskii
  2024-03-29 18:53   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-30  3:17   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30  5:09     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-30  3:17 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: casouri, 70077, yantar92, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

[-- Attachment #1: Type: text/plain, Size: 323 bytes --]

> I cannot imagine how applications would use these APIs.  I'm probably
> missing something, org the above documentation does.  Can you show
> some real-life examples?

Here's my first attempt at a real-life use.
Note: this example doesn't make use of the full API (it doesn't need the
`before` argument).


        Stefan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: diff-track-changes.patch --]
[-- Type: text/x-diff, Size: 4845 bytes --]

diff --git a/lisp/vc/diff-mode.el b/lisp/vc/diff-mode.el
index d9146ea8cc5..f8c31fb6748 100644
--- a/lisp/vc/diff-mode.el
+++ b/lisp/vc/diff-mode.el
@@ -53,9 +53,10 @@
 ;; - Handle `diff -b' output in context->unified.
 
 ;;; Code:
+(require 'easy-mmode)
+(require 'track-changes)
 (eval-when-compile (require 'cl-lib))
 (eval-when-compile (require 'subr-x))
-(require 'easy-mmode)
 
 (autoload 'vc-find-revision "vc")
 (autoload 'vc-find-revision-no-save "vc")
@@ -1441,31 +1441,16 @@
   (if (buffer-modified-p) (diff-fixup-modifs (point-min) (point-max)))
   nil)
 
-;; It turns out that making changes in the buffer from within an
-;; *-change-function is asking for trouble, whereas making them
-;; from a post-command-hook doesn't pose much problems
-(defvar diff-unhandled-changes nil)
-(defun diff-after-change-function (beg end _len)
-  "Remember to fixup the hunk header.
-See `after-change-functions' for the meaning of BEG, END and LEN."
-  ;; Ignoring changes when inhibit-read-only is set is strictly speaking
-  ;; incorrect, but it turns out that inhibit-read-only is normally not set
-  ;; inside editing commands, while it tends to be set when the buffer gets
-  ;; updated by an async process or by a conversion function, both of which
-  ;; would rather not be uselessly slowed down by this hook.
-  (when (and (not undo-in-progress) (not inhibit-read-only))
-    (if diff-unhandled-changes
-	(setq diff-unhandled-changes
-	      (cons (min beg (car diff-unhandled-changes))
-		    (max end (cdr diff-unhandled-changes))))
-      (setq diff-unhandled-changes (cons beg end)))))
+(defvar-local diff--track-changes nil)
 
-(defun diff-post-command-hook ()
-  "Fixup hunk headers if necessary."
-  (when (consp diff-unhandled-changes)
-    (ignore-errors
+(defun diff--track-changes-signal (tracker)
+  (cl-assert (eq tracker diff--track-changes))
+  (track-changes-fetch tracker #'diff--track-changes-function))
+
+(defun diff--track-changes-function (beg end _before)
+  (with-demoted-errors "%S"
       (save-excursion
-	(goto-char (car diff-unhandled-changes))
+      (goto-char beg)
 	;; Maybe we've cut the end of the hunk before point.
 	(if (and (bolp) (not (bobp))) (backward-char 1))
 	;; We used to fixup modifs on all the changes, but it turns out that
@@ -1480,17 +1465,16 @@
                           (re-search-forward diff-context-mid-hunk-header-re
                                              nil t)))))
           (when (and ;; Don't try to fixup changes in the hunk header.
-                 (>= (car diff-unhandled-changes) start)
+               (>= beg start)
                  ;; Don't try to fixup changes in the mid-hunk header either.
                  (or (not mid)
-                     (< (cdr diff-unhandled-changes) (match-beginning 0))
-                     (> (car diff-unhandled-changes) (match-end 0)))
+                   (< end (match-beginning 0))
+                   (> beg (match-end 0)))
                  (save-excursion
 		(diff-end-of-hunk nil 'donttrustheader)
                    ;; Don't try to fixup changes past the end of the hunk.
-                   (>= (point) (cdr diff-unhandled-changes))))
-	  (diff-fixup-modifs (point) (cdr diff-unhandled-changes)))))
-      (setq diff-unhandled-changes nil))))
+                 (>= (point) end)))
+	 (diff-fixup-modifs (point) end))))))
 
 (defun diff-next-error (arg reset)
   ;; Select a window that displays the current buffer so that point
@@ -1572,9 +1557,8 @@ diff-mode
   ;; setup change hooks
   (if (not diff-update-on-the-fly)
       (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
-    (make-local-variable 'diff-unhandled-changes)
-    (add-hook 'after-change-functions #'diff-after-change-function nil t)
-    (add-hook 'post-command-hook #'diff-post-command-hook nil t))
+    (setq diff--track-changes
+          (track-changes-register #'diff--track-changes-signal)))
 
   ;; add-log support
   (setq-local add-log-current-defun-function #'diff-current-defun)
@@ -1593,12 +1577,13 @@ diff-minor-mode
 \\{diff-minor-mode-map}"
   :group 'diff-mode :lighter " Diff"
   ;; FIXME: setup font-lock
-  ;; setup change hooks
+  (when diff--track-changes (track-changes-unregister diff--track-changes))
+  (remove-hook 'write-contents-functions #'diff-write-contents-hooks t)
   (if (not diff-update-on-the-fly)
       (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
-    (make-local-variable 'diff-unhandled-changes)
-    (add-hook 'after-change-functions #'diff-after-change-function nil t)
-    (add-hook 'post-command-hook #'diff-post-command-hook nil t)))
+    (unless diff--track-changes
+     (setq diff--track-changes
+           (track-changes-register #'diff--track-changes-signal)))))
 
 ;;; Handy hook functions ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30  3:17   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-30  5:09     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-30  5:09 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: casouri, 70077, yantar92, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

[-- Attachment #1: Type: text/plain, Size: 833 bytes --]

> Here's my first attempt at a real-life use.

And here's a second attempt, which is a tentative patch for `eglot.el`.
This one does make use of the `before` argument, so it exercises more
the API.

The `eglot--virtual-pos-to-lsp-position` is not completely satisfactory,
since to compute the LSP position of the end of the chunk before it was
modified, I end up creating a temp buffer to insert the part of the text
that changed (to count its line+column, which is much easier in a buffer
than in a string).  That kinda sucks performancewise, but we do it at
most once per command rather than once per buffer-modification, so it
should be lost in the noise.

The upside is that we're insulated from the quirks of the
after/before-change-functions evidenced by the copious comments
referring to various bug reports.


        Stefan

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: eglot.patch --]
[-- Type: text/x-diff, Size: 7993 bytes --]

diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index 7d2f1a55165..d2268cea940 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -110,6 +110,7 @@
 (require 'text-property-search nil t)
 (require 'diff-mode)
 (require 'diff)
+(require 'track-changes)
 
 ;; These dependencies are also GNU ELPA core packages.  Because of
 ;; bug#62576, since there is a risk that M-x package-install, despite
@@ -1732,6 +1733,9 @@ eglot-utf-16-linepos
   "Calculate number of UTF-16 code units from position given by LBP.
 LBP defaults to `eglot--bol'."
   (/ (- (length (encode-coding-region (or lbp (eglot--bol))
+                                      ;; FIXME: How could `point' ever be
+                                      ;; larger than `point-max' (sounds like
+                                      ;; a bug in Emacs).
                                       ;; Fix github#860
                                       (min (point) (point-max)) 'utf-16 t))
         2)
@@ -1749,6 +1753,24 @@ eglot--pos-to-lsp-position
          :character (progn (when pos (goto-char pos))
                            (funcall eglot-current-linepos-function)))))
 
+(defun eglot--virtual-pos-to-lsp-position (pos string)
+  "Return the LSP position at the end of STRING if it were inserted at POS."
+  (eglot--widening
+   (goto-char pos)
+   (forward-char 0)
+   ;; LSP line is zero-origin; Emacs is one-origin.
+   (let ((posline (1- (line-number-at-pos nil t)))
+         (linebeg (buffer-substring (point) pos))
+         (colfun eglot-current-linepos-function))
+     ;; Use a temp buffer because:
+     ;; - I don't know of a fast way to count newlines in a string.
+     ;; - We currently don't have `eglot-current-linepos-function' for strings.
+     (with-temp-buffer
+       (insert linebeg string)
+       (goto-char (point-max))
+       (list :line (+ posline (1- (line-number-at-pos nil t)))
+             :character (funcall colfun))))))
+
 (defvar eglot-move-to-linepos-function #'eglot-move-to-utf-16-linepos
   "Function to move to a position within a line reported by the LSP server.
 
@@ -1946,6 +1968,8 @@ eglot-managed-mode-hook
   "A hook run by Eglot after it started/stopped managing a buffer.
 Use `eglot-managed-p' to determine if current buffer is managed.")
 
+(defvar-local eglot--track-changes nil)
+
 (define-minor-mode eglot--managed-mode
   "Mode for source buffers managed by some Eglot project."
   :init-value nil :lighter nil :keymap eglot-mode-map
@@ -1959,8 +1983,9 @@ eglot--managed-mode
       ("utf-8"
        (eglot--setq-saving eglot-current-linepos-function #'eglot-utf-8-linepos)
        (eglot--setq-saving eglot-move-to-linepos-function #'eglot-move-to-utf-8-linepos)))
-    (add-hook 'after-change-functions #'eglot--after-change nil t)
-    (add-hook 'before-change-functions #'eglot--before-change nil t)
+    (unless eglot--track-changes
+      (setq eglot--track-changes
+            (track-changes-register #'eglot--track-changes-signal)))
     (add-hook 'kill-buffer-hook #'eglot--managed-mode-off nil t)
     ;; Prepend "didClose" to the hook after the "nonoff", so it will run first
     (add-hook 'kill-buffer-hook #'eglot--signal-textDocument/didClose nil t)
@@ -1994,8 +2019,8 @@ eglot--managed-mode
       (eldoc-mode 1))
     (cl-pushnew (current-buffer) (eglot--managed-buffers (eglot-current-server))))
    (t
-    (remove-hook 'after-change-functions #'eglot--after-change t)
-    (remove-hook 'before-change-functions #'eglot--before-change t)
+    (when eglot--track-changes
+      (track-changes-unregister eglot--track-changes))
     (remove-hook 'kill-buffer-hook #'eglot--managed-mode-off t)
     (remove-hook 'kill-buffer-hook #'eglot--signal-textDocument/didClose t)
     (remove-hook 'before-revert-hook #'eglot--signal-textDocument/didClose t)
@@ -2564,54 +2589,20 @@ jsonrpc-connection-ready-p
 
 (defvar-local eglot--change-idle-timer nil "Idle timer for didChange signals.")
 
-(defun eglot--before-change (beg end)
-  "Hook onto `before-change-functions' with BEG and END."
-  (when (listp eglot--recent-changes)
-    ;; Records BEG and END, crucially convert them into LSP
-    ;; (line/char) positions before that information is lost (because
-    ;; the after-change thingy doesn't know if newlines were
-    ;; deleted/added).  Also record markers of BEG and END
-    ;; (github#259)
-    (push `(,(eglot--pos-to-lsp-position beg)
-            ,(eglot--pos-to-lsp-position end)
-            (,beg . ,(copy-marker beg nil))
-            (,end . ,(copy-marker end t)))
-          eglot--recent-changes)))
-
 (defvar eglot--document-changed-hook '(eglot--signal-textDocument/didChange)
   "Internal hook for doing things when the document changes.")
 
-(defun eglot--after-change (beg end pre-change-length)
-  "Hook onto `after-change-functions'.
-Records BEG, END and PRE-CHANGE-LENGTH locally."
+(defun eglot--track-changes-signal (id)
   (cl-incf eglot--versioned-identifier)
-  (pcase (car-safe eglot--recent-changes)
-    (`(,lsp-beg ,lsp-end
-                (,b-beg . ,b-beg-marker)
-                (,b-end . ,b-end-marker))
-     ;; github#259 and github#367: with `capitalize-word' & friends,
-     ;; `before-change-functions' records the whole word's `b-beg' and
-     ;; `b-end'.  Similarly, when `fill-paragraph' coalesces two
-     ;; lines, `b-beg' and `b-end' mark end of first line and end of
-     ;; second line, resp.  In both situations, `beg' and `end'
-     ;; received here seemingly contradict that: they will differ by 1
-     ;; and encompass the capitalized character or, in the coalescing
-     ;; case, the replacement of the newline with a space.  We keep
-     ;; both markers and positions to detect and correct this.  In
-     ;; this specific case, we ignore `beg', `len' and
-     ;; `pre-change-len' and send richer information about the region
-     ;; from the markers.  I've also experimented with doing this
-     ;; unconditionally but it seems to break when newlines are added.
-     (if (and (= b-end b-end-marker) (= b-beg b-beg-marker)
-              (or (/= beg b-beg) (/= end b-end)))
-         (setcar eglot--recent-changes
-                 `(,lsp-beg ,lsp-end ,(- b-end-marker b-beg-marker)
-                            ,(buffer-substring-no-properties b-beg-marker
-                                                             b-end-marker)))
-       (setcar eglot--recent-changes
-               `(,lsp-beg ,lsp-end ,pre-change-length
-                          ,(buffer-substring-no-properties beg end)))))
-    (_ (setf eglot--recent-changes :emacs-messup)))
+  (track-changes-fetch
+   id (lambda (beg end before)
+        (if (stringp before)
+            (push `(,(eglot--pos-to-lsp-position beg)
+                    ,(eglot--virtual-pos-to-lsp-position beg before)
+                    ,(length before)
+                    ,(buffer-substring-no-properties beg end))
+                  eglot--recent-changes)
+          (setf eglot--recent-changes :emacs-messup))))
   (when eglot--change-idle-timer (cancel-timer eglot--change-idle-timer))
   (let ((buf (current-buffer)))
     (setq eglot--change-idle-timer
@@ -2741,12 +2732,6 @@ eglot--signal-textDocument/didChange
                               (buffer-substring-no-properties (point-min)
                                                               (point-max)))))
           (cl-loop for (beg end len text) in (reverse eglot--recent-changes)
-                   ;; github#259: `capitalize-word' and commands based
-                   ;; on `casify_region' will cause multiple duplicate
-                   ;; empty entries in `eglot--before-change' calls
-                   ;; without an `eglot--after-change' reciprocal.
-                   ;; Weed them out here.
-                   when (numberp len)
                    vconcat `[,(list :range `(:start ,beg :end ,end)
                                     :rangeLength len :text text)]))))
       (setq eglot--recent-changes nil)

^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 18:53   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-30  6:34     ` Eli Zaretskii
  2024-03-30 14:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2024-03-30  6:34 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: yantar92, 70077, casouri, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 70077@debbugs.gnu.org,  mail@nicolasgoaziou.fr,  yantar92@posteo.net,
>   acm@muc.de,  joaotavora@gmail.com,  alan.zimm@gmail.com,
>   frederic.bour@lakaban.net,  phillip.lord@russet.org.uk,
>   stephen_leake@stephe-leake.org,  casouri@gmail.com,  qhong@alum.mit.edu
> Date: Fri, 29 Mar 2024 14:53:41 -0400
> 
> > I cannot imagine how applications would use these APIs.  I'm probably
> > missing something, org the above documentation does.  Can you show
> > some real-life examples?
> 
> Haven't written real code for it yes, no.
> The best I can offer is the sample code in the file:
> 
>     (defvar my-foo--change-tracker nil)
>     (define-minor-mode my-foo-mode
>       "Fooing like there's no tomorrow."
>       (if (null my-foo-mode)
>           (when my-foo--change-tracker
>             (track-changes-unregister my-foo--change-tracker)
>             (setq my-foo--change-tracker nil))
>         (unless my-foo--change-tracker
>           (setq my-foo--change-tracker
>                 (track-changes-register
>                  (lambda ()
>                    (track-changes-fetch
>                     my-foo--change-tracker
>                     (lambda (beg end before)
>                       ..DO THE THING..))))))))
> 
> Where "DO THE THING" is run similarly to what would happen in an
> `after-change-functions`, except:
> 
> - BEFORE is a string holding the content of what was in BEG..END
>   instead of being limited to its length.
> - It's run at most once per command, so there's no performance worries.
> - It's run "outside" of the modifications themselves,
>   so `inhibit-modification-hooks` is nil and the code can wait, modify
>   the buffer, or do any kind of crazy things.

Thanks.

I understand the last point, but that still doesn't explain enough for
me to see the light.  (I've also looked at the two real-life uses you
posted, and it didn't help, probably because the important ideas
drowned in the sea of modifications to code I'm not familiar with well
enough to understand what's important and what isn't.)  If the last
point, i.e. the problems caused by limitations on what can be safely
done from modification hooks, is basically the main advantage, then I
think I understand the rationale.  Otherwise, the above looks like
doing all the job in after-change-functions, and it is not clear to me
how is that better, since if track-changes-fetch will fetch a series
of changes, deciding how to handle them could be much harder than
handling them one by one when each one happens.  For example, the
BEGIN..END values no longer reflect the current buffer contents, and
each fetched change refers to a different content of the buffer (so
the same values of BEG..END don't necessarily mean the same places in
the buffer).  I think the need for the
eglot--virtual-pos-to-lsp-position function in one of your examples is
the tip of that iceberg.

Also, all of your examples seem to have the signal function just call
track-changes-fetch and do almost nothing else, so I wonder why we
need a separate function for that, and more specifically what would be
a use case where the registered signal function does NOT call
track-changes-fetch, but does something else, and track-changes-fetch
is then called outside of the signal function.

Finally, the doc string of track-changes-register does not describe
the exact place in the buffer-change sequence where the signal
function will be called, which makes it harder to reason about it.
Will it be called where we now call signal_after_change or somewhere
else?  And how do you guarantee that the signal function will not be
called again until track-changes-fetch is called?

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 22:59   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-30  6:46     ` Eli Zaretskii
  2024-03-30 12:06     ` phillip.lord
  1 sibling, 0 replies; 50+ messages in thread
From: Eli Zaretskii @ 2024-03-30  6:46 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: yantar92, 70077, casouri, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

> Cc: Ihor Radchenko <yantar92@posteo.net>, 70077@debbugs.gnu.org,
>  Yuan Fu <casouri@gmail.com>, Qiantan Hong <qhong@alum.mit.edu>,
>  Frédéric Bour <frederic.bour@lakaban.net>,
>  João Távora <joaotavora@gmail.com>,
>  Nicolas Goaziou <mail@nicolasgoaziou.fr>, Alan Mackenzie <acm@muc.de>,
>  Stephen Leake <stephen_leake@stephe-leake.org>,
>  Alan Zimmerman <alan.zimm@gmail.com>
> Date: Fri, 29 Mar 2024 18:59:38 -0400
> From:  Stefan Monnier via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> > If I remember correctly, I think this wouldn't be enough for my
> > use. You keep two buffers in sync, you have to use
> > before-change-function -- it is only before any change that the two
> > buffers are guaranteed to be in sync and it is this that allows you to
> > work out what the `start' and `end' positions mean in the copied
> > buffer. Afterward, you cannot work out what the end position because
> > you don't know if the change is a change, insertion, deletion or both.
> 
> I believe the API I propose does provide that information: you can
> recover the state of the buffer before the change (or more specifically,
> the state of the buffer as of the last time you called
> track-changes-fetch) from the BEG/END/BEFORE arguments as follows:
> 
>     (concat (buffer-substring (point-min) beg)
>             before
>             (buffer-substring end (point-max)))

But if you get several changes, the above will need to be done in
reverse order, back-to-front, no?  And before you do that, you cannot
really handle the changes, right?

> I don't mean to suggest to do that, since it's costly for large
> buffers

Exactly.  But if the buffer _is_ large, then what to do?

> Also, it would fix only the problem of pairing, and not the other ones.

So the main/only problem this mechanism solves is the lack of pairing
between before/after calls to modification hooks?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 16:15 bug#70077: An easier way to track buffer changes Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-29 18:12 ` Eli Zaretskii
  2024-03-29 22:20 ` phillip.lord
@ 2024-03-30  9:51 ` Ihor Radchenko
  2024-03-30 12:49   ` Eli Zaretskii
  2024-03-30 14:09   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-05 22:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  3 siblings, 2 replies; 50+ messages in thread
From: Ihor Radchenko @ 2024-03-30  9:51 UTC (permalink / raw)
  To: 70077
  Cc: casouri, yantar92, qhong, frederic.bour, joaotavora, mail, acm,
	stephen_leake, alan.zimm, monnier, phillip.lord

Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

> The driving design was:
>
> - Try to provide enough info such that it is possible and easy to
>   maintain a copy of the buffer simply by applying the reported changes.
>   E.g. for uses such as `eglot.el` or `crdt.el`.
> - Make the API less synchronous: take care of combining small changes
>   into larger ones, and let the clients decide when they react to changes.

Before we discuss the API, may you allow me to raise one critical
concern: bug#65451.

If my reading of the patch is correct, your code is relying upon the
buffer changes arriving in the same order the changes are being made.
However, it is not always the case, as demonstrated in the linked bug
report.

I am skeptical that you can achieve the desired patch goals purely
relying upon before/after-change-functions, without reaching down to C
internals.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 22:59   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30  6:46     ` Eli Zaretskii
@ 2024-03-30 12:06     ` phillip.lord
  2024-03-30 13:39       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: phillip.lord @ 2024-03-30 12:06 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: Ihor Radchenko, 70077, Yuan Fu, Qiantan Hong,
	Frédéric Bour, João Távora, Nicolas Goaziou,
	Alan Mackenzie, Stephen Leake, Alan Zimmerman

On 2024-03-29 18:59, Stefan Monnier wrote:
>> If I remember correctly, I think this wouldn't be enough for my
>> use. You keep two buffers in sync, you have to use
>> before-change-function -- it is only before any change that the two
>> buffers are guaranteed to be in sync and it is this that allows you to
>> work out what the `start' and `end' positions mean in the copied
>> buffer. Afterward, you cannot work out what the end position because
>> you don't know if the change is a change, insertion, deletion or both.
> 
> I believe the API I propose does provide that information: you can
> recover the state of the buffer before the change (or more 
> specifically,
> the state of the buffer as of the last time you called
> track-changes-fetch) from the BEG/END/BEFORE arguments as follows:
> 
>     (concat (buffer-substring (point-min) beg)
>             before
>             (buffer-substring end (point-max)))
> 
> I don't mean to suggest to do that, since it's costly for large
> buffers, but to illustrate that the information is properly preserved.


Ah, yes, you are correct, I had missed that one. As you note, it would 
be costly,
especially because if you wanted to do anything with that data, you 
would probably
end up dumping it into a temp buffer.


>> Last time I checked, I did find relatively few primitives that were 
>> guilty
>> of being inconsistent -- in the case of `subst-char-in-region', it 
>> returned
>> the maximal area of effect before the and the minimal area of effect
>> after. Would it not be easier to fix these?
> 
> [ IIRC `revert-buffer` has a similar behavior, and in that case the
>   difference can be large since the "before" covers the whole buffer.  
> ]
> 
> Also, it would fix only the problem of pairing, and not the other ones.


Understood. It would be interesting to know how many primitives cause 
issues though.

Phil





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30  9:51 ` Ihor Radchenko
@ 2024-03-30 12:49   ` Eli Zaretskii
  2024-03-30 13:19     ` Ihor Radchenko
  2024-03-30 14:09   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2024-03-30 12:49 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, 70077, yantar92, qhong, frederic.bour, joaotavora, mail,
	acm, phillip.lord, stephen_leake, alan.zimm, monnier

> Cc: casouri@gmail.com, yantar92@gmail.com, qhong@alum.mit.edu,
>  frederic.bour@lakaban.net, joaotavora@gmail.com, mail@nicolasgoaziou.fr,
>  acm@muc.de, stephen_leake@stephe-leake.org, alan.zimm@gmail.com,
>  monnier@iro.umontreal.ca, phillip.lord@russet.org.uk
> From: Ihor Radchenko <yantar92@posteo.net>
> Date: Sat, 30 Mar 2024 09:51:13 +0000
> 
> Before we discuss the API, may you allow me to raise one critical
> concern: bug#65451.
> 
> If my reading of the patch is correct, your code is relying upon the
> buffer changes arriving in the same order the changes are being made.
> However, it is not always the case, as demonstrated in the linked bug
> report.

That bug report is about after-change-functions.  Since Stefan didn't
yet describe where will the changes be recorded, it doesn't
necessarily follow that your worries are justified.  They could be, of
course, but we should decide that after we hear the details.

In any case, who and where said the changes will be fetched by
track-changes-fetch must be in the order they were made? why is the
order at all significant?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30 12:49   ` Eli Zaretskii
@ 2024-03-30 13:19     ` Ihor Radchenko
  2024-03-30 13:31       ` Eli Zaretskii
  0 siblings, 1 reply; 50+ messages in thread
From: Ihor Radchenko @ 2024-03-30 13:19 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: casouri, 70077, yantar92, qhong, frederic.bour, joaotavora, mail,
	acm, phillip.lord, stephen_leake, alan.zimm, monnier

Eli Zaretskii <eliz@gnu.org> writes:

> In any case, who and where said the changes will be fetched by
> track-changes-fetch must be in the order they were made? why is the
> order at all significant?

I have concerns that the following API promise can be fulfilled:

    (defun track-changes-fetch (id func)
      "Fetch the pending changes.
    ID is the tracker ID returned by a previous `track-changes-register'.
    FUNC is a function.  It is called with 3 arguments (BEGIN END BEFORE)
    where BEGIN..END delimit the region that was changed since the last
    time `track-changes-fetch' was called and BEFORE is a string containing
                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    the previous content of that region.
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

Looking at the implementation of `track-changes--before/after', I can
see that it is updating the BEFORE string and these updates implicitly
assume that the changes arrive in order - which is not true in some edge
cases described in the bug report.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30 13:19     ` Ihor Radchenko
@ 2024-03-30 13:31       ` Eli Zaretskii
  0 siblings, 0 replies; 50+ messages in thread
From: Eli Zaretskii @ 2024-03-30 13:31 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, 70077, yantar92, qhong, frederic.bour, joaotavora, mail,
	acm, phillip.lord, stephen_leake, alan.zimm, monnier

> From: Ihor Radchenko <yantar92@posteo.net>
> Cc: 70077@debbugs.gnu.org, casouri@gmail.com, yantar92@gmail.com,
>  qhong@alum.mit.edu, frederic.bour@lakaban.net, joaotavora@gmail.com,
>  mail@nicolasgoaziou.fr, acm@muc.de, stephen_leake@stephe-leake.org,
>  alan.zimm@gmail.com, monnier@iro.umontreal.ca, phillip.lord@russet.org.uk
> Date: Sat, 30 Mar 2024 13:19:31 +0000
> 
> Looking at the implementation of `track-changes--before/after', I can
> see that it is updating the BEFORE string and these updates implicitly
> assume that the changes arrive in order - which is not true in some edge
> cases described in the bug report.

So I guess you have detected a bug in the implementation before it was
even finalized and installed.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30 12:06     ` phillip.lord
@ 2024-03-30 13:39       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-30 13:39 UTC (permalink / raw)
  To: phillip.lord
  Cc: Ihor Radchenko, 70077, Yuan Fu, Qiantan Hong,
	Frédéric Bour, João Távora, Nicolas Goaziou,
	Alan Mackenzie, Stephen Leake, Alan Zimmerman

> Ah, yes, you are correct, I had missed that one. As you note, it would
> be costly, especially because if you wanted to do anything with that
> data, you would probably end up dumping it into a temp buffer.

That's indeed what I end up doing in the `eglot.el` case.
But if the API wants to tackle the performance issue of combining many
small changes into a larger one, I don't know how we can do better.

Another idea I considered was to keep a whole buffer as "previous
contents" (instead of a string that covers the modified area).
In some cases it would be more efficient, but the common case would be
a lot more costly.

        Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30  9:51 ` Ihor Radchenko
  2024-03-30 12:49   ` Eli Zaretskii
@ 2024-03-30 14:09   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 0 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-30 14:09 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: Ihor Radchenko, 70077, Yuan Fu, Qiantan Hong,
	Frédéric Bour, João Távora, Nicolas Goaziou,
	Alan Mackenzie, Stephen Leake, Alan Zimmerman, Phillip Lord

>> The driving design was:
>>
>> - Try to provide enough info such that it is possible and easy to
>>   maintain a copy of the buffer simply by applying the reported changes.
>>   E.g. for uses such as `eglot.el` or `crdt.el`.
>> - Make the API less synchronous: take care of combining small changes
>>   into larger ones, and let the clients decide when they react to changes.
>
> Before we discuss the API, may you allow me to raise one critical
> concern: bug#65451.

Thanks, I wasn't aware of it.

Note that bug#65451 would affect `track-changes.el` but not its clients
nor its API.

Well, I guess it couldn't really insulate its clients from such bugs,
but it would be `track-changes.el`s responsibility to detect such
problems and pass down to its clients something saying "oops we have
a problem, you have to do a full-resync".

> If my reading of the patch is correct, your code is relying upon the
> buffer changes arriving in the same order the changes are being made.

Indeed.

> I am skeptical that you can achieve the desired patch goals purely
> relying upon before/after-change-functions, without reaching down to C
> internals.

There's a FIXME for that:

    ;; FIXME: Try and do some sanity-checks (e.g. looking at `buffer-size'),
    ;; to detect if/when we somehow missed some changes.

All the current non-trivial users of *-change-functions have such sanity
checks.  They're designed to handle those nasty cases where we have
a bug in the C code.  I don't claim my new API can magically paper over
those bugs.  The intention to deal with those bugs is:

- When they're detected, call the clients with a special value for
  `before` which says that we lost track of some change, so the client
  knows that it may be necessary to do a full resync.
  Luckily for many/most clients a full-resync is only a performance
  problem (e.g. having to reparse the whole file), tho for some (like
  lentic) I suspect it can result in an actual loss of information.

- Try and fix them, of course.  Alan has done a great job in the past
  fixing many of them (tho apparently still not all).
  [ And also a great job of convincing me that we *can* and thus
    *should* fix them.  ]

IOW, no magic bullet: the clients would still have to somehow handle such
a "full-resync"s.

The main advantage would be that the job of sanity-checking would be
taken care of by `track-changes.el` and the clients would have to check
only `before` for a special value to indicate a problem.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30  6:34     ` Eli Zaretskii
@ 2024-03-30 14:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30 16:45         ` Eli Zaretskii
  2024-04-01 11:53         ` Ihor Radchenko
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-30 14:58 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: yantar92, 70077, casouri, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

> If the last point, i.e. the problems caused by limitations on what can
> be safely done from modification hooks, is basically the main
> advantage, then I think I understand the rationale.

That's one of the purposes but not the only one.

> Otherwise, the above looks like doing all the job in
> after-change-functions, and it is not clear to me how is that better,
> since if track-changes-fetch will fetch a series of changes,

`track-changes-fetch` will call its function argument only once.
If several changes happened since last time, `track-changes.el` will
summarize them into a single (BEG END BEFORE).

> deciding how to handle them could be much harder than handling them
> one by one when each one happens.  For example, the BEGIN..END values
> no longer reflect the current buffer contents,

Indeed, one of the purposes of the proposed API is to allow delaying the
handling to a later time, and if you keep individual changes, then it
becomes very difficult to make sense of those changes because they refer
to buffer states that aren't available any more.

That's why `track-changes.el` bundles those into a single (BEG END
BEFORE) change, which makes sure BEG/END are currently-valid buffer
positions and thus easy to use.
The previous buffer state is not directly available but can be
easily reconstructed from (BEG END BEFORE).

> Also, all of your examples seem to have the signal function just call
> track-changes-fetch and do almost nothing else, so I wonder why we
> need a separate function for that,

It gives more freedom to the clients.  For example it would allow
`eglot.el` to get rid of the `eglot--recent-changes` list of changes:
instead of calling `track-changes-fetch` directly from the signal and
collect the changes in `eglot--recent-changes`, it would delay the call
to `track-changes-fetch` to the idle-timer where we run
`eglot--signal-textDocument/didChange` so it would never need to send
more than a "single change" to the server.

Similarly, it would allow changing the specific moment the signal is
called without necessarily moving the moment the changes are performed.
This may be necessary in `eglot.el`: there are various places where we
check the boolean value of `eglot--recent-changes` to know if we're
in sync with the server.  In the current code this value because non-nil
as soon as a modification is made, whereas with my patch it becomes
non-nil only after the end of the command that made the first change.
I don't know yet if the difference is important, but if it is, then
we'd want to ask `track-changes.el` to send the signal more promptly
(there is a FIXME to add such a feature), although we would still want
to delay the `track-changes-fetch` so it's called only at the end of
the command.

> and more specifically what would be a use case where the registered
> signal function does NOT call track-changes-fetch, but does something
> else, and track-changes-fetch is then called outside of the
> signal function.

As mentioned, a use-case I imagine are cases where we want to delay the
processing of changes to an idle timer.  In the current `eglot.el` that
idle timer processes a sequence of changes, but for some use cases it
may too difficult (for the reasons discussed above: it can be difficult
to make sense of earlier changes once they're disconnected from the
corresponding buffer state), so they'd instead prefer to call
`track-changes-fetch` from the idle timer (and thus let
`track-changes.el` combine all those changes).

> Finally, the doc string of track-changes-register does not describe
> the exact place in the buffer-change sequence where the signal
> function will be called, which makes it harder to reason about it.
> Will it be called where we now call signal_after_change or somewhere
> else?

Good point.  Currently it's called via `funcall-later` which isn't in
`master` but can be thought of as `run-with-timer` with a 0 delay.
[ In reality it relies on `pending_funcalls` instead.  ]

But indeed, I have a FIXME to let the caller request to signal
as soon as possible, i.e. directly from the `after-change-functions`.

I think it's better to use something like `funcall-later` by default
than to signal directly from `after-change-functions` because most
coders don't realize that `after-change-functions` can be called
thousands of times for a single command (and in normal testing, they'll
probably never bump into such a case either).  So waiting for the end of
the current command (via `funcall-later`) provides a behavior closer to
what most naive coders expect, I believe, and will save them from the
corner cased they weren't aware of.

> And how do you guarantee that the signal function will not be
> called again until track-changes-fetch is called?

By removing the tracker from `track-changes--clean-trackers` (and
re-adding it once `track-changes-fetch` is finished, which is the main
reason I make `track-changes-fetch` call a function argument rather
than making it return (BEG END CHANGES)).

In a later email you wrote:

>>     (concat (buffer-substring (point-min) beg)
>>             before
>>             (buffer-substring end (point-max)))
>
> But if you get several changes, the above will need to be done in
> reverse order, back-to-front, no?

No, because you (the client) never get several changes.

>> I don't mean to suggest to do that, since it's costly for large
>> buffers
> Exactly.  But if the buffer _is_ large, then what to do?

It all depends on the specific cases.  E.g. in the case of `eglot.el` we
don't need the full content of the buffer before the change.  We only
really need to know how many newlines were in `before` as well as some
kind of length of the last line of `before`.  I compute that
by inserting `before` into a temp buffer.  Note that this is
proportional to the size of `before` and not to the total size of
the buffer.

If we want to do better, I think we'd then need a more complex API where
the client can specify more precisely (presumably via some kind of
function) what information we need to record about the "before state"
(and how to update that information as new changes are performed).

I don't have a good idea for what such an API could look like.

>> Also, it would fix only the problem of pairing, and not the other ones.
> So the main/only problem this mechanism solves is the lack of pairing
> between before/after calls to modification hooks?

No, the text you cite is saying the opposite: that we don't want to
solve only the pairing.  I hope/want it to solve all the
problems I mentioned:

    - before and after calls are not necessarily paired.
    - the beg/end values don't always match.
    - there can be thousands of calls from within a single command.
    - these hooks are run at a fairly low-level so there are things they
      really shouldn't do, such as modify the buffer or wait.
    - the after call doesn't get enough info to rebuild the before-change state,
      so some callers need to use both before-c-f and after-c-f (and then
      deal with the first two points above).

I don't claim it solves them all in a perfect way, tho.

        Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30 14:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-03-30 16:45         ` Eli Zaretskii
  2024-03-31  2:57           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-01 11:53         ` Ihor Radchenko
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2024-03-30 16:45 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: yantar92, 70077, casouri, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 70077@debbugs.gnu.org,  mail@nicolasgoaziou.fr,  yantar92@posteo.net,
>   acm@muc.de,  joaotavora@gmail.com,  alan.zimm@gmail.com,
>   frederic.bour@lakaban.net,  phillip.lord@russet.org.uk,
>   stephen_leake@stephe-leake.org,  casouri@gmail.com,  qhong@alum.mit.edu
> Date: Sat, 30 Mar 2024 10:58:40 -0400
> 
> > Otherwise, the above looks like doing all the job in
> > after-change-functions, and it is not clear to me how is that better,
> > since if track-changes-fetch will fetch a series of changes,
> 
> `track-changes-fetch` will call its function argument only once.
> If several changes happened since last time, `track-changes.el` will
> summarize them into a single (BEG END BEFORE).

Then I don't think you will be able to guarantee that in all cases.
You are basically trying to solve a problem that many packages which
used the modification hooks tried to solve, but where they relied on
some specifics of the problem they wanted to solve, you are trying to
solve it in general, and I just don't believe it's possible (but will
be happy to learn I'm mistaken).





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30 16:45         ` Eli Zaretskii
@ 2024-03-31  2:57           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-03-31  2:57 UTC (permalink / raw)
  To: Eli Zaretskii
  Cc: yantar92, 70077, casouri, qhong, frederic.bour, joaotavora, mail,
	acm, stephen_leake, alan.zimm, phillip.lord

>> `track-changes-fetch` will call its function argument only once.
>> If several changes happened since last time, `track-changes.el` will
>> summarize them into a single (BEG END BEFORE).
>
> Then I don't think you will be able to guarantee that in all cases.
> You are basically trying to solve a problem that many packages which
> used the modification hooks tried to solve, but where they relied on
> some specifics of the problem they wanted to solve, you are trying to
> solve it in general, and I just don't believe it's possible (but will
> be happy to learn I'm mistaken).

Indeed, there are cases where bugs in our C code will get in the way,
and I'll have to return something like (BEG END :error).
But other than that, the current code already handles
"arbitrary" merging.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-30 14:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-03-30 16:45         ` Eli Zaretskii
@ 2024-04-01 11:53         ` Ihor Radchenko
  2024-04-01 14:51           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: Ihor Radchenko @ 2024-04-01 11:53 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>  ... That's why `track-changes.el` bundles those into a single (BEG END
> BEFORE) change, which makes sure BEG/END are currently-valid buffer
> positions and thus easy to use.
> The previous buffer state is not directly available but can be
> easily reconstructed from (BEG END BEFORE).

Do I understand correctly that such bundling may result in a situation
when BEFORE is very long? In particular, I am thinking of a situation
when there are changes near the beginning and the end of buffer in quick
succession.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-01 11:53         ` Ihor Radchenko
@ 2024-04-01 14:51           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-01 17:49             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-01 14:51 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

>>  ... That's why `track-changes.el` bundles those into a single (BEG END
>> BEFORE) change, which makes sure BEG/END are currently-valid buffer
>> positions and thus easy to use.
>> The previous buffer state is not directly available but can be
>> easily reconstructed from (BEG END BEFORE).
>
> Do I understand correctly that such bundling may result in a situation
> when BEFORE is very long? In particular, I am thinking of a situation
> when there are changes near the beginning and the end of buffer in quick
> succession.

Yes!


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-01 14:51           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-01 17:49             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-02 14:22               ` Ihor Radchenko
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-01 17:49 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

>>>  ... That's why `track-changes.el` bundles those into a single (BEG END
>>> BEFORE) change, which makes sure BEG/END are currently-valid buffer
>>> positions and thus easy to use.
>>> The previous buffer state is not directly available but can be
>>> easily reconstructed from (BEG END BEFORE).
>> Do I understand correctly that such bundling may result in a situation
>> when BEFORE is very long? In particular, I am thinking of a situation
>> when there are changes near the beginning and the end of buffer in quick
>> succession.
> Yes!

I'm not sure how to combine the benefits of combining small changes into
larger ones with the benefits of keeping distant changes separate.

I don't want to expose in the API a "sequence of changes", because
that's difficult to use in general: the only thing a client can conveniently
do with it is to keep their own copy of the buffer (e.g. in the LSP
server) and apply the changes in the order given.  But if for some
reason you need to do something else (e.g. convert the position from
charpos to line+col) you're in a world of hurt because (except for the
last element in the sequence) you don't have easy access to the state
the buffer was in when the change was made.

We could expose a list of simultaneous (and thus disjoint) changes,
which avoids the last problem.  But it's a fair bit more work for us, it
makes the API more complex for the clients, and it's rarely what the
clients really want anyway.

But it did occur to me that we could solve the "disjoint changes"
problem in the following way: signal the client (from
`before-change-functions`) when a change is about to be made "far" from
the currently pending changes.

The API would still expose only (BEG END BEFORE) rather than
lists/sequences of changes, but the clients could then decide to record
disjoint changes as they occur and thus create&manage their own
list/sequence of changes.  More specifically, someone could come
a create a new API on top which exposes a list/sequence of changes.

The main downside is that this signal of "upcoming disjoint change"
would have to be called from `before-change-functions`, so the client
would need to be careful as usual (e.g. don't modify the buffer, don't
do any blocking operation, ...).

        Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-01 17:49             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-02 14:22               ` Ihor Radchenko
  2024-04-02 15:17                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Ihor Radchenko @ 2024-04-02 14:22 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> Do I understand correctly that such bundling may result in a situation
>>> when BEFORE is very long? In particular, I am thinking of a situation
>>> when there are changes near the beginning and the end of buffer in quick
>>> succession.
>> Yes!
>
> I'm not sure how to combine the benefits of combining small changes into
> larger ones with the benefits of keeping distant changes separate.

I am not sure if combining small changes into larger ones is at all a
good idea.  The changes are stored in strings, which get allocated and
re-allocated repeatedly.  Repeated string allocations, especially when
strings keep growing towards the buffer size, is likely going to
increase consing and make GCs more frequent.

> I don't want to expose in the API a "sequence of changes", because
> that's difficult to use in general: the only thing a client can conveniently
> do with it is to keep their own copy of the buffer (e.g. in the LSP
> server) and apply the changes in the order given.  But if for some
> reason you need to do something else (e.g. convert the position from
> charpos to line+col) you're in a world of hurt because (except for the
> last element in the sequence) you don't have easy access to the state
> the buffer was in when the change was made.

Aside...
How nice would it be if buffer state and buffer text were persistent.
Like in https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

> We could expose a list of simultaneous (and thus disjoint) changes,
> which avoids the last problem.  But it's a fair bit more work for us, it
> makes the API more complex for the clients, and it's rarely what the
> clients really want anyway.

FYI, Org parser maintains such a list.
We previously discussed a similar API in
https://yhetil.org/emacs-bugs/87o7iq1emo.fsf@localhost/

> But it did occur to me that we could solve the "disjoint changes"
> problem in the following way: signal the client (from
> `before-change-functions`) when a change is about to be made "far" from
> the currently pending changes.
>
> The API would still expose only (BEG END BEFORE) rather than
> lists/sequences of changes, but the clients could then decide to record
> disjoint changes as they occur and thus create&manage their own
> list/sequence of changes.  More specifically, someone could come
> a create a new API on top which exposes a list/sequence of changes.
>
> The main downside is that this signal of "upcoming disjoint change"
> would have to be called from `before-change-functions`, so the client
> would need to be careful as usual (e.g. don't modify the buffer, don't
> do any blocking operation, ...).

This makes sense.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-02 14:22               ` Ihor Radchenko
@ 2024-04-02 15:17                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-02 16:21                   ` Ihor Radchenko
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-02 15:17 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

[-- Attachment #1: Type: text/plain, Size: 3533 bytes --]

>> I'm not sure how to combine the benefits of combining small changes into
>> larger ones with the benefits of keeping distant changes separate.
> I am not sure if combining small changes into larger ones is at all a
> good idea.

In my experience, if the amount of work to be done "per change" is not
completely trivial, combining small changes is indispensable (the worst
part being that the cases where it's needed are sufficiently infrequent
that naive users of `*-change-functions` won't know about that, so we
end up with unacceptable slowdowns in corner cases).

It also keeps the API simpler since the clients only need to care about
at most one (BEG END BEFORE) item at any given time.

> The changes are stored in strings, which get allocated and
> re-allocated repeatedly.

Indeed.  Tho for N small changes, my code should perform only O(log N)
re-allocations.

> Repeated string allocations, especially when strings keep growing
> towards the buffer size, is likely going to increase consing and make
> GCs more frequent.

Similar allocations presumably take place anyway while running the code
(e.g. for the `buffer-undo-list`), so I'm hoping the effect will be
"lost in the noise".

But admittedly for code which does not need the `before` string at all
(such as `diff-mode.el`), it's indeed a waste of effort.  I haven't been
able to come up with an API which is still simple but without such costs.

> Aside...
> How nice would it be if buffer state and buffer text were persistent.
> Like in https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimplementation

🙂

>> We could expose a list of simultaneous (and thus disjoint) changes,
>> which avoids the last problem.  But it's a fair bit more work for us, it
>> makes the API more complex for the clients, and it's rarely what the
>> clients really want anyway.
> FYI, Org parser maintains such a list.

Could you point me to the relevant code (I failed to find it)?

> We previously discussed a similar API in
> https://yhetil.org/emacs-bugs/87o7iq1emo.fsf@localhost/

IIUC this discusses a *sequence* of edits.  In the point to which you
replied I was discussing keeping a list of *simultaneous* edits.

This said, the downside in both cases is that the specific data that we
need from such a list tends to depend on the user.  E.g. you suggest

    (BEG END_BEFORE END_AFTER COUNTER)

but that is not sufficient reconstruct the corresponding buffer state,
so things like Eglot/CRDT can't use it.  Ideally for CRDT I think you'd
want a sequence of

    (BEG END-BEFORE STRING-AFTER)

but for Eglot this is not sufficient because Eglot needs to convert BEG
and END_BEFORE into LSP positions (i.e. "line+col") and for that it
needs to reproduce the past buffer state.  So instead, what Eglot needs
(and does indeed build using `*-change-functions`) is a sequence of

    (LSP-BEG LSP-END-BEFORE STRING-AFTER)

[ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure
  exactly why, but I guess it's a piece of redundant info the servers
  can use to sanity-check the data?  ]

>> But it did occur to me that we could solve the "disjoint changes"
>> problem in the following way: signal the client (from
>> `before-change-functions`) when a change is about to be made "far" from
>> the currently pending changes.
> This makes sense.

The code changes were actually quite simple, so I have now included it.
See my current PoC code below.

        Stefan

[-- Attachment #2: track-changes.el --]
[-- Type: application/emacs-lisp, Size: 17616 bytes --]

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-02 15:17                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-02 16:21                   ` Ihor Radchenko
  2024-04-02 17:51                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Ihor Radchenko @ 2024-04-02 16:21 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>>> I'm not sure how to combine the benefits of combining small changes into
>>> larger ones with the benefits of keeping distant changes separate.
>> I am not sure if combining small changes into larger ones is at all a
>> good idea.
>
> In my experience, if the amount of work to be done "per change" is not
> completely trivial, combining small changes is indispensable (the worst
> part being that the cases where it's needed are sufficiently infrequent
> that naive users of `*-change-functions` won't know about that, so we
> end up with unacceptable slowdowns in corner cases).

I agree, but only when subsequent changes intersect each other.
When distant changes are combined together, they cover a lot of buffer
text that has not been changed - they may trigger a lot more work
compared to more granular changes. Consider, for example, two changes
near the beginning and the end of the buffer:
(1..10) and (1000000...(point-max))
If such changes are tracked by tree-sitter or other parser, there is a
world of difference between requesting to re-parse individual segments
and the whole buffer.

>> The changes are stored in strings, which get allocated and
>> re-allocated repeatedly.
>
> Indeed.  Tho for N small changes, my code should perform only O(log N)
> re-allocations.

May you explain a bit more about this?
My reading of `track-changes--before' is that `concat' is called on
every change except when a new change is already inside the previously
recorded one. (aside: I have a hard time reading the code because of
confusing slot names: bbeg vs beg??)

>> Repeated string allocations, especially when strings keep growing
>> towards the buffer size, is likely going to increase consing and make
>> GCs more frequent.
>
> Similar allocations presumably take place anyway while running the code
> (e.g. for the `buffer-undo-list`), so I'm hoping the effect will be
> "lost in the noise".

I disagree. `buffer-undo-list' only includes the text that has been
actually changed. It never creates strings that span between
distant regions. As a terminal case, consider alternating between first
and second half of the buffer, starting in the middle and editing
towards the buffer boundaries - this will involve re-allocation of
buffer-long strings on average.

>>> We could expose a list of simultaneous (and thus disjoint) changes,
>>> which avoids the last problem.  But it's a fair bit more work for us, it
>>> makes the API more complex for the clients, and it's rarely what the
>>> clients really want anyway.
>> FYI, Org parser maintains such a list.
>
> Could you point me to the relevant code (I failed to find it)?

That code handles a lot more than just changes, so it might not be the
best reference. Anyway...

See the docstring of `org-element--cache-sync-requests', and
the branch in `org-element--cache-submit-request' starting from
	  ;; Current changes can be merged with first sync request: we
	  ;; can save a partial cache synchronization.

>> We previously discussed a similar API in
>> https://yhetil.org/emacs-bugs/87o7iq1emo.fsf@localhost/
>
> IIUC this discusses a *sequence* of edits.  In the point to which you
> replied I was discussing keeping a list of *simultaneous* edits.

Hmm. By referring to buffer-undo-list, I meant that intersecting edits
will be automatically merged, as it is usually done in undo system.

I am also not 100% sure why edits being simultaneous is any relevant.
Maybe we are talking about different things?

What I was talking about is (1) automatically merging subsequent edits
like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not
intersect with previous edited region, record it separately without
merging.

> This said, the downside in both cases is that the specific data that we
> need from such a list tends to depend on the user.  E.g. you suggest
>
>     (BEG END_BEFORE END_AFTER COUNTER)
>
> but that is not sufficient reconstruct the corresponding buffer state,
> so things like Eglot/CRDT can't use it.  Ideally for CRDT I think you'd
> want a sequence of
>
>     (BEG END-BEFORE STRING-AFTER)

Right. It's just that Org mode does not need STRING-AFTER, which is why
I did not think about it in my proposal. Of course, having STRING-AFTER
is required to get full reconstruction of the buffer state.

> but for Eglot this is not sufficient because Eglot needs to convert BEG
> and END_BEFORE into LSP positions (i.e. "line+col") and for that it
> needs to reproduce the past buffer state.  So instead, what Eglot needs
> (and does indeed build using `*-change-functions`) is a sequence of
>
>     (LSP-BEG LSP-END-BEFORE STRING-AFTER)
>
> [ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure
>   exactly why, but I guess it's a piece of redundant info the servers
>   can use to sanity-check the data?  ]

It is to support deprecated LSP spec (I presume that older LSP servers
may still be using the old spec):

https://microsoft.github.io/language-server-protocol/specifications/specification-3-15/
export type TextDocumentContentChangeEvent = {
	 * The range of the document that changed.
	range: Range;

	 * The optional length of the range that got replaced.
	 * @deprecated use range instead.
	rangeLength?: number;

	 * The new text for the provided range.
	text: string;

> The code changes were actually quite simple, so I have now included it.
> See my current PoC code below.

Am I missing something, or will `track-changes--signal-if-disjoint'
alter `track-changes--state' when it calls `track-changes-fetch' ->
`track-changes-reset'? Won't that interfere with the information passed
to non-disjoint trackers?

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-02 16:21                   ` Ihor Radchenko
@ 2024-04-02 17:51                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-03 12:34                       ` Ihor Radchenko
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-02 17:51 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

> I agree, but only when subsequent changes intersect each other.
> When distant changes are combined together, they cover a lot of buffer
> text that has not been changed - they may trigger a lot more work
> compared to more granular changes. Consider, for example, two changes
> near the beginning and the end of the buffer:
> (1..10) and (1000000...(point-max))
> If such changes are tracked by tree-sitter or other parser, there is a
> world of difference between requesting to re-parse individual segments
> and the whole buffer.

Oh, definitely.  For some clients, the amount of work doesn't really
depend on the size of the change, tho (OTOH the work done by
`track-changes.el` *is* affected by the size of the change).
Handling disjoint changes can be very important.

I'm fairly satisfied with my `track-changes-register-disjoint` solution
to this problem.

>>> The changes are stored in strings, which get allocated and
>>> re-allocated repeatedly.
>> Indeed.  Tho for N small changes, my code should perform only O(log N)
>> re-allocations.
> May you explain a bit more about this?
> My reading of `track-changes--before' is that `concat' is called on
> every change except when a new change is already inside the previously
> recorded one.

The "previously recorded one" (i.e. the info stored in
`track-changes--before-beg/end/string`) is a conservative
over-approximation of the changed area.  We (at least) double its size
every time we have to grow it, specifically to reduce the number of
times we need to grow it (otherwise we quickly fall into the O(N²)
trap).

> (aside: I have a hard time reading the code because of
> confusing slot names: bbeg vs beg??)

"bbeg/bend" stands for "before beg" and "before end" (i.e. the positions as
they were before the change).  BTW, those two slots don't exist any more
in the latest code I sent (but vars with those names are still present
here and there).

>>>> We could expose a list of simultaneous (and thus disjoint) changes,
>>>> which avoids the last problem.  But it's a fair bit more work for us, it
>>>> makes the API more complex for the clients, and it's rarely what the
>>>> clients really want anyway.
>>> FYI, Org parser maintains such a list.
>> Could you point me to the relevant code (I failed to find it)?
> That code handles a lot more than just changes, so it might not be the
> best reference. Anyway...
> See the docstring of `org-element--cache-sync-requests', and
> the branch in `org-element--cache-submit-request' starting from
> 	  ;; Current changes can be merged with first sync request: we
> 	  ;; can save a partial cache synchronization.

Thanks.  [ I see I did search the right file, but the code was too
intertwined with other things for me to find that part.  ]

[ Further discussions of Org's code moved off-list.  ]

> Hmm. By referring to buffer-undo-list, I meant that intersecting edits
> will be automatically merged, as it is usually done in undo system.

[ Actually `buffer-undo-list` doesn't do very much merging, if any.
  AFAIK the only merging that happens there is to "amalgamate" consecutive
  `self-insert-command`s or consecutive single-char `delete-char`s. 🙂  ]

> I am also not 100% sure why edits being simultaneous is any relevant.

If they're not simultaneous it means that they describe N different
buffer states and that to interpret a specific change in the list
(e.g. to convert buffer positions to line+col positions) you
may have to reconstruct its before&after states by applying the
other changes.

For some use cases this is quite inconvenient.
In any case, it's not very important, I think.

> What I was talking about is (1) automatically merging subsequent edits
> like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not
> intersect with previous edited region, record it separately without
> merging.

That's what you can get now with `track-changes-register-disjoint`.

>> The code changes were actually quite simple, so I have now included it.
>> See my current PoC code below.
> Am I missing something, or will `track-changes--signal-if-disjoint'
> alter `track-changes--state' when it calls `track-changes-fetch' ->
> `track-changes-reset'?

[ I assume you meant `track-changes--clean-state` instead of
  `track-changes-reset`.  ] 

Yes it will (and there's a bug in `track-changes--before` because of
that that I still need to fix 🙁), but that should not be a problem for
the clients.

> Won't that interfere with the information passed to
> non-disjoint trackers?

No: the separate states will be (re)combined when those trackers call
`track-changes-fetch`.

There is a nearby poor interference between clients, tho: if one
client calls `track-changes-fetch` eagerly all the time, it may prevent
another client from getting a "disjoint change" signal because
disjointness is noticed only between changes that occurred since the
last call to `track-changes--clean-state` (which is called mostly by
`track-changes-fetch`).
I guess this deserves a FIXME.

        Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-02 17:51                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-03 12:34                       ` Ihor Radchenko
  2024-04-03 12:45                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Ihor Radchenko @ 2024-04-03 12:34 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> My reading of `track-changes--before' is that `concat' is called on
>> every change except when a new change is already inside the previously
>> recorded one.
>
> The "previously recorded one" (i.e. the info stored in
> `track-changes--before-beg/end/string`) is a conservative
> over-approximation of the changed area.  We (at least) double its size
> every time we have to grow it, specifically to reduce the number of
> times we need to grow it (otherwise we quickly fall into the O(N²)
> trap).

I am not 100% where the O(N^2) is coming from. I'd rather see the
"doubling" explained better in the comment before the code.

In any case, while I do see where the idea of over-expanding the region
comes from, it is not ideal for my use-case in Org mode - it is often
critical to know precise region where the change happened.

Larger region spanning over more lines than needed can easily trigger
re-parsing too much, especially when changes are being made near Org
heading lines. In worst-case scenario, Org mode has to drop parser cache
for the whole edited subtree repeatedly, slowing things down by orders
of magnitude.

And this is not a theoretical consideration - I had to tweak things
considerably in the past for certain Org mass-editing commands in order
to not slow them down because of repeated cache drops.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-03 12:34                       ` Ihor Radchenko
@ 2024-04-03 12:45                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-04 17:58                           ` Ihor Radchenko
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-03 12:45 UTC (permalink / raw)
  To: Ihor Radchenko
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

> I am not 100% where the O(N^2) is coming from.

If you add N times 1 char to an (initially empty) string, the total cost
of constructing the resulting N-char string is O(N²).

> In any case, while I do see where the idea of over-expanding the region
> comes from, it is not ideal for my use-case in Org mode - it is often
> critical to know precise region where the change happened.

The reported changes are precise (modulo the fact that they are
combined): the over-expanded `track-changes--before-string`
is trimmed to the actual changes when it gets moved to a "state"
(that's done in `track-changes--clean-state`).

Also, note that in 99% of the cases, a command performs only a single
change (insertion or deletion), in which case there's no combining nor
over-expansion before the signal gets called.  Of course, approximation
may still happen if you decide not to act immediately when the signal
is sent.

        Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-03 12:45                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-04 17:58                           ` Ihor Radchenko
  0 siblings, 0 replies; 50+ messages in thread
From: Ihor Radchenko @ 2024-04-04 17:58 UTC (permalink / raw)
  To: Stefan Monnier
  Cc: casouri, 70077, qhong, frederic.bour, joaotavora, mail, acm,
	Eli Zaretskii, stephen_leake, alan.zimm, phillip.lord

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> I am not 100% where the O(N^2) is coming from.
>
> If you add N times 1 char to an (initially empty) string, the total cost
> of constructing the resulting N-char string is O(N²).

I guess that another approach is not concatenating the strings and
instead accumulating them into a list (or two lists - before/after).
That will get rid of logN multiplier :)

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-03-29 16:15 bug#70077: An easier way to track buffer changes Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
                   ` (2 preceding siblings ...)
  2024-03-30  9:51 ` Ihor Radchenko
@ 2024-04-05 22:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-06  8:43   ` Eli Zaretskii
                     ` (2 more replies)
  3 siblings, 3 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-05 22:12 UTC (permalink / raw)
  To: 70077; +Cc: Alan Mackenzie, Ihor Radchenko

[-- Attachment #1: Type: text/plain, Size: 1338 bytes --]

My PoC has matured into something quite usable.  The maturing part
included making it robust against mismatched or missing
`*-change-functions` calls, which it tries to detect (the problem
is then passed on to the client in the form of a change with an
unknown "before" content).
I also wrote a test which makes random changes and verifies that
the clients receive correct descriptions.

I currently use it only for Eglot and diff-mode, but I'm quite happy
with it.  For Eglot, it nicely packs up consecutive changes (like
consecutive `self-insert-command`s) into a single change yet keeps
changes to different parts of the buffer nicely separate.

The API is still about the same as before except that
`track-changes-register` now takes 3 options:

- `immediate` to control when the presence of changes is signaled
  (default to use `funcall-later` but `immediate` makes it use `funcall`
  so there is no delay).
- `disjoint` to prevent changes to different parts of the buffer from
  being combined into too large a change change.
- `nobefore` which indicates that the client doesn't actually need the
  `before` contents, so will only get the length thereof (like
  `after-change-functions` does).

I'm proposing we include it into `master`.
I have pushed the patch to `scratch/track-changes` (and attached it below).


        Stefan

[-- Attachment #2: 0001-lisp-emacs-lisp-track-changes.el-New-file.patch --]
[-- Type: text/x-diff, Size: 54837 bytes --]

From 4fd5a97052472eb1c332ea9b3f9ff90e94ad0cd1 Mon Sep 17 00:00:00 2001
From: Stefan Monnier <monnier@iro.umontreal.ca>
Date: Fri, 5 Apr 2024 17:37:32 -0400
Subject: [PATCH] lisp/emacs-lisp/track-changes.el: New file

This new package provides an API that is easier to use right than
our `*-change-functions` hooks.

The patch includes changes to `diff-mode.el` and `eglot.el` to
make use of this new package.

* lisp/emacs-lisp/track-changes.el: New file.
* test/lisp/emacs-lisp/track-changes-tests.el: New file.

* lisp/progmodes/eglot.el: Require `track-changes`.
(eglot--virtual-pos-to-lsp-position): New function.
(eglot--track-changes): New var.
(eglot--managed-mode): Use `track-changes-register` i.s.o
`after/before-change-functions`.
(eglot--before-change): Delete function.
(eglot--track-changes-signal): Rename from `eglot--after-change`
and adjust arguments accordingly.
(eglot--track-changes-fetch): New function.
(eglot--signal-textDocument/didChange): Call it and simplify now that
corner-cases are handled by `track-changes`.

* lisp/vc/diff-mode.el: Require `track-changes`.
Also require `easy-mmode` before the `eval-when-compile`s.
(diff-unhandled-changes): Delete variable.
(diff-after-change-function): Delete function.
(diff--track-changes-function): Rename from `diff-post-command-hook`
and adjust to new calling convention.
(diff--track-changes): New variable.
(diff--track-changes-signal): New function.
(diff-mode, diff-minor-mode): Use it with `track-changes-register`.
---
 lisp/emacs-lisp/track-changes.el            | 605 ++++++++++++++++++++
 lisp/progmodes/eglot.el                     | 105 ++--
 lisp/vc/diff-mode.el                        | 107 ++--
 test/lisp/emacs-lisp/track-changes-tests.el | 149 +++++
 4 files changed, 852 insertions(+), 114 deletions(-)
 create mode 100644 lisp/emacs-lisp/track-changes.el
 create mode 100644 test/lisp/emacs-lisp/track-changes-tests.el

diff --git a/lisp/emacs-lisp/track-changes.el b/lisp/emacs-lisp/track-changes.el
new file mode 100644
index 00000000000..7644a7de98d
--- /dev/null
+++ b/lisp/emacs-lisp/track-changes.el
@@ -0,0 +1,605 @@
+;;; track-changes.el --- API to react to buffer modifications  -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2024  Free Software Foundation, Inc.
+
+;; Author: Stefan Monnier <monnier@iro.umontreal.ca>
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; This library is a layer of abstraction above `before-change-functions'
+;; and `after-change-functions' which takes care of accumulating changes
+;; until a time when its client finds it convenient to react to them.
+;;
+;; It provides an API that is easier to use correctly than our
+;; `*-change-functions` hooks.  Problems that it claims to solve:
+;;
+;; - Before and after calls are not necessarily paired.
+;; - The beg/end values don't always match.
+;; - There's usually only one call to the hooks per command but
+;;   there can be thousands of calls from within a single command,
+;;   so naive users will tend to write code that performs poorly
+;;   in those rare cases.
+;; - The hooks are run at a fairly low-level so there are things they
+;;   really shouldn't do, such as modify the buffer or wait.
+;; - The after call doesn't get enough info to rebuild the before-change state,
+;;   so some callers need to use both before-c-f and after-c-f (and then
+;;   deal with the first two points above).
+;;
+;; The new API is almost like `after-change-functions` except that:
+;; - It provides the "before string" (i.e. the previous content of
+;;   the changed area) rather than only its length.
+;; - It can combine several changes into larger ones.
+;; - Clients do not have to process changes right away, instead they
+;;   can let changes accumulate (by combining them into a larger change)
+;;   until it is convenient for them to process them.
+;; - By default, changes are signaled at most once per command.
+
+;; The API consists in the following functions:
+;;
+;;     (track-changes-register SIGNAL &key NOBEFORE DISJOINT IMMEDIATE)
+;;     (track-changes-fetch ID FUNC)
+;;     (track-changes-unregister ID)
+;;
+;; A typical use case might look like:
+;;
+;;     (defvar my-foo--change-tracker nil)
+;;     (define-minor-mode my-foo-mode
+;;       "Fooing like there's no tomorrow."
+;;       (if (null my-foo-mode)
+;;           (when my-foo--change-tracker
+;;             (track-changes-unregister my-foo--change-tracker)
+;;             (setq my-foo--change-tracker nil))
+;;         (unless my-foo--change-tracker
+;;           (setq my-foo--change-tracker
+;;                 (track-changes-register
+;;                  (lambda (id)
+;;                    (track-changes-fetch
+;;                     id (lambda (beg end before)
+;;                          ..DO THE THING..))))))))
+
+;;; Code:
+
+(require 'cl-lib)
+
+(unless (fboundp 'funcall-later)
+  (defun funcall-later (&rest args)
+    ;; FIXME: Not sure if `run-with-timer' preserves ordering between
+    ;; different calls with the same target time.
+   (apply #'run-with-timer 0 nil args)))
+
+;;;; Internal types and variables.
+
+(cl-defstruct (track-changes--tracker
+               (:noinline t)
+               (:constructor nil)
+               (:constructor track-changes--tracker ( signal state
+                                                      &optional
+                                                      nobefore immediate)))
+  signal state nobefore immediate)
+
+(cl-defstruct (track-changes--state
+               (:noinline t)
+               (:constructor nil)
+               (:constructor track-changes--state ()))
+  "Object holding a description of a buffer state.
+BEG..END is the area that was changed and BEFORE is its previous content.
+If the current buffer currently holds the content of the next state, you can get
+the contents of the previous state with:
+
+    (concat (buffer-substring (point-min) beg)
+                before
+                (buffer-substring end (point-max)))
+
+NEXT is the next state object (i.e. a more recent state).
+If NEXT is nil it means it's most recent state and it may be incomplete
+\(BEG/END/BEFORE may be nil), in which case those fields will take their
+values from `track-changes--before-(beg|end|before)' when the next
+state is create."
+  (beg (point-max))
+  (end (point-min))
+  (before nil)
+  (next nil))
+
+(defvar-local track-changes--trackers ()
+  "List of trackers currently registered in the current buffer.")
+(defvar-local track-changes--clean-trackers ()
+  "List of trackers that are clean.
+Those are the trackers that get signaled when a change is made.")
+
+(defvar-local track-changes--disjoint-trackers ()
+ "List of trackers that want to react to disjoint changes.
+These trackers' are signaled every time track-changes notices
+that some upcoming changes touch another \"distant\" part of the buffer.")
+
+(defvar-local track-changes--state nil)
+
+;; `track-changes--before-*' keep track of the content of the
+;; buffer when `track-changes--state' was cleaned.
+(defvar-local track-changes--before-beg (point-min)
+  "Beginning position of the remembered \"before string\".")
+(defvar-local track-changes--before-end (point-min)
+  "End position of the text replacing the \"before string\".")
+(defvar-local track-changes--before-string ""
+  "String holding some contents of the buffer before the current change.
+This string is supposed to cover all the already modified areas plus
+the upcoming modifications announced via `before-change-functions'.
+If all trackers are `nobefore', then this holds the `buffer-size' before
+the current change.")
+(defvar-local track-changes--before-no t
+  "If non-nil, all the trackers are `nobefore'.
+Should be equal to (memq #\\='track-changes--before before-change-functions).")
+
+(defvar-local track-changes--before-clean 'unset
+  "If non-nil, the `track-changes--before-*' vars are old.
+More specifically it means they cover a part of the buffer relevant
+for the previous state.
+It can take two non-nil values:
+- `unset': Means that the vars cover some older state.
+  This is what it is set right after creating a fresh new state.
+- `set': Means the vars reflect the current buffer state.
+  This is what it is set to after the first `before-change-functions'
+  but before an `after-change-functions'.")
+
+(defvar-local track-changes--buffer-size nil
+  "Current size of the buffer, as far as this library knows.
+This is used to try and detect cases where buffer modifications are \"lost\".")
+
+;;;; Exposed API.
+
+(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
+  "Register a new tracker and return a new tracker ID.
+SIGNAL is a function that will be called with one argument (the tracker ID)
+after the current buffer is modified, so that we can react to the change.
+Once called, SIGNAL is not called again until `track-changes-fetch'
+is called with the corresponding tracker ID.
+
+If optional argument NOBEFORE is non-nil, it means that this tracker does
+not need the BEFORE strings (it will receive their size instead).
+
+By default SIGNAL is called as soon as convenient after a change, which is
+usually right after the end of the current command.
+If optional argument IMMEDIATE is non-nil it means SIGNAL should be called
+as soon as a change is detected,
+BEWARE: In that case SIGNAL is called directly from `after-change-functions'
+and should thus be extra careful: don't modify the buffer, don't call a function
+that may block, do as little work as possible, ...
+When IMMEDIATE is non-nil, the SIGNAL should preferably not always call
+`track-changes-fetch', since that would defeat the purpose of this library.
+
+If optional argument DISJOINT is non-nil, SIGNAL is called every time we are
+about to combine changes from \"distant\" parts of the buffer.
+This is needed when combining disjoint changes into one bigger change
+is unacceptable, typically for performance reasons.
+These calls are distinguished from normal calls by calling SIGNAL with
+a second argument which is the distance between the upcoming change and
+the previous changes.
+BEWARE: In that case SIGNAL is called directly from `before-change-functions'
+and should thus be extra careful: don't modify the buffer, don't call a function
+that may block, ...
+In order to prevent the upcoming change from being combined with the previous
+changes, SIGNAL needs to call `track-changes-fetch' before it returns."
+  (when (and nobefore disjoint)
+    ;; FIXME: Without `before-change-functions', we can only discover
+    ;; a disjoint change after the fact, which is not good enough.
+    ;; But we could use  stripped down before-change-function,
+    (error "`disjoint' not supported for `nobefore' trackers"))
+  (track-changes--clean-state)
+  (unless nobefore
+    (setq track-changes--before-no nil)
+    (add-hook 'before-change-functions #'track-changes--before nil t))
+  (add-hook 'after-change-functions  #'track-changes--after  nil t)
+  (let ((tracker (track-changes--tracker signal track-changes--state
+                                         nobefore immediate)))
+    (push tracker track-changes--trackers)
+    (push tracker track-changes--clean-trackers)
+    (when disjoint
+      (push tracker track-changes--disjoint-trackers))
+    tracker))
+
+(defun track-changes-unregister (id)
+  "Remove the tracker denoted by ID.
+Trackers can consume resources (especially if `track-changes-fetch' is
+not called), so it is good practice to unregister them when you don't
+need them any more."
+  (unless (memq id track-changes--trackers)
+    (error "Unregistering a non-registered tracker: %S" id))
+  (setq track-changes--trackers (delq id track-changes--trackers))
+  (setq track-changes--clean-trackers (delq id track-changes--clean-trackers))
+  (setq track-changes--disjoint-trackers
+        (delq id track-changes--disjoint-trackers))
+  (when (cl-every #'track-changes--tracker-nobefore track-changes--trackers)
+    (setq track-changes--before-no t)
+    (remove-hook 'before-change-functions #'track-changes--before t))
+  (when (null track-changes--trackers)
+    (mapc #'kill-local-variable
+          '(track-changes--before-beg
+            track-changes--before-end
+            track-changes--before-string
+            track-changes--buffer-size
+            track-changes--before-clean
+            track-changes--state))
+    (remove-hook 'after-change-functions  #'track-changes--after  t)))
+
+(defun track-changes-fetch (id func)
+  "Fetch the pending changes.
+ID is the tracker ID returned by a previous `track-changes-register'.
+FUNC is a function.  It is called with 3 arguments (BEGIN END BEFORE)
+where BEGIN..END delimit the region that was changed since the last
+time `track-changes-fetch' was called and BEFORE is a string containing
+the previous content of that region (or just its length as an integer
+If the tracker ID was registered with the `nobefore' option).
+If some error caused us to miss some changes, then BEFORE will be the
+symbol `error' to indicate that the buffer got out of sync.
+This reflects a bug somewhere, so please report it when it happens.
+
+If no changes occurred since the last time, FUNC is not called and
+we return nil, otherwise we return the value returned by FUNC,
+and re-enable the TRACKER corresponding to ID."
+  (cl-assert (memq id track-changes--trackers))
+  (unless (equal track-changes--buffer-size (buffer-size))
+    (track-changes--recover-from-error))
+  (let ((beg nil)
+        (end nil)
+        (before t)
+        (lenbefore 0)
+        (states ()))
+    ;; Transfer the data from `track-changes--before-string'
+    ;; to the tracker's state object, if needed.
+    (track-changes--clean-state)
+    ;; We want to combine the states from most recent to oldest,
+    ;; so reverse them.
+    (let ((state (track-changes--tracker-state id)))
+      (while state
+        (push state states)
+        (setq state (track-changes--state-next state))))
+
+    (cond
+     ((eq (car states) track-changes--state)
+      (cl-assert (null (track-changes--state-before (car states))))
+      (setq states (cdr states)))
+     (t
+      ;; The states are disconnected from the latest state because
+      ;; we got out of sync!
+      (cl-assert (eq (track-changes--state-before (car states)) 'error))
+      (setq beg (point-min))
+      (setq end (point-max))
+      (setq before 'error)
+      (setq states nil)))
+
+    (dolist (state states)
+      (let ((prevbeg (track-changes--state-beg state))
+            (prevend (track-changes--state-end state))
+            (prevbefore (track-changes--state-before state)))
+        (if (eq before t)
+            (progn
+              ;; This is the most recent change.  Just initialize the vars.
+              (setq beg prevbeg)
+              (setq end prevend)
+              (setq lenbefore
+                    (if (stringp prevbefore) (length prevbefore) prevbefore))
+              (setq before
+                    (unless (track-changes--tracker-nobefore id) prevbefore)))
+          (let ((endb (+ beg lenbefore)))
+            (when (< prevbeg beg)
+              (if (not before)
+                  (setq lenbefore (+ (- beg prevbeg) lenbefore))
+                (setq before
+                      (concat (buffer-substring-no-properties
+                               prevbeg beg)
+                              before))
+                (setq lenbefore (length before)))
+              (setq beg prevbeg)
+              (cl-assert (= endb (+ beg lenbefore))))
+            (when (< endb prevend)
+              (let ((new-end (+ end (- prevend endb))))
+                (if (not before)
+                    (setq lenbefore (+ lenbefore (- new-end end)))
+                  (setq before
+                        (concat before
+                                (buffer-substring-no-properties
+                                 end new-end)))
+                  (setq lenbefore (length before)))
+                (setq end new-end)
+                (cl-assert (= prevend (+ beg lenbefore)))
+                (setq endb (+ beg lenbefore))))
+            (cl-assert (<= beg prevbeg prevend endb))
+            ;; The `prevbefore' is covered by the new one.
+            (if (not before)
+                (setq lenbefore
+                      (+ (- prevbeg beg)
+                         (if (stringp prevbefore)
+                             (length prevbefore) prevbefore)
+                         (- endb prevend)))
+              (setq before
+                    (concat (substring before 0 (- prevbeg beg))
+                            prevbefore
+                            (substring before (- (length before)
+                                                 (- endb prevend)))))
+              (setq lenbefore (length before)))))))
+    (if (null beg)
+        (progn
+          (cl-assert (null states))
+          (cl-assert (memq id track-changes--clean-trackers))
+          (cl-assert (eq (track-changes--tracker-state id)
+                         track-changes--state))
+          ;; Nothing to do.
+          nil)
+      (cl-assert (<= (point-min) beg end (point-max)))
+      ;; Update the tracker's state *before* running `func' so we don't risk
+      ;; mistakenly replaying the changes in case `func' exits non-locally.
+      (setf (track-changes--tracker-state id) track-changes--state)
+      (unwind-protect (funcall func beg end (or before lenbefore))
+        ;; Re-enable the tracker's signal only after running `func', so
+        ;; as to avoid recursive invocations.
+        (cl-pushnew id track-changes--clean-trackers)))))
+
+;;;; Auxiliary functions.
+
+(defun track-changes--clean-state ()
+  (cond
+   ((null track-changes--state)
+    (cl-assert track-changes--before-clean)
+    (cl-assert (null track-changes--buffer-size))
+    ;; No state has been created yet.  Do it now.
+    (setq track-changes--buffer-size (buffer-size))
+    (when track-changes--before-no
+      (setq track-changes--before-string (buffer-size)))
+    (setq track-changes--state (track-changes--state)))
+   (track-changes--before-clean nil)
+   (t
+    (cl-assert (<= (track-changes--state-beg track-changes--state)
+                   (track-changes--state-end track-changes--state)))
+    (let ((actual-beg (track-changes--state-beg track-changes--state))
+          (actual-end (track-changes--state-end track-changes--state)))
+      (if track-changes--before-no
+          (progn
+            (cl-assert (integerp track-changes--before-string))
+            (setf (track-changes--state-before track-changes--state)
+                  (- track-changes--before-string
+                     (- (buffer-size) (- actual-end actual-beg))))
+            (setq track-changes--before-string (buffer-size)))
+        (cl-assert (<= track-changes--before-beg
+                       actual-beg actual-end
+                       track-changes--before-end))
+        (cl-assert (null (track-changes--state-before track-changes--state)))
+        ;; The `track-changes--before-*' vars can cover more text than the
+        ;; actually modified area, so trim it down now to the relevant part.
+        (unless (= (- track-changes--before-end track-changes--before-beg)
+                   (- actual-end actual-beg))
+          (setq track-changes--before-string
+                (substring track-changes--before-string
+                           (- actual-beg track-changes--before-beg)
+                           (- (length track-changes--before-string)
+                              (- track-changes--before-end actual-end))))
+          (setq track-changes--before-beg actual-beg)
+          (setq track-changes--before-end actual-end))
+        (setf (track-changes--state-before track-changes--state)
+              track-changes--before-string)))
+    ;; Note: We preserve `track-changes--before-*' because they may still
+    ;; be needed, in case `after-change-functions' are run before the next
+    ;; `before-change-functions'.
+    ;; Instead, we set `track-changes--before-clean' to `unset' to mean that
+    ;; `track-changes--before-*' can be reset at the next
+    ;; `before-change-functions'.
+    (setq track-changes--before-clean 'unset)
+    (let ((new (track-changes--state)))
+      (setf (track-changes--state-next track-changes--state) new)
+      (setq track-changes--state new)))))
+
+(defvar track-changes--disjoint-threshold 100
+  "Distance below which changes are not considered disjoint.")
+
+(defvar track-changes--error-log ()
+  "List of errors encountered.
+Each element is a triplet (BUFFER-NAME BACKTRACE RECENT-KEYS).")
+
+(defun track-changes--recover-from-error ()
+  ;; We somehow got out of sync.  This is usually the result of a bug
+  ;; elsewhere that causes the before-c-f and after-c-f to be improperly
+  ;; paired, or to be skipped altogether.
+  ;; Not much we can do, other than force a full re-synchronization.
+  (warn "Missing/incorrect calls to `before/after-change-functions'!!
+Details logged to `track-changes--error-log'")
+  (push (list (buffer-name)
+              (backtrace-frames 'track-changes--recover-from-error)
+              (recent-keys 'include-cmds))
+        track-changes--error-log)
+  (setq track-changes--before-clean 'unset)
+  (setq track-changes--buffer-size (buffer-size))
+  ;; Create a new state disconnected from the previous ones!
+  ;; Mark the previous one as junk, just to be clear.
+  (setf (track-changes--state-before track-changes--state) 'error)
+  (setq track-changes--state (track-changes--state)))
+
+(defun track-changes--before (beg end)
+  (cl-assert track-changes--state)
+  (cl-assert (<= beg end))
+  (let* ((size (- end beg))
+         (reset (lambda ()
+                  (cl-assert track-changes--before-clean)
+                  (setq track-changes--before-clean 'set)
+                  (setf track-changes--before-string
+                        (buffer-substring-no-properties beg end))
+                  (setf track-changes--before-beg beg)
+                  (setf track-changes--before-end end)))
+
+         (signal-if-disjoint
+          (lambda (pos1 pos2)
+            (let ((distance (- pos2 pos1)))
+              (when (> distance
+                       (max track-changes--disjoint-threshold
+                            ;; If the distance is smaller than the size of the
+                            ;; current change, then we may as well consider it
+                            ;; as "near".
+                            (length track-changes--before-string)
+                            size
+                            (- track-changes--before-end
+                               track-changes--before-beg)))
+                (dolist (tracker track-changes--disjoint-trackers)
+                  (funcall (track-changes--tracker-signal tracker)
+                           tracker distance))
+                ;; Return non-nil if the state was cleaned along the way.
+                track-changes--before-clean)))))
+
+    (if track-changes--before-clean
+        (progn
+          ;; Detect disjointness with previous changes here as well,
+          ;; so that if a client calls `track-changes-fetch' all the time,
+          ;; it doesn't prevent others from getting a disjointness signal.
+          (when (and track-changes--before-beg
+                     (let ((found nil))
+                       (dolist (tracker track-changes--disjoint-trackers)
+                         (unless (memq tracker track-changes--clean-trackers)
+                           (setq found t)))
+                       found))
+            ;; There's at least one `tracker' that wants to know about disjoint
+            ;; changes *and* it has unseen pending changes.
+            ;; FIXME: This can occasionally signal a tracker that's clean.
+            (if (< beg track-changes--before-beg)
+                (funcall signal-if-disjoint end track-changes--before-beg)
+              (funcall signal-if-disjoint track-changes--before-end beg)))
+          (funcall reset))
+      (cl-assert (save-restriction
+                   (widen)
+                   (<= (point-min)
+                       track-changes--before-beg
+                       track-changes--before-end
+                       (point-max))))
+      (when (< beg track-changes--before-beg)
+        (if (and track-changes--disjoint-trackers
+                 (funcall signal-if-disjoint end track-changes--before-beg))
+            (funcall reset)
+          (let* ((old-bbeg track-changes--before-beg)
+                 ;; To avoid O(N²) behavior when faced with many small changes,
+                 ;; we copy more than needed.
+                 (new-bbeg (min (max (point-min)
+                                     (- old-bbeg
+                                        (length track-changes--before-string)))
+                                beg)))
+            (setf track-changes--before-beg new-bbeg)
+            (cl-callf (lambda (old new) (concat new old))
+                track-changes--before-string
+              (buffer-substring-no-properties new-bbeg old-bbeg)))))
+
+      (when (< track-changes--before-end end)
+        (if (and track-changes--disjoint-trackers
+                 (funcall signal-if-disjoint track-changes--before-end beg))
+            (funcall reset)
+          (let* ((old-bend track-changes--before-end)
+                 ;; To avoid O(N²) behavior when faced with many small changes,
+                 ;; we copy more than needed.
+                 (new-bend (max (min (point-max)
+                                     (+ old-bend
+                                        (length track-changes--before-string)))
+                                end)))
+            (setf track-changes--before-end new-bend)
+            (cl-callf concat track-changes--before-string
+              (buffer-substring-no-properties old-bend new-bend))))))))
+
+(defun track-changes--after (beg end len)
+  (cl-assert track-changes--state)
+  (and (eq track-changes--before-clean 'unset)
+       (not track-changes--before-no)
+       ;; This can be a sign that a `before-change-functions' went missing,
+       ;; or that we called `track-changes--clean-state' between
+       ;; a `before-change-functions' and `after-change-functions'.
+       (track-changes--before beg end))
+  (setq track-changes--before-clean nil)
+  (let ((offset (- (- end beg) len)))
+    (cl-incf track-changes--before-end offset)
+    (cl-incf track-changes--buffer-size offset)
+    (if (not (or track-changes--before-no
+                 (save-restriction
+                   (widen)
+                   (<= (point-min)
+                       track-changes--before-beg
+                       beg end
+                       track-changes--before-end
+                       (point-max)))))
+        ;; BEG..END is not covered by previous `before-change-functions'!!
+        (track-changes--recover-from-error)
+      ;; Note the new changes.
+      (when (< beg (track-changes--state-beg track-changes--state))
+        (setf (track-changes--state-beg track-changes--state) beg))
+      (cl-callf (lambda (old-end) (max end (+ old-end offset)))
+          (track-changes--state-end track-changes--state))
+      (cl-assert (or track-changes--before-no
+                     (<= track-changes--before-beg
+                         (track-changes--state-beg track-changes--state)
+                         beg end
+                         (track-changes--state-end track-changes--state)
+                         track-changes--before-end)))))
+  (while track-changes--clean-trackers
+    (let ((tracker (pop track-changes--clean-trackers)))
+      (if (track-changes--tracker-immediate tracker)
+          (funcall (track-changes--tracker-signal tracker) tracker)
+        (funcall-later #'track-changes--call-signal
+                       (current-buffer) tracker)))))
+
+(defun track-changes--call-signal (buf tracker)
+  (when (buffer-live-p buf)
+    (with-current-buffer buf
+      ;; Silence ourselves if `track-changes-fetch' was called in the mean time.
+      (unless (memq tracker track-changes--clean-trackers)
+        (funcall (track-changes--tracker-signal tracker) tracker)))))
+
+;;;; Extra candidates for the API.
+
+;; This could be a good alternative to using a temp-buffer like I used in
+;; Eglot, since presumably we've just been changing this very area of the
+;; buffer, so the gap should be ready nearby,
+;; It may seem silly to go back to the previous state, since we could have
+;; used `before-change-functions' to run FUNC right then when we were in
+;; that state.  The advantage is that with track-changes we get to decide
+;; retroactively which state is the one for which we want to call FUNC and
+;; which BEG..END to use: when that state was current we may have known
+;; then that it would be "the one" but we didn't know what BEG and END
+;; should be because those depend on the changes that came afterwards.
+(defun track-changes--in-revert (beg end before func)
+  "Call FUNC with the buffer contents temporarily reverted to BEFORE.
+FUNC is called with no arguments and with point right after BEFORE.
+FUNC is not allowed to modify the buffer and it should refrain from using
+operations that use a cache populated from the buffer's content,
+such as `syntax-ppss'."
+  (catch 'track-changes--exit
+    (with-silent-modifications ;; This has to be outside `atomic-change-group'.
+      (atomic-change-group
+        (goto-char end)
+        (insert-before-markers before)
+        (delete-region beg end)
+        (throw 'track-changes--exit
+               (let ((inhibit-read-only nil)
+                     (buffer-read-only t))
+                 (funcall func)))))))
+
+(defun track-changes--reset (id)
+  "Mark all past changes as handled for tracker ID.
+Does not re-enable ID's signal."
+  (track-changes--clean-state)
+  (setf (track-changes--tracker-state id) track-changes--state))
+
+(defun track-changes--pending-p (id)
+  "Return non-nil if there are pending changes for tracker ID."
+  (not (memq id track-changes--clean-trackers)))
+
+(defmacro with--track-changes (id vars &rest body)
+  (declare (indent 2) (debug (form sexp body)))
+  `(track-changes-fetch ,id (lambda ,vars ,@body)))
+
+(provide 'track-changes)
+;;; track-changes.el end here.
diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index 7f4284bf09d..00c09d7f06b 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -110,6 +110,7 @@
 (require 'text-property-search nil t)
 (require 'diff-mode)
 (require 'diff)
+(require 'track-changes)
 
 ;; These dependencies are also GNU ELPA core packages.  Because of
 ;; bug#62576, since there is a risk that M-x package-install, despite
@@ -1732,6 +1733,9 @@ eglot-utf-16-linepos
   "Calculate number of UTF-16 code units from position given by LBP.
 LBP defaults to `eglot--bol'."
   (/ (- (length (encode-coding-region (or lbp (eglot--bol))
+                                      ;; FIXME: How could `point' ever be
+                                      ;; larger than `point-max' (sounds like
+                                      ;; a bug in Emacs).
                                       ;; Fix github#860
                                       (min (point) (point-max)) 'utf-16 t))
         2)
@@ -1749,6 +1753,24 @@ eglot--pos-to-lsp-position
          :character (progn (when pos (goto-char pos))
                            (funcall eglot-current-linepos-function)))))
 
+(defun eglot--virtual-pos-to-lsp-position (pos string)
+  "Return the LSP position at the end of STRING if it were inserted at POS."
+  (eglot--widening
+   (goto-char pos)
+   (forward-line 0)
+   ;; LSP line is zero-origin; Emacs is one-origin.
+   (let ((posline (1- (line-number-at-pos nil t)))
+         (linebeg (buffer-substring (point) pos))
+         (colfun eglot-current-linepos-function))
+     ;; Use a temp buffer because:
+     ;; - I don't know of a fast way to count newlines in a string.
+     ;; - We currently don't have `eglot-current-linepos-function' for strings.
+     (with-temp-buffer
+       (insert linebeg string)
+       (goto-char (point-max))
+       (list :line (+ posline (1- (line-number-at-pos nil t)))
+             :character (funcall colfun))))))
+
 (defvar eglot-move-to-linepos-function #'eglot-move-to-utf-16-linepos
   "Function to move to a position within a line reported by the LSP server.
 
@@ -1946,6 +1968,8 @@ eglot-managed-mode-hook
   "A hook run by Eglot after it started/stopped managing a buffer.
 Use `eglot-managed-p' to determine if current buffer is managed.")
 
+(defvar-local eglot--track-changes nil)
+
 (define-minor-mode eglot--managed-mode
   "Mode for source buffers managed by some Eglot project."
   :init-value nil :lighter nil :keymap eglot-mode-map
@@ -1959,8 +1983,9 @@ eglot--managed-mode
       ("utf-8"
        (eglot--setq-saving eglot-current-linepos-function #'eglot-utf-8-linepos)
        (eglot--setq-saving eglot-move-to-linepos-function #'eglot-move-to-utf-8-linepos)))
-    (add-hook 'after-change-functions #'eglot--after-change nil t)
-    (add-hook 'before-change-functions #'eglot--before-change nil t)
+    (unless eglot--track-changes
+      (setq eglot--track-changes
+            (track-changes-register #'eglot--track-changes-signal :disjoint t)))
     (add-hook 'kill-buffer-hook #'eglot--managed-mode-off nil t)
     ;; Prepend "didClose" to the hook after the "nonoff", so it will run first
     (add-hook 'kill-buffer-hook #'eglot--signal-textDocument/didClose nil t)
@@ -1998,8 +2023,9 @@ eglot--managed-mode
                buffer
                (eglot--managed-buffers (eglot-current-server)))))
    (t
-    (remove-hook 'after-change-functions #'eglot--after-change t)
-    (remove-hook 'before-change-functions #'eglot--before-change t)
+    (when eglot--track-changes
+      (track-changes-unregister eglot--track-changes)
+      (setq eglot--track-changes nil))
     (remove-hook 'kill-buffer-hook #'eglot--managed-mode-off t)
     (remove-hook 'kill-buffer-hook #'eglot--signal-textDocument/didClose t)
     (remove-hook 'before-revert-hook #'eglot--signal-textDocument/didClose t)
@@ -2568,54 +2594,29 @@ jsonrpc-connection-ready-p
 
 (defvar-local eglot--change-idle-timer nil "Idle timer for didChange signals.")
 
-(defun eglot--before-change (beg end)
-  "Hook onto `before-change-functions' with BEG and END."
-  (when (listp eglot--recent-changes)
-    ;; Records BEG and END, crucially convert them into LSP
-    ;; (line/char) positions before that information is lost (because
-    ;; the after-change thingy doesn't know if newlines were
-    ;; deleted/added).  Also record markers of BEG and END
-    ;; (github#259)
-    (push `(,(eglot--pos-to-lsp-position beg)
-            ,(eglot--pos-to-lsp-position end)
-            (,beg . ,(copy-marker beg nil))
-            (,end . ,(copy-marker end t)))
-          eglot--recent-changes)))
-
 (defvar eglot--document-changed-hook '(eglot--signal-textDocument/didChange)
   "Internal hook for doing things when the document changes.")
 
-(defun eglot--after-change (beg end pre-change-length)
-  "Hook onto `after-change-functions'.
-Records BEG, END and PRE-CHANGE-LENGTH locally."
+(defun eglot--track-changes-fetch (id)
+  (if (eq eglot--recent-changes 'pending) (setq eglot--recent-changes nil))
+  (track-changes-fetch
+   id (lambda (beg end before)
+        (if (stringp before)
+            (push `(,(eglot--pos-to-lsp-position beg)
+                    ,(eglot--virtual-pos-to-lsp-position beg before)
+                    ,(length before)
+                    ,(buffer-substring-no-properties beg end))
+                  eglot--recent-changes)
+          (setf eglot--recent-changes :emacs-messup)))))
+
+(defun eglot--track-changes-signal (id &optional distance)
   (cl-incf eglot--versioned-identifier)
-  (pcase (car-safe eglot--recent-changes)
-    (`(,lsp-beg ,lsp-end
-                (,b-beg . ,b-beg-marker)
-                (,b-end . ,b-end-marker))
-     ;; github#259 and github#367: with `capitalize-word' & friends,
-     ;; `before-change-functions' records the whole word's `b-beg' and
-     ;; `b-end'.  Similarly, when `fill-paragraph' coalesces two
-     ;; lines, `b-beg' and `b-end' mark end of first line and end of
-     ;; second line, resp.  In both situations, `beg' and `end'
-     ;; received here seemingly contradict that: they will differ by 1
-     ;; and encompass the capitalized character or, in the coalescing
-     ;; case, the replacement of the newline with a space.  We keep
-     ;; both markers and positions to detect and correct this.  In
-     ;; this specific case, we ignore `beg', `len' and
-     ;; `pre-change-len' and send richer information about the region
-     ;; from the markers.  I've also experimented with doing this
-     ;; unconditionally but it seems to break when newlines are added.
-     (if (and (= b-end b-end-marker) (= b-beg b-beg-marker)
-              (or (/= beg b-beg) (/= end b-end)))
-         (setcar eglot--recent-changes
-                 `(,lsp-beg ,lsp-end ,(- b-end-marker b-beg-marker)
-                            ,(buffer-substring-no-properties b-beg-marker
-                                                             b-end-marker)))
-       (setcar eglot--recent-changes
-               `(,lsp-beg ,lsp-end ,pre-change-length
-                          ,(buffer-substring-no-properties beg end)))))
-    (_ (setf eglot--recent-changes :emacs-messup)))
+  (cond
+   (distance (eglot--track-changes-fetch id))
+   (eglot--recent-changes nil)
+   ;; Note that there are pending changes, for the benefit of those
+   ;; who check it as a boolean.
+   (t (setq eglot--recent-changes 'pending)))
   (when eglot--change-idle-timer (cancel-timer eglot--change-idle-timer))
   (let ((buf (current-buffer)))
     (setq eglot--change-idle-timer
@@ -2729,6 +2730,7 @@ eglot-handle-request
 (defun eglot--signal-textDocument/didChange ()
   "Send textDocument/didChange to server."
   (when eglot--recent-changes
+    (eglot--track-changes-fetch eglot--track-changes)
     (let* ((server (eglot--current-server-or-lose))
            (sync-capability (eglot-server-capable :textDocumentSync))
            (sync-kind (if (numberp sync-capability) sync-capability
@@ -2745,13 +2747,8 @@ eglot--signal-textDocument/didChange
                               (buffer-substring-no-properties (point-min)
                                                               (point-max)))))
           (cl-loop for (beg end len text) in (reverse eglot--recent-changes)
-                   ;; github#259: `capitalize-word' and commands based
-                   ;; on `casify_region' will cause multiple duplicate
-                   ;; empty entries in `eglot--before-change' calls
-                   ;; without an `eglot--after-change' reciprocal.
-                   ;; Weed them out here.
-                   when (numberp len)
                    vconcat `[,(list :range `(:start ,beg :end ,end)
+                                    ;; `rangeLength' is obsolete.
                                     :rangeLength len :text text)]))))
       (setq eglot--recent-changes nil)
       (jsonrpc--call-deferred server))))
diff --git a/lisp/vc/diff-mode.el b/lisp/vc/diff-mode.el
index 66043059d14..e7ac517b72f 100644
--- a/lisp/vc/diff-mode.el
+++ b/lisp/vc/diff-mode.el
@@ -53,9 +53,10 @@
 ;; - Handle `diff -b' output in context->unified.
 
 ;;; Code:
+(require 'easy-mmode)
+(require 'track-changes)
 (eval-when-compile (require 'cl-lib))
 (eval-when-compile (require 'subr-x))
-(require 'easy-mmode)
 
 (autoload 'vc-find-revision "vc")
 (autoload 'vc-find-revision-no-save "vc")
@@ -1431,56 +1432,40 @@ diff-write-contents-hooks
   (if (buffer-modified-p) (diff-fixup-modifs (point-min) (point-max)))
   nil)
 
-;; It turns out that making changes in the buffer from within an
-;; *-change-function is asking for trouble, whereas making them
-;; from a post-command-hook doesn't pose much problems
-(defvar diff-unhandled-changes nil)
-(defun diff-after-change-function (beg end _len)
-  "Remember to fixup the hunk header.
-See `after-change-functions' for the meaning of BEG, END and LEN."
-  ;; Ignoring changes when inhibit-read-only is set is strictly speaking
-  ;; incorrect, but it turns out that inhibit-read-only is normally not set
-  ;; inside editing commands, while it tends to be set when the buffer gets
-  ;; updated by an async process or by a conversion function, both of which
-  ;; would rather not be uselessly slowed down by this hook.
-  (when (and (not undo-in-progress) (not inhibit-read-only))
-    (if diff-unhandled-changes
-	(setq diff-unhandled-changes
-	      (cons (min beg (car diff-unhandled-changes))
-		    (max end (cdr diff-unhandled-changes))))
-      (setq diff-unhandled-changes (cons beg end)))))
-
-(defun diff-post-command-hook ()
-  "Fixup hunk headers if necessary."
-  (when (consp diff-unhandled-changes)
-    (ignore-errors
-      (save-excursion
-	(goto-char (car diff-unhandled-changes))
-	;; Maybe we've cut the end of the hunk before point.
-	(if (and (bolp) (not (bobp))) (backward-char 1))
-	;; We used to fixup modifs on all the changes, but it turns out that
-	;; it's safer not to do it on big changes, e.g. when yanking a big
-	;; diff, or when the user edits the header, since we might then
-	;; screw up perfectly correct values.  --Stef
-	(diff-beginning-of-hunk t)
-        (let* ((style (if (looking-at "\\*\\*\\*") 'context))
-               (start (line-beginning-position (if (eq style 'context) 3 2)))
-               (mid (if (eq style 'context)
-                        (save-excursion
-                          (re-search-forward diff-context-mid-hunk-header-re
-                                             nil t)))))
-          (when (and ;; Don't try to fixup changes in the hunk header.
-                 (>= (car diff-unhandled-changes) start)
-                 ;; Don't try to fixup changes in the mid-hunk header either.
-                 (or (not mid)
-                     (< (cdr diff-unhandled-changes) (match-beginning 0))
-                     (> (car diff-unhandled-changes) (match-end 0)))
-                 (save-excursion
-		(diff-end-of-hunk nil 'donttrustheader)
-                   ;; Don't try to fixup changes past the end of the hunk.
-                   (>= (point) (cdr diff-unhandled-changes))))
-	  (diff-fixup-modifs (point) (cdr diff-unhandled-changes)))))
-      (setq diff-unhandled-changes nil))))
+(defvar-local diff--track-changes nil)
+
+(defun diff--track-changes-signal (tracker)
+  (cl-assert (eq tracker diff--track-changes))
+  (track-changes-fetch tracker #'diff--track-changes-function))
+
+(defun diff--track-changes-function (beg end _before)
+  (with-demoted-errors "%S"
+    (save-excursion
+      (goto-char beg)
+      ;; Maybe we've cut the end of the hunk before point.
+      (if (and (bolp) (not (bobp))) (backward-char 1))
+      ;; We used to fixup modifs on all the changes, but it turns out that
+      ;; it's safer not to do it on big changes, e.g. when yanking a big
+      ;; diff, or when the user edits the header, since we might then
+      ;; screw up perfectly correct values.  --Stef
+      (diff-beginning-of-hunk t)
+      (let* ((style (if (looking-at "\\*\\*\\*") 'context))
+             (start (line-beginning-position (if (eq style 'context) 3 2)))
+             (mid (if (eq style 'context)
+                      (save-excursion
+                        (re-search-forward diff-context-mid-hunk-header-re
+                                           nil t)))))
+        (when (and ;; Don't try to fixup changes in the hunk header.
+               (>= beg start)
+               ;; Don't try to fixup changes in the mid-hunk header either.
+               (or (not mid)
+                   (< end (match-beginning 0))
+                   (> beg (match-end 0)))
+               (save-excursion
+		 (diff-end-of-hunk nil 'donttrustheader)
+                 ;; Don't try to fixup changes past the end of the hunk.
+                 (>= (point) end)))
+	 (diff-fixup-modifs (point) end))))))
 
 (defun diff-next-error (arg reset)
   ;; Select a window that displays the current buffer so that point
@@ -1560,9 +1545,8 @@ diff-mode
   ;; setup change hooks
   (if (not diff-update-on-the-fly)
       (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
-    (make-local-variable 'diff-unhandled-changes)
-    (add-hook 'after-change-functions #'diff-after-change-function nil t)
-    (add-hook 'post-command-hook #'diff-post-command-hook nil t))
+    (setq diff--track-changes
+          (track-changes-register #'diff--track-changes-signal :nobefore t)))
 
   ;; add-log support
   (setq-local add-log-current-defun-function #'diff-current-defun)
@@ -1581,12 +1565,15 @@ diff-minor-mode
 \\{diff-minor-mode-map}"
   :group 'diff-mode :lighter " Diff"
   ;; FIXME: setup font-lock
-  ;; setup change hooks
-  (if (not diff-update-on-the-fly)
-      (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
-    (make-local-variable 'diff-unhandled-changes)
-    (add-hook 'after-change-functions #'diff-after-change-function nil t)
-    (add-hook 'post-command-hook #'diff-post-command-hook nil t)))
+  (when diff--track-changes (track-changes-unregister diff--track-changes))
+  (remove-hook 'write-contents-functions #'diff-write-contents-hooks t)
+  (when diff-minor-mode
+    (if (not diff-update-on-the-fly)
+        (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
+      (unless diff--track-changes
+        (setq diff--track-changes
+              (track-changes-register #'diff--track-changes-signal
+                                      :nobefore t))))))
 
 ;;; Handy hook functions ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
diff --git a/test/lisp/emacs-lisp/track-changes-tests.el b/test/lisp/emacs-lisp/track-changes-tests.el
new file mode 100644
index 00000000000..cdccbe80299
--- /dev/null
+++ b/test/lisp/emacs-lisp/track-changes-tests.el
@@ -0,0 +1,149 @@
+;;; track-changes-tests.el --- tests for emacs-lisp/track-changes.el  -*- lexical-binding:t -*-
+
+;; Copyright (C) 2024  Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;;; Code:
+
+(require 'track-changes)
+(require 'cl-lib)
+(require 'ert)
+
+(defun track-changes-tests--random-word ()
+  (let ((chars ()))
+    (dotimes (_ (1+ (random 12)))
+      (push (+ ?A (random (1+ (- ?z ?A)))) chars))
+    (apply #'string chars)))
+
+(defvar track-changes-tests--random-verbose nil)
+
+(defun track-changes-tests--message (&rest args)
+  (when track-changes-tests--random-verbose (apply #'message args)))
+
+(ert-deftest track-changes-tests--random ()
+  ;; Keep 2 buffers in sync with a third one as we make random
+  ;; changes to that 3rd one.
+  ;; We have 3 trackers: a "normal" one which we sync
+  ;; at random intervals, one which syncs via the "disjoint" signal,
+  ;; plus a third one which verifies that "nobefore" gets
+  ;; information consistent with the "normal" tracker.
+  (with-temp-buffer
+    (dotimes (_ 100)
+      (insert (track-changes-tests--random-word) "\n"))
+    (let* ((buf1 (generate-new-buffer " *tc1*"))
+           (buf2 (generate-new-buffer " *tc2*"))
+           (char-counts (make-vector 2 0))
+           (sync-counts (make-vector 2 0))
+           (print-escape-newlines t)
+           (file (make-temp-file "tc"))
+           (id1 (track-changes-register #'ignore))
+           (id3 (track-changes-register #'ignore :nobefore t))
+           (sync
+            (lambda (id buf n)
+              (track-changes-tests--message "!! SYNC %d !!" n)
+              (track-changes-fetch
+               id (lambda (beg end before)
+                    (when (eq n 1)
+                      (track-changes-fetch
+                       id3 (lambda (beg3 end3 before3)
+                             (should (eq beg3 beg))
+                             (should (eq end3 end))
+                             (should (eq before3
+                                         (if (symbolp before)
+                                             before (length before)))))))
+                    (cl-incf (aref sync-counts (1- n)))
+                    (cl-incf (aref char-counts (1- n)) (- end beg))
+                    (let ((after (buffer-substring beg end)))
+                      (track-changes-tests--message
+                       "Sync:\n    %S\n=>  %S\nat %d .. %d"
+                       before after beg end)
+                      (with-current-buffer buf
+                        (if (eq before 'error)
+                            (erase-buffer)
+                          (should (equal before
+                                         (buffer-substring
+                                          beg (+ beg (length before)))))
+                          (delete-region beg (+ beg (length before))))
+                        (goto-char beg)
+                        (insert after)))
+                    (should (equal (buffer-string)
+                                   (with-current-buffer buf
+                                     (buffer-string))))))))
+           (id2 (track-changes-register
+                 (lambda (id2 &optional distance)
+                   (when distance
+                     (track-changes-tests--message "Disjoint distance: %d"
+                                                   distance)
+                     (funcall sync id2 buf2 2)))
+                 :disjoint t)))
+      (write-region (point-min) (point-max) file)
+      (insert-into-buffer buf1)
+      (insert-into-buffer buf2)
+      (should (equal (buffer-hash) (buffer-hash buf1)))
+      (should (equal (buffer-hash) (buffer-hash buf2)))
+      (dotimes (_ 1000)
+        (pcase (random 15)
+          (0
+           (track-changes-tests--message "Manual sync1")
+           (funcall sync id1 buf1 1))
+          (1
+           (track-changes-tests--message "Manual sync2")
+           (funcall sync id2 buf2 2))
+          ((pred (< _ 5))
+           (let* ((beg (+ (point-min) (random (1+ (buffer-size)))))
+                  (end (min (+ beg (1+ (random 100))) (point-max))))
+             (track-changes-tests--message "Fill %d .. %d" beg end)
+             (fill-region-as-paragraph beg end)))
+          ((pred (< _ 8))
+           (let* ((beg (+ (point-min) (random (1+ (buffer-size)))))
+                  (end (min (+ beg (1+ (random 12))) (point-max))))
+             (track-changes-tests--message "Delete %S at %d .. %d"
+                                           (buffer-substring beg end) beg end)
+             (delete-region beg end)))
+          ((and 8 (guard (= (random 50) 0)))
+           (track-changes-tests--message "Silent insertion")
+           (let ((inhibit-modification-hooks t))
+             (insert "a")))
+          ((and 8 (guard (= (random 10) 0)))
+           (track-changes-tests--message "Revert")
+           (insert-file-contents file nil nil nil 'replace))
+          ((and 8 (guard (= (random 3) 0)))
+           (let* ((beg (+ (point-min) (random (1+ (buffer-size)))))
+                  (end (min (+ beg (1+ (random 12))) (point-max)))
+                  (after (eq (random 2) 0)))
+             (track-changes-tests--message "Bogus %S %d .. %d"
+                                           (if after 'after 'before) beg end)
+             (if after
+                 (run-hook-with-args 'after-change-functions
+                                     beg end (- end beg))
+               (run-hook-with-args 'before-change-functions beg end))))
+          (_
+           (goto-char (+ (point-min) (random (1+ (buffer-size)))))
+           (let ((word (track-changes-tests--random-word)))
+             (track-changes-tests--message "insert %S at %d" word (point))
+             (insert  word "\n")))))
+      (message "SCOREs: default: %d/%d=%d     disjoint: %d/%d=%d"
+               (aref char-counts 0) (aref sync-counts 0)
+               (/ (aref char-counts 0) (aref sync-counts 0))
+               (aref char-counts 1) (aref sync-counts 1)
+               (/ (aref char-counts 1) (aref sync-counts 1))))))
+
+
+
+;;; track-changes-tests.el ends here
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-05 22:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-06  8:43   ` Eli Zaretskii
  2024-04-08 15:24     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-06 17:37   ` Dmitry Gutov
  2024-04-07 14:07   ` Ihor Radchenko
  2 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2024-04-06  8:43 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 70077

> Cc: Alan Mackenzie <acm@muc.de>, Ihor Radchenko <yantar92@posteo.net>
> Date: Fri, 05 Apr 2024 18:12:55 -0400
> From:  Stefan Monnier via "Bug reports for GNU Emacs,
>  the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
> 
> +;; This library is a layer of abstraction above `before-change-functions'
> +;; and `after-change-functions' which takes care of accumulating changes
> +;; until a time when its client finds it convenient to react to them.
> +;;
> +;; It provides an API that is easier to use correctly than our
> +;; `*-change-functions` hooks.  Problems that it claims to solve:

Please redo all the quoting from `like this` to one of the two styles
we use in our documentation.

> +(unless (fboundp 'funcall-later)
> +  (defun funcall-later (&rest args)
> +    ;; FIXME: Not sure if `run-with-timer' preserves ordering between
> +    ;; different calls with the same target time.
> +   (apply #'run-with-timer 0 nil args)))

Both funcall-later and run-with-timer??

> +
> +;;;; Internal types and variables.
> +
> +(cl-defstruct (track-changes--tracker
> +               (:noinline t)
> +               (:constructor nil)
> +               (:constructor track-changes--tracker ( signal state
> +                                                      &optional
> +                                                      nobefore immediate)))
> +  signal state nobefore immediate)
> +
> +(cl-defstruct (track-changes--state
> +               (:noinline t)
> +               (:constructor nil)
> +               (:constructor track-changes--state ()))
> +  "Object holding a description of a buffer state.
> +BEG..END is the area that was changed and BEFORE is its previous content.
> +If the current buffer currently holds the content of the next state, you can get
> +the contents of the previous state with:
> +
> +    (concat (buffer-substring (point-min) beg)
> +                before
> +                (buffer-substring end (point-max)))
> +
> +NEXT is the next state object (i.e. a more recent state).
> +If NEXT is nil it means it's most recent state and it may be incomplete
> +\(BEG/END/BEFORE may be nil), in which case those fields will take their
> +values from `track-changes--before-(beg|end|before)' when the next
> +state is create."

This doc string should IMO include the form of the object, because
without that BEG, END, BEFORE, and NEXT are "out of the blue", and
it's entirely unclear what they allude to.

Also, this doc string (and a few others) have very long lines, so
please refill them.

> +(defvar-local track-changes--trackers ()
> +  "List of trackers currently registered in the current buffer.")
                                            ^^^^^^^^^^^^^^^^^^^^^
I think "in the buffer" is more accurate, since this is not limited to
the current buffer.

> +(defvar-local track-changes--disjoint-trackers ()
> + "List of trackers that want to react to disjoint changes.
> +These trackers' are signaled every time track-changes notices
                 ^
That apostrophe is redundant.

> +(defvar-local track-changes--before-clean 'unset
> +  "If non-nil, the `track-changes--before-*' vars are old.
> +More specifically it means they cover a part of the buffer relevant
> +for the previous state.
> +It can take two non-nil values:
> +- `unset': Means that the vars cover some older state.
> +  This is what it is set right after creating a fresh new state.
                        ^^^
"set to"

> +(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
> +  "Register a new tracker and return a new tracker ID.

Please mention SIGNAL in the first line of the doc string.

> +SIGNAL is a function that will be called with one argument (the tracker ID)
> +after the current buffer is modified, so that we can react to the change.
                                                 ^^
"we"?

> +By default SIGNAL is called as soon as convenient after a change, which is
                               ^^^^^^^^^^^^^^^^^^^^^
"as soon as it's convenient", I presume?

> +usually right after the end of the current command.

This should explicitly reference funcall-later, so that users could
understand what "as soon as convenient" means.

> +If optional argument DISJOINT is non-nil, SIGNAL is called every time we are
> +about to combine changes from \"distant\" parts of the buffer.        ^^

"we" again?

> +In order to prevent the upcoming change from being combined with the previous
> +changes, SIGNAL needs to call `track-changes-fetch' before it returns."

This seems to contradict what the doc string says previously: that
SIGNAL should NOT call track-changes-fetch.

> +(defun track-changes-fetch (id func)
> +  "Fetch the pending changes.

The first line of a doc string should mention the arguments.

> +ID is the tracker ID returned by a previous `track-changes-register'.
> +FUNC is a function.  It is called with 3 arguments (BEGIN END BEFORE)
> +where BEGIN..END delimit the region that was changed since the last
> +time `track-changes-fetch' was called and BEFORE is a string containing
> +the previous content of that region (or just its length as an integer
> +If the tracker ID was registered with the `nobefore' option).
   ^^
"if", lower-case.

> +If some error caused us to miss some changes, then BEFORE will be the
                        ^^
"we" again?

> +If no changes occurred since the last time, FUNC is not called and
> +we return nil, otherwise we return the value returned by FUNC,
   ^^
And again.

> +      ;; The states are disconnected from the latest state because
> +      ;; we got out of sync!
> +      (cl-assert (eq (track-changes--state-before (car states)) 'error))

This seem to mean Emacs will signal an error in this case, not pass
'error' in BEFORE?

> +(defun track-changes--clean-state ()
> +  (cond
> +   ((null track-changes--state)
> +    (cl-assert track-changes--before-clean)
> +    (cl-assert (null track-changes--buffer-size))
> +    ;; No state has been created yet.  Do it now.
> +    (setq track-changes--buffer-size (buffer-size))
> +    (when track-changes--before-no
> +      (setq track-changes--before-string (buffer-size)))
> +    (setq track-changes--state (track-changes--state)))
> +   (track-changes--before-clean nil)
> +   (t
> +    (cl-assert (<= (track-changes--state-beg track-changes--state)
> +                   (track-changes--state-end track-changes--state)))
> +    (let ((actual-beg (track-changes--state-beg track-changes--state))
> +          (actual-end (track-changes--state-end track-changes--state)))
> +      (if track-changes--before-no
> +          (progn
> +            (cl-assert (integerp track-changes--before-string))
> +            (setf (track-changes--state-before track-changes--state)
> +                  (- track-changes--before-string
> +                     (- (buffer-size) (- actual-end actual-beg))))
> +            (setq track-changes--before-string (buffer-size)))
> +        (cl-assert (<= track-changes--before-beg
> +                       actual-beg actual-end
> +                       track-changes--before-end))
> +        (cl-assert (null (track-changes--state-before track-changes--state)))

Are all those assertions a good idea in this function?  I can envision
using it as a cleanup, in which case assertions will not be
appreciated.

> +(defvar track-changes--disjoint-threshold 100
> +  "Distance below which changes are not considered disjoint.")

This should tell in what units the distance is measured.

> +;;;; Extra candidates for the API.
> +
> +;; This could be a good alternative to using a temp-buffer like I used in
                                                                   ^^^^^^
"I"?

> +;; Eglot, since presumably we've just been changing this very area of the
> +;; buffer, so the gap should be ready nearby,
> +;; It may seem silly to go back to the previous state, since we could have
> +;; used `before-change-functions' to run FUNC right then when we were in
> +;; that state.  The advantage is that with track-changes we get to decide
> +;; retroactively which state is the one for which we want to call FUNC and
> +;; which BEG..END to use: when that state was current we may have known
> +;; then that it would be "the one" but we didn't know what BEG and END
> +;; should be because those depend on the changes that came afterwards.

Suggest to reword (or remove) this comment, as it sounds like
development-time notes.

> +(defun diff--track-changes-function (beg end _before)
> +  (with-demoted-errors "%S"

Why did you need with-demoted-errors here?

Last, but not least: this needs suitable changes in NEWS and ELisp
manual.

Thanks.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-05 22:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-06  8:43   ` Eli Zaretskii
@ 2024-04-06 17:37   ` Dmitry Gutov
  2024-04-06 19:44     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-07 14:07   ` Ihor Radchenko
  2 siblings, 1 reply; 50+ messages in thread
From: Dmitry Gutov @ 2024-04-06 17:37 UTC (permalink / raw)
  To: Stefan Monnier, 70077; +Cc: Alan Mackenzie, Ihor Radchenko

On 06/04/2024 01:12, Stefan Monnier via Bug reports for GNU Emacs, the 
Swiss army knife of text editors wrote:
> diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
> index 7f4284bf09d..00c09d7f06b 100644
> --- a/lisp/progmodes/eglot.el
> +++ b/lisp/progmodes/eglot.el
> @@ -110,6 +110,7 @@
>   (require 'text-property-search nil t)
>   (require 'diff-mode)
>   (require 'diff)
> +(require 'track-changes)

Eglot is a "ELPA core" package.

Will we put `track-changes' into ELPA as well?





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-06 17:37   ` Dmitry Gutov
@ 2024-04-06 19:44     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-07 14:40       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-06 19:44 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Alan Mackenzie, Ihor Radchenko, 70077

>> diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
>> index 7f4284bf09d..00c09d7f06b 100644
>> --- a/lisp/progmodes/eglot.el
>> +++ b/lisp/progmodes/eglot.el
>> @@ -110,6 +110,7 @@
>>   (require 'text-property-search nil t)
>>   (require 'diff-mode)
>>   (require 'diff)
>> +(require 'track-changes)
> Eglot is a "ELPA core" package.
> Will we put `track-changes' into ELPA as well?

The part of the patch that touches `eglot.el` is not indispensable, but
yes, that's indeed something I've been wondering as well, seeing how
it could be useful for third party packages like Lsp-mode, Crdt,
and more.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-05 22:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-06  8:43   ` Eli Zaretskii
  2024-04-06 17:37   ` Dmitry Gutov
@ 2024-04-07 14:07   ` Ihor Radchenko
  2024-04-08 16:06     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2 siblings, 1 reply; 50+ messages in thread
From: Ihor Radchenko @ 2024-04-07 14:07 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Alan Mackenzie, 70077

Stefan Monnier <monnier@iro.umontreal.ca> writes:

> +(cl-defstruct (track-changes--state
> +               (:noinline t)
> +               (:constructor nil)
> +               (:constructor track-changes--state ()))
> +  "Object holding a description of a buffer state.
> +BEG..END is the area that was changed and BEFORE is its previous content.
> +If the current buffer currently holds the content of the next state, you can get
> +the contents of the previous state with:
> +
> +    (concat (buffer-substring (point-min) beg)
> +                before
> +                (buffer-substring end (point-max)))
> +
> +NEXT is the next state object (i.e. a more recent state).
> +If NEXT is nil it means it's most recent state and it may be incomplete
> +\(BEG/END/BEFORE may be nil), in which case those fields will take their
> +values from `track-changes--before-(beg|end|before)' when the next
> +state is create."

This docstring is a bit confusing.
If a state object is not the most recent, how come 

> +    (concat (buffer-substring (point-min) beg)
> +                before
> +                (buffer-substring end (point-max)))

produces the previous content?

And if the state object is the most recent, "it may be incomplete"...

So, when is it safe to use the above (concat ... ) call?

> +(defvar-local track-changes--before-beg (point-min)
> +  "Beginning position of the remembered \"before string\".")
> +(defvar-local track-changes--before-end (point-min)
> +  "End position of the text replacing the \"before string\".")

Why (point-min)? It will make the values dependent on the buffer
narrowing that happens to be active when the library if first loaded.
Which cannot be right.

> +(defvar-local track-changes--buffer-size nil
> +  "Current size of the buffer, as far as this library knows.
> +This is used to try and detect cases where buffer modifications are \"lost\".")

Just looking at the buffer size may miss unregistered edits that do not
change the total size of the buffer. Although I do not know a better
measure. `buffer-chars-modified-tic' may lead to false-positives
(Bug#51766).

> +(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
> +  "Register a new tracker and return a new tracker ID.
> +SIGNAL is a function that will be called with one argument (the tracker ID)
> +after the current buffer is modified, so that we can react to the change.
> + ...
> +If optional argument DISJOINT is non-nil, SIGNAL is called every time we are
> +about to combine changes from \"distant\" parts of the buffer.
> +This is needed when combining disjoint changes into one bigger change
> +is unacceptable, typically for performance reasons.
> +These calls are distinguished from normal calls by calling SIGNAL with
> +a second argument which is the distance between the upcoming change and
> +the previous changes.

This is a bit confusing. The first paragraph says that SIGNAL is called
with a single argument, but that it appears that two arguments may be
passed. I'd rather tell the calling convention early in the docstring.

> +  (unless nobefore
> +    (setq track-changes--before-no nil)
> +    (add-hook 'before-change-functions #'track-changes--before nil t))
> +  (add-hook 'after-change-functions  #'track-changes--after  nil t)

Note that all the changes made in indirect buffers will be missed.
See bug#60333.

> +(defun track-changes-fetch (id func)
> ...
> +  (unless (equal track-changes--buffer-size (buffer-size))
> +    (track-changes--recover-from-error))
> +  (let ((beg nil)
> +        (end nil)
> +        (before t)
> +        (lenbefore 0)
> +        (states ()))
> +    ;; Transfer the data from `track-changes--before-string'
> +    ;; to the tracker's state object, if needed.
> +    (track-changes--clean-state)

> +(defun track-changes--recover-from-error ()
> ...
> +  (setq track-changes--state (track-changes--state)))

This will create a dummy state with

 (beg (point-max))
 (end (point-min))

such state will not pass (< beg end) assertion in
`track-changes--clean-state' called in `track-changes-fetch' immediately
after `track-changes--recover-from-error'

> +(defun track-changes--in-revert (beg end before func)
> ...
> +      (atomic-change-group
> +        (goto-char end)
> +        (insert-before-markers before)
> +        (delete-region beg end)

What happens if there are markers inside beg...end?

> +(defun track-changes-tests--random-word ()
> +  (let ((chars ()))
> +    (dotimes (_ (1+ (random 12)))
> +      (push (+ ?A (random (1+ (- ?z ?A)))) chars))
> +    (apply #'string chars)))

If you are using random values for edits, how can such test be
reproduced? Maybe first generate a random seed and then log it, so that
the failing test can be repeated if necessary with seed assigned manually.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-06 19:44     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-07 14:40       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-07 15:47         ` Dmitry Gutov
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-07 14:40 UTC (permalink / raw)
  To: Dmitry Gutov; +Cc: Alan Mackenzie, Ihor Radchenko, 70077

>> Eglot is a "ELPA core" package.
>> Will we put `track-changes' into ELPA as well?
>
> The part of the patch that touches `eglot.el` is not indispensable, but
> yes, that's indeed something I've been wondering as well, seeing how
> it could be useful for third party packages like Lsp-mode, Crdt,
> and more.

BTW, by that I meant that maybe it should live in `elpa.git` (i.e. in
GNU ELPA instead of in Emacs).  This is especially since I'm not sure we
want to push this as "the" API.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-07 14:40       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-07 15:47         ` Dmitry Gutov
  0 siblings, 0 replies; 50+ messages in thread
From: Dmitry Gutov @ 2024-04-07 15:47 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Alan Mackenzie, Ihor Radchenko, 70077

On 07/04/2024 17:40, Stefan Monnier wrote:
>>> Eglot is a "ELPA core" package.
>>> Will we put `track-changes' into ELPA as well?
>> The part of the patch that touches `eglot.el` is not indispensable, but
>> yes, that's indeed something I've been wondering as well, seeing how
>> it could be useful for third party packages like Lsp-mode, Crdt,
>> and more.
> BTW, by that I meant that maybe it should live in `elpa.git` (i.e. in
> GNU ELPA instead of in Emacs).  This is especially since I'm not sure we
> want to push this as "the" API.

In that case, the change to Eglot might have to be postponed (since we 
can't make a built-in package depend on code that could be absent).

I don't have an opinion on the API itself, FWIW.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-06  8:43   ` Eli Zaretskii
@ 2024-04-08 15:24     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-08 15:53       ` Eli Zaretskii
  2024-04-08 20:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-08 15:24 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 70077

>> +(cl-defstruct (track-changes--state
>> +               (:noinline t)
>> +               (:constructor nil)
>> +               (:constructor track-changes--state ()))
>> +  "Object holding a description of a buffer state.
>> +BEG..END is the area that was changed and BEFORE is its previous content.
>> +If the current buffer currently holds the content of the next state, you can get
>> +the contents of the previous state with:
>> +
>> +    (concat (buffer-substring (point-min) beg)
>> +                before
>> +                (buffer-substring end (point-max)))
>> +
>> +NEXT is the next state object (i.e. a more recent state).
>> +If NEXT is nil it means it's most recent state and it may be incomplete
>> +\(BEG/END/BEFORE may be nil), in which case those fields will take their
>> +values from `track-changes--before-(beg|end|before)' when the next
>> +state is create."
>
> This doc string should IMO include the form of the object, because
> without that BEG, END, BEFORE, and NEXT are "out of the blue", and
> it's entirely unclear what they allude to.

AFAIK this docstring is displayed only by `describe-type` which shows
something like:

    track-changes--state is a type (of kind ‘cl-structure-class’) in ‘track-changes.el’.
     Inherits from ‘cl-structure-object’.
    
    Object holding a description of a buffer state.
    BEG..END is the area that was changed and BEFORE is its previous content.
    If the current buffer currently holds the content of the next state, you can get
    the contents of the previous state with:
    
        (concat (buffer-substring (point-min) beg)
                    before
                    (buffer-substring end (point-max)))
    
    NEXT is the next state object (i.e. a more recent state).
    If NEXT is nil it means it's most recent state and it may be incomplete
    (BEG/END/BEFORE may be nil), in which case those fields will take their
    values from `track-changes--before-(beg|end|before)' when the next
    state is create.
    
    Instance Allocated Slots:
    
            Name    Type    Default
            ————    ————    ———————
            beg     t       (point-max)
            end     t       (point-min)
            before  t       nil
            next    t       nil
    
    Specialized Methods:
    
    [...]

so the "form of the object" is included.  We don't have much practice
with docstrings for `cl-defstruct`, but I tried to follow the same kind
of conventions we use for functions, taking object slots as equivalent
to formal arguments.

Maybe `describe-type` should lists the slots first and the docstring
underneath rather than other way around?

>> +(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
>> +  "Register a new tracker and return a new tracker ID.
> Please mention SIGNAL in the first line of the doc string.

Hmm... having trouble making that fit on a single line.
Could you clarify why you think SIGNAL needs to be on the first line?

The best I could come up with so far is:

    Register a new change tracker handled via SIGNAL.

>> +By default SIGNAL is called as soon as convenient after a change, which is
>                                ^^^^^^^^^^^^^^^^^^^^^
> "as soon as it's convenient", I presume?

Other than the extra " it's", what is the difference?

>> +usually right after the end of the current command.
> This should explicitly reference funcall-later, so that users could
> understand what "as soon as convenient" means.

I don't see the point of telling which function we use: the programmers
who want to know that, can consult the code.  As for explaining what "as
soon as convenient" means, that's what "usually right after the end of
the current command" is there for.

[ Also, I've changed the code to use `run-with-timer` in the mean time.  ]

>> +In order to prevent the upcoming change from being combined with the previous
>> +changes, SIGNAL needs to call `track-changes-fetch' before it returns."
>
> This seems to contradict what the doc string says previously: that
> SIGNAL should NOT call track-changes-fetch.

I don't kow where you see the docstring saying that.
The closest I can find is:

    When IMMEDIATE is non-nil, the SIGNAL should preferably not always call
    `track-changes-fetch', since that would defeat the purpose of this library.

Note the "When IMMEDIATE is non-nil", "preferably", and "not always",
and the fact that the reason is not that something will break but that
some other solution would probably work better.
[ I changed "preferably" into "probably" since it's just a guess of mine
  rather than a request: there might be a good reason out there to
  prefer track-changes even for such a use case.  ]

>> +      ;; The states are disconnected from the latest state because
>> +      ;; we got out of sync!
>> +      (cl-assert (eq (track-changes--state-before (car states)) 'error))
> This seem to mean Emacs will signal an error in this case, not pass
> 'error' in BEFORE?

No, this verifies that the states were disconnected on purpose by
`track-changes--recover-from-error` rather than due to some bug in
the code.

>> +(defun track-changes--clean-state ()
>> +  (cond
>> +   ((null track-changes--state)
>> +    (cl-assert track-changes--before-clean)
>> +    (cl-assert (null track-changes--buffer-size))
>> +    ;; No state has been created yet.  Do it now.
>> +    (setq track-changes--buffer-size (buffer-size))
>> +    (when track-changes--before-no
>> +      (setq track-changes--before-string (buffer-size)))
>> +    (setq track-changes--state (track-changes--state)))
>> +   (track-changes--before-clean nil)
>> +   (t
>> +    (cl-assert (<= (track-changes--state-beg track-changes--state)
>> +                   (track-changes--state-end track-changes--state)))
>> +    (let ((actual-beg (track-changes--state-beg track-changes--state))
>> +          (actual-end (track-changes--state-end track-changes--state)))
>> +      (if track-changes--before-no
>> +          (progn
>> +            (cl-assert (integerp track-changes--before-string))
>> +            (setf (track-changes--state-before track-changes--state)
>> +                  (- track-changes--before-string
>> +                     (- (buffer-size) (- actual-end actual-beg))))
>> +            (setq track-changes--before-string (buffer-size)))
>> +        (cl-assert (<= track-changes--before-beg
>> +                       actual-beg actual-end
>> +                       track-changes--before-end))
>> +        (cl-assert (null (track-changes--state-before track-changes--state)))
>
> Are all those assertions a good idea in this function?

The library has a lot of `cl-assert`s which are all placed both as
documentation and as sanity checks.  All those should hopefully be true
all the time barring bugs in the code.

> I can envision using it as a cleanup, in which case assertions will
> not be appreciated.

Not sure what you mean by "using it as a cleanup"?
Its purpose is not to cleanup the general state of track-changes, but
to create a new, clean `track-changes--state`.

>> +;;;; Extra candidates for the API.
>> +;; This could be a good alternative to using a temp-buffer like I used in
>                                                                    ^^^^^^
> "I"?

Yes, that refers to the code I wrote.

>> +;; Eglot, since presumably we've just been changing this very area of the
>> +;; buffer, so the gap should be ready nearby,
>> +;; It may seem silly to go back to the previous state, since we could have
>> +;; used `before-change-functions' to run FUNC right then when we were in
>> +;; that state.  The advantage is that with track-changes we get to decide
>> +;; retroactively which state is the one for which we want to call FUNC and
>> +;; which BEG..END to use: when that state was current we may have known
>> +;; then that it would be "the one" but we didn't know what BEG and END
>> +;; should be because those depend on the changes that came afterwards.
>
> Suggest to reword (or remove) this comment, as it sounds like
> development-time notes.

This is in the "Extra candidates for the API" section, which holds
a bunch of things which might be useful or might not.

>> +(defun diff--track-changes-function (beg end _before)
>> +  (with-demoted-errors "%S"
> Why did you need with-demoted-errors here?

It used to be `ignore-errors` because back in 1999 we didn't have
`with-demoted-errors`.

> Last, but not least: this needs suitable changes in NEWS and ELisp
> manual.

Working on it.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 15:24     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-08 15:53       ` Eli Zaretskii
  2024-04-08 17:17         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-08 20:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2024-04-08 15:53 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 70077

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 70077@debbugs.gnu.org,  acm@muc.de,  yantar92@posteo.net
> Date: Mon, 08 Apr 2024 11:24:38 -0400
> 
> >> +(cl-defstruct (track-changes--state
> >> +               (:noinline t)
> >> +               (:constructor nil)
> >> +               (:constructor track-changes--state ()))
> >> +  "Object holding a description of a buffer state.
> >> +BEG..END is the area that was changed and BEFORE is its previous content.
> >> +If the current buffer currently holds the content of the next state, you can get
> >> +the contents of the previous state with:
> >> +
> >> +    (concat (buffer-substring (point-min) beg)
> >> +                before
> >> +                (buffer-substring end (point-max)))
> >> +
> >> +NEXT is the next state object (i.e. a more recent state).
> >> +If NEXT is nil it means it's most recent state and it may be incomplete
> >> +\(BEG/END/BEFORE may be nil), in which case those fields will take their
> >> +values from `track-changes--before-(beg|end|before)' when the next
> >> +state is create."
> >
> > This doc string should IMO include the form of the object, because
> > without that BEG, END, BEFORE, and NEXT are "out of the blue", and
> > it's entirely unclear what they allude to.
> 
> AFAIK this docstring is displayed only by `describe-type` which shows
> something like:
> 
>     track-changes--state is a type (of kind ‘cl-structure-class’) in ‘track-changes.el’.
>      Inherits from ‘cl-structure-object’.
>     
>     Object holding a description of a buffer state.
>     BEG..END is the area that was changed and BEFORE is its previous content.
>     If the current buffer currently holds the content of the next state, you can get
>     the contents of the previous state with:
>     
>         (concat (buffer-substring (point-min) beg)
>                     before
>                     (buffer-substring end (point-max)))
>     
>     NEXT is the next state object (i.e. a more recent state).
>     If NEXT is nil it means it's most recent state and it may be incomplete
>     (BEG/END/BEFORE may be nil), in which case those fields will take their
>     values from `track-changes--before-(beg|end|before)' when the next
>     state is create.
>     
>     Instance Allocated Slots:
>     
>             Name    Type    Default
>             ————    ————    ———————
>             beg     t       (point-max)
>             end     t       (point-min)
>             before  t       nil
>             next    t       nil
>     
>     Specialized Methods:
>     
>     [...]
> 
> so the "form of the object" is included.

It's included, but _after_ explaining what each member of the object
form means.  That's bad from the methodological POV: we should first
show the form and only afterwards describe each of its members.

> Maybe `describe-type` should lists the slots first and the docstring
> underneath rather than other way around?

That'd also be good.  Then the doc string should say something like

  Object holding a description of a buffer state.
  It has the following Allocated Slots:

            Name    Type    Default
            ————    ————    ———————
            beg     t       (point-max)
            end     t       (point-min)
            before  t       nil
            next    t       nil

  BEG..END is the area that was changed and BEFORE is its previous
  content[...]

(Btw, those "t" under "Type" are also somewhat mysterious.  What do
they signify?)

> >> +(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
> >> +  "Register a new tracker and return a new tracker ID.
> > Please mention SIGNAL in the first line of the doc string.
> 
> Hmm... having trouble making that fit on a single line.

    Register a new tracker whose change-tracking function is SIGNAL.
  Return the ID of the new tracker.

> Could you clarify why you think SIGNAL needs to be on the first line?

It's our convention to mention the mandatory arguments on the first
line of the doc string.

> The best I could come up with so far is:
> 
>     Register a new change tracker handled via SIGNAL.

That's a good start, IMO, except that SIGNAL doesn't hadle the
tracker, it handles changes, right?

> >> +By default SIGNAL is called as soon as convenient after a change, which is
> >                                ^^^^^^^^^^^^^^^^^^^^^
> > "as soon as it's convenient", I presume?
> 
> Other than the extra " it's", what is the difference?

Nothing.  I indeed thing "it's" is missing there.

> >> +usually right after the end of the current command.
> > This should explicitly reference funcall-later, so that users could
> > understand what "as soon as convenient" means.
> 
> I don't see the point of telling which function we use: the programmers
> who want to know that, can consult the code.  As for explaining what "as
> soon as convenient" means, that's what "usually right after the end of
> the current command" is there for.

My point is that by referencing funcall-later you can avoid the need
to explain what is already explained in that function's doc string.
You could, for example, simply say

  By default, SIGNAL is arranged to be called later by using
  `funcall-later'.

> [ Also, I've changed the code to use `run-with-timer` in the mean time.  ]

That'd need a trivial change above, and I still think it's worthwhile,
as it makes this doc string easier to grasp: if one already knows how
run-with-timer works, they don't need anything else to be said; and if
they don't, they can later read on that separately.

IOW, you separate one complex description into two simpler ones: a win
IME.

> >> +In order to prevent the upcoming change from being combined with the previous
> >> +changes, SIGNAL needs to call `track-changes-fetch' before it returns."
> >
> > This seems to contradict what the doc string says previously: that
> > SIGNAL should NOT call track-changes-fetch.
> 
> I don't kow where you see the docstring saying that.
> The closest I can find is:
> 
>     When IMMEDIATE is non-nil, the SIGNAL should preferably not always call
>     `track-changes-fetch', since that would defeat the purpose of this library.
> 
> Note the "When IMMEDIATE is non-nil", "preferably", and "not always",
> and the fact that the reason is not that something will break but that
> some other solution would probably work better.

Then maybe the sentence on which I commented should say

  Except when IMMEDIATE is non-nil, if SIGNAL needs to prevent the
  upcoming change from being combined with the previous ones, it
  should call `track-changes-fetch' before it returns.

> > Are all those assertions a good idea in this function?
> 
> The library has a lot of `cl-assert`s which are all placed both as
> documentation and as sanity checks.  All those should hopefully be true
> all the time barring bugs in the code.
> 
> > I can envision using it as a cleanup, in which case assertions will
> > not be appreciated.
> 
> Not sure what you mean by "using it as a cleanup"?
> Its purpose is not to cleanup the general state of track-changes, but
> to create a new, clean `track-changes--state`.

In general, when I want to create a clean slate, I don't care too much
about the dirt I remove.  Why is it important to signal errors because
a state I am dumping had some errors?

> >> +;;;; Extra candidates for the API.
> >> +;; This could be a good alternative to using a temp-buffer like I used in
> >                                                                    ^^^^^^
> > "I"?
> 
> Yes, that refers to the code I wrote.

We don't usually leave such style in long-term comments and
documentation.

Thanks.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-07 14:07   ` Ihor Radchenko
@ 2024-04-08 16:06     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-09 17:35       ` Ihor Radchenko
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-08 16:06 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Alan Mackenzie, 70077

>> +  "Object holding a description of a buffer state.
>> +BEG..END is the area that was changed and BEFORE is its previous content.
>> +If the current buffer currently holds the content of the next state, you can get
>> +the contents of the previous state with:
>> +
>> +    (concat (buffer-substring (point-min) beg)
>> +                before
>> +                (buffer-substring end (point-max)))
>> +
>> +NEXT is the next state object (i.e. a more recent state).
>> +If NEXT is nil it means it's most recent state and it may be incomplete
>> +\(BEG/END/BEFORE may be nil), in which case those fields will take their
>> +values from `track-changes--before-(beg|end|before)' when the next
>> +state is create."
>
> This docstring is a bit confusing.
> If a state object is not the most recent, how come 
>
>> +    (concat (buffer-substring (point-min) beg)
>> +                before
>> +                (buffer-substring end (point-max)))
>
> produces the previous content?

Because it says "If the current buffer currently holds the content of
the next state".
[ "current ... currently" wasn't my best moment, admittedly.  ]

> And if the state object is the most recent, "it may be incomplete"...
> So, when is it safe to use the above (concat ... ) call?

You never want to use this call, it's only there to show in a concise
manner how BEG/END/BEFORE relate and what information they're supposed
to hold.

>> +(defvar-local track-changes--before-beg (point-min)
>> +  "Beginning position of the remembered \"before string\".")
>> +(defvar-local track-changes--before-end (point-min)
>> +  "End position of the text replacing the \"before string\".")
> Why (point-min)? It will make the values dependent on the buffer
> narrowing that happens to be active when the library if first loaded.
> Which cannot be right.

The precise value should hopefully never matter, barring bugs.
I changed them to 0.

>> +(defvar-local track-changes--buffer-size nil
>> +  "Current size of the buffer, as far as this library knows.
>> +This is used to try and detect cases where buffer modifications are \"lost\".")
> Just looking at the buffer size may miss unregistered edits that do not
> change the total size of the buffer.  Although I do not know a better
> measure.  `buffer-chars-modified-tic' may lead to false-positives
> (Bug#51766).

Yup, hence the "try to".

>> +(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
>> +  "Register a new tracker and return a new tracker ID.
>> +SIGNAL is a function that will be called with one argument (the tracker ID)
>> +after the current buffer is modified, so that we can react to the change.
>> + ...
>> +If optional argument DISJOINT is non-nil, SIGNAL is called every time we are
>> +about to combine changes from \"distant\" parts of the buffer.
>> +This is needed when combining disjoint changes into one bigger change
>> +is unacceptable, typically for performance reasons.
>> +These calls are distinguished from normal calls by calling SIGNAL with
>> +a second argument which is the distance between the upcoming change and
>> +the previous changes.
>
> This is a bit confusing. The first paragraph says that SIGNAL is called
> with a single argument, but that it appears that two arguments may be
> passed. I'd rather tell the calling convention early in the docstring.

The second convention is used only when DISJOINT is non-nil, which is
why it's described where we document the effect of DISJOINT.
An alternative could be to make the `disjoint` arg hold the function to call
to signal disjoint changes.

>> +  (unless nobefore
>> +    (setq track-changes--before-no nil)
>> +    (add-hook 'before-change-functions #'track-changes--before nil t))
>> +  (add-hook 'after-change-functions  #'track-changes--after  nil t)
> Note that all the changes made in indirect buffers will be missed.
> See bug#60333.

Yup.  And misuses of `inhibit-modification-hooks` or
`with-silent-modifications`.  🙁

>> +(defun track-changes-fetch (id func)
>> ...
>> +  (unless (equal track-changes--buffer-size (buffer-size))
>> +    (track-changes--recover-from-error))
>> +  (let ((beg nil)
>> +        (end nil)
>> +        (before t)
>> +        (lenbefore 0)
>> +        (states ()))
>> +    ;; Transfer the data from `track-changes--before-string'
>> +    ;; to the tracker's state object, if needed.
>> +    (track-changes--clean-state)
>
>> +(defun track-changes--recover-from-error ()
>> ...
>> +  (setq track-changes--state (track-changes--state)))
>
> This will create a dummy state with
>
>  (beg (point-max))
>  (end (point-min))
>
> such state will not pass (< beg end) assertion in
> `track-changes--clean-state' called in `track-changes-fetch' immediately
> after `track-changes--recover-from-error'

I can't reproduce that.  Do you have a recipe?

AFAICT all the (< beg end) tests in `track-changes--clean-state` are
conditional on `track-changes--before-clean` being nil, whereas
`track-changes--recover-from-error` sets that var to `unset`.

>> +(defun track-changes--in-revert (beg end before func)
>> ...
>> +      (atomic-change-group
>> +        (goto-char end)
>> +        (insert-before-markers before)
>> +        (delete-region beg end)
>
> What happens if there are markers inside beg...end?

During FUNC they're moved to BEG or END, and when we restore the
original state, well... the undo machinery has some support to restore
the markers where they were, but it's definitely not 100%.  🙁

>> +(defun track-changes-tests--random-word ()
>> +  (let ((chars ()))
>> +    (dotimes (_ (1+ (random 12)))
>> +      (push (+ ?A (random (1+ (- ?z ?A)))) chars))
>> +    (apply #'string chars)))
>
> If you are using random values for edits, how can such test be
> reproduced?

Luck?

> Maybe first generate a random seed and then log it, so that the
> failing test can be repeated if necessary with seed assigned manually.

Good idea.
But my attempt (see patch below) failed.
I'm not sure what I'm doing wrong, but

    make test/lisp/emacs-lisp/track-changes-tests \
         EMACS_EXTRAOPT="--eval '(setq track-changes-tests--random-seed \"15216888\")'"

gives a different score each time.  🙁


        Stefan


diff --git a/test/lisp/emacs-lisp/track-changes-tests.el b/test/lisp/emacs-lisp/track-changes-tests.el
index cdccbe80299..eab9197030f 100644
--- a/test/lisp/emacs-lisp/track-changes-tests.el
+++ b/test/lisp/emacs-lisp/track-changes-tests.el
@@ -36,6 +36,11 @@ track-changes-tests--random-verbose
 (defun track-changes-tests--message (&rest args)
   (when track-changes-tests--random-verbose (apply #'message args)))
 
+(defvar track-changes-tests--random-seed
+  (let ((seed (number-to-string (random (expt 2 24)))))
+    (message "Random seed = %S" seed)
+    seed))
+
 (ert-deftest track-changes-tests--random ()
   ;; Keep 2 buffers in sync with a third one as we make random
   ;; changes to that 3rd one.
@@ -97,6 +102,8 @@ track-changes-tests--random
       (insert-into-buffer buf2)
       (should (equal (buffer-hash) (buffer-hash buf1)))
       (should (equal (buffer-hash) (buffer-hash buf2)))
+      (message "seeding with: %S" track-changes-tests--random-seed)
+      (random track-changes-tests--random-seed)
       (dotimes (_ 1000)
         (pcase (random 15)
           (0






^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 15:53       ` Eli Zaretskii
@ 2024-04-08 17:17         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-08 17:27           ` Andrea Corallo
  2024-04-08 18:36           ` Eli Zaretskii
  0 siblings, 2 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-08 17:17 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 70077

>> Maybe `describe-type` should lists the slots first and the docstring
>> underneath rather than other way around?
>
> That'd also be good.  Then the doc string should say something like
>
>   Object holding a description of a buffer state.
>   It has the following Allocated Slots:
>
>             Name    Type    Default
>             ————    ————    ———————
>             beg     t       (point-max)
>             end     t       (point-min)
>             before  t       nil
>             next    t       nil
>
>   BEG..END is the area that was changed and BEFORE is its previous
>   content[...]

OK, I'll switch the two, thanks.

> (Btw, those "t" under "Type" are also somewhat mysterious.  What do
> they signify?)

`C-h o t RET` says:

    t’s value is t
    
    Not documented as a variable.
    
      Probably introduced at or before Emacs version 16.
    
    
    
    t is also a type.
    
    t is a type (of kind ‘built-in-class’).
     Children ‘sequence’, ‘atom’.
    
    Abstract supertype of everything.
    
    This is a built-in type.
    
    [back]

We could put buttons in the "Type" column, but I'm not sure it'd be
better or worse (I'm worried about turning everything into a button).

>> >> +(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
>> >> +  "Register a new tracker and return a new tracker ID.
>> > Please mention SIGNAL in the first line of the doc string.
>> Hmm... having trouble making that fit on a single line.
>     Register a new tracker whose change-tracking function is SIGNAL.
>   Return the ID of the new tracker.

Thanks.

>> >> +By default SIGNAL is called as soon as convenient after a change, which is
>> >                                ^^^^^^^^^^^^^^^^^^^^^
>> > "as soon as it's convenient", I presume?
>> Other than the extra " it's", what is the difference?
> Nothing.  I indeed thing "it's" is missing there.

My local native-English representative says that "both work fine"
(somewhat dismissively, I must add).

> My point is that by referencing funcall-later you can avoid the need
> to explain what is already explained in that function's doc string.

OK.

>> >> +In order to prevent the upcoming change from being combined with the previous
>> >> +changes, SIGNAL needs to call `track-changes-fetch' before it returns."
>> >
>> > This seems to contradict what the doc string says previously: that
>> > SIGNAL should NOT call track-changes-fetch.
>> 
>> I don't kow where you see the docstring saying that.
>> The closest I can find is:
>> 
>>     When IMMEDIATE is non-nil, the SIGNAL should preferably not always call
>>     `track-changes-fetch', since that would defeat the purpose of this library.
>> 
>> Note the "When IMMEDIATE is non-nil", "preferably", and "not always",
>> and the fact that the reason is not that something will break but that
>> some other solution would probably work better.
>
> Then maybe the sentence on which I commented should say
>
>   Except when IMMEDIATE is non-nil, if SIGNAL needs to prevent the
>   upcoming change from being combined with the previous ones, it
>   should call `track-changes-fetch' before it returns.

But that wouldn't say what I want to say.  It'd want to call
`track-changes-fetch' before it returns even if IMMEDIATE is non-nil.
It just probably wouldn't want to do that for all calls to the signal.
E.g. only for those calls where the second argument is non-nil
(i.e. the disjointness signals).

> In general, when I want to create a clean slate, I don't care too much
> about the dirt I remove.  Why is it important to signal errors because
> a state I am dumping had some errors?

I don't understand why you think it will signal an error?
More to the point, it should signal an error only if I made a mistake in
`track-changes.el` or if you messed with the internals.
Note also that this function is itself internal.

>> >> +;;;; Extra candidates for the API.
>> >> +;; This could be a good alternative to using a temp-buffer like I used in
>> >                                                                    ^^^^^^
>> > "I"?
>> Yes, that refers to the code I wrote.
> We don't usually leave such style in long-term comments and
> documentation.

`grep " I " lisp/**/*.el` suggests otherwise.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 17:17         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-08 17:27           ` Andrea Corallo
  2024-04-08 18:36           ` Eli Zaretskii
  1 sibling, 0 replies; 50+ messages in thread
From: Andrea Corallo @ 2024-04-08 17:27 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, Stefan Monnier, 70077

Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of
text editors" <bug-gnu-emacs@gnu.org> writes:

>>> Maybe `describe-type` should lists the slots first and the docstring
>>> underneath rather than other way around?
>>
>> That'd also be good.  Then the doc string should say something like
>>
>>   Object holding a description of a buffer state.
>>   It has the following Allocated Slots:
>>
>>             Name    Type    Default
>>             ————    ————    ———————
>>             beg     t       (point-max)
>>             end     t       (point-min)
>>             before  t       nil
>>             next    t       nil
>>
>>   BEG..END is the area that was changed and BEFORE is its previous
>>   content[...]
>
> OK, I'll switch the two, thanks.
>
>> (Btw, those "t" under "Type" are also somewhat mysterious.  What do
>> they signify?)
>
> `C-h o t RET` says:
>
>     t’s value is t
>     
>     Not documented as a variable.
>     
>       Probably introduced at or before Emacs version 16.
>     
>     
>     
>     t is also a type.
>     
>     t is a type (of kind ‘built-in-class’).
>      Children ‘sequence’, ‘atom’.
>     
>     Abstract supertype of everything.
>     
>     This is a built-in type.
>     
>     [back]
>
> We could put buttons in the "Type" column, but I'm not sure it'd be
> better or worse (I'm worried about turning everything into a button).

FWIW I'd vote for buttonify every type in our *Help* buffer.

  Andrea





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 17:17         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-08 17:27           ` Andrea Corallo
@ 2024-04-08 18:36           ` Eli Zaretskii
  2024-04-08 20:57             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  1 sibling, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2024-04-08 18:36 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 70077

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 70077@debbugs.gnu.org,  acm@muc.de,  yantar92@posteo.net
> Date: Mon, 08 Apr 2024 13:17:37 -0400
> 
> >             Name    Type    Default
> >             ————    ————    ———————
> >             beg     t       (point-max)
> >             end     t       (point-min)
> >             before  t       nil
> >             next    t       nil
> >
> >   BEG..END is the area that was changed and BEFORE is its previous
> >   content[...]
> 
> OK, I'll switch the two, thanks.
> 
> > (Btw, those "t" under "Type" are also somewhat mysterious.  What do
> > they signify?)
> 
> `C-h o t RET` says:
> 
>     t’s value is t
>     
>     Not documented as a variable.
>     
>       Probably introduced at or before Emacs version 16.
>     
>     
>     
>     t is also a type.
>     
>     t is a type (of kind ‘built-in-class’).
>      Children ‘sequence’, ‘atom’.
>     
>     Abstract supertype of everything.
>     
>     This is a built-in type.

If this indicates that the slots are of built-in-class type, why do we
show the cryptic t there?

> >> >> +By default SIGNAL is called as soon as convenient after a change, which is
> >> >                                ^^^^^^^^^^^^^^^^^^^^^
> >> > "as soon as it's convenient", I presume?
> >> Other than the extra " it's", what is the difference?
> > Nothing.  I indeed thing "it's" is missing there.
> 
> My local native-English representative says that "both work fine"
> (somewhat dismissively, I must add).

In that case I'll yield, but do note that it got me stumbled.

> > In general, when I want to create a clean slate, I don't care too much
> > about the dirt I remove.  Why is it important to signal errors because
> > a state I am dumping had some errors?
> 
> I don't understand why you think it will signal an error?

Doesn't cl-assert signal an error if the condition is false?

> More to the point, it should signal an error only if I made a mistake in
> `track-changes.el` or if you messed with the internals.

I have the latter possibility in mind, yes.  Why catch me doing that
when I'm cleaning up my mess, _after_ all the damage, such as it is,
was already done?

> >> >> +;;;; Extra candidates for the API.
> >> >> +;; This could be a good alternative to using a temp-buffer like I used in
> >> >                                                                    ^^^^^^
> >> > "I"?
> >> Yes, that refers to the code I wrote.
> > We don't usually leave such style in long-term comments and
> > documentation.
> 
> `grep " I " lisp/**/*.el` suggests otherwise.

"A journey of a thousand miles begins with one first step."





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 15:24     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-08 15:53       ` Eli Zaretskii
@ 2024-04-08 20:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-09  3:56         ` Eli Zaretskii
  1 sibling, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-08 20:45 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 70077

[-- Attachment #1: Type: text/plain, Size: 165 bytes --]

>> Last, but not least: this needs suitable changes in NEWS and ELisp
>> manual.
> Working on it.

Here it is (and aso on `scratch/track-changes`).


        Stefan

[-- Attachment #2: 0001-lisp-emacs-lisp-track-changes.el-New-file.patch --]
[-- Type: text/x-diff, Size: 61054 bytes --]

From b676b0ff3f046a1456a433a4b7741599c7ae4714 Mon Sep 17 00:00:00 2001
From: Stefan Monnier <monnier@iro.umontreal.ca>
Date: Fri, 5 Apr 2024 17:37:32 -0400
Subject: [PATCH] lisp/emacs-lisp/track-changes.el: New file

This new package provides an API that is easier to use right than
our `*-change-functions` hooks.

The patch includes changes to `diff-mode.el` and `eglot.el` to
make use of this new package.

* lisp/emacs-lisp/track-changes.el: New file.
* test/lisp/emacs-lisp/track-changes-tests.el: New file.
* doc/lispref/text.texi (Tracking changes): New subsection.

* lisp/progmodes/eglot.el: Require `track-changes`.
(eglot--virtual-pos-to-lsp-position): New function.
(eglot--track-changes): New var.
(eglot--managed-mode): Use `track-changes-register` i.s.o
`after/before-change-functions` when available.
(eglot--track-changes-signal): New function, partly extracted from
`eglot--after-change`.
(eglot--after-change): Use it.
(eglot--track-changes-fetch): New function.
(eglot--signal-textDocument/didChange): Use it.

* lisp/vc/diff-mode.el: Require `track-changes`.
Also require `easy-mmode` before the `eval-when-compile`s.
(diff-unhandled-changes): Delete variable.
(diff-after-change-function): Delete function.
(diff--track-changes-function): Rename from `diff-post-command-hook`
and adjust to new calling convention.
(diff--track-changes): New variable.
(diff--track-changes-signal): New function.
(diff-mode, diff-minor-mode): Use it with `track-changes-register`.
---
 doc/lispref/text.texi                       | 141 +++++
 etc/NEWS                                    |  11 +
 lisp/emacs-lisp/track-changes.el            | 599 ++++++++++++++++++++
 lisp/progmodes/eglot.el                     |  64 ++-
 lisp/vc/diff-mode.el                        |  85 ++-
 test/lisp/emacs-lisp/track-changes-tests.el | 156 +++++
 6 files changed, 1003 insertions(+), 53 deletions(-)
 create mode 100644 lisp/emacs-lisp/track-changes.el
 create mode 100644 test/lisp/emacs-lisp/track-changes-tests.el

diff --git a/doc/lispref/text.texi b/doc/lispref/text.texi
index 18f0ee88fe5..2875f6f6ba8 100644
--- a/doc/lispref/text.texi
+++ b/doc/lispref/text.texi
@@ -6375,3 +6375,144 @@ Change Hooks
 use @code{combine-change-calls} or @code{combine-after-change-calls}
 instead.
 @end defvar
+
+@node Tracking changes
+@subsection Tracking changes
+@cindex track-changes
+
+Using @code{before-change-functions} and @code{after-change-functions}
+can be difficult in practice because of a number of pitfalls, such as
+the fact that the two calls are not always properly paired, or some
+calls may be missing, either because of bugs in the C code or because of
+inappropriate use of @code{inhibit-modification-hooks}.  Furthermore,
+many restrictions apply to those hook functions, such as the fact that
+they basically should never modify the current buffer, nor use an
+operation that may block, and they proceed quickly because
+some commands may call these hooks a large number of times.
+
+The Track-Changes library fundamentally provides an alternative API,
+built on top of those hooks.  Compared to @code{after-change-functions},
+the first important difference is that, instead of providing the bounds
+of the change and the previous length, it provides the bounds of the
+change and the actual previous content of that region.  The need to
+extract information from the original contents of the buffer is one of
+the main reasons why some packages need to use both
+@code{before-change-functions} and @code{after-change-functions} and
+then try to match them up.
+
+The second difference is that it decouples the notification of a change
+from the act of processing it, and it automatically combines into
+a single change operation all the changes that occur between the first
+change and the actual processing.  This makes it natural and easy to
+process the changes at a larger granularity, such as once per command,
+and eliminates most of the restrictions that apply to the usual change
+hook functions, making it possible to use blocking operations or to
+modify the buffer
+
+The start tracking changes, you have to call
+@code{track-changes-register}, passing it a @var{signal} function as
+argument.  This will return a tracker @var{id} which is used to identify
+your tracker to the other functions of the library.  The other main
+function of the library is @code{track-changes-fetch} which lets you
+fetch the changes you have not yet processed.
+
+When the buffer is modified, the library will call the @var{signal}
+function to inform you of that change and will immediately start
+accumulating subsequent changes into a single combined change.
+The @var{signal} function serves only to warn that a modification
+occurred but does not receive a description of the change.  Also the
+library will not call it again until after you processed
+the change.
+
+To process changes, you need to call @code{track-changes-fetch}, which
+will provide you with the bounds of the changes accumulated since the
+last call, as well as the previous content of that region.  It will also
+``re-arm'' the @var{signal} function so that the library will call it
+again after the next buffer modification.
+
+@defun track-changes-register signal &key nobefore disjoint immediate
+This function creates a new @emph{tracker}.  Trackers are kept abstract,
+so we refer to them as mere identities, and the function thus returns
+the tracker's @var{id}.
+
+@var{signal} is a function that the library will call to notify of
+a change.  It will sometimes call it with a single argument and
+sometimes with two.  Upon the first change to the buffer since this
+tracker last called @code{track-changes-fetch}, the library calls this
+@var{signal} function with a single argument holding the @var{id} of
+the tracker.
+
+By default, the call to the @var{signal} function does not happen
+immediately, but is instead postponed with a 0 seconds timer.  This is
+usually desired to make sure the @var{signal} function is not called too
+frequently and runs in a permissive context, freeing the client from
+performance concerns or worries about which operations might be
+problematic.  If a client wants to have more control, they can provide
+a non-nil value as the @var{immediate} argument in which case the
+library will call the @var{signal} function directly from
+@code{after-change-functions}.  Beware that it means that the
+@var{signal} function has to be careful not to modify the buffer or use
+operations that may block.
+
+If you're not interested in the actual previous content of the buffer,
+but are using this library only for its ability to combine many small
+changes into a larger one and to delay the processing to a more
+convenient time, you can specify a non-nil value for the @var{before}
+argument.  This will make it so the library provides you only with the
+length of the previous content, just like
+@code{after-change-functions}.  It will also allow the library to save
+some work.
+
+While you may like to accumulate many small changes into larger ones,
+you may not want to do that if the changes are too far apart.  If you
+specify a non-nil value for the @var{disjoint} argument, the library
+will let you know when a change is about to occur ``far'' from the
+currently pending ones by calling the @var{signal} function right away,
+passing it two arguments this time: the @var{id} of the tracker, and the
+number of characters that separates the upcoming change from the
+already pending changes.  This in itself does not prevent combining this
+new change with the previous ones, so if you think the upcoming change
+is indeed too far, you need to call @code{track-change-fetch}
+right away.
+Beware that when the @var{signal} function is called because of
+a disjoint change, this happens directly from
+@code{before-change-functions}, so the usual restrictions apply about
+modifying the buffer or using operations that may block.
+@end defun
+
+@defun track-changes-fetch id func
+This is the function that lets you find out what has changed in the
+buffer.  By providing the tracker @var{id} you let the library figure
+out which changes have already been seen by your tracker.  Instead of
+returning a description of the changes, @code{track-changes-fetch} calls
+the @var{func} function with that description in the form of
+3 arguments: @var{beg}, @var{end}, and @var{before}, where
+@code{@var{beg}..@var{end}} delimit the region that was modified and
+@var{before} describes the previous content of that region.
+Usually @var{before} is a string containing the previous text of the
+modified region, but if you specified a non-nil @var{nobefore} argument
+to @code{track-changes-register}, then it is replaced by the number of
+characters of that previous text.
+
+In case no changes occurred since the last call,
+@code{track-changes-fetch} simply does not call @var{func} and returns
+nil.  If changes did occur, it calls @var{func} and returns the value
+returned by @var{func}.  But note that @var{func} is called just once
+regardless of how many changes occurred: those are summarized into
+a single @var{beg}/@var{end}/@var{before} triplet.
+
+Once @var{func} finishes, @code{track-changes-fetch} re-enables the
+@var{signal} function so that it will be called the next time a change
+occurs.  This is the reason why it calls @var{func} instead of returning
+a description: it makes sure that the @var{signal} function will not be
+called while you're still processing past changes.
+@end defun
+
+@defun track-changes-unregister id
+This function tells the library that the tracker @var{id} does not need
+to know about buffer changes any more.  Most clients will never want to
+stop tracking changes, but for clients such as minor modes, it is
+important to call this function when the minor mode is disabled,
+otherwise the tracker will keep accumulating changes and consume more
+and more resources.
+@end defun
diff --git a/etc/NEWS b/etc/NEWS
index b2543ae77d9..d85b65abd0b 100644
--- a/etc/NEWS
+++ b/etc/NEWS
@@ -1569,6 +1569,17 @@ This allows disabling JavaScript in xwidget Webkit sessions.
 \f
 * New Modes and Packages in Emacs 30.1
 
+** New package Track-Changes.
+This library is a layer of abstraction above 'before-change-functions'
+and 'after-change-functions' which provides a superset of
+the functionality of 'after-change-functions':
+- It provides the actual previous text rather than only its length.
+- It takes care of accumulating and bundling changes until a time when
+  its client finds it convenient to react to them.
+- It detects most cases where some changes were not properly
+  reported (calls to 'before/after-change-functions' that are
+  incorrectly paired, missing, etc...) and reports them adequately.
+
 ** New major modes based on the tree-sitter library
 
 +++
diff --git a/lisp/emacs-lisp/track-changes.el b/lisp/emacs-lisp/track-changes.el
new file mode 100644
index 00000000000..fef74074582
--- /dev/null
+++ b/lisp/emacs-lisp/track-changes.el
@@ -0,0 +1,599 @@
+;;; track-changes.el --- API to react to buffer modifications  -*- lexical-binding: t; -*-
+
+;; Copyright (C) 2024  Free Software Foundation, Inc.
+
+;; Author: Stefan Monnier <monnier@iro.umontreal.ca>
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;; This library is a layer of abstraction above `before-change-functions'
+;; and `after-change-functions' which takes care of accumulating changes
+;; until a time when its client finds it convenient to react to them.
+;;
+;; It provides an API that is easier to use correctly than our
+;; `*-change-functions' hooks.  Problems that it claims to solve:
+;;
+;; - Before and after calls are not necessarily paired.
+;; - The beg/end values don't always match.
+;; - There's usually only one call to the hooks per command but
+;;   there can be thousands of calls from within a single command,
+;;   so naive users will tend to write code that performs poorly
+;;   in those rare cases.
+;; - The hooks are run at a fairly low-level so there are things they
+;;   really shouldn't do, such as modify the buffer or wait.
+;; - The after call doesn't get enough info to rebuild the before-change state,
+;;   so some callers need to use both before-c-f and after-c-f (and then
+;;   deal with the first two points above).
+;;
+;; The new API is almost like `after-change-functions' except that:
+;; - It provides the "before string" (i.e. the previous content of
+;;   the changed area) rather than only its length.
+;; - It can combine several changes into larger ones.
+;; - Clients do not have to process changes right away, instead they
+;;   can let changes accumulate (by combining them into a larger change)
+;;   until it is convenient for them to process them.
+;; - By default, changes are signaled at most once per command.
+
+;; The API consists in the following functions:
+;;
+;;     (track-changes-register SIGNAL &key NOBEFORE DISJOINT IMMEDIATE)
+;;     (track-changes-fetch ID FUNC)
+;;     (track-changes-unregister ID)
+;;
+;; A typical use case might look like:
+;;
+;;     (defvar my-foo--change-tracker nil)
+;;     (define-minor-mode my-foo-mode
+;;       "Fooing like there's no tomorrow."
+;;       (if (null my-foo-mode)
+;;           (when my-foo--change-tracker
+;;             (track-changes-unregister my-foo--change-tracker)
+;;             (setq my-foo--change-tracker nil))
+;;         (unless my-foo--change-tracker
+;;           (setq my-foo--change-tracker
+;;                 (track-changes-register
+;;                  (lambda (id)
+;;                    (track-changes-fetch
+;;                     id (lambda (beg end before)
+;;                          ..DO THE THING..))))))))
+
+;;; Code:
+
+(require 'cl-lib)
+
+;;;; Internal types and variables.
+
+(cl-defstruct (track-changes--tracker
+               (:noinline t)
+               (:constructor nil)
+               (:constructor track-changes--tracker ( signal state
+                                                      &optional
+                                                      nobefore immediate)))
+  signal state nobefore immediate)
+
+(cl-defstruct (track-changes--state
+               (:noinline t)
+               (:constructor nil)
+               (:constructor track-changes--state ()))
+  "Object holding a description of a buffer state.
+BEG..END is the area that was changed and BEFORE is its previous content.
+If the current buffer currently holds the content of the next state, you can
+get the contents of the previous state with:
+
+    (concat (buffer-substring (point-min) beg)
+                before
+                (buffer-substring end (point-max)))
+
+NEXT is the next state object (i.e. a more recent state).
+If NEXT is nil it means it's most recent state and it may be incomplete
+\(BEG/END/BEFORE may be nil), in which case those fields will take their
+values from `track-changes--before-(beg|end|before)' when the next
+state is create."
+  (beg (point-max))
+  (end (point-min))
+  (before nil)
+  (next nil))
+
+(defvar-local track-changes--trackers ()
+  "List of trackers currently registered in the buffer.")
+(defvar-local track-changes--clean-trackers ()
+  "List of trackers that are clean.
+Those are the trackers that get signaled when a change is made.")
+
+(defvar-local track-changes--disjoint-trackers ()
+ "List of trackers that want to react to disjoint changes.
+These trackers are signaled every time track-changes notices
+that some upcoming changes touch another \"distant\" part of the buffer.")
+
+(defvar-local track-changes--state nil)
+
+;; `track-changes--before-*' keep track of the content of the
+;; buffer when `track-changes--state' was cleaned.
+(defvar-local track-changes--before-beg 0
+  "Beginning position of the remembered \"before string\".")
+(defvar-local track-changes--before-end 0
+  "End position of the text replacing the \"before string\".")
+(defvar-local track-changes--before-string ""
+  "String holding some contents of the buffer before the current change.
+This string is supposed to cover all the already modified areas plus
+the upcoming modifications announced via `before-change-functions'.
+If all trackers are `nobefore', then this holds the `buffer-size' before
+the current change.")
+(defvar-local track-changes--before-no t
+  "If non-nil, all the trackers are `nobefore'.
+Should be equal to (memq #\\='track-changes--before before-change-functions).")
+
+(defvar-local track-changes--before-clean 'unset
+  "Status of `track-changes--before-*' vars.
+More specifically it indicates which \"before\" they hold.
+- nil: The vars hold the \"before\" info of the current state.
+- `unset': The vars hold the \"before\" info of some older state.
+  This is what it is set to right after creating a fresh new state.
+- `set': Like nil but the state is still clean because the buffer has not
+  been modified yet.  This is what it is set to after the first
+  `before-change-functions'  but before an `after-change-functions'.")
+
+(defvar-local track-changes--buffer-size nil
+  "Current size of the buffer, as far as this library knows.
+This is used to try and detect cases where buffer modifications are \"lost\".")
+
+;;;; Exposed API.
+
+(cl-defun track-changes-register ( signal &key nobefore disjoint immediate)
+  "Register a new tracker whose change-tracking function is SIGNAL.
+Return the ID of the new tracker.
+
+SIGNAL is a function that will be called with one argument (the tracker ID)
+after the current buffer is modified, so that it can react to the change.
+Once called, SIGNAL is not called again until `track-changes-fetch'
+is called with the corresponding tracker ID.
+
+If optional argument NOBEFORE is non-nil, it means that this tracker does
+not need the BEFORE strings (it will receive their size instead).
+
+If optional argument DISJOINT is non-nil, SIGNAL is called every time just
+before combining changes from \"distant\" parts of the buffer.
+This is needed when combining disjoint changes into one bigger change
+is unacceptable, typically for performance reasons.
+These calls are distinguished from normal calls by calling SIGNAL with
+a second argument which is the distance between the upcoming change and
+the previous changes.
+BEWARE: In that case SIGNAL is called directly from `before-change-functions'
+and should thus be extra careful: don't modify the buffer, don't call a function
+that may block, ...
+In order to prevent the upcoming change from being combined with the previous
+changes, SIGNAL needs to call `track-changes-fetch' before it returns.
+
+By default SIGNAL is called after a change via a 0 seconds timer.
+If optional argument IMMEDIATE is non-nil it means SIGNAL should be called
+as soon as a change is detected,
+BEWARE: In that case SIGNAL is called directly from `after-change-functions'
+and should thus be extra careful: don't modify the buffer, don't call a function
+that may block, do as little work as possible, ...
+When IMMEDIATE is non-nil, the SIGNAL should probably not always call
+`track-changes-fetch', since that would defeat the purpose of this library."
+  (when (and nobefore disjoint)
+    ;; FIXME: Without `before-change-functions', we can discover
+    ;; a disjoint change only after the fact, which is not good enough.
+    ;; But we could use a stripped down before-change-function,
+    (error "`disjoint' not supported for `nobefore' trackers"))
+  (track-changes--clean-state)
+  (unless nobefore
+    (setq track-changes--before-no nil)
+    (add-hook 'before-change-functions #'track-changes--before nil t))
+  (add-hook 'after-change-functions  #'track-changes--after  nil t)
+  (let ((tracker (track-changes--tracker signal track-changes--state
+                                         nobefore immediate)))
+    (push tracker track-changes--trackers)
+    (push tracker track-changes--clean-trackers)
+    (when disjoint
+      (push tracker track-changes--disjoint-trackers))
+    tracker))
+
+(defun track-changes-unregister (id)
+  "Remove the tracker denoted by ID.
+Trackers can consume resources (especially if `track-changes-fetch' is
+not called), so it is good practice to unregister them when you don't
+need them any more."
+  (unless (memq id track-changes--trackers)
+    (error "Unregistering a non-registered tracker: %S" id))
+  (setq track-changes--trackers (delq id track-changes--trackers))
+  (setq track-changes--clean-trackers (delq id track-changes--clean-trackers))
+  (setq track-changes--disjoint-trackers
+        (delq id track-changes--disjoint-trackers))
+  (when (cl-every #'track-changes--tracker-nobefore track-changes--trackers)
+    (setq track-changes--before-no t)
+    (remove-hook 'before-change-functions #'track-changes--before t))
+  (when (null track-changes--trackers)
+    (mapc #'kill-local-variable
+          '(track-changes--before-beg
+            track-changes--before-end
+            track-changes--before-string
+            track-changes--buffer-size
+            track-changes--before-clean
+            track-changes--state))
+    (remove-hook 'after-change-functions  #'track-changes--after  t)))
+
+(defun track-changes-fetch (id func)
+  "Fetch the pending changes for tracker ID pass them to FUNC.
+ID is the tracker ID returned by a previous `track-changes-register'.
+FUNC is a function.  It is called with 3 arguments (BEGIN END BEFORE)
+where BEGIN..END delimit the region that was changed since the last
+time `track-changes-fetch' was called and BEFORE is a string containing
+the previous content of that region (or just its length as an integer
+if the tracker ID was registered with the `nobefore' option).
+If track-changes detected that some changes were missed, then BEFORE will
+be the symbol `error' to indicate that the buffer got out of sync.
+This reflects a bug somewhere, so please report it when it happens.
+
+If no changes occurred since the last time, it doesn't call FUNC and
+returns nil, otherwise it returns the value returned by FUNC
+and re-enable the TRACKER corresponding to ID."
+  (cl-assert (memq id track-changes--trackers))
+  (unless (equal track-changes--buffer-size (buffer-size))
+    (track-changes--recover-from-error))
+  (let ((beg nil)
+        (end nil)
+        (before t)
+        (lenbefore 0)
+        (states ()))
+    ;; Transfer the data from `track-changes--before-string'
+    ;; to the tracker's state object, if needed.
+    (track-changes--clean-state)
+    ;; We want to combine the states from most recent to oldest,
+    ;; so reverse them.
+    (let ((state (track-changes--tracker-state id)))
+      (while state
+        (push state states)
+        (setq state (track-changes--state-next state))))
+
+    (cond
+     ((eq (car states) track-changes--state)
+      (cl-assert (null (track-changes--state-before (car states))))
+      (setq states (cdr states)))
+     (t
+      ;; The states are disconnected from the latest state because
+      ;; we got out of sync!
+      (cl-assert (eq (track-changes--state-before (car states)) 'error))
+      (setq beg (point-min))
+      (setq end (point-max))
+      (setq before 'error)
+      (setq states nil)))
+
+    (dolist (state states)
+      (let ((prevbeg (track-changes--state-beg state))
+            (prevend (track-changes--state-end state))
+            (prevbefore (track-changes--state-before state)))
+        (if (eq before t)
+            (progn
+              ;; This is the most recent change.  Just initialize the vars.
+              (setq beg prevbeg)
+              (setq end prevend)
+              (setq lenbefore
+                    (if (stringp prevbefore) (length prevbefore) prevbefore))
+              (setq before
+                    (unless (track-changes--tracker-nobefore id) prevbefore)))
+          (let ((endb (+ beg lenbefore)))
+            (when (< prevbeg beg)
+              (if (not before)
+                  (setq lenbefore (+ (- beg prevbeg) lenbefore))
+                (setq before
+                      (concat (buffer-substring-no-properties
+                               prevbeg beg)
+                              before))
+                (setq lenbefore (length before)))
+              (setq beg prevbeg)
+              (cl-assert (= endb (+ beg lenbefore))))
+            (when (< endb prevend)
+              (let ((new-end (+ end (- prevend endb))))
+                (if (not before)
+                    (setq lenbefore (+ lenbefore (- new-end end)))
+                  (setq before
+                        (concat before
+                                (buffer-substring-no-properties
+                                 end new-end)))
+                  (setq lenbefore (length before)))
+                (setq end new-end)
+                (cl-assert (= prevend (+ beg lenbefore)))
+                (setq endb (+ beg lenbefore))))
+            (cl-assert (<= beg prevbeg prevend endb))
+            ;; The `prevbefore' is covered by the new one.
+            (if (not before)
+                (setq lenbefore
+                      (+ (- prevbeg beg)
+                         (if (stringp prevbefore)
+                             (length prevbefore) prevbefore)
+                         (- endb prevend)))
+              (setq before
+                    (concat (substring before 0 (- prevbeg beg))
+                            prevbefore
+                            (substring before (- (length before)
+                                                 (- endb prevend)))))
+              (setq lenbefore (length before)))))))
+    (if (null beg)
+        (progn
+          (cl-assert (null states))
+          (cl-assert (memq id track-changes--clean-trackers))
+          (cl-assert (eq (track-changes--tracker-state id)
+                         track-changes--state))
+          ;; Nothing to do.
+          nil)
+      (cl-assert (<= (point-min) beg end (point-max)))
+      ;; Update the tracker's state *before* running `func' so we don't risk
+      ;; mistakenly replaying the changes in case `func' exits non-locally.
+      (setf (track-changes--tracker-state id) track-changes--state)
+      (unwind-protect (funcall func beg end (or before lenbefore))
+        ;; Re-enable the tracker's signal only after running `func', so
+        ;; as to avoid recursive invocations.
+        (cl-pushnew id track-changes--clean-trackers)))))
+
+;;;; Auxiliary functions.
+
+(defun track-changes--clean-state ()
+  (cond
+   ((null track-changes--state)
+    (cl-assert track-changes--before-clean)
+    (cl-assert (null track-changes--buffer-size))
+    ;; No state has been created yet.  Do it now.
+    (setq track-changes--buffer-size (buffer-size))
+    (when track-changes--before-no
+      (setq track-changes--before-string (buffer-size)))
+    (setq track-changes--state (track-changes--state)))
+   (track-changes--before-clean nil)
+   (t
+    (cl-assert (<= (track-changes--state-beg track-changes--state)
+                   (track-changes--state-end track-changes--state)))
+    (let ((actual-beg (track-changes--state-beg track-changes--state))
+          (actual-end (track-changes--state-end track-changes--state)))
+      (if track-changes--before-no
+          (progn
+            (cl-assert (integerp track-changes--before-string))
+            (setf (track-changes--state-before track-changes--state)
+                  (- track-changes--before-string
+                     (- (buffer-size) (- actual-end actual-beg))))
+            (setq track-changes--before-string (buffer-size)))
+        (cl-assert (<= track-changes--before-beg
+                       actual-beg actual-end
+                       track-changes--before-end))
+        (cl-assert (null (track-changes--state-before track-changes--state)))
+        ;; The `track-changes--before-*' vars can cover more text than the
+        ;; actually modified area, so trim it down now to the relevant part.
+        (unless (= (- track-changes--before-end track-changes--before-beg)
+                   (- actual-end actual-beg))
+          (setq track-changes--before-string
+                (substring track-changes--before-string
+                           (- actual-beg track-changes--before-beg)
+                           (- (length track-changes--before-string)
+                              (- track-changes--before-end actual-end))))
+          (setq track-changes--before-beg actual-beg)
+          (setq track-changes--before-end actual-end))
+        (setf (track-changes--state-before track-changes--state)
+              track-changes--before-string)))
+    ;; Note: We preserve `track-changes--before-*' because they may still
+    ;; be needed, in case `after-change-functions' are run before the next
+    ;; `before-change-functions'.
+    ;; Instead, we set `track-changes--before-clean' to `unset' to mean that
+    ;; `track-changes--before-*' can be reset at the next
+    ;; `before-change-functions'.
+    (setq track-changes--before-clean 'unset)
+    (let ((new (track-changes--state)))
+      (setf (track-changes--state-next track-changes--state) new)
+      (setq track-changes--state new)))))
+
+(defvar track-changes--disjoint-threshold 100
+  "Number of chars below which changes are not considered disjoint.")
+
+(defvar track-changes--error-log ()
+  "List of errors encountered.
+Each element is a triplet (BUFFER-NAME BACKTRACE RECENT-KEYS).")
+
+(defun track-changes--recover-from-error ()
+  ;; We somehow got out of sync.  This is usually the result of a bug
+  ;; elsewhere that causes the before-c-f and after-c-f to be improperly
+  ;; paired, or to be skipped altogether.
+  ;; Not much we can do, other than force a full re-synchronization.
+  (warn "Missing/incorrect calls to `before/after-change-functions'!!
+Details logged to `track-changes--error-log'")
+  (push (list (buffer-name)
+              (backtrace-frames 'track-changes--recover-from-error)
+              (recent-keys 'include-cmds))
+        track-changes--error-log)
+  (setq track-changes--before-clean 'unset)
+  (setq track-changes--buffer-size (buffer-size))
+  ;; Create a new state disconnected from the previous ones!
+  ;; Mark the previous one as junk, just to be clear.
+  (setf (track-changes--state-before track-changes--state) 'error)
+  (setq track-changes--state (track-changes--state)))
+
+(defun track-changes--before (beg end)
+  (cl-assert track-changes--state)
+  (cl-assert (<= beg end))
+  (let* ((size (- end beg))
+         (reset (lambda ()
+                  (cl-assert track-changes--before-clean)
+                  (setq track-changes--before-clean 'set)
+                  (setf track-changes--before-string
+                        (buffer-substring-no-properties beg end))
+                  (setf track-changes--before-beg beg)
+                  (setf track-changes--before-end end)))
+
+         (signal-if-disjoint
+          (lambda (pos1 pos2)
+            (let ((distance (- pos2 pos1)))
+              (when (> distance
+                       (max track-changes--disjoint-threshold
+                            ;; If the distance is smaller than the size of the
+                            ;; current change, then we may as well consider it
+                            ;; as "near".
+                            (length track-changes--before-string)
+                            size
+                            (- track-changes--before-end
+                               track-changes--before-beg)))
+                (dolist (tracker track-changes--disjoint-trackers)
+                  (funcall (track-changes--tracker-signal tracker)
+                           tracker distance))
+                ;; Return non-nil if the state was cleaned along the way.
+                track-changes--before-clean)))))
+
+    (if track-changes--before-clean
+        (progn
+          ;; Detect disjointness with previous changes here as well,
+          ;; so that if a client calls `track-changes-fetch' all the time,
+          ;; it doesn't prevent others from getting a disjointness signal.
+          (when (and track-changes--before-beg
+                     (let ((found nil))
+                       (dolist (tracker track-changes--disjoint-trackers)
+                         (unless (memq tracker track-changes--clean-trackers)
+                           (setq found t)))
+                       found))
+            ;; There's at least one `tracker' that wants to know about disjoint
+            ;; changes *and* it has unseen pending changes.
+            ;; FIXME: This can occasionally signal a tracker that's clean.
+            (if (< beg track-changes--before-beg)
+                (funcall signal-if-disjoint end track-changes--before-beg)
+              (funcall signal-if-disjoint track-changes--before-end beg)))
+          (funcall reset))
+      (cl-assert (save-restriction
+                   (widen)
+                   (<= (point-min)
+                       track-changes--before-beg
+                       track-changes--before-end
+                       (point-max))))
+      (when (< beg track-changes--before-beg)
+        (if (and track-changes--disjoint-trackers
+                 (funcall signal-if-disjoint end track-changes--before-beg))
+            (funcall reset)
+          (let* ((old-bbeg track-changes--before-beg)
+                 ;; To avoid O(N²) behavior when faced with many small changes,
+                 ;; we copy more than needed.
+                 (new-bbeg (min (max (point-min)
+                                     (- old-bbeg
+                                        (length track-changes--before-string)))
+                                beg)))
+            (setf track-changes--before-beg new-bbeg)
+            (cl-callf (lambda (old new) (concat new old))
+                track-changes--before-string
+              (buffer-substring-no-properties new-bbeg old-bbeg)))))
+
+      (when (< track-changes--before-end end)
+        (if (and track-changes--disjoint-trackers
+                 (funcall signal-if-disjoint track-changes--before-end beg))
+            (funcall reset)
+          (let* ((old-bend track-changes--before-end)
+                 ;; To avoid O(N²) behavior when faced with many small changes,
+                 ;; we copy more than needed.
+                 (new-bend (max (min (point-max)
+                                     (+ old-bend
+                                        (length track-changes--before-string)))
+                                end)))
+            (setf track-changes--before-end new-bend)
+            (cl-callf concat track-changes--before-string
+              (buffer-substring-no-properties old-bend new-bend))))))))
+
+(defun track-changes--after (beg end len)
+  (cl-assert track-changes--state)
+  (and (eq track-changes--before-clean 'unset)
+       (not track-changes--before-no)
+       ;; This can be a sign that a `before-change-functions' went missing,
+       ;; or that we called `track-changes--clean-state' between
+       ;; a `before-change-functions' and `after-change-functions'.
+       (track-changes--before beg end))
+  (setq track-changes--before-clean nil)
+  (let ((offset (- (- end beg) len)))
+    (cl-incf track-changes--before-end offset)
+    (cl-incf track-changes--buffer-size offset)
+    (if (not (or track-changes--before-no
+                 (save-restriction
+                   (widen)
+                   (<= (point-min)
+                       track-changes--before-beg
+                       beg end
+                       track-changes--before-end
+                       (point-max)))))
+        ;; BEG..END is not covered by previous `before-change-functions'!!
+        (track-changes--recover-from-error)
+      ;; Note the new changes.
+      (when (< beg (track-changes--state-beg track-changes--state))
+        (setf (track-changes--state-beg track-changes--state) beg))
+      (cl-callf (lambda (old-end) (max end (+ old-end offset)))
+          (track-changes--state-end track-changes--state))
+      (cl-assert (or track-changes--before-no
+                     (<= track-changes--before-beg
+                         (track-changes--state-beg track-changes--state)
+                         beg end
+                         (track-changes--state-end track-changes--state)
+                         track-changes--before-end)))))
+  (while track-changes--clean-trackers
+    (let ((tracker (pop track-changes--clean-trackers)))
+      (if (track-changes--tracker-immediate tracker)
+          (funcall (track-changes--tracker-signal tracker) tracker)
+        (run-with-timer 0 nil #'track-changes--call-signal
+                        (current-buffer) tracker)))))
+
+(defun track-changes--call-signal (buf tracker)
+  (when (buffer-live-p buf)
+    (with-current-buffer buf
+      ;; Silence ourselves if `track-changes-fetch' was called in the mean time.
+      (unless (memq tracker track-changes--clean-trackers)
+        (funcall (track-changes--tracker-signal tracker) tracker)))))
+
+;;;; Extra candidates for the API.
+
+;; This could be a good alternative to using a temp-buffer like I used in
+;; Eglot, since presumably we've just been changing this very area of the
+;; buffer, so the gap should be ready nearby,
+;; It may seem silly to go back to the previous state, since we could have
+;; used `before-change-functions' to run FUNC right then when we were in
+;; that state.  The advantage is that with track-changes we get to decide
+;; retroactively which state is the one for which we want to call FUNC and
+;; which BEG..END to use: when that state was current we may have known
+;; then that it would be "the one" but we didn't know what BEG and END
+;; should be because those depend on the changes that came afterwards.
+(defun track-changes--in-revert (beg end before func)
+  "Call FUNC with the buffer contents temporarily reverted to BEFORE.
+FUNC is called with no arguments and with point right after BEFORE.
+FUNC is not allowed to modify the buffer and it should refrain from using
+operations that use a cache populated from the buffer's content,
+such as `syntax-ppss'."
+  (catch 'track-changes--exit
+    (with-silent-modifications ;; This has to be outside `atomic-change-group'.
+      (atomic-change-group
+        (goto-char end)
+        (insert-before-markers before)
+        (delete-region beg end)
+        (throw 'track-changes--exit
+               (let ((inhibit-read-only nil)
+                     (buffer-read-only t))
+                 (funcall func)))))))
+
+(defun track-changes--reset (id)
+  "Mark all past changes as handled for tracker ID.
+Does not re-enable ID's signal."
+  (track-changes--clean-state)
+  (setf (track-changes--tracker-state id) track-changes--state))
+
+(defun track-changes--pending-p (id)
+  "Return non-nil if there are pending changes for tracker ID."
+  (not (memq id track-changes--clean-trackers)))
+
+(defmacro with--track-changes (id vars &rest body)
+  (declare (indent 2) (debug (form sexp body)))
+  `(track-changes-fetch ,id (lambda ,vars ,@body)))
+
+(provide 'track-changes)
+;;; track-changes.el end here.
diff --git a/lisp/progmodes/eglot.el b/lisp/progmodes/eglot.el
index 7f4284bf09d..478e7687bb3 100644
--- a/lisp/progmodes/eglot.el
+++ b/lisp/progmodes/eglot.el
@@ -110,6 +110,7 @@
 (require 'text-property-search nil t)
 (require 'diff-mode)
 (require 'diff)
+(require 'track-changes nil t)
 
 ;; These dependencies are also GNU ELPA core packages.  Because of
 ;; bug#62576, since there is a risk that M-x package-install, despite
@@ -1732,6 +1733,9 @@ eglot-utf-16-linepos
   "Calculate number of UTF-16 code units from position given by LBP.
 LBP defaults to `eglot--bol'."
   (/ (- (length (encode-coding-region (or lbp (eglot--bol))
+                                      ;; FIXME: How could `point' ever be
+                                      ;; larger than `point-max' (sounds like
+                                      ;; a bug in Emacs).
                                       ;; Fix github#860
                                       (min (point) (point-max)) 'utf-16 t))
         2)
@@ -1749,6 +1753,24 @@ eglot--pos-to-lsp-position
          :character (progn (when pos (goto-char pos))
                            (funcall eglot-current-linepos-function)))))
 
+(defun eglot--virtual-pos-to-lsp-position (pos string)
+  "Return the LSP position at the end of STRING if it were inserted at POS."
+  (eglot--widening
+   (goto-char pos)
+   (forward-line 0)
+   ;; LSP line is zero-origin; Emacs is one-origin.
+   (let ((posline (1- (line-number-at-pos nil t)))
+         (linebeg (buffer-substring (point) pos))
+         (colfun eglot-current-linepos-function))
+     ;; Use a temp buffer because:
+     ;; - I don't know of a fast way to count newlines in a string.
+     ;; - We currently don't have `eglot-current-linepos-function' for strings.
+     (with-temp-buffer
+       (insert linebeg string)
+       (goto-char (point-max))
+       (list :line (+ posline (1- (line-number-at-pos nil t)))
+             :character (funcall colfun))))))
+
 (defvar eglot-move-to-linepos-function #'eglot-move-to-utf-16-linepos
   "Function to move to a position within a line reported by the LSP server.
 
@@ -1946,6 +1968,8 @@ eglot-managed-mode-hook
   "A hook run by Eglot after it started/stopped managing a buffer.
 Use `eglot-managed-p' to determine if current buffer is managed.")
 
+(defvar-local eglot--track-changes nil)
+
 (define-minor-mode eglot--managed-mode
   "Mode for source buffers managed by some Eglot project."
   :init-value nil :lighter nil :keymap eglot-mode-map
@@ -1959,8 +1983,13 @@ eglot--managed-mode
       ("utf-8"
        (eglot--setq-saving eglot-current-linepos-function #'eglot-utf-8-linepos)
        (eglot--setq-saving eglot-move-to-linepos-function #'eglot-move-to-utf-8-linepos)))
-    (add-hook 'after-change-functions #'eglot--after-change nil t)
-    (add-hook 'before-change-functions #'eglot--before-change nil t)
+    (if (fboundp 'track-changes-register)
+        (unless eglot--track-changes
+          (setq eglot--track-changes
+           (track-changes-register
+            #'eglot--track-changes-signal :disjoint t)))
+      (add-hook 'after-change-functions #'eglot--after-change nil t)
+      (add-hook 'before-change-functions #'eglot--before-change nil t))
     (add-hook 'kill-buffer-hook #'eglot--managed-mode-off nil t)
     ;; Prepend "didClose" to the hook after the "nonoff", so it will run first
     (add-hook 'kill-buffer-hook #'eglot--signal-textDocument/didClose nil t)
@@ -1998,6 +2027,9 @@ eglot--managed-mode
                buffer
                (eglot--managed-buffers (eglot-current-server)))))
    (t
+    (when eglot--track-changes
+      (track-changes-unregister eglot--track-changes)
+      (setq eglot--track-changes nil))
     (remove-hook 'after-change-functions #'eglot--after-change t)
     (remove-hook 'before-change-functions #'eglot--before-change t)
     (remove-hook 'kill-buffer-hook #'eglot--managed-mode-off t)
@@ -2588,7 +2620,6 @@ eglot--document-changed-hook
 (defun eglot--after-change (beg end pre-change-length)
   "Hook onto `after-change-functions'.
 Records BEG, END and PRE-CHANGE-LENGTH locally."
-  (cl-incf eglot--versioned-identifier)
   (pcase (car-safe eglot--recent-changes)
     (`(,lsp-beg ,lsp-end
                 (,b-beg . ,b-beg-marker)
@@ -2616,6 +2647,29 @@ eglot--after-change
                `(,lsp-beg ,lsp-end ,pre-change-length
                           ,(buffer-substring-no-properties beg end)))))
     (_ (setf eglot--recent-changes :emacs-messup)))
+  (eglot--track-changes-signal nil))
+
+(defun eglot--track-changes-fetch (id)
+  (if (eq eglot--recent-changes :pending) (setq eglot--recent-changes nil))
+  (track-changes-fetch
+   id (lambda (beg end before)
+        (cond
+         ((eq eglot--recent-changes :emacs-messup) nil)
+         ((eq before 'error) (setf eglot--recent-changes :emacs-messup))
+         (t (push `(,(eglot--pos-to-lsp-position beg)
+                    ,(eglot--virtual-pos-to-lsp-position beg before)
+                    ,(length before)
+                    ,(buffer-substring-no-properties beg end))
+                  eglot--recent-changes))))))
+
+(defun eglot--track-changes-signal (id &optional distance)
+  (cl-incf eglot--versioned-identifier)
+  (cond
+   (distance (eglot--track-changes-fetch id))
+   (eglot--recent-changes nil)
+   ;; Note that there are pending changes, for the benefit of those
+   ;; who check it as a boolean.
+   (t (setq eglot--recent-changes :pending)))
   (when eglot--change-idle-timer (cancel-timer eglot--change-idle-timer))
   (let ((buf (current-buffer)))
     (setq eglot--change-idle-timer
@@ -2729,6 +2783,8 @@ eglot-handle-request
 (defun eglot--signal-textDocument/didChange ()
   "Send textDocument/didChange to server."
   (when eglot--recent-changes
+    (when eglot--track-changes
+      (eglot--track-changes-fetch eglot--track-changes))
     (let* ((server (eglot--current-server-or-lose))
            (sync-capability (eglot-server-capable :textDocumentSync))
            (sync-kind (if (numberp sync-capability) sync-capability
@@ -2750,7 +2806,7 @@ eglot--signal-textDocument/didChange
                    ;; empty entries in `eglot--before-change' calls
                    ;; without an `eglot--after-change' reciprocal.
                    ;; Weed them out here.
-                   when (numberp len)
+                   when (numberp len) ;FIXME: Not needed with `track-changes'.
                    vconcat `[,(list :range `(:start ,beg :end ,end)
                                     :rangeLength len :text text)]))))
       (setq eglot--recent-changes nil)
diff --git a/lisp/vc/diff-mode.el b/lisp/vc/diff-mode.el
index 66043059d14..0a618dc8f39 100644
--- a/lisp/vc/diff-mode.el
+++ b/lisp/vc/diff-mode.el
@@ -53,9 +53,10 @@
 ;; - Handle `diff -b' output in context->unified.
 
 ;;; Code:
+(require 'easy-mmode)
+(require 'track-changes)
 (eval-when-compile (require 'cl-lib))
 (eval-when-compile (require 'subr-x))
-(require 'easy-mmode)
 
 (autoload 'vc-find-revision "vc")
 (autoload 'vc-find-revision-no-save "vc")
@@ -1431,38 +1432,23 @@ diff-write-contents-hooks
   (if (buffer-modified-p) (diff-fixup-modifs (point-min) (point-max)))
   nil)
 
-;; It turns out that making changes in the buffer from within an
-;; *-change-function is asking for trouble, whereas making them
-;; from a post-command-hook doesn't pose much problems
-(defvar diff-unhandled-changes nil)
-(defun diff-after-change-function (beg end _len)
-  "Remember to fixup the hunk header.
-See `after-change-functions' for the meaning of BEG, END and LEN."
-  ;; Ignoring changes when inhibit-read-only is set is strictly speaking
-  ;; incorrect, but it turns out that inhibit-read-only is normally not set
-  ;; inside editing commands, while it tends to be set when the buffer gets
-  ;; updated by an async process or by a conversion function, both of which
-  ;; would rather not be uselessly slowed down by this hook.
-  (when (and (not undo-in-progress) (not inhibit-read-only))
-    (if diff-unhandled-changes
-	(setq diff-unhandled-changes
-	      (cons (min beg (car diff-unhandled-changes))
-		    (max end (cdr diff-unhandled-changes))))
-      (setq diff-unhandled-changes (cons beg end)))))
-
-(defun diff-post-command-hook ()
-  "Fixup hunk headers if necessary."
-  (when (consp diff-unhandled-changes)
-    (ignore-errors
-      (save-excursion
-	(goto-char (car diff-unhandled-changes))
-	;; Maybe we've cut the end of the hunk before point.
-	(if (and (bolp) (not (bobp))) (backward-char 1))
-	;; We used to fixup modifs on all the changes, but it turns out that
-	;; it's safer not to do it on big changes, e.g. when yanking a big
-	;; diff, or when the user edits the header, since we might then
-	;; screw up perfectly correct values.  --Stef
-	(diff-beginning-of-hunk t)
+(defvar-local diff--track-changes nil)
+
+(defun diff--track-changes-signal (tracker)
+  (cl-assert (eq tracker diff--track-changes))
+  (track-changes-fetch tracker #'diff--track-changes-function))
+
+(defun diff--track-changes-function (beg end _before)
+  (with-demoted-errors "%S"
+    (save-excursion
+      (goto-char beg)
+      ;; Maybe we've cut the end of the hunk before point.
+      (if (and (bolp) (not (bobp))) (backward-char 1))
+      ;; We used to fixup modifs on all the changes, but it turns out that
+      ;; it's safer not to do it on big changes, e.g. when yanking a big
+      ;; diff, or when the user edits the header, since we might then
+      ;; screw up perfectly correct values.  --Stef
+      (when (ignore-errors (diff-beginning-of-hunk t))
         (let* ((style (if (looking-at "\\*\\*\\*") 'context))
                (start (line-beginning-position (if (eq style 'context) 3 2)))
                (mid (if (eq style 'context)
@@ -1470,17 +1456,16 @@ diff-post-command-hook
                           (re-search-forward diff-context-mid-hunk-header-re
                                              nil t)))))
           (when (and ;; Don't try to fixup changes in the hunk header.
-                 (>= (car diff-unhandled-changes) start)
+                 (>= beg start)
                  ;; Don't try to fixup changes in the mid-hunk header either.
                  (or (not mid)
-                     (< (cdr diff-unhandled-changes) (match-beginning 0))
-                     (> (car diff-unhandled-changes) (match-end 0)))
+                     (< end (match-beginning 0))
+                     (> beg (match-end 0)))
                  (save-excursion
-		(diff-end-of-hunk nil 'donttrustheader)
+		   (diff-end-of-hunk nil 'donttrustheader)
                    ;; Don't try to fixup changes past the end of the hunk.
-                   (>= (point) (cdr diff-unhandled-changes))))
-	  (diff-fixup-modifs (point) (cdr diff-unhandled-changes)))))
-      (setq diff-unhandled-changes nil))))
+                   (>= (point) end)))
+	    (diff-fixup-modifs (point) end)))))))
 
 (defun diff-next-error (arg reset)
   ;; Select a window that displays the current buffer so that point
@@ -1560,9 +1545,8 @@ diff-mode
   ;; setup change hooks
   (if (not diff-update-on-the-fly)
       (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
-    (make-local-variable 'diff-unhandled-changes)
-    (add-hook 'after-change-functions #'diff-after-change-function nil t)
-    (add-hook 'post-command-hook #'diff-post-command-hook nil t))
+    (setq diff--track-changes
+          (track-changes-register #'diff--track-changes-signal :nobefore t)))
 
   ;; add-log support
   (setq-local add-log-current-defun-function #'diff-current-defun)
@@ -1581,12 +1565,15 @@ diff-minor-mode
 \\{diff-minor-mode-map}"
   :group 'diff-mode :lighter " Diff"
   ;; FIXME: setup font-lock
-  ;; setup change hooks
-  (if (not diff-update-on-the-fly)
-      (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
-    (make-local-variable 'diff-unhandled-changes)
-    (add-hook 'after-change-functions #'diff-after-change-function nil t)
-    (add-hook 'post-command-hook #'diff-post-command-hook nil t)))
+  (when diff--track-changes (track-changes-unregister diff--track-changes))
+  (remove-hook 'write-contents-functions #'diff-write-contents-hooks t)
+  (when diff-minor-mode
+    (if (not diff-update-on-the-fly)
+        (add-hook 'write-contents-functions #'diff-write-contents-hooks nil t)
+      (unless diff--track-changes
+        (setq diff--track-changes
+              (track-changes-register #'diff--track-changes-signal
+                                      :nobefore t))))))
 
 ;;; Handy hook functions ;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
 
diff --git a/test/lisp/emacs-lisp/track-changes-tests.el b/test/lisp/emacs-lisp/track-changes-tests.el
new file mode 100644
index 00000000000..eab9197030f
--- /dev/null
+++ b/test/lisp/emacs-lisp/track-changes-tests.el
@@ -0,0 +1,156 @@
+;;; track-changes-tests.el --- tests for emacs-lisp/track-changes.el  -*- lexical-binding:t -*-
+
+;; Copyright (C) 2024  Free Software Foundation, Inc.
+
+;; This file is part of GNU Emacs.
+
+;; GNU Emacs is free software: you can redistribute it and/or modify
+;; it under the terms of the GNU General Public License as published by
+;; the Free Software Foundation, either version 3 of the License, or
+;; (at your option) any later version.
+
+;; GNU Emacs is distributed in the hope that it will be useful,
+;; but WITHOUT ANY WARRANTY; without even the implied warranty of
+;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+;; GNU General Public License for more details.
+
+;; You should have received a copy of the GNU General Public License
+;; along with GNU Emacs.  If not, see <https://www.gnu.org/licenses/>.
+
+;;; Commentary:
+
+;;; Code:
+
+(require 'track-changes)
+(require 'cl-lib)
+(require 'ert)
+
+(defun track-changes-tests--random-word ()
+  (let ((chars ()))
+    (dotimes (_ (1+ (random 12)))
+      (push (+ ?A (random (1+ (- ?z ?A)))) chars))
+    (apply #'string chars)))
+
+(defvar track-changes-tests--random-verbose nil)
+
+(defun track-changes-tests--message (&rest args)
+  (when track-changes-tests--random-verbose (apply #'message args)))
+
+(defvar track-changes-tests--random-seed
+  (let ((seed (number-to-string (random (expt 2 24)))))
+    (message "Random seed = %S" seed)
+    seed))
+
+(ert-deftest track-changes-tests--random ()
+  ;; Keep 2 buffers in sync with a third one as we make random
+  ;; changes to that 3rd one.
+  ;; We have 3 trackers: a "normal" one which we sync
+  ;; at random intervals, one which syncs via the "disjoint" signal,
+  ;; plus a third one which verifies that "nobefore" gets
+  ;; information consistent with the "normal" tracker.
+  (with-temp-buffer
+    (dotimes (_ 100)
+      (insert (track-changes-tests--random-word) "\n"))
+    (let* ((buf1 (generate-new-buffer " *tc1*"))
+           (buf2 (generate-new-buffer " *tc2*"))
+           (char-counts (make-vector 2 0))
+           (sync-counts (make-vector 2 0))
+           (print-escape-newlines t)
+           (file (make-temp-file "tc"))
+           (id1 (track-changes-register #'ignore))
+           (id3 (track-changes-register #'ignore :nobefore t))
+           (sync
+            (lambda (id buf n)
+              (track-changes-tests--message "!! SYNC %d !!" n)
+              (track-changes-fetch
+               id (lambda (beg end before)
+                    (when (eq n 1)
+                      (track-changes-fetch
+                       id3 (lambda (beg3 end3 before3)
+                             (should (eq beg3 beg))
+                             (should (eq end3 end))
+                             (should (eq before3
+                                         (if (symbolp before)
+                                             before (length before)))))))
+                    (cl-incf (aref sync-counts (1- n)))
+                    (cl-incf (aref char-counts (1- n)) (- end beg))
+                    (let ((after (buffer-substring beg end)))
+                      (track-changes-tests--message
+                       "Sync:\n    %S\n=>  %S\nat %d .. %d"
+                       before after beg end)
+                      (with-current-buffer buf
+                        (if (eq before 'error)
+                            (erase-buffer)
+                          (should (equal before
+                                         (buffer-substring
+                                          beg (+ beg (length before)))))
+                          (delete-region beg (+ beg (length before))))
+                        (goto-char beg)
+                        (insert after)))
+                    (should (equal (buffer-string)
+                                   (with-current-buffer buf
+                                     (buffer-string))))))))
+           (id2 (track-changes-register
+                 (lambda (id2 &optional distance)
+                   (when distance
+                     (track-changes-tests--message "Disjoint distance: %d"
+                                                   distance)
+                     (funcall sync id2 buf2 2)))
+                 :disjoint t)))
+      (write-region (point-min) (point-max) file)
+      (insert-into-buffer buf1)
+      (insert-into-buffer buf2)
+      (should (equal (buffer-hash) (buffer-hash buf1)))
+      (should (equal (buffer-hash) (buffer-hash buf2)))
+      (message "seeding with: %S" track-changes-tests--random-seed)
+      (random track-changes-tests--random-seed)
+      (dotimes (_ 1000)
+        (pcase (random 15)
+          (0
+           (track-changes-tests--message "Manual sync1")
+           (funcall sync id1 buf1 1))
+          (1
+           (track-changes-tests--message "Manual sync2")
+           (funcall sync id2 buf2 2))
+          ((pred (< _ 5))
+           (let* ((beg (+ (point-min) (random (1+ (buffer-size)))))
+                  (end (min (+ beg (1+ (random 100))) (point-max))))
+             (track-changes-tests--message "Fill %d .. %d" beg end)
+             (fill-region-as-paragraph beg end)))
+          ((pred (< _ 8))
+           (let* ((beg (+ (point-min) (random (1+ (buffer-size)))))
+                  (end (min (+ beg (1+ (random 12))) (point-max))))
+             (track-changes-tests--message "Delete %S at %d .. %d"
+                                           (buffer-substring beg end) beg end)
+             (delete-region beg end)))
+          ((and 8 (guard (= (random 50) 0)))
+           (track-changes-tests--message "Silent insertion")
+           (let ((inhibit-modification-hooks t))
+             (insert "a")))
+          ((and 8 (guard (= (random 10) 0)))
+           (track-changes-tests--message "Revert")
+           (insert-file-contents file nil nil nil 'replace))
+          ((and 8 (guard (= (random 3) 0)))
+           (let* ((beg (+ (point-min) (random (1+ (buffer-size)))))
+                  (end (min (+ beg (1+ (random 12))) (point-max)))
+                  (after (eq (random 2) 0)))
+             (track-changes-tests--message "Bogus %S %d .. %d"
+                                           (if after 'after 'before) beg end)
+             (if after
+                 (run-hook-with-args 'after-change-functions
+                                     beg end (- end beg))
+               (run-hook-with-args 'before-change-functions beg end))))
+          (_
+           (goto-char (+ (point-min) (random (1+ (buffer-size)))))
+           (let ((word (track-changes-tests--random-word)))
+             (track-changes-tests--message "insert %S at %d" word (point))
+             (insert  word "\n")))))
+      (message "SCOREs: default: %d/%d=%d     disjoint: %d/%d=%d"
+               (aref char-counts 0) (aref sync-counts 0)
+               (/ (aref char-counts 0) (aref sync-counts 0))
+               (aref char-counts 1) (aref sync-counts 1)
+               (/ (aref char-counts 1) (aref sync-counts 1))))))
+
+
+
+;;; track-changes-tests.el ends here
-- 
2.43.0


^ permalink raw reply related	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 18:36           ` Eli Zaretskii
@ 2024-04-08 20:57             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-09  4:10               ` Eli Zaretskii
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-08 20:57 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 70077

> If this indicates that the slots are of built-in-class type, why do we
> show the cryptic t there?

No, what it's saying is that these slots can contain values of any type
(since any type is a subtype of t).
This `Type` information is a way to document what kind of values can be
found in those slots.  Very often we don't bother specifying it, in
which case `t` is used as a default.

>> >> >> +By default SIGNAL is called as soon as convenient after a change, which is
>> >> >                                ^^^^^^^^^^^^^^^^^^^^^
>> >> > "as soon as it's convenient", I presume?
>> >> Other than the extra " it's", what is the difference?
>> > Nothing.  I indeed thing "it's" is missing there.
>> My local native-English representative says that "both work fine"
>> (somewhat dismissively, I must add).
> In that case I'll yield, but do note that it got me stumbled.

Wow.  To me this is just as natural as "as soon as possible", tho used
admittedly less frequently.

>> > In general, when I want to create a clean slate, I don't care too much
>> > about the dirt I remove.  Why is it important to signal errors because
>> > a state I am dumping had some errors?
>> I don't understand why you think it will signal an error?
> Doesn't cl-assert signal an error if the condition is false?

Yes.  What makes you think it will be false?

>> More to the point, it should signal an error only if I made a mistake in
>> `track-changes.el` or if you messed with the internals.
> I have the latter possibility in mind, yes.  Why catch me doing that
> when I'm cleaning up my mess, _after_ all the damage, such as it is,
> was already done?

But `track-changes--clean-state` is not a function to "clean up
the mess" any more than, say, `track-changes-fetch` or
`track-changes--before`.

>> >> >> +;;;; Extra candidates for the API.
>> >> >> +;; This could be a good alternative to using a temp-buffer like I used in
>> >> >                                                                    ^^^^^^
>> >> > "I"?
>> >> Yes, that refers to the code I wrote.
>> > We don't usually leave such style in long-term comments and
>> > documentation.
>> `grep " I " lisp/**/*.el` suggests otherwise.
> "A journey of a thousand miles begins with one first step."

I disagree with the goal, tho.
If you want, I can add something like "--Stef" at the end to clarify who
is this "I", tho nowadays I tend to rely on `C-x v h` to find that kind
of information.

The text there is just recording my thoughts about this part of the
design of the API.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 20:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-09  3:56         ` Eli Zaretskii
  2024-04-09 23:30           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Eli Zaretskii @ 2024-04-09  3:56 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 70077

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 70077@debbugs.gnu.org,  acm@muc.de,  yantar92@posteo.net
> Date: Mon, 08 Apr 2024 16:45:39 -0400
> 
> >> Last, but not least: this needs suitable changes in NEWS and ELisp
> >> manual.
> > Working on it.
> 
> Here it is (and aso on `scratch/track-changes`).

Thanks.

> +Using @code{before-change-functions} and @code{after-change-functions}
> +can be difficult in practice because of a number of pitfalls, such as
> +the fact that the two calls are not always properly paired, or some
> +calls may be missing, either because of bugs in the C code or because of
> +inappropriate use of @code{inhibit-modification-hooks}.

I don't think we should talk about bugs in C code in the manual, at
least not so explicitly.  I would rephrase

  the fact that the two calls are not always properly paired, or some
  calls may be missing, either because some Emacs primitives cannot
  properly pair them or because of incorrect use of
  @code{inhibit-modification-hooks}.

> +The start tracking changes, you have to call
   ^^^^^^^^^
"To start"

> +@code{track-changes-register}, passing it a @var{signal} function as
> +argument.  This will return a tracker @var{id} which is used to identify
> +your tracker to the other functions of the library.  The other main
> +function of the library is @code{track-changes-fetch} which lets you
> +fetch the changes you have not yet processed.

The last sentence is redundant, since you are about to describe
track-changes-fetch shortly.

> +When the buffer is modified, the library will call the @var{signal}
> +function to inform you of that change and will immediately start
> +accumulating subsequent changes into a single combined change.
> +The @var{signal} function serves only to warn that a modification
> +occurred but does not receive a description of the change.  Also the
> +library will not call it again until after you processed
> +the change.

The last sentence should IMO say "...until after you retrieved the
change by calling @code{track-changes-fetch}."  The important part
here is to say what "process" means in practice, instead of leaving it
unsaid.

> +To process changes, you need to call @code{track-changes-fetch}, which

That's not really "processing", that's "retrieval", right?  Processing
is what the program does after it retrieves the changes.

> +@defun track-changes-register signal &key nobefore disjoint immediate
> +This function creates a new @emph{tracker}.  Trackers are kept abstract,

I suggest to use "change tracker" instead of just "tracker".  On my
daytime job, "tracker" has a very different meaning, so I stumble each
time I see this used like that.

Also, I suggest to use @dfn for its markup (and add a @cindex for it
for good measure).

> +By default, the call to the @var{signal} function does not happen
> +immediately, but is instead postponed with a 0 seconds timer.
                                                ^^^^^^^^^^^^^^^
A cross-reference to where timers are described is in order there.

> +usually desired to make sure the @var{signal} function is not called too
> +frequently and runs in a permissive context, freeing the client from
> +performance concerns or worries about which operations might be
> +problematic.  If a client wants to have more control, they can provide
> +a non-nil value as the @var{immediate} argument in which case the
         ^^^
@code{nil}

> +If you're not interested in the actual previous content of the buffer,
> +but are using this library only for its ability to combine many small
> +changes into a larger one and to delay the processing to a more
> +convenient time, you can specify a non-nil value for the @var{before}
                                          ^^^
Likewise.

> +While you may like to accumulate many small changes into larger ones,
> +you may not want to do that if the changes are too far apart.  If you
> +specify a non-nil value for the @var{disjoint} argument, the library
                 ^^^
And likewise.

> +modified region, but if you specified a non-nil @var{nobefore} argument
                                               ^^^
And likewise.

> +In case no changes occurred since the last call,
> +@code{track-changes-fetch} simply does not call @var{func} and returns
> +nil.  If changes did occur, it calls @var{func} and returns the value
   ^^^
And likewise.

> +Once @var{func} finishes, @code{track-changes-fetch} re-enables the
> +@var{signal} function so that it will be called the next time a change
> +occurs.  This is the reason why it calls @var{func} instead of returning
> +a description: it makes sure that the @var{signal} function will not be
> +called while you're still processing past changes.

I think there's a subtler issue here that needs to be described
explicitly: if the entire processing of the change is not done inside
FUNC, there's no guarantee that by the time some other function
processes it, the change is still valid and in particular SIGNAL will
not have been called again.  This is not a trivial aspect, since a
program can use FUNC to do just some partial processing, like squirrel
the changes to some buffer for later processing.

>  * New Modes and Packages in Emacs 30.1
>  
> +** New package Track-Changes.

This should be marked with "+++".





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 20:57             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-09  4:10               ` Eli Zaretskii
  0 siblings, 0 replies; 50+ messages in thread
From: Eli Zaretskii @ 2024-04-09  4:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: acm, yantar92, 70077

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Cc: 70077@debbugs.gnu.org,  acm@muc.de,  yantar92@posteo.net
> Date: Mon, 08 Apr 2024 16:57:11 -0400
> 
> > If this indicates that the slots are of built-in-class type, why do we
> > show the cryptic t there?
> 
> No, what it's saying is that these slots can contain values of any type
> (since any type is a subtype of t).
> This `Type` information is a way to document what kind of values can be
> found in those slots.  Very often we don't bother specifying it, in
> which case `t` is used as a default.

Then maybe show "Any" or even "N/A".  t is simply not informative at
all.  Actually, it is counter-productive: it makes a simple issue look
complex and strange.

> >> > In general, when I want to create a clean slate, I don't care too much
> >> > about the dirt I remove.  Why is it important to signal errors because
> >> > a state I am dumping had some errors?
> >> I don't understand why you think it will signal an error?
> > Doesn't cl-assert signal an error if the condition is false?
> 
> Yes.  What makes you think it will be false?

If it should always be true, why have it there?

> >> More to the point, it should signal an error only if I made a mistake in
> >> `track-changes.el` or if you messed with the internals.
> > I have the latter possibility in mind, yes.  Why catch me doing that
> > when I'm cleaning up my mess, _after_ all the damage, such as it is,
> > was already done?
> 
> But `track-changes--clean-state` is not a function to "clean up
> the mess" any more than, say, `track-changes-fetch` or
> `track-changes--before`.

I give up (but I don't give in).

> >> >> > "I"?
> >> >> Yes, that refers to the code I wrote.
> >> > We don't usually leave such style in long-term comments and
> >> > documentation.
> >> `grep " I " lisp/**/*.el` suggests otherwise.
> > "A journey of a thousand miles begins with one first step."
> 
> I disagree with the goal, tho.
> If you want, I can add something like "--Stef" at the end to clarify who
> is this "I", tho nowadays I tend to rely on `C-x v h` to find that kind
> of information.

Yes, adding somee ID of "I" would be the least I'd like to have there.

> The text there is just recording my thoughts about this part of the
> design of the API.

The rewording to avoid saying "I" is very simple, but I won't insist.





^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-08 16:06     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-09 17:35       ` Ihor Radchenko
  2024-04-10  2:02         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Ihor Radchenko @ 2024-04-09 17:35 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Alan Mackenzie, 70077

Stefan Monnier <monnier@iro.umontreal.ca> writes:

>> ...
>> This docstring is a bit confusing.
>> If a state object is not the most recent, how come 
>>
>>> +    (concat (buffer-substring (point-min) beg)
>>> +                before
>>> +                (buffer-substring end (point-max)))
>>
>> produces the previous content?
>
> Because it says "If the current buffer currently holds the content of
> the next state".
> [ "current ... currently" wasn't my best moment, admittedly.  ]
>
>> And if the state object is the most recent, "it may be incomplete"...
>> So, when is it safe to use the above (concat ... ) call?
>
> You never want to use this call, it's only there to show in a concise
> manner how BEG/END/BEFORE relate and what information they're supposed
> to hold.

My point is that it is hard to understand that docstring for someone
reading it first. I did not mean that it does not make sense for someone
who is already familiar with the rest of the code.

I believe that it would help if you explain how to interpret the values
when there is a chain of state1 -> next -> next -> nil.

>> This is a bit confusing. The first paragraph says that SIGNAL is called
>> with a single argument, but that it appears that two arguments may be
>> passed. I'd rather tell the calling convention early in the docstring.
>
> The second convention is used only when DISJOINT is non-nil, which is
> why it's described where we document the effect of DISJOINT.
> An alternative could be to make the `disjoint` arg hold the function to call
> to signal disjoint changes.

Sure. My point is that the current formulation of the docstring might
lead to errors unless one is very careful when reading it.

>>> +  (unless nobefore
>>> +    (setq track-changes--before-no nil)
>>> +    (add-hook 'before-change-functions #'track-changes--before nil t))
>>> +  (add-hook 'after-change-functions  #'track-changes--after  nil t)
>> Note that all the changes made in indirect buffers will be missed.
>> See bug#60333.
>
> Yup.  And misuses of `inhibit-modification-hooks` or
> `with-silent-modifications`.  🙁

Indirect buffer is not at all a misuse. It is a perfectly valid use.
What I implied in my comment is that it would be nice to implement
something like what whitespace.el does in bug#60333.

This point is important from the point of view of Org mode because Org
mode uses a lot of indirect buffers both internally and as a recommended
workflow (org-tree-to-indirect-buffer).

>> ...
>> such state will not pass (< beg end) assertion in
>> `track-changes--clean-state' called in `track-changes-fetch' immediately
>> after `track-changes--recover-from-error'
>
> I can't reproduce that.  Do you have a recipe?
>
> AFAICT all the (< beg end) tests in `track-changes--clean-state` are
> conditional on `track-changes--before-clean` being nil, whereas
> `track-changes--recover-from-error` sets that var to `unset`.

You are right. That sneaky (track-changes--before-clean nil) line is so
sneaky that I'd like to see a comment highlighting that it exists ;)
I simply missed it and thought that `track-changes--before-clean' only
has two cond branches when I was reading the code.

>>> +      (atomic-change-group
>>> +        (goto-char end)
>>> +        (insert-before-markers before)
>>> +        (delete-region beg end)
>>
>> What happens if there are markers inside beg...end?
>
> During FUNC they're moved to BEG or END, and when we restore the
> original state, well... the undo machinery has some support to restore
> the markers where they were, but it's definitely not 100%.  🙁

Interesting. I did not know that markers are recovered by undo.

I had very bad experience with marker preservation when working with
other low-level APIs like `replace-match'. And previous Org mode
commiters had the same issues with kill/yank. We even have a dedicated
machinery in Org mode to save/restore markers in region. For example,
see `org-save-markers-in-region' `org-reinstall-markers-in-region'
(there are more in various places).

May undo be trusted? Or may it be more reliable to deploy a more manual
approach explicitly saving the markers and restoring them via
unwind-protect?

>> If you are using random values for edits, how can such test be
>> reproduced?
>
> Luck?

Such test is barely useful then, IMHO.

>> Maybe first generate a random seed and then log it, so that the
>> failing test can be repeated if necessary with seed assigned manually.
>
> Good idea.
> But my attempt (see patch below) failed.
> I'm not sure what I'm doing wrong, but
>
>     make test/lisp/emacs-lisp/track-changes-tests \
>          EMACS_EXTRAOPT="--eval '(setq track-changes-tests--random-seed \"15216888\")'"
>
> gives a different score each time.  🙁

May it be because you used different Emacs builds?

   Elisp manual 3.10 Random Numbers

   Sometimes you want the random number sequence to be repeatable. For
   example, when debugging a program whose behavior depends on the
   random number sequence, it is helpful to get the same behavior in
   each program run. To make the sequence repeat, execute ‘(random "")’.
   This sets the seed to a constant value for your particular Emacs
   executable (though it may differ for other Emacs builds). You can use
               ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
   other strings to choose various seed values.

As an alternative, you can use `cl-make-random-state' + `cl-random'.
They are less random AFAIU, but it is not a big deal in this particular
use case.

-- 
Ihor Radchenko // yantar92,
Org mode contributor,
Learn more about Org mode at <https://orgmode.org/>.
Support Org development at <https://liberapay.com/org-mode>,
or support my work at <https://liberapay.com/yantar92>

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-09  3:56         ` Eli Zaretskii
@ 2024-04-09 23:30           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  2024-04-13 13:44             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 1 reply; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-09 23:30 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 70077

>> +Using @code{before-change-functions} and @code{after-change-functions}
>> +can be difficult in practice because of a number of pitfalls, such as
>> +the fact that the two calls are not always properly paired, or some
>> +calls may be missing, either because of bugs in the C code or because of
>> +inappropriate use of @code{inhibit-modification-hooks}.
>
> I don't think we should talk about bugs in C code in the manual, at
> least not so explicitly.  I would rephrase
>
>   the fact that the two calls are not always properly paired, or some
>   calls may be missing, either because some Emacs primitives cannot
>   properly pair them or because of incorrect use of
>   @code{inhibit-modification-hooks}.

Thanks.

>> +@code{track-changes-register}, passing it a @var{signal} function as
>> +argument.  This will return a tracker @var{id} which is used to identify
>> +your tracker to the other functions of the library.  The other main
>> +function of the library is @code{track-changes-fetch} which lets you
>> +fetch the changes you have not yet processed.
>
> The last sentence is redundant, since you are about to describe
> track-changes-fetch shortly.

Good point.

>> +When the buffer is modified, the library will call the @var{signal}
>> +function to inform you of that change and will immediately start
>> +accumulating subsequent changes into a single combined change.
>> +The @var{signal} function serves only to warn that a modification
>> +occurred but does not receive a description of the change.  Also the
>> +library will not call it again until after you processed
>> +the change.
>
> The last sentence should IMO say "...until after you retrieved the
> change by calling @code{track-changes-fetch}."  The important part
> here is to say what "process" means in practice, instead of leaving it
> unsaid.

But then we get:

    ...until after you retrieved the
    change by calling @code{track-changes-fetch}.
    
    To process changes, you need to call @code{track-changes-fetch}, ...

which again is kind of redundant/heavy.

I went with:

    To start tracking changes, you have to call
    @code{track-changes-register}, passing it a @var{signal} function as
    argument.  This returns a tracker @var{id} which is used to
    identify your change tracker to the other functions of the library.
    When the buffer is modified, the library calls the @var{signal}
    function to inform you of that change and immediately starts
    accumulating subsequent changes into a single combined change.
    The @var{signal} function serves only to warn that a modification
    occurred but does not receive a description of the change.  Also the
    library will not call it again until after you retrieved the change.
    
    To retrieve changes, you need to call @code{track-changes-fetch},
    which provides you with the bounds of the changes accumulated
    since the last call, as well as the previous content of that region.
    It also ``re-arms'' the @var{signal} function so that the
    library will call it again after the next buffer modification.

[ Where I also switched to the present tense. Not sure why I used the
  future tense.  ]

>> +@defun track-changes-register signal &key nobefore disjoint immediate
>> +This function creates a new @emph{tracker}.  Trackers are kept abstract,
>
> I suggest to use "change tracker" instead of just "tracker".  On my
> daytime job, "tracker" has a very different meaning, so I stumble each
> time I see this used like that.
>
> Also, I suggest to use @dfn for its markup (and add a @cindex for it
> for good measure).

I turned some uses of "tracker" into "change tracker" and I think it's
indeed an improvement.  I tried to do it more systematically but with
all the "track-changes" and "change trackers" I started to feel like I was
writing a tongue twister.

>> +By default, the call to the @var{signal} function does not happen
>> +immediately, but is instead postponed with a 0 seconds timer.
>                                                 ^^^^^^^^^^^^^^^
> A cross-reference to where timers are described is in order there.

Thanks.

>> +usually desired to make sure the @var{signal} function is not called too
>> +frequently and runs in a permissive context, freeing the client from
>> +performance concerns or worries about which operations might be
>> +problematic.  If a client wants to have more control, they can provide
>> +a non-nil value as the @var{immediate} argument in which case the
>          ^^^
> @code{nil}

Duh.

>> +Once @var{func} finishes, @code{track-changes-fetch} re-enables the
>> +@var{signal} function so that it will be called the next time a change
>> +occurs.  This is the reason why it calls @var{func} instead of returning
>> +a description: it makes sure that the @var{signal} function will not be
>> +called while you're still processing past changes.
>
> I think there's a subtler issue here that needs to be described
> explicitly: if the entire processing of the change is not done inside
> FUNC, there's no guarantee that by the time some other function
> processes it, the change is still valid and in particular SIGNAL will
> not have been called again.  This is not a trivial aspect, since a
> program can use FUNC to do just some partial processing, like squirrel
> the changes to some buffer for later processing.

I tried to be more explicit about that as follows:

    Once @var{func} finishes, @code{track-changes-fetch} re-enables the
    @var{signal} function so that it will be called the next time a change
    occurs.  This is the reason why it calls @var{func} instead of returning
    a description: it lets you process the change without worrying about the
    risk that the @var{signal} function gets triggered in the middle of it,
    because the @var{signal} is re-enabled only after @var{func} finishes.

Note that calling SIGNAL before the change is "processed" is not
necessarily a problem.  E.g. Eglot does exactly that: its FUNC just
turns the NEWBEG/NEWEND/BEFORE into something like OLDBEG/OLDEND/AFTER
and pushes it on a list for later processing, it then waits for more
changes to come before later sending them to the server.

>>  * New Modes and Packages in Emacs 30.1
>> +** New package Track-Changes.
> This should be marked with "+++".

Thanks.

>> No, what it's saying is that these slots can contain values of any type
>> (since any type is a subtype of t).
>> This `Type` information is a way to document what kind of values can be
>> found in those slots.  Very often we don't bother specifying it, in
>> which case `t` is used as a default.
> Then maybe show "Any" or even "N/A".  t is simply not informative at
> all.  Actually, it is counter-productive: it makes a simple issue look
> complex and strange.

OK, I just used the empty string.


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-09 17:35       ` Ihor Radchenko
@ 2024-04-10  2:02         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-10  2:02 UTC (permalink / raw)
  To: Ihor Radchenko; +Cc: Alan Mackenzie, 70077

>> You never want to use this call, it's only there to show in a concise
>> manner how BEG/END/BEFORE relate and what information they're supposed
>> to hold.
> My point is that it is hard to understand that docstring for someone
> reading it first. I did not mean that it does not make sense for someone
> who is already familiar with the rest of the code.
> I believe that it would help if you explain how to interpret the values
> when there is a chain of state1 -> next -> next -> nil.

How 'bout:

      "Object holding a description of a buffer state.
    A buffer state is described by a BEG/END/BEFORE triplet which say how to
    recover that state from the next state.  I.e. if the buffer's contents
    reflects the next state, you can recover the previous state by replacing
    the BEG..END region with the BEFORE string.

    NEXT is the next state object (i.e. a more recent state).
    If NEXT is nil it means it's the most recent state and it may be incomplete
    \(BEG/END/BEFORE may be nil), in which case those fields will take their
    values from `track-changes--before-(beg|end|before)' when the next
    state is created."

>> The second convention is used only when DISJOINT is non-nil, which is
>> why it's described where we document the effect of DISJOINT.
>> An alternative could be to make the `disjoint` arg hold the function to call
>> to signal disjoint changes.
>
> Sure. My point is that the current formulation of the docstring might
> lead to errors unless one is very careful when reading it.

The Texinfo doc is more upfront about the fact that it's called
sometimes with one and sometimes with 2 args, but I feel that the
function's docstring is already too long so I'd rather not
repeat that here.

I feel that

    These calls are distinguished from normal calls by calling SIGNAL with
    a second argument which ...

does make it sufficiently clear that it's different.  Also, any errors
should be caught pretty quickly, so I'm not worried about seeing a lot of
broken code with latent bugs because the SIGNAL function doesn't accept
the right number of arguments.

>>>> +  (unless nobefore
>>>> +    (setq track-changes--before-no nil)
>>>> +    (add-hook 'before-change-functions #'track-changes--before nil t))
>>>> +  (add-hook 'after-change-functions  #'track-changes--after  nil t)
>>> Note that all the changes made in indirect buffers will be missed.
>>> See bug#60333.
>> Yup.  And misuses of `inhibit-modification-hooks` or
>> `with-silent-modifications`.  🙁
> Indirect buffer is not at all a misuse.

I didn't claim the opposite here (tho I do admit to thinking that it's
an attractive nuisance).  I only reminded that cases where changes are
missed are things that can and do happen because of various problems in
various pieces of code and indirect buffers are just one of those
sources of problems.

> What I implied in my comment is that it would be nice to implement
> something like what whitespace.el does in bug#60333.

I'll reply the same as in bug#60333: it should be fixed at a lower level.
I don't see any reason why it should be difficult to make it work right.

> This point is important from the point of view of Org mode because Org
> mode uses a lot of indirect buffers both internally and as a recommended
> workflow (org-tree-to-indirect-buffer).

Then I encourage you to try and fix it: rename `make-indirect-buffer` to
`internal-make-indirect-buffer`, define a new `make-indirect-buffer` in
ELisp which registers *-change-functions to propagate the changes to all
the related buffers.

Personally all the problems with indirect buffers put me off.

> You are right. That sneaky (track-changes--before-clean nil) line is so
> sneaky that I'd like to see a comment highlighting that it exists ;)
> I simply missed it and thought that `track-changes--before-clean' only
> has two cond branches when I was reading the code.

OK, I made it stand out more. 🙂

>> During FUNC they're moved to BEG or END, and when we restore the
>> original state, well... the undo machinery has some support to restore
>> the markers where they were, but it's definitely not 100%.  🙁
>
> Interesting. I did not know that markers are recovered by undo.

That's what the following entries are for:

    An entry (MARKER . DISTANCE) indicates that the marker MARKER
    was adjusted in position by the offset DISTANCE (an integer).

> I had very bad experience with marker preservation when working with
> other low-level APIs like `replace-match'. And previous Org mode
> commiters had the same issues with kill/yank.

`replace-match`, `kill`, `yank`, etc... can't do very much because they
have no idea how the new text relates to the old one, so without
additional application-specific information, the only thing they can do
is move the inner markers to one of the ends of the change.
E.g. when you move a subtree, *you* know that the text you insert is the
same text that you previously deleted, but the `insdel.c` primitives
don't know that.
In some cases you can get the desired behavior "automatically" by
using text-properties instead of markers, tho.

Undo is different because we're going back to a state for which Emacs
can know where the markers should be because we've been there.

> May undo be trusted?

I never looked closely to see how reliable it is.
AFAIK for the markers for which we don't call `move-marker`
it should not be too hard to make it reliable (if it's not already).
But for those on which we do call `move-marker`, I don't think the
current form of `buffer-undo-list` provides sufficient info to
know whether the marker should be "moved to its previous position"
or should be left alone (if `move-marker` was called in the mean
time, the user probably doesn't want that marker to move back).

We know this because the `mark` is a marker we move all the time
(instead of creating a new marker each time) and we've had a bug report
about it.  🙁

> Or may it be more reliable to deploy a more manual
> approach explicitly saving the markers and restoring them via
> unwind-protect?

We don't have a primitive to fetch all the markers in a given region.
But if the FUNC in `track-changes--in-revert` doesn't move any of the
markers, then the `buffer-undo-list` should be good enough (in theory at
least, so it depends on whether we have bugs in there).

>>> Maybe first generate a random seed and then log it, so that the
>>> failing test can be repeated if necessary with seed assigned manually.
>>
>> Good idea.
>> But my attempt (see patch below) failed.
>> I'm not sure what I'm doing wrong, but
>>
>>     make test/lisp/emacs-lisp/track-changes-tests \
>>          EMACS_EXTRAOPT="--eval '(setq track-changes-tests--random-seed \"15216888\")'"
>>
>> gives a different score each time.  🙁
>
> May it be because you used different Emacs builds?

Nope, just repeated invocations of the same Emacs binary.
It was much more silly than that: I seeded the random number generator
too late (after generating the initial buffer content).  🙂
It works now, thank you.

        Stefan

^ permalink raw reply	[flat|nested] 50+ messages in thread

* bug#70077: An easier way to track buffer changes
  2024-04-09 23:30           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
@ 2024-04-13 13:44             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
  0 siblings, 0 replies; 50+ messages in thread
From: Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors @ 2024-04-13 13:44 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: acm, yantar92, 70077-done

Pushed to master, closing,


        Stefan






^ permalink raw reply	[flat|nested] 50+ messages in thread

end of thread, other threads:[~2024-04-13 13:44 UTC | newest]

Thread overview: 50+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2024-03-29 16:15 bug#70077: An easier way to track buffer changes Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-29 18:12 ` Eli Zaretskii
2024-03-29 18:53   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30  6:34     ` Eli Zaretskii
2024-03-30 14:58       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30 16:45         ` Eli Zaretskii
2024-03-31  2:57           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-01 11:53         ` Ihor Radchenko
2024-04-01 14:51           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-01 17:49             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-02 14:22               ` Ihor Radchenko
2024-04-02 15:17                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-02 16:21                   ` Ihor Radchenko
2024-04-02 17:51                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-03 12:34                       ` Ihor Radchenko
2024-04-03 12:45                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-04 17:58                           ` Ihor Radchenko
2024-03-30  3:17   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30  5:09     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-29 22:20 ` phillip.lord
2024-03-29 22:59   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30  6:46     ` Eli Zaretskii
2024-03-30 12:06     ` phillip.lord
2024-03-30 13:39       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-03-30  9:51 ` Ihor Radchenko
2024-03-30 12:49   ` Eli Zaretskii
2024-03-30 13:19     ` Ihor Radchenko
2024-03-30 13:31       ` Eli Zaretskii
2024-03-30 14:09   ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-05 22:12 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-06  8:43   ` Eli Zaretskii
2024-04-08 15:24     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-08 15:53       ` Eli Zaretskii
2024-04-08 17:17         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-08 17:27           ` Andrea Corallo
2024-04-08 18:36           ` Eli Zaretskii
2024-04-08 20:57             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-09  4:10               ` Eli Zaretskii
2024-04-08 20:45       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-09  3:56         ` Eli Zaretskii
2024-04-09 23:30           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-13 13:44             ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-06 17:37   ` Dmitry Gutov
2024-04-06 19:44     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-07 14:40       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-07 15:47         ` Dmitry Gutov
2024-04-07 14:07   ` Ihor Radchenko
2024-04-08 16:06     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-04-09 17:35       ` Ihor Radchenko
2024-04-10  2:02         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).