From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Stefan Monnier via "Bug reports for GNU Emacs,
 the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
Newsgroups: gmane.emacs.bugs
Subject: bug#70077: An easier way to track buffer changes
Date: Sat, 30 Mar 2024 10:58:40 -0400
Message-ID: <jwv7chjvmho.fsf-monnier+emacs@gnu.org>
References: <jwvle615806.fsf@iro.umontreal.ca> <86frw8ewk9.fsf@gnu.org>
 <jwvle60x4d6.fsf-monnier+emacs@gnu.org> <86cyrcdy80.fsf@gnu.org>
Reply-To: Stefan Monnier <monnier@iro.umontreal.ca>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="20326"; mail-complaints-to="usenet@ciao.gmane.io"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cc: yantar92@posteo.net, 70077@debbugs.gnu.org, casouri@gmail.com,
 qhong@alum.mit.edu, frederic.bour@lakaban.net, joaotavora@gmail.com,
 mail@nicolasgoaziou.fr, acm@muc.de, stephen_leake@stephe-leake.org,
 alan.zimm@gmail.com, phillip.lord@russet.org.uk
To: Eli Zaretskii <eliz@gnu.org>
Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Mar 30 15:59:19 2024
Return-path: <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org>
Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org>)
	id 1rqaAx-00054X-9E
	for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 30 Mar 2024 15:59:19 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <bug-gnu-emacs-bounces@gnu.org>)
	id 1rqaAk-0004S8-KB; Sat, 30 Mar 2024 10:59:06 -0400
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
 id 1rqaAe-0004Rq-L7
 for bug-gnu-emacs@gnu.org; Sat, 30 Mar 2024 10:59:00 -0400
Original-Received: from debbugs.gnu.org ([2001:470:142:5::43])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <Debian-debbugs@debbugs.gnu.org>)
 id 1rqaAe-0008QY-CU
 for bug-gnu-emacs@gnu.org; Sat, 30 Mar 2024 10:59:00 -0400
Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2)
 (envelope-from <Debian-debbugs@debbugs.gnu.org>) id 1rqaAf-0004xu-Vf
 for bug-gnu-emacs@gnu.org; Sat, 30 Mar 2024 10:59:01 -0400
X-Loop: help-debbugs@gnu.org
Resent-From: Stefan Monnier <monnier@iro.umontreal.ca>
Original-Sender: "Debbugs-submit" <debbugs-submit-bounces@debbugs.gnu.org>
Resent-CC: bug-gnu-emacs@gnu.org
Resent-Date: Sat, 30 Mar 2024 14:59:01 +0000
Resent-Message-ID: <handler.70077.B70077.171181073419059@debbugs.gnu.org>
Resent-Sender: help-debbugs@gnu.org
X-GNU-PR-Message: followup 70077
X-GNU-PR-Package: emacs
X-GNU-PR-Keywords: patch
Original-Received: via spool by 70077-submit@debbugs.gnu.org id=B70077.171181073419059
 (code B ref 70077); Sat, 30 Mar 2024 14:59:01 +0000
Original-Received: (at 70077) by debbugs.gnu.org; 30 Mar 2024 14:58:54 +0000
Original-Received: from localhost ([127.0.0.1]:46015 helo=debbugs.gnu.org)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <debbugs-submit-bounces@debbugs.gnu.org>)
 id 1rqaAX-0004xL-Aj
 for submit@debbugs.gnu.org; Sat, 30 Mar 2024 10:58:53 -0400
Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:59256)
 by debbugs.gnu.org with esmtp (Exim 4.84_2)
 (envelope-from <monnier@iro.umontreal.ca>) id 1rqaAV-0004x6-H7
 for 70077@debbugs.gnu.org; Sat, 30 Mar 2024 10:58:52 -0400
Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1])
 by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id AAFD5441506;
 Sat, 30 Mar 2024 10:58:43 -0400 (EDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca;
 s=mail; t=1711810721;
 bh=P1DGJ8aMWNYPARIfB+KpIiHJqtgrioIXFBM5CiRbo4w=;
 h=From:To:Cc:Subject:In-Reply-To:References:Date:From;
 b=em2LDzB5CYfPFTvTSBFwv+5F3DKRHp1Ib1gT0r8Cy5p5IypbqQkYBBypRzk7RfSyR
 i/GW9w1oZAmH6HUTAAsedFb90yVuNb1gJZT8D3bSPLHfztXlKwWkd6y07Oq6ju9d/z
 Ysk1cHZgtiNOS1PPeWvExUA+Bupiyrgqs+xrSxusoHQ5zLVPliWnV+WJTXxI3nFV8r
 qShktHowq3k5XzOU2vp+v5/U+yCmK4yL33XTb5UWGthvC64VYc7F9xmkTy/uPC6pNc
 PfCK9LhmCCFnUyJ8cd2NgjrlOofiGO54NJHygfQBJ3zUs8NcO9se5Qv6+3bop0NOe/
 ZadhanINy376g==
Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1])
 by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 7FD21441421;
 Sat, 30 Mar 2024 10:58:41 -0400 (EDT)
Original-Received: from pastel (unknown [45.72.201.215])
 by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 296221201E7;
 Sat, 30 Mar 2024 10:58:41 -0400 (EDT)
In-Reply-To: <86cyrcdy80.fsf@gnu.org> (Eli Zaretskii's message of "Sat, 30 Mar
 2024 09:34:39 +0300")
X-BeenThere: debbugs-submit@debbugs.gnu.org
X-Mailman-Version: 2.1.18
Precedence: list
X-BeenThere: bug-gnu-emacs@gnu.org
List-Id: "Bug reports for GNU Emacs,
 the Swiss army knife of text editors" <bug-gnu-emacs.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/bug-gnu-emacs>,
 <mailto:bug-gnu-emacs-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/bug-gnu-emacs>
List-Post: <mailto:bug-gnu-emacs@gnu.org>
List-Help: <mailto:bug-gnu-emacs-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/bug-gnu-emacs>,
 <mailto:bug-gnu-emacs-request@gnu.org?subject=subscribe>
Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org
Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.emacs.bugs:282377
Archived-At: <http://permalink.gmane.org/gmane.emacs.bugs/282377>

> If the last point, i.e. the problems caused by limitations on what can
> be safely done from modification hooks, is basically the main
> advantage, then I think I understand the rationale.

That's one of the purposes but not the only one.

> Otherwise, the above looks like doing all the job in
> after-change-functions, and it is not clear to me how is that better,
> since if track-changes-fetch will fetch a series of changes,

`track-changes-fetch` will call its function argument only once.
If several changes happened since last time, `track-changes.el` will
summarize them into a single (BEG END BEFORE).

> deciding how to handle them could be much harder than handling them
> one by one when each one happens.  For example, the BEGIN..END values
> no longer reflect the current buffer contents,

Indeed, one of the purposes of the proposed API is to allow delaying the
handling to a later time, and if you keep individual changes, then it
becomes very difficult to make sense of those changes because they refer
to buffer states that aren't available any more.

That's why `track-changes.el` bundles those into a single (BEG END
BEFORE) change, which makes sure BEG/END are currently-valid buffer
positions and thus easy to use.
The previous buffer state is not directly available but can be
easily reconstructed from (BEG END BEFORE).

> Also, all of your examples seem to have the signal function just call
> track-changes-fetch and do almost nothing else, so I wonder why we
> need a separate function for that,

It gives more freedom to the clients.  For example it would allow
`eglot.el` to get rid of the `eglot--recent-changes` list of changes:
instead of calling `track-changes-fetch` directly from the signal and
collect the changes in `eglot--recent-changes`, it would delay the call
to `track-changes-fetch` to the idle-timer where we run
`eglot--signal-textDocument/didChange` so it would never need to send
more than a "single change" to the server.

Similarly, it would allow changing the specific moment the signal is
called without necessarily moving the moment the changes are performed.
This may be necessary in `eglot.el`: there are various places where we
check the boolean value of `eglot--recent-changes` to know if we're
in sync with the server.  In the current code this value because non-nil
as soon as a modification is made, whereas with my patch it becomes
non-nil only after the end of the command that made the first change.
I don't know yet if the difference is important, but if it is, then
we'd want to ask `track-changes.el` to send the signal more promptly
(there is a FIXME to add such a feature), although we would still want
to delay the `track-changes-fetch` so it's called only at the end of
the command.

> and more specifically what would be a use case where the registered
> signal function does NOT call track-changes-fetch, but does something
> else, and track-changes-fetch is then called outside of the
> signal function.

As mentioned, a use-case I imagine are cases where we want to delay the
processing of changes to an idle timer.  In the current `eglot.el` that
idle timer processes a sequence of changes, but for some use cases it
may too difficult (for the reasons discussed above: it can be difficult
to make sense of earlier changes once they're disconnected from the
corresponding buffer state), so they'd instead prefer to call
`track-changes-fetch` from the idle timer (and thus let
`track-changes.el` combine all those changes).

> Finally, the doc string of track-changes-register does not describe
> the exact place in the buffer-change sequence where the signal
> function will be called, which makes it harder to reason about it.
> Will it be called where we now call signal_after_change or somewhere
> else?

Good point.  Currently it's called via `funcall-later` which isn't in
`master` but can be thought of as `run-with-timer` with a 0 delay.
[ In reality it relies on `pending_funcalls` instead.  ]

But indeed, I have a FIXME to let the caller request to signal
as soon as possible, i.e. directly from the `after-change-functions`.

I think it's better to use something like `funcall-later` by default
than to signal directly from `after-change-functions` because most
coders don't realize that `after-change-functions` can be called
thousands of times for a single command (and in normal testing, they'll
probably never bump into such a case either).  So waiting for the end of
the current command (via `funcall-later`) provides a behavior closer to
what most naive coders expect, I believe, and will save them from the
corner cased they weren't aware of.

> And how do you guarantee that the signal function will not be
> called again until track-changes-fetch is called?

By removing the tracker from `track-changes--clean-trackers` (and
re-adding it once `track-changes-fetch` is finished, which is the main
reason I make `track-changes-fetch` call a function argument rather
than making it return (BEG END CHANGES)).

In a later email you wrote:

>>     (concat (buffer-substring (point-min) beg)
>>             before
>>             (buffer-substring end (point-max)))
>
> But if you get several changes, the above will need to be done in
> reverse order, back-to-front, no?

No, because you (the client) never get several changes.

>> I don't mean to suggest to do that, since it's costly for large
>> buffers
> Exactly.  But if the buffer _is_ large, then what to do?

It all depends on the specific cases.  E.g. in the case of `eglot.el` we
don't need the full content of the buffer before the change.  We only
really need to know how many newlines were in `before` as well as some
kind of length of the last line of `before`.  I compute that
by inserting `before` into a temp buffer.  Note that this is
proportional to the size of `before` and not to the total size of
the buffer.

If we want to do better, I think we'd then need a more complex API where
the client can specify more precisely (presumably via some kind of
function) what information we need to record about the "before state"
(and how to update that information as new changes are performed).

I don't have a good idea for what such an API could look like.

>> Also, it would fix only the problem of pairing, and not the other ones.
> So the main/only problem this mechanism solves is the lack of pairing
> between before/after calls to modification hooks?

No, the text you cite is saying the opposite: that we don't want to
solve only the pairing.  I hope/want it to solve all the
problems I mentioned:

    - before and after calls are not necessarily paired.
    - the beg/end values don't always match.
    - there can be thousands of calls from within a single command.
    - these hooks are run at a fairly low-level so there are things they
      really shouldn't do, such as modify the buffer or wait.
    - the after call doesn't get enough info to rebuild the before-change state,
      so some callers need to use both before-c-f and after-c-f (and then
      deal with the first two points above).

I don't claim it solves them all in a perfect way, tho.


        Stefan