From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#70077: An easier way to track buffer changes Date: Tue, 02 Apr 2024 11:17:41 -0400 Message-ID: References: <86frw8ewk9.fsf@gnu.org> <86cyrcdy80.fsf@gnu.org> <877chh8fjk.fsf@localhost> <87wmpfsv2y.fsf@localhost> Reply-To: Stefan Monnier Mime-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="30028"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: casouri@gmail.com, 70077@debbugs.gnu.org, qhong@alum.mit.edu, frederic.bour@lakaban.net, joaotavora@gmail.com, mail@nicolasgoaziou.fr, acm@muc.de, Eli Zaretskii , stephen_leake@stephe-leake.org, alan.zimm@gmail.com, phillip.lord@russet.org.uk To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Apr 02 17:18:34 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rrfuD-0007bm-Ua for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 02 Apr 2024 17:18:34 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rrftk-0007FU-Px; Tue, 02 Apr 2024 11:18:04 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rrftf-0007Eb-GC for bug-gnu-emacs@gnu.org; Tue, 02 Apr 2024 11:18:00 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rrftf-00044U-73 for bug-gnu-emacs@gnu.org; Tue, 02 Apr 2024 11:17:59 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rrfti-0004nd-Bb for bug-gnu-emacs@gnu.org; Tue, 02 Apr 2024 11:18:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 02 Apr 2024 15:18:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70077 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 70077-submit@debbugs.gnu.org id=B70077.171207107618402 (code B ref 70077); Tue, 02 Apr 2024 15:18:02 +0000 Original-Received: (at 70077) by debbugs.gnu.org; 2 Apr 2024 15:17:56 +0000 Original-Received: from localhost ([127.0.0.1]:54983 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rrftc-0004mi-7u for submit@debbugs.gnu.org; Tue, 02 Apr 2024 11:17:56 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:27096) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rrfta-0004lr-IK for 70077@debbugs.gnu.org; Tue, 02 Apr 2024 11:17:55 -0400 Original-Received: from pmg1.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id D0D3C10005D; Tue, 2 Apr 2024 11:17:44 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1712071063; bh=eeNWoHWR/wUiP2mhIy2njzwEpHTZJYqdVZ1Aqxh3DhM=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=WLVOCjS47f//H5W3OgEUuyNMycF9xhOwgoxoqjVbuygzfIUopH+GopUUTods8cC2L arW30CBKKzkED8+qaCkzEUQHdfbxEhqeTWeD68fwCKbbiE6J7MpvZN8QDZRAS5ml2K TaO4fecFA7iOoBrVDalNpRlHC6H2AitWQ1bSGP7EKD7BNms228NGY+MIdLqDV6qVem vD3vc71tz/u8P8Z5PVstwBEMtEF9jsmUCakSuyWTXXuTAQWFO5ZmCYh1s+Uwdsrlv3 dktTtzDD0YAhQtGKJhD+B2W//ujVpY8wQjzpgqpqPgPDr4HFPjxj9P1ERFm1o8FsD4 HSL1B1adk6Oiw== Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg1.iro.umontreal.ca (Proxmox) with ESMTP id A7833100048; Tue, 2 Apr 2024 11:17:43 -0400 (EDT) Original-Received: from pastel (unknown [45.72.201.215]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 4972512082A; Tue, 2 Apr 2024 11:17:43 -0400 (EDT) In-Reply-To: <87wmpfsv2y.fsf@localhost> (Ihor Radchenko's message of "Tue, 02 Apr 2024 14:22:29 +0000") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:282519 Archived-At: --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable >> I'm not sure how to combine the benefits of combining small changes into >> larger ones with the benefits of keeping distant changes separate. > I am not sure if combining small changes into larger ones is at all a > good idea. In my experience, if the amount of work to be done "per change" is not completely trivial, combining small changes is indispensable (the worst part being that the cases where it's needed are sufficiently infrequent that naive users of `*-change-functions` won't know about that, so we end up with unacceptable slowdowns in corner cases). It also keeps the API simpler since the clients only need to care about at most one (BEG END BEFORE) item at any given time. > The changes are stored in strings, which get allocated and > re-allocated repeatedly. Indeed. Tho for N small changes, my code should perform only O(log N) re-allocations. > Repeated string allocations, especially when strings keep growing > towards the buffer size, is likely going to increase consing and make > GCs more frequent. Similar allocations presumably take place anyway while running the code (e.g. for the `buffer-undo-list`), so I'm hoping the effect will be "lost in the noise". But admittedly for code which does not need the `before` string at all (such as `diff-mode.el`), it's indeed a waste of effort. I haven't been able to come up with an API which is still simple but without such costs. > Aside... > How nice would it be if buffer state and buffer text were persistent. > Like in https://code.visualstudio.com/blogs/2018/03/23/text-buffer-reimpl= ementation =F0=9F=99=82 >> We could expose a list of simultaneous (and thus disjoint) changes, >> which avoids the last problem. But it's a fair bit more work for us, it >> makes the API more complex for the clients, and it's rarely what the >> clients really want anyway. > FYI, Org parser maintains such a list. Could you point me to the relevant code (I failed to find it)? > We previously discussed a similar API in > https://yhetil.org/emacs-bugs/87o7iq1emo.fsf@localhost/ IIUC this discusses a *sequence* of edits. In the point to which you replied I was discussing keeping a list of *simultaneous* edits. This said, the downside in both cases is that the specific data that we need from such a list tends to depend on the user. E.g. you suggest (BEG END_BEFORE END_AFTER COUNTER) but that is not sufficient reconstruct the corresponding buffer state, so things like Eglot/CRDT can't use it. Ideally for CRDT I think you'd want a sequence of (BEG END-BEFORE STRING-AFTER) but for Eglot this is not sufficient because Eglot needs to convert BEG and END_BEFORE into LSP positions (i.e. "line+col") and for that it needs to reproduce the past buffer state. So instead, what Eglot needs (and does indeed build using `*-change-functions`) is a sequence of (LSP-BEG LSP-END-BEFORE STRING-AFTER) [ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure exactly why, but I guess it's a piece of redundant info the servers can use to sanity-check the data? ] >> But it did occur to me that we could solve the "disjoint changes" >> problem in the following way: signal the client (from >> `before-change-functions`) when a change is about to be made "far" from >> the currently pending changes. > This makes sense. The code changes were actually quite simple, so I have now included it. See my current PoC code below. Stefan --=-=-= Content-Type: application/emacs-lisp Content-Disposition: attachment; filename=track-changes.el Content-Transfer-Encoding: quoted-printable ;;; track-changes.el --- API to react to buffer modifications -*- lexical-= binding: t; -*- ;; Copyright (C) 2024 Free Software Foundation, Inc. ;; Author: Stefan Monnier ;; This file is part of GNU Emacs. ;; GNU Emacs is free software: you can redistribute it and/or modify ;; it under the terms of the GNU General Public License as published by ;; the Free Software Foundation, either version 3 of the License, or ;; (at your option) any later version. ;; GNU Emacs is distributed in the hope that it will be useful, ;; but WITHOUT ANY WARRANTY; without even the implied warranty of ;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the ;; GNU General Public License for more details. ;; You should have received a copy of the GNU General Public License ;; along with GNU Emacs. If not, see . ;;; Commentary: ;; This library is a layer of abstraction above `before-change-functions' ;; and `after-change-functions' which takes care of accumulating changes ;; until a time when its client finds it convenient to react to them. ;; It provides the following operations: ;; ;; (track-changes-register SIGNAL) ;; (track-changes-fetch ID FUNC) ;; (track-changes-unregister ID) ;; (track-changes-reset ID) ;; (track-changes-register-disjoint ID) ;; ;; A typical use case might look like: ;; ;; (defvar my-foo--change-tracker nil) ;; (define-minor-mode my-foo-mode ;; "Fooing like there's no tomorrow." ;; (if (null my-foo-mode) ;; (when my-foo--change-tracker ;; (track-changes-unregister my-foo--change-tracker) ;; (setq my-foo--change-tracker nil)) ;; (unless my-foo--change-tracker ;; (setq my-foo--change-tracker ;; (track-changes-register ;; (lambda (id) ;; (track-changes-fetch ;; id (lambda (beg end before) ;; ..DO THE THING..)))))))) ;;; Code: ;; FIXME: Try and do some sanity-checks (e.g. looking at `buffer-size'), ;; to detect if/when we somehow missed some changes. ;; FIXME: The API doesn't offer an easy way to signal a "full resync" ;; kind of change, as might be needed if we lost changes. (eval-when-compile (require 'cl-lib)) (cl-defstruct (track-changes--tracker (:noinline t) (:constructor nil) (:constructor track-changes--tracker ( signal state))) ( signal nil :read-only t) state) (cl-defstruct (track-changes--state (:noinline t) (:constructor nil) (:constructor track-changes--state ())) (beg (point-max)) (end (point-min)) (before nil) (next nil)) (defvar-local track-changes--trackers ()) (defvar-local track-changes--clean-trackers () "List of trackers that are clean. Those are the trackers that get signaled when a change is made.") (defvar-local track-changes--disjoint-trackers () "List of trackers that want to react to disjoint changes. These trackers' are signaled every time track-changes notices that some upcoming changes touch another \"distant\" part of the buffer.") (defvar-local track-changes--state nil) ;; `track-changes--before-*' keep track of the content of the ;; buffer when `track-changes--state' was cleaned. (defvar-local track-changes--before-beg nil) (defvar-local track-changes--before-end nil) (defvar-local track-changes--before-string nil) (defvar-local track-changes--buffer-size nil) (defun track-changes-register ( signal) "Register a new tracker and return a new tracker ID. SIGNAL is a function that will be called with one argument (the tracker ID)= when the current buffer is modified, so that we can react to the change. Once called, SIGNAL is not called again until `track-changes-fetch' is called with the corresponding tracker ID." ;; FIXME: Add an optional arg to choose between `funcall' and `funcall-la= ter'? ;; FIXME: Add an optional arg to say we don't need the `before' info? (track-changes--clean-state) (add-hook 'before-change-functions #'track-changes--before nil t) (add-hook 'after-change-functions #'track-changes--after nil t) (let ((tracker (track-changes--tracker signal track-changes--state))) (push tracker track-changes--trackers) (push tracker track-changes--clean-trackers) tracker)) (defun track-changes-register-disjoint (id) "Enable disjoint change tracking (DCT) for tracker ID. This is needed when combining disjoint changes into one bigger change is unacceptable, typically for performance reasons. When DCT is enabled, we call ID's SIGNAL function every time we are about to combine changes from \"distant\" parts of the buffer. These calls are distinguished from normal calls by calling SIGNAL with a second argument which is the distance between the upcoming change and the previous changes. In that case SIGNAL is called directly from `before-change-functions' and should thus be extra careful: don't modify the buffer, don't call a function that may block, ... In order to prevent the upcoming change from being combined with the previo= us changes, SIGNAL needs to call `track-changes-fetch' before it returns." (cl-assert (memq id track-changes--trackers)) (unless (memq id track-changes--disjoint-trackers) (push id track-changes--disjoint-trackers))) (defun track-changes-reset (id) "Mark all past changes as handled for tracker ID. Does not re-enable ID's signal." (track-changes--clean-state) (setf (track-changes--tracker-state id) track-changes--state)) (defun track-changes-unregister (id) "Remove the tracker denoted by ID. Trackers can consume resources (especially if `track-changes-fetch' is not called), so it is good practice to unregister them when you don't need them any more." (unless (memq id track-changes--trackers) (error "Unregistering a non-registered tracker: %S" id)) (setq track-changes--trackers (delq id track-changes--trackers)) (setq track-changes--clean-trackers (delq id track-changes--clean-tracker= s)) (setq track-changes--disjoint-trackers (delq id track-changes--disjoint-trackers)) (when (null track-changes--trackers) (setq track-changes--state nil) (setq track-changes--buffer-size nil) (setq track-changes--before-string nil) (remove-hook 'before-change-functions #'track-changes--before t) (remove-hook 'after-change-functions #'track-changes--after t))) (defun track-changes--clean-state () (cond ((null track-changes--state) (cl-assert (null track-changes--before-string)) (cl-assert (null track-changes--buffer-size)) ;; No state has been created yet. Do it now. (setq track-changes--buffer-size (buffer-size)) (setq track-changes--state (track-changes--state))) ((null track-changes--before-string) nil) ((> (track-changes--state-beg track-changes--state) (track-changes--state-end track-changes--state)) ;; before-c-f was run but not after-c-f, so there was really no change. nil) (t ;; FIXME: We may be in-between a before-c-f and an after-c-f, so we ;; should save some of the current buffer in case an after-c-f comes ;; before a before-c-f. (cl-assert (<=3D track-changes--before-beg (track-changes--state-beg track-changes--state) (track-changes--state-end track-changes--state) track-changes--before-end)) (cl-assert (null (track-changes--state-before track-changes--state))) (setf (track-changes--state-before track-changes--state) ;; The before-* vars can cover more text than the actually modifi= ed ;; area, so trim it down now to the relevant part. (if (=3D (- track-changes--before-end track-changes--before-beg) (- (track-changes--state-end track-changes--state) (track-changes--state-beg track-changes--state))) ;; Common case. track-changes--before-string (substring track-changes--before-string (- (track-changes--state-beg track-changes--state) track-changes--before-beg) (- (length track-changes--before-string) (- track-changes--before-end (track-changes--state-end track-changes--state)))))) (setq track-changes--before-beg nil track-changes--before-end nil track-changes--before-string nil) (let ((new (track-changes--state))) (setf (track-changes--state-next track-changes--state) new) (setq track-changes--state new))))) (defvar track-changes-disjoint-threshold 100 "Distance below which changes are not considered disjoint.") (defun track-changes--signal-if-disjoint (pos1 pos2) (let ((distance (- pos2 pos1))) (when (> distance (max track-changes-disjoint-threshold ;; If the distance is smaller than the size of the current ;; change, then we may as well consider it as "near". (length track-changes--before-string) (- track-changes--before-end track-changes--before-beg))) (dolist (tracker track-changes--disjoint-trackers) (funcall (track-changes--tracker-signal tracker) tracker distance))= ))) (defun track-changes--before (beg end) (cl-assert (=3D track-changes--buffer-size (buffer-size))) (cl-assert track-changes--state) (cl-assert (<=3D beg end)) (if (null track-changes--before-string) (progn (setf track-changes--before-string (buffer-substring-no-properties beg end)) (setf track-changes--before-beg beg) (setf track-changes--before-end end)) (cl-assert (save-restriction (widen) (<=3D (point-min) track-changes--before-beg track-changes--before-end (point-max)))) (when (< beg track-changes--before-beg) (when track-changes--disjoint-trackers (track-changes--signal-if-disjoint end track-changes--before-beg)) (let* ((old-bbeg track-changes--before-beg) ;; To avoid O(N=C2=B2) behavior when faced with many small cha= nges, ;; we copy more than needed. (new-bbeg (min (max (point-min) (- old-bbeg (length track-changes--before-string))) beg))) (setf track-changes--before-beg new-bbeg) (cl-callf (lambda (old new) (concat new old)) track-changes--before-string (buffer-substring-no-properties new-bbeg old-bbeg)))) (when (< track-changes--before-end end) (when track-changes--disjoint-trackers (track-changes--signal-if-disjoint track-changes--before-end beg)) (let* ((old-bend track-changes--before-end) ;; To avoid O(N=C2=B2) behavior when faced with many small cha= nges, ;; we copy more than needed. (new-bend (max (min (point-max) (+ old-bend (length track-changes--before-string))) end))) (setf track-changes--before-end new-bend) (cl-callf concat track-changes--before-string (buffer-substring-no-properties old-bend new-bend)))))) (defun track-changes--after (beg end len) (cl-assert track-changes--state) (cl-assert track-changes--before-string) (let ((offset (- (- end beg) len))) (cl-incf track-changes--before-end offset) (cl-incf track-changes--buffer-size offset) (cl-assert (=3D track-changes--buffer-size (buffer-size))) (cl-assert (save-restriction (widen) (<=3D (point-min) track-changes--before-beg beg end track-changes--before-end (point-max)))) ;; Note the new changes. (when (< beg (track-changes--state-beg track-changes--state)) (setf (track-changes--state-beg track-changes--state) beg)) (cl-callf (lambda (old-end) (max end (+ old-end offset))) (track-changes--state-end track-changes--state))) (cl-assert (<=3D track-changes--before-beg (track-changes--state-beg track-changes--state) beg end (track-changes--state-end track-changes--state) track-changes--before-end)) (while track-changes--clean-trackers (let ((tracker (pop track-changes--clean-trackers))) ;; FIXME: Use `funcall'? (funcall-later #'track-changes--call-signal (list (current-buffer) tracker))))) (defun track-changes--call-signal (buf tracker) (when (buffer-live-p buf) (with-current-buffer buf ;; Silence ourselves if `track-changes-fetch' was called in the mean = time. (unless (memq tracker track-changes--clean-trackers) (funcall (track-changes--tracker-signal tracker) tracker))))) (defun track-changes-fetch (id func) ;; FIXME: Bad name, "fetch" doesn't make much sense for it any more. "Fetch the pending changes. ID is the tracker ID returned by a previous `track-changes-register'. FUNC is a function. It is called with 3 arguments (BEGIN END BEFORE) where BEGIN..END delimit the region that was changed since the last time `track-changes-fetch' was called and BEFORE is a string containing the previous content of that region. If no changes occurred since the last time, FUNC is not called and we return nil, otherwise we return the value returned by FUNC, and re-enable the TRACKER corresponding to ID." (let ((beg nil) (end nil) (before nil) (states ())) ;; Transfer the data from `track-changes--before-string' ;; to the tracker's state object, if needed. (track-changes--clean-state) ;; We want to combine the states from most recent to oldest, ;; so reverse them. (let ((state (track-changes--tracker-state id))) (while state (push state states) (setq state (track-changes--state-next state)))) (when (null (track-changes--state-before (car states))) (cl-assert (eq (car states) track-changes--state)) (setq states (cdr states))) (if (null states) (progn (cl-assert (memq id track-changes--clean-trackers)) nil) (dolist (state states) (let ((prevbbeg (track-changes--state-beg state)) (prevbend (track-changes--state-end state)) (prevbefore (track-changes--state-before state))) (if (not before) (progn ;; This is the most recent change. Just initialize the var= s. (setq beg (track-changes--state-beg state)) (setq end (track-changes--state-end state)) (setq before prevbefore) (unless (and (=3D beg prevbbeg) (=3D end prevbend)) (setq before (substring before (- beg (track-changes--state-beg state)) (- (length before) (- (track-changes--state-end state) end)))))) (let ((endb (+ beg (length before)))) (when (< prevbbeg beg) (setq before (concat (buffer-substring-no-properties prevbbeg beg) before)) (setq beg prevbbeg) (cl-assert (=3D endb (+ beg (length before))))) (when (< endb prevbend) (let ((new-end (+ end (- prevbend endb)))) (setq before (concat before (buffer-substring-no-properties end new-end))) (setq end new-end) (cl-assert (=3D prevbend (+ beg (length before)))) (setq endb (+ beg (length before))))) (cl-assert (<=3D beg prevbbeg prevbend endb)) ;; The `prevbefore' is covered by the new one. (setq before (concat (substring before 0 (- prevbbeg beg)) prevbefore (substring before (- (length before) (- endb prevbend))))))))) (cl-assert (<=3D (point-min) beg end (point-max))) ;; Update the tracker's state before running `func' so we don't risk ;; mistakenly replaying the changes in case `func' exits non-locally. (setf (track-changes--tracker-state id) track-changes--state) (unwind-protect (funcall func beg end before) ;; Re-enable the tracker's signal only after running `func', so ;; as to avoid recursive invocations. (cl-pushnew id track-changes--clean-trackers))))) (defmacro with-track-changes (id vars &rest body) (declare (indent 2) (debug (form sexp body))) `(track-changes-fetch ,id (lambda ,vars ,@body))) =20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20=20 (provide 'track-changes) ;;; track-changes.el end here. --=-=-=--