From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#70077: An easier way to track buffer changes Date: Tue, 02 Apr 2024 16:21:25 +0000 Message-ID: <87ttkjspkq.fsf@localhost> References: <86frw8ewk9.fsf@gnu.org> <86cyrcdy80.fsf@gnu.org> <877chh8fjk.fsf@localhost> <87wmpfsv2y.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31239"; mail-complaints-to="usenet@ciao.gmane.io" Cc: casouri@gmail.com, 70077@debbugs.gnu.org, qhong@alum.mit.edu, frederic.bour@lakaban.net, joaotavora@gmail.com, mail@nicolasgoaziou.fr, acm@muc.de, Eli Zaretskii , stephen_leake@stephe-leake.org, alan.zimm@gmail.com, phillip.lord@russet.org.uk To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Tue Apr 02 18:22:32 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1rrgu7-0007tq-Uk for geb-bug-gnu-emacs@m.gmane-mx.org; Tue, 02 Apr 2024 18:22:32 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1rrgtr-0000xj-NC; Tue, 02 Apr 2024 12:22:15 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1rrgtc-0000vO-Nu for bug-gnu-emacs@gnu.org; Tue, 02 Apr 2024 12:22:00 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1rrgtc-0001pO-Eo for bug-gnu-emacs@gnu.org; Tue, 02 Apr 2024 12:22:00 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1rrgtg-0008Ld-5Z for bug-gnu-emacs@gnu.org; Tue, 02 Apr 2024 12:22:04 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Tue, 02 Apr 2024 16:22:04 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70077 X-GNU-PR-Package: emacs X-GNU-PR-Keywords: patch Original-Received: via spool by 70077-submit@debbugs.gnu.org id=B70077.171207489631946 (code B ref 70077); Tue, 02 Apr 2024 16:22:04 +0000 Original-Received: (at 70077) by debbugs.gnu.org; 2 Apr 2024 16:21:36 +0000 Original-Received: from localhost ([127.0.0.1]:55826 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rrgtB-0008Iv-Qx for submit@debbugs.gnu.org; Tue, 02 Apr 2024 12:21:35 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]:49755) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1rrgt7-0008I5-EQ for 70077@debbugs.gnu.org; Tue, 02 Apr 2024 12:21:31 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 68D2424002A for <70077@debbugs.gnu.org>; Tue, 2 Apr 2024 18:21:19 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1712074879; bh=099D2+4u5fNlEq3tyHM7sYiP/H/3+hOxowmjOz+zMEE=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type: From; b=bqFrDQfgdM08afCXDiK1xF+cU2jG8zl00tJ+WBZh5q+J6Or37KxviLNfWIwuDGi/S AG2WXlsod+kN7e2dw4/ThTEC3SiCrUWsftBruUSQeCDaxkN9+8rBen0pFpRrd+xE+M VnUyZy0k0GMNDTiBtpia54djbCYADFS++IvthpLwBkS31lA9un4q/nP/oNBlwhlCM0 Kd90p3bNFglYmUVpy50jc6w2EhMPlNYiZtVZ7IY2nZdHzvGwEJGSQZdoZBXv3Rjaag zsODSfqm+TG9d9WfSW5xo2bIWKpnT5CocITFEoXsUS5C6TE6pZMx7lRbKAlS0Hi6H+ +2abXf0T9XrCg== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4V8Cms4bw9z6tm8; Tue, 2 Apr 2024 18:21:17 +0200 (CEST) In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:282524 Archived-At: Stefan Monnier writes: >>> I'm not sure how to combine the benefits of combining small changes into >>> larger ones with the benefits of keeping distant changes separate. >> I am not sure if combining small changes into larger ones is at all a >> good idea. > > In my experience, if the amount of work to be done "per change" is not > completely trivial, combining small changes is indispensable (the worst > part being that the cases where it's needed are sufficiently infrequent > that naive users of `*-change-functions` won't know about that, so we > end up with unacceptable slowdowns in corner cases). I agree, but only when subsequent changes intersect each other. When distant changes are combined together, they cover a lot of buffer text that has not been changed - they may trigger a lot more work compared to more granular changes. Consider, for example, two changes near the beginning and the end of the buffer: (1..10) and (1000000...(point-max)) If such changes are tracked by tree-sitter or other parser, there is a world of difference between requesting to re-parse individual segments and the whole buffer. >> The changes are stored in strings, which get allocated and >> re-allocated repeatedly. > > Indeed. Tho for N small changes, my code should perform only O(log N) > re-allocations. May you explain a bit more about this? My reading of `track-changes--before' is that `concat' is called on every change except when a new change is already inside the previously recorded one. (aside: I have a hard time reading the code because of confusing slot names: bbeg vs beg??) >> Repeated string allocations, especially when strings keep growing >> towards the buffer size, is likely going to increase consing and make >> GCs more frequent. > > Similar allocations presumably take place anyway while running the code > (e.g. for the `buffer-undo-list`), so I'm hoping the effect will be > "lost in the noise". I disagree. `buffer-undo-list' only includes the text that has been actually changed. It never creates strings that span between distant regions. As a terminal case, consider alternating between first and second half of the buffer, starting in the middle and editing towards the buffer boundaries - this will involve re-allocation of buffer-long strings on average. >>> We could expose a list of simultaneous (and thus disjoint) changes, >>> which avoids the last problem. But it's a fair bit more work for us, it >>> makes the API more complex for the clients, and it's rarely what the >>> clients really want anyway. >> FYI, Org parser maintains such a list. > > Could you point me to the relevant code (I failed to find it)? That code handles a lot more than just changes, so it might not be the best reference. Anyway... See the docstring of `org-element--cache-sync-requests', and the branch in `org-element--cache-submit-request' starting from ;; Current changes can be merged with first sync request: we ;; can save a partial cache synchronization. >> We previously discussed a similar API in >> https://yhetil.org/emacs-bugs/87o7iq1emo.fsf@localhost/ > > IIUC this discusses a *sequence* of edits. In the point to which you > replied I was discussing keeping a list of *simultaneous* edits. Hmm. By referring to buffer-undo-list, I meant that intersecting edits will be automatically merged, as it is usually done in undo system. I am also not 100% sure why edits being simultaneous is any relevant. Maybe we are talking about different things? What I was talking about is (1) automatically merging subsequent edits like 1..5 2..7 7..9 into 1..9. (2) if a subsequent edit does not intersect with previous edited region, record it separately without merging. > This said, the downside in both cases is that the specific data that we > need from such a list tends to depend on the user. E.g. you suggest > > (BEG END_BEFORE END_AFTER COUNTER) > > but that is not sufficient reconstruct the corresponding buffer state, > so things like Eglot/CRDT can't use it. Ideally for CRDT I think you'd > want a sequence of > > (BEG END-BEFORE STRING-AFTER) Right. It's just that Org mode does not need STRING-AFTER, which is why I did not think about it in my proposal. Of course, having STRING-AFTER is required to get full reconstruction of the buffer state. > but for Eglot this is not sufficient because Eglot needs to convert BEG > and END_BEFORE into LSP positions (i.e. "line+col") and for that it > needs to reproduce the past buffer state. So instead, what Eglot needs > (and does indeed build using `*-change-functions`) is a sequence of > > (LSP-BEG LSP-END-BEFORE STRING-AFTER) > > [ Tho it seems it also needs a "LENGTH" of the deleted chunk, not sure > exactly why, but I guess it's a piece of redundant info the servers > can use to sanity-check the data? ] It is to support deprecated LSP spec (I presume that older LSP servers may still be using the old spec): https://microsoft.github.io/language-server-protocol/specifications/specification-3-15/ export type TextDocumentContentChangeEvent = { * The range of the document that changed. range: Range; * The optional length of the range that got replaced. * @deprecated use range instead. rangeLength?: number; * The new text for the provided range. text: string; > The code changes were actually quite simple, so I have now included it. > See my current PoC code below. Am I missing something, or will `track-changes--signal-if-disjoint' alter `track-changes--state' when it calls `track-changes-fetch' -> `track-changes-reset'? Won't that interfere with the information passed to non-disjoint trackers? -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at