On Dec 16, 2023, at 11:45 AM, Eli Zaretskii <eliz@gnu.org> wrote:

From: JD Smith <jdtsmith@gmail.com>
Date: Sat, 16 Dec 2023 11:19:51 -0500
Cc: Yuan Fu <casouri@gmail.com>

Yuan’s excellent ELPA package vundo[1] uses `buffer-undo-list’ and `undo-equiv-table’ to visualize and flexibly navigate the entire tree of edits/undos/redos (see the beautiful explanation how[2]).  We’ve been improving it lately with saved state visibility and diff functionality.  It displays the tree of edits sideways in just a few lines of text or unicode chars in a small buffer at the bottom of the frame.  This works nicely, until you accumulate undo branches >~300 levels deep, which is easy to achieve in long editing sessions, or if you save undo state (e.g. with undo-fu-session).  This leads to displayed lines which are >1000 columns wide.  And then Emacs' dreaded long-line display/edit slowness starts creeping in, imposing a quadratic time penalty [3]:



(I don't understand what was supposed to be shown in my MUA at this
place; what I see is as single character U+FFFC.)

Apologies; you’d think inline images could survive passage in 2023!   These two images were included in the OP:

https://private-user-images.githubusercontent.com/93749/289367869-b0d8ab4f-c231-4964-85ba-d9ecc20363b0.png
https://private-user-images.githubusercontent.com/93749/290972630-54f81ca6-4e4e-4943-a5a1-d8a949bd3c33.png

Is this with Emacs 29?  If so, you should be able to tune
long-line-threshold to get some help from Emacs.  Although I'm a bit
surprised by the number 1000: usually these problems happen when lines
are much wider, which is why the default value of long-line-threshold
is 50K, not 1K.  

Yes, Emacs 29.  The problem is mild until say 10K columns.  I tried dropping `long-line-threshold' to 1K (locally in the vundo buffer) and unfortunately it made no difference in the timing.

In the issue you posted, I see that there's
significant slowdown (hundreds of milliseconds) when you limit the
display to 200 or 80 or even 30(!) columns.  Given that, and the other
evidence in the issue you posted, I'd say that the long lines are not
the main factor for slowness here, because no one will convince me
that the Emacs display has problems when the buffer includes 30
columns.  

Curious indeed, and a valid point.  It’s a logarithmic scale, so the absolute difference isn’t that much.  If long lines are not to blame (whether via display or some related issue), I can’t then understand why breaking the lines into shorter pieces would so radically speed things up.  It’s a pretty simple draw algorithm involving straightforward buffer edits:

• inserting (propertized) text one or two chars at a time (list below)
• looking-at with a simple regex (couple chars) to see if there’s room for the branch
• delete-char in a few places
• goto-char to the saved parents position to start drawing a new branch
• move-to-column to continue a branch on a new line from the same column position

Nothing obvious to me that would seem to give quadratic performance degradation, especially since I’m testing with a purely linear structure (one long line).  Settling down to approximately a constant draw time per edit/node, as occurs for high tree depth at max_columns<1000, seems quite reasonable.  As another piece of evidence which I mentioned in the issue thread, when such slow to display long-line vundo buffers are visible, editing performance in buffers shown in other windows is degraded (a behavior I’ve experienced with long-line buffers before).

Maybe the stuff you display uses some characters that make
redisplay slower for reasons other than line-length?  What kind of
characters are used for showing these "undo-trees"?

Nothing fancy, just characters from either one of these two sets (with faces applied):

(defconst vundo-ascii-symbols
  '((selected-node . ?x)
    (node . ?o)
    (horizontal-stem . ?-)
    (vertical-stem . ?|)
    (branch . ?|)
    (last-branch . ?`)))

(defconst vundo-unicode-symbols
  '((selected-node . ?●)
    (node . ?○)
    (horizontal-stem . ?─)
    (vertical-stem . ?│)
    (branch . ?├)
    (last-branch . ?└)))

I just retried my test with the ascii-only symbols, and it resulted in a modest 17% speedup, but still >20s at 32K columns.

Since we don’t need access to more than a window-width worth of tree at a time, is there a simple way to display only some range of columns of a very wide buffer, and have the full structure live in a non-displayed “backing data store”?  Would that even help with the long-line penalty, or does that penalty come in with any sort of character insertion, looking-at, etc. within long lines?  Do we have to resort to our own “backing store” like a ragged array of long vectors, redrawing when the left/right column boundaries change (sounds complicated!)?

I don't think I understand your questions, but the reasons for display
slowness with very long lines are not related to character insertion
or looking-at, they are related to the algorithms used for display
layout, which is the first stage of each redisplay cycle, and are also
reused for visual cursor movement (C-n/C-p etc.).

Breaking the long lines at column lengths <200 speeds things up by >50x.  I was hoping to replicate this speedup by limiting the column width of displayed text in some convenient fashion, without explicitly having to create a data structure outside of the buffer, and recreate the buffer contents from it on demand.

To confirm, it’s your understanding that functions like all those mentioned above (goto-char, looking at, delete-char, move-to-column, insert) should be agnostic w.r.t. the length of lines, for buffers of the same total character count?  If so, I really can’t understand how breaking fewer long lines into more shorter ones, drawn with the same algorithm, would improve the performance so much.

If anyone would like to try this test themselves, I’ve included it below (vundo + lorem-ipsum required).

Bottom line: I don't think I understand what causes your problems.
The information presented is insufficient for even guessing the
possible reasons.  Please consider telling more.

I did include a link to the short draw function itself; not sure if you saw that.  And I demonstrated clearly (IMO of course) that the performance slowdown depends directly on the max column count of the displayed buffer.  As a small bit of constructive feedback, this take (“insufficient for even guessing the possible reasons”) strikes me as uncharitable given this context.  That said, you’ve motivated me to dig a bit deeper and temporarily set aside long lines as the presumed culprit. I used elp to instrument anything significant in vundo—draw-tree, but came up short:

Function Name               Call Count  Elapsed Time  Average Time
vundo--draw-tree            1           3.849056      3.849056
append                      48118       0.0478320000  9.940...e-07
propertize                  12063       0.0423950000  3.514...e-06
insert                      22018       0.0183619999  8.339...e-07
vundo--translate            12002       0.0085510000  7.124...e-07
looking-at                  12202       0.0038980000  3.194...e-07
vundo--node-timestamp       6002        0.0033740000  5.621...e-07
goto-char                   13517       0.0026380000  1.951...e-07
last                        12002       0.0021369999  1.780...e-07
vundo-m-children            12001       0.0021319999  1.776...e-07
pop                         6001        0.0017129999  2.854...e-07
vundo-m-point               6000        0.0015979999  2.663...e-07
vundo--put-node-at-point    6001        0.0015049999  2.507...e-07
vundo-m-parent              6001        0.0007300000  1.216...e-07
erase-buffer                48          0.0005819999  1.212...e-05
delete-char                 29          1.100...e-05  3.793...e-07


I’ll report more if I discover anything, but of course very happy to hear anyone's thoughts on the matter in the meantime.

Thanks for your attention and analysis.

+++
;; create and benchmark starting vundo with long linear undo trees
(defun vundo-linear-stress-test (nsen)
  (interactive
   (list (read-number "Number of edits (sentences): " 20000)))
  (let* ((buf (get-buffer-create "*vundo-linear-test*"))
         (res (cl-loop for nedits = 3 then (round (* nedits 1.25))
                       while (< nedits nsen) do
                       (with-current-buffer buf
                         (read-only-mode -1)
                         (font-lock-mode -1)
                         (erase-buffer)
                         (setq buffer-undo-list nil
                               pending-undo-list nil)
                         (dotimes (_ nedits)
                           (lorem-ipsum-insert-sentences 1)
                           (insert "\n")
                           (undo-boundary)))
                       
                       collect (flatten-tree
                                (list nedits (with-current-buffer buf
                                               (benchmark-call #'vundo))
                                      (with-current-buffer " *vundo tree*"
                                        (current-column))))
                       do (with-current-buffer (get-buffer " *vundo tree*")
                            (vundo-quit)))))
    (with-output-to-temp-buffer "*vundo-linear-stress-test-results*"
      (princ (format "%6s %10s %4s %10s %6s\n"
                     "Edits" "Time(s)" "GCs" "GCTime(s)" "Cols"))
      (dolist (x res) (princ (apply #'format "%6d %10.3f %4d %10.3f %6d\n" x))))))