Mitigating the long-lines penalty in vundo

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Mitigating the long-lines penalty in vundo
@ 2023-12-16 16:19 JD Smith
  2023-12-16 16:45 ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: JD Smith @ 2023-12-16 16:19 UTC (permalink / raw)
  To: emacs-devel; +Cc: Yuan Fu

[-- Attachment #1: Type: text/plain, Size: 2252 bytes --]

Yuan’s excellent ELPA package vundo[1] uses `buffer-undo-list’ and `undo-equiv-table’ to visualize and flexibly navigate the entire tree of edits/undos/redos (see the beautiful explanation how[2]).  We’ve been improving it lately with saved state visibility and diff functionality.  It displays the tree of edits sideways in just a few lines of text or unicode chars in a small buffer at the bottom of the frame.  This works nicely, until you accumulate undo branches >~300 levels deep, which is easy to achieve in long editing sessions, or if you save undo state (e.g. with undo-fu-session).  This leads to displayed lines which are >1000 columns wide.  And then Emacs' dreaded long-line display/edit slowness starts creeping in, imposing a quadratic time penalty [3]:

I verified that this is due to long lines and not some vundo issue per se by simply breaking the tree-drawing algorithm so it wraps to a new line after some max number of columns:

But obviously such “broken” trees aren’t terribly useful as-is.  What creative solutions can people suggest?  The tree draw algorithm [4] is relatively simple, and depth-first.  Only a total of 6 distinct characters are displayed.  It has to do things like “bend” a branch down if it runs into a preexisting branch deeper in the tree.  I didn’t see any improvement by enabling so-long-minor-mode.

Since we don’t need access to more than a window-width worth of tree at a time, is there a simple way to display only some range of columns of a very wide buffer, and have the full structure live in a non-displayed “backing data store”?  Would that even help with the long-line penalty, or does that penalty come in with any sort of character insertion, looking-at, etc. within long lines?  Do we have to resort to our own “backing store” like a ragged array of long vectors, redrawing when the left/right column boundaries change (sounds complicated!)?

Thanks for your insights.

[1] https://github.com/casouri/vundo
[2] https://archive.casouri.cat/note/2021/visual-undo-tree
[3] https://github.com/casouri/vundo/issues/68#issuecomment-1848992975
[4] https://github.com/casouri/vundo/blob/824be351527f7a606b25637f6ba2a00cceade338/vundo.el#L591

[-- Attachment #2.1: Type: text/html, Size: 3376 bytes --]

[-- Attachment #2.2: image.png --]
[-- Type: image/png, Size: 406873 bytes --]

[-- Attachment #2.3: image.png --]
[-- Type: image/png, Size: 143077 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-16 16:19 Mitigating the long-lines penalty in vundo JD Smith
@ 2023-12-16 16:45 ` Eli Zaretskii
  2023-12-16 18:54   ` JD Smith
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2023-12-16 16:45 UTC (permalink / raw)
  To: JD Smith; +Cc: emacs-devel, casouri

> From: JD Smith <jdtsmith@gmail.com>
> Date: Sat, 16 Dec 2023 11:19:51 -0500
> Cc: Yuan Fu <casouri@gmail.com>
> 
> Yuan’s excellent ELPA package vundo[1] uses `buffer-undo-list’ and `undo-equiv-table’ to visualize and flexibly navigate the entire tree of edits/undos/redos (see the beautiful explanation how[2]).  We’ve been improving it lately with saved state visibility and diff functionality.  It displays the tree of edits sideways in just a few lines of text or unicode chars in a small buffer at the bottom of the frame.  This works nicely, until you accumulate undo branches >~300 levels deep, which is easy to achieve in long editing sessions, or if you save undo state (e.g. with undo-fu-session).  This leads to displayed lines which are >1000 columns wide.  And then Emacs' dreaded long-line display/edit slowness starts creeping in, imposing a quadratic time penalty [3]:
> 
> 

(I don't understand what was supposed to be shown in my MUA at this
place; what I see is as single character U+FFFC.)

Is this with Emacs 29?  If so, you should be able to tune
long-line-threshold to get some help from Emacs.  Although I'm a bit
surprised by the number 1000: usually these problems happen when lines
are much wider, which is why the default value of long-line-threshold
is 50K, not 1K.  In the issue you posted, I see that there's
significant slowdown (hundreds of milliseconds) when you limit the
display to 200 or 80 or even 30(!) columns.  Given that, and the other
evidence in the issue you posted, I'd say that the long lines are not
the main factor for slowness here, because no one will convince me
that the Emacs display has problems when the buffer includes 30
columns.  Maybe the stuff you display uses some characters that make
redisplay slower for reasons other than line-length?  What kind of
characters are used for showing these "undo-trees"?

> Since we don’t need access to more than a window-width worth of tree at a time, is there a simple way to display only some range of columns of a very wide buffer, and have the full structure live in a non-displayed “backing data store”?  Would that even help with the long-line penalty, or does that penalty come in with any sort of character insertion, looking-at, etc. within long lines?  Do we have to resort to our own “backing store” like a ragged array of long vectors, redrawing when the left/right column boundaries change (sounds complicated!)?

I don't think I understand your questions, but the reasons for display
slowness with very long lines are not related to character insertion
or looking-at, they are related to the algorithms used for display
layout, which is the first stage of each redisplay cycle, and are also
reused for visual cursor movement (C-n/C-p etc.).

Bottom line: I don't think I understand what causes your problems.
The information presented is insufficient for even guessing the
possible reasons.  Please consider telling more.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-16 16:45 ` Eli Zaretskii
@ 2023-12-16 18:54   ` JD Smith
  2023-12-16 19:13     ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: JD Smith @ 2023-12-16 18:54 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, casouri

[-- Attachment #1: Type: text/plain, Size: 9921 bytes --]



> On Dec 16, 2023, at 11:45 AM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
>> From: JD Smith <jdtsmith@gmail.com>
>> Date: Sat, 16 Dec 2023 11:19:51 -0500
>> Cc: Yuan Fu <casouri@gmail.com>
>> 
>> Yuan’s excellent ELPA package vundo[1] uses `buffer-undo-list’ and `undo-equiv-table’ to visualize and flexibly navigate the entire tree of edits/undos/redos (see the beautiful explanation how[2]).  We’ve been improving it lately with saved state visibility and diff functionality.  It displays the tree of edits sideways in just a few lines of text or unicode chars in a small buffer at the bottom of the frame.  This works nicely, until you accumulate undo branches >~300 levels deep, which is easy to achieve in long editing sessions, or if you save undo state (e.g. with undo-fu-session).  This leads to displayed lines which are >1000 columns wide.  And then Emacs' dreaded long-line display/edit slowness starts creeping in, imposing a quadratic time penalty [3]:
>> 
>> 
> 
> (I don't understand what was supposed to be shown in my MUA at this
> place; what I see is as single character U+FFFC.)

Apologies; you’d think inline images could survive passage in 2023!   These two images were included in the OP:

https://private-user-images.githubusercontent.com/93749/289367869-b0d8ab4f-c231-4964-85ba-d9ecc20363b0.png
https://private-user-images.githubusercontent.com/93749/290972630-54f81ca6-4e4e-4943-a5a1-d8a949bd3c33.png

> Is this with Emacs 29?  If so, you should be able to tune
> long-line-threshold to get some help from Emacs.  Although I'm a bit
> surprised by the number 1000: usually these problems happen when lines
> are much wider, which is why the default value of long-line-threshold
> is 50K, not 1K.  

Yes, Emacs 29.  The problem is mild until say 10K columns.  I tried dropping `long-line-threshold' to 1K (locally in the vundo buffer) and unfortunately it made no difference in the timing.

> In the issue you posted, I see that there's
> significant slowdown (hundreds of milliseconds) when you limit the
> display to 200 or 80 or even 30(!) columns.  Given that, and the other
> evidence in the issue you posted, I'd say that the long lines are not
> the main factor for slowness here, because no one will convince me
> that the Emacs display has problems when the buffer includes 30
> columns.  

Curious indeed, and a valid point.  It’s a logarithmic scale, so the absolute difference isn’t that much.  If long lines are not to blame (whether via display or some related issue), I can’t then understand why breaking the lines into shorter pieces would so radically speed things up.  It’s a pretty simple draw algorithm involving straightforward buffer edits:

    • inserting (propertized) text one or two chars at a time (list below)
    • looking-at with a simple regex (couple chars) to see if there’s room for the branch
    • delete-char in a few places
    • goto-char to the saved parents position to start drawing a new branch
    • move-to-column to continue a branch on a new line from the same column position

Nothing obvious to me that would seem to give quadratic performance degradation, especially since I’m testing with a purely linear structure (one long line).  Settling down to approximately a constant draw time per edit/node, as occurs for high tree depth at max_columns<1000, seems quite reasonable.  As another piece of evidence which I mentioned in the issue thread, when such slow to display long-line vundo buffers are visible, editing performance in buffers shown in other windows is degraded (a behavior I’ve experienced with long-line buffers before).

> Maybe the stuff you display uses some characters that make
> redisplay slower for reasons other than line-length?  What kind of
> characters are used for showing these "undo-trees"?

Nothing fancy, just characters from either one of these two sets (with faces applied):

(defconst vundo-ascii-symbols
  '((selected-node . ?x)
    (node . ?o)
    (horizontal-stem . ?-)
    (vertical-stem . ?|)
    (branch . ?|)
    (last-branch . ?`)))

(defconst vundo-unicode-symbols
  '((selected-node . ?●)
    (node . ?○)
    (horizontal-stem . ?─)
    (vertical-stem . ?│)
    (branch . ?├)
    (last-branch . ?└)))

I just retried my test with the ascii-only symbols, and it resulted in a modest 17% speedup, but still >20s at 32K columns.

>> Since we don’t need access to more than a window-width worth of tree at a time, is there a simple way to display only some range of columns of a very wide buffer, and have the full structure live in a non-displayed “backing data store”?  Would that even help with the long-line penalty, or does that penalty come in with any sort of character insertion, looking-at, etc. within long lines?  Do we have to resort to our own “backing store” like a ragged array of long vectors, redrawing when the left/right column boundaries change (sounds complicated!)?
> 
> I don't think I understand your questions, but the reasons for display
> slowness with very long lines are not related to character insertion
> or looking-at, they are related to the algorithms used for display
> layout, which is the first stage of each redisplay cycle, and are also
> reused for visual cursor movement (C-n/C-p etc.).

Breaking the long lines at column lengths <200 speeds things up by >50x.  I was hoping to replicate this speedup by limiting the column width of displayed text in some convenient fashion, without explicitly having to create a data structure outside of the buffer, and recreate the buffer contents from it on demand.

To confirm, it’s your understanding that functions like all those mentioned above (goto-char, looking at, delete-char, move-to-column, insert) should be agnostic w.r.t. the length of lines, for buffers of the same total character count?  If so, I really can’t understand how breaking fewer long lines into more shorter ones, drawn with the same algorithm, would improve the performance so much.

If anyone would like to try this test themselves, I’ve included it below (vundo + lorem-ipsum required).

> Bottom line: I don't think I understand what causes your problems.
> The information presented is insufficient for even guessing the
> possible reasons.  Please consider telling more.

I did include a link to the short draw function itself; not sure if you saw that.  And I demonstrated clearly (IMO of course) that the performance slowdown depends directly on the max column count of the displayed buffer.  As a small bit of constructive feedback, this take (“insufficient for even guessing the possible reasons”) strikes me as uncharitable given this context.  That said, you’ve motivated me to dig a bit deeper and temporarily set aside long lines as the presumed culprit. I used elp to instrument anything significant in vundo—draw-tree, but came up short:

Function Name               Call Count  Elapsed Time  Average Time
vundo--draw-tree            1           3.849056      3.849056
append                      48118       0.0478320000  9.940...e-07
propertize                  12063       0.0423950000  3.514...e-06
insert                      22018       0.0183619999  8.339...e-07
vundo--translate            12002       0.0085510000  7.124...e-07
looking-at                  12202       0.0038980000  3.194...e-07
vundo--node-timestamp       6002        0.0033740000  5.621...e-07
goto-char                   13517       0.0026380000  1.951...e-07
last                        12002       0.0021369999  1.780...e-07
vundo-m-children            12001       0.0021319999  1.776...e-07
pop                         6001        0.0017129999  2.854...e-07
vundo-m-point               6000        0.0015979999  2.663...e-07
vundo--put-node-at-point    6001        0.0015049999  2.507...e-07
vundo-m-parent              6001        0.0007300000  1.216...e-07
erase-buffer                48          0.0005819999  1.212...e-05
delete-char                 29          1.100...e-05  3.793...e-07

I’ll report more if I discover anything, but of course very happy to hear anyone's thoughts on the matter in the meantime.

Thanks for your attention and analysis.

+++
;; create and benchmark starting vundo with long linear undo trees
(defun vundo-linear-stress-test (nsen)
  (interactive
   (list (read-number "Number of edits (sentences): " 20000)))
  (let* ((buf (get-buffer-create "*vundo-linear-test*"))
         (res (cl-loop for nedits = 3 then (round (* nedits 1.25))
                       while (< nedits nsen) do
                       (with-current-buffer buf
                         (read-only-mode -1)
                         (font-lock-mode -1)
                         (erase-buffer)
                         (setq buffer-undo-list nil
                               pending-undo-list nil)
                         (dotimes (_ nedits)
                           (lorem-ipsum-insert-sentences 1)
                           (insert "\n")
                           (undo-boundary)))
                       
                       collect (flatten-tree
                                (list nedits (with-current-buffer buf
                                               (benchmark-call #'vundo))
                                      (with-current-buffer " *vundo tree*"
                                        (current-column))))
                       do (with-current-buffer (get-buffer " *vundo tree*")
                            (vundo-quit)))))
    (with-output-to-temp-buffer "*vundo-linear-stress-test-results*"
      (princ (format "%6s %10s %4s %10s %6s\n"
                     "Edits" "Time(s)" "GCs" "GCTime(s)" "Cols"))
      (dolist (x res) (princ (apply #'format "%6d %10.3f %4d %10.3f %6d\n" x))))))




[-- Attachment #2: Type: text/html, Size: 14335 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-16 18:54   ` JD Smith
@ 2023-12-16 19:13     ` Eli Zaretskii
  2023-12-16 20:32       ` Stephen Berman
  2023-12-16 22:00       ` JD Smith
  0 siblings, 2 replies; 11+ messages in thread
From: Eli Zaretskii @ 2023-12-16 19:13 UTC (permalink / raw)
  To: JD Smith; +Cc: emacs-devel, casouri

> From: JD Smith <jdtsmith@gmail.com>
> Date: Sat, 16 Dec 2023 13:54:40 -0500
> Cc: emacs-devel@gnu.org,
>  casouri@gmail.com
> 
> >> 
> > 
> > (I don't understand what was supposed to be shown in my MUA at this
> > place; what I see is as single character U+FFFC.)
> 
> Apologies; you’d think inline images could survive passage in 2023!   These two images were included in the OP:
> 
> https://private-user-images.githubusercontent.com/93749/289367869-b0d8ab4f-c231-4964-85ba-d9ecc20363b0.png
> https://private-user-images.githubusercontent.com/93749/290972630-54f81ca6-4e4e-4943-a5a1-d8a949bd3c33.png

I cannot view these images: my browser says there are errors, and if I
try downloading them with wget, I get 404...

> > Maybe the stuff you display uses some characters that make
> > redisplay slower for reasons other than line-length?  What kind of
> > characters are used for showing these "undo-trees"?
> 
> Nothing fancy, just characters from either one of these two sets (with faces applied):
> 
> (defconst vundo-ascii-symbols
>   '((selected-node . ?x)
>     (node . ?o)
>     (horizontal-stem . ?-)
>     (vertical-stem . ?|)
>     (branch . ?|)
>     (last-branch . ?`)))
> 
> (defconst vundo-unicode-symbols
>   '((selected-node . ?●)
>     (node . ?○)
>     (horizontal-stem . ?─)
>     (vertical-stem . ?│)
>     (branch . ?├)
>     (last-branch . ?└)))
> 
> I just retried my test with the ascii-only symbols, and it resulted in a modest 17% speedup, but still >20s at 32K columns.

Very strange.  Profiling should point out the culprit, especially if
you load the .el file (not .elc!) and then profile.

> To confirm, it’s your understanding that functions like all those mentioned above (goto-char, looking at, delete-char, move-to-column, insert) should be agnostic w.r.t. the length of lines, for buffers of the same total character count?  

Yes.  Emacs editing commands are largely unaware of lines, unless Lisp
programs invoke functions that are sensitive to lines, like
end-of-line etc.  Buffer text is stored as a C string, it is not
subdivided into lines.  Only the display engine is sensitive to lines.

> > Bottom line: I don't think I understand what causes your problems.
> > The information presented is insufficient for even guessing the
> > possible reasons.  Please consider telling more.
> 
> I did include a link to the short draw function itself; not sure if you saw that.  And I demonstrated clearly (IMO of course) that the performance slowdown depends directly on the max column count of the displayed buffer.  As a small bit of constructive feedback, this take (“insufficient for even guessing the possible reasons”) strikes me as uncharitable given this context.

You must understand that I don't have time to study non-trivial code
when such questions are posted.  I spent 8 hours today reviewing and
installing patches, merging the release branch to master, fixing bugs
people reported over the last days, etc.  So by "insufficient
information" I meant what was presented explicitly, both here and in
the discussion of the issue.  My best suggestion so far is to profile
the code; if you are certain you know which function is problematic,
profile that after loading the code as .el file, and show the
completely expanded profile.  The profiles you've shown until now
indicate that vundo--draw-tree is the hot spot, so loading it as a .el
file and running "M-x profiler-start" followed by "M-x profiler-report"
when it finishes should show the expensive parts.

Another idea is to increase the GC threshold, in case what slows this
down is the amount of garbage created by this code.

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-16 19:13     ` Eli Zaretskii
@ 2023-12-16 20:32       ` Stephen Berman
  2023-12-16 22:00       ` JD Smith
  1 sibling, 0 replies; 11+ messages in thread
From: Stephen Berman @ 2023-12-16 20:32 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: JD Smith, emacs-devel, casouri

On Sat, 16 Dec 2023 21:13:28 +0200 Eli Zaretskii <eliz@gnu.org> wrote:

>> From: JD Smith <jdtsmith@gmail.com>
>> Date: Sat, 16 Dec 2023 13:54:40 -0500
>> Cc: emacs-devel@gnu.org,
>>  casouri@gmail.com
>> 
>> >> 
>> > 
>> > (I don't understand what was supposed to be shown in my MUA at this
>> > place; what I see is as single character U+FFFC.)
>> 
>> Apologies; you’d think inline images could survive passage in 2023!  These
>> two images were included in the OP:
>> 
>> https://private-user-images.githubusercontent.com/93749/289367869-b0d8ab4f-c231-4964-85ba-d9ecc20363b0.png
>> https://private-user-images.githubusercontent.com/93749/290972630-54f81ca6-4e4e-4943-a5a1-d8a949bd3c33.png
>
> I cannot view these images: my browser says there are errors, and if I
> try downloading them with wget, I get 404...

You can see the images in a web broswer here:

https://lists.gnu.org/archive/html/emacs-devel/2023-12/msg00448.html

Steve Berman



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-16 19:13     ` Eli Zaretskii
  2023-12-16 20:32       ` Stephen Berman
@ 2023-12-16 22:00       ` JD Smith
  2023-12-17  7:28         ` Eli Zaretskii
  1 sibling, 1 reply; 11+ messages in thread
From: JD Smith @ 2023-12-16 22:00 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, casouri

[-- Attachment #1: Type: text/plain, Size: 5791 bytes --]



> On Dec 16, 2023, at 2:13 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> I cannot view these images: my browser says there are errors, and if I
> try downloading them with wget, I get 404...

GitHub shenanigans I guess.  Stephen’s link works: https://lists.gnu.org/archive/html/emacs-devel/2023-12/msg00448.html.  

>>> Maybe the stuff you display uses some characters that make
>>> redisplay slower for reasons other than line-length?  What kind of
>>> characters are used for showing these "undo-trees"?
>> 
>> I just retried my test with the ascii-only symbols, and it resulted in a modest 17% speedup, but still >20s at 32K columns.
> 
> Very strange.  Profiling should point out the culprit, especially if
> you load the .el file (not .elc!) and then profile.

I tried again from a clean session with eval-buffer’d vundo.el.  The profiler for 10,000 edits (no undos) reports a very odd result:

        1799  95% - command-execute
        1799  95%  - call-interactively
        1799  95%   - funcall-interactively
        1799  95%    - eval-expression
        1799  95%     - eval
        1799  95%      - save-current-buffer
        1797  95%       - vundo
        1797  95%        - let
        1756  93%         - vundo-1
        1756  93%          - save-current-buffer
        1756  93%           - let
        1756  93%            - save-current-buffer
        1756  93%             - vundo--refresh-buffer
        1756  93%              - save-current-buffer
        1756  93%               - let
        1651  87%                - if
        1649  87%                 - vundo--draw-tree
        1649  87%                  - let*
        1649  87%                   - while
        1649  87%                    - let*
        1640  86%                     - let
        1560  82%                      - max
        1560  82%                         1-
          71   3%                      - if
          71   3%                       - let
          71   3%                        - if
          65   3%                         + insert
           9   0%                      - rx-to-string
           7   0%                       + rx--translate
           1   0%                       + cons
           2   0%                     + if
           1   0%                     + progn
           1   0%                       vundo--put-node-at-point
           1   0%                       vundo--node-timestamp
           2   0%                 + eq
         105   5%                + let*
          41   2%         + let
          87   4% + ...

Looking closer, the only relevant thing in draw-tree is (1- (current-column)).  Adding that to my elp list and... we have found the principal culprit (here with 10K edits):

vundo--draw-tree  1           9.65838       9.65838
current-column    10001       9.124152      0.0009123239

`current-column’ is apparently pretty expensive here (~1ms).  The current-column docs do say:

In a buffer with very long lines, the value will be an approximation,
because calculating the exact number is very expensive.

I just tested it in a temp buffer with width 10K, and it’s much less expensive (60-100µs).  Perhaps the text properties on each node character (containing the defstruct node object) slows current-column down even further.  So it is a long-lines problem of a sort, just not display related. 

>> To confirm, it’s your understanding that functions like all those mentioned above (goto-char, looking at, delete-char, move-to-column, insert) should be agnostic w.r.t. the length of lines, for buffers of the same total character count?  
> 
> Yes.  Emacs editing commands are largely unaware of lines, unless Lisp
> programs invoke functions that are sensitive to lines, like
> end-of-line etc.  Buffer text is stored as a C string, it is not
> subdivided into lines.  Only the display engine is sensitive to lines.
> 
>>> Bottom line: I don't think I understand what causes your problems.
>>> The information presented is insufficient for even guessing the
>>> possible reasons.  Please consider telling more.
>> 
>> I did include a link to the short draw function itself; not sure if you saw that.  And I demonstrated clearly (IMO of course) that the performance slowdown depends directly on the max column count of the displayed buffer.  As a small bit of constructive feedback, this take (“insufficient for even guessing the possible reasons”) strikes me as uncharitable given this context.
> 
> You must understand that I don't have time to study non-trivial code
> when such questions are posted.  I spent 8 hours today reviewing and
> installing patches, merging the release branch to master, fixing bugs
> people reported over the last days, etc.  So by "insufficient
> information" I meant what was presented explicitly, both here and in
> the discussion of the issue.  My best suggestion so far is to profile
> the code; if you are certain you know which function is problematic,
> profile that after loading the code as .el file, and show the
> completely expanded profile.  The profiles you've shown until now
> indicate that vundo--draw-tree is the hot spot, so loading it as a .el
> file and running "M-x profiler-start" followed by "M-x profiler-report"
> when it finishes should show the expensive parts.

Thank for the suggestions.  I deeply appreciate all you do for Emacs and am honestly quite blown away by your responsiveness to bugs, queries, long bike-shed discussions, etc.  Please take my comment only as gentle feedback on communication & perceptions, which I always appreciate receiving.   Your advice was, as usual, good... now to workaround current-column slowness. 

Thanks again.

[-- Attachment #2: Type: text/html, Size: 10143 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-16 22:00       ` JD Smith
@ 2023-12-17  7:28         ` Eli Zaretskii
  2023-12-17 17:49           ` JD Smith
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2023-12-17  7:28 UTC (permalink / raw)
  To: JD Smith; +Cc: emacs-devel, casouri

> From: JD Smith <jdtsmith@gmail.com>
> Date: Sat, 16 Dec 2023 17:00:36 -0500
> Cc: emacs-devel@gnu.org,
>  casouri@gmail.com
> 
> I tried again from a clean session with eval-buffer’d vundo.el.  The profiler for 10,000 edits (no undos)
> reports a very odd result:
> 
>         1799  95% - command-execute
>         1799  95%  - call-interactively
>         1799  95%   - funcall-interactively
>         1799  95%    - eval-expression
>         1799  95%     - eval
>         1799  95%      - save-current-buffer
>         1797  95%       - vundo
>         1797  95%        - let
>         1756  93%         - vundo-1
>         1756  93%          - save-current-buffer
>         1756  93%           - let
>         1756  93%            - save-current-buffer
>         1756  93%             - vundo--refresh-buffer
>         1756  93%              - save-current-buffer
>         1756  93%               - let
>         1651  87%                - if
>         1649  87%                 - vundo--draw-tree
>         1649  87%                  - let*
>         1649  87%                   - while
>         1649  87%                    - let*
>         1640  86%                     - let
>         1560  82%                      - max
>         1560  82%                         1-
>           71   3%                      - if
>           71   3%                       - let
>           71   3%                        - if
>           65   3%                         + insert
>            9   0%                      - rx-to-string
>            7   0%                       + rx--translate
>            1   0%                       + cons
>            2   0%                     + if
>            1   0%                     + progn
>            1   0%                       vundo--put-node-at-point
>            1   0%                       vundo--node-timestamp
>            2   0%                 + eq
>          105   5%                + let*
>           41   2%         + let
>           87   4% + ...
> 
> Looking closer, the only relevant thing in draw-tree is (1- (current-column)).  Adding that to my elp list
> and... we have found the principal culprit (here with 10K edits):
> 
>  vundo--draw-tree  1           9.65838       9.65838
>  current-column    10001       9.124152      0.0009123239
> 
> `current-column’ is apparently pretty expensive here (~1ms).

Yes, I think current-column is the culprit here.

> The current-column docs do say:
> 
>  In a buffer with very long lines, the value will be an approximation,
>  because calculating the exact number is very expensive.

The "very long lines" case happens when the line's length exceeds the
long-line-threshold, which in your case it most probably doesn't
(unless you lower long-line-threshold considerably).  So this part of
the doc string is not relevant to your problem.

> I just tested it in a temp buffer with width 10K, and it’s much less expensive (60-100µs).  Perhaps the
> text properties on each node character (containing the defstruct node object) slows current-column
> down even further.

Yes. The problem is that vundo.el uses 'display' properties _a_lot_,
and those slow down current-column, because each time it sees a
'display' property whose value is a string, it needs to extract the
property value and calculate the width of that string, to account for
its width on display.

This is basically an algorithmic problem in vundo.el: since the width
of each string that vundo--translate is fixed and can be computed in
advance, vundo.el could count the columns itself (by adding those
known widths), instead of asking current-column to measure the width
of those strings over and over and over again.

Alternatively, instead of using 'display' properties, vundo.el could
insert the characters produced by vundo--translate directly into the
buffer, which would allow current-column to work at its usual speed.

> So it is a long-lines problem of a sort, just not display related. 

I disagree.  it's a problem with too many 'display' properties in the
buffer, which is not recommended for more than one reason.  I suggest
to rethink this design.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-17  7:28         ` Eli Zaretskii
@ 2023-12-17 17:49           ` JD Smith
  2023-12-17 18:07             ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: JD Smith @ 2023-12-17 17:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, casouri

[-- Attachment #1: Type: text/plain, Size: 6257 bytes --]

> On Dec 17, 2023, at 2:28 AM, Eli Zaretskii <eliz@gnu.org> wrote:
>> 
>> 
>> Looking closer, the only relevant thing in draw-tree is (1- (current-column)).  Adding that to my elp list
>> and... we have found the principal culprit (here with 10K edits):
>> 
>> vundo--draw-tree  1           9.65838       9.65838
>> current-column    10001       9.124152      0.0009123239
>> 
>> `current-column’ is apparently pretty expensive here (~1ms).
> 
> Yes, I think current-column is the culprit here.

Indeed it was, thanks for pointing us in the right direction.  Luckily there was no need to call current-column on every node, as vundo had been doing, but only in the rare case when a branch needed to bend around a preexisting one — an easy fix that improved launch speed by 25x or more. 

That said, with long lines (>10-20K columns, say) visible, the problem remains that overall Emacs is mildly but noticeably less responsive, in the mini-buffer, in other buffers, etc., until the vundo buffer is hidden.  Perhaps that’s just the minimum penalty paid for such long lines.

>> The current-column docs do say:
>> 
>> In a buffer with very long lines, the value will be an approximation,
>> because calculating the exact number is very expensive.
> 
> The "very long lines" case happens when the line's length exceeds the
> long-line-threshold, which in your case it most probably doesn't
> (unless you lower long-line-threshold considerably).  So this part of
> the doc string is not relevant to your problem.

I was only referring to the “very expensive” part. 

>> I just tested it in a temp buffer with width 10K, and it’s much less expensive (60-100µs).  Perhaps the
>> text properties on each node character (containing the defstruct node object) slows current-column
>> down even further.
> 
> Yes. The problem is that vundo.el uses 'display' properties _a_lot_,
> and those slow down current-column, because each time it sees a
> 'display' property whose value is a string, it needs to extract the
> property value and calculate the width of that string, to account for
> its width on display.

That would have indeed been a good explanation for the additional slowness, but in fact vundo does not use 'display properties:

	(next-single-property-change (point-min) ‘display) -> nil

It uses simple characters with faces, and on each “node” character it also attaches a special 'vundo-node text property pointing to the node data at point (a cl-defstruct type vundo-m).

> This is basically an algorithmic problem in vundo.el: since the width
> of each string that vundo--translate is fixed and can be computed in
> advance, vundo.el could count the columns itself (by adding those
> known widths), instead of asking current-column to measure the width
> of those strings over and over and over again.

Indeed this was a vundo problem in that there was no need to call current-column over and over for each node.  But the basic slowness of current-column remains, so unless there are low-hanging improvements to be made there, algorithms that need frequent column calculations would need to try other approaches.

> Alternatively, instead of using 'display' properties, vundo.el could
> insert the characters produced by vundo--translate directly into the
> buffer, which would allow current-column to work at its usual speed.

This is in fact what vundo already does.  The problem is rather that current-column slows down for some types of buffer content (see below).

>> So it is a long-lines problem of a sort, just not display related.
> 
> I disagree.  it's a problem with too many 'display' properties in the
> buffer, which is not recommended for more than one reason.  I suggest
> to rethink this design.

Since no 'display properties were used (other than in a single overlay for indicating the current node), something else is causing the current-column quadratic slowdown for long lines.  To investigate, I made a generic test of current-column speed on 18K long lines; see below.

What I conclude from this test is:

Current column is reasonably fast counting long lines of plain text — 175ms total to compute each column at 6000 nodes spread over 18K columns.
current-column is quite slow counting unicode characters: about 21x slower than the ascii equivalents.  Perhaps this is because unicode chars can have non-standard widths, so emacs needs to do some additional checks?
But this isn’t the full story, because current-column is also (separately) quite slow counting plain characters with attached faces: 33x slower than the fast case. This one seems less intuitive to me. Can faces also (in principle) alter the character width (like italic for example)?
Combining these two (unicode chars with alternating faces) is the worst case — 43x slower than the fast case of unadorned plain chars.  This was vundo’s case.
Conclusion: unless some improvements can be made, since current-column can get quite slow for long lines of many chars with faces (especially unicode chars), it is best to use it judiciously.

Thanks again for looking into this.

++++
(require 'elp)
(elp-instrument-function #'current-column)

(elp-reset-all)
(with-temp-buffer
  (dotimes (_ 6000)
    (insert "*")
    (insert "--")
    (current-column))
  (elp-results))
;; 1. Plain chars, no face -- average time: 2.906...e-05

(elp-reset-all)
(with-temp-buffer
  (dotimes (_ 6000)
    (insert "○")
    (insert "──")
    (current-column))
  (elp-results))
;; 2. Unicode chars, no face -- average time: 0.0006181615, 21.3x slower than 1

(elp-reset-all)
(with-temp-buffer
  (dotimes (_ 6000)
    (insert (propertize "*" 'face 'bold))
    (insert (propertize "--" 'face 'font-lock-regexp-face))
    (current-column))
  (elp-results))
;; 3. Plain chars, alternating faces -- average time: 0.0009658896, 33.2x slower than 1

(elp-reset-all)
(with-temp-buffer
  (dotimes (_ 6000)
    (insert (propertize "○" 'face 'bold))
    (insert (propertize "──" 'face 'font-lock-regexp-face))
    (current-column))
  (elp-results))
;; 4. Unicode chars, alternative faces -- average time: 0.0012507139, 43x slower than 1

[-- Attachment #2: Type: text/html, Size: 9440 bytes --]

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-17 17:49           ` JD Smith
@ 2023-12-17 18:07             ` Eli Zaretskii
  2023-12-17 19:26               ` JD Smith
  0 siblings, 1 reply; 11+ messages in thread
From: Eli Zaretskii @ 2023-12-17 18:07 UTC (permalink / raw)
  To: JD Smith; +Cc: emacs-devel, casouri

> From: JD Smith <jdtsmith@gmail.com>
> Date: Sun, 17 Dec 2023 12:49:27 -0500
> Cc: emacs-devel@gnu.org,
>  casouri@gmail.com
> 
> That said, with long lines (>10-20K columns, say) visible, the problem remains that overall Emacs is
> mildly but noticeably less responsive, in the mini-buffer, in other buffers, etc., until the vundo buffer is
> hidden.  Perhaps that’s just the minimum penalty paid for such long lines.

Long lines with faces and overlays, I should say.

>  The current-column docs do say:
> 
>  In a buffer with very long lines, the value will be an approximation,
>  because calculating the exact number is very expensive.
> 
>  The "very long lines" case happens when the line's length exceeds the
>  long-line-threshold, which in your case it most probably doesn't
>  (unless you lower long-line-threshold considerably).  So this part of
>  the doc string is not relevant to your problem.
> 
> I was only referring to the “very expensive” part. 

The "very expensive" part is also only when lines are around
long-line-threshold.  Again, not your case.

>  Yes. The problem is that vundo.el uses 'display' properties _a_lot_,
>  and those slow down current-column, because each time it sees a
>  'display' property whose value is a string, it needs to extract the
>  property value and calculate the width of that string, to account for
>  its width on display.
> 
> That would have indeed been a good explanation for the additional slowness, but in fact vundo does
> not use 'display properties:

I do see 'display' properties being used:

  (defun vundo--highlight-node (node)
    "Highlight NODE as current node."
    (unless vundo--highlight-overlay
      (setq vundo--highlight-overlay
	    (make-overlay (1- (vundo-m-point node)) (vundo-m-point node)))
      (overlay-put vundo--highlight-overlay
		   'display (vundo--translate "●"))
      (overlay-put vundo--highlight-overlay
		   'face 'vundo-highlight)
      ;; Make current node’s highlight override last saved node’s
      ;; highlight, should they collide.
      (overlay-put vundo--highlight-overlay 'priority 2))
    (move-overlay vundo--highlight-overlay
		  (1- (vundo-m-point node))
		  (vundo-m-point node)))

> Indeed this was a vundo problem in that there was no need to call current-column over and over for
> each node.  But the basic slowness of current-column remains, so unless there are low-hanging
> improvements to be made there, algorithms that need frequent column calculations would need to try
> other approaches.

Lisp programs are not supposed to need frequent column calculations,
certainly not several times on the same line.

> * Current column is reasonably fast counting long lines of plain text — 175ms total to compute each
>  column at 6000 nodes spread over 18K columns.
> * current-column is quite slow counting unicode characters: about 21x slower than the ascii
>  equivalents.  Perhaps this is because unicode chars can have non-standard widths, so emacs
>  needs to do some additional checks?

No.  I can understand why non-ASCII characters are somewhat slower,
since each one of them is several bytes, not just one, and we convert
the multibyte representation into a codepoint on the fly.  But 21-fold
slowdown sounds excessive.

> * But this isn’t the full story, because current-column is also (separately) quite slow counting plain
>  characters with attached faces: 33x slower than the fast case. This one seems less intuitive to me.
>  Can faces also (in principle) alter the character width (like italic for example)?

When buffer text has faces (or any other text properties, really)
current-column uses a slower algorithm, because the faster one will
yield incorrect results.

> * Combining these two (unicode chars with alternating faces) is the worst case — 43x slower than
>  the fast case of unadorned plain chars.  This was vundo’s case.

Then vundo is better redesigned to avoid such massive reliance on
current-column.

There's one more factor to consider here: current-column always starts
from the beginning of the line.  If vundo needs to call current-column
many times on the same line, it loses because it each time starts from
BOL, in effect throwing away the results of the previous calculation.
This wastes CPU cycles.  It might be better to use sum char-width
values of individual characters instead.



^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-17 18:07             ` Eli Zaretskii
@ 2023-12-17 19:26               ` JD Smith
  2023-12-17 19:43                 ` Eli Zaretskii
  0 siblings, 1 reply; 11+ messages in thread
From: JD Smith @ 2023-12-17 19:26 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel, casouri

> On Dec 17, 2023, at 1:07 PM, Eli Zaretskii <eliz@gnu.org> wrote:
> 
> I do see 'display' properties being used:
> 
>  (defun vundo--highlight-node (node)
>       <snip>
> (overlay-put vundo--highlight-overlay
>   'display (vundo--translate "●”)) ...

That’s just a single display property on the one character wide overlay used to indicate the current node.  The test showed faces and unicode are what lead to the significant current-column slowdown (though I wouldn’t be surprised if ‘display would make it yet worse!).

> Lisp programs are not supposed to need frequent column calculations,
> certainly not several times on the same line.

Maybe then it’s worth a word of warning in the current-column docs mentioning this in the context of faces, unicode, and long-line performance?

>> * current-column is quite slow counting unicode characters: about 21x slower than the ascii
>> equivalents.  Perhaps this is because unicode chars can have non-standard widths, so emacs
>> needs to do some additional checks?
> 
> No.  I can understand why non-ASCII characters are somewhat slower,
> since each one of them is several bytes, not just one, and we convert
> the multibyte representation into a codepoint on the fly.  But 21-fold
> slowdown sounds excessive.

Results may vary by system/font I suppose; perhaps others could run the simple test in my last message (no non-default packages required) and report back what slowdown factor they get.

>> * But this isn’t the full story, because current-column is also (separately) quite slow counting plain
>> characters with attached faces: 33x slower than the fast case. This one seems less intuitive to me.
>> Can faces also (in principle) alter the character width (like italic for example)?
> 
> When buffer text has faces (or any other text properties, really)
> current-column uses a slower algorithm, because the faster one will
> yield incorrect results.

I see.  I had noticed the slowdown does not depend on whether faces alternate or just a single face is used, so this makes sense.

>> * Combining these two (unicode chars with alternating faces) is the worst case — 43x slower than
>> the fast case of unadorned plain chars.  This was vundo’s case.
> 
> Then vundo is better redesigned to avoid such massive reliance on
> current-column.

Yes indeed, and it has been already!  The fix was quite trivial [1].  I was speaking more generally to current-column performance.  You could imagine algorithms that legitimately do need to keep a dynamically updating measure of the column.

> There's one more factor to consider here: current-column always starts
> from the beginning of the line.  If vundo needs to call current-column
> many times on the same line, it loses because it each time starts from
> BOL, in effect throwing away the results of the previous calculation.
> This wastes CPU cycles.  It might be better to use sum char-width
> values of individual characters instead.

Definitely; it’s clear how repetitive/wasteful this is.  I wonder, similar to line-numbers, could we cache current-column at various points along a line and apply a delta?  I.e. for computing line numbers differentially we have

(count-lines start end) 

which linum and others use to good effect.  Is there a similar

(count-columns start end)?  

This could be useful for algorithms that truly need frequent, up-to-date column info (not vundo, luckily).

Thanks for your help and analysis.

[1] https://github.com/casouri/vundo/pull/85

^ permalink raw reply	[flat|nested] 11+ messages in thread

* Re: Mitigating the long-lines penalty in vundo
  2023-12-17 19:26               ` JD Smith
@ 2023-12-17 19:43                 ` Eli Zaretskii
  0 siblings, 0 replies; 11+ messages in thread
From: Eli Zaretskii @ 2023-12-17 19:43 UTC (permalink / raw)
  To: JD Smith; +Cc: emacs-devel, casouri

> From: JD Smith <jdtsmith@gmail.com>
> Date: Sun, 17 Dec 2023 14:26:23 -0500
> Cc: emacs-devel@gnu.org,
>  casouri@gmail.com
> 
> > Lisp programs are not supposed to need frequent column calculations,
> > certainly not several times on the same line.
> 
> Maybe then it’s worth a word of warning in the current-column docs mentioning this in the context of faces, unicode, and long-line performance?

I'm not sure.  It was never a problem until now.  Warning against very
rare issues could have negative effects of its own.

> >> * current-column is quite slow counting unicode characters: about 21x slower than the ascii
> >> equivalents.  Perhaps this is because unicode chars can have non-standard widths, so emacs
> >> needs to do some additional checks?
> > 
> > No.  I can understand why non-ASCII characters are somewhat slower,
> > since each one of them is several bytes, not just one, and we convert
> > the multibyte representation into a codepoint on the fly.  But 21-fold
> > slowdown sounds excessive.
> 
> Results may vary by system/font I suppose; perhaps others could run the simple test in my last message (no non-default packages required) and report back what slowdown factor they get.

current-column does not depend on fonts and knows nothing about fonts.
The canonical width of every character is stored in a char-table that
is indexed only by the character's codepoint, and is independent of
any real font metrics.

> > There's one more factor to consider here: current-column always starts
> > from the beginning of the line.  If vundo needs to call current-column
> > many times on the same line, it loses because it each time starts from
> > BOL, in effect throwing away the results of the previous calculation.
> > This wastes CPU cycles.  It might be better to use sum char-width
> > values of individual characters instead.
> 
> Definitely; it’s clear how repetitive/wasteful this is.  I wonder, similar to line-numbers, could we cache current-column at various points along a line and apply a delta?  I.e. for computing line numbers differentially we have
> 
> (count-lines start end) 
> 
> which linum and others use to good effect.  Is there a similar
> 
> (count-columns start end)?  

This is impossible in general (think TABs).  What we can do is provide
something like

  (count-columns startpos startcol endpos)

IOW, we need to tell the function what is the column at STARTPOS.  And
then there will be problems if STARTPOS is "covered" by an overlay or
a display property...



^ permalink raw reply	[flat|nested] 11+ messages in thread

end of thread, other threads:[~2023-12-17 19:43 UTC | newest]

Thread overview: 11+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-12-16 16:19 Mitigating the long-lines penalty in vundo JD Smith
2023-12-16 16:45 ` Eli Zaretskii
2023-12-16 18:54   ` JD Smith
2023-12-16 19:13     ` Eli Zaretskii
2023-12-16 20:32       ` Stephen Berman
2023-12-16 22:00       ` JD Smith
2023-12-17  7:28         ` Eli Zaretskii
2023-12-17 17:49           ` JD Smith
2023-12-17 18:07             ` Eli Zaretskii
2023-12-17 19:26               ` JD Smith
2023-12-17 19:43                 ` Eli Zaretskii

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).