* Mitigating the long-lines penalty in vundo @ 2023-12-16 16:19 JD Smith 2023-12-16 16:45 ` Eli Zaretskii 0 siblings, 1 reply; 11+ messages in thread From: JD Smith @ 2023-12-16 16:19 UTC (permalink / raw) To: emacs-devel; +Cc: Yuan Fu [-- Attachment #1: Type: text/plain, Size: 2252 bytes --] Yuan’s excellent ELPA package vundo[1] uses `buffer-undo-list’ and `undo-equiv-table’ to visualize and flexibly navigate the entire tree of edits/undos/redos (see the beautiful explanation how[2]). We’ve been improving it lately with saved state visibility and diff functionality. It displays the tree of edits sideways in just a few lines of text or unicode chars in a small buffer at the bottom of the frame. This works nicely, until you accumulate undo branches >~300 levels deep, which is easy to achieve in long editing sessions, or if you save undo state (e.g. with undo-fu-session). This leads to displayed lines which are >1000 columns wide. And then Emacs' dreaded long-line display/edit slowness starts creeping in, imposing a quadratic time penalty [3]:  I verified that this is due to long lines and not some vundo issue per se by simply breaking the tree-drawing algorithm so it wraps to a new line after some max number of columns:  But obviously such “broken” trees aren’t terribly useful as-is. What creative solutions can people suggest? The tree draw algorithm [4] is relatively simple, and depth-first. Only a total of 6 distinct characters are displayed. It has to do things like “bend” a branch down if it runs into a preexisting branch deeper in the tree. I didn’t see any improvement by enabling so-long-minor-mode. Since we don’t need access to more than a window-width worth of tree at a time, is there a simple way to display only some range of columns of a very wide buffer, and have the full structure live in a non-displayed “backing data store”? Would that even help with the long-line penalty, or does that penalty come in with any sort of character insertion, looking-at, etc. within long lines? Do we have to resort to our own “backing store” like a ragged array of long vectors, redrawing when the left/right column boundaries change (sounds complicated!)? Thanks for your insights. [1] https://github.com/casouri/vundo [2] https://archive.casouri.cat/note/2021/visual-undo-tree [3] https://github.com/casouri/vundo/issues/68#issuecomment-1848992975 [4] https://github.com/casouri/vundo/blob/824be351527f7a606b25637f6ba2a00cceade338/vundo.el#L591 [-- Attachment #2.1: Type: text/html, Size: 3376 bytes --] [-- Attachment #2.2: image.png --] [-- Type: image/png, Size: 406873 bytes --] [-- Attachment #2.3: image.png --] [-- Type: image/png, Size: 143077 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-16 16:19 Mitigating the long-lines penalty in vundo JD Smith @ 2023-12-16 16:45 ` Eli Zaretskii 2023-12-16 18:54 ` JD Smith 0 siblings, 1 reply; 11+ messages in thread From: Eli Zaretskii @ 2023-12-16 16:45 UTC (permalink / raw) To: JD Smith; +Cc: emacs-devel, casouri > From: JD Smith <jdtsmith@gmail.com> > Date: Sat, 16 Dec 2023 11:19:51 -0500 > Cc: Yuan Fu <casouri@gmail.com> > > Yuan’s excellent ELPA package vundo[1] uses `buffer-undo-list’ and `undo-equiv-table’ to visualize and flexibly navigate the entire tree of edits/undos/redos (see the beautiful explanation how[2]). We’ve been improving it lately with saved state visibility and diff functionality. It displays the tree of edits sideways in just a few lines of text or unicode chars in a small buffer at the bottom of the frame. This works nicely, until you accumulate undo branches >~300 levels deep, which is easy to achieve in long editing sessions, or if you save undo state (e.g. with undo-fu-session). This leads to displayed lines which are >1000 columns wide. And then Emacs' dreaded long-line display/edit slowness starts creeping in, imposing a quadratic time penalty [3]: > >  (I don't understand what was supposed to be shown in my MUA at this place; what I see is as single character U+FFFC.) Is this with Emacs 29? If so, you should be able to tune long-line-threshold to get some help from Emacs. Although I'm a bit surprised by the number 1000: usually these problems happen when lines are much wider, which is why the default value of long-line-threshold is 50K, not 1K. In the issue you posted, I see that there's significant slowdown (hundreds of milliseconds) when you limit the display to 200 or 80 or even 30(!) columns. Given that, and the other evidence in the issue you posted, I'd say that the long lines are not the main factor for slowness here, because no one will convince me that the Emacs display has problems when the buffer includes 30 columns. Maybe the stuff you display uses some characters that make redisplay slower for reasons other than line-length? What kind of characters are used for showing these "undo-trees"? > Since we don’t need access to more than a window-width worth of tree at a time, is there a simple way to display only some range of columns of a very wide buffer, and have the full structure live in a non-displayed “backing data store”? Would that even help with the long-line penalty, or does that penalty come in with any sort of character insertion, looking-at, etc. within long lines? Do we have to resort to our own “backing store” like a ragged array of long vectors, redrawing when the left/right column boundaries change (sounds complicated!)? I don't think I understand your questions, but the reasons for display slowness with very long lines are not related to character insertion or looking-at, they are related to the algorithms used for display layout, which is the first stage of each redisplay cycle, and are also reused for visual cursor movement (C-n/C-p etc.). Bottom line: I don't think I understand what causes your problems. The information presented is insufficient for even guessing the possible reasons. Please consider telling more. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-16 16:45 ` Eli Zaretskii @ 2023-12-16 18:54 ` JD Smith 2023-12-16 19:13 ` Eli Zaretskii 0 siblings, 1 reply; 11+ messages in thread From: JD Smith @ 2023-12-16 18:54 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, casouri [-- Attachment #1: Type: text/plain, Size: 9921 bytes --] > On Dec 16, 2023, at 11:45 AM, Eli Zaretskii <eliz@gnu.org> wrote: > >> From: JD Smith <jdtsmith@gmail.com> >> Date: Sat, 16 Dec 2023 11:19:51 -0500 >> Cc: Yuan Fu <casouri@gmail.com> >> >> Yuan’s excellent ELPA package vundo[1] uses `buffer-undo-list’ and `undo-equiv-table’ to visualize and flexibly navigate the entire tree of edits/undos/redos (see the beautiful explanation how[2]). We’ve been improving it lately with saved state visibility and diff functionality. It displays the tree of edits sideways in just a few lines of text or unicode chars in a small buffer at the bottom of the frame. This works nicely, until you accumulate undo branches >~300 levels deep, which is easy to achieve in long editing sessions, or if you save undo state (e.g. with undo-fu-session). This leads to displayed lines which are >1000 columns wide. And then Emacs' dreaded long-line display/edit slowness starts creeping in, imposing a quadratic time penalty [3]: >> >>  > > (I don't understand what was supposed to be shown in my MUA at this > place; what I see is as single character U+FFFC.) Apologies; you’d think inline images could survive passage in 2023! These two images were included in the OP: https://private-user-images.githubusercontent.com/93749/289367869-b0d8ab4f-c231-4964-85ba-d9ecc20363b0.png https://private-user-images.githubusercontent.com/93749/290972630-54f81ca6-4e4e-4943-a5a1-d8a949bd3c33.png > Is this with Emacs 29? If so, you should be able to tune > long-line-threshold to get some help from Emacs. Although I'm a bit > surprised by the number 1000: usually these problems happen when lines > are much wider, which is why the default value of long-line-threshold > is 50K, not 1K. Yes, Emacs 29. The problem is mild until say 10K columns. I tried dropping `long-line-threshold' to 1K (locally in the vundo buffer) and unfortunately it made no difference in the timing. > In the issue you posted, I see that there's > significant slowdown (hundreds of milliseconds) when you limit the > display to 200 or 80 or even 30(!) columns. Given that, and the other > evidence in the issue you posted, I'd say that the long lines are not > the main factor for slowness here, because no one will convince me > that the Emacs display has problems when the buffer includes 30 > columns. Curious indeed, and a valid point. It’s a logarithmic scale, so the absolute difference isn’t that much. If long lines are not to blame (whether via display or some related issue), I can’t then understand why breaking the lines into shorter pieces would so radically speed things up. It’s a pretty simple draw algorithm involving straightforward buffer edits: • inserting (propertized) text one or two chars at a time (list below) • looking-at with a simple regex (couple chars) to see if there’s room for the branch • delete-char in a few places • goto-char to the saved parents position to start drawing a new branch • move-to-column to continue a branch on a new line from the same column position Nothing obvious to me that would seem to give quadratic performance degradation, especially since I’m testing with a purely linear structure (one long line). Settling down to approximately a constant draw time per edit/node, as occurs for high tree depth at max_columns<1000, seems quite reasonable. As another piece of evidence which I mentioned in the issue thread, when such slow to display long-line vundo buffers are visible, editing performance in buffers shown in other windows is degraded (a behavior I’ve experienced with long-line buffers before). > Maybe the stuff you display uses some characters that make > redisplay slower for reasons other than line-length? What kind of > characters are used for showing these "undo-trees"? Nothing fancy, just characters from either one of these two sets (with faces applied): (defconst vundo-ascii-symbols '((selected-node . ?x) (node . ?o) (horizontal-stem . ?-) (vertical-stem . ?|) (branch . ?|) (last-branch . ?`))) (defconst vundo-unicode-symbols '((selected-node . ?●) (node . ?○) (horizontal-stem . ?─) (vertical-stem . ?│) (branch . ?├) (last-branch . ?└))) I just retried my test with the ascii-only symbols, and it resulted in a modest 17% speedup, but still >20s at 32K columns. >> Since we don’t need access to more than a window-width worth of tree at a time, is there a simple way to display only some range of columns of a very wide buffer, and have the full structure live in a non-displayed “backing data store”? Would that even help with the long-line penalty, or does that penalty come in with any sort of character insertion, looking-at, etc. within long lines? Do we have to resort to our own “backing store” like a ragged array of long vectors, redrawing when the left/right column boundaries change (sounds complicated!)? > > I don't think I understand your questions, but the reasons for display > slowness with very long lines are not related to character insertion > or looking-at, they are related to the algorithms used for display > layout, which is the first stage of each redisplay cycle, and are also > reused for visual cursor movement (C-n/C-p etc.). Breaking the long lines at column lengths <200 speeds things up by >50x. I was hoping to replicate this speedup by limiting the column width of displayed text in some convenient fashion, without explicitly having to create a data structure outside of the buffer, and recreate the buffer contents from it on demand. To confirm, it’s your understanding that functions like all those mentioned above (goto-char, looking at, delete-char, move-to-column, insert) should be agnostic w.r.t. the length of lines, for buffers of the same total character count? If so, I really can’t understand how breaking fewer long lines into more shorter ones, drawn with the same algorithm, would improve the performance so much. If anyone would like to try this test themselves, I’ve included it below (vundo + lorem-ipsum required). > Bottom line: I don't think I understand what causes your problems. > The information presented is insufficient for even guessing the > possible reasons. Please consider telling more. I did include a link to the short draw function itself; not sure if you saw that. And I demonstrated clearly (IMO of course) that the performance slowdown depends directly on the max column count of the displayed buffer. As a small bit of constructive feedback, this take (“insufficient for even guessing the possible reasons”) strikes me as uncharitable given this context. That said, you’ve motivated me to dig a bit deeper and temporarily set aside long lines as the presumed culprit. I used elp to instrument anything significant in vundo—draw-tree, but came up short: Function Name Call Count Elapsed Time Average Time vundo--draw-tree 1 3.849056 3.849056 append 48118 0.0478320000 9.940...e-07 propertize 12063 0.0423950000 3.514...e-06 insert 22018 0.0183619999 8.339...e-07 vundo--translate 12002 0.0085510000 7.124...e-07 looking-at 12202 0.0038980000 3.194...e-07 vundo--node-timestamp 6002 0.0033740000 5.621...e-07 goto-char 13517 0.0026380000 1.951...e-07 last 12002 0.0021369999 1.780...e-07 vundo-m-children 12001 0.0021319999 1.776...e-07 pop 6001 0.0017129999 2.854...e-07 vundo-m-point 6000 0.0015979999 2.663...e-07 vundo--put-node-at-point 6001 0.0015049999 2.507...e-07 vundo-m-parent 6001 0.0007300000 1.216...e-07 erase-buffer 48 0.0005819999 1.212...e-05 delete-char 29 1.100...e-05 3.793...e-07 I’ll report more if I discover anything, but of course very happy to hear anyone's thoughts on the matter in the meantime. Thanks for your attention and analysis. +++ ;; create and benchmark starting vundo with long linear undo trees (defun vundo-linear-stress-test (nsen) (interactive (list (read-number "Number of edits (sentences): " 20000))) (let* ((buf (get-buffer-create "*vundo-linear-test*")) (res (cl-loop for nedits = 3 then (round (* nedits 1.25)) while (< nedits nsen) do (with-current-buffer buf (read-only-mode -1) (font-lock-mode -1) (erase-buffer) (setq buffer-undo-list nil pending-undo-list nil) (dotimes (_ nedits) (lorem-ipsum-insert-sentences 1) (insert "\n") (undo-boundary))) collect (flatten-tree (list nedits (with-current-buffer buf (benchmark-call #'vundo)) (with-current-buffer " *vundo tree*" (current-column)))) do (with-current-buffer (get-buffer " *vundo tree*") (vundo-quit))))) (with-output-to-temp-buffer "*vundo-linear-stress-test-results*" (princ (format "%6s %10s %4s %10s %6s\n" "Edits" "Time(s)" "GCs" "GCTime(s)" "Cols")) (dolist (x res) (princ (apply #'format "%6d %10.3f %4d %10.3f %6d\n" x)))))) [-- Attachment #2: Type: text/html, Size: 14335 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-16 18:54 ` JD Smith @ 2023-12-16 19:13 ` Eli Zaretskii 2023-12-16 20:32 ` Stephen Berman 2023-12-16 22:00 ` JD Smith 0 siblings, 2 replies; 11+ messages in thread From: Eli Zaretskii @ 2023-12-16 19:13 UTC (permalink / raw) To: JD Smith; +Cc: emacs-devel, casouri > From: JD Smith <jdtsmith@gmail.com> > Date: Sat, 16 Dec 2023 13:54:40 -0500 > Cc: emacs-devel@gnu.org, > casouri@gmail.com > > >>  > > > > (I don't understand what was supposed to be shown in my MUA at this > > place; what I see is as single character U+FFFC.) > > Apologies; you’d think inline images could survive passage in 2023! These two images were included in the OP: > > https://private-user-images.githubusercontent.com/93749/289367869-b0d8ab4f-c231-4964-85ba-d9ecc20363b0.png > https://private-user-images.githubusercontent.com/93749/290972630-54f81ca6-4e4e-4943-a5a1-d8a949bd3c33.png I cannot view these images: my browser says there are errors, and if I try downloading them with wget, I get 404... > > Maybe the stuff you display uses some characters that make > > redisplay slower for reasons other than line-length? What kind of > > characters are used for showing these "undo-trees"? > > Nothing fancy, just characters from either one of these two sets (with faces applied): > > (defconst vundo-ascii-symbols > '((selected-node . ?x) > (node . ?o) > (horizontal-stem . ?-) > (vertical-stem . ?|) > (branch . ?|) > (last-branch . ?`))) > > (defconst vundo-unicode-symbols > '((selected-node . ?●) > (node . ?○) > (horizontal-stem . ?─) > (vertical-stem . ?│) > (branch . ?├) > (last-branch . ?└))) > > I just retried my test with the ascii-only symbols, and it resulted in a modest 17% speedup, but still >20s at 32K columns. Very strange. Profiling should point out the culprit, especially if you load the .el file (not .elc!) and then profile. > To confirm, it’s your understanding that functions like all those mentioned above (goto-char, looking at, delete-char, move-to-column, insert) should be agnostic w.r.t. the length of lines, for buffers of the same total character count? Yes. Emacs editing commands are largely unaware of lines, unless Lisp programs invoke functions that are sensitive to lines, like end-of-line etc. Buffer text is stored as a C string, it is not subdivided into lines. Only the display engine is sensitive to lines. > > Bottom line: I don't think I understand what causes your problems. > > The information presented is insufficient for even guessing the > > possible reasons. Please consider telling more. > > I did include a link to the short draw function itself; not sure if you saw that. And I demonstrated clearly (IMO of course) that the performance slowdown depends directly on the max column count of the displayed buffer. As a small bit of constructive feedback, this take (“insufficient for even guessing the possible reasons”) strikes me as uncharitable given this context. You must understand that I don't have time to study non-trivial code when such questions are posted. I spent 8 hours today reviewing and installing patches, merging the release branch to master, fixing bugs people reported over the last days, etc. So by "insufficient information" I meant what was presented explicitly, both here and in the discussion of the issue. My best suggestion so far is to profile the code; if you are certain you know which function is problematic, profile that after loading the code as .el file, and show the completely expanded profile. The profiles you've shown until now indicate that vundo--draw-tree is the hot spot, so loading it as a .el file and running "M-x profiler-start" followed by "M-x profiler-report" when it finishes should show the expensive parts. Another idea is to increase the GC threshold, in case what slows this down is the amount of garbage created by this code. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-16 19:13 ` Eli Zaretskii @ 2023-12-16 20:32 ` Stephen Berman 2023-12-16 22:00 ` JD Smith 1 sibling, 0 replies; 11+ messages in thread From: Stephen Berman @ 2023-12-16 20:32 UTC (permalink / raw) To: Eli Zaretskii; +Cc: JD Smith, emacs-devel, casouri On Sat, 16 Dec 2023 21:13:28 +0200 Eli Zaretskii <eliz@gnu.org> wrote: >> From: JD Smith <jdtsmith@gmail.com> >> Date: Sat, 16 Dec 2023 13:54:40 -0500 >> Cc: emacs-devel@gnu.org, >> casouri@gmail.com >> >> >>  >> > >> > (I don't understand what was supposed to be shown in my MUA at this >> > place; what I see is as single character U+FFFC.) >> >> Apologies; you’d think inline images could survive passage in 2023! These >> two images were included in the OP: >> >> https://private-user-images.githubusercontent.com/93749/289367869-b0d8ab4f-c231-4964-85ba-d9ecc20363b0.png >> https://private-user-images.githubusercontent.com/93749/290972630-54f81ca6-4e4e-4943-a5a1-d8a949bd3c33.png > > I cannot view these images: my browser says there are errors, and if I > try downloading them with wget, I get 404... You can see the images in a web broswer here: https://lists.gnu.org/archive/html/emacs-devel/2023-12/msg00448.html Steve Berman ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-16 19:13 ` Eli Zaretskii 2023-12-16 20:32 ` Stephen Berman @ 2023-12-16 22:00 ` JD Smith 2023-12-17 7:28 ` Eli Zaretskii 1 sibling, 1 reply; 11+ messages in thread From: JD Smith @ 2023-12-16 22:00 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, casouri [-- Attachment #1: Type: text/plain, Size: 5791 bytes --] > On Dec 16, 2023, at 2:13 PM, Eli Zaretskii <eliz@gnu.org> wrote: > > I cannot view these images: my browser says there are errors, and if I > try downloading them with wget, I get 404... GitHub shenanigans I guess. Stephen’s link works: https://lists.gnu.org/archive/html/emacs-devel/2023-12/msg00448.html. >>> Maybe the stuff you display uses some characters that make >>> redisplay slower for reasons other than line-length? What kind of >>> characters are used for showing these "undo-trees"? >> >> I just retried my test with the ascii-only symbols, and it resulted in a modest 17% speedup, but still >20s at 32K columns. > > Very strange. Profiling should point out the culprit, especially if > you load the .el file (not .elc!) and then profile. I tried again from a clean session with eval-buffer’d vundo.el. The profiler for 10,000 edits (no undos) reports a very odd result: 1799 95% - command-execute 1799 95% - call-interactively 1799 95% - funcall-interactively 1799 95% - eval-expression 1799 95% - eval 1799 95% - save-current-buffer 1797 95% - vundo 1797 95% - let 1756 93% - vundo-1 1756 93% - save-current-buffer 1756 93% - let 1756 93% - save-current-buffer 1756 93% - vundo--refresh-buffer 1756 93% - save-current-buffer 1756 93% - let 1651 87% - if 1649 87% - vundo--draw-tree 1649 87% - let* 1649 87% - while 1649 87% - let* 1640 86% - let 1560 82% - max 1560 82% 1- 71 3% - if 71 3% - let 71 3% - if 65 3% + insert 9 0% - rx-to-string 7 0% + rx--translate 1 0% + cons 2 0% + if 1 0% + progn 1 0% vundo--put-node-at-point 1 0% vundo--node-timestamp 2 0% + eq 105 5% + let* 41 2% + let 87 4% + ... Looking closer, the only relevant thing in draw-tree is (1- (current-column)). Adding that to my elp list and... we have found the principal culprit (here with 10K edits): vundo--draw-tree 1 9.65838 9.65838 current-column 10001 9.124152 0.0009123239 `current-column’ is apparently pretty expensive here (~1ms). The current-column docs do say: In a buffer with very long lines, the value will be an approximation, because calculating the exact number is very expensive. I just tested it in a temp buffer with width 10K, and it’s much less expensive (60-100µs). Perhaps the text properties on each node character (containing the defstruct node object) slows current-column down even further. So it is a long-lines problem of a sort, just not display related. >> To confirm, it’s your understanding that functions like all those mentioned above (goto-char, looking at, delete-char, move-to-column, insert) should be agnostic w.r.t. the length of lines, for buffers of the same total character count? > > Yes. Emacs editing commands are largely unaware of lines, unless Lisp > programs invoke functions that are sensitive to lines, like > end-of-line etc. Buffer text is stored as a C string, it is not > subdivided into lines. Only the display engine is sensitive to lines. > >>> Bottom line: I don't think I understand what causes your problems. >>> The information presented is insufficient for even guessing the >>> possible reasons. Please consider telling more. >> >> I did include a link to the short draw function itself; not sure if you saw that. And I demonstrated clearly (IMO of course) that the performance slowdown depends directly on the max column count of the displayed buffer. As a small bit of constructive feedback, this take (“insufficient for even guessing the possible reasons”) strikes me as uncharitable given this context. > > You must understand that I don't have time to study non-trivial code > when such questions are posted. I spent 8 hours today reviewing and > installing patches, merging the release branch to master, fixing bugs > people reported over the last days, etc. So by "insufficient > information" I meant what was presented explicitly, both here and in > the discussion of the issue. My best suggestion so far is to profile > the code; if you are certain you know which function is problematic, > profile that after loading the code as .el file, and show the > completely expanded profile. The profiles you've shown until now > indicate that vundo--draw-tree is the hot spot, so loading it as a .el > file and running "M-x profiler-start" followed by "M-x profiler-report" > when it finishes should show the expensive parts. Thank for the suggestions. I deeply appreciate all you do for Emacs and am honestly quite blown away by your responsiveness to bugs, queries, long bike-shed discussions, etc. Please take my comment only as gentle feedback on communication & perceptions, which I always appreciate receiving. Your advice was, as usual, good... now to workaround current-column slowness. Thanks again. [-- Attachment #2: Type: text/html, Size: 10143 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-16 22:00 ` JD Smith @ 2023-12-17 7:28 ` Eli Zaretskii 2023-12-17 17:49 ` JD Smith 0 siblings, 1 reply; 11+ messages in thread From: Eli Zaretskii @ 2023-12-17 7:28 UTC (permalink / raw) To: JD Smith; +Cc: emacs-devel, casouri > From: JD Smith <jdtsmith@gmail.com> > Date: Sat, 16 Dec 2023 17:00:36 -0500 > Cc: emacs-devel@gnu.org, > casouri@gmail.com > > I tried again from a clean session with eval-buffer’d vundo.el. The profiler for 10,000 edits (no undos) > reports a very odd result: > > 1799 95% - command-execute > 1799 95% - call-interactively > 1799 95% - funcall-interactively > 1799 95% - eval-expression > 1799 95% - eval > 1799 95% - save-current-buffer > 1797 95% - vundo > 1797 95% - let > 1756 93% - vundo-1 > 1756 93% - save-current-buffer > 1756 93% - let > 1756 93% - save-current-buffer > 1756 93% - vundo--refresh-buffer > 1756 93% - save-current-buffer > 1756 93% - let > 1651 87% - if > 1649 87% - vundo--draw-tree > 1649 87% - let* > 1649 87% - while > 1649 87% - let* > 1640 86% - let > 1560 82% - max > 1560 82% 1- > 71 3% - if > 71 3% - let > 71 3% - if > 65 3% + insert > 9 0% - rx-to-string > 7 0% + rx--translate > 1 0% + cons > 2 0% + if > 1 0% + progn > 1 0% vundo--put-node-at-point > 1 0% vundo--node-timestamp > 2 0% + eq > 105 5% + let* > 41 2% + let > 87 4% + ... > > Looking closer, the only relevant thing in draw-tree is (1- (current-column)). Adding that to my elp list > and... we have found the principal culprit (here with 10K edits): > > vundo--draw-tree 1 9.65838 9.65838 > current-column 10001 9.124152 0.0009123239 > > `current-column’ is apparently pretty expensive here (~1ms). Yes, I think current-column is the culprit here. > The current-column docs do say: > > In a buffer with very long lines, the value will be an approximation, > because calculating the exact number is very expensive. The "very long lines" case happens when the line's length exceeds the long-line-threshold, which in your case it most probably doesn't (unless you lower long-line-threshold considerably). So this part of the doc string is not relevant to your problem. > I just tested it in a temp buffer with width 10K, and it’s much less expensive (60-100µs). Perhaps the > text properties on each node character (containing the defstruct node object) slows current-column > down even further. Yes. The problem is that vundo.el uses 'display' properties _a_lot_, and those slow down current-column, because each time it sees a 'display' property whose value is a string, it needs to extract the property value and calculate the width of that string, to account for its width on display. This is basically an algorithmic problem in vundo.el: since the width of each string that vundo--translate is fixed and can be computed in advance, vundo.el could count the columns itself (by adding those known widths), instead of asking current-column to measure the width of those strings over and over and over again. Alternatively, instead of using 'display' properties, vundo.el could insert the characters produced by vundo--translate directly into the buffer, which would allow current-column to work at its usual speed. > So it is a long-lines problem of a sort, just not display related. I disagree. it's a problem with too many 'display' properties in the buffer, which is not recommended for more than one reason. I suggest to rethink this design. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-17 7:28 ` Eli Zaretskii @ 2023-12-17 17:49 ` JD Smith 2023-12-17 18:07 ` Eli Zaretskii 0 siblings, 1 reply; 11+ messages in thread From: JD Smith @ 2023-12-17 17:49 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, casouri [-- Attachment #1: Type: text/plain, Size: 6257 bytes --] > On Dec 17, 2023, at 2:28 AM, Eli Zaretskii <eliz@gnu.org> wrote: >> >> >> Looking closer, the only relevant thing in draw-tree is (1- (current-column)). Adding that to my elp list >> and... we have found the principal culprit (here with 10K edits): >> >> vundo--draw-tree 1 9.65838 9.65838 >> current-column 10001 9.124152 0.0009123239 >> >> `current-column’ is apparently pretty expensive here (~1ms). > > Yes, I think current-column is the culprit here. Indeed it was, thanks for pointing us in the right direction. Luckily there was no need to call current-column on every node, as vundo had been doing, but only in the rare case when a branch needed to bend around a preexisting one — an easy fix that improved launch speed by 25x or more. That said, with long lines (>10-20K columns, say) visible, the problem remains that overall Emacs is mildly but noticeably less responsive, in the mini-buffer, in other buffers, etc., until the vundo buffer is hidden. Perhaps that’s just the minimum penalty paid for such long lines. >> The current-column docs do say: >> >> In a buffer with very long lines, the value will be an approximation, >> because calculating the exact number is very expensive. > > The "very long lines" case happens when the line's length exceeds the > long-line-threshold, which in your case it most probably doesn't > (unless you lower long-line-threshold considerably). So this part of > the doc string is not relevant to your problem. I was only referring to the “very expensive” part. >> I just tested it in a temp buffer with width 10K, and it’s much less expensive (60-100µs). Perhaps the >> text properties on each node character (containing the defstruct node object) slows current-column >> down even further. > > Yes. The problem is that vundo.el uses 'display' properties _a_lot_, > and those slow down current-column, because each time it sees a > 'display' property whose value is a string, it needs to extract the > property value and calculate the width of that string, to account for > its width on display. That would have indeed been a good explanation for the additional slowness, but in fact vundo does not use 'display properties: (next-single-property-change (point-min) ‘display) -> nil It uses simple characters with faces, and on each “node” character it also attaches a special 'vundo-node text property pointing to the node data at point (a cl-defstruct type vundo-m). > This is basically an algorithmic problem in vundo.el: since the width > of each string that vundo--translate is fixed and can be computed in > advance, vundo.el could count the columns itself (by adding those > known widths), instead of asking current-column to measure the width > of those strings over and over and over again. Indeed this was a vundo problem in that there was no need to call current-column over and over for each node. But the basic slowness of current-column remains, so unless there are low-hanging improvements to be made there, algorithms that need frequent column calculations would need to try other approaches. > Alternatively, instead of using 'display' properties, vundo.el could > insert the characters produced by vundo--translate directly into the > buffer, which would allow current-column to work at its usual speed. This is in fact what vundo already does. The problem is rather that current-column slows down for some types of buffer content (see below). >> So it is a long-lines problem of a sort, just not display related. > > I disagree. it's a problem with too many 'display' properties in the > buffer, which is not recommended for more than one reason. I suggest > to rethink this design. Since no 'display properties were used (other than in a single overlay for indicating the current node), something else is causing the current-column quadratic slowdown for long lines. To investigate, I made a generic test of current-column speed on 18K long lines; see below. What I conclude from this test is: Current column is reasonably fast counting long lines of plain text — 175ms total to compute each column at 6000 nodes spread over 18K columns. current-column is quite slow counting unicode characters: about 21x slower than the ascii equivalents. Perhaps this is because unicode chars can have non-standard widths, so emacs needs to do some additional checks? But this isn’t the full story, because current-column is also (separately) quite slow counting plain characters with attached faces: 33x slower than the fast case. This one seems less intuitive to me. Can faces also (in principle) alter the character width (like italic for example)? Combining these two (unicode chars with alternating faces) is the worst case — 43x slower than the fast case of unadorned plain chars. This was vundo’s case. Conclusion: unless some improvements can be made, since current-column can get quite slow for long lines of many chars with faces (especially unicode chars), it is best to use it judiciously. Thanks again for looking into this. ++++ (require 'elp) (elp-instrument-function #'current-column) (elp-reset-all) (with-temp-buffer (dotimes (_ 6000) (insert "*") (insert "--") (current-column)) (elp-results)) ;; 1. Plain chars, no face -- average time: 2.906...e-05 (elp-reset-all) (with-temp-buffer (dotimes (_ 6000) (insert "○") (insert "──") (current-column)) (elp-results)) ;; 2. Unicode chars, no face -- average time: 0.0006181615, 21.3x slower than 1 (elp-reset-all) (with-temp-buffer (dotimes (_ 6000) (insert (propertize "*" 'face 'bold)) (insert (propertize "--" 'face 'font-lock-regexp-face)) (current-column)) (elp-results)) ;; 3. Plain chars, alternating faces -- average time: 0.0009658896, 33.2x slower than 1 (elp-reset-all) (with-temp-buffer (dotimes (_ 6000) (insert (propertize "○" 'face 'bold)) (insert (propertize "──" 'face 'font-lock-regexp-face)) (current-column)) (elp-results)) ;; 4. Unicode chars, alternative faces -- average time: 0.0012507139, 43x slower than 1 [-- Attachment #2: Type: text/html, Size: 9440 bytes --] ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-17 17:49 ` JD Smith @ 2023-12-17 18:07 ` Eli Zaretskii 2023-12-17 19:26 ` JD Smith 0 siblings, 1 reply; 11+ messages in thread From: Eli Zaretskii @ 2023-12-17 18:07 UTC (permalink / raw) To: JD Smith; +Cc: emacs-devel, casouri > From: JD Smith <jdtsmith@gmail.com> > Date: Sun, 17 Dec 2023 12:49:27 -0500 > Cc: emacs-devel@gnu.org, > casouri@gmail.com > > That said, with long lines (>10-20K columns, say) visible, the problem remains that overall Emacs is > mildly but noticeably less responsive, in the mini-buffer, in other buffers, etc., until the vundo buffer is > hidden. Perhaps that’s just the minimum penalty paid for such long lines. Long lines with faces and overlays, I should say. > The current-column docs do say: > > In a buffer with very long lines, the value will be an approximation, > because calculating the exact number is very expensive. > > The "very long lines" case happens when the line's length exceeds the > long-line-threshold, which in your case it most probably doesn't > (unless you lower long-line-threshold considerably). So this part of > the doc string is not relevant to your problem. > > I was only referring to the “very expensive” part. The "very expensive" part is also only when lines are around long-line-threshold. Again, not your case. > Yes. The problem is that vundo.el uses 'display' properties _a_lot_, > and those slow down current-column, because each time it sees a > 'display' property whose value is a string, it needs to extract the > property value and calculate the width of that string, to account for > its width on display. > > That would have indeed been a good explanation for the additional slowness, but in fact vundo does > not use 'display properties: I do see 'display' properties being used: (defun vundo--highlight-node (node) "Highlight NODE as current node." (unless vundo--highlight-overlay (setq vundo--highlight-overlay (make-overlay (1- (vundo-m-point node)) (vundo-m-point node))) (overlay-put vundo--highlight-overlay 'display (vundo--translate "●")) (overlay-put vundo--highlight-overlay 'face 'vundo-highlight) ;; Make current node’s highlight override last saved node’s ;; highlight, should they collide. (overlay-put vundo--highlight-overlay 'priority 2)) (move-overlay vundo--highlight-overlay (1- (vundo-m-point node)) (vundo-m-point node))) > Indeed this was a vundo problem in that there was no need to call current-column over and over for > each node. But the basic slowness of current-column remains, so unless there are low-hanging > improvements to be made there, algorithms that need frequent column calculations would need to try > other approaches. Lisp programs are not supposed to need frequent column calculations, certainly not several times on the same line. > * Current column is reasonably fast counting long lines of plain text — 175ms total to compute each > column at 6000 nodes spread over 18K columns. > * current-column is quite slow counting unicode characters: about 21x slower than the ascii > equivalents. Perhaps this is because unicode chars can have non-standard widths, so emacs > needs to do some additional checks? No. I can understand why non-ASCII characters are somewhat slower, since each one of them is several bytes, not just one, and we convert the multibyte representation into a codepoint on the fly. But 21-fold slowdown sounds excessive. > * But this isn’t the full story, because current-column is also (separately) quite slow counting plain > characters with attached faces: 33x slower than the fast case. This one seems less intuitive to me. > Can faces also (in principle) alter the character width (like italic for example)? When buffer text has faces (or any other text properties, really) current-column uses a slower algorithm, because the faster one will yield incorrect results. > * Combining these two (unicode chars with alternating faces) is the worst case — 43x slower than > the fast case of unadorned plain chars. This was vundo’s case. Then vundo is better redesigned to avoid such massive reliance on current-column. There's one more factor to consider here: current-column always starts from the beginning of the line. If vundo needs to call current-column many times on the same line, it loses because it each time starts from BOL, in effect throwing away the results of the previous calculation. This wastes CPU cycles. It might be better to use sum char-width values of individual characters instead. ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-17 18:07 ` Eli Zaretskii @ 2023-12-17 19:26 ` JD Smith 2023-12-17 19:43 ` Eli Zaretskii 0 siblings, 1 reply; 11+ messages in thread From: JD Smith @ 2023-12-17 19:26 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel, casouri > On Dec 17, 2023, at 1:07 PM, Eli Zaretskii <eliz@gnu.org> wrote: > > I do see 'display' properties being used: > > (defun vundo--highlight-node (node) > <snip> > (overlay-put vundo--highlight-overlay > 'display (vundo--translate "●”)) ... That’s just a single display property on the one character wide overlay used to indicate the current node. The test showed faces and unicode are what lead to the significant current-column slowdown (though I wouldn’t be surprised if ‘display would make it yet worse!). > Lisp programs are not supposed to need frequent column calculations, > certainly not several times on the same line. Maybe then it’s worth a word of warning in the current-column docs mentioning this in the context of faces, unicode, and long-line performance? >> * current-column is quite slow counting unicode characters: about 21x slower than the ascii >> equivalents. Perhaps this is because unicode chars can have non-standard widths, so emacs >> needs to do some additional checks? > > No. I can understand why non-ASCII characters are somewhat slower, > since each one of them is several bytes, not just one, and we convert > the multibyte representation into a codepoint on the fly. But 21-fold > slowdown sounds excessive. Results may vary by system/font I suppose; perhaps others could run the simple test in my last message (no non-default packages required) and report back what slowdown factor they get. >> * But this isn’t the full story, because current-column is also (separately) quite slow counting plain >> characters with attached faces: 33x slower than the fast case. This one seems less intuitive to me. >> Can faces also (in principle) alter the character width (like italic for example)? > > When buffer text has faces (or any other text properties, really) > current-column uses a slower algorithm, because the faster one will > yield incorrect results. I see. I had noticed the slowdown does not depend on whether faces alternate or just a single face is used, so this makes sense. >> * Combining these two (unicode chars with alternating faces) is the worst case — 43x slower than >> the fast case of unadorned plain chars. This was vundo’s case. > > Then vundo is better redesigned to avoid such massive reliance on > current-column. Yes indeed, and it has been already! The fix was quite trivial [1]. I was speaking more generally to current-column performance. You could imagine algorithms that legitimately do need to keep a dynamically updating measure of the column. > There's one more factor to consider here: current-column always starts > from the beginning of the line. If vundo needs to call current-column > many times on the same line, it loses because it each time starts from > BOL, in effect throwing away the results of the previous calculation. > This wastes CPU cycles. It might be better to use sum char-width > values of individual characters instead. Definitely; it’s clear how repetitive/wasteful this is. I wonder, similar to line-numbers, could we cache current-column at various points along a line and apply a delta? I.e. for computing line numbers differentially we have (count-lines start end) which linum and others use to good effect. Is there a similar (count-columns start end)? This could be useful for algorithms that truly need frequent, up-to-date column info (not vundo, luckily). Thanks for your help and analysis. [1] https://github.com/casouri/vundo/pull/85 ^ permalink raw reply [flat|nested] 11+ messages in thread
* Re: Mitigating the long-lines penalty in vundo 2023-12-17 19:26 ` JD Smith @ 2023-12-17 19:43 ` Eli Zaretskii 0 siblings, 0 replies; 11+ messages in thread From: Eli Zaretskii @ 2023-12-17 19:43 UTC (permalink / raw) To: JD Smith; +Cc: emacs-devel, casouri > From: JD Smith <jdtsmith@gmail.com> > Date: Sun, 17 Dec 2023 14:26:23 -0500 > Cc: emacs-devel@gnu.org, > casouri@gmail.com > > > Lisp programs are not supposed to need frequent column calculations, > > certainly not several times on the same line. > > Maybe then it’s worth a word of warning in the current-column docs mentioning this in the context of faces, unicode, and long-line performance? I'm not sure. It was never a problem until now. Warning against very rare issues could have negative effects of its own. > >> * current-column is quite slow counting unicode characters: about 21x slower than the ascii > >> equivalents. Perhaps this is because unicode chars can have non-standard widths, so emacs > >> needs to do some additional checks? > > > > No. I can understand why non-ASCII characters are somewhat slower, > > since each one of them is several bytes, not just one, and we convert > > the multibyte representation into a codepoint on the fly. But 21-fold > > slowdown sounds excessive. > > Results may vary by system/font I suppose; perhaps others could run the simple test in my last message (no non-default packages required) and report back what slowdown factor they get. current-column does not depend on fonts and knows nothing about fonts. The canonical width of every character is stored in a char-table that is indexed only by the character's codepoint, and is independent of any real font metrics. > > There's one more factor to consider here: current-column always starts > > from the beginning of the line. If vundo needs to call current-column > > many times on the same line, it loses because it each time starts from > > BOL, in effect throwing away the results of the previous calculation. > > This wastes CPU cycles. It might be better to use sum char-width > > values of individual characters instead. > > Definitely; it’s clear how repetitive/wasteful this is. I wonder, similar to line-numbers, could we cache current-column at various points along a line and apply a delta? I.e. for computing line numbers differentially we have > > (count-lines start end) > > which linum and others use to good effect. Is there a similar > > (count-columns start end)? This is impossible in general (think TABs). What we can do is provide something like (count-columns startpos startcol endpos) IOW, we need to tell the function what is the column at STARTPOS. And then there will be problems if STARTPOS is "covered" by an overlay or a display property... ^ permalink raw reply [flat|nested] 11+ messages in thread
end of thread, other threads:[~2023-12-17 19:43 UTC | newest] Thread overview: 11+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-12-16 16:19 Mitigating the long-lines penalty in vundo JD Smith 2023-12-16 16:45 ` Eli Zaretskii 2023-12-16 18:54 ` JD Smith 2023-12-16 19:13 ` Eli Zaretskii 2023-12-16 20:32 ` Stephen Berman 2023-12-16 22:00 ` JD Smith 2023-12-17 7:28 ` Eli Zaretskii 2023-12-17 17:49 ` JD Smith 2023-12-17 18:07 ` Eli Zaretskii 2023-12-17 19:26 ` JD Smith 2023-12-17 19:43 ` Eli Zaretskii
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).