* RE: State of the overlay tree branch? [not found] ` <<834lldp18f.fsf@gnu.org> @ 2018-03-18 21:37 ` Drew Adams 2018-03-19 1:33 ` Stefan Monnier 2018-03-19 6:33 ` Eli Zaretskii 0 siblings, 2 replies; 54+ messages in thread From: Drew Adams @ 2018-03-18 21:37 UTC (permalink / raw) To: Eli Zaretskii, Sebastian Sturm; +Cc: emacs-devel > If lsp-mode/lsp-ui needs a fast line counter, one can easily be > provided by exposing find_newline to Lisp. IME, it's lightning-fast, > and should run circles around count-lines (used by line-number-at-pos). Having a fast line counter in Elisp would be terrific. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 21:37 ` State of the overlay tree branch? Drew Adams @ 2018-03-19 1:33 ` Stefan Monnier 2018-03-19 6:50 ` Eli Zaretskii 2018-03-19 6:33 ` Eli Zaretskii 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-19 1:33 UTC (permalink / raw) To: emacs-devel >> If lsp-mode/lsp-ui needs a fast line counter, one can easily be >> provided by exposing find_newline to Lisp. IME, it's lightning-fast, >> and should run circles around count-lines (used by line-number-at-pos). > Having a fast line counter in Elisp would be terrific. It should be pretty easy to provide such a thing by relying on a cache of the last call. Tho Sebastian's experience seems to indicate that the current code doesn't only suffer from the time to count LF but also from the time to process the markers. I seem to remember someone else experiencing a similar problem and suggesting that the problem lies in the charpos_to_bytepos (and/or bytepos_to_charpos) conversion function, which iterates through all the markers to try and find a "nearby" marker (because markers keep track of both their bytepos and their charpos). Looking for a nearby marker to avoid scanning the whole buffer is a good idea in many cases, but not if scanning the list of markers takes more time than scanning the whole buffer. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 1:33 ` Stefan Monnier @ 2018-03-19 6:50 ` Eli Zaretskii 2018-03-19 12:29 ` Stefan Monnier 0 siblings, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 6:50 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Sun, 18 Mar 2018 21:33:05 -0400 > > >> If lsp-mode/lsp-ui needs a fast line counter, one can easily be > >> provided by exposing find_newline to Lisp. IME, it's lightning-fast, > >> and should run circles around count-lines (used by line-number-at-pos). > > Having a fast line counter in Elisp would be terrific. > > It should be pretty easy to provide such a thing by relying on a cache > of the last call. This is already coded, see display_count_lines. That's what the native line-number display uses. Exposing it to Lisp should be easy. But I don't believe it could be orders of magnitude faster than count-lines, even though it doesn't need to convert character position to byte position. I'm guessing something entirely different and unrelated to line-counting per se is at work here. > Tho Sebastian's experience seems to indicate that the > current code doesn't only suffer from the time to count LF but also from > the time to process the markers. Not sure what marker processing did you have in mind. Can you elaborate? > I seem to remember someone else experiencing a similar problem and > suggesting that the problem lies in the charpos_to_bytepos (and/or > bytepos_to_charpos) conversion function, which iterates through all the > markers to try and find a "nearby" marker (because markers keep track > of both their bytepos and their charpos). Looking for a nearby marker > to avoid scanning the whole buffer is a good idea in many cases, but not > if scanning the list of markers takes more time than scanning the > whole buffer. But find_newline doesn't look for markers, and it converts character to byte position just 2 times. Or am I missing something? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 6:50 ` Eli Zaretskii @ 2018-03-19 12:29 ` Stefan Monnier 2018-03-19 13:02 ` Eli Zaretskii 0 siblings, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-19 12:29 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >> It should be pretty easy to provide such a thing by relying on a cache >> of the last call. > This is already coded, see display_count_lines. I don't see any cache in display_count_lines but yes, the code that uses display_count_lines does do such caching and we could/should expose it to Lisp. In nlinum.el I also have something similar (called nlinum--line-number-at-pos). > But I don't believe it could be orders of magnitude faster than > count-lines, even though it doesn't need to convert character position > to byte position. Scanning from the last used position can be *very* different from scanning from point-min. So yes, it can be orders of magnitude faster (I wrote nlinum--line-number-at-pos for that reason: I sadly didn't write down the test case I used back then, but the difference was very significant). > I'm guessing something entirely different and unrelated to > line-counting per se is at work here. Agreed. >> Tho Sebastian's experience seems to indicate that the >> current code doesn't only suffer from the time to count LF but also from >> the time to process the markers. > Not sure what marker processing did you have in mind. Can you > elaborate? The for (tail = BUF_MARKERS (b); tail; tail = tail->next) loop in buf_charpos_to_bytepos and buf_bytepos_to_charpos. > But find_newline doesn't look for markers, and it converts character > to byte position just 2 times. Or am I missing something? The idea is that the above loop (even if called only twice) might be sufficient to make line-number-at-pos take 0.2s. I don't know that it's the culprit, I'm just mentioning that possibility, since noverlays removes all the overlay-induced markers which would significantly reduce the number of markers over which the above loops. Note that those loops stop as soon as we're within 50 chars of the goal, and they also stop as soon as there's no non-ascii char between the "best bounds so far". So for them to cause the slow down seen here, we'd need not only a very large number of markers but also additional conditions that might not be very likely. But it's still a possibility. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 12:29 ` Stefan Monnier @ 2018-03-19 13:02 ` Eli Zaretskii 2018-03-19 13:43 ` Stefan Monnier 0 siblings, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 13:02 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: emacs-devel@gnu.org > Date: Mon, 19 Mar 2018 08:29:40 -0400 > > > But I don't believe [display_count_lines] could be orders of > > magnitude faster than count-lines, even though it doesn't need to > > convert character position to byte position. > > Scanning from the last used position can be *very* different from > scanning from point-min. So yes, it can be orders of magnitude faster Well, in my measurements it's already very fast. I don't understand why the OP sees times that are 10 times slower. > The > > for (tail = BUF_MARKERS (b); tail; tail = tail->next) > > loop in buf_charpos_to_bytepos and buf_bytepos_to_charpos. > > > But find_newline doesn't look for markers, and it converts character > > to byte position just 2 times. Or am I missing something? > > The idea is that the above loop (even if called only twice) might be > sufficient to make line-number-at-pos take 0.2s. I very much doubt that loop could take such a long time. And running a benchmark 1000 times means that the 2nd through 1000th iteration find the mapping much faster, probably bypassing the loop entirely. > So for them to cause the slow down seen here, we'd need not only > a very large number of markers but also additional conditions that might > not be very likely. > But it's still a possibility. I'll believe it when I see it. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 13:02 ` Eli Zaretskii @ 2018-03-19 13:43 ` Stefan Monnier 2018-03-19 14:28 ` Eli Zaretskii 0 siblings, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-19 13:43 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel >> The idea is that the above loop (even if called only twice) might be >> sufficient to make line-number-at-pos take 0.2s. > I very much doubt that loop could take such a long time. I also find it unlikely. > And running a benchmark 1000 times means that the 2nd through 1000th > iteration find the mapping much faster, probably bypassing the > loop entirely. Quite likely. BTW, when thinking about how to avoid this loop's worst case (regardless if it's the source of the current problem), I was thinking that we could probably make that worst case even less likely by replacing the hardcoded "50" with a limit that grows as the loop progresses. The patch below should make sure that the total number of iterations (through markers + through chars) is no more than 2*buffer-size, no matter how many markers there are. [ Rather than incrementing by 1 we should ideally increment by a number that corresponds to how much slower is each iteration through markers compared to the iteration through chars, but I have no idea what that number would typically be. ] WDYT? Stefan diff --git a/src/marker.c b/src/marker.c index f61701f2f6..bfe6de486e 100644 --- a/src/marker.c +++ b/src/marker.c @@ -141,6 +141,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = 50; eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b)); @@ -180,8 +181,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - best_below < distance) break; + else + distance++; } /* We get here if we did not exactly hit one of the known places. @@ -293,6 +296,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = 50; eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b)); @@ -323,8 +327,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - best_below < distance) break; + else + distance++; } /* We get here if we did not exactly hit one of the known places. ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 13:43 ` Stefan Monnier @ 2018-03-19 14:28 ` Eli Zaretskii 2018-03-19 14:39 ` Stefan Monnier 0 siblings, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 14:28 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: emacs-devel@gnu.org > Date: Mon, 19 Mar 2018 09:43:12 -0400 > > BTW, when thinking about how to avoid this loop's worst case (regardless > if it's the source of the current problem), I was thinking that we could > probably make that worst case even less likely by replacing the > hardcoded "50" with a limit that grows as the loop progresses. > > The patch below should make sure that the total number of iterations > (through markers + through chars) is no more than 2*buffer-size, no matter > how many markers there are. > [ Rather than incrementing by 1 we should ideally increment by > a number that corresponds to how much slower is each iteration through > markers compared to the iteration through chars, but I have no idea > what that number would typically be. ] > > WDYT? Could be a good idea, but I suggest to time its improvement before we decide. I've seen a few surprises in that area. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 14:28 ` Eli Zaretskii @ 2018-03-19 14:39 ` Stefan Monnier 0 siblings, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-03-19 14:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > Could be a good idea, but I suggest to time its improvement before we > decide. I've seen a few surprises in that area. I see some speed up in my artificial test (where I create an insane number of markers at point-min and then ask to goto-char to the middle of a midsized buffer) but I can't see any impact whatsoever (neither positive nor negative) in real tests (unsurprisingly). I'd rather first have a non-artificial case where I can measure a speed up. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 21:37 ` State of the overlay tree branch? Drew Adams 2018-03-19 1:33 ` Stefan Monnier @ 2018-03-19 6:33 ` Eli Zaretskii 1 sibling, 0 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 6:33 UTC (permalink / raw) To: Drew Adams; +Cc: s.sturm, emacs-devel > Date: Sun, 18 Mar 2018 14:37:39 -0700 (PDT) > From: Drew Adams <drew.adams@oracle.com> > Cc: emacs-devel@gnu.org > > > If lsp-mode/lsp-ui needs a fast line counter, one can easily be > > provided by exposing find_newline to Lisp. IME, it's lightning-fast, > > and should run circles around count-lines (used by line-number-at-pos). > > Having a fast line counter in Elisp would be terrific. Actually, I see that line-number-at-pos, count-lines, and forward-line should already be fast, as they do exactly what I thought, something I failed to see originally. If that's not "fast enough", a test case showing the problem would be a good starting point. ^ permalink raw reply [flat|nested] 54+ messages in thread
* State of the overlay tree branch? @ 2018-03-18 20:14 Sebastian Sturm 2018-03-18 20:39 ` Eli Zaretskii 2018-03-26 13:06 ` Stefan Monnier 0 siblings, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-18 20:14 UTC (permalink / raw) To: emacs-devel Hi, after finding that the feature/noverlay branch does wonders for my editing experience[1], I'd like to reinvigorate the discussion on its inclusion into master. Are there plans for a merge with the emacs-27 master branch? Any critical issues blocking such a merge? If a recent Emacs 27 variant with the overlay branch feature was available, I'd be happy to evaluate that in my daily work. best regards, Sebastian Sturm [1] I'm using cquery for my C++ editing needs, which comes with an overlay-based semantic highlighting mechanism. With my emacs configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per character insertion, which consume ~20 to 25 ms each when performing edits close to the bottom of a 66KB C++ file (measured using (benchmark-run 1000 (line-number-at-pos (point))) on a release build of emacs-27/git commit #9942734...). Using the noverlay branch, this figure drops to ~160us per call. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 20:14 Sebastian Sturm @ 2018-03-18 20:39 ` Eli Zaretskii 2018-03-18 21:04 ` Sebastian Sturm 2018-03-21 14:14 ` Sebastien Chapuis 2018-03-26 13:06 ` Stefan Monnier 1 sibling, 2 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-18 20:39 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Sun, 18 Mar 2018 21:14:53 +0100 > > [1] I'm using cquery for my C++ editing needs, which comes with an > overlay-based semantic highlighting mechanism. With my emacs > configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per > character insertion, which consume ~20 to 25 ms each when performing > edits close to the bottom of a 66KB C++ file (measured using > (benchmark-run 1000 (line-number-at-pos (point))) on a release build of > emacs-27/git commit #9942734...). Using the noverlay branch, this figure > drops to ~160us per call. If lsp-mode/lsp-ui needs a fast line counter, one can easily be provided by exposing find_newline to Lisp. IME, it's lightning-fast, and should run circles around count-lines (used by line-number-at-pos). (I'm not sure I even understand how overlays come into play here, btw.) ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 20:39 ` Eli Zaretskii @ 2018-03-18 21:04 ` Sebastian Sturm 2018-03-18 23:03 ` Sebastian Sturm 2018-03-19 6:28 ` Eli Zaretskii 2018-03-21 14:14 ` Sebastien Chapuis 1 sibling, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-18 21:04 UTC (permalink / raw) To: emacs-devel I also found it surprising that overlays would slow down line counting, but since I don't know anything about the architecture of the Emacs display engine, or its overlay implementation, I figured that overlays must be to blame because (i) the issue went away after switching to the feature/noverlay branch (ii) configuring the semantic highlighter to use its font-lock backend also resolved the performance issue (though with the font-lock backend, highlights are easily messed up by editing operations which makes the overlay variant far more appealing) I also found that some other heavy users of overlays such as avy-goto-word-0-{above,below} feel faster with the feature/noverlay branch, so I'd welcome a merge of the overlay branch even if there was a technically superior alternative to line-number-at-pos that didn't suffer from overlay-related performance issues. That being said, your suggestion sounds intriguing. What would be required to expose find_newline to Lisp? Would I simply have to wrap it in one of Emacs's DEFINE_<something> macros? Is there some documentation on the Emacs C backend? On 03/18/2018 09:39 PM, Eli Zaretskii wrote: >> From: Sebastian Sturm <s.sturm@arkona-technologies.de> >> Date: Sun, 18 Mar 2018 21:14:53 +0100 >> >> [1] I'm using cquery for my C++ editing needs, which comes with an >> overlay-based semantic highlighting mechanism. With my emacs >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per >> character insertion, which consume ~20 to 25 ms each when performing >> edits close to the bottom of a 66KB C++ file (measured using >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of >> emacs-27/git commit #9942734...). Using the noverlay branch, this figure >> drops to ~160us per call. > > If lsp-mode/lsp-ui needs a fast line counter, one can easily be > provided by exposing find_newline to Lisp. IME, it's lightning-fast, > and should run circles around count-lines (used by line-number-at-pos). > > (I'm not sure I even understand how overlays come into play here, > btw.) ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 21:04 ` Sebastian Sturm @ 2018-03-18 23:03 ` Sebastian Sturm 2018-03-18 23:20 ` Sebastian Sturm 2018-03-19 6:36 ` Eli Zaretskii 2018-03-19 6:28 ` Eli Zaretskii 1 sibling, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-18 23:03 UTC (permalink / raw) To: emacs-devel concerning the performance improvement with noverlay, it seems I spoke to soon. I've now had the issue reappear, both with the noverlay branch, and with the semantic highlighter set to use font-lock. Sorry for the misinformation. Again, however, line-number-at-pos shows up as a large CPU time consumer in the profiler report, and benchmark-run still reports several ms per invocation (though this time it's usually around 2 to 4 ms instead of the 20 to 25 I measured earlier), so I'd still be very much interested in a faster line-number-at-pos implementation. On 03/18/2018 10:04 PM, Sebastian Sturm wrote: > I also found it surprising that overlays would slow down line counting, > but since I don't know anything about the architecture of the Emacs > display engine, or its overlay implementation, I figured that overlays > must be to blame because > > (i) the issue went away after switching to the feature/noverlay branch > > (ii) configuring the semantic highlighter to use its font-lock backend > also resolved the performance issue (though with the font-lock backend, > highlights are easily messed up by editing operations which makes the > overlay variant far more appealing) > > I also found that some other heavy users of overlays such as > avy-goto-word-0-{above,below} feel faster with the feature/noverlay > branch, so I'd welcome a merge of the overlay branch even if there was a > technically superior alternative to line-number-at-pos that didn't > suffer from overlay-related performance issues. > > That being said, your suggestion sounds intriguing. What would be > required to expose find_newline to Lisp? Would I simply have to wrap it > in one of Emacs's DEFINE_<something> macros? Is there some documentation > on the Emacs C backend? > > On 03/18/2018 09:39 PM, Eli Zaretskii wrote: > >> From: Sebastian Sturm <s.sturm@arkona-technologies.de> > >> Date: Sun, 18 Mar 2018 21:14:53 +0100 > >> > >> [1] I'm using cquery for my C++ editing needs, which comes with an > >> overlay-based semantic highlighting mechanism. With my emacs > >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per > >> character insertion, which consume ~20 to 25 ms each when performing > >> edits close to the bottom of a 66KB C++ file (measured using > >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of > >> emacs-27/git commit #9942734...). Using the noverlay branch, this > figure > >> drops to ~160us per call. > > > > If lsp-mode/lsp-ui needs a fast line counter, one can easily be > > provided by exposing find_newline to Lisp. IME, it's lightning-fast, > > and should run circles around count-lines (used by line-number-at-pos). > > > > (I'm not sure I even understand how overlays come into play here, > > btw.) > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 23:03 ` Sebastian Sturm @ 2018-03-18 23:20 ` Sebastian Sturm 2018-03-19 6:43 ` Eli Zaretskii 2018-03-19 6:36 ` Eli Zaretskii 1 sibling, 1 reply; 54+ messages in thread From: Sebastian Sturm @ 2018-03-18 23:20 UTC (permalink / raw) To: emacs-devel for the record, I just switched back to emacs master (no noverlay) and the time reported by (benchmark-run 1000 (line-number-at-pos (point)) increased by a factor of ~40, to 75-80s. At this level, editing is unbearably slow. With the semantic highlighter disabled, the same measurement yields ~2.5s (still painfully slow, but borderline usable), so about the same time reported by the noverlay branch. To me, this suggests that noverlay indeed improves performance, though not necessarily to the level I had previously claimed. find_newline may solve this particular issue completely. Since the time taken by line-number-at-pos seems to fluctuate wildly for (to me) unknown reasons, I'll try and see if I can set up a systematic way to collect reliable data. On 03/19/2018 12:03 AM, Sebastian Sturm wrote: > concerning the performance improvement with noverlay, it seems I spoke > to soon. I've now had the issue reappear, both with the noverlay branch, > and with the semantic highlighter set to use font-lock. Sorry for the > misinformation. > Again, however, line-number-at-pos shows up as a large CPU time consumer > in the profiler report, and benchmark-run still reports several ms per > invocation (though this time it's usually around 2 to 4 ms instead of > the 20 to 25 I measured earlier), so I'd still be very much interested > in a faster line-number-at-pos implementation. > > On 03/18/2018 10:04 PM, Sebastian Sturm wrote: >> I also found it surprising that overlays would slow down line >> counting, but since I don't know anything about the architecture of >> the Emacs display engine, or its overlay implementation, I figured >> that overlays must be to blame because >> >> (i) the issue went away after switching to the feature/noverlay branch >> >> (ii) configuring the semantic highlighter to use its font-lock backend >> also resolved the performance issue (though with the font-lock >> backend, highlights are easily messed up by editing operations which >> makes the overlay variant far more appealing) >> >> I also found that some other heavy users of overlays such as >> avy-goto-word-0-{above,below} feel faster with the feature/noverlay >> branch, so I'd welcome a merge of the overlay branch even if there was >> a technically superior alternative to line-number-at-pos that didn't >> suffer from overlay-related performance issues. >> >> That being said, your suggestion sounds intriguing. What would be >> required to expose find_newline to Lisp? Would I simply have to wrap >> it in one of Emacs's DEFINE_<something> macros? Is there some >> documentation on the Emacs C backend? >> >> On 03/18/2018 09:39 PM, Eli Zaretskii wrote: >> >> From: Sebastian Sturm <s.sturm@arkona-technologies.de> >> >> Date: Sun, 18 Mar 2018 21:14:53 +0100 >> >> >> >> [1] I'm using cquery for my C++ editing needs, which comes with an >> >> overlay-based semantic highlighting mechanism. With my emacs >> >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per >> >> character insertion, which consume ~20 to 25 ms each when performing >> >> edits close to the bottom of a 66KB C++ file (measured using >> >> (benchmark-run 1000 (line-number-at-pos (point))) on a release >> build of >> >> emacs-27/git commit #9942734...). Using the noverlay branch, this >> figure >> >> drops to ~160us per call. >> > >> > If lsp-mode/lsp-ui needs a fast line counter, one can easily be >> > provided by exposing find_newline to Lisp. IME, it's lightning-fast, >> > and should run circles around count-lines (used by >> line-number-at-pos). >> > >> > (I'm not sure I even understand how overlays come into play here, >> > btw.) >> > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 23:20 ` Sebastian Sturm @ 2018-03-19 6:43 ` Eli Zaretskii 2018-03-19 9:53 ` Sebastian Sturm 0 siblings, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 6:43 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Mon, 19 Mar 2018 00:20:13 +0100 > > for the record, I just switched back to emacs master (no noverlay) and > the time reported by (benchmark-run 1000 (line-number-at-pos (point)) > increased by a factor of ~40, to 75-80s. At this level, editing is > unbearably slow. With the semantic highlighter disabled, the same > measurement yields ~2.5s (still painfully slow, but borderline usable), > so about the same time reported by the noverlay branch. You will have to explain why overlays and the semantic highlighter affect line-counting. How about presenting a profile produced by "M-x profiler-report"? And the timings you measure are 2.5 _milliseconds_ (the benchmark runs 1000 times), right? If so, I cannot understand why you say that's borderline usable, because IME such short times are imperceptible by humans. I guess some other factor is at work here, so I'd suggest to describe more details about your use case. > Since the time taken by line-number-at-pos seems to fluctuate wildly for > (to me) unknown reasons, I'll try and see if I can set up a systematic > way to collect reliable data. Yes, please do. I'm guessing there's some factor here that is important to consider. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 6:43 ` Eli Zaretskii @ 2018-03-19 9:53 ` Sebastian Sturm 2018-03-19 12:57 ` Eli Zaretskii 2018-03-19 14:56 ` Stefan Monnier 0 siblings, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-19 9:53 UTC (permalink / raw) To: emacs-devel On 03/19/2018 07:43 AM, Eli Zaretskii wrote: >> From: Sebastian Sturm <s.sturm@arkona-technologies.de> >> Date: Mon, 19 Mar 2018 00:20:13 +0100 >> >> for the record, I just switched back to emacs master (no noverlay) and >> the time reported by (benchmark-run 1000 (line-number-at-pos (point)) >> increased by a factor of ~40, to 75-80s. At this level, editing is >> unbearably slow. With the semantic highlighter disabled, the same >> measurement yields ~2.5s (still painfully slow, but borderline usable), >> so about the same time reported by the noverlay branch. > > You will have to explain why overlays and the semantic highlighter > affect line-counting. How about presenting a profile produced by > "M-x profiler-report"? please find below a profiler report taken this morning (on my PC at work, which doesn't suffer from the performance issue as much as my 2014 MacBook Pro, but even here the issue is clearly noticeable) > > And the timings you measure are 2.5 _milliseconds_ (the benchmark runs > 1000 times), right? If so, I cannot understand why you say that's > borderline usable, because IME such short times are imperceptible by > humans. I guess some other factor is at work here, so I'd suggest to > describe more details about your use case. well no, it's about 2.5ms per call to line-number-at-pos, which is called at least 6 times per character insertion (with my Emacs config, at least). Which already makes for 15ms per character insertion, excluding anything else done by cc-mode or lsp-mode. >> Since the time taken by line-number-at-pos seems to fluctuate wildly for >> (to me) unknown reasons, I'll try and see if I can set up a systematic >> way to collect reliable data. > > Yes, please do. I'm guessing there's some factor here that is > important to consider. I wrote a simple-minded measurement routine that can at least report the time taken between successive character insertions, and I noticed that better alternatives have been brought up in response to my other mailing list thread (titled "Latency profiling?"). I haven't had time to prepare a systematic test case yet, but I will look into it when I get home from work this evening. - command-execute 5676 73% - call-interactively 5676 73% - funcall-interactively 5676 73% - self-insert-command 5080 66% - lsp-on-change 2144 27% - lsp--text-document-content-change-event 1964 25% - lsp--point-to-position 1964 25% line-number-at-pos 1964 25% + json-encode 164 2% + file-truename 16 0% - c-after-change 1784 23% - mapc 1772 23% + #<compiled 0x1a24bed> 1772 23% - c-invalidate-sws-region-after 12 0% - c-invalidate-sws-region-after-ins 12 0% - c-beginning-of-macro 8 0% back-to-indentation 4 0% - c-literal-limits 4 0% c-state-full-pp-to-literal 4 0% - lsp-before-change 1016 13% - lsp--point-to-position 1016 13% line-number-at-pos 1016 13% - sp--post-self-insert-hook-handler 56 0% - sp-insert-pair 28 0% - sp--pair-to-insert 24 0% - sp--all-pairs-to-insert 20 0% - sp--looking-back-p 8 0% sp--looking-back 8 0% + sp--do-action-p 8 0% + sp--get-closing-regexp 4 0% + sp--all-pairs-to-insert 16 0% + sp-escape-open-delimiter 12 0% + c-before-change 28 0% - jit-lock-after-change 4 0% + run-hook-with-args 4 0% + counsel-M-x 360 4% + evil-open-below 232 3% - lsp-ui-doc--make-request 643 8% line-number-at-pos 627 8% + file-truename 8 0% url-hexify-string 4 0% - lsp-ui-sideline 528 6% line-number-at-pos 528 6% + timer-event-handler 327 4% + evil-escape-pre-command-hook 220 2% + #<compiled 0xb54291> 141 1% + redisplay_internal (C function) 88 1% + ... 29 0% + yas--post-command-handler 20 0% + global-spacemacs-whitespace-cleanup-mode-check-buffers 4 0% + internal-timer-start-idle 4 0% + flycheck-display-error-at-point-soon 4 0% flycheck-error-list-update-source 4 0% ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 9:53 ` Sebastian Sturm @ 2018-03-19 12:57 ` Eli Zaretskii 2018-03-19 14:56 ` Stefan Monnier 1 sibling, 0 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 12:57 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Mon, 19 Mar 2018 10:53:52 +0100 > > On 03/19/2018 07:43 AM, Eli Zaretskii wrote: > >> From: Sebastian Sturm <s.sturm@arkona-technologies.de> > >> Date: Mon, 19 Mar 2018 00:20:13 +0100 > >> > >> for the record, I just switched back to emacs master (no noverlay) and > >> the time reported by (benchmark-run 1000 (line-number-at-pos (point)) > >> increased by a factor of ~40, to 75-80s. At this level, editing is > >> unbearably slow. With the semantic highlighter disabled, the same > >> measurement yields ~2.5s (still painfully slow, but borderline usable), > >> so about the same time reported by the noverlay branch. > > > > You will have to explain why overlays and the semantic highlighter > > affect line-counting. How about presenting a profile produced by > > "M-x profiler-report"? > > please find below a profiler report taken this morning (on my PC at > work, which doesn't suffer from the performance issue as much as my 2014 > MacBook Pro, but even here the issue is clearly noticeable) That profile says that self-insert-command takes a large percentage of the time. So I think we should look into the reasons for such a strange place to spend hundreds of microseconds. According to the profile, line-number-at-pos takes about the same percentage of time as self-insert-command does. And that is even before you optimize the successive calls to line-counting code to take advantage of the previously computed value for some close line. > > And the timings you measure are 2.5 _milliseconds_ (the benchmark runs > > 1000 times), right? If so, I cannot understand why you say that's > > borderline usable, because IME such short times are imperceptible by > > humans. I guess some other factor is at work here, so I'd suggest to > > describe more details about your use case. > > well no, it's about 2.5ms per call to line-number-at-pos, which is > called at least 6 times per character insertion (with my Emacs config, > at least). Which already makes for 15ms per character insertion, > excluding anything else done by cc-mode or lsp-mode. Then, as I said, I don't understand why it takes so much on your system. I get times that are 10 times faster. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 9:53 ` Sebastian Sturm 2018-03-19 12:57 ` Eli Zaretskii @ 2018-03-19 14:56 ` Stefan Monnier 2018-03-19 15:07 ` Sebastian Sturm 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-19 14:56 UTC (permalink / raw) To: emacs-devel > well no, it's about 2.5ms per call to line-number-at-pos, which is called at > least 6 times per character insertion (with my Emacs config, at > least). Which already makes for 15ms per character insertion, excluding > anything else done by cc-mode or lsp-mode. Since you say that the noverlay branch helps, could you check the number of overlays involved? E.g. M-: (length (overlays-in (point-min) (point-max))) RET If there are more overlays than chars in this buffer, maybe there's a problem in some Elisp that creates too many overlays? If there aren't that many overlays, then I wonder why the noverlays branch would make such a significant difference. Also, if you can reliably reproduce the "slow editing", would it be possible to make a recipe for it that we can try and reproduce on our side? Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 14:56 ` Stefan Monnier @ 2018-03-19 15:07 ` Sebastian Sturm 2018-03-19 15:13 ` Stefan Monnier 2018-03-20 1:23 ` Sebastian Sturm 0 siblings, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-19 15:07 UTC (permalink / raw) To: emacs-devel for the file I was complaining about, the number returned is 3080 (doesn't exceed the number of chars though, (point-max) = 67641). Will try to obtain usable timing data later today when I get home from work. Thanks! On 03/19/2018 03:56 PM, Stefan Monnier wrote: >> well no, it's about 2.5ms per call to line-number-at-pos, which is called at >> least 6 times per character insertion (with my Emacs config, at >> least). Which already makes for 15ms per character insertion, excluding >> anything else done by cc-mode or lsp-mode. > > Since you say that the noverlay branch helps, could you check the number > of overlays involved? E.g. > > M-: (length (overlays-in (point-min) (point-max))) RET > > If there are more overlays than chars in this buffer, maybe there's > a problem in some Elisp that creates too many overlays? > > If there aren't that many overlays, then I wonder why the noverlays > branch would make such a significant difference. > > Also, if you can reliably reproduce the "slow editing", would it be > possible to make a recipe for it that we can try and reproduce on > our side? > > > Stefan > > -- Sebastian Sturm Research & Development Phone: +49 (0) 6155 7808977 Fax: +49 (0) 6155 7802880 Email: s.sturm@arkona-technologies.de Web: www.arkona-technologies.de arkona technologies GmbH Im Leuschnerpark 4 64347 Griesheim Germany Amtsgericht / Commercial Register of Darmstadt, HRB 90080 USt-ID: DE273794666 Steuernummer / Tax-ID: 007 / 228 / 19331 Geschäftsführung / Managing Director: Rainer Sturm This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this message and any attachments from your system. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. We cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. If, despite our use of anti-virus software, a virus enters your systems in connection with the sending of the e-mail, you may not hold us liable for any damages that may possibly arise in that connection. We will accept liability which by law we cannot exclude. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 15:07 ` Sebastian Sturm @ 2018-03-19 15:13 ` Stefan Monnier 2018-03-20 1:23 ` Sebastian Sturm 1 sibling, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-03-19 15:13 UTC (permalink / raw) To: emacs-devel > for the file I was complaining about, the number returned is 3080 (doesn't > exceed the number of chars though, (point-max) = 67641). Will try to obtain > usable timing data later today when I get home from work. Thanks! Thanks. It's large enough that noverlay might indeed make a difference for some operations, but not large enough to explain why line-number-at-pos would be so slow, I think. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-19 15:07 ` Sebastian Sturm 2018-03-19 15:13 ` Stefan Monnier @ 2018-03-20 1:23 ` Sebastian Sturm 2018-03-20 6:30 ` Eli Zaretskii 2018-03-22 20:52 ` Stefan Monnier 1 sibling, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-20 1:23 UTC (permalink / raw) To: emacs-devel after disabling cquery for testing purposes (which leaves me with 45 overlays generated by flycheck for my troublesome C++ file), I'm now generating a large number of overlays using the following function: (defun test-highlight () (save-excursion (with-silent-modifications (let ((stepsize 10)) (widen) (goto-char 1) (cl-loop for n from (point-min) upto (- (point-max) stepsize) by stepsize do (let ((ov (make-overlay n (+ (1- stepsize) n)))) (overlay-put ov 'cquery-sem-highlight t))))))) Evaluating the following function without additional overlays (beyond the flycheck ones, that is) yields the following results: (defun benchmark-often () (cl-loop for n from 1 upto 20 do (message (format "iteration %d: %f" n (nth 0 (benchmark-run (line-number-at-pos (point)))))))) 1st run: iteration 1: 0.001213 iteration 2: 0.001170 iteration 3: 0.001170 iteration 4: 0.001238 iteration 5: 0.001163 iteration 6: 0.001153 iteration 7: 0.000421 iteration 8: 0.000426 iteration 9: 0.000322 iteration 10: 0.000301 iteration 11: 0.000291 iteration 12: 0.000292 iteration 13: 0.000291 iteration 14: 0.000291 iteration 15: 0.000295 iteration 16: 0.000289 iteration 17: 0.000289 iteration 18: 0.000288 iteration 19: 0.000288 iteration 20: 0.000287 2nd run: iteration 1: 0.001044 iteration 2: 0.000942 iteration 3: 0.000935 iteration 4: 0.000935 iteration 5: 0.000935 iteration 6: 0.000932 iteration 7: 0.000954 iteration 8: 0.000940 iteration 9: 0.000933 iteration 10: 0.000625 iteration 11: 0.000545 iteration 12: 0.000428 iteration 13: 0.000362 iteration 14: 0.000346 iteration 15: 0.000325 iteration 16: 0.000309 iteration 17: 0.000309 iteration 18: 0.000316 iteration 19: 0.000310 iteration 20: 0.000308 after evaluating (test-highlight) the figures are as follows: 1st run: iteration 1: 0.026012 iteration 2: 0.020334 iteration 3: 0.020250 iteration 4: 0.020349 iteration 5: 0.020501 iteration 6: 0.020635 iteration 7: 0.020302 iteration 8: 0.020426 iteration 9: 0.020440 iteration 10: 0.020441 iteration 11: 0.020515 iteration 12: 0.020525 iteration 13: 0.020383 iteration 14: 0.020510 iteration 15: 0.019829 iteration 16: 0.019899 iteration 17: 0.019950 iteration 18: 0.019828 iteration 19: 0.019901 iteration 20: 0.019819 2nd run: iteration 1: 0.026526 iteration 2: 0.020051 iteration 3: 0.020100 iteration 4: 0.020080 iteration 5: 0.020080 iteration 6: 0.020249 iteration 7: 0.020087 iteration 8: 0.020005 iteration 9: 0.019980 iteration 10: 0.019985 iteration 11: 0.020077 iteration 12: 0.019979 iteration 13: 0.020060 iteration 14: 0.020092 iteration 15: 0.019954 iteration 16: 0.019766 iteration 17: 0.019432 iteration 18: 0.019491 iteration 19: 0.019458 iteration 20: 0.019482 I'm not allowed to share my employer's source code as a test case, so I tried the same procedure with the similarly large DeclBase.h from the public LLVM repository. To my surprise, DeclBase.h didn't suffer from any performance issues at all. Switching to fundamental-mode while visiting my file didn't change anything, so I assume that c-mode isn't to blame either. There have been claims of overlay-related performance issues with cquery and some large(-ish) open-source C or C++ files, so I'll try to locate these files and hope that at least one of them exhibits this issue as well. On 03/19/2018 04:07 PM, Sebastian Sturm wrote: > for the file I was complaining about, the number returned is 3080 > (doesn't exceed the number of chars though, (point-max) = 67641). Will > try to obtain usable timing data later today when I get home from work. > Thanks! > > On 03/19/2018 03:56 PM, Stefan Monnier wrote: >>> well no, it's about 2.5ms per call to line-number-at-pos, which is >>> called at >>> least 6 times per character insertion (with my Emacs config, at >>> least). Which already makes for 15ms per character insertion, excluding >>> anything else done by cc-mode or lsp-mode. >> >> Since you say that the noverlay branch helps, could you check the number >> of overlays involved? E.g. >> >> M-: (length (overlays-in (point-min) (point-max))) RET >> >> If there are more overlays than chars in this buffer, maybe there's >> a problem in some Elisp that creates too many overlays? >> >> If there aren't that many overlays, then I wonder why the noverlays >> branch would make such a significant difference. >> >> Also, if you can reliably reproduce the "slow editing", would it be >> possible to make a recipe for it that we can try and reproduce on >> our side? >> >> >> Stefan >> >> > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-20 1:23 ` Sebastian Sturm @ 2018-03-20 6:30 ` Eli Zaretskii 2018-03-21 0:36 ` Sebastian Sturm 2018-03-22 20:52 ` Stefan Monnier 1 sibling, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-20 6:30 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Tue, 20 Mar 2018 02:23:02 +0100 > > 1st run: > iteration 1: 0.001213 > iteration 2: 0.001170 > [...] > > after evaluating (test-highlight) the figures are as follows: > 1st run: > iteration 1: 0.026012 > iteration 2: 0.020334 So, between 20-fold and 100-fold slow-down. > I'm not allowed to share my employer's source code as a test case, so I > tried the same procedure with the similarly large DeclBase.h from the > public LLVM repository. To my surprise, DeclBase.h didn't suffer from > any performance issues at all. Switching to fundamental-mode while > visiting my file didn't change anything, so I assume that c-mode isn't > to blame either. So it's still a mystery why your original file produces such a large slowdown with overlays. Can you show the results of "M-x profiler-report" for the slow test with your original source file? It could have some clues. If that's impossible, I can only repeat my suggestion to use perf to find the code in Emacs that takes the lion's share of the processing time. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-20 6:30 ` Eli Zaretskii @ 2018-03-21 0:36 ` Sebastian Sturm 2018-03-21 6:47 ` Eli Zaretskii 2018-03-22 13:16 ` Stefan Monnier 0 siblings, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-21 0:36 UTC (permalink / raw) To: emacs-devel > So it's still a mystery why your original file produces such a large > slowdown with overlays. > > Can you show the results of "M-x profiler-report" for the slow test > with your original source file? It could have some clues. If that's > impossible, I can only repeat my suggestion to use perf to find the > code in Emacs that takes the lion's share of the processing time. this is the profiler report I get for the slow case (BTW, is there a way to have the profiler resolve functions within line-number-at-pos? I tried increasing profiler-max-stack-depth to 32, but the profiler still didn't show anything below line-number-at-pos) - command-execute 20522 97% - call-interactively 20522 97% - funcall-interactively 20061 94% - eval-expression 18609 87% - eval 18609 87% - benchmark-often 18609 87% - let* 18609 87% - while 18609 87% - message 18388 86% - format 18376 86% - nth 18376 86% - let 18376 86% - list 18376 86% - let 18368 86% line-number-at-pos 18348 86% ws-butler-after-change 4 0% + evil-open-below 614 2% + counsel-M-x 575 2% + undo-tree-undo 79 0% + next-line 68 0% + evil-normal-state 52 0% + evil-previous-line 48 0% + previous-line 7 0% + evil-emacs-state 6 0% + evil-insert 3 0% + byte-code 461 2% + redisplay_internal (C function) 254 1% + timer-event-handler 116 0% + evil-escape-pre-command-hook 82 0% + ... 63 0% + global-spacemacs-whitespace-cleanup-mode-check-buffers 47 0% + yas--post-command-handler 33 0% + evil-repeat-post-hook 6 0% + evil-visual-post-command 5 0% + flycheck-handle-signal 4 0% evil-snipe-mode-check-buffers 4 0% global-undo-tree-mode-check-buffers 4 0% + sp--save-pre-command-state 2 0% + xselect-convert-to-string 2 0% + evil-visual-pre-command 1 0% + flycheck-pos-tip-hide-messages 1 0% + which-key--hide-popup 1 0% with perf, the ("self") time taken by buf_charpos_to_bytepos increases from ~60% (fast case) to >98%. This is the diff generated by perf diff <fast.perf> <slow.perf>: # Event 'cycles' # # Baseline Delta Shared Object Symbol # ........ ....... .................... .......................................... # 57.77% +40.30% emacs-27.0.50 [.] buf_charpos_to_bytepos 11.12% -10.70% libc-2.23.so [.] __memrchr 6.48% -6.19% emacs-27.0.50 [.] assq_no_quit 5.92% -5.62% emacs-27.0.50 [.] find_cache_boundary 4.26% -4.07% emacs-27.0.50 [.] set_buffer_internal_2 4.10% -3.73% emacs-27.0.50 [.] find_newline 3.54% libc-2.23.so [.] __memmove_avx_unaligned 1.25% -1.19% emacs-27.0.50 [.] region_cache_forward 0.46% -0.41% [kernel.kallsyms] [k] 0xffffffff83004eb0 0.26% -0.26% emacs-27.0.50 [.] revalidate_region_cache.isra.1 0.25% -0.23% emacs-27.0.50 [.] find_interval 0.23% -0.23% emacs-27.0.50 [.] swap_in_symval_forwarding 0.19% -0.18% emacs-27.0.50 [.] do_symval_forwarding 0.16% -0.15% emacs-27.0.50 [.] eval_sub 0.16% -0.15% emacs-27.0.50 [.] x_produce_glyphs 0.13% libXft.so.2.3.2 [.] XftCharIndex 0.13% -0.12% libXft.so.2.3.2 [.] XftGlyphExtents 0.12% -0.12% emacs-27.0.50 [.] store_symval_forwarding 0.12% -0.11% emacs-27.0.50 [.] exec_byte_code 0.09% emacs-27.0.50 [.] Fsymbol_value 0.08% -0.08% emacs-27.0.50 [.] x_get_glyph_overhangs 0.07% emacs-27.0.50 [.] find_symbol_value 0.07% libc-2.23.so [.] __memcpy_avx_unaligned 0.07% emacs-27.0.50 [.] Fassq 0.07% emacs-27.0.50 [.] insert_1_both.part.9 0.06% emacs-27.0.50 [.] offset_intervals 0.06% libc-2.23.so [.] __memcmp_sse4_1 0.06% emacs-27.0.50 [.] next_element_from_buffer 0.05% -0.05% emacs-27.0.50 [.] move_it_in_display_line_to 0.05% -0.04% emacs-27.0.50 [.] get_next_display_element 0.05% -0.04% libXft.so.2.3.2 [.] XftFontCheckGlyph 0.05% -0.04% libX11.so.6.3.0 [.] _XSetClipRectangles 0.04% libc-2.23.so [.] __GI___printf_fp_l 0.04% emacs-27.0.50 [.] styled_format 0.04% -0.03% emacs-27.0.50 [.] xftfont_text_extents 0.04% -0.03% emacs-27.0.50 [.] draw_glyphs 0.04% emacs-27.0.50 [.] display_line 0.04% emacs-27.0.50 [.] validate_interval_range 0.04% emacs-27.0.50 [.] update_window_fringes 0.04% libX11.so.6.3.0 [.] 0x000000000001b65a 0.04% -0.03% emacs-27.0.50 [.] lookup_char_property 0.04% emacs-27.0.50 [.] balance_an_interval 0.03% emacs-27.0.50 [.] composition_compute_stop_pos 0.03% libpthread-2.23.so [.] pthread_mutex_unlock 0.03% -0.03% emacs-27.0.50 [.] x_draw_glyph_string 0.03% -0.03% emacs-27.0.50 [.] mem_insert 0.03% -0.03% libpthread-2.23.so [.] pthread_mutex_lock 0.03% libc-2.23.so [.] malloc 0.03% emacs-27.0.50 [.] update_window_line 0.03% emacs-27.0.50 [.] Ffuncall 0.03% -0.03% emacs-27.0.50 [.] get_glyph_face_and_encoding 0.03% -0.02% emacs-27.0.50 [.] set_internal 0.03% -0.02% emacs-27.0.50 [.] update_window 0.03% libXft.so.2.3.2 [.] XftDrawSrcPicture 0.03% libc-2.23.so [.] vfprintf 0.03% -0.02% emacs-27.0.50 [.] do_one_unbind.constprop.20 0.03% emacs-27.0.50 [.] set_iterator_to_next 0.02% emacs-27.0.50 [.] specbind 0.02% libc-2.23.so [.] __memset_avx2 0.02% -0.02% libX11.so.6.3.0 [.] _XFlushGCCache 0.02% emacs-27.0.50 [.] lookup_glyphless_char_display 0.02% libc-2.23.so [.] _int_malloc 0.02% emacs-27.0.50 [.] set_cursor_from_row.isra.40 0.02% -0.02% emacs-27.0.50 [.] message_dolog.part.60 0.02% emacs-27.0.50 [.] gap_left 0.02% -0.02% emacs-27.0.50 [.] unbind_to 0.02% -0.02% emacs-27.0.50 [.] init_iterator 0.02% emacs-27.0.50 [.] Fget_buffer 0.02% -0.01% emacs-27.0.50 [.] overlays_at 0.02% -0.01% [wl] [k] osl_readl 0.02% -0.01% emacs-27.0.50 [.] set_point_both 0.02% emacs-27.0.50 [.] message3_nolog 0.02% libX11.so.6.3.0 [.] _XGetRequest 0.02% libc-2.23.so [.] __sprintf_chk 0.02% emacs-27.0.50 [.] adjust_markers_for_insert 0.02% emacs-27.0.50 [.] adjust_suspend_auto_hscroll 0.02% emacs-27.0.50 [.] get_char_property_and_overlay 0.01% emacs-27.0.50 [.] window_box_width 0.01% emacs-27.0.50 [.] Fcons 0.01% libxcb.so.1.1.0 [.] 0x0000000000009cca 0.01% -0.01% emacs-27.0.50 [.] arith_driver 0.01% -0.01% emacs-27.0.50 [.] resize_mini_window 0.01% emacs-27.0.50 [.] unchain_marker 0.01% emacs-27.0.50 [.] face_for_char 0.01% emacs-27.0.50 [.] unblock_input_to 0.01% emacs-27.0.50 [.] unblock_input 0.01% emacs-27.0.50 [.] balance_possible_root_interval 0.01% emacs-27.0.50 [.] add_text_properties_1 0.01% emacs-27.0.50 [.] save_restriction_restore 0.01% emacs-27.0.50 [.] funcall_subr 0.01% emacs-27.0.50 [.] gap_right 0.01% -0.01% emacs-27.0.50 [.] Fcurrent_buffer 0.01% libXrender.so.1.3.0 [.] XRenderFillRectangle 0.01% libXext.so.6.4.0 [.] XdbeSwapBuffers 0.01% emacs-27.0.50 [.] get_glyph_string_clip_rects.part.72 0.01% emacs-27.0.50 [.] Fwhile 0.01% emacs-27.0.50 [.] window_wants_mode_line 0.01% emacs-27.0.50 [.] display_echo_area_1 0.01% libXft.so.2.3.2 [.] XftDrawSetClipRectangles 0.01% emacs-27.0.50 [.] append_space_for_newline 0.01% emacs-27.0.50 [.] x_update_end 0.01% emacs-27.0.50 [.] Fforward_line 0.01% emacs-27.0.50 [.] memrchr@plt 0.01% emacs-27.0.50 [.] make_uninit_multibyte_string 0.01% emacs-27.0.50 [.] prepare_to_modify_buffer_1 0.01% emacs-27.0.50 [.] Fstring_equal 0.01% emacs-27.0.50 [.] init_glyph_string 0.01% libc-2.23.so [.] _IO_old_init 0.01% libXrender.so.1.3.0 [.] XRenderFindDisplay 0.01% emacs-27.0.50 [.] Fnreverse 0.01% emacs-27.0.50 [.] arithcompare 0.01% emacs-27.0.50 [.] font_get_frame_data 0.01% emacs-27.0.50 [.] window_box_left 0.01% libX11.so.6.3.0 [.] XSetClipRectangles 0.01% -0.01% emacs-27.0.50 [.] Flength 0.01% emacs-27.0.50 [.] previous_interval 0.01% libXft.so.2.3.2 [.] 0x0000000000007294 0.01% emacs-27.0.50 [.] x_compute_glyph_string_overhangs 0.01% emacs-27.0.50 [.] XftCharIndex@plt 0.01% libXrender.so.1.3.0 [.] XRenderCompositeString16 0.01% emacs-27.0.50 [.] message3 0.01% emacs-27.0.50 [.] xftfont_encode_char 0.01% emacs-27.0.50 [.] assign_row 0.01% emacs-27.0.50 [.] prepare_desired_row 0.01% emacs-27.0.50 [.] xftfont_draw 0.01% libXext.so.6.4.0 [.] XextFindDisplay 0.01% libpthread-2.23.so [.] pthread_cond_broadcast@@GLIBC_2.3.2 0.01% libXft.so.2.3.2 [.] XftDrawGlyphs 0.01% emacs-27.0.50 [.] marker_position 0.01% emacs-27.0.50 [.] funcall_lambda 0.01% emacs-27.0.50 [.] gettime 0.01% libpthread-2.23.so [.] __GI___libc_recvmsg 0.01% emacs-27.0.50 [.] lisp_time_struct 0.01% emacs-27.0.50 [.] echo_area_display 0.01% emacs-27.0.50 [.] CHECK_MARKER 0.01% emacs-27.0.50 [.] del_range_2 0.01% emacs-27.0.50 [.] prepare_face_for_display 0.01% emacs-27.0.50 [.] interval_deletion_adjustment 0.01% libc-2.23.so [.] strlen 0.01% emacs-27.0.50 [.] verify_interval_modification 0.01% emacs-27.0.50 [.] count_size_as_multibyte 0.01% emacs-27.0.50 [.] adjust_overlays_for_insert 0.01% libXrender.so.1.3.0 [.] XRenderSetPictureClipRectangles 0.01% emacs-27.0.50 [.] buffer_local_value 0.01% emacs-27.0.50 [.] adjust_window_count 0.01% emacs-27.0.50 [.] notice_overwritten_cursor.part.52 0.01% libc-2.23.so [.] __GI___writev 0.01% emacs-27.0.50 [.] record_in_backtrace 0.01% emacs-27.0.50 [.] save_restriction_save 0.01% emacs-27.0.50 [.] Fget_text_property 0.01% -0.00% [vdso] [.] __vdso_clock_gettime 0.01% emacs-27.0.50 [.] window_wants_header_line 0.01% emacs-27.0.50 [.] Fmessage 0.01% emacs-27.0.50 [.] FUNCTIONP 0.01% emacs-27.0.50 [.] window_display_table 0.01% emacs-27.0.50 [.] maybe_quit 0.01% emacs-27.0.50 [.] clear_glyph_matrix 0.01% -0.00% emacs-27.0.50 [.] with_echo_area_buffer 0.01% libc-2.23.so [.] __libc_enable_asynccancel 0.01% emacs-27.0.50 [.] should_produce_line_number 0.01% emacs-27.0.50 [.] x_set_glyph_string_clipping 0.01% emacs-27.0.50 [.] set_message_1 0.01% emacs-27.0.50 [.] del_range_both 0.01% emacs-27.0.50 [.] record_insert 0.01% emacs-27.0.50 [.] adjust_overlays_for_delete 0.01% +0.00% libc-2.23.so [.] _int_free 0.01% libc-2.23.so [.] __strchrnul 0.01% emacs-27.0.50 [.] Fgoto_char 0.01% emacs-27.0.50 [.] free_misc 0.01% emacs-27.0.50 [.] draw_window_fringes 0.01% emacs-27.0.50 [.] x_flush.isra.37.part.38 0.01% libpthread-2.23.so [.] __errno_location 0.01% emacs-27.0.50 [.] make_float 0.01% emacs-27.0.50 [.] fill_glyph_string 0.01% emacs-27.0.50 [.] Fget 0.01% emacs-27.0.50 [.] Flocal_variable_p 0.01% emacs-27.0.50 [.] decode_time_components 0.01% emacs-27.0.50 [.] XftGlyphExtents@plt 0.01% emacs-27.0.50 [.] row_for_charpos_p 0.01% emacs-27.0.50 [.] load_overlay_strings 0.01% emacs-27.0.50 [.] allocate_misc 0.01% -0.00% libX11.so.6.3.0 [.] _XSend 0.01% libX11.so.6.3.0 [.] _XData32 0.01% -0.00% emacs-27.0.50 [.] buf_bytepos_to_charpos 0.01% emacs-27.0.50 [.] face_at_buffer_position 0.01% emacs-27.0.50 [.] invalidate_current_column 0.01% emacs-27.0.50 [.] find_composition 0.01% emacs-27.0.50 [.] x_draw_window_cursor 0.01% emacs-27.0.50 [.] x_flip_and_flush 0.01% libc-2.23.so [.] _IO_no_init 0.01% emacs-27.0.50 [.] Ftime_subtract 0.01% emacs-27.0.50 [.] row_hash 0.01% emacs-27.0.50 [.] lookup_basic_face 0.01% emacs-27.0.50 [.] memcpy@plt 0.01% emacs-27.0.50 [.] Fset_buffer 0.01% emacs-27.0.50 [.] produce_special_glyphs 0.00% emacs-27.0.50 [.] x_draw_glyph_string_background.part.44 0.00% [vdso] [.] 0x0000000000000939 0.00% emacs-27.0.50 [.] make_save_obj_obj_obj_obj 0.00% libc-2.23.so [.] __GI___libc_poll 0.00% emacs-27.0.50 [.] do_specbind 0.00% libXft.so.2.3.2 [.] XftGlyphRender 0.00% emacs-27.0.50 [.] move_it_to 0.00% emacs-27.0.50 [.] make_current.isra.14 0.00% emacs-27.0.50 [.] Fnext_single_property_change 0.00% emacs-27.0.50 [.] handle_face_prop 0.00% libX11.so.6.3.0 [.] _XFlush 0.00% emacs-27.0.50 [.] unwind_with_echo_area_buffer 0.00% emacs-27.0.50 [.] evaporate_overlays 0.00% emacs-27.0.50 [.] sort_overlays 0.00% emacs-27.0.50 [.] del_range_1 0.00% libX11.so.6.3.0 [.] XFlush 0.00% emacs-27.0.50 [.] try_window 0.00% emacs-27.0.50 [.] x_set_glyph_string_gc 0.00% libc-2.23.so [.] malloc_consolidate 0.00% emacs-27.0.50 [.] handle_stop 0.00% emacs-27.0.50 [.] recenter_overlay_lists 0.00% libX11.so.6.3.0 [.] pthread_mutex_lock@plt 0.00% emacs-27.0.50 [.] modify_text_properties 0.00% emacs-27.0.50 [.] delete_interval 0.00% emacs-27.0.50 [.] window_text_bottom_y 0.00% +0.00% emacs-27.0.50 [.] copy_text 0.00% emacs-27.0.50 [.] set_default_internal 0.00% +0.00% libxcb.so.1.1.0 [.] xcb_writev 0.00% emacs-27.0.50 [.] temp_set_point_both 0.00% emacs-27.0.50 [.] insert_1_both 0.00% emacs-27.0.50 [.] record_property_change 0.00% emacs-27.0.50 [.] lisp_align_malloc 0.00% emacs-27.0.50 [.] x_mark_frame_dirty 0.00% +0.00% emacs-27.0.50 [.] text_quoting_style 0.00% libc-2.23.so [.] __GI___mempcpy 0.00% emacs-27.0.50 [.] Ffloat_time 0.00% emacs-27.0.50 [.] x_write_glyphs 0.00% +0.00% emacs-27.0.50 [.] insert_from_string_1 0.00% emacs-27.0.50 [.] Fpoint 0.00% libc-2.23.so [.] __vsprintf_chk 0.00% libc-2.23.so [.] hack_digit 0.00% emacs-27.0.50 [.] grow_specpdl 0.00% emacs-27.0.50 [.] time_arith 0.00% emacs-27.0.50 [.] update_end 0.00% emacs-27.0.50 [.] allocate_string_data 0.00% emacs-27.0.50 [.] save_excursion_restore 0.00% +0.00% emacs-27.0.50 [.] Flet 0.00% emacs-27.0.50 [.] mem_rotate_right 0.00% emacs-27.0.50 [.] fetch_buffer_markers 0.00% emacs-27.0.50 [.] move_gap_both 0.00% emacs-27.0.50 [.] xmalloc 0.00% emacs-27.0.50 [.] update_begin 0.00% emacs-27.0.50 [.] float_arith_driver 0.00% emacs-27.0.50 [.] make_specified_string 0.00% emacs-27.0.50 [.] handle_composition_prop 0.00% emacs-27.0.50 [.] display_and_set_cursor 0.00% emacs-27.0.50 [.] compute_line_metrics 0.00% emacs-27.0.50 [.] clock_gettime@plt 0.00% libc-2.23.so [.] free 0.00% emacs-27.0.50 [.] Fplist_get 0.00% libgdk-3.so.0.1800.9 [.] gdk_display_manager_get 0.00% emacs-27.0.50 [.] make_interval 0.00% libxcb.so.1.1.0 [.] xcb_poll_for_event 0.00% emacs-27.0.50 [.] adjust_markers_for_delete 0.00% emacs-27.0.50 [.] do_pending_window_change 0.00% emacs-27.0.50 [.] XftDrawGlyphs@plt 0.00% libc-2.23.so [.] _itoa_word 0.00% emacs-27.0.50 [.] disassemble_lisp_time 0.00% emacs-27.0.50 [.] Ftext_properties_at 0.00% emacs-27.0.50 [.] composition_reseat_it 0.00% libX11.so.6.3.0 [.] _XEventsQueued 0.00% emacs-27.0.50 [.] minmax_driver 0.00% emacs-27.0.50 [.] set_marker_restricted_both 0.00% emacs-27.0.50 [.] graft_intervals_into_buffer 0.00% emacs-27.0.50 [.] disp_char_vector 0.00% libc-2.23.so [.] __clock_gettime 0.00% emacs-27.0.50 [.] CHECK_STRING_OR_BUFFER 0.00% emacs-27.0.50 [.] intervals_equal 0.00% emacs-27.0.50 [.] ensure_echo_area_buffers 0.00% emacs-27.0.50 [.] invalidate_buffer_caches 0.00% emacs-27.0.50 [.] set_marker_both 0.00% +0.00% emacs-27.0.50 [.] record_unwind_protect 0.00% emacs-27.0.50 [.] run_hook_with_args 0.00% emacs-27.0.50 [.] set_point_from_marker 0.00% emacs-27.0.50 [.] list4 0.00% libX11.so.6.3.0 [.] XSetClipMask 0.00% emacs-27.0.50 [.] record_buffer_markers 0.00% emacs-27.0.50 [.] default_value +0.00% libXext.so.6.4.0 [.] 0x000000000000bf10 +0.00% emacs-27.0.50 [.] Fsetq +0.00% emacs-27.0.50 [.] reseat_1 +0.00% emacs-27.0.50 [.] update_compositions this is what perf annotate shows when invoked on buf_charpos_to_bytepos (slow case): │ ↓ jle 438 ▒ 4,39 │ mov 0x20(%rax),%r8 ▒ 8,38 │ mov %rdx,%rbp ▒ 0,05 │2f0: mov %rbp,%rdx ▒ 0,13 │ mov %r8,%rdi ▒ 2,70 │ sub %r13,%rdx ▒ 2,22 │ sub %rbx,%rdi ▒ 0,05 │ cmp %rdi,%rdx ▒ │ ↑ je 1e9 ▒ │ ▒ │ /* If we are down to a range of 50 chars, ▒ │ don't bother checking any other markers; ▒ │ scan the intervening chars directly now. */ ▒ │ if (best_above - best_below < 50) ▒ 3,21 │305: cmp $0x31,%rdx ▒ │ ↓ jle 480 ▒ │ CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); ▒ │ ▒ │ if (b == cached_buffer && BUF_MODIFF (b) == cached_modiff) ▒ │ CONSIDER (cached_charpos, cached_bytepos); ▒ │ ▒ │ for (tail = BUF_MARKERS (b); tail; tail = tail->next) ▒ 2,50 │ mov 0x10(%rax),%rax ▒ 4,25 │ test %rax,%rax ▒ │ ↓ je 480 ▒ │ { ◆ │ CONSIDER (tail->charpos, tail->bytepos); ▒ 2,64 │31c: mov 0x18(%rax),%rdx ▒ 59,16 │ cmp %rdx,%rsi ▒ │ ↓ je 638 ▒ 5,70 │ cmp %rdx,%rsi ▒ │ ↑ jl 2e0 ▒ 0,00 │ cmp %rdx,%r13 ▒ │ ↓ jge 438 ▒ 0,00 │ mov 0x20(%rax),%rbx ▒ 0,01 │ mov %rdx,%r13 ▒ │ ↑ jmp 2f0 ▒ │ marker_byte_position(): ▒ │ if (!buf) ▒ │ error ("Marker does not point anywhere"); ▒ │ ▒ │ eassert (BUF_BEG_BYTE (buf) <= m->bytepos && m->bytepos <= BUF_Z_BYTE (b▒ │ ▒ │ return m->bytepos; ▒ │340: mov 0x20(%rdi),%rbx ▒ │ ↑ jmpq cb ▒ │ nop ▒ │ buf_charpos_to_bytepos(): ▒ │ ▒ │ If at any point we can tell that the space between those ▒ │ two best approximations is all single-byte, ▒ │ we interpolate the result immediately. */ ▒ │ ▒ │ CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); ▒ │350: mov 0x2f0(%rdi),%r13 ▒ │ cmp %rsi,%r13 ▒ to my eyes, the fast case doesn't look too different though: │ ↓ jle 438 0,83 │ mov 0x20(%rax),%r8 0,93 │ mov %rdx,%rbp │2f0: mov %rbp,%rdx 0,06 │ mov %r8,%rdi 0,20 │ sub %r13,%rdx 0,11 │ sub %rbx,%rdi 0,12 │ cmp %rdi,%rdx │ ↑ je 1e9 │ │ /* If we are down to a range of 50 chars, │ don't bother checking any other markers; │ scan the intervening chars directly now. */ │ if (best_above - best_below < 50) 11,24 │305: cmp $0x31,%rdx │ ↓ jle 480 │ CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); │ │ if (b == cached_buffer && BUF_MODIFF (b) == cached_modiff) │ CONSIDER (cached_charpos, cached_bytepos); │ │ for (tail = BUF_MARKERS (b); tail; tail = tail->next) 0,16 │ mov 0x10(%rax),%rax 10,41 │ test %rax,%rax │ ↓ je 480 │ { │ CONSIDER (tail->charpos, tail->bytepos); 3,19 │31c: mov 0x18(%rax),%rdx 42,86 │ cmp %rdx,%rsi │ ↓ je 638 10,36 │ cmp %rdx,%rsi │ ↑ jl 2e0 2,24 │ cmp %rdx,%r13 │ ↓ jge 438 0,76 │ mov 0x20(%rax),%rbx 0,43 │ mov %rdx,%r13 │ ↑ jmp 2f0 │ marker_byte_position(): │ if (!buf) │ error ("Marker does not point anywhere"); │ │ eassert (BUF_BEG_BYTE (buf) <= m->bytepos && m->bytepos <= BUF_Z_BYTE (bu │ │ return m->bytepos; │340: mov 0x20(%rdi),%rbx │ ↑ jmpq cb │ nop │ buf_charpos_to_bytepos(): │ │ If at any point we can tell that the space between those │ two best approximations is all single-byte, │ we interpolate the result immediately. */ │ │ CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); │350: mov 0x2f0(%rdi),%r13 │ cmp %rsi,%r13 I hope this is of some use, but I'll keep looking for open source files to reproduce the issue ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-21 0:36 ` Sebastian Sturm @ 2018-03-21 6:47 ` Eli Zaretskii 2018-03-22 13:16 ` Stefan Monnier 1 sibling, 0 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-21 6:47 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Wed, 21 Mar 2018 01:36:38 +0100 > > this is the profiler report I get for the slow case (BTW, is there a way > to have the profiler resolve functions within line-number-at-pos? Yes: load simple.el manually before running the benchmark. > with perf, the ("self") time taken by buf_charpos_to_bytepos increases > from ~60% (fast case) to >98%. This is the diff generated by perf diff > <fast.perf> <slow.perf>: > > # Event 'cycles' > # > # Baseline Delta Shared Object Symbol > > # ........ ....... .................... > .......................................... > # > 57.77% +40.30% emacs-27.0.50 [.] buf_charpos_to_bytepos This seems to confirm Stefan's guess that converting character positions to byte positions takes most of the time, which might make sense with a lot of overlays (because each overlay uses 2 markers). Does your code call line-number-at-pos at random positions in the buffer, or are the positions close to one another? If the latter, you might be better off calling count-lines directly, starting at the line where you previously calculated the line number, instead of calling line-number-at-pos, which always begins at point-min. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-21 0:36 ` Sebastian Sturm 2018-03-21 6:47 ` Eli Zaretskii @ 2018-03-22 13:16 ` Stefan Monnier 2018-03-22 19:54 ` Sebastian Sturm 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-22 13:16 UTC (permalink / raw) To: emacs-devel > this is the profiler report I get for the slow case (BTW, is there a way to > have the profiler resolve functions within line-number-at-pos? It should do that without you asking. I mean, it won't show you `goto-char` and `point-min` kind of things since these are "inlined" (actually turned into their own byte-code), but `count-lines` should definitely be there. > - let 18368 86% > line-number-at-pos 18348 86% It's very odd that there's no `count-lines` down here (and `count-lines` is a perfectly normal Elisp function that's not inlined or otherwise treated specially), since it should be where most of the time is spent! If we assume the code works as intended, it would imply that the time is not spent in `count-lines` but elsewhere, e.g. in `goto-char`. > with perf, the ("self") time taken by buf_charpos_to_bytepos increases from > ~60% (fast case) to >98%. This is the diff generated by perf diff > <fast.perf> <slow.perf>: So I guess that could be it: the (goto-char opoint) spends an inordinate amount of time in buf_charpos_to_bytepos. Could you try the patch below, to see if it makes a difference? Stefan diff --git a/src/marker.c b/src/marker.c index 7773c4fce0..3d808fd6fa 100644 --- a/src/marker.c +++ b/src/marker.c @@ -141,6 +141,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = 50; eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b)); @@ -180,8 +181,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - best_below < distance) break; + else + distance = distance + 10; } /* We get here if we did not exactly hit one of the known places. ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-22 13:16 ` Stefan Monnier @ 2018-03-22 19:54 ` Sebastian Sturm 2018-03-22 20:04 ` Sebastian Sturm 0 siblings, 1 reply; 54+ messages in thread From: Sebastian Sturm @ 2018-03-22 19:54 UTC (permalink / raw) To: emacs-devel > > It should do that without you asking. I mean, it won't show you > `goto-char` and `point-min` kind of things since these are "inlined" > (actually turned into their own byte-code), but `count-lines` should > definitely be there. > >> - let 18368 86% >> line-number-at-pos 18348 86% > > It's very odd that there's no `count-lines` down here (and `count-lines` > is a perfectly normal Elisp function that's not inlined or otherwise > treated specially), since it should be where most of the time is spent! when first evaluating simple.el as Eli suggested, the profiler indeed shows that pretty much all time is spent somewhere within `count-lines`. Unfortunately the patch didn't seem to help though I'll double-check at home (also, it seemed to improve baseline performance, but I'm not sure if I should regard that as a random fluctuation, will need to do more measurements). Also, this time even the first measurement with a small number of overlays was extremely slow, not sure what to make of that. In general however, the trend seems clear to me --- for this one file, overlays hurt `line-number-at-pos` performance very much, except when using the noverlay branch. Are there other things I should look for in my file that might negatively affect performance (strange codepoints or anything else that might perhaps upset `buf_charpos_to_bytepos`)? baseline (emacs-27 master #667cdf42..., approx. 5-10 overlays created by flycheck): iteration 1: 0.012696 iteration 2: 0.002595 iteration 3: 0.002606 iteration 4: 0.002601 iteration 5: 0.002649 iteration 6: 0.002605 iteration 7: 0.002594 iteration 8: 0.002601 iteration 9: 0.002603 iteration 10: 0.002601 iteration 11: 0.002606 iteration 12: 0.002626 iteration 13: 0.002603 iteration 14: 0.002642 iteration 15: 0.002599 iteration 16: 0.002598 iteration 17: 0.002600 iteration 18: 0.002601 iteration 19: 0.002599 iteration 20: 0.002608 emacs-27 master, approx. 6200 overlays created by (test-highlight) iteration 1: 0.022795 iteration 2: 0.015560 iteration 3: 0.015697 iteration 4: 0.015913 iteration 5: 0.015894 iteration 6: 0.016063 iteration 7: 0.015928 iteration 8: 0.015890 iteration 9: 0.015278 iteration 10: 0.015515 iteration 11: 0.015327 iteration 12: 0.015326 iteration 13: 0.015574 iteration 14: 0.015319 iteration 15: 0.015370 iteration 16: 0.015354 iteration 17: 0.015333 iteration 18: 0.015312 iteration 19: 0.015481 iteration 20: 0.015411 emacs-27 master + patch "distance+10", ~10 overlays created by flycheck: iteration 1: 0.002938 iteration 2: 0.001132 iteration 3: 0.000550 iteration 4: 0.000468 iteration 5: 0.000470 iteration 6: 0.000451 iteration 7: 0.000449 iteration 8: 0.000451 iteration 9: 0.000450 iteration 10: 0.000449 iteration 11: 0.000448 iteration 12: 0.000452 iteration 13: 0.000451 iteration 14: 0.000443 iteration 15: 0.000448 iteration 16: 0.000443 iteration 17: 0.000444 iteration 18: 0.000445 iteration 19: 0.000445 iteration 20: 0.000445 emacs-27 master + patch "distance+10", ~6200 overlays created by (test-highlight): iteration 1: 0.019673 iteration 2: 0.014469 iteration 3: 0.014491 iteration 4: 0.014430 iteration 5: 0.014493 iteration 6: 0.014704 iteration 7: 0.014741 iteration 8: 0.014536 iteration 9: 0.014433 iteration 10: 0.014469 iteration 11: 0.014429 iteration 12: 0.014509 iteration 13: 0.014484 iteration 14: 0.014487 iteration 15: 0.014524 iteration 16: 0.014449 iteration 17: 0.014501 iteration 18: 0.014469 iteration 19: 0.014429 iteration 20: 0.014507 noverlay branch (#886933...), ~10 overlays: iteration 1: 0.002370 iteration 2: 0.001191 iteration 3: 0.001189 iteration 4: 0.001162 iteration 5: 0.001095 iteration 6: 0.001096 iteration 7: 0.000522 iteration 8: 0.000531 iteration 9: 0.000369 iteration 10: 0.000365 iteration 11: 0.000365 iteration 12: 0.000366 iteration 13: 0.000365 iteration 14: 0.000380 iteration 15: 0.000367 iteration 16: 0.000365 iteration 17: 0.000365 iteration 18: 0.000365 iteration 19: 0.000363 iteration 20: 0.000368 noverlay branch, ~6200 overlays: iteration 1: 0.001878 iteration 2: 0.001831 iteration 3: 0.001838 iteration 4: 0.001826 iteration 5: 0.001027 iteration 6: 0.000722 iteration 7: 0.000531 iteration 8: 0.000479 iteration 9: 0.000480 iteration 10: 0.000479 iteration 11: 0.000479 iteration 12: 0.000479 iteration 13: 0.000478 iteration 14: 0.000479 iteration 15: 0.000478 iteration 16: 0.000478 iteration 17: 0.000482 iteration 18: 0.000479 iteration 19: 0.000480 iteration 20: 0.000476 ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-22 19:54 ` Sebastian Sturm @ 2018-03-22 20:04 ` Sebastian Sturm 0 siblings, 0 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-22 20:04 UTC (permalink / raw) To: emacs-devel for the record, this is what I get with the (non-noverlay, non-patched) emacs-27 master branch after calling `(set-buffer-multibyte nil)`. Guess this merely confirms what you already found out, but for me this is a viable workaround until the issue is properly resolved. Will keep looking for an open source test file over the weekend emacs-27 master, ~6200 overlays, enable-multibyte-characters = nil: iteration 1: 0.000300 iteration 2: 0.000268 iteration 3: 0.000267 iteration 4: 0.000268 iteration 5: 0.000267 iteration 6: 0.000266 iteration 7: 0.000263 iteration 8: 0.000263 iteration 9: 0.000277 iteration 10: 0.000269 iteration 11: 0.000264 iteration 12: 0.000266 iteration 13: 0.000263 iteration 14: 0.000265 iteration 15: 0.000273 iteration 16: 0.000267 iteration 17: 0.000266 iteration 18: 0.000266 iteration 19: 0.000265 iteration 20: 0.000263 On 03/22/2018 08:54 PM, Sebastian Sturm wrote: > > > > It should do that without you asking. I mean, it won't show you > > `goto-char` and `point-min` kind of things since these are "inlined" > > (actually turned into their own byte-code), but `count-lines` should > > definitely be there. > > > >> - let 18368 86% > >> line-number-at-pos 18348 86% > > > > It's very odd that there's no `count-lines` down here (and `count-lines` > > is a perfectly normal Elisp function that's not inlined or otherwise > > treated specially), since it should be where most of the time is spent! > > when first evaluating simple.el as Eli suggested, the profiler indeed > shows that pretty much all time is spent somewhere within `count-lines`. > Unfortunately the patch didn't seem to help though I'll double-check at > home (also, it seemed to improve baseline performance, but I'm not sure > if I should regard that as a random fluctuation, will need to do more > measurements). > Also, this time even the first measurement with a small number of > overlays was extremely slow, not sure what to make of that. > > In general however, the trend seems clear to me --- for this one file, > overlays hurt `line-number-at-pos` performance very much, except when > using the noverlay branch. > Are there other things I should look for in my file that might > negatively affect performance (strange codepoints or anything else that > might perhaps upset `buf_charpos_to_bytepos`)? > > baseline (emacs-27 master #667cdf42..., approx. 5-10 overlays created by > flycheck): > iteration 1: 0.012696 > iteration 2: 0.002595 > iteration 3: 0.002606 > iteration 4: 0.002601 > iteration 5: 0.002649 > iteration 6: 0.002605 > iteration 7: 0.002594 > iteration 8: 0.002601 > iteration 9: 0.002603 > iteration 10: 0.002601 > iteration 11: 0.002606 > iteration 12: 0.002626 > iteration 13: 0.002603 > iteration 14: 0.002642 > iteration 15: 0.002599 > iteration 16: 0.002598 > iteration 17: 0.002600 > iteration 18: 0.002601 > iteration 19: 0.002599 > iteration 20: 0.002608 > > emacs-27 master, approx. 6200 overlays created by (test-highlight) > iteration 1: 0.022795 > iteration 2: 0.015560 > iteration 3: 0.015697 > iteration 4: 0.015913 > iteration 5: 0.015894 > iteration 6: 0.016063 > iteration 7: 0.015928 > iteration 8: 0.015890 > iteration 9: 0.015278 > iteration 10: 0.015515 > iteration 11: 0.015327 > iteration 12: 0.015326 > iteration 13: 0.015574 > iteration 14: 0.015319 > iteration 15: 0.015370 > iteration 16: 0.015354 > iteration 17: 0.015333 > iteration 18: 0.015312 > iteration 19: 0.015481 > iteration 20: 0.015411 > > emacs-27 master + patch "distance+10", ~10 overlays created by flycheck: > iteration 1: 0.002938 > iteration 2: 0.001132 > iteration 3: 0.000550 > iteration 4: 0.000468 > iteration 5: 0.000470 > iteration 6: 0.000451 > iteration 7: 0.000449 > iteration 8: 0.000451 > iteration 9: 0.000450 > iteration 10: 0.000449 > iteration 11: 0.000448 > iteration 12: 0.000452 > iteration 13: 0.000451 > iteration 14: 0.000443 > iteration 15: 0.000448 > iteration 16: 0.000443 > iteration 17: 0.000444 > iteration 18: 0.000445 > iteration 19: 0.000445 > iteration 20: 0.000445 > > emacs-27 master + patch "distance+10", ~6200 overlays created by > (test-highlight): > iteration 1: 0.019673 > iteration 2: 0.014469 > iteration 3: 0.014491 > iteration 4: 0.014430 > iteration 5: 0.014493 > iteration 6: 0.014704 > iteration 7: 0.014741 > iteration 8: 0.014536 > iteration 9: 0.014433 > iteration 10: 0.014469 > iteration 11: 0.014429 > iteration 12: 0.014509 > iteration 13: 0.014484 > iteration 14: 0.014487 > iteration 15: 0.014524 > iteration 16: 0.014449 > iteration 17: 0.014501 > iteration 18: 0.014469 > iteration 19: 0.014429 > iteration 20: 0.014507 > > noverlay branch (#886933...), ~10 overlays: > iteration 1: 0.002370 > iteration 2: 0.001191 > iteration 3: 0.001189 > iteration 4: 0.001162 > iteration 5: 0.001095 > iteration 6: 0.001096 > iteration 7: 0.000522 > iteration 8: 0.000531 > iteration 9: 0.000369 > iteration 10: 0.000365 > iteration 11: 0.000365 > iteration 12: 0.000366 > iteration 13: 0.000365 > iteration 14: 0.000380 > iteration 15: 0.000367 > iteration 16: 0.000365 > iteration 17: 0.000365 > iteration 18: 0.000365 > iteration 19: 0.000363 > iteration 20: 0.000368 > > noverlay branch, ~6200 overlays: > iteration 1: 0.001878 > iteration 2: 0.001831 > iteration 3: 0.001838 > iteration 4: 0.001826 > iteration 5: 0.001027 > iteration 6: 0.000722 > iteration 7: 0.000531 > iteration 8: 0.000479 > iteration 9: 0.000480 > iteration 10: 0.000479 > iteration 11: 0.000479 > iteration 12: 0.000479 > iteration 13: 0.000478 > iteration 14: 0.000479 > iteration 15: 0.000478 > iteration 16: 0.000478 > iteration 17: 0.000482 > iteration 18: 0.000479 > iteration 19: 0.000480 > iteration 20: 0.000476 > ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-20 1:23 ` Sebastian Sturm 2018-03-20 6:30 ` Eli Zaretskii @ 2018-03-22 20:52 ` Stefan Monnier 2018-03-22 23:11 ` Sebastian Sturm 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-22 20:52 UTC (permalink / raw) To: emacs-devel > (defun benchmark-often () > (cl-loop for n from 1 upto 20 do > (message (format "iteration %d: %f" n (nth 0 (benchmark-run > (line-number-at-pos (point)))))))) ^^^^^^^ Where is this "point" in your tests (I expect the timing to vary significantly depending on this). > 1st run: > iteration 1: 0.001213 > iteration 2: 0.001170 > iteration 3: 0.001170 > iteration 4: 0.001238 > iteration 5: 0.001163 > iteration 6: 0.001153 > iteration 7: 0.000421 > iteration 8: 0.000426 > iteration 9: 0.000322 > iteration 10: 0.000301 > iteration 11: 0.000291 > iteration 12: 0.000292 > iteration 13: 0.000291 > iteration 14: 0.000291 > iteration 15: 0.000295 > iteration 16: 0.000289 > iteration 17: 0.000289 > iteration 18: 0.000288 > iteration 19: 0.000288 > iteration 20: 0.000287 I recommend you don't bother outputting all 20 results: better summarize it by getting rid of the first test and then giving e.g. the sum or the median of the rest. > I'm not allowed to share my employer's source code as a test case, so > I tried the same procedure with the similarly large DeclBase.h from the > public LLVM repository. To my surprise, DeclBase.h didn't suffer from any > performance issues at all. My crystal ball tells me that DeclBase.h is pure ASCII so byte<->char conversion is trivial, whereas your file likely contains umlauts and other disreputable characters. Here's a similar test case to yours but which builds up its own artificial buffer with a few non-ascii chars to spice things up: (with-temp-buffer (dotimes (i 1000) (insert "lksajflahalskjdféefawrgfrüegf\n")) (let ((txtbuf (current-buffer))) (dotimes (s 4) (with-temp-buffer (insert-buffer-substring txtbuf) (let ((stepsize (lsh 10 (* 4 s)))) (cl-loop for n from (point-min) upto (- (point-max) stepsize) by stepsize do (let ((ov (make-overlay n (+ (1- stepsize) n)))) (overlay-put ov 'cquery-sem-highlight t)))) (dotimes (i 4) (let ((timing (benchmark-run 1000 (line-number-at-pos (+ (point-min) (* i (/ (buffer-size) 4))))))) (message "ols=%S pos=%S/4 time=%.4f (+ %S)" (/ (buffer-size) (lsh 10 (* 4 s))) i (car timing) (cdr timing))) ))))) This gave me (on my top-of-the-line Thinkpad T61 using Debian's `emacs25`): ols=3000 pos=0/4 time=0.0018 (+ (0 0.0)) ols=3000 pos=1/4 time=6.1074 (+ (0 0.0)) ols=3000 pos=2/4 time=10.6876 (+ (0 0.0)) ols=3000 pos=3/4 time=13.7854 (+ (0 0.0)) ols=187 pos=0/4 time=0.0016 (+ (0 0.0)) ols=187 pos=1/4 time=0.3055 (+ (0 0.0)) ols=187 pos=2/4 time=0.6001 (+ (0 0.0)) ols=187 pos=3/4 time=0.8903 (+ (0 0.0)) ols=11 pos=0/4 time=0.0015 (+ (0 0.0)) ols=11 pos=1/4 time=0.0769 (+ (1 0.006324223)) ols=11 pos=2/4 time=0.1439 (+ (0 0.0)) ols=11 pos=3/4 time=0.2215 (+ (0 0.0)) ols=0 pos=0/4 time=0.0015 (+ (0 0.0)) ols=0 pos=1/4 time=0.0548 (+ (0 0.0)) ols=0 pos=2/4 time=0.1102 (+ (0 0.0)) ols=0 pos=3/4 time=0.1690 (+ (0 0.0)) Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-22 20:52 ` Stefan Monnier @ 2018-03-22 23:11 ` Sebastian Sturm 2018-03-23 5:03 ` Stefan Monnier 2018-03-23 8:07 ` Eli Zaretskii 0 siblings, 2 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-22 23:11 UTC (permalink / raw) To: emacs-devel On 03/22/2018 09:52 PM, Stefan Monnier wrote: >> (defun benchmark-often () >> (cl-loop for n from 1 upto 20 do >> (message (format "iteration %d: %f" n (nth 0 (benchmark-run >> (line-number-at-pos (point)))))))) > ^^^^^^^ > Where is this "point" in your tests (I expect the timing to vary > significantly depending on this). yes, these tests were all performed close to the very bottom of the file as I knew the issue to get worse towards the buffer end > My crystal ball tells me that DeclBase.h is pure ASCII so byte<->char > conversion is trivial, whereas your file likely contains umlauts and > other disreputable characters. > > Here's a similar test case to yours but which builds up its own > artificial buffer with a few non-ascii chars to spice things up: thank you! I'm very glad you could come up with a reproducible test case, and it's true that my file contains two instances of the greek letter "μ" that seem to cause this performance issue (though I was surprised to see that this few characters could have such a poisonous effect). Likewise, when replacing the topmost part in your benchmark function with the following: (A) (dotimes (i 1000) (insert "pure ascii pure ascii pure ascii\n")) (B) (dotimes (i 500) (insert "pure ascii pure ascii pure ascii\n")) (insert "μ") (dotimes (i 500) (insert "pure ascii pure ascii pure ascii\n")) respectively, I obtain the following timing results: (A) ols=3300 pos=0/4 time=0.0014 (+ (0 0.0)) ols=3300 pos=1/4 time=0.0155 (+ (0 0.0)) ols=3300 pos=2/4 time=0.0281 (+ (0 0.0)) ols=3300 pos=3/4 time=0.0447 (+ (0 0.0)) ols=206 pos=0/4 time=0.0007 (+ (0 0.0)) ols=206 pos=1/4 time=0.0130 (+ (0 0.0)) ols=206 pos=2/4 time=0.0283 (+ (0 0.0)) ols=206 pos=3/4 time=0.0447 (+ (0 0.0)) ols=12 pos=0/4 time=0.0007 (+ (0 0.0)) ols=12 pos=1/4 time=0.0129 (+ (0 0.0)) ols=12 pos=2/4 time=0.0281 (+ (0 0.0)) ols=12 pos=3/4 time=0.0447 (+ (0 0.0)) ols=0 pos=0/4 time=0.0007 (+ (0 0.0)) ols=0 pos=1/4 time=0.0134 (+ (0 0.0)) ols=0 pos=2/4 time=0.0297 (+ (0 0.0)) ols=0 pos=3/4 time=0.0463 (+ (0 0.0)) (B) ols=3300 pos=0/4 time=0.0007 (+ (0 0.0)) ols=3300 pos=1/4 time=0.0301 (+ (0 0.0)) ols=3300 pos=2/4 time=0.0482 (+ (0 0.0)) ols=3300 pos=3/4 time=8.0213 (+ (0 0.0)) ols=206 pos=0/4 time=0.0007 (+ (0 0.0)) ols=206 pos=1/4 time=0.0141 (+ (0 0.0)) ols=206 pos=2/4 time=0.0325 (+ (0 0.0)) ols=206 pos=3/4 time=0.1786 (+ (0 0.0)) ols=12 pos=0/4 time=0.0007 (+ (0 0.0)) ols=12 pos=1/4 time=0.0136 (+ (0 0.0)) ols=12 pos=2/4 time=0.0323 (+ (0 0.0)) ols=12 pos=3/4 time=0.0794 (+ (0 0.0)) ols=0 pos=0/4 time=0.0009 (+ (0 0.0)) ols=0 pos=1/4 time=0.0139 (+ (0 0.0)) ols=0 pos=2/4 time=0.0326 (+ (0 0.0)) ols=0 pos=3/4 time=0.0632 (+ (0 0.0)) by comparison, these are my results using the noverlay branch: (A) ols=3300 pos=0/4 time=0.0012 (+ (0 0.0)) ols=3300 pos=1/4 time=0.0132 (+ (0 0.0)) ols=3300 pos=2/4 time=0.0291 (+ (0 0.0)) ols=3300 pos=3/4 time=0.0448 (+ (0 0.0)) ols=206 pos=0/4 time=0.0007 (+ (0 0.0)) ols=206 pos=1/4 time=0.0132 (+ (0 0.0)) ols=206 pos=2/4 time=0.0290 (+ (0 0.0)) ols=206 pos=3/4 time=0.0454 (+ (0 0.0)) ols=12 pos=0/4 time=0.0008 (+ (0 0.0)) ols=12 pos=1/4 time=0.0131 (+ (0 0.0)) ols=12 pos=2/4 time=0.0287 (+ (0 0.0)) ols=12 pos=3/4 time=0.0452 (+ (0 0.0)) ols=0 pos=0/4 time=0.0007 (+ (0 0.0)) ols=0 pos=1/4 time=0.0131 (+ (0 0.0)) ols=0 pos=2/4 time=0.0289 (+ (0 0.0)) ols=0 pos=3/4 time=0.0457 (+ (0 0.0)) (B) ols=3300 pos=0/4 time=0.0015 (+ (0 0.0)) ols=3300 pos=1/4 time=0.0177 (+ (0 0.0)) ols=3300 pos=2/4 time=0.0345 (+ (0 0.0)) ols=3300 pos=3/4 time=0.0544 (+ (0 0.0)) ols=206 pos=0/4 time=0.0008 (+ (0 0.0)) ols=206 pos=1/4 time=0.0136 (+ (0 0.0)) ols=206 pos=2/4 time=0.0317 (+ (0 0.0)) ols=206 pos=3/4 time=0.0537 (+ (0 0.0)) ols=12 pos=0/4 time=0.0007 (+ (0 0.0)) ols=12 pos=1/4 time=0.0135 (+ (0 0.0)) ols=12 pos=2/4 time=0.0319 (+ (0 0.0)) ols=12 pos=3/4 time=0.0537 (+ (0 0.0)) ols=0 pos=0/4 time=0.0007 (+ (0 0.0)) ols=0 pos=1/4 time=0.0146 (+ (0 0.0)) ols=0 pos=2/4 time=0.0318 (+ (0 0.0)) ols=0 pos=3/4 time=0.0554 (+ (0 0.0)) since noverlay performs so well, I guess the technical issue here is already solved and I'll just have to wait for it to make it into the master branch. Until then I'll continue using feature/noverlay, but if a more recent merge with master was made available, I'd be interested in testing that. thanks again for all the helpful responses in this thread, Sebastian ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-22 23:11 ` Sebastian Sturm @ 2018-03-23 5:03 ` Stefan Monnier 2018-03-23 12:25 ` Sebastian Sturm 2018-03-23 8:07 ` Eli Zaretskii 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-23 5:03 UTC (permalink / raw) To: emacs-devel > since noverlay performs so well, I guess the technical issue here is already > solved and I'll just have to wait for it to make it into the master > branch. Yes and no: there can still be many markers (even without having any overlays) and that poses the same problem (and the noverlays branch doesn't change that). I've made some further tests and found that the patch I sent indeed doesn't help tremendously with "distance += 10" but that when we reach "+= 50" the effect is more significant. Could you try the patch below (ideally not just with artificial tests but also in actual use) to see if it helps? Stefan diff --git a/src/marker.c b/src/marker.c index 7773c4fce0..6ab0d3d61e 100644 --- a/src/marker.c +++ b/src/marker.c @@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x) CHECK_TYPE (MARKERP (x), Qmarkerp, x); } +/* When converting bytes from/to chars, we look through the list of + markers to try and find a good starting point (since markers keep + track of both bytepos and charpos at the same time). + But if there are many markers, it can take too much time to find a "good" + marker from which to start. Worse yet: if it takes a long time and we end + up finding a nearby markers, we won't add a new marker to cache this + result, so next time around we'll have to go through this same long list + to (re)find this best marker. So the further down the list of + markers we go, the less demanding we are w.r.t what is a good marker. + + The previous code used INITIAL=50 and INCREMENT=0 and this lead to + really poor performance when there are many markers. + I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad + T61 using various artificial test cases seem to suggest that INCREMENT=50 + might be "the best compromise": it significantly improved the + worst case and it was rarely slower and never by much. + + The asymptotic behavior is still poor, tho, so in largish buffers with many + overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */ +#define BYTECHAR_DISTANCE_INITIAL 50 +#define BYTECHAR_DISTANCE_INCREMENT 50 + /* Return the byte position corresponding to CHARPOS in B. */ ptrdiff_t @@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b)); @@ -180,8 +203,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - best_below < distance) break; + else + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. @@ -293,6 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b)); @@ -323,8 +349,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - best_below < distance) break; + else + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 5:03 ` Stefan Monnier @ 2018-03-23 12:25 ` Sebastian Sturm 2018-03-23 12:47 ` Eli Zaretskii 0 siblings, 1 reply; 54+ messages in thread From: Sebastian Sturm @ 2018-03-23 12:25 UTC (permalink / raw) To: emacs-devel I haven't tested this very extensively yet, but artificial benchmark results are now comparable to the noverlay branch and editing seems similarly fluid. Many thanks for that! On a tangential note, removing the bottleneck now brought up another hotspot in my performance profile (less severe than the one previously reported) that seems to be due to cc-mode itself. Since I don't want to derail this thread, I'll try to implement some of the suggestions made in the "Latency profiling?" thread and come up with some reproducible data on that other issue. On 03/23/2018 06:03 AM, Stefan Monnier wrote: >> since noverlay performs so well, I guess the technical issue here is already >> solved and I'll just have to wait for it to make it into the master >> branch. > > Yes and no: there can still be many markers (even without having any > overlays) and that poses the same problem (and the noverlays branch > doesn't change that). > > I've made some further tests and found that the patch I sent indeed > doesn't help tremendously with "distance += 10" but that when we reach > "+= 50" the effect is more significant. > > Could you try the patch below (ideally not just with artificial tests > but also in actual use) to see if it helps? > > > Stefan > > > diff --git a/src/marker.c b/src/marker.c > index 7773c4fce0..6ab0d3d61e 100644 > --- a/src/marker.c > +++ b/src/marker.c > @@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x) > CHECK_TYPE (MARKERP (x), Qmarkerp, x); > } > > +/* When converting bytes from/to chars, we look through the list of > + markers to try and find a good starting point (since markers keep > + track of both bytepos and charpos at the same time). > + But if there are many markers, it can take too much time to find a "good" > + marker from which to start. Worse yet: if it takes a long time and we end > + up finding a nearby markers, we won't add a new marker to cache this > + result, so next time around we'll have to go through this same long list > + to (re)find this best marker. So the further down the list of > + markers we go, the less demanding we are w.r.t what is a good marker. > + > + The previous code used INITIAL=50 and INCREMENT=0 and this lead to > + really poor performance when there are many markers. > + I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad > + T61 using various artificial test cases seem to suggest that INCREMENT=50 > + might be "the best compromise": it significantly improved the > + worst case and it was rarely slower and never by much. > + > + The asymptotic behavior is still poor, tho, so in largish buffers with many > + overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */ > +#define BYTECHAR_DISTANCE_INITIAL 50 > +#define BYTECHAR_DISTANCE_INCREMENT 50 > + > /* Return the byte position corresponding to CHARPOS in B. */ > > ptrdiff_t > @@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) > struct Lisp_Marker *tail; > ptrdiff_t best_above, best_above_byte; > ptrdiff_t best_below, best_below_byte; > + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; > > eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b)); > > @@ -180,8 +203,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) > /* If we are down to a range of 50 chars, > don't bother checking any other markers; > scan the intervening chars directly now. */ > - if (best_above - best_below < 50) > + if (best_above - best_below < distance) > break; > + else > + distance += BYTECHAR_DISTANCE_INCREMENT; > } > > /* We get here if we did not exactly hit one of the known places. > @@ -293,6 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) > struct Lisp_Marker *tail; > ptrdiff_t best_above, best_above_byte; > ptrdiff_t best_below, best_below_byte; > + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; > > eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b)); > > @@ -323,8 +349,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) > /* If we are down to a range of 50 chars, > don't bother checking any other markers; > scan the intervening chars directly now. */ > - if (best_above - best_below < 50) > + if (best_above - best_below < distance) > break; > + else > + distance += BYTECHAR_DISTANCE_INCREMENT; > } > > /* We get here if we did not exactly hit one of the known places. > > -- Sebastian Sturm Research & Development Phone: +49 (0) 6155 7808977 Fax: +49 (0) 6155 7802880 Email: s.sturm@arkona-technologies.de Web: www.arkona-technologies.de arkona technologies GmbH Im Leuschnerpark 4 64347 Griesheim Germany Amtsgericht / Commercial Register of Darmstadt, HRB 90080 USt-ID: DE273794666 Steuernummer / Tax-ID: 007 / 228 / 19331 Geschäftsführung / Managing Director: Rainer Sturm This e-mail may contain confidential and/or privileged information. If you are not the intended recipient (or have received this e-mail in error) please notify the sender immediately and delete this message and any attachments from your system. Any unauthorised copying, disclosure or distribution of the material in this e-mail is strictly forbidden. We cannot accept any responsibility for the accuracy or completeness of this message as it has been transmitted over a public network. If, despite our use of anti-virus software, a virus enters your systems in connection with the sending of the e-mail, you may not hold us liable for any damages that may possibly arise in that connection. We will accept liability which by law we cannot exclude. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 12:25 ` Sebastian Sturm @ 2018-03-23 12:47 ` Eli Zaretskii 2018-03-23 13:19 ` Stefan Monnier 0 siblings, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-23 12:47 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Fri, 23 Mar 2018 13:25:19 +0100 > > I haven't tested this very extensively yet, but artificial benchmark > results are now comparable to the noverlay branch and editing seems > similarly fluid. Many thanks for that! I'd be interested to see a comparison with a code that ignores the markers entirely, and uses just these 4: CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b)); CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b)); CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of the number of markers. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 12:47 ` Eli Zaretskii @ 2018-03-23 13:19 ` Stefan Monnier 2018-03-23 13:37 ` Noam Postavsky 2018-03-23 14:22 ` Eli Zaretskii 0 siblings, 2 replies; 54+ messages in thread From: Stefan Monnier @ 2018-03-23 13:19 UTC (permalink / raw) To: emacs-devel > I'd be interested to see a comparison with a code that ignores the > markers entirely, and uses just these 4: > > CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); > CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b)); > CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b)); > CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); I tried that, and in the synthetic benchmark which tries to reproduce Sebastian's lsp-mode situation the result was indeed much better, but then in other benchmarks it caused very significant slowdowns. > That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of > the number of markers. I haven't investigated closely enough to be sure, but in my tests with INCREMENT=50 I reached points where adding more overlays did not make things worse any more, so I suspect that the ideal INCREMENT depends on the buffer size more than on the number of markers. In any case, my patch is just a "quick hack" to try and reduce the pain, but it doesn't really solve the problem: the slow down with many markers and a large buffer still grows pretty significantly with the size of the buffer. I think it's worse than O(sqrt N). If we want to really solve this problem, we should use an algorithm with at most an O(log N) complexity, e.g. keeping the markers in a red-black tree, or inside a sorted array (probably with a gap like we have for the buffer text) so we can do a binary search. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 13:19 ` Stefan Monnier @ 2018-03-23 13:37 ` Noam Postavsky 2018-03-23 13:55 ` Stefan Monnier 2018-03-23 14:22 ` Eli Zaretskii 1 sibling, 1 reply; 54+ messages in thread From: Noam Postavsky @ 2018-03-23 13:37 UTC (permalink / raw) To: Stefan Monnier; +Cc: Emacs developers On 23 March 2018 at 09:19, Stefan Monnier <monnier@iro.umontreal.ca> wrote: > In any case, my patch is just a "quick hack" to try and reduce the pain, > but it doesn't really solve the problem: the slow down with many markers > and a large buffer still grows pretty significantly with the size of > the buffer. I think it's worse than O(sqrt N). > > If we want to really solve this problem, we should use an algorithm with > at most an O(log N) complexity, e.g. keeping the markers in > a red-black tree, or inside a sorted array (probably with a gap like we > have for the buffer text) so we can do a binary search. Is this related to Bug#24548 "Long GC delays with many non-detached markers (PATCH)" aka Bug#29439 "Quadratic complexity in sweep_markers"? https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24548 https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29439 ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 13:37 ` Noam Postavsky @ 2018-03-23 13:55 ` Stefan Monnier 0 siblings, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-03-23 13:55 UTC (permalink / raw) To: Noam Postavsky; +Cc: Emacs developers >> If we want to really solve this problem, we should use an algorithm with >> at most an O(log N) complexity, e.g. keeping the markers in >> a red-black tree, or inside a sorted array (probably with a gap like we >> have for the buffer text) so we can do a binary search. > > Is this related to Bug#24548 "Long GC delays with many non-detached > markers (PATCH)" > aka Bug#29439 "Quadratic complexity in sweep_markers"? Not directly, no, Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 13:19 ` Stefan Monnier 2018-03-23 13:37 ` Noam Postavsky @ 2018-03-23 14:22 ` Eli Zaretskii 2018-03-23 14:39 ` Stefan Monnier 2018-03-23 19:39 ` Stefan Monnier 1 sibling, 2 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-23 14:22 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Fri, 23 Mar 2018 09:19:11 -0400 > > > I'd be interested to see a comparison with a code that ignores the > > markers entirely, and uses just these 4: > > > > CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); > > CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b)); > > CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b)); > > CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); > > I tried that, and in the synthetic benchmark which tries to reproduce > Sebastian's lsp-mode situation the result was indeed much better, but > then in other benchmarks it caused very significant slowdowns. In what benchmarks did it cause significant slowdowns? > > That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of > > the number of markers. > > I haven't investigated closely enough to be sure, but in my tests with > INCREMENT=50 I reached points where adding more overlays did not make > things worse any more, so I suspect that the ideal INCREMENT depends on > the buffer size more than on the number of markers. Could be. My point was that it isn't a constant. > If we want to really solve this problem, we should use an algorithm with > at most an O(log N) complexity, e.g. keeping the markers in > a red-black tree, or inside a sorted array (probably with a gap like we > have for the buffer text) so we can do a binary search. Yes, agreed. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 14:22 ` Eli Zaretskii @ 2018-03-23 14:39 ` Stefan Monnier 2018-03-23 19:39 ` Stefan Monnier 1 sibling, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-03-23 14:39 UTC (permalink / raw) To: emacs-devel >> > I'd be interested to see a comparison with a code that ignores the >> > markers entirely, and uses just these 4: >> > CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); >> > CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b)); >> > CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b)); >> > CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); >> I tried that, and in the synthetic benchmark which tries to reproduce >> Sebastian's lsp-mode situation the result was indeed much better, but >> then in other benchmarks it caused very significant slowdowns. > In what benchmarks did it cause significant slowdowns? Can't remember exactly, I think it was bad enough for a test case which seemed pretty realistic so I discarded that option. Basically what I remember is that I got the impression that it would probably harm more users than the problem at hand. Stefan PS: BTW, the number of markers is not the only issue: the order in which they are created also matters. Maybe we should try Sebastian's test case but creating the markers/overlays in random order (and if that helps, we could get a similar effect by making the GC randomly shuffle the buffers's list of markers ;-). ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 14:22 ` Eli Zaretskii 2018-03-23 14:39 ` Stefan Monnier @ 2018-03-23 19:39 ` Stefan Monnier 2018-03-25 15:11 ` Stefan Monnier 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-23 19:39 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel OK, I think I'm tired of these experiments. Here's my current test, along with the patches I use with it. You can test it with something like src/emacs -Q --batch -l ~/tmp/foo.el --eval '(setq internal--bytechar-distance-increment 50 internal--randomize-markers t)' --eval '(bytechar-test 3000 nil)' Shuffling the markers can make a noticeable difference in some cases but we're only talking about a factor of 2 or 3. It doesn't have much negative impact, so it's not a bad option, but the algorithmic problem remains anyway. Stefan (defun bytechar-test (buffer-kb &optional forward) (random "seed") (with-temp-buffer (dotimes (i (* buffer-kb 33)) (insert "lksajflahalskjdféefawrgfrüegf\n")) (message "buffer-size = %SkB" (/ (buffer-size) 1024)) (let ((txtbuf (current-buffer)) (goto-iterations (/ 10000000 buffer-kb)) (line-iterations (/ 20000 buffer-kb)) (markers ())) (dotimes (s 4) (with-temp-buffer (insert-buffer-substring txtbuf) (let ((stepsize (lsh 10 (* 4 s)))) (if forward (cl-loop for n from (point-min) upto (point-max) by stepsize do (push (copy-marker n) markers)) (cl-loop for n from (point-max) downto (point-min) by stepsize do (push (copy-marker n) markers)))) ;; The GC's internal--randomize-markers just brings-to-front every ;; 8th marker, so when starting with an ordered list of markers (like ;; in our case), we need to run the GC at least 8 times before the ;; whole list starts to look somewhat shuffled. (dotimes (i 20) (garbage-collect)) (let ((timing (benchmark-run goto-iterations (goto-char (+ (point-min) (random (buffer-size))))))) (message "ols=%S goto-random time=%.4f (+ %S)" (/ (buffer-size) (lsh 10 (* 4 s))) (car timing) (cdr timing))) (garbage-collect) ;throw away the transient markers (let ((timing (benchmark-run line-iterations (dotimes (i 5) (line-number-at-pos (+ (point-min) (* i (/ (buffer-size) 4)))))))) (message "nbmks=%S pos=*/4 time=%.4f (+ %S)" (/ (buffer-size) (lsh 10 (* 4 s))) (car timing) (cdr timing))) (dotimes (i 5) (let ((timing (benchmark-run line-iterations (line-number-at-pos (+ (point-min) (* i (/ (buffer-size) 4))))))) (message "nbmks=%S pos=%S/4 time=%.4f (+ %S)" (/ (buffer-size) (lsh 10 (* 4 s))) i (car timing) (cdr timing)))) ))))) diff --git a/lisp/emacs-lisp/benchmark.el b/lisp/emacs-lisp/benchmark.el index b86b56b81e..2f4e38fe35 100644 --- a/lisp/emacs-lisp/benchmark.el +++ b/lisp/emacs-lisp/benchmark.el @@ -50,7 +50,7 @@ benchmark-run garbage collections that ran, and the time taken by garbage collection. See also `benchmark-run-compiled'." (declare (indent 1) (debug t)) - (unless (natnump repetitions) + (unless (or (natnump repetitions) (symbolp repetitions)) (setq forms (cons repetitions forms) repetitions 1)) (let ((i (make-symbol "i")) @@ -58,7 +58,7 @@ benchmark-run (gc (make-symbol "gc"))) `(let ((,gc gc-elapsed) (,gcs gcs-done)) - (list ,(if (> repetitions 1) + (list ,(if (or (symbolp repetitions) (> repetitions 1)) ;; Take account of the loop overhead. `(- (benchmark-elapse (dotimes (,i ,repetitions) ,@forms)) @@ -101,7 +101,7 @@ benchmark For non-interactive use see also `benchmark-run' and `benchmark-run-compiled'." (interactive "p\nxForm: ") - (let ((result (eval `(benchmark-run ,repetitions ,form)))) + (let ((result (eval `(benchmark-run ,repetitions ,form) t))) (if (zerop (nth 1 result)) (message "Elapsed time: %fs" (car result)) (message "Elapsed time: %fs (%fs in %d GCs)" (car result) diff --git a/src/alloc.c b/src/alloc.c index 7ba872aaee..16d11e34cd 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -7270,10 +7270,22 @@ static void unchain_dead_markers (struct buffer *buffer) { struct Lisp_Marker *this, **prev = &BUF_MARKERS (buffer); + /* In order to try and avoid worst case behaviors in buf_charpos_to_bytepos + we try and randomize the order of markers here. */ + unsigned i = 4; while ((this = *prev)) if (this->gcmarkbit) - prev = &this->next; + { + if (!randomize_markers || i++ % 8) + prev = &this->next; + else + { /* Move this one to front, just to randomize things a bit. */ + *prev = this->next; + this->next = BUF_MARKERS (buffer); + BUF_MARKERS (buffer) = this; + } + } else { this->buffer = NULL; @@ -7752,6 +7764,9 @@ The time is in seconds as a floating point value. */); DEFVAR_INT ("gcs-done", gcs_done, doc: /* Accumulated number of garbage collections done. */); + DEFVAR_BOOL ("internal--randomize-markers", randomize_markers, doc: /* */); + randomize_markers = true; + defsubr (&Scons); defsubr (&Slist); defsubr (&Svector); diff --git a/src/marker.c b/src/marker.c index 3d808fd6fa..7c1d164927 100644 --- a/src/marker.c +++ b/src/marker.c @@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x) CHECK_TYPE (MARKERP (x), Qmarkerp, x); } +/* When converting bytes from/to chars, we look through the list of + markers to try and find a good starting point (since markers keep + track of both bytepos and charpos at the same time). + But if there are many markers, it can take too much time to find a "good" + marker from which to start. Worse yet: if it takes a long time and we end + up finding a nearby markers, we won't add a new marker to cache this + result, so next time around we'll have to go through this same long list + to (re)find this best marker. So the further down the list of + markers we go, the less demanding we are w.r.t what is a good marker. + + The previous code used INITIAL=50 and INCREMENT=0 and this lead to + really poor performance when there are many markers. + I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad + T61 using various artificial test cases seem to suggest that INCREMENT=50 + might be "the best compromise": it significantly improved the + worst case and it was rarely slower and never by much. + + The asymptotic behavior is still poor, tho, so in largish buffers with many + overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */ +#define BYTECHAR_DISTANCE_INITIAL 50 +#define BYTECHAR_DISTANCE_INCREMENT bytechar_distance_increment + /* Return the byte position corresponding to CHARPOS in B. */ ptrdiff_t @@ -141,7 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; - ptrdiff_t distance = 50; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b)); @@ -184,7 +206,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) if (best_above - best_below < distance) break; else - distance++; + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. @@ -296,7 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; - ptrdiff_t distance = 50; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b)); @@ -330,7 +352,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) if (best_above - best_below < distance) break; else - distance++; + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. @@ -756,4 +778,9 @@ syms_of_marker (void) defsubr (&Scopy_marker); defsubr (&Smarker_insertion_type); defsubr (&Sset_marker_insertion_type); + + DEFVAR_INT ("internal--bytechar-distance-increment", + bytechar_distance_increment, + doc: /* Haha */); + bytechar_distance_increment = 50; } ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 19:39 ` Stefan Monnier @ 2018-03-25 15:11 ` Stefan Monnier 2018-03-25 16:39 ` Eli Zaretskii 0 siblings, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-25 15:11 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > OK, I think I'm tired of these experiments. Taking a break finally let me get rid of my blinders so I could see more clearly what's going on: All this time is really spent in find_newline (rather than in things like goto-char). The problem goes as follows: A - (line-number-at-pos POS) will call find_newline, making it scan all positions between point-min and POS. B1- if find_newline uses the newline cache, at each newline encountered it will call buf_charpos_to_bytepos. B2- if find_newline does not use the newline cache, at each newline encountered it will call buf_bytepos_to_charpos. - Each time buf_charpos_to_bytepos/buf_bytepos_to_charpos is called it will "CONSIDER" the known positions at point-min, point-max, and at the position of the last-call (i.e. at the previous newline). Then it will loop through all the markers until the distance between the nearest position before POS and the nearest position after POS are within 50 of each other. C - since one of the known positions is the one of the last newline, the distance between the nearest position and POS (before considering any marker) is usually pretty small: the length of the current line. It's even often smaller than 50. But that doesn't let us stop because the marker loop doesn't pay attention to the distance to POS in order to decide to stop: it only consider the distance between the current upper and lower bound and since we're scanning in one direction, all the recently seen positions (added as markers) are on the same side of POS as the last seen newline so they don't help. So there are various ways to solve this problem. So far, I tried to make the markers-loop give up earlier, which helps (C) to some extent but without attacking the core of its problem which is to pay attention to the case where one of the bounds is already very close to POS even tho the other is still way off. The patch below tweaks my previous patch to take this into account. The result is now that my test cases stay fast (mostly unaffected by the number of markers) even for large buffers with a large number of markers. Note that the above shows that there are other optimisations which would also solve this problem (and would be worthwhile independently). A - change line-number-at-pos so it doesn't always rescan all the way from point-min. This would really circumvent the whole problem (contrarily to what I thought before with my blinders on, thanks Eli for insisting on that). B - change find_newline so it doesn't call buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline. E.g. in the no-newline-cache case it'd probably be faster to loop through each char using INC/DEC_BOTH and checking if we're at \n, than the current code which uses mem(r)chr to look for the bytepos of the next \n and then calls buf_bytepos_to_charpos to get the corresponding charpos. Alternatively, we could just delay the computation of the charpos until the end (currently we update it at each newline, for the purpose of filling the newline-cache). -- Stefan diff --git a/src/marker.c b/src/marker.c index 7773c4fce0..f869b3f948 100644 --- a/src/marker.c +++ b/src/marker.c @@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x) CHECK_TYPE (MARKERP (x), Qmarkerp, x); } +/* When converting bytes from/to chars, we look through the list of + markers to try and find a good starting point (since markers keep + track of both bytepos and charpos at the same time). + But if there are many markers, it can take too much time to find a "good" + marker from which to start. Worse yet: if it takes a long time and we end + up finding a nearby markers, we won't add a new marker to cache this + result, so next time around we'll have to go through this same long list + to (re)find this best marker. So the further down the list of + markers we go, the less demanding we are w.r.t what is a good marker. + + The previous code used INITIAL=50 and INCREMENT=0 and this lead to + really poor performance when there are many markers. + I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad + T61 using various artificial test cases seem to suggest that INCREMENT=50 + might be "the best compromise": it significantly improved the + worst case and it was rarely slower and never by much. + + The asymptotic behavior is still poor, tho, so in largish buffers with many + overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */ +#define BYTECHAR_DISTANCE_INITIAL 50 +#define BYTECHAR_DISTANCE_INCREMENT 50 + /* Return the byte position corresponding to CHARPOS in B. */ ptrdiff_t @@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b)); @@ -180,8 +203,11 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - charpos < distance + || charpos - best_below < distance) break; + else + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. @@ -293,6 +319,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b)); @@ -323,8 +350,11 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - bytepos < distance + || bytepos - best_below < distance) break; + else + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. ^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-25 15:11 ` Stefan Monnier @ 2018-03-25 16:39 ` Eli Zaretskii 2018-03-25 17:35 ` Stefan Monnier 0 siblings, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-25 16:39 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@IRO.UMontreal.CA> > Cc: emacs-devel@gnu.org > Date: Sun, 25 Mar 2018 11:11:14 -0400 > > B1- if find_newline uses the newline cache, at each newline encountered it > will call buf_charpos_to_bytepos. > B2- if find_newline does not use the newline cache, at each newline > encountered it will call buf_bytepos_to_charpos. Agree with B1, but not with B2. Unless I'm overlooking something, when the newline cache is disabled, we use memchr, which can search a contiguous sequence of bytes in a loop, without translating byte-to-character positions. It only needs this translation at the beginning of search, after hitting the gap, and when the search is completed. > So there are various ways to solve this problem. So far, I tried to > make the markers-loop give up earlier, which helps (C) to some extent but > without attacking the core of its problem which is to pay attention to > the case where one of the bounds is already very close to POS even tho > the other is still way off. > > The patch below tweaks my previous patch to take this into account. > The result is now that my test cases stay fast (mostly unaffected by the > number of markers) even for large buffers with a large number of markers. This is a good change, I think. But it emphasizes even more the fact that if we instead expose to Lisp display_count_lines, which is basically a stripped-down version of find_newline with all the unnecessary ballast removed, we will get an even better performance in applications that need to count lines a lot. > Note that the above shows that there are other optimisations which would > also solve this problem (and would be worthwhile independently). > A - change line-number-at-pos so it doesn't always rescan all the way > from point-min. This would really circumvent the whole problem > (contrarily to what I thought before with my blinders on, thanks Eli > for insisting on that). > B - change find_newline so it doesn't call > buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline. > E.g. in the no-newline-cache case it'd probably be faster to > loop through each char using INC/DEC_BOTH and checking if we're at > \n, than the current code which uses mem(r)chr to look for the > bytepos of the next \n and then calls buf_bytepos_to_charpos to get > the corresponding charpos. Alternatively, we could just delay the > computation of the charpos until the end (currently we update it > at each newline, for the purpose of filling the newline-cache). I think we need not touch find_newline. It is a very frequently used workhorse, and needs to produce decent performance for every one of its callers. By contrast, applications whose primary need is to count lines, let alone do that _thousands_ of times per keystroke, should have a dedicated function optimized for that job alone. It's not a coincidence the native line-number display uses display_count_lines ;-) Thanks. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-25 16:39 ` Eli Zaretskii @ 2018-03-25 17:35 ` Stefan Monnier 0 siblings, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-03-25 17:35 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > Agree with B1, but not with B2. Unless I'm overlooking something, > when the newline cache is disabled, we use memchr, which can search a > contiguous sequence of bytes in a loop, without translating > byte-to-character positions. It only needs this translation at the > beginning of search, after hitting the gap, and when the search is > completed. Hmm... indeed, when the newline cache is completely disabled the problem should not appear. >> The patch below tweaks my previous patch to take this into account. >> The result is now that my test cases stay fast (mostly unaffected by the >> number of markers) even for large buffers with a large number of markers. > This is a good change, I think. But it emphasizes even more the fact > that if we instead expose to Lisp display_count_lines, which is > basically a stripped-down version of find_newline with all the > unnecessary ballast removed, we will get an even better performance in > applications that need to count lines a lot. Yes, fixing A is definitely worthwhile. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-22 23:11 ` Sebastian Sturm 2018-03-23 5:03 ` Stefan Monnier @ 2018-03-23 8:07 ` Eli Zaretskii 2018-03-23 9:08 ` Eli Zaretskii 1 sibling, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-23 8:07 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Fri, 23 Mar 2018 00:11:16 +0100 > > since noverlay performs so well, I guess the technical issue here is > already solved and I'll just have to wait for it to make it into the > master branch. As Stefan points out, the overlays are not the reason. The reason are the number of markers in the buffer; each overlay defines 2 markers. And there could be many markers in a buffer even if there are no overlays. Btw, why do you have so many overlays in these buffers? Is this part of lsp-mode implementation, or is the reason unrelated? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 8:07 ` Eli Zaretskii @ 2018-03-23 9:08 ` Eli Zaretskii 2018-03-23 10:15 ` Sebastian Sturm 2018-03-23 12:12 ` Stefan Monnier 0 siblings, 2 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-23 9:08 UTC (permalink / raw) To: s.sturm; +Cc: emacs-devel > Date: Fri, 23 Mar 2018 11:07:26 +0300 > From: Eli Zaretskii <eliz@gnu.org> > Cc: emacs-devel@gnu.org > > Btw, why do you have so many overlays in these buffers? Is this part > of lsp-mode implementation, or is the reason unrelated? Also, what about my suggestion to count lines in a relative manner, using count-lines from a line with a known number? You never replied to that suggestion. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 9:08 ` Eli Zaretskii @ 2018-03-23 10:15 ` Sebastian Sturm 2018-03-23 12:39 ` Eli Zaretskii 2018-03-23 12:12 ` Stefan Monnier 1 sibling, 1 reply; 54+ messages in thread From: Sebastian Sturm @ 2018-03-23 10:15 UTC (permalink / raw) To: emacs-devel On 03/23/2018 10:08 AM, Eli Zaretskii wrote: >> Date: Fri, 23 Mar 2018 11:07:26 +0300 >> From: Eli Zaretskii <eliz@gnu.org> >> Cc: emacs-devel@gnu.org >> >> Btw, why do you have so many overlays in these buffers? Is this part >> of lsp-mode implementation, or is the reason unrelated? this is not related to lsp-mode itself, but to the semantic highlighter implemented by emacs-cquery (which is built on top of lsp-mode but implements additional features offered by the cquery backend). When set to use overlays (which provides a better visual experience than font-lock, as font-lock tends to get out of sync with the buffer during editing), the semantic highlighter retrieves a list of symbols within the current buffer and creates overlays to provide semantically meaningful syntax highlighting. It's not a feature I couldn't live without, but it's very precise and in principle could probably be faster than the syntax highlighting provided by cc-mode as C++ parsing is handled asynchronously by the clang-based native backend. > Also, what about my suggestion to count lines in a relative manner, > using count-lines from a line with a known number? You never replied > to that suggestion. you're right, sorry. In my opinion, a caching mechanism might be a very useful thing to have if it provides further performance benefits on top of what the noverlay branch has to offer. However, since count-lines may not be the only function that has to convert between char and byte positions (or is it?), and since the noverlay branch seems to resolve the overlay issue without having to introduce additional complexity in the elisp layer, implementing a caching mechanism before noverlay is merged into the master branch seems like a premature optimization to me. Of course this is a layman's opinion (and maybe the case of "few overlays but many markers" is not as pathological as it appears to me); if you think a line number cache should be implemented, I'll go and discuss that with the lsp-mode maintainers (assuming that they are among the heaviest users of line-number-at-pos). Sebastian ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 10:15 ` Sebastian Sturm @ 2018-03-23 12:39 ` Eli Zaretskii 0 siblings, 0 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-23 12:39 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Fri, 23 Mar 2018 11:15:48 +0100 > > > Also, what about my suggestion to count lines in a relative manner, > > using count-lines from a line with a known number? You never replied > > to that suggestion. > > you're right, sorry. In my opinion, a caching mechanism might be a very > useful thing to have if it provides further performance benefits on top > of what the noverlay branch has to offer. However, since count-lines may > not be the only function that has to convert between char and byte > positions (or is it?), and since the noverlay branch seems to resolve > the overlay issue without having to introduce additional complexity in > the elisp layer, implementing a caching mechanism before noverlay is > merged into the master branch seems like a premature optimization to me. It isn't premature optimization, because a buffer could have many markers even if it has no or only a few overlays. > Of course this is a layman's opinion (and maybe the case of "few > overlays but many markers" is not as pathological as it appears to me); > if you think a line number cache should be implemented, I'll go and > discuss that with the lsp-mode maintainers (assuming that they are among > the heaviest users of line-number-at-pos). I think the effect should be at least measured before the decision whether to do that is made. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 9:08 ` Eli Zaretskii 2018-03-23 10:15 ` Sebastian Sturm @ 2018-03-23 12:12 ` Stefan Monnier 2018-03-23 12:40 ` Eli Zaretskii 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-23 12:12 UTC (permalink / raw) To: emacs-devel > Also, what about my suggestion to count lines in a relative manner, > using count-lines from a line with a known number? You never replied > to that suggestion. FWIW, I believe in his case most of the time is spent outside of the actual "count the lines" (aka forward-line) code, so counting fewer lines because we start from a more nearby position probably won't help very much. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 12:12 ` Stefan Monnier @ 2018-03-23 12:40 ` Eli Zaretskii 2018-03-23 12:55 ` Stefan Monnier 0 siblings, 1 reply; 54+ messages in thread From: Eli Zaretskii @ 2018-03-23 12:40 UTC (permalink / raw) To: Stefan Monnier; +Cc: emacs-devel > From: Stefan Monnier <monnier@iro.umontreal.ca> > Date: Fri, 23 Mar 2018 08:12:41 -0400 > > > Also, what about my suggestion to count lines in a relative manner, > > using count-lines from a line with a known number? You never replied > > to that suggestion. > > FWIW, I believe in his case most of the time is spent outside of the > actual "count the lines" (aka forward-line) code, so counting fewer > lines because we start from a more nearby position probably won't help > very much. I'm not sure, because the profile indicates memrchr is a significant runner-up in the CPU time usage. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-23 12:40 ` Eli Zaretskii @ 2018-03-23 12:55 ` Stefan Monnier 0 siblings, 0 replies; 54+ messages in thread From: Stefan Monnier @ 2018-03-23 12:55 UTC (permalink / raw) To: Eli Zaretskii; +Cc: emacs-devel > I'm not sure, because the profile indicates memrchr is a significant > runner-up in the CPU time usage. Oh, indeed I wasn't clear: it won't solve the "too many markers" case, but it will still be useful, especially in the case where there aren't too many markers. My experience with nlinum--line-number-at-pos is that it's a low-hanging fruit. If the solution can be implemented directly inside line-number-at-pos it'd be even better. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 23:03 ` Sebastian Sturm 2018-03-18 23:20 ` Sebastian Sturm @ 2018-03-19 6:36 ` Eli Zaretskii 1 sibling, 0 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 6:36 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Mon, 19 Mar 2018 00:03:11 +0100 > > Again, however, line-number-at-pos shows up as a large CPU time consumer > in the profiler report, and benchmark-run still reports several ms per > invocation (though this time it's usually around 2 to 4 ms instead of > the 20 to 25 I measured earlier), so I'd still be very much interested > in a faster line-number-at-pos implementation. 2 to 4 ms for 6 calls is as fast as you can get for a 70K file. But you should be able to issue just one call, and replace the other 5 with relative counting using count-lines or forward-line, which should then count only a small number of lines from the original location. Right? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 21:04 ` Sebastian Sturm 2018-03-18 23:03 ` Sebastian Sturm @ 2018-03-19 6:28 ` Eli Zaretskii 1 sibling, 0 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-19 6:28 UTC (permalink / raw) To: Sebastian Sturm; +Cc: emacs-devel > From: Sebastian Sturm <s.sturm@arkona-technologies.de> > Date: Sun, 18 Mar 2018 22:04:13 +0100 > > I also found it surprising that overlays would slow down line counting, > but since I don't know anything about the architecture of the Emacs > display engine, or its overlay implementation, I figured that overlays > must be to blame because > > (i) the issue went away after switching to the feature/noverlay branch > > (ii) configuring the semantic highlighter to use its font-lock backend > also resolved the performance issue (though with the font-lock backend, > highlights are easily messed up by editing operations which makes the > overlay variant far more appealing) If you look at the implementation of count-lines, you will see it just calls forward-line, which pays no attention to overlays (as I'd expect). > I also found that some other heavy users of overlays such as > avy-goto-word-0-{above,below} feel faster with the feature/noverlay > branch, so I'd welcome a merge of the overlay branch even if there was a > technically superior alternative to line-number-at-pos that didn't > suffer from overlay-related performance issues. That's unrelated: we want to merge that branch when it's ready for several good reasons. But counting lines shouldn't be one of those reasons. > That being said, your suggestion sounds intriguing. What would be > required to expose find_newline to Lisp? Would I simply have to wrap it > in one of Emacs's DEFINE_<something> macros? DEFUN, more accurately. But yes. Actually, I see that forward-line already calls find_newline almost directly, so it should be fast enough. Exposing find_line should perhaps be able to produce some speedup (because you don't need to set point, like forward-line does), but I doubt that it would be significant. > Is there some documentation on the Emacs C backend? There's the "Writing Emacs Primitives" section of the "Internals" appendix. Turning back to your original problem, viz.: >> [1] I'm using cquery for my C++ editing needs, which comes with an >> overlay-based semantic highlighting mechanism. With my emacs >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per >> character insertion, which consume ~20 to 25 ms each when performing >> edits close to the bottom of a 66KB C++ file (measured using >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of >> emacs-27/git commit #9942734...). Using the noverlay branch, this figure >> drops to ~160us per call. it looks strange to me that you get such long times. I just tried Emacs 26.0.91 near the end of xdisp.c (a 1MB file with over 33K lines), and I get 3 ms per call there. So I wonder how come you get 4 or 5 ms per call in a file that is 50 times smaller, because if I run the above benchmark with point at 70K, I get 0.16 ms per call. In an unoptimized build of the current master branch, I get 0.25 ms per call under these conditions. (And you could cache the result if you need 6 calls in the same vicinity, and after the initial call only call forward-line to compute relative increments.) This is with a 5-year old i7-2600 box with a 3.40 GHz clock. Is your CPU significantly slower? ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 20:39 ` Eli Zaretskii 2018-03-18 21:04 ` Sebastian Sturm @ 2018-03-21 14:14 ` Sebastien Chapuis 2018-03-21 15:35 ` Eli Zaretskii 1 sibling, 1 reply; 54+ messages in thread From: Sebastien Chapuis @ 2018-03-21 14:14 UTC (permalink / raw) To: Eli Zaretskii; +Cc: Sebastian Sturm, emacs-devel Eli Zaretskii writes: >> From: Sebastian Sturm <s.sturm@arkona-technologies.de> >> Date: Sun, 18 Mar 2018 21:14:53 +0100 >> >> [1] I'm using cquery for my C++ editing needs, which comes with an >> overlay-based semantic highlighting mechanism. With my emacs >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per >> character insertion, which consume ~20 to 25 ms each when performing >> edits close to the bottom of a 66KB C++ file (measured using >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of >> emacs-27/git commit #9942734...). Using the noverlay branch, this figure >> drops to ~160us per call. > > If lsp-mode/lsp-ui needs a fast line counter, one can easily be > provided by exposing find_newline to Lisp. IME, it's lightning-fast, > and should run circles around count-lines (used by line-number-at-pos). > > (I'm not sure I even understand how overlays come into play here, > btw.) The language server protocol defines a position in file with zero-indexed line and column offsets [1]: ``` interface Position { line: number; character: number; } ``` lsp-mode uses heavily line-number-at-pos to convert an Emacs buffer point to a LSP position, and vice-versa [2]. This can happen thousands times on each keystroke. If Emacs could provide a function to do the conversion very fast (or at least faster than with line-number-at-pos), it would be great. [1] https://github.com/Microsoft/language-server-protocol/blob/gh-pages/specification.md#position [2] The conversion from a LSP position to point doesn't use line-number-at-pos, but forward-line and forward-char. It's still very slow. -- Sebastien Chapuis ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-21 14:14 ` Sebastien Chapuis @ 2018-03-21 15:35 ` Eli Zaretskii 0 siblings, 0 replies; 54+ messages in thread From: Eli Zaretskii @ 2018-03-21 15:35 UTC (permalink / raw) To: Sebastien Chapuis; +Cc: s.sturm, emacs-devel > From: Sebastien Chapuis <sebastien@chapu.is> > Cc: Sebastian Sturm <s.sturm@arkona-technologies.de>, emacs-devel@gnu.org > Date: Wed, 21 Mar 2018 15:14:18 +0100 > > The language server protocol defines a position in file with zero-indexed > line and column offsets [1]: > > ``` > interface Position { > line: number; > character: number; > } > ``` > > lsp-mode uses heavily line-number-at-pos to convert an Emacs buffer > point to a LSP position, and vice-versa [2]. line-number-at-pos is inefficient, in that it always counts from the beginning of the buffer. By keeping already computed line numbers around, you could make a faster implementation if you count relative line offsets using count-lines instead. > This can happen thousands times on each keystroke. _Thousands_ times for _each_ keystroke? Why is that? Most keystrokes only change a single line, so how come you need thousands of lines recounted each time? > If Emacs could provide a function to do the conversion very fast (or at > least faster than with line-number-at-pos), it would be great. Given the above, I think you need to describe the issue in more details, before we even begin designing the solution. ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-18 20:14 Sebastian Sturm 2018-03-18 20:39 ` Eli Zaretskii @ 2018-03-26 13:06 ` Stefan Monnier 2018-03-27 20:59 ` Sebastian Sturm 1 sibling, 1 reply; 54+ messages in thread From: Stefan Monnier @ 2018-03-26 13:06 UTC (permalink / raw) To: emacs-devel > after finding that the feature/noverlay branch does wonders for my editing > experience[1], I'd like to reinvigorate the discussion on its inclusion into > master. While I also hope that branch can be merged soon, I've installed a patch into Emacs's `master` which should hopefully solve your immediate problems. Stefan ^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch? 2018-03-26 13:06 ` Stefan Monnier @ 2018-03-27 20:59 ` Sebastian Sturm 0 siblings, 0 replies; 54+ messages in thread From: Sebastian Sturm @ 2018-03-27 20:59 UTC (permalink / raw) To: emacs-devel thank you, the master branch now handles my use case a lot better than before On 03/26/2018 03:06 PM, Stefan Monnier wrote: >> after finding that the feature/noverlay branch does wonders for my editing >> experience[1], I'd like to reinvigorate the discussion on its inclusion into >> master. > > While I also hope that branch can be merged soon, I've installed a patch > into Emacs's `master` which should hopefully solve your > immediate problems. > > > Stefan > > ^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2018-03-27 20:59 UTC | newest] Thread overview: 54+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- [not found] <<c24f8534-5245-026e-da18-f6be7b9702bf@arkona-technologies.de> [not found] ` <<834lldp18f.fsf@gnu.org> 2018-03-18 21:37 ` State of the overlay tree branch? Drew Adams 2018-03-19 1:33 ` Stefan Monnier 2018-03-19 6:50 ` Eli Zaretskii 2018-03-19 12:29 ` Stefan Monnier 2018-03-19 13:02 ` Eli Zaretskii 2018-03-19 13:43 ` Stefan Monnier 2018-03-19 14:28 ` Eli Zaretskii 2018-03-19 14:39 ` Stefan Monnier 2018-03-19 6:33 ` Eli Zaretskii 2018-03-18 20:14 Sebastian Sturm 2018-03-18 20:39 ` Eli Zaretskii 2018-03-18 21:04 ` Sebastian Sturm 2018-03-18 23:03 ` Sebastian Sturm 2018-03-18 23:20 ` Sebastian Sturm 2018-03-19 6:43 ` Eli Zaretskii 2018-03-19 9:53 ` Sebastian Sturm 2018-03-19 12:57 ` Eli Zaretskii 2018-03-19 14:56 ` Stefan Monnier 2018-03-19 15:07 ` Sebastian Sturm 2018-03-19 15:13 ` Stefan Monnier 2018-03-20 1:23 ` Sebastian Sturm 2018-03-20 6:30 ` Eli Zaretskii 2018-03-21 0:36 ` Sebastian Sturm 2018-03-21 6:47 ` Eli Zaretskii 2018-03-22 13:16 ` Stefan Monnier 2018-03-22 19:54 ` Sebastian Sturm 2018-03-22 20:04 ` Sebastian Sturm 2018-03-22 20:52 ` Stefan Monnier 2018-03-22 23:11 ` Sebastian Sturm 2018-03-23 5:03 ` Stefan Monnier 2018-03-23 12:25 ` Sebastian Sturm 2018-03-23 12:47 ` Eli Zaretskii 2018-03-23 13:19 ` Stefan Monnier 2018-03-23 13:37 ` Noam Postavsky 2018-03-23 13:55 ` Stefan Monnier 2018-03-23 14:22 ` Eli Zaretskii 2018-03-23 14:39 ` Stefan Monnier 2018-03-23 19:39 ` Stefan Monnier 2018-03-25 15:11 ` Stefan Monnier 2018-03-25 16:39 ` Eli Zaretskii 2018-03-25 17:35 ` Stefan Monnier 2018-03-23 8:07 ` Eli Zaretskii 2018-03-23 9:08 ` Eli Zaretskii 2018-03-23 10:15 ` Sebastian Sturm 2018-03-23 12:39 ` Eli Zaretskii 2018-03-23 12:12 ` Stefan Monnier 2018-03-23 12:40 ` Eli Zaretskii 2018-03-23 12:55 ` Stefan Monnier 2018-03-19 6:36 ` Eli Zaretskii 2018-03-19 6:28 ` Eli Zaretskii 2018-03-21 14:14 ` Sebastien Chapuis 2018-03-21 15:35 ` Eli Zaretskii 2018-03-26 13:06 ` Stefan Monnier 2018-03-27 20:59 ` Sebastian Sturm
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/emacs.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).