* State of the overlay tree branch?
@ 2018-03-18 20:14 Sebastian Sturm
2018-03-18 20:39 ` Eli Zaretskii
2018-03-26 13:06 ` Stefan Monnier
0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 20:14 UTC (permalink / raw)
To: emacs-devel
Hi,
after finding that the feature/noverlay branch does wonders for my
editing experience[1], I'd like to reinvigorate the discussion on its
inclusion into master. Are there plans for a merge with the emacs-27
master branch? Any critical issues blocking such a merge?
If a recent Emacs 27 variant with the overlay branch feature was
available, I'd be happy to evaluate that in my daily work.
best regards,
Sebastian Sturm
[1] I'm using cquery for my C++ editing needs, which comes with an
overlay-based semantic highlighting mechanism. With my emacs
configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
character insertion, which consume ~20 to 25 ms each when performing
edits close to the bottom of a 66KB C++ file (measured using
(benchmark-run 1000 (line-number-at-pos (point))) on a release build of
emacs-27/git commit #9942734...). Using the noverlay branch, this figure
drops to ~160us per call.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 20:14 Sebastian Sturm
@ 2018-03-18 20:39 ` Eli Zaretskii
2018-03-18 21:04 ` Sebastian Sturm
2018-03-21 14:14 ` Sebastien Chapuis
2018-03-26 13:06 ` Stefan Monnier
1 sibling, 2 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-18 20:39 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Sun, 18 Mar 2018 21:14:53 +0100
>
> [1] I'm using cquery for my C++ editing needs, which comes with an
> overlay-based semantic highlighting mechanism. With my emacs
> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
> character insertion, which consume ~20 to 25 ms each when performing
> edits close to the bottom of a 66KB C++ file (measured using
> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
> emacs-27/git commit #9942734...). Using the noverlay branch, this figure
> drops to ~160us per call.
If lsp-mode/lsp-ui needs a fast line counter, one can easily be
provided by exposing find_newline to Lisp. IME, it's lightning-fast,
and should run circles around count-lines (used by line-number-at-pos).
(I'm not sure I even understand how overlays come into play here,
btw.)
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 20:39 ` Eli Zaretskii
@ 2018-03-18 21:04 ` Sebastian Sturm
2018-03-18 23:03 ` Sebastian Sturm
2018-03-19 6:28 ` Eli Zaretskii
2018-03-21 14:14 ` Sebastien Chapuis
1 sibling, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 21:04 UTC (permalink / raw)
To: emacs-devel
I also found it surprising that overlays would slow down line counting,
but since I don't know anything about the architecture of the Emacs
display engine, or its overlay implementation, I figured that overlays
must be to blame because
(i) the issue went away after switching to the feature/noverlay branch
(ii) configuring the semantic highlighter to use its font-lock backend
also resolved the performance issue (though with the font-lock backend,
highlights are easily messed up by editing operations which makes the
overlay variant far more appealing)
I also found that some other heavy users of overlays such as
avy-goto-word-0-{above,below} feel faster with the feature/noverlay
branch, so I'd welcome a merge of the overlay branch even if there was a
technically superior alternative to line-number-at-pos that didn't
suffer from overlay-related performance issues.
That being said, your suggestion sounds intriguing. What would be
required to expose find_newline to Lisp? Would I simply have to wrap it
in one of Emacs's DEFINE_<something> macros? Is there some documentation
on the Emacs C backend?
On 03/18/2018 09:39 PM, Eli Zaretskii wrote:
>> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>> Date: Sun, 18 Mar 2018 21:14:53 +0100
>>
>> [1] I'm using cquery for my C++ editing needs, which comes with an
>> overlay-based semantic highlighting mechanism. With my emacs
>> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
>> character insertion, which consume ~20 to 25 ms each when performing
>> edits close to the bottom of a 66KB C++ file (measured using
>> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
>> emacs-27/git commit #9942734...). Using the noverlay branch, this figure
>> drops to ~160us per call.
>
> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> provided by exposing find_newline to Lisp. IME, it's lightning-fast,
> and should run circles around count-lines (used by line-number-at-pos).
>
> (I'm not sure I even understand how overlays come into play here,
> btw.)
^ permalink raw reply [flat|nested] 54+ messages in thread
* RE: State of the overlay tree branch?
[not found] ` <<834lldp18f.fsf@gnu.org>
@ 2018-03-18 21:37 ` Drew Adams
2018-03-19 1:33 ` Stefan Monnier
2018-03-19 6:33 ` Eli Zaretskii
0 siblings, 2 replies; 54+ messages in thread
From: Drew Adams @ 2018-03-18 21:37 UTC (permalink / raw)
To: Eli Zaretskii, Sebastian Sturm; +Cc: emacs-devel
> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> provided by exposing find_newline to Lisp. IME, it's lightning-fast,
> and should run circles around count-lines (used by line-number-at-pos).
Having a fast line counter in Elisp would be terrific.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 21:04 ` Sebastian Sturm
@ 2018-03-18 23:03 ` Sebastian Sturm
2018-03-18 23:20 ` Sebastian Sturm
2018-03-19 6:36 ` Eli Zaretskii
2018-03-19 6:28 ` Eli Zaretskii
1 sibling, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 23:03 UTC (permalink / raw)
To: emacs-devel
concerning the performance improvement with noverlay, it seems I spoke
to soon. I've now had the issue reappear, both with the noverlay branch,
and with the semantic highlighter set to use font-lock. Sorry for the
misinformation.
Again, however, line-number-at-pos shows up as a large CPU time consumer
in the profiler report, and benchmark-run still reports several ms per
invocation (though this time it's usually around 2 to 4 ms instead of
the 20 to 25 I measured earlier), so I'd still be very much interested
in a faster line-number-at-pos implementation.
On 03/18/2018 10:04 PM, Sebastian Sturm wrote:
> I also found it surprising that overlays would slow down line counting,
> but since I don't know anything about the architecture of the Emacs
> display engine, or its overlay implementation, I figured that overlays
> must be to blame because
>
> (i) the issue went away after switching to the feature/noverlay branch
>
> (ii) configuring the semantic highlighter to use its font-lock backend
> also resolved the performance issue (though with the font-lock backend,
> highlights are easily messed up by editing operations which makes the
> overlay variant far more appealing)
>
> I also found that some other heavy users of overlays such as
> avy-goto-word-0-{above,below} feel faster with the feature/noverlay
> branch, so I'd welcome a merge of the overlay branch even if there was a
> technically superior alternative to line-number-at-pos that didn't
> suffer from overlay-related performance issues.
>
> That being said, your suggestion sounds intriguing. What would be
> required to expose find_newline to Lisp? Would I simply have to wrap it
> in one of Emacs's DEFINE_<something> macros? Is there some documentation
> on the Emacs C backend?
>
> On 03/18/2018 09:39 PM, Eli Zaretskii wrote:
> >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> >> Date: Sun, 18 Mar 2018 21:14:53 +0100
> >>
> >> [1] I'm using cquery for my C++ editing needs, which comes with an
> >> overlay-based semantic highlighting mechanism. With my emacs
> >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
> >> character insertion, which consume ~20 to 25 ms each when performing
> >> edits close to the bottom of a 66KB C++ file (measured using
> >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
> >> emacs-27/git commit #9942734...). Using the noverlay branch, this
> figure
> >> drops to ~160us per call.
> >
> > If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> > provided by exposing find_newline to Lisp. IME, it's lightning-fast,
> > and should run circles around count-lines (used by line-number-at-pos).
> >
> > (I'm not sure I even understand how overlays come into play here,
> > btw.)
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 23:03 ` Sebastian Sturm
@ 2018-03-18 23:20 ` Sebastian Sturm
2018-03-19 6:43 ` Eli Zaretskii
2018-03-19 6:36 ` Eli Zaretskii
1 sibling, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 23:20 UTC (permalink / raw)
To: emacs-devel
for the record, I just switched back to emacs master (no noverlay) and
the time reported by (benchmark-run 1000 (line-number-at-pos (point))
increased by a factor of ~40, to 75-80s. At this level, editing is
unbearably slow. With the semantic highlighter disabled, the same
measurement yields ~2.5s (still painfully slow, but borderline usable),
so about the same time reported by the noverlay branch.
To me, this suggests that noverlay indeed improves performance, though
not necessarily to the level I had previously claimed. find_newline may
solve this particular issue completely.
Since the time taken by line-number-at-pos seems to fluctuate wildly for
(to me) unknown reasons, I'll try and see if I can set up a systematic
way to collect reliable data.
On 03/19/2018 12:03 AM, Sebastian Sturm wrote:
> concerning the performance improvement with noverlay, it seems I spoke
> to soon. I've now had the issue reappear, both with the noverlay branch,
> and with the semantic highlighter set to use font-lock. Sorry for the
> misinformation.
> Again, however, line-number-at-pos shows up as a large CPU time consumer
> in the profiler report, and benchmark-run still reports several ms per
> invocation (though this time it's usually around 2 to 4 ms instead of
> the 20 to 25 I measured earlier), so I'd still be very much interested
> in a faster line-number-at-pos implementation.
>
> On 03/18/2018 10:04 PM, Sebastian Sturm wrote:
>> I also found it surprising that overlays would slow down line
>> counting, but since I don't know anything about the architecture of
>> the Emacs display engine, or its overlay implementation, I figured
>> that overlays must be to blame because
>>
>> (i) the issue went away after switching to the feature/noverlay branch
>>
>> (ii) configuring the semantic highlighter to use its font-lock backend
>> also resolved the performance issue (though with the font-lock
>> backend, highlights are easily messed up by editing operations which
>> makes the overlay variant far more appealing)
>>
>> I also found that some other heavy users of overlays such as
>> avy-goto-word-0-{above,below} feel faster with the feature/noverlay
>> branch, so I'd welcome a merge of the overlay branch even if there was
>> a technically superior alternative to line-number-at-pos that didn't
>> suffer from overlay-related performance issues.
>>
>> That being said, your suggestion sounds intriguing. What would be
>> required to expose find_newline to Lisp? Would I simply have to wrap
>> it in one of Emacs's DEFINE_<something> macros? Is there some
>> documentation on the Emacs C backend?
>>
>> On 03/18/2018 09:39 PM, Eli Zaretskii wrote:
>> >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>> >> Date: Sun, 18 Mar 2018 21:14:53 +0100
>> >>
>> >> [1] I'm using cquery for my C++ editing needs, which comes with an
>> >> overlay-based semantic highlighting mechanism. With my emacs
>> >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
>> >> character insertion, which consume ~20 to 25 ms each when performing
>> >> edits close to the bottom of a 66KB C++ file (measured using
>> >> (benchmark-run 1000 (line-number-at-pos (point))) on a release
>> build of
>> >> emacs-27/git commit #9942734...). Using the noverlay branch, this
>> figure
>> >> drops to ~160us per call.
>> >
>> > If lsp-mode/lsp-ui needs a fast line counter, one can easily be
>> > provided by exposing find_newline to Lisp. IME, it's lightning-fast,
>> > and should run circles around count-lines (used by
>> line-number-at-pos).
>> >
>> > (I'm not sure I even understand how overlays come into play here,
>> > btw.)
>>
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 21:37 ` State of the overlay tree branch? Drew Adams
@ 2018-03-19 1:33 ` Stefan Monnier
2018-03-19 6:50 ` Eli Zaretskii
2018-03-19 6:33 ` Eli Zaretskii
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 1:33 UTC (permalink / raw)
To: emacs-devel
>> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
>> provided by exposing find_newline to Lisp. IME, it's lightning-fast,
>> and should run circles around count-lines (used by line-number-at-pos).
> Having a fast line counter in Elisp would be terrific.
It should be pretty easy to provide such a thing by relying on a cache
of the last call. Tho Sebastian's experience seems to indicate that the
current code doesn't only suffer from the time to count LF but also from
the time to process the markers.
I seem to remember someone else experiencing a similar problem and
suggesting that the problem lies in the charpos_to_bytepos (and/or
bytepos_to_charpos) conversion function, which iterates through all the
markers to try and find a "nearby" marker (because markers keep track
of both their bytepos and their charpos). Looking for a nearby marker
to avoid scanning the whole buffer is a good idea in many cases, but not
if scanning the list of markers takes more time than scanning the
whole buffer.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 21:04 ` Sebastian Sturm
2018-03-18 23:03 ` Sebastian Sturm
@ 2018-03-19 6:28 ` Eli Zaretskii
1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 6:28 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Sun, 18 Mar 2018 22:04:13 +0100
>
> I also found it surprising that overlays would slow down line counting,
> but since I don't know anything about the architecture of the Emacs
> display engine, or its overlay implementation, I figured that overlays
> must be to blame because
>
> (i) the issue went away after switching to the feature/noverlay branch
>
> (ii) configuring the semantic highlighter to use its font-lock backend
> also resolved the performance issue (though with the font-lock backend,
> highlights are easily messed up by editing operations which makes the
> overlay variant far more appealing)
If you look at the implementation of count-lines, you will see it just
calls forward-line, which pays no attention to overlays (as I'd
expect).
> I also found that some other heavy users of overlays such as
> avy-goto-word-0-{above,below} feel faster with the feature/noverlay
> branch, so I'd welcome a merge of the overlay branch even if there was a
> technically superior alternative to line-number-at-pos that didn't
> suffer from overlay-related performance issues.
That's unrelated: we want to merge that branch when it's ready for
several good reasons. But counting lines shouldn't be one of those
reasons.
> That being said, your suggestion sounds intriguing. What would be
> required to expose find_newline to Lisp? Would I simply have to wrap it
> in one of Emacs's DEFINE_<something> macros?
DEFUN, more accurately. But yes.
Actually, I see that forward-line already calls find_newline almost
directly, so it should be fast enough. Exposing find_line should
perhaps be able to produce some speedup (because you don't need to set
point, like forward-line does), but I doubt that it would be
significant.
> Is there some documentation on the Emacs C backend?
There's the "Writing Emacs Primitives" section of the "Internals"
appendix.
Turning back to your original problem, viz.:
>> [1] I'm using cquery for my C++ editing needs, which comes with an
>> overlay-based semantic highlighting mechanism. With my emacs
>> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
>> character insertion, which consume ~20 to 25 ms each when performing
>> edits close to the bottom of a 66KB C++ file (measured using
>> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
>> emacs-27/git commit #9942734...). Using the noverlay branch, this figure
>> drops to ~160us per call.
it looks strange to me that you get such long times. I just tried
Emacs 26.0.91 near the end of xdisp.c (a 1MB file with over 33K
lines), and I get 3 ms per call there. So I wonder how come you get 4
or 5 ms per call in a file that is 50 times smaller, because if I run
the above benchmark with point at 70K, I get 0.16 ms per call. In an
unoptimized build of the current master branch, I get 0.25 ms per call
under these conditions. (And you could cache the result if you need 6
calls in the same vicinity, and after the initial call only call
forward-line to compute relative increments.)
This is with a 5-year old i7-2600 box with a 3.40 GHz clock. Is your
CPU significantly slower?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 21:37 ` State of the overlay tree branch? Drew Adams
2018-03-19 1:33 ` Stefan Monnier
@ 2018-03-19 6:33 ` Eli Zaretskii
1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 6:33 UTC (permalink / raw)
To: Drew Adams; +Cc: s.sturm, emacs-devel
> Date: Sun, 18 Mar 2018 14:37:39 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
>
> > If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> > provided by exposing find_newline to Lisp. IME, it's lightning-fast,
> > and should run circles around count-lines (used by line-number-at-pos).
>
> Having a fast line counter in Elisp would be terrific.
Actually, I see that line-number-at-pos, count-lines, and forward-line
should already be fast, as they do exactly what I thought, something I
failed to see originally. If that's not "fast enough", a test case
showing the problem would be a good starting point.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 23:03 ` Sebastian Sturm
2018-03-18 23:20 ` Sebastian Sturm
@ 2018-03-19 6:36 ` Eli Zaretskii
1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 6:36 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Mon, 19 Mar 2018 00:03:11 +0100
>
> Again, however, line-number-at-pos shows up as a large CPU time consumer
> in the profiler report, and benchmark-run still reports several ms per
> invocation (though this time it's usually around 2 to 4 ms instead of
> the 20 to 25 I measured earlier), so I'd still be very much interested
> in a faster line-number-at-pos implementation.
2 to 4 ms for 6 calls is as fast as you can get for a 70K file. But
you should be able to issue just one call, and replace the other 5
with relative counting using count-lines or forward-line, which should
then count only a small number of lines from the original location.
Right?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 23:20 ` Sebastian Sturm
@ 2018-03-19 6:43 ` Eli Zaretskii
2018-03-19 9:53 ` Sebastian Sturm
0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 6:43 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Mon, 19 Mar 2018 00:20:13 +0100
>
> for the record, I just switched back to emacs master (no noverlay) and
> the time reported by (benchmark-run 1000 (line-number-at-pos (point))
> increased by a factor of ~40, to 75-80s. At this level, editing is
> unbearably slow. With the semantic highlighter disabled, the same
> measurement yields ~2.5s (still painfully slow, but borderline usable),
> so about the same time reported by the noverlay branch.
You will have to explain why overlays and the semantic highlighter
affect line-counting. How about presenting a profile produced by
"M-x profiler-report"?
And the timings you measure are 2.5 _milliseconds_ (the benchmark runs
1000 times), right? If so, I cannot understand why you say that's
borderline usable, because IME such short times are imperceptible by
humans. I guess some other factor is at work here, so I'd suggest to
describe more details about your use case.
> Since the time taken by line-number-at-pos seems to fluctuate wildly for
> (to me) unknown reasons, I'll try and see if I can set up a systematic
> way to collect reliable data.
Yes, please do. I'm guessing there's some factor here that is
important to consider.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 1:33 ` Stefan Monnier
@ 2018-03-19 6:50 ` Eli Zaretskii
2018-03-19 12:29 ` Stefan Monnier
0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 6:50 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 18 Mar 2018 21:33:05 -0400
>
> >> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> >> provided by exposing find_newline to Lisp. IME, it's lightning-fast,
> >> and should run circles around count-lines (used by line-number-at-pos).
> > Having a fast line counter in Elisp would be terrific.
>
> It should be pretty easy to provide such a thing by relying on a cache
> of the last call.
This is already coded, see display_count_lines. That's what the
native line-number display uses. Exposing it to Lisp should be easy.
But I don't believe it could be orders of magnitude faster than
count-lines, even though it doesn't need to convert character position
to byte position.
I'm guessing something entirely different and unrelated to
line-counting per se is at work here.
> Tho Sebastian's experience seems to indicate that the
> current code doesn't only suffer from the time to count LF but also from
> the time to process the markers.
Not sure what marker processing did you have in mind. Can you
elaborate?
> I seem to remember someone else experiencing a similar problem and
> suggesting that the problem lies in the charpos_to_bytepos (and/or
> bytepos_to_charpos) conversion function, which iterates through all the
> markers to try and find a "nearby" marker (because markers keep track
> of both their bytepos and their charpos). Looking for a nearby marker
> to avoid scanning the whole buffer is a good idea in many cases, but not
> if scanning the list of markers takes more time than scanning the
> whole buffer.
But find_newline doesn't look for markers, and it converts character
to byte position just 2 times. Or am I missing something?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 6:43 ` Eli Zaretskii
@ 2018-03-19 9:53 ` Sebastian Sturm
2018-03-19 12:57 ` Eli Zaretskii
2018-03-19 14:56 ` Stefan Monnier
0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-19 9:53 UTC (permalink / raw)
To: emacs-devel
On 03/19/2018 07:43 AM, Eli Zaretskii wrote:
>> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>> Date: Mon, 19 Mar 2018 00:20:13 +0100
>>
>> for the record, I just switched back to emacs master (no noverlay) and
>> the time reported by (benchmark-run 1000 (line-number-at-pos (point))
>> increased by a factor of ~40, to 75-80s. At this level, editing is
>> unbearably slow. With the semantic highlighter disabled, the same
>> measurement yields ~2.5s (still painfully slow, but borderline usable),
>> so about the same time reported by the noverlay branch.
>
> You will have to explain why overlays and the semantic highlighter
> affect line-counting. How about presenting a profile produced by
> "M-x profiler-report"?
please find below a profiler report taken this morning (on my PC at
work, which doesn't suffer from the performance issue as much as my 2014
MacBook Pro, but even here the issue is clearly noticeable)
>
> And the timings you measure are 2.5 _milliseconds_ (the benchmark runs
> 1000 times), right? If so, I cannot understand why you say that's
> borderline usable, because IME such short times are imperceptible by
> humans. I guess some other factor is at work here, so I'd suggest to
> describe more details about your use case.
well no, it's about 2.5ms per call to line-number-at-pos, which is
called at least 6 times per character insertion (with my Emacs config,
at least). Which already makes for 15ms per character insertion,
excluding anything else done by cc-mode or lsp-mode.
>> Since the time taken by line-number-at-pos seems to fluctuate wildly for
>> (to me) unknown reasons, I'll try and see if I can set up a systematic
>> way to collect reliable data.
>
> Yes, please do. I'm guessing there's some factor here that is
> important to consider.
I wrote a simple-minded measurement routine that can at least report the
time taken between successive character insertions, and I noticed that
better alternatives have been brought up in response to my other mailing
list thread (titled "Latency profiling?"). I haven't had time to prepare
a systematic test case yet, but I will look into it when I get home from
work this evening.
- command-execute 5676 73%
- call-interactively 5676 73%
- funcall-interactively 5676 73%
- self-insert-command 5080 66%
- lsp-on-change 2144 27%
- lsp--text-document-content-change-event 1964 25%
- lsp--point-to-position 1964 25%
line-number-at-pos 1964 25%
+ json-encode 164 2%
+ file-truename 16 0%
- c-after-change 1784 23%
- mapc 1772 23%
+ #<compiled 0x1a24bed> 1772 23%
- c-invalidate-sws-region-after 12 0%
- c-invalidate-sws-region-after-ins 12 0%
- c-beginning-of-macro 8 0%
back-to-indentation 4 0%
- c-literal-limits 4 0%
c-state-full-pp-to-literal 4 0%
- lsp-before-change 1016 13%
- lsp--point-to-position 1016 13%
line-number-at-pos 1016 13%
- sp--post-self-insert-hook-handler 56 0%
- sp-insert-pair 28 0%
- sp--pair-to-insert 24 0%
- sp--all-pairs-to-insert 20 0%
- sp--looking-back-p 8 0%
sp--looking-back 8 0%
+ sp--do-action-p 8 0%
+ sp--get-closing-regexp 4 0%
+ sp--all-pairs-to-insert 16 0%
+ sp-escape-open-delimiter 12 0%
+ c-before-change 28 0%
- jit-lock-after-change 4 0%
+ run-hook-with-args 4 0%
+ counsel-M-x 360 4%
+ evil-open-below 232 3%
- lsp-ui-doc--make-request 643 8%
line-number-at-pos 627 8%
+ file-truename 8 0%
url-hexify-string 4 0%
- lsp-ui-sideline 528 6%
line-number-at-pos 528 6%
+ timer-event-handler 327 4%
+ evil-escape-pre-command-hook 220 2%
+ #<compiled 0xb54291> 141 1%
+ redisplay_internal (C function) 88 1%
+ ... 29 0%
+ yas--post-command-handler 20 0%
+ global-spacemacs-whitespace-cleanup-mode-check-buffers 4 0%
+ internal-timer-start-idle 4 0%
+ flycheck-display-error-at-point-soon 4 0%
flycheck-error-list-update-source 4 0%
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 6:50 ` Eli Zaretskii
@ 2018-03-19 12:29 ` Stefan Monnier
2018-03-19 13:02 ` Eli Zaretskii
0 siblings, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 12:29 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
>> It should be pretty easy to provide such a thing by relying on a cache
>> of the last call.
> This is already coded, see display_count_lines.
I don't see any cache in display_count_lines but yes, the code that uses
display_count_lines does do such caching and we could/should expose it
to Lisp.
In nlinum.el I also have something similar (called
nlinum--line-number-at-pos).
> But I don't believe it could be orders of magnitude faster than
> count-lines, even though it doesn't need to convert character position
> to byte position.
Scanning from the last used position can be *very* different from
scanning from point-min. So yes, it can be orders of magnitude faster
(I wrote nlinum--line-number-at-pos for that reason: I sadly didn't
write down the test case I used back then, but the difference was
very significant).
> I'm guessing something entirely different and unrelated to
> line-counting per se is at work here.
Agreed.
>> Tho Sebastian's experience seems to indicate that the
>> current code doesn't only suffer from the time to count LF but also from
>> the time to process the markers.
> Not sure what marker processing did you have in mind. Can you
> elaborate?
The
for (tail = BUF_MARKERS (b); tail; tail = tail->next)
loop in buf_charpos_to_bytepos and buf_bytepos_to_charpos.
> But find_newline doesn't look for markers, and it converts character
> to byte position just 2 times. Or am I missing something?
The idea is that the above loop (even if called only twice) might be
sufficient to make line-number-at-pos take 0.2s.
I don't know that it's the culprit, I'm just mentioning that
possibility, since noverlays removes all the overlay-induced markers
which would significantly reduce the number of markers over which the
above loops.
Note that those loops stop as soon as we're within 50 chars of the goal,
and they also stop as soon as there's no non-ascii char between the
"best bounds so far".
So for them to cause the slow down seen here, we'd need not only
a very large number of markers but also additional conditions that might
not be very likely.
But it's still a possibility.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 9:53 ` Sebastian Sturm
@ 2018-03-19 12:57 ` Eli Zaretskii
2018-03-19 14:56 ` Stefan Monnier
1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 12:57 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Mon, 19 Mar 2018 10:53:52 +0100
>
> On 03/19/2018 07:43 AM, Eli Zaretskii wrote:
> >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> >> Date: Mon, 19 Mar 2018 00:20:13 +0100
> >>
> >> for the record, I just switched back to emacs master (no noverlay) and
> >> the time reported by (benchmark-run 1000 (line-number-at-pos (point))
> >> increased by a factor of ~40, to 75-80s. At this level, editing is
> >> unbearably slow. With the semantic highlighter disabled, the same
> >> measurement yields ~2.5s (still painfully slow, but borderline usable),
> >> so about the same time reported by the noverlay branch.
> >
> > You will have to explain why overlays and the semantic highlighter
> > affect line-counting. How about presenting a profile produced by
> > "M-x profiler-report"?
>
> please find below a profiler report taken this morning (on my PC at
> work, which doesn't suffer from the performance issue as much as my 2014
> MacBook Pro, but even here the issue is clearly noticeable)
That profile says that self-insert-command takes a large percentage of
the time. So I think we should look into the reasons for such a
strange place to spend hundreds of microseconds. According to the
profile, line-number-at-pos takes about the same percentage of time as
self-insert-command does. And that is even before you optimize the
successive calls to line-counting code to take advantage of the
previously computed value for some close line.
> > And the timings you measure are 2.5 _milliseconds_ (the benchmark runs
> > 1000 times), right? If so, I cannot understand why you say that's
> > borderline usable, because IME such short times are imperceptible by
> > humans. I guess some other factor is at work here, so I'd suggest to
> > describe more details about your use case.
>
> well no, it's about 2.5ms per call to line-number-at-pos, which is
> called at least 6 times per character insertion (with my Emacs config,
> at least). Which already makes for 15ms per character insertion,
> excluding anything else done by cc-mode or lsp-mode.
Then, as I said, I don't understand why it takes so much on your
system. I get times that are 10 times faster.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 12:29 ` Stefan Monnier
@ 2018-03-19 13:02 ` Eli Zaretskii
2018-03-19 13:43 ` Stefan Monnier
0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 13:02 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org
> Date: Mon, 19 Mar 2018 08:29:40 -0400
>
> > But I don't believe [display_count_lines] could be orders of
> > magnitude faster than count-lines, even though it doesn't need to
> > convert character position to byte position.
>
> Scanning from the last used position can be *very* different from
> scanning from point-min. So yes, it can be orders of magnitude faster
Well, in my measurements it's already very fast. I don't understand
why the OP sees times that are 10 times slower.
> The
>
> for (tail = BUF_MARKERS (b); tail; tail = tail->next)
>
> loop in buf_charpos_to_bytepos and buf_bytepos_to_charpos.
>
> > But find_newline doesn't look for markers, and it converts character
> > to byte position just 2 times. Or am I missing something?
>
> The idea is that the above loop (even if called only twice) might be
> sufficient to make line-number-at-pos take 0.2s.
I very much doubt that loop could take such a long time. And running
a benchmark 1000 times means that the 2nd through 1000th iteration
find the mapping much faster, probably bypassing the loop entirely.
> So for them to cause the slow down seen here, we'd need not only
> a very large number of markers but also additional conditions that might
> not be very likely.
> But it's still a possibility.
I'll believe it when I see it.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 13:02 ` Eli Zaretskii
@ 2018-03-19 13:43 ` Stefan Monnier
2018-03-19 14:28 ` Eli Zaretskii
0 siblings, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 13:43 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
>> The idea is that the above loop (even if called only twice) might be
>> sufficient to make line-number-at-pos take 0.2s.
> I very much doubt that loop could take such a long time.
I also find it unlikely.
> And running a benchmark 1000 times means that the 2nd through 1000th
> iteration find the mapping much faster, probably bypassing the
> loop entirely.
Quite likely.
BTW, when thinking about how to avoid this loop's worst case (regardless
if it's the source of the current problem), I was thinking that we could
probably make that worst case even less likely by replacing the
hardcoded "50" with a limit that grows as the loop progresses.
The patch below should make sure that the total number of iterations
(through markers + through chars) is no more than 2*buffer-size, no matter
how many markers there are.
[ Rather than incrementing by 1 we should ideally increment by
a number that corresponds to how much slower is each iteration through
markers compared to the iteration through chars, but I have no idea
what that number would typically be. ]
WDYT?
Stefan
diff --git a/src/marker.c b/src/marker.c
index f61701f2f6..bfe6de486e 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -141,6 +141,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
+ ptrdiff_t distance = 50;
eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
@@ -180,8 +181,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
/* If we are down to a range of 50 chars,
don't bother checking any other markers;
scan the intervening chars directly now. */
- if (best_above - best_below < 50)
+ if (best_above - best_below < distance)
break;
+ else
+ distance++;
}
/* We get here if we did not exactly hit one of the known places.
@@ -293,6 +296,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
+ ptrdiff_t distance = 50;
eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
@@ -323,8 +327,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
/* If we are down to a range of 50 chars,
don't bother checking any other markers;
scan the intervening chars directly now. */
- if (best_above - best_below < 50)
+ if (best_above - best_below < distance)
break;
+ else
+ distance++;
}
/* We get here if we did not exactly hit one of the known places.
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 13:43 ` Stefan Monnier
@ 2018-03-19 14:28 ` Eli Zaretskii
2018-03-19 14:39 ` Stefan Monnier
0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 14:28 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org
> Date: Mon, 19 Mar 2018 09:43:12 -0400
>
> BTW, when thinking about how to avoid this loop's worst case (regardless
> if it's the source of the current problem), I was thinking that we could
> probably make that worst case even less likely by replacing the
> hardcoded "50" with a limit that grows as the loop progresses.
>
> The patch below should make sure that the total number of iterations
> (through markers + through chars) is no more than 2*buffer-size, no matter
> how many markers there are.
> [ Rather than incrementing by 1 we should ideally increment by
> a number that corresponds to how much slower is each iteration through
> markers compared to the iteration through chars, but I have no idea
> what that number would typically be. ]
>
> WDYT?
Could be a good idea, but I suggest to time its improvement before we
decide. I've seen a few surprises in that area.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 14:28 ` Eli Zaretskii
@ 2018-03-19 14:39 ` Stefan Monnier
0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 14:39 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> Could be a good idea, but I suggest to time its improvement before we
> decide. I've seen a few surprises in that area.
I see some speed up in my artificial test (where I create an insane
number of markers at point-min and then ask to goto-char to the middle
of a midsized buffer) but I can't see any impact whatsoever (neither
positive nor negative) in real tests (unsurprisingly).
I'd rather first have a non-artificial case where I can measure
a speed up.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 9:53 ` Sebastian Sturm
2018-03-19 12:57 ` Eli Zaretskii
@ 2018-03-19 14:56 ` Stefan Monnier
2018-03-19 15:07 ` Sebastian Sturm
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 14:56 UTC (permalink / raw)
To: emacs-devel
> well no, it's about 2.5ms per call to line-number-at-pos, which is called at
> least 6 times per character insertion (with my Emacs config, at
> least). Which already makes for 15ms per character insertion, excluding
> anything else done by cc-mode or lsp-mode.
Since you say that the noverlay branch helps, could you check the number
of overlays involved? E.g.
M-: (length (overlays-in (point-min) (point-max))) RET
If there are more overlays than chars in this buffer, maybe there's
a problem in some Elisp that creates too many overlays?
If there aren't that many overlays, then I wonder why the noverlays
branch would make such a significant difference.
Also, if you can reliably reproduce the "slow editing", would it be
possible to make a recipe for it that we can try and reproduce on
our side?
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 14:56 ` Stefan Monnier
@ 2018-03-19 15:07 ` Sebastian Sturm
2018-03-19 15:13 ` Stefan Monnier
2018-03-20 1:23 ` Sebastian Sturm
0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-19 15:07 UTC (permalink / raw)
To: emacs-devel
for the file I was complaining about, the number returned is 3080
(doesn't exceed the number of chars though, (point-max) = 67641). Will
try to obtain usable timing data later today when I get home from work.
Thanks!
On 03/19/2018 03:56 PM, Stefan Monnier wrote:
>> well no, it's about 2.5ms per call to line-number-at-pos, which is called at
>> least 6 times per character insertion (with my Emacs config, at
>> least). Which already makes for 15ms per character insertion, excluding
>> anything else done by cc-mode or lsp-mode.
>
> Since you say that the noverlay branch helps, could you check the number
> of overlays involved? E.g.
>
> M-: (length (overlays-in (point-min) (point-max))) RET
>
> If there are more overlays than chars in this buffer, maybe there's
> a problem in some Elisp that creates too many overlays?
>
> If there aren't that many overlays, then I wonder why the noverlays
> branch would make such a significant difference.
>
> Also, if you can reliably reproduce the "slow editing", would it be
> possible to make a recipe for it that we can try and reproduce on
> our side?
>
>
> Stefan
>
>
--
Sebastian Sturm
Research & Development
Phone: +49 (0) 6155 7808977
Fax: +49 (0) 6155 7802880
Email: s.sturm@arkona-technologies.de
Web: www.arkona-technologies.de
arkona technologies GmbH
Im Leuschnerpark 4
64347 Griesheim
Germany
Amtsgericht / Commercial Register of Darmstadt, HRB 90080
USt-ID: DE273794666
Steuernummer / Tax-ID: 007 / 228 / 19331
Geschäftsführung / Managing Director: Rainer Sturm
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and delete this message and
any attachments from your system. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
We cannot accept any responsibility for the accuracy or completeness of
this message as it has been transmitted over a public network. If,
despite our use of anti-virus software, a virus enters your systems in
connection with the sending of the e-mail, you may not hold us liable
for any damages that may possibly arise in that connection. We will
accept liability which by law we cannot exclude.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 15:07 ` Sebastian Sturm
@ 2018-03-19 15:13 ` Stefan Monnier
2018-03-20 1:23 ` Sebastian Sturm
1 sibling, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 15:13 UTC (permalink / raw)
To: emacs-devel
> for the file I was complaining about, the number returned is 3080 (doesn't
> exceed the number of chars though, (point-max) = 67641). Will try to obtain
> usable timing data later today when I get home from work. Thanks!
Thanks.
It's large enough that noverlay might indeed make a difference for some
operations, but not large enough to explain why line-number-at-pos would
be so slow, I think.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-19 15:07 ` Sebastian Sturm
2018-03-19 15:13 ` Stefan Monnier
@ 2018-03-20 1:23 ` Sebastian Sturm
2018-03-20 6:30 ` Eli Zaretskii
2018-03-22 20:52 ` Stefan Monnier
1 sibling, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-20 1:23 UTC (permalink / raw)
To: emacs-devel
after disabling cquery for testing purposes (which leaves me with 45
overlays generated by flycheck for my troublesome C++ file), I'm now
generating a large number of overlays using the following function:
(defun test-highlight ()
(save-excursion
(with-silent-modifications
(let ((stepsize 10))
(widen)
(goto-char 1)
(cl-loop for n from (point-min) upto (- (point-max)
stepsize) by stepsize do
(let ((ov (make-overlay n (+ (1- stepsize) n))))
(overlay-put ov 'cquery-sem-highlight t)))))))
Evaluating the following function without additional overlays (beyond
the flycheck ones, that is) yields the following results:
(defun benchmark-often ()
(cl-loop for n from 1 upto 20 do
(message (format "iteration %d: %f" n (nth 0 (benchmark-run
(line-number-at-pos (point))))))))
1st run:
iteration 1: 0.001213
iteration 2: 0.001170
iteration 3: 0.001170
iteration 4: 0.001238
iteration 5: 0.001163
iteration 6: 0.001153
iteration 7: 0.000421
iteration 8: 0.000426
iteration 9: 0.000322
iteration 10: 0.000301
iteration 11: 0.000291
iteration 12: 0.000292
iteration 13: 0.000291
iteration 14: 0.000291
iteration 15: 0.000295
iteration 16: 0.000289
iteration 17: 0.000289
iteration 18: 0.000288
iteration 19: 0.000288
iteration 20: 0.000287
2nd run:
iteration 1: 0.001044
iteration 2: 0.000942
iteration 3: 0.000935
iteration 4: 0.000935
iteration 5: 0.000935
iteration 6: 0.000932
iteration 7: 0.000954
iteration 8: 0.000940
iteration 9: 0.000933
iteration 10: 0.000625
iteration 11: 0.000545
iteration 12: 0.000428
iteration 13: 0.000362
iteration 14: 0.000346
iteration 15: 0.000325
iteration 16: 0.000309
iteration 17: 0.000309
iteration 18: 0.000316
iteration 19: 0.000310
iteration 20: 0.000308
after evaluating (test-highlight) the figures are as follows:
1st run:
iteration 1: 0.026012
iteration 2: 0.020334
iteration 3: 0.020250
iteration 4: 0.020349
iteration 5: 0.020501
iteration 6: 0.020635
iteration 7: 0.020302
iteration 8: 0.020426
iteration 9: 0.020440
iteration 10: 0.020441
iteration 11: 0.020515
iteration 12: 0.020525
iteration 13: 0.020383
iteration 14: 0.020510
iteration 15: 0.019829
iteration 16: 0.019899
iteration 17: 0.019950
iteration 18: 0.019828
iteration 19: 0.019901
iteration 20: 0.019819
2nd run:
iteration 1: 0.026526
iteration 2: 0.020051
iteration 3: 0.020100
iteration 4: 0.020080
iteration 5: 0.020080
iteration 6: 0.020249
iteration 7: 0.020087
iteration 8: 0.020005
iteration 9: 0.019980
iteration 10: 0.019985
iteration 11: 0.020077
iteration 12: 0.019979
iteration 13: 0.020060
iteration 14: 0.020092
iteration 15: 0.019954
iteration 16: 0.019766
iteration 17: 0.019432
iteration 18: 0.019491
iteration 19: 0.019458
iteration 20: 0.019482
I'm not allowed to share my employer's source code as a test case, so I
tried the same procedure with the similarly large DeclBase.h from the
public LLVM repository. To my surprise, DeclBase.h didn't suffer from
any performance issues at all. Switching to fundamental-mode while
visiting my file didn't change anything, so I assume that c-mode isn't
to blame either.
There have been claims of overlay-related performance issues with cquery
and some large(-ish) open-source C or C++ files, so I'll try to locate
these files and hope that at least one of them exhibits this issue as well.
On 03/19/2018 04:07 PM, Sebastian Sturm wrote:
> for the file I was complaining about, the number returned is 3080
> (doesn't exceed the number of chars though, (point-max) = 67641). Will
> try to obtain usable timing data later today when I get home from work.
> Thanks!
>
> On 03/19/2018 03:56 PM, Stefan Monnier wrote:
>>> well no, it's about 2.5ms per call to line-number-at-pos, which is
>>> called at
>>> least 6 times per character insertion (with my Emacs config, at
>>> least). Which already makes for 15ms per character insertion, excluding
>>> anything else done by cc-mode or lsp-mode.
>>
>> Since you say that the noverlay branch helps, could you check the number
>> of overlays involved? E.g.
>>
>> M-: (length (overlays-in (point-min) (point-max))) RET
>>
>> If there are more overlays than chars in this buffer, maybe there's
>> a problem in some Elisp that creates too many overlays?
>>
>> If there aren't that many overlays, then I wonder why the noverlays
>> branch would make such a significant difference.
>>
>> Also, if you can reliably reproduce the "slow editing", would it be
>> possible to make a recipe for it that we can try and reproduce on
>> our side?
>>
>>
>> Stefan
>>
>>
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-20 1:23 ` Sebastian Sturm
@ 2018-03-20 6:30 ` Eli Zaretskii
2018-03-21 0:36 ` Sebastian Sturm
2018-03-22 20:52 ` Stefan Monnier
1 sibling, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-20 6:30 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Tue, 20 Mar 2018 02:23:02 +0100
>
> 1st run:
> iteration 1: 0.001213
> iteration 2: 0.001170
> [...]
>
> after evaluating (test-highlight) the figures are as follows:
> 1st run:
> iteration 1: 0.026012
> iteration 2: 0.020334
So, between 20-fold and 100-fold slow-down.
> I'm not allowed to share my employer's source code as a test case, so I
> tried the same procedure with the similarly large DeclBase.h from the
> public LLVM repository. To my surprise, DeclBase.h didn't suffer from
> any performance issues at all. Switching to fundamental-mode while
> visiting my file didn't change anything, so I assume that c-mode isn't
> to blame either.
So it's still a mystery why your original file produces such a large
slowdown with overlays.
Can you show the results of "M-x profiler-report" for the slow test
with your original source file? It could have some clues. If that's
impossible, I can only repeat my suggestion to use perf to find the
code in Emacs that takes the lion's share of the processing time.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-20 6:30 ` Eli Zaretskii
@ 2018-03-21 0:36 ` Sebastian Sturm
2018-03-21 6:47 ` Eli Zaretskii
2018-03-22 13:16 ` Stefan Monnier
0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-21 0:36 UTC (permalink / raw)
To: emacs-devel
> So it's still a mystery why your original file produces such a large
> slowdown with overlays.
>
> Can you show the results of "M-x profiler-report" for the slow test
> with your original source file? It could have some clues. If that's
> impossible, I can only repeat my suggestion to use perf to find the
> code in Emacs that takes the lion's share of the processing time.
this is the profiler report I get for the slow case (BTW, is there a way
to have the profiler resolve functions within line-number-at-pos? I
tried increasing profiler-max-stack-depth to 32, but the profiler still
didn't show anything below line-number-at-pos)
- command-execute 20522 97%
- call-interactively 20522 97%
- funcall-interactively 20061 94%
- eval-expression 18609 87%
- eval 18609 87%
- benchmark-often 18609 87%
- let* 18609 87%
- while 18609 87%
- message 18388 86%
- format 18376 86%
- nth 18376 86%
- let 18376 86%
- list 18376 86%
- let 18368 86%
line-number-at-pos 18348 86%
ws-butler-after-change 4 0%
+ evil-open-below 614 2%
+ counsel-M-x 575 2%
+ undo-tree-undo 79 0%
+ next-line 68 0%
+ evil-normal-state 52 0%
+ evil-previous-line 48 0%
+ previous-line 7 0%
+ evil-emacs-state 6 0%
+ evil-insert 3 0%
+ byte-code 461 2%
+ redisplay_internal (C function) 254 1%
+ timer-event-handler 116 0%
+ evil-escape-pre-command-hook 82 0%
+ ... 63 0%
+ global-spacemacs-whitespace-cleanup-mode-check-buffers
47 0%
+ yas--post-command-handler 33 0%
+ evil-repeat-post-hook 6 0%
+ evil-visual-post-command 5 0%
+ flycheck-handle-signal 4 0%
evil-snipe-mode-check-buffers 4 0%
global-undo-tree-mode-check-buffers 4 0%
+ sp--save-pre-command-state 2 0%
+ xselect-convert-to-string 2 0%
+ evil-visual-pre-command 1 0%
+ flycheck-pos-tip-hide-messages 1 0%
+ which-key--hide-popup 1 0%
with perf, the ("self") time taken by buf_charpos_to_bytepos increases
from ~60% (fast case) to >98%. This is the diff generated by perf diff
<fast.perf> <slow.perf>:
# Event 'cycles'
#
# Baseline Delta Shared Object Symbol
# ........ ....... ....................
..........................................
#
57.77% +40.30% emacs-27.0.50 [.] buf_charpos_to_bytepos
11.12% -10.70% libc-2.23.so [.] __memrchr
6.48% -6.19% emacs-27.0.50 [.] assq_no_quit
5.92% -5.62% emacs-27.0.50 [.] find_cache_boundary
4.26% -4.07% emacs-27.0.50 [.] set_buffer_internal_2
4.10% -3.73% emacs-27.0.50 [.] find_newline
3.54% libc-2.23.so [.] __memmove_avx_unaligned
1.25% -1.19% emacs-27.0.50 [.] region_cache_forward
0.46% -0.41% [kernel.kallsyms] [k] 0xffffffff83004eb0
0.26% -0.26% emacs-27.0.50 [.]
revalidate_region_cache.isra.1
0.25% -0.23% emacs-27.0.50 [.] find_interval
0.23% -0.23% emacs-27.0.50 [.] swap_in_symval_forwarding
0.19% -0.18% emacs-27.0.50 [.] do_symval_forwarding
0.16% -0.15% emacs-27.0.50 [.] eval_sub
0.16% -0.15% emacs-27.0.50 [.] x_produce_glyphs
0.13% libXft.so.2.3.2 [.] XftCharIndex
0.13% -0.12% libXft.so.2.3.2 [.] XftGlyphExtents
0.12% -0.12% emacs-27.0.50 [.] store_symval_forwarding
0.12% -0.11% emacs-27.0.50 [.] exec_byte_code
0.09% emacs-27.0.50 [.] Fsymbol_value
0.08% -0.08% emacs-27.0.50 [.] x_get_glyph_overhangs
0.07% emacs-27.0.50 [.] find_symbol_value
0.07% libc-2.23.so [.] __memcpy_avx_unaligned
0.07% emacs-27.0.50 [.] Fassq
0.07% emacs-27.0.50 [.] insert_1_both.part.9
0.06% emacs-27.0.50 [.] offset_intervals
0.06% libc-2.23.so [.] __memcmp_sse4_1
0.06% emacs-27.0.50 [.] next_element_from_buffer
0.05% -0.05% emacs-27.0.50 [.] move_it_in_display_line_to
0.05% -0.04% emacs-27.0.50 [.] get_next_display_element
0.05% -0.04% libXft.so.2.3.2 [.] XftFontCheckGlyph
0.05% -0.04% libX11.so.6.3.0 [.] _XSetClipRectangles
0.04% libc-2.23.so [.] __GI___printf_fp_l
0.04% emacs-27.0.50 [.] styled_format
0.04% -0.03% emacs-27.0.50 [.] xftfont_text_extents
0.04% -0.03% emacs-27.0.50 [.] draw_glyphs
0.04% emacs-27.0.50 [.] display_line
0.04% emacs-27.0.50 [.] validate_interval_range
0.04% emacs-27.0.50 [.] update_window_fringes
0.04% libX11.so.6.3.0 [.] 0x000000000001b65a
0.04% -0.03% emacs-27.0.50 [.] lookup_char_property
0.04% emacs-27.0.50 [.] balance_an_interval
0.03% emacs-27.0.50 [.] composition_compute_stop_pos
0.03% libpthread-2.23.so [.] pthread_mutex_unlock
0.03% -0.03% emacs-27.0.50 [.] x_draw_glyph_string
0.03% -0.03% emacs-27.0.50 [.] mem_insert
0.03% -0.03% libpthread-2.23.so [.] pthread_mutex_lock
0.03% libc-2.23.so [.] malloc
0.03% emacs-27.0.50 [.] update_window_line
0.03% emacs-27.0.50 [.] Ffuncall
0.03% -0.03% emacs-27.0.50 [.] get_glyph_face_and_encoding
0.03% -0.02% emacs-27.0.50 [.] set_internal
0.03% -0.02% emacs-27.0.50 [.] update_window
0.03% libXft.so.2.3.2 [.] XftDrawSrcPicture
0.03% libc-2.23.so [.] vfprintf
0.03% -0.02% emacs-27.0.50 [.] do_one_unbind.constprop.20
0.03% emacs-27.0.50 [.] set_iterator_to_next
0.02% emacs-27.0.50 [.] specbind
0.02% libc-2.23.so [.] __memset_avx2
0.02% -0.02% libX11.so.6.3.0 [.] _XFlushGCCache
0.02% emacs-27.0.50 [.]
lookup_glyphless_char_display
0.02% libc-2.23.so [.] _int_malloc
0.02% emacs-27.0.50 [.] set_cursor_from_row.isra.40
0.02% -0.02% emacs-27.0.50 [.] message_dolog.part.60
0.02% emacs-27.0.50 [.] gap_left
0.02% -0.02% emacs-27.0.50 [.] unbind_to
0.02% -0.02% emacs-27.0.50 [.] init_iterator
0.02% emacs-27.0.50 [.] Fget_buffer
0.02% -0.01% emacs-27.0.50 [.] overlays_at
0.02% -0.01% [wl] [k] osl_readl
0.02% -0.01% emacs-27.0.50 [.] set_point_both
0.02% emacs-27.0.50 [.] message3_nolog
0.02% libX11.so.6.3.0 [.] _XGetRequest
0.02% libc-2.23.so [.] __sprintf_chk
0.02% emacs-27.0.50 [.] adjust_markers_for_insert
0.02% emacs-27.0.50 [.] adjust_suspend_auto_hscroll
0.02% emacs-27.0.50 [.]
get_char_property_and_overlay
0.01% emacs-27.0.50 [.] window_box_width
0.01% emacs-27.0.50 [.] Fcons
0.01% libxcb.so.1.1.0 [.] 0x0000000000009cca
0.01% -0.01% emacs-27.0.50 [.] arith_driver
0.01% -0.01% emacs-27.0.50 [.] resize_mini_window
0.01% emacs-27.0.50 [.] unchain_marker
0.01% emacs-27.0.50 [.] face_for_char
0.01% emacs-27.0.50 [.] unblock_input_to
0.01% emacs-27.0.50 [.] unblock_input
0.01% emacs-27.0.50 [.]
balance_possible_root_interval
0.01% emacs-27.0.50 [.] add_text_properties_1
0.01% emacs-27.0.50 [.] save_restriction_restore
0.01% emacs-27.0.50 [.] funcall_subr
0.01% emacs-27.0.50 [.] gap_right
0.01% -0.01% emacs-27.0.50 [.] Fcurrent_buffer
0.01% libXrender.so.1.3.0 [.] XRenderFillRectangle
0.01% libXext.so.6.4.0 [.] XdbeSwapBuffers
0.01% emacs-27.0.50 [.]
get_glyph_string_clip_rects.part.72
0.01% emacs-27.0.50 [.] Fwhile
0.01% emacs-27.0.50 [.] window_wants_mode_line
0.01% emacs-27.0.50 [.] display_echo_area_1
0.01% libXft.so.2.3.2 [.] XftDrawSetClipRectangles
0.01% emacs-27.0.50 [.] append_space_for_newline
0.01% emacs-27.0.50 [.] x_update_end
0.01% emacs-27.0.50 [.] Fforward_line
0.01% emacs-27.0.50 [.] memrchr@plt
0.01% emacs-27.0.50 [.] make_uninit_multibyte_string
0.01% emacs-27.0.50 [.] prepare_to_modify_buffer_1
0.01% emacs-27.0.50 [.] Fstring_equal
0.01% emacs-27.0.50 [.] init_glyph_string
0.01% libc-2.23.so [.] _IO_old_init
0.01% libXrender.so.1.3.0 [.] XRenderFindDisplay
0.01% emacs-27.0.50 [.] Fnreverse
0.01% emacs-27.0.50 [.] arithcompare
0.01% emacs-27.0.50 [.] font_get_frame_data
0.01% emacs-27.0.50 [.] window_box_left
0.01% libX11.so.6.3.0 [.] XSetClipRectangles
0.01% -0.01% emacs-27.0.50 [.] Flength
0.01% emacs-27.0.50 [.] previous_interval
0.01% libXft.so.2.3.2 [.] 0x0000000000007294
0.01% emacs-27.0.50 [.]
x_compute_glyph_string_overhangs
0.01% emacs-27.0.50 [.] XftCharIndex@plt
0.01% libXrender.so.1.3.0 [.] XRenderCompositeString16
0.01% emacs-27.0.50 [.] message3
0.01% emacs-27.0.50 [.] xftfont_encode_char
0.01% emacs-27.0.50 [.] assign_row
0.01% emacs-27.0.50 [.] prepare_desired_row
0.01% emacs-27.0.50 [.] xftfont_draw
0.01% libXext.so.6.4.0 [.] XextFindDisplay
0.01% libpthread-2.23.so [.]
pthread_cond_broadcast@@GLIBC_2.3.2
0.01% libXft.so.2.3.2 [.] XftDrawGlyphs
0.01% emacs-27.0.50 [.] marker_position
0.01% emacs-27.0.50 [.] funcall_lambda
0.01% emacs-27.0.50 [.] gettime
0.01% libpthread-2.23.so [.] __GI___libc_recvmsg
0.01% emacs-27.0.50 [.] lisp_time_struct
0.01% emacs-27.0.50 [.] echo_area_display
0.01% emacs-27.0.50 [.] CHECK_MARKER
0.01% emacs-27.0.50 [.] del_range_2
0.01% emacs-27.0.50 [.] prepare_face_for_display
0.01% emacs-27.0.50 [.] interval_deletion_adjustment
0.01% libc-2.23.so [.] strlen
0.01% emacs-27.0.50 [.] verify_interval_modification
0.01% emacs-27.0.50 [.] count_size_as_multibyte
0.01% emacs-27.0.50 [.] adjust_overlays_for_insert
0.01% libXrender.so.1.3.0 [.]
XRenderSetPictureClipRectangles
0.01% emacs-27.0.50 [.] buffer_local_value
0.01% emacs-27.0.50 [.] adjust_window_count
0.01% emacs-27.0.50 [.]
notice_overwritten_cursor.part.52
0.01% libc-2.23.so [.] __GI___writev
0.01% emacs-27.0.50 [.] record_in_backtrace
0.01% emacs-27.0.50 [.] save_restriction_save
0.01% emacs-27.0.50 [.] Fget_text_property
0.01% -0.00% [vdso] [.] __vdso_clock_gettime
0.01% emacs-27.0.50 [.] window_wants_header_line
0.01% emacs-27.0.50 [.] Fmessage
0.01% emacs-27.0.50 [.] FUNCTIONP
0.01% emacs-27.0.50 [.] window_display_table
0.01% emacs-27.0.50 [.] maybe_quit
0.01% emacs-27.0.50 [.] clear_glyph_matrix
0.01% -0.00% emacs-27.0.50 [.] with_echo_area_buffer
0.01% libc-2.23.so [.] __libc_enable_asynccancel
0.01% emacs-27.0.50 [.] should_produce_line_number
0.01% emacs-27.0.50 [.] x_set_glyph_string_clipping
0.01% emacs-27.0.50 [.] set_message_1
0.01% emacs-27.0.50 [.] del_range_both
0.01% emacs-27.0.50 [.] record_insert
0.01% emacs-27.0.50 [.] adjust_overlays_for_delete
0.01% +0.00% libc-2.23.so [.] _int_free
0.01% libc-2.23.so [.] __strchrnul
0.01% emacs-27.0.50 [.] Fgoto_char
0.01% emacs-27.0.50 [.] free_misc
0.01% emacs-27.0.50 [.] draw_window_fringes
0.01% emacs-27.0.50 [.] x_flush.isra.37.part.38
0.01% libpthread-2.23.so [.] __errno_location
0.01% emacs-27.0.50 [.] make_float
0.01% emacs-27.0.50 [.] fill_glyph_string
0.01% emacs-27.0.50 [.] Fget
0.01% emacs-27.0.50 [.] Flocal_variable_p
0.01% emacs-27.0.50 [.] decode_time_components
0.01% emacs-27.0.50 [.] XftGlyphExtents@plt
0.01% emacs-27.0.50 [.] row_for_charpos_p
0.01% emacs-27.0.50 [.] load_overlay_strings
0.01% emacs-27.0.50 [.] allocate_misc
0.01% -0.00% libX11.so.6.3.0 [.] _XSend
0.01% libX11.so.6.3.0 [.] _XData32
0.01% -0.00% emacs-27.0.50 [.] buf_bytepos_to_charpos
0.01% emacs-27.0.50 [.] face_at_buffer_position
0.01% emacs-27.0.50 [.] invalidate_current_column
0.01% emacs-27.0.50 [.] find_composition
0.01% emacs-27.0.50 [.] x_draw_window_cursor
0.01% emacs-27.0.50 [.] x_flip_and_flush
0.01% libc-2.23.so [.] _IO_no_init
0.01% emacs-27.0.50 [.] Ftime_subtract
0.01% emacs-27.0.50 [.] row_hash
0.01% emacs-27.0.50 [.] lookup_basic_face
0.01% emacs-27.0.50 [.] memcpy@plt
0.01% emacs-27.0.50 [.] Fset_buffer
0.01% emacs-27.0.50 [.] produce_special_glyphs
0.00% emacs-27.0.50 [.]
x_draw_glyph_string_background.part.44
0.00% [vdso] [.] 0x0000000000000939
0.00% emacs-27.0.50 [.] make_save_obj_obj_obj_obj
0.00% libc-2.23.so [.] __GI___libc_poll
0.00% emacs-27.0.50 [.] do_specbind
0.00% libXft.so.2.3.2 [.] XftGlyphRender
0.00% emacs-27.0.50 [.] move_it_to
0.00% emacs-27.0.50 [.] make_current.isra.14
0.00% emacs-27.0.50 [.] Fnext_single_property_change
0.00% emacs-27.0.50 [.] handle_face_prop
0.00% libX11.so.6.3.0 [.] _XFlush
0.00% emacs-27.0.50 [.] unwind_with_echo_area_buffer
0.00% emacs-27.0.50 [.] evaporate_overlays
0.00% emacs-27.0.50 [.] sort_overlays
0.00% emacs-27.0.50 [.] del_range_1
0.00% libX11.so.6.3.0 [.] XFlush
0.00% emacs-27.0.50 [.] try_window
0.00% emacs-27.0.50 [.] x_set_glyph_string_gc
0.00% libc-2.23.so [.] malloc_consolidate
0.00% emacs-27.0.50 [.] handle_stop
0.00% emacs-27.0.50 [.] recenter_overlay_lists
0.00% libX11.so.6.3.0 [.] pthread_mutex_lock@plt
0.00% emacs-27.0.50 [.] modify_text_properties
0.00% emacs-27.0.50 [.] delete_interval
0.00% emacs-27.0.50 [.] window_text_bottom_y
0.00% +0.00% emacs-27.0.50 [.] copy_text
0.00% emacs-27.0.50 [.] set_default_internal
0.00% +0.00% libxcb.so.1.1.0 [.] xcb_writev
0.00% emacs-27.0.50 [.] temp_set_point_both
0.00% emacs-27.0.50 [.] insert_1_both
0.00% emacs-27.0.50 [.] record_property_change
0.00% emacs-27.0.50 [.] lisp_align_malloc
0.00% emacs-27.0.50 [.] x_mark_frame_dirty
0.00% +0.00% emacs-27.0.50 [.] text_quoting_style
0.00% libc-2.23.so [.] __GI___mempcpy
0.00% emacs-27.0.50 [.] Ffloat_time
0.00% emacs-27.0.50 [.] x_write_glyphs
0.00% +0.00% emacs-27.0.50 [.] insert_from_string_1
0.00% emacs-27.0.50 [.] Fpoint
0.00% libc-2.23.so [.] __vsprintf_chk
0.00% libc-2.23.so [.] hack_digit
0.00% emacs-27.0.50 [.] grow_specpdl
0.00% emacs-27.0.50 [.] time_arith
0.00% emacs-27.0.50 [.] update_end
0.00% emacs-27.0.50 [.] allocate_string_data
0.00% emacs-27.0.50 [.] save_excursion_restore
0.00% +0.00% emacs-27.0.50 [.] Flet
0.00% emacs-27.0.50 [.] mem_rotate_right
0.00% emacs-27.0.50 [.] fetch_buffer_markers
0.00% emacs-27.0.50 [.] move_gap_both
0.00% emacs-27.0.50 [.] xmalloc
0.00% emacs-27.0.50 [.] update_begin
0.00% emacs-27.0.50 [.] float_arith_driver
0.00% emacs-27.0.50 [.] make_specified_string
0.00% emacs-27.0.50 [.] handle_composition_prop
0.00% emacs-27.0.50 [.] display_and_set_cursor
0.00% emacs-27.0.50 [.] compute_line_metrics
0.00% emacs-27.0.50 [.] clock_gettime@plt
0.00% libc-2.23.so [.] free
0.00% emacs-27.0.50 [.] Fplist_get
0.00% libgdk-3.so.0.1800.9 [.] gdk_display_manager_get
0.00% emacs-27.0.50 [.] make_interval
0.00% libxcb.so.1.1.0 [.] xcb_poll_for_event
0.00% emacs-27.0.50 [.] adjust_markers_for_delete
0.00% emacs-27.0.50 [.] do_pending_window_change
0.00% emacs-27.0.50 [.] XftDrawGlyphs@plt
0.00% libc-2.23.so [.] _itoa_word
0.00% emacs-27.0.50 [.] disassemble_lisp_time
0.00% emacs-27.0.50 [.] Ftext_properties_at
0.00% emacs-27.0.50 [.] composition_reseat_it
0.00% libX11.so.6.3.0 [.] _XEventsQueued
0.00% emacs-27.0.50 [.] minmax_driver
0.00% emacs-27.0.50 [.] set_marker_restricted_both
0.00% emacs-27.0.50 [.] graft_intervals_into_buffer
0.00% emacs-27.0.50 [.] disp_char_vector
0.00% libc-2.23.so [.] __clock_gettime
0.00% emacs-27.0.50 [.] CHECK_STRING_OR_BUFFER
0.00% emacs-27.0.50 [.] intervals_equal
0.00% emacs-27.0.50 [.] ensure_echo_area_buffers
0.00% emacs-27.0.50 [.] invalidate_buffer_caches
0.00% emacs-27.0.50 [.] set_marker_both
0.00% +0.00% emacs-27.0.50 [.] record_unwind_protect
0.00% emacs-27.0.50 [.] run_hook_with_args
0.00% emacs-27.0.50 [.] set_point_from_marker
0.00% emacs-27.0.50 [.] list4
0.00% libX11.so.6.3.0 [.] XSetClipMask
0.00% emacs-27.0.50 [.] record_buffer_markers
0.00% emacs-27.0.50 [.] default_value
+0.00% libXext.so.6.4.0 [.] 0x000000000000bf10
+0.00% emacs-27.0.50 [.] Fsetq
+0.00% emacs-27.0.50 [.] reseat_1
+0.00% emacs-27.0.50 [.] update_compositions
this is what perf annotate shows when invoked on buf_charpos_to_bytepos
(slow case):
│ ↓ jle 438
▒
4,39 │ mov 0x20(%rax),%r8
▒
8,38 │ mov %rdx,%rbp
▒
0,05 │2f0: mov %rbp,%rdx
▒
0,13 │ mov %r8,%rdi
▒
2,70 │ sub %r13,%rdx
▒
2,22 │ sub %rbx,%rdi
▒
0,05 │ cmp %rdi,%rdx
▒
│ ↑ je 1e9
▒
│
▒
│ /* If we are down to a range of 50 chars,
▒
│ don't bother checking any other markers;
▒
│ scan the intervening chars directly now. */
▒
│ if (best_above - best_below < 50)
▒
3,21 │305: cmp $0x31,%rdx
▒
│ ↓ jle 480
▒
│ CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
▒
│
▒
│ if (b == cached_buffer && BUF_MODIFF (b) ==
cached_modiff) ▒
│ CONSIDER (cached_charpos, cached_bytepos);
▒
│
▒
│ for (tail = BUF_MARKERS (b); tail; tail = tail->next)
▒
2,50 │ mov 0x10(%rax),%rax
▒
4,25 │ test %rax,%rax
▒
│ ↓ je 480
▒
│ {
◆
│ CONSIDER (tail->charpos, tail->bytepos);
▒
2,64 │31c: mov 0x18(%rax),%rdx
▒
59,16 │ cmp %rdx,%rsi
▒
│ ↓ je 638
▒
5,70 │ cmp %rdx,%rsi
▒
│ ↑ jl 2e0
▒
0,00 │ cmp %rdx,%r13
▒
│ ↓ jge 438
▒
0,00 │ mov 0x20(%rax),%rbx
▒
0,01 │ mov %rdx,%r13
▒
│ ↑ jmp 2f0
▒
│ marker_byte_position():
▒
│ if (!buf)
▒
│ error ("Marker does not point anywhere");
▒
│
▒
│ eassert (BUF_BEG_BYTE (buf) <= m->bytepos && m->bytepos
<= BUF_Z_BYTE (b▒
│
▒
│ return m->bytepos;
▒
│340: mov 0x20(%rdi),%rbx
▒
│ ↑ jmpq cb
▒
│ nop
▒
│ buf_charpos_to_bytepos():
▒
│
▒
│ If at any point we can tell that the space between
those ▒
│ two best approximations is all single-byte,
▒
│ we interpolate the result immediately. */
▒
│
▒
│ CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
▒
│350: mov 0x2f0(%rdi),%r13
▒
│ cmp %rsi,%r13
▒
to my eyes, the fast case doesn't look too different though:
│ ↓ jle 438
0,83 │ mov 0x20(%rax),%r8
0,93 │ mov %rdx,%rbp
│2f0: mov %rbp,%rdx
0,06 │ mov %r8,%rdi
0,20 │ sub %r13,%rdx
0,11 │ sub %rbx,%rdi
0,12 │ cmp %rdi,%rdx
│ ↑ je 1e9
│
│ /* If we are down to a range of 50 chars,
│ don't bother checking any other markers;
│ scan the intervening chars directly now. */
│ if (best_above - best_below < 50)
11,24 │305: cmp $0x31,%rdx
│ ↓ jle 480
│ CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
│
│ if (b == cached_buffer && BUF_MODIFF (b) == cached_modiff)
│ CONSIDER (cached_charpos, cached_bytepos);
│
│ for (tail = BUF_MARKERS (b); tail; tail = tail->next)
0,16 │ mov 0x10(%rax),%rax
10,41 │ test %rax,%rax
│ ↓ je 480
│ {
│ CONSIDER (tail->charpos, tail->bytepos);
3,19 │31c: mov 0x18(%rax),%rdx
42,86 │ cmp %rdx,%rsi
│ ↓ je 638
10,36 │ cmp %rdx,%rsi
│ ↑ jl 2e0
2,24 │ cmp %rdx,%r13
│ ↓ jge 438
0,76 │ mov 0x20(%rax),%rbx
0,43 │ mov %rdx,%r13
│ ↑ jmp 2f0
│ marker_byte_position():
│ if (!buf)
│ error ("Marker does not point anywhere");
│
│ eassert (BUF_BEG_BYTE (buf) <= m->bytepos && m->bytepos
<= BUF_Z_BYTE (bu
│
│ return m->bytepos;
│340: mov 0x20(%rdi),%rbx
│ ↑ jmpq cb
│ nop
│ buf_charpos_to_bytepos():
│
│ If at any point we can tell that the space between those
│ two best approximations is all single-byte,
│ we interpolate the result immediately. */
│
│ CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
│350: mov 0x2f0(%rdi),%r13
│ cmp %rsi,%r13
I hope this is of some use, but I'll keep looking for open source files
to reproduce the issue
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-21 0:36 ` Sebastian Sturm
@ 2018-03-21 6:47 ` Eli Zaretskii
2018-03-22 13:16 ` Stefan Monnier
1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-21 6:47 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Wed, 21 Mar 2018 01:36:38 +0100
>
> this is the profiler report I get for the slow case (BTW, is there a way
> to have the profiler resolve functions within line-number-at-pos?
Yes: load simple.el manually before running the benchmark.
> with perf, the ("self") time taken by buf_charpos_to_bytepos increases
> from ~60% (fast case) to >98%. This is the diff generated by perf diff
> <fast.perf> <slow.perf>:
>
> # Event 'cycles'
> #
> # Baseline Delta Shared Object Symbol
>
> # ........ ....... ....................
> ..........................................
> #
> 57.77% +40.30% emacs-27.0.50 [.] buf_charpos_to_bytepos
This seems to confirm Stefan's guess that converting character
positions to byte positions takes most of the time, which might make
sense with a lot of overlays (because each overlay uses 2 markers).
Does your code call line-number-at-pos at random positions in the
buffer, or are the positions close to one another? If the latter, you
might be better off calling count-lines directly, starting at the line
where you previously calculated the line number, instead of calling
line-number-at-pos, which always begins at point-min.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 20:39 ` Eli Zaretskii
2018-03-18 21:04 ` Sebastian Sturm
@ 2018-03-21 14:14 ` Sebastien Chapuis
2018-03-21 15:35 ` Eli Zaretskii
1 sibling, 1 reply; 54+ messages in thread
From: Sebastien Chapuis @ 2018-03-21 14:14 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: Sebastian Sturm, emacs-devel
Eli Zaretskii writes:
>> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>> Date: Sun, 18 Mar 2018 21:14:53 +0100
>>
>> [1] I'm using cquery for my C++ editing needs, which comes with an
>> overlay-based semantic highlighting mechanism. With my emacs
>> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
>> character insertion, which consume ~20 to 25 ms each when performing
>> edits close to the bottom of a 66KB C++ file (measured using
>> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
>> emacs-27/git commit #9942734...). Using the noverlay branch, this figure
>> drops to ~160us per call.
>
> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> provided by exposing find_newline to Lisp. IME, it's lightning-fast,
> and should run circles around count-lines (used by line-number-at-pos).
>
> (I'm not sure I even understand how overlays come into play here,
> btw.)
The language server protocol defines a position in file with zero-indexed
line and column offsets [1]:
```
interface Position {
line: number;
character: number;
}
```
lsp-mode uses heavily line-number-at-pos to convert an Emacs buffer
point to a LSP position, and vice-versa [2].
This can happen thousands times on each keystroke.
If Emacs could provide a function to do the conversion very fast (or at
least faster than with line-number-at-pos), it would be great.
[1] https://github.com/Microsoft/language-server-protocol/blob/gh-pages/specification.md#position
[2] The conversion from a LSP position to point doesn't use
line-number-at-pos, but forward-line and forward-char. It's still very slow.
--
Sebastien Chapuis
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-21 14:14 ` Sebastien Chapuis
@ 2018-03-21 15:35 ` Eli Zaretskii
0 siblings, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-21 15:35 UTC (permalink / raw)
To: Sebastien Chapuis; +Cc: s.sturm, emacs-devel
> From: Sebastien Chapuis <sebastien@chapu.is>
> Cc: Sebastian Sturm <s.sturm@arkona-technologies.de>, emacs-devel@gnu.org
> Date: Wed, 21 Mar 2018 15:14:18 +0100
>
> The language server protocol defines a position in file with zero-indexed
> line and column offsets [1]:
>
> ```
> interface Position {
> line: number;
> character: number;
> }
> ```
>
> lsp-mode uses heavily line-number-at-pos to convert an Emacs buffer
> point to a LSP position, and vice-versa [2].
line-number-at-pos is inefficient, in that it always counts from the
beginning of the buffer. By keeping already computed line numbers
around, you could make a faster implementation if you count relative
line offsets using count-lines instead.
> This can happen thousands times on each keystroke.
_Thousands_ times for _each_ keystroke? Why is that? Most keystrokes
only change a single line, so how come you need thousands of lines
recounted each time?
> If Emacs could provide a function to do the conversion very fast (or at
> least faster than with line-number-at-pos), it would be great.
Given the above, I think you need to describe the issue in more
details, before we even begin designing the solution.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-21 0:36 ` Sebastian Sturm
2018-03-21 6:47 ` Eli Zaretskii
@ 2018-03-22 13:16 ` Stefan Monnier
2018-03-22 19:54 ` Sebastian Sturm
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-22 13:16 UTC (permalink / raw)
To: emacs-devel
> this is the profiler report I get for the slow case (BTW, is there a way to
> have the profiler resolve functions within line-number-at-pos?
It should do that without you asking. I mean, it won't show you
`goto-char` and `point-min` kind of things since these are "inlined"
(actually turned into their own byte-code), but `count-lines` should
definitely be there.
> - let 18368 86%
> line-number-at-pos 18348 86%
It's very odd that there's no `count-lines` down here (and `count-lines`
is a perfectly normal Elisp function that's not inlined or otherwise
treated specially), since it should be where most of the time is spent!
If we assume the code works as intended, it would imply that the time is
not spent in `count-lines` but elsewhere, e.g. in `goto-char`.
> with perf, the ("self") time taken by buf_charpos_to_bytepos increases from
> ~60% (fast case) to >98%. This is the diff generated by perf diff
> <fast.perf> <slow.perf>:
So I guess that could be it: the (goto-char opoint) spends an inordinate
amount of time in buf_charpos_to_bytepos.
Could you try the patch below, to see if it makes a difference?
Stefan
diff --git a/src/marker.c b/src/marker.c
index 7773c4fce0..3d808fd6fa 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -141,6 +141,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
+ ptrdiff_t distance = 50;
eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
@@ -180,8 +181,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
/* If we are down to a range of 50 chars,
don't bother checking any other markers;
scan the intervening chars directly now. */
- if (best_above - best_below < 50)
+ if (best_above - best_below < distance)
break;
+ else
+ distance = distance + 10;
}
/* We get here if we did not exactly hit one of the known places.
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-22 13:16 ` Stefan Monnier
@ 2018-03-22 19:54 ` Sebastian Sturm
2018-03-22 20:04 ` Sebastian Sturm
0 siblings, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-22 19:54 UTC (permalink / raw)
To: emacs-devel
>
> It should do that without you asking. I mean, it won't show you
> `goto-char` and `point-min` kind of things since these are "inlined"
> (actually turned into their own byte-code), but `count-lines` should
> definitely be there.
>
>> - let
18368 86%
>> line-number-at-pos
18348 86%
>
> It's very odd that there's no `count-lines` down here (and `count-lines`
> is a perfectly normal Elisp function that's not inlined or otherwise
> treated specially), since it should be where most of the time is spent!
when first evaluating simple.el as Eli suggested, the profiler indeed
shows that pretty much all time is spent somewhere within `count-lines`.
Unfortunately the patch didn't seem to help though I'll double-check at
home (also, it seemed to improve baseline performance, but I'm not sure
if I should regard that as a random fluctuation, will need to do more
measurements).
Also, this time even the first measurement with a small number of
overlays was extremely slow, not sure what to make of that.
In general however, the trend seems clear to me --- for this one file,
overlays hurt `line-number-at-pos` performance very much, except when
using the noverlay branch.
Are there other things I should look for in my file that might
negatively affect performance (strange codepoints or anything else that
might perhaps upset `buf_charpos_to_bytepos`)?
baseline (emacs-27 master #667cdf42..., approx. 5-10 overlays created by
flycheck):
iteration 1: 0.012696
iteration 2: 0.002595
iteration 3: 0.002606
iteration 4: 0.002601
iteration 5: 0.002649
iteration 6: 0.002605
iteration 7: 0.002594
iteration 8: 0.002601
iteration 9: 0.002603
iteration 10: 0.002601
iteration 11: 0.002606
iteration 12: 0.002626
iteration 13: 0.002603
iteration 14: 0.002642
iteration 15: 0.002599
iteration 16: 0.002598
iteration 17: 0.002600
iteration 18: 0.002601
iteration 19: 0.002599
iteration 20: 0.002608
emacs-27 master, approx. 6200 overlays created by (test-highlight)
iteration 1: 0.022795
iteration 2: 0.015560
iteration 3: 0.015697
iteration 4: 0.015913
iteration 5: 0.015894
iteration 6: 0.016063
iteration 7: 0.015928
iteration 8: 0.015890
iteration 9: 0.015278
iteration 10: 0.015515
iteration 11: 0.015327
iteration 12: 0.015326
iteration 13: 0.015574
iteration 14: 0.015319
iteration 15: 0.015370
iteration 16: 0.015354
iteration 17: 0.015333
iteration 18: 0.015312
iteration 19: 0.015481
iteration 20: 0.015411
emacs-27 master + patch "distance+10", ~10 overlays created by flycheck:
iteration 1: 0.002938
iteration 2: 0.001132
iteration 3: 0.000550
iteration 4: 0.000468
iteration 5: 0.000470
iteration 6: 0.000451
iteration 7: 0.000449
iteration 8: 0.000451
iteration 9: 0.000450
iteration 10: 0.000449
iteration 11: 0.000448
iteration 12: 0.000452
iteration 13: 0.000451
iteration 14: 0.000443
iteration 15: 0.000448
iteration 16: 0.000443
iteration 17: 0.000444
iteration 18: 0.000445
iteration 19: 0.000445
iteration 20: 0.000445
emacs-27 master + patch "distance+10", ~6200 overlays created by
(test-highlight):
iteration 1: 0.019673
iteration 2: 0.014469
iteration 3: 0.014491
iteration 4: 0.014430
iteration 5: 0.014493
iteration 6: 0.014704
iteration 7: 0.014741
iteration 8: 0.014536
iteration 9: 0.014433
iteration 10: 0.014469
iteration 11: 0.014429
iteration 12: 0.014509
iteration 13: 0.014484
iteration 14: 0.014487
iteration 15: 0.014524
iteration 16: 0.014449
iteration 17: 0.014501
iteration 18: 0.014469
iteration 19: 0.014429
iteration 20: 0.014507
noverlay branch (#886933...), ~10 overlays:
iteration 1: 0.002370
iteration 2: 0.001191
iteration 3: 0.001189
iteration 4: 0.001162
iteration 5: 0.001095
iteration 6: 0.001096
iteration 7: 0.000522
iteration 8: 0.000531
iteration 9: 0.000369
iteration 10: 0.000365
iteration 11: 0.000365
iteration 12: 0.000366
iteration 13: 0.000365
iteration 14: 0.000380
iteration 15: 0.000367
iteration 16: 0.000365
iteration 17: 0.000365
iteration 18: 0.000365
iteration 19: 0.000363
iteration 20: 0.000368
noverlay branch, ~6200 overlays:
iteration 1: 0.001878
iteration 2: 0.001831
iteration 3: 0.001838
iteration 4: 0.001826
iteration 5: 0.001027
iteration 6: 0.000722
iteration 7: 0.000531
iteration 8: 0.000479
iteration 9: 0.000480
iteration 10: 0.000479
iteration 11: 0.000479
iteration 12: 0.000479
iteration 13: 0.000478
iteration 14: 0.000479
iteration 15: 0.000478
iteration 16: 0.000478
iteration 17: 0.000482
iteration 18: 0.000479
iteration 19: 0.000480
iteration 20: 0.000476
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-22 19:54 ` Sebastian Sturm
@ 2018-03-22 20:04 ` Sebastian Sturm
0 siblings, 0 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-22 20:04 UTC (permalink / raw)
To: emacs-devel
for the record, this is what I get with the (non-noverlay, non-patched)
emacs-27 master branch after calling `(set-buffer-multibyte nil)`. Guess
this merely confirms what you already found out, but for me this is a
viable workaround until the issue is properly resolved. Will keep
looking for an open source test file over the weekend
emacs-27 master, ~6200 overlays, enable-multibyte-characters = nil:
iteration 1: 0.000300
iteration 2: 0.000268
iteration 3: 0.000267
iteration 4: 0.000268
iteration 5: 0.000267
iteration 6: 0.000266
iteration 7: 0.000263
iteration 8: 0.000263
iteration 9: 0.000277
iteration 10: 0.000269
iteration 11: 0.000264
iteration 12: 0.000266
iteration 13: 0.000263
iteration 14: 0.000265
iteration 15: 0.000273
iteration 16: 0.000267
iteration 17: 0.000266
iteration 18: 0.000266
iteration 19: 0.000265
iteration 20: 0.000263
On 03/22/2018 08:54 PM, Sebastian Sturm wrote:
> >
> > It should do that without you asking. I mean, it won't show you
> > `goto-char` and `point-min` kind of things since these are "inlined"
> > (actually turned into their own byte-code), but `count-lines` should
> > definitely be there.
> >
> >> - let 18368 86%
> >> line-number-at-pos 18348 86%
> >
> > It's very odd that there's no `count-lines` down here (and `count-lines`
> > is a perfectly normal Elisp function that's not inlined or otherwise
> > treated specially), since it should be where most of the time is spent!
>
> when first evaluating simple.el as Eli suggested, the profiler indeed
> shows that pretty much all time is spent somewhere within `count-lines`.
> Unfortunately the patch didn't seem to help though I'll double-check at
> home (also, it seemed to improve baseline performance, but I'm not sure
> if I should regard that as a random fluctuation, will need to do more
> measurements).
> Also, this time even the first measurement with a small number of
> overlays was extremely slow, not sure what to make of that.
>
> In general however, the trend seems clear to me --- for this one file,
> overlays hurt `line-number-at-pos` performance very much, except when
> using the noverlay branch.
> Are there other things I should look for in my file that might
> negatively affect performance (strange codepoints or anything else that
> might perhaps upset `buf_charpos_to_bytepos`)?
>
> baseline (emacs-27 master #667cdf42..., approx. 5-10 overlays created by
> flycheck):
> iteration 1: 0.012696
> iteration 2: 0.002595
> iteration 3: 0.002606
> iteration 4: 0.002601
> iteration 5: 0.002649
> iteration 6: 0.002605
> iteration 7: 0.002594
> iteration 8: 0.002601
> iteration 9: 0.002603
> iteration 10: 0.002601
> iteration 11: 0.002606
> iteration 12: 0.002626
> iteration 13: 0.002603
> iteration 14: 0.002642
> iteration 15: 0.002599
> iteration 16: 0.002598
> iteration 17: 0.002600
> iteration 18: 0.002601
> iteration 19: 0.002599
> iteration 20: 0.002608
>
> emacs-27 master, approx. 6200 overlays created by (test-highlight)
> iteration 1: 0.022795
> iteration 2: 0.015560
> iteration 3: 0.015697
> iteration 4: 0.015913
> iteration 5: 0.015894
> iteration 6: 0.016063
> iteration 7: 0.015928
> iteration 8: 0.015890
> iteration 9: 0.015278
> iteration 10: 0.015515
> iteration 11: 0.015327
> iteration 12: 0.015326
> iteration 13: 0.015574
> iteration 14: 0.015319
> iteration 15: 0.015370
> iteration 16: 0.015354
> iteration 17: 0.015333
> iteration 18: 0.015312
> iteration 19: 0.015481
> iteration 20: 0.015411
>
> emacs-27 master + patch "distance+10", ~10 overlays created by flycheck:
> iteration 1: 0.002938
> iteration 2: 0.001132
> iteration 3: 0.000550
> iteration 4: 0.000468
> iteration 5: 0.000470
> iteration 6: 0.000451
> iteration 7: 0.000449
> iteration 8: 0.000451
> iteration 9: 0.000450
> iteration 10: 0.000449
> iteration 11: 0.000448
> iteration 12: 0.000452
> iteration 13: 0.000451
> iteration 14: 0.000443
> iteration 15: 0.000448
> iteration 16: 0.000443
> iteration 17: 0.000444
> iteration 18: 0.000445
> iteration 19: 0.000445
> iteration 20: 0.000445
>
> emacs-27 master + patch "distance+10", ~6200 overlays created by
> (test-highlight):
> iteration 1: 0.019673
> iteration 2: 0.014469
> iteration 3: 0.014491
> iteration 4: 0.014430
> iteration 5: 0.014493
> iteration 6: 0.014704
> iteration 7: 0.014741
> iteration 8: 0.014536
> iteration 9: 0.014433
> iteration 10: 0.014469
> iteration 11: 0.014429
> iteration 12: 0.014509
> iteration 13: 0.014484
> iteration 14: 0.014487
> iteration 15: 0.014524
> iteration 16: 0.014449
> iteration 17: 0.014501
> iteration 18: 0.014469
> iteration 19: 0.014429
> iteration 20: 0.014507
>
> noverlay branch (#886933...), ~10 overlays:
> iteration 1: 0.002370
> iteration 2: 0.001191
> iteration 3: 0.001189
> iteration 4: 0.001162
> iteration 5: 0.001095
> iteration 6: 0.001096
> iteration 7: 0.000522
> iteration 8: 0.000531
> iteration 9: 0.000369
> iteration 10: 0.000365
> iteration 11: 0.000365
> iteration 12: 0.000366
> iteration 13: 0.000365
> iteration 14: 0.000380
> iteration 15: 0.000367
> iteration 16: 0.000365
> iteration 17: 0.000365
> iteration 18: 0.000365
> iteration 19: 0.000363
> iteration 20: 0.000368
>
> noverlay branch, ~6200 overlays:
> iteration 1: 0.001878
> iteration 2: 0.001831
> iteration 3: 0.001838
> iteration 4: 0.001826
> iteration 5: 0.001027
> iteration 6: 0.000722
> iteration 7: 0.000531
> iteration 8: 0.000479
> iteration 9: 0.000480
> iteration 10: 0.000479
> iteration 11: 0.000479
> iteration 12: 0.000479
> iteration 13: 0.000478
> iteration 14: 0.000479
> iteration 15: 0.000478
> iteration 16: 0.000478
> iteration 17: 0.000482
> iteration 18: 0.000479
> iteration 19: 0.000480
> iteration 20: 0.000476
>
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-20 1:23 ` Sebastian Sturm
2018-03-20 6:30 ` Eli Zaretskii
@ 2018-03-22 20:52 ` Stefan Monnier
2018-03-22 23:11 ` Sebastian Sturm
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-22 20:52 UTC (permalink / raw)
To: emacs-devel
> (defun benchmark-often ()
> (cl-loop for n from 1 upto 20 do
> (message (format "iteration %d: %f" n (nth 0 (benchmark-run
> (line-number-at-pos (point))))))))
^^^^^^^
Where is this "point" in your tests (I expect the timing to vary
significantly depending on this).
> 1st run:
> iteration 1: 0.001213
> iteration 2: 0.001170
> iteration 3: 0.001170
> iteration 4: 0.001238
> iteration 5: 0.001163
> iteration 6: 0.001153
> iteration 7: 0.000421
> iteration 8: 0.000426
> iteration 9: 0.000322
> iteration 10: 0.000301
> iteration 11: 0.000291
> iteration 12: 0.000292
> iteration 13: 0.000291
> iteration 14: 0.000291
> iteration 15: 0.000295
> iteration 16: 0.000289
> iteration 17: 0.000289
> iteration 18: 0.000288
> iteration 19: 0.000288
> iteration 20: 0.000287
I recommend you don't bother outputting all 20 results: better summarize
it by getting rid of the first test and then giving e.g. the sum or the
median of the rest.
> I'm not allowed to share my employer's source code as a test case, so
> I tried the same procedure with the similarly large DeclBase.h from the
> public LLVM repository. To my surprise, DeclBase.h didn't suffer from any
> performance issues at all.
My crystal ball tells me that DeclBase.h is pure ASCII so byte<->char
conversion is trivial, whereas your file likely contains umlauts and
other disreputable characters.
Here's a similar test case to yours but which builds up its own
artificial buffer with a few non-ascii chars to spice things up:
(with-temp-buffer
(dotimes (i 1000)
(insert "lksajflahalskjdféefawrgfrüegf\n"))
(let ((txtbuf (current-buffer)))
(dotimes (s 4)
(with-temp-buffer
(insert-buffer-substring txtbuf)
(let ((stepsize (lsh 10 (* 4 s))))
(cl-loop for n from (point-min) upto (- (point-max) stepsize)
by stepsize do
(let ((ov (make-overlay n (+ (1- stepsize) n))))
(overlay-put ov 'cquery-sem-highlight t))))
(dotimes (i 4)
(let ((timing
(benchmark-run 1000
(line-number-at-pos
(+ (point-min) (* i (/ (buffer-size) 4)))))))
(message "ols=%S pos=%S/4 time=%.4f (+ %S)"
(/ (buffer-size) (lsh 10 (* 4 s))) i
(car timing) (cdr timing)))
)))))
This gave me (on my top-of-the-line Thinkpad T61 using Debian's `emacs25`):
ols=3000 pos=0/4 time=0.0018 (+ (0 0.0))
ols=3000 pos=1/4 time=6.1074 (+ (0 0.0))
ols=3000 pos=2/4 time=10.6876 (+ (0 0.0))
ols=3000 pos=3/4 time=13.7854 (+ (0 0.0))
ols=187 pos=0/4 time=0.0016 (+ (0 0.0))
ols=187 pos=1/4 time=0.3055 (+ (0 0.0))
ols=187 pos=2/4 time=0.6001 (+ (0 0.0))
ols=187 pos=3/4 time=0.8903 (+ (0 0.0))
ols=11 pos=0/4 time=0.0015 (+ (0 0.0))
ols=11 pos=1/4 time=0.0769 (+ (1 0.006324223))
ols=11 pos=2/4 time=0.1439 (+ (0 0.0))
ols=11 pos=3/4 time=0.2215 (+ (0 0.0))
ols=0 pos=0/4 time=0.0015 (+ (0 0.0))
ols=0 pos=1/4 time=0.0548 (+ (0 0.0))
ols=0 pos=2/4 time=0.1102 (+ (0 0.0))
ols=0 pos=3/4 time=0.1690 (+ (0 0.0))
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-22 20:52 ` Stefan Monnier
@ 2018-03-22 23:11 ` Sebastian Sturm
2018-03-23 5:03 ` Stefan Monnier
2018-03-23 8:07 ` Eli Zaretskii
0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-22 23:11 UTC (permalink / raw)
To: emacs-devel
On 03/22/2018 09:52 PM, Stefan Monnier wrote:
>> (defun benchmark-often ()
>> (cl-loop for n from 1 upto 20 do
>> (message (format "iteration %d: %f" n (nth 0 (benchmark-run
>> (line-number-at-pos (point))))))))
> ^^^^^^^
> Where is this "point" in your tests (I expect the timing to vary
> significantly depending on this).
yes, these tests were all performed close to the very bottom of the file
as I knew the issue to get worse towards the buffer end
> My crystal ball tells me that DeclBase.h is pure ASCII so byte<->char
> conversion is trivial, whereas your file likely contains umlauts and
> other disreputable characters.
>
> Here's a similar test case to yours but which builds up its own
> artificial buffer with a few non-ascii chars to spice things up:
thank you! I'm very glad you could come up with a reproducible test
case, and it's true that my file contains two instances of the greek
letter "μ" that seem to cause this performance issue (though I was
surprised to see that this few characters could have such a poisonous
effect). Likewise, when replacing the topmost part in your benchmark
function with the following:
(A) (dotimes (i 1000)
(insert "pure ascii pure ascii pure ascii\n"))
(B) (dotimes (i 500)
(insert "pure ascii pure ascii pure ascii\n"))
(insert "μ")
(dotimes (i 500)
(insert "pure ascii pure ascii pure ascii\n"))
respectively, I obtain the following timing results:
(A)
ols=3300 pos=0/4 time=0.0014 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0155 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0281 (+ (0 0.0))
ols=3300 pos=3/4 time=0.0447 (+ (0 0.0))
ols=206 pos=0/4 time=0.0007 (+ (0 0.0))
ols=206 pos=1/4 time=0.0130 (+ (0 0.0))
ols=206 pos=2/4 time=0.0283 (+ (0 0.0))
ols=206 pos=3/4 time=0.0447 (+ (0 0.0))
ols=12 pos=0/4 time=0.0007 (+ (0 0.0))
ols=12 pos=1/4 time=0.0129 (+ (0 0.0))
ols=12 pos=2/4 time=0.0281 (+ (0 0.0))
ols=12 pos=3/4 time=0.0447 (+ (0 0.0))
ols=0 pos=0/4 time=0.0007 (+ (0 0.0))
ols=0 pos=1/4 time=0.0134 (+ (0 0.0))
ols=0 pos=2/4 time=0.0297 (+ (0 0.0))
ols=0 pos=3/4 time=0.0463 (+ (0 0.0))
(B)
ols=3300 pos=0/4 time=0.0007 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0301 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0482 (+ (0 0.0))
ols=3300 pos=3/4 time=8.0213 (+ (0 0.0))
ols=206 pos=0/4 time=0.0007 (+ (0 0.0))
ols=206 pos=1/4 time=0.0141 (+ (0 0.0))
ols=206 pos=2/4 time=0.0325 (+ (0 0.0))
ols=206 pos=3/4 time=0.1786 (+ (0 0.0))
ols=12 pos=0/4 time=0.0007 (+ (0 0.0))
ols=12 pos=1/4 time=0.0136 (+ (0 0.0))
ols=12 pos=2/4 time=0.0323 (+ (0 0.0))
ols=12 pos=3/4 time=0.0794 (+ (0 0.0))
ols=0 pos=0/4 time=0.0009 (+ (0 0.0))
ols=0 pos=1/4 time=0.0139 (+ (0 0.0))
ols=0 pos=2/4 time=0.0326 (+ (0 0.0))
ols=0 pos=3/4 time=0.0632 (+ (0 0.0))
by comparison, these are my results using the noverlay branch:
(A)
ols=3300 pos=0/4 time=0.0012 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0132 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0291 (+ (0 0.0))
ols=3300 pos=3/4 time=0.0448 (+ (0 0.0))
ols=206 pos=0/4 time=0.0007 (+ (0 0.0))
ols=206 pos=1/4 time=0.0132 (+ (0 0.0))
ols=206 pos=2/4 time=0.0290 (+ (0 0.0))
ols=206 pos=3/4 time=0.0454 (+ (0 0.0))
ols=12 pos=0/4 time=0.0008 (+ (0 0.0))
ols=12 pos=1/4 time=0.0131 (+ (0 0.0))
ols=12 pos=2/4 time=0.0287 (+ (0 0.0))
ols=12 pos=3/4 time=0.0452 (+ (0 0.0))
ols=0 pos=0/4 time=0.0007 (+ (0 0.0))
ols=0 pos=1/4 time=0.0131 (+ (0 0.0))
ols=0 pos=2/4 time=0.0289 (+ (0 0.0))
ols=0 pos=3/4 time=0.0457 (+ (0 0.0))
(B)
ols=3300 pos=0/4 time=0.0015 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0177 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0345 (+ (0 0.0))
ols=3300 pos=3/4 time=0.0544 (+ (0 0.0))
ols=206 pos=0/4 time=0.0008 (+ (0 0.0))
ols=206 pos=1/4 time=0.0136 (+ (0 0.0))
ols=206 pos=2/4 time=0.0317 (+ (0 0.0))
ols=206 pos=3/4 time=0.0537 (+ (0 0.0))
ols=12 pos=0/4 time=0.0007 (+ (0 0.0))
ols=12 pos=1/4 time=0.0135 (+ (0 0.0))
ols=12 pos=2/4 time=0.0319 (+ (0 0.0))
ols=12 pos=3/4 time=0.0537 (+ (0 0.0))
ols=0 pos=0/4 time=0.0007 (+ (0 0.0))
ols=0 pos=1/4 time=0.0146 (+ (0 0.0))
ols=0 pos=2/4 time=0.0318 (+ (0 0.0))
ols=0 pos=3/4 time=0.0554 (+ (0 0.0))
since noverlay performs so well, I guess the technical issue here is
already solved and I'll just have to wait for it to make it into the
master branch. Until then I'll continue using feature/noverlay, but if a
more recent merge with master was made available, I'd be interested in
testing that.
thanks again for all the helpful responses in this thread,
Sebastian
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-22 23:11 ` Sebastian Sturm
@ 2018-03-23 5:03 ` Stefan Monnier
2018-03-23 12:25 ` Sebastian Sturm
2018-03-23 8:07 ` Eli Zaretskii
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 5:03 UTC (permalink / raw)
To: emacs-devel
> since noverlay performs so well, I guess the technical issue here is already
> solved and I'll just have to wait for it to make it into the master
> branch.
Yes and no: there can still be many markers (even without having any
overlays) and that poses the same problem (and the noverlays branch
doesn't change that).
I've made some further tests and found that the patch I sent indeed
doesn't help tremendously with "distance += 10" but that when we reach
"+= 50" the effect is more significant.
Could you try the patch below (ideally not just with artificial tests
but also in actual use) to see if it helps?
Stefan
diff --git a/src/marker.c b/src/marker.c
index 7773c4fce0..6ab0d3d61e 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
CHECK_TYPE (MARKERP (x), Qmarkerp, x);
}
+/* When converting bytes from/to chars, we look through the list of
+ markers to try and find a good starting point (since markers keep
+ track of both bytepos and charpos at the same time).
+ But if there are many markers, it can take too much time to find a "good"
+ marker from which to start. Worse yet: if it takes a long time and we end
+ up finding a nearby markers, we won't add a new marker to cache this
+ result, so next time around we'll have to go through this same long list
+ to (re)find this best marker. So the further down the list of
+ markers we go, the less demanding we are w.r.t what is a good marker.
+
+ The previous code used INITIAL=50 and INCREMENT=0 and this lead to
+ really poor performance when there are many markers.
+ I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
+ T61 using various artificial test cases seem to suggest that INCREMENT=50
+ might be "the best compromise": it significantly improved the
+ worst case and it was rarely slower and never by much.
+
+ The asymptotic behavior is still poor, tho, so in largish buffers with many
+ overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */
+#define BYTECHAR_DISTANCE_INITIAL 50
+#define BYTECHAR_DISTANCE_INCREMENT 50
+
/* Return the byte position corresponding to CHARPOS in B. */
ptrdiff_t
@@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
+ ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
@@ -180,8 +203,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
/* If we are down to a range of 50 chars,
don't bother checking any other markers;
scan the intervening chars directly now. */
- if (best_above - best_below < 50)
+ if (best_above - best_below < distance)
break;
+ else
+ distance += BYTECHAR_DISTANCE_INCREMENT;
}
/* We get here if we did not exactly hit one of the known places.
@@ -293,6 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
+ ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
@@ -323,8 +349,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
/* If we are down to a range of 50 chars,
don't bother checking any other markers;
scan the intervening chars directly now. */
- if (best_above - best_below < 50)
+ if (best_above - best_below < distance)
break;
+ else
+ distance += BYTECHAR_DISTANCE_INCREMENT;
}
/* We get here if we did not exactly hit one of the known places.
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-22 23:11 ` Sebastian Sturm
2018-03-23 5:03 ` Stefan Monnier
@ 2018-03-23 8:07 ` Eli Zaretskii
2018-03-23 9:08 ` Eli Zaretskii
1 sibling, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 8:07 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Fri, 23 Mar 2018 00:11:16 +0100
>
> since noverlay performs so well, I guess the technical issue here is
> already solved and I'll just have to wait for it to make it into the
> master branch.
As Stefan points out, the overlays are not the reason. The reason are
the number of markers in the buffer; each overlay defines 2 markers.
And there could be many markers in a buffer even if there are no
overlays.
Btw, why do you have so many overlays in these buffers? Is this part
of lsp-mode implementation, or is the reason unrelated?
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 8:07 ` Eli Zaretskii
@ 2018-03-23 9:08 ` Eli Zaretskii
2018-03-23 10:15 ` Sebastian Sturm
2018-03-23 12:12 ` Stefan Monnier
0 siblings, 2 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 9:08 UTC (permalink / raw)
To: s.sturm; +Cc: emacs-devel
> Date: Fri, 23 Mar 2018 11:07:26 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
>
> Btw, why do you have so many overlays in these buffers? Is this part
> of lsp-mode implementation, or is the reason unrelated?
Also, what about my suggestion to count lines in a relative manner,
using count-lines from a line with a known number? You never replied
to that suggestion.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 9:08 ` Eli Zaretskii
@ 2018-03-23 10:15 ` Sebastian Sturm
2018-03-23 12:39 ` Eli Zaretskii
2018-03-23 12:12 ` Stefan Monnier
1 sibling, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-23 10:15 UTC (permalink / raw)
To: emacs-devel
On 03/23/2018 10:08 AM, Eli Zaretskii wrote:
>> Date: Fri, 23 Mar 2018 11:07:26 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: emacs-devel@gnu.org
>>
>> Btw, why do you have so many overlays in these buffers? Is this part
>> of lsp-mode implementation, or is the reason unrelated?
this is not related to lsp-mode itself, but to the semantic highlighter
implemented by emacs-cquery (which is built on top of lsp-mode but
implements additional features offered by the cquery backend). When set
to use overlays (which provides a better visual experience than
font-lock, as font-lock tends to get out of sync with the buffer during
editing), the semantic highlighter retrieves a list of symbols within
the current buffer and creates overlays to provide semantically
meaningful syntax highlighting. It's not a feature I couldn't live
without, but it's very precise and in principle could probably be faster
than the syntax highlighting provided by cc-mode as C++ parsing is
handled asynchronously by the clang-based native backend.
> Also, what about my suggestion to count lines in a relative manner,
> using count-lines from a line with a known number? You never replied
> to that suggestion.
you're right, sorry. In my opinion, a caching mechanism might be a very
useful thing to have if it provides further performance benefits on top
of what the noverlay branch has to offer. However, since count-lines may
not be the only function that has to convert between char and byte
positions (or is it?), and since the noverlay branch seems to resolve
the overlay issue without having to introduce additional complexity in
the elisp layer, implementing a caching mechanism before noverlay is
merged into the master branch seems like a premature optimization to me.
Of course this is a layman's opinion (and maybe the case of "few
overlays but many markers" is not as pathological as it appears to me);
if you think a line number cache should be implemented, I'll go and
discuss that with the lsp-mode maintainers (assuming that they are among
the heaviest users of line-number-at-pos).
Sebastian
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 9:08 ` Eli Zaretskii
2018-03-23 10:15 ` Sebastian Sturm
@ 2018-03-23 12:12 ` Stefan Monnier
2018-03-23 12:40 ` Eli Zaretskii
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 12:12 UTC (permalink / raw)
To: emacs-devel
> Also, what about my suggestion to count lines in a relative manner,
> using count-lines from a line with a known number? You never replied
> to that suggestion.
FWIW, I believe in his case most of the time is spent outside of the
actual "count the lines" (aka forward-line) code, so counting fewer
lines because we start from a more nearby position probably won't help
very much.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 5:03 ` Stefan Monnier
@ 2018-03-23 12:25 ` Sebastian Sturm
2018-03-23 12:47 ` Eli Zaretskii
0 siblings, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-23 12:25 UTC (permalink / raw)
To: emacs-devel
I haven't tested this very extensively yet, but artificial benchmark
results are now comparable to the noverlay branch and editing seems
similarly fluid. Many thanks for that!
On a tangential note, removing the bottleneck now brought up another
hotspot in my performance profile (less severe than the one previously
reported) that seems to be due to cc-mode itself. Since I don't want to
derail this thread, I'll try to implement some of the suggestions made
in the "Latency profiling?" thread and come up with some reproducible
data on that other issue.
On 03/23/2018 06:03 AM, Stefan Monnier wrote:
>> since noverlay performs so well, I guess the technical issue here is already
>> solved and I'll just have to wait for it to make it into the master
>> branch.
>
> Yes and no: there can still be many markers (even without having any
> overlays) and that poses the same problem (and the noverlays branch
> doesn't change that).
>
> I've made some further tests and found that the patch I sent indeed
> doesn't help tremendously with "distance += 10" but that when we reach
> "+= 50" the effect is more significant.
>
> Could you try the patch below (ideally not just with artificial tests
> but also in actual use) to see if it helps?
>
>
> Stefan
>
>
> diff --git a/src/marker.c b/src/marker.c
> index 7773c4fce0..6ab0d3d61e 100644
> --- a/src/marker.c
> +++ b/src/marker.c
> @@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
> CHECK_TYPE (MARKERP (x), Qmarkerp, x);
> }
>
> +/* When converting bytes from/to chars, we look through the list of
> + markers to try and find a good starting point (since markers keep
> + track of both bytepos and charpos at the same time).
> + But if there are many markers, it can take too much time to find a "good"
> + marker from which to start. Worse yet: if it takes a long time and we end
> + up finding a nearby markers, we won't add a new marker to cache this
> + result, so next time around we'll have to go through this same long list
> + to (re)find this best marker. So the further down the list of
> + markers we go, the less demanding we are w.r.t what is a good marker.
> +
> + The previous code used INITIAL=50 and INCREMENT=0 and this lead to
> + really poor performance when there are many markers.
> + I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
> + T61 using various artificial test cases seem to suggest that INCREMENT=50
> + might be "the best compromise": it significantly improved the
> + worst case and it was rarely slower and never by much.
> +
> + The asymptotic behavior is still poor, tho, so in largish buffers with many
> + overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */
> +#define BYTECHAR_DISTANCE_INITIAL 50
> +#define BYTECHAR_DISTANCE_INCREMENT 50
> +
> /* Return the byte position corresponding to CHARPOS in B. */
>
> ptrdiff_t
> @@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
> struct Lisp_Marker *tail;
> ptrdiff_t best_above, best_above_byte;
> ptrdiff_t best_below, best_below_byte;
> + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
>
> eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
>
> @@ -180,8 +203,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
> /* If we are down to a range of 50 chars,
> don't bother checking any other markers;
> scan the intervening chars directly now. */
> - if (best_above - best_below < 50)
> + if (best_above - best_below < distance)
> break;
> + else
> + distance += BYTECHAR_DISTANCE_INCREMENT;
> }
>
> /* We get here if we did not exactly hit one of the known places.
> @@ -293,6 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
> struct Lisp_Marker *tail;
> ptrdiff_t best_above, best_above_byte;
> ptrdiff_t best_below, best_below_byte;
> + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
>
> eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
>
> @@ -323,8 +349,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
> /* If we are down to a range of 50 chars,
> don't bother checking any other markers;
> scan the intervening chars directly now. */
> - if (best_above - best_below < 50)
> + if (best_above - best_below < distance)
> break;
> + else
> + distance += BYTECHAR_DISTANCE_INCREMENT;
> }
>
> /* We get here if we did not exactly hit one of the known places.
>
>
--
Sebastian Sturm
Research & Development
Phone: +49 (0) 6155 7808977
Fax: +49 (0) 6155 7802880
Email: s.sturm@arkona-technologies.de
Web: www.arkona-technologies.de
arkona technologies GmbH
Im Leuschnerpark 4
64347 Griesheim
Germany
Amtsgericht / Commercial Register of Darmstadt, HRB 90080
USt-ID: DE273794666
Steuernummer / Tax-ID: 007 / 228 / 19331
Geschäftsführung / Managing Director: Rainer Sturm
This e-mail may contain confidential and/or privileged information. If
you are not the intended recipient (or have received this e-mail in
error) please notify the sender immediately and delete this message and
any attachments from your system. Any unauthorised copying, disclosure
or distribution of the material in this e-mail is strictly forbidden.
We cannot accept any responsibility for the accuracy or completeness of
this message as it has been transmitted over a public network. If,
despite our use of anti-virus software, a virus enters your systems in
connection with the sending of the e-mail, you may not hold us liable
for any damages that may possibly arise in that connection. We will
accept liability which by law we cannot exclude.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 10:15 ` Sebastian Sturm
@ 2018-03-23 12:39 ` Eli Zaretskii
0 siblings, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 12:39 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Fri, 23 Mar 2018 11:15:48 +0100
>
> > Also, what about my suggestion to count lines in a relative manner,
> > using count-lines from a line with a known number? You never replied
> > to that suggestion.
>
> you're right, sorry. In my opinion, a caching mechanism might be a very
> useful thing to have if it provides further performance benefits on top
> of what the noverlay branch has to offer. However, since count-lines may
> not be the only function that has to convert between char and byte
> positions (or is it?), and since the noverlay branch seems to resolve
> the overlay issue without having to introduce additional complexity in
> the elisp layer, implementing a caching mechanism before noverlay is
> merged into the master branch seems like a premature optimization to me.
It isn't premature optimization, because a buffer could have many
markers even if it has no or only a few overlays.
> Of course this is a layman's opinion (and maybe the case of "few
> overlays but many markers" is not as pathological as it appears to me);
> if you think a line number cache should be implemented, I'll go and
> discuss that with the lsp-mode maintainers (assuming that they are among
> the heaviest users of line-number-at-pos).
I think the effect should be at least measured before the decision
whether to do that is made.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 12:12 ` Stefan Monnier
@ 2018-03-23 12:40 ` Eli Zaretskii
2018-03-23 12:55 ` Stefan Monnier
0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 12:40 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 23 Mar 2018 08:12:41 -0400
>
> > Also, what about my suggestion to count lines in a relative manner,
> > using count-lines from a line with a known number? You never replied
> > to that suggestion.
>
> FWIW, I believe in his case most of the time is spent outside of the
> actual "count the lines" (aka forward-line) code, so counting fewer
> lines because we start from a more nearby position probably won't help
> very much.
I'm not sure, because the profile indicates memrchr is a significant
runner-up in the CPU time usage.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 12:25 ` Sebastian Sturm
@ 2018-03-23 12:47 ` Eli Zaretskii
2018-03-23 13:19 ` Stefan Monnier
0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 12:47 UTC (permalink / raw)
To: Sebastian Sturm; +Cc: emacs-devel
> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Fri, 23 Mar 2018 13:25:19 +0100
>
> I haven't tested this very extensively yet, but artificial benchmark
> results are now comparable to the noverlay branch and editing seems
> similarly fluid. Many thanks for that!
I'd be interested to see a comparison with a code that ignores the
markers entirely, and uses just these 4:
CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of
the number of markers.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 12:40 ` Eli Zaretskii
@ 2018-03-23 12:55 ` Stefan Monnier
0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 12:55 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> I'm not sure, because the profile indicates memrchr is a significant
> runner-up in the CPU time usage.
Oh, indeed I wasn't clear: it won't solve the "too many markers" case,
but it will still be useful, especially in the case where there aren't
too many markers.
My experience with nlinum--line-number-at-pos is that it's a low-hanging
fruit. If the solution can be implemented directly inside
line-number-at-pos it'd be even better.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 12:47 ` Eli Zaretskii
@ 2018-03-23 13:19 ` Stefan Monnier
2018-03-23 13:37 ` Noam Postavsky
2018-03-23 14:22 ` Eli Zaretskii
0 siblings, 2 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 13:19 UTC (permalink / raw)
To: emacs-devel
> I'd be interested to see a comparison with a code that ignores the
> markers entirely, and uses just these 4:
>
> CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
> CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
> CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
> CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
I tried that, and in the synthetic benchmark which tries to reproduce
Sebastian's lsp-mode situation the result was indeed much better, but
then in other benchmarks it caused very significant slowdowns.
> That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of
> the number of markers.
I haven't investigated closely enough to be sure, but in my tests with
INCREMENT=50 I reached points where adding more overlays did not make
things worse any more, so I suspect that the ideal INCREMENT depends on
the buffer size more than on the number of markers.
In any case, my patch is just a "quick hack" to try and reduce the pain,
but it doesn't really solve the problem: the slow down with many markers
and a large buffer still grows pretty significantly with the size of
the buffer. I think it's worse than O(sqrt N).
If we want to really solve this problem, we should use an algorithm with
at most an O(log N) complexity, e.g. keeping the markers in
a red-black tree, or inside a sorted array (probably with a gap like we
have for the buffer text) so we can do a binary search.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 13:19 ` Stefan Monnier
@ 2018-03-23 13:37 ` Noam Postavsky
2018-03-23 13:55 ` Stefan Monnier
2018-03-23 14:22 ` Eli Zaretskii
1 sibling, 1 reply; 54+ messages in thread
From: Noam Postavsky @ 2018-03-23 13:37 UTC (permalink / raw)
To: Stefan Monnier; +Cc: Emacs developers
On 23 March 2018 at 09:19, Stefan Monnier <monnier@iro.umontreal.ca> wrote:
> In any case, my patch is just a "quick hack" to try and reduce the pain,
> but it doesn't really solve the problem: the slow down with many markers
> and a large buffer still grows pretty significantly with the size of
> the buffer. I think it's worse than O(sqrt N).
>
> If we want to really solve this problem, we should use an algorithm with
> at most an O(log N) complexity, e.g. keeping the markers in
> a red-black tree, or inside a sorted array (probably with a gap like we
> have for the buffer text) so we can do a binary search.
Is this related to Bug#24548 "Long GC delays with many non-detached
markers (PATCH)"
aka Bug#29439 "Quadratic complexity in sweep_markers"?
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24548
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29439
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 13:37 ` Noam Postavsky
@ 2018-03-23 13:55 ` Stefan Monnier
0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 13:55 UTC (permalink / raw)
To: Noam Postavsky; +Cc: Emacs developers
>> If we want to really solve this problem, we should use an algorithm with
>> at most an O(log N) complexity, e.g. keeping the markers in
>> a red-black tree, or inside a sorted array (probably with a gap like we
>> have for the buffer text) so we can do a binary search.
>
> Is this related to Bug#24548 "Long GC delays with many non-detached
> markers (PATCH)"
> aka Bug#29439 "Quadratic complexity in sweep_markers"?
Not directly, no,
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 13:19 ` Stefan Monnier
2018-03-23 13:37 ` Noam Postavsky
@ 2018-03-23 14:22 ` Eli Zaretskii
2018-03-23 14:39 ` Stefan Monnier
2018-03-23 19:39 ` Stefan Monnier
1 sibling, 2 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 14:22 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 23 Mar 2018 09:19:11 -0400
>
> > I'd be interested to see a comparison with a code that ignores the
> > markers entirely, and uses just these 4:
> >
> > CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
> > CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
> > CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
> > CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
>
> I tried that, and in the synthetic benchmark which tries to reproduce
> Sebastian's lsp-mode situation the result was indeed much better, but
> then in other benchmarks it caused very significant slowdowns.
In what benchmarks did it cause significant slowdowns?
> > That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of
> > the number of markers.
>
> I haven't investigated closely enough to be sure, but in my tests with
> INCREMENT=50 I reached points where adding more overlays did not make
> things worse any more, so I suspect that the ideal INCREMENT depends on
> the buffer size more than on the number of markers.
Could be. My point was that it isn't a constant.
> If we want to really solve this problem, we should use an algorithm with
> at most an O(log N) complexity, e.g. keeping the markers in
> a red-black tree, or inside a sorted array (probably with a gap like we
> have for the buffer text) so we can do a binary search.
Yes, agreed.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 14:22 ` Eli Zaretskii
@ 2018-03-23 14:39 ` Stefan Monnier
2018-03-23 19:39 ` Stefan Monnier
1 sibling, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 14:39 UTC (permalink / raw)
To: emacs-devel
>> > I'd be interested to see a comparison with a code that ignores the
>> > markers entirely, and uses just these 4:
>> > CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
>> > CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
>> > CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
>> > CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
>> I tried that, and in the synthetic benchmark which tries to reproduce
>> Sebastian's lsp-mode situation the result was indeed much better, but
>> then in other benchmarks it caused very significant slowdowns.
> In what benchmarks did it cause significant slowdowns?
Can't remember exactly, I think it was bad enough for a test case
which seemed pretty realistic so I discarded that option.
Basically what I remember is that I got the impression that it would
probably harm more users than the problem at hand.
Stefan
PS: BTW, the number of markers is not the only issue: the order in which
they are created also matters. Maybe we should try Sebastian's test
case but creating the markers/overlays in random order (and if that
helps, we could get a similar effect by making the GC randomly shuffle
the buffers's list of markers ;-).
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 14:22 ` Eli Zaretskii
2018-03-23 14:39 ` Stefan Monnier
@ 2018-03-23 19:39 ` Stefan Monnier
2018-03-25 15:11 ` Stefan Monnier
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 19:39 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
OK, I think I'm tired of these experiments.
Here's my current test, along with the patches I use with it.
You can test it with something like
src/emacs -Q --batch -l ~/tmp/foo.el --eval '(setq internal--bytechar-distance-increment 50 internal--randomize-markers t)' --eval '(bytechar-test 3000 nil)'
Shuffling the markers can make a noticeable difference in some cases but
we're only talking about a factor of 2 or 3. It doesn't have much
negative impact, so it's not a bad option, but the algorithmic
problem remains anyway.
Stefan
(defun bytechar-test (buffer-kb &optional forward)
(random "seed")
(with-temp-buffer
(dotimes (i (* buffer-kb 33))
(insert "lksajflahalskjdféefawrgfrüegf\n"))
(message "buffer-size = %SkB" (/ (buffer-size) 1024))
(let ((txtbuf (current-buffer))
(goto-iterations (/ 10000000 buffer-kb))
(line-iterations (/ 20000 buffer-kb))
(markers ()))
(dotimes (s 4)
(with-temp-buffer
(insert-buffer-substring txtbuf)
(let ((stepsize (lsh 10 (* 4 s))))
(if forward
(cl-loop for n from (point-min) upto (point-max) by stepsize do
(push (copy-marker n) markers))
(cl-loop for n from (point-max) downto (point-min) by stepsize do
(push (copy-marker n) markers))))
;; The GC's internal--randomize-markers just brings-to-front every
;; 8th marker, so when starting with an ordered list of markers (like
;; in our case), we need to run the GC at least 8 times before the
;; whole list starts to look somewhat shuffled.
(dotimes (i 20) (garbage-collect))
(let ((timing
(benchmark-run goto-iterations
(goto-char (+ (point-min) (random (buffer-size)))))))
(message "ols=%S goto-random time=%.4f (+ %S)"
(/ (buffer-size) (lsh 10 (* 4 s)))
(car timing) (cdr timing)))
(garbage-collect) ;throw away the transient markers
(let ((timing
(benchmark-run line-iterations
(dotimes (i 5)
(line-number-at-pos
(+ (point-min) (* i (/ (buffer-size) 4))))))))
(message "nbmks=%S pos=*/4 time=%.4f (+ %S)"
(/ (buffer-size) (lsh 10 (* 4 s)))
(car timing) (cdr timing)))
(dotimes (i 5)
(let ((timing
(benchmark-run line-iterations
(line-number-at-pos
(+ (point-min) (* i (/ (buffer-size) 4)))))))
(message "nbmks=%S pos=%S/4 time=%.4f (+ %S)"
(/ (buffer-size) (lsh 10 (* 4 s))) i
(car timing) (cdr timing))))
)))))
diff --git a/lisp/emacs-lisp/benchmark.el b/lisp/emacs-lisp/benchmark.el
index b86b56b81e..2f4e38fe35 100644
--- a/lisp/emacs-lisp/benchmark.el
+++ b/lisp/emacs-lisp/benchmark.el
@@ -50,7 +50,7 @@ benchmark-run
garbage collections that ran, and the time taken by garbage collection.
See also `benchmark-run-compiled'."
(declare (indent 1) (debug t))
- (unless (natnump repetitions)
+ (unless (or (natnump repetitions) (symbolp repetitions))
(setq forms (cons repetitions forms)
repetitions 1))
(let ((i (make-symbol "i"))
@@ -58,7 +58,7 @@ benchmark-run
(gc (make-symbol "gc")))
`(let ((,gc gc-elapsed)
(,gcs gcs-done))
- (list ,(if (> repetitions 1)
+ (list ,(if (or (symbolp repetitions) (> repetitions 1))
;; Take account of the loop overhead.
`(- (benchmark-elapse (dotimes (,i ,repetitions)
,@forms))
@@ -101,7 +101,7 @@ benchmark
For non-interactive use see also `benchmark-run' and
`benchmark-run-compiled'."
(interactive "p\nxForm: ")
- (let ((result (eval `(benchmark-run ,repetitions ,form))))
+ (let ((result (eval `(benchmark-run ,repetitions ,form) t)))
(if (zerop (nth 1 result))
(message "Elapsed time: %fs" (car result))
(message "Elapsed time: %fs (%fs in %d GCs)" (car result)
diff --git a/src/alloc.c b/src/alloc.c
index 7ba872aaee..16d11e34cd 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -7270,10 +7270,22 @@ static void
unchain_dead_markers (struct buffer *buffer)
{
struct Lisp_Marker *this, **prev = &BUF_MARKERS (buffer);
+ /* In order to try and avoid worst case behaviors in buf_charpos_to_bytepos
+ we try and randomize the order of markers here. */
+ unsigned i = 4;
while ((this = *prev))
if (this->gcmarkbit)
- prev = &this->next;
+ {
+ if (!randomize_markers || i++ % 8)
+ prev = &this->next;
+ else
+ { /* Move this one to front, just to randomize things a bit. */
+ *prev = this->next;
+ this->next = BUF_MARKERS (buffer);
+ BUF_MARKERS (buffer) = this;
+ }
+ }
else
{
this->buffer = NULL;
@@ -7752,6 +7764,9 @@ The time is in seconds as a floating point value. */);
DEFVAR_INT ("gcs-done", gcs_done,
doc: /* Accumulated number of garbage collections done. */);
+ DEFVAR_BOOL ("internal--randomize-markers", randomize_markers, doc: /* */);
+ randomize_markers = true;
+
defsubr (&Scons);
defsubr (&Slist);
defsubr (&Svector);
diff --git a/src/marker.c b/src/marker.c
index 3d808fd6fa..7c1d164927 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
CHECK_TYPE (MARKERP (x), Qmarkerp, x);
}
+/* When converting bytes from/to chars, we look through the list of
+ markers to try and find a good starting point (since markers keep
+ track of both bytepos and charpos at the same time).
+ But if there are many markers, it can take too much time to find a "good"
+ marker from which to start. Worse yet: if it takes a long time and we end
+ up finding a nearby markers, we won't add a new marker to cache this
+ result, so next time around we'll have to go through this same long list
+ to (re)find this best marker. So the further down the list of
+ markers we go, the less demanding we are w.r.t what is a good marker.
+
+ The previous code used INITIAL=50 and INCREMENT=0 and this lead to
+ really poor performance when there are many markers.
+ I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
+ T61 using various artificial test cases seem to suggest that INCREMENT=50
+ might be "the best compromise": it significantly improved the
+ worst case and it was rarely slower and never by much.
+
+ The asymptotic behavior is still poor, tho, so in largish buffers with many
+ overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */
+#define BYTECHAR_DISTANCE_INITIAL 50
+#define BYTECHAR_DISTANCE_INCREMENT bytechar_distance_increment
+
/* Return the byte position corresponding to CHARPOS in B. */
ptrdiff_t
@@ -141,7 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
- ptrdiff_t distance = 50;
+ ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
@@ -184,7 +206,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
if (best_above - best_below < distance)
break;
else
- distance++;
+ distance += BYTECHAR_DISTANCE_INCREMENT;
}
/* We get here if we did not exactly hit one of the known places.
@@ -296,7 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
- ptrdiff_t distance = 50;
+ ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
@@ -330,7 +352,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
if (best_above - best_below < distance)
break;
else
- distance++;
+ distance += BYTECHAR_DISTANCE_INCREMENT;
}
/* We get here if we did not exactly hit one of the known places.
@@ -756,4 +778,9 @@ syms_of_marker (void)
defsubr (&Scopy_marker);
defsubr (&Smarker_insertion_type);
defsubr (&Sset_marker_insertion_type);
+
+ DEFVAR_INT ("internal--bytechar-distance-increment",
+ bytechar_distance_increment,
+ doc: /* Haha */);
+ bytechar_distance_increment = 50;
}
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-23 19:39 ` Stefan Monnier
@ 2018-03-25 15:11 ` Stefan Monnier
2018-03-25 16:39 ` Eli Zaretskii
0 siblings, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-25 15:11 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> OK, I think I'm tired of these experiments.
Taking a break finally let me get rid of my blinders so I could see
more clearly what's going on:
All this time is really spent in find_newline (rather than in things
like goto-char). The problem goes as follows:
A - (line-number-at-pos POS) will call find_newline, making it scan all
positions between point-min and POS.
B1- if find_newline uses the newline cache, at each newline encountered it
will call buf_charpos_to_bytepos.
B2- if find_newline does not use the newline cache, at each newline
encountered it will call buf_bytepos_to_charpos.
- Each time buf_charpos_to_bytepos/buf_bytepos_to_charpos is called
it will "CONSIDER" the known positions at point-min, point-max, and at
the position of the last-call (i.e. at the previous newline).
Then it will loop through all the markers until the distance between
the nearest position before POS and the nearest position after POS
are within 50 of each other.
C - since one of the known positions is the one of the last newline, the
distance between the nearest position and POS (before considering any
marker) is usually pretty small: the length of the current line.
It's even often smaller than 50. But that doesn't let us stop because
the marker loop doesn't pay attention to the distance to POS in order
to decide to stop: it only consider the distance between the current upper
and lower bound and since we're scanning in one direction, all the
recently seen positions (added as markers) are on the same side of
POS as the last seen newline so they don't help.
So there are various ways to solve this problem. So far, I tried to
make the markers-loop give up earlier, which helps (C) to some extent but
without attacking the core of its problem which is to pay attention to
the case where one of the bounds is already very close to POS even tho
the other is still way off.
The patch below tweaks my previous patch to take this into account.
The result is now that my test cases stay fast (mostly unaffected by the
number of markers) even for large buffers with a large number of markers.
Note that the above shows that there are other optimisations which would
also solve this problem (and would be worthwhile independently).
A - change line-number-at-pos so it doesn't always rescan all the way
from point-min. This would really circumvent the whole problem
(contrarily to what I thought before with my blinders on, thanks Eli
for insisting on that).
B - change find_newline so it doesn't call
buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline.
E.g. in the no-newline-cache case it'd probably be faster to
loop through each char using INC/DEC_BOTH and checking if we're at
\n, than the current code which uses mem(r)chr to look for the
bytepos of the next \n and then calls buf_bytepos_to_charpos to get
the corresponding charpos. Alternatively, we could just delay the
computation of the charpos until the end (currently we update it
at each newline, for the purpose of filling the newline-cache).
-- Stefan
diff --git a/src/marker.c b/src/marker.c
index 7773c4fce0..f869b3f948 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
CHECK_TYPE (MARKERP (x), Qmarkerp, x);
}
+/* When converting bytes from/to chars, we look through the list of
+ markers to try and find a good starting point (since markers keep
+ track of both bytepos and charpos at the same time).
+ But if there are many markers, it can take too much time to find a "good"
+ marker from which to start. Worse yet: if it takes a long time and we end
+ up finding a nearby markers, we won't add a new marker to cache this
+ result, so next time around we'll have to go through this same long list
+ to (re)find this best marker. So the further down the list of
+ markers we go, the less demanding we are w.r.t what is a good marker.
+
+ The previous code used INITIAL=50 and INCREMENT=0 and this lead to
+ really poor performance when there are many markers.
+ I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
+ T61 using various artificial test cases seem to suggest that INCREMENT=50
+ might be "the best compromise": it significantly improved the
+ worst case and it was rarely slower and never by much.
+
+ The asymptotic behavior is still poor, tho, so in largish buffers with many
+ overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */
+#define BYTECHAR_DISTANCE_INITIAL 50
+#define BYTECHAR_DISTANCE_INCREMENT 50
+
/* Return the byte position corresponding to CHARPOS in B. */
ptrdiff_t
@@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
+ ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
@@ -180,8 +203,11 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
/* If we are down to a range of 50 chars,
don't bother checking any other markers;
scan the intervening chars directly now. */
- if (best_above - best_below < 50)
+ if (best_above - charpos < distance
+ || charpos - best_below < distance)
break;
+ else
+ distance += BYTECHAR_DISTANCE_INCREMENT;
}
/* We get here if we did not exactly hit one of the known places.
@@ -293,6 +319,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
struct Lisp_Marker *tail;
ptrdiff_t best_above, best_above_byte;
ptrdiff_t best_below, best_below_byte;
+ ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
@@ -323,8 +350,11 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
/* If we are down to a range of 50 chars,
don't bother checking any other markers;
scan the intervening chars directly now. */
- if (best_above - best_below < 50)
+ if (best_above - bytepos < distance
+ || bytepos - best_below < distance)
break;
+ else
+ distance += BYTECHAR_DISTANCE_INCREMENT;
}
/* We get here if we did not exactly hit one of the known places.
^ permalink raw reply related [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-25 15:11 ` Stefan Monnier
@ 2018-03-25 16:39 ` Eli Zaretskii
2018-03-25 17:35 ` Stefan Monnier
0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-25 16:39 UTC (permalink / raw)
To: Stefan Monnier; +Cc: emacs-devel
> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org
> Date: Sun, 25 Mar 2018 11:11:14 -0400
>
> B1- if find_newline uses the newline cache, at each newline encountered it
> will call buf_charpos_to_bytepos.
> B2- if find_newline does not use the newline cache, at each newline
> encountered it will call buf_bytepos_to_charpos.
Agree with B1, but not with B2. Unless I'm overlooking something,
when the newline cache is disabled, we use memchr, which can search a
contiguous sequence of bytes in a loop, without translating
byte-to-character positions. It only needs this translation at the
beginning of search, after hitting the gap, and when the search is
completed.
> So there are various ways to solve this problem. So far, I tried to
> make the markers-loop give up earlier, which helps (C) to some extent but
> without attacking the core of its problem which is to pay attention to
> the case where one of the bounds is already very close to POS even tho
> the other is still way off.
>
> The patch below tweaks my previous patch to take this into account.
> The result is now that my test cases stay fast (mostly unaffected by the
> number of markers) even for large buffers with a large number of markers.
This is a good change, I think. But it emphasizes even more the fact
that if we instead expose to Lisp display_count_lines, which is
basically a stripped-down version of find_newline with all the
unnecessary ballast removed, we will get an even better performance in
applications that need to count lines a lot.
> Note that the above shows that there are other optimisations which would
> also solve this problem (and would be worthwhile independently).
> A - change line-number-at-pos so it doesn't always rescan all the way
> from point-min. This would really circumvent the whole problem
> (contrarily to what I thought before with my blinders on, thanks Eli
> for insisting on that).
> B - change find_newline so it doesn't call
> buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline.
> E.g. in the no-newline-cache case it'd probably be faster to
> loop through each char using INC/DEC_BOTH and checking if we're at
> \n, than the current code which uses mem(r)chr to look for the
> bytepos of the next \n and then calls buf_bytepos_to_charpos to get
> the corresponding charpos. Alternatively, we could just delay the
> computation of the charpos until the end (currently we update it
> at each newline, for the purpose of filling the newline-cache).
I think we need not touch find_newline. It is a very frequently used
workhorse, and needs to produce decent performance for every one of
its callers. By contrast, applications whose primary need is to count
lines, let alone do that _thousands_ of times per keystroke, should
have a dedicated function optimized for that job alone. It's not a
coincidence the native line-number display uses display_count_lines ;-)
Thanks.
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-25 16:39 ` Eli Zaretskii
@ 2018-03-25 17:35 ` Stefan Monnier
0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-25 17:35 UTC (permalink / raw)
To: Eli Zaretskii; +Cc: emacs-devel
> Agree with B1, but not with B2. Unless I'm overlooking something,
> when the newline cache is disabled, we use memchr, which can search a
> contiguous sequence of bytes in a loop, without translating
> byte-to-character positions. It only needs this translation at the
> beginning of search, after hitting the gap, and when the search is
> completed.
Hmm... indeed, when the newline cache is completely disabled the problem should
not appear.
>> The patch below tweaks my previous patch to take this into account.
>> The result is now that my test cases stay fast (mostly unaffected by the
>> number of markers) even for large buffers with a large number of markers.
> This is a good change, I think. But it emphasizes even more the fact
> that if we instead expose to Lisp display_count_lines, which is
> basically a stripped-down version of find_newline with all the
> unnecessary ballast removed, we will get an even better performance in
> applications that need to count lines a lot.
Yes, fixing A is definitely worthwhile.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-18 20:14 Sebastian Sturm
2018-03-18 20:39 ` Eli Zaretskii
@ 2018-03-26 13:06 ` Stefan Monnier
2018-03-27 20:59 ` Sebastian Sturm
1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-26 13:06 UTC (permalink / raw)
To: emacs-devel
> after finding that the feature/noverlay branch does wonders for my editing
> experience[1], I'd like to reinvigorate the discussion on its inclusion into
> master.
While I also hope that branch can be merged soon, I've installed a patch
into Emacs's `master` which should hopefully solve your
immediate problems.
Stefan
^ permalink raw reply [flat|nested] 54+ messages in thread
* Re: State of the overlay tree branch?
2018-03-26 13:06 ` Stefan Monnier
@ 2018-03-27 20:59 ` Sebastian Sturm
0 siblings, 0 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-27 20:59 UTC (permalink / raw)
To: emacs-devel
thank you, the master branch now handles my use case a lot better than
before
On 03/26/2018 03:06 PM, Stefan Monnier wrote:
>> after finding that the feature/noverlay branch does wonders for my editing
>> experience[1], I'd like to reinvigorate the discussion on its inclusion into
>> master.
>
> While I also hope that branch can be merged soon, I've installed a patch
> into Emacs's `master` which should hopefully solve your
> immediate problems.
>
>
> Stefan
>
>
^ permalink raw reply [flat|nested] 54+ messages in thread
end of thread, other threads:[~2018-03-27 20:59 UTC | newest]
Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
[not found] <<c24f8534-5245-026e-da18-f6be7b9702bf@arkona-technologies.de>
[not found] ` <<834lldp18f.fsf@gnu.org>
2018-03-18 21:37 ` State of the overlay tree branch? Drew Adams
2018-03-19 1:33 ` Stefan Monnier
2018-03-19 6:50 ` Eli Zaretskii
2018-03-19 12:29 ` Stefan Monnier
2018-03-19 13:02 ` Eli Zaretskii
2018-03-19 13:43 ` Stefan Monnier
2018-03-19 14:28 ` Eli Zaretskii
2018-03-19 14:39 ` Stefan Monnier
2018-03-19 6:33 ` Eli Zaretskii
2018-03-18 20:14 Sebastian Sturm
2018-03-18 20:39 ` Eli Zaretskii
2018-03-18 21:04 ` Sebastian Sturm
2018-03-18 23:03 ` Sebastian Sturm
2018-03-18 23:20 ` Sebastian Sturm
2018-03-19 6:43 ` Eli Zaretskii
2018-03-19 9:53 ` Sebastian Sturm
2018-03-19 12:57 ` Eli Zaretskii
2018-03-19 14:56 ` Stefan Monnier
2018-03-19 15:07 ` Sebastian Sturm
2018-03-19 15:13 ` Stefan Monnier
2018-03-20 1:23 ` Sebastian Sturm
2018-03-20 6:30 ` Eli Zaretskii
2018-03-21 0:36 ` Sebastian Sturm
2018-03-21 6:47 ` Eli Zaretskii
2018-03-22 13:16 ` Stefan Monnier
2018-03-22 19:54 ` Sebastian Sturm
2018-03-22 20:04 ` Sebastian Sturm
2018-03-22 20:52 ` Stefan Monnier
2018-03-22 23:11 ` Sebastian Sturm
2018-03-23 5:03 ` Stefan Monnier
2018-03-23 12:25 ` Sebastian Sturm
2018-03-23 12:47 ` Eli Zaretskii
2018-03-23 13:19 ` Stefan Monnier
2018-03-23 13:37 ` Noam Postavsky
2018-03-23 13:55 ` Stefan Monnier
2018-03-23 14:22 ` Eli Zaretskii
2018-03-23 14:39 ` Stefan Monnier
2018-03-23 19:39 ` Stefan Monnier
2018-03-25 15:11 ` Stefan Monnier
2018-03-25 16:39 ` Eli Zaretskii
2018-03-25 17:35 ` Stefan Monnier
2018-03-23 8:07 ` Eli Zaretskii
2018-03-23 9:08 ` Eli Zaretskii
2018-03-23 10:15 ` Sebastian Sturm
2018-03-23 12:39 ` Eli Zaretskii
2018-03-23 12:12 ` Stefan Monnier
2018-03-23 12:40 ` Eli Zaretskii
2018-03-23 12:55 ` Stefan Monnier
2018-03-19 6:36 ` Eli Zaretskii
2018-03-19 6:28 ` Eli Zaretskii
2018-03-21 14:14 ` Sebastien Chapuis
2018-03-21 15:35 ` Eli Zaretskii
2018-03-26 13:06 ` Stefan Monnier
2018-03-27 20:59 ` Sebastian Sturm
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/emacs.git
https://git.savannah.gnu.org/cgit/emacs/org-mode.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.