RE: State of the overlay tree branch?

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* RE: State of the overlay tree branch?
       [not found] ` <<834lldp18f.fsf@gnu.org>
@ 2018-03-18 21:37   ` Drew Adams
  2018-03-19  1:33     ` Stefan Monnier
  2018-03-19  6:33     ` Eli Zaretskii
  0 siblings, 2 replies; 54+ messages in thread
From: Drew Adams @ 2018-03-18 21:37 UTC (permalink / raw)
  To: Eli Zaretskii, Sebastian Sturm; +Cc: emacs-devel

> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
> and should run circles around count-lines (used by line-number-at-pos).

Having a fast line counter in Elisp would be terrific.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 21:37   ` State of the overlay tree branch? Drew Adams
@ 2018-03-19  1:33     ` Stefan Monnier
  2018-03-19  6:50       ` Eli Zaretskii
  2018-03-19  6:33     ` Eli Zaretskii
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19  1:33 UTC (permalink / raw)
  To: emacs-devel

>> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
>> provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
>> and should run circles around count-lines (used by line-number-at-pos).
> Having a fast line counter in Elisp would be terrific.

It should be pretty easy to provide such a thing by relying on a cache
of the last call.  Tho Sebastian's experience seems to indicate that the
current code doesn't only suffer from the time to count LF but also from
the time to process the markers.

I seem to remember someone else experiencing a similar problem and
suggesting that the problem lies in the charpos_to_bytepos (and/or
bytepos_to_charpos) conversion function, which iterates through all the
markers to try and find a "nearby" marker (because markers keep track
of both their bytepos and their charpos).  Looking for a nearby marker
to avoid scanning the whole buffer is a good idea in many cases, but not
if scanning the list of markers takes more time than scanning the
whole buffer.

        Stefan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19  1:33     ` Stefan Monnier
@ 2018-03-19  6:50       ` Eli Zaretskii
  2018-03-19 12:29         ` Stefan Monnier
  0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19  6:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Sun, 18 Mar 2018 21:33:05 -0400
> 
> >> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> >> provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
> >> and should run circles around count-lines (used by line-number-at-pos).
> > Having a fast line counter in Elisp would be terrific.
> 
> It should be pretty easy to provide such a thing by relying on a cache
> of the last call.

This is already coded, see display_count_lines.  That's what the
native line-number display uses.  Exposing it to Lisp should be easy.
But I don't believe it could be orders of magnitude faster than
count-lines, even though it doesn't need to convert character position
to byte position.

I'm guessing something entirely different and unrelated to
line-counting per se is at work here.

> Tho Sebastian's experience seems to indicate that the
> current code doesn't only suffer from the time to count LF but also from
> the time to process the markers.

Not sure what marker processing did you have in mind.  Can you
elaborate?

> I seem to remember someone else experiencing a similar problem and
> suggesting that the problem lies in the charpos_to_bytepos (and/or
> bytepos_to_charpos) conversion function, which iterates through all the
> markers to try and find a "nearby" marker (because markers keep track
> of both their bytepos and their charpos).  Looking for a nearby marker
> to avoid scanning the whole buffer is a good idea in many cases, but not
> if scanning the list of markers takes more time than scanning the
> whole buffer.

But find_newline doesn't look for markers, and it converts character
to byte position just 2 times.  Or am I missing something?



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19  6:50       ` Eli Zaretskii
@ 2018-03-19 12:29         ` Stefan Monnier
  2018-03-19 13:02           ` Eli Zaretskii
  0 siblings, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 12:29 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> It should be pretty easy to provide such a thing by relying on a cache
>> of the last call.
> This is already coded, see display_count_lines.

I don't see any cache in display_count_lines but yes, the code that uses
display_count_lines does do such caching and we could/should expose it
to Lisp.

In nlinum.el I also have something similar (called
nlinum--line-number-at-pos).

> But I don't believe it could be orders of magnitude faster than
> count-lines, even though it doesn't need to convert character position
> to byte position.

Scanning from the last used position can be *very* different from
scanning from point-min.  So yes, it can be orders of magnitude faster
(I wrote nlinum--line-number-at-pos for that reason: I sadly didn't
write down the test case I used back then, but the difference was
very significant).

> I'm guessing something entirely different and unrelated to
> line-counting per se is at work here.

Agreed.

>> Tho Sebastian's experience seems to indicate that the
>> current code doesn't only suffer from the time to count LF but also from
>> the time to process the markers.
> Not sure what marker processing did you have in mind.  Can you
> elaborate?

The

  for (tail = BUF_MARKERS (b); tail; tail = tail->next)

loop in buf_charpos_to_bytepos and buf_bytepos_to_charpos.

> But find_newline doesn't look for markers, and it converts character
> to byte position just 2 times.  Or am I missing something?

The idea is that the above loop (even if called only twice) might be
sufficient to make line-number-at-pos take 0.2s.

I don't know that it's the culprit, I'm just mentioning that
possibility, since noverlays removes all the overlay-induced markers
which would significantly reduce the number of markers over which the
above loops.

Note that those loops stop as soon as we're within 50 chars of the goal,
and they also stop as soon as there's no non-ascii char between the
"best bounds so far".

So for them to cause the slow down seen here, we'd need not only
a very large number of markers but also additional conditions that might
not be very likely.
But it's still a possibility.

        Stefan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19 12:29         ` Stefan Monnier
@ 2018-03-19 13:02           ` Eli Zaretskii
  2018-03-19 13:43             ` Stefan Monnier
  0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 13:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org
> Date: Mon, 19 Mar 2018 08:29:40 -0400
> 
> > But I don't believe [display_count_lines] could be orders of
> > magnitude faster than count-lines, even though it doesn't need to
> > convert character position to byte position.
> 
> Scanning from the last used position can be *very* different from
> scanning from point-min.  So yes, it can be orders of magnitude faster

Well, in my measurements it's already very fast.  I don't understand
why the OP sees times that are 10 times slower.

> The
> 
>   for (tail = BUF_MARKERS (b); tail; tail = tail->next)
> 
> loop in buf_charpos_to_bytepos and buf_bytepos_to_charpos.
> 
> > But find_newline doesn't look for markers, and it converts character
> > to byte position just 2 times.  Or am I missing something?
> 
> The idea is that the above loop (even if called only twice) might be
> sufficient to make line-number-at-pos take 0.2s.

I very much doubt that loop could take such a long time.  And running
a benchmark 1000 times means that the 2nd through 1000th iteration
find the mapping much faster, probably bypassing the loop entirely.

> So for them to cause the slow down seen here, we'd need not only
> a very large number of markers but also additional conditions that might
> not be very likely.
> But it's still a possibility.

I'll believe it when I see it.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19 13:02           ` Eli Zaretskii
@ 2018-03-19 13:43             ` Stefan Monnier
  2018-03-19 14:28               ` Eli Zaretskii
  0 siblings, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 13:43 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

>> The idea is that the above loop (even if called only twice) might be
>> sufficient to make line-number-at-pos take 0.2s.
> I very much doubt that loop could take such a long time.

I also find it unlikely.

> And running a benchmark 1000 times means that the 2nd through 1000th
> iteration find the mapping much faster, probably bypassing the
> loop entirely.

Quite likely.

BTW, when thinking about how to avoid this loop's worst case (regardless
if it's the source of the current problem), I was thinking that we could
probably make that worst case even less likely by replacing the
hardcoded "50" with a limit that grows as the loop progresses.

The patch below should make sure that the total number of iterations
(through markers + through chars) is no more than 2*buffer-size, no matter
how many markers there are.
[ Rather than incrementing by 1 we should ideally increment by
  a number that corresponds to how much slower is each iteration through
  markers compared to the iteration through chars, but I have no idea
  what that number would typically be.  ]

WDYT?


        Stefan


diff --git a/src/marker.c b/src/marker.c
index f61701f2f6..bfe6de486e 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -141,6 +141,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = 50;
 
   eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
 
@@ -180,8 +181,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - best_below < distance)
 	break;
+      else
+        distance++;
     }
 
   /* We get here if we did not exactly hit one of the known places.
@@ -293,6 +296,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = 50;
 
   eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
 
@@ -323,8 +327,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - best_below < distance)
 	break;
+      else
+        distance++;
     }
 
   /* We get here if we did not exactly hit one of the known places.



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19 13:43             ` Stefan Monnier
@ 2018-03-19 14:28               ` Eli Zaretskii
  2018-03-19 14:39                 ` Stefan Monnier
  0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 14:28 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org
> Date: Mon, 19 Mar 2018 09:43:12 -0400
> 
> BTW, when thinking about how to avoid this loop's worst case (regardless
> if it's the source of the current problem), I was thinking that we could
> probably make that worst case even less likely by replacing the
> hardcoded "50" with a limit that grows as the loop progresses.
> 
> The patch below should make sure that the total number of iterations
> (through markers + through chars) is no more than 2*buffer-size, no matter
> how many markers there are.
> [ Rather than incrementing by 1 we should ideally increment by
>   a number that corresponds to how much slower is each iteration through
>   markers compared to the iteration through chars, but I have no idea
>   what that number would typically be.  ]
> 
> WDYT?

Could be a good idea, but I suggest to time its improvement before we
decide.  I've seen a few surprises in that area.




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19 14:28               ` Eli Zaretskii
@ 2018-03-19 14:39                 ` Stefan Monnier
  0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 14:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> Could be a good idea, but I suggest to time its improvement before we
> decide.  I've seen a few surprises in that area.

I see some speed up in my artificial test (where I create an insane
number of markers at point-min and then ask to goto-char to the middle
of a midsized buffer) but I can't see any impact whatsoever (neither
positive nor negative) in real tests (unsurprisingly).

I'd rather first have a non-artificial case where I can measure
a speed up.

        Stefan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 21:37   ` State of the overlay tree branch? Drew Adams
  2018-03-19  1:33     ` Stefan Monnier
@ 2018-03-19  6:33     ` Eli Zaretskii
  1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19  6:33 UTC (permalink / raw)
  To: Drew Adams; +Cc: s.sturm, emacs-devel

> Date: Sun, 18 Mar 2018 14:37:39 -0700 (PDT)
> From: Drew Adams <drew.adams@oracle.com>
> Cc: emacs-devel@gnu.org
> 
> > If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> > provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
> > and should run circles around count-lines (used by line-number-at-pos).
> 
> Having a fast line counter in Elisp would be terrific.

Actually, I see that line-number-at-pos, count-lines, and forward-line
should already be fast, as they do exactly what I thought, something I
failed to see originally.  If that's not "fast enough", a test case
showing the problem would be a good starting point.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* State of the overlay tree branch?
@ 2018-03-18 20:14 Sebastian Sturm
  2018-03-18 20:39 ` Eli Zaretskii
  2018-03-26 13:06 ` Stefan Monnier
  0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 20:14 UTC (permalink / raw)
  To: emacs-devel

Hi,

after finding that the feature/noverlay branch does wonders for my 
editing experience[1], I'd like to reinvigorate the discussion on its 
inclusion into master. Are there plans for a merge with the emacs-27 
master branch? Any critical issues blocking such a merge?

If a recent Emacs 27 variant with the overlay branch feature was 
available, I'd be happy to evaluate that in my daily work.

best regards,
Sebastian Sturm

[1] I'm using cquery for my C++ editing needs, which comes with an 
overlay-based semantic highlighting mechanism. With my emacs 
configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per 
character insertion, which consume ~20 to 25 ms each when performing 
edits close to the bottom of a 66KB C++ file (measured using 
(benchmark-run 1000 (line-number-at-pos (point))) on a release build of 
emacs-27/git commit #9942734...). Using the noverlay branch, this figure 
drops to ~160us per call.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 20:14 Sebastian Sturm
@ 2018-03-18 20:39 ` Eli Zaretskii
  2018-03-18 21:04   ` Sebastian Sturm
  2018-03-21 14:14   ` Sebastien Chapuis
  2018-03-26 13:06 ` Stefan Monnier
  1 sibling, 2 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-18 20:39 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Sun, 18 Mar 2018 21:14:53 +0100
> 
> [1] I'm using cquery for my C++ editing needs, which comes with an 
> overlay-based semantic highlighting mechanism. With my emacs 
> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per 
> character insertion, which consume ~20 to 25 ms each when performing 
> edits close to the bottom of a 66KB C++ file (measured using 
> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of 
> emacs-27/git commit #9942734...). Using the noverlay branch, this figure 
> drops to ~160us per call.

If lsp-mode/lsp-ui needs a fast line counter, one can easily be
provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
and should run circles around count-lines (used by line-number-at-pos).

(I'm not sure I even understand how overlays come into play here,
btw.)



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 20:39 ` Eli Zaretskii
@ 2018-03-18 21:04   ` Sebastian Sturm
  2018-03-18 23:03     ` Sebastian Sturm
  2018-03-19  6:28     ` Eli Zaretskii
  2018-03-21 14:14   ` Sebastien Chapuis
  1 sibling, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 21:04 UTC (permalink / raw)
  To: emacs-devel

I also found it surprising that overlays would slow down line counting, 
but since I don't know anything about the architecture of the Emacs 
display engine, or its overlay implementation, I figured that overlays 
must be to blame because

(i) the issue went away after switching to the feature/noverlay branch

(ii) configuring the semantic highlighter to use its font-lock backend 
also resolved the performance issue (though with the font-lock backend, 
highlights are easily messed up by editing operations which makes the 
overlay variant far more appealing)

I also found that some other heavy users of overlays such as 
avy-goto-word-0-{above,below} feel faster with the feature/noverlay 
branch, so I'd welcome a merge of the overlay branch even if there was a 
technically superior alternative to line-number-at-pos that didn't 
suffer from overlay-related performance issues.

That being said, your suggestion sounds intriguing. What would be 
required to expose find_newline to Lisp? Would I simply have to wrap it 
in one of Emacs's DEFINE_<something> macros? Is there some documentation 
on the Emacs C backend?

On 03/18/2018 09:39 PM, Eli Zaretskii wrote:
 >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
 >> Date: Sun, 18 Mar 2018 21:14:53 +0100
 >>
 >> [1] I'm using cquery for my C++ editing needs, which comes with an
 >> overlay-based semantic highlighting mechanism. With my emacs
 >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
 >> character insertion, which consume ~20 to 25 ms each when performing
 >> edits close to the bottom of a 66KB C++ file (measured using
 >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
 >> emacs-27/git commit #9942734...). Using the noverlay branch, this figure
 >> drops to ~160us per call.
 >
 > If lsp-mode/lsp-ui needs a fast line counter, one can easily be
 > provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
 > and should run circles around count-lines (used by line-number-at-pos).
 >
 > (I'm not sure I even understand how overlays come into play here,
 > btw.)



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 21:04   ` Sebastian Sturm
@ 2018-03-18 23:03     ` Sebastian Sturm
  2018-03-18 23:20       ` Sebastian Sturm
  2018-03-19  6:36       ` Eli Zaretskii
  2018-03-19  6:28     ` Eli Zaretskii
  1 sibling, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 23:03 UTC (permalink / raw)
  To: emacs-devel

concerning the performance improvement with noverlay, it seems I spoke 
to soon. I've now had the issue reappear, both with the noverlay branch, 
and with the semantic highlighter set to use font-lock. Sorry for the 
misinformation.
Again, however, line-number-at-pos shows up as a large CPU time consumer 
in the profiler report, and benchmark-run still reports several ms per 
invocation (though this time it's usually around 2 to 4 ms instead of 
the 20 to 25 I measured earlier), so I'd still be very much interested 
in a faster line-number-at-pos implementation.

On 03/18/2018 10:04 PM, Sebastian Sturm wrote:
> I also found it surprising that overlays would slow down line counting, 
> but since I don't know anything about the architecture of the Emacs 
> display engine, or its overlay implementation, I figured that overlays 
> must be to blame because
> 
> (i) the issue went away after switching to the feature/noverlay branch
> 
> (ii) configuring the semantic highlighter to use its font-lock backend 
> also resolved the performance issue (though with the font-lock backend, 
> highlights are easily messed up by editing operations which makes the 
> overlay variant far more appealing)
> 
> I also found that some other heavy users of overlays such as 
> avy-goto-word-0-{above,below} feel faster with the feature/noverlay 
> branch, so I'd welcome a merge of the overlay branch even if there was a 
> technically superior alternative to line-number-at-pos that didn't 
> suffer from overlay-related performance issues.
> 
> That being said, your suggestion sounds intriguing. What would be 
> required to expose find_newline to Lisp? Would I simply have to wrap it 
> in one of Emacs's DEFINE_<something> macros? Is there some documentation 
> on the Emacs C backend?
> 
> On 03/18/2018 09:39 PM, Eli Zaretskii wrote:
>  >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>  >> Date: Sun, 18 Mar 2018 21:14:53 +0100
>  >>
>  >> [1] I'm using cquery for my C++ editing needs, which comes with an
>  >> overlay-based semantic highlighting mechanism. With my emacs
>  >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
>  >> character insertion, which consume ~20 to 25 ms each when performing
>  >> edits close to the bottom of a 66KB C++ file (measured using
>  >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
>  >> emacs-27/git commit #9942734...). Using the noverlay branch, this 
> figure
>  >> drops to ~160us per call.
>  >
>  > If lsp-mode/lsp-ui needs a fast line counter, one can easily be
>  > provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
>  > and should run circles around count-lines (used by line-number-at-pos).
>  >
>  > (I'm not sure I even understand how overlays come into play here,
>  > btw.)
> 



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 23:03     ` Sebastian Sturm
@ 2018-03-18 23:20       ` Sebastian Sturm
  2018-03-19  6:43         ` Eli Zaretskii
  2018-03-19  6:36       ` Eli Zaretskii
  1 sibling, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-18 23:20 UTC (permalink / raw)
  To: emacs-devel

for the record, I just switched back to emacs master (no noverlay) and 
the time reported by (benchmark-run 1000 (line-number-at-pos (point)) 
increased by a factor of ~40, to 75-80s. At this level, editing is 
unbearably slow. With the semantic highlighter disabled, the same 
measurement yields ~2.5s (still painfully slow, but borderline usable), 
so about the same time reported by the noverlay branch.
To me, this suggests that noverlay indeed improves performance, though 
not necessarily to the level I had previously claimed. find_newline may 
solve this particular issue completely.
Since the time taken by line-number-at-pos seems to fluctuate wildly for 
(to me) unknown reasons, I'll try and see if I can set up a systematic 
way to collect reliable data.

On 03/19/2018 12:03 AM, Sebastian Sturm wrote:
> concerning the performance improvement with noverlay, it seems I spoke 
> to soon. I've now had the issue reappear, both with the noverlay branch, 
> and with the semantic highlighter set to use font-lock. Sorry for the 
> misinformation.
> Again, however, line-number-at-pos shows up as a large CPU time consumer 
> in the profiler report, and benchmark-run still reports several ms per 
> invocation (though this time it's usually around 2 to 4 ms instead of 
> the 20 to 25 I measured earlier), so I'd still be very much interested 
> in a faster line-number-at-pos implementation.
> 
> On 03/18/2018 10:04 PM, Sebastian Sturm wrote:
>> I also found it surprising that overlays would slow down line 
>> counting, but since I don't know anything about the architecture of 
>> the Emacs display engine, or its overlay implementation, I figured 
>> that overlays must be to blame because
>>
>> (i) the issue went away after switching to the feature/noverlay branch
>>
>> (ii) configuring the semantic highlighter to use its font-lock backend 
>> also resolved the performance issue (though with the font-lock 
>> backend, highlights are easily messed up by editing operations which 
>> makes the overlay variant far more appealing)
>>
>> I also found that some other heavy users of overlays such as 
>> avy-goto-word-0-{above,below} feel faster with the feature/noverlay 
>> branch, so I'd welcome a merge of the overlay branch even if there was 
>> a technically superior alternative to line-number-at-pos that didn't 
>> suffer from overlay-related performance issues.
>>
>> That being said, your suggestion sounds intriguing. What would be 
>> required to expose find_newline to Lisp? Would I simply have to wrap 
>> it in one of Emacs's DEFINE_<something> macros? Is there some 
>> documentation on the Emacs C backend?
>>
>> On 03/18/2018 09:39 PM, Eli Zaretskii wrote:
>>  >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>>  >> Date: Sun, 18 Mar 2018 21:14:53 +0100
>>  >>
>>  >> [1] I'm using cquery for my C++ editing needs, which comes with an
>>  >> overlay-based semantic highlighting mechanism. With my emacs
>>  >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
>>  >> character insertion, which consume ~20 to 25 ms each when performing
>>  >> edits close to the bottom of a 66KB C++ file (measured using
>>  >> (benchmark-run 1000 (line-number-at-pos (point))) on a release 
>> build of
>>  >> emacs-27/git commit #9942734...). Using the noverlay branch, this 
>> figure
>>  >> drops to ~160us per call.
>>  >
>>  > If lsp-mode/lsp-ui needs a fast line counter, one can easily be
>>  > provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
>>  > and should run circles around count-lines (used by 
>> line-number-at-pos).
>>  >
>>  > (I'm not sure I even understand how overlays come into play here,
>>  > btw.)
>>
> 



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 23:20       ` Sebastian Sturm
@ 2018-03-19  6:43         ` Eli Zaretskii
  2018-03-19  9:53           ` Sebastian Sturm
  0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19  6:43 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Mon, 19 Mar 2018 00:20:13 +0100
> 
> for the record, I just switched back to emacs master (no noverlay) and 
> the time reported by (benchmark-run 1000 (line-number-at-pos (point)) 
> increased by a factor of ~40, to 75-80s. At this level, editing is 
> unbearably slow. With the semantic highlighter disabled, the same 
> measurement yields ~2.5s (still painfully slow, but borderline usable), 
> so about the same time reported by the noverlay branch.

You will have to explain why overlays and the semantic highlighter
affect line-counting.  How about presenting a profile produced by
"M-x profiler-report"?

And the timings you measure are 2.5 _milliseconds_ (the benchmark runs
1000 times), right?  If so, I cannot understand why you say that's
borderline usable, because IME such short times are imperceptible by
humans.  I guess some other factor is at work here, so I'd suggest to
describe more details about your use case.

> Since the time taken by line-number-at-pos seems to fluctuate wildly for 
> (to me) unknown reasons, I'll try and see if I can set up a systematic 
> way to collect reliable data.

Yes, please do.  I'm guessing there's some factor here that is
important to consider.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19  6:43         ` Eli Zaretskii
@ 2018-03-19  9:53           ` Sebastian Sturm
  2018-03-19 12:57             ` Eli Zaretskii
  2018-03-19 14:56             ` Stefan Monnier
  0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-19  9:53 UTC (permalink / raw)
  To: emacs-devel

On 03/19/2018 07:43 AM, Eli Zaretskii wrote:
 >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
 >> Date: Mon, 19 Mar 2018 00:20:13 +0100
 >>
 >> for the record, I just switched back to emacs master (no noverlay) and
 >> the time reported by (benchmark-run 1000 (line-number-at-pos (point))
 >> increased by a factor of ~40, to 75-80s. At this level, editing is
 >> unbearably slow. With the semantic highlighter disabled, the same
 >> measurement yields ~2.5s (still painfully slow, but borderline usable),
 >> so about the same time reported by the noverlay branch.
 >
 > You will have to explain why overlays and the semantic highlighter
 > affect line-counting.  How about presenting a profile produced by
 > "M-x profiler-report"?

please find below a profiler report taken this morning (on my PC at 
work, which doesn't suffer from the performance issue as much as my 2014 
MacBook Pro, but even here the issue is clearly noticeable)

 >
 > And the timings you measure are 2.5 _milliseconds_ (the benchmark runs
 > 1000 times), right?  If so, I cannot understand why you say that's
 > borderline usable, because IME such short times are imperceptible by
 > humans.  I guess some other factor is at work here, so I'd suggest to
 > describe more details about your use case.

well no, it's about 2.5ms per call to line-number-at-pos, which is 
called at least 6 times per character insertion (with my Emacs config, 
at least). Which already makes for 15ms per character insertion, 
excluding anything else done by cc-mode or lsp-mode.

 >> Since the time taken by line-number-at-pos seems to fluctuate wildly for
 >> (to me) unknown reasons, I'll try and see if I can set up a systematic
 >> way to collect reliable data.
 >
 > Yes, please do.  I'm guessing there's some factor here that is
 > important to consider.

I wrote a simple-minded measurement routine that can at least report the 
time taken between successive character insertions, and I noticed that 
better alternatives have been brought up in response to my other mailing 
list thread (titled "Latency profiling?"). I haven't had time to prepare 
a systematic test case yet, but I will look into it when I get home from 
work this evening.


- command-execute                                                5676  73%
  - call-interactively                                            5676  73%
   - funcall-interactively                                        5676  73%
    - self-insert-command                                         5080  66%
     - lsp-on-change                                              2144  27%
      - lsp--text-document-content-change-event                   1964  25%
       - lsp--point-to-position                                   1964  25%
          line-number-at-pos                                      1964  25%
      + json-encode                                                164   2%
      + file-truename                                               16   0%
     - c-after-change                                             1784  23%
      - mapc                                                      1772  23%
       + #<compiled 0x1a24bed>                                    1772  23%
      - c-invalidate-sws-region-after                               12   0%
       - c-invalidate-sws-region-after-ins                          12   0%
        - c-beginning-of-macro                                       8   0%
           back-to-indentation                                       4   0%
        - c-literal-limits                                           4   0%
           c-state-full-pp-to-literal                                4   0%
     - lsp-before-change                                          1016  13%
      - lsp--point-to-position                                    1016  13%
         line-number-at-pos                                       1016  13%
     - sp--post-self-insert-hook-handler                            56   0%
      - sp-insert-pair                                              28   0%
       - sp--pair-to-insert                                         24   0%
        - sp--all-pairs-to-insert                                   20   0%
         - sp--looking-back-p                                        8   0%
            sp--looking-back                                         8   0%
         + sp--do-action-p                                           8   0%
       + sp--get-closing-regexp                                      4   0%
      + sp--all-pairs-to-insert                                     16   0%
      + sp-escape-open-delimiter                                    12   0%
     + c-before-change                                              28   0%
     - jit-lock-after-change                                         4   0%
      + run-hook-with-args                                           4   0%
    + counsel-M-x                                                  360   4%
    + evil-open-below                                              232   3%
- lsp-ui-doc--make-request                                        643   8%
    line-number-at-pos                                             627   8%
  + file-truename                                                    8   0%
    url-hexify-string                                                4   0%
- lsp-ui-sideline                                                 528   6%
    line-number-at-pos                                             528   6%
+ timer-event-handler                                             327   4%
+ evil-escape-pre-command-hook                                    220   2%
+ #<compiled 0xb54291>                                            141   1%
+ redisplay_internal (C function)                                  88   1%
+ ...                                                              29   0%
+ yas--post-command-handler                                        20   0%
+ global-spacemacs-whitespace-cleanup-mode-check-buffers            4   0%
+ internal-timer-start-idle                                         4   0%
+ flycheck-display-error-at-point-soon                              4   0%
   flycheck-error-list-update-source                                 4   0%



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19  9:53           ` Sebastian Sturm
@ 2018-03-19 12:57             ` Eli Zaretskii
  2018-03-19 14:56             ` Stefan Monnier
  1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19 12:57 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Mon, 19 Mar 2018 10:53:52 +0100
> 
> On 03/19/2018 07:43 AM, Eli Zaretskii wrote:
>  >> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>  >> Date: Mon, 19 Mar 2018 00:20:13 +0100
>  >>
>  >> for the record, I just switched back to emacs master (no noverlay) and
>  >> the time reported by (benchmark-run 1000 (line-number-at-pos (point))
>  >> increased by a factor of ~40, to 75-80s. At this level, editing is
>  >> unbearably slow. With the semantic highlighter disabled, the same
>  >> measurement yields ~2.5s (still painfully slow, but borderline usable),
>  >> so about the same time reported by the noverlay branch.
>  >
>  > You will have to explain why overlays and the semantic highlighter
>  > affect line-counting.  How about presenting a profile produced by
>  > "M-x profiler-report"?
> 
> please find below a profiler report taken this morning (on my PC at 
> work, which doesn't suffer from the performance issue as much as my 2014 
> MacBook Pro, but even here the issue is clearly noticeable)

That profile says that self-insert-command takes a large percentage of
the time.  So I think we should look into the reasons for such a
strange place to spend hundreds of microseconds.  According to the
profile, line-number-at-pos takes about the same percentage of time as
self-insert-command does.  And that is even before you optimize the
successive calls to line-counting code to take advantage of the
previously computed value for some close line.

>  > And the timings you measure are 2.5 _milliseconds_ (the benchmark runs
>  > 1000 times), right?  If so, I cannot understand why you say that's
>  > borderline usable, because IME such short times are imperceptible by
>  > humans.  I guess some other factor is at work here, so I'd suggest to
>  > describe more details about your use case.
> 
> well no, it's about 2.5ms per call to line-number-at-pos, which is 
> called at least 6 times per character insertion (with my Emacs config, 
> at least). Which already makes for 15ms per character insertion, 
> excluding anything else done by cc-mode or lsp-mode.

Then, as I said, I don't understand why it takes so much on your
system.  I get times that are 10 times faster.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19  9:53           ` Sebastian Sturm
  2018-03-19 12:57             ` Eli Zaretskii
@ 2018-03-19 14:56             ` Stefan Monnier
  2018-03-19 15:07               ` Sebastian Sturm
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 14:56 UTC (permalink / raw)
  To: emacs-devel

> well no, it's about 2.5ms per call to line-number-at-pos, which is called at
> least 6 times per character insertion (with my Emacs config, at
> least). Which already makes for 15ms per character insertion, excluding
> anything else done by cc-mode or lsp-mode.

Since you say that the noverlay branch helps, could you check the number
of overlays involved?  E.g.

    M-: (length (overlays-in (point-min) (point-max))) RET

If there are more overlays than chars in this buffer, maybe there's
a problem in some Elisp that creates too many overlays?

If there aren't that many overlays, then I wonder why the noverlays
branch would make such a significant difference.

Also, if you can reliably reproduce the "slow editing", would it be
possible to make a recipe for it that we can try and reproduce on
our side?

        Stefan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19 14:56             ` Stefan Monnier
@ 2018-03-19 15:07               ` Sebastian Sturm
  2018-03-19 15:13                 ` Stefan Monnier
  2018-03-20  1:23                 ` Sebastian Sturm
  0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-19 15:07 UTC (permalink / raw)
  To: emacs-devel

for the file I was complaining about, the number returned is 3080 
(doesn't exceed the number of chars though, (point-max) = 67641). Will 
try to obtain usable timing data later today when I get home from work. 
Thanks!

On 03/19/2018 03:56 PM, Stefan Monnier wrote:
>> well no, it's about 2.5ms per call to line-number-at-pos, which is called at
>> least 6 times per character insertion (with my Emacs config, at
>> least). Which already makes for 15ms per character insertion, excluding
>> anything else done by cc-mode or lsp-mode.
> 
> Since you say that the noverlay branch helps, could you check the number
> of overlays involved?  E.g.
> 
>      M-: (length (overlays-in (point-min) (point-max))) RET
> 
> If there are more overlays than chars in this buffer, maybe there's
> a problem in some Elisp that creates too many overlays?
> 
> If there aren't that many overlays, then I wonder why the noverlays
> branch would make such a significant difference.
> 
> Also, if you can reliably reproduce the "slow editing", would it be
> possible to make a recipe for it that we can try and reproduce on
> our side?
> 
> 
>          Stefan
> 
> 

-- 
Sebastian Sturm
Research & Development

Phone:	+49 (0) 6155 7808977
Fax:	+49 (0) 6155 7802880
Email:	s.sturm@arkona-technologies.de
Web:	www.arkona-technologies.de

arkona technologies GmbH
Im Leuschnerpark 4
64347 Griesheim
Germany
Amtsgericht / Commercial Register of Darmstadt, HRB 90080
USt-ID: DE273794666
Steuernummer / Tax-ID: 007 / 228 / 19331
Geschäftsführung / Managing Director: Rainer Sturm

This e-mail may contain confidential and/or privileged information. If 
you are not the intended recipient (or have received this e-mail in 
error) please notify the sender immediately and delete this message and 
any attachments from your system. Any unauthorised copying, disclosure 
or distribution of the material in this e-mail is strictly forbidden.

We cannot accept any responsibility for the accuracy or completeness of 
this message as it has been transmitted over a public network. If, 
despite our use of anti-virus software, a virus enters your systems in 
connection with the sending of the e-mail, you may not hold us liable 
for any damages that may possibly arise in that connection. We will 
accept liability which by law we cannot exclude.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19 15:07               ` Sebastian Sturm
@ 2018-03-19 15:13                 ` Stefan Monnier
  2018-03-20  1:23                 ` Sebastian Sturm
  1 sibling, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-19 15:13 UTC (permalink / raw)
  To: emacs-devel

> for the file I was complaining about, the number returned is 3080 (doesn't
> exceed the number of chars though, (point-max) = 67641). Will try to obtain
> usable timing data later today when I get home from work. Thanks!

Thanks.

It's large enough that noverlay might indeed make a difference for some
operations, but not large enough to explain why line-number-at-pos would
be so slow, I think.


        Stefan




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-19 15:07               ` Sebastian Sturm
  2018-03-19 15:13                 ` Stefan Monnier
@ 2018-03-20  1:23                 ` Sebastian Sturm
  2018-03-20  6:30                   ` Eli Zaretskii
  2018-03-22 20:52                   ` Stefan Monnier
  1 sibling, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-20  1:23 UTC (permalink / raw)
  To: emacs-devel

after disabling cquery for testing purposes (which leaves me with 45 
overlays generated by flycheck for my troublesome C++ file), I'm now 
generating a large number of overlays using the following function:

(defun test-highlight ()
         (save-excursion
           (with-silent-modifications
             (let ((stepsize 10))
               (widen)
               (goto-char 1)
               (cl-loop for n from (point-min) upto (- (point-max) 
stepsize) by stepsize do
                 (let ((ov (make-overlay n (+ (1- stepsize) n))))
                   (overlay-put ov 'cquery-sem-highlight t)))))))

Evaluating the following function without additional overlays (beyond 
the flycheck ones, that is) yields the following results:

(defun benchmark-often ()
   (cl-loop for n from 1 upto 20 do
            (message (format "iteration %d: %f" n (nth 0 (benchmark-run 
(line-number-at-pos (point))))))))

1st run:
iteration 1: 0.001213
iteration 2: 0.001170
iteration 3: 0.001170
iteration 4: 0.001238
iteration 5: 0.001163
iteration 6: 0.001153
iteration 7: 0.000421
iteration 8: 0.000426
iteration 9: 0.000322
iteration 10: 0.000301
iteration 11: 0.000291
iteration 12: 0.000292
iteration 13: 0.000291
iteration 14: 0.000291
iteration 15: 0.000295
iteration 16: 0.000289
iteration 17: 0.000289
iteration 18: 0.000288
iteration 19: 0.000288
iteration 20: 0.000287

2nd run:
iteration 1: 0.001044
iteration 2: 0.000942
iteration 3: 0.000935
iteration 4: 0.000935
iteration 5: 0.000935
iteration 6: 0.000932
iteration 7: 0.000954
iteration 8: 0.000940
iteration 9: 0.000933
iteration 10: 0.000625
iteration 11: 0.000545
iteration 12: 0.000428
iteration 13: 0.000362
iteration 14: 0.000346
iteration 15: 0.000325
iteration 16: 0.000309
iteration 17: 0.000309
iteration 18: 0.000316
iteration 19: 0.000310
iteration 20: 0.000308

after evaluating (test-highlight) the figures are as follows:
1st run:
iteration 1: 0.026012
iteration 2: 0.020334
iteration 3: 0.020250
iteration 4: 0.020349
iteration 5: 0.020501
iteration 6: 0.020635
iteration 7: 0.020302
iteration 8: 0.020426
iteration 9: 0.020440
iteration 10: 0.020441
iteration 11: 0.020515
iteration 12: 0.020525
iteration 13: 0.020383
iteration 14: 0.020510
iteration 15: 0.019829
iteration 16: 0.019899
iteration 17: 0.019950
iteration 18: 0.019828
iteration 19: 0.019901
iteration 20: 0.019819

2nd run:
iteration 1: 0.026526
iteration 2: 0.020051
iteration 3: 0.020100
iteration 4: 0.020080
iteration 5: 0.020080
iteration 6: 0.020249
iteration 7: 0.020087
iteration 8: 0.020005
iteration 9: 0.019980
iteration 10: 0.019985
iteration 11: 0.020077
iteration 12: 0.019979
iteration 13: 0.020060
iteration 14: 0.020092
iteration 15: 0.019954
iteration 16: 0.019766
iteration 17: 0.019432
iteration 18: 0.019491
iteration 19: 0.019458
iteration 20: 0.019482

I'm not allowed to share my employer's source code as a test case, so I 
tried the same procedure with the similarly large DeclBase.h from the 
public LLVM repository. To my surprise, DeclBase.h didn't suffer from 
any performance issues at all. Switching to fundamental-mode while 
visiting my file didn't change anything, so I assume that c-mode isn't 
to blame either.
There have been claims of overlay-related performance issues with cquery 
and some large(-ish) open-source C or C++ files, so I'll try to locate 
these files and hope that at least one of them exhibits this issue as well.

On 03/19/2018 04:07 PM, Sebastian Sturm wrote:
> for the file I was complaining about, the number returned is 3080 
> (doesn't exceed the number of chars though, (point-max) = 67641). Will 
> try to obtain usable timing data later today when I get home from work. 
> Thanks!
> 
> On 03/19/2018 03:56 PM, Stefan Monnier wrote:
>>> well no, it's about 2.5ms per call to line-number-at-pos, which is 
>>> called at
>>> least 6 times per character insertion (with my Emacs config, at
>>> least). Which already makes for 15ms per character insertion, excluding
>>> anything else done by cc-mode or lsp-mode.
>>
>> Since you say that the noverlay branch helps, could you check the number
>> of overlays involved?  E.g.
>>
>>      M-: (length (overlays-in (point-min) (point-max))) RET
>>
>> If there are more overlays than chars in this buffer, maybe there's
>> a problem in some Elisp that creates too many overlays?
>>
>> If there aren't that many overlays, then I wonder why the noverlays
>> branch would make such a significant difference.
>>
>> Also, if you can reliably reproduce the "slow editing", would it be
>> possible to make a recipe for it that we can try and reproduce on
>> our side?
>>
>>
>>          Stefan
>>
>>
> 



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-20  1:23                 ` Sebastian Sturm
@ 2018-03-20  6:30                   ` Eli Zaretskii
  2018-03-21  0:36                     ` Sebastian Sturm
  2018-03-22 20:52                   ` Stefan Monnier
  1 sibling, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-20  6:30 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Tue, 20 Mar 2018 02:23:02 +0100
> 
> 1st run:
> iteration 1: 0.001213
> iteration 2: 0.001170
> [...]
> 
> after evaluating (test-highlight) the figures are as follows:
> 1st run:
> iteration 1: 0.026012
> iteration 2: 0.020334

So, between 20-fold and 100-fold slow-down.

> I'm not allowed to share my employer's source code as a test case, so I 
> tried the same procedure with the similarly large DeclBase.h from the 
> public LLVM repository. To my surprise, DeclBase.h didn't suffer from 
> any performance issues at all. Switching to fundamental-mode while 
> visiting my file didn't change anything, so I assume that c-mode isn't 
> to blame either.

So it's still a mystery why your original file produces such a large
slowdown with overlays.

Can you show the results of "M-x profiler-report" for the slow test
with your original source file?  It could have some clues.  If that's
impossible, I can only repeat my suggestion to use perf to find the
code in Emacs that takes the lion's share of the processing time.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-20  6:30                   ` Eli Zaretskii
@ 2018-03-21  0:36                     ` Sebastian Sturm
  2018-03-21  6:47                       ` Eli Zaretskii
  2018-03-22 13:16                       ` Stefan Monnier
  0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-21  0:36 UTC (permalink / raw)
  To: emacs-devel

 > So it's still a mystery why your original file produces such a large
 > slowdown with overlays.
 >
 > Can you show the results of "M-x profiler-report" for the slow test
 > with your original source file?  It could have some clues.  If that's
 > impossible, I can only repeat my suggestion to use perf to find the
 > code in Emacs that takes the lion's share of the processing time.

this is the profiler report I get for the slow case (BTW, is there a way 
to have the profiler resolve functions within line-number-at-pos? I 
tried increasing profiler-max-stack-depth to 32, but the profiler still 
didn't show anything below line-number-at-pos)

- command-execute                                               20522  97%
  - call-interactively                                           20522  97%
   - funcall-interactively                                       20061  94%
    - eval-expression                                            18609  87%
     - eval                                                      18609  87%
      - benchmark-often                                          18609  87%
       - let*                                                    18609  87%
        - while                                                  18609  87%
         - message                                               18388  86%
          - format                                               18376  86%
           - nth                                                 18376  86%
            - let                                                18376  86%
             - list                                              18376  86%
              - let                                              18368  86%
                 line-number-at-pos                              18348  86%
            ws-butler-after-change                                   4   0%
    + evil-open-below                                              614   2%
    + counsel-M-x                                                  575   2%
    + undo-tree-undo                                                79   0%
    + next-line                                                     68   0%
    + evil-normal-state                                             52   0%
    + evil-previous-line                                            48   0%
    + previous-line                                                  7   0%
    + evil-emacs-state                                               6   0%
    + evil-insert                                                    3   0%
   + byte-code                                                     461   2%
+ redisplay_internal (C function)                                 254   1%
+ timer-event-handler                                             116   0%
+ evil-escape-pre-command-hook                                     82   0%
+ ...                                                              63   0%
+ global-spacemacs-whitespace-cleanup-mode-check-buffers 
47   0%
+ yas--post-command-handler                                        33   0%
+ evil-repeat-post-hook                                             6   0%
+ evil-visual-post-command                                          5   0%
+ flycheck-handle-signal                                            4   0%
   evil-snipe-mode-check-buffers                                     4   0%
   global-undo-tree-mode-check-buffers                               4   0%
+ sp--save-pre-command-state                                        2   0%
+ xselect-convert-to-string                                         2   0%
+ evil-visual-pre-command                                           1   0%
+ flycheck-pos-tip-hide-messages                                    1   0%
+ which-key--hide-popup                                             1   0%

with perf, the ("self") time taken by buf_charpos_to_bytepos increases 
from ~60% (fast case) to >98%. This is the diff generated by perf diff 
<fast.perf> <slow.perf>:

# Event 'cycles'
#
# Baseline    Delta  Shared Object         Symbol 

# ........  .......  .................... 
..........................................
#
     57.77%  +40.30%  emacs-27.0.50         [.] buf_charpos_to_bytepos
     11.12%  -10.70%  libc-2.23.so          [.] __memrchr
      6.48%   -6.19%  emacs-27.0.50         [.] assq_no_quit
      5.92%   -5.62%  emacs-27.0.50         [.] find_cache_boundary
      4.26%   -4.07%  emacs-27.0.50         [.] set_buffer_internal_2
      4.10%   -3.73%  emacs-27.0.50         [.] find_newline
      3.54%           libc-2.23.so          [.] __memmove_avx_unaligned
      1.25%   -1.19%  emacs-27.0.50         [.] region_cache_forward
      0.46%   -0.41%  [kernel.kallsyms]     [k] 0xffffffff83004eb0
      0.26%   -0.26%  emacs-27.0.50         [.] 
revalidate_region_cache.isra.1
      0.25%   -0.23%  emacs-27.0.50         [.] find_interval
      0.23%   -0.23%  emacs-27.0.50         [.] swap_in_symval_forwarding
      0.19%   -0.18%  emacs-27.0.50         [.] do_symval_forwarding
      0.16%   -0.15%  emacs-27.0.50         [.] eval_sub
      0.16%   -0.15%  emacs-27.0.50         [.] x_produce_glyphs
      0.13%           libXft.so.2.3.2       [.] XftCharIndex
      0.13%   -0.12%  libXft.so.2.3.2       [.] XftGlyphExtents
      0.12%   -0.12%  emacs-27.0.50         [.] store_symval_forwarding
      0.12%   -0.11%  emacs-27.0.50         [.] exec_byte_code
      0.09%           emacs-27.0.50         [.] Fsymbol_value
      0.08%   -0.08%  emacs-27.0.50         [.] x_get_glyph_overhangs
      0.07%           emacs-27.0.50         [.] find_symbol_value
      0.07%           libc-2.23.so          [.] __memcpy_avx_unaligned
      0.07%           emacs-27.0.50         [.] Fassq
      0.07%           emacs-27.0.50         [.] insert_1_both.part.9
      0.06%           emacs-27.0.50         [.] offset_intervals
      0.06%           libc-2.23.so          [.] __memcmp_sse4_1
      0.06%           emacs-27.0.50         [.] next_element_from_buffer
      0.05%   -0.05%  emacs-27.0.50         [.] move_it_in_display_line_to
      0.05%   -0.04%  emacs-27.0.50         [.] get_next_display_element
      0.05%   -0.04%  libXft.so.2.3.2       [.] XftFontCheckGlyph
      0.05%   -0.04%  libX11.so.6.3.0       [.] _XSetClipRectangles
      0.04%           libc-2.23.so          [.] __GI___printf_fp_l
      0.04%           emacs-27.0.50         [.] styled_format
      0.04%   -0.03%  emacs-27.0.50         [.] xftfont_text_extents
      0.04%   -0.03%  emacs-27.0.50         [.] draw_glyphs
      0.04%           emacs-27.0.50         [.] display_line
      0.04%           emacs-27.0.50         [.] validate_interval_range
      0.04%           emacs-27.0.50         [.] update_window_fringes
      0.04%           libX11.so.6.3.0       [.] 0x000000000001b65a
      0.04%   -0.03%  emacs-27.0.50         [.] lookup_char_property
      0.04%           emacs-27.0.50         [.] balance_an_interval
      0.03%           emacs-27.0.50         [.] composition_compute_stop_pos
      0.03%           libpthread-2.23.so    [.] pthread_mutex_unlock
      0.03%   -0.03%  emacs-27.0.50         [.] x_draw_glyph_string
      0.03%   -0.03%  emacs-27.0.50         [.] mem_insert
      0.03%   -0.03%  libpthread-2.23.so    [.] pthread_mutex_lock
      0.03%           libc-2.23.so          [.] malloc
      0.03%           emacs-27.0.50         [.] update_window_line
      0.03%           emacs-27.0.50         [.] Ffuncall
      0.03%   -0.03%  emacs-27.0.50         [.] get_glyph_face_and_encoding
      0.03%   -0.02%  emacs-27.0.50         [.] set_internal
      0.03%   -0.02%  emacs-27.0.50         [.] update_window
      0.03%           libXft.so.2.3.2       [.] XftDrawSrcPicture
      0.03%           libc-2.23.so          [.] vfprintf
      0.03%   -0.02%  emacs-27.0.50         [.] do_one_unbind.constprop.20
      0.03%           emacs-27.0.50         [.] set_iterator_to_next
      0.02%           emacs-27.0.50         [.] specbind
      0.02%           libc-2.23.so          [.] __memset_avx2
      0.02%   -0.02%  libX11.so.6.3.0       [.] _XFlushGCCache
      0.02%           emacs-27.0.50         [.] 
lookup_glyphless_char_display
      0.02%           libc-2.23.so          [.] _int_malloc
      0.02%           emacs-27.0.50         [.] set_cursor_from_row.isra.40
      0.02%   -0.02%  emacs-27.0.50         [.] message_dolog.part.60
      0.02%           emacs-27.0.50         [.] gap_left
      0.02%   -0.02%  emacs-27.0.50         [.] unbind_to
      0.02%   -0.02%  emacs-27.0.50         [.] init_iterator
      0.02%           emacs-27.0.50         [.] Fget_buffer
      0.02%   -0.01%  emacs-27.0.50         [.] overlays_at
      0.02%   -0.01%  [wl]                  [k] osl_readl
      0.02%   -0.01%  emacs-27.0.50         [.] set_point_both
      0.02%           emacs-27.0.50         [.] message3_nolog
      0.02%           libX11.so.6.3.0       [.] _XGetRequest
      0.02%           libc-2.23.so          [.] __sprintf_chk
      0.02%           emacs-27.0.50         [.] adjust_markers_for_insert
      0.02%           emacs-27.0.50         [.] adjust_suspend_auto_hscroll
      0.02%           emacs-27.0.50         [.] 
get_char_property_and_overlay
      0.01%           emacs-27.0.50         [.] window_box_width
      0.01%           emacs-27.0.50         [.] Fcons
      0.01%           libxcb.so.1.1.0       [.] 0x0000000000009cca
      0.01%   -0.01%  emacs-27.0.50         [.] arith_driver
      0.01%   -0.01%  emacs-27.0.50         [.] resize_mini_window
      0.01%           emacs-27.0.50         [.] unchain_marker
      0.01%           emacs-27.0.50         [.] face_for_char
      0.01%           emacs-27.0.50         [.] unblock_input_to
      0.01%           emacs-27.0.50         [.] unblock_input
      0.01%           emacs-27.0.50         [.] 
balance_possible_root_interval
      0.01%           emacs-27.0.50         [.] add_text_properties_1
      0.01%           emacs-27.0.50         [.] save_restriction_restore
      0.01%           emacs-27.0.50         [.] funcall_subr
      0.01%           emacs-27.0.50         [.] gap_right
      0.01%   -0.01%  emacs-27.0.50         [.] Fcurrent_buffer
      0.01%           libXrender.so.1.3.0   [.] XRenderFillRectangle
      0.01%           libXext.so.6.4.0      [.] XdbeSwapBuffers
      0.01%           emacs-27.0.50         [.] 
get_glyph_string_clip_rects.part.72
      0.01%           emacs-27.0.50         [.] Fwhile
      0.01%           emacs-27.0.50         [.] window_wants_mode_line
      0.01%           emacs-27.0.50         [.] display_echo_area_1
      0.01%           libXft.so.2.3.2       [.] XftDrawSetClipRectangles
      0.01%           emacs-27.0.50         [.] append_space_for_newline
      0.01%           emacs-27.0.50         [.] x_update_end
      0.01%           emacs-27.0.50         [.] Fforward_line
      0.01%           emacs-27.0.50         [.] memrchr@plt
      0.01%           emacs-27.0.50         [.] make_uninit_multibyte_string
      0.01%           emacs-27.0.50         [.] prepare_to_modify_buffer_1
      0.01%           emacs-27.0.50         [.] Fstring_equal
      0.01%           emacs-27.0.50         [.] init_glyph_string
      0.01%           libc-2.23.so          [.] _IO_old_init
      0.01%           libXrender.so.1.3.0   [.] XRenderFindDisplay
      0.01%           emacs-27.0.50         [.] Fnreverse
      0.01%           emacs-27.0.50         [.] arithcompare
      0.01%           emacs-27.0.50         [.] font_get_frame_data
      0.01%           emacs-27.0.50         [.] window_box_left
      0.01%           libX11.so.6.3.0       [.] XSetClipRectangles
      0.01%   -0.01%  emacs-27.0.50         [.] Flength
      0.01%           emacs-27.0.50         [.] previous_interval
      0.01%           libXft.so.2.3.2       [.] 0x0000000000007294
      0.01%           emacs-27.0.50         [.] 
x_compute_glyph_string_overhangs
      0.01%           emacs-27.0.50         [.] XftCharIndex@plt
      0.01%           libXrender.so.1.3.0   [.] XRenderCompositeString16
      0.01%           emacs-27.0.50         [.] message3
      0.01%           emacs-27.0.50         [.] xftfont_encode_char
      0.01%           emacs-27.0.50         [.] assign_row
      0.01%           emacs-27.0.50         [.] prepare_desired_row
      0.01%           emacs-27.0.50         [.] xftfont_draw
      0.01%           libXext.so.6.4.0      [.] XextFindDisplay
      0.01%           libpthread-2.23.so    [.] 
pthread_cond_broadcast@@GLIBC_2.3.2
      0.01%           libXft.so.2.3.2       [.] XftDrawGlyphs
      0.01%           emacs-27.0.50         [.] marker_position
      0.01%           emacs-27.0.50         [.] funcall_lambda
      0.01%           emacs-27.0.50         [.] gettime
      0.01%           libpthread-2.23.so    [.] __GI___libc_recvmsg
      0.01%           emacs-27.0.50         [.] lisp_time_struct
      0.01%           emacs-27.0.50         [.] echo_area_display
      0.01%           emacs-27.0.50         [.] CHECK_MARKER
      0.01%           emacs-27.0.50         [.] del_range_2
      0.01%           emacs-27.0.50         [.] prepare_face_for_display
      0.01%           emacs-27.0.50         [.] interval_deletion_adjustment
      0.01%           libc-2.23.so          [.] strlen
      0.01%           emacs-27.0.50         [.] verify_interval_modification
      0.01%           emacs-27.0.50         [.] count_size_as_multibyte
      0.01%           emacs-27.0.50         [.] adjust_overlays_for_insert
      0.01%           libXrender.so.1.3.0   [.] 
XRenderSetPictureClipRectangles
      0.01%           emacs-27.0.50         [.] buffer_local_value
      0.01%           emacs-27.0.50         [.] adjust_window_count
      0.01%           emacs-27.0.50         [.] 
notice_overwritten_cursor.part.52
      0.01%           libc-2.23.so          [.] __GI___writev
      0.01%           emacs-27.0.50         [.] record_in_backtrace
      0.01%           emacs-27.0.50         [.] save_restriction_save
      0.01%           emacs-27.0.50         [.] Fget_text_property
      0.01%   -0.00%  [vdso]                [.] __vdso_clock_gettime
      0.01%           emacs-27.0.50         [.] window_wants_header_line
      0.01%           emacs-27.0.50         [.] Fmessage
      0.01%           emacs-27.0.50         [.] FUNCTIONP
      0.01%           emacs-27.0.50         [.] window_display_table
      0.01%           emacs-27.0.50         [.] maybe_quit
      0.01%           emacs-27.0.50         [.] clear_glyph_matrix
      0.01%   -0.00%  emacs-27.0.50         [.] with_echo_area_buffer
      0.01%           libc-2.23.so          [.] __libc_enable_asynccancel
      0.01%           emacs-27.0.50         [.] should_produce_line_number
      0.01%           emacs-27.0.50         [.] x_set_glyph_string_clipping
      0.01%           emacs-27.0.50         [.] set_message_1
      0.01%           emacs-27.0.50         [.] del_range_both
      0.01%           emacs-27.0.50         [.] record_insert
      0.01%           emacs-27.0.50         [.] adjust_overlays_for_delete
      0.01%   +0.00%  libc-2.23.so          [.] _int_free
      0.01%           libc-2.23.so          [.] __strchrnul
      0.01%           emacs-27.0.50         [.] Fgoto_char
      0.01%           emacs-27.0.50         [.] free_misc
      0.01%           emacs-27.0.50         [.] draw_window_fringes
      0.01%           emacs-27.0.50         [.] x_flush.isra.37.part.38
      0.01%           libpthread-2.23.so    [.] __errno_location
      0.01%           emacs-27.0.50         [.] make_float
      0.01%           emacs-27.0.50         [.] fill_glyph_string
      0.01%           emacs-27.0.50         [.] Fget
      0.01%           emacs-27.0.50         [.] Flocal_variable_p
      0.01%           emacs-27.0.50         [.] decode_time_components
      0.01%           emacs-27.0.50         [.] XftGlyphExtents@plt
      0.01%           emacs-27.0.50         [.] row_for_charpos_p
      0.01%           emacs-27.0.50         [.] load_overlay_strings
      0.01%           emacs-27.0.50         [.] allocate_misc
      0.01%   -0.00%  libX11.so.6.3.0       [.] _XSend
      0.01%           libX11.so.6.3.0       [.] _XData32
      0.01%   -0.00%  emacs-27.0.50         [.] buf_bytepos_to_charpos
      0.01%           emacs-27.0.50         [.] face_at_buffer_position
      0.01%           emacs-27.0.50         [.] invalidate_current_column
      0.01%           emacs-27.0.50         [.] find_composition
      0.01%           emacs-27.0.50         [.] x_draw_window_cursor
      0.01%           emacs-27.0.50         [.] x_flip_and_flush
      0.01%           libc-2.23.so          [.] _IO_no_init
      0.01%           emacs-27.0.50         [.] Ftime_subtract
      0.01%           emacs-27.0.50         [.] row_hash
      0.01%           emacs-27.0.50         [.] lookup_basic_face
      0.01%           emacs-27.0.50         [.] memcpy@plt
      0.01%           emacs-27.0.50         [.] Fset_buffer
      0.01%           emacs-27.0.50         [.] produce_special_glyphs
      0.00%           emacs-27.0.50         [.] 
x_draw_glyph_string_background.part.44
      0.00%           [vdso]                [.] 0x0000000000000939
      0.00%           emacs-27.0.50         [.] make_save_obj_obj_obj_obj
      0.00%           libc-2.23.so          [.] __GI___libc_poll
      0.00%           emacs-27.0.50         [.] do_specbind
      0.00%           libXft.so.2.3.2       [.] XftGlyphRender
      0.00%           emacs-27.0.50         [.] move_it_to
      0.00%           emacs-27.0.50         [.] make_current.isra.14
      0.00%           emacs-27.0.50         [.] Fnext_single_property_change
      0.00%           emacs-27.0.50         [.] handle_face_prop
      0.00%           libX11.so.6.3.0       [.] _XFlush
      0.00%           emacs-27.0.50         [.] unwind_with_echo_area_buffer
      0.00%           emacs-27.0.50         [.] evaporate_overlays
      0.00%           emacs-27.0.50         [.] sort_overlays
      0.00%           emacs-27.0.50         [.] del_range_1
      0.00%           libX11.so.6.3.0       [.] XFlush
      0.00%           emacs-27.0.50         [.] try_window
      0.00%           emacs-27.0.50         [.] x_set_glyph_string_gc
      0.00%           libc-2.23.so          [.] malloc_consolidate
      0.00%           emacs-27.0.50         [.] handle_stop
      0.00%           emacs-27.0.50         [.] recenter_overlay_lists
      0.00%           libX11.so.6.3.0       [.] pthread_mutex_lock@plt
      0.00%           emacs-27.0.50         [.] modify_text_properties
      0.00%           emacs-27.0.50         [.] delete_interval
      0.00%           emacs-27.0.50         [.] window_text_bottom_y
      0.00%   +0.00%  emacs-27.0.50         [.] copy_text
      0.00%           emacs-27.0.50         [.] set_default_internal
      0.00%   +0.00%  libxcb.so.1.1.0       [.] xcb_writev
      0.00%           emacs-27.0.50         [.] temp_set_point_both
      0.00%           emacs-27.0.50         [.] insert_1_both
      0.00%           emacs-27.0.50         [.] record_property_change
      0.00%           emacs-27.0.50         [.] lisp_align_malloc
      0.00%           emacs-27.0.50         [.] x_mark_frame_dirty
      0.00%   +0.00%  emacs-27.0.50         [.] text_quoting_style
      0.00%           libc-2.23.so          [.] __GI___mempcpy
      0.00%           emacs-27.0.50         [.] Ffloat_time
      0.00%           emacs-27.0.50         [.] x_write_glyphs
      0.00%   +0.00%  emacs-27.0.50         [.] insert_from_string_1
      0.00%           emacs-27.0.50         [.] Fpoint
      0.00%           libc-2.23.so          [.] __vsprintf_chk
      0.00%           libc-2.23.so          [.] hack_digit
      0.00%           emacs-27.0.50         [.] grow_specpdl
      0.00%           emacs-27.0.50         [.] time_arith
      0.00%           emacs-27.0.50         [.] update_end
      0.00%           emacs-27.0.50         [.] allocate_string_data
      0.00%           emacs-27.0.50         [.] save_excursion_restore
      0.00%   +0.00%  emacs-27.0.50         [.] Flet
      0.00%           emacs-27.0.50         [.] mem_rotate_right
      0.00%           emacs-27.0.50         [.] fetch_buffer_markers
      0.00%           emacs-27.0.50         [.] move_gap_both
      0.00%           emacs-27.0.50         [.] xmalloc
      0.00%           emacs-27.0.50         [.] update_begin
      0.00%           emacs-27.0.50         [.] float_arith_driver
      0.00%           emacs-27.0.50         [.] make_specified_string
      0.00%           emacs-27.0.50         [.] handle_composition_prop
      0.00%           emacs-27.0.50         [.] display_and_set_cursor
      0.00%           emacs-27.0.50         [.] compute_line_metrics
      0.00%           emacs-27.0.50         [.] clock_gettime@plt
      0.00%           libc-2.23.so          [.] free
      0.00%           emacs-27.0.50         [.] Fplist_get
      0.00%           libgdk-3.so.0.1800.9  [.] gdk_display_manager_get
      0.00%           emacs-27.0.50         [.] make_interval
      0.00%           libxcb.so.1.1.0       [.] xcb_poll_for_event
      0.00%           emacs-27.0.50         [.] adjust_markers_for_delete
      0.00%           emacs-27.0.50         [.] do_pending_window_change
      0.00%           emacs-27.0.50         [.] XftDrawGlyphs@plt
      0.00%           libc-2.23.so          [.] _itoa_word
      0.00%           emacs-27.0.50         [.] disassemble_lisp_time
      0.00%           emacs-27.0.50         [.] Ftext_properties_at
      0.00%           emacs-27.0.50         [.] composition_reseat_it
      0.00%           libX11.so.6.3.0       [.] _XEventsQueued
      0.00%           emacs-27.0.50         [.] minmax_driver
      0.00%           emacs-27.0.50         [.] set_marker_restricted_both
      0.00%           emacs-27.0.50         [.] graft_intervals_into_buffer
      0.00%           emacs-27.0.50         [.] disp_char_vector
      0.00%           libc-2.23.so          [.] __clock_gettime
      0.00%           emacs-27.0.50         [.] CHECK_STRING_OR_BUFFER
      0.00%           emacs-27.0.50         [.] intervals_equal
      0.00%           emacs-27.0.50         [.] ensure_echo_area_buffers
      0.00%           emacs-27.0.50         [.] invalidate_buffer_caches
      0.00%           emacs-27.0.50         [.] set_marker_both
      0.00%   +0.00%  emacs-27.0.50         [.] record_unwind_protect
      0.00%           emacs-27.0.50         [.] run_hook_with_args
      0.00%           emacs-27.0.50         [.] set_point_from_marker
      0.00%           emacs-27.0.50         [.] list4
      0.00%           libX11.so.6.3.0       [.] XSetClipMask
      0.00%           emacs-27.0.50         [.] record_buffer_markers
      0.00%           emacs-27.0.50         [.] default_value
              +0.00%  libXext.so.6.4.0      [.] 0x000000000000bf10
              +0.00%  emacs-27.0.50         [.] Fsetq
              +0.00%  emacs-27.0.50         [.] reseat_1
              +0.00%  emacs-27.0.50         [.] update_compositions

this is what perf annotate shows when invoked on buf_charpos_to_bytepos 
(slow case):

        │     ↓ jle    438 
                ▒
   4,39 │       mov    0x20(%rax),%r8 
                ▒
   8,38 │       mov    %rdx,%rbp 
                ▒
   0,05 │2f0:   mov    %rbp,%rdx 
                ▒
   0,13 │       mov    %r8,%rdi 
                ▒
   2,70 │       sub    %r13,%rdx 
                ▒
   2,22 │       sub    %rbx,%rdi 
                ▒
   0,05 │       cmp    %rdi,%rdx 
                ▒
        │     ↑ je     1e9 
                ▒
        │ 
                ▒
        │           /* If we are down to a range of 50 chars, 
                ▒
        │              don't bother checking any other markers; 
                ▒
        │              scan the intervening chars directly now.  */ 
                ▒
        │           if (best_above - best_below < 50) 
                ▒
   3,21 │305:   cmp    $0x31,%rdx 
                ▒
        │     ↓ jle    480 
                ▒
        │       CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b)); 
                ▒
        │ 
                ▒
        │       if (b == cached_buffer && BUF_MODIFF (b) == 
cached_modiff)              ▒
        │         CONSIDER (cached_charpos, cached_bytepos); 
                ▒
        │ 
                ▒
        │       for (tail = BUF_MARKERS (b); tail; tail = tail->next) 
                ▒
   2,50 │       mov    0x10(%rax),%rax 
                ▒
   4,25 │       test   %rax,%rax 
                ▒
        │     ↓ je     480 
                ▒
        │         { 
                ◆
        │           CONSIDER (tail->charpos, tail->bytepos); 
                ▒
   2,64 │31c:   mov    0x18(%rax),%rdx 
                ▒
  59,16 │       cmp    %rdx,%rsi 
                ▒
        │     ↓ je     638 
                ▒
   5,70 │       cmp    %rdx,%rsi 
                ▒
        │     ↑ jl     2e0 
                ▒
   0,00 │       cmp    %rdx,%r13 
                ▒
        │     ↓ jge    438 
                ▒
   0,00 │       mov    0x20(%rax),%rbx 
                ▒
   0,01 │       mov    %rdx,%r13 
                ▒
        │     ↑ jmp    2f0 
                ▒
        │     marker_byte_position(): 
                ▒
        │       if (!buf) 
                ▒
        │         error ("Marker does not point anywhere"); 
                ▒
        │ 
                ▒
        │       eassert (BUF_BEG_BYTE (buf) <= m->bytepos && m->bytepos 
<= BUF_Z_BYTE (b▒
        │ 
                ▒
        │       return m->bytepos; 
                ▒
        │340:   mov    0x20(%rdi),%rbx 
                ▒
        │     ↑ jmpq   cb 
                ▒
        │       nop 
                ▒
        │     buf_charpos_to_bytepos(): 
                ▒
        │ 
                ▒
        │          If at any point we can tell that the space between 
those             ▒
        │          two best approximations is all single-byte, 
                ▒
        │          we interpolate the result immediately.  */ 
                ▒
        │ 
                ▒
        │       CONSIDER (BUF_PT (b), BUF_PT_BYTE (b)); 
                ▒
        │350:   mov    0x2f0(%rdi),%r13 
                ▒
        │       cmp    %rsi,%r13 
                ▒

to my eyes, the fast case doesn't look too different though:

        │     ↓ jle    438
   0,83 │       mov    0x20(%rax),%r8
   0,93 │       mov    %rdx,%rbp
        │2f0:   mov    %rbp,%rdx
   0,06 │       mov    %r8,%rdi
   0,20 │       sub    %r13,%rdx
   0,11 │       sub    %rbx,%rdi
   0,12 │       cmp    %rdi,%rdx
        │     ↑ je     1e9
        │
        │           /* If we are down to a range of 50 chars,
        │              don't bother checking any other markers;
        │              scan the intervening chars directly now.  */
        │           if (best_above - best_below < 50)
  11,24 │305:   cmp    $0x31,%rdx
        │     ↓ jle    480
        │       CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
        │
        │       if (b == cached_buffer && BUF_MODIFF (b) == cached_modiff)
        │         CONSIDER (cached_charpos, cached_bytepos);
        │
        │       for (tail = BUF_MARKERS (b); tail; tail = tail->next)
   0,16 │       mov    0x10(%rax),%rax
  10,41 │       test   %rax,%rax
        │     ↓ je     480
        │         {
        │           CONSIDER (tail->charpos, tail->bytepos);
   3,19 │31c:   mov    0x18(%rax),%rdx
  42,86 │       cmp    %rdx,%rsi
        │     ↓ je     638
  10,36 │       cmp    %rdx,%rsi
        │     ↑ jl     2e0
   2,24 │       cmp    %rdx,%r13
        │     ↓ jge    438
   0,76 │       mov    0x20(%rax),%rbx
   0,43 │       mov    %rdx,%r13
        │     ↑ jmp    2f0
        │     marker_byte_position():
        │       if (!buf)
        │         error ("Marker does not point anywhere");
        │
        │       eassert (BUF_BEG_BYTE (buf) <= m->bytepos && m->bytepos 
<= BUF_Z_BYTE (bu
        │
        │       return m->bytepos;
        │340:   mov    0x20(%rdi),%rbx
        │     ↑ jmpq   cb
        │       nop
        │     buf_charpos_to_bytepos():
        │
        │          If at any point we can tell that the space between those
        │          two best approximations is all single-byte,
        │          we interpolate the result immediately.  */
        │
        │       CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
        │350:   mov    0x2f0(%rdi),%r13
        │       cmp    %rsi,%r13

I hope this is of some use, but I'll keep looking for open source files 
to reproduce the issue



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-21  0:36                     ` Sebastian Sturm
@ 2018-03-21  6:47                       ` Eli Zaretskii
  2018-03-22 13:16                       ` Stefan Monnier
  1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-21  6:47 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Wed, 21 Mar 2018 01:36:38 +0100
> 
> this is the profiler report I get for the slow case (BTW, is there a way 
> to have the profiler resolve functions within line-number-at-pos?

Yes: load simple.el manually before running the benchmark.

> with perf, the ("self") time taken by buf_charpos_to_bytepos increases 
> from ~60% (fast case) to >98%. This is the diff generated by perf diff 
> <fast.perf> <slow.perf>:
> 
> # Event 'cycles'
> #
> # Baseline    Delta  Shared Object         Symbol 
> 
> # ........  .......  .................... 
> ..........................................
> #
>      57.77%  +40.30%  emacs-27.0.50         [.] buf_charpos_to_bytepos

This seems to confirm Stefan's guess that converting character
positions to byte positions takes most of the time, which might make
sense with a lot of overlays (because each overlay uses 2 markers).

Does your code call line-number-at-pos at random positions in the
buffer, or are the positions close to one another?  If the latter, you
might be better off calling count-lines directly, starting at the line
where you previously calculated the line number, instead of calling
line-number-at-pos, which always begins at point-min.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-21  0:36                     ` Sebastian Sturm
  2018-03-21  6:47                       ` Eli Zaretskii
@ 2018-03-22 13:16                       ` Stefan Monnier
  2018-03-22 19:54                         ` Sebastian Sturm
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-22 13:16 UTC (permalink / raw)
  To: emacs-devel

> this is the profiler report I get for the slow case (BTW, is there a way to
> have the profiler resolve functions within line-number-at-pos?

It should do that without you asking.  I mean, it won't show you
`goto-char` and `point-min` kind of things since these are "inlined"
(actually turned into their own byte-code), but `count-lines` should
definitely be there.

>              - let                                              18368  86%
>                 line-number-at-pos                              18348  86%

It's very odd that there's no `count-lines` down here (and `count-lines`
is a perfectly normal Elisp function that's not inlined or otherwise
treated specially), since it should be where most of the time is spent!

If we assume the code works as intended, it would imply that the time is
not spent in `count-lines` but elsewhere, e.g. in `goto-char`.

> with perf, the ("self") time taken by buf_charpos_to_bytepos increases from
> ~60% (fast case) to >98%. This is the diff generated by perf diff
> <fast.perf> <slow.perf>:

So I guess that could be it: the (goto-char opoint) spends an inordinate
amount of time in buf_charpos_to_bytepos.

Could you try the patch below, to see if it makes a difference?

        Stefan

diff --git a/src/marker.c b/src/marker.c
index 7773c4fce0..3d808fd6fa 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -141,6 +141,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = 50;

   eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));

@@ -180,8 +181,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - best_below < distance)
 	break;
+      else
+        distance = distance + 10;
     }

   /* We get here if we did not exactly hit one of the known places.

^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-22 13:16                       ` Stefan Monnier
@ 2018-03-22 19:54                         ` Sebastian Sturm
  2018-03-22 20:04                           ` Sebastian Sturm
  0 siblings, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-22 19:54 UTC (permalink / raw)
  To: emacs-devel

 >
 > It should do that without you asking.  I mean, it won't show you
 > `goto-char` and `point-min` kind of things since these are "inlined"
 > (actually turned into their own byte-code), but `count-lines` should
 > definitely be there.
 >
 >>               - let 
18368  86%
 >>                  line-number-at-pos 
18348  86%
 >
 > It's very odd that there's no `count-lines` down here (and `count-lines`
 > is a perfectly normal Elisp function that's not inlined or otherwise
 > treated specially), since it should be where most of the time is spent!

when first evaluating simple.el as Eli suggested, the profiler indeed 
shows that pretty much all time is spent somewhere within `count-lines`. 
Unfortunately the patch didn't seem to help though I'll double-check at 
home (also, it seemed to improve baseline performance, but I'm not sure 
if I should regard that as a random fluctuation, will need to do more 
measurements).
Also, this time even the first measurement with a small number of 
overlays was extremely slow, not sure what to make of that.

In general however, the trend seems clear to me --- for this one file, 
overlays hurt `line-number-at-pos` performance very much, except when 
using the noverlay branch.
Are there other things I should look for in my file that might 
negatively affect performance (strange codepoints or anything else that 
might perhaps upset `buf_charpos_to_bytepos`)?

baseline (emacs-27 master #667cdf42..., approx. 5-10 overlays created by 
flycheck):
iteration 1: 0.012696
iteration 2: 0.002595
iteration 3: 0.002606
iteration 4: 0.002601
iteration 5: 0.002649
iteration 6: 0.002605
iteration 7: 0.002594
iteration 8: 0.002601
iteration 9: 0.002603
iteration 10: 0.002601
iteration 11: 0.002606
iteration 12: 0.002626
iteration 13: 0.002603
iteration 14: 0.002642
iteration 15: 0.002599
iteration 16: 0.002598
iteration 17: 0.002600
iteration 18: 0.002601
iteration 19: 0.002599
iteration 20: 0.002608

emacs-27 master, approx. 6200 overlays created by (test-highlight)
iteration 1: 0.022795
iteration 2: 0.015560
iteration 3: 0.015697
iteration 4: 0.015913
iteration 5: 0.015894
iteration 6: 0.016063
iteration 7: 0.015928
iteration 8: 0.015890
iteration 9: 0.015278
iteration 10: 0.015515
iteration 11: 0.015327
iteration 12: 0.015326
iteration 13: 0.015574
iteration 14: 0.015319
iteration 15: 0.015370
iteration 16: 0.015354
iteration 17: 0.015333
iteration 18: 0.015312
iteration 19: 0.015481
iteration 20: 0.015411

emacs-27 master + patch "distance+10", ~10 overlays created by flycheck:
iteration 1: 0.002938
iteration 2: 0.001132
iteration 3: 0.000550
iteration 4: 0.000468
iteration 5: 0.000470
iteration 6: 0.000451
iteration 7: 0.000449
iteration 8: 0.000451
iteration 9: 0.000450
iteration 10: 0.000449
iteration 11: 0.000448
iteration 12: 0.000452
iteration 13: 0.000451
iteration 14: 0.000443
iteration 15: 0.000448
iteration 16: 0.000443
iteration 17: 0.000444
iteration 18: 0.000445
iteration 19: 0.000445
iteration 20: 0.000445

emacs-27 master + patch "distance+10", ~6200 overlays created by 
(test-highlight):
iteration 1: 0.019673
iteration 2: 0.014469
iteration 3: 0.014491
iteration 4: 0.014430
iteration 5: 0.014493
iteration 6: 0.014704
iteration 7: 0.014741
iteration 8: 0.014536
iteration 9: 0.014433
iteration 10: 0.014469
iteration 11: 0.014429
iteration 12: 0.014509
iteration 13: 0.014484
iteration 14: 0.014487
iteration 15: 0.014524
iteration 16: 0.014449
iteration 17: 0.014501
iteration 18: 0.014469
iteration 19: 0.014429
iteration 20: 0.014507

noverlay branch (#886933...), ~10 overlays:
iteration 1: 0.002370
iteration 2: 0.001191
iteration 3: 0.001189
iteration 4: 0.001162
iteration 5: 0.001095
iteration 6: 0.001096
iteration 7: 0.000522
iteration 8: 0.000531
iteration 9: 0.000369
iteration 10: 0.000365
iteration 11: 0.000365
iteration 12: 0.000366
iteration 13: 0.000365
iteration 14: 0.000380
iteration 15: 0.000367
iteration 16: 0.000365
iteration 17: 0.000365
iteration 18: 0.000365
iteration 19: 0.000363
iteration 20: 0.000368

noverlay branch, ~6200 overlays:
iteration 1: 0.001878
iteration 2: 0.001831
iteration 3: 0.001838
iteration 4: 0.001826
iteration 5: 0.001027
iteration 6: 0.000722
iteration 7: 0.000531
iteration 8: 0.000479
iteration 9: 0.000480
iteration 10: 0.000479
iteration 11: 0.000479
iteration 12: 0.000479
iteration 13: 0.000478
iteration 14: 0.000479
iteration 15: 0.000478
iteration 16: 0.000478
iteration 17: 0.000482
iteration 18: 0.000479
iteration 19: 0.000480
iteration 20: 0.000476



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-22 19:54                         ` Sebastian Sturm
@ 2018-03-22 20:04                           ` Sebastian Sturm
  0 siblings, 0 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-22 20:04 UTC (permalink / raw)
  To: emacs-devel

for the record, this is what I get with the (non-noverlay, non-patched) 
emacs-27 master branch after calling `(set-buffer-multibyte nil)`. Guess 
this merely confirms what you already found out, but for me this is a 
viable workaround until the issue is properly resolved. Will keep 
looking for an open source test file over the weekend

emacs-27 master, ~6200 overlays, enable-multibyte-characters = nil:
iteration 1: 0.000300
iteration 2: 0.000268
iteration 3: 0.000267
iteration 4: 0.000268
iteration 5: 0.000267
iteration 6: 0.000266
iteration 7: 0.000263
iteration 8: 0.000263
iteration 9: 0.000277
iteration 10: 0.000269
iteration 11: 0.000264
iteration 12: 0.000266
iteration 13: 0.000263
iteration 14: 0.000265
iteration 15: 0.000273
iteration 16: 0.000267
iteration 17: 0.000266
iteration 18: 0.000266
iteration 19: 0.000265
iteration 20: 0.000263


On 03/22/2018 08:54 PM, Sebastian Sturm wrote:
>  >
>  > It should do that without you asking.  I mean, it won't show you
>  > `goto-char` and `point-min` kind of things since these are "inlined"
>  > (actually turned into their own byte-code), but `count-lines` should
>  > definitely be there.
>  >
>  >>               - let 18368  86%
>  >>                  line-number-at-pos 18348  86%
>  >
>  > It's very odd that there's no `count-lines` down here (and `count-lines`
>  > is a perfectly normal Elisp function that's not inlined or otherwise
>  > treated specially), since it should be where most of the time is spent!
> 
> when first evaluating simple.el as Eli suggested, the profiler indeed 
> shows that pretty much all time is spent somewhere within `count-lines`. 
> Unfortunately the patch didn't seem to help though I'll double-check at 
> home (also, it seemed to improve baseline performance, but I'm not sure 
> if I should regard that as a random fluctuation, will need to do more 
> measurements).
> Also, this time even the first measurement with a small number of 
> overlays was extremely slow, not sure what to make of that.
> 
> In general however, the trend seems clear to me --- for this one file, 
> overlays hurt `line-number-at-pos` performance very much, except when 
> using the noverlay branch.
> Are there other things I should look for in my file that might 
> negatively affect performance (strange codepoints or anything else that 
> might perhaps upset `buf_charpos_to_bytepos`)?
> 
> baseline (emacs-27 master #667cdf42..., approx. 5-10 overlays created by 
> flycheck):
> iteration 1: 0.012696
> iteration 2: 0.002595
> iteration 3: 0.002606
> iteration 4: 0.002601
> iteration 5: 0.002649
> iteration 6: 0.002605
> iteration 7: 0.002594
> iteration 8: 0.002601
> iteration 9: 0.002603
> iteration 10: 0.002601
> iteration 11: 0.002606
> iteration 12: 0.002626
> iteration 13: 0.002603
> iteration 14: 0.002642
> iteration 15: 0.002599
> iteration 16: 0.002598
> iteration 17: 0.002600
> iteration 18: 0.002601
> iteration 19: 0.002599
> iteration 20: 0.002608
> 
> emacs-27 master, approx. 6200 overlays created by (test-highlight)
> iteration 1: 0.022795
> iteration 2: 0.015560
> iteration 3: 0.015697
> iteration 4: 0.015913
> iteration 5: 0.015894
> iteration 6: 0.016063
> iteration 7: 0.015928
> iteration 8: 0.015890
> iteration 9: 0.015278
> iteration 10: 0.015515
> iteration 11: 0.015327
> iteration 12: 0.015326
> iteration 13: 0.015574
> iteration 14: 0.015319
> iteration 15: 0.015370
> iteration 16: 0.015354
> iteration 17: 0.015333
> iteration 18: 0.015312
> iteration 19: 0.015481
> iteration 20: 0.015411
> 
> emacs-27 master + patch "distance+10", ~10 overlays created by flycheck:
> iteration 1: 0.002938
> iteration 2: 0.001132
> iteration 3: 0.000550
> iteration 4: 0.000468
> iteration 5: 0.000470
> iteration 6: 0.000451
> iteration 7: 0.000449
> iteration 8: 0.000451
> iteration 9: 0.000450
> iteration 10: 0.000449
> iteration 11: 0.000448
> iteration 12: 0.000452
> iteration 13: 0.000451
> iteration 14: 0.000443
> iteration 15: 0.000448
> iteration 16: 0.000443
> iteration 17: 0.000444
> iteration 18: 0.000445
> iteration 19: 0.000445
> iteration 20: 0.000445
> 
> emacs-27 master + patch "distance+10", ~6200 overlays created by 
> (test-highlight):
> iteration 1: 0.019673
> iteration 2: 0.014469
> iteration 3: 0.014491
> iteration 4: 0.014430
> iteration 5: 0.014493
> iteration 6: 0.014704
> iteration 7: 0.014741
> iteration 8: 0.014536
> iteration 9: 0.014433
> iteration 10: 0.014469
> iteration 11: 0.014429
> iteration 12: 0.014509
> iteration 13: 0.014484
> iteration 14: 0.014487
> iteration 15: 0.014524
> iteration 16: 0.014449
> iteration 17: 0.014501
> iteration 18: 0.014469
> iteration 19: 0.014429
> iteration 20: 0.014507
> 
> noverlay branch (#886933...), ~10 overlays:
> iteration 1: 0.002370
> iteration 2: 0.001191
> iteration 3: 0.001189
> iteration 4: 0.001162
> iteration 5: 0.001095
> iteration 6: 0.001096
> iteration 7: 0.000522
> iteration 8: 0.000531
> iteration 9: 0.000369
> iteration 10: 0.000365
> iteration 11: 0.000365
> iteration 12: 0.000366
> iteration 13: 0.000365
> iteration 14: 0.000380
> iteration 15: 0.000367
> iteration 16: 0.000365
> iteration 17: 0.000365
> iteration 18: 0.000365
> iteration 19: 0.000363
> iteration 20: 0.000368
> 
> noverlay branch, ~6200 overlays:
> iteration 1: 0.001878
> iteration 2: 0.001831
> iteration 3: 0.001838
> iteration 4: 0.001826
> iteration 5: 0.001027
> iteration 6: 0.000722
> iteration 7: 0.000531
> iteration 8: 0.000479
> iteration 9: 0.000480
> iteration 10: 0.000479
> iteration 11: 0.000479
> iteration 12: 0.000479
> iteration 13: 0.000478
> iteration 14: 0.000479
> iteration 15: 0.000478
> iteration 16: 0.000478
> iteration 17: 0.000482
> iteration 18: 0.000479
> iteration 19: 0.000480
> iteration 20: 0.000476
> 



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-20  1:23                 ` Sebastian Sturm
  2018-03-20  6:30                   ` Eli Zaretskii
@ 2018-03-22 20:52                   ` Stefan Monnier
  2018-03-22 23:11                     ` Sebastian Sturm
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-22 20:52 UTC (permalink / raw)
  To: emacs-devel

> (defun benchmark-often ()
>   (cl-loop for n from 1 upto 20 do
>            (message (format "iteration %d: %f" n (nth 0 (benchmark-run
> (line-number-at-pos (point))))))))
                      ^^^^^^^
Where is this "point" in your tests (I expect the timing to vary
significantly depending on this).

> 1st run:
> iteration 1: 0.001213
> iteration 2: 0.001170
> iteration 3: 0.001170
> iteration 4: 0.001238
> iteration 5: 0.001163
> iteration 6: 0.001153
> iteration 7: 0.000421
> iteration 8: 0.000426
> iteration 9: 0.000322
> iteration 10: 0.000301
> iteration 11: 0.000291
> iteration 12: 0.000292
> iteration 13: 0.000291
> iteration 14: 0.000291
> iteration 15: 0.000295
> iteration 16: 0.000289
> iteration 17: 0.000289
> iteration 18: 0.000288
> iteration 19: 0.000288
> iteration 20: 0.000287

I recommend you don't bother outputting all 20 results: better summarize
it by getting rid of the first test and then giving e.g. the sum or the
median of the rest.

> I'm not allowed to share my employer's source code as a test case, so
> I tried the same procedure with the similarly large DeclBase.h from the
> public LLVM repository. To my surprise, DeclBase.h didn't suffer from any
> performance issues at all.

My crystal ball tells me that DeclBase.h is pure ASCII so byte<->char
conversion is trivial, whereas your file likely contains umlauts and
other disreputable characters.

Here's a similar test case to yours but which builds up its own
artificial buffer with a few non-ascii chars to spice things up:

       (with-temp-buffer
         (dotimes (i 1000)
           (insert "lksajflahalskjdféefawrgfrüegf\n"))
         (let ((txtbuf (current-buffer)))
           (dotimes (s 4)
             (with-temp-buffer
               (insert-buffer-substring txtbuf)
               (let ((stepsize (lsh 10 (* 4 s))))
                 (cl-loop for n from (point-min) upto (- (point-max) stepsize)
                          by stepsize do
                          (let ((ov (make-overlay n (+ (1- stepsize) n))))
                            (overlay-put ov 'cquery-sem-highlight t))))
               (dotimes (i 4)
                 (let ((timing
                        (benchmark-run 1000
                          (line-number-at-pos
                           (+ (point-min) (* i (/ (buffer-size) 4)))))))
                   (message "ols=%S pos=%S/4 time=%.4f (+ %S)"
                            (/ (buffer-size) (lsh 10 (* 4 s))) i
                            (car timing) (cdr timing)))
                 )))))

This gave me (on my top-of-the-line Thinkpad T61 using Debian's `emacs25`):

    ols=3000 pos=0/4 time=0.0018 (+ (0 0.0))
    ols=3000 pos=1/4 time=6.1074 (+ (0 0.0))
    ols=3000 pos=2/4 time=10.6876 (+ (0 0.0))
    ols=3000 pos=3/4 time=13.7854 (+ (0 0.0))
    ols=187 pos=0/4 time=0.0016 (+ (0 0.0))
    ols=187 pos=1/4 time=0.3055 (+ (0 0.0))
    ols=187 pos=2/4 time=0.6001 (+ (0 0.0))
    ols=187 pos=3/4 time=0.8903 (+ (0 0.0))
    ols=11 pos=0/4 time=0.0015 (+ (0 0.0))
    ols=11 pos=1/4 time=0.0769 (+ (1 0.006324223))
    ols=11 pos=2/4 time=0.1439 (+ (0 0.0))
    ols=11 pos=3/4 time=0.2215 (+ (0 0.0))
    ols=0 pos=0/4 time=0.0015 (+ (0 0.0))
    ols=0 pos=1/4 time=0.0548 (+ (0 0.0))
    ols=0 pos=2/4 time=0.1102 (+ (0 0.0))
    ols=0 pos=3/4 time=0.1690 (+ (0 0.0))



        Stefan




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-22 20:52                   ` Stefan Monnier
@ 2018-03-22 23:11                     ` Sebastian Sturm
  2018-03-23  5:03                       ` Stefan Monnier
  2018-03-23  8:07                       ` Eli Zaretskii
  0 siblings, 2 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-22 23:11 UTC (permalink / raw)
  To: emacs-devel

On 03/22/2018 09:52 PM, Stefan Monnier wrote:
>> (defun benchmark-often ()
>>    (cl-loop for n from 1 upto 20 do
>>             (message (format "iteration %d: %f" n (nth 0 (benchmark-run
>> (line-number-at-pos (point))))))))
>                        ^^^^^^^
> Where is this "point" in your tests (I expect the timing to vary
> significantly depending on this).

yes, these tests were all performed close to the very bottom of the file 
as I knew the issue to get worse towards the buffer end

> My crystal ball tells me that DeclBase.h is pure ASCII so byte<->char
> conversion is trivial, whereas your file likely contains umlauts and
> other disreputable characters.
> 
> Here's a similar test case to yours but which builds up its own
> artificial buffer with a few non-ascii chars to spice things up:

thank you! I'm very glad you could come up with a reproducible test 
case, and it's true that my file contains two instances of the greek 
letter "μ" that seem to cause this performance issue (though I was 
surprised to see that this few characters could have such a poisonous 
effect). Likewise, when replacing the topmost part in your benchmark 
function with the following:

(A)  (dotimes (i 1000)
        (insert "pure ascii pure ascii pure ascii\n"))

(B)  (dotimes (i 500)
        (insert "pure ascii pure ascii pure ascii\n"))
        (insert "μ")
      (dotimes (i 500)
        (insert "pure ascii pure ascii pure ascii\n"))

respectively, I obtain the following timing results:

(A)
ols=3300 pos=0/4 time=0.0014 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0155 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0281 (+ (0 0.0))
ols=3300 pos=3/4 time=0.0447 (+ (0 0.0))
ols=206 pos=0/4 time=0.0007 (+ (0 0.0))
ols=206 pos=1/4 time=0.0130 (+ (0 0.0))
ols=206 pos=2/4 time=0.0283 (+ (0 0.0))
ols=206 pos=3/4 time=0.0447 (+ (0 0.0))
ols=12 pos=0/4 time=0.0007 (+ (0 0.0))
ols=12 pos=1/4 time=0.0129 (+ (0 0.0))
ols=12 pos=2/4 time=0.0281 (+ (0 0.0))
ols=12 pos=3/4 time=0.0447 (+ (0 0.0))
ols=0 pos=0/4 time=0.0007 (+ (0 0.0))
ols=0 pos=1/4 time=0.0134 (+ (0 0.0))
ols=0 pos=2/4 time=0.0297 (+ (0 0.0))
ols=0 pos=3/4 time=0.0463 (+ (0 0.0))

(B)
ols=3300 pos=0/4 time=0.0007 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0301 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0482 (+ (0 0.0))
ols=3300 pos=3/4 time=8.0213 (+ (0 0.0))
ols=206 pos=0/4 time=0.0007 (+ (0 0.0))
ols=206 pos=1/4 time=0.0141 (+ (0 0.0))
ols=206 pos=2/4 time=0.0325 (+ (0 0.0))
ols=206 pos=3/4 time=0.1786 (+ (0 0.0))
ols=12 pos=0/4 time=0.0007 (+ (0 0.0))
ols=12 pos=1/4 time=0.0136 (+ (0 0.0))
ols=12 pos=2/4 time=0.0323 (+ (0 0.0))
ols=12 pos=3/4 time=0.0794 (+ (0 0.0))
ols=0 pos=0/4 time=0.0009 (+ (0 0.0))
ols=0 pos=1/4 time=0.0139 (+ (0 0.0))
ols=0 pos=2/4 time=0.0326 (+ (0 0.0))
ols=0 pos=3/4 time=0.0632 (+ (0 0.0))

by comparison, these are my results using the noverlay branch:

(A)
ols=3300 pos=0/4 time=0.0012 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0132 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0291 (+ (0 0.0))
ols=3300 pos=3/4 time=0.0448 (+ (0 0.0))
ols=206 pos=0/4 time=0.0007 (+ (0 0.0))
ols=206 pos=1/4 time=0.0132 (+ (0 0.0))
ols=206 pos=2/4 time=0.0290 (+ (0 0.0))
ols=206 pos=3/4 time=0.0454 (+ (0 0.0))
ols=12 pos=0/4 time=0.0008 (+ (0 0.0))
ols=12 pos=1/4 time=0.0131 (+ (0 0.0))
ols=12 pos=2/4 time=0.0287 (+ (0 0.0))
ols=12 pos=3/4 time=0.0452 (+ (0 0.0))
ols=0 pos=0/4 time=0.0007 (+ (0 0.0))
ols=0 pos=1/4 time=0.0131 (+ (0 0.0))
ols=0 pos=2/4 time=0.0289 (+ (0 0.0))
ols=0 pos=3/4 time=0.0457 (+ (0 0.0))

(B)
ols=3300 pos=0/4 time=0.0015 (+ (0 0.0))
ols=3300 pos=1/4 time=0.0177 (+ (0 0.0))
ols=3300 pos=2/4 time=0.0345 (+ (0 0.0))
ols=3300 pos=3/4 time=0.0544 (+ (0 0.0))
ols=206 pos=0/4 time=0.0008 (+ (0 0.0))
ols=206 pos=1/4 time=0.0136 (+ (0 0.0))
ols=206 pos=2/4 time=0.0317 (+ (0 0.0))
ols=206 pos=3/4 time=0.0537 (+ (0 0.0))
ols=12 pos=0/4 time=0.0007 (+ (0 0.0))
ols=12 pos=1/4 time=0.0135 (+ (0 0.0))
ols=12 pos=2/4 time=0.0319 (+ (0 0.0))
ols=12 pos=3/4 time=0.0537 (+ (0 0.0))
ols=0 pos=0/4 time=0.0007 (+ (0 0.0))
ols=0 pos=1/4 time=0.0146 (+ (0 0.0))
ols=0 pos=2/4 time=0.0318 (+ (0 0.0))
ols=0 pos=3/4 time=0.0554 (+ (0 0.0))

since noverlay performs so well, I guess the technical issue here is 
already solved and I'll just have to wait for it to make it into the 
master branch. Until then I'll continue using feature/noverlay, but if a 
more recent merge with master was made available, I'd be interested in 
testing that.
thanks again for all the helpful responses in this thread,
Sebastian



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-22 23:11                     ` Sebastian Sturm
@ 2018-03-23  5:03                       ` Stefan Monnier
  2018-03-23 12:25                         ` Sebastian Sturm
  2018-03-23  8:07                       ` Eli Zaretskii
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23  5:03 UTC (permalink / raw)
  To: emacs-devel

> since noverlay performs so well, I guess the technical issue here is already
> solved and I'll just have to wait for it to make it into the master
> branch.

Yes and no: there can still be many markers (even without having any
overlays) and that poses the same problem (and the noverlays branch
doesn't change that).

I've made some further tests and found that the patch I sent indeed
doesn't help tremendously with "distance += 10" but that when we reach
"+= 50" the effect is more significant.

Could you try the patch below (ideally not just with artificial tests
but also in actual use) to see if it helps?


        Stefan


diff --git a/src/marker.c b/src/marker.c
index 7773c4fce0..6ab0d3d61e 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
   CHECK_TYPE (MARKERP (x), Qmarkerp, x);
 }
 
+/* When converting bytes from/to chars, we look through the list of
+   markers to try and find a good starting point (since markers keep
+   track of both bytepos and charpos at the same time).
+   But if there are many markers, it can take too much time to find a "good"
+   marker from which to start.  Worse yet: if it takes a long time and we end
+   up finding a nearby markers, we won't add a new marker to cache this
+   result, so next time around we'll have to go through this same long list
+   to (re)find this best marker.  So the further down the list of
+   markers we go, the less demanding we are w.r.t what is a good marker.
+
+   The previous code used INITIAL=50 and INCREMENT=0 and this lead to
+   really poor performance when there are many markers.
+   I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
+   T61 using various artificial test cases seem to suggest that INCREMENT=50
+   might be "the best compromise": it significantly improved the
+   worst case and it was rarely slower and never by much.
+
+   The asymptotic behavior is still poor, tho, so in largish buffers with many
+   overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck.  */
+#define BYTECHAR_DISTANCE_INITIAL 50
+#define BYTECHAR_DISTANCE_INCREMENT 50
+
 /* Return the byte position corresponding to CHARPOS in B.  */
 
 ptrdiff_t
@@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
 
@@ -180,8 +203,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - best_below < distance)
 	break;
+      else
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.
@@ -293,6 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
 
@@ -323,8 +349,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - best_below < distance)
 	break;
+      else
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.




^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23  5:03                       ` Stefan Monnier
@ 2018-03-23 12:25                         ` Sebastian Sturm
  2018-03-23 12:47                           ` Eli Zaretskii
  0 siblings, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-23 12:25 UTC (permalink / raw)
  To: emacs-devel

I haven't tested this very extensively yet, but artificial benchmark 
results are now comparable to the noverlay branch and editing seems 
similarly fluid. Many thanks for that!
On a tangential note, removing the bottleneck now brought up another 
hotspot in my performance profile (less severe than the one previously 
reported) that seems to be due to cc-mode itself. Since I don't want to 
derail this thread, I'll try to implement some of the suggestions made 
in the "Latency profiling?" thread and come up with some reproducible 
data on that other issue.

On 03/23/2018 06:03 AM, Stefan Monnier wrote:
>> since noverlay performs so well, I guess the technical issue here is already
>> solved and I'll just have to wait for it to make it into the master
>> branch.
> 
> Yes and no: there can still be many markers (even without having any
> overlays) and that poses the same problem (and the noverlays branch
> doesn't change that).
> 
> I've made some further tests and found that the patch I sent indeed
> doesn't help tremendously with "distance += 10" but that when we reach
> "+= 50" the effect is more significant.
> 
> Could you try the patch below (ideally not just with artificial tests
> but also in actual use) to see if it helps?
> 
> 
>          Stefan
> 
> 
> diff --git a/src/marker.c b/src/marker.c
> index 7773c4fce0..6ab0d3d61e 100644
> --- a/src/marker.c
> +++ b/src/marker.c
> @@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
>     CHECK_TYPE (MARKERP (x), Qmarkerp, x);
>   }
>   
> +/* When converting bytes from/to chars, we look through the list of
> +   markers to try and find a good starting point (since markers keep
> +   track of both bytepos and charpos at the same time).
> +   But if there are many markers, it can take too much time to find a "good"
> +   marker from which to start.  Worse yet: if it takes a long time and we end
> +   up finding a nearby markers, we won't add a new marker to cache this
> +   result, so next time around we'll have to go through this same long list
> +   to (re)find this best marker.  So the further down the list of
> +   markers we go, the less demanding we are w.r.t what is a good marker.
> +
> +   The previous code used INITIAL=50 and INCREMENT=0 and this lead to
> +   really poor performance when there are many markers.
> +   I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
> +   T61 using various artificial test cases seem to suggest that INCREMENT=50
> +   might be "the best compromise": it significantly improved the
> +   worst case and it was rarely slower and never by much.
> +
> +   The asymptotic behavior is still poor, tho, so in largish buffers with many
> +   overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck.  */
> +#define BYTECHAR_DISTANCE_INITIAL 50
> +#define BYTECHAR_DISTANCE_INCREMENT 50
> +
>   /* Return the byte position corresponding to CHARPOS in B.  */
>   
>   ptrdiff_t
> @@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
>     struct Lisp_Marker *tail;
>     ptrdiff_t best_above, best_above_byte;
>     ptrdiff_t best_below, best_below_byte;
> +  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
>   
>     eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
>   
> @@ -180,8 +203,10 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
>         /* If we are down to a range of 50 chars,
>   	 don't bother checking any other markers;
>   	 scan the intervening chars directly now.  */
> -      if (best_above - best_below < 50)
> +      if (best_above - best_below < distance)
>   	break;
> +      else
> +        distance += BYTECHAR_DISTANCE_INCREMENT;
>       }
>   
>     /* We get here if we did not exactly hit one of the known places.
> @@ -293,6 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
>     struct Lisp_Marker *tail;
>     ptrdiff_t best_above, best_above_byte;
>     ptrdiff_t best_below, best_below_byte;
> +  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
>   
>     eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
>   
> @@ -323,8 +349,10 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
>         /* If we are down to a range of 50 chars,
>   	 don't bother checking any other markers;
>   	 scan the intervening chars directly now.  */
> -      if (best_above - best_below < 50)
> +      if (best_above - best_below < distance)
>   	break;
> +      else
> +        distance += BYTECHAR_DISTANCE_INCREMENT;
>       }
>   
>     /* We get here if we did not exactly hit one of the known places.
> 
> 

-- 
Sebastian Sturm
Research & Development

Phone:	+49 (0) 6155 7808977
Fax:	+49 (0) 6155 7802880
Email:	s.sturm@arkona-technologies.de
Web:	www.arkona-technologies.de

arkona technologies GmbH
Im Leuschnerpark 4
64347 Griesheim
Germany
Amtsgericht / Commercial Register of Darmstadt, HRB 90080
USt-ID: DE273794666
Steuernummer / Tax-ID: 007 / 228 / 19331
Geschäftsführung / Managing Director: Rainer Sturm

This e-mail may contain confidential and/or privileged information. If 
you are not the intended recipient (or have received this e-mail in 
error) please notify the sender immediately and delete this message and 
any attachments from your system. Any unauthorised copying, disclosure 
or distribution of the material in this e-mail is strictly forbidden.

We cannot accept any responsibility for the accuracy or completeness of 
this message as it has been transmitted over a public network. If, 
despite our use of anti-virus software, a virus enters your systems in 
connection with the sending of the e-mail, you may not hold us liable 
for any damages that may possibly arise in that connection. We will 
accept liability which by law we cannot exclude.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 12:25                         ` Sebastian Sturm
@ 2018-03-23 12:47                           ` Eli Zaretskii
  2018-03-23 13:19                             ` Stefan Monnier
  0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 12:47 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Fri, 23 Mar 2018 13:25:19 +0100
> 
> I haven't tested this very extensively yet, but artificial benchmark 
> results are now comparable to the noverlay branch and editing seems 
> similarly fluid. Many thanks for that!

I'd be interested to see a comparison with a code that ignores the
markers entirely, and uses just these 4:

  CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
  CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
  CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
  CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));

That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of
the number of markers.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 12:47                           ` Eli Zaretskii
@ 2018-03-23 13:19                             ` Stefan Monnier
  2018-03-23 13:37                               ` Noam Postavsky
  2018-03-23 14:22                               ` Eli Zaretskii
  0 siblings, 2 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 13:19 UTC (permalink / raw)
  To: emacs-devel

> I'd be interested to see a comparison with a code that ignores the
> markers entirely, and uses just these 4:
>
>   CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
>   CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
>   CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
>   CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));

I tried that, and in the synthetic benchmark which tries to reproduce
Sebastian's lsp-mode situation the result was indeed much better, but
then in other benchmarks it caused very significant slowdowns.

> That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of
> the number of markers.

I haven't investigated closely enough to be sure, but in my tests with
INCREMENT=50 I reached points where adding more overlays did not make
things worse any more, so I suspect that the ideal INCREMENT depends on
the buffer size more than on the number of markers.

In any case, my patch is just a "quick hack" to try and reduce the pain,
but it doesn't really solve the problem: the slow down with many markers
and a large buffer still grows pretty significantly with the size of
the buffer.  I think it's worse than O(sqrt N).

If we want to really solve this problem, we should use an algorithm with
at most an O(log N) complexity, e.g. keeping the markers in
a red-black tree, or inside a sorted array (probably with a gap like we
have for the buffer text) so we can do a binary search.

        Stefan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 13:19                             ` Stefan Monnier
@ 2018-03-23 13:37                               ` Noam Postavsky
  2018-03-23 13:55                                 ` Stefan Monnier
  2018-03-23 14:22                               ` Eli Zaretskii
  1 sibling, 1 reply; 54+ messages in thread
From: Noam Postavsky @ 2018-03-23 13:37 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Emacs developers

On 23 March 2018 at 09:19, Stefan Monnier <monnier@iro.umontreal.ca> wrote:

> In any case, my patch is just a "quick hack" to try and reduce the pain,
> but it doesn't really solve the problem: the slow down with many markers
> and a large buffer still grows pretty significantly with the size of
> the buffer.  I think it's worse than O(sqrt N).
>
> If we want to really solve this problem, we should use an algorithm with
> at most an O(log N) complexity, e.g. keeping the markers in
> a red-black tree, or inside a sorted array (probably with a gap like we
> have for the buffer text) so we can do a binary search.

Is this related to Bug#24548 "Long GC delays with many non-detached
markers (PATCH)"
aka Bug#29439 "Quadratic complexity in sweep_markers"?

https://debbugs.gnu.org/cgi/bugreport.cgi?bug=24548
https://debbugs.gnu.org/cgi/bugreport.cgi?bug=29439



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 13:37                               ` Noam Postavsky
@ 2018-03-23 13:55                                 ` Stefan Monnier
  0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 13:55 UTC (permalink / raw)
  To: Noam Postavsky; +Cc: Emacs developers

>> If we want to really solve this problem, we should use an algorithm with
>> at most an O(log N) complexity, e.g. keeping the markers in
>> a red-black tree, or inside a sorted array (probably with a gap like we
>> have for the buffer text) so we can do a binary search.
>
> Is this related to Bug#24548 "Long GC delays with many non-detached
> markers (PATCH)"
> aka Bug#29439 "Quadratic complexity in sweep_markers"?

Not directly, no,


        Stefan



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 13:19                             ` Stefan Monnier
  2018-03-23 13:37                               ` Noam Postavsky
@ 2018-03-23 14:22                               ` Eli Zaretskii
  2018-03-23 14:39                                 ` Stefan Monnier
  2018-03-23 19:39                                 ` Stefan Monnier
  1 sibling, 2 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 14:22 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 23 Mar 2018 09:19:11 -0400
> 
> > I'd be interested to see a comparison with a code that ignores the
> > markers entirely, and uses just these 4:
> >
> >   CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
> >   CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
> >   CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
> >   CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
> 
> I tried that, and in the synthetic benchmark which tries to reproduce
> Sebastian's lsp-mode situation the result was indeed much better, but
> then in other benchmarks it caused very significant slowdowns.

In what benchmarks did it cause significant slowdowns?

> > That's because BYTECHAR_DISTANCE_INCREMENT is probably a function of
> > the number of markers.
> 
> I haven't investigated closely enough to be sure, but in my tests with
> INCREMENT=50 I reached points where adding more overlays did not make
> things worse any more, so I suspect that the ideal INCREMENT depends on
> the buffer size more than on the number of markers.

Could be.  My point was that it isn't a constant.

> If we want to really solve this problem, we should use an algorithm with
> at most an O(log N) complexity, e.g. keeping the markers in
> a red-black tree, or inside a sorted array (probably with a gap like we
> have for the buffer text) so we can do a binary search.

Yes, agreed.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 14:22                               ` Eli Zaretskii
@ 2018-03-23 14:39                                 ` Stefan Monnier
  2018-03-23 19:39                                 ` Stefan Monnier
  1 sibling, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 14:39 UTC (permalink / raw)
  To: emacs-devel

>> > I'd be interested to see a comparison with a code that ignores the
>> > markers entirely, and uses just these 4:
>> >   CONSIDER (BUF_PT (b), BUF_PT_BYTE (b));
>> >   CONSIDER (BUF_GPT (b), BUF_GPT_BYTE (b));
>> >   CONSIDER (BUF_BEGV (b), BUF_BEGV_BYTE (b));
>> >   CONSIDER (BUF_ZV (b), BUF_ZV_BYTE (b));
>> I tried that, and in the synthetic benchmark which tries to reproduce
>> Sebastian's lsp-mode situation the result was indeed much better, but
>> then in other benchmarks it caused very significant slowdowns.
> In what benchmarks did it cause significant slowdowns?

Can't remember exactly, I think it was bad enough for a test case
which seemed pretty realistic so I discarded that option.
Basically what I remember is that I got the impression that it would
probably harm more users than the problem at hand.

        Stefan

PS: BTW, the number of markers is not the only issue: the order in which
they are created also matters.  Maybe we should try Sebastian's test
case but creating the markers/overlays in random order (and if that
helps, we could get a similar effect by making the GC randomly shuffle
the buffers's list of markers ;-).

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 14:22                               ` Eli Zaretskii
  2018-03-23 14:39                                 ` Stefan Monnier
@ 2018-03-23 19:39                                 ` Stefan Monnier
  2018-03-25 15:11                                   ` Stefan Monnier
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 19:39 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

OK, I think I'm tired of these experiments.
Here's my current test, along with the patches I use with it.

You can test it with something like

    src/emacs -Q --batch -l ~/tmp/foo.el --eval '(setq internal--bytechar-distance-increment 50 internal--randomize-markers t)' --eval '(bytechar-test 3000 nil)' 

Shuffling the markers can make a noticeable difference in some cases but
we're only talking about a factor of 2 or 3.  It doesn't have much
negative impact, so it's not a bad option, but the algorithmic
problem remains anyway.


        Stefan


(defun bytechar-test (buffer-kb &optional forward)
  (random "seed")
  (with-temp-buffer
    (dotimes (i (* buffer-kb 33))
      (insert "lksajflahalskjdféefawrgfrüegf\n"))
    (message "buffer-size = %SkB" (/ (buffer-size) 1024))
    (let ((txtbuf (current-buffer))
          (goto-iterations (/ 10000000 buffer-kb))
          (line-iterations (/ 20000 buffer-kb))
          (markers ()))
      (dotimes (s 4)
        (with-temp-buffer
          (insert-buffer-substring txtbuf)
          (let ((stepsize (lsh 10 (* 4 s))))
            (if forward
                (cl-loop for n from (point-min) upto (point-max) by stepsize do
                         (push (copy-marker n) markers))
              (cl-loop for n from (point-max) downto (point-min) by stepsize do
                       (push (copy-marker n) markers))))
          ;; The GC's internal--randomize-markers just brings-to-front every
          ;; 8th marker, so when starting with an ordered list of markers (like
          ;; in our case), we need to run the GC at least 8 times before the
          ;; whole list starts to look somewhat shuffled.
          (dotimes (i 20) (garbage-collect))
          (let ((timing
                 (benchmark-run goto-iterations
                   (goto-char (+ (point-min) (random (buffer-size)))))))
            (message "ols=%S goto-random time=%.4f (+ %S)"
                     (/ (buffer-size) (lsh 10 (* 4 s)))
                     (car timing) (cdr timing)))
          (garbage-collect)             ;throw away the transient markers
          (let ((timing
                 (benchmark-run line-iterations
                   (dotimes (i 5)
                     (line-number-at-pos
                      (+ (point-min) (* i (/ (buffer-size) 4))))))))
            (message "nbmks=%S pos=*/4 time=%.4f (+ %S)"
                     (/ (buffer-size) (lsh 10 (* 4 s)))
                     (car timing) (cdr timing)))
          (dotimes (i 5)
            (let ((timing
                   (benchmark-run line-iterations
                     (line-number-at-pos
                      (+ (point-min) (* i (/ (buffer-size) 4)))))))
              (message "nbmks=%S pos=%S/4 time=%.4f (+ %S)"
                       (/ (buffer-size) (lsh 10 (* 4 s))) i
                       (car timing) (cdr timing))))
          )))))

diff --git a/lisp/emacs-lisp/benchmark.el b/lisp/emacs-lisp/benchmark.el
index b86b56b81e..2f4e38fe35 100644
--- a/lisp/emacs-lisp/benchmark.el
+++ b/lisp/emacs-lisp/benchmark.el
@@ -50,7 +50,7 @@ benchmark-run
 garbage collections that ran, and the time taken by garbage collection.
 See also `benchmark-run-compiled'."
   (declare (indent 1) (debug t))
-  (unless (natnump repetitions)
+  (unless (or (natnump repetitions) (symbolp repetitions))
     (setq forms (cons repetitions forms)
 	  repetitions 1))
   (let ((i (make-symbol "i"))
@@ -58,7 +58,7 @@ benchmark-run
 	(gc (make-symbol "gc")))
     `(let ((,gc gc-elapsed)
 	   (,gcs gcs-done))
-       (list ,(if (> repetitions 1)
+       (list ,(if (or (symbolp repetitions) (> repetitions 1))
 		  ;; Take account of the loop overhead.
 		  `(- (benchmark-elapse (dotimes (,i ,repetitions)
 					  ,@forms))
@@ -101,7 +101,7 @@ benchmark
 For non-interactive use see also `benchmark-run' and
 `benchmark-run-compiled'."
   (interactive "p\nxForm: ")
-  (let ((result (eval `(benchmark-run ,repetitions ,form))))
+  (let ((result (eval `(benchmark-run ,repetitions ,form) t)))
     (if (zerop (nth 1 result))
 	(message "Elapsed time: %fs" (car result))
       (message "Elapsed time: %fs (%fs in %d GCs)" (car result)
diff --git a/src/alloc.c b/src/alloc.c
index 7ba872aaee..16d11e34cd 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -7270,10 +7270,22 @@ static void
 unchain_dead_markers (struct buffer *buffer)
 {
   struct Lisp_Marker *this, **prev = &BUF_MARKERS (buffer);
+  /* In order to try and avoid worst case behaviors in buf_charpos_to_bytepos
+     we try and randomize the order of markers here.  */
+  unsigned i = 4;
 
   while ((this = *prev))
     if (this->gcmarkbit)
-      prev = &this->next;
+      {
+        if (!randomize_markers || i++ % 8)
+          prev = &this->next;
+        else
+          { /* Move this one to front, just to randomize things a bit.  */
+            *prev = this->next;
+            this->next = BUF_MARKERS (buffer);
+            BUF_MARKERS (buffer) = this;
+          }
+      }
     else
       {
         this->buffer = NULL;
@@ -7752,6 +7764,9 @@ The time is in seconds as a floating point value.  */);
   DEFVAR_INT ("gcs-done", gcs_done,
               doc: /* Accumulated number of garbage collections done.  */);
 
+  DEFVAR_BOOL ("internal--randomize-markers", randomize_markers, doc: /*  */);
+  randomize_markers = true;
+  
   defsubr (&Scons);
   defsubr (&Slist);
   defsubr (&Svector);
diff --git a/src/marker.c b/src/marker.c
index 3d808fd6fa..7c1d164927 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
   CHECK_TYPE (MARKERP (x), Qmarkerp, x);
 }
 
+/* When converting bytes from/to chars, we look through the list of
+   markers to try and find a good starting point (since markers keep
+   track of both bytepos and charpos at the same time).
+   But if there are many markers, it can take too much time to find a "good"
+   marker from which to start.  Worse yet: if it takes a long time and we end
+   up finding a nearby markers, we won't add a new marker to cache this
+   result, so next time around we'll have to go through this same long list
+   to (re)find this best marker.  So the further down the list of
+   markers we go, the less demanding we are w.r.t what is a good marker.
+
+   The previous code used INITIAL=50 and INCREMENT=0 and this lead to
+   really poor performance when there are many markers.
+   I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
+   T61 using various artificial test cases seem to suggest that INCREMENT=50
+   might be "the best compromise": it significantly improved the
+   worst case and it was rarely slower and never by much.
+
+   The asymptotic behavior is still poor, tho, so in largish buffers with many
+   overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck.  */
+#define BYTECHAR_DISTANCE_INITIAL 50
+#define BYTECHAR_DISTANCE_INCREMENT bytechar_distance_increment
+
 /* Return the byte position corresponding to CHARPOS in B.  */
 
 ptrdiff_t
@@ -141,7 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
-  ptrdiff_t distance = 50;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
 
@@ -184,7 +206,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
       if (best_above - best_below < distance)
 	break;
       else
-        distance++;
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.
@@ -296,7 +318,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
-  ptrdiff_t distance = 50;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
 
@@ -330,7 +352,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
       if (best_above - best_below < distance)
 	break;
       else
-        distance++;
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.
@@ -756,4 +778,9 @@ syms_of_marker (void)
   defsubr (&Scopy_marker);
   defsubr (&Smarker_insertion_type);
   defsubr (&Sset_marker_insertion_type);
+
+  DEFVAR_INT ("internal--bytechar-distance-increment",
+              bytechar_distance_increment,
+              doc: /* Haha */);
+  bytechar_distance_increment = 50;
 }



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 19:39                                 ` Stefan Monnier
@ 2018-03-25 15:11                                   ` Stefan Monnier
  2018-03-25 16:39                                     ` Eli Zaretskii
  0 siblings, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-25 15:11 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> OK, I think I'm tired of these experiments.

Taking a break finally let me get rid of my blinders so I could see
more clearly what's going on:

All this time is really spent in find_newline (rather than in things
like goto-char).  The problem goes as follows:

A - (line-number-at-pos POS) will call find_newline, making it scan all
    positions between point-min and POS.
B1- if find_newline uses the newline cache, at each newline encountered it
    will call buf_charpos_to_bytepos.
B2- if find_newline does not use the newline cache, at each newline
    encountered it will call buf_bytepos_to_charpos.
  - Each time buf_charpos_to_bytepos/buf_bytepos_to_charpos is called
    it will "CONSIDER" the known positions at point-min, point-max, and at
    the position of the last-call (i.e. at the previous newline).
    Then it will loop through all the markers until the distance between
    the nearest position before POS and the nearest position after POS
    are within 50 of each other.
C - since one of the known positions is the one of the last newline, the
    distance between the nearest position and POS (before considering any
    marker) is usually pretty small: the length of the current line.
    It's even often smaller than 50.  But that doesn't let us stop because
    the marker loop doesn't pay attention to the distance to POS in order
    to decide to stop: it only consider the distance between the current upper
    and lower bound and since we're scanning in one direction, all the
    recently seen positions (added as markers) are on the same side of
    POS as the last seen newline so they don't help.

So there are various ways to solve this problem.  So far, I tried to
make the markers-loop give up earlier, which helps (C) to some extent but
without attacking the core of its problem which is to pay attention to
the case where one of the bounds is already very close to POS even tho
the other is still way off.

The patch below tweaks my previous patch to take this into account.
The result is now that my test cases stay fast (mostly unaffected by the
number of markers) even for large buffers with a large number of markers.

Note that the above shows that there are other optimisations which would
also solve this problem (and would be worthwhile independently).
A - change line-number-at-pos so it doesn't always rescan all the way
    from point-min.  This would really circumvent the whole problem
    (contrarily to what I thought before with my blinders on, thanks Eli
    for insisting on that).
B - change find_newline so it doesn't call
    buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline.
    E.g. in the no-newline-cache case it'd probably be faster to
    loop through each char using INC/DEC_BOTH and checking if we're at
    \n, than the current code which uses mem(r)chr to look for the
    bytepos of the next \n and then calls buf_bytepos_to_charpos to get
    the corresponding charpos.  Alternatively, we could just delay the
    computation of the charpos until the end (currently we update it
    at each newline, for the purpose of filling the newline-cache).


-- Stefan


diff --git a/src/marker.c b/src/marker.c
index 7773c4fce0..f869b3f948 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
   CHECK_TYPE (MARKERP (x), Qmarkerp, x);
 }
 
+/* When converting bytes from/to chars, we look through the list of
+   markers to try and find a good starting point (since markers keep
+   track of both bytepos and charpos at the same time).
+   But if there are many markers, it can take too much time to find a "good"
+   marker from which to start.  Worse yet: if it takes a long time and we end
+   up finding a nearby markers, we won't add a new marker to cache this
+   result, so next time around we'll have to go through this same long list
+   to (re)find this best marker.  So the further down the list of
+   markers we go, the less demanding we are w.r.t what is a good marker.
+
+   The previous code used INITIAL=50 and INCREMENT=0 and this lead to
+   really poor performance when there are many markers.
+   I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
+   T61 using various artificial test cases seem to suggest that INCREMENT=50
+   might be "the best compromise": it significantly improved the
+   worst case and it was rarely slower and never by much.
+
+   The asymptotic behavior is still poor, tho, so in largish buffers with many
+   overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck.  */
+#define BYTECHAR_DISTANCE_INITIAL 50
+#define BYTECHAR_DISTANCE_INCREMENT 50
+
 /* Return the byte position corresponding to CHARPOS in B.  */
 
 ptrdiff_t
@@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
 
@@ -180,8 +203,11 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - charpos < distance
+          || charpos - best_below < distance)
 	break;
+      else
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.
@@ -293,6 +319,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
 
@@ -323,8 +350,11 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - bytepos < distance
+          || bytepos - best_below < distance)
 	break;
+      else
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.



^ permalink raw reply related	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-25 15:11                                   ` Stefan Monnier
@ 2018-03-25 16:39                                     ` Eli Zaretskii
  2018-03-25 17:35                                       ` Stefan Monnier
  0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-25 16:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org
> Date: Sun, 25 Mar 2018 11:11:14 -0400
> 
> B1- if find_newline uses the newline cache, at each newline encountered it
>     will call buf_charpos_to_bytepos.
> B2- if find_newline does not use the newline cache, at each newline
>     encountered it will call buf_bytepos_to_charpos.

Agree with B1, but not with B2.  Unless I'm overlooking something,
when the newline cache is disabled, we use memchr, which can search a
contiguous sequence of bytes in a loop, without translating
byte-to-character positions.  It only needs this translation at the
beginning of search, after hitting the gap, and when the search is
completed.

> So there are various ways to solve this problem.  So far, I tried to
> make the markers-loop give up earlier, which helps (C) to some extent but
> without attacking the core of its problem which is to pay attention to
> the case where one of the bounds is already very close to POS even tho
> the other is still way off.
> 
> The patch below tweaks my previous patch to take this into account.
> The result is now that my test cases stay fast (mostly unaffected by the
> number of markers) even for large buffers with a large number of markers.

This is a good change, I think.  But it emphasizes even more the fact
that if we instead expose to Lisp display_count_lines, which is
basically a stripped-down version of find_newline with all the
unnecessary ballast removed, we will get an even better performance in
applications that need to count lines a lot.

> Note that the above shows that there are other optimisations which would
> also solve this problem (and would be worthwhile independently).
> A - change line-number-at-pos so it doesn't always rescan all the way
>     from point-min.  This would really circumvent the whole problem
>     (contrarily to what I thought before with my blinders on, thanks Eli
>     for insisting on that).
> B - change find_newline so it doesn't call
>     buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline.
>     E.g. in the no-newline-cache case it'd probably be faster to
>     loop through each char using INC/DEC_BOTH and checking if we're at
>     \n, than the current code which uses mem(r)chr to look for the
>     bytepos of the next \n and then calls buf_bytepos_to_charpos to get
>     the corresponding charpos.  Alternatively, we could just delay the
>     computation of the charpos until the end (currently we update it
>     at each newline, for the purpose of filling the newline-cache).

I think we need not touch find_newline.  It is a very frequently used
workhorse, and needs to produce decent performance for every one of
its callers.  By contrast, applications whose primary need is to count
lines, let alone do that _thousands_ of times per keystroke, should
have a dedicated function optimized for that job alone.  It's not a
coincidence the native line-number display uses display_count_lines ;-)

Thanks.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-25 16:39                                     ` Eli Zaretskii
@ 2018-03-25 17:35                                       ` Stefan Monnier
  0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-25 17:35 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> Agree with B1, but not with B2.  Unless I'm overlooking something,
> when the newline cache is disabled, we use memchr, which can search a
> contiguous sequence of bytes in a loop, without translating
> byte-to-character positions.  It only needs this translation at the
> beginning of search, after hitting the gap, and when the search is
> completed.

Hmm... indeed, when the newline cache is completely disabled the problem should
not appear.

>> The patch below tweaks my previous patch to take this into account.
>> The result is now that my test cases stay fast (mostly unaffected by the
>> number of markers) even for large buffers with a large number of markers.
> This is a good change, I think.  But it emphasizes even more the fact
> that if we instead expose to Lisp display_count_lines, which is
> basically a stripped-down version of find_newline with all the
> unnecessary ballast removed, we will get an even better performance in
> applications that need to count lines a lot.

Yes, fixing A is definitely worthwhile.


        Stefan



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-22 23:11                     ` Sebastian Sturm
  2018-03-23  5:03                       ` Stefan Monnier
@ 2018-03-23  8:07                       ` Eli Zaretskii
  2018-03-23  9:08                         ` Eli Zaretskii
  1 sibling, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23  8:07 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Fri, 23 Mar 2018 00:11:16 +0100
> 
> since noverlay performs so well, I guess the technical issue here is 
> already solved and I'll just have to wait for it to make it into the 
> master branch.

As Stefan points out, the overlays are not the reason.  The reason are
the number of markers in the buffer; each overlay defines 2 markers.
And there could be many markers in a buffer even if there are no
overlays.

Btw, why do you have so many overlays in these buffers?  Is this part
of lsp-mode implementation, or is the reason unrelated?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23  8:07                       ` Eli Zaretskii
@ 2018-03-23  9:08                         ` Eli Zaretskii
  2018-03-23 10:15                           ` Sebastian Sturm
  2018-03-23 12:12                           ` Stefan Monnier
  0 siblings, 2 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23  9:08 UTC (permalink / raw)
  To: s.sturm; +Cc: emacs-devel

> Date: Fri, 23 Mar 2018 11:07:26 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: emacs-devel@gnu.org
> 
> Btw, why do you have so many overlays in these buffers?  Is this part
> of lsp-mode implementation, or is the reason unrelated?

Also, what about my suggestion to count lines in a relative manner,
using count-lines from a line with a known number?  You never replied
to that suggestion.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23  9:08                         ` Eli Zaretskii
@ 2018-03-23 10:15                           ` Sebastian Sturm
  2018-03-23 12:39                             ` Eli Zaretskii
  2018-03-23 12:12                           ` Stefan Monnier
  1 sibling, 1 reply; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-23 10:15 UTC (permalink / raw)
  To: emacs-devel

On 03/23/2018 10:08 AM, Eli Zaretskii wrote:
>> Date: Fri, 23 Mar 2018 11:07:26 +0300
>> From: Eli Zaretskii <eliz@gnu.org>
>> Cc: emacs-devel@gnu.org
>>
>> Btw, why do you have so many overlays in these buffers?  Is this part
>> of lsp-mode implementation, or is the reason unrelated?

this is not related to lsp-mode itself, but to the semantic highlighter 
implemented by emacs-cquery (which is built on top of lsp-mode but 
implements additional features offered by the cquery backend). When set 
to use overlays (which provides a better visual experience than 
font-lock, as font-lock tends to get out of sync with the buffer during 
editing), the semantic highlighter retrieves a list of symbols within 
the current buffer and creates overlays to provide semantically 
meaningful syntax highlighting. It's not a feature I couldn't live 
without, but it's very precise and in principle could probably be faster 
than the syntax highlighting provided by cc-mode as C++ parsing is 
handled asynchronously by the clang-based native backend.

> Also, what about my suggestion to count lines in a relative manner,
> using count-lines from a line with a known number?  You never replied
> to that suggestion.

you're right, sorry. In my opinion, a caching mechanism might be a very 
useful thing to have if it provides further performance benefits on top 
of what the noverlay branch has to offer. However, since count-lines may 
not be the only function that has to convert between char and byte 
positions (or is it?), and since the noverlay branch seems to resolve 
the overlay issue without having to introduce additional complexity in 
the elisp layer, implementing a caching mechanism before noverlay is 
merged into the master branch seems like a premature optimization to me.
Of course this is a layman's opinion (and maybe the case of "few 
overlays but many markers" is not as pathological as it appears to me); 
if you think a line number cache should be implemented, I'll go and 
discuss that with the lsp-mode maintainers (assuming that they are among 
the heaviest users of line-number-at-pos).
Sebastian

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 10:15                           ` Sebastian Sturm
@ 2018-03-23 12:39                             ` Eli Zaretskii
  0 siblings, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 12:39 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Fri, 23 Mar 2018 11:15:48 +0100
> 
> > Also, what about my suggestion to count lines in a relative manner,
> > using count-lines from a line with a known number?  You never replied
> > to that suggestion.
> 
> you're right, sorry. In my opinion, a caching mechanism might be a very 
> useful thing to have if it provides further performance benefits on top 
> of what the noverlay branch has to offer. However, since count-lines may 
> not be the only function that has to convert between char and byte 
> positions (or is it?), and since the noverlay branch seems to resolve 
> the overlay issue without having to introduce additional complexity in 
> the elisp layer, implementing a caching mechanism before noverlay is 
> merged into the master branch seems like a premature optimization to me.

It isn't premature optimization, because a buffer could have many
markers even if it has no or only a few overlays.

> Of course this is a layman's opinion (and maybe the case of "few 
> overlays but many markers" is not as pathological as it appears to me); 
> if you think a line number cache should be implemented, I'll go and 
> discuss that with the lsp-mode maintainers (assuming that they are among 
> the heaviest users of line-number-at-pos).

I think the effect should be at least measured before the decision
whether to do that is made.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23  9:08                         ` Eli Zaretskii
  2018-03-23 10:15                           ` Sebastian Sturm
@ 2018-03-23 12:12                           ` Stefan Monnier
  2018-03-23 12:40                             ` Eli Zaretskii
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 12:12 UTC (permalink / raw)
  To: emacs-devel

> Also, what about my suggestion to count lines in a relative manner,
> using count-lines from a line with a known number?  You never replied
> to that suggestion.

FWIW, I believe in his case most of the time is spent outside of the
actual "count the lines" (aka forward-line) code, so counting fewer
lines because we start from a more nearby position probably won't help
very much.


        Stefan




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 12:12                           ` Stefan Monnier
@ 2018-03-23 12:40                             ` Eli Zaretskii
  2018-03-23 12:55                               ` Stefan Monnier
  0 siblings, 1 reply; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-23 12:40 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

> From: Stefan Monnier <monnier@iro.umontreal.ca>
> Date: Fri, 23 Mar 2018 08:12:41 -0400
> 
> > Also, what about my suggestion to count lines in a relative manner,
> > using count-lines from a line with a known number?  You never replied
> > to that suggestion.
> 
> FWIW, I believe in his case most of the time is spent outside of the
> actual "count the lines" (aka forward-line) code, so counting fewer
> lines because we start from a more nearby position probably won't help
> very much.

I'm not sure, because the profile indicates memrchr is a significant
runner-up in the CPU time usage.



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-23 12:40                             ` Eli Zaretskii
@ 2018-03-23 12:55                               ` Stefan Monnier
  0 siblings, 0 replies; 54+ messages in thread
From: Stefan Monnier @ 2018-03-23 12:55 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: emacs-devel

> I'm not sure, because the profile indicates memrchr is a significant
> runner-up in the CPU time usage.

Oh, indeed I wasn't clear: it won't solve the "too many markers" case,
but it will still be useful, especially in the case where there aren't
too many markers.

My experience with nlinum--line-number-at-pos is that it's a low-hanging
fruit.  If the solution can be implemented directly inside
line-number-at-pos it'd be even better.

        Stefan

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 23:03     ` Sebastian Sturm
  2018-03-18 23:20       ` Sebastian Sturm
@ 2018-03-19  6:36       ` Eli Zaretskii
  1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19  6:36 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Mon, 19 Mar 2018 00:03:11 +0100
> 
> Again, however, line-number-at-pos shows up as a large CPU time consumer 
> in the profiler report, and benchmark-run still reports several ms per 
> invocation (though this time it's usually around 2 to 4 ms instead of 
> the 20 to 25 I measured earlier), so I'd still be very much interested 
> in a faster line-number-at-pos implementation.

2 to 4 ms for 6 calls is as fast as you can get for a 70K file.  But
you should be able to issue just one call, and replace the other 5
with relative counting using count-lines or forward-line, which should
then count only a small number of lines from the original location.
Right?



^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 21:04   ` Sebastian Sturm
  2018-03-18 23:03     ` Sebastian Sturm
@ 2018-03-19  6:28     ` Eli Zaretskii
  1 sibling, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-19  6:28 UTC (permalink / raw)
  To: Sebastian Sturm; +Cc: emacs-devel

> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
> Date: Sun, 18 Mar 2018 22:04:13 +0100
> 
> I also found it surprising that overlays would slow down line counting, 
> but since I don't know anything about the architecture of the Emacs 
> display engine, or its overlay implementation, I figured that overlays 
> must be to blame because
> 
> (i) the issue went away after switching to the feature/noverlay branch
> 
> (ii) configuring the semantic highlighter to use its font-lock backend 
> also resolved the performance issue (though with the font-lock backend, 
> highlights are easily messed up by editing operations which makes the 
> overlay variant far more appealing)

If you look at the implementation of count-lines, you will see it just
calls forward-line, which pays no attention to overlays (as I'd
expect).

> I also found that some other heavy users of overlays such as 
> avy-goto-word-0-{above,below} feel faster with the feature/noverlay 
> branch, so I'd welcome a merge of the overlay branch even if there was a 
> technically superior alternative to line-number-at-pos that didn't 
> suffer from overlay-related performance issues.

That's unrelated: we want to merge that branch when it's ready for
several good reasons.  But counting lines shouldn't be one of those
reasons.

> That being said, your suggestion sounds intriguing. What would be 
> required to expose find_newline to Lisp? Would I simply have to wrap it 
> in one of Emacs's DEFINE_<something> macros?

DEFUN, more accurately.  But yes.

Actually, I see that forward-line already calls find_newline almost
directly, so it should be fast enough.  Exposing find_line should
perhaps be able to produce some speedup (because you don't need to set
point, like forward-line does), but I doubt that it would be
significant.

> Is there some documentation on the Emacs C backend?

There's the "Writing Emacs Primitives" section of the "Internals"
appendix.

Turning back to your original problem, viz.:

 >> [1] I'm using cquery for my C++ editing needs, which comes with an
 >> overlay-based semantic highlighting mechanism. With my emacs
 >> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
 >> character insertion, which consume ~20 to 25 ms each when performing
 >> edits close to the bottom of a 66KB C++ file (measured using
 >> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
 >> emacs-27/git commit #9942734...). Using the noverlay branch, this figure
 >> drops to ~160us per call.

it looks strange to me that you get such long times.  I just tried
Emacs 26.0.91 near the end of xdisp.c (a 1MB file with over 33K
lines), and I get 3 ms per call there.  So I wonder how come you get 4
or 5 ms per call in a file that is 50 times smaller, because if I run
the above benchmark with point at 70K, I get 0.16 ms per call.  In an
unoptimized build of the current master branch, I get 0.25 ms per call
under these conditions.  (And you could cache the result if you need 6
calls in the same vicinity, and after the initial call only call
forward-line to compute relative increments.)

This is with a 5-year old i7-2600 box with a 3.40 GHz clock.  Is your
CPU significantly slower?

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 20:39 ` Eli Zaretskii
  2018-03-18 21:04   ` Sebastian Sturm
@ 2018-03-21 14:14   ` Sebastien Chapuis
  2018-03-21 15:35     ` Eli Zaretskii
  1 sibling, 1 reply; 54+ messages in thread
From: Sebastien Chapuis @ 2018-03-21 14:14 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: Sebastian Sturm, emacs-devel

Eli Zaretskii writes:

>> From: Sebastian Sturm <s.sturm@arkona-technologies.de>
>> Date: Sun, 18 Mar 2018 21:14:53 +0100
>>
>> [1] I'm using cquery for my C++ editing needs, which comes with an
>> overlay-based semantic highlighting mechanism. With my emacs
>> configuration, lsp-mode/lsp-ui emit 6 calls to line-number-at-pos per
>> character insertion, which consume ~20 to 25 ms each when performing
>> edits close to the bottom of a 66KB C++ file (measured using
>> (benchmark-run 1000 (line-number-at-pos (point))) on a release build of
>> emacs-27/git commit #9942734...). Using the noverlay branch, this figure
>> drops to ~160us per call.
>
> If lsp-mode/lsp-ui needs a fast line counter, one can easily be
> provided by exposing find_newline to Lisp.  IME, it's lightning-fast,
> and should run circles around count-lines (used by line-number-at-pos).
>
> (I'm not sure I even understand how overlays come into play here,
> btw.)

The language server protocol defines a position in file with zero-indexed
line and column offsets [1]:

```
interface Position {
	line: number;
	character: number;
}
```

lsp-mode uses heavily line-number-at-pos to convert an Emacs buffer
point to a LSP position, and vice-versa [2].
This can happen thousands times on each keystroke.

If Emacs could provide a function to do the conversion very fast (or at
least faster than with line-number-at-pos), it would be great.

[1] https://github.com/Microsoft/language-server-protocol/blob/gh-pages/specification.md#position
[2] The conversion from a LSP position to point doesn't use
line-number-at-pos, but forward-line and forward-char. It's still very slow.

--
Sebastien Chapuis

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-21 14:14   ` Sebastien Chapuis
@ 2018-03-21 15:35     ` Eli Zaretskii
  0 siblings, 0 replies; 54+ messages in thread
From: Eli Zaretskii @ 2018-03-21 15:35 UTC (permalink / raw)
  To: Sebastien Chapuis; +Cc: s.sturm, emacs-devel

> From: Sebastien Chapuis <sebastien@chapu.is>
> Cc: Sebastian Sturm <s.sturm@arkona-technologies.de>, emacs-devel@gnu.org
> Date: Wed, 21 Mar 2018 15:14:18 +0100
> 
> The language server protocol defines a position in file with zero-indexed
> line and column offsets [1]:
> 
> ```
> interface Position {
> 	line: number;
> 	character: number;
> }
> ```
> 
> lsp-mode uses heavily line-number-at-pos to convert an Emacs buffer
> point to a LSP position, and vice-versa [2].

line-number-at-pos is inefficient, in that it always counts from the
beginning of the buffer.  By keeping already computed line numbers
around, you could make a faster implementation if you count relative
line offsets using count-lines instead.

> This can happen thousands times on each keystroke.

_Thousands_ times for _each_ keystroke?  Why is that?  Most keystrokes
only change a single line, so how come you need thousands of lines
recounted each time?

> If Emacs could provide a function to do the conversion very fast (or at
> least faster than with line-number-at-pos), it would be great.

Given the above, I think you need to describe the issue in more
details, before we even begin designing the solution.

^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-18 20:14 Sebastian Sturm
  2018-03-18 20:39 ` Eli Zaretskii
@ 2018-03-26 13:06 ` Stefan Monnier
  2018-03-27 20:59   ` Sebastian Sturm
  1 sibling, 1 reply; 54+ messages in thread
From: Stefan Monnier @ 2018-03-26 13:06 UTC (permalink / raw)
  To: emacs-devel

> after finding that the feature/noverlay branch does wonders for my editing
> experience[1], I'd like to reinvigorate the discussion on its inclusion into
> master.

While I also hope that branch can be merged soon, I've installed a patch
into Emacs's `master` which should hopefully solve your
immediate problems.


        Stefan




^ permalink raw reply	[flat|nested] 54+ messages in thread

* Re: State of the overlay tree branch?
  2018-03-26 13:06 ` Stefan Monnier
@ 2018-03-27 20:59   ` Sebastian Sturm
  0 siblings, 0 replies; 54+ messages in thread
From: Sebastian Sturm @ 2018-03-27 20:59 UTC (permalink / raw)
  To: emacs-devel

thank you, the master branch now handles my use case a lot better than 
before

On 03/26/2018 03:06 PM, Stefan Monnier wrote:
>> after finding that the feature/noverlay branch does wonders for my editing
>> experience[1], I'd like to reinvigorate the discussion on its inclusion into
>> master.
> 
> While I also hope that branch can be merged soon, I've installed a patch
> into Emacs's `master` which should hopefully solve your
> immediate problems.
> 
> 
>          Stefan
> 
> 



^ permalink raw reply	[flat|nested] 54+ messages in thread

end of thread, other threads:[~2018-03-27 20:59 UTC | newest]

Thread overview: 54+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
     [not found] <<c24f8534-5245-026e-da18-f6be7b9702bf@arkona-technologies.de>
     [not found] ` <<834lldp18f.fsf@gnu.org>
2018-03-18 21:37   ` State of the overlay tree branch? Drew Adams
2018-03-19  1:33     ` Stefan Monnier
2018-03-19  6:50       ` Eli Zaretskii
2018-03-19 12:29         ` Stefan Monnier
2018-03-19 13:02           ` Eli Zaretskii
2018-03-19 13:43             ` Stefan Monnier
2018-03-19 14:28               ` Eli Zaretskii
2018-03-19 14:39                 ` Stefan Monnier
2018-03-19  6:33     ` Eli Zaretskii
2018-03-18 20:14 Sebastian Sturm
2018-03-18 20:39 ` Eli Zaretskii
2018-03-18 21:04   ` Sebastian Sturm
2018-03-18 23:03     ` Sebastian Sturm
2018-03-18 23:20       ` Sebastian Sturm
2018-03-19  6:43         ` Eli Zaretskii
2018-03-19  9:53           ` Sebastian Sturm
2018-03-19 12:57             ` Eli Zaretskii
2018-03-19 14:56             ` Stefan Monnier
2018-03-19 15:07               ` Sebastian Sturm
2018-03-19 15:13                 ` Stefan Monnier
2018-03-20  1:23                 ` Sebastian Sturm
2018-03-20  6:30                   ` Eli Zaretskii
2018-03-21  0:36                     ` Sebastian Sturm
2018-03-21  6:47                       ` Eli Zaretskii
2018-03-22 13:16                       ` Stefan Monnier
2018-03-22 19:54                         ` Sebastian Sturm
2018-03-22 20:04                           ` Sebastian Sturm
2018-03-22 20:52                   ` Stefan Monnier
2018-03-22 23:11                     ` Sebastian Sturm
2018-03-23  5:03                       ` Stefan Monnier
2018-03-23 12:25                         ` Sebastian Sturm
2018-03-23 12:47                           ` Eli Zaretskii
2018-03-23 13:19                             ` Stefan Monnier
2018-03-23 13:37                               ` Noam Postavsky
2018-03-23 13:55                                 ` Stefan Monnier
2018-03-23 14:22                               ` Eli Zaretskii
2018-03-23 14:39                                 ` Stefan Monnier
2018-03-23 19:39                                 ` Stefan Monnier
2018-03-25 15:11                                   ` Stefan Monnier
2018-03-25 16:39                                     ` Eli Zaretskii
2018-03-25 17:35                                       ` Stefan Monnier
2018-03-23  8:07                       ` Eli Zaretskii
2018-03-23  9:08                         ` Eli Zaretskii
2018-03-23 10:15                           ` Sebastian Sturm
2018-03-23 12:39                             ` Eli Zaretskii
2018-03-23 12:12                           ` Stefan Monnier
2018-03-23 12:40                             ` Eli Zaretskii
2018-03-23 12:55                               ` Stefan Monnier
2018-03-19  6:36       ` Eli Zaretskii
2018-03-19  6:28     ` Eli Zaretskii
2018-03-21 14:14   ` Sebastien Chapuis
2018-03-21 15:35     ` Eli Zaretskii
2018-03-26 13:06 ` Stefan Monnier
2018-03-27 20:59   ` Sebastian Sturm

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).