all messages for Emacs-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Stefan Monnier <monnier@IRO.UMontreal.CA>
To: Eli Zaretskii <eliz@gnu.org>
Cc: emacs-devel@gnu.org
Subject: Re: State of the overlay tree branch?
Date: Sun, 25 Mar 2018 11:11:14 -0400	[thread overview]
Message-ID: <jwvsh8op6h6.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <jwvzi2yio5f.fsf-monnier+emacs@gnu.org> (Stefan Monnier's message of "Fri, 23 Mar 2018 15:39:35 -0400")

> OK, I think I'm tired of these experiments.

Taking a break finally let me get rid of my blinders so I could see
more clearly what's going on:

All this time is really spent in find_newline (rather than in things
like goto-char).  The problem goes as follows:

A - (line-number-at-pos POS) will call find_newline, making it scan all
    positions between point-min and POS.
B1- if find_newline uses the newline cache, at each newline encountered it
    will call buf_charpos_to_bytepos.
B2- if find_newline does not use the newline cache, at each newline
    encountered it will call buf_bytepos_to_charpos.
  - Each time buf_charpos_to_bytepos/buf_bytepos_to_charpos is called
    it will "CONSIDER" the known positions at point-min, point-max, and at
    the position of the last-call (i.e. at the previous newline).
    Then it will loop through all the markers until the distance between
    the nearest position before POS and the nearest position after POS
    are within 50 of each other.
C - since one of the known positions is the one of the last newline, the
    distance between the nearest position and POS (before considering any
    marker) is usually pretty small: the length of the current line.
    It's even often smaller than 50.  But that doesn't let us stop because
    the marker loop doesn't pay attention to the distance to POS in order
    to decide to stop: it only consider the distance between the current upper
    and lower bound and since we're scanning in one direction, all the
    recently seen positions (added as markers) are on the same side of
    POS as the last seen newline so they don't help.

So there are various ways to solve this problem.  So far, I tried to
make the markers-loop give up earlier, which helps (C) to some extent but
without attacking the core of its problem which is to pay attention to
the case where one of the bounds is already very close to POS even tho
the other is still way off.

The patch below tweaks my previous patch to take this into account.
The result is now that my test cases stay fast (mostly unaffected by the
number of markers) even for large buffers with a large number of markers.

Note that the above shows that there are other optimisations which would
also solve this problem (and would be worthwhile independently).
A - change line-number-at-pos so it doesn't always rescan all the way
    from point-min.  This would really circumvent the whole problem
    (contrarily to what I thought before with my blinders on, thanks Eli
    for insisting on that).
B - change find_newline so it doesn't call
    buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline.
    E.g. in the no-newline-cache case it'd probably be faster to
    loop through each char using INC/DEC_BOTH and checking if we're at
    \n, than the current code which uses mem(r)chr to look for the
    bytepos of the next \n and then calls buf_bytepos_to_charpos to get
    the corresponding charpos.  Alternatively, we could just delay the
    computation of the charpos until the end (currently we update it
    at each newline, for the purpose of filling the newline-cache).


-- Stefan


diff --git a/src/marker.c b/src/marker.c
index 7773c4fce0..f869b3f948 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x)
   CHECK_TYPE (MARKERP (x), Qmarkerp, x);
 }
 
+/* When converting bytes from/to chars, we look through the list of
+   markers to try and find a good starting point (since markers keep
+   track of both bytepos and charpos at the same time).
+   But if there are many markers, it can take too much time to find a "good"
+   marker from which to start.  Worse yet: if it takes a long time and we end
+   up finding a nearby markers, we won't add a new marker to cache this
+   result, so next time around we'll have to go through this same long list
+   to (re)find this best marker.  So the further down the list of
+   markers we go, the less demanding we are w.r.t what is a good marker.
+
+   The previous code used INITIAL=50 and INCREMENT=0 and this lead to
+   really poor performance when there are many markers.
+   I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad
+   T61 using various artificial test cases seem to suggest that INCREMENT=50
+   might be "the best compromise": it significantly improved the
+   worst case and it was rarely slower and never by much.
+
+   The asymptotic behavior is still poor, tho, so in largish buffers with many
+   overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck.  */
+#define BYTECHAR_DISTANCE_INITIAL 50
+#define BYTECHAR_DISTANCE_INCREMENT 50
+
 /* Return the byte position corresponding to CHARPOS in B.  */
 
 ptrdiff_t
@@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b));
 
@@ -180,8 +203,11 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - charpos < distance
+          || charpos - best_below < distance)
 	break;
+      else
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.
@@ -293,6 +319,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
   struct Lisp_Marker *tail;
   ptrdiff_t best_above, best_above_byte;
   ptrdiff_t best_below, best_below_byte;
+  ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL;
 
   eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b));
 
@@ -323,8 +350,11 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos)
       /* If we are down to a range of 50 chars,
 	 don't bother checking any other markers;
 	 scan the intervening chars directly now.  */
-      if (best_above - best_below < 50)
+      if (best_above - bytepos < distance
+          || bytepos - best_below < distance)
 	break;
+      else
+        distance += BYTECHAR_DISTANCE_INCREMENT;
     }
 
   /* We get here if we did not exactly hit one of the known places.



  reply	other threads:[~2018-03-25 15:11 UTC|newest]

Thread overview: 54+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-03-18 20:14 State of the overlay tree branch? Sebastian Sturm
2018-03-18 20:39 ` Eli Zaretskii
2018-03-18 21:04   ` Sebastian Sturm
2018-03-18 23:03     ` Sebastian Sturm
2018-03-18 23:20       ` Sebastian Sturm
2018-03-19  6:43         ` Eli Zaretskii
2018-03-19  9:53           ` Sebastian Sturm
2018-03-19 12:57             ` Eli Zaretskii
2018-03-19 14:56             ` Stefan Monnier
2018-03-19 15:07               ` Sebastian Sturm
2018-03-19 15:13                 ` Stefan Monnier
2018-03-20  1:23                 ` Sebastian Sturm
2018-03-20  6:30                   ` Eli Zaretskii
2018-03-21  0:36                     ` Sebastian Sturm
2018-03-21  6:47                       ` Eli Zaretskii
2018-03-22 13:16                       ` Stefan Monnier
2018-03-22 19:54                         ` Sebastian Sturm
2018-03-22 20:04                           ` Sebastian Sturm
2018-03-22 20:52                   ` Stefan Monnier
2018-03-22 23:11                     ` Sebastian Sturm
2018-03-23  5:03                       ` Stefan Monnier
2018-03-23 12:25                         ` Sebastian Sturm
2018-03-23 12:47                           ` Eli Zaretskii
2018-03-23 13:19                             ` Stefan Monnier
2018-03-23 13:37                               ` Noam Postavsky
2018-03-23 13:55                                 ` Stefan Monnier
2018-03-23 14:22                               ` Eli Zaretskii
2018-03-23 14:39                                 ` Stefan Monnier
2018-03-23 19:39                                 ` Stefan Monnier
2018-03-25 15:11                                   ` Stefan Monnier [this message]
2018-03-25 16:39                                     ` Eli Zaretskii
2018-03-25 17:35                                       ` Stefan Monnier
2018-03-23  8:07                       ` Eli Zaretskii
2018-03-23  9:08                         ` Eli Zaretskii
2018-03-23 10:15                           ` Sebastian Sturm
2018-03-23 12:39                             ` Eli Zaretskii
2018-03-23 12:12                           ` Stefan Monnier
2018-03-23 12:40                             ` Eli Zaretskii
2018-03-23 12:55                               ` Stefan Monnier
2018-03-19  6:36       ` Eli Zaretskii
2018-03-19  6:28     ` Eli Zaretskii
2018-03-21 14:14   ` Sebastien Chapuis
2018-03-21 15:35     ` Eli Zaretskii
2018-03-26 13:06 ` Stefan Monnier
2018-03-27 20:59   ` Sebastian Sturm
     [not found] <<c24f8534-5245-026e-da18-f6be7b9702bf@arkona-technologies.de>
     [not found] ` <<834lldp18f.fsf@gnu.org>
2018-03-18 21:37   ` Drew Adams
2018-03-19  1:33     ` Stefan Monnier
2018-03-19  6:50       ` Eli Zaretskii
2018-03-19 12:29         ` Stefan Monnier
2018-03-19 13:02           ` Eli Zaretskii
2018-03-19 13:43             ` Stefan Monnier
2018-03-19 14:28               ` Eli Zaretskii
2018-03-19 14:39                 ` Stefan Monnier
2018-03-19  6:33     ` Eli Zaretskii

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvsh8op6h6.fsf-monnier+emacs@gnu.org \
    --to=monnier@iro.umontreal.ca \
    --cc=eliz@gnu.org \
    --cc=emacs-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/emacs.git
	https://git.savannah.gnu.org/cgit/emacs/org-mode.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.