From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Stefan Monnier Newsgroups: gmane.emacs.devel Subject: Re: State of the overlay tree branch? Date: Sun, 25 Mar 2018 11:11:14 -0400 Message-ID: References: <834lldp18f.fsf@gnu.org> <9646341d-700b-4240-216b-8c0e753fa79f@arkona-technologies.de> <86d03e78-9984-f33e-a3f3-3faa4b34d78b@arkona-technologies.de> <83vadso9ad.fsf@gnu.org> <5155d5e2-6b5c-581e-89fe-4f3af717304f@arkona-technologies.de> <4c82fcbd-961a-c6ca-b1f0-6b85665cb339@arkona-technologies.de> <1ea4248a-11ce-00a9-0644-df7b2e5a3a58@arkona-technologies.de> <838tajhsau.fsf@gnu.org> <837eq2j2i5.fsf@gnu.org> NNTP-Posting-Host: blaine.gmane.org Mime-Version: 1.0 Content-Type: text/plain X-Trace: blaine.gmane.org 1521990566 18571 195.159.176.226 (25 Mar 2018 15:09:26 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 25 Mar 2018 15:09:26 +0000 (UTC) User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.0.50 (gnu/linux) Cc: emacs-devel@gnu.org To: Eli Zaretskii Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 25 17:09:22 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f07Gq-0004ig-Uh for ged-emacs-devel@m.gmane.org; Sun, 25 Mar 2018 17:09:21 +0200 Original-Received: from localhost ([::1]:51558 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f07Iu-0007rL-0C for ged-emacs-devel@m.gmane.org; Sun, 25 Mar 2018 11:11:28 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:33421) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f07In-0007r4-C1 for emacs-devel@gnu.org; Sun, 25 Mar 2018 11:11:22 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f07Ik-0006rF-6H for emacs-devel@gnu.org; Sun, 25 Mar 2018 11:11:21 -0400 Original-Received: from chene.dit.umontreal.ca ([132.204.246.20]:53073) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f07Ij-0006oq-W5; Sun, 25 Mar 2018 11:11:18 -0400 Original-Received: from pastel.home (lechon.iro.umontreal.ca [132.204.27.242]) by chene.dit.umontreal.ca (8.14.7/8.14.1) with ESMTP id w2PFBFNv006205; Sun, 25 Mar 2018 11:11:15 -0400 Original-Received: by pastel.home (Postfix, from userid 20848) id C1A23612DF; Sun, 25 Mar 2018 11:11:14 -0400 (EDT) In-Reply-To: (Stefan Monnier's message of "Fri, 23 Mar 2018 15:39:35 -0400") X-NAI-Spam-Flag: NO X-NAI-Spam-Threshold: 5 X-NAI-Spam-Score: 0 X-NAI-Spam-Rules: 2 Rules triggered EDT_SA_DN_PASS=0, RV6249=0 X-NAI-Spam-Version: 2.3.0.9418 : core <6249> : inlines <6517> : streams <1782230> : uri <2614614> X-detected-operating-system: by eggs.gnu.org: Genre and OS details not recognized. X-Received-From: 132.204.246.20 X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224000 Archived-At: > OK, I think I'm tired of these experiments. Taking a break finally let me get rid of my blinders so I could see more clearly what's going on: All this time is really spent in find_newline (rather than in things like goto-char). The problem goes as follows: A - (line-number-at-pos POS) will call find_newline, making it scan all positions between point-min and POS. B1- if find_newline uses the newline cache, at each newline encountered it will call buf_charpos_to_bytepos. B2- if find_newline does not use the newline cache, at each newline encountered it will call buf_bytepos_to_charpos. - Each time buf_charpos_to_bytepos/buf_bytepos_to_charpos is called it will "CONSIDER" the known positions at point-min, point-max, and at the position of the last-call (i.e. at the previous newline). Then it will loop through all the markers until the distance between the nearest position before POS and the nearest position after POS are within 50 of each other. C - since one of the known positions is the one of the last newline, the distance between the nearest position and POS (before considering any marker) is usually pretty small: the length of the current line. It's even often smaller than 50. But that doesn't let us stop because the marker loop doesn't pay attention to the distance to POS in order to decide to stop: it only consider the distance between the current upper and lower bound and since we're scanning in one direction, all the recently seen positions (added as markers) are on the same side of POS as the last seen newline so they don't help. So there are various ways to solve this problem. So far, I tried to make the markers-loop give up earlier, which helps (C) to some extent but without attacking the core of its problem which is to pay attention to the case where one of the bounds is already very close to POS even tho the other is still way off. The patch below tweaks my previous patch to take this into account. The result is now that my test cases stay fast (mostly unaffected by the number of markers) even for large buffers with a large number of markers. Note that the above shows that there are other optimisations which would also solve this problem (and would be worthwhile independently). A - change line-number-at-pos so it doesn't always rescan all the way from point-min. This would really circumvent the whole problem (contrarily to what I thought before with my blinders on, thanks Eli for insisting on that). B - change find_newline so it doesn't call buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline. E.g. in the no-newline-cache case it'd probably be faster to loop through each char using INC/DEC_BOTH and checking if we're at \n, than the current code which uses mem(r)chr to look for the bytepos of the next \n and then calls buf_bytepos_to_charpos to get the corresponding charpos. Alternatively, we could just delay the computation of the charpos until the end (currently we update it at each newline, for the purpose of filling the newline-cache). -- Stefan diff --git a/src/marker.c b/src/marker.c index 7773c4fce0..f869b3f948 100644 --- a/src/marker.c +++ b/src/marker.c @@ -133,6 +133,28 @@ CHECK_MARKER (Lisp_Object x) CHECK_TYPE (MARKERP (x), Qmarkerp, x); } +/* When converting bytes from/to chars, we look through the list of + markers to try and find a good starting point (since markers keep + track of both bytepos and charpos at the same time). + But if there are many markers, it can take too much time to find a "good" + marker from which to start. Worse yet: if it takes a long time and we end + up finding a nearby markers, we won't add a new marker to cache this + result, so next time around we'll have to go through this same long list + to (re)find this best marker. So the further down the list of + markers we go, the less demanding we are w.r.t what is a good marker. + + The previous code used INITIAL=50 and INCREMENT=0 and this lead to + really poor performance when there are many markers. + I haven't tried to tweak INITIAL, but my experiments on my trusty Thinkpad + T61 using various artificial test cases seem to suggest that INCREMENT=50 + might be "the best compromise": it significantly improved the + worst case and it was rarely slower and never by much. + + The asymptotic behavior is still poor, tho, so in largish buffers with many + overlays (e.g. 300KB and 30K overlays), it can still be a bottlneck. */ +#define BYTECHAR_DISTANCE_INITIAL 50 +#define BYTECHAR_DISTANCE_INCREMENT 50 + /* Return the byte position corresponding to CHARPOS in B. */ ptrdiff_t @@ -141,6 +163,7 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG (b) <= charpos && charpos <= BUF_Z (b)); @@ -180,8 +203,11 @@ buf_charpos_to_bytepos (struct buffer *b, ptrdiff_t charpos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - charpos < distance + || charpos - best_below < distance) break; + else + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. @@ -293,6 +319,7 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) struct Lisp_Marker *tail; ptrdiff_t best_above, best_above_byte; ptrdiff_t best_below, best_below_byte; + ptrdiff_t distance = BYTECHAR_DISTANCE_INITIAL; eassert (BUF_BEG_BYTE (b) <= bytepos && bytepos <= BUF_Z_BYTE (b)); @@ -323,8 +350,11 @@ buf_bytepos_to_charpos (struct buffer *b, ptrdiff_t bytepos) /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ - if (best_above - best_below < 50) + if (best_above - bytepos < distance + || bytepos - best_below < distance) break; + else + distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places.