From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.org!.POSTED!not-for-mail
From: Eli Zaretskii <eliz@gnu.org>
Newsgroups: gmane.emacs.devel
Subject: Re: State of the overlay tree branch?
Date: Sun, 25 Mar 2018 19:39:30 +0300
Message-ID: <834ll4gldp.fsf@gnu.org>
References: <c24f8534-5245-026e-da18-f6be7b9702bf@arkona-technologies.de>
	<834lldp18f.fsf@gnu.org>
	<cee01726-a91d-69c2-a52b-7e7e86164b0b@arkona-technologies.de>
	<9646341d-700b-4240-216b-8c0e753fa79f@arkona-technologies.de>
	<86d03e78-9984-f33e-a3f3-3faa4b34d78b@arkona-technologies.de>
	<83vadso9ad.fsf@gnu.org>
	<f3216876-e29c-a163-bd7e-102574a28ff7@arkona-technologies.de>
	<jwvo9jknmqd.fsf-monnier+gmane.emacs.devel@gnu.org>
	<5155d5e2-6b5c-581e-89fe-4f3af717304f@arkona-technologies.de>
	<4c82fcbd-961a-c6ca-b1f0-6b85665cb339@arkona-technologies.de>
	<jwv7eq3lvuo.fsf-monnier+gmane.emacs.devel@gnu.org>
	<d8dc3c68-2638-409f-78f7-18e0f7369b96@arkona-technologies.de>
	<jwvbmffjwyq.fsf-monnier+gmane.emacs.devel@gnu.org>
	<1ea4248a-11ce-00a9-0644-df7b2e5a3a58@arkona-technologies.de>
	<838tajhsau.fsf@gnu.org>
	<jwvin9nrlru.fsf-monnier+gmane.emacs.devel@gnu.org>
	<837eq2j2i5.fsf@gnu.org> <jwvzi2yio5f.fsf-monnier+emacs@gnu.org>
	<jwvsh8op6h6.fsf-monnier+emacs@gnu.org>
Reply-To: Eli Zaretskii <eliz@gnu.org>
NNTP-Posting-Host: blaine.gmane.org
X-Trace: blaine.gmane.org 1521995887 9196 195.159.176.226 (25 Mar 2018 16:38:07 GMT)
X-Complaints-To: usenet@blaine.gmane.org
NNTP-Posting-Date: Sun, 25 Mar 2018 16:38:07 +0000 (UTC)
Cc: emacs-devel@gnu.org
To: Stefan Monnier <monnier@IRO.UMontreal.CA>
Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 25 18:38:03 2018
Return-path: <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Envelope-to: ged-emacs-devel@m.gmane.org
Original-Received: from lists.gnu.org ([208.118.235.17])
	by blaine.gmane.org with esmtp (Exim 4.84_2)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1f08eh-0002Gw-4e
	for ged-emacs-devel@m.gmane.org; Sun, 25 Mar 2018 18:38:03 +0200
Original-Received: from localhost ([::1]:51882 helo=lists.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>)
	id 1f08gi-0002Du-Ju
	for ged-emacs-devel@m.gmane.org; Sun, 25 Mar 2018 12:40:08 -0400
Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39674)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1f08g3-0002DY-AG
	for emacs-devel@gnu.org; Sun, 25 Mar 2018 12:39:28 -0400
Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <eliz@gnu.org>) id 1f08g0-0003RA-1R
	for emacs-devel@gnu.org; Sun, 25 Mar 2018 12:39:27 -0400
Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41945)
	by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from <eliz@gnu.org>)
	id 1f08fz-0003Qx-UE; Sun, 25 Mar 2018 12:39:23 -0400
Original-Received: from [176.228.60.248] (port=1864 helo=home-c4e4a596f7)
	by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256)
	(Exim 4.82) (envelope-from <eliz@gnu.org>)
	id 1f08fz-0000Mi-9p; Sun, 25 Mar 2018 12:39:23 -0400
In-reply-to: <jwvsh8op6h6.fsf-monnier+emacs@gnu.org> (message from Stefan
	Monnier on Sun, 25 Mar 2018 11:11:14 -0400)
X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic]
X-Received-From: 2001:4830:134:3::e
X-BeenThere: emacs-devel@gnu.org
X-Mailman-Version: 2.1.21
Precedence: list
List-Id: "Emacs development discussions." <emacs-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/emacs-devel/>
List-Post: <mailto:emacs-devel@gnu.org>
List-Help: <mailto:emacs-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/emacs-devel>,
	<mailto:emacs-devel-request@gnu.org?subject=subscribe>
Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org
Original-Sender: "Emacs-devel" <emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org>
Xref: news.gmane.org gmane.emacs.devel:224007
Archived-At: <http://permalink.gmane.org/gmane.emacs.devel/224007>

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Cc: emacs-devel@gnu.org
> Date: Sun, 25 Mar 2018 11:11:14 -0400
> 
> B1- if find_newline uses the newline cache, at each newline encountered it
>     will call buf_charpos_to_bytepos.
> B2- if find_newline does not use the newline cache, at each newline
>     encountered it will call buf_bytepos_to_charpos.

Agree with B1, but not with B2.  Unless I'm overlooking something,
when the newline cache is disabled, we use memchr, which can search a
contiguous sequence of bytes in a loop, without translating
byte-to-character positions.  It only needs this translation at the
beginning of search, after hitting the gap, and when the search is
completed.

> So there are various ways to solve this problem.  So far, I tried to
> make the markers-loop give up earlier, which helps (C) to some extent but
> without attacking the core of its problem which is to pay attention to
> the case where one of the bounds is already very close to POS even tho
> the other is still way off.
> 
> The patch below tweaks my previous patch to take this into account.
> The result is now that my test cases stay fast (mostly unaffected by the
> number of markers) even for large buffers with a large number of markers.

This is a good change, I think.  But it emphasizes even more the fact
that if we instead expose to Lisp display_count_lines, which is
basically a stripped-down version of find_newline with all the
unnecessary ballast removed, we will get an even better performance in
applications that need to count lines a lot.

> Note that the above shows that there are other optimisations which would
> also solve this problem (and would be worthwhile independently).
> A - change line-number-at-pos so it doesn't always rescan all the way
>     from point-min.  This would really circumvent the whole problem
>     (contrarily to what I thought before with my blinders on, thanks Eli
>     for insisting on that).
> B - change find_newline so it doesn't call
>     buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline.
>     E.g. in the no-newline-cache case it'd probably be faster to
>     loop through each char using INC/DEC_BOTH and checking if we're at
>     \n, than the current code which uses mem(r)chr to look for the
>     bytepos of the next \n and then calls buf_bytepos_to_charpos to get
>     the corresponding charpos.  Alternatively, we could just delay the
>     computation of the charpos until the end (currently we update it
>     at each newline, for the purpose of filling the newline-cache).

I think we need not touch find_newline.  It is a very frequently used
workhorse, and needs to produce decent performance for every one of
its callers.  By contrast, applications whose primary need is to count
lines, let alone do that _thousands_ of times per keystroke, should
have a dedicated function optimized for that job alone.  It's not a
coincidence the native line-number display uses display_count_lines ;-)

Thanks.