From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.org!.POSTED!not-for-mail From: Eli Zaretskii Newsgroups: gmane.emacs.devel Subject: Re: State of the overlay tree branch? Date: Sun, 25 Mar 2018 19:39:30 +0300 Message-ID: <834ll4gldp.fsf@gnu.org> References: <834lldp18f.fsf@gnu.org> <9646341d-700b-4240-216b-8c0e753fa79f@arkona-technologies.de> <86d03e78-9984-f33e-a3f3-3faa4b34d78b@arkona-technologies.de> <83vadso9ad.fsf@gnu.org> <5155d5e2-6b5c-581e-89fe-4f3af717304f@arkona-technologies.de> <4c82fcbd-961a-c6ca-b1f0-6b85665cb339@arkona-technologies.de> <1ea4248a-11ce-00a9-0644-df7b2e5a3a58@arkona-technologies.de> <838tajhsau.fsf@gnu.org> <837eq2j2i5.fsf@gnu.org> Reply-To: Eli Zaretskii NNTP-Posting-Host: blaine.gmane.org X-Trace: blaine.gmane.org 1521995887 9196 195.159.176.226 (25 Mar 2018 16:38:07 GMT) X-Complaints-To: usenet@blaine.gmane.org NNTP-Posting-Date: Sun, 25 Mar 2018 16:38:07 +0000 (UTC) Cc: emacs-devel@gnu.org To: Stefan Monnier Original-X-From: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Sun Mar 25 18:38:03 2018 Return-path: Envelope-to: ged-emacs-devel@m.gmane.org Original-Received: from lists.gnu.org ([208.118.235.17]) by blaine.gmane.org with esmtp (Exim 4.84_2) (envelope-from ) id 1f08eh-0002Gw-4e for ged-emacs-devel@m.gmane.org; Sun, 25 Mar 2018 18:38:03 +0200 Original-Received: from localhost ([::1]:51882 helo=lists.gnu.org) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f08gi-0002Du-Ju for ged-emacs-devel@m.gmane.org; Sun, 25 Mar 2018 12:40:08 -0400 Original-Received: from eggs.gnu.org ([2001:4830:134:3::10]:39674) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f08g3-0002DY-AG for emacs-devel@gnu.org; Sun, 25 Mar 2018 12:39:28 -0400 Original-Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1f08g0-0003RA-1R for emacs-devel@gnu.org; Sun, 25 Mar 2018 12:39:27 -0400 Original-Received: from fencepost.gnu.org ([2001:4830:134:3::e]:41945) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1f08fz-0003Qx-UE; Sun, 25 Mar 2018 12:39:23 -0400 Original-Received: from [176.228.60.248] (port=1864 helo=home-c4e4a596f7) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1f08fz-0000Mi-9p; Sun, 25 Mar 2018 12:39:23 -0400 In-reply-to: (message from Stefan Monnier on Sun, 25 Mar 2018 11:11:14 -0400) X-detected-operating-system: by eggs.gnu.org: GNU/Linux 2.2.x-3.x [generic] X-Received-From: 2001:4830:134:3::e X-BeenThere: emacs-devel@gnu.org X-Mailman-Version: 2.1.21 Precedence: list List-Id: "Emacs development discussions." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: emacs-devel-bounces+ged-emacs-devel=m.gmane.org@gnu.org Original-Sender: "Emacs-devel" Xref: news.gmane.org gmane.emacs.devel:224007 Archived-At: > From: Stefan Monnier > Cc: emacs-devel@gnu.org > Date: Sun, 25 Mar 2018 11:11:14 -0400 > > B1- if find_newline uses the newline cache, at each newline encountered it > will call buf_charpos_to_bytepos. > B2- if find_newline does not use the newline cache, at each newline > encountered it will call buf_bytepos_to_charpos. Agree with B1, but not with B2. Unless I'm overlooking something, when the newline cache is disabled, we use memchr, which can search a contiguous sequence of bytes in a loop, without translating byte-to-character positions. It only needs this translation at the beginning of search, after hitting the gap, and when the search is completed. > So there are various ways to solve this problem. So far, I tried to > make the markers-loop give up earlier, which helps (C) to some extent but > without attacking the core of its problem which is to pay attention to > the case where one of the bounds is already very close to POS even tho > the other is still way off. > > The patch below tweaks my previous patch to take this into account. > The result is now that my test cases stay fast (mostly unaffected by the > number of markers) even for large buffers with a large number of markers. This is a good change, I think. But it emphasizes even more the fact that if we instead expose to Lisp display_count_lines, which is basically a stripped-down version of find_newline with all the unnecessary ballast removed, we will get an even better performance in applications that need to count lines a lot. > Note that the above shows that there are other optimisations which would > also solve this problem (and would be worthwhile independently). > A - change line-number-at-pos so it doesn't always rescan all the way > from point-min. This would really circumvent the whole problem > (contrarily to what I thought before with my blinders on, thanks Eli > for insisting on that). > B - change find_newline so it doesn't call > buf_charpos_to_bytepos/buf_bytepos_to_charpos at each newline. > E.g. in the no-newline-cache case it'd probably be faster to > loop through each char using INC/DEC_BOTH and checking if we're at > \n, than the current code which uses mem(r)chr to look for the > bytepos of the next \n and then calls buf_bytepos_to_charpos to get > the corresponding charpos. Alternatively, we could just delay the > computation of the charpos until the end (currently we update it > at each newline, for the purpose of filling the newline-cache). I think we need not touch find_newline. It is a very frequently used workhorse, and needs to produce decent performance for every one of its callers. By contrast, applications whose primary need is to count lines, let alone do that _thousands_ of times per keystroke, should have a dedicated function optimized for that job alone. It's not a coincidence the native line-number display uses display_count_lines ;-) Thanks.