From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#63040: 30.0.50; Performance of buf_bytepos_to_charpos when a buffer has large number of markers Date: Mon, 24 Apr 2023 06:36:07 +0000 Message-ID: <87bkjdzt0o.fsf@localhost> References: <878reiwfm3.fsf@localhost> <83wn22xbj9.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="12491"; mail-complaints-to="usenet@ciao.gmane.io" Cc: 63040@debbugs.gnu.org To: Eli Zaretskii Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Mon Apr 24 08:34:49 2023 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pqpmj-0002zI-9E for geb-bug-gnu-emacs@m.gmane-mx.org; Mon, 24 Apr 2023 08:34:49 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pqpm8-0004LS-8E; Mon, 24 Apr 2023 02:34:12 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pqplz-0004Ko-SR for bug-gnu-emacs@gnu.org; Mon, 24 Apr 2023 02:34:05 -0400 Original-Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pqply-000843-Tv for bug-gnu-emacs@gnu.org; Mon, 24 Apr 2023 02:34:03 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pqply-00018q-95 for bug-gnu-emacs@gnu.org; Mon, 24 Apr 2023 02:34:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Mon, 24 Apr 2023 06:34:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 63040 X-GNU-PR-Package: emacs Original-Received: via spool by 63040-submit@debbugs.gnu.org id=B63040.16823180044340 (code B ref 63040); Mon, 24 Apr 2023 06:34:02 +0000 Original-Received: (at 63040) by debbugs.gnu.org; 24 Apr 2023 06:33:24 +0000 Original-Received: from localhost ([127.0.0.1]:47341 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pqplM-00017w-A1 for submit@debbugs.gnu.org; Mon, 24 Apr 2023 02:33:24 -0400 Original-Received: from mout01.posteo.de ([185.67.36.65]:35165) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pqplK-00017g-Cq for 63040@debbugs.gnu.org; Mon, 24 Apr 2023 02:33:23 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 28C46240139 for <63040@debbugs.gnu.org>; Mon, 24 Apr 2023 08:33:16 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1682317996; bh=+FZ8PE6bjUdMApN9r/BX9qrXrQO9ZmGzwVb1V9lRiVo=; h=From:To:Cc:Subject:Date:From; b=VpVnP32kbeH4PQK/NYoDMA4VAbx3EKTovrHiXMHOLprySXVx3KMKkBeRIpbA+f7ud O3vqOfRtqHcm5S1pNseuVZgsEjNZJOaFWZVTq1vYyi4ITV46FXK9HEoB8BDZ6Utz1X yB0Hx6FP1FdtJL1QEYnmlJ593F0cPmKgjsVB8m7+wr0q47BimD6/o4qAdQ2mc0XRcJ 3BwyG8kb8sXMaBZgDXoQS9VjtAEkaOGhOL3xQJ8rRcahE0hAY8zvZWSmS/Yt+0QZkq WxKZ9FAId5MQcwPwA8WDxxRH0gT2wx6LqbPSWIXpsFLsKhON4FQFXHKv4wjlpnOBFs 5Rmyyk8DgLOow== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4Q4b175Htnz6tyb; Mon, 24 Apr 2023 08:33:15 +0200 (CEST) In-Reply-To: <83wn22xbj9.fsf@gnu.org> X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:260547 Archived-At: Eli Zaretskii writes: >> I am sure that my dumb approach is not the best way to improve the >> performance, but this place in buf_bytepos_to_charpos is clearly >> something that can be optimized. > > Interesting. Would it be possible to show the effect of different > values of the cut-off on the performance, so we could decide which > value to use? I can do such test, but I do not think that playing with cut off is the best approach here. The full code in question is below and there is already existing condition to cut the marker loop early based on the distance from best_above to the requested bytepos. So, another approach could be playing with BYTECHAR_DISTANCE_INCREMENT. Now, it is clearly not efficient enough for my large file. (Note that I expressed similar idea in another discussion and was pointed exactly to this existing condition) Further, the later code creates markers to cache recent results and cutting too early may waste this cache. Another idea could be moving the cache markers into a separate array, so that we can examine them without mixing with all other buffer markers. I am not sure if it is feasible though. int i = 0; for (tail = BUF_MARKERS (b); tail && i < 50; tail = tail->next) { i++; CONSIDER (tail->bytepos, tail->charpos); /* If we are down to a range of 50 chars, don't bother checking any other markers; scan the intervening chars directly now. */ if (best_above - bytepos < distance || bytepos - best_below < distance) break; else distance += BYTECHAR_DISTANCE_INCREMENT; } /* We get here if we did not exactly hit one of the known places. We have one known above and one known below. Scan, counting characters, from whichever one is closer. */ if (bytepos - best_below_byte < best_above_byte - bytepos) { ... /* If this position is quite far from the nearest known position, cache the correspondence by creating a marker here. It will last until the next GC. But don't do it if BUF_MARKERS is nil; that is a signal from Fset_buffer_multibyte. */ if (record && BUF_MARKERS (b)) build_marker (b, best_below, best_below_byte); ... return best_below; } else { ... if (record && BUF_MARKERS (b)) build_marker (b, best_above, best_above_byte); ... return best_above; } } -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at