From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Ihor Radchenko Newsgroups: gmane.emacs.bugs Subject: bug#71644: 30.0.50; Severe slowdown in larger files with markers beginning in emacs 29+ Date: Sat, 22 Jun 2024 14:10:23 +0000 Message-ID: <87wmmhgjao.fsf@localhost> References: <86ed8tozub.fsf@gnu.org> <86jzijmo5a.fsf@gnu.org> <87h6dmbyy2.fsf@localhost> Mime-Version: 1.0 Content-Type: text/plain Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="2662"; mail-complaints-to="usenet@ciao.gmane.io" Cc: Mitchell , Eli Zaretskii , 71644@debbugs.gnu.org To: Stefan Monnier Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jun 22 16:09:20 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sL1Qd-0000Sw-AE for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 22 Jun 2024 16:09:19 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sL1QP-0005bV-32; Sat, 22 Jun 2024 10:09:05 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sL1QM-0005b2-VN for bug-gnu-emacs@gnu.org; Sat, 22 Jun 2024 10:09:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sL1QM-0004cJ-N3 for bug-gnu-emacs@gnu.org; Sat, 22 Jun 2024 10:09:02 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sL1QM-0006fY-6W for bug-gnu-emacs@gnu.org; Sat, 22 Jun 2024 10:09:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Ihor Radchenko Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 22 Jun 2024 14:09:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 71644 X-GNU-PR-Package: emacs Original-Received: via spool by 71644-submit@debbugs.gnu.org id=B71644.171906532925614 (code B ref 71644); Sat, 22 Jun 2024 14:09:02 +0000 Original-Received: (at 71644) by debbugs.gnu.org; 22 Jun 2024 14:08:49 +0000 Original-Received: from localhost ([127.0.0.1]:47232 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sL1Q8-0006f3-VF for submit@debbugs.gnu.org; Sat, 22 Jun 2024 10:08:49 -0400 Original-Received: from mout02.posteo.de ([185.67.36.66]:49711) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sL1Q6-0006el-Jd for 71644@debbugs.gnu.org; Sat, 22 Jun 2024 10:08:47 -0400 Original-Received: from submission (posteo.de [185.67.36.169]) by mout02.posteo.de (Postfix) with ESMTPS id C00AF240103 for <71644@debbugs.gnu.org>; Sat, 22 Jun 2024 16:08:40 +0200 (CEST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1719065320; bh=NEYDy0q8Al0jhoD5qSbKSfnt4hRb/+sUDIOO8tlMKhU=; h=From:To:Cc:Subject:Date:Message-ID:MIME-Version:Content-Type: From; b=ikyN5m5WUUg2qMmUNSXy3Gb3ElY2daBB2DEsktxM+ZeP41f86JzxMI9vl466GTXjX /VVREveM6bcR8WIfhH1tImbzy3htNt2YNrrqF2T7d5WNrjcyoYEjoel5mXXQxWb9pu 2zLXALYGLa0d0jIyL+jZ+ljokXyJoLf12IzD72we1LMm84Y7l8sjL9m43nmfSgQD5y FTiNeDQ9fHvCCBka6xWUwOz7W74ituXbd+xdo51oT8FjNA0qC3B2PCHvCmR786k3+C MdCW6et53ra7vt7f0b2ITsx27U2TDLDlpIfsQLOSxWL7zc+TnRLtwSeXTJy+W+Qik8 1eBgZ2Ih5ggCA== Original-Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4W5x0S07LJz6tyX; Sat, 22 Jun 2024 16:08:39 +0200 (CEST) In-Reply-To: X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:287707 Archived-At: Stefan Monnier writes: >> I got 5x `re-search-forward' speed improvement in my setup with this >> dumb change. > > Hmm... of course, there'd likely be other circumstances where it would > make it 5x slower. E.g. in a large buffer, doing a forward search from > the middle of the buffer when the first 50 markers happen to all be near > the beginning of the buffer will mean that we always use the "slow path" > which scans all the bytes between PT and the position of interest to > count the number of chars therein. Yup. I _do not_ propose my patch to be merged. (BTW, we should probably merge this bug and bug#63040 where I first shared this patch - just to demonstrate the problem and discuss possible solutions) > BTW, we already stop after at most N/50 markers where N is the smallest > distance between the position of interest and point/bob/eob/gap (oh and > the position of the last conversion). So it seems that in your test, > this distance N is >2500. Also when that distance is >5000 we create > a new marker, so next time around that position should be at the > beginning of the marker-list. So I wonder what happens in your test: > why do we jump "very far" (more than 2500) between each call to the > conversion function? Also, maybe we should increase > BYTECHAR_DISTANCE_INCREMENT? ] It is indeed another option. Also, from bug#63040 Another idea could be moving the cache markers into a separate array, so that we can examine them without mixing with all other buffer markers. > Using markers as a cheap cache of conversions was a cute hack but we > really need to replace it. > > Some options that come to mind: > > - Keep the tradition of abusing an existing data structure, and stash > the bytepos info inside the overlay tree or the text properties. > This way the conversion is bounded by O(log BUFFERSIZE). For overlay tree, it might be even better to stash all the markers in Emacs into itree structure. For now, every operation involving adjusting/searching markers scales linearly - not ideal. > - Use a dedicated data-structure. E.g. a pair of array-with-a-gap > (one indexed by BYTEPOS/STEP the other indexed by CHARPOS/STEP, where > STEP would be a large enough constant to make those arrays cheap yet > small enough that the remaining scan is cheap). > This way the conversion is O(STEP), i.e. "constant-time". I think that it will be less efficient compared to using a tree-like structure (if we can manage to use it). Will it be easier? > BTW, among my various local hacks, I've been using the hack below, which > aims to randomize the order in our markers-list, so as to minimize the > risk that we have to wade through lots of markers all clumped around the > same position. I don't think it does a good job of it, but maybe we can > improve the execution of this idea, tho it still doesn't help if there's > no GC involved. I am not sure if I believe that this approach can yield practical gains. AFAIU, the problem with the slowdown we are discussing here is markers that are all around the same position. It's rather too many markers in general, spaced not far from each other. > BTW, if/when we use some other data-structure to convert bytes<->chars, > then we could presumably get rid of our markers-list and stash markers > inside our overlay tree (basically represent them as degenerate overlays > with beg==end and no properties). I am wondering why it is impossible to stash markers inside overlay tree without doing anything special about bytes<->chars conversion (other than changing the linear loop with itree query). -- Ihor Radchenko // yantar92, Org mode contributor, Learn more about Org mode at . Support Org development at , or support my work at