From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#71644: 30.0.50; Severe slowdown in larger files with markers beginning in emacs 29+ Date: Sat, 22 Jun 2024 11:52:08 -0400 Message-ID: References: <86ed8tozub.fsf@gnu.org> <86jzijmo5a.fsf@gnu.org> <87h6dmbyy2.fsf@localhost> <87wmmhgjao.fsf@localhost> Reply-To: Stefan Monnier Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="13853"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Mitchell , Eli Zaretskii , 71644@debbugs.gnu.org To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Sat Jun 22 17:53:17 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sL33F-0003MW-40 for geb-bug-gnu-emacs@m.gmane-mx.org; Sat, 22 Jun 2024 17:53:17 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sL330-0000zd-S7; Sat, 22 Jun 2024 11:53:02 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sL32z-0000zS-Vr for bug-gnu-emacs@gnu.org; Sat, 22 Jun 2024 11:53:02 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sL32z-0006Fw-O2 for bug-gnu-emacs@gnu.org; Sat, 22 Jun 2024 11:53:01 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sL32z-00022L-Hx for bug-gnu-emacs@gnu.org; Sat, 22 Jun 2024 11:53:01 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Sat, 22 Jun 2024 15:53:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 71644 X-GNU-PR-Package: emacs Original-Received: via spool by 71644-submit@debbugs.gnu.org id=B71644.17190715457752 (code B ref 71644); Sat, 22 Jun 2024 15:53:01 +0000 Original-Received: (at 71644) by debbugs.gnu.org; 22 Jun 2024 15:52:25 +0000 Original-Received: from localhost ([127.0.0.1]:50501 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sL32O-00020v-Mb for submit@debbugs.gnu.org; Sat, 22 Jun 2024 11:52:25 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:25732) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sL32N-00020Z-0C for 71644@debbugs.gnu.org; Sat, 22 Jun 2024 11:52:23 -0400 Original-Received: from pmg3.iro.umontreal.ca (localhost [127.0.0.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 9B57044212F; Sat, 22 Jun 2024 11:52:15 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1719071529; bh=kbWq6dpFJMeDg6qGg+/mtp3lRgzdIGAsP5qKqA0T8x4=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=WDZTMO4tqDHkA7ONe4BPB5v0jfrNAumjvs+wu58Gxv59jWjMKjbQAvyCpzS/mTun0 lTmLro3jTZmxAaFMVXQ27wG546KTFe1zKbXxKzovLZexnjTNU05J3uPF+8gBowwNnw 3GBiUbMwLqA3eB72Vq5NhF5aq9Zyf69RynLBjIcnSPyTAjhkkoeTscYCia8KUayWjv EmYWOAlAdsbgrkx9T/K5WA5Reaot+m1tWsJJXGui6d2CCyl5/VthGv8TopSDuLVlOO V2/jPYM5lB8Pca2786/lL48TZMqdA0kvVAe5nM3XOGRMPMxi47Ri8u4ghtLmYNsGNZ BmIsvvWFxVpLA== Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg3.iro.umontreal.ca (Proxmox) with ESMTP id 5866D4420EF; Sat, 22 Jun 2024 11:52:09 -0400 (EDT) Original-Received: from pastel (unknown [24.140.236.196]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 23F45120828; Sat, 22 Jun 2024 11:52:09 -0400 (EDT) In-Reply-To: <87wmmhgjao.fsf@localhost> (Ihor Radchenko's message of "Sat, 22 Jun 2024 14:10:23 +0000") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:287716 Archived-At: > (BTW, we should probably merge this bug and bug#63040 where I first > shared this patch - just to demonstrate the problem and discuss possible > solutions) Ah, thanks, I'll take a look at that bug. > Another idea could be moving the cache markers into a separate > array, so that we can examine them without mixing with all other > buffer markers. =F0=9F=99=82 >> Using markers as a cheap cache of conversions was a cute hack but we >> really need to replace it. >> >> Some options that come to mind: >> >> - Keep the tradition of abusing an existing data structure, and stash >> the bytepos info inside the overlay tree or the text properties. >> This way the conversion is bounded by O(log BUFFERSIZE). > > For overlay tree, it might be even better to stash all the markers in > Emacs into itree structure. Yes, I don't really distinguish the itree structure and the overlay tree, but indeed we could have a separate itree for the markers. > For now, every operation involving > adjusting/searching markers scales linearly - not ideal. Adjusting only happens upon insertion, and while it's definitely not ideal, it's surprising how little it bites in practice. And if we move to an array-with-gap it'll bite even much less. As for searching markers, AFAIK the only time we do that is for .... .... the bytes<->chars conversion =F0=9F=99=82 [ Admittedly, we also do a search when we delete a marker, but that'd be easy to optimize away in the common case if we cared about it. ] IIRC, XEmacs doesn't use a linked-list of markers but an array-with-gap of markers instead. Not sure if they keep it sorted, but if they do, such an array-with-gap can even be binary-searched, so it's quite efficient. >> - Use a dedicated data-structure. E.g. a pair of array-with-a-gap >> (one indexed by BYTEPOS/STEP the other indexed by CHARPOS/STEP, where >> STEP would be a large enough constant to make those arrays cheap yet >> small enough that the remaining scan is cheap). >> This way the conversion is O(STEP), i.e. "constant-time". > I think that it will be less efficient compared to using a tree-like > structure (if we can manage to use it). Will it be easier? Given we've survived with a *really* poor data-structure until now, I suspect that we don't need to worry about choosing the most efficient option. =F0=9F=99=82 >> BTW, among my various local hacks, I've been using the hack below, which >> aims to randomize the order in our markers-list, so as to minimize the >> risk that we have to wade through lots of markers all clumped around the >> same position. I don't think it does a good job of it, but maybe we can >> improve the execution of this idea, tho it still doesn't help if there's >> no GC involved. > I am not sure if I believe that this approach can yield practical gains. Neither am I. Another idea I had in the same vicinity (and hence arguably just as unconvincing) was that in buf_*pos_to_*pos, when we exit the for (tail =3D BUF_MARKERS (b); tail; tail =3D tail->next) loop, we could move the markers we just skipped to the end of the list, based on the idea that we've just found them to be useless at the head. Doing it efficiently requires keeping a pointer to the end of the list, tho. > AFAIU, the problem with the slowdown we are discussing here is markers > that are all around the same position. It's rather too many markers in > general, spaced not far from each other. Sounds about right. But if we could keep them nicely randomized (which was my original goal with my quick hack), then the total number of markers wouldn't matter that much because the first N markers in the list would still be (probabilistically) nicely spread over the whole buffer. >> BTW, if/when we use some other data-structure to convert bytes<->chars, >> then we could presumably get rid of our markers-list and stash markers >> inside our overlay tree (basically represent them as degenerate overlays >> with beg=3D=3Dend and no properties). > I am wondering why it is impossible to stash markers inside overlay tree > without doing anything special about bytes<->chars conversion (other > than changing the linear loop with itree query). I think the answer is that it's not not possible. Stefan