From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" Newsgroups: gmane.emacs.bugs Subject: bug#71644: 30.0.50; Severe slowdown in larger files with markers beginning in emacs 29+ Date: Fri, 21 Jun 2024 16:53:04 -0400 Message-ID: References: <86ed8tozub.fsf@gnu.org> <86jzijmo5a.fsf@gnu.org> <87h6dmbyy2.fsf@localhost> Reply-To: Stefan Monnier Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="35775"; mail-complaints-to="usenet@ciao.gmane.io" User-Agent: Gnus/5.13 (Gnus v5.13) Cc: Mitchell , Eli Zaretskii , 71644@debbugs.gnu.org To: Ihor Radchenko Original-X-From: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Fri Jun 21 22:54:17 2024 Return-path: Envelope-to: geb-bug-gnu-emacs@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1sKlGy-00092x-Ot for geb-bug-gnu-emacs@m.gmane-mx.org; Fri, 21 Jun 2024 22:54:17 +0200 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sKlGo-0005oQ-0z; Fri, 21 Jun 2024 16:54:06 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sKlGm-0005ns-BQ for bug-gnu-emacs@gnu.org; Fri, 21 Jun 2024 16:54:04 -0400 Original-Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sKlGm-0001DR-2t for bug-gnu-emacs@gnu.org; Fri, 21 Jun 2024 16:54:04 -0400 Original-Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sKlGk-00057l-Do for bug-gnu-emacs@gnu.org; Fri, 21 Jun 2024 16:54:02 -0400 X-Loop: help-debbugs@gnu.org Resent-From: Stefan Monnier Original-Sender: "Debbugs-submit" Resent-CC: bug-gnu-emacs@gnu.org Resent-Date: Fri, 21 Jun 2024 20:54:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 71644 X-GNU-PR-Package: emacs Original-Received: via spool by 71644-submit@debbugs.gnu.org id=B71644.171900319719646 (code B ref 71644); Fri, 21 Jun 2024 20:54:02 +0000 Original-Received: (at 71644) by debbugs.gnu.org; 21 Jun 2024 20:53:17 +0000 Original-Received: from localhost ([127.0.0.1]:43900 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sKlG0-00056o-U0 for submit@debbugs.gnu.org; Fri, 21 Jun 2024 16:53:17 -0400 Original-Received: from mailscanner.iro.umontreal.ca ([132.204.25.50]:48775) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sKlFx-00056Z-7o for 71644@debbugs.gnu.org; Fri, 21 Jun 2024 16:53:15 -0400 Original-Received: from pmg2.iro.umontreal.ca (localhost.localdomain [127.0.0.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id 6A87880C8E; Fri, 21 Jun 2024 16:53:07 -0400 (EDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=iro.umontreal.ca; s=mail; t=1719003185; bh=7AfMU+VaclZfW0lsebWgyaqBfx88fEvVq2+zFEvVelw=; h=From:To:Cc:Subject:In-Reply-To:References:Date:From; b=SLir+/rYCHh0m8eybxj4gw2rEsnZLCHnlvPSVVQmcnLztMfQ+39YjnwNwOpR+Dkpa O22Vxnwag5FUVWn46YMRkar6f79k57/esvHdH3ayqweM6IMftK6rIwHCZ7i53/mmBK DPaplKDeCRiXaUqatH/S2OlcHWuhMMTw+pexbZBF+sc74zKUwGud7JWiQ/3X451MCD 6h22oGsAu1h4kauQDg0bV/J7EOt5zWHTvBiUgVb6snMRlUr76sbSh4rcI7XUQ2fH1Z l1hs864qZhgfINFG0Wxl43+IjYBtVPJ1mOpIKH2xXmXRaYuXKOa4VR8gBpMSZRO2mn 8lk6dpu/vP9Rw== Original-Received: from mail01.iro.umontreal.ca (unknown [172.31.2.1]) by pmg2.iro.umontreal.ca (Proxmox) with ESMTP id D4D46801D8; Fri, 21 Jun 2024 16:53:05 -0400 (EDT) Original-Received: from asado (lechon.iro.umontreal.ca [132.204.27.242]) by mail01.iro.umontreal.ca (Postfix) with ESMTPSA id 8E2781201A6; Fri, 21 Jun 2024 16:53:05 -0400 (EDT) In-Reply-To: <87h6dmbyy2.fsf@localhost> (Ihor Radchenko's message of "Fri, 21 Jun 2024 06:19:01 +0000") X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-gnu-emacs@gnu.org List-Id: "Bug reports for GNU Emacs, the Swiss army knife of text editors" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Original-Sender: bug-gnu-emacs-bounces+geb-bug-gnu-emacs=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.emacs.bugs:287639 Archived-At: >>>From 0bafd288faee8cae33fe4a122f6e3ac73ec10d60 Mon Sep 17 00:00:00 2001 > Message-ID: <0bafd288faee8cae33fe4a122f6e3ac73ec10d60.1718950719.git.yant= ar92@posteo.net> > From: Ihor Radchenko > Date: Sun, 23 Apr 2023 21:31:46 +0200 > Subject: [PATCH] * src/marker.c (buf_bytepos_to_charpos): Limit marker se= arch > > Limit searching across buffer markers to first 50 markers and thus > avoid performance scaling with the number of markers. > > I got 5x `re-search-forward' speed improvement in my setup with this > dumb change. Hmm... of course, there'd likely be other circumstances where it would make it 5x slower. E.g. in a large buffer, doing a forward search from the middle of the buffer when the first 50 markers happen to all be near the beginning of the buffer will mean that we always use the "slow path" which scans all the bytes between PT and the position of interest to count the number of chars therein. BTW, we already stop after at most N/50 markers where N is the smallest distance between the position of interest and point/bob/eob/gap (oh and the position of the last conversion). So it seems that in your test, this distance N is >2500. Also when that distance is >5000 we create a new marker, so next time around that position should be at the beginning of the marker-list. So I wonder what happens in your test: why do we jump "very far" (more than 2500) between each call to the conversion function? Also, maybe we should increase BYTECHAR_DISTANCE_INCREMENT? ] This byte_to_char and char_to_byte conversion business is a real PITA. [ In retrospect I wish we'd stuck to the Emacs-20.1 design where chars=3Dbytes. I know it introduced a lot of breakage (and it'd be even worse now), but it is The Right Thing=E2=84=A2 if you can ignore backward compatibility. ] Using markers as a cheap cache of conversions was a cute hack but we really need to replace it. Some options that come to mind: - Keep the tradition of abusing an existing data structure, and stash the bytepos info inside the overlay tree or the text properties. This way the conversion is bounded by O(log BUFFERSIZE). - Use a dedicated data-structure. E.g. a pair of array-with-a-gap (one indexed by BYTEPOS/STEP the other indexed by CHARPOS/STEP, where STEP would be a large enough constant to make those arrays cheap yet small enough that the remaining scan is cheap). This way the conversion is O(STEP), i.e. "constant-time". BTW, among my various local hacks, I've been using the hack below, which aims to randomize the order in our markers-list, so as to minimize the risk that we have to wade through lots of markers all clumped around the same position. I don't think it does a good job of it, but maybe we can improve the execution of this idea, tho it still doesn't help if there's no GC involved. BTW, if/when we use some other data-structure to convert bytes<->chars, then we could presumably get rid of our markers-list and stash markers inside our overlay tree (basically represent them as degenerate overlays with beg=3D=3Dend and no properties). Stefan diff --git a/src/alloc.c b/src/alloc.c index 9304e4e42bb..07409e7cfc3 100644 --- a/src/alloc.c +++ b/src/alloc.c @@ -7836,10 +7514,22 @@ sweep_symbols (void) unchain_dead_markers (struct buffer *buffer) { struct Lisp_Marker *this, **prev =3D &BUF_MARKERS (buffer); + /* In order to try and avoid worst case behaviors in buf_charpos_to_byte= pos + we try and randomize the order of markers here. */ + unsigned i =3D 4; =20 while ((this =3D *prev)) if (vectorlike_marked_p (&this->header)) - prev =3D &this->next; + { + if (!shuffle_markers || i++ % 8) + prev =3D &this->next; + else + { /* Move this one to front, just to shuffle things a bit. */ + *prev =3D this->next; + this->next =3D BUF_MARKERS (buffer); + BUF_MARKERS (buffer) =3D this; + } + } else { this->buffer =3D NULL;