unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
From: Stefan Monnier via "Bug reports for GNU Emacs, the Swiss army knife of text editors" <bug-gnu-emacs@gnu.org>
To: Ihor Radchenko <yantar92@posteo.net>
Cc: Mitchell <mitchellahren@gmail.com>, Eli Zaretskii <eliz@gnu.org>,
	71644@debbugs.gnu.org
Subject: bug#71644: 30.0.50; Severe slowdown in larger files with markers beginning in emacs 29+
Date: Fri, 21 Jun 2024 16:53:04 -0400	[thread overview]
Message-ID: <jwvbk3u12t1.fsf-monnier+emacs@gnu.org> (raw)
In-Reply-To: <87h6dmbyy2.fsf@localhost> (Ihor Radchenko's message of "Fri, 21 Jun 2024 06:19:01 +0000")

>>From 0bafd288faee8cae33fe4a122f6e3ac73ec10d60 Mon Sep 17 00:00:00 2001
> Message-ID: <0bafd288faee8cae33fe4a122f6e3ac73ec10d60.1718950719.git.yantar92@posteo.net>
> From: Ihor Radchenko <yantar92@posteo.net>
> Date: Sun, 23 Apr 2023 21:31:46 +0200
> Subject: [PATCH] * src/marker.c (buf_bytepos_to_charpos): Limit marker search
>
> Limit searching across buffer markers to first 50 markers and thus
> avoid performance scaling with the number of markers.
>
> I got 5x `re-search-forward' speed improvement in my setup with this
> dumb change.

Hmm... of course, there'd likely be other circumstances where it would
make it 5x slower.  E.g. in a large buffer, doing a forward search from
the middle of the buffer when the first 50 markers happen to all be near
the beginning of the buffer will mean that we always use the "slow path"
which scans all the bytes between PT and the position of interest to
count the number of chars therein.

BTW, we already stop after at most N/50 markers where N is the smallest
distance between the position of interest and point/bob/eob/gap (oh and
the position of the last conversion).  So it seems that in your test,
this distance N is >2500.  Also when that distance is >5000 we create
a new marker, so next time around that position should be at the
beginning of the marker-list.  So I wonder what happens in your test:
why do we jump "very far" (more than 2500) between each call to the
conversion function?  Also, maybe we should increase
BYTECHAR_DISTANCE_INCREMENT?  ]

This byte_to_char and char_to_byte conversion business is a real PITA.
[ In retrospect I wish we'd stuck to the Emacs-20.1 design where
  chars=bytes.  I know it introduced a lot of breakage (and it'd be even
  worse now), but it is The Right Thing™ if you can ignore backward
  compatibility.  ]

Using markers as a cheap cache of conversions was a cute hack but we
really need to replace it.

Some options that come to mind:

- Keep the tradition of abusing an existing data structure, and stash
  the bytepos info inside the overlay tree or the text properties.
  This way the conversion is bounded by O(log BUFFERSIZE).

- Use a dedicated data-structure.  E.g. a pair of array-with-a-gap
  (one indexed by BYTEPOS/STEP the other indexed by CHARPOS/STEP, where
  STEP would be a large enough constant to make those arrays cheap yet
  small enough that the remaining scan is cheap).
  This way the conversion is O(STEP), i.e. "constant-time".

BTW, among my various local hacks, I've been using the hack below, which
aims to randomize the order in our markers-list, so as to minimize the
risk that we have to wade through lots of markers all clumped around the
same position.  I don't think it does a good job of it, but maybe we can
improve the execution of this idea, tho it still doesn't help if there's
no GC involved.

BTW, if/when we use some other data-structure to convert bytes<->chars,
then we could presumably get rid of our markers-list and stash markers
inside our overlay tree (basically represent them as degenerate overlays
with beg==end and no properties).


        Stefan


diff --git a/src/alloc.c b/src/alloc.c
index 9304e4e42bb..07409e7cfc3 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -7836,10 +7514,22 @@ sweep_symbols (void)
 unchain_dead_markers (struct buffer *buffer)
 {
   struct Lisp_Marker *this, **prev = &BUF_MARKERS (buffer);
+  /* In order to try and avoid worst case behaviors in buf_charpos_to_bytepos
+     we try and randomize the order of markers here.  */
+  unsigned i = 4;
 
   while ((this = *prev))
     if (vectorlike_marked_p (&this->header))
-      prev = &this->next;
+      {
+        if (!shuffle_markers || i++ % 8)
+          prev = &this->next;
+        else
+          { /* Move this one to front, just to shuffle things a bit.  */
+            *prev = this->next;
+            this->next = BUF_MARKERS (buffer);
+            BUF_MARKERS (buffer) = this;
+          }
+      }
     else
       {
         this->buffer = NULL;






  reply	other threads:[~2024-06-21 20:53 UTC|newest]

Thread overview: 42+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-06-19  5:25 bug#71644: 30.0.50; Severe slowdown in larger files with markers beginning in emacs 29+ Mitchell
2024-06-19 12:56 ` Eli Zaretskii
2024-06-20 13:49   ` Ihor Radchenko
2024-06-20 16:11     ` Eli Zaretskii
2024-06-20 16:24       ` Eli Zaretskii
2024-06-20 18:57   ` Mitchell
2024-06-20 19:04     ` Eli Zaretskii
2024-06-21  2:46       ` Mitchell
2024-06-21  6:19         ` Ihor Radchenko
2024-06-21 20:53           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors [this message]
2024-06-22  5:23             ` Gerd Möllmann
2024-06-22 14:10             ` Ihor Radchenko
2024-06-22 15:52               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-25  3:07               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-25  4:00                 ` Eli Zaretskii
2024-06-25  9:30                 ` Ihor Radchenko
2024-06-25 13:44                   ` Eli Zaretskii
2024-06-25 13:50                     ` Ihor Radchenko
2024-06-25 13:57                       ` Eli Zaretskii
2024-06-25 14:25                         ` Ihor Radchenko
2024-06-26  3:53                         ` Mitchell
2024-06-26 12:41                           ` Eli Zaretskii
2024-06-25 20:54                 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-26 11:41                   ` Eli Zaretskii
2024-06-26 12:35                     ` Ihor Radchenko
2024-06-26 13:30                     ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-26 13:49                       ` Ihor Radchenko
2024-06-26 14:08                         ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-29 21:27                           ` Mitchell
2024-06-25 21:02               ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-21  6:48         ` Eli Zaretskii
2024-06-21 21:06           ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-22  6:57             ` Eli Zaretskii
2024-06-22 18:03               ` Mitchell
2024-06-22 18:17                 ` Eli Zaretskii
2024-06-24  7:09                   ` Mitchell
2024-06-24 12:38                     ` Eli Zaretskii
2024-06-25  3:08                       ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors
2024-06-22 13:53 ` Ihor Radchenko
2024-06-22 14:12   ` Eli Zaretskii
2024-06-22 14:15     ` Eli Zaretskii
2024-06-22 15:09 ` Stefan Monnier via Bug reports for GNU Emacs, the Swiss army knife of text editors

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/emacs/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=jwvbk3u12t1.fsf-monnier+emacs@gnu.org \
    --to=bug-gnu-emacs@gnu.org \
    --cc=71644@debbugs.gnu.org \
    --cc=eliz@gnu.org \
    --cc=mitchellahren@gmail.com \
    --cc=monnier@iro.umontreal.ca \
    --cc=yantar92@posteo.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).