unofficial mirror of bug-gnu-emacs@gnu.org 
 help / color / mirror / code / Atom feed
* bug#24548: 25.2.50; Long GC delays with many non-detached markers (PATCH)
@ 2016-09-26 15:12 Pip Cet
  2018-03-23 13:55 ` Stefan Monnier
  0 siblings, 1 reply; 5+ messages in thread
From: Pip Cet @ 2016-09-26 15:12 UTC (permalink / raw)
  To: 24548

[-- Attachment #1: Type: text/plain, Size: 5363 bytes --]

This is an enhancement request for Emacs 25.2.

When many markers are created but not explicitly made to point to no
buffer after use, GC performance suffers severely. (Normal performance
(apparently) suffers a little. That's okay and I don't consider that a
problem.)  An automated test script that was in no way explicitly
designed to do so caused an Emacs session to be caught in garbage
collection for more than an hour.

The problem can be observed with this test code:
----------
(defvar l2 nil)

(defun loop (lim)
  (let ((i 0)
        (l nil))
    (while (< i lim)
      (push (copy-marker (point-max)) l)
      (push (copy-marker (point-min)) l)
      (push (copy-marker (point-max)) l2)
      (push (copy-marker (point-min)) l2)
      (setq i (1+ i)))))
----------

On this system, I'm seeing the following behaviour:

----------
$ time emacs -Q --load ~/git/ankol/emacs/loop.el --eval '(loop 10000)'
--eval '(garbage-collect)' --kill

real    0m2.556s
user    0m2.392s
sys    0m0.068s

$ time emacs -Q --load ~/git/ankol/emacs/loop.el --eval '(loop 20000)'
--eval '(garbage-collect)' --kill

real    0m19.576s
user    0m19.340s
sys    0m0.068s
----------

Upon investigation, it turns out that we keep a buffer's markers in a
singly-linked list, which we walk for every marker as it is
collected. That's O(n^2) (where n is the number of collected markers,
not that of markers that survive this garbage collection cycle); if
Emacs is made to grow to considerable size, this begins to dominate and
the hour-long GC problem appears.

This happened to me while testing a new programming language mode that
has a number of interactive commands that transform code without
changing its semantics; in order to make sure the transformations
weren't doing anything wrong, I ran them automatedly in large numbers,
and that's when my test sessions would appear to freeze while caught in
GC.

I thought it would be very easy to modify the code to avoid the problem;
it was a bit harder than I thought, because the GC mark bit is not
equivalent to "this object survives the current GC cycle". In the
attached patch, I ended up sacrificing one more bit to mean precisely
that, so we can walk a buffer's marker list once and unchain all
collected markers. If there's a better way of telling whether a marker
is due to be collected in the current GC cycle, I'd be happy to modify
the code.  (I tried mem_find and PURE_P, but there appear to be some
additional cases that are missed by that approach.)

As always, the patch is merely a suggestion.  However, it improves
performance (of the test code above and the actual test case) so
drastically that I think it is a good idea to fix this problem.

In GNU Emacs 25.2.50.19 (x86_64-pc-linux-gnu, X toolkit, Xaw scroll bars)
 of 2016-09-26 built on amygdala
Repository revision: 5ee56c4613e9380dbbe4bbaa97b29dd377e2134c
Windowing system distributor 'The X.Org Foundation', version 11.0.11804000
System Description:    Debian GNU/Linux unstable (sid)

Recent messages:
For information about GNU Emacs and the GNU system, type C-h C-a.

Configured features:
XPM JPEG TIFF GIF PNG SOUND DBUS GSETTINGS NOTIFY GNUTLS LIBXML2
FREETYPE XFT ZLIB TOOLKIT_SCROLL_BARS LUCID X11

Important settings:
  value of $LANG: en_US.UTF-8
  locale-coding-system: utf-8-unix

Major mode: Lisp Interaction

Minor modes in effect:
  tooltip-mode: t
  global-eldoc-mode: t
  electric-indent-mode: t
  mouse-wheel-mode: t
  tool-bar-mode: t
  menu-bar-mode: t
  file-name-shadow-mode: t
  global-font-lock-mode: t
  font-lock-mode: t
  blink-cursor-mode: t
  auto-composition-mode: t
  auto-encryption-mode: t
  auto-compression-mode: t
  line-number-mode: t
  transient-mark-mode: t

Load-path shadows:
None found.

Features:
(shadow sort mail-extr emacsbug message subr-x puny seq byte-opt gv
bytecomp byte-compile cl-extra help-mode cconv cl-loaddefs pcase cl-lib
dired dired-loaddefs format-spec rfc822 mml easymenu mml-sec
password-cache epa derived epg epg-config gnus-util rmail rmail-loaddefs
mm-decode mm-bodies mm-encode mail-parse rfc2231 mailabbrev gmm-utils
mailheader sendmail rfc2047 rfc2045 ietf-drums mm-util mail-prsvr
mail-utils time-date mule-util tooltip eldoc electric uniquify
ediff-hook vc-hooks lisp-float-type mwheel term/x-win x-win
term/common-win x-dnd tool-bar dnd fontset image regexp-opt fringe
tabulated-list newcomment elisp-mode lisp-mode prog-mode register page
menu-bar rfn-eshadow timer select scroll-bar mouse jit-lock font-lock
syntax facemenu font-core term/tty-colors frame cl-generic cham georgian
utf-8-lang misc-lang vietnamese tibetan thai tai-viet lao korean
japanese eucjp-ms cp51932 hebrew greek romanian slovak czech european
ethiopic indian cyrillic chinese charscript case-table epa-hook
jka-cmpr-hook help simple abbrev obarray minibuffer cl-preloaded nadvice
loaddefs button faces cus-face macroexp files text-properties overlay
sha1 md5 base64 format env code-pages mule custom widget
hashtable-print-readable backquote dbusbind inotify dynamic-setting
system-font-setting font-render-setting x-toolkit x multi-tty
make-network-process emacs)

Memory information:
((conses 16 96543 9103)
 (symbols 48 20425 0)
 (miscs 40 47 168)
 (strings 32 18288 5117)
 (string-bytes 1 571191)
 (vectors 16 13407)
 (vector-slots 8 447852 3741)
 (floats 8 185 69)
 (intervals 56 221 0)
 (buffers 976 11))

[-- Attachment #2: emacs-bug-001-002.diff --]
[-- Type: text/plain, Size: 4337 bytes --]

diff --git a/src/alloc.c b/src/alloc.c
index 41b2f9e..1e9189a 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -3675,6 +3675,7 @@ allocate_misc (enum Lisp_Misc_Type type)
   misc_objects_consed++;
   XMISCANY (val)->type = type;
   XMISCANY (val)->gcmarkbit = 0;
+  XMISCANY (val)->gcsweepbit = 0;
   return val;
 }
 
@@ -6893,6 +6894,24 @@ sweep_misc (void)
   for (mblk = marker_block; mblk; mblk = *mprev)
     {
       register int i;
+
+      for (i = 0; i < lim; i++)
+        {
+          if (mblk->markers[i].m.u_any.type == Lisp_Misc_Marker)
+            {
+              if (!mblk->markers[i].m.u_any.gcmarkbit)
+                {
+                  mblk->markers[i].m.u_marker.gcsweepbit = 1;
+                }
+            }
+        }
+      lim = MARKER_BLOCK_SIZE;
+      mprev = &mblk->next;
+    }
+  lim = marker_block_index;
+  for (mblk = marker_block; mblk; mblk = *mprev)
+    {
+      register int i;
       int this_free = 0;
 
       for (i = 0; i < lim; i++)
@@ -6900,7 +6919,7 @@ sweep_misc (void)
           if (!mblk->markers[i].m.u_any.gcmarkbit)
             {
               if (mblk->markers[i].m.u_any.type == Lisp_Misc_Marker)
-                unchain_marker (&mblk->markers[i].m.u_marker);
+                unchain_collected_markers(mblk->markers[i].m.u_marker.buffer);
               else if (mblk->markers[i].m.u_any.type == Lisp_Misc_Finalizer)
                 unchain_finalizer (&mblk->markers[i].m.u_finalizer);
 #ifdef HAVE_MODULES
diff --git a/src/lisp.h b/src/lisp.h
index f653d85..1af4c34 100644
--- a/src/lisp.h
+++ b/src/lisp.h
@@ -2040,14 +2040,16 @@ struct Lisp_Misc_Any		/* Supertype of all Misc types.  */
 {
   ENUM_BF (Lisp_Misc_Type) type : 16;		/* = Lisp_Misc_??? */
   bool_bf gcmarkbit : 1;
-  unsigned spacer : 15;
+  bool_bf gcsweepbit : 1;
+  unsigned spacer : 14;
 };
 
 struct Lisp_Marker
 {
   ENUM_BF (Lisp_Misc_Type) type : 16;		/* = Lisp_Misc_Marker */
   bool_bf gcmarkbit : 1;
-  unsigned spacer : 13;
+  bool_bf gcsweepbit : 1;
+  unsigned spacer : 12;
   /* This flag is temporarily used in the functions
      decode/encode_coding_object to record that the marker position
      must be adjusted after the conversion.  */
@@ -3976,6 +3978,7 @@ extern void clear_charpos_cache (struct buffer *);
 extern ptrdiff_t buf_charpos_to_bytepos (struct buffer *, ptrdiff_t);
 extern ptrdiff_t buf_bytepos_to_charpos (struct buffer *, ptrdiff_t);
 extern void unchain_marker (struct Lisp_Marker *marker);
+extern void unchain_collected_markers (struct buffer *buffer);
 extern Lisp_Object set_marker_restricted (Lisp_Object, Lisp_Object, Lisp_Object);
 extern Lisp_Object set_marker_both (Lisp_Object, Lisp_Object, ptrdiff_t, ptrdiff_t);
 extern Lisp_Object set_marker_restricted_both (Lisp_Object, Lisp_Object,
diff --git a/src/marker.c b/src/marker.c
index 05e5bb8..9cd7d26 100644
--- a/src/marker.c
+++ b/src/marker.c
@@ -629,6 +629,46 @@ unchain_marker (register struct Lisp_Marker *marker)
     }
 }
 
+/* Remove all markers that are due to be swept from the chain of
+   BUFFER, leaving them pointing to nowhere.  This is called during
+   garbage collection, so we must be careful to ignore and preserve
+   mark bits, including those in chain fields of markers.  */
+
+void
+unchain_collected_markers (register struct buffer *buffer)
+{
+  register struct Lisp_Marker *tail, **prev;
+
+  if (!buffer)
+    return;
+
+  /* No dead buffers here.  */
+  eassert (BUFFER_LIVE_P (buffer));
+
+  prev = &BUF_MARKERS (buffer);
+
+  for (tail = BUF_MARKERS (buffer); tail; prev = &tail->next, tail = *prev)
+    if (tail->gcsweepbit)
+      {
+        register struct Lisp_Marker *next;
+        for (next = tail; next && next->gcsweepbit; next = next->next)
+          {
+            next->gcsweepbit = 0;
+            next->buffer = NULL;
+          }
+        if (*prev == BUF_MARKERS (buffer))
+          {
+            /* Deleting first marker from the buffer's chain.  Crash
+               if new first marker in chain does not say it belongs
+               to the same buffer, or at least that they have the same
+               base buffer.  */
+            if (next && buffer->text != next->buffer->text)
+              emacs_abort ();
+          }
+        *prev = next;
+      }
+}
+
 /* Return the char position of marker MARKER, as a C integer.  */
 
 ptrdiff_t

^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#24548: 25.2.50; Long GC delays with many non-detached markers (PATCH)
  2016-09-26 15:12 bug#24548: 25.2.50; Long GC delays with many non-detached markers (PATCH) Pip Cet
@ 2018-03-23 13:55 ` Stefan Monnier
  2018-03-23 14:22   ` Stefan Monnier
  0 siblings, 1 reply; 5+ messages in thread
From: Stefan Monnier @ 2018-03-23 13:55 UTC (permalink / raw)
  To: Pip Cet; +Cc: 24548

> I thought it would be very easy to modify the code to avoid the problem;
> it was a bit harder than I thought, because the GC mark bit is not
> equivalent to "this object survives the current GC cycle".

Could you give a bit more details about what you mean by that?

During the mark phase, indeed the markbit only says "if true then this
object won't be GC'd but if false than maybe it's only because we
haven't finished marking".  Is that what you're referring to?

But if we call unchain_collected_markers from within the sweep phase
(e.g. on every buffer we find), `gcmarkbit` should be
sufficient/reliable.  Or am I missing something?


        Stefan





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#24548: 25.2.50; Long GC delays with many non-detached markers (PATCH)
  2018-03-23 13:55 ` Stefan Monnier
@ 2018-03-23 14:22   ` Stefan Monnier
  2018-03-23 15:10     ` Pip Cet
  2018-03-23 15:11     ` Stefan Monnier
  0 siblings, 2 replies; 5+ messages in thread
From: Stefan Monnier @ 2018-03-23 14:22 UTC (permalink / raw)
  To: Pip Cet; +Cc: 24548

> But if we call unchain_collected_markers from within the sweep phase
> (e.g. on every buffer we find), `gcmarkbit` should be
> sufficient/reliable.  Or am I missing something?

At least the patch below seems to work as well.


        Stefan


diff --git a/src/alloc.c b/src/alloc.c
index da01173fba..369592d70e 100644
--- a/src/alloc.c
+++ b/src/alloc.c
@@ -7095,7 +7091,9 @@ sweep_misc (void)
           if (!mblk->markers[i].m.u_any.gcmarkbit)
             {
               if (mblk->markers[i].m.u_any.type == Lisp_Misc_Marker)
-                unchain_marker (&mblk->markers[i].m.u_marker);
+                /* Make sure markers have been unchained from their buffer
+                   in sweep_buffer before we collect them.  */
+                eassert (!mblk->markers[i].m.u_marker.buffer);
               else if (mblk->markers[i].m.u_any.type == Lisp_Misc_Finalizer)
                 unchain_finalizer (&mblk->markers[i].m.u_finalizer);
 #ifdef HAVE_MODULES
@@ -7142,6 +7140,23 @@ sweep_misc (void)
   total_free_markers = num_free;
 }
 
+/* Remove BUFFER's markers that are due to be swept.  This is needed since
+   we treat BUF_MARKERS and markers's `next' field as weak pointers.  */
+static void
+unchain_dead_markers (struct buffer *buffer)
+{
+  struct Lisp_Marker *this, **prev = &BUF_MARKERS (buffer);
+
+  while ((this = *prev))
+    if (this->gcmarkbit)
+      prev = &this->next;
+    else
+      {
+        this->buffer = NULL;
+        *prev = this->next;
+      }
+}
+
 NO_INLINE /* For better stack traces */
 static void
 sweep_buffers (void)
@@ -7160,6 +7175,7 @@ sweep_buffers (void)
         VECTOR_UNMARK (buffer);
         /* Do not use buffer_(set|get)_intervals here.  */
         buffer->text->intervals = balance_intervals (buffer->text->intervals);
+        unchain_dead_markers (buffer);
         total_buffers++;
         bprev = &buffer->next;
       }
@@ -7179,8 +7195,8 @@ gc_sweep (void)
   sweep_floats ();
   sweep_intervals ();
   sweep_symbols ();
-  sweep_misc ();
   sweep_buffers ();
+  sweep_misc ();
   sweep_vectors ();
   check_string_bytes (!noninteractive);
 }





^ permalink raw reply related	[flat|nested] 5+ messages in thread

* bug#24548: 25.2.50; Long GC delays with many non-detached markers (PATCH)
  2018-03-23 14:22   ` Stefan Monnier
@ 2018-03-23 15:10     ` Pip Cet
  2018-03-23 15:11     ` Stefan Monnier
  1 sibling, 0 replies; 5+ messages in thread
From: Pip Cet @ 2018-03-23 15:10 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: 24548

I'm afraid I can't make sense of my earlier comment, so please ignore
it. There's nothing I can think of today that would prevent your patch
from working, so if it does work that would fix the bug.





^ permalink raw reply	[flat|nested] 5+ messages in thread

* bug#24548: 25.2.50; Long GC delays with many non-detached markers (PATCH)
  2018-03-23 14:22   ` Stefan Monnier
  2018-03-23 15:10     ` Pip Cet
@ 2018-03-23 15:11     ` Stefan Monnier
  1 sibling, 0 replies; 5+ messages in thread
From: Stefan Monnier @ 2018-03-23 15:11 UTC (permalink / raw)
  To: 24548-done; +Cc: Pip Cet

Version: 27.1

> At least the patch below seems to work as well.

Yes, I'm sufficiently confident that I installed it into master.


        Stefan





^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2018-03-23 15:11 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2016-09-26 15:12 bug#24548: 25.2.50; Long GC delays with many non-detached markers (PATCH) Pip Cet
2018-03-23 13:55 ` Stefan Monnier
2018-03-23 14:22   ` Stefan Monnier
2018-03-23 15:10     ` Pip Cet
2018-03-23 15:11     ` Stefan Monnier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).