unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed
* GC: cons sweeping
@ 2007-07-02 15:01 Dmitry Antipov
  2007-07-02 18:15 ` Stefan Monnier
  0 siblings, 1 reply; 4+ messages in thread
From: Dmitry Antipov @ 2007-07-02 15:01 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 4488 bytes --]

For the most common usage patterns, it was observed that from 50% to near 80%
of allocated cons blocks are full (for the dumped Emacs). Other blocks may
contains a free cons cells, and, when Emacs lifetime grows, the fragmentation for
such blocks usually increases (this is a weakness of all non-copying GCs we can't avoid).
Anyway, the number of full blocks is quite large, and some of non-full blocks may
contains large full subareas. This means that we may try to check the whole word of
the cons block bitmap first - if all appropriate conses are marked, it will be -1,
and, since there is nothing to link into free list from there, all mark bits might be
just cleared at once by setting this word to 0. Otherwise, we should scan over
each bit of this word and link non-marked conses to a free list.

Here is an implementation of a such sweeping scheme. It's a bit more complicated
than described because a) first cons block may be used partially and b) number of
bits in a cons block bitmap is not a multiple of a number of bits per word, so
last word of the bitmap might be scanned as usual. This patch also saves an old
sweeping code under #if and provides a simple timing of both sweeping methods
in order to compare the speed of them.

Finally, there are typical results of simple benchmarks I've performed.
Everything is started from 'src' subdirectory of the source tree. Cons sweeping
time (in us) for the original code is 2nd column, for the modified - 3rd.
Since the number of usecs is absolute and highly depends on your hardware,
only the difference (or ratio) between them makes sense.

1. Run 'emacs *.[ch]', then exit immediately:

    GC0     612     514
    GC1     692     578
    GC2     814     674
    GC3     741     522
    GC4     778     534
    GC5     776     505
    GC6     779     490
    GC7     856     543
    GC8     912     590
    GC9     941     611
    GC10    971     597
    GC11    1000    610
    GC12    1031    620
    GC13    1104    661
    GC14    1099    664
    GC15    1134    684
    GC16    1149    679
    GC17    1199    719
    GC18    1215    717
    GC19    1243    733
    GC20    1277    780
    GC21    1316    774
    GC22    1422    802
    GC23    1381    803
    GC24    1400    823
    GC25    1430    1009
    GC26    1474    834
    GC27    1497    840
    GC28    1512    849
    GC29    1536    866
    GC30    1574    968
    GC31    1595    954
    GC32    1606    893
    GC33    1671    921
    GC34    1693    924
    GC35    1727    937
    GC36    1759    994
    GC37    1856    987
    GC38    1809    998
    GC39    1831    987

2. Run 'emacs -batch -f batch-byte-compile ../lisp/textmodes/org.el':

    GC0     602     504
    GC1     561     383
    GC2     613     438
    GC3     650     451
    GC4     676     412
    GC5     696     411
    GC6     712     457
    GC7     799     509
    GC8     825     526
    GC9     850     515
    GC10    914     556
    GC11    893     500
    GC12    1002    584
    GC13    1052    619
    GC14    1145    711
    GC15    1340    909
    GC16    1520    930
    GC17    1333    894
    GC18    1376    951
    GC19    1402    992
    GC20    1331    920
    GC21    1381    1008
    GC22    1382    970
    GC23    1376    998
    GC24    1441    1037
    GC25    1384    991
    GC26    1442    1065
    GC27    1430    1060
    GC28    1400    1017
    GC29    1400    991
    GC30    1521    1072
    GC31    1427    1014

3. Run 'emacs ChangeLog', then '(replace-string "a" "__A__")' and exit
    without saving the modified buffer:

    GC0     607     510
    GC1     697     593
    GC2     809     670
    GC3     757     537
    GC4     736     599
    GC5     863     679
    GC6     937     661
    GC7     1023    702
    GC8     1144    753
    GC9     1195    802
    GC10    1308    829
    GC11    1359    851
    GC12    1509    900
    GC13    1641    912
    GC14    1648    969
    GC15    1728    1004
    GC16    1929    1047
    GC17    2100    1146
    GC18    2141    1175
    GC19    2133    1249
    GC20    2373    1317
    GC21    2440    1369
    GC22    2633    1464
    GC23    2775    1521
    GC24    2834    1615
    GC25    3088    1703
    GC26    3189    1823
    GC27    3399    1915
    GC28    3563    2143
    GC29    3854    2163
    GC30    4133    2262
    GC31    4225    2373
    GC32    4519    2580
    GC33    4653    2605
    GC34    4922    2768
    GC35    5414    2910

Waiting for feedback,

Dmitry

[-- Attachment #2: new_cons_sweep.patch --]
[-- Type: text/plain, Size: 3493 bytes --]

Index: alloc.c
===================================================================
RCS file: /sources/emacs/emacs/src/alloc.c,v
retrieving revision 1.410
diff -u -r1.410 alloc.c
--- alloc.c	8 Jun 2007 19:59:46 -0000	1.410
+++ alloc.c	2 Jul 2007 14:54:46 -0000
@@ -5950,6 +5950,9 @@
 static void
 gc_sweep ()
 {
+  EMACS_TIME beg, end, total;
+  char *sweep_type;
+
   /* Remove or mark entries in weak hash tables.
      This must be done before any object is unmarked.  */
   sweep_weak_hash_tables ();
@@ -5961,6 +5964,11 @@
 #endif
 
   /* Put all unmarked conses on free list */
+
+  EMACS_GET_TIME (beg);
+
+#if 0 /* Old straightforward sweep */
+
   {
     register struct cons_block *cblk;
     struct cons_block **cprev = &cons_block;
@@ -5968,6 +5976,7 @@
     register int num_free = 0, num_used = 0;
 
     cons_free_list = 0;
+    sweep_type = "old";
 
     for (cblk = cons_block; cblk; cblk = *cprev)
       {
@@ -5991,6 +6000,88 @@
 	lim = CONS_BLOCK_SIZE;
 	/* If this block contains only free conses and we have already
 	   seen more than two blocks worth of free conses then deallocate
+	   this block.	*/
+	if (this_free == CONS_BLOCK_SIZE && num_free > CONS_BLOCK_SIZE)
+	  {
+	    *cprev = cblk->next;
+	    /* Unhook from the free list.  */
+	    cons_free_list = cblk->conses[0].u.chain;
+	    lisp_align_free (cblk);
+	    n_cons_blocks--;
+	  }
+	else
+	  {
+	    num_free += this_free;
+	    cprev = &cblk->next;
+	  }
+      }
+    total_conses = num_used;
+    total_free_conses = num_free;
+  }
+
+#else /* New, more compilcated, but hopefully faster, sweep */
+
+  {
+    register struct cons_block *cblk;
+    struct cons_block **cprev = &cons_block;
+    register int lim = cons_block_index;
+    register int num_free = 0, num_used = 0;
+
+    cons_free_list = 0;
+    sweep_type = "new";
+
+    for (cblk = cons_block; cblk; cblk = *cprev)
+      {
+	register int i = 0;
+	int this_free = 0;
+
+	while (1)
+	  {
+	    if (cblk->gcmarkbits[i] == -1)
+	      {
+		/* Fast path - everything is marked.  */
+		cblk->gcmarkbits[i++] = 0;
+		num_used += BITS_PER_INT;
+	      }
+	    else
+	      {
+		/* Slow path - scan over each bit, from the beginning
+		   of current word to 'min (word boundary, LIM)'.  */
+		int start, stop;
+
+		start = i * BITS_PER_INT;
+		stop = lim - start;
+		if (stop > BITS_PER_INT)
+		  stop = BITS_PER_INT;
+		stop += start;
+
+		while (start < stop)
+		  {
+		    if (!CONS_MARKED_P (&cblk->conses[start]))
+		      {
+			this_free++;
+			cblk->conses[start].u.chain = cons_free_list;
+			cons_free_list = &cblk->conses[start];
+#if GC_MARK_STACK
+			cons_free_list->car = Vdead;
+#endif
+		      }
+		    else
+		      {
+			num_used++;
+			CONS_UNMARK (&cblk->conses[start]);
+		      }
+		    start++;
+		  }
+		if (stop < (++i) * BITS_PER_INT)
+		  /* Whole bitmap is scanned or LIM is reached.  */
+		  break;
+	      }
+	  }
+
+	lim = CONS_BLOCK_SIZE;
+	/* If this block contains only free conses and we have already
+	   seen more than two blocks worth of free conses then deallocate
 	   this block.  */
 	if (this_free == CONS_BLOCK_SIZE && num_free > CONS_BLOCK_SIZE)
 	  {
@@ -6010,6 +6101,13 @@
     total_free_conses = num_free;
   }
 
+#endif /* Sweep types */
+
+  EMACS_GET_TIME (end);
+  EMACS_SUB_TIME (total, end, beg);
+  fprintf (stderr, "GC%u [%s sweep]: %u us\n", gcs_done,
+	   sweep_type, EMACS_USECS (total));
+
   /* Put all unmarked floats on free list */
   {
     register struct float_block *fblk;

[-- Attachment #3: Type: text/plain, Size: 142 bytes --]

_______________________________________________
Emacs-devel mailing list
Emacs-devel@gnu.org
http://lists.gnu.org/mailman/listinfo/emacs-devel

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GC: cons sweeping
  2007-07-02 15:01 GC: cons sweeping Dmitry Antipov
@ 2007-07-02 18:15 ` Stefan Monnier
  2007-07-02 22:39   ` Richard Stallman
  2007-07-03 10:48   ` Dmitry Antipov
  0 siblings, 2 replies; 4+ messages in thread
From: Stefan Monnier @ 2007-07-02 18:15 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>    GC39    1831    987

So it seems to be able to speed up sweeping of cons cells by anywhere between
20% and 50% (up to a factor of 2 speed up).  Now, what fraction of time is
spent sweeping cons-cells?


        Stefan

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GC: cons sweeping
  2007-07-02 18:15 ` Stefan Monnier
@ 2007-07-02 22:39   ` Richard Stallman
  2007-07-03 10:48   ` Dmitry Antipov
  1 sibling, 0 replies; 4+ messages in thread
From: Richard Stallman @ 2007-07-02 22:39 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: dmantipov, emacs-devel

    So it seems to be able to speed up sweeping of cons cells by anywhere between
    20% and 50% (up to a factor of 2 speed up).  Now, what fraction of time is
    spent sweeping cons-cells?

The change is simple enough that we have no reason to argue more
before we install it.  We just need legal papers.  So I sent Dmitry
a message about that.

^ permalink raw reply	[flat|nested] 4+ messages in thread

* Re: GC: cons sweeping
  2007-07-02 18:15 ` Stefan Monnier
  2007-07-02 22:39   ` Richard Stallman
@ 2007-07-03 10:48   ` Dmitry Antipov
  1 sibling, 0 replies; 4+ messages in thread
From: Dmitry Antipov @ 2007-07-03 10:48 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

Stefan Monnier wrote:

> So it seems to be able to speed up sweeping of cons cells by anywhere between
> 20% and 50% (up to a factor of 2 speed up).  Now, what fraction of time is
> spent sweeping cons-cells?

In general, typical GC spent 80% of time in marking and 20% of time in sweeping
(so even a tiny optimization of the first phase might be more important than a
much more solid enhancement of a second - but this doesn't mean that we should
not try to optimize both :-), and cons sweeping time is typically 2-5 % of the
whole GC time. So, since an average GC takes 20-30 ms on a 3GHz x86 CPU, I agree
that this optimization is likely not to be seen by an average user on an average
modern hardware.

Dmitry

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2007-07-03 10:48 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2007-07-02 15:01 GC: cons sweeping Dmitry Antipov
2007-07-02 18:15 ` Stefan Monnier
2007-07-02 22:39   ` Richard Stallman
2007-07-03 10:48   ` Dmitry Antipov

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).