Proposal: block-based vector allocator

unofficial mirror of emacs-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Proposal: block-based vector allocator
@ 2011-12-06  5:22 Dmitry Antipov
  2011-12-06 13:35 ` Stefan Monnier
       [not found] ` <jwvaa1yjs21.fsf-monnier+emacs@gnu.org>
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-06  5:22 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2178 bytes --]

I've tried a few designs of block-based vector allocation, and this is the first
one which looks good enough to be proposed here.

This vector allocator uses near-to-PAGE_SIZE blocks to allocate small vectors
and dedicated list for large vectors. Array of free lists is used to separately
hold free small vectors of all possible sizes. When an allocation request can't
be satisfied from exact-fit free list, best-fit is tried, and unused tail bytes
forms a free vector which is set up on an appropriate free list. If best-fit
allocation has failed too, and allocation request exceeds an amount of free bytes
in the current vector block, new block is allocated and this request is satisfied
from it; the rest of space from previous block forms a free vector which is set
up on appropriate free list. If allocation request may be satisfied from the
current vector block, it's just a simple bump pointer allocation.

At sweeping, all vector blocks are processed in reverse allocation order,
and pointers to unmarked vectors are collected on a per-block basic. If the
block contains no live vectors, it's freed. Otherwise, adjacent free vectors
are coalesced to form the largest possible free vectors, which are set up on
an appropriate free lists. Finally, dedicated list of large vector is processed,
and unmarked vectors are freed.

I didn't test it too much, but this code survives the bootstrap both in 32 and
64-bit variants and doesn't crash under usual daily editing workloads. Although
the whole stuff looks a bit complicated, the benefits are 4-5% faster speed and
7-8% less peak heap utilization with my favorite byte-force-recompile benchmark,
at the cost of 2% larger (64-bit) dumped executable.

As with the most general case algorithms, some special scenarios may hurt
an allocator. For example, allocating a lot of 3K vectors wastes ~25% of
vector block space; allocating a lot of mid-size vectors, followed by allocating
a lot of small vectors (or vice versa) will trigger an excessive splitting or
coalescing work. But I believe this code is balanced enough to have a measurable
advantage over current scheme for the most of average normal usage patterns.

Dmitry

[-- Attachment #2: vector_alloc.patch --]
[-- Type: text/plain, Size: 13435 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2011-11-20 03:07:02 +0000
+++ src/alloc.c	2011-12-06 05:06:33 +0000
@@ -21,7 +21,7 @@
 #include <stdio.h>
 #include <limits.h>		/* For CHAR_BIT.  */
 #include <setjmp.h>
-
+#include <sys/param.h>		/* For roundup.  */
 #include <signal.h>
 
 #ifdef HAVE_PTHREAD
@@ -2896,9 +2896,12 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
 
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE (PAGE_SIZE - sizeof (void *))
+#define VECTOR_BLOCK_BYTES (VECTOR_BLOCK_SIZE - 2 * sizeof (void *))
 
 /* Handy constants for vectorlike objects.  */
 enum
@@ -2907,6 +2910,301 @@
     word_size = sizeof (Lisp_Object)
   };
 
+/* Free lists for all possible block-allocated vectors.  */
+
+#define VECTOR_FREE_LISTS (VECTOR_BLOCK_BYTES / word_size)
+
+/* Vector blocks are aligned to page size, so it's easy to
+   get vector block pointer from the vector pointer.  */
+
+#define VECTOR_BLOCK(v) (struct vector_block *) \
+  ((unsigned long)(v) & ~(PAGE_SIZE - 1))
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_SIZE \
+  (((EMACS_INT) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | (PAGE_SIZE - 1)))
+
+struct vector_block
+{
+  unsigned char data[VECTOR_BLOCK_BYTES];
+  unsigned char *bump;
+  struct vector_block *next;
+};
+
+/* Current vector block.  */
+
+static struct vector_block *vector_block;
+
+/* Vector free lists, where NTH item points to a chain
+   of free vectors of NTH * WORD_SIZE bytes.  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_FREE_LISTS];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *b;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  /* Although lisp_align_malloc should be used, the default
+     1K granularity looks too small for vector allocation.  */
+  b = memalign (PAGE_SIZE, VECTOR_BLOCK_SIZE);
+  if (!b)
+    memory_full (VECTOR_BLOCK_SIZE);
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+  memset (b->data, 0, VECTOR_BLOCK_BYTES);
+  mem_insert (b->data, b->data + VECTOR_BLOCK_BYTES, MEM_TYPE_VECTORLIKE);
+
+  b->bump = b->data;
+  b->next = vector_block;
+  vector_block = b;
+  return b;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  int i;
+
+  large_vectors = NULL;
+  vector_block = allocate_vector_block ();
+
+  for (i = 0; i < VECTOR_FREE_LISTS; i++)
+    vector_free_lists[i] = NULL;
+}
+
+/* Allocate vector from the vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  /* Vector with 0 slots for Lisp_Objects is allowed.  */
+  eassert (nbytes >= sizeof (struct vectorlike_header) &&
+	   nbytes <= VECTOR_BLOCK_BYTES);
+  eassert (nbytes % word_size == 0);
+  eassert (vector_block != NULL);
+
+  /* First, try to allocate from a free list
+     contains vectors of the requested size.  */
+  index = nbytes / word_size;
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists contains larger vectors.  */
+  for (index = (nbytes + sizeof (struct Lisp_Vector)) / word_size;
+       index < VECTOR_FREE_LISTS; index++)
+    if (vector_free_lists[index])
+      {
+	struct Lisp_Vector *rest;
+
+	/* This vector is larger than it was requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excessive bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * word_size - nbytes;
+	eassert (restbytes % word_size == 0);
+	rest = (struct Lisp_Vector *) ((char *) vector + nbytes);
+	rest->header.size = VECTOR_FREE_LIST_SIZE | restbytes;
+	index = restbytes / word_size;
+	eassert (index < VECTOR_FREE_LISTS);
+	rest->header.next.vector = vector_free_lists[index];
+	vector_free_lists[index] = rest;
+
+	return vector;
+      }
+
+  /* Next, try to allocate from current vector block.  */
+  restbytes = vector_block->data + VECTOR_BLOCK_BYTES - vector_block->bump;
+  if (nbytes > restbytes)
+    {
+      /* Current block's free space isn't enough...  */
+      if (restbytes >= sizeof (struct Lisp_Vector))
+	{
+	  /* ...but it's enough to allocate smaller vector,
+	     which should be set on an appropriate free list.  */
+	  eassert (restbytes % word_size == 0);
+	  vector = (struct Lisp_Vector *) vector_block->bump;
+	  vector_block->bump += restbytes;
+	  vector->header.size = VECTOR_FREE_LIST_SIZE | restbytes;
+	  index = restbytes / word_size;
+	  eassert (index < VECTOR_FREE_LISTS);
+	  vector->header.next.vector = vector_free_lists[index];
+	  vector_free_lists[index] = vector;
+	}
+      /* Anyway, need a new vector block.  */
+      block = allocate_vector_block ();
+    }
+  else
+    /* Allocate from current vector block.  */
+    block = vector_block;
+
+  /* Bump pointer allocation from current vector block.  */
+  vector = (struct Lisp_Vector *) block->bump;
+  block->bump += nbytes;
+  vector->header.next.nbytes = nbytes;
+  return vector;
+}
+
+/* Return amount of Lisp_Objects can be stored in that vector.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
+  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) :	(v)->header.size)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_block, *bprev = NULL, *bnext;
+  struct Lisp_Vector *vector, *prev, *next;
+  int index, nr;
+
+  total_vector_size = 0;
+  for (index = 0; index < VECTOR_FREE_LISTS; index++)
+    vector_free_lists[index] = NULL;
+
+  /* Looking through vector blocks first.  */
+
+  while (block)
+    {
+      struct Lisp_Vector *freearray
+	[VECTOR_BLOCK_BYTES / sizeof (struct Lisp_Vector)];
+
+      index = nr = 0;
+      vector = (struct Lisp_Vector *) block->data;
+      while ((unsigned char *) vector < block->bump)
+	{
+	  if ((vector->header.size & VECTOR_FREE_LIST_SIZE) == 
+	      VECTOR_FREE_LIST_SIZE)
+	    {
+	      /* This vector is on a free list already.  */
+	      freearray[index++] = vector;
+	      vector->header.next.nbytes = 
+		vector->header.size & (PAGE_SIZE - 1);
+	    }
+	  else
+	    {
+	      if (VECTOR_MARKED_P (vector))
+		{
+		  VECTOR_UNMARK (vector), nr++;
+		  total_vector_size += VECTOR_SIZE (vector);
+		}
+	      else
+		{
+		  /* This vector is not marked, and it's not on a free list. */
+		  freearray[index++] = vector;
+		  vector->header.size = 
+		    VECTOR_FREE_LIST_SIZE | vector->header.next.nbytes;
+		}
+	    }
+	  eassert (index < sizeof (freearray) / sizeof (freearray[0]));
+	  eassert (vector->header.next.nbytes != 0);
+	  vector = (struct Lisp_Vector *)
+	    ((unsigned char *) vector + vector->header.next.nbytes);
+	}
+
+      if (nr)
+	{
+	  /* This block has some used vectors, which are unmarked. Unused
+	     vectors are collected in FREEARRAY. Setup them on a free lists,
+	     coalescing adjacent free vectors if possible.  */
+	  int i, j;
+	  EMACS_INT nbytes;
+
+	  for (i = 0; i < index; i = j)
+	    {
+	      vector = freearray[i];
+	      nbytes = vector->header.next.nbytes;
+
+	      /* Try to form the largest possible free vector.  */
+	      for (j = i + 1; j < index; j++)
+		{
+		  next = freearray[j];
+		  if ((struct Lisp_Vector *)
+		       ((unsigned char *) vector + nbytes) == next)
+		    nbytes += next->header.next.nbytes;
+		  else
+		    break;
+		}
+	      vector->header.next.nbytes = nbytes;
+
+	      /* Setup resulting vector on a free list.  */
+	      eassert (vector->header.next.nbytes % word_size == 0);
+	      nr = vector->header.next.nbytes / word_size;
+	      eassert (nr < VECTOR_FREE_LISTS);
+	      vector->header.size =
+		VECTOR_FREE_LIST_SIZE | vector->header.next.nbytes;
+	      vector->header.next.vector = vector_free_lists[nr];
+	      vector_free_lists[nr] = vector;
+	    }
+	  bprev = block, block = block->next;
+	}
+      else
+	{
+	  /* This block is no longer used.  */
+	  if (bprev)
+	    bprev->next = block->next;
+	  else
+	    vector_block = block->next;
+	  bnext = block->next;
+	  mem_delete (mem_find (block->data));
+	  free (block);
+	  block = bnext;
+	}
+    }
+
+  /* Next, sweep large vectors.  */
+
+  vector = large_vectors, prev = NULL;
+
+  while (vector)
+    if (VECTOR_MARKED_P (vector))
+      {
+	VECTOR_UNMARK (vector);
+	total_vector_size += VECTOR_SIZE (vector);
+	prev = vector, vector = vector->header.next.vector;
+      }
+    else
+      {
+	if (prev)
+	  prev->header.next = vector->header.next;
+	else
+	  large_vectors = vector->header.next.vector;
+	next = vector->header.next.vector;
+	lisp_free (vector);
+	vector = next;
+      }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2929,7 +3227,15 @@
   /* eassert (!handling_signal); */
 
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+  if (nbytes > VECTOR_BLOCK_BYTES)
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
+  else
+    /* Rounding is needed for 32-bit code to preserve 8-byte alignment.  */
+    p = allocate_vector_from_block (roundup (nbytes, 8));
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2939,9 +3245,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4016,7 +4319,38 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTORLIKE)
+    {
+      if (m->end - m->start == VECTOR_BLOCK_BYTES)
+	{
+	  /* This memory node corresponds to a vector block.  */
+	  struct vector_block *block = VECTOR_BLOCK (p);
+	  struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+	  /* Scan the whole vector block and see whether P points to
+	     the start of some vector which is not on a free list.  */
+	  while ((unsigned char *) vector < block->bump)
+	    {
+	      if ((vector->header.size & VECTOR_FREE_LIST_SIZE)
+		  == VECTOR_FREE_LIST_SIZE)
+		vector = (struct Lisp_Vector *)
+		  ((unsigned char *) vector +
+		   (vector->header.size & (PAGE_SIZE - 1)));
+	      else if (vector == p)
+		return 1;
+	      else
+		vector = (struct Lisp_Vector *)
+		  ((unsigned char *) vector + vector->header.next.nbytes);
+	    }
+	}
+      else
+	{
+	  if (p == m->start)
+	    /* This memory node corresponds to a large vector.  */
+	    return 1;
+	}
+    }
+  return 0;
 }
 
 
@@ -6169,33 +6503,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6331,7 +6639,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6344,6 +6651,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2011-12-05 00:15:15 +0000
+++ src/lisp.h	2011-12-06 05:04:46 +0000
@@ -868,12 +868,15 @@
 struct vectorlike_header
   {
     EMACS_INT size;
-
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      EMACS_INT nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-06  5:22 Proposal: block-based vector allocator Dmitry Antipov
@ 2011-12-06 13:35 ` Stefan Monnier
  2011-12-06 15:14   ` Dmitry Antipov
       [not found] ` <jwvaa1yjs21.fsf-monnier+emacs@gnu.org>
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2011-12-06 13:35 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

> I've tried a few designs of block-based vector allocation, and this is
> the first one which looks good enough to be proposed here.

I don't have time to look at it, but I like the idea of reducing the
cost of small vector allocation (currently O(log N) where N is the
number of malloc'd objects).

Could you give a quick overview of what you've considered and thrown away?

> This vector allocator uses near-to-PAGE_SIZE blocks to allocate small vectors
> and dedicated list for large vectors.

For large vectors we can probably just keep using what we have now.

> be satisfied from exact-fit free list, best-fit is tried, and unused
> tail bytes forms a free vector which is set up on an appropriate free
> list.

I don't like much splitting&coalescing.  Is it really needed?


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-06 13:35 ` Stefan Monnier
@ 2011-12-06 15:14   ` Dmitry Antipov
  2011-12-06 19:39     ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-06 15:14 UTC (permalink / raw)
  To: emacs-devel

On 12/06/2011 05:35 PM, Stefan Monnier wrote:

> Could you give a quick overview of what you've considered and thrown away?

The most basic idea was to maintain separate vector blocks for vectors
of each possible size (with the room enough to 0, 1, 2, etc. Lisp_Objects -
let's call it order0, order1 and so on), up to 4x1K vectors in 4K blocks,
and dedicated list for >1K vectors. This scheme is simple, but it tends to
hold too much space in free lists, and this space is of 'wrong order'. It was
typical to have 1000 free slots in order[234] blocks, but no free space in
order50 blocks. As a result, the total memory consumption was _more_ than
current.

> For large vectors we can probably just keep using what we have now.

It is so.

> I don't like much splitting&coalescing.  Is it really needed?

I believe splitting is important because the most of vectors are small,
so even a small pieces of unused memory are very likely to be utilized.

For (64-bit) example, consider 600-byte request, and there is no 600-byte free
block in corresponding free list. In this case, proposed allocator will not
consider 608 and 616-byte free lists, because splitting gives 8 and 16-bytes
tails, which can't be used to hold any vector (except 0-sized, which is very
rare). But it will consider free list with 624-bytes blocks, because 24-bytes
tail can be used for the vector holds 1 Lisp_Object. If it fails, next candidate
is 632-byte block free list, and so on. And it's very likely that 24-bytes (or
32, or 40, etc.) tail will be used soon, because most vectors are small.

Coalescing is harder, because... yes, most vectors are small :-). For example,
we can have vector block with 10 free and adjacent 40-byte vectors. While
coalescing as much as possible, an obvious result is to create 400-byte free
vector and set it on a free list. Next, it's very likely to have, for example,
8 requests for 32,40,32,40,48,32,24,48-byte vectors instead of one request for
360-byte vector. So, if corresponding small-size free lists are empty, an
allocator will split 400-byte block back into small pieces. This is an obvious
disadvantage of this allocator. But it's possible to introduce a kind of
'coalescing threshold' and stop coalescing if small-size free lists becomes
too short in favor of mid- and large-size ones.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-06 15:14   ` Dmitry Antipov
@ 2011-12-06 19:39     ` Stefan Monnier
  2011-12-07  5:05       ` Dmitry Antipov
  0 siblings, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2011-12-06 19:39 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

> The most basic idea was to maintain separate vector blocks for vectors
> of each possible size (with the room enough to 0, 1, 2, etc. Lisp_Objects -
> let's call it order0, order1 and so on), up to 4x1K vectors in 4K blocks,
> and dedicated list for >1K vectors. This scheme is simple, but it tends to
> hold too much space in free lists, and this space is of 'wrong order'. It was
> typical to have 1000 free slots in order[234] blocks, but no free space in
> order50 blocks.

Are you saying that you kept separate blocks for 7-slot vectors and
8-slot vectors?  Yes, that tends to lead to too-many blocks.
Instead, each block should cover a range of sizes.

Example for 32bit system: since vectors, like all objects, need to be
aligned on a multiple of 8, and have at least one metadata field (the
vector's size and gcmarkbit), that means that the minimum size of tiny
vectors is:

   0&1-slot:   8bytes.
   2&3-slots: 16bytes.
   4&5-slots: 24bytes.
   6&7-slots: 32bytes.

[If we keep the "next" field (even tho it's not needed for those
vectors if we don't split&coalesce), it shifts things by 1, of course]

The best we can do (with 4KB blocks) without splitting/coalescing is:
- order16 : 256 objects of size <= 16bytes.
- order24 : 170 objects of size <= 24bytes.
- order32 : 128 objects of size <= 32bytes.
- order40 : 102 objects of size <= 40bytes.
- order48 :  85 objects of size <= 48bytes.
- order56 :  73 objects of size <= 56bytes.
- order64 :  64 objects of size <= 64bytes.
- order72 :  56 objects of size <= 72bytes.
- order80 :  51 objects of size <= 80bytes.
- order88 :  46 objects of size <= 88bytes.
- order96 :  42 objects of size <= 96bytes.
- order104:  39 objects of size <= 104bytes.
- order112:  36 objects of size <= 112bytes.
- order120:  34 objects of size <= 120bytes.
- order128:  32 objects of size <= 128bytes.
- order136:  30 objects of size <= 136bytes.
- order144:  28 objects of size <= 144bytes.
- order152:  26 objects of size <= 152bytes.
- order160:  25 objects of size <= 160bytes.
- order168:  24 objects of size <= 168bytes.
- order176:  23 objects of size <= 176bytes.
- order184:  22 objects of size <= 184bytes.
- order192:  21 objects of size <= 192bytes.
- order200:  20 objects of size <= 200bytes.
- order208:  19 objects of size <= 208bytes.
(skip order216:  18 objects of size <= 216bytes)
- order224:  18 objects of size <= 224bytes.
(skip order232:  17 objects of size <= 232bytes)
- order240:  17 objects of size <= 240bytes.
- order256:  16 objects of size <= 256bytes.
- order272:  15 objects of size <= 272bytes.
- order288:  14 objects of size <= 288bytes.
- order312:  13 objects of size <= 312bytes.
- order336:  12 objects of size <= 336bytes.
- order368:  11 objects of size <= 368bytes.
- order408:  10 objects of size <= 408bytes.
- order448:   9 objects of size <= 448bytes.
- order512:   8 objects of size <= 512bytes.
- order584:   7 objects of size <= 584bytes.
- order680:   6 objects of size <= 680bytes.
- order816:   5 objects of size <= 816bytes.
- order1024:  4 objects of size <= 1024bytes.
- order1360:  3 objects of size <= 1360bytes.
- order2048:  2 objects of size <= 2048bytes.
- order4096:  1 objects of size <= 4096bytes.

I.e. 44 different orders.  As objects grow, so does the wasted space,
e.g. for order512 the max wasted space (a 456bytes object) is already
12%, so we'd set a limit (e.g. 200bytes) after which we'd use the large
vector code.

Is that more or less what you had done?  It's a very standard and
popular malloc algorithm, so I'm surprised you say it didn't work well.

If the problem is that we have many "mostly empty" blocks of the wrong
size, then we can do the following:
- reduce the block size to 1024 (as used for cons cells and floats, BTW).
- impose an alignment on multiple of 16 (probably a good idea anyway
  since it will give us a bit more tag space) which reduces the number
  of orders accordingly.

E.g. for 1024bytes blocks with a "multiple of 16" alignment, we have
much fewer orders, and hence correspondingly fewer "mostly empty" blocks:
- order16 : 64 objects of size <= 16bytes.
- order32 : 32 objects of size <= 32bytes.
- order48 : 21 objects of size <= 48bytes.
- order64 : 16 objects of size <= 64bytes.
- order80 : 12 objects of size <= 80bytes.
- order96 : 10 objects of size <= 96bytes.
- order112:  9 objects of size <= 112bytes.
- order128:  8 objects of size <= 128bytes.
- order144:  7 objects of size <= 144bytes.
- order160:  6 objects of size <= 160bytes.
- order192:  5 objects of size <= 192bytes.
- order256:  4 objects of size <= 256bytes.
- order336:  3 objects of size <= 336bytes.
- order512:  2 objects of size <= 512bytes.
- order1024: 1 objects of size <= 1024bytes.
  
> As a result, the total memory consumption was _more_ than current.

I don't think we should care about 0-slot and 1-slot vectors too much,
and even 2-slot vectors are not super important (code that really cares
about it can use a cons cell instead in most cases).  Also, currently,
3-slot vectors take up 24bytes for the Lisp_Vectorlike object plus one
mem_node entry in the red/black tree (which is 7words (could be brought
down to 6), i.e. 32bytes), for a grand total of 56bytes (that's without
counting malloc's own overhead, tho it seemed to be fairly small last
time I looked at it).

So we should have a fair bit of wiggle room here.  Especially since I'm
not specifically interested in reducing memory use as much as getting
rid of the O(log n) overhead and generally making smallish
"defstruct-style" vectors more efficient.

E.g. the "56bit overhead" corresponds more or less to the overhead of
placing a 452B objects in the order512 (for 4096B blocks), so we should
be able to allocate all vectors smaller than 512B in such a scheme
without suffering much increased memory use (other than having some
number of mostly empty blocks).

> block in corresponding free list. In this case, proposed allocator will not
> consider 608 and 616-byte free lists, because splitting gives 8 and 16-bytes
> tails, which can't be used to hold any vector (except 0-sized, which is very
> rare). But it will consider free list with 624-bytes blocks, because 24-bytes
> tail can be used for the vector holds 1 Lisp_Object.

Regardless of whether we can find 1-slot vector to fill this, 624bytes
for a 600bytes object is just 4% overhead, so it's not worth wasting too
much effort trying to make use of that space.  It's still much smaller
than the 56bytes (for 32bit systems) of mem_node we currently allocate
for every vector.

> If it fails, next candidate is 632-byte block free list, and so
> on. And it's very likely that 24-bytes (or 32, or 40, etc.) tail will
> be used soon, because most vectors are small.

Why have different classes for 632B and 600B sizes?  You can't fit more
than 6 vectors of either size in a 4KB block, so you might as well
divide your "order600" block into 6 chunks of 680B (plus 16B of padding)
rather than 6 chunks of 600B plus an unused 496B of padding.


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-06 19:39     ` Stefan Monnier
@ 2011-12-07  5:05       ` Dmitry Antipov
  2011-12-07 12:27         ` Carsten Mattner
  2011-12-07 13:52         ` Stefan Monnier
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-07  5:05 UTC (permalink / raw)
  To: emacs-devel

On 12/06/2011 11:39 PM, Stefan Monnier wrote:

> Is that more or less what you had done?  It's a very standard and
> popular malloc algorithm, so I'm surprised you say it didn't work well.

The point is that it makes no sense to have 'standard malloc algorithm'
for vector allocation on top of another 'standard malloc algorithm' used
by C library malloc. To get an advantage over plain malloc()/free() pair,
we should have something specially tailored for our needs instead of
duplicating library functionality at the higher levels. Any good
allocator should be a well-designed balance between complexity, speed,
code size, fragmentation, etc., but I'm trying to target smallest possible
fragmentation and best possible heap utilization with smallest possible
waste. This is especially important because our heap is very unlikely to
shrink and tends to be very fragmented.

> So we should have a fair bit of wiggle room here.  Especially since I'm
> not specifically interested in reducing memory use as much as getting
> rid of the O(log n) overhead and generally making smallish
> "defstruct-style" vectors more efficient.

This approach looks pretty strange for anyone who looks through this list's
archives: "my Emacs is too slow" noise is quite rare, but "my Emacs is
too fat" topics are raised over and over again.

I believe Emacs is fast enough to be a good editor on any reasonable hardware.
Even for an instance bloated with a few hundreds of buffers, I never seen
a GC longer than 0.1s, which is faster than a keystroke. So, eliminating
an logarithmic overhead in one of internal subsystems is insignificant. But,
when an editor with 0 (zero) open files consumes more memory than a web
browser with 10 open tabs, it's just a shame.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-07  5:05       ` Dmitry Antipov
@ 2011-12-07 12:27         ` Carsten Mattner
  2011-12-07 13:52         ` Stefan Monnier
  1 sibling, 0 replies; 62+ messages in thread
From: Carsten Mattner @ 2011-12-07 12:27 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

On Wed, Dec 7, 2011 at 6:05 AM, Dmitry Antipov <dmantipov@yandex.ru> wrote:
> On 12/06/2011 11:39 PM, Stefan Monnier wrote:
>> So we should have a fair bit of wiggle room here.  Especially since I'm
>> not specifically interested in reducing memory use as much as getting
>> rid of the O(log n) overhead and generally making smallish
>> "defstruct-style" vectors more efficient.
>
>
> This approach looks pretty strange for anyone who looks through this list's
> archives: "my Emacs is too slow" noise is quite rare, but "my Emacs is
> too fat" topics are raised over and over again.
>
> I believe Emacs is fast enough to be a good editor on any reasonable
> hardware.
> Even for an instance bloated with a few hundreds of buffers, I never seen
> a GC longer than 0.1s, which is faster than a keystroke. So, eliminating
> an logarithmic overhead in one of internal subsystems is insignificant. But,
> when an editor with 0 (zero) open files consumes more memory than a web
> browser with 10 open tabs, it's just a shame.

For what it's worth I'd like to second that based on my emacs experience.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-07  5:05       ` Dmitry Antipov
  2011-12-07 12:27         ` Carsten Mattner
@ 2011-12-07 13:52         ` Stefan Monnier
  2011-12-07 16:08           ` Dmitry Antipov
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2011-12-07 13:52 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> Is that more or less what you had done?  It's a very standard and
>> popular malloc algorithm, so I'm surprised you say it didn't work well.
> The point is that it makes no sense to have 'standard malloc algorithm'
> for vector allocation on top of another 'standard malloc algorithm' used
> by C library malloc. To get an advantage over plain malloc()/free() pair,
> we should have something specially tailored for our needs instead of
> duplicating library functionality at the higher levels.

Then I suggest you leave the code alone: we have no evidence that the
allocation pattern of vectors is any different than what malloc was
designed for, and the malloc library has seen many years of engineering
and tweaking, so you're very unlikely to do better than it does.

> Any good allocator should be a well-designed balance between
> complexity, speed, code size, fragmentation, etc.,

Yes.  And the design I propose is one that has proved to be such a good
balance.

> but I'm trying to target smallest possible fragmentation and best
> possible heap utilization with smallest possible waste.

Then you're doing it wrong: do it the way Lisp_Strings are handled
(i.e. with compaction).

>> So we should have a fair bit of wiggle room here.  Especially since I'm
>> not specifically interested in reducing memory use as much as getting
>> rid of the O(log n) overhead and generally making smallish
>> "defstruct-style" vectors more efficient.
> This approach looks pretty strange for anyone who looks through this list's
> archives: "my Emacs is too slow" noise is quite rare, but "my Emacs is
> too fat" topics are raised over and over again.

The intention is not to make Emacs faster, but to eliminate gross
inefficiencies on particular operations (because that can influence
coding style negatively).

> I believe Emacs is fast enough to be a good editor on any
> reasonable hardware.

I disagree (it often has trouble keeping up with my typing (and I don't
even touch-type); but of course, this particular optimization won't
help, since the slowness is mostly in the display rendering which is not
affected by the speed of vector allocation).

> Even for an instance bloated with a few hundreds of buffers, I never seen
> a GC longer than 0.1s, which is faster than a keystroke.

None of this is related to the GC speed anyway.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-07 13:52         ` Stefan Monnier
@ 2011-12-07 16:08           ` Dmitry Antipov
  2011-12-07 16:30             ` Stefan Monnier
  2011-12-08  1:53             ` Stephen J. Turnbull
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-07 16:08 UTC (permalink / raw)
  To: emacs-devel

On 12/07/2011 05:52 PM, Stefan Monnier wrote:

> Then I suggest you leave the code alone: we have no evidence that the
> allocation pattern of vectors is any different than what malloc was
> designed for, and the malloc library has seen many years of engineering
> and tweaking, so you're very unlikely to do better than it does.

At least for the vector allocation, I did it in a few days, if you believe
in my numbers and byte-force-recompile as a representative benchmark.

> Yes.  And the design I propose is one that has proved to be such a good
> balance.

The design itself can't prove anything. The devil is in the details, and
implementation is the only thing that matters. My favorite GC textbook
(http://www.amazon.com/gp/product/1420082795) describes a tens of
very good designs, but doesn't provide any ready-to-use recipes to help
us solve memory usage issues.

> Then you're doing it wrong: do it the way Lisp_Strings are handled
> (i.e. with compaction).

Although some designs makes it pretty hard, compaction is orthogonal to
an allocation, and allocator I'm proposing can be enhanced to support
compaction.

> None of this is related to the GC speed anyway.

Wrong. Among others, good design might (and should) try to improve the
spatial locality of allocated objects, thus giving less data cache misses
and so better speed. It's very hard to predict how marking traversal
accesses the memory, and locality of this procedure is known to be very
poor. But, for example, sweeping of block-packed data is definitely
faster than sweeping the same amount of data randomly dispersed
among the heap.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-07 16:08           ` Dmitry Antipov
@ 2011-12-07 16:30             ` Stefan Monnier
  2011-12-08  8:50               ` Dmitry Antipov
  2011-12-08  1:53             ` Stephen J. Turnbull
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2011-12-07 16:30 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> Then I suggest you leave the code alone: we have no evidence that the
>> allocation pattern of vectors is any different than what malloc was
>> designed for, and the malloc library has seen many years of engineering
>> and tweaking, so you're very unlikely to do better than it does.
> At least for the vector allocation, I did it in a few days, if you believe
> in my numbers and byte-force-recompile as a representative benchmark.

I'm not convinced your result is because your code does better than
malloc.  Instead I expect it's because it avoids the mem_node overhead.
Recompiling and trying again without conservative stack scanning should
factor out this effect.

I'm still curious why you have a "order608" for 4KB blocks, since
dividing the 4KB block into 608 would leave you with a lot of padding at
the end and reducing the number of distinct "orders" should
reduce fragmentation.

>> Then you're doing it wrong: do it the way Lisp_Strings are handled
>> (i.e. with compaction).
> Although some designs makes it pretty hard, compaction is orthogonal to
> an allocation, and allocator I'm proposing can be enhanced to support
> compaction.

You can't do compaction without changing the Lisp_Vector's layout
(adding an indirection, as is done for Lisp_Strings).

>> None of this is related to the GC speed anyway.
> Wrong. Among others, good design might (and should) try to improve the
> spatial locality of allocated objects, thus giving less data cache misses
> and so better speed. It's very hard to predict how marking traversal
> accesses the memory, and locality of this procedure is known to be very
> poor. But, for example, sweeping of block-packed data is definitely
> faster than sweeping the same amount of data randomly dispersed
> among the heap.

I thought we agreed that GC performance is not a concern here.
Of course, every change to the layout can impact the performance of the
GC.  Have you measured the impact of your code on GC performance?

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-07 16:30             ` Stefan Monnier
@ 2011-12-08  8:50               ` Dmitry Antipov
  2011-12-08 13:52                 ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-08  8:50 UTC (permalink / raw)
  To: emacs-devel

On 12/07/2011 08:30 PM, Stefan Monnier wrote:

> I'm not convinced your result is because your code does better than
> malloc.  Instead I expect it's because it avoids the mem_node overhead.
> Recompiling and trying again without conservative stack scanning should
> factor out this effect.

Without conservative stack scanning, it works at least not worse
(I believe we should ignore the differences less than, say, 0.5%).
But, as I have said, there are further improvements possible.

> You can't do compaction without changing the Lisp_Vector's layout
> (adding an indirection, as is done for Lisp_Strings).

I know, and I'm just saying that my proposal contains nothing which
makes compaction impossible or too tricky.

> I thought we agreed that GC performance is not a concern here.
> Of course, every change to the layout can impact the performance of the
> GC.  Have you measured the impact of your code on GC performance?

Not yet. But will do it, of course.

Dmitry



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08  8:50               ` Dmitry Antipov
@ 2011-12-08 13:52                 ` Stefan Monnier
  0 siblings, 0 replies; 62+ messages in thread
From: Stefan Monnier @ 2011-12-08 13:52 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> I'm not convinced your result is because your code does better than
>> malloc.  Instead I expect it's because it avoids the mem_node overhead.
>> Recompiling and trying again without conservative stack scanning should
>> factor out this effect.

> Without conservative stack scanning, it works at least not worse
> (I believe we should ignore the differences less than, say, 0.5%).

Sounds about right (performance of vector allocation, both in terms of
memory and CPU use, should be lost in the noise unless the allocator
really suffers from a serious problem, as is the case with the current
allocator because of the mem_node allocated alongside).

> But, as I have said, there are further improvements possible.

I think the improvements should focus on simplicity since I doubt we can
get measurably much better.

>> You can't do compaction without changing the Lisp_Vector's layout
>> (adding an indirection, as is done for Lisp_Strings).
> I know, and I'm just saying that my proposal contains nothing which
> makes compaction impossible or too tricky.

OK.

>> I thought we agreed that GC performance is not a concern here.
>> Of course, every change to the layout can impact the performance of the
>> GC.  Have you measured the impact of your code on GC performance?
> Not yet. But will do it, of course.

I wouldn't bother.

BTW:
I'm still curious why you have a "order608" for 4KB blocks, since
dividing the 4KB block into 608 would leave you with a lot of padding at
the end and reducing the number of distinct "orders" should
reduce fragmentation.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-07 16:08           ` Dmitry Antipov
  2011-12-07 16:30             ` Stefan Monnier
@ 2011-12-08  1:53             ` Stephen J. Turnbull
  2011-12-08  4:41               ` Dmitry Antipov
  1 sibling, 1 reply; 62+ messages in thread
From: Stephen J. Turnbull @ 2011-12-08  1:53 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

Dmitry Antipov writes:
 > On 12/07/2011 05:52 PM, Stefan Monnier wrote:

 > > Yes.  And the design I propose is one that has proved to be such a good
 > > balance.
 > 
 > The design itself can't prove anything. The devil is in the details, and
 > implementation is the only thing that matters. My favorite GC textbook
 > (http://www.amazon.com/gp/product/1420082795) describes a tens of
 > very good designs, but doesn't provide any ready-to-use recipes to help
 > us solve memory usage issues.

I don't think you're going to solve the memory use issues by improving
vector allocation.  "Premature optimization is the root of all error."

 > Wrong. Among others, good design might (and should) try to improve the
 > spatial locality of allocated objects,

I (and I suspect Stefan) disagree with "should".  The goal is worthy,
but success difficult and unlikely: Emacs tend to end up with very
poor locality for many reasons.

As for the memory savings themselves, your benchmark is not
convincing: all it tells us that an allocation algorithm carefully
tuned to decreasing the size of a dumped Emacs can do so, a little.
But the amount of memory that can be saved in practice is rather small
compared to the level of memory usage that users complain about these
days.  "EMACS = 8 MB And Constantly Swapping" is a non sequitur, and
has been for almost 20 years for me.  So it does not tell us that it
will help with the problems being discussed in the "Memory again"
thread, among others.

If you want to keep working on it, fine -- every little bit helps, and
you might succeed.  But I don't think it should be a priority for
Emacs, and you are going to need to show more gains to make it worth a
somewhat more complex code base.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08  1:53             ` Stephen J. Turnbull
@ 2011-12-08  4:41               ` Dmitry Antipov
  2011-12-08 14:10                 ` Stefan Monnier
  2011-12-09  4:44                 ` Stephen J. Turnbull
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-08  4:41 UTC (permalink / raw)
  To: emacs-devel

On 12/08/2011 05:53 AM, Stephen J. Turnbull wrote:

> I don't think you're going to solve the memory use issues by improving
> vector allocation.

Initially I considered the possibility to use mmap()ed areas to store Lisp
data for normal (already dumped) Emacs. Among others, it requires
1) suitable tagging scheme and 2) ability to store all Lisp data within
multiple-of-page-size blocks. That was the main motivation for working on
block-based vector allocator.

> "Premature optimization is the root of all error."

It's not premature, and further improvements are possible. "The way of
thousand miles begins with single step".

> If you want to keep working on it, fine -- every little bit helps, and
> you might succeed.  But I don't think it should be a priority for
> Emacs, and you are going to need to show more gains to make it worth a
> somewhat more complex code base.

Sorry, but did you mean "don't show half-done work to a fool"? If yes,
then I should agree.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08  4:41               ` Dmitry Antipov
@ 2011-12-08 14:10                 ` Stefan Monnier
  2011-12-08 16:48                   ` Dmitry Antipov
  2011-12-09  4:44                 ` Stephen J. Turnbull
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2011-12-08 14:10 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> I don't think you're going to solve the memory use issues by improving
>> vector allocation.
> Initially I considered the possibility to use mmap()ed areas to store Lisp
> data for normal (already dumped) Emacs. Among others, it requires
> 1) suitable tagging scheme and 2) ability to store all Lisp data within
> multiple-of-page-size blocks. That was the main motivation for working on
> block-based vector allocator.

Emacs already allocates all cons, float, string headers,
markers/overlays, interval nodes, and symbols (i.e. everything other
than vectors) within (more or less) multiple-of-page-size blocks.
These are allocated via `malloc' rather than `mmap', but I don't think
our "memory issues" are due to malloc not returning the corresponding
pages to the OS, but rather due to the fact that after a phase of very
high allocation followed by a massive deallocation, many of these blocks
end up mostly (but not fully) empty.

Now, I don't know for a fact that this is the case, it's just a hunch.
Someone could try to change the allocator so those blocks are exactly
1-page in size and get allocated via mmap:
- using mmap will ensure that we factor out malloc from the equation.
- using page-size (presumably 4KB) blocks rather than the current 16KB
  reduces the granularity so there's more probability that a block
  contains no live objects.
Then re-run some problematic case (e.g. the million-lines compile buffer)
to see if this helps.  It really should help, tho I don't hold much hope
that it will make a significant difference (e.g. more than 30%
difference would be significant).
  
>> If you want to keep working on it, fine -- every little bit helps, and
>> you might succeed.  But I don't think it should be a priority for
>> Emacs, and you are going to need to show more gains to make it worth a
>> somewhat more complex code base.
> Sorry, but did you mean "don't show half-done work to a fool"? If yes,
> then I should agree.

It's not a priority for Emacs to improve the CPU and memory performance
in general.  But I do find it important to make Elisp programming
easier, which is why I'm interested in a more complex vector allocation:
defstruct is currently pathetically inefficient.


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08 14:10                 ` Stefan Monnier
@ 2011-12-08 16:48                   ` Dmitry Antipov
  2011-12-08 19:58                     ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-08 16:48 UTC (permalink / raw)
  To: emacs-devel

On 12/08/2011 06:10 PM, Stefan Monnier wrote:

> Someone could try to change the allocator so those blocks are exactly
> 1-page in size and get allocated via mmap:

I did this even before I started with block-based vector allocator.

> - using mmap will ensure that we factor out malloc from the equation.
> - using page-size (presumably 4KB) blocks rather than the current 16KB

Current sizes are 1K except string data, which is 8K.

> Then re-run some problematic case (e.g. the million-lines compile buffer)
> to see if this helps.  It really should help, tho I don't hold much hope
> that it will make a significant difference (e.g. more than 30%
> difference would be significant).

My favorite byte-force-recompile benchmark usually consumes ~89M. With my
vector allocator it shrinks to 78M. When mmap() is enabled, it shrinks
to 76M. Not too much.

But for (definitely pathological) compile buffer case, results are
very unexpected (I tried with 100000-lines buffer). Fresh instance
is 18M. With compile buffer, it grows to 142M and shrinks to 141M
after killing *compilation* and garbage-collect. When mmap() is enabled,
it grows to 138M and shrinks back to 21M (twenty one) after kill & GC.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08 16:48                   ` Dmitry Antipov
@ 2011-12-08 19:58                     ` Stefan Monnier
  2011-12-09  7:32                       ` Eli Zaretskii
  2011-12-09  9:04                       ` Dmitry Antipov
  0 siblings, 2 replies; 62+ messages in thread
From: Stefan Monnier @ 2011-12-08 19:58 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> Someone could try to change the allocator so those blocks are exactly
>> 1-page in size and get allocated via mmap:
> I did this even before I started with block-based vector allocator.

How much did it help?

>> - using mmap will ensure that we factor out malloc from the equation.
>> - using page-size (presumably 4KB) blocks rather than the current 16KB
> Current sizes are 1K except string data, which is 8K.

For cons cells and floats, we call malloc for 16KB blocks at a time
(which we then cut into 1024 blocks).  This is because we need the 1024
blocks to be aligned on a 1024 boundary (so when we use malloc instead
of posix_memalign, we waste upto 1023 (well 1016, really) bytes of those
16KB).

> My favorite byte-force-recompile benchmark usually consumes ~89M. With my
> vector allocator it shrinks to 78M.

And you said that using the non-conservative stack scanning resulted in
similar results, so all that extra memory is occupied by mem_nodes (or
fragments that used to hole mem_nodes ;-).

> But for (definitely pathological) compile buffer case, results are
> very unexpected (I tried with 100000-lines buffer).  Fresh instance
> is 18M.  With compile buffer, it grows to 142M and shrinks to 141M
> after killing *compilation* and garbage-collect.  When mmap() is enabled,
> it grows to 138M and shrinks back to 21M (twenty one) after kill & GC.

Hmm.. cool.  So some of the responsability for this test to behave so
poorly might be in malloc rather than in alloc.c.  Good news.
Oh, yes, now I remember:

#ifdef DOUG_LEA_MALLOC
      /* Prevent mmap'ing the chunk.  Lisp data may not be mmap'ed
	 because mapped region contents are not preserved in
	 a dumped Emacs.  */
      mallopt (M_MMAP_MAX, 0);
#endif

So maybe malloc behaves poorly simply because we don't let it
behave better.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08 19:58                     ` Stefan Monnier
@ 2011-12-09  7:32                       ` Eli Zaretskii
  2011-12-09  9:04                       ` Dmitry Antipov
  1 sibling, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2011-12-09  7:32 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: dmantipov, emacs-devel

> From: Stefan Monnier <monnier@IRO.UMontreal.CA>
> Date: Thu, 08 Dec 2011 14:58:04 -0500
> Cc: emacs-devel@gnu.org
> 
> #ifdef DOUG_LEA_MALLOC
>       /* Prevent mmap'ing the chunk.  Lisp data may not be mmap'ed
> 	 because mapped region contents are not preserved in
> 	 a dumped Emacs.  */
>       mallopt (M_MMAP_MAX, 0);
> #endif
> 
> So maybe malloc behaves poorly simply because we don't let it
> behave better.

But only on systems that use DOUG_LEA_MALLOC.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08 19:58                     ` Stefan Monnier
  2011-12-09  7:32                       ` Eli Zaretskii
@ 2011-12-09  9:04                       ` Dmitry Antipov
  2011-12-09 14:05                         ` Stefan Monnier
  1 sibling, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-09  9:04 UTC (permalink / raw)
  To: emacs-devel

On 12/08/2011 11:58 PM, Stefan Monnier wrote:

> How much did it help?

Not too much because it makes no sense without block-based vector
allocation - huge amounts of small vectors and corresponding mem_nodes
allocated from brk()ed heap devastates everything. And it was a bit
slower, most probably because explicit mmmap() implies zeroing
performed by OS.

> And you said that using the non-conservative stack scanning resulted in
> similar results, so all that extra memory is occupied by mem_nodes (or
> fragments that used to hole mem_nodes ;-).

Yes, one mem_node per each vector results in substantial overhead and
fragmentation. _Any_ block-based vector allocator should reduce the number
of mem_nodes, so saving space - but rounding allocator will consume some
of this space back for it's own internal fragmentation. That's why
I proposed the scheme with tightest possible fit instead of 'standard
malloc algorithm'.

Even for non-conservative stack scanning with eliminated mem_nodes overhead,
I'm pretty sure that my proposal works at least not worse, taking both
CPU and memory usage into account; rounding vector allocator will consume
more memory, but may have near to negligible advantage in CPU usage because
it has no splitting/coalescing overhead.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-09  9:04                       ` Dmitry Antipov
@ 2011-12-09 14:05                         ` Stefan Monnier
  2011-12-09 16:15                           ` Dmitry Antipov
  0 siblings, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2011-12-09 14:05 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> How much did it help?
> Not too much because it makes no sense without block-based vector

Your test with the *compilation* seems to indicate otherwise, or am
I missing something?

>> And you said that using the non-conservative stack scanning resulted in
>> similar results, so all that extra memory is occupied by mem_nodes (or
>> fragments that used to hole mem_nodes ;-).
> Yes, one mem_node per each vector results in substantial overhead and
> fragmentation. _Any_ block-based vector allocator should reduce the number
> of mem_nodes, so saving space - but rounding allocator will consume some

Exactly: no need to work hard.

> of this space back for it's own internal fragmentation.  That's why
> I proposed the scheme with tightest possible fit instead of 'standard
> malloc algorithm'.

The apparently tightest fit is not necessarily tighter: what happens if
you use an "order624" to allocate a 600B object, then use the remaining
24B for another object, and then free the 600B object?

I also really want to know the answer to this question (third try):
Why have different classes for 632B and 600B sizes?  You can't fit more
than 6 vectors of either size in a 4KB block, so you might as well
divide your "order600" block into 6 chunks of 680B (plus 16B of padding)
rather than 6 chunks of 600B plus an unused 496B of padding.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-09 14:05                         ` Stefan Monnier
@ 2011-12-09 16:15                           ` Dmitry Antipov
  2011-12-09 21:04                             ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-09 16:15 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2319 bytes --]

On 12/09/2011 06:05 PM, Stefan Monnier wrote:

> Your test with the *compilation* seems to indicate otherwise, or am
> I missing something?

"Not too much" for byte-compile test. Since *compilation* buffer test
allocates mostly an intervals, mmap() helps it regardless of the method
used for vector allocation.

> Exactly: no need to work hard.

Sure, but throwing away the almost-done work makes no sense, especially
when an alternative work is not even started (if I miss something done,
I would be glad to see it).

> The apparently tightest fit is not necessarily tighter: what happens if
> you use an "order624" to allocate a 600B object, then use the remaining
> 24B for another object, and then free the 600B object?

There will be an attempt to coalesce this 600-bytes vector with adjacent
vectors, and resulting vector will be set up on a free list.

> I also really want to know the answer to this question (third try):
> Why have different classes for 632B and 600B sizes?  You can't fit more
> than 6 vectors of either size in a 4KB block, so you might as well
> divide your "order600" block into 6 chunks of 680B (plus 16B of padding)
> rather than 6 chunks of 600B plus an unused 496B of padding.

I don't understand your question. It's _your_ proposal to divide objects
on classes by rounding the size up to nearest magic value ('standard malloc
algorithm'). My proposal assumes that vector block can contains objects
of any size, and I'm using segregated free lists for ..., 600, 608, 616, 624,
632, ... bytes free vectors.  I believe it was well explained why I rejects
both 'exact-fit vector blocks for each possible vector size' and 'good-fit
vector block for each size class' ('standard malloc') algorithms.

To be concrete, I'm attaching the current state of my proposal again.

Did you ever tried to obtain a histogram of vector sizes? Of course, it
highly depends on the benchmark; but I'm pretty sure that 2..6 - Lisp_Objects
vectors are very popular for the wide range of real workloads. That's
another reason why throwing away even 32 bytes is not so reasonable.

I'm scheduling the global benchmark for the night (32 vs. 64-bit
binaries, GCPROs vs.stack marking, block-based vector allocation vs.
current code, in all possible variants, 8 executables in total,
16 runs for each :-)).

Dmitry

[-- Attachment #2: vector_alloc_final.patch --]
[-- Type: text/plain, Size: 13111 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2011-11-20 03:07:02 +0000
+++ src/alloc.c	2011-12-09 10:08:32 +0000
@@ -21,7 +21,7 @@
 #include <stdio.h>
 #include <limits.h>		/* For CHAR_BIT.  */
 #include <setjmp.h>
-
+#include <sys/param.h>		/* For roundup.  */
 #include <signal.h>
 
 #ifdef HAVE_PTHREAD
@@ -2896,9 +2896,12 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
+#ifndef PAGE_SIZE
+#define PAGE_SIZE 4096
+#endif
 
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE (PAGE_SIZE - sizeof (void *))
+#define VECTOR_BLOCK_BYTES (VECTOR_BLOCK_SIZE - 2 * sizeof (void *))
 
 /* Handy constants for vectorlike objects.  */
 enum
@@ -2907,6 +2910,296 @@
     word_size = sizeof (Lisp_Object)
   };
 
+/* Free lists for all possible block-allocated vectors.  */
+
+#define VECTOR_FREE_LISTS (VECTOR_BLOCK_BYTES / word_size)
+
+/* Vector blocks are aligned to page size, so it's easy to
+   get vector block pointer from the vector pointer.  */
+
+#define VECTOR_BLOCK(v) ((struct vector_block *) \
+			 ((unsigned long)(v) & ~(PAGE_SIZE - 1)))
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_SIZE \
+  (((EMACS_INT) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | (PAGE_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) \
+  (struct Lisp_Vector *) ((unsigned char *) (v) + (nbytes))
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index) do { \
+  (v)->header.size = VECTOR_FREE_LIST_SIZE | (nbytes); \
+  (index) = (nbytes) / word_size; \
+  eassert ((index) < VECTOR_FREE_LISTS); \
+  (v)->header.next.vector = vector_free_lists[(index)]; \
+  vector_free_lists[(index)] = (v); } while (0)
+
+struct vector_block
+{
+  unsigned char data[VECTOR_BLOCK_BYTES];
+  unsigned char *bump;
+  struct vector_block *next;
+};
+
+/* Current vector block.  */
+
+static struct vector_block *vector_block;
+
+/* Vector free lists, where NTH item points to a chain
+   of free vectors of NTH * WORD_SIZE bytes.  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_FREE_LISTS];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  /* Although lisp_align_malloc should be used, the default
+     1K granularity looks too small for vector allocation.  */
+  block = memalign (PAGE_SIZE, VECTOR_BLOCK_SIZE);
+  if (!block)
+    memory_full (VECTOR_BLOCK_SIZE);
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTORLIKE);
+#endif
+
+  block->bump = block->data;
+  block->next = vector_block;
+  vector_block = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  int i;
+
+  large_vectors = NULL;
+  vector_block = allocate_vector_block ();
+
+  for (i = 0; i < VECTOR_FREE_LISTS; i++)
+    vector_free_lists[i] = NULL;
+}
+
+/* Allocate vector from the vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  /* Vector with 0 slots for Lisp_Objects is allowed.  */
+  eassert (nbytes >= sizeof (struct vectorlike_header) &&
+	   nbytes <= VECTOR_BLOCK_BYTES);
+  eassert (nbytes % word_size == 0);
+  eassert (vector_block != NULL);
+
+  /* First, try to allocate from a free list
+     contains vectors of the requested size.  */
+  index = nbytes / word_size;
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists contains larger vectors.  */
+  for (index = (nbytes + sizeof (struct Lisp_Vector)) / word_size;
+       index < VECTOR_FREE_LISTS; index++)
+    if (vector_free_lists[index])
+      {
+	struct Lisp_Vector *rest;
+
+	/* This vector is larger than it was requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excessive bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * word_size - nbytes;
+	eassert (restbytes % word_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Next, try to allocate from current vector block.  */
+  restbytes = vector_block->data + VECTOR_BLOCK_BYTES - vector_block->bump;
+  if (nbytes > restbytes)
+    {
+      /* Current block's free space isn't enough...  */
+      if (restbytes >= sizeof (struct Lisp_Vector))
+	{
+	  /* ...but it's enough to allocate smaller vector,
+	     which should be set on an appropriate free list.  */
+	  eassert (restbytes % word_size == 0);
+	  vector = (struct Lisp_Vector *) vector_block->bump;
+	  vector_block->bump += restbytes;
+	  SETUP_ON_FREE_LIST (vector, restbytes, index);
+	}
+      /* Anyway, need a new vector block.  */
+      block = allocate_vector_block ();
+    }
+  else
+    /* Allocate from current vector block.  */
+    block = vector_block;
+
+  /* Bump pointer allocation from current vector block.  */
+  vector = (struct Lisp_Vector *) block->bump;
+  block->bump += nbytes;
+  vector->header.next.nbytes = nbytes;
+  return vector;
+}
+
+/* Return amount of Lisp_Objects can be stored in that vector.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
+  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) : (v)->header.size)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_block, *bprev = NULL, *bnext;
+  struct Lisp_Vector *vector, *prev, *next;
+  int i;
+
+  total_vector_size = 0;
+  for (i = 0; i < VECTOR_FREE_LISTS; i++)
+    vector_free_lists[i] = NULL;
+
+  /* Looking through vector blocks.  */
+
+  while (block)
+    {
+      struct Lisp_Vector *
+	free_vectors[VECTOR_BLOCK_BYTES / sizeof (struct Lisp_Vector)];
+      int index, used;
+
+      for (index = 0, used = 0, vector = (struct Lisp_Vector *) block->data;
+	   (unsigned char *) vector < block->bump; vector = next)
+	{
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector), used = 1;
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      EMACS_INT nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_SIZE) == 
+		  VECTOR_FREE_LIST_SIZE)
+		vector->header.next.nbytes =
+		  vector->header.size & (PAGE_SIZE - 1);
+	      
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while ((unsigned char *) next < block->bump)
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_SIZE) == 
+		      VECTOR_FREE_LIST_SIZE)
+		    nbytes = next->header.size & (PAGE_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+	      
+	      /* Collect a pointer to resulting vector.  */
+	      eassert (index < (VECTOR_BLOCK_BYTES /
+				sizeof (struct Lisp_Vector)));
+	      free_vectors[index++] = vector;
+	    }
+	}
+
+      if (used)
+	{
+	  /* Setup collected vectors on a free lists.  */
+	  while (--index >= 0)
+	    {
+	      vector = free_vectors[index];
+	      eassert (vector->header.next.nbytes % word_size == 0);
+	      SETUP_ON_FREE_LIST (vector, vector->header.next.nbytes, i);
+	    }
+	  bprev = block, block = block->next;
+	}
+      else
+	{
+	  /* Nothing used in this block.  */
+	  if (bprev)
+	    bprev->next = block->next;
+	  else
+	    vector_block = block->next;
+	  bnext = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  free (block);
+	  block = bnext;
+	}
+    }
+
+  /* Sweep large vectors.  */
+
+  vector = large_vectors, prev = NULL;
+
+  while (vector)
+    if (VECTOR_MARKED_P (vector))
+      {
+	VECTOR_UNMARK (vector);
+	total_vector_size += VECTOR_SIZE (vector);
+	prev = vector, vector = vector->header.next.vector;
+      }
+    else
+      {
+	if (prev)
+	  prev->header.next = vector->header.next;
+	else
+	  large_vectors = vector->header.next.vector;
+	next = vector->header.next.vector;
+	lisp_free (vector);
+	vector = next;
+      }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2929,7 +3222,15 @@
   /* eassert (!handling_signal); */
 
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+  if (nbytes > VECTOR_BLOCK_BYTES)
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
+  else
+    /* Rounding is needed for 32-bit code to preserve 8-byte alignment.  */
+    p = allocate_vector_from_block (roundup (nbytes, 8));
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2939,9 +3240,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4016,7 +4314,36 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTORLIKE)
+    {
+      if (m->end - m->start == VECTOR_BLOCK_BYTES)
+	{
+	  /* This memory node corresponds to a vector block.  */
+	  struct vector_block *block = VECTOR_BLOCK (p);
+	  struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+	  /* Scan the whole vector block and see whether P points to
+	     the start of some vector which is not on a free list.  */
+	  while ((unsigned char *) vector < block->bump)
+	    {
+	      if ((vector->header.size & VECTOR_FREE_LIST_SIZE)
+		  == VECTOR_FREE_LIST_SIZE)
+		vector = ADVANCE
+		  (vector, (vector->header.size & (PAGE_SIZE - 1)));
+	      else if (vector == p)
+		return 1;
+	      else
+		vector = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	}
+      else
+	{
+	  if (p == m->start)
+	    /* This memory node corresponds to a large vector.  */
+	    return 1;
+	}
+    }
+  return 0;
 }
 
 
@@ -6169,33 +6496,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6331,7 +6632,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6344,6 +6644,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2011-12-05 00:15:15 +0000
+++ src/lisp.h	2011-12-09 07:19:16 +0000
@@ -868,12 +868,15 @@
 struct vectorlike_header
   {
     EMACS_INT size;
-
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      EMACS_INT nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-09 16:15                           ` Dmitry Antipov
@ 2011-12-09 21:04                             ` Stefan Monnier
  2011-12-11 13:18                               ` Dmitry Antipov
  2011-12-12  3:07                               ` Dmitry Antipov
  0 siblings, 2 replies; 62+ messages in thread
From: Stefan Monnier @ 2011-12-09 21:04 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> Your test with the *compilation* seems to indicate otherwise, or am
>> I missing something?
> "Not too much" for byte-compile test. Since *compilation* buffer test
> allocates mostly an intervals, mmap() helps it regardless of the method
> used for vector allocation.

Thanks for the clarification.

> I don't understand your question.

It's really me who didn't understand your code.  For some reason
I thought you didn't just have segregated free lists, but also
segregated blocks (I guess the reason was that segregated blocks were
partly designed to reduce problems of fragmentation, and since we were
talking earlier about fragmentation...).

> It's _your_ proposal to divide objects on classes by rounding the size
> up to nearest magic value ('standard malloc algorithm').

Yours is also very standard, indeed (probably something like "best-fit
allocation with segregated free list").

> My proposal assumes that vector block can contains objects
> of any size, and I'm using segregated free lists for ..., 600, 608, 616, 624,
> 632, ... bytes free vectors.

BTW, your code seems to compute the free list index by dividing by
word_size (= 4 on 32bit systems) rather than by 8, so half of the slots
will always be unused, IIUC.

> Did you ever tried to obtain a histogram of vector sizes? Of course, it
> highly depends on the benchmark; but I'm pretty sure that 2..6 - Lisp_Objects
> vectors are very popular for the wide range of real workloads.  That's
> another reason why throwing away even 32 bytes is not so reasonable.

Using segregated blocks will waste some bytes because of rounding and
padding, but you can choose the percentage you're willing to waste.
So you may sometimes waste a 32B area, but it's only once per 4KB block,
so it is of no consequence.

Both segregated and non-segregated blocks have their advantages, both in
terms of CPU and memory use (e.g. the segregated blocks will waste some space
because of rounding, but they save space by removing the need for the
"next" (aka nbytes/buffer/vector) field).

> I'm scheduling the global benchmark for the night (32 vs. 64-bit
> binaries, GCPROs vs.stack marking, block-based vector allocation vs.
> current code, in all possible variants, 8 executables in total,
> 16 runs for each :-)).

Let us how it turns out.

> +#ifndef PAGE_SIZE
> +#define PAGE_SIZE 4096
> +#endif

If it's an arbitrary choice, it should not be called "page_size" but
"vector_block_size" or some such.  Otherwise, we should fetch its value
from the OS.

> +/* Vector blocks are aligned to page size, so it's easy to
> +   get vector block pointer from the vector pointer.  */

Where do you make use of this alignment?

> +  /* Although lisp_align_malloc should be used, the default
> +     1K granularity looks too small for vector allocation.  */

Yes, it would be nice to reuse it, indeed.

>  live_vector_p (struct mem_node *m, void *p)
>  {
> -  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
> +  if (m->type == MEM_TYPE_VECTORLIKE)
> +    {
> +      if (m->end - m->start == VECTOR_BLOCK_BYTES)

This test looks dangerous (we if we allocate a large vector of this
size?).  I guess your code currently can't allocate such a "small large
vector", but if so, we should at least have a comment here.  And we
could trivially remove this assumption by adding a new
MEM_TYPE_LARGE_VECTOR.

> +	  /* Scan the whole vector block and see whether P points to
> +	     the start of some vector which is not on a free list.  */
> +	  while ((unsigned char *) vector < block->bump)
> +	    {
> +	      if ((vector->header.size & VECTOR_FREE_LIST_SIZE)
> +		  == VECTOR_FREE_LIST_SIZE)
> +		vector = ADVANCE
> +		  (vector, (vector->header.size & (PAGE_SIZE - 1)));
> +	      else if (vector == p)
> +		return 1;
> +	      else
> +		vector = ADVANCE (vector, vector->header.next.nbytes);
> +	    }
> +	}

Hmm... I didn't think of it, but this is a disadvantage of
non-segregated blocks: during conservative scanning, you need to scan
the vector block to determine if we really have a live_vector_p whereas
segregated blocks would just divide the offset by the block's
vector size.

For small enough vector blocks, it's probably not a big deal, but for
10-slot vectors on 32bit systems with 4KB blocks, that's about
100 iterations.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-09 21:04                             ` Stefan Monnier
@ 2011-12-11 13:18                               ` Dmitry Antipov
  2011-12-12  3:07                               ` Dmitry Antipov
  1 sibling, 0 replies; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-11 13:18 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 2859 bytes --]

On 12/10/2011 01:04 AM, Stefan Monnier wrote:

> BTW, your code seems to compute the free list index by dividing by
> word_size (= 4 on 32bit systems) rather than by 8, so half of the slots
> will always be unused, IIUC.

You're correct, and this is fixed.

> Let us how it turns out.

It needs to be run again since I introduced some fixes, including
those that suggested by you.

> If it's an arbitrary choice, it should not be called "page_size" but
> "vector_block_size" or some such.  Otherwise, we should fetch its value
> from the OS.

Better name is OK (old one was a trace of mmap()-related experiments :-).

> Where do you make use of this alignment?

Not needed any more.

> This test looks dangerous (we if we allocate a large vector of this
> size?).

It's impossible because VECTOR_BLOCK_BYTES is 'exact-fit' for the largest
possible block-allocated vector. That's why I'm using xmalloc/mem_insert
and not lisp_malloc: VECTOR_BLOCK_SIZE bytes are allocated for the block,
but only block->data..block->data + VECTOR_BLOCK_BYTES range is marked as
vector memory within mem_node tree. So, any MEM_TYPE_VECTORLIKE mem_node
with VECTOR_BLOCK_BYTES size memory range is guaranteed to match the
vector block, and this vector block may contain the only vector of
VECTOR_BLOCK_BYTES bytes size.

> Hmm... I didn't think of it, but this is a disadvantage of
> non-segregated blocks: during conservative scanning, you need to scan
> the vector block to determine if we really have a live_vector_p whereas
> segregated blocks would just divide the offset by the block's
> vector size.

Yes, but this linear scan may be optimized:
  1) scan is required only if examined pointer is less than
     block bump allocation pointer;
  2) scan should not look at addresses beyond examined pointer.

> For small enough vector blocks, it's probably not a big deal, but for
> 10-slot vectors on 32bit systems with 4KB blocks, that's about
> 100 iterations.

This is a worst-case scenario when it's needed to scan up to the end of vector
block, and the block is full of small vectors. Although the most vectors
are small, small vectors usually allocated and died (becomes unreachable)
in groups, and such a group of adjacent died vectors is a subject for coalescing,
which reduces the number of vectors in the block. For the byte-compile
benchmark, this loop does ~12 iterations on the average (for 64-bit code; expect
larger value for 32-bit since average vector size is ~2x smaller).

BTW, what do you think about hooking buffer allocation into common vector
allocation code? I found it controversial (buffers are mid-sized vector-like
objects, so it might be reasonable to generalize buffer allocation with
other vectors; on the other side, buffers needs special treatment for
liveness detection and special processing during sweep), but want to try
anyway.

Dmitry

[-- Attachment #2: vector_alloc.patch --]
[-- Type: text/plain, Size: 13305 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2011-11-20 03:07:02 +0000
+++ src/alloc.c	2011-12-11 10:15:11 +0000
@@ -21,7 +21,7 @@
 #include <stdio.h>
 #include <limits.h>		/* For CHAR_BIT.  */
 #include <setjmp.h>
-
+#include <sys/param.h>		/* For roundup.  */
 #include <signal.h>
 
 #ifdef HAVE_PTHREAD
@@ -2896,17 +2896,309 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
-
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE (4096)
 
 /* Handy constants for vectorlike objects.  */
 enum
   {
     header_size = offsetof (struct Lisp_Vector, contents),
-    word_size = sizeof (Lisp_Object)
+    word_size = sizeof (Lisp_Object),
+    roundup_size = 8
   };
 
+/* One pointer is malloc overhead, others are BUMP and NEXT fields of struct
+   vector_block.  Rounding helps to maintain alignment constraints.  */
+
+#define VECTOR_BLOCK_BYTES \
+  (VECTOR_BLOCK_SIZE - roundup (3 * sizeof (void *), roundup_size))
+
+/* Maximum amount of vectors allocated from the vector block.  */
+
+#define VECTORS_PER_BLOCK_MAX \
+  (VECTOR_BLOCK_BYTES / sizeof (struct vectorlike_header))
+
+/* Free lists for all possible block-allocated vectors.  */
+
+#define VECTOR_FREE_LISTS ((VECTOR_BLOCK_BYTES / roundup_size) + 1)
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_SIZE \
+  (((EMACS_INT) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
+			(VECTOR_BLOCK_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) \
+  (struct Lisp_Vector *) ((unsigned char *) (v) + (nbytes))
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index) do { \
+  (v)->header.size = VECTOR_FREE_LIST_SIZE | (nbytes); \
+  eassert ((nbytes) % roundup_size == 0); \
+  (index) = (nbytes) / roundup_size; \
+  eassert ((index) < VECTOR_FREE_LISTS); \
+  (v)->header.next.vector = vector_free_lists[(index)]; \
+  vector_free_lists[(index)] = (v); } while (0)
+
+struct vector_block
+{
+  unsigned char data[VECTOR_BLOCK_BYTES];
+  unsigned char *bump;
+  struct vector_block *next;
+};
+
+/* Current vector block.  */
+
+static struct vector_block *vector_block;
+
+/* Vector free lists, where NTH item points to a chain
+   of free vectors of NTH * ROUNDUP_SIZE bytes.  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_FREE_LISTS];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  block = xmalloc (VECTOR_BLOCK_SIZE);
+  if (!block)
+    memory_full (VECTOR_BLOCK_SIZE);
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTORLIKE);
+#endif
+
+  block->bump = block->data;
+  block->next = vector_block;
+  vector_block = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  int i;
+
+  large_vectors = NULL;
+  vector_block = allocate_vector_block ();
+
+  for (i = 0; i < VECTOR_FREE_LISTS; i++)
+    vector_free_lists[i] = NULL;
+}
+
+/* Allocate vector from the vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  /* Vector with 0 slots for Lisp_Objects is allowed.  */
+  eassert (nbytes >= sizeof (struct vectorlike_header) &&
+	   nbytes <= VECTOR_BLOCK_BYTES);
+  eassert (nbytes % roundup_size == 0);
+  eassert (vector_block != NULL);
+
+  /* First, try to allocate from a free list
+     contains vectors of the requested size.  */
+  index = nbytes / roundup_size;
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists contains larger vectors.  */
+  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;
+       index < VECTOR_FREE_LISTS; index++)
+    if (vector_free_lists[index])
+      {
+	struct Lisp_Vector *rest;
+
+	/* This vector is larger than it was requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excessive bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * roundup_size - nbytes;
+	eassert (restbytes % roundup_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Next, try to allocate from current vector block.  */
+  restbytes = vector_block->data + VECTOR_BLOCK_BYTES - vector_block->bump;
+  if (nbytes > restbytes)
+    {
+      /* Current block's free space isn't enough...  */
+      if (restbytes >= sizeof (struct Lisp_Vector))
+	{
+	  /* ...but it's enough to allocate smaller vector,
+	     which should be set on an appropriate free list.  */
+	  eassert (restbytes % roundup_size == 0);
+	  vector = (struct Lisp_Vector *) vector_block->bump;
+	  vector_block->bump += restbytes;
+	  SETUP_ON_FREE_LIST (vector, restbytes, index);
+	}
+      /* Anyway, need a new vector block.  */
+      block = allocate_vector_block ();
+    }
+  else
+    /* Allocate from current vector block.  */
+    block = vector_block;
+
+  /* Bump pointer allocation from current vector block.  */
+  vector = (struct Lisp_Vector *) block->bump;
+  block->bump += nbytes;
+  vector->header.next.nbytes = nbytes;
+  return vector;
+}
+
+/* Return amount of Lisp_Objects can be stored in that vector.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
+  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) : (v)->header.size)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_block, *bprev = NULL, *bnext;
+  struct Lisp_Vector *vector, *prev, *next;
+  int i;
+
+  total_vector_size = 0;
+  for (i = 0; i < VECTOR_FREE_LISTS; i++)
+    vector_free_lists[i] = NULL;
+
+  /* Looking through vector blocks.  */
+
+  while (block)
+    {
+      struct Lisp_Vector *free_vectors[VECTORS_PER_BLOCK_MAX];
+      int index, used;
+
+      for (index = 0, used = 0, vector = (struct Lisp_Vector *) block->data;
+	   (unsigned char *) vector < block->bump; vector = next)
+	{
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector), used = 1;
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      EMACS_INT nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_SIZE) == 
+		  VECTOR_FREE_LIST_SIZE)
+		vector->header.next.nbytes =
+		  vector->header.size & (VECTOR_BLOCK_SIZE - 1);
+	      
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while ((unsigned char *) next < block->bump)
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_SIZE) == 
+		      VECTOR_FREE_LIST_SIZE)
+		    nbytes = next->header.size & (VECTOR_BLOCK_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+	      
+	      /* Collect a pointer to resulting vector.  */
+	      eassert (index < VECTORS_PER_BLOCK_MAX);
+	      free_vectors[index++] = vector;
+	    }
+	}
+
+      if (used)
+	{
+	  /* Setup collected vectors on a free lists.  */
+	  while (--index >= 0)
+	    {
+	      vector = free_vectors[index];
+	      eassert (vector->header.next.nbytes % roundup_size == 0);
+	      SETUP_ON_FREE_LIST (vector, vector->header.next.nbytes, i);
+	    }
+	  bprev = block, block = block->next;
+	}
+      else
+	{
+	  /* Nothing used in this block.  */
+	  if (bprev)
+	    bprev->next = block->next;
+	  else
+	    vector_block = block->next;
+	  bnext = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  xfree (block);
+	  block = bnext;
+	}
+    }
+
+  /* Sweep large vectors.  */
+
+  vector = large_vectors, prev = NULL;
+
+  while (vector)
+    if (VECTOR_MARKED_P (vector))
+      {
+	VECTOR_UNMARK (vector);
+	total_vector_size += VECTOR_SIZE (vector);
+	prev = vector, vector = vector->header.next.vector;
+      }
+    else
+      {
+	if (prev)
+	  prev->header.next = vector->header.next;
+	else
+	  large_vectors = vector->header.next.vector;
+	next = vector->header.next.vector;
+	lisp_free (vector);
+	vector = next;
+      }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2929,7 +3221,15 @@
   /* eassert (!handling_signal); */
 
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+  if (nbytes > VECTOR_BLOCK_BYTES)
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
+  else
+    /* Rounding is for 32-bit code to preserve alignment.  */
+    p = allocate_vector_from_block (roundup (nbytes, roundup_size));
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2939,9 +3239,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4016,7 +4313,41 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTORLIKE)
+    {
+      if (m->end - m->start == VECTOR_BLOCK_BYTES)
+	{
+	  /* This memory node corresponds to a vector block.  */
+	  struct vector_block *block = (struct vector_block *) m->start;
+
+	  if (p < (void *) block->bump)
+	    {
+	      struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+	      /* P is in the block's allocation range. Scan the block
+		 up to P and see whether P points to the start of some
+		 vector which is not on a free list.  */
+	      while (vector <= (struct Lisp_Vector *) p)
+		{
+		  if ((vector->header.size & VECTOR_FREE_LIST_SIZE)
+		      == VECTOR_FREE_LIST_SIZE)
+		    vector = ADVANCE (vector, (vector->header.size & 
+					       (VECTOR_BLOCK_SIZE - 1)));
+		  else if (vector == p)
+		    return 1;
+		  else
+		    vector = ADVANCE (vector, vector->header.next.nbytes);
+		}
+	    }
+	}
+      else
+	{
+	  if (p == m->start)
+	    /* This memory node corresponds to a large vector.  */
+	    return 1;
+	}
+    }
+  return 0;
 }
 
 
@@ -6169,33 +6500,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6331,7 +6636,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6344,6 +6648,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2011-12-05 00:15:15 +0000
+++ src/lisp.h	2011-12-09 11:46:15 +0000
@@ -868,12 +868,15 @@
 struct vectorlike_header
   {
     EMACS_INT size;
-
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      EMACS_INT nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-09 21:04                             ` Stefan Monnier
  2011-12-11 13:18                               ` Dmitry Antipov
@ 2011-12-12  3:07                               ` Dmitry Antipov
  2011-12-12 16:24                                 ` Stefan Monnier
  1 sibling, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2011-12-12  3:07 UTC (permalink / raw)
  To: emacs-devel

On 12/10/2011 01:04 AM, Stefan Monnier wrote:

> Let us how it turns out.

Results for the byte-compile benchmark, an average of 16 runs:

        CPU time spent in user mode, seconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        74.07        84.87
default, GCPROs            72.90        81.37
patched, stack mark        71.35        81.57
patched, GCPROs            70.16        82.18

          Peak heap utilization, KBytes
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        41499        73651
default, GCPROs            37918        65648
patched, stack mark        38310        67169
patched, GCPROs            38052        65730

          Total time spent in GC, seconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        23.58        32.32
default, GCPROs            21.94        30.43
patched, stack mark        21.64        29.89
patched, GCPROs            21.13        29.22

         Average time per GC, milliseconds
----< configuration >----< 32bit >----< 64bit >----
default, stack mark        27.62        36.03
default, GCPROs            25.57        33.93
patched, stack mark        25.22        33.34
patched, GCPROs            24.63        32.57

First of all, I was surprised with such a difference between 32- and
64-bit code. Due to this, I'm wondering whether --with-wide-int makes
sense: I expect it to be "too fat, too bloated" for a 32-bit CPU
(but would be glad to mistake).

Next, block vector allocation makes GC a bit faster. I expected it,
and can explain this effect with the better locality of vector-like
objects, which takes a good care of sweeping.

Finally, I'm pretty sure that block vector allocation makes a
very negligible effect for the case of GCPROs. In terms of peak
heap utilization, it's 0.35% worse on 32-bit and just 0.12% worse
on 64-bit. In terms of CPU usage, results are more interesting:
1% worse for 64-bit case, but 3.8% better for 32-bit. The only
explanation I have for this effect is that an arithmetic used in
splitting/coalescing operations creates some pressure on the CPU
in 64-bit mode, but 32-bit version of the same code may be implicitly
executed in parallel by the 64-bit core. Due to this, I don't consider
my 32-bit benchmark as fairly representative - it should be done
on a real 32-bit core and not in 'compatibility mode' on 64-bit one.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-12  3:07                               ` Dmitry Antipov
@ 2011-12-12 16:24                                 ` Stefan Monnier
  0 siblings, 0 replies; 62+ messages in thread
From: Stefan Monnier @ 2011-12-12 16:24 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> Let us how it turns out.
> Results for the byte-compile benchmark, an average of 16 runs:

>        CPU time spent in user mode, seconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        74.07        84.87
> default, GCPROs            72.90        81.37
> patched, stack mark        71.35        81.57
> patched, GCPROs            70.16        82.18

>          Peak heap utilization, KBytes
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        41499        73651
> default, GCPROs            37918        65648
> patched, stack mark        38310        67169
> patched, GCPROs            38052        65730

>          Total time spent in GC, seconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        23.58        32.32
> default, GCPROs            21.94        30.43
> patched, stack mark        21.64        29.89
> patched, GCPROs            21.13        29.22

>         Average time per GC, milliseconds
> ----< configuration >----< 32bit >----< 64bit >----
> default, stack mark        27.62        36.03
> default, GCPROs            25.57        33.93
> patched, stack mark        25.22        33.34
> patched, GCPROs            24.63        32.57

Since most small differences are difficult to separate from noise (and
are not important anyway), the summary from where I stand is that
there's no substantial difference (CPU and memory wise) between your new
code (with or without GCPROs) and the current code with GCPROs, whereas
the current code with conservative stack scanning is a tiny bit slower
and uses a non-negligible amount of extra space in peak usage.
This confirms that the main issue is the mem_nodes.

It would also be interesting to see how your new code performs in terms
of fragmentation (i.e. average memory use for a long running interactive
session), but it's very difficult to measure and I doubt we'd see much
difference (other than the impact of mem_nodes of course).

> on 64-bit. In terms of CPU usage, results are more interesting: 1%
> worse for 64-bit case, but 3.8% better for 32-bit.  The only
> explanation I have for this effect is that an arithmetic used in
> splitting/coalescing operations creates some pressure on the CPU in
> 64-bit mode, but 32-bit version of the same code may be implicitly
> executed in parallel by the 64-bit core.

64bit cores don't implicitly parallelize 32bit code to take advantage of
the 64bit datapath.  So that can't be the explanation.  And register
pressure is worse in the x86 architecture than in the
amd64 architecture.

> Due to this, I don't consider my 32-bit benchmark as fairly
> representative - it should be done on a real 32-bit core and not in
> 'compatibility mode' on 64-bit one.

You misunderstand what is the "compatibility mode" of amd64 processors.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2011-12-08  4:41               ` Dmitry Antipov
  2011-12-08 14:10                 ` Stefan Monnier
@ 2011-12-09  4:44                 ` Stephen J. Turnbull
  1 sibling, 0 replies; 62+ messages in thread
From: Stephen J. Turnbull @ 2011-12-09  4:44 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

Dmitry Antipov writes:

 > Sorry, but did you mean "don't show half-done work to a fool"? If yes,
 > then I should agree.

*plonk*



^ permalink raw reply	[flat|nested] 62+ messages in thread

[parent not found: <jwvaa1yjs21.fsf-monnier+emacs@gnu.org>]

* Re: Proposal: block-based vector allocator
       [not found] ` <jwvaa1yjs21.fsf-monnier+emacs@gnu.org>
@ 2012-05-17  7:58   ` Dmitry Antipov
  2012-05-18 17:40     ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2012-05-17  7:58 UTC (permalink / raw)
  To: emacs-devel; +Cc: Stefan Monnier

[-- Attachment #1: Type: text/plain, Size: 939 bytes --]

On 04/26/2012 10:01 PM, Stefan Monnier wrote:

>> I've tried a few designs of block-based vector allocation, and this is
>> the first one which looks good enough to be proposed here.
>
> I'd like to install something like that for Emacs-24.2, and as the trunk
> is open again for new features, we can do it "any time now".
>
> So, please send your latest code to emacs-devel, so we can review it one
> more time before installation.

OK

> As mentioned earlier, the main point of your patch is to reduce the use
> of mem_nodes, so please keep the patch as simple as possible for now,
> focusing on just reducing the use of mem_nodes.

IIRC from the previous discussions, you're suggesting rounding-up allocator
for the sake of simplicity and better speed; I'm still voting for exact-fit
allocator with splitting/coalescing, for the sake of best possible memory
utilization. It would be helpful to get a broader discussion on this.

Dmitry

[-- Attachment #2: vector_alloc.patch --]
[-- Type: text/plain, Size: 13305 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2012-04-23 05:44:49 +0000
+++ src/alloc.c	2012-05-17 07:43:29 +0000
@@ -22,7 +22,7 @@
 #include <stdio.h>
 #include <limits.h>		/* For CHAR_BIT.  */
 #include <setjmp.h>
-
+#include <sys/param.h>		/* For roundup.  */
 #include <signal.h>
 
 #ifdef HAVE_PTHREAD
@@ -2924,17 +2924,309 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
-
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE (4096)
 
 /* Handy constants for vectorlike objects.  */
 enum
   {
     header_size = offsetof (struct Lisp_Vector, contents),
-    word_size = sizeof (Lisp_Object)
+    word_size = sizeof (Lisp_Object),
+    roundup_size = 8
   };
 
+/* One pointer is malloc overhead, others are BUMP and NEXT fields of struct
+   vector_block.  Rounding helps to maintain alignment constraints.  */
+
+#define VECTOR_BLOCK_BYTES \
+  (VECTOR_BLOCK_SIZE - roundup (3 * sizeof (void *), roundup_size))
+
+/* Maximum amount of vectors allocated from the vector block.  */
+
+#define VECTORS_PER_BLOCK_MAX \
+  (VECTOR_BLOCK_BYTES / sizeof (struct vectorlike_header))
+
+/* Free lists for all possible block-allocated vectors.  */
+
+#define VECTOR_FREE_LISTS ((VECTOR_BLOCK_BYTES / roundup_size) + 1)
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_SIZE \
+  (((EMACS_INT) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
+			(VECTOR_BLOCK_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) \
+  (struct Lisp_Vector *) ((unsigned char *) (v) + (nbytes))
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index) do { \
+  (v)->header.size = VECTOR_FREE_LIST_SIZE | (nbytes); \
+  eassert ((nbytes) % roundup_size == 0); \
+  (index) = (nbytes) / roundup_size; \
+  eassert ((index) < VECTOR_FREE_LISTS); \
+  (v)->header.next.vector = vector_free_lists[(index)]; \
+  vector_free_lists[(index)] = (v); } while (0)
+
+struct vector_block
+{
+  unsigned char data[VECTOR_BLOCK_BYTES];
+  unsigned char *bump;
+  struct vector_block *next;
+};
+
+/* Current vector block.  */
+
+static struct vector_block *vector_block;
+
+/* Vector free lists, where NTH item points to a chain
+   of free vectors of NTH * ROUNDUP_SIZE bytes.  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_FREE_LISTS];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  block = xmalloc (VECTOR_BLOCK_SIZE);
+  if (!block)
+    memory_full (VECTOR_BLOCK_SIZE);
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTORLIKE);
+#endif
+
+  block->bump = block->data;
+  block->next = vector_block;
+  vector_block = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  int i;
+
+  large_vectors = NULL;
+  vector_block = allocate_vector_block ();
+
+  for (i = 0; i < VECTOR_FREE_LISTS; i++)
+    vector_free_lists[i] = NULL;
+}
+
+/* Allocate vector from the vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  /* Vector with 0 slots for Lisp_Objects is allowed.  */
+  eassert (nbytes >= sizeof (struct vectorlike_header) &&
+	   nbytes <= VECTOR_BLOCK_BYTES);
+  eassert (nbytes % roundup_size == 0);
+  eassert (vector_block != NULL);
+
+  /* First, try to allocate from a free list
+     contains vectors of the requested size.  */
+  index = nbytes / roundup_size;
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists contains larger vectors.  */
+  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;
+       index < VECTOR_FREE_LISTS; index++)
+    if (vector_free_lists[index])
+      {
+	struct Lisp_Vector *rest;
+
+	/* This vector is larger than it was requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excessive bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * roundup_size - nbytes;
+	eassert (restbytes % roundup_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Next, try to allocate from current vector block.  */
+  restbytes = vector_block->data + VECTOR_BLOCK_BYTES - vector_block->bump;
+  if (nbytes > restbytes)
+    {
+      /* Current block's free space isn't enough...  */
+      if (restbytes >= sizeof (struct Lisp_Vector))
+	{
+	  /* ...but it's enough to allocate smaller vector,
+	     which should be set on an appropriate free list.  */
+	  eassert (restbytes % roundup_size == 0);
+	  vector = (struct Lisp_Vector *) vector_block->bump;
+	  vector_block->bump += restbytes;
+	  SETUP_ON_FREE_LIST (vector, restbytes, index);
+	}
+      /* Anyway, need a new vector block.  */
+      block = allocate_vector_block ();
+    }
+  else
+    /* Allocate from current vector block.  */
+    block = vector_block;
+
+  /* Bump pointer allocation from current vector block.  */
+  vector = (struct Lisp_Vector *) block->bump;
+  block->bump += nbytes;
+  vector->header.next.nbytes = nbytes;
+  return vector;
+}
+
+/* Return amount of Lisp_Objects can be stored in that vector.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
+  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) : (v)->header.size)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_block, *bprev = NULL, *bnext;
+  struct Lisp_Vector *vector, *prev, *next;
+  int i;
+
+  total_vector_size = 0;
+  for (i = 0; i < VECTOR_FREE_LISTS; i++)
+    vector_free_lists[i] = NULL;
+
+  /* Looking through vector blocks.  */
+
+  while (block)
+    {
+      struct Lisp_Vector *free_vectors[VECTORS_PER_BLOCK_MAX];
+      int index, used;
+
+      for (index = 0, used = 0, vector = (struct Lisp_Vector *) block->data;
+	   (unsigned char *) vector < block->bump; vector = next)
+	{
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector), used = 1;
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      EMACS_INT nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_SIZE) == 
+		  VECTOR_FREE_LIST_SIZE)
+		vector->header.next.nbytes =
+		  vector->header.size & (VECTOR_BLOCK_SIZE - 1);
+	      
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while ((unsigned char *) next < block->bump)
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_SIZE) == 
+		      VECTOR_FREE_LIST_SIZE)
+		    nbytes = next->header.size & (VECTOR_BLOCK_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+	      
+	      /* Collect a pointer to resulting vector.  */
+	      eassert (index < VECTORS_PER_BLOCK_MAX);
+	      free_vectors[index++] = vector;
+	    }
+	}
+
+      if (used)
+	{
+	  /* Setup collected vectors on a free lists.  */
+	  while (--index >= 0)
+	    {
+	      vector = free_vectors[index];
+	      eassert (vector->header.next.nbytes % roundup_size == 0);
+	      SETUP_ON_FREE_LIST (vector, vector->header.next.nbytes, i);
+	    }
+	  bprev = block, block = block->next;
+	}
+      else
+	{
+	  /* Nothing used in this block.  */
+	  if (bprev)
+	    bprev->next = block->next;
+	  else
+	    vector_block = block->next;
+	  bnext = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  xfree (block);
+	  block = bnext;
+	}
+    }
+
+  /* Sweep large vectors.  */
+
+  vector = large_vectors, prev = NULL;
+
+  while (vector)
+    if (VECTOR_MARKED_P (vector))
+      {
+	VECTOR_UNMARK (vector);
+	total_vector_size += VECTOR_SIZE (vector);
+	prev = vector, vector = vector->header.next.vector;
+      }
+    else
+      {
+	if (prev)
+	  prev->header.next = vector->header.next;
+	else
+	  large_vectors = vector->header.next.vector;
+	next = vector->header.next.vector;
+	lisp_free (vector);
+	vector = next;
+      }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2957,7 +3249,15 @@
   /* eassert (!handling_signal); */
 
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+  if (nbytes > VECTOR_BLOCK_BYTES)
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
+  else
+    /* Rounding is for 32-bit code to preserve alignment.  */
+    p = allocate_vector_from_block (roundup (nbytes, roundup_size));
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2967,9 +3267,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4068,7 +4365,41 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTORLIKE)
+    {
+      if (m->end - m->start == VECTOR_BLOCK_BYTES)
+	{
+	  /* This memory node corresponds to a vector block.  */
+	  struct vector_block *block = (struct vector_block *) m->start;
+
+	  if (p < (void *) block->bump)
+	    {
+	      struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+	      /* P is in the block's allocation range. Scan the block
+		 up to P and see whether P points to the start of some
+		 vector which is not on a free list.  */
+	      while (vector <= (struct Lisp_Vector *) p)
+		{
+		  if ((vector->header.size & VECTOR_FREE_LIST_SIZE)
+		      == VECTOR_FREE_LIST_SIZE)
+		    vector = ADVANCE (vector, (vector->header.size & 
+					       (VECTOR_BLOCK_SIZE - 1)));
+		  else if (vector == p)
+		    return 1;
+		  else
+		    vector = ADVANCE (vector, vector->header.next.nbytes);
+		}
+	    }
+	}
+      else
+	{
+	  if (p == m->start)
+	    /* This memory node corresponds to a large vector.  */
+	    return 1;
+	}
+    }
+  return 0;
 }
 
 
@@ -6237,33 +6568,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6400,7 +6705,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6413,6 +6717,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2012-05-09 17:51:30 +0000
+++ src/lisp.h	2012-05-17 07:43:29 +0000
@@ -899,12 +899,15 @@
 struct vectorlike_header
   {
     EMACS_INT size;
-
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      EMACS_INT nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-17  7:58   ` Dmitry Antipov
@ 2012-05-18 17:40     ` Stefan Monnier
  2012-05-21 12:19       ` Dmitry Antipov
  0 siblings, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-05-18 17:40 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

> IIRC from the previous discussions, you're suggesting rounding-up allocator
> for the sake of simplicity and better speed;

Actually, the main reason is because it has proved over the years to be
a good choice, for example in terms of fragmentation.

> I'm still voting for exact-fit allocator with splitting/coalescing,
> for the sake of best possible memory utilization.

It's not at all clear that the memory utilization will be better in
practice, for example because of fragmentation or because of the need to
keep the "next" field in vectors.

> It would be helpful to get a broader discussion on this.

At this point, I'm OK with using your approach, so I only want to make
sure the code is as simple and clean as possible.

Here are some questions and comments about your code:

> +#define VECTOR_BLOCK_SIZE (4096)

No need to put parens, right?

> +    roundup_size = 8

This deserves a comment explaining the choice of 8 and describing what
this constant is used for.

> +/* One pointer is malloc overhead, others are BUMP and NEXT fields of struct
> +   vector_block.  Rounding helps to maintain alignment constraints.  */
> +#define VECTOR_BLOCK_BYTES \
> +  (VECTOR_BLOCK_SIZE - roundup (3 * sizeof (void *), roundup_size))

Why do we care about malloc overhead?

> +/* Free lists for all possible block-allocated vectors.  */
> +
> +#define VECTOR_FREE_LISTS ((VECTOR_BLOCK_BYTES / roundup_size) + 1)

Some comment here or earlier should make it clear that we have one free
list per vector size.  Maybe just put this declaration together with the
one for "vector_free_lists", after all they go together.

And this macro should be called something like
VECTOR_MAX_FREE_LIST_INDEX; its current names makes it sound like it
returns the free lists (or a pointer to them).

> +/* When the vector is on a free list, vectorlike_header.SIZE is set to
> +   this special value ORed with vector's memory footprint size.  */
> +
> +#define VECTOR_FREE_LIST_SIZE \
> +  (((EMACS_INT) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
> +			(VECTOR_BLOCK_SIZE - 1)))

The name doesn't sound right, it's not a real size, so it should be
VECTOR_FREE_LIST_FLAG.

Also, do we really have to put this info in the `size' field?  E.g. for
cons-cells we put Vdead in the `car' slot.  Of course having it in
`size' is not terrible, but `size' is pretty complicated already.

BTW, if needed we could add a special case so there is only 1 vector of
size 0 and (make-vector 0 ?) always returns that same vector (so it
doesn't have to come from a vector_block).

> +/* Current vector block.  */
> +static struct vector_block *vector_block;

Do we actually want a "current vector block"?
Here's what I mean by that: how often are we going to allocate from this
"current vector block" as opposed to allocating from one of the
free lists?
It would be simpler, when we allocate a new vector block, to just carve
out the vector we need from it and put all the rest directly on the
appropriate free list, so there's never such a thing as a "current
vector block".  That also should lets us get rid of the "bump" field.

> +/* Allocate vector from the vector block.  */
                           ^^^
                            a

> +  /* Next, check free lists contains larger vectors.  */
> +  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;

Add a comment explaining that the "+ sizeof (struct Lisp_Vector)"
eliminates the case where the "excess bytes" aren't sufficient to make
them into a valid vector.

> +/* Return amount of Lisp_Objects can be stored in that vector.  */
                                  ^^^
                                 which

> +  while (block)
> +    {
> +      struct Lisp_Vector *free_vectors[VECTORS_PER_BLOCK_MAX];
> +      int index, used;

You should be able to avoid this two-step "put them in free_vectors,
then loop over free_vectors to put them on the free lists": just at the
end of the coalescing loop, if we bump into the end of the vector_block
check if we're also the first vector in the block, and if so free the
whole (otherwise push the vector on the free list).

> +  if (nbytes > VECTOR_BLOCK_BYTES)
> +    {
> +      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
> +      p->header.next.vector = large_vectors;
> +      large_vectors = p;
> +    }
> +  else
> +    /* Rounding is for 32-bit code to preserve alignment.  */
> +    p = allocate_vector_from_block (roundup (nbytes, roundup_size));

Maybe we should only allocate from vector blocks vectors that are
smaller than (say) VECTOR_BLOCK_BYTES/2.  The reason is that allocating
from a vector block incurs an overhead (memory overhead of "roundup (3 *
sizeof (void *), roundup_size)", and CPU overhead of setting up the
block, allocating the vector out of it, scanning the block, ...).
This is shared among all the vectors in the block, so for smallish
vectors it's worth the trouble (since we save other overheads) but if
the vector is large, there won't be many vectors with which to share
that overhead.

> +	      /* P is in the block's allocation range. Scan the block
> +		 up to P and see whether P points to the start of some
> +		 vector which is not on a free list.  */
> +	      while (vector <= (struct Lisp_Vector *) p)
> +		{
> +		  if ((vector->header.size & VECTOR_FREE_LIST_SIZE)
> +		      == VECTOR_FREE_LIST_SIZE)
> +		    vector = ADVANCE (vector, (vector->header.size & 
> +					       (VECTOR_BLOCK_SIZE - 1)));
> +		  else if (vector == p)
> +		    return 1;
> +		  else
> +		    vector = ADVANCE (vector, vector->header.next.nbytes);
> +		}
> +	    }

This deserves a FIXME mentioning that this could be a significant
overhead.  I don't see how we can get rid of this overhead without using
a different allocation technique, but it's always good to make this
choice explicit.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-18 17:40     ` Stefan Monnier
@ 2012-05-21 12:19       ` Dmitry Antipov
  2012-05-21 13:02         ` Andreas Schwab
  2012-05-21 20:12         ` Stefan Monnier
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2012-05-21 12:19 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 5621 bytes --]

On 05/18/2012 09:40 PM, Stefan Monnier wrote:

> Here are some questions and comments about your code:
>
>> +#define VECTOR_BLOCK_SIZE (4096)
>
> No need to put parens, right?

OK

>> +    roundup_size = 8
>
> This deserves a comment explaining the choice of 8 and describing what
> this constant is used for.

OK

>> +/* One pointer is malloc overhead, others are BUMP and NEXT fields of struct
>> +   vector_block.  Rounding helps to maintain alignment constraints.  */
>> +#define VECTOR_BLOCK_BYTES \
>> +  (VECTOR_BLOCK_SIZE - roundup (3 * sizeof (void *), roundup_size))
>
> Why do we care about malloc overhead?

IIUC, if/when we will have mmap()ed Lisp_Objects someday, this should help
malloc() to do direct mmap()/munmap() of page-size vector blocks.

>> +/* Free lists for all possible block-allocated vectors.  */
>> +
>> +#define VECTOR_FREE_LISTS ((VECTOR_BLOCK_BYTES / roundup_size) + 1)
>
> Some comment here or earlier should make it clear that we have one free
> list per vector size.  Maybe just put this declaration together with the
> one for "vector_free_lists", after all they go together.
>
> And this macro should be called something like
> VECTOR_MAX_FREE_LIST_INDEX; its current names makes it sound like it
> returns the free lists (or a pointer to them).

OK

>> +/* When the vector is on a free list, vectorlike_header.SIZE is set to
>> +   this special value ORed with vector's memory footprint size.  */
>> +
>> +#define VECTOR_FREE_LIST_SIZE \
>> +  (((EMACS_INT) ~0)&  ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
>> +			(VECTOR_BLOCK_SIZE - 1)))
>
> The name doesn't sound right, it's not a real size, so it should be
> VECTOR_FREE_LIST_FLAG.

OK

> Also, do we really have to put this info in the `size' field?  E.g. for
> cons-cells we put Vdead in the `car' slot.  Of course having it in
> `size' is not terrible, but `size' is pretty complicated already.

May be. I'll try to consider some alternatives.

> BTW, if needed we could add a special case so there is only 1 vector of
> size 0 and (make-vector 0 ?) always returns that same vector (so it
> doesn't have to come from a vector_block).

Should we allocate it from pure space?

>> +/* Current vector block.  */
>> +static struct vector_block *vector_block;
>
> Do we actually want a "current vector block"?
> Here's what I mean by that: how often are we going to allocate from this
> "current vector block" as opposed to allocating from one of the
> free lists?
> It would be simpler, when we allocate a new vector block, to just carve
> out the vector we need from it and put all the rest directly on the
> appropriate free list, so there's never such a thing as a "current
> vector block".  That also should lets us get rid of the "bump" field.

Argh, yes.

>> +  /* Next, check free lists contains larger vectors.  */
>> +  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;
>
> Add a comment explaining that the "+ sizeof (struct Lisp_Vector)"
> eliminates the case where the "excess bytes" aren't sufficient to make
> them into a valid vector.

OK

>> +  while (block)
>> +    {
>> +      struct Lisp_Vector *free_vectors[VECTORS_PER_BLOCK_MAX];
>> +      int index, used;
>
> You should be able to avoid this two-step "put them in free_vectors,
> then loop over free_vectors to put them on the free lists": just at the
> end of the coalescing loop, if we bump into the end of the vector_block
> check if we're also the first vector in the block, and if so free the
> whole (otherwise push the vector on the free list).

OK

>> +  if (nbytes>  VECTOR_BLOCK_BYTES)
>> +    {
>> +      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
>> +      p->header.next.vector = large_vectors;
>> +      large_vectors = p;
>> +    }
>> +  else
>> +    /* Rounding is for 32-bit code to preserve alignment.  */
>> +    p = allocate_vector_from_block (roundup (nbytes, roundup_size));
>
> Maybe we should only allocate from vector blocks vectors that are
> smaller than (say) VECTOR_BLOCK_BYTES/2.  The reason is that allocating
> from a vector block incurs an overhead (memory overhead of "roundup (3 *
> sizeof (void *), roundup_size)", and CPU overhead of setting up the
> block, allocating the vector out of it, scanning the block, ...).
> This is shared among all the vectors in the block, so for smallish
> vectors it's worth the trouble (since we save other overheads) but if
> the vector is large, there won't be many vectors with which to share
> that overhead.

Hm... didn't checked that, most probably a very negligible difference.

>> +	      /* P is in the block's allocation range. Scan the block
>> +		 up to P and see whether P points to the start of some
>> +		 vector which is not on a free list.  */
>> +	      while (vector<= (struct Lisp_Vector *) p)
>> +		{
>> +		  if ((vector->header.size&  VECTOR_FREE_LIST_SIZE)
>> +		      == VECTOR_FREE_LIST_SIZE)
>> +		    vector = ADVANCE (vector, (vector->header.size&
>> +					       (VECTOR_BLOCK_SIZE - 1)));
>> +		  else if (vector == p)
>> +		    return 1;
>> +		  else
>> +		    vector = ADVANCE (vector, vector->header.next.nbytes);
>> +		}
>> +	    }
>
> This deserves a FIXME mentioning that this could be a significant
> overhead.  I don't see how we can get rid of this overhead without using
> a different allocation technique, but it's always good to make this
> choice explicit.

OK

As for the roundup() check, it looks like GNU/Linux and *BSD systems
both has it in sys/param.h, but (Open)Solaris uses sys/sysmacros.h.
So I'm trying to check for the most common case and then fall back
to simple default.

Dmitry






[-- Attachment #2: vector_alloc.patch --]
[-- Type: text/plain, Size: 14315 bytes --]

=== modified file 'configure.in'
--- configure.in	2012-05-21 07:30:23 +0000
+++ configure.in	2012-05-21 10:39:57 +0000
@@ -1773,6 +1773,15 @@
 
 AC_CHECK_LIB(pthreads, cma_open)
 
+dnl Check whether roundup is available.
+if test "$ac_cv_header_sys_param_h"; then
+  AC_MSG_CHECKING([for roundup in sys/param.h])
+  AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include <sys/param.h>]], [[roundup (5, 8);]])],
+    [AC_MSG_RESULT([yes])
+     AC_DEFINE(HAVE_ROUNDUP, 1, [Define to 1 if `roundup' is declared in <sys/param.h>.])],
+    [AC_MSG_RESULT([no])])
+fi
+
 ## Note: when using cpp in s/aix4.2.h, this definition depended on
 ## HAVE_LIBPTHREADS.  That was not defined earlier in configure when
 ## the system file was sourced.  Hence the value of LIBS_SYSTEM

=== modified file 'src/alloc.c'
--- src/alloc.c	2012-04-23 05:44:49 +0000
+++ src/alloc.c	2012-05-21 12:10:23 +0000
@@ -25,6 +25,12 @@
 
 #include <signal.h>
 
+#if defined HAVE_ROUNDUP
+#include <sys/param.h>
+#else
+#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
+#endif
+
 #ifdef HAVE_PTHREAD
 #include <pthread.h>
 #endif
@@ -2924,17 +2930,304 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
-
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE 4096
 
 /* Handy constants for vectorlike objects.  */
 enum
   {
     header_size = offsetof (struct Lisp_Vector, contents),
-    word_size = sizeof (Lisp_Object)
+    word_size = sizeof (Lisp_Object),
+    /* On a 32-bit system, rounding up vector size (in bytes) up
+       to mult-of-8 helps to maintain mult-of-8 alignment.  */
+    roundup_size = 8
   };
 
+/* One pointer is malloc overhead, other is NEXT field of struct
+   vector_block.  Rounding helps to maintain alignment constraints.  */
+
+#define VECTOR_BLOCK_BYTES \
+  (VECTOR_BLOCK_SIZE - roundup (2 * sizeof (void *), roundup_size))
+
+/* Maximum amount of vectors allocated from the vector block.  */
+
+#define VECTORS_PER_BLOCK_MAX \
+  (VECTOR_BLOCK_BYTES / sizeof (struct vectorlike_header))
+
+/* We maintain one free list for each possible block-allocated
+   vector size, and this is now much of the free lists we have.  */
+
+#define VECTOR_MAX_FREE_LIST_INDEX ((VECTOR_BLOCK_BYTES / roundup_size) + 1)
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_FLAG \
+  (((EMACS_INT) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
+			(VECTOR_BLOCK_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) \
+  (struct Lisp_Vector *) ((unsigned char *) (v) + (nbytes))
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index) do { \
+  (v)->header.size = VECTOR_FREE_LIST_FLAG | (nbytes); \
+  eassert ((nbytes) % roundup_size == 0); \
+  (index) = (nbytes) / roundup_size; \
+  eassert ((index) < VECTOR_MAX_FREE_LIST_INDEX); \
+  (v)->header.next.vector = vector_free_lists[(index)]; \
+  vector_free_lists[(index)] = (v); } while (0)
+
+struct vector_block
+{
+  unsigned char data[VECTOR_BLOCK_BYTES];
+  struct vector_block *next;
+};
+
+/* Chain of vector blocks.  */
+
+static struct vector_block *vector_blocks;
+
+/* Vector free lists, where NTH item points to a chain
+   of free vectors of NTH * ROUNDUP_SIZE bytes.  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* The only vector with 0 slots, allocated from pure space.  */
+
+static struct Lisp_Vector *zero_vector;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  block = xmalloc (sizeof (struct vector_block));
+  if (!block)
+    memory_full (VECTOR_BLOCK_SIZE);
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTORLIKE);
+#endif
+
+  block->next = vector_blocks;
+  vector_blocks = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  int i;
+
+  large_vectors = NULL;
+
+  zero_vector = (struct Lisp_Vector *)
+    pure_alloc (header_size, Lisp_Vectorlike);
+  zero_vector->header.size = 0;
+
+  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
+    vector_free_lists[i] = NULL;
+}
+
+/* Allocate vector from a vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector, *rest;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  /* No vectors with 0 slots for Lisp_Objects here.  */
+  eassert (nbytes > sizeof (struct vectorlike_header) &&
+	   nbytes <= VECTOR_BLOCK_BYTES);
+  eassert (nbytes % roundup_size == 0);
+
+  /* First, try to allocate from a free list
+     contains vectors of the requested size.  */
+  index = nbytes / roundup_size;
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists contains larger vectors.  Since we will
+     split the result, we should have remaining space large enough
+     to use for one-slot vector at least.  */
+  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;
+       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
+    if (vector_free_lists[index])
+      {
+	/* This vector is larger than it was requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excessive bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * roundup_size - nbytes;
+	eassert (restbytes % roundup_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Finally, need a new vector block.  */
+  block = allocate_vector_block ();
+
+  /* New vector will be at the beginning of this block.  */
+  vector = (struct Lisp_Vector *) block->data;
+  vector->header.next.nbytes = nbytes;
+
+  /* The rest of this block is set up on a free list.  */
+  rest = ADVANCE (vector, nbytes);
+  restbytes = VECTOR_BLOCK_BYTES - nbytes;
+  index = restbytes / roundup_size;
+  SETUP_ON_FREE_LIST (rest, restbytes, index);
+
+  return vector;
+ }
+
+/* Return amount of Lisp_Objects which can be stored in that vector.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
+  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) : (v)->header.size)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_blocks, *bprev = NULL, *bnext;
+  struct Lisp_Vector *vector, *prev, *next;
+  int i;
+
+  total_vector_size = 0;
+  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
+    vector_free_lists[i] = NULL;
+
+  /* Looking through vector blocks.  */
+
+  while (block)
+    {
+      struct Lisp_Vector *free_vectors[VECTORS_PER_BLOCK_MAX];
+      int index, free_block_p;
+
+      for (index = 0, vector = (struct Lisp_Vector *) block->data;
+	   (unsigned char *) vector < block->data + VECTOR_BLOCK_BYTES;
+	   vector = next)
+	{
+	  free_block_p = 0;
+
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector);
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      EMACS_INT nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG) == 
+		  VECTOR_FREE_LIST_FLAG)
+		vector->header.next.nbytes =
+		  vector->header.size & (VECTOR_BLOCK_SIZE - 1);
+	      
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while ((unsigned char *) next < block->data + VECTOR_BLOCK_BYTES)
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_FLAG) == 
+		      VECTOR_FREE_LIST_FLAG)
+		    nbytes = next->header.size & (VECTOR_BLOCK_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+	      
+	      eassert (index < VECTORS_PER_BLOCK_MAX);
+	      eassert (vector->header.next.nbytes % roundup_size == 0);
+
+	      if (vector == (struct Lisp_Vector *) block->data &&
+		  (unsigned char *) next >= block->data + VECTOR_BLOCK_BYTES)
+		/* This block should be freed because all of it's
+		   space was coalesced into the only free vector.  */
+		free_block_p = 1;
+	      else
+		SETUP_ON_FREE_LIST (vector, vector->header.next.nbytes, i);
+	    }
+	}
+
+      if (free_block_p)
+	{
+	  if (bprev)
+	    bprev->next = block->next;
+	  else
+	    vector_blocks = block->next;
+	  bnext = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  xfree (block);
+	  block = bnext;
+	}
+      else
+	bprev = block, block = block->next;
+    }
+
+  /* Sweep large vectors.  */
+
+  vector = large_vectors, prev = NULL;
+
+  while (vector)
+    if (VECTOR_MARKED_P (vector))
+      {
+	VECTOR_UNMARK (vector);
+	total_vector_size += VECTOR_SIZE (vector);
+	prev = vector, vector = vector->header.next.vector;
+      }
+    else
+      {
+	if (prev)
+	  prev->header.next = vector->header.next;
+	else
+	  large_vectors = vector->header.next.vector;
+	next = vector->header.next.vector;
+	lisp_free (vector);
+	vector = next;
+      }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2956,8 +3249,20 @@
   /* This gets triggered by code which I haven't bothered to fix.  --Stef  */
   /* eassert (!handling_signal); */
 
+  if (len == 0)
+    return zero_vector;
+
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+
+  if (nbytes > VECTOR_BLOCK_BYTES)
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
+  else
+    /* Rounding is to preserve alignment.  */
+    p = allocate_vector_from_block (roundup (nbytes, roundup_size));
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2967,9 +3272,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4068,7 +4370,39 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTORLIKE)
+    {
+      if (m->end - m->start == VECTOR_BLOCK_BYTES)
+	{
+	  /* This memory node corresponds to a vector block.  */
+	  struct vector_block *block = (struct vector_block *) m->start;
+	  struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+	  /* P is in the block's allocation range.  Scan the block
+	     up to P and see whether P points to the start of some
+	     vector which is not on a free list.  FIXME: check whether
+	     some allocation patterns (probably a lot of short vectors)
+	     may cause a substantial overhead of this loop.  */
+	  while (vector <= (struct Lisp_Vector *) p)
+	    {
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG)
+		  == VECTOR_FREE_LIST_FLAG)
+		vector = ADVANCE (vector, (vector->header.size & 
+					   (VECTOR_BLOCK_SIZE - 1)));
+	      else if (vector == p)
+		return 1;
+	      else
+		vector = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	}
+      else
+	{
+	  if (p == m->start)
+	    /* This memory node corresponds to a large vector.  */
+	    return 1;
+	}
+    }
+  return 0;
 }
 
 
@@ -6237,33 +6571,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6400,7 +6708,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6413,6 +6720,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2012-05-09 17:51:30 +0000
+++ src/lisp.h	2012-05-21 07:51:40 +0000
@@ -899,12 +899,15 @@
 struct vectorlike_header
   {
     EMACS_INT size;
-
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      EMACS_INT nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-21 12:19       ` Dmitry Antipov
@ 2012-05-21 13:02         ` Andreas Schwab
  2012-05-21 13:48           ` Dmitry Antipov
  2012-05-21 20:12         ` Stefan Monnier
  1 sibling, 1 reply; 62+ messages in thread
From: Andreas Schwab @ 2012-05-21 13:02 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: Stefan Monnier, emacs-devel

Dmitry Antipov <dmantipov@yandex.ru> writes:

> --- src/alloc.c	2012-04-23 05:44:49 +0000
> +++ src/alloc.c	2012-05-21 12:10:23 +0000
> @@ -25,6 +25,12 @@
>  
>  #include <signal.h>
>  
> +#if defined HAVE_ROUNDUP
> +#include <sys/param.h>
> +#else
> +#define roundup(x, y) ((((x) + ((y) - 1)) / (y)) * (y))
> +#endif
> +

It doesn't make much sense to try to find such a trivial but non-std
macro in a system header.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-21 13:02         ` Andreas Schwab
@ 2012-05-21 13:48           ` Dmitry Antipov
  2012-05-21 15:07             ` Andreas Schwab
  2012-05-22  5:23             ` Ken Raeburn
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2012-05-21 13:48 UTC (permalink / raw)
  To: Andreas Schwab; +Cc: Stefan Monnier, emacs-devel

On 05/21/2012 05:02 PM, Andreas Schwab wrote:

> It doesn't make much sense to try to find such a trivial but non-std
> macro in a system header.

Hm, glibc provides some not-so-trivial bits (assuming gcc):

# define roundup(x, y)  (__builtin_constant_p (y) && powerof2 (y)             \
                          ? (((x) + (y) - 1) & ~((y) - 1))                     \
                          : ((((x) + ((y) - 1)) / (y)) * (y)))

It seems to be a very negligible optimization, but why not use it if available?

Dmitry



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-21 13:48           ` Dmitry Antipov
@ 2012-05-21 15:07             ` Andreas Schwab
  2012-05-22  5:23             ` Ken Raeburn
  1 sibling, 0 replies; 62+ messages in thread
From: Andreas Schwab @ 2012-05-21 15:07 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: Stefan Monnier, emacs-devel

Dmitry Antipov <dmantipov@yandex.ru> writes:

> On 05/21/2012 05:02 PM, Andreas Schwab wrote:
>
>> It doesn't make much sense to try to find such a trivial but non-std
>> macro in a system header.
>
> Hm, glibc provides some not-so-trivial bits (assuming gcc):
>
> # define roundup(x, y)  (__builtin_constant_p (y) && powerof2 (y)             \
>                          ? (((x) + (y) - 1) & ~((y) - 1))                     \
>                          : ((((x) + ((y) - 1)) / (y)) * (y)))
>
> It seems to be a very negligible optimization, but why not use it if available?

Since you are always using a constant you can define your roundup macro
accordingly.  But note that the expansion is buggy for some types of
arguments, so one more reason not to use it.

Andreas.

-- 
Andreas Schwab, schwab@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-21 13:48           ` Dmitry Antipov
  2012-05-21 15:07             ` Andreas Schwab
@ 2012-05-22  5:23             ` Ken Raeburn
  1 sibling, 0 replies; 62+ messages in thread
From: Ken Raeburn @ 2012-05-22  5:23 UTC (permalink / raw)
  To: Emacs Dev

On May 21, 2012, at 09:48, Dmitry Antipov wrote:
> On 05/21/2012 05:02 PM, Andreas Schwab wrote:
> 
>> It doesn't make much sense to try to find such a trivial but non-std
>> macro in a system header.
> 
> Hm, glibc provides some not-so-trivial bits (assuming gcc):
> 
> # define roundup(x, y)  (__builtin_constant_p (y) && powerof2 (y)             \
>                         ? (((x) + (y) - 1) & ~((y) - 1))                     \
>                         : ((((x) + ((y) - 1)) / (y)) * (y)))
> 
> It seems to be a very negligible optimization, but why not use it if available?

Because a good compiler should be able to make that optimization itself if you use the simple, straightforward version?  (You might have to use an unsigned type for x.)  My Mac's compiler does.  I get the impression glibc has a lot of optimizations targeted at older versions of gcc.

You need to supply the fallback version anyways, so just use the version optimized for human readability and maintenance, instead of jumping through hoops to dig through multiple system headers for a trivial arithmetic macro to maybe save a few microseconds on a system with outdated software.

Ken

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-21 12:19       ` Dmitry Antipov
  2012-05-21 13:02         ` Andreas Schwab
@ 2012-05-21 20:12         ` Stefan Monnier
  2012-05-22  8:24           ` Dmitry Antipov
  2012-05-31 13:44           ` Dmitry Antipov
  1 sibling, 2 replies; 62+ messages in thread
From: Stefan Monnier @ 2012-05-21 20:12 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>>> +#define VECTOR_BLOCK_SIZE (4096)
[...]
>>> +/* One pointer is malloc overhead, others are BUMP and NEXT fields of struct
>>> +   vector_block.  Rounding helps to maintain alignment constraints.  */
>>> +#define VECTOR_BLOCK_BYTES \
>>> +  (VECTOR_BLOCK_SIZE - roundup (3 * sizeof (void *), roundup_size))
>> Why do we care about malloc overhead?
> IIUC, if/when we will have mmap()ed Lisp_Objects someday, this should help
> malloc() to do direct mmap()/munmap() of page-size vector blocks.

There are several conditions here, some of them might never
actually happen.  So I'd rather not worry about it for now.

We can even define VECTOR_BLOCK_BYTES to some arbitrary constant and not
worry about VECTOR_BLOCK_SIZE at all, then.

>> BTW, if needed we could add a special case so there is only 1 vector of
>> size 0 and (make-vector 0 ?) always returns that same vector (so it
>> doesn't have to come from a vector_block).
> Should we allocate it from pure space?

Yes, we could do that.

> As for the roundup() check, it looks like GNU/Linux and *BSD systems
> both has it in sys/param.h, but (Open)Solaris uses sys/sysmacros.h.
> So I'm trying to check for the most common case and then fall back
> to simple default.

Since it's a constant we can choose, we can choose that constant to be
one that makes for efficient code.  So I think we don't need to use some
predefined macro but can code up our own, custom-designed for our
specific needs, without worry about where various systems define
`roundup'.


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-21 20:12         ` Stefan Monnier
@ 2012-05-22  8:24           ` Dmitry Antipov
  2012-05-31 13:44           ` Dmitry Antipov
  1 sibling, 0 replies; 62+ messages in thread
From: Dmitry Antipov @ 2012-05-22  8:24 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 870 bytes --]

On 05/22/2012 12:12 AM, Stefan Monnier wrote:

>> IIUC, if/when we will have mmap()ed Lisp_Objects someday, this should help
>> malloc() to do direct mmap()/munmap() of page-size vector blocks.
>
> There are several conditions here, some of them might never
> actually happen.  So I'd rather not worry about it for now.

OK

>> As for the roundup() check, it looks like GNU/Linux and *BSD systems
>> both has it in sys/param.h, but (Open)Solaris uses sys/sysmacros.h.
>> So I'm trying to check for the most common case and then fall back
>> to simple default.
>
> Since it's a constant we can choose, we can choose that constant to be
> one that makes for efficient code.  So I think we don't need to use some
> predefined macro but can code up our own, custom-designed for our
> specific needs, without worry about where various systems define
> `roundup'.

OK

Dmitry

[-- Attachment #2: vector_alloc.patch --]
[-- Type: text/plain, Size: 13282 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2012-05-21 15:36:54 +0000
+++ src/alloc.c	2012-05-22 07:34:36 +0000
@@ -2924,17 +2924,305 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
-
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE 4096
+
+/* Round up X to nearest mult-of-Y, assuming Y is a power of 2.  */
+
+#define roundup_powof2(x,y) (((x) + (y) - 1) & ~((y) - 1))
 
 /* Handy constants for vectorlike objects.  */
 enum
   {
     header_size = offsetof (struct Lisp_Vector, contents),
-    word_size = sizeof (Lisp_Object)
+    word_size = sizeof (Lisp_Object),
+    /* On a 32-bit system, rounding up vector size (in bytes) up
+       to mult-of-8 helps to maintain mult-of-8 alignment.  */
+    roundup_size = 8
   };
 
+/* Rounding helps to maintain alignment constraints.  */
+
+#define VECTOR_BLOCK_BYTES \
+  (VECTOR_BLOCK_SIZE - roundup_powof2 (sizeof (void *), roundup_size))
+
+/* Maximum amount of vectors allocated from the vector block.  */
+
+#define VECTORS_PER_BLOCK_MAX \
+  (VECTOR_BLOCK_BYTES / sizeof (struct vectorlike_header))
+
+/* We maintain one free list for each possible block-allocated
+   vector size, and this is now much of the free lists we have.  */
+
+#define VECTOR_MAX_FREE_LIST_INDEX ((VECTOR_BLOCK_BYTES / roundup_size) + 1)
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_FLAG \
+  (((EMACS_INT) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
+			(VECTOR_BLOCK_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) \
+  (struct Lisp_Vector *) ((unsigned char *) (v) + (nbytes))
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index) do { \
+  (v)->header.size = VECTOR_FREE_LIST_FLAG | (nbytes); \
+  eassert ((nbytes) % roundup_size == 0); \
+  (index) = (nbytes) / roundup_size; \
+  eassert ((index) < VECTOR_MAX_FREE_LIST_INDEX); \
+  (v)->header.next.vector = vector_free_lists[(index)]; \
+  vector_free_lists[(index)] = (v); } while (0)
+
+struct vector_block
+{
+  unsigned char data[VECTOR_BLOCK_BYTES];
+  struct vector_block *next;
+};
+
+/* Chain of vector blocks.  */
+
+static struct vector_block *vector_blocks;
+
+/* Vector free lists, where NTH item points to a chain
+   of free vectors of NTH * ROUNDUP_SIZE bytes.  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* The only vector with 0 slots, allocated from pure space.  */
+
+static struct Lisp_Vector *zero_vector;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  block = xmalloc (sizeof (struct vector_block));
+  if (!block)
+    memory_full (VECTOR_BLOCK_SIZE);
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTORLIKE);
+#endif
+
+  block->next = vector_blocks;
+  vector_blocks = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  int i;
+
+  large_vectors = NULL;
+
+  zero_vector = (struct Lisp_Vector *)
+    pure_alloc (header_size, Lisp_Vectorlike);
+  zero_vector->header.size = 0;
+
+  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
+    vector_free_lists[i] = NULL;
+}
+
+/* Allocate vector from a vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector, *rest;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  /* No vectors with 0 slots for Lisp_Objects here.  */
+  eassert (nbytes > sizeof (struct vectorlike_header) &&
+	   nbytes <= VECTOR_BLOCK_BYTES);
+  eassert (nbytes % roundup_size == 0);
+
+  /* First, try to allocate from a free list
+     contains vectors of the requested size.  */
+  index = nbytes / roundup_size;
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists contains larger vectors.  Since we will
+     split the result, we should have remaining space large enough
+     to use for one-slot vector at least.  */
+  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;
+       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
+    if (vector_free_lists[index])
+      {
+	/* This vector is larger than it was requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excessive bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * roundup_size - nbytes;
+	eassert (restbytes % roundup_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Finally, need a new vector block.  */
+  block = allocate_vector_block ();
+
+  /* New vector will be at the beginning of this block.  */
+  vector = (struct Lisp_Vector *) block->data;
+  vector->header.next.nbytes = nbytes;
+
+  /* The rest of this block is set up on a free list.  */
+  rest = ADVANCE (vector, nbytes);
+  restbytes = VECTOR_BLOCK_BYTES - nbytes;
+  index = restbytes / roundup_size;
+  SETUP_ON_FREE_LIST (rest, restbytes, index);
+
+  return vector;
+ }
+
+/* Return amount of Lisp_Objects which can be stored in that vector.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
+  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) : (v)->header.size)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_blocks, *bprev = NULL, *bnext;
+  struct Lisp_Vector *vector, *prev, *next;
+  int i;
+
+  total_vector_size = 0;
+  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
+    vector_free_lists[i] = NULL;
+
+  /* Looking through vector blocks.  */
+
+  while (block)
+    {
+      int free_this_block;
+
+      for (vector = (struct Lisp_Vector *) block->data;
+	   (unsigned char *) vector < block->data + VECTOR_BLOCK_BYTES;
+	   vector = next)
+	{
+	  free_this_block = 0;
+
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector);
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      EMACS_INT nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG) == 
+		  VECTOR_FREE_LIST_FLAG)
+		vector->header.next.nbytes =
+		  vector->header.size & (VECTOR_BLOCK_SIZE - 1);
+	      
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while ((unsigned char *) next < block->data + VECTOR_BLOCK_BYTES)
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_FLAG) == 
+		      VECTOR_FREE_LIST_FLAG)
+		    nbytes = next->header.size & (VECTOR_BLOCK_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+	      
+	      eassert (vector->header.next.nbytes % roundup_size == 0);
+
+	      if (vector == (struct Lisp_Vector *) block->data &&
+		  (unsigned char *) next >= block->data + VECTOR_BLOCK_BYTES)
+		/* This block should be freed because all of it's
+		   space was coalesced into the only free vector.  */
+		free_this_block = 1;
+	      else
+		SETUP_ON_FREE_LIST (vector, vector->header.next.nbytes, i);
+	    }
+	}
+
+      if (free_this_block)
+	{
+	  if (bprev)
+	    bprev->next = block->next;
+	  else
+	    vector_blocks = block->next;
+	  bnext = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  xfree (block);
+	  block = bnext;
+	}
+      else
+	bprev = block, block = block->next;
+    }
+
+  /* Sweep large vectors.  */
+
+  vector = large_vectors, prev = NULL;
+
+  while (vector)
+    if (VECTOR_MARKED_P (vector))
+      {
+	VECTOR_UNMARK (vector);
+	total_vector_size += VECTOR_SIZE (vector);
+	prev = vector, vector = vector->header.next.vector;
+      }
+    else
+      {
+	if (prev)
+	  prev->header.next = vector->header.next;
+	else
+	  large_vectors = vector->header.next.vector;
+	next = vector->header.next.vector;
+	lisp_free (vector);
+	vector = next;
+      }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2956,8 +3244,20 @@
   /* This gets triggered by code which I haven't bothered to fix.  --Stef  */
   /* eassert (!handling_signal); */
 
+  if (len == 0)
+    return zero_vector;
+
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+
+  if (nbytes > VECTOR_BLOCK_BYTES)
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
+  else
+    /* Rounding is to preserve alignment.  */
+    p = allocate_vector_from_block (roundup_powof2 (nbytes, roundup_size));
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2967,9 +3267,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4068,7 +4365,39 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTORLIKE)
+    {
+      if (m->end - m->start == VECTOR_BLOCK_BYTES)
+	{
+	  /* This memory node corresponds to a vector block.  */
+	  struct vector_block *block = (struct vector_block *) m->start;
+	  struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+	  /* P is in the block's allocation range.  Scan the block
+	     up to P and see whether P points to the start of some
+	     vector which is not on a free list.  FIXME: check whether
+	     some allocation patterns (probably a lot of short vectors)
+	     may cause a substantial overhead of this loop.  */
+	  while (vector <= (struct Lisp_Vector *) p)
+	    {
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG)
+		  == VECTOR_FREE_LIST_FLAG)
+		vector = ADVANCE (vector, (vector->header.size & 
+					   (VECTOR_BLOCK_SIZE - 1)));
+	      else if (vector == p)
+		return 1;
+	      else
+		vector = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	}
+      else
+	{
+	  if (p == m->start)
+	    /* This memory node corresponds to a large vector.  */
+	    return 1;
+	}
+    }
+  return 0;
 }
 
 
@@ -6237,33 +6566,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6400,7 +6703,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6413,6 +6715,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2012-05-21 15:36:54 +0000
+++ src/lisp.h	2012-05-22 07:13:18 +0000
@@ -899,12 +899,15 @@
 struct vectorlike_header
   {
     EMACS_INT size;
-
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      EMACS_INT nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-21 20:12         ` Stefan Monnier
  2012-05-22  8:24           ` Dmitry Antipov
@ 2012-05-31 13:44           ` Dmitry Antipov
  2012-05-31 15:43             ` Paul Eggert
  2012-05-31 21:16             ` Stefan Monnier
  1 sibling, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2012-05-31 13:44 UTC (permalink / raw)
  To: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 163 bytes --]

Awaiting for more comments on this. If there will be no serious
objections, I'll write ChangeLog entries and do other makeups
for the final consideration.

Dmitry

[-- Attachment #2: vector_alloc.patch --]
[-- Type: text/plain, Size: 13823 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2012-05-30 07:59:44 +0000
+++ src/alloc.c	2012-05-31 13:20:29 +0000
@@ -2926,17 +2926,314 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
-
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE 4096
+
+/* Round up X to nearest mult-of-Y, assuming Y is a power of 2.  */
+
+#define roundup_powof2(x,y) (((x) + (y) - 1) & ~((y) - 1))
 
 /* Handy constants for vectorlike objects.  */
 enum
   {
     header_size = offsetof (struct Lisp_Vector, contents),
-    word_size = sizeof (Lisp_Object)
+    word_size = sizeof (Lisp_Object),
+    /* On a 32-bit system, rounding up vector size (in bytes) up
+       to mult-of-8 helps to maintain mult-of-8 alignment.  */
+    roundup_size = 8
   };
 
+/* Rounding helps to maintain alignment constraints.  */
+
+#define VECTOR_BLOCK_BYTES \
+  (VECTOR_BLOCK_SIZE - roundup_powof2 (sizeof (void *), roundup_size))
+
+/* Maximum amount of vectors allocated from the vector block.  */
+
+#define VECTORS_PER_BLOCK_MAX \
+  (VECTOR_BLOCK_BYTES / sizeof (struct vectorlike_header))
+
+/* We maintain one free list for each possible block-allocated
+   vector size, and this is now much of the free lists we have.  */
+
+#define VECTOR_MAX_FREE_LIST_INDEX ((VECTOR_BLOCK_BYTES / roundup_size) + 1)
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_FLAG \
+  (((ptrdiff_t) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
+			(VECTOR_BLOCK_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) \
+  (struct Lisp_Vector *) ((unsigned char *) (v) + (nbytes))
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index) do { \
+  (v)->header.size = VECTOR_FREE_LIST_FLAG | (nbytes); \
+  eassert ((nbytes) % roundup_size == 0); \
+  (index) = (nbytes) / roundup_size; \
+  eassert ((index) < VECTOR_MAX_FREE_LIST_INDEX); \
+  (v)->header.next.vector = vector_free_lists[(index)]; \
+  vector_free_lists[(index)] = (v); } while (0)
+
+struct vector_block
+{
+  unsigned char data[VECTOR_BLOCK_BYTES];
+  struct vector_block *next;
+};
+
+/* Chain of vector blocks.  */
+
+static struct vector_block *vector_blocks;
+
+/* Vector free lists, where NTH item points to a chain
+   of free vectors of NTH * ROUNDUP_SIZE bytes.  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* The only vector with 0 slots, allocated from pure space.  */
+
+static struct Lisp_Vector *zero_vector;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  block = xmalloc (sizeof (struct vector_block));
+  if (!block)
+    memory_full (VECTOR_BLOCK_SIZE);
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTORLIKE);
+#endif
+
+  block->next = vector_blocks;
+  vector_blocks = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  int i;
+
+  large_vectors = NULL;
+
+  zero_vector = (struct Lisp_Vector *)
+    pure_alloc (header_size, Lisp_Vectorlike);
+  zero_vector->header.size = 0;
+
+  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
+    vector_free_lists[i] = NULL;
+}
+
+/* Allocate vector from a vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector, *rest;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  /* No vectors with 0 slots for Lisp_Objects here.  */
+  eassert (nbytes > sizeof (struct vectorlike_header) &&
+	   nbytes <= VECTOR_BLOCK_BYTES);
+  eassert (nbytes % roundup_size == 0);
+
+  /* First, try to allocate from a free list
+     contains vectors of the requested size.  */
+  index = nbytes / roundup_size;
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists contains larger vectors.  Since we will
+     split the result, we should have remaining space large enough
+     to use for one-slot vector at least.  */
+  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;
+       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
+    if (vector_free_lists[index])
+      {
+	/* This vector is larger than it was requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excessive bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * roundup_size - nbytes;
+	eassert (restbytes % roundup_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Finally, need a new vector block.  */
+  block = allocate_vector_block ();
+
+  /* New vector will be at the beginning of this block.  */
+  vector = (struct Lisp_Vector *) block->data;
+  vector->header.next.nbytes = nbytes;
+
+  /* If the rest of space from this block is large enough
+     for 1-slot vector at least, set up it on a free list.  */
+  restbytes = VECTOR_BLOCK_BYTES - nbytes;
+  if (restbytes >= sizeof (struct Lisp_Vector))
+    {
+      eassert (restbytes % roundup_size == 0);
+      rest = ADVANCE (vector, nbytes);
+      index = restbytes / roundup_size;
+      SETUP_ON_FREE_LIST (rest, restbytes, index);
+    }
+  return vector;
+ }
+
+/* Return amount of Lisp_Objects which can be stored in that vector.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
+  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) : (v)->header.size)
+
+/* Nonzero if VECTOR pointer is valid pointer inside BLOCK.  */
+
+#define VECTOR_IN_BLOCK(vector,block) \
+  (unsigned char *) (vector) <= (block)->data + \
+    VECTOR_BLOCK_BYTES - sizeof (struct Lisp_Vector)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_blocks, *bprev = NULL, *bnext;
+  struct Lisp_Vector *vector, *prev, *next;
+  int i;
+
+  total_vector_size = 0;
+  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
+    vector_free_lists[i] = NULL;
+
+  /* Looking through vector blocks.  */
+
+  while (block)
+    {
+      int free_this_block;
+
+      for (vector = (struct Lisp_Vector *) block->data;
+	   VECTOR_IN_BLOCK (vector, block); vector = next)
+	{
+	  free_this_block = 0;
+
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector);
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      ptrdiff_t nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG) == 
+		  VECTOR_FREE_LIST_FLAG)
+		vector->header.next.nbytes =
+		  vector->header.size & (VECTOR_BLOCK_SIZE - 1);
+	      
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while (VECTOR_IN_BLOCK (next, block))
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_FLAG) == 
+		      VECTOR_FREE_LIST_FLAG)
+		    nbytes = next->header.size & (VECTOR_BLOCK_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+	      
+	      eassert (vector->header.next.nbytes % roundup_size == 0);
+
+	      if (vector == (struct Lisp_Vector *) block->data &&
+		  (unsigned char *) next >= block->data + VECTOR_BLOCK_BYTES)
+		/* This block should be freed because all of it's
+		   space was coalesced into the only free vector.  */
+		free_this_block = 1;
+	      else
+		SETUP_ON_FREE_LIST (vector, vector->header.next.nbytes, i);
+	    }
+	}
+
+      if (free_this_block)
+	{
+	  if (bprev)
+	    bprev->next = block->next;
+	  else
+	    vector_blocks = block->next;
+	  bnext = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  xfree (block);
+	  block = bnext;
+	}
+      else
+	bprev = block, block = block->next;
+    }
+
+  /* Sweep large vectors.  */
+
+  vector = large_vectors, prev = NULL;
+
+  while (vector)
+    if (VECTOR_MARKED_P (vector))
+      {
+	VECTOR_UNMARK (vector);
+	total_vector_size += VECTOR_SIZE (vector);
+	prev = vector, vector = vector->header.next.vector;
+      }
+    else
+      {
+	if (prev)
+	  prev->header.next = vector->header.next;
+	else
+	  large_vectors = vector->header.next.vector;
+	next = vector->header.next.vector;
+	lisp_free (vector);
+	vector = next;
+      }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2958,8 +3255,20 @@
   /* This gets triggered by code which I haven't bothered to fix.  --Stef  */
   /* eassert (!handling_signal); */
 
+  if (len == 0)
+    return zero_vector;
+
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+
+  if (nbytes > VECTOR_BLOCK_BYTES)
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
+  else
+    /* Rounding is to preserve alignment.  */
+    p = allocate_vector_from_block (roundup_powof2 (nbytes, roundup_size));
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2969,9 +3278,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4070,7 +4376,40 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTORLIKE)
+    {
+      if (m->end - m->start == VECTOR_BLOCK_BYTES)
+	{
+	  /* This memory node corresponds to a vector block.  */
+	  struct vector_block *block = (struct vector_block *) m->start;
+	  struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+	  /* P is in the block's allocation range.  Scan the block
+	     up to P and see whether P points to the start of some
+	     vector which is not on a free list.  FIXME: check whether
+	     some allocation patterns (probably a lot of short vectors)
+	     may cause a substantial overhead of this loop.  */
+	  while (VECTOR_IN_BLOCK (vector, block) &&
+		 vector <= (struct Lisp_Vector *) p)
+	    {
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG)
+		  == VECTOR_FREE_LIST_FLAG)
+		vector = ADVANCE (vector, (vector->header.size & 
+					   (VECTOR_BLOCK_SIZE - 1)));
+	      else if (vector == p)
+		return 1;
+	      else
+		vector = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	}
+      else
+	{
+	  if (p == m->start)
+	    /* This memory node corresponds to a large vector.  */
+	    return 1;
+	}
+    }
+  return 0;
 }
 
 
@@ -6239,33 +6578,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6402,7 +6715,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6415,6 +6727,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2012-05-30 19:23:37 +0000
+++ src/lisp.h	2012-05-31 10:08:37 +0000
@@ -916,11 +916,15 @@
   {
     ptrdiff_t size;
 
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
-       Lisp_Vector alias, so for convenience it is a union instead of a
-       pointer: this way, one can write P->next.vector instead of ((struct
-       Lisp_Vector *) P->next).  */
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
+        Lisp_Vector alias, so for convenience it is a union instead of a
+        pointer: this way, one can write P->next.vector instead of ((struct
+        Lisp_Vector *) P->next).  */
     union {
+      ptrdiff_t nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-31 13:44           ` Dmitry Antipov
@ 2012-05-31 15:43             ` Paul Eggert
  2012-06-01  5:15               ` Dmitry Antipov
  2012-05-31 21:16             ` Stefan Monnier
  1 sibling, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2012-05-31 15:43 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

On 05/31/2012 06:44 AM, Dmitry Antipov wrote:
> Awaiting for more comments on this.

Thanks for looking into this.  Here's a brief low-level
review.  I have only read the patch; I haven't tried to
build or run it, or to think through the high-level
stuff.

Have you done some performance analysis on
typical 64- and 32-bit hosts?  Has this been published
somewhere?  Should be in the ChangeLog entry or whatnot.

> +#define VECTORS_PER_BLOCK_MAX \
> +  (VECTOR_BLOCK_BYTES / sizeof (struct vectorlike_header))

This macro is never used.  And surely the value is wrong, since
empty vectors are never put into blocks; it would be
(VECTOR_BLOCK_BYTES / sizeof (struct Lisp_Vector)).

Check to see whether there are other macros not used.
Configure with --enable-gcc-warnings; you may find other stuff.

> +/* Round up X to nearest mult-of-Y, assuming Y is a power of 2.  */
> +
> +#define roundup_powof2(x,y) (((x) + (y) - 1) & ~((y) - 1))

Need parentheses around "(y) - 1", in case x+y overflows but
x+(y-1) does not.

> +    /* On a 32-bit system, rounding up vector size (in bytes) up
> +       to mult-of-8 helps to maintain mult-of-8 alignment.  */
> +    roundup_size = 8

This alignment is needed only if USE_LSB_TAG, right?  If so, the
roundup should occur only on such hosts.

> +#define VECTOR_BLOCK_BYTES \
> +  (VECTOR_BLOCK_SIZE - roundup_powof2 (sizeof (void *), roundup_size))

A simpler and clearer way to state this might be (VECTOR_BLOCK_SIZE -
sizeof (void *) & (roundup_size - 1)), or perhaps round_down
(VECTOR_BLOCK_SIZE - sizeof (void *)) if you'd rather encapsulate that
into a macro.  Here I'm assuming roundup_size is 1 if USE_LSB_TAG is
not defined.

The above suggests that "roundup_size" should be renamed to
"vector_alignment" or something like that, since it can be used
to round down as well as up.

> +#define VECTOR_MAX_FREE_LIST_INDEX ((VECTOR_BLOCK_BYTES / roundup_size) + 1)

Surely this should be ((VECTOR_BLOCK_BYTES + (roundup_size - 1)) / roundup_size).

> +/* When the vector is on a free list, vectorlike_header.SIZE is set to
> +   this special value ORed with vector's memory footprint size.  */
> +
> +#define VECTOR_FREE_LIST_FLAG \
> +  (((ptrdiff_t) ~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG | \
> +			(VECTOR_BLOCK_SIZE - 1)))

The "(ptrdiff_t) ~0) &" is a no-op and should be removed.

> +#define ADVANCE(v,nbytes) \
> +  (struct Lisp_Vector *) ((unsigned char *) (v) + (nbytes))

No need for the "unsigned char" here, or elsewhere in the code.
Since the code doesn't char whether signed or unsigned char is used
just use "char".

> +
> +/* Common shortcut to setup vector on a free list.  */
> +
> +#define SETUP_ON_FREE_LIST(v,nbytes,index) do { \
> +  (v)->header.size = VECTOR_FREE_LIST_FLAG | (nbytes); \
> +  eassert ((nbytes) % roundup_size == 0); \
> +  (index) = (nbytes) / roundup_size; \
> +  eassert ((index) < VECTOR_MAX_FREE_LIST_INDEX); \
> +  (v)->header.next.vector = vector_free_lists[(index)]; \
> +  vector_free_lists[(index)] = (v); } while (0)

Space after comma.  Reindent the do { ... }, e.g., like CHECK_RANGED_INTEGER.
Since roundup_size is a power of 2, it looks like the code should be
using its log base 2, so that the division can be turned into a shift.
No need for parentheses in "[(index)]".

> +  large_vectors = NULL;

Can't this be removed?  large_vectors must be NULL here.

> +  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
> +    vector_free_lists[i] = NULL;

Likewise.

> +  eassert (nbytes > sizeof (struct vectorlike_header) &&
> +	   nbytes <= VECTOR_BLOCK_BYTES);

Put && at start of line, and use "<" rather than ">" so that
it looks like "A < N && N <= B".

Be consistent about putting binary operators at the start of the next
line.

> +  /* First, try to allocate from a free list
> +     contains vectors of the requested size.  */

"contains"?  I can't parse that.  Maybe you meant "containing"?

> +  /* Next, check free lists contains larger vectors.  Since we will
> +     split the result, we should have remaining space large enough
> +     to use for one-slot vector at least.  */

Likewise.

> +  for (index = (nbytes + sizeof (struct Lisp_Vector)) / roundup_size;
> +       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
> +    if (vector_free_lists[index])
> +      {
> +	/* This vector is larger than it was requested.  */

Remove "it was".

> +	/* Excessive bytes are used for the smaller vector,
> +	   which should be set on an appropriate free list.  */

"Excessive" -> "Excess"

> +#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ? \
> +  (PSEUDOVECTOR_SIZE_MASK & (v)->header.size) : (v)->header.size)

Put "?" at start of next line, and use proper indentation.
No need for parentheses around if-part.

> +#define VECTOR_IN_BLOCK(vector,block) \
> +  (unsigned char *) (vector) <= (block)->data + \
> +    VECTOR_BLOCK_BYTES - sizeof (struct Lisp_Vector)

Put "+" at start of next line.  Parenthesize the entire definiens.

> +  for (i = 0; i < VECTOR_MAX_FREE_LIST_INDEX; i++)
> +    vector_free_lists[i] = NULL;

Use memset.

> +	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG) == 
> +		  VECTOR_FREE_LIST_FLAG)

"==" at start of next line (here, and in other places).

> +	  if (bprev)
> +	    bprev->next = block->next;
> +	  else
> +	    vector_blocks = block->next;

Change bprev's type to be of struct vector_block **, and have it
point to the address of the pointer to be stepped on, rather than
to the block containing the 'next' field.  I.e., bprev is initially
&vector_blocks, and thereafter it's &block->next.  Then the above code
can be simplified to:

      *bprev = block->next;

Use the same trick in the two or three other places that there is a
'prev' pointer.

> +  if (nbytes > VECTOR_BLOCK_BYTES)
> +    {
> +      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
> +      p->header.next.vector = large_vectors;
> +      large_vectors = p;
> +    }
> +  else
> +    /* Rounding is to preserve alignment.  */
> +    p = allocate_vector_from_block (roundup_powof2 (nbytes, roundup_size));

Reverse the condition and swap the then-and-else.

> +    /* When the vector is allocated from a vector block, NBYTES is used
> +       if the vector is not on a free list, and VECTOR is used otherwise.
> +       For large vector-like objects, BUFFER or VECTOR is used as a pointer
> +       to the next vector-like object.  It is generally a buffer or a 
> +        Lisp_Vector alias, so for convenience it is a union instead of a
> +        pointer: this way, one can write P->next.vector instead of ((struct
> +        Lisp_Vector *) P->next).  */

Indent consistently.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-31 15:43             ` Paul Eggert
@ 2012-06-01  5:15               ` Dmitry Antipov
  2012-06-01  5:44                 ` Paul Eggert
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-01  5:15 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

On 05/31/2012 07:43 PM, Paul Eggert wrote:

> Have you done some performance analysis on typical 64- and 32-bit hosts?

Sure.

> Has this been published somewhere?  Should be in the ChangeLog entry or whatnot.

I'll post it here, just after making them readable and comparable.

>> +    /* On a 32-bit system, rounding up vector size (in bytes) up
>> +       to mult-of-8 helps to maintain mult-of-8 alignment.  */
>> +    roundup_size = 8
>
> This alignment is needed only if USE_LSB_TAG, right?  If so, the
> roundup should occur only on such hosts.

It looks like you miss the core idea behind this code.

The rounding is not just to force alignment. For the block-allocated vectors,
this code rounds each possible vector size (in bytes) up to nearest
mult-of-roundup_size bytes, and then uses (size / roundup_size) as an index
into vector_free_lists. I believe this is the most natural way for both 32
and 64-bit systems with USE_LSB_TAG since it guarantees expected alignment
by design. On non-USE_LSB_TAG system, roundup_size is orthogonal to pointer
values (roundup_size can't be less than sizeof (Lisp_Object), of course).
For example, 32-bit system (non-wide-int, i.e. sizeof (Lisp_Object) == 4)
without USE_LSB_TAG is expected to work with roundup_size = 4, but I can't
test this and have no ideas why it's worth trying at all.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-01  5:15               ` Dmitry Antipov
@ 2012-06-01  5:44                 ` Paul Eggert
  2012-06-01  9:06                   ` Dmitry Antipov
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2012-06-01  5:44 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

On 05/31/2012 10:15 PM, Dmitry Antipov wrote:
> On non-USE_LSB_TAG system, roundup_size is orthogonal to pointer
> values (roundup_size can't be less than sizeof (Lisp_Object), of course).

Ah, OK, then we can make roundup_size be the equivalent of
COMMON_MULTIPLE (sizeof (Lisp_Object), defined USE_LSB_TAG ? 8 : 1).

> For example, 32-bit system (non-wide-int, i.e. sizeof (Lisp_Object) == 4)
> without USE_LSB_TAG is expected to work with roundup_size = 4, but I can't
> test this and have no ideas why it's worth trying at all.

It would waste less memory due to internal fragmentation, which
would be a win.  I don't see the downside -- perhaps you could
explain?



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-01  5:44                 ` Paul Eggert
@ 2012-06-01  9:06                   ` Dmitry Antipov
  2012-06-01 17:36                     ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-01  9:06 UTC (permalink / raw)
  To: Paul Eggert; +Cc: emacs-devel

On 06/01/2012 09:44 AM, Paul Eggert wrote:

>> On non-USE_LSB_TAG system, roundup_size is orthogonal to pointer
>> values (roundup_size can't be less than sizeof (Lisp_Object), of course).
>
> Ah, OK, then we can make roundup_size be the equivalent of
> COMMON_MULTIPLE (sizeof (Lisp_Object), defined USE_LSB_TAG ? 8 : 1).
>
>> For example, 32-bit system (non-wide-int, i.e. sizeof (Lisp_Object) == 4)
>> without USE_LSB_TAG is expected to work with roundup_size = 4, but I can't
>> test this and have no ideas why it's worth trying at all.
>
> It would waste less memory due to internal fragmentation, which
> would be a win.  I don't see the downside -- perhaps you could
> explain?

I expect very negligible win here. But I'll try to reconfigure with
--enable-use-lisp-union-type and see what happens.

BTW, is this configuration usable? When I'm trying to configure with
this option, I still get USE_LSB_TAG - but the comment in lisp.h says
they're incompatible. I suppose we shouldn't enforce USE_LSB_TAG
if USE_LISP_UNION_TYPE is defined, i.e.:

=== modified file 'src/lisp.h'
--- src/lisp.h	2012-05-30 19:23:37 +0000
+++ src/lisp.h	2012-06-01 08:58:16 +0000
@@ -201,9 +201,10 @@
  # endif
  #endif

-/* Let's USE_LSB_TAG on systems where we know malloc returns mult-of-8.  */
+/* If USE_LISP_UNION_TYPE is not explicitly enabled, let's USE_LSB_TAG
+   on systems where we know malloc returns mult-of-8.  */
  #if (defined GNU_MALLOC || defined DOUG_LEA_MALLOC || defined __GLIBC__ \
-     || defined DARWIN_OS || defined __sun)
+     || defined DARWIN_OS || defined __sun) && !defined USE_LISP_UNION_TYPE
  /* We also need to be able to specify mult-of-8 alignment on static vars.  */
  # if defined DECL_ALIGN
  /* On hosts where pointers-as-ints do not exceed VAL_MAX,

Dmitry



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-01  9:06                   ` Dmitry Antipov
@ 2012-06-01 17:36                     ` Stefan Monnier
  2012-06-02  0:32                       ` Paul Eggert
  0 siblings, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-06-01 17:36 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: Paul Eggert, emacs-devel

> I expect very negligible win here. But I'll try to reconfigure with
> --enable-use-lisp-union-type and see what happens.
> BTW, is this configuration usable?

I've been using it daily for several years, yes.

> When I'm trying to configure with this option, I still get USE_LSB_TAG

Actually, with lisp-union-type the C code doesn't really care or know
whether the tags are on LSB or MSB anyway, so I wouldn't bother paying
attention to this situation.


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-01 17:36                     ` Stefan Monnier
@ 2012-06-02  0:32                       ` Paul Eggert
  2012-06-02  7:41                         ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2012-06-02  0:32 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: Eli Zaretskii, Dmitry Antipov, emacs-devel

On 06/01/2012 10:36 AM, Stefan Monnier wrote:
> with lisp-union-type the C code doesn't really care or know
> whether the tags are on LSB or MSB anyway

True, but USE_LSB_TAG doesn't mean "the internal
representation of Emacs objects uses low-order tag bits".
It means "when tagging, ignore the low order
bits of pointers because it's safe to assume they're zero".
This is not quite the same thing (and perhaps suggests that
USE_LSB_TAG is a poorly-chosen name...).

To put it another way, lisp.h has code in the
USE_LISP_UNION_TYPE case that behaves differently
depending on whether USE_LSB_TAG is set.  Also, the
comment that Dmitry noted is out of date: USE_LSB_TAG does
work when combined with USE_LISP_UNION_TYPE.

This whole area is in a bit of a mess.  It's not just
that comment.  While looking into this and cleaning it up
I also found the following:

 * The union data structure depends on WORDS_BIGENDIAN
   for no particularly good reason nowadays, and this just
   confuses things.

 * The w32heap code is confused about this, and thinks that
   USE_LISP_UNION_TYPE invalidates USE_LSB_TAG, which maybe
   used to be true but is no longer true.

 * There's no way for the builder to force a build
   without USE_LSB_TAG (Dmitry's point).

 * There are a few annoying warnings if you compile unions
   with gcc -Wall.

You can find the patch for all this at
<http://bugs.gnu.org/11604>.  I'll CC: Eli
since this affects w32 code.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-02  0:32                       ` Paul Eggert
@ 2012-06-02  7:41                         ` Eli Zaretskii
  2012-06-03  6:49                           ` Paul Eggert
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2012-06-02  7:41 UTC (permalink / raw)
  To: Paul Eggert; +Cc: dmantipov, monnier, emacs-devel

> Date: Fri, 01 Jun 2012 17:32:54 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: Dmitry Antipov <dmantipov@yandex.ru>, emacs-devel@gnu.org, 
>  Eli Zaretskii <eliz@gnu.org>
> 
> This whole area is in a bit of a mess.  It's not just
> that comment.  While looking into this and cleaning it up
> I also found the following:
> 
>  * The union data structure depends on WORDS_BIGENDIAN
>    for no particularly good reason nowadays, and this just
>    confuses things.
> 
>  * The w32heap code is confused about this, and thinks that
>    USE_LISP_UNION_TYPE invalidates USE_LSB_TAG, which maybe
>    used to be true but is no longer true.
> 
>  * There's no way for the builder to force a build
>    without USE_LSB_TAG (Dmitry's point).
> 
>  * There are a few annoying warnings if you compile unions
>    with gcc -Wall.
> 
> You can find the patch for all this at
> <http://bugs.gnu.org/11604>.  I'll CC: Eli
> since this affects w32 code.

Unfortunately, that patch does nothing IMO to reduce "the mess".  To
reduce the mess, we should add commentary that would replace wading
through twisty macros all alike with reading clear descriptions in
plain English.  By contrast, the patch doesn't add any documentation,
it actually _removes_ some of the comments.

In this situation, saying that w32heap.c is "confused" is really being
facetious, to say the least.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-02  7:41                         ` Eli Zaretskii
@ 2012-06-03  6:49                           ` Paul Eggert
  2012-06-03 14:26                             ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Paul Eggert @ 2012-06-03  6:49 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: dmantipov, monnier, emacs-devel

On 06/02/2012 12:41 AM, Eli Zaretskii wrote:
> To reduce the mess, we should add commentary that would replace wading
> through twisty macros

That would also help, and I can work on doing something along those
lines, but it's a bigger change and right now I'm just trying to help
out with the immediate issues raised by the block-based vector
allocator.  One thing at a time.

> the patch doesn't add any documentation,
> it actually _removes_ some of the comments.

I will revise the patch so that it doesn't contain the comment-removal
that you objected to.

> In this situation, saying that w32heap.c is "confused" is really being
> facetious, to say the least.

I would rather not have the w32heap.c issue slow things down, so I
will revise the patch to leave w32heap.c alone.  Any problems that
w32heap.c has will be independent of the revised patch.

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-03  6:49                           ` Paul Eggert
@ 2012-06-03 14:26                             ` Eli Zaretskii
  0 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2012-06-03 14:26 UTC (permalink / raw)
  To: Paul Eggert; +Cc: dmantipov, monnier, emacs-devel

> Date: Sat, 02 Jun 2012 23:49:33 -0700
> From: Paul Eggert <eggert@cs.ucla.edu>
> CC: monnier@iro.umontreal.ca, dmantipov@yandex.ru, emacs-devel@gnu.org
> 
> > In this situation, saying that w32heap.c is "confused" is really being
> > facetious, to say the least.
> 
> I would rather not have the w32heap.c issue slow things down, so I
> will revise the patch to leave w32heap.c alone.  Any problems that
> w32heap.c has will be independent of the revised patch.

When/if the issue is sufficiently well documented in lisp.h, perhaps
even I will understand what's wrong about w32heap.c.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-31 13:44           ` Dmitry Antipov
  2012-05-31 15:43             ` Paul Eggert
@ 2012-05-31 21:16             ` Stefan Monnier
  2012-06-01  7:34               ` Dmitry Antipov
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-05-31 21:16 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

> +#define VECTOR_BLOCK_SIZE 4096

This needs a comment explaining the choice of constant.  E.g. it should
say that the code makes no particular assumption about this constant, so
it does not have to be a power of 2.

> +    /* On a 32-bit system, rounding up vector size (in bytes) up
> +       to mult-of-8 helps to maintain mult-of-8 alignment.  */
> +    roundup_size = 8

IIUC what the comment should say is something more like "Used to ensure
alignment (e.g. LSB_TAG requires this to be a multiple of 8), as well as
reduce the size of the vector_free_lists array".

> +#define VECTORS_PER_BLOCK_MAX \
> +  (VECTOR_BLOCK_BYTES / sizeof (struct vectorlike_header))

Unused.

> +/* We maintain one free list for each possible block-allocated
> +   vector size, and this is now much of the free lists we have.  */

Should "now much of the" be replace by "the number of"?

> +struct vector_block
> +{
> +  unsigned char data[VECTOR_BLOCK_BYTES];
> +  struct vector_block *next;
> +};

Wouldn't it be easier to define VECTOR_BLOCK_BYTES to be an arbitrary
constant, and then VECTOR_BLOCK_SIZE to be "sizeof (struct vector_block)"?

> -  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
> +  if (m->type == MEM_TYPE_VECTORLIKE)
> +    {
> +      if (m->end - m->start == VECTOR_BLOCK_BYTES)

Please just use a new MEM_TYPE, so we don't need to worry about the
possibility of having a non-vector-block which just happens to have the
same size as a vector-block.

Other than that (and Paul's comments), this looks good, thank you very
much,

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-05-31 21:16             ` Stefan Monnier
@ 2012-06-01  7:34               ` Dmitry Antipov
  2012-06-01 17:40                 ` Stefan Monnier
  2012-06-01 17:43                 ` Stefan Monnier
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-01  7:34 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 06/01/2012 01:16 AM, Stefan Monnier wrote:

>> -  return (p == m->start&&  m->type == MEM_TYPE_VECTORLIKE);
>> +  if (m->type == MEM_TYPE_VECTORLIKE)
>> +    {
>> +      if (m->end - m->start == VECTOR_BLOCK_BYTES)
>
> Please just use a new MEM_TYPE, so we don't need to worry about the
> possibility of having a non-vector-block which just happens to have the
> same size as a vector-block.

I don't understand this. Any block-allocated vector P0 has the size less than
or equal to VECTOR_BLOCK_BYTES. So, mem_find (P0) will always return mem_node
M0 where M0->end - M0->start == VECTOR_BLOCK_BYTES (IOW, such a mem_node always
corresponds to a vector block). If P1's size is greater than VECTOR_BLOCK_BYTES,
including the case when it's equal to sizeof (vector_block), mem_find (P1) will
always return it's own corresponding node M1 where P1 is equal to M1->start and
M1->end - M1->start is never equal to VECTOR_BLOCK_BYTES.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-01  7:34               ` Dmitry Antipov
@ 2012-06-01 17:40                 ` Stefan Monnier
  2012-06-01 17:43                 ` Stefan Monnier
  1 sibling, 0 replies; 62+ messages in thread
From: Stefan Monnier @ 2012-06-01 17:40 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>>> -  return (p == m->start&&  m->type == MEM_TYPE_VECTORLIKE);
>>> +  if (m->type == MEM_TYPE_VECTORLIKE)
>>> +    {
>>> +      if (m->end - m->start == VECTOR_BLOCK_BYTES)
>> 
>> Please just use a new MEM_TYPE, so we don't need to worry about the
>> possibility of having a non-vector-block which just happens to have the
>> same size as a vector-block.

> I don't understand this.  Any block-allocated vector P0 has the size
> less than or equal to VECTOR_BLOCK_BYTES.  So, mem_find (P0) will
> always return mem_node M0 where M0->end - M0->start ==
> VECTOR_BLOCK_BYTES (IOW, such a mem_node always corresponds to
> a vector block).  If P1's size is greater than VECTOR_BLOCK_BYTES,
> including the case when it's equal to sizeof (vector_block), mem_find
> (P1) will always return it's own corresponding node M1 where P1 is
> equal to M1->start and M1-> end - M1->start is never equal to
> VECTOR_BLOCK_BYTES.

I'm not saying the code is wrong.  I'm saying that its correctness
should be made more explicit by using a different MEM_TYPE for
vector blocks.  This way the correctness argument is trivial rather than
relying on some reasoning about which kind of vector allocation
is used in which circumstance.


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-01  7:34               ` Dmitry Antipov
  2012-06-01 17:40                 ` Stefan Monnier
@ 2012-06-01 17:43                 ` Stefan Monnier
  2012-06-06  7:02                   ` Dmitry Antipov
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-06-01 17:43 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

> I don't understand this.  Any block-allocated vector P0 has the size
> less than or equal to VECTOR_BLOCK_BYTES.  So, mem_find (P0) will
> always return mem_node M0 where M0->end - M0->start ==
> VECTOR_BLOCK_BYTES (IOW, such a mem_node always corresponds to
> a vector block).  If P1's size is greater than VECTOR_BLOCK_BYTES,
> including the case when it's equal to sizeof (vector_block), mem_find
> (P1) will always return it's own corresponding node M1 where P1 is
> equal to M1->start and M1->end - M1->start is never equal to
> VECTOR_BLOCK_BYTES.

BTW, if we change the vector allocation so it only uses vector-blocks
for vectors smaller than "vector-block-size / 2", as I think we should,
then I think your reasoning breaks down.


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-01 17:43                 ` Stefan Monnier
@ 2012-06-06  7:02                   ` Dmitry Antipov
  2012-06-06 13:13                     ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-06  7:02 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 06/01/2012 09:43 PM, Stefan Monnier wrote:

> BTW, if we change the vector allocation so it only uses vector-blocks
> for vectors smaller than "vector-block-size / 2", as I think we should,

Why? Whatever this limit's value, it's possible to construct an allocation
pattern which will be a worst-case for this allocator, and I don't see
why one specially designed worst-case is more probabilistic than another.

[... from another e-mail...]

 > I'm not saying the code is wrong.  I'm saying that its correctness
 > should be made more explicit by using a different MEM_TYPE for
 > vector blocks.  This way the correctness argument is trivial rather than
 > relying on some reasoning about which kind of vector allocation
 > is used in which circumstance.

Yet another mem_type duplicates live_vector_p, so complicates stack
marking code and makes it slower; IMHO it's not worth trying to make the
code larger and slower just for making correctness obvious for the reader.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-06  7:02                   ` Dmitry Antipov
@ 2012-06-06 13:13                     ` Stefan Monnier
  2012-06-06 14:58                       ` Dmitry Antipov
  0 siblings, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-06-06 13:13 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> BTW, if we change the vector allocation so it only uses vector-blocks
>> for vectors smaller than "vector-block-size / 2", as I think we should,
> Why?

I explained this earlier: using a vector-block for largish vectors is
not efficient (because the overhead of the vector-block is not shared
among enough vectors).
E.g. for a vector of size VECTOR_BLOCK_BYTES, using the vector-block
code is a complete waste .

> Whatever this limit's value, it's possible to construct an allocation
> pattern which will be a worst-case for this allocator, and I don't see
> why one specially designed worst-case is more probabilistic than another.

For the case of a vector of size VECTOR_BLOCK_BYTES, allocating in
a vector block will always be a bad idea, no matter the scenario.
Now obviously, this is not true of all vector sizes, and I don't know
where's the threshold where we should switch from one to the other, but
vector-block-size / 2 sounds like a safe choice (even if the threshold
is not quite there, there's probably not much to gain from being
closer).

>> I'm not saying the code is wrong.  I'm saying that its correctness
>> should be made more explicit by using a different MEM_TYPE for
>> vector blocks.  This way the correctness argument is trivial rather than
>> relying on some reasoning about which kind of vector allocation
>> is used in which circumstance.
> Yet another mem_type duplicates live_vector_p,

I don't see why that should be the case.

        Stefan

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-06 13:13                     ` Stefan Monnier
@ 2012-06-06 14:58                       ` Dmitry Antipov
  2012-06-06 19:18                         ` Stefan Monnier
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-06 14:58 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

On 06/06/2012 05:13 PM, Stefan Monnier wrote:

> I explained this earlier: using a vector-block for largish vectors is
> not efficient (because the overhead of the vector-block is not shared
> among enough vectors).
> E.g. for a vector of size VECTOR_BLOCK_BYTES, using the vector-block
> code is a complete waste

...of just one pointer, so 8/4088, or 0.2% in terms of space for this rare
case; an overhead of having one more mem_node is 6x larger. As for the
speed, the difference is harder to predict: for the same amount of vectors,
more blocks adds more overhead of per-block allocation and sweeping; on the
other side, less blocks adds more mem_nodes and thus more overhead of all
mem_node tree operations.

BTW, I suppose the whole thing should be under #if GC_MARK_STACK.

> For the case of a vector of size VECTOR_BLOCK_BYTES, allocating in
> a vector block will always be a bad idea, no matter the scenario.

Allocating a lot of VECTOR_BLOCK_BYTES / 2 + sizeof (Lisp_Object)
vectors (and negligible amount of others) will waste ~50% of space in
blocks; if the block allocation limit is VECTOR_BLOCK_BYTES / 2,
allocating a lot of VECTOR_BLOCK_BYTES / 4 + sizeof (Lisp_Object)
vectors will waste ~25% of space in blocks, etc. I believe this is the
most important problem with current design. So, per-block allocation
limit should be an answer to two questions: 1) how often we expect to
get the worst case and 2) how much we allow to waste for such a case.

Dmitry

^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-06 14:58                       ` Dmitry Antipov
@ 2012-06-06 19:18                         ` Stefan Monnier
  2012-06-07 10:03                           ` Dmitry Antipov
  0 siblings, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-06-06 19:18 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> I explained this earlier: using a vector-block for largish vectors is
>> not efficient (because the overhead of the vector-block is not shared
>> among enough vectors).
>> E.g. for a vector of size VECTOR_BLOCK_BYTES, using the vector-block
>> code is a complete waste

> ...of just one pointer, so 8/4088, or 0.2% in terms of space for this rare
> case; an overhead of having one more mem_node is 6x larger.

No, no, no: since this VECTOR_BLOCK_BYTES will fill the whole
vector-block it will incur the full cost of a mem_mode anyway when
allocating the vector-block.

>> For the case of a vector of size VECTOR_BLOCK_BYTES, allocating in
>> a vector block will always be a bad idea, no matter the scenario.
> Allocating a lot of VECTOR_BLOCK_BYTES / 2 + sizeof (Lisp_Object)
> vectors (and negligible amount of others) will waste ~50% of space in
> blocks;

I'm not too worried about this, actually, what I'm after is different:

The cost (CPU or memory, it doesn't matter much here, and we ignore
fragmentation) of allocating a vector is something like:
and CPU costs alike, ignores fragmentation):
- outside of a vector-block: malloc + mem_node
- inside a vector-block: vector-block + frac * (malloc + mem_node + a-bit-more)
  where "frac" is a fraction that's more or less vector-size /
  VECTOR_BLOCK_BYTES.
So for small vectors, "frac" is small and since we expect the
vector-block overhead to be significantly smaller than malloc+mem_node
we win.  But past a certain value of "frac", we're better off allocating
outside of a vector-block.  As I said, I don't know exactly where that
threshold is, but using VECTOR_BLOCK_BYTES / 2 should be good enough.

And it also happens to help with the problem you mention, reducing the
worst-case 50% (which really means doubling the memory use) to
a worst-case 25% (which only means an extra 33% of memory use, which
I find a lot more acceptable) lost in fragmentation.


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-06 19:18                         ` Stefan Monnier
@ 2012-06-07 10:03                           ` Dmitry Antipov
  2012-06-07 14:07                             ` Stefan Monnier
  2012-06-08  6:38                             ` Paul Eggert
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-07 10:03 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 940 bytes --]

On 06/06/2012 11:18 PM, Stefan Monnier wrote:

> The cost (CPU or memory, it doesn't matter much here, and we ignore
> fragmentation) of allocating a vector is something like:
> and CPU costs alike, ignores fragmentation):
> - outside of a vector-block: malloc + mem_node
> - inside a vector-block: vector-block + frac * (malloc + mem_node + a-bit-more)
>    where "frac" is a fraction that's more or less vector-size /
>    VECTOR_BLOCK_BYTES.
> So for small vectors, "frac" is small and since we expect the
> vector-block overhead to be significantly smaller than malloc+mem_node
> we win.  But past a certain value of "frac", we're better off allocating
> outside of a vector-block.  As I said, I don't know exactly where that
> threshold is, but using VECTOR_BLOCK_BYTES / 2 should be good enough.

OK. This is a bit more complicated since we need some tricks to
manage free space larger than (approx.) VECTOR_BLOCK_BYTES / 2.

Dmitry


[-- Attachment #2: vector_alloc.patch --]
[-- Type: text/plain, Size: 16011 bytes --]

=== modified file 'src/alloc.c'
--- src/alloc.c	2012-06-02 08:52:27 +0000
+++ src/alloc.c	2012-06-07 09:30:04 +0000
@@ -304,7 +304,9 @@
      process, hash_table, frame, terminal, and window, but we never made
      use of the distinction, so it only caused source-code complexity
      and runtime slowdown.  Minor but pointless.  */
-  MEM_TYPE_VECTORLIKE
+  MEM_TYPE_VECTORLIKE,
+  /* Special type to denote vector blocks.  */
+  MEM_TYPE_VECTOR_BLOCK
 };
 
 static void *lisp_malloc (size_t, enum mem_type);
@@ -494,6 +496,11 @@
   xsignal (Qnil, Vmemory_signal_data);
 }
 
+/* A common multiple of the positive integers A and B.  Ideally this
+   would be the least common multiple, but there's no way to do that
+   as a constant expression in C, so do the best that we can easily do.  */
+#define COMMON_MULTIPLE(a, b) \
+  ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b))
 
 #ifndef XMALLOC_OVERRUN_CHECK
 #define XMALLOC_OVERRUN_CHECK_OVERHEAD 0
@@ -525,12 +532,8 @@
       char c;						\
     },							\
     c)
+
 #ifdef USE_LSB_TAG
-/* A common multiple of the positive integers A and B.  Ideally this
-   would be the least common multiple, but there's no way to do that
-   as a constant expression in C, so do the best that we can easily do.  */
-# define COMMON_MULTIPLE(a, b) \
-    ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b))
 # define XMALLOC_HEADER_ALIGNMENT \
     COMMON_MULTIPLE (1 << GCTYPEBITS, XMALLOC_BASE_ALIGNMENT)
 #else
@@ -2928,17 +2931,334 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
+/* This value is balanced well enough to avoid too much internal overhead
+   for the most common cases; it's not required to be a power of two, but
+   it's expected to be a mult-of-ROUNDUP_SIZE (see below).  */
 
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE 4096
 
 /* Handy constants for vectorlike objects.  */
 enum
   {
     header_size = offsetof (struct Lisp_Vector, contents),
-    word_size = sizeof (Lisp_Object)
+    word_size = sizeof (Lisp_Object),
+    roundup_size = COMMON_MULTIPLE (sizeof (Lisp_Object),
+#ifdef USE_LSB_TAG
+    8 /* Helps to maintain alignment constraints.  */
+#else
+    1 /* If alignment doesn't matter, should round up
+	 to sizeof (Lisp_Object) at least.  */
+#endif
+    )
   };
 
+/* Round up X to nearest mult-of-ROUNDUP_SIZE,
+   assuming ROUNDUP_SIZE is a power of 2.  */
+
+#define vroundup(x) (((x) + (roundup_size - 1)) & ~(roundup_size - 1))
+
+/* Rounding helps to maintain alignment constraints if USE_LSB_TAG.  */
+
+#define VECTOR_BLOCK_BYTES (VECTOR_BLOCK_SIZE - vroundup (sizeof (void *)))
+
+/* Size of the smallest vector allocated from block.  */
+
+#define VBLOCK_BYTES_MIN (vroundup (sizeof (struct Lisp_Vector)))
+
+/* Size of the largest vector allocated from block.  */
+
+#define VBLOCK_BYTES_MAX \
+  (vroundup ((VECTOR_BLOCK_BYTES / 2) - sizeof (Lisp_Object)))
+
+/* We maintain one free list for each possible block-allocated
+   vector size, and this is the number of free lists we have.  */
+
+#define VECTOR_MAX_FREE_LIST_INDEX				\
+  ((VBLOCK_BYTES_MAX - VBLOCK_BYTES_MIN) / roundup_size + 1)
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_FLAG				\
+  ((~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG		\
+	    | (VECTOR_BLOCK_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) (struct Lisp_Vector *) ((char *) (v) + (nbytes))
+
+/* Common shortcut to calculate NBYTES-vector index in VECTOR_FREE_LISTS.  */
+
+#define VINDEX(nbytes) (((nbytes) - VBLOCK_BYTES_MIN) / roundup_size)
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index)			\
+  do {								\
+    (v)->header.size = VECTOR_FREE_LIST_FLAG | (nbytes);	\
+    eassert ((nbytes) % roundup_size == 0);			\
+    (index) = VINDEX (nbytes);					\
+    eassert ((index) < VECTOR_MAX_FREE_LIST_INDEX);		\
+    (v)->header.next.vector = vector_free_lists[index];		\
+    vector_free_lists[index] = (v);				\
+  } while (0)
+
+struct vector_block
+{
+  char data[VECTOR_BLOCK_BYTES];
+  struct vector_block *next;
+};
+
+/* Chain of vector blocks.  */
+
+static struct vector_block *vector_blocks;
+
+/* Vector free lists, where NTH item points to a chain of free
+   vectors of the same NBYTES size, so NTH == VINDEX (NBYTES).  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* The only vector with 0 slots, allocated from pure space.  */
+
+static struct Lisp_Vector *zero_vector;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  block = xmalloc (sizeof (struct vector_block));
+  if (!block)
+    memory_full (sizeof (struct vector_block));
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTOR_BLOCK);
+#endif
+
+  block->next = vector_blocks;
+  vector_blocks = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  zero_vector = (struct Lisp_Vector *)
+    pure_alloc (header_size, Lisp_Vectorlike);
+  zero_vector->header.size = 0;
+}
+
+/* If VECTOR is too large to join a free list at once,
+   this function splits it to smaller fragments. Next,
+   all fragments are set up to appropriate free lists.  */
+
+static inline void
+setup_free_space (struct Lisp_Vector *vector)
+{
+  ptrdiff_t index, usedbytes, restbytes;
+
+  for (restbytes = vector->header.next.nbytes;
+       restbytes >= VBLOCK_BYTES_MIN; restbytes -= usedbytes)
+    {
+      eassert (restbytes % roundup_size == 0);
+      usedbytes = min (restbytes, VBLOCK_BYTES_MAX);
+
+      /* Do not create too small fragments.  */
+      if (restbytes - usedbytes > 0)
+	while (restbytes - usedbytes < VBLOCK_BYTES_MIN)
+	  usedbytes -= roundup_size;
+
+      SETUP_ON_FREE_LIST (vector, usedbytes, index);
+      vector = ADVANCE (vector, usedbytes);
+    }
+
+  /* Make sure all space is used.  */
+  eassert (restbytes == 0);
+}
+
+/* Allocate vector from a vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector, *rest;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  eassert (VBLOCK_BYTES_MIN <= nbytes && nbytes <= VBLOCK_BYTES_MAX);
+  eassert (nbytes % roundup_size == 0);
+
+  /* First, try to allocate from a free list
+     containing vectors of the requested size.  */
+  index = VINDEX (nbytes);
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists containing larger vectors.  Since
+     we will split the result, we should have remaining space
+     large enough to use for one-slot vector at least.  */
+  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
+       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
+    if (vector_free_lists[index])
+      {
+	/* This vector is larger than requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excess bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * roundup_size + VBLOCK_BYTES_MIN - nbytes;
+	eassert (restbytes % roundup_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Finally, need a new vector block.  */
+  block = allocate_vector_block ();
+
+  /* New vector will be at the beginning of this block.  */
+  vector = (struct Lisp_Vector *) block->data;
+  vector->header.next.nbytes = nbytes;
+
+  /* The rest space from this block is free.  */
+  rest = ADVANCE (vector, nbytes);
+  rest->header.next.nbytes = VECTOR_BLOCK_BYTES - nbytes;
+  setup_free_space (rest);
+
+  return vector;
+ }
+
+/* Return how many Lisp_Objects can be stored in V.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ?		\
+			(PSEUDOVECTOR_SIZE_MASK & (v)->header.size) :	\
+			(v)->header.size)
+
+/* Nonzero if VECTOR pointer is valid pointer inside BLOCK.  */
+
+#define VECTOR_IN_BLOCK(vector,block)			\
+  ((char *) (vector) <= (block)->data			\
+   + VECTOR_BLOCK_BYTES - VBLOCK_BYTES_MIN)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_blocks, **bprev = &vector_blocks;
+  struct Lisp_Vector *vector, *next, **vprev = &large_vectors;
+
+  total_vector_size = 0;
+  memset (vector_free_lists, 0, sizeof (vector_free_lists));
+
+  /* Looking through vector blocks.  */
+
+  for (block = vector_blocks; block; block = *bprev)
+    {
+      int free_this_block = 0;
+
+      for (vector = (struct Lisp_Vector *) block->data;
+	   VECTOR_IN_BLOCK (vector, block); vector = next)
+	{
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector);
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      ptrdiff_t nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG)
+		  == VECTOR_FREE_LIST_FLAG)
+		vector->header.next.nbytes =
+		  vector->header.size & (VECTOR_BLOCK_SIZE - 1);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while (VECTOR_IN_BLOCK (next, block))
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_FLAG)
+		      == VECTOR_FREE_LIST_FLAG)
+		    nbytes = next->header.size & (VECTOR_BLOCK_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+
+	      /* Make sure resulting vector is valid.  */
+	      eassert (vector->header.next.nbytes % roundup_size == 0);
+
+	      if (vector == (struct Lisp_Vector *) block->data
+		  && !VECTOR_IN_BLOCK (next, block))
+		/* This block should be freed because all of it's
+		   space was coalesced into the only free vector.  */
+		free_this_block = 1;
+	      else
+		setup_free_space (vector);
+	    }
+	}
+
+      if (free_this_block)
+	{
+	  *bprev = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  xfree (block);
+	}
+      else
+	bprev = &block->next;
+    }
+
+  /* Sweep large vectors.  */
+
+  for (vector = large_vectors; vector; vector = *vprev)
+    {
+      if (VECTOR_MARKED_P (vector))
+	{
+	  VECTOR_UNMARK (vector);
+	  total_vector_size += VECTOR_SIZE (vector);
+	  vprev = &vector->header.next.vector;
+	}
+      else
+	{
+	  *vprev = vector->header.next.vector;
+	  lisp_free (vector);
+	}
+    }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2960,8 +3280,19 @@
   /* This gets triggered by code which I haven't bothered to fix.  --Stef  */
   /* eassert (!handling_signal); */
 
+  if (len == 0)
+    return zero_vector;
+
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+
+  if (nbytes <= VBLOCK_BYTES_MAX)
+    p = allocate_vector_from_block (vroundup (nbytes));
+  else
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2971,9 +3302,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4072,7 +4400,35 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTOR_BLOCK)
+    {
+      /* This memory node corresponds to a vector block.  */
+      struct vector_block *block = (struct vector_block *) m->start;
+      struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+      /* P is in the block's allocation range.  Scan the block
+	 up to P and see whether P points to the start of some
+	 vector which is not on a free list.  FIXME: check whether
+	 some allocation patterns (probably a lot of short vectors)
+	 may cause a substantial overhead of this loop.  */
+      while (VECTOR_IN_BLOCK (vector, block)
+	     && vector <= (struct Lisp_Vector *) p)
+	{
+	  if ((vector->header.size & VECTOR_FREE_LIST_FLAG)
+	      == VECTOR_FREE_LIST_FLAG)
+	    vector = ADVANCE (vector, (vector->header.size
+				       & (VECTOR_BLOCK_SIZE - 1)));
+	  else if (vector == p)
+	    return 1;
+	  else
+	    vector = ADVANCE (vector, vector->header.next.nbytes);
+	}
+    }
+  else if (m->type == MEM_TYPE_VECTORLIKE && p == m->start)
+    /* This memory node corresponds to a vector
+       allocated outside of a block.  */
+    return 1;
+  return 0;
 }
 
 
@@ -4272,6 +4628,7 @@
 	  break;
 
 	case MEM_TYPE_VECTORLIKE:
+	case MEM_TYPE_VECTOR_BLOCK:
 	  if (live_vector_p (m, p))
 	    {
 	      Lisp_Object tem;
@@ -4705,6 +5062,7 @@
       return live_float_p (m, p);
 
     case MEM_TYPE_VECTORLIKE:
+    case MEM_TYPE_VECTOR_BLOCK:
       return live_vector_p (m, p);
 
     default:
@@ -6241,33 +6599,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6404,7 +6736,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6417,6 +6748,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2012-05-30 19:23:37 +0000
+++ src/lisp.h	2012-06-07 05:19:08 +0000
@@ -916,11 +916,15 @@
   {
     ptrdiff_t size;
 
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      ptrdiff_t nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-07 10:03                           ` Dmitry Antipov
@ 2012-06-07 14:07                             ` Stefan Monnier
  2012-06-08  5:50                               ` Dmitry Antipov
  2012-06-08  6:38                             ` Paul Eggert
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-06-07 14:07 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> The cost (CPU or memory, it doesn't matter much here, and we ignore
>> fragmentation) of allocating a vector is something like:
>> and CPU costs alike, ignores fragmentation):
>> - outside of a vector-block: malloc + mem_node
>> - inside a vector-block: vector-block + frac * (malloc + mem_node + a-bit-more)
>> where "frac" is a fraction that's more or less vector-size /
>> VECTOR_BLOCK_BYTES.
>> So for small vectors, "frac" is small and since we expect the
>> vector-block overhead to be significantly smaller than malloc+mem_node
>> we win.  But past a certain value of "frac", we're better off allocating
>> outside of a vector-block.  As I said, I don't know exactly where that
>> threshold is, but using VECTOR_BLOCK_BYTES / 2 should be good enough.

> OK.  This is a bit more complicated since we need some tricks to
> manage free space larger than (approx.) VECTOR_BLOCK_BYTES / 2.

I do not know what you're referring to.  Could you explain what you mean
by that? ....aahhh.... right, I see what you mean now.
I don't think fragmenting the left-over space is a good idea.  Just keep
your free lists for free chunks of size [VECTOR_BLOCK_BYTES/2
... VECTOR_BLOCK_BYTES], even if you never allocate a vector of that
size from those free lists.  I.e. keep the old code, just change the part
that chooses between "allocate via vector-block or allocate directly" to
use a VECTOR_BLOCK_BYTES/2 threshold.

> +    roundup_size = COMMON_MULTIPLE (sizeof (Lisp_Object),
> +#ifdef USE_LSB_TAG
> +    8 /* Helps to maintain alignment constraints.  */
> +#else
> +    1 /* If alignment doesn't matter, should round up
> +	 to sizeof (Lisp_Object) at least.  */
> +#endif
> +    )

All I wanted was a comment explaining the choice of 8.  I think using
8 for all cases is perfectly fine (I don't know of any significant case
where the above expression will give something smaller anyway).

> +/* If VECTOR is too large to join a free list at once,
> +   this function splits it to smaller fragments. Next,
> +   all fragments are set up to appropriate free lists.  */
> +
> +static inline void
> +setup_free_space (struct Lisp_Vector *vector)
> +{
> +  ptrdiff_t index, usedbytes, restbytes;
> +
> +  for (restbytes = vector->header.next.nbytes;
> +       restbytes >= VBLOCK_BYTES_MIN; restbytes -= usedbytes)
> +    {
> +      eassert (restbytes % roundup_size == 0);
> +      usedbytes = min (restbytes, VBLOCK_BYTES_MAX);
> +
> +      /* Do not create too small fragments.  */
> +      if (restbytes - usedbytes > 0)
> +	while (restbytes - usedbytes < VBLOCK_BYTES_MIN)
> +	  usedbytes -= roundup_size;
> +
> +      SETUP_ON_FREE_LIST (vector, usedbytes, index);
> +      vector = ADVANCE (vector, usedbytes);
> +    }
> +
> +  /* Make sure all space is used.  */
> +  eassert (restbytes == 0);
> +}

This should not have any loop and should not fragment.  I.e. it should
be barely more than SETUP_ON_FREE_LIST (vector, restbytes, index).

The rest looks good now.  Feel free to install it,


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-07 14:07                             ` Stefan Monnier
@ 2012-06-08  5:50                               ` Dmitry Antipov
  2012-06-08  6:17                                 ` Stefan Monnier
  2012-06-08  6:57                                 ` Eli Zaretskii
  0 siblings, 2 replies; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-08  5:50 UTC (permalink / raw)
  To: Stefan Monnier; +Cc: emacs-devel

[-- Attachment #1: Type: text/plain, Size: 188 bytes --]

On 06/07/2012 06:07 PM, Stefan Monnier wrote:

> The rest looks good now.  Feel free to install it

OK, last call for comments, with internals.texi tweaks
and ChangeLog entries...

Dmitry

[-- Attachment #2: vector_alloc_final.patch --]
[-- Type: text/plain, Size: 19825 bytes --]

=== modified file 'doc/lispref/ChangeLog'
--- doc/lispref/ChangeLog	2012-06-03 09:03:23 +0000
+++ doc/lispref/ChangeLog	2012-06-08 05:48:11 +0000
@@ -1,3 +1,8 @@
+2012-06-08  Dmitry Antipov  <dmantipov@yandex.ru>
+
+	* internals.text (Garbage Collection): Document new
+	vector management code and vectorlike_header structure.
+
 2012-06-03  Chong Yidong  <cyd@gnu.org>
 
 	* modes.texi (Mode Line Data): Use "mode line construct"

=== modified file 'doc/lispref/internals.texi'
--- doc/lispref/internals.texi	2012-05-27 01:34:14 +0000
+++ doc/lispref/internals.texi	2012-06-08 05:40:00 +0000
@@ -215,10 +215,20 @@
 (such as by loading a library), that data is placed in normal storage.
 If normal storage runs low, then Emacs asks the operating system to
 allocate more memory.  Different types of Lisp objects, such as
-symbols, cons cells, markers, etc., are segregated in distinct blocks
-in memory.  (Vectors, long strings, buffers and certain other editing
-types, which are fairly large, are allocated in individual blocks, one
-per object, while small strings are packed into blocks of 8k bytes.)
+symbols, cons cells, small vectors, markers, etc., are segregated in
+distinct blocks in memory.  (Large vectors, long strings, buffers and
+certain other editing types, which are fairly large, are allocated in
+individual blocks, one per object; small strings are packed into blocks
+of 8k bytes, and small vectors are packed into blocks of 4k bytes).
+
+@cindex vectors and objects looks like a vectors
+  Beyond the basic vector, a lot of objects like window, buffer, and
+frame are managed as if they're vectors.  So, all corresponding data
+types includes the field @code{struct vectorlike_header}, where
+@code{header.next.buffer} points to the next buffer, in the chain of all
+buffers (including killed buffers), and @code{header.next.vector} points
+to the next vector in a free list.  For small vectors allocated from vector
+blocks, @code{header.next.nbytes} is the vector's memory footprint.
 
 @cindex garbage collection
   It is quite common to use some storage for a while, then release it
@@ -243,8 +253,12 @@
   The sweep phase puts unused cons cells onto a @dfn{free list}
 for future allocation; likewise for symbols and markers.  It compacts
 the accessible strings so they occupy fewer 8k blocks; then it frees the
-other 8k blocks.  Vectors, buffers, windows, and other large objects are
-individually allocated and freed using @code{malloc} and @code{free}.
+other 8k blocks.  Unreachable vectors from vector blocks are coalesced
+to create largest possible free areas; if free area occupies the whole 4k
+block, the block is freed.  Otherwise, the free area is recorded
+to a free list array, where each entry corresponds to a free list
+of areas of the same size. Large vectors, buffers, and other large objects
+are individually allocated and freed using @code{malloc} and @code{free}.
 
 @cindex CL note---allocate more storage
 @quotation

=== modified file 'src/ChangeLog'
--- src/ChangeLog	2012-06-07 05:11:51 +0000
+++ src/ChangeLog	2012-06-08 04:54:07 +0000
@@ -1,3 +1,28 @@
+2012-06-08  Dmitry Antipov  <dmantipov@yandex.ru>
+
+	Block-based vector allocation of small vectors.
+	* lisp.h (struct vectorlike_header): New field `nbytes',
+	adjust comment accordingly.
+	* alloc.c (enum mem_type): New type `MEM_TYPE_VECTOR_BLOCK'
+	to denote vector blocks. Adjust users (live_vector_p,
+	mark_maybe_pointer, valid_lisp_object_p) accordingly.
+	(COMMON_MULTIPLE): Move outside #if USE_LSB_TAG.
+	(VECTOR_BLOCK_SIZE, vroundup, VECTOR_BLOCK_BYTES),
+	(VBLOCK_BYTES_MIN, VBLOCK_BYTES_MAX, VECTOR_MAX_FREE_LIST_INDEX),
+	(VECTOR_FREE_LIST_FLAG, ADVANCE, VINDEX, SETUP_ON_FREE_LIST),
+	(VECTOR_SIZE, VECTOR_IN_BLOCK): New macros.
+	(roundup_size): New constant.
+	(struct vector_block): New data type.
+	(vector_blocks, vector_free_lists, zero_vector): New variables.
+	(all_vectors): Renamed to `large_vectors'.
+	(allocate_vector_from_block, init_vectors, allocate_vector_from_block)
+	(sweep_vectors): New functions.
+	(allocate_vectorlike): Return `zero_vector' as the only vector of
+	0 items. Allocate new vector from block if vector size is less than
+	or equal to VBLOCK_BYTES_MAX.
+	(Fgarbage_collect): Move all vector sweeping code to sweep_vectors.
+	(init_alloc_once): Add call to init_vectors.
+
 2012-06-07  Paul Eggert  <eggert@cs.ucla.edu>
 
 	* doprnt.c (doprnt): Truncate multibyte char correctly.

=== modified file 'src/alloc.c'
--- src/alloc.c	2012-06-02 08:52:27 +0000
+++ src/alloc.c	2012-06-08 03:41:27 +0000
@@ -304,7 +304,9 @@
      process, hash_table, frame, terminal, and window, but we never made
      use of the distinction, so it only caused source-code complexity
      and runtime slowdown.  Minor but pointless.  */
-  MEM_TYPE_VECTORLIKE
+  MEM_TYPE_VECTORLIKE,
+  /* Special type to denote vector blocks.  */
+  MEM_TYPE_VECTOR_BLOCK
 };
 
 static void *lisp_malloc (size_t, enum mem_type);
@@ -494,6 +496,11 @@
   xsignal (Qnil, Vmemory_signal_data);
 }
 
+/* A common multiple of the positive integers A and B.  Ideally this
+   would be the least common multiple, but there's no way to do that
+   as a constant expression in C, so do the best that we can easily do.  */
+#define COMMON_MULTIPLE(a, b) \
+  ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b))
 
 #ifndef XMALLOC_OVERRUN_CHECK
 #define XMALLOC_OVERRUN_CHECK_OVERHEAD 0
@@ -525,12 +532,8 @@
       char c;						\
     },							\
     c)
+
 #ifdef USE_LSB_TAG
-/* A common multiple of the positive integers A and B.  Ideally this
-   would be the least common multiple, but there's no way to do that
-   as a constant expression in C, so do the best that we can easily do.  */
-# define COMMON_MULTIPLE(a, b) \
-    ((a) % (b) == 0 ? (a) : (b) % (a) == 0 ? (b) : (a) * (b))
 # define XMALLOC_HEADER_ALIGNMENT \
     COMMON_MULTIPLE (1 << GCTYPEBITS, XMALLOC_BASE_ALIGNMENT)
 #else
@@ -2928,17 +2931,311 @@
 			   Vector Allocation
  ***********************************************************************/
 
-/* Singly-linked list of all vectors.  */
+/* This value is balanced well enough to avoid too much internal overhead
+   for the most common cases; it's not required to be a power of two, but
+   it's expected to be a mult-of-ROUNDUP_SIZE (see below).  */
 
-static struct Lisp_Vector *all_vectors;
+#define VECTOR_BLOCK_SIZE 4096
 
 /* Handy constants for vectorlike objects.  */
 enum
   {
     header_size = offsetof (struct Lisp_Vector, contents),
-    word_size = sizeof (Lisp_Object)
+    word_size = sizeof (Lisp_Object),
+    roundup_size = COMMON_MULTIPLE (sizeof (Lisp_Object),
+#ifdef USE_LSB_TAG
+    8 /* Helps to maintain alignment constraints imposed by
+	 assumption that least 3 bits of pointers are always 0.  */
+#else
+    1 /* If alignment doesn't matter, should round up
+	 to sizeof (Lisp_Object) at least.  */
+#endif
+    )
   };
 
+/* Round up X to nearest mult-of-ROUNDUP_SIZE,
+   assuming ROUNDUP_SIZE is a power of 2.  */
+
+#define vroundup(x) (((x) + (roundup_size - 1)) & ~(roundup_size - 1))
+
+/* Rounding helps to maintain alignment constraints if USE_LSB_TAG.  */
+
+#define VECTOR_BLOCK_BYTES (VECTOR_BLOCK_SIZE - vroundup (sizeof (void *)))
+
+/* Size of the minimal vector allocated from block.  */
+
+#define VBLOCK_BYTES_MIN (vroundup (sizeof (struct Lisp_Vector)))
+
+/* Size of the largest vector allocated from block.  */
+
+#define VBLOCK_BYTES_MAX					\
+  (vroundup ((VECTOR_BLOCK_BYTES / 2) - sizeof (Lisp_Object)))
+
+/* We maintain one free list for each possible block-allocated
+   vector size, and this is the number of free lists we have.  */
+
+#define VECTOR_MAX_FREE_LIST_INDEX				\
+  ((VECTOR_BLOCK_BYTES - VBLOCK_BYTES_MIN) / roundup_size + 1)
+
+/* When the vector is on a free list, vectorlike_header.SIZE is set to
+   this special value ORed with vector's memory footprint size.  */
+
+#define VECTOR_FREE_LIST_FLAG				\
+  ((~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG		\
+	    | (VECTOR_BLOCK_SIZE - 1)))
+
+/* Common shortcut to advance vector pointer over a block data.  */
+
+#define ADVANCE(v,nbytes) (struct Lisp_Vector *) ((char *) (v) + (nbytes))
+
+/* Common shortcut to calculate NBYTES-vector index in VECTOR_FREE_LISTS.  */
+
+#define VINDEX(nbytes) (((nbytes) - VBLOCK_BYTES_MIN) / roundup_size)
+
+/* Common shortcut to setup vector on a free list.  */
+
+#define SETUP_ON_FREE_LIST(v,nbytes,index)			\
+  do {								\
+    (v)->header.size = VECTOR_FREE_LIST_FLAG | (nbytes);	\
+    eassert ((nbytes) % roundup_size == 0);			\
+    (index) = VINDEX (nbytes);					\
+    eassert ((index) < VECTOR_MAX_FREE_LIST_INDEX);		\
+    (v)->header.next.vector = vector_free_lists[index];		\
+    vector_free_lists[index] = (v);				\
+  } while (0)
+
+struct vector_block
+{
+  char data[VECTOR_BLOCK_BYTES];
+  struct vector_block *next;
+};
+
+/* Chain of vector blocks.  */
+
+static struct vector_block *vector_blocks;
+
+/* Vector free lists, where NTH item points to a chain of free
+   vectors of the same NBYTES size, so NTH == VINDEX (NBYTES).  */
+
+static struct Lisp_Vector *vector_free_lists[VECTOR_MAX_FREE_LIST_INDEX];
+
+/* Singly-linked list of large vectors.  */
+
+static struct Lisp_Vector *large_vectors;
+
+/* The only vector with 0 slots, allocated from pure space.  */
+
+static struct Lisp_Vector *zero_vector;
+
+/* Get a new vector block.  */
+
+static struct vector_block *
+allocate_vector_block (void)
+{
+  struct vector_block *block;
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, 0);
+#endif
+
+  block = xmalloc (sizeof (struct vector_block));
+  if (!block)
+    memory_full (sizeof (struct vector_block));
+
+#ifdef DOUG_LEA_MALLOC
+  mallopt (M_MMAP_MAX, MMAP_MAX_AREAS);
+#endif
+
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+  mem_insert (block->data, block->data + VECTOR_BLOCK_BYTES,
+	      MEM_TYPE_VECTOR_BLOCK);
+#endif
+
+  block->next = vector_blocks;
+  vector_blocks = block;
+  return block;
+}
+
+/* Called once to initialize vector allocation.  */
+
+static void
+init_vectors (void)
+{
+  zero_vector = (struct Lisp_Vector *)
+    pure_alloc (header_size, Lisp_Vectorlike);
+  zero_vector->header.size = 0;
+}
+
+/* Allocate vector from a vector block.  */
+
+static struct Lisp_Vector *
+allocate_vector_from_block (size_t nbytes)
+{
+  struct Lisp_Vector *vector, *rest;
+  struct vector_block *block;
+  size_t index, restbytes;
+
+  eassert (VBLOCK_BYTES_MIN <= nbytes && nbytes <= VBLOCK_BYTES_MAX);
+  eassert (nbytes % roundup_size == 0);
+
+  /* First, try to allocate from a free list
+     containing vectors of the requested size.  */
+  index = VINDEX (nbytes);
+  if (vector_free_lists[index])
+    {
+      vector = vector_free_lists[index];
+      vector_free_lists[index] = vector->header.next.vector;
+      vector->header.next.nbytes = nbytes;
+      return vector;
+    }
+
+  /* Next, check free lists containing larger vectors.  Since
+     we will split the result, we should have remaining space
+     large enough to use for one-slot vector at least.  */
+  for (index = VINDEX (nbytes + VBLOCK_BYTES_MIN);
+       index < VECTOR_MAX_FREE_LIST_INDEX; index++)
+    if (vector_free_lists[index])
+      {
+	/* This vector is larger than requested.  */
+	vector = vector_free_lists[index];
+	vector_free_lists[index] = vector->header.next.vector;
+	vector->header.next.nbytes = nbytes;
+
+	/* Excess bytes are used for the smaller vector,
+	   which should be set on an appropriate free list.  */
+	restbytes = index * roundup_size + VBLOCK_BYTES_MIN - nbytes;
+	eassert (restbytes % roundup_size == 0);
+	rest = ADVANCE (vector, nbytes);
+	SETUP_ON_FREE_LIST (rest, restbytes, index);
+	return vector;
+      }
+
+  /* Finally, need a new vector block.  */
+  block = allocate_vector_block ();
+
+  /* New vector will be at the beginning of this block.  */
+  vector = (struct Lisp_Vector *) block->data;
+  vector->header.next.nbytes = nbytes;
+
+  /* If the rest of space from this block is large enough
+     for one-slot vector at least, set up it on a free list.  */
+  restbytes = VECTOR_BLOCK_BYTES - nbytes;
+  if (restbytes >= VBLOCK_BYTES_MIN)
+    {
+      eassert (restbytes % roundup_size == 0);
+      rest = ADVANCE (vector, nbytes);
+      SETUP_ON_FREE_LIST (rest, restbytes, index);
+    }
+  return vector;
+ }
+
+/* Return how many Lisp_Objects can be stored in V.  */
+
+#define VECTOR_SIZE(v) ((v)->header.size & PSEUDOVECTOR_FLAG ?		\
+			(PSEUDOVECTOR_SIZE_MASK & (v)->header.size) :	\
+			(v)->header.size)
+
+/* Nonzero if VECTOR pointer is valid pointer inside BLOCK.  */
+
+#define VECTOR_IN_BLOCK(vector,block)		\
+  ((char *) (vector) <= (block)->data		\
+   + VECTOR_BLOCK_BYTES - VBLOCK_BYTES_MIN)
+
+/* Reclaim space used by unmarked vectors.  */
+
+static void
+sweep_vectors (void)
+{
+  struct vector_block *block = vector_blocks, **bprev = &vector_blocks;
+  struct Lisp_Vector *vector, *next, **vprev = &large_vectors;
+
+  total_vector_size = 0;
+  memset (vector_free_lists, 0, sizeof (vector_free_lists));
+
+  /* Looking through vector blocks.  */
+
+  for (block = vector_blocks; block; block = *bprev)
+    {
+      int free_this_block = 0;
+
+      for (vector = (struct Lisp_Vector *) block->data;
+	   VECTOR_IN_BLOCK (vector, block); vector = next)
+	{
+	  if (VECTOR_MARKED_P (vector))
+	    {
+	      VECTOR_UNMARK (vector);
+	      total_vector_size += VECTOR_SIZE (vector);
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+	    }
+	  else
+	    {
+	      ptrdiff_t nbytes;
+
+	      if ((vector->header.size & VECTOR_FREE_LIST_FLAG)
+		  == VECTOR_FREE_LIST_FLAG)
+		vector->header.next.nbytes =
+		  vector->header.size & (VECTOR_BLOCK_SIZE - 1);
+	      
+	      next = ADVANCE (vector, vector->header.next.nbytes);
+
+	      /* While NEXT is not marked, try to coalesce with VECTOR,
+		 thus making VECTOR of the largest possible size.  */
+
+	      while (VECTOR_IN_BLOCK (next, block))
+		{
+		  if (VECTOR_MARKED_P (next))
+		    break;
+		  if ((next->header.size & VECTOR_FREE_LIST_FLAG)
+		      == VECTOR_FREE_LIST_FLAG)
+		    nbytes = next->header.size & (VECTOR_BLOCK_SIZE - 1);
+		  else
+		    nbytes = next->header.next.nbytes;
+		  vector->header.next.nbytes += nbytes;
+		  next = ADVANCE (next, nbytes);
+		}
+	      
+	      eassert (vector->header.next.nbytes % roundup_size == 0);
+
+	      if (vector == (struct Lisp_Vector *) block->data
+		  && !VECTOR_IN_BLOCK (next, block))
+		/* This block should be freed because all of it's
+		   space was coalesced into the only free vector.  */
+		free_this_block = 1;
+	      else
+		SETUP_ON_FREE_LIST (vector, vector->header.next.nbytes, nbytes);
+	    }
+	}
+
+      if (free_this_block)
+	{
+	  *bprev = block->next;
+#if GC_MARK_STACK && !defined GC_MALLOC_CHECK
+	  mem_delete (mem_find (block->data));
+#endif
+	  xfree (block);
+	}
+      else
+	bprev = &block->next;
+    }
+
+  /* Sweep large vectors.  */
+
+  for (vector = large_vectors; vector; vector = *vprev)
+    {
+      if (VECTOR_MARKED_P (vector))
+	{
+	  VECTOR_UNMARK (vector);
+	  total_vector_size += VECTOR_SIZE (vector);
+	  vprev = &vector->header.next.vector;
+	}
+      else
+	{
+	  *vprev = vector->header.next.vector;
+	  lisp_free (vector);
+	}
+    }
+}
+
 /* Value is a pointer to a newly allocated Lisp_Vector structure
    with room for LEN Lisp_Objects.  */
 
@@ -2960,8 +3257,19 @@
   /* This gets triggered by code which I haven't bothered to fix.  --Stef  */
   /* eassert (!handling_signal); */
 
+  if (len == 0)
+    return zero_vector;
+
   nbytes = header_size + len * word_size;
-  p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+
+  if (nbytes <= VBLOCK_BYTES_MAX)
+    p = allocate_vector_from_block (vroundup (nbytes));
+  else
+    {
+      p = (struct Lisp_Vector *) lisp_malloc (nbytes, MEM_TYPE_VECTORLIKE);
+      p->header.next.vector = large_vectors;
+      large_vectors = p;
+    }
 
 #ifdef DOUG_LEA_MALLOC
   /* Back to a reasonable maximum of mmap'ed areas.  */
@@ -2971,9 +3279,6 @@
   consing_since_gc += nbytes;
   vector_cells_consed += len;
 
-  p->header.next.vector = all_vectors;
-  all_vectors = p;
-
   MALLOC_UNBLOCK_INPUT;
 
   return p;
@@ -4072,7 +4377,34 @@
 static inline int
 live_vector_p (struct mem_node *m, void *p)
 {
-  return (p == m->start && m->type == MEM_TYPE_VECTORLIKE);
+  if (m->type == MEM_TYPE_VECTOR_BLOCK)
+    {
+      /* This memory node corresponds to a vector block.  */
+      struct vector_block *block = (struct vector_block *) m->start;
+      struct Lisp_Vector *vector = (struct Lisp_Vector *) block->data;
+
+      /* P is in the block's allocation range.  Scan the block
+	 up to P and see whether P points to the start of some
+	 vector which is not on a free list.  FIXME: check whether
+	 some allocation patterns (probably a lot of short vectors)
+	 may cause a substantial overhead of this loop.  */
+      while (VECTOR_IN_BLOCK (vector, block)
+	     && vector <= (struct Lisp_Vector *) p)
+	{
+	  if ((vector->header.size & VECTOR_FREE_LIST_FLAG)
+	      == VECTOR_FREE_LIST_FLAG)
+	    vector = ADVANCE (vector, (vector->header.size
+				       & (VECTOR_BLOCK_SIZE - 1)));
+	  else if (vector == p)
+	    return 1;
+	  else
+	    vector = ADVANCE (vector, vector->header.next.nbytes);
+	}
+    }
+  else if (m->type == MEM_TYPE_VECTORLIKE && p == m->start)
+    /* This memory node corresponds to a large vector.  */
+    return 1;
+  return 0;
 }
 
 
@@ -4272,6 +4604,7 @@
 	  break;
 
 	case MEM_TYPE_VECTORLIKE:
+	case MEM_TYPE_VECTOR_BLOCK:
 	  if (live_vector_p (m, p))
 	    {
 	      Lisp_Object tem;
@@ -4705,6 +5038,7 @@
       return live_float_p (m, p);
 
     case MEM_TYPE_VECTORLIKE:
+    case MEM_TYPE_VECTOR_BLOCK:
       return live_vector_p (m, p);
 
     default:
@@ -6241,33 +6575,7 @@
 	}
   }
 
-  /* Free all unmarked vectors */
-  {
-    register struct Lisp_Vector *vector = all_vectors, *prev = 0, *next;
-    total_vector_size = 0;
-
-    while (vector)
-      if (!VECTOR_MARKED_P (vector))
-	{
-	  if (prev)
-	    prev->header.next = vector->header.next;
-	  else
-	    all_vectors = vector->header.next.vector;
-	  next = vector->header.next.vector;
-	  lisp_free (vector);
-	  vector = next;
-
-	}
-      else
-	{
-	  VECTOR_UNMARK (vector);
-	  if (vector->header.size & PSEUDOVECTOR_FLAG)
-	    total_vector_size += PSEUDOVECTOR_SIZE_MASK & vector->header.size;
-	  else
-	    total_vector_size += vector->header.size;
-	  prev = vector, vector = vector->header.next.vector;
-	}
-  }
+  sweep_vectors ();
 
 #ifdef GC_CHECK_STRING_BYTES
   if (!noninteractive)
@@ -6404,7 +6712,6 @@
   Vdead = make_pure_string ("DEAD", 4, 4, 0);
 #endif
 
-  all_vectors = 0;
   ignore_warnings = 1;
 #ifdef DOUG_LEA_MALLOC
   mallopt (M_TRIM_THRESHOLD, 128*1024); /* trim threshold */
@@ -6417,6 +6724,7 @@
   init_marker ();
   init_float ();
   init_intervals ();
+  init_vectors ();
   init_weak_hash_tables ();
 
 #ifdef REL_ALLOC

=== modified file 'src/lisp.h'
--- src/lisp.h	2012-05-30 19:23:37 +0000
+++ src/lisp.h	2012-06-08 03:41:27 +0000
@@ -916,11 +916,15 @@
   {
     ptrdiff_t size;
 
-    /* Pointer to the next vector-like object.  It is generally a buffer or a
+    /* When the vector is allocated from a vector block, NBYTES is used
+       if the vector is not on a free list, and VECTOR is used otherwise.
+       For large vector-like objects, BUFFER or VECTOR is used as a pointer
+       to the next vector-like object.  It is generally a buffer or a 
        Lisp_Vector alias, so for convenience it is a union instead of a
        pointer: this way, one can write P->next.vector instead of ((struct
        Lisp_Vector *) P->next).  */
     union {
+      ptrdiff_t nbytes;
       struct buffer *buffer;
       struct Lisp_Vector *vector;
     } next;


^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-08  5:50                               ` Dmitry Antipov
@ 2012-06-08  6:17                                 ` Stefan Monnier
  2012-06-08  8:49                                   ` Dmitry Antipov
  2012-06-08  6:57                                 ` Eli Zaretskii
  1 sibling, 1 reply; 62+ messages in thread
From: Stefan Monnier @ 2012-06-08  6:17 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

>> The rest looks good now.  Feel free to install it
> OK, last call for comments, with internals.texi tweaks
> and ChangeLog entries...

Looks good to me, thank you,


        Stefan



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-08  6:17                                 ` Stefan Monnier
@ 2012-06-08  8:49                                   ` Dmitry Antipov
  2012-06-08  8:53                                     ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Dmitry Antipov @ 2012-06-08  8:49 UTC (permalink / raw)
  To: Stefan Monnier, Paul Eggert, Eli Zaretskii; +Cc: emacs-devel

Installed as 108520, with all suggested fixes and cleanups.

Dmitry



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-08  8:49                                   ` Dmitry Antipov
@ 2012-06-08  8:53                                     ` Eli Zaretskii
  2012-06-08  9:41                                       ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2012-06-08  8:53 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: eggert, monnier, emacs-devel

> Date: Fri, 08 Jun 2012 12:49:11 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> CC: emacs-devel@gnu.org
> 
> Installed as 108520, with all suggested fixes and cleanups.

This breaks the MS-Windows port: it infloops at startup.  I will look
into it and see what I find.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-08  8:53                                     ` Eli Zaretskii
@ 2012-06-08  9:41                                       ` Eli Zaretskii
  2012-06-08 10:00                                         ` Eli Zaretskii
  0 siblings, 1 reply; 62+ messages in thread
From: Eli Zaretskii @ 2012-06-08  9:41 UTC (permalink / raw)
  To: dmantipov, eggert, monnier, emacs-devel

> Date: Fri, 08 Jun 2012 11:53:18 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> Cc: eggert@cs.ucla.edu, monnier@iro.umontreal.ca, emacs-devel@gnu.org
> 
> > Date: Fri, 08 Jun 2012 12:49:11 +0400
> > From: Dmitry Antipov <dmantipov@yandex.ru>
> > CC: emacs-devel@gnu.org
> > 
> > Installed as 108520, with all suggested fixes and cleanups.
> 
> This breaks the MS-Windows port: it infloops at startup.  I will look
> into it and see what I find.

The immediate reason seems to be some mismatch of BLOCK_INPUT and
UNBLOCK_INPUT somewhere.  The infloop is in
w32term.c:x_make_frame_visible, and it happens because
interrupt_input_blocked is non-zero after UNBLOCK_INPUT on line 5725
of w32term.c, which doesn't let the socket read hook run, and
therefore does not increment input_signal_count, which is a condition
for exiting this loop:

    for (count = input_signal_count + 10;
	 input_signal_count < count && !FRAME_VISIBLE_P (f);)
      {
	/* Force processing of queued events.  */
        /* TODO: x_sync equivalent?  */

	/* Machines that do polling rather than SIGIO have been observed
	   to go into a busy-wait here.  So we'll fake an alarm signal
	   to let the handler know that there's something to be read.
	   We used to raise a real alarm, but it seems that the handler
	   isn't always enabled here.  This is probably a bug.  */
	if (input_polling_used ())
	  {
	    /* It could be confusing if a real alarm arrives while processing
	       the fake one.  Turn it off and let the handler reset it.  */
	    int old_poll_suppress_count = poll_suppress_count;
	    poll_suppress_count = 1;
	    poll_for_input_1 ();
	    poll_suppress_count = old_poll_suppress_count;
	  }
      }

But the problem is not limited to this code alone.  If I reset
interrupt_input_blocked to zero, the initial frame gets displayed
correctly, but Emacs then becomes unresponsive to input, presumably
again because BLOCK_INPUT mismatch somewhere.

Do we have some facilities to catch such mismatches?  Any other ideas?



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-08  9:41                                       ` Eli Zaretskii
@ 2012-06-08 10:00                                         ` Eli Zaretskii
  0 siblings, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2012-06-08 10:00 UTC (permalink / raw)
  To: dmantipov, eggert, monnier, emacs-devel

> Date: Fri, 08 Jun 2012 12:41:24 +0300
> From: Eli Zaretskii <eliz@gnu.org>
> 
> But the problem is not limited to this code alone.  If I reset
> interrupt_input_blocked to zero, the initial frame gets displayed
> correctly, but Emacs then becomes unresponsive to input, presumably
> again because BLOCK_INPUT mismatch somewhere.

Found it.  Should be fixed now.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-08  5:50                               ` Dmitry Antipov
  2012-06-08  6:17                                 ` Stefan Monnier
@ 2012-06-08  6:57                                 ` Eli Zaretskii
  1 sibling, 0 replies; 62+ messages in thread
From: Eli Zaretskii @ 2012-06-08  6:57 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: monnier, emacs-devel

> Date: Fri, 08 Jun 2012 09:50:15 +0400
> From: Dmitry Antipov <dmantipov@yandex.ru>
> Cc: emacs-devel@gnu.org
> 
> OK, last call for comments, with internals.texi tweaks
> and ChangeLog entries...

Thanks.  Allow me a few minor comments and suggestions regarding the
documentation:

> +@cindex vectors and objects looks like a vectors

This index entry is problematic in several ways:

  . it is too long to be useful
  . it is grammatically incorrect ("objects looks")
  . most importantly, it is not a phrase a reader is likely to think
    of when she needs to find this information

The way to write good concept index entries is to come up with phrases
that _you_ might have in mind for locating description of the issues
documented in the text being indexed.

Based on the text, I suggest these 2 index entries:

 @cindex vector-like objects, storage
 @cindex storage of vector-like Lisp objects

(The reason for two permutations is to facilitate the "i TAB"
completion of index entries, when the reader looks for information
about storage of Lisp object.)

> +  Beyond the basic vector, a lot of objects like window, buffer, and
> +frame are managed as if they're vectors.
                     ^^^^^^^^^^^^^^^^^^^^^
"as if they were vectors" is more grammatically correct.

> +                                          So, all corresponding data
> +types includes the field @code{struct vectorlike_header}, where
         ^^^^^^^^
"include", in plural.

> +@code{header.next.buffer} points to the next buffer, in the chain of all
> +buffers (including killed buffers), and @code{header.next.vector} points
> +to the next vector in a free list.  For small vectors allocated from vector
> +blocks, @code{header.next.nbytes} is the vector's memory footprint.

I would rephrase slightly:

  The corresponding C data structures include the
  @code{struct vectorlike_header} field whose @code{next} field points
  to the next object in the chain: @code{header.next.buffer} points to
  the next buffer (which could be a killed buffer), and
  @code{header.next.vector} points to the next vector in a free list.
  If a vector is small (smaller than NNN bytes), then
  @code{header.next.nbytes} contains the vector data.

(And please fill in the value of NNN.)

> +to create largest possible free areas; if free area occupies the whole 4k
> +block, the block is freed.

"if a free area spans a complete 4K block, that block is freed"

> +                            Otherwise, the free area is recorded
> +to a free list array, where each entry corresponds to a free list
   ^^^^^^^^^^^^^^^^^^^^
"in a free list array".

> +of areas of the same size. Large vectors, buffers, and other large objects
                            ^^
Two spaces, please.

> +are individually allocated and freed using @code{malloc} and @code{free}.

This is inaccurate, at least for buffer text and strings.  They are
allocated using either mmap or ralloc.c functions.

Thanks.



^ permalink raw reply	[flat|nested] 62+ messages in thread

* Re: Proposal: block-based vector allocator
  2012-06-07 10:03                           ` Dmitry Antipov
  2012-06-07 14:07                             ` Stefan Monnier
@ 2012-06-08  6:38                             ` Paul Eggert
  1 sibling, 0 replies; 62+ messages in thread
From: Paul Eggert @ 2012-06-08  6:38 UTC (permalink / raw)
  To: Dmitry Antipov; +Cc: emacs-devel

Thanks for working on this.  Here are a few low-level comments.

On 06/07/2012 03:03 AM, Dmitry Antipov wrote:
> +#define VECTOR_FREE_LIST_FLAG				\
> +  ((~0) & ~(ARRAY_MARK_FLAG | PSEUDOVECTOR_FLAG		\

There's no need for the "(~0) &".

> +  zero_vector = (struct Lisp_Vector *)
> +    pure_alloc (header_size, Lisp_Vectorlike);

No need for the cast; "zero_vector = pure_alloc (header_size, Lisp_Vectorlike)"
will work just fine.  In general, it's better to avoid casts when they're
not needed, as casts are error-prone.  (Admittedly the code doesn't always
follow this rule, but this was partly motivated by pre-c89 compilers.)

> +#define ADVANCE(v,nbytes) (struct Lisp_Vector *) ((char *) (v) + (nbytes))

Space after comma (here and elsewhere).
Need to parenthesize the right-hand-side.

> +#define VBLOCK_BYTES_MIN (vroundup (sizeof (struct Lisp_Vector)))
> +
> +/* Size of the largest vector allocated from block.  */
> +
> +#define VBLOCK_BYTES_MAX					\
> +  (vroundup ((VECTOR_BLOCK_BYTES / 2) - sizeof (Lisp_Object)))

No need to parenthesize either right-hand side.

> +  block = xmalloc (sizeof (struct vector_block));
> +  if (!block)
> +    memory_full (sizeof (struct vector_block));

The last two lines are not needed, as 'xmalloc'
already does overflow checking.

^ permalink raw reply	[flat|nested] 62+ messages in thread

end of thread, other threads:[~2012-06-08 10:00 UTC | newest]

Thread overview: 62+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2011-12-06  5:22 Proposal: block-based vector allocator Dmitry Antipov
2011-12-06 13:35 ` Stefan Monnier
2011-12-06 15:14   ` Dmitry Antipov
2011-12-06 19:39     ` Stefan Monnier
2011-12-07  5:05       ` Dmitry Antipov
2011-12-07 12:27         ` Carsten Mattner
2011-12-07 13:52         ` Stefan Monnier
2011-12-07 16:08           ` Dmitry Antipov
2011-12-07 16:30             ` Stefan Monnier
2011-12-08  8:50               ` Dmitry Antipov
2011-12-08 13:52                 ` Stefan Monnier
2011-12-08  1:53             ` Stephen J. Turnbull
2011-12-08  4:41               ` Dmitry Antipov
2011-12-08 14:10                 ` Stefan Monnier
2011-12-08 16:48                   ` Dmitry Antipov
2011-12-08 19:58                     ` Stefan Monnier
2011-12-09  7:32                       ` Eli Zaretskii
2011-12-09  9:04                       ` Dmitry Antipov
2011-12-09 14:05                         ` Stefan Monnier
2011-12-09 16:15                           ` Dmitry Antipov
2011-12-09 21:04                             ` Stefan Monnier
2011-12-11 13:18                               ` Dmitry Antipov
2011-12-12  3:07                               ` Dmitry Antipov
2011-12-12 16:24                                 ` Stefan Monnier
2011-12-09  4:44                 ` Stephen J. Turnbull
     [not found] ` <jwvaa1yjs21.fsf-monnier+emacs@gnu.org>
2012-05-17  7:58   ` Dmitry Antipov
2012-05-18 17:40     ` Stefan Monnier
2012-05-21 12:19       ` Dmitry Antipov
2012-05-21 13:02         ` Andreas Schwab
2012-05-21 13:48           ` Dmitry Antipov
2012-05-21 15:07             ` Andreas Schwab
2012-05-22  5:23             ` Ken Raeburn
2012-05-21 20:12         ` Stefan Monnier
2012-05-22  8:24           ` Dmitry Antipov
2012-05-31 13:44           ` Dmitry Antipov
2012-05-31 15:43             ` Paul Eggert
2012-06-01  5:15               ` Dmitry Antipov
2012-06-01  5:44                 ` Paul Eggert
2012-06-01  9:06                   ` Dmitry Antipov
2012-06-01 17:36                     ` Stefan Monnier
2012-06-02  0:32                       ` Paul Eggert
2012-06-02  7:41                         ` Eli Zaretskii
2012-06-03  6:49                           ` Paul Eggert
2012-06-03 14:26                             ` Eli Zaretskii
2012-05-31 21:16             ` Stefan Monnier
2012-06-01  7:34               ` Dmitry Antipov
2012-06-01 17:40                 ` Stefan Monnier
2012-06-01 17:43                 ` Stefan Monnier
2012-06-06  7:02                   ` Dmitry Antipov
2012-06-06 13:13                     ` Stefan Monnier
2012-06-06 14:58                       ` Dmitry Antipov
2012-06-06 19:18                         ` Stefan Monnier
2012-06-07 10:03                           ` Dmitry Antipov
2012-06-07 14:07                             ` Stefan Monnier
2012-06-08  5:50                               ` Dmitry Antipov
2012-06-08  6:17                                 ` Stefan Monnier
2012-06-08  8:49                                   ` Dmitry Antipov
2012-06-08  8:53                                     ` Eli Zaretskii
2012-06-08  9:41                                       ` Eli Zaretskii
2012-06-08 10:00                                         ` Eli Zaretskii
2012-06-08  6:57                                 ` Eli Zaretskii
2012-06-08  6:38                             ` Paul Eggert

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/emacs.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).