On 12/10/2011 01:04 AM, Stefan Monnier wrote:

> BTW, your code seems to compute the free list index by dividing by
> word_size (= 4 on 32bit systems) rather than by 8, so half of the slots
> will always be unused, IIUC.

You're correct, and this is fixed.

> Let us how it turns out.

It needs to be run again since I introduced some fixes, including
those that suggested by you.

> If it's an arbitrary choice, it should not be called "page_size" but
> "vector_block_size" or some such.  Otherwise, we should fetch its value
> from the OS.

Better name is OK (old one was a trace of mmap()-related experiments :-).

> Where do you make use of this alignment?

Not needed any more.

> This test looks dangerous (we if we allocate a large vector of this
> size?).

It's impossible because VECTOR_BLOCK_BYTES is 'exact-fit' for the largest
possible block-allocated vector. That's why I'm using xmalloc/mem_insert
and not lisp_malloc: VECTOR_BLOCK_SIZE bytes are allocated for the block,
but only block->data..block->data + VECTOR_BLOCK_BYTES range is marked as
vector memory within mem_node tree. So, any MEM_TYPE_VECTORLIKE mem_node
with VECTOR_BLOCK_BYTES size memory range is guaranteed to match the
vector block, and this vector block may contain the only vector of
VECTOR_BLOCK_BYTES bytes size.

> Hmm... I didn't think of it, but this is a disadvantage of
> non-segregated blocks: during conservative scanning, you need to scan
> the vector block to determine if we really have a live_vector_p whereas
> segregated blocks would just divide the offset by the block's
> vector size.

Yes, but this linear scan may be optimized:
  1) scan is required only if examined pointer is less than
     block bump allocation pointer;
  2) scan should not look at addresses beyond examined pointer.

> For small enough vector blocks, it's probably not a big deal, but for
> 10-slot vectors on 32bit systems with 4KB blocks, that's about
> 100 iterations.

This is a worst-case scenario when it's needed to scan up to the end of vector
block, and the block is full of small vectors. Although the most vectors
are small, small vectors usually allocated and died (becomes unreachable)
in groups, and such a group of adjacent died vectors is a subject for coalescing,
which reduces the number of vectors in the block. For the byte-compile
benchmark, this loop does ~12 iterations on the average (for 64-bit code; expect
larger value for 32-bit since average vector size is ~2x smaller).

BTW, what do you think about hooking buffer allocation into common vector
allocation code? I found it controversial (buffers are mid-sized vector-like
objects, so it might be reasonable to generalize buffer allocation with
other vectors; on the other side, buffers needs special treatment for
liveness detection and special processing during sweep), but want to try
anyway.

Dmitry