On 12/09/2011 06:05 PM, Stefan Monnier wrote:

> Your test with the *compilation* seems to indicate otherwise, or am
> I missing something?

"Not too much" for byte-compile test. Since *compilation* buffer test
allocates mostly an intervals, mmap() helps it regardless of the method
used for vector allocation.

> Exactly: no need to work hard.

Sure, but throwing away the almost-done work makes no sense, especially
when an alternative work is not even started (if I miss something done,
I would be glad to see it).

> The apparently tightest fit is not necessarily tighter: what happens if
> you use an "order624" to allocate a 600B object, then use the remaining
> 24B for another object, and then free the 600B object?

There will be an attempt to coalesce this 600-bytes vector with adjacent
vectors, and resulting vector will be set up on a free list.

> I also really want to know the answer to this question (third try):
> Why have different classes for 632B and 600B sizes?  You can't fit more
> than 6 vectors of either size in a 4KB block, so you might as well
> divide your "order600" block into 6 chunks of 680B (plus 16B of padding)
> rather than 6 chunks of 600B plus an unused 496B of padding.

I don't understand your question. It's _your_ proposal to divide objects
on classes by rounding the size up to nearest magic value ('standard malloc
algorithm'). My proposal assumes that vector block can contains objects
of any size, and I'm using segregated free lists for ..., 600, 608, 616, 624,
632, ... bytes free vectors.  I believe it was well explained why I rejects
both 'exact-fit vector blocks for each possible vector size' and 'good-fit
vector block for each size class' ('standard malloc') algorithms.

To be concrete, I'm attaching the current state of my proposal again.

Did you ever tried to obtain a histogram of vector sizes? Of course, it
highly depends on the benchmark; but I'm pretty sure that 2..6 - Lisp_Objects
vectors are very popular for the wide range of real workloads. That's
another reason why throwing away even 32 bytes is not so reasonable.

I'm scheduling the global benchmark for the night (32 vs. 64-bit
binaries, GCPROs vs.stack marking, block-based vector allocation vs.
current code, in all possible variants, 8 executables in total,
16 runs for each :-)).

Dmitry