I've tried a few designs of block-based vector allocation, and this is the first
one which looks good enough to be proposed here.

This vector allocator uses near-to-PAGE_SIZE blocks to allocate small vectors
and dedicated list for large vectors. Array of free lists is used to separately
hold free small vectors of all possible sizes. When an allocation request can't
be satisfied from exact-fit free list, best-fit is tried, and unused tail bytes
forms a free vector which is set up on an appropriate free list. If best-fit
allocation has failed too, and allocation request exceeds an amount of free bytes
in the current vector block, new block is allocated and this request is satisfied
from it; the rest of space from previous block forms a free vector which is set
up on appropriate free list. If allocation request may be satisfied from the
current vector block, it's just a simple bump pointer allocation.

At sweeping, all vector blocks are processed in reverse allocation order,
and pointers to unmarked vectors are collected on a per-block basic. If the
block contains no live vectors, it's freed. Otherwise, adjacent free vectors
are coalesced to form the largest possible free vectors, which are set up on
an appropriate free lists. Finally, dedicated list of large vector is processed,
and unmarked vectors are freed.

I didn't test it too much, but this code survives the bootstrap both in 32 and
64-bit variants and doesn't crash under usual daily editing workloads. Although
the whole stuff looks a bit complicated, the benefits are 4-5% faster speed and
7-8% less peak heap utilization with my favorite byte-force-recompile benchmark,
at the cost of 2% larger (64-bit) dumped executable.

As with the most general case algorithms, some special scenarios may hurt
an allocator. For example, allocating a lot of 3K vectors wastes ~25% of
vector block space; allocating a lot of mid-size vectors, followed by allocating
a lot of small vectors (or vice versa) will trigger an excessive splitting or
coalescing work. But I believe this code is balanced enough to have a measurable
advantage over current scheme for the most of average normal usage patterns.

Dmitry