lru hashes

unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed

* lru hashes
@ 2022-03-17 23:07 Stefan Israelsson Tampe
  0 siblings, 0 replies; only message in thread
From: Stefan Israelsson Tampe @ 2022-03-17 23:07 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2569 bytes --]

Here is an idea for guile hash tables.

So consider n bins,
struct bin x[n];

where the bin is defined by,

struct bin
{
unioni m128      m,
SCM                  h,
SCM                  v,
}

and the matching structure is a MMS friendly datastructure
union m128
{
uint8_t       v[16];
uint128_t   x;
}

let c be the constant 0x01010101010101010101010101010101

Then if k is uint8_t, we can distribute k over all v as such
y = c *  k

And in the binwe can define
z = (y == x[i].m)

This will put ones at the bytes where the bytes matches. We can now find
the index of the first nonzero byte in z and hence find a matching index if
there is a match. So with just a few instructions you will find a matching
index among 16 bytes. This can lead to insane speed in scanning a list of
key value pairs and I noted that scanning 1000 elements takes about 100ns
on my laptop.

Now over to hashes the idea is h is the head and x is a vector that are 4
or 8 or 16 byte long and stores the actual key/value handles. the last slot
however will be kept for a normal assoc in case the 15 slots are not enough
or there are hash clashes.

The happy case for setting a new value are that you will find zero matches
in just essentially a few operations although the bin is filled with 15
elements. And one hence setting a new value is essentially setting the
data, not checking if there is a bin.

Now over to finding a value that are included in the hashtable. Here the
happy case are that you find out that the slot is first (h)  and then after
a equal check of the key part of the handle one have found the handle (this
is as good as one can hope). If we find the slot in one of the other 15
slots a move to front is used in order to try to keep that active set
relevant and fast. In case we scan the hole hashtable one should also keep
a scanning list beside the hash table and use that instead of the hash
table structure so essentially there is very few use cases where this
hashtable has a downside. Also one would probably store hashtables of size
< 128 or such as a scanning list as this can be made very fast.

Now this method compresses the main table, we are talking about 4scm values
for each bin and we can maybe have 12-16 scm values in the mean meaning
that the size of the main hash is 3-4 times smaller than a normal hash
table. This is important as the threshold when the randomisation of memory
can mean a dramatic improvement of speed for quite many sizes of hash table.

At least this is the theory. You can follow my work at
 https://gitlab.com/tampe/guile-persist

[-- Attachment #2: Type: text/html, Size: 3212 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-03-17 23:07 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-17 23:07 lru hashes Stefan Israelsson Tampe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).