unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* assocs
@ 2022-03-20 17:27 Stefan Israelsson Tampe
  0 siblings, 0 replies; only message in thread
From: Stefan Israelsson Tampe @ 2022-03-20 17:27 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 2483 bytes --]

from working a bit with huge hashmaps I started to look into the usage
scenario of small hashmaps that are referenced a lot of time. And to
benchmark those I was led to study simple assoc lists.

The observations I found,
1. Call overhead is too large for a simple lookup. It comes from computing
the hash value and  the actual lookup call. So essentially to make it
superfast you need to
1. Make sure that simple hash functions (hashq) are a VM operation. Current
hash implementation take advantage of living in C land, but this design is
not lending itself well to a performant extensible mechanism, sure there is
hashx and friends, but they are horrible slow. Now for example I would like
to make a new hash function that is super fast for numbers and the same as
the other for the rest. That's not doable as any gain one can make is
destroyed by hashx's overhead. Anyhow a simple example with a hash of 16
elements can in a naive implementation potentially be more than 2X as fast
as guile's internal tool.

So I was wondering about how to improve the lookup capabilities of a simple
assoc. My conclusion is that much can be done with bit fiddling and
currently C is superior when it comes to that. Just guide gcc a little and
speed is there in the form of vectorsation. So for 8 64bit handles, we can
have a an array of the format,

Word1:   8 1byte hashes
Word2:   4 2byte hashes
Word3:   4 2byte hashes
Word4:   SCM Data1
...
Word11:  SCM Data8
Word12;  Rest (cons list)

Now as you see the memory overhead is low compared to all the data slot
that we manage and this allows us to with just one check decide if the hash
key is present e.g. we gain an extra 16 bit hash length to decide about
equality. And designing it like this means that with one comparison you can
check if the target might be in the 8 slot's.  and you are close to a
single lookup in the pure map. To see if there is an equality you go about
with divide and conquer so with 5 checks you have a 16bit extra
verification that with quite certainty you have found a match. Looks messy!
Well, it turns out that making a program that writes programs is well
suited for this case so I am about to test run a new case with this. Well
with this approach we increase the number of bits used to match for hashes
with low extra overhead.

But aren't it better to just use a simple approach with a vector and assoc
lists? Well for larger hashmaps one encounter issues with




As you see, the ovehead is really really

[-- Attachment #2: Type: text/html, Size: 2928 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2022-03-20 17:27 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-20 17:27 assocs Stefan Israelsson Tampe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).