unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* hastbables in scheme, are they slow? using goops is this madness
@ 2022-02-21 23:05 Stefan Israelsson Tampe
  2022-02-22  7:14 ` Maxime Devos
  0 siblings, 1 reply; 2+ messages in thread
From: Stefan Israelsson Tampe @ 2022-02-21 23:05 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3269 bytes --]

Hi,

I did some research about guile's hashtables and have tried a few versions
of it in guile scheme from using enormous macro expansions to goops object
oriented layers. What I found is

1. For small hashtables, scheme based solutions are superior to C based
2-3X faster for a hastables of size 40

(let lp ((i 0)) (if (< i 2000000) (let lp2 ((j 0)) (if (< j 40) (begin
(hashq-set! h j (+ 1 (hashq-ref h j 0)) (lp2 (+ j 2)) (lp (+ i 1)))


2. For very large hash tables C based solutions are about 1.5-2.0 faster.
(for-each (lambda (i) (hashq-set! h i i)) (iota 20000000))

3. iterating over the hashtable is 15X faster with a scheme based solution
and then it is also preserving the order of insertion, we are not
entirely restricted to fold/for-each but all proper loops are this speedy.
This means that the overhead is about as much as using spaghetti stack one
shot generators. This is actually 5X faster than python3 (but python are
better on large examples due to the fact that (hash n) = n for numbers in
python.

(hash-fold (lambda (k v s) (+ v s)) 0 h)

4. I experimented with goops and a more functional approach and a pure
crazy macro everything approach. Some function needs to be done as a macro,
but compartmetizing and organizing the hash environment with goops is about
10% slower than an expand everything strategy. We use struct-ref extensible
to cheat a little and and also, although tha accessors and remove functions
are goops methods, we can grovel the pure functions inside the goops
architecture like so

(with-hash-accessors (H : href hset hremove)
     ...
     (hset k (+ 12 (href k 0)))
     ...)

So with the goops database we easily construct all the normal accessors and
have a nice extensible system in case one would like to use more general
hash and checking predicates.

Code:
https://gitlab.com/python-on-guile/python-on-guile/-/blob/master/modules/ice-9/hashish.scm

For small hashes, it is better to use a vector or assoc and the system
dispatches between those two regions, here a C based helper function could
speed things up, but that's the only application of C that is valuable and
it is pretty limited in scope.

Removing is done by setting the cdr of the handle to a special tag and a
counter is incremented. When iterating those slots are jumped over and when
there is enough space, the hashtable is copied over to a fresh new one.
This means that removing and restoring the same key is not allocating if
done close in time and also it simplifies the managing of the insertion
order property.

The class is mainly
assoc  = vector which contains the handles in insertion order
bins     = vector where the hash index the assoc which is searched
n          = number of elements in the assoc insertion order vector
m         = number of deleted elements

Fun fact, we can store a hash as well in the class if we like, because the
order is not important in many applications semantically, one can just take
the xor of all the elements in the datastructure, there are different
variants here, like only for the keys to get more of a set hash or both to
get the total hash. This means that it is very fast usually to discover if
two hashmaps or sets are different.

Any thoughts or comments are welcomed.

LICENSE
LGPL v2 or v3

[-- Attachment #2: Type: text/html, Size: 4089 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-02-22  7:14 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-21 23:05 hastbables in scheme, are they slow? using goops is this madness Stefan Israelsson Tampe
2022-02-22  7:14 ` Maxime Devos

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).