unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* hashtables
@ 2022-04-06 12:16 Stefan Israelsson Tampe
  2022-04-06 15:10 ` hashtables Dr. Arne Babenhauserheide
  0 siblings, 1 reply; 2+ messages in thread
From: Stefan Israelsson Tampe @ 2022-04-06 12:16 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3271 bytes --]

Hi all,

After optimising the hashtable implementation I conclude the following,
1. + 0.25s  10M lookups in small hashtable  (0.36s for guiles current)
2. + 6X faster for a large table scanning numbers
3. + The general hash interface is faster also because it does not call SCM
from C
4. + fast (2X) for relative large random working sets although the
hashtable is large
5. - we do not suport the assoc generalisation (sort off though)
6. Compact (low memory overhead)

To note the design allows high performance for non uniform hashes e.g. we
can allow it to not spread out the hashes as optimal e.g. we can grow the
bins quite well and still maintain speed. This is especially well suited
for python hash strategies where integers just translate to integers.

This uses a datastructures that take advantage of SIMD instructions to
quickly find matches and vacant slots. Also an lru strategy is included for
speedup when the hash is large and the working set is not as large.

This is with the low level primitives. I will continue now with the higher
level primitives.

Let me know if you want to have this supported in guile as I have defined
some vm operations that are needed else this will be about 1.5 2x slower on
small hashtables.

Code:
;; This is the slow path where the bin is larger than 15
;; elements
(define (stis-d-ref v k default h eq)
  (let ((l (stis-b-alist-ref v h)))
    (let lp ((ll l))
      (if (pair? ll)
          (let ((it (car ll)))
             (if (eq k (car it))
                  it
                  (lp (cdr ll))))
          default))))


;; stis-b-search v is hashtable, k is key, h is hashvalue eq
;; the eqality predicate, stis-e-ref is the underlining lru
;; VM instruction used to lookup most (key . value)
(define-inlinable (stis-c-ref v k default h eq)
  (define (slow-ref) (stis-d-ref v k default h eq))
    (aif it (stis-e-search v h)
         (if (pair? it)
             (if (eq (car it) k)
                 it
                 (slow-ref))
             (slow-ref))
          default))

;; This is the slow set where one needs to cons the pair
;; on the last slot that is the assoc
(define (stis-d-set! v k val h eq)
  (let ((l (stis-b-alist-ref v h)))
    (let lp ((ll l))
      (if (pair? ll)
          (let ((it (car ll)))
             (if (eq k (car it))
                 (begin
                    (set-cdr! it val)
                    it)
                 (lp (cdr ll))))
           (let* ((it (cons k val))
             (l  (cons it l)))
             (stis-b-alist-set! v h l)
             it)))))

;; This is the storing of a (k . val) pair in the
;; stis hash data structure
;; if we find a handle, set it else if the key hash is
;; missmatching, store it on the assoc, else if not a match
;; add it to the list via the c-primitive stis-b-add!
(define-inlinable (stis-c-set! v k val h eq)
  (define (slow-set!) (stis-d-set! v k val h eq))
    (aif it (stis-e-search v h)
         (if (pair? it)
             (if (eq (car it) k)
                 (begin
                   (set-cdr! it val)
                   it)
                 (slow-set!))
             (slow-set!))
         (let ((it (cons k val)))
            (stis-b-add! v h it)
            it)))

Code:
see my previous post, but it's in guile-persist (gitlab)

[-- Attachment #2: Type: text/html, Size: 5078 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: hashtables
  2022-04-06 12:16 hashtables Stefan Israelsson Tampe
@ 2022-04-06 15:10 ` Dr. Arne Babenhauserheide
  0 siblings, 0 replies; 2+ messages in thread
From: Dr. Arne Babenhauserheide @ 2022-04-06 15:10 UTC (permalink / raw)
  To: Stefan Israelsson Tampe; +Cc: guile-devel

[-- Attachment #1: Type: text/plain, Size: 1373 bytes --]


Stefan Israelsson Tampe <stefan.itampe@gmail.com> writes:

> After optimising the hashtable implementation I conclude the following,
> 1. + 0.25s  10M lookups in small hashtable  (0.36s for guiles current)
> 2. + 6X faster for a large table scanning numbers

What do you mean by "large taable scanning numbers"? (I don’t understand)

> 3. + The general hash interface is faster also because it does not call SCM from C
> 4. + fast (2X) for relative large random working sets although the hashtable is large

That would likely directly speed up code I have by factor 2. 

> 5. - we do not suport the assoc generalisation (sort off though)

What do you mean by assoc generalization?

> 6. Compact (low memory overhead)

All in all these improvements sound great! And low memory overhead is
the icing on the cake.

How big is the remaining overhead? (compared to storing the values in a vector?)

> Let me know if you want to have this supported in guile as I have defined some vm operations that are needed else this will be about 1.5 2x slower on small hashtables.

Since I do not know the implications of this on the VM, I cannot give an
answer.

If Andy Wingo doesn’t object, then I think these improvements would be
awesome to have!

Best wishes,
Arne
-- 
Unpolitisch sein
heißt politisch sein,
ohne es zu merken.
draketo.de

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 1125 bytes --]

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-04-06 15:10 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-06 12:16 hashtables Stefan Israelsson Tampe
2022-04-06 15:10 ` hashtables Dr. Arne Babenhauserheide

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).