unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
From: Stefan Israelsson Tampe <stefan.itampe@gmail.com>
To: guile-devel <guile-devel@gnu.org>
Subject: hashtables
Date: Wed, 6 Apr 2022 14:16:12 +0200	[thread overview]
Message-ID: <CAGua6m1mmRoGwvO6b1N+9CuyYcQ5NwsyqJCJ=E_HnQ=Lx6ADhQ@mail.gmail.com> (raw)

[-- Attachment #1: Type: text/plain, Size: 3271 bytes --]

Hi all,

After optimising the hashtable implementation I conclude the following,
1. + 0.25s  10M lookups in small hashtable  (0.36s for guiles current)
2. + 6X faster for a large table scanning numbers
3. + The general hash interface is faster also because it does not call SCM
from C
4. + fast (2X) for relative large random working sets although the
hashtable is large
5. - we do not suport the assoc generalisation (sort off though)
6. Compact (low memory overhead)

To note the design allows high performance for non uniform hashes e.g. we
can allow it to not spread out the hashes as optimal e.g. we can grow the
bins quite well and still maintain speed. This is especially well suited
for python hash strategies where integers just translate to integers.

This uses a datastructures that take advantage of SIMD instructions to
quickly find matches and vacant slots. Also an lru strategy is included for
speedup when the hash is large and the working set is not as large.

This is with the low level primitives. I will continue now with the higher
level primitives.

Let me know if you want to have this supported in guile as I have defined
some vm operations that are needed else this will be about 1.5 2x slower on
small hashtables.

Code:
;; This is the slow path where the bin is larger than 15
;; elements
(define (stis-d-ref v k default h eq)
  (let ((l (stis-b-alist-ref v h)))
    (let lp ((ll l))
      (if (pair? ll)
          (let ((it (car ll)))
             (if (eq k (car it))
                  it
                  (lp (cdr ll))))
          default))))


;; stis-b-search v is hashtable, k is key, h is hashvalue eq
;; the eqality predicate, stis-e-ref is the underlining lru
;; VM instruction used to lookup most (key . value)
(define-inlinable (stis-c-ref v k default h eq)
  (define (slow-ref) (stis-d-ref v k default h eq))
    (aif it (stis-e-search v h)
         (if (pair? it)
             (if (eq (car it) k)
                 it
                 (slow-ref))
             (slow-ref))
          default))

;; This is the slow set where one needs to cons the pair
;; on the last slot that is the assoc
(define (stis-d-set! v k val h eq)
  (let ((l (stis-b-alist-ref v h)))
    (let lp ((ll l))
      (if (pair? ll)
          (let ((it (car ll)))
             (if (eq k (car it))
                 (begin
                    (set-cdr! it val)
                    it)
                 (lp (cdr ll))))
           (let* ((it (cons k val))
             (l  (cons it l)))
             (stis-b-alist-set! v h l)
             it)))))

;; This is the storing of a (k . val) pair in the
;; stis hash data structure
;; if we find a handle, set it else if the key hash is
;; missmatching, store it on the assoc, else if not a match
;; add it to the list via the c-primitive stis-b-add!
(define-inlinable (stis-c-set! v k val h eq)
  (define (slow-set!) (stis-d-set! v k val h eq))
    (aif it (stis-e-search v h)
         (if (pair? it)
             (if (eq (car it) k)
                 (begin
                   (set-cdr! it val)
                   it)
                 (slow-set!))
             (slow-set!))
         (let ((it (cons k val)))
            (stis-b-add! v h it)
            it)))

Code:
see my previous post, but it's in guile-persist (gitlab)

[-- Attachment #2: Type: text/html, Size: 5078 bytes --]

             reply	other threads:[~2022-04-06 12:16 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-06 12:16 Stefan Israelsson Tampe [this message]
2022-04-06 15:10 ` hashtables Dr. Arne Babenhauserheide

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://www.gnu.org/software/guile/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAGua6m1mmRoGwvO6b1N+9CuyYcQ5NwsyqJCJ=E_HnQ=Lx6ADhQ@mail.gmail.com' \
    --to=stefan.itampe@gmail.com \
    --cc=guile-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).