From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: hastbables in scheme, are they slow? using goops is this madness Date: Tue, 22 Feb 2022 00:05:43 +0100 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="0000000000004cfde905d88f46d7" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17658"; mail-complaints-to="usenet@ciao.gmane.io" To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Tue Feb 22 00:07:00 2022 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nMHlj-0004P2-AH for guile-devel@m.gmane-mx.org; Tue, 22 Feb 2022 00:06:59 +0100 Original-Received: from localhost ([::1]:47600 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nMHli-00041q-8E for guile-devel@m.gmane-mx.org; Mon, 21 Feb 2022 18:06:58 -0500 Original-Received: from eggs.gnu.org ([209.51.188.92]:51248) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nMHkl-00041h-8v for guile-devel@gnu.org; Mon, 21 Feb 2022 18:05:59 -0500 Original-Received: from [2607:f8b0:4864:20::42a] (port=33290 helo=mail-pf1-x42a.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nMHkj-0004Z2-89 for guile-devel@gnu.org; Mon, 21 Feb 2022 18:05:58 -0500 Original-Received: by mail-pf1-x42a.google.com with SMTP id d17so10185599pfl.0 for ; Mon, 21 Feb 2022 15:05:56 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=MWbK0FsEXzSn7CiEF4dbZdGTW+UcEcD95khnhM9JBJY=; b=ekRzP5c9cvahVfY9e0i+9r4t6aJnbZLhliFODVQFPcdk/o7fZJd0iVm+CEq25dU+Qp Wz1Uv8LKP/8mVJcBgU9ZTOORvAuB5yViq3CLzevU2aXmfsdi5yAw+W/K4azrGv9ULW4q K19b55s7P0I1U4euw24PVpHrObGARiHyyBxC5R2zHELG41qzzuNBTHzYAxW+koDqSJia 2YgFCoqa4APP1stkvCoboJlMBSehFCOI0UtJHQ9wpF34XmN/eVrNcIYBgH2s3TZ4rGoq nEqGP4phvHY1euqOhJN5Pdw9xxdaHmsL4x1KT8Kk6vtjDVpEsoEpNXhKbc/dcifgk7vt R1JQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=MWbK0FsEXzSn7CiEF4dbZdGTW+UcEcD95khnhM9JBJY=; b=iqEYkLGcJF4jpCylWWvTbHeXOrJ5rZ2y4o1+jQhr6NyXvRahXXs6Avvjv5jPY+HAfg 5NjgzitDZ1Gj2VAPun2USPCRV+reAK+zr74mZMYqujKTtMY6PZclsS8i4O5gvRiVOMOf LLVhNCP1UJG9RrMhFoTtudcmdzFdXHTqCmgE5ut2bjOMFrh2xUZyzy8Gwb+GwF98fHSe lQIXfTMW7jrWoAd+FO5M8ImF1i20aZ5mPxQs/OZ+Ibf1c9Z9S46M0iuWHg5LkZOv7z41 w35sa+Zm3PTfVlSpcOfalb14u/YMvzj6bjVXM0qh/9YTWZ6U1SYYTvMUoP1QQiRXNoLS qAbQ== X-Gm-Message-State: AOAM531xITCr95lHYY0HPdteFf70Q/7GO5zG9ZWdseSJ3FUMJx8SuQgv 8ljpPV/irXS1gx3roZh1Wlb03K4p/R9OoVv147IMkdUC0zo= X-Google-Smtp-Source: ABdhPJyWY97K/FbpzeYeZUi3o4jtR3K0Dojf2mytRoo6d8Mz9tJaAaQUjXqeiqq9oAJNEooP8kJuRbBourhbwqn+GtM= X-Received: by 2002:a63:de55:0:b0:374:2526:3596 with SMTP id y21-20020a63de55000000b0037425263596mr8127135pgi.592.1645484754925; Mon, 21 Feb 2022 15:05:54 -0800 (PST) X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::42a (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::42a; envelope-from=stefan.itampe@gmail.com; helo=mail-pf1-x42a.google.com X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, PDS_HP_HELO_NORDNS=0.659, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:21140 Archived-At: --0000000000004cfde905d88f46d7 Content-Type: text/plain; charset="UTF-8" Hi, I did some research about guile's hashtables and have tried a few versions of it in guile scheme from using enormous macro expansions to goops object oriented layers. What I found is 1. For small hashtables, scheme based solutions are superior to C based 2-3X faster for a hastables of size 40 (let lp ((i 0)) (if (< i 2000000) (let lp2 ((j 0)) (if (< j 40) (begin (hashq-set! h j (+ 1 (hashq-ref h j 0)) (lp2 (+ j 2)) (lp (+ i 1))) 2. For very large hash tables C based solutions are about 1.5-2.0 faster. (for-each (lambda (i) (hashq-set! h i i)) (iota 20000000)) 3. iterating over the hashtable is 15X faster with a scheme based solution and then it is also preserving the order of insertion, we are not entirely restricted to fold/for-each but all proper loops are this speedy. This means that the overhead is about as much as using spaghetti stack one shot generators. This is actually 5X faster than python3 (but python are better on large examples due to the fact that (hash n) = n for numbers in python. (hash-fold (lambda (k v s) (+ v s)) 0 h) 4. I experimented with goops and a more functional approach and a pure crazy macro everything approach. Some function needs to be done as a macro, but compartmetizing and organizing the hash environment with goops is about 10% slower than an expand everything strategy. We use struct-ref extensible to cheat a little and and also, although tha accessors and remove functions are goops methods, we can grovel the pure functions inside the goops architecture like so (with-hash-accessors (H : href hset hremove) ... (hset k (+ 12 (href k 0))) ...) So with the goops database we easily construct all the normal accessors and have a nice extensible system in case one would like to use more general hash and checking predicates. Code: https://gitlab.com/python-on-guile/python-on-guile/-/blob/master/modules/ice-9/hashish.scm For small hashes, it is better to use a vector or assoc and the system dispatches between those two regions, here a C based helper function could speed things up, but that's the only application of C that is valuable and it is pretty limited in scope. Removing is done by setting the cdr of the handle to a special tag and a counter is incremented. When iterating those slots are jumped over and when there is enough space, the hashtable is copied over to a fresh new one. This means that removing and restoring the same key is not allocating if done close in time and also it simplifies the managing of the insertion order property. The class is mainly assoc = vector which contains the handles in insertion order bins = vector where the hash index the assoc which is searched n = number of elements in the assoc insertion order vector m = number of deleted elements Fun fact, we can store a hash as well in the class if we like, because the order is not important in many applications semantically, one can just take the xor of all the elements in the datastructure, there are different variants here, like only for the keys to get more of a set hash or both to get the total hash. This means that it is very fast usually to discover if two hashmaps or sets are different. Any thoughts or comments are welcomed. LICENSE LGPL v2 or v3 --0000000000004cfde905d88f46d7 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi,

I did some research about guile'= ;s hashtables and have tried a few versions of it in guile scheme from usin= g enormous=C2=A0macro expansions to goops object oriented layers. What I fo= und is

1. For small hashtables, scheme based solut= ions are superior to C based
2-3X faster for a hastables=C2=A0of = size 40

(let lp ((i 0)) (if (< i 2000000) (= let lp2 ((j 0)) (if (< j 40) (begin (hashq-set! h j (+ 1 (hashq-ref h j = 0)) (lp2 (+ j 2)) (lp (+ i 1)))


2. = For very large hash tables=C2=A0C based solutions are about 1.5-2.0 faster.= =C2=A0
(for-each (lambda (i) (hashq-set! h i i)) (iota 20000000))=

3. iterating over the hashtable is 15X faster wit= h a scheme based solution and then it is also preserving the=C2=A0order of = insertion, we are not entirely=C2=A0restricted to fold/for-each but all pro= per loops are this speedy. This means that the overhead is about as much as= using spaghetti=C2=A0stack one shot generators. This is actually=C2=A05X f= aster=C2=A0than=C2=A0python3 (but python are better on large examples due t= o the fact that (hash n) =3D n for numbers in python.

(hash-fold (lambda (k v s) (+= v s)) 0 h)

4. I experimented with goop= s and a more functional approach and a pure crazy macro everything approach= . Some function needs to be done as a macro, but compartmetizing and organi= zing the hash environment with goops is about 10% slower than an expand eve= rything strategy. We use struct-ref extensible to cheat a little and and al= so, although tha accessors and remove functions are goops methods, we can g= rovel the pure functions inside the goops architecture like so
(with-hash-accessors (H : href hset hremove)
=C2=A0 = =C2=A0 =C2=A0...=C2=A0
=C2=A0 =C2=A0 =C2=A0(hset k (+ 12 (href k = 0)))
=C2=A0 =C2=A0 =C2=A0...)

So with th= e goops database we easily=C2=A0construct all the normal accessors and have= a nice extensible system in case one would like to use more general hash a= nd checking predicates.=C2=A0

Code:

Fo= r small hashes, it is better to use a vector or assoc and the system dispat= ches between those two regions, here a C based helper function could speed = things up, but that's the only application of C that is valuable and it= is pretty limited in scope.

Removing is done by s= etting the cdr of the handle to a special tag and a counter is incremented.= When iterating those slots are jumped over and when there is enough space,= the hashtable is copied over to a fresh new one. This means that removing = and restoring the same key is not allocating if done close in time and also= it simplifies the managing of the insertion order property.

=
The class is mainly
assoc=C2=A0 =3D vector which conta= ins the handles in insertion order
bins=C2=A0 =C2=A0 =C2=A0=3D ve= ctor where the hash index the assoc which is searched
n=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =3D number of elements in the assoc insertion orde= r vector
m=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0=3D number of deleted= elements

Fun fact, we can store a hash as well in= the class if we like, because the order is not important in many applicati= ons semantically, one can just take the xor of all the elements in the data= structure, there are different variants here, like only for the keys to get= more of a set hash or both to get the total hash. This means that it is ve= ry fast usually to discover if two hashmaps or sets are different.

Any thoughts or comments are welcomed.

LICENSE
LGPL v2 or v3
--0000000000004cfde905d88f46d7--