From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: lru hashes Date: Fri, 18 Mar 2022 00:07:55 +0100 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000f8109e05da721a30" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="14393"; mail-complaints-to="usenet@ciao.gmane.io" To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Fri Mar 18 00:08:36 2022 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1nUzES-0003ZH-Ae for guile-devel@m.gmane-mx.org; Fri, 18 Mar 2022 00:08:36 +0100 Original-Received: from localhost ([::1]:60342 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nUzER-0008Ru-9c for guile-devel@m.gmane-mx.org; Thu, 17 Mar 2022 19:08:35 -0400 Original-Received: from eggs.gnu.org ([209.51.188.92]:58620) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nUzED-0008RU-Ts for guile-devel@gnu.org; Thu, 17 Mar 2022 19:08:21 -0400 Original-Received: from [2607:f8b0:4864:20::633] (port=41785 helo=mail-pl1-x633.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1nUzEB-0008By-JH for guile-devel@gnu.org; Thu, 17 Mar 2022 19:08:21 -0400 Original-Received: by mail-pl1-x633.google.com with SMTP id z3so5682165plg.8 for ; Thu, 17 Mar 2022 16:08:18 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=ofI9eLhtyRBkITN/Sy0lwH8O21NFAgH8EzywZdeP/SM=; b=oBsDqlq6vxo9F7sOJS3927+/KOBe9bPc0yqJdTr53qUsq5Ry15LpAlwzll6HwHAgQX lt+J2csuG3dAr1lecSo+PPU1pPqvCOmxl84I0Obxn+WjcY1ohKAfh4BCiNC9PqpTHObG jZODfto3UKQDoawlISI8CF5cJIQv/uiQXktmhPFqA9uJ8gYZSZR0mrBN2BvqaPI+6aMJ mSH2AnkK5j7kWwfxK9xF1bK1Xpzh1Do0FoVwtaMWdtM4nfegHVMte0yJD2UDKeudmZHY b+yuFOSgYN44NI16nN+pPfih1pj7VXOIphWjsLEyFEUMmBW5MPkmqUzwiuFpiQq5nS+T MBug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=ofI9eLhtyRBkITN/Sy0lwH8O21NFAgH8EzywZdeP/SM=; b=I5nvIBGRfzyjujH/8jq19vALX9VCdjRY6lQFwedn5KOfhC3YQdNZ7Rn9OIHOB/PoKu R99g1alVI5k5cineFCGF/VYKbnwiU+pmIsTR7xYceVk68tZifYNmyTr2Gp/K30kyGjD4 UpdjBLKfDGMIhCNkW1Er5CivG7oiT2h3OKzyX1eOk63m/T8ovMC4UxDWewuFq9g0rRi9 d/PUP5yNaA5ntIFx2cOGdZWYqJZiwzui+DBa9y8Kqp3qlIxN1aCTICXQXFeoVJkfQdaN kdz/3wcK5Je8EMPUJdxBqMlHjyK0Cbqo1Aj96cADFOg5CQCk7sowKQq7ksSWDLAEwmSm tAyg== X-Gm-Message-State: AOAM530lsn6TEuInEgcyBf+6TBxa1PEVKzVsH/zV5GjCUOCrFQL+UTAP 1KJQDzVzIKQU1fD4lm273VEXLHDyaIka7lzY1rD8DA5h6yQ= X-Google-Smtp-Source: ABdhPJxn9Mi2JCRN1KWXIfFiqeIlLn1zTvFwWbWHhZDmd0RqjZj7COE/tdy1xSvoKBTjF39Ylq7wFUT4zYi8BTaexEM= X-Received: by 2002:a17:90b:4b8a:b0:1c6:35d1:4418 with SMTP id lr10-20020a17090b4b8a00b001c635d14418mr17910050pjb.176.1647558497143; Thu, 17 Mar 2022 16:08:17 -0700 (PDT) X-Host-Lookup-Failed: Reverse DNS lookup failed for 2607:f8b0:4864:20::633 (failed) Received-SPF: pass client-ip=2607:f8b0:4864:20::633; envelope-from=stefan.itampe@gmail.com; helo=mail-pl1-x633.google.com X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, PDS_HP_HELO_NORDNS=0.659, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:21169 Archived-At: --000000000000f8109e05da721a30 Content-Type: text/plain; charset="UTF-8" Here is an idea for guile hash tables. So consider n bins, struct bin x[n]; where the bin is defined by, struct bin { unioni m128 m, SCM h, SCM v, } and the matching structure is a MMS friendly datastructure union m128 { uint8_t v[16]; uint128_t x; } let c be the constant 0x01010101010101010101010101010101 Then if k is uint8_t, we can distribute k over all v as such y = c * k And in the binwe can define z = (y == x[i].m) This will put ones at the bytes where the bytes matches. We can now find the index of the first nonzero byte in z and hence find a matching index if there is a match. So with just a few instructions you will find a matching index among 16 bytes. This can lead to insane speed in scanning a list of key value pairs and I noted that scanning 1000 elements takes about 100ns on my laptop. Now over to hashes the idea is h is the head and x is a vector that are 4 or 8 or 16 byte long and stores the actual key/value handles. the last slot however will be kept for a normal assoc in case the 15 slots are not enough or there are hash clashes. The happy case for setting a new value are that you will find zero matches in just essentially a few operations although the bin is filled with 15 elements. And one hence setting a new value is essentially setting the data, not checking if there is a bin. Now over to finding a value that are included in the hashtable. Here the happy case are that you find out that the slot is first (h) and then after a equal check of the key part of the handle one have found the handle (this is as good as one can hope). If we find the slot in one of the other 15 slots a move to front is used in order to try to keep that active set relevant and fast. In case we scan the hole hashtable one should also keep a scanning list beside the hash table and use that instead of the hash table structure so essentially there is very few use cases where this hashtable has a downside. Also one would probably store hashtables of size < 128 or such as a scanning list as this can be made very fast. Now this method compresses the main table, we are talking about 4scm values for each bin and we can maybe have 12-16 scm values in the mean meaning that the size of the main hash is 3-4 times smaller than a normal hash table. This is important as the threshold when the randomisation of memory can mean a dramatic improvement of speed for quite many sizes of hash table. At least this is the theory. You can follow my work at https://gitlab.com/tampe/guile-persist --000000000000f8109e05da721a30 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Here is an idea for guile hash tables.=C2=A0

So consider n bins,
struct bin x[n];

where the bin is defined by,

struct bin
{
unioni m128=C2=A0 =C2=A0 =C2=A0 m,
SCM=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 h,
SCM=C2=A0= =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 v,
}

and the matching structure is a MMS friendly=C2=A0dat= astructure
union m128
{
uint8_t=C2=A0 =C2=A0 = =C2=A0 =C2=A0v[16];
uint128_t=C2=A0 =C2=A0x;
}

let c be the constant=C2=A00x01010101010101010101010101010= 101

Then if k is uint8_t, we can distribute k over= all v as such
y =3D c *=C2=A0 k

And in = the binwe can define
z =3D (y =3D=3D x[i].m)

=
This will put ones at the bytes where the bytes matches. We can now fi= nd the index of the first nonzero byte in z and hence find a matching index= if there is a match. So with just a few instructions you will find a match= ing index among 16 bytes. This can lead to insane speed in scanning a list = of key value pairs and I noted that scanning 1000 elements takes about 100n= s on my laptop.=C2=A0

Now over to hashes the idea = is h is the head and x is a vector that are 4 or 8 or 16 byte long and stor= es the actual key/value handles. the last slot however will be kept for a n= ormal assoc in case the 15 slots are not enough or there are hash clashes.<= /div>

The happy case for setting a new value are that yo= u will find zero matches in just essentially a few operations although the = bin is filled with 15 elements. And one hence setting a new value is essent= ially setting the data, not checking if there is a bin.

Now over to finding a value that are included in the hashtable. Here = the happy case are that you find out that the slot is first (h)=C2=A0 and t= hen after a equal check of the key part of the handle one have found the ha= ndle (this is as good as one can hope). If we find the slot in one of the o= ther 15 slots a move to front is used in order to try to keep that active= =C2=A0set relevant and fast. In case we scan the hole hashtable one should = also keep a scanning list beside the hash table and use that instead=C2=A0o= f the hash table structure so essentially there is very few use cases where= this hashtable has a downside. Also one would probably store hashtables of= size < 128 or such as a scanning list as this can be made very fast.

Now this method compresses the main table, we are ta= lking about 4scm values for each bin and we can maybe have 12-16 scm values= in the mean meaning that the size of the main hash is 3-4 times smaller th= an a normal hash table. This is important as the threshold when the randomi= sation of memory can mean a dramatic improvement of speed for quite many si= zes of hash table.

At least this is the theory. Yo= u can follow my work at





--000000000000f8109e05da721a30--