From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: super-bit-vectors Date: Tue, 7 Sep 2021 21:26:05 +0200 Message-ID: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="00000000000049754205cb6cbd83" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="15985"; mail-complaints-to="usenet@ciao.gmane.io" To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Tue Sep 07 21:26:32 2021 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mNgjo-00044O-Bg for guile-devel@m.gmane-mx.org; Tue, 07 Sep 2021 21:26:32 +0200 Original-Received: from localhost ([::1]:47794 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mNgjn-0004br-Ep for guile-devel@m.gmane-mx.org; Tue, 07 Sep 2021 15:26:31 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:37058) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mNgjc-0004aA-DH for guile-devel@gnu.org; Tue, 07 Sep 2021 15:26:20 -0400 Original-Received: from mail-pl1-x62d.google.com ([2607:f8b0:4864:20::62d]:42732) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mNgja-0004qo-J2 for guile-devel@gnu.org; Tue, 07 Sep 2021 15:26:20 -0400 Original-Received: by mail-pl1-x62d.google.com with SMTP id n4so6449578plh.9 for ; Tue, 07 Sep 2021 12:26:17 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:from:date:message-id:subject:to; bh=oywcxvFWvyOa3qJV8MpsQ9jdgI+CMa2x/LpHWBB0GcI=; b=mseSTo3pxAmS+xQ5+o3/1hck1rP/HgQ+fnu/xGMR88OA159W8Q46yDNLdrW+c3Q7nT ZRFd0P45E0BAsRQ/Y5i36Fc84rSxq+ojdyOBLSFIcOR+5nhn1aXvZYJx2QSvYDqqaxKO cLIThidRyUhPdtuds7C1RVNmh8zm9Z3eJwxnRKU5lmKGe4gTxsaUdAbEINZi4dowYjAN Us9a/7b9Oh84pdMg6+f9d6+JzkWnNH6s0ghVDl5j6DBTAYDKuFABo0rmpnCH4cik80HS MTZhnUUyIcfBuRYC+kQ6EjuNGrqzOf/iReex3GX9bu1BGkZ8zfJ0AXec5jnuVH+HMI36 yjZw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:from:date:message-id:subject:to; bh=oywcxvFWvyOa3qJV8MpsQ9jdgI+CMa2x/LpHWBB0GcI=; b=iJnu9YM6eQEPTzpe4xNSoQxzOT4hrvchcN5eVGUUyFoaB+37uR+XVhuDwkGp6GKJEG kJ51zJGOdgrvDLHkAdAfoC2uRCIGHssdqGFORjku76YwcnaAxArbN4XpN32NLrfLwB8Y vS5GioFJ6Ij3p5tn7yml84HqlpyRRRDbGPmHA7EIog6WY0iN0Y04XtfwOPzEc+HrosZu ATJK8tHUZu2VxVp2Xb0NkXQ8bG2Ja08/8MkQSd4BmqLHul58qzjZCMnHLw+2OEmjvI6y S4V9dc2XUGsDfx522fL8HZ/hIpcJIWeE9+tDQrFY7DtyzhESokKlEmEPL98DcFqb3xTe Yn8w== X-Gm-Message-State: AOAM533Y8sDya2aGkpTswTZe1FhAy2wQgR6L6ewCKWNHZmLM8Hbd/lM4 ZV7CsyuvS2xRtwrWXX9hlppcL92+9r2abSaUFnOcc0oi8h0= X-Google-Smtp-Source: ABdhPJzut5ck3GTs1d0l7PJT/FQl6m8GHNvjn+xKXY+NPdrFKAS+xAVcEvAgkw111JtLWCWKImZYUH1iZh1yBNcoX0w= X-Received: by 2002:a17:902:7403:b0:138:e682:43f6 with SMTP id g3-20020a170902740300b00138e68243f6mr15920262pll.14.1631042776166; Tue, 07 Sep 2021 12:26:16 -0700 (PDT) Received-SPF: pass client-ip=2607:f8b0:4864:20::62d; envelope-from=stefan.itampe@gmail.com; helo=mail-pl1-x62d.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:20843 Archived-At: --00000000000049754205cb6cbd83 Content-Type: text/plain; charset="UTF-8" Hi Maybe you will not know it but bit vectors are sometimes a very good match to represent sets where the world is defined from the beginning. a world of 64 objects can represent subsets of this world as a uint64_t in other words the unit our cpu's are so darn fast at managing. So whenever you plan to spend time on having sets of a moderate set of enums an huge optimisation is to not use a hashmap, but n stead use a bitvector. And as you know, guile ave arbitrary big integers so a world of 128 elements is just a quite small bignum. Now this does not scale to millions where hashmaps starts to excel, but many times you do not need scale. Now what about our favorite set operations you may ask? well we have a translation here set complement = lognot set union = logior set intersection = logand set addition = lugxor And the processor have ops for even some other set operations like finding the first nonzero bit in order to loop through the elements in the set that are 1 (this would be a good addition to guile). In guile-log I have implemented a very featureful prolog matcher on steroids where the bulk of the work is to manage the set of matching elements, where all set operations are used. And is hence super fast for typical implemented predicates as this includes typically have less than 64 cases. So if the number of match cases is below 1000 we typically just keep a bigint to represent the set else we move over to hashmaps. Now un-fourtunately in guile, we do not have self modifying bitwise operators so in guile-log I hack guile i C to have those ops to not waste a lot of memory and time for larger predicates with bigger bit vectors. This is why I hope that guile will keep having big sized immediate integers as copying is so cheap and fast (creating new numbers on the stack). When working with supervectors I realized that a functional approach is sort of viable as we can share bits between different read only structures and even use copy on write if we would like to mutate. Using a supervector approach to bitvectors make sense if we would like to move to higher sizes as well and is much preferable than using hashmaps. we would represent the sets by trees instead which spervectors essentially are. So I realized that there is a good use case for these kinds of data structures in a supervector context and hence I am working on this atm. Kind of cool is that I keep a bit representing if the bin is negated or not, which means that we can optimize essentially all set operations of the type sparse-bitvector Op fat-bitvector her is how it goes. lognot -> just iterate through the bins and change the invert bit 1 or Bitvector => 1 0 or Bitvector => Bitvector 1 and Bitvector => Bitvector 0 and Bitvector => 0 1 xor Bitvector => lognot(Bitvectoe) 0 xor Bitvector => Bitvector cool right. Now the code in stis-supervectors can handle all cases, if the bins are not aligned and try to be smart about it (can treeify a datastructure to match bins) so it is a more complex story here, but in the simplest setup we just have a vector of bins with all bins the same size and we are good to go. A hint is to use binsizes of (ash 1 x), x a number so that we enable the easiest way to match bins else it can be really fragmented. A final note: Why on earth does not guile's bitvectors allow more bitwise operations, looks qiuite a bit of underpowered to me. --00000000000049754205cb6cbd83 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi

Maybe you will not know it but bit v= ectors are sometimes=C2=A0a very good match to represent sets where the wor= ld is defined from the beginning. a world of 64 objects can represent subse= ts of this world as a uint64_t in other words the unit our cpu's are so= darn fast at managing.=C2=A0 So whenever you plan to spend time on having = sets of a moderate set of enums an huge optimisation=C2=A0is to not use a h= ashmap, but n stead use a bitvector. And as you know, guile ave arbitrary= =C2=A0big integers so a world of 128 elements=C2=A0is just a=C2=A0 quite sm= all bignum. Now this does not scale to millions where hashmaps=C2=A0starts = to excel,=C2=A0but many times you do not need scale.
Now what abo= ut our favorite set operations you may ask? well we have a translation=C2= =A0here

set complement =3D lognot
set un= ion=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =3D logior
set inter= section=C2=A0 =3D logand
set addition=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =3D lugxor

And the processor=C2=A0have ops for eve= n some other set operations like finding the first nonzero bit
in= order to loop through the elements in the set that are 1 (this would be a = good addition=C2=A0
to guile).

In guile-= log I have implemented a very featureful prolog matcher on steroids=C2=A0wh= ere the bulk of the work is to manage the set of matching elements, where a= ll set operations=C2=A0are used. And is hence super fast for typical implem= ented predicates as this includes typically have less than 64 cases.=C2=A0 = So if the number of match cases is below 1000 we typically just keep a bigi= nt to represent the set else we move over to hashmaps. Now un-fourtunately= =C2=A0in guile, we do not have self modifying bitwise operators so in guile= -log I hack guile i C to have those ops to not waste=C2=A0
a lot = of memory and time for larger predicates with bigger bit vectors.

This is why I hope that guile will keep having big sized im= mediate integers as=C2=A0 copying is so cheap and fast (creating new number= s on the stack).

When working with supervectors I = realized that a functional approach is sort of viable as we can share bits = between different read only structures and even use copy on write if we wou= ld=C2=A0
like to mutate.=C2=A0
Using=C2=A0 a supervecto= r approach to bitvectors make sense if we would like to move to higher size= s as well and is much preferable than using hashmaps. we would represent th= e sets by trees instead which spervectors essentially are. So I realized th= at there is a good use case for these kinds of data structures in a superve= ctor context and hence I am working on this atm.

K= ind of cool is that I keep a bit representing if the bin is negated or not,= which means=C2=A0that we can optimize essentially all set operations of th= e type sparse-bitvector Op fat-bitvector her is how it goes.

=
lognot=C2=A0 -> just iterate through the bins and change the = invert bit
1 or Bitvector=C2=A0 =C2=A0 =3D> 1
0 or B= itvector=C2=A0 =C2=A0 =3D> Bitvector

1 and Bitv= ector =3D> Bitvector
0 and Bitvector =3D> 0
1 xor= =C2=A0 Bitvector =3D> lognot(Bitvectoe)
0 xor=C2=A0 Bitve= ctor =3D> Bitvector

cool right. Now the c= ode in stis-supervectors can handle all cases, if the bins are not aligned = and try to be smart about it (can treeify a datastructure to match bins) so= it is a more complex story here, but in the simplest setup we just have a = vector of bins with all bins the same size and we are good to go. A hint is= to use binsizes of (ash 1 x), x a number so that we enable the easiest way= to match bins else it can be really fragmented.

A= final note:
Why on earth does not guile's bitvectors=C2=A0al= low more bitwise operations, looks qiuite=C2=A0a bit of underpowered to me.=





=





=
--00000000000049754205cb6cbd83--