unofficial mirror of guile-devel@gnu.org 
 help / color / mirror / Atom feed
* super-bit-vectors
@ 2021-09-07 19:26 Stefan Israelsson Tampe
  0 siblings, 0 replies; only message in thread
From: Stefan Israelsson Tampe @ 2021-09-07 19:26 UTC (permalink / raw)
  To: guile-devel

[-- Attachment #1: Type: text/plain, Size: 3427 bytes --]

Hi

Maybe you will not know it but bit vectors are sometimes a very good match
to represent sets where the world is defined from the beginning. a world of
64 objects can represent subsets of this world as a uint64_t in other words
the unit our cpu's are so darn fast at managing.  So whenever you plan to
spend time on having sets of a moderate set of enums an huge
optimisation is to not use a hashmap, but n stead use a bitvector. And as
you know, guile ave arbitrary big integers so a world of 128 elements is
just a  quite small bignum. Now this does not scale to millions where
hashmaps starts to excel, but many times you do not need scale.
Now what about our favorite set operations you may ask? well we have a
translation here

set complement = lognot
set union            = logior
set intersection  = logand
set addition        = lugxor

And the processor have ops for even some other set operations like finding
the first nonzero bit
in order to loop through the elements in the set that are 1 (this would be
a good addition
to guile).

In guile-log I have implemented a very featureful prolog matcher on
steroids where the bulk of the work is to manage the set of matching
elements, where all set operations are used. And is hence super fast for
typical implemented predicates as this includes typically have less than 64
cases.  So if the number of match cases is below 1000 we typically just
keep a bigint to represent the set else we move over to hashmaps. Now
un-fourtunately in guile, we do not have self modifying bitwise operators
so in guile-log I hack guile i C to have those ops to not waste
a lot of memory and time for larger predicates with bigger bit vectors.

This is why I hope that guile will keep having big sized immediate integers
as  copying is so cheap and fast (creating new numbers on the stack).

When working with supervectors I realized that a functional approach is
sort of viable as we can share bits between different read only structures
and even use copy on write if we would
like to mutate.
Using  a supervector approach to bitvectors make sense if we would like to
move to higher sizes as well and is much preferable than using hashmaps. we
would represent the sets by trees instead which spervectors essentially
are. So I realized that there is a good use case for these kinds of data
structures in a supervector context and hence I am working on this atm.

Kind of cool is that I keep a bit representing if the bin is negated or
not, which means that we can optimize essentially all set operations of the
type sparse-bitvector Op fat-bitvector her is how it goes.

lognot  -> just iterate through the bins and change the invert bit
1 or Bitvector    => 1
0 or Bitvector    => Bitvector

1 and Bitvector => Bitvector
0 and Bitvector => 0
1 xor  Bitvector => lognot(Bitvectoe)
0 xor  Bitvector => Bitvector

cool right. Now the code in stis-supervectors can handle all cases, if the
bins are not aligned and try to be smart about it (can treeify a
datastructure to match bins) so it is a more complex story here, but in the
simplest setup we just have a vector of bins with all bins the same size
and we are good to go. A hint is to use binsizes of (ash 1 x), x a number
so that we enable the easiest way to match bins else it can be really
fragmented.

A final note:
Why on earth does not guile's bitvectors allow more bitwise operations,
looks qiuite a bit of underpowered to me.

[-- Attachment #2: Type: text/html, Size: 4098 bytes --]

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2021-09-07 19:26 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-07 19:26 super-bit-vectors Stefan Israelsson Tampe

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).