hi, I think I am happe now with the API for binary operators, To see where I am heading, consider the bitvector implementation. Here is e.g. how logxor is implemented s1 logior s2 -> s2 (define (supervector-logxor! s1 s2) (supervector-binary ;; The general case logxor ;; The s1 part is a constant bin (lambda (x) (if (= x 0) (values invert 0) ;; will invert the s2 part in case it is a full bin - fast (values nothing 0) ;; will leace the s2 part non changes - fast ;; the s2 part is constant (lambda (x) (if (= x 0) (values invert 0) ;; if s1 is ro? refere to it with negation if referable (fast) (values nothing 0))) ;; replace s2 with s1 if referable (fast) ;; Both s1 and s2 are constant and s2 is a complete bin logxor s1 s2)) We have the following controls (values add val) : will add the same val to all elements in the bin e.g. maybe stored in the bin (values scale val) : will scale the same vale to all elements in the bin (values nothing _) : will do nothing with the bin (values replace val) : set the whole bin to val (values invert _) : take lognot to the whole bin (values negate _) : take not to the whole bin With a good alignement of the bins and ability to reference bins if s2 is the constant part, the code will successfully produce a fast application of the binary operators. This means that clustered sparse bitvectors are well modeled in this system and for those vectors. especially for advanced matching constructs with very large numbers of clauses it is possible to make the matching fast and also the amount of memory needed to store the matcher is small. We have a similar system for algebraic manipulation and for logical operations. logxor is not in the current setup not applicated as a fast op however, but it is sent as a function into the construct in higher order manner. This means that busy loops for fat vectors are a bit slow. When things are debugged I will move over to macro versions which allow the busy loop to essentially be as fast as if we did a logxor in a single loop of a bytevector which you can get at 100-200Mops per second. This can go much faster like 1-2G ops per second if we push it to C. On Thu, Sep 2, 2021 at 5:45 PM Stefan Israelsson Tampe < stefan.itampe@gmail.com> wrote: > Hi guilers! > > My next project is to explore a more advanced bytevector structure than > today's bytevectors. > I think that having this infrastructure in guile and core code taken > advantage of it like having strings otop of it and allowing our string > library to use those (the difficult case is probably to get regexps working > properly) > > Anyhow this is my initial comment in the code: > > #| > The idea of this data structure is to note that when we employ full > large datastructure scans we can allow for a much more rich and featurefull > datastructure then a simple bytevector. The reason is that we can divide > the data in byte chunks and spend most of the time scanning copying maping > those areas with usual methis, even optimized C - code taken advantage of > advanced cpu opcodes are possible here. ANd by dividing it in chunks we get > a lot of new features with essentiually no cose with more than complexity > which we manage mostly in scheme. We gain many things, > > 1. Many new features that can vary on the pages > > 2. less memory hogs as > a. it can use copy ion write semantics > b. it does not need to fins 1GB continuous blocks > > 3. potentially faster operations as we can fast foorward the zeroe on write > pages compared to pure bytevectors > > 4. we can allow for swaping and refferential datastructures to speed up > copying > even further > > 5. can get better fiber features of C programs that spends seconds or > minutes on > performing an operation because it will just spend a microsecond or such > in C-land and then swap back to Scheme. CTRL-C will also work nicely. > > 6. We could potentially build a string library untop of these > datastructures and > also we could add features to pages that lead to a much richer > interface. > > 7. resizing is much faster end efficient > > 8. reversing is super quick > > 9. queues and stacks of byte data can have a very efficient > implementations > > Drawback: > 1. More complex as we need to consider boudaries > 2. Slower one off operations like bytevector-u8-get as guile compiles the > core operatoins to quite effective JIT CPU encodings. But maybe we can > disign some caching to make those operations much faster and even have > suporting JIT operations. > > |# > > WDYT ? > >