From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Stefan Israelsson Tampe Newsgroups: gmane.lisp.guile.devel Subject: Re: more advanced bytevector => supervectors Date: Wed, 15 Sep 2021 02:12:32 +0200 Message-ID: References: Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000a2dbe305cbfd8e21" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="17784"; mail-complaints-to="usenet@ciao.gmane.io" To: guile-devel Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Wed Sep 15 02:13:04 2021 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mQIXw-0004Tz-1P for guile-devel@m.gmane-mx.org; Wed, 15 Sep 2021 02:13:04 +0200 Original-Received: from localhost ([::1]:39614 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mQIXt-0000Gr-VQ for guile-devel@m.gmane-mx.org; Tue, 14 Sep 2021 20:13:01 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:56638) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mQIXf-0000Gg-Qo for guile-devel@gnu.org; Tue, 14 Sep 2021 20:12:47 -0400 Original-Received: from mail-vk1-xa2c.google.com ([2607:f8b0:4864:20::a2c]:45834) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1mQIXd-0003F1-Ms for guile-devel@gnu.org; Tue, 14 Sep 2021 20:12:47 -0400 Original-Received: by mail-vk1-xa2c.google.com with SMTP id f18so336027vka.12 for ; Tue, 14 Sep 2021 17:12:44 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=mime-version:references:in-reply-to:from:date:message-id:subject:to; bh=KKjm30p54AhtKuk1IF3f2ki/ML/xCiE+pRJ+TQThkHk=; b=RFRSce7YSS/kuKsXJTar9BKc4hiHeaXGNfJG6+/vqafMouemfkvy+/c3hrBkCJS71a Y+3ndpcZPItbSg1L4rLZZKeKELH6QsPod65xt6p1zzirKUpSv8sAaB9OrRmEpCzADwJp E9neSGBPKRZgg8I9U22CdYRj5OYFMPqsL8CY241ISY/zSiRJkUt3PRS9DTgGm5mAtPJk RysjRBHBfvVz3JcaI3V6ggiUEruUAQalADLWSmjYlUHv3nCaBCxtYy5v0BP18cdvn30M hzC8OVKTYVJPXD2z2g1luL+3QolltoiX46MZ70lvNq5rjdU8vmvBIwH1c8PGgAhxM5CH Lumw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to; bh=KKjm30p54AhtKuk1IF3f2ki/ML/xCiE+pRJ+TQThkHk=; b=cCgyQUP1bUjvf3MitQFLQeb8aVQcP8I8kQonobgkYdhdfjFPj9i1J7MMg7aNy6wkz1 HlEMbQeucnuX1mv6EioDjXypYuOYTdly72cpDxUBdJVWbZ84WXvGHD//JNE1qRM44EJj NtwMjxb+U3Z0aT4kptSOKntZvg3h8joHdKnG6BrLtgMFUMWXYNCCInDGbS1JIkvs37rA faevur/Eo6tCT7GAVYkL9hS2hIoqNsfRQplzLNMEivDvKG8TtvreYMiyandErCHyQvc4 oPW9P84bcjQ8RhcR7qhbfqh2s68I/b4AXUWWIJx0gXPe5BLN0OipnTTl7uKyxdSZz7Zb GlLQ== X-Gm-Message-State: AOAM533NfK3CFbJ++uoXEtupe6ne5J1ppbCsjaGNAHg8VKfNZMvcmG8J 4xiB6FHATzM80KK58QZtJz2EUCM0nFiNU2BllMDBCKBmpTw= X-Google-Smtp-Source: ABdhPJzFjAA34hBzJ7H4QptZFhy4dkD0YaQ9DQRHg5Y66ok7nUuEbrRfVYZiCJMYqZPMSj3cnvfPROp4U39mwHyTsG0= X-Received: by 2002:a1f:9412:: with SMTP id w18mr1528836vkd.16.1631664763754; Tue, 14 Sep 2021 17:12:43 -0700 (PDT) In-Reply-To: Received-SPF: pass client-ip=2607:f8b0:4864:20::a2c; envelope-from=stefan.itampe@gmail.com; helo=mail-vk1-xa2c.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:20861 Archived-At: --000000000000a2dbe305cbfd8e21 Content-Type: text/plain; charset="UTF-8" hi, I think I am happe now with the API for binary operators, To see where I am heading, consider the bitvector implementation. Here is e.g. how logxor is implemented s1 logior s2 -> s2 (define (supervector-logxor! s1 s2) (supervector-binary ;; The general case logxor ;; The s1 part is a constant bin (lambda (x) (if (= x 0) (values invert 0) ;; will invert the s2 part in case it is a full bin - fast (values nothing 0) ;; will leace the s2 part non changes - fast ;; the s2 part is constant (lambda (x) (if (= x 0) (values invert 0) ;; if s1 is ro? refere to it with negation if referable (fast) (values nothing 0))) ;; replace s2 with s1 if referable (fast) ;; Both s1 and s2 are constant and s2 is a complete bin logxor s1 s2)) We have the following controls (values add val) : will add the same val to all elements in the bin e.g. maybe stored in the bin (values scale val) : will scale the same vale to all elements in the bin (values nothing _) : will do nothing with the bin (values replace val) : set the whole bin to val (values invert _) : take lognot to the whole bin (values negate _) : take not to the whole bin With a good alignement of the bins and ability to reference bins if s2 is the constant part, the code will successfully produce a fast application of the binary operators. This means that clustered sparse bitvectors are well modeled in this system and for those vectors. especially for advanced matching constructs with very large numbers of clauses it is possible to make the matching fast and also the amount of memory needed to store the matcher is small. We have a similar system for algebraic manipulation and for logical operations. logxor is not in the current setup not applicated as a fast op however, but it is sent as a function into the construct in higher order manner. This means that busy loops for fat vectors are a bit slow. When things are debugged I will move over to macro versions which allow the busy loop to essentially be as fast as if we did a logxor in a single loop of a bytevector which you can get at 100-200Mops per second. This can go much faster like 1-2G ops per second if we push it to C. On Thu, Sep 2, 2021 at 5:45 PM Stefan Israelsson Tampe < stefan.itampe@gmail.com> wrote: > Hi guilers! > > My next project is to explore a more advanced bytevector structure than > today's bytevectors. > I think that having this infrastructure in guile and core code taken > advantage of it like having strings otop of it and allowing our string > library to use those (the difficult case is probably to get regexps working > properly) > > Anyhow this is my initial comment in the code: > > #| > The idea of this data structure is to note that when we employ full > large datastructure scans we can allow for a much more rich and featurefull > datastructure then a simple bytevector. The reason is that we can divide > the data in byte chunks and spend most of the time scanning copying maping > those areas with usual methis, even optimized C - code taken advantage of > advanced cpu opcodes are possible here. ANd by dividing it in chunks we get > a lot of new features with essentiually no cose with more than complexity > which we manage mostly in scheme. We gain many things, > > 1. Many new features that can vary on the pages > > 2. less memory hogs as > a. it can use copy ion write semantics > b. it does not need to fins 1GB continuous blocks > > 3. potentially faster operations as we can fast foorward the zeroe on write > pages compared to pure bytevectors > > 4. we can allow for swaping and refferential datastructures to speed up > copying > even further > > 5. can get better fiber features of C programs that spends seconds or > minutes on > performing an operation because it will just spend a microsecond or such > in C-land and then swap back to Scheme. CTRL-C will also work nicely. > > 6. We could potentially build a string library untop of these > datastructures and > also we could add features to pages that lead to a much richer > interface. > > 7. resizing is much faster end efficient > > 8. reversing is super quick > > 9. queues and stacks of byte data can have a very efficient > implementations > > Drawback: > 1. More complex as we need to consider boudaries > 2. Slower one off operations like bytevector-u8-get as guile compiles the > core operatoins to quite effective JIT CPU encodings. But maybe we can > disign some caching to make those operations much faster and even have > suporting JIT operations. > > |# > > WDYT ? > > --000000000000a2dbe305cbfd8e21 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
hi,

I think I am happe now with the API= for binary operators,=C2=A0

To see where I am hea= ding, consider the bitvector=C2=A0implementation. Here is e.g. how logxor i= s implemented
s1 logior s2 -> s2
(define (supervecto= r-logxor! s1 s2)
=C2=A0 (supervector-binary
=C2=A0 =C2=A0;; Th= e general case
=C2=A0 =C2=A0logxor=C2=A0=C2=A0

= =C2=A0 =C2=A0;; The=C2=A0 s1 part is a constant bin
=C2=A0 =C2=A0(lambda= (x)
=C2=A0 =C2=A0 =C2=A0(if (=3D x 0)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0(values invert=C2=A0 =C2=A0 0) ;; will invert the s2 part in case it = is a full bin - fast
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0(values nothing 0= ) ;; will leace=C2=A0the s2 part non changes - fast

=C2=A0 =C2=A0 ;;= the s2 part is constant
=C2=A0 =C2=A0 =C2=A0(lambda (x)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0(if (=3D x 0)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 (value= s invert=C2=A0 =C2=A0 0)=C2=A0 =C2=A0 ;; if s1 is ro? refere to it with neg= ation if referable=C2=A0 =C2=A0(fast)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0= (values nothing 0))) ;; replace s2 with s1 if referable=C2=A0 =C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 (fast)
=C2=A0 =C2=A0
=C2=A0 =C2=A0;; Both s1 and= s2 are constant and s2 is a complete bin
=C2=A0 =C2=A0logxor
=
=C2=A0 =C2=A0s1
=C2=A0 =C2=A0s2))

We ha= ve the following controls=C2=A0
(values add=C2=A0 =C2=A0val)=C2= =A0 =C2=A0 : will add the same val to all elements=C2=A0in the bin e.g. may= be stored in the bin
(values scale val)=C2=A0 =C2=A0 :=C2=A0 will= scale the same vale to all elements in the bin
(values nothing _= )=C2=A0 =C2=A0: will do nothing with the bin
(values replace val)= : set the whole bin to val
(values invert=C2=A0 =C2=A0 _)=C2=A0 = =C2=A0 :=C2=A0 take lognot to the whole bin
(values negate _)=C2= =A0 =C2=A0 =C2=A0: take not to the whole bin

With = a good alignement=C2=A0of the bins and ability to reference bins if s2 is t= he constant part,=C2=A0
the code will successfully produce a fast= application of the binary operators. This means that
clustered s= parse bitvectors=C2=A0are well modeled in this system and for those vectors= . especially
for advanced matching constructs with very large num= bers of clauses it is possible to make the
matching fast and also= the amount of memory needed to store the matcher is small.

<= /div>
We have a similar system for algebraic manipulation and for logic= al operations.

logxor is not in the current setup = not applicated as a fast op however, but it is sent as a function into the= =C2=A0
construct in higher order manner. This means that busy loo= ps for fat vectors=C2=A0are a bit slow. When things
are debugged = I will move over to macro versions which allow the busy loop to essentially= =C2=A0be as fast as if we
did a logxor in a single loop of a byte= vector which you can get at 100-200Mops per second. This can go much
<= div>faster like 1-2G ops per second if we push it to C.




On Thu, Sep 2, 2021 at 5:45 PM Stefan Israel= sson Tampe <stefan.itampe@gma= il.com> wrote:
Hi guilers!

My next project is to= explore a more advanced bytevector structure than today's bytevectors.= =C2=A0
I think that having this infrastructure in guile and core = code taken advantage=C2=A0of it like having strings otop of it and allowing= our string library to use those (the difficult case is probably to get reg= exps working properly)

Anyhow this is my initial c= omment in the code:

#|
The idea of this data st= ructure is to note that when we employ full
large datastructure scans we= can allow for a much more rich and featurefull
datastructure then a sim= ple bytevector. The reason is that we can divide
the data in byte chunks= and spend most of the time scanning copying maping
those areas with usu= al methis, even optimized C - code taken advantage of advanced cpu opcodes = are possible here. ANd by dividing it in chunks we get a lot of new feature= s with essentiually no cose with more than complexity which we manage mostl= y in scheme. We gain many things,

1. Many new features that can vary= on the pages

2. less memory hogs as
=C2=A0 =C2=A0a. it can use c= opy ion write semantics
=C2=A0 =C2=A0b. it does not need to fins 1GB con= tinuous blocks

3. potentially faster operations as we can fast foorw= ard the zeroe on write
=C2=A0 =C2=A0pages compared to pure bytevectors
4. we can allow for swaping and refferential datastructures to speed = up copying
=C2=A0 =C2=A0even further

5. can get better fiber feat= ures of C programs that spends seconds or minutes on
=C2=A0 =C2=A0perfor= ming an operation because it will just spend a microsecond or such
=C2= =A0 =C2=A0in C-land and then swap back to Scheme. CTRL-C will also work nic= ely.

6. We could potentially build a string library untop of these d= atastructures and
=C2=A0 =C2=A0also we could add features to pages that = lead to a much richer interface.
=C2=A0
7. resizing is much faster e= nd efficient

8. reversing is super quick

9. queues and stacks= of byte data can have a very efficient implementations

Drawback:1. More complex as we need to consider boudaries
2. Slower one off ope= rations like bytevector-u8-get as guile compiles the
=C2=A0 =C2=A0core = operatoins to quite effective JIT CPU encodings. But maybe we can
=C2=A0= =C2=A0disign some caching to make those operations much faster and even ha= ve
=C2=A0 =C2=A0suporting JIT operations.

|#

WDYT ?

--000000000000a2dbe305cbfd8e21--