From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: lloda Newsgroups: gmane.lisp.guile.devel Subject: Re: more advanced bytevector => supervectors Date: Sat, 11 Sep 2021 20:21:27 +0200 Message-ID: <64B4CFB3-AD56-4789-9498-95DB3F1D6306@sarc.name> References: Mime-Version: 1.0 (Mac OS X Mail 13.4 \(3608.120.23.2.7\)) Content-Type: multipart/alternative; boundary="Apple-Mail=_C1CF87FE-F984-44D4-95C2-9DDBF598A931" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="23062"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-devel To: Stefan Israelsson Tampe Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Sat Sep 11 20:22:18 2021 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1mP7dp-0005mb-Iy for guile-devel@m.gmane-mx.org; Sat, 11 Sep 2021 20:22:17 +0200 Original-Received: from localhost ([::1]:45876 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mP7dn-0002Ny-Oz for guile-devel@m.gmane-mx.org; Sat, 11 Sep 2021 14:22:15 -0400 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]:44202) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mP7dM-0002Nq-QW for guile-devel@gnu.org; Sat, 11 Sep 2021 14:21:48 -0400 Original-Received: from mta-11-3.privateemail.com ([198.54.122.105]:9208) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mP7dJ-0001aS-Pu for guile-devel@gnu.org; Sat, 11 Sep 2021 14:21:48 -0400 Original-Received: from mta-11.privateemail.com (localhost [127.0.0.1]) by mta-11.privateemail.com (Postfix) with ESMTP id C0E0518001F3; Sat, 11 Sep 2021 14:21:30 -0400 (EDT) Original-Received: from [192.168.1.105] (unknown [10.20.151.248]) by mta-11.privateemail.com (Postfix) with ESMTPA id 0550E18000A5; Sat, 11 Sep 2021 14:21:29 -0400 (EDT) In-Reply-To: X-Mailer: Apple Mail (2.3608.120.23.2.7) X-Virus-Scanned: ClamAV using ClamSMTP Received-SPF: pass client-ip=198.54.122.105; envelope-from=lloda@sarc.name; helo=MTA-11-3.privateemail.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: "guile-devel" Xref: news.gmane.io gmane.lisp.guile.devel:20855 Archived-At: --Apple-Mail=_C1CF87FE-F984-44D4-95C2-9DDBF598A931 Content-Transfer-Encoding: quoted-printable Content-Type: text/plain; charset=us-ascii A problem is that Guile doesn't really provide a god set of fast rank 1 = ops. None of them have strides!=3D1 for example (this is ok for regular = vectors, but it hurts for general arrays), and some are missing = start/end or you have to write wrappers yourself, like for the typed = vectors (other than u8). So in some cases you have to do the loop in = Scheme. That's fine when the body of the loop is Scheme ops but if it's = something like copy or fill it really hurts compared to C. > On 11 Sep 2021, at 19:03, Stefan Israelsson Tampe = wrote: >=20 > I did some test ands wingo's superb compiler is about equally fast for = a hand made scheme loop as the automatic dispatch for getter and setter. = It e.g. can copy from=20 > e.g. u8 to i16 in about 100 op's / second using native byte order. = However compiling it in C lead to nasty 2 Go ops / second. So for these = kind of patterns > it is still better to work in C as it probaly vectorises the operation = quite well. Supervectors supports pushing busy loops to C very well and = I will probably=20 > enable fast C code for some simple utility ops. >=20 > On Wed, Sep 8, 2021 at 9:18 AM lloda > wrote: >=20 >=20 >> On 8 Sep 2021, at 04:04, Stefan Israelsson Tampe = > wrote: >>=20 >=20 > ... >=20 >> So using get-setter typically means >> ((get-setter #f bin1 #f=20 >> (lambda (set) (set v 2 val))) >>=20 >> #:is-endian 'little ;; only consider little endian = setters like I know=20 >> #:is-unsigned #t ;; only use unsigned >> #:is-integer #t ;; only use integer representations >> #:is-fixed #t ;; do not use the scm value vector = versions >> ) >> So a version where we only consider handling nonegative integers of = up to 64bit. The gain is faster compilation as this ideom will dispatch >> between 4 different versions of the the loop lambda and the compiler = could inline all of them or be able to detect the one that are used and = hot compile that version >> (a feature we do not have yet in guile) now whe you select between a = ref and a set you will similarly end up with 4*4 versions =3D 16 = different loops that. full versions >> is very large and a double loop with all featurs consists of (2*2 + = 3*2*2*2 + 4 + 1)**2 =3D 33*33 ~ 1000 versions of the loop which is crazy = if we should expand the loop >> for all cases in the compilation. Now guile would just use a = functional approach and not expand the loop everywhere. We will have = parameterised versions of >> libraries so that one can select which versions to compile for. for = example the general functions that performs transform form one = supervector to another is a general >> ideom that would use the full dispatc which is not practical,=20 >=20 > I'm curious where you're going with this. >=20 > I implemented something similar (iiuc) in = https://github.com/lloda/guile-newra/ = , specifically = https://github.com/lloda/guile-newra/blob/master/mod/newra/map.scm = , = where the lookup/set methods are inlined in the loop. The compilation = times indeed grow exponentially so I'm forced to have a default = 'generic' case.=20 >=20 > The idea for fixing this was to have some kind of run time compilation = cache so only a fixed number of type combinations that actually get used = would be compiled, instead of the tensor product of all types. But I = haven't figured out, or actually tried to do that yet. >=20 > Regards > =09 > Daniel >=20 --Apple-Mail=_C1CF87FE-F984-44D4-95C2-9DDBF598A931 Content-Transfer-Encoding: quoted-printable Content-Type: text/html; charset=us-ascii

A problem is that Guile doesn't really = provide a god set of fast rank 1 ops. None of them have strides!=3D1 for = example (this is ok for regular vectors, but it hurts for general = arrays), and some are missing start/end or you have to write wrappers = yourself, like for the typed vectors (other than u8). So in some cases = you have to do the loop in Scheme. That's fine when the body of the loop = is Scheme ops but if it's something like copy or fill it really hurts = compared to C.


On 11 = Sep 2021, at 19:03, Stefan Israelsson Tampe <stefan.itampe@gmail.com> wrote:

I did some test ands wingo's superb compiler is about equally = fast for a hand made scheme loop as the automatic dispatch for = getter and setter. It e.g. can copy from 
e.g. u8 to = i16 in about 100 op's / second using native byte order. However = compiling it in C lead to nasty 2 Go ops / second. So for these = kind of patterns
it is still better to = work in C as it probaly vectorises the operation quite well. = Supervectors supports pushing busy loops to C very well and I will = probably 
enable fast C code for some simple = utility ops.

On Wed, Sep = 8, 2021 at 9:18 AM lloda <lloda@sarc.name> wrote:


On 8 Sep = 2021, at 04:04, Stefan Israelsson Tampe <stefan.itampe@gmail.com> wrote:


...

So using get-setter typically means
((get-setter #f bin1 #f 
  =  (lambda (set) (set v 2 val)))

   #:is-endian 'little  =         ;; only consider little endian setters like = I know 
   #:is-unsigned  = #t         ;; only use unsigned
   #:is-integer      #t    =      ;; only use integer representations
   #:is-fixed          = #t        ;; do not use the scm value vector = versions
)
So a version where = we only consider handling nonegative integers of up to 64bit. The gain = is faster compilation as this ideom will dispatch
between 4 different versions of the the loop lambda and = the compiler could inline all of them or be able to detect the one that = are used and hot compile that version
(a feature we = do not have yet in guile) now whe you select between a ref and a set you = will similarly end up with 4*4 versions =3D 16 different loops = that. full versions
is very large and a double loop = with all featurs consists of (2*2 + 3*2*2*2 + 4 + 1)**2 =3D= 33*33 ~ 1000 versions of the loop which is crazy if we should expand = the loop
for all cases in the compilation. Now = guile would just use a functional approach and not expand the loop = everywhere. We will have parameterised versions of
libraries so that one can select which versions to = compile for. for example the general functions that performs transform = form one supervector to another is a general
ideom that would use the full dispatc which is not = practical, 

I'm curious = where you're going with this.

I implemented = something similar (iiuc) in https://github.com/lloda/guile-newra/, = specifically https://github.com/lloda/guile-newra/blob/master/mod/newra/map.= scm , where the lookup/set methods are inlined in the loop. The = compilation times indeed grow exponentially so I'm forced to have a = default 'generic' case. 

The idea for fixing this was to have = some kind of run time compilation cache so only a fixed number of type = combinations that actually get used would be compiled, instead of the = tensor product of all types. But I haven't figured out, or actually = tried to do that yet.

Regards
= Daniel


= --Apple-Mail=_C1CF87FE-F984-44D4-95C2-9DDBF598A931--