I did some test ands wingo's superb compiler is about equally fast for a
hand made scheme loop as the automatic dispatch for getter and setter. It
e.g. can copy from
e.g. u8 to i16 in about 100 op's / second using native byte order. However
compiling it in C lead to nasty 2 Go ops / second. So for these kind of
patterns
it is still better to work in C as it probaly vectorises the operation
quite well. Supervectors supports pushing busy loops to C very well and I
will probably
enable fast C code for some simple utility ops.

On Wed, Sep 8, 2021 at 9:18 AM lloda <lloda@sarc.name> wrote:

>
>
> On 8 Sep 2021, at 04:04, Stefan Israelsson Tampe <stefan.itampe@gmail.com>
> wrote:
>
>
> ...
>
>
> So using get-setter typically means
> ((get-setter #f bin1 #f
>    (lambda (set) (set v 2 val)))
>
>    #:is-endian 'little          ;; only consider little endian setters
> like I know
>    #:is-unsigned  #t         ;; only use unsigned
>    #:is-integer      #t         ;; only use integer representations
>    #:is-fixed          #t        ;; do not use the scm value vector
> versions
> )
> So a version where we only consider handling nonegative integers of up to
> 64bit. The gain is faster compilation as this ideom will dispatch
> between 4 different versions of the the loop lambda and the compiler could
> inline all of them or be able to detect the one that are used and hot
> compile that version
> (a feature we do not have yet in guile) now whe you select between a ref
> and a set you will similarly end up with 4*4 versions = 16 different loops
> that. full versions
> is very large and a double loop with all featurs consists of (2*2 +
> 3*2*2*2 + 4 + 1)**2 = 33*33 ~ 1000 versions of the loop which is crazy if
> we should expand the loop
> for all cases in the compilation. Now guile would just use a functional
> approach and not expand the loop everywhere. We will have parameterised
> versions of
> libraries so that one can select which versions to compile for. for
> example the general functions that performs transform form one supervector
> to another is a general
> ideom that would use the full dispatc which is not practical,
>
>
> I'm curious where you're going with this.
>
> I implemented something similar (iiuc) in
> https://github.com/lloda/guile-newra/, specifically
> https://github.com/lloda/guile-newra/blob/master/mod/newra/map.scm ,
> where the lookup/set methods are inlined in the loop. The compilation times
> indeed grow exponentially so I'm forced to have a default 'generic' case.
>
> The idea for fixing this was to have some kind of run time compilation
> cache so only a fixed number of type combinations that actually get used
> would be compiled, instead of the tensor product of all types. But I
> haven't figured out, or actually tried to do that yet.
>
> Regards
> Daniel
>
>