From mboxrd@z Thu Jan  1 00:00:00 1970
Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail
From: Damien Mattei <damien.mattei@gmail.com>
Newsgroups: gmane.lisp.guile.devel,gmane.lisp.guile.user
Subject: Re: fibers,questions about thread id and mutation of vectors
Date: Fri, 13 Jan 2023 12:10:12 +0100
Message-ID: <CADEOadfrNgjKixyEtsreQT0=HkNUPhcVvgLi+PhwwwFtTDPLAQ@mail.gmail.com>
References: <CADEOadcJmiNnDe2Vh98aZBFOMufH0oJvbDOPsx=nnGv8hwJ61A@mail.gmail.com>
 <a21ee4b1-20ce-4229-882a-d7cde6d060fd@telenet.be>
 <CADEOadfJMG4rM1c=UcK+2+HJFx0WthZM1cXE3qX4P-+v_nOkgQ@mail.gmail.com>
 <97b66c92-d969-a5d1-7371-1ea4a66a0063@telenet.be>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="000000000000ae84be05f22347e9"
Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214";
	logging-data="31129"; mail-complaints-to="usenet@ciao.gmane.io"
Cc: guile-user <guile-user@gnu.org>, guile-devel <guile-devel@gnu.org>
To: Maxime Devos <maximedevos@telenet.be>
Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Fri Jan 13 12:11:25 2023
Return-path: <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org>
Envelope-to: guile-devel@m.gmane-mx.org
Original-Received: from lists.gnu.org ([209.51.188.17])
	by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
	(Exim 4.92)
	(envelope-from <guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org>)
	id 1pGHy0-0007tu-9N
	for guile-devel@m.gmane-mx.org; Fri, 13 Jan 2023 12:11:24 +0100
Original-Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <guile-devel-bounces@gnu.org>)
	id 1pGHxU-0006Ge-5R; Fri, 13 Jan 2023 06:10:52 -0500
Original-Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <damien.mattei@gmail.com>)
 id 1pGHxB-00065R-Kx; Fri, 13 Jan 2023 06:10:39 -0500
Original-Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <damien.mattei@gmail.com>)
 id 1pGHx5-0005mV-ER; Fri, 13 Jan 2023 06:10:30 -0500
Original-Received: by mail-ej1-x629.google.com with SMTP id fy8so51392881ejc.13;
 Fri, 13 Jan 2023 03:10:25 -0800 (PST)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:from:to:cc:subject:date:message-id:reply-to;
 bh=heJa3VCAfGAZmDSbHF3DBEuSaSD0ZJFlTycKMQG0pcI=;
 b=cgvanTXOwekmO+Jud61xwei+lMbjCPfFLVNvYgGYtmDlFTp4Ms0k73m2UEcdZVyQft
 gbjpTXscWqRBhkZTkkJSp6L5Ei5DuPK9eKAY/OeP46I+x+BH3zemq+Y/cY09TU+MoRu/
 PYFZimjPsaJhh2Xva2o5NOVijwCoRwR3kUHyz0H6jqWKJJzcQe/6f5P/5X0kJoCBIx+2
 4nqbNYofkm6QMGNIcTero053eiqyCYO8JsCfAbnOE4Jp1cauaE1VjKgDX0BgAWufEFHG
 isesjwOvf7RtSeY68tfWBa0GOLFV6IgbeNvtN870wzYsovsgb46f29BvSG0EUepd4bpD
 bzBA==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=cc:to:subject:message-id:date:from:in-reply-to:references
 :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id
 :reply-to;
 bh=heJa3VCAfGAZmDSbHF3DBEuSaSD0ZJFlTycKMQG0pcI=;
 b=kigZq4R52ah5/Ic9p1nRTT7IH4uT41XM+8p7mmjt3foDBMClEfYbU1uqq9haerRRf0
 6UGQiu5fyzYae18NZP3inPODrSw4By13XaRtKGUbTUmFrZujQqS4LFe6FM6qjLAXNB0l
 aD9SxFkYpdVgpWhefpyPN6K5Th7ffieB+3UgABmLatYYPXXZfCElUpsI7DJAcHgt0/Wq
 0GOA6fDLIcMHckxJ6/mqNhK1TUc3Y2tl06hMwUdETxtvF8OIzECQqYjy5APpDsxDlkiN
 OZTw59AKJFgxN/uE5fsHhBTXXMseRvZZ6/tOI3eJ7pmcHJTJhQRr6XzYx6E4RfjYXjgz
 m6Qg==
X-Gm-Message-State: AFqh2krdUveLQkIBx1mKqHoJWycJujNpO7pMR0lchmHc1Yw7P2MlBOqU
 74gZM1JitVb6TrRmrt/anvj4iRjT0unjEId/15oIoR4fmdQ=
X-Google-Smtp-Source: AMrXdXswDi+k5Q6OUfBE6Ds8nNld23I1jRBUYFIi2rDserJv61LTNY2WhD3wJNFhvKFvurS3rTMVQBpW+UBY/dihY7c=
X-Received: by 2002:a17:906:61b:b0:7c1:71ba:9759 with SMTP id
 s27-20020a170906061b00b007c171ba9759mr5484090ejb.770.1673608223948; Fri, 13
 Jan 2023 03:10:23 -0800 (PST)
In-Reply-To: <97b66c92-d969-a5d1-7371-1ea4a66a0063@telenet.be>
Received-SPF: pass client-ip=2a00:1450:4864:20::629;
 envelope-from=damien.mattei@gmail.com; helo=mail-ej1-x629.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: guile-devel@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: "Developers list for Guile,
 the GNU extensibility library" <guile-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guile-devel>,
 <mailto:guile-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/guile-devel>
List-Post: <mailto:guile-devel@gnu.org>
List-Help: <mailto:guile-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guile-devel>,
 <mailto:guile-devel-request@gnu.org?subject=subscribe>
Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org
Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org
Xref: news.gmane.io gmane.lisp.guile.devel:21572 gmane.lisp.guile.user:18861
Archived-At: <http://permalink.gmane.org/gmane.lisp.guile.devel/21572>

--000000000000ae84be05f22347e9
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

i made some test of openMP and Guile with Guile 3.0.8.99-f3ea8 on MacOS M1
and Linux Intel because i was not sure of the performances. I find a
problem on Linux the code is slower (could be a factor of 5) with openMP
and in Mac OS the gain is is of 100% (divide by 2) or 15% depending of
computation complexity.
i can not explain why it works under MacOS and not Linux, the only
difference of compilation is that under Mac OS i had to force this option
to succeed compiling:
configure --enable-mini-gmp

Anyway it is not good performance for openMP with scheme, under openMP with
n CPUs i have gain of almost n x 100% of speedup, in C language or Fortran
OpenMP when use for astronomical numerical simulation.
in the // region i have only this code on MacOS:

  scm_init_guile();

#pragma omp parallel for

  for (i=3Dstart; i<=3Dstop; i++)  { /* i is private by default */

    scm_call_1( func , scm_from_int(i) );

with linux this create a segmentation fault unless i move inside the for
loop the line scm_init_guile();

like this:

#pragma omp parallel for

  for (i=3Dstart; i<=3Dstop; i++)  { /* i is private by default */

    scm_init_guile();
    scm_call_1( func , scm_from_int(i) );

https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP=
.c#L91

the scheme+ code for speed test looks like that (i use collatz function to
make the computation unpredictable for any C compiler optimisations when i
compare with pur C code):

;; only for speed tests
{vtstlen <+ 2642245}
{vtst <+ (make-vector vtstlen 0)}

{fct <+ (lambda (x) {x * x * x})}

(define (fctapply i) {vtst[i] <- fct(vtst[i])}) ;; neoteric expression of
{vtst[i] <- (fct vtst[i])}

(define (fctpluscollatzapply i) {vtst[i] <- fctpluscollatz(vtst[i])})

(define (speed-test)

  ;; init data
  (display-nl "speed-test : Initialising data.")
  (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}})
       {vtst[i] <- i})

  ;; compute
  (display-nl "speed-test : testing Scheme alone : start")
  (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}})
       (fctpluscollatzapply i));;(fctapply i))
  (display-nl "speed-test : testing Scheme alone : end")

  (newline)

  ;; display a few results
  (for ({i <+ 0} {i < 10} {i <- {i + 1}})
       (display-nl {vtst[i]}))
  (display-nl ".....")
  (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}})
       (display-nl {vtst[i]}))

  ;; init data
  (display-nl "speed-test : Initialising data.")
  (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}})
       {vtst[i] <- i})

  ;; compute
  (display-nl "speed-test : testing Scheme with OpenMP : start")
  (openmp 0 {vtstlen - 1} (string->pointer
"fctpluscollatzapply"));;"fctapply"))
  (display-nl "speed-test : testing Scheme with OpenMP : end")

  (newline)

  ;; display a few results
  (for ({i <+ 0} {i < 10} {i <- {i + 1}})
       (display-nl {vtst[i]}))
  (display-nl ".....")
  (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}})
       (display-nl {vtst[i]}))

  )


(define (collatz n)
  (cond ({n =3D 1} 1)
({(modulo n 2) =3D 0} {n / 2})
(else {{3 * n} + 1})))


(define (fctpluscollatz x)
  (declare c)
  (if {x =3D 0}
      {c <- 0}
      {c <- collatz(x)})
  {{x * x * x} + c})


(define openmp (foreign-library-function "./libguile-openMP" "openmp"
#:return-type int #:arg-types (list int int '*)))


(define libomp (dynamic-link "libomp")) ;;  note: require a link : ln -s
/opt/homebrew/opt/libomp/lib/libomp.dylib libomp.dylib
;; export LTDL_LIBRARY_PATH=3D. under linux with a link as above
;; or better solution: export LTDL_LIBRARY_PATH=3D/usr/lib/llvm-14/lib

(define omp-get-max-threads
  (pointer->procedure int
                      (dynamic-func "omp_get_max_threads" libomp)
                      '()))

https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki=
%2B.scm#L3581

output:

scheme@(guile-user)> (speed-test )
speed-test : Initialising data.
speed-test : testing Scheme alone : start
speed-test : testing Scheme alone : end

0
2
9
37
66
141
219
365
516
757
.....
18446514741354254581
18446535685572961374
18446556629820732765
18446577574071146391
18446598518350624637
18446619462632745120
18446640406943930245
18446661351257757609
18446682295600649637
18446703239946183906
speed-test : Initialising data.
speed-test : testing Scheme with OpenMP : start
speed-test : testing Scheme with OpenMP : end

0
2
9
37
66
141
219
365
516
757
.....
18446514741354254581
18446535685572961374
18446556629820732765
18446577574071146391
18446598518350624637
18446619462632745120
18446640406943930245
18446661351257757609
18446682295600649637
18446703239946183906

the sequential region : 4"
the // region: 2" (twice faster)

of course if i run a pure C eqivlent code it is instantaneous:

// openMP cube - collatz test

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>


// OpenMP on macOS with Xcode tools:
// https://mac.r-project.org/openmp/

// export OMP_NUM_THREADS=3D8

// this main() in a library was only for testing openMP with Mac OS Xcode
and Linux for use uncomment main() and comment openmp() functions


// mac os :
// clang  -I/opt/homebrew/opt/libomp/include
 -L/opt/homebrew/opt/libomp/lib -Xclang -fopenmp -o collatz  -lomp collatz.=
c

// gcc -L/usr/lib/llvm-14/lib/ -fopenmp  -o collatz  -lomp collatz.c


unsigned long long *vtst;


unsigned long long collatz(unsigned long long n) {

  if (n =3D=3D 1) return 1;

  if ((n % 2) =3D=3D 0)
    return n / 2;
  else
    return 3*n + 1;

}

unsigned long long fct(unsigned long long x) {

  unsigned long long c;
  if (x =3D=3D 0)
    c =3D 0;
  else
    c =3D collatz(x);

  return (x * x * x) + c;
}


unsigned long long fctapply(unsigned long long i) {
  return vtst[i] =3D fct(vtst[i]);
}


int main() {
  int vtstlen =3D 2642245; // cubic root of 18,446,744,073,709,551,615
https://en.wikipedia.org/wiki/C_data_types
  vtst =3D calloc(vtstlen, sizeof(unsigned long long));

  int ncpus =3D omp_get_max_threads();
  printf("Found a maximum of %i cores.\n",ncpus);
  printf("Program compute cube of numbers and add collatz result (1) with
and without parallelisation with OpenMP library.\n\n");
  printf("Initialising data.\n\n");
  //int iam,nthr;

  // init data sequential
  for (int i=3D0; i<vtstlen; i++) { /* i is private by default because it i=
s
the for indice*/
    //iam =3D omp_get_thread_num();
    //printf("iam=3D%i\n",iam);
    //nthr =3D omp_get_num_threads() ;
    //printf("total number of threads=3D%i\n",nthr);
    vtst[i]=3Di;

  }


  printf("STARTING computation without //.\n");


  for (int i=3D0; i<vtstlen; i++) {

    fctapply(i);

  }

  printf("ENDING computation without //.\n\n");

  // display a few results
  for (int i=3D0;i < 10; i++) {
    printf("%llu\n",vtst[i]);
  }
  printf( ".....\n");
  for (int i=3Dvtstlen - 10; i < vtstlen; i++) {
    printf("%llu\n",vtst[i]);
  }


  printf("Initialising data in //.\n\n");
  //int iam,nthr;

#pragma omp parallel for private(vtstlen) shared(vtst)


  for (int i=3D0; i<vtstlen; i++) { /* i is private by default because it i=
s
the for indice*/

    vtst[i]=3Di;

  }

  printf("STARTING computation in //.\n");


  // setting private disable unecessary // overload work on some variables
(mutex...)
#pragma omp parallel for private(vtstlen) shared(vtst)


  for (int i=3D0; i<vtstlen; i++) { /* i is private by default */

    fctapply(i);

  }

  printf("ENDING computation in //.\n\n");


  // display a few results
  for (int i=3D0;i < 10; i++) {
    printf("%llu\n",vtst[i]);
  }
  printf( ".....\n");
  for (int i=3Dvtstlen - 10; i < vtstlen; i++) {
    printf("%llu\n",vtst[i]);
  }


}

https://github.com/damien-mattei/library-FunctProg/blob/master/collatz.c

in conclusion openMP with Guile give a few improvement of a factor between
1.15 (with logic algo) of 2 (benchmarks with cube and collatz) of speed
only on MacOS under Linux it fails with segfault or is slower.

there should be difference in implementation of Guile between Mac OS and
Linux but i do not know the inner mechanism and algorithm used to run Guile
in a C environment,what  scm_init_guile() is doing?
why must it be placed under the // region on Linux (with slower result) and
anywhere under MacOS ? (speed up code)
possibly this could be improved. It is already a good result to see it
works with OpenMP in Scheme .

Best wishes,

Damien


On Fri, Jan 6, 2023 at 6:06 PM Maxime Devos <maximedevos@telenet.be> wrote:

> > no it returns something based on address:
> > scheme@(guile-user)> (current-thread)
> > $1 =3D #<thread 8814535936 (102a61d80)>
> > the good thing it is that it is different for each address, the bad is
> that i do not know how to extract it from the result and anyway i need a
> number : 0,1,2,3... ordered and  being a partition to make scheduling tha=
t
> each thread deal with a part of the array (vector) the way it is in OpenM=
P
> like in the FOR example i posted a week ago
>
> You could define a (weak key) hash table from threads to numbers, and
> whenever a thread is encountered that isn't yet in the table, assign it
> an unused number and insert it in the table.  Requires locking (or an
> atomics equivalent) though, so not ideal.
>
> (Maybe there's a method to get a number, directly, but I don't know any.)
>
> > just do a 'for like in openMP (mentioned above)
>
> In that case, when implementing slicing the array between different new
> fibers, you can give each of the fibers you spawn (one fiber per slice,
> if I understand the terminology correctly) an entry in the vector, and
> after all the fibers complete do the usual 'sum/multiply/... all
> entries' trick.
>
> As each fiber has its own (independent) storage, not touched by the
> other fibers, that should be safe.
>
> I suppose this might take more memory storage than with openMP.
>
> > i undertand fibers is better for scheduling web server request but not
> for parallelizing like openMP - it is two differents world.
>
> You can do parallelisation with fibers (see =E2=80=98In that case, when
> implementing slicing ...=E2=80=99), but from what I'm reading, it will be
> somewhat unlike openMP.
>
> On 06-01-2023 16:06, Damien Mattei wrote:
> >
> > (define omp-get-max-threads
> >    (pointer->procedure int
> >                        (dynamic-func "omp_get_max_threads" libomp)
> >                        (list void)))
> >
> > but i get this error:
> > ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> > In procedure pointer->procedure: Wrong type argument in position 3: 0
> >
> > i do not understand why.
>
>
> =E2=80=98int omp_get_max_thread(void);=E2=80=99 is C's way to declare tha=
t
> omp_get_max_thread has no arguments -- there is no 'void'-typed argument.
>
> Try (untested):
>
> (define omp-get-max-threads
>    (pointer->procedure int
>                        (dynamic-func "omp_get_max_threads" libomp)
>                        (list)))
>
> Greetings,
> Maxime.
>

--000000000000ae84be05f22347e9
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div dir=3D"ltr"><div class=3D"gmail_default" style=3D"fon=
t-size:large">i made some test of openMP and Guile with Guile 3.0.8.99-f3ea=
8 on MacOS M1 and Linux Intel because i was not sure of the performances. I=
 find a problem on Linux the code is slower (could be a factor of 5) with o=
penMP and in Mac OS the gain is is of 100% (divide by 2) or 15% depending o=
f computation complexity.</div><div class=3D"gmail_default" style=3D"font-s=
ize:large">i can not explain why it works under MacOS and not Linux, the on=
ly difference of compilation is that under Mac OS i had to force this optio=
n to succeed compiling:</div><div class=3D"gmail_default" style=3D"font-siz=
e:large">configure --enable-mini-gmp</div><div class=3D"gmail_default" styl=
e=3D"font-size:large"><br></div><div class=3D"gmail_default" style=3D"font-=
size:large">Anyway it is not good performance for openMP with scheme, under=
 openMP with n CPUs i have gain of almost n x 100% of speedup, in C languag=
e or Fortran OpenMP when use for astronomical numerical simulation.</div><d=
iv class=3D"gmail_default" style=3D"font-size:large">in the // region i hav=
e only this code on MacOS:</div><div class=3D"gmail_default" style=3D"font-=
size:large"><br></div><div class=3D"gmail_default" style=3D"font-size:large=
">=C2=A0 scm_init_guile(); <br>=C2=A0 <br>#pragma omp parallel for<br>=C2=
=A0 <br>=C2=A0 for (i=3Dstart; i&lt;=3Dstop; i++) =C2=A0{ /* i is private b=
y default */<br><br>=C2=A0 =C2=A0 scm_call_1( func , scm_from_int(i) );</di=
v><div class=3D"gmail_default" style=3D"font-size:large"><br></div><div cla=
ss=3D"gmail_default" style=3D"font-size:large">with linux this create a seg=
mentation fault unless i move inside the for loop the line scm_init_guile()=
; <br></div><div class=3D"gmail_default" style=3D"font-size:large"><br></di=
v><div class=3D"gmail_default" style=3D"font-size:large">like this:</div><d=
iv class=3D"gmail_default" style=3D"font-size:large"><br></div><div class=
=3D"gmail_default" style=3D"font-size:large"><div class=3D"gmail_default" s=
tyle=3D"font-size:large">#pragma omp parallel for<br>=C2=A0 <br>=C2=A0 for =
(i=3Dstart; i&lt;=3Dstop; i++) =C2=A0{ /* i is private by default */<br></d=
iv><div class=3D"gmail_default" style=3D"font-size:large"><br></div><div cl=
ass=3D"gmail_default" style=3D"font-size:large">=C2=A0=C2=A0=C2=A0 scm_init=
_guile(); <br>=C2=A0 =C2=A0 scm_call_1( func , scm_from_int(i) );</div><div=
 class=3D"gmail_default" style=3D"font-size:large"><br></div><div class=3D"=
gmail_default" style=3D"font-size:large"><a href=3D"https://github.com/dami=
en-mattei/library-FunctProg/blob/master/guile-openMP.c#L91">https://github.=
com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c#L91</a></div=
><div class=3D"gmail_default" style=3D"font-size:large"><br></div><div clas=
s=3D"gmail_default" style=3D"font-size:large">the scheme+ code for speed te=
st looks like that (i use collatz function to make the computation unpredic=
table for any C compiler optimisations when i compare with pur C code):</di=
v><div class=3D"gmail_default" style=3D"font-size:large"><br></div><div cla=
ss=3D"gmail_default" style=3D"font-size:large">;; only for speed tests<br>{=
vtstlen &lt;+ 2642245}<br>{vtst &lt;+ (make-vector vtstlen 0)}<br><br>{fct =
&lt;+ (lambda (x) {x * x * x})}<br><br>(define (fctapply i) {vtst[i] &lt;- =
fct(vtst[i])}) ;; neoteric expression of {vtst[i] &lt;- (fct vtst[i])}<br><=
br>(define (fctpluscollatzapply i) {vtst[i] &lt;- fctpluscollatz(vtst[i])})=
<br><br>(define (speed-test)<br><br>=C2=A0 ;; init data<br>=C2=A0 (display-=
nl &quot;speed-test : Initialising data.&quot;)<br>=C2=A0 (for ({i &lt;+ 0}=
 {i &lt; vtstlen} {i &lt;- {i + 1}})<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0{vtst[i]=
 &lt;- i})<br><br>=C2=A0 ;; compute<br>=C2=A0 (display-nl &quot;speed-test =
: testing Scheme alone : start&quot;)<br>=C2=A0 (for ({i &lt;+ 0} {i &lt; v=
tstlen} {i &lt;- {i + 1}})<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0(fctpluscollatzapp=
ly i));;(fctapply i))<br>=C2=A0 (display-nl &quot;speed-test : testing Sche=
me alone : end&quot;)<br><br>=C2=A0 (newline)<br>=C2=A0 <br>=C2=A0 ;; displ=
ay a few results<br>=C2=A0 (for ({i &lt;+ 0} {i &lt; 10} {i &lt;- {i + 1}})=
<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0(display-nl {vtst[i]}))<br>=C2=A0 (display-n=
l &quot;.....&quot;)<br>=C2=A0 (for ({i &lt;+ {vtstlen - 10}} {i &lt; vtstl=
en} {i &lt;- {i + 1}})<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0(display-nl {vtst[i]})=
)<br><br>=C2=A0 ;; init data<br>=C2=A0 (display-nl &quot;speed-test : Initi=
alising data.&quot;)<br>=C2=A0 (for ({i &lt;+ 0} {i &lt; vtstlen} {i &lt;- =
{i + 1}})<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0{vtst[i] &lt;- i})<br><br>=C2=A0 ;;=
 compute<br>=C2=A0 (display-nl &quot;speed-test : testing Scheme with OpenM=
P : start&quot;)<br>=C2=A0 (openmp 0 {vtstlen - 1} (string-&gt;pointer &quo=
t;fctpluscollatzapply&quot;));;&quot;fctapply&quot;))<br>=C2=A0 (display-nl=
 &quot;speed-test : testing Scheme with OpenMP : end&quot;)<br><br>=C2=A0 (=
newline)<br><br>=C2=A0 ;; display a few results<br>=C2=A0 (for ({i &lt;+ 0}=
 {i &lt; 10} {i &lt;- {i + 1}})<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0(display-nl {=
vtst[i]}))<br>=C2=A0 (display-nl &quot;.....&quot;)<br>=C2=A0 (for ({i &lt;=
+ {vtstlen - 10}} {i &lt; vtstlen} {i &lt;- {i + 1}})<br>=C2=A0 =C2=A0 =C2=
=A0 =C2=A0(display-nl {vtst[i]}))<br>=C2=A0 <br>=C2=A0 )<br><br><br>(define=
 (collatz n)<br>=C2=A0 (cond ({n =3D 1} 1)<br>	({(modulo n 2) =3D 0} {n / 2=
})<br>	(else {{3 * n} + 1})))<br><br><br>(define (fctpluscollatz x)<br>=C2=
=A0 (declare c)<br>=C2=A0 (if {x =3D 0}<br>=C2=A0 =C2=A0 =C2=A0 {c &lt;- 0}=
<br>=C2=A0 =C2=A0 =C2=A0 {c &lt;- collatz(x)})<br>=C2=A0 {{x * x * x} + c})=
<br><br><br>(define openmp (foreign-library-function &quot;./libguile-openM=
P&quot; &quot;openmp&quot; #:return-type int #:arg-types (list int int &#39=
;*)))<br><br><br>(define libomp (dynamic-link &quot;libomp&quot;)) ;; =C2=
=A0note: require a link : ln -s /opt/homebrew/opt/libomp/lib/libomp.dylib l=
ibomp.dylib<br>;; export LTDL_LIBRARY_PATH=3D. under linux with a link as a=
bove<br>;; or better solution: export LTDL_LIBRARY_PATH=3D/usr/lib/llvm-14/=
lib<br><br>(define omp-get-max-threads<br>=C2=A0 (pointer-&gt;procedure int=
<br>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 (dynamic-func &quot;omp_get_max_threads&quot; libomp)<br>=C2=A0 =C2=
=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 &#39;())=
)<br></div><div class=3D"gmail_default" style=3D"font-size:large"><br></div=
><div class=3D"gmail_default" style=3D"font-size:large"><a href=3D"https://=
github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#=
L3581">https://github.com/damien-mattei/library-FunctProg/blob/master/guile=
/logiki%2B.scm#L3581</a></div><div class=3D"gmail_default" style=3D"font-si=
ze:large"><br></div><div class=3D"gmail_default" style=3D"font-size:large">=
output:</div><div class=3D"gmail_default" style=3D"font-size:large"><br></d=
iv><div class=3D"gmail_default" style=3D"font-size:large">scheme@(guile-use=
r)&gt; (speed-test )<br>speed-test : Initialising data.<br>speed-test : tes=
ting Scheme alone : start<br>speed-test : testing Scheme alone : end<br><br=
>0<br>2<br>9<br>37<br>66<br>141<br>219<br>365<br>516<br>757<br>.....<br>184=
46514741354254581<br>18446535685572961374<br>18446556629820732765<br>184465=
77574071146391<br>18446598518350624637<br>18446619462632745120<br>184466404=
06943930245<br>18446661351257757609<br>18446682295600649637<br>184467032399=
46183906<br>speed-test : Initialising data.<br>speed-test : testing Scheme =
with OpenMP : start<br>speed-test : testing Scheme with OpenMP : end<br><br=
>0<br>2<br>9<br>37<br>66<br>141<br>219<br>365<br>516<br>757<br>.....<br>184=
46514741354254581<br>18446535685572961374<br>18446556629820732765<br>184465=
77574071146391<br>18446598518350624637<br>18446619462632745120<br>184466404=
06943930245<br>18446661351257757609<br>18446682295600649637<br>184467032399=
46183906</div><div class=3D"gmail_default" style=3D"font-size:large"><br></=
div><div class=3D"gmail_default" style=3D"font-size:large">the sequential r=
egion : 4&quot;</div><div class=3D"gmail_default" style=3D"font-size:large"=
>the // region: 2&quot; (twice faster)</div><div class=3D"gmail_default" st=
yle=3D"font-size:large"><br></div><div class=3D"gmail_default" style=3D"fon=
t-size:large">of course if i run a pure C eqivlent code it is instantaneous=
:</div><div class=3D"gmail_default" style=3D"font-size:large"><br></div><di=
v class=3D"gmail_default" style=3D"font-size:large">// openMP cube - collat=
z test<br><br>#include &lt;omp.h&gt;<br>#include &lt;stdio.h&gt;<br>#includ=
e &lt;stdlib.h&gt;<br><br><br><br>// OpenMP on macOS with Xcode tools:<br>/=
/ <a href=3D"https://mac.r-project.org/openmp/">https://mac.r-project.org/o=
penmp/</a><br><br>// export OMP_NUM_THREADS=3D8<br><br>// this main() in a =
library was only for testing openMP with Mac OS Xcode and Linux for use unc=
omment main() and comment openmp() functions<br><br><br>// mac os :<br>// c=
lang =C2=A0-I/opt/homebrew/opt/libomp/include =C2=A0-L/opt/homebrew/opt/lib=
omp/lib -Xclang -fopenmp -o collatz =C2=A0-lomp collatz.c<br><br>// gcc -L/=
usr/lib/llvm-14/lib/ -fopenmp =C2=A0-o collatz =C2=A0-lomp collatz.c<br><br=
><br>unsigned long long *vtst;<br><br><br><br>unsigned long long collatz(un=
signed long long n) {<br><br>=C2=A0 if (n =3D=3D 1) return 1;<br><br>=C2=A0=
 if ((n % 2) =3D=3D 0)<br>=C2=A0 =C2=A0 return n / 2;<br>=C2=A0 else<br>=C2=
=A0 =C2=A0 return 3*n + 1;<br><br>}<br><br>unsigned long long fct(unsigned =
long long x) {<br><br>=C2=A0 unsigned long long c;<br>=C2=A0 if (x =3D=3D 0=
)<br>=C2=A0 =C2=A0 c =3D 0;<br>=C2=A0 else<br>=C2=A0 =C2=A0 c =3D collatz(x=
);<br>=C2=A0 <br>=C2=A0 return (x * x * x) + c;<br>}<br><br><br>unsigned lo=
ng long fctapply(unsigned long long i) {<br>=C2=A0 return vtst[i] =3D fct(v=
tst[i]);<br>}<br><br><br><br><br>int main() {<br>=C2=A0 int vtstlen =3D 264=
2245; // cubic root of 18,446,744,073,709,551,615 <a href=3D"https://en.wik=
ipedia.org/wiki/C_data_types">https://en.wikipedia.org/wiki/C_data_types</a=
><br>=C2=A0 vtst =3D calloc(vtstlen, sizeof(unsigned long long));<br>=C2=A0=
 <br>=C2=A0 int ncpus =3D omp_get_max_threads();<br>=C2=A0 printf(&quot;Fou=
nd a maximum of %i cores.\n&quot;,ncpus);<br>=C2=A0 printf(&quot;Program co=
mpute cube of numbers and add collatz result (1) with and without paralleli=
sation with OpenMP library.\n\n&quot;);<br>=C2=A0 printf(&quot;Initialising=
 data.\n\n&quot;);<br>=C2=A0 //int iam,nthr;<br><br>=C2=A0 // init data seq=
uential<br>=C2=A0 for (int i=3D0; i&lt;vtstlen; i++) { /* i is private by d=
efault because it is the for indice*/<br>=C2=A0 =C2=A0 //iam =3D omp_get_th=
read_num();<br>=C2=A0 =C2=A0 //printf(&quot;iam=3D%i\n&quot;,iam);<br>=C2=
=A0 =C2=A0 //nthr =3D omp_get_num_threads() ;<br>=C2=A0 =C2=A0 //printf(&qu=
ot;total number of threads=3D%i\n&quot;,nthr);<br>=C2=A0 =C2=A0 vtst[i]=3Di=
;<br>=C2=A0 =C2=A0 <br>=C2=A0 }<br><br><br>=C2=A0 printf(&quot;STARTING com=
putation without //.\n&quot;);<br>=C2=A0 <br><br>=C2=A0 for (int i=3D0; i&l=
t;vtstlen; i++) {<br>=C2=A0 =C2=A0 <br>=C2=A0 =C2=A0 fctapply(i);<br>=C2=A0=
 =C2=A0 <br>=C2=A0 }<br><br>=C2=A0 printf(&quot;ENDING computation without =
//.\n\n&quot;);<br><br>=C2=A0 // display a few results<br>=C2=A0 for (int i=
=3D0;i &lt; 10; i++) {<br>=C2=A0 =C2=A0 printf(&quot;%llu\n&quot;,vtst[i]);=
<br>=C2=A0 }<br>=C2=A0 printf( &quot;.....\n&quot;);<br>=C2=A0 for (int i=
=3Dvtstlen - 10; i &lt; vtstlen; i++) {<br>=C2=A0 =C2=A0 printf(&quot;%llu\=
n&quot;,vtst[i]);<br>=C2=A0 }<br>=C2=A0<br>=C2=A0 <br>=C2=A0 printf(&quot;I=
nitialising data in //.\n\n&quot;);<br>=C2=A0 //int iam,nthr;<br><br>#pragm=
a omp parallel for private(vtstlen) shared(vtst)<br>=C2=A0 <br>=C2=A0 <br>=
=C2=A0 for (int i=3D0; i&lt;vtstlen; i++) { /* i is private by default beca=
use it is the for indice*/<br>=C2=A0 <br>=C2=A0 =C2=A0 vtst[i]=3Di;<br>=C2=
=A0 =C2=A0 <br>=C2=A0 }<br><br>=C2=A0 printf(&quot;STARTING computation in =
//.\n&quot;);<br><br>=C2=A0 <br>=C2=A0 // setting private disable unecessar=
y // overload work on some variables (mutex...)<br>#pragma omp parallel for=
 private(vtstlen) shared(vtst)<br>=C2=A0 <br>=C2=A0 <br>=C2=A0 for (int i=
=3D0; i&lt;vtstlen; i++) { /* i is private by default */<br>=C2=A0 =C2=A0 <=
br>=C2=A0 =C2=A0 fctapply(i);<br>=C2=A0 =C2=A0 <br>=C2=A0 }<br>=C2=A0 <br>=
=C2=A0 printf(&quot;ENDING computation in //.\n\n&quot;);<br>=C2=A0 <br><br=
>=C2=A0 // display a few results<br>=C2=A0 for (int i=3D0;i &lt; 10; i++) {=
<br>=C2=A0 =C2=A0 printf(&quot;%llu\n&quot;,vtst[i]);<br>=C2=A0 }<br>=C2=A0=
 printf( &quot;.....\n&quot;);<br>=C2=A0 for (int i=3Dvtstlen - 10; i &lt; =
vtstlen; i++) {<br>=C2=A0 =C2=A0 printf(&quot;%llu\n&quot;,vtst[i]);<br>=C2=
=A0 }<br>=C2=A0 <br><br>}</div><div class=3D"gmail_default" style=3D"font-s=
ize:large"><br></div><div class=3D"gmail_default" style=3D"font-size:large"=
><a href=3D"https://github.com/damien-mattei/library-FunctProg/blob/master/=
collatz.c">https://github.com/damien-mattei/library-FunctProg/blob/master/c=
ollatz.c</a></div><div class=3D"gmail_default" style=3D"font-size:large"><b=
r></div><div class=3D"gmail_default" style=3D"font-size:large">in conclusio=
n openMP with Guile give a few improvement of a factor between 1.15 (with l=
ogic algo) of 2 (benchmarks with cube and collatz) of speed only on MacOS u=
nder Linux it fails with segfault or is slower.</div><div class=3D"gmail_de=
fault" style=3D"font-size:large"><br></div><div class=3D"gmail_default" sty=
le=3D"font-size:large">there should be difference in implementation of Guil=
e between Mac OS and Linux but i do not know the inner mechanism and algori=
thm used to run Guile in a C environment,what=C2=A0 scm_init_guile() is doi=
ng?</div><div class=3D"gmail_default" style=3D"font-size:large">why must it=
 be placed under the // region on Linux (with slower result) and anywhere u=
nder MacOS ? (speed up code) <br></div><div class=3D"gmail_default" style=
=3D"font-size:large">possibly this could be improved. It is already a good =
result to see it works with OpenMP in Scheme .</div><div class=3D"gmail_def=
ault" style=3D"font-size:large"><br></div><div class=3D"gmail_default" styl=
e=3D"font-size:large">Best wishes,</div><div class=3D"gmail_default" style=
=3D"font-size:large"><br></div><div class=3D"gmail_default" style=3D"font-s=
ize:large">Damien<br></div><div class=3D"gmail_default" style=3D"font-size:=
large"><br></div></div></div><br><div class=3D"gmail_quote"><div dir=3D"ltr=
" class=3D"gmail_attr">On Fri, Jan 6, 2023 at 6:06 PM Maxime Devos &lt;<a h=
ref=3D"mailto:maximedevos@telenet.be" target=3D"_blank">maximedevos@telenet=
.be</a>&gt; wrote:<br></div><blockquote class=3D"gmail_quote" style=3D"marg=
in:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1e=
x">&gt; no it returns something based on address:<br>
&gt; scheme@(guile-user)&gt; (current-thread)<br>
&gt; $1 =3D #&lt;thread 8814535936 (102a61d80)&gt;<br>
&gt; the good thing it is that it is different for each address, the bad is=
 that i do not know how to extract it from the result and anyway i need a n=
umber : 0,1,2,3... ordered and=C2=A0 being a partition to make scheduling t=
hat each thread deal with a part of the array (vector) the way it is in Ope=
nMP like in the FOR example i posted a week ago<br>
<br>
You could define a (weak key) hash table from threads to numbers, and <br>
whenever a thread is encountered that isn&#39;t yet in the table, assign it=
 <br>
an unused number and insert it in the table.=C2=A0 Requires locking (or an =
<br>
atomics equivalent) though, so not ideal.<br>
<br>
(Maybe there&#39;s a method to get a number, directly, but I don&#39;t know=
 any.)<br>
<br>
&gt; just do a &#39;for like in openMP (mentioned above)<br>
<br>
In that case, when implementing slicing the array between different new <br=
>
fibers, you can give each of the fibers you spawn (one fiber per slice, <br=
>
if I understand the terminology correctly) an entry in the vector, and <br>
after all the fibers complete do the usual &#39;sum/multiply/... all <br>
entries&#39; trick.<br>
<br>
As each fiber has its own (independent) storage, not touched by the <br>
other fibers, that should be safe.<br>
<br>
I suppose this might take more memory storage than with openMP.<br>
<br>
&gt; i undertand fibers is better for scheduling web server request but not=
 for parallelizing like openMP - it is two differents world.<br>
<br>
You can do parallelisation with fibers (see =E2=80=98In that case, when <br=
>
implementing slicing ...=E2=80=99), but from what I&#39;m reading, it will =
be <br>
somewhat unlike openMP.<br>
<br>
On 06-01-2023 16:06, Damien Mattei wrote:<br>
&gt; <br>
&gt; (define omp-get-max-threads<br>
&gt;=C2=A0 =C2=A0 (pointer-&gt;procedure int<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 (dynamic-func &quot;omp_get_max_threads&quot; libomp)<br>
&gt;=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =
=C2=A0 =C2=A0 (list void)))<br>
&gt; <br>
&gt; but i get this error:<br>
&gt; ice-9/boot-9.scm:1685:16: In procedure raise-exception:<br>
&gt; In procedure pointer-&gt;procedure: Wrong type argument in position 3:=
 0<br>
&gt; <br>
&gt; i do not understand why.<br>
<br>
<br>
=E2=80=98int omp_get_max_thread(void);=E2=80=99 is C&#39;s way to declare t=
hat <br>
omp_get_max_thread has no arguments -- there is no &#39;void&#39;-typed arg=
ument.<br>
<br>
Try (untested):<br>
<br>
(define omp-get-max-threads<br>
=C2=A0 =C2=A0(pointer-&gt;procedure int<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0(dynamic-func &quot;omp_get_max_threads&quot; libomp)<br>
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=
=A0 =C2=A0(list)))<br>
<br>
Greetings,<br>
Maxime.<br>
</blockquote></div>
</div>

--000000000000ae84be05f22347e9--