From mboxrd@z Thu Jan 1 00:00:00 1970 Path: news.gmane.io!.POSTED.blaine.gmane.org!not-for-mail From: Damien Mattei Newsgroups: gmane.lisp.guile.devel,gmane.lisp.guile.user Subject: Re: fibers,questions about thread id and mutation of vectors Date: Fri, 13 Jan 2023 12:10:12 +0100 Message-ID: References: <97b66c92-d969-a5d1-7371-1ea4a66a0063@telenet.be> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="000000000000ae84be05f22347e9" Injection-Info: ciao.gmane.io; posting-host="blaine.gmane.org:116.202.254.214"; logging-data="31129"; mail-complaints-to="usenet@ciao.gmane.io" Cc: guile-user , guile-devel To: Maxime Devos Original-X-From: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Fri Jan 13 12:11:25 2023 Return-path: Envelope-to: guile-devel@m.gmane-mx.org Original-Received: from lists.gnu.org ([209.51.188.17]) by ciao.gmane.io with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.92) (envelope-from ) id 1pGHy0-0007tu-9N for guile-devel@m.gmane-mx.org; Fri, 13 Jan 2023 12:11:24 +0100 Original-Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pGHxU-0006Ge-5R; Fri, 13 Jan 2023 06:10:52 -0500 Original-Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pGHxB-00065R-Kx; Fri, 13 Jan 2023 06:10:39 -0500 Original-Received: from mail-ej1-x629.google.com ([2a00:1450:4864:20::629]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pGHx5-0005mV-ER; Fri, 13 Jan 2023 06:10:30 -0500 Original-Received: by mail-ej1-x629.google.com with SMTP id fy8so51392881ejc.13; Fri, 13 Jan 2023 03:10:25 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:from:to:cc:subject:date:message-id:reply-to; bh=heJa3VCAfGAZmDSbHF3DBEuSaSD0ZJFlTycKMQG0pcI=; b=cgvanTXOwekmO+Jud61xwei+lMbjCPfFLVNvYgGYtmDlFTp4Ms0k73m2UEcdZVyQft gbjpTXscWqRBhkZTkkJSp6L5Ei5DuPK9eKAY/OeP46I+x+BH3zemq+Y/cY09TU+MoRu/ PYFZimjPsaJhh2Xva2o5NOVijwCoRwR3kUHyz0H6jqWKJJzcQe/6f5P/5X0kJoCBIx+2 4nqbNYofkm6QMGNIcTero053eiqyCYO8JsCfAbnOE4Jp1cauaE1VjKgDX0BgAWufEFHG isesjwOvf7RtSeY68tfWBa0GOLFV6IgbeNvtN870wzYsovsgb46f29BvSG0EUepd4bpD bzBA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=heJa3VCAfGAZmDSbHF3DBEuSaSD0ZJFlTycKMQG0pcI=; b=kigZq4R52ah5/Ic9p1nRTT7IH4uT41XM+8p7mmjt3foDBMClEfYbU1uqq9haerRRf0 6UGQiu5fyzYae18NZP3inPODrSw4By13XaRtKGUbTUmFrZujQqS4LFe6FM6qjLAXNB0l aD9SxFkYpdVgpWhefpyPN6K5Th7ffieB+3UgABmLatYYPXXZfCElUpsI7DJAcHgt0/Wq 0GOA6fDLIcMHckxJ6/mqNhK1TUc3Y2tl06hMwUdETxtvF8OIzECQqYjy5APpDsxDlkiN OZTw59AKJFgxN/uE5fsHhBTXXMseRvZZ6/tOI3eJ7pmcHJTJhQRr6XzYx6E4RfjYXjgz m6Qg== X-Gm-Message-State: AFqh2krdUveLQkIBx1mKqHoJWycJujNpO7pMR0lchmHc1Yw7P2MlBOqU 74gZM1JitVb6TrRmrt/anvj4iRjT0unjEId/15oIoR4fmdQ= X-Google-Smtp-Source: AMrXdXswDi+k5Q6OUfBE6Ds8nNld23I1jRBUYFIi2rDserJv61LTNY2WhD3wJNFhvKFvurS3rTMVQBpW+UBY/dihY7c= X-Received: by 2002:a17:906:61b:b0:7c1:71ba:9759 with SMTP id s27-20020a170906061b00b007c171ba9759mr5484090ejb.770.1673608223948; Fri, 13 Jan 2023 03:10:23 -0800 (PST) In-Reply-To: <97b66c92-d969-a5d1-7371-1ea4a66a0063@telenet.be> Received-SPF: pass client-ip=2a00:1450:4864:20::629; envelope-from=damien.mattei@gmail.com; helo=mail-ej1-x629.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guile-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Developers list for Guile, the GNU extensibility library" List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Original-Sender: guile-devel-bounces+guile-devel=m.gmane-mx.org@gnu.org Xref: news.gmane.io gmane.lisp.guile.devel:21572 gmane.lisp.guile.user:18861 Archived-At: --000000000000ae84be05f22347e9 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable i made some test of openMP and Guile with Guile 3.0.8.99-f3ea8 on MacOS M1 and Linux Intel because i was not sure of the performances. I find a problem on Linux the code is slower (could be a factor of 5) with openMP and in Mac OS the gain is is of 100% (divide by 2) or 15% depending of computation complexity. i can not explain why it works under MacOS and not Linux, the only difference of compilation is that under Mac OS i had to force this option to succeed compiling: configure --enable-mini-gmp Anyway it is not good performance for openMP with scheme, under openMP with n CPUs i have gain of almost n x 100% of speedup, in C language or Fortran OpenMP when use for astronomical numerical simulation. in the // region i have only this code on MacOS: scm_init_guile(); #pragma omp parallel for for (i=3Dstart; i<=3Dstop; i++) { /* i is private by default */ scm_call_1( func , scm_from_int(i) ); with linux this create a segmentation fault unless i move inside the for loop the line scm_init_guile(); like this: #pragma omp parallel for for (i=3Dstart; i<=3Dstop; i++) { /* i is private by default */ scm_init_guile(); scm_call_1( func , scm_from_int(i) ); https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP= .c#L91 the scheme+ code for speed test looks like that (i use collatz function to make the computation unpredictable for any C compiler optimisations when i compare with pur C code): ;; only for speed tests {vtstlen <+ 2642245} {vtst <+ (make-vector vtstlen 0)} {fct <+ (lambda (x) {x * x * x})} (define (fctapply i) {vtst[i] <- fct(vtst[i])}) ;; neoteric expression of {vtst[i] <- (fct vtst[i])} (define (fctpluscollatzapply i) {vtst[i] <- fctpluscollatz(vtst[i])}) (define (speed-test) ;; init data (display-nl "speed-test : Initialising data.") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) {vtst[i] <- i}) ;; compute (display-nl "speed-test : testing Scheme alone : start") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) (fctpluscollatzapply i));;(fctapply i)) (display-nl "speed-test : testing Scheme alone : end") (newline) ;; display a few results (for ({i <+ 0} {i < 10} {i <- {i + 1}}) (display-nl {vtst[i]})) (display-nl ".....") (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}}) (display-nl {vtst[i]})) ;; init data (display-nl "speed-test : Initialising data.") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) {vtst[i] <- i}) ;; compute (display-nl "speed-test : testing Scheme with OpenMP : start") (openmp 0 {vtstlen - 1} (string->pointer "fctpluscollatzapply"));;"fctapply")) (display-nl "speed-test : testing Scheme with OpenMP : end") (newline) ;; display a few results (for ({i <+ 0} {i < 10} {i <- {i + 1}}) (display-nl {vtst[i]})) (display-nl ".....") (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}}) (display-nl {vtst[i]})) ) (define (collatz n) (cond ({n =3D 1} 1) ({(modulo n 2) =3D 0} {n / 2}) (else {{3 * n} + 1}))) (define (fctpluscollatz x) (declare c) (if {x =3D 0} {c <- 0} {c <- collatz(x)}) {{x * x * x} + c}) (define openmp (foreign-library-function "./libguile-openMP" "openmp" #:return-type int #:arg-types (list int int '*))) (define libomp (dynamic-link "libomp")) ;; note: require a link : ln -s /opt/homebrew/opt/libomp/lib/libomp.dylib libomp.dylib ;; export LTDL_LIBRARY_PATH=3D. under linux with a link as above ;; or better solution: export LTDL_LIBRARY_PATH=3D/usr/lib/llvm-14/lib (define omp-get-max-threads (pointer->procedure int (dynamic-func "omp_get_max_threads" libomp) '())) https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki= %2B.scm#L3581 output: scheme@(guile-user)> (speed-test ) speed-test : Initialising data. speed-test : testing Scheme alone : start speed-test : testing Scheme alone : end 0 2 9 37 66 141 219 365 516 757 ..... 18446514741354254581 18446535685572961374 18446556629820732765 18446577574071146391 18446598518350624637 18446619462632745120 18446640406943930245 18446661351257757609 18446682295600649637 18446703239946183906 speed-test : Initialising data. speed-test : testing Scheme with OpenMP : start speed-test : testing Scheme with OpenMP : end 0 2 9 37 66 141 219 365 516 757 ..... 18446514741354254581 18446535685572961374 18446556629820732765 18446577574071146391 18446598518350624637 18446619462632745120 18446640406943930245 18446661351257757609 18446682295600649637 18446703239946183906 the sequential region : 4" the // region: 2" (twice faster) of course if i run a pure C eqivlent code it is instantaneous: // openMP cube - collatz test #include #include #include // OpenMP on macOS with Xcode tools: // https://mac.r-project.org/openmp/ // export OMP_NUM_THREADS=3D8 // this main() in a library was only for testing openMP with Mac OS Xcode and Linux for use uncomment main() and comment openmp() functions // mac os : // clang -I/opt/homebrew/opt/libomp/include -L/opt/homebrew/opt/libomp/lib -Xclang -fopenmp -o collatz -lomp collatz.= c // gcc -L/usr/lib/llvm-14/lib/ -fopenmp -o collatz -lomp collatz.c unsigned long long *vtst; unsigned long long collatz(unsigned long long n) { if (n =3D=3D 1) return 1; if ((n % 2) =3D=3D 0) return n / 2; else return 3*n + 1; } unsigned long long fct(unsigned long long x) { unsigned long long c; if (x =3D=3D 0) c =3D 0; else c =3D collatz(x); return (x * x * x) + c; } unsigned long long fctapply(unsigned long long i) { return vtst[i] =3D fct(vtst[i]); } int main() { int vtstlen =3D 2642245; // cubic root of 18,446,744,073,709,551,615 https://en.wikipedia.org/wiki/C_data_types vtst =3D calloc(vtstlen, sizeof(unsigned long long)); int ncpus =3D omp_get_max_threads(); printf("Found a maximum of %i cores.\n",ncpus); printf("Program compute cube of numbers and add collatz result (1) with and without parallelisation with OpenMP library.\n\n"); printf("Initialising data.\n\n"); //int iam,nthr; // init data sequential for (int i=3D0; i wrote: > > no it returns something based on address: > > scheme@(guile-user)> (current-thread) > > $1 =3D # > > the good thing it is that it is different for each address, the bad is > that i do not know how to extract it from the result and anyway i need a > number : 0,1,2,3... ordered and being a partition to make scheduling tha= t > each thread deal with a part of the array (vector) the way it is in OpenM= P > like in the FOR example i posted a week ago > > You could define a (weak key) hash table from threads to numbers, and > whenever a thread is encountered that isn't yet in the table, assign it > an unused number and insert it in the table. Requires locking (or an > atomics equivalent) though, so not ideal. > > (Maybe there's a method to get a number, directly, but I don't know any.) > > > just do a 'for like in openMP (mentioned above) > > In that case, when implementing slicing the array between different new > fibers, you can give each of the fibers you spawn (one fiber per slice, > if I understand the terminology correctly) an entry in the vector, and > after all the fibers complete do the usual 'sum/multiply/... all > entries' trick. > > As each fiber has its own (independent) storage, not touched by the > other fibers, that should be safe. > > I suppose this might take more memory storage than with openMP. > > > i undertand fibers is better for scheduling web server request but not > for parallelizing like openMP - it is two differents world. > > You can do parallelisation with fibers (see =E2=80=98In that case, when > implementing slicing ...=E2=80=99), but from what I'm reading, it will be > somewhat unlike openMP. > > On 06-01-2023 16:06, Damien Mattei wrote: > > > > (define omp-get-max-threads > > (pointer->procedure int > > (dynamic-func "omp_get_max_threads" libomp) > > (list void))) > > > > but i get this error: > > ice-9/boot-9.scm:1685:16: In procedure raise-exception: > > In procedure pointer->procedure: Wrong type argument in position 3: 0 > > > > i do not understand why. > > > =E2=80=98int omp_get_max_thread(void);=E2=80=99 is C's way to declare tha= t > omp_get_max_thread has no arguments -- there is no 'void'-typed argument. > > Try (untested): > > (define omp-get-max-threads > (pointer->procedure int > (dynamic-func "omp_get_max_threads" libomp) > (list))) > > Greetings, > Maxime. > --000000000000ae84be05f22347e9 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
i made some test of openMP and Guile with Guile 3.0.8.99-f3ea= 8 on MacOS M1 and Linux Intel because i was not sure of the performances. I= find a problem on Linux the code is slower (could be a factor of 5) with o= penMP and in Mac OS the gain is is of 100% (divide by 2) or 15% depending o= f computation complexity.
i can not explain why it works under MacOS and not Linux, the on= ly difference of compilation is that under Mac OS i had to force this optio= n to succeed compiling:
configure --enable-mini-gmp

Anyway it is not good performance for openMP with scheme, under= openMP with n CPUs i have gain of almost n x 100% of speedup, in C languag= e or Fortran OpenMP when use for astronomical numerical simulation.
in the // region i hav= e only this code on MacOS:

=C2=A0 scm_init_guile();
=C2=A0
#pragma omp parallel for
=C2= =A0
=C2=A0 for (i=3Dstart; i<=3Dstop; i++) =C2=A0{ /* i is private b= y default */

=C2=A0 =C2=A0 scm_call_1( func , scm_from_int(i) );

with linux this create a seg= mentation fault unless i move inside the for loop the line scm_init_guile()= ;

like this:

#pragma omp parallel for
=C2=A0
=C2=A0 for = (i=3Dstart; i<=3Dstop; i++) =C2=A0{ /* i is private by default */

=C2=A0=C2=A0=C2=A0 scm_init= _guile();
=C2=A0 =C2=A0 scm_call_1( func , scm_from_int(i) );

https://github.= com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c#L91

the scheme+ code for speed te= st looks like that (i use collatz function to make the computation unpredic= table for any C compiler optimisations when i compare with pur C code):

;; only for speed tests
{= vtstlen <+ 2642245}
{vtst <+ (make-vector vtstlen 0)}

{fct = <+ (lambda (x) {x * x * x})}

(define (fctapply i) {vtst[i] <- = fct(vtst[i])}) ;; neoteric expression of {vtst[i] <- (fct vtst[i])}
<= br>(define (fctpluscollatzapply i) {vtst[i] <- fctpluscollatz(vtst[i])})=

(define (speed-test)

=C2=A0 ;; init data
=C2=A0 (display-= nl "speed-test : Initialising data.")
=C2=A0 (for ({i <+ 0}= {i < vtstlen} {i <- {i + 1}})
=C2=A0 =C2=A0 =C2=A0 =C2=A0{vtst[i]= <- i})

=C2=A0 ;; compute
=C2=A0 (display-nl "speed-test = : testing Scheme alone : start")
=C2=A0 (for ({i <+ 0} {i < v= tstlen} {i <- {i + 1}})
=C2=A0 =C2=A0 =C2=A0 =C2=A0(fctpluscollatzapp= ly i));;(fctapply i))
=C2=A0 (display-nl "speed-test : testing Sche= me alone : end")

=C2=A0 (newline)
=C2=A0
=C2=A0 ;; displ= ay a few results
=C2=A0 (for ({i <+ 0} {i < 10} {i <- {i + 1}})=
=C2=A0 =C2=A0 =C2=A0 =C2=A0(display-nl {vtst[i]}))
=C2=A0 (display-n= l ".....")
=C2=A0 (for ({i <+ {vtstlen - 10}} {i < vtstl= en} {i <- {i + 1}})
=C2=A0 =C2=A0 =C2=A0 =C2=A0(display-nl {vtst[i]})= )

=C2=A0 ;; init data
=C2=A0 (display-nl "speed-test : Initi= alising data.")
=C2=A0 (for ({i <+ 0} {i < vtstlen} {i <- = {i + 1}})
=C2=A0 =C2=A0 =C2=A0 =C2=A0{vtst[i] <- i})

=C2=A0 ;;= compute
=C2=A0 (display-nl "speed-test : testing Scheme with OpenM= P : start")
=C2=A0 (openmp 0 {vtstlen - 1} (string->pointer &quo= t;fctpluscollatzapply"));;"fctapply"))
=C2=A0 (display-nl= "speed-test : testing Scheme with OpenMP : end")

=C2=A0 (= newline)

=C2=A0 ;; display a few results
=C2=A0 (for ({i <+ 0}= {i < 10} {i <- {i + 1}})
=C2=A0 =C2=A0 =C2=A0 =C2=A0(display-nl {= vtst[i]}))
=C2=A0 (display-nl ".....")
=C2=A0 (for ({i <= + {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}})
=C2=A0 =C2=A0 =C2= =A0 =C2=A0(display-nl {vtst[i]}))
=C2=A0
=C2=A0 )


(define= (collatz n)
=C2=A0 (cond ({n =3D 1} 1)
({(modulo n 2) =3D 0} {n / 2= })
(else {{3 * n} + 1})))


(define (fctpluscollatz x)
=C2= =A0 (declare c)
=C2=A0 (if {x =3D 0}
=C2=A0 =C2=A0 =C2=A0 {c <- 0}=
=C2=A0 =C2=A0 =C2=A0 {c <- collatz(x)})
=C2=A0 {{x * x * x} + c})=


(define openmp (foreign-library-function "./libguile-openM= P" "openmp" #:return-type int #:arg-types (list int int '= ;*)))


(define libomp (dynamic-link "libomp")) ;; =C2= =A0note: require a link : ln -s /opt/homebrew/opt/libomp/lib/libomp.dylib l= ibomp.dylib
;; export LTDL_LIBRARY_PATH=3D. under linux with a link as a= bove
;; or better solution: export LTDL_LIBRARY_PATH=3D/usr/lib/llvm-14/= lib

(define omp-get-max-threads
=C2=A0 (pointer->procedure int=
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 (dynamic-func "omp_get_max_threads" libomp)
=C2=A0 =C2= =A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 '())= )


= output:

scheme@(guile-use= r)> (speed-test )
speed-test : Initialising data.
speed-test : tes= ting Scheme alone : start
speed-test : testing Scheme alone : end
0
2
9
37
66
141
219
365
516
757
.....
184= 46514741354254581
18446535685572961374
18446556629820732765
184465= 77574071146391
18446598518350624637
18446619462632745120
184466404= 06943930245
18446661351257757609
18446682295600649637
184467032399= 46183906
speed-test : Initialising data.
speed-test : testing Scheme = with OpenMP : start
speed-test : testing Scheme with OpenMP : end
0
2
9
37
66
141
219
365
516
757
.....
184= 46514741354254581
18446535685572961374
18446556629820732765
184465= 77574071146391
18446598518350624637
18446619462632745120
184466404= 06943930245
18446661351257757609
18446682295600649637
184467032399= 46183906

the sequential r= egion : 4"
the // region: 2" (twice faster)

of course if i run a pure C eqivlent code it is instantaneous= :

// openMP cube - collat= z test

#include <omp.h>
#include <stdio.h>
#includ= e <stdlib.h>



// OpenMP on macOS with Xcode tools:
/= / https://mac.r-project.org/o= penmp/

// export OMP_NUM_THREADS=3D8

// this main() in a = library was only for testing openMP with Mac OS Xcode and Linux for use unc= omment main() and comment openmp() functions


// mac os :
// c= lang =C2=A0-I/opt/homebrew/opt/libomp/include =C2=A0-L/opt/homebrew/opt/lib= omp/lib -Xclang -fopenmp -o collatz =C2=A0-lomp collatz.c

// gcc -L/= usr/lib/llvm-14/lib/ -fopenmp =C2=A0-o collatz =C2=A0-lomp collatz.c

unsigned long long *vtst;



unsigned long long collatz(un= signed long long n) {

=C2=A0 if (n =3D=3D 1) return 1;

=C2=A0= if ((n % 2) =3D=3D 0)
=C2=A0 =C2=A0 return n / 2;
=C2=A0 else
=C2= =A0 =C2=A0 return 3*n + 1;

}

unsigned long long fct(unsigned = long long x) {

=C2=A0 unsigned long long c;
=C2=A0 if (x =3D=3D 0= )
=C2=A0 =C2=A0 c =3D 0;
=C2=A0 else
=C2=A0 =C2=A0 c =3D collatz(x= );
=C2=A0
=C2=A0 return (x * x * x) + c;
}


unsigned lo= ng long fctapply(unsigned long long i) {
=C2=A0 return vtst[i] =3D fct(v= tst[i]);
}




int main() {
=C2=A0 int vtstlen =3D 264= 2245; // cubic root of 18,446,744,073,709,551,615 https://en.wikipedia.org/wiki/C_data_types
=C2=A0 vtst =3D calloc(vtstlen, sizeof(unsigned long long));
=C2=A0=
=C2=A0 int ncpus =3D omp_get_max_threads();
=C2=A0 printf("Fou= nd a maximum of %i cores.\n",ncpus);
=C2=A0 printf("Program co= mpute cube of numbers and add collatz result (1) with and without paralleli= sation with OpenMP library.\n\n");
=C2=A0 printf("Initialising= data.\n\n");
=C2=A0 //int iam,nthr;

=C2=A0 // init data seq= uential
=C2=A0 for (int i=3D0; i<vtstlen; i++) { /* i is private by d= efault because it is the for indice*/
=C2=A0 =C2=A0 //iam =3D omp_get_th= read_num();
=C2=A0 =C2=A0 //printf("iam=3D%i\n",iam);
=C2= =A0 =C2=A0 //nthr =3D omp_get_num_threads() ;
=C2=A0 =C2=A0 //printf(&qu= ot;total number of threads=3D%i\n",nthr);
=C2=A0 =C2=A0 vtst[i]=3Di= ;
=C2=A0 =C2=A0
=C2=A0 }


=C2=A0 printf("STARTING com= putation without //.\n");
=C2=A0

=C2=A0 for (int i=3D0; i&l= t;vtstlen; i++) {
=C2=A0 =C2=A0
=C2=A0 =C2=A0 fctapply(i);
=C2=A0= =C2=A0
=C2=A0 }

=C2=A0 printf("ENDING computation without = //.\n\n");

=C2=A0 // display a few results
=C2=A0 for (int i= =3D0;i < 10; i++) {
=C2=A0 =C2=A0 printf("%llu\n",vtst[i]);=
=C2=A0 }
=C2=A0 printf( ".....\n");
=C2=A0 for (int i= =3Dvtstlen - 10; i < vtstlen; i++) {
=C2=A0 =C2=A0 printf("%llu\= n",vtst[i]);
=C2=A0 }
=C2=A0
=C2=A0
=C2=A0 printf("I= nitialising data in //.\n\n");
=C2=A0 //int iam,nthr;

#pragm= a omp parallel for private(vtstlen) shared(vtst)
=C2=A0
=C2=A0
= =C2=A0 for (int i=3D0; i<vtstlen; i++) { /* i is private by default beca= use it is the for indice*/
=C2=A0
=C2=A0 =C2=A0 vtst[i]=3Di;
=C2= =A0 =C2=A0
=C2=A0 }

=C2=A0 printf("STARTING computation in = //.\n");

=C2=A0
=C2=A0 // setting private disable unecessar= y // overload work on some variables (mutex...)
#pragma omp parallel for= private(vtstlen) shared(vtst)
=C2=A0
=C2=A0
=C2=A0 for (int i= =3D0; i<vtstlen; i++) { /* i is private by default */
=C2=A0 =C2=A0 <= br>=C2=A0 =C2=A0 fctapply(i);
=C2=A0 =C2=A0
=C2=A0 }
=C2=A0
= =C2=A0 printf("ENDING computation in //.\n\n");
=C2=A0
=C2=A0 // display a few results
=C2=A0 for (int i=3D0;i < 10; i++) {=
=C2=A0 =C2=A0 printf("%llu\n",vtst[i]);
=C2=A0 }
=C2=A0= printf( ".....\n");
=C2=A0 for (int i=3Dvtstlen - 10; i < = vtstlen; i++) {
=C2=A0 =C2=A0 printf("%llu\n",vtst[i]);
=C2= =A0 }
=C2=A0

}

in conclusio= n openMP with Guile give a few improvement of a factor between 1.15 (with l= ogic algo) of 2 (benchmarks with cube and collatz) of speed only on MacOS u= nder Linux it fails with segfault or is slower.

there should be difference in implementation of Guil= e between Mac OS and Linux but i do not know the inner mechanism and algori= thm used to run Guile in a C environment,what=C2=A0 scm_init_guile() is doi= ng?
why must it= be placed under the // region on Linux (with slower result) and anywhere u= nder MacOS ? (speed up code)
possibly this could be improved. It is already a good = result to see it works with OpenMP in Scheme .

Best wishes,

Damien


On Fri, Jan 6, 2023 at 6:06 PM Maxime Devos <maximedevos@telenet= .be> wrote:
> no it returns something based on address:
> scheme@(guile-user)> (current-thread)
> $1 =3D #<thread 8814535936 (102a61d80)>
> the good thing it is that it is different for each address, the bad is= that i do not know how to extract it from the result and anyway i need a n= umber : 0,1,2,3... ordered and=C2=A0 being a partition to make scheduling t= hat each thread deal with a part of the array (vector) the way it is in Ope= nMP like in the FOR example i posted a week ago

You could define a (weak key) hash table from threads to numbers, and
whenever a thread is encountered that isn't yet in the table, assign it=
an unused number and insert it in the table.=C2=A0 Requires locking (or an =
atomics equivalent) though, so not ideal.

(Maybe there's a method to get a number, directly, but I don't know= any.)

> just do a 'for like in openMP (mentioned above)

In that case, when implementing slicing the array between different new fibers, you can give each of the fibers you spawn (one fiber per slice, if I understand the terminology correctly) an entry in the vector, and
after all the fibers complete do the usual 'sum/multiply/... all
entries' trick.

As each fiber has its own (independent) storage, not touched by the
other fibers, that should be safe.

I suppose this might take more memory storage than with openMP.

> i undertand fibers is better for scheduling web server request but not= for parallelizing like openMP - it is two differents world.

You can do parallelisation with fibers (see =E2=80=98In that case, when implementing slicing ...=E2=80=99), but from what I'm reading, it will = be
somewhat unlike openMP.

On 06-01-2023 16:06, Damien Mattei wrote:
>
> (define omp-get-max-threads
>=C2=A0 =C2=A0 (pointer->procedure int
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 (dynamic-func "omp_get_max_threads" libomp)
>=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 = =C2=A0 =C2=A0 (list void)))
>
> but i get this error:
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> In procedure pointer->procedure: Wrong type argument in position 3:= 0
>
> i do not understand why.


=E2=80=98int omp_get_max_thread(void);=E2=80=99 is C's way to declare t= hat
omp_get_max_thread has no arguments -- there is no 'void'-typed arg= ument.

Try (untested):

(define omp-get-max-threads
=C2=A0 =C2=A0(pointer->procedure int
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0(dynamic-func "omp_get_max_threads" libomp)
=C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2=A0 =C2= =A0 =C2=A0(list)))

Greetings,
Maxime.
--000000000000ae84be05f22347e9--