i made some test of openMP and Guile with Guile 3.0.8.99-f3ea8 on MacOS M1 and Linux Intel because i was not sure of the performances. I find a problem on Linux the code is slower (could be a factor of 5) with openMP and in Mac OS the gain is is of 100% (divide by 2) or 15% depending of computation complexity. i can not explain why it works under MacOS and not Linux, the only difference of compilation is that under Mac OS i had to force this option to succeed compiling: configure --enable-mini-gmp Anyway it is not good performance for openMP with scheme, under openMP with n CPUs i have gain of almost n x 100% of speedup, in C language or Fortran OpenMP when use for astronomical numerical simulation. in the // region i have only this code on MacOS: scm_init_guile(); #pragma omp parallel for for (i=start; i<=stop; i++) { /* i is private by default */ scm_call_1( func , scm_from_int(i) ); with linux this create a segmentation fault unless i move inside the for loop the line scm_init_guile(); like this: #pragma omp parallel for for (i=start; i<=stop; i++) { /* i is private by default */ scm_init_guile(); scm_call_1( func , scm_from_int(i) ); https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c#L91 the scheme+ code for speed test looks like that (i use collatz function to make the computation unpredictable for any C compiler optimisations when i compare with pur C code): ;; only for speed tests {vtstlen <+ 2642245} {vtst <+ (make-vector vtstlen 0)} {fct <+ (lambda (x) {x * x * x})} (define (fctapply i) {vtst[i] <- fct(vtst[i])}) ;; neoteric expression of {vtst[i] <- (fct vtst[i])} (define (fctpluscollatzapply i) {vtst[i] <- fctpluscollatz(vtst[i])}) (define (speed-test) ;; init data (display-nl "speed-test : Initialising data.") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) {vtst[i] <- i}) ;; compute (display-nl "speed-test : testing Scheme alone : start") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) (fctpluscollatzapply i));;(fctapply i)) (display-nl "speed-test : testing Scheme alone : end") (newline) ;; display a few results (for ({i <+ 0} {i < 10} {i <- {i + 1}}) (display-nl {vtst[i]})) (display-nl ".....") (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}}) (display-nl {vtst[i]})) ;; init data (display-nl "speed-test : Initialising data.") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) {vtst[i] <- i}) ;; compute (display-nl "speed-test : testing Scheme with OpenMP : start") (openmp 0 {vtstlen - 1} (string->pointer "fctpluscollatzapply"));;"fctapply")) (display-nl "speed-test : testing Scheme with OpenMP : end") (newline) ;; display a few results (for ({i <+ 0} {i < 10} {i <- {i + 1}}) (display-nl {vtst[i]})) (display-nl ".....") (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}}) (display-nl {vtst[i]})) ) (define (collatz n) (cond ({n = 1} 1) ({(modulo n 2) = 0} {n / 2}) (else {{3 * n} + 1}))) (define (fctpluscollatz x) (declare c) (if {x = 0} {c <- 0} {c <- collatz(x)}) {{x * x * x} + c}) (define openmp (foreign-library-function "./libguile-openMP" "openmp" #:return-type int #:arg-types (list int int '*))) (define libomp (dynamic-link "libomp")) ;; note: require a link : ln -s /opt/homebrew/opt/libomp/lib/libomp.dylib libomp.dylib ;; export LTDL_LIBRARY_PATH=. under linux with a link as above ;; or better solution: export LTDL_LIBRARY_PATH=/usr/lib/llvm-14/lib (define omp-get-max-threads (pointer->procedure int (dynamic-func "omp_get_max_threads" libomp) '())) https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3581 output: scheme@(guile-user)> (speed-test ) speed-test : Initialising data. speed-test : testing Scheme alone : start speed-test : testing Scheme alone : end 0 2 9 37 66 141 219 365 516 757 ..... 18446514741354254581 18446535685572961374 18446556629820732765 18446577574071146391 18446598518350624637 18446619462632745120 18446640406943930245 18446661351257757609 18446682295600649637 18446703239946183906 speed-test : Initialising data. speed-test : testing Scheme with OpenMP : start speed-test : testing Scheme with OpenMP : end 0 2 9 37 66 141 219 365 516 757 ..... 18446514741354254581 18446535685572961374 18446556629820732765 18446577574071146391 18446598518350624637 18446619462632745120 18446640406943930245 18446661351257757609 18446682295600649637 18446703239946183906 the sequential region : 4" the // region: 2" (twice faster) of course if i run a pure C eqivlent code it is instantaneous: // openMP cube - collatz test #include #include #include // OpenMP on macOS with Xcode tools: // https://mac.r-project.org/openmp/ // export OMP_NUM_THREADS=8 // this main() in a library was only for testing openMP with Mac OS Xcode and Linux for use uncomment main() and comment openmp() functions // mac os : // clang -I/opt/homebrew/opt/libomp/include -L/opt/homebrew/opt/libomp/lib -Xclang -fopenmp -o collatz -lomp collatz.c // gcc -L/usr/lib/llvm-14/lib/ -fopenmp -o collatz -lomp collatz.c unsigned long long *vtst; unsigned long long collatz(unsigned long long n) { if (n == 1) return 1; if ((n % 2) == 0) return n / 2; else return 3*n + 1; } unsigned long long fct(unsigned long long x) { unsigned long long c; if (x == 0) c = 0; else c = collatz(x); return (x * x * x) + c; } unsigned long long fctapply(unsigned long long i) { return vtst[i] = fct(vtst[i]); } int main() { int vtstlen = 2642245; // cubic root of 18,446,744,073,709,551,615 https://en.wikipedia.org/wiki/C_data_types vtst = calloc(vtstlen, sizeof(unsigned long long)); int ncpus = omp_get_max_threads(); printf("Found a maximum of %i cores.\n",ncpus); printf("Program compute cube of numbers and add collatz result (1) with and without parallelisation with OpenMP library.\n\n"); printf("Initialising data.\n\n"); //int iam,nthr; // init data sequential for (int i=0; i wrote: > > no it returns something based on address: > > scheme@(guile-user)> (current-thread) > > $1 = # > > the good thing it is that it is different for each address, the bad is > that i do not know how to extract it from the result and anyway i need a > number : 0,1,2,3... ordered and being a partition to make scheduling that > each thread deal with a part of the array (vector) the way it is in OpenMP > like in the FOR example i posted a week ago > > You could define a (weak key) hash table from threads to numbers, and > whenever a thread is encountered that isn't yet in the table, assign it > an unused number and insert it in the table. Requires locking (or an > atomics equivalent) though, so not ideal. > > (Maybe there's a method to get a number, directly, but I don't know any.) > > > just do a 'for like in openMP (mentioned above) > > In that case, when implementing slicing the array between different new > fibers, you can give each of the fibers you spawn (one fiber per slice, > if I understand the terminology correctly) an entry in the vector, and > after all the fibers complete do the usual 'sum/multiply/... all > entries' trick. > > As each fiber has its own (independent) storage, not touched by the > other fibers, that should be safe. > > I suppose this might take more memory storage than with openMP. > > > i undertand fibers is better for scheduling web server request but not > for parallelizing like openMP - it is two differents world. > > You can do parallelisation with fibers (see ‘In that case, when > implementing slicing ...’), but from what I'm reading, it will be > somewhat unlike openMP. > > On 06-01-2023 16:06, Damien Mattei wrote: > > > > (define omp-get-max-threads > > (pointer->procedure int > > (dynamic-func "omp_get_max_threads" libomp) > > (list void))) > > > > but i get this error: > > ice-9/boot-9.scm:1685:16: In procedure raise-exception: > > In procedure pointer->procedure: Wrong type argument in position 3: 0 > > > > i do not understand why. > > > ‘int omp_get_max_thread(void);’ is C's way to declare that > omp_get_max_thread has no arguments -- there is no 'void'-typed argument. > > Try (untested): > > (define omp-get-max-threads > (pointer->procedure int > (dynamic-func "omp_get_max_threads" libomp) > (list))) > > Greetings, > Maxime. >