* fibers,questions about thread id and mutation of vectors @ 2023-01-06 10:20 Damien Mattei 2023-01-06 14:12 ` Maxime Devos 0 siblings, 1 reply; 8+ messages in thread From: Damien Mattei @ 2023-01-06 10:20 UTC (permalink / raw) To: guile-user Hi, is fibers having a way to know the thread number the code is running? i, mean the equivalent of omp_get_thread_num(); in openMP. Does vector-set! (i read the mutation section of the docand i suppose it is wrapped/rewritten for fibers) in fibers let many thread access a vector without time penalty if the index access are differents? (because if index are different there is no real concurrency) Regards, Damien ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fibers,questions about thread id and mutation of vectors 2023-01-06 10:20 fibers,questions about thread id and mutation of vectors Damien Mattei @ 2023-01-06 14:12 ` Maxime Devos 2023-01-06 15:06 ` Damien Mattei 0 siblings, 1 reply; 8+ messages in thread From: Maxime Devos @ 2023-01-06 14:12 UTC (permalink / raw) To: Damien Mattei, guile-user [-- Attachment #1.1.1: Type: text/plain, Size: 2026 bytes --] On 06-01-2023 11:20, Damien Mattei wrote: > Hi, > is fibers having a way to know the thread number the code is running? > i, mean the equivalent of omp_get_thread_num(); in openMP. There's (current-thread), but that's just Guile; it's not Fibers-specific. It's also not a number. I don't know if it's equivalent to omp_get_thread_num. > Does vector-set! (i read the mutation section of the docand i suppose it is > wrapped/rewritten for fibers) in fibers What docs? Guile fibers doesn't rewrite vector-set!. I don't see what reason it would have to rewrite vector-set! at all. The only thing it rewrites is the suspendable ports stuff, but that's irrelevant here. Could you quote the relevant part of the documentation? > let many thread access a vector > without time penalty if the index access are differents? (because if index > are different there is no real concurrency) I've heard there is some kind of CPU cache migration penalty when mutating and reading nearby regions of memory from different CPUs, but that's CPU stuff, not fibers-specific or even Guile-specific stuff. Going by your previous question 'does fibers have a way to determine the thread is running on', I'm assuming your vector has entries per-thread and the fibers modify the per-thread entry (maybe some kind of accumulator?). Regardless of the time penalty, it's probably unsafe, because due to scheduling, the thread on which a fiber runs can change over time -- between determining the index and writing to the vector, the thread can have changed. So you'd have to do 'atomic-box-compare-and-swap!' stuff, but then you have a time penalty. Or use 'call-with-blocked-asyncs' appropriately to temporarily block preemption. Anyway, this reads as an 'XY problem' (https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378); instead of asking about your attempted solution, could you say what problem you are trying to solve? Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fibers,questions about thread id and mutation of vectors 2023-01-06 14:12 ` Maxime Devos @ 2023-01-06 15:06 ` Damien Mattei 2023-01-06 17:06 ` Damien Mattei 2023-01-06 17:06 ` Maxime Devos 0 siblings, 2 replies; 8+ messages in thread From: Damien Mattei @ 2023-01-06 15:06 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-user yes sorry i misunderstand the fiber doc about mutation: https://github.com/wingo/fibers/wiki/Manual#Mutation set! and vector-set! are not wrapped.(i imaginate it ...) reply continue below: On Fri, Jan 6, 2023 at 3:12 PM Maxime Devos <maximedevos@telenet.be> wrote: > > > On 06-01-2023 11:20, Damien Mattei wrote: > > Hi, > > is fibers having a way to know the thread number the code is running? > > i, mean the equivalent of omp_get_thread_num(); in openMP. > > There's (current-thread), but that's just Guile; it's not > Fibers-specific. It's also not a number. I don't know if it's > equivalent to omp_get_thread_num. > no it returns something based on address: scheme@(guile-user)> (current-thread) $1 = #<thread 8814535936 (102a61d80)> the good thing it is that it is different for each address, the bad is that i do not know how to extract it from the result and anyway i need a number : 0,1,2,3... ordered and being a partition to make scheduling that each thread deal with a part of the array (vector) the way it is in OpenMP like in the FOR example i posted a week ago > > > > let many thread access a vector > > without time penalty if the index access are differents? (because if > index > > are different there is no real concurrency) > > I've heard there is some kind of CPU cache migration penalty when > mutating and reading nearby regions of memory from different CPUs, but > that's CPU stuff, not fibers-specific or even Guile-specific stuff. > yes it is true > > Going by your previous question 'does fibers have a way to determine the > thread is running on', I'm assuming your vector has entries per-thread > and the fibers modify the per-thread entry (maybe some kind of > accumulator?). > > Regardless of the time penalty, it's probably unsafe, because due to > scheduling, the thread on which a fiber runs can change over time -- > between determining the index and writing to the vector, the thread can > have changed. ! if the code change of thread the organisation of computation is completely broken in OpenMP (and GPUs too) a thread keep it jobs until it is finished , data are sliced between threads at beginning, each thread use its number id to retrieve and compute the data of the array (array defined in shared memory) there is no concurrency (unless in cpu cache as you mentioned it but i do not know how openMP deal with that) as each thread use separate regions of the array (vector) > So you'd have to do 'atomic-box-compare-and-swap!' stuff, > but then you have a time penalty. Or use 'call-with-blocked-asyncs' > appropriately to temporarily block preemption. > > Anyway, this reads as an 'XY problem' > ( > https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378); > > instead of asking about your attempted solution, could you say what > problem you are trying to solve? > just do a 'for like in openMP (mentioned above) i undertand fibers is better for scheduling web server request but not for parallelizing like openMP - it is two differents world. Regards, Damien note: a little question again: i try to use int omp_get_max_threads(void); https://www.openmp.org/spec-html/5.0/openmpsu112.html from guile with: (define libomp (dynamic-link "libomp")) (define omp-get-max-threads (pointer->procedure int (dynamic-func "omp_get_max_threads" libomp) (list void))) but i get this error: ice-9/boot-9.scm:1685:16: In procedure raise-exception: In procedure pointer->procedure: Wrong type argument in position 3: 0 i do not understand why. > Greetings, > Maxime. > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fibers,questions about thread id and mutation of vectors 2023-01-06 15:06 ` Damien Mattei @ 2023-01-06 17:06 ` Damien Mattei 2023-01-06 17:06 ` Maxime Devos 1 sibling, 0 replies; 8+ messages in thread From: Damien Mattei @ 2023-01-06 17:06 UTC (permalink / raw) To: guile-user when void is not empty , the solution was to replace (list void) with '() :-) (define omp-get-max-threads (pointer->procedure int (dynamic-func "omp_get_max_threads" libomp) '())) On Fri, Jan 6, 2023 at 4:06 PM Damien Mattei <damien.mattei@gmail.com> wrote: > yes sorry i misunderstand the fiber doc about mutation: > https://github.com/wingo/fibers/wiki/Manual#Mutation > set! and vector-set! are not wrapped.(i imaginate it ...) > > reply continue below: > > On Fri, Jan 6, 2023 at 3:12 PM Maxime Devos <maximedevos@telenet.be> > wrote: > >> >> >> On 06-01-2023 11:20, Damien Mattei wrote: >> > Hi, >> > is fibers having a way to know the thread number the code is running? >> > i, mean the equivalent of omp_get_thread_num(); in openMP. >> >> There's (current-thread), but that's just Guile; it's not >> Fibers-specific. It's also not a number. I don't know if it's >> equivalent to omp_get_thread_num. >> > > no it returns something based on address: > scheme@(guile-user)> (current-thread) > $1 = #<thread 8814535936 (102a61d80)> > the good thing it is that it is different for each address, the bad is > that i do not know how to extract it from the result and anyway i need a > number : 0,1,2,3... ordered and being a partition to make scheduling that > each thread deal with a part of the array (vector) the way it is in OpenMP > like in the FOR example i posted a week ago > >> >> >> > let many thread access a vector >> > without time penalty if the index access are differents? (because if >> index >> > are different there is no real concurrency) >> >> I've heard there is some kind of CPU cache migration penalty when >> mutating and reading nearby regions of memory from different CPUs, but >> that's CPU stuff, not fibers-specific or even Guile-specific stuff. >> > > yes it is true > >> >> Going by your previous question 'does fibers have a way to determine the >> thread is running on', I'm assuming your vector has entries per-thread >> and the fibers modify the per-thread entry (maybe some kind of >> accumulator?). >> >> Regardless of the time penalty, it's probably unsafe, because due to >> scheduling, the thread on which a fiber runs can change over time -- >> between determining the index and writing to the vector, the thread can >> have changed. > > > ! if the code change of thread the organisation of computation is > completely broken > in OpenMP (and GPUs too) a thread keep it jobs until it is finished , data > are sliced between threads at beginning, each thread use its number id to > retrieve and compute the data of the array (array defined in shared memory) > there is no concurrency (unless in cpu cache as you mentioned it but i do > not know how openMP deal with that) as each thread use separate regions of > the array (vector) > >> So you'd have to do 'atomic-box-compare-and-swap!' stuff, >> but then you have a time penalty. Or use 'call-with-blocked-asyncs' >> appropriately to temporarily block preemption. >> >> Anyway, this reads as an 'XY problem' >> ( >> https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378); >> >> instead of asking about your attempted solution, could you say what >> problem you are trying to solve? >> > > just do a 'for like in openMP (mentioned above) > i undertand fibers is better for scheduling web server request but not for > parallelizing like openMP - it is two differents world. > > Regards, > Damien > > note: a little question again: > i try to use > > int omp_get_max_threads(void); > > https://www.openmp.org/spec-html/5.0/openmpsu112.html > > from guile with: > (define libomp (dynamic-link "libomp")) > > (define omp-get-max-threads > (pointer->procedure int > (dynamic-func "omp_get_max_threads" libomp) > (list void))) > > but i get this error: > ice-9/boot-9.scm:1685:16: In procedure raise-exception: > In procedure pointer->procedure: Wrong type argument in position 3: 0 > > i do not understand why. > > >> Greetings, >> Maxime. >> > ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fibers,questions about thread id and mutation of vectors 2023-01-06 15:06 ` Damien Mattei 2023-01-06 17:06 ` Damien Mattei @ 2023-01-06 17:06 ` Maxime Devos 2023-01-13 11:10 ` Damien Mattei 1 sibling, 1 reply; 8+ messages in thread From: Maxime Devos @ 2023-01-06 17:06 UTC (permalink / raw) To: Damien Mattei; +Cc: guile-user [-- Attachment #1.1.1: Type: text/plain, Size: 2426 bytes --] > no it returns something based on address: > scheme@(guile-user)> (current-thread) > $1 = #<thread 8814535936 (102a61d80)> > the good thing it is that it is different for each address, the bad is that i do not know how to extract it from the result and anyway i need a number : 0,1,2,3... ordered and being a partition to make scheduling that each thread deal with a part of the array (vector) the way it is in OpenMP like in the FOR example i posted a week ago You could define a (weak key) hash table from threads to numbers, and whenever a thread is encountered that isn't yet in the table, assign it an unused number and insert it in the table. Requires locking (or an atomics equivalent) though, so not ideal. (Maybe there's a method to get a number, directly, but I don't know any.) > just do a 'for like in openMP (mentioned above) In that case, when implementing slicing the array between different new fibers, you can give each of the fibers you spawn (one fiber per slice, if I understand the terminology correctly) an entry in the vector, and after all the fibers complete do the usual 'sum/multiply/... all entries' trick. As each fiber has its own (independent) storage, not touched by the other fibers, that should be safe. I suppose this might take more memory storage than with openMP. > i undertand fibers is better for scheduling web server request but not for parallelizing like openMP - it is two differents world. You can do parallelisation with fibers (see ‘In that case, when implementing slicing ...’), but from what I'm reading, it will be somewhat unlike openMP. On 06-01-2023 16:06, Damien Mattei wrote: > > (define omp-get-max-threads > (pointer->procedure int > (dynamic-func "omp_get_max_threads" libomp) > (list void))) > > but i get this error: > ice-9/boot-9.scm:1685:16: In procedure raise-exception: > In procedure pointer->procedure: Wrong type argument in position 3: 0 > > i do not understand why. ‘int omp_get_max_thread(void);’ is C's way to declare that omp_get_max_thread has no arguments -- there is no 'void'-typed argument. Try (untested): (define omp-get-max-threads (pointer->procedure int (dynamic-func "omp_get_max_threads" libomp) (list))) Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fibers,questions about thread id and mutation of vectors 2023-01-06 17:06 ` Maxime Devos @ 2023-01-13 11:10 ` Damien Mattei 2023-01-13 12:23 ` Maxime Devos 0 siblings, 1 reply; 8+ messages in thread From: Damien Mattei @ 2023-01-13 11:10 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-user, guile-devel [-- Attachment #1: Type: text/plain, Size: 11152 bytes --] i made some test of openMP and Guile with Guile 3.0.8.99-f3ea8 on MacOS M1 and Linux Intel because i was not sure of the performances. I find a problem on Linux the code is slower (could be a factor of 5) with openMP and in Mac OS the gain is is of 100% (divide by 2) or 15% depending of computation complexity. i can not explain why it works under MacOS and not Linux, the only difference of compilation is that under Mac OS i had to force this option to succeed compiling: configure --enable-mini-gmp Anyway it is not good performance for openMP with scheme, under openMP with n CPUs i have gain of almost n x 100% of speedup, in C language or Fortran OpenMP when use for astronomical numerical simulation. in the // region i have only this code on MacOS: scm_init_guile(); #pragma omp parallel for for (i=start; i<=stop; i++) { /* i is private by default */ scm_call_1( func , scm_from_int(i) ); with linux this create a segmentation fault unless i move inside the for loop the line scm_init_guile(); like this: #pragma omp parallel for for (i=start; i<=stop; i++) { /* i is private by default */ scm_init_guile(); scm_call_1( func , scm_from_int(i) ); https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c#L91 the scheme+ code for speed test looks like that (i use collatz function to make the computation unpredictable for any C compiler optimisations when i compare with pur C code): ;; only for speed tests {vtstlen <+ 2642245} {vtst <+ (make-vector vtstlen 0)} {fct <+ (lambda (x) {x * x * x})} (define (fctapply i) {vtst[i] <- fct(vtst[i])}) ;; neoteric expression of {vtst[i] <- (fct vtst[i])} (define (fctpluscollatzapply i) {vtst[i] <- fctpluscollatz(vtst[i])}) (define (speed-test) ;; init data (display-nl "speed-test : Initialising data.") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) {vtst[i] <- i}) ;; compute (display-nl "speed-test : testing Scheme alone : start") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) (fctpluscollatzapply i));;(fctapply i)) (display-nl "speed-test : testing Scheme alone : end") (newline) ;; display a few results (for ({i <+ 0} {i < 10} {i <- {i + 1}}) (display-nl {vtst[i]})) (display-nl ".....") (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}}) (display-nl {vtst[i]})) ;; init data (display-nl "speed-test : Initialising data.") (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}}) {vtst[i] <- i}) ;; compute (display-nl "speed-test : testing Scheme with OpenMP : start") (openmp 0 {vtstlen - 1} (string->pointer "fctpluscollatzapply"));;"fctapply")) (display-nl "speed-test : testing Scheme with OpenMP : end") (newline) ;; display a few results (for ({i <+ 0} {i < 10} {i <- {i + 1}}) (display-nl {vtst[i]})) (display-nl ".....") (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}}) (display-nl {vtst[i]})) ) (define (collatz n) (cond ({n = 1} 1) ({(modulo n 2) = 0} {n / 2}) (else {{3 * n} + 1}))) (define (fctpluscollatz x) (declare c) (if {x = 0} {c <- 0} {c <- collatz(x)}) {{x * x * x} + c}) (define openmp (foreign-library-function "./libguile-openMP" "openmp" #:return-type int #:arg-types (list int int '*))) (define libomp (dynamic-link "libomp")) ;; note: require a link : ln -s /opt/homebrew/opt/libomp/lib/libomp.dylib libomp.dylib ;; export LTDL_LIBRARY_PATH=. under linux with a link as above ;; or better solution: export LTDL_LIBRARY_PATH=/usr/lib/llvm-14/lib (define omp-get-max-threads (pointer->procedure int (dynamic-func "omp_get_max_threads" libomp) '())) https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3581 output: scheme@(guile-user)> (speed-test ) speed-test : Initialising data. speed-test : testing Scheme alone : start speed-test : testing Scheme alone : end 0 2 9 37 66 141 219 365 516 757 ..... 18446514741354254581 18446535685572961374 18446556629820732765 18446577574071146391 18446598518350624637 18446619462632745120 18446640406943930245 18446661351257757609 18446682295600649637 18446703239946183906 speed-test : Initialising data. speed-test : testing Scheme with OpenMP : start speed-test : testing Scheme with OpenMP : end 0 2 9 37 66 141 219 365 516 757 ..... 18446514741354254581 18446535685572961374 18446556629820732765 18446577574071146391 18446598518350624637 18446619462632745120 18446640406943930245 18446661351257757609 18446682295600649637 18446703239946183906 the sequential region : 4" the // region: 2" (twice faster) of course if i run a pure C eqivlent code it is instantaneous: // openMP cube - collatz test #include <omp.h> #include <stdio.h> #include <stdlib.h> // OpenMP on macOS with Xcode tools: // https://mac.r-project.org/openmp/ // export OMP_NUM_THREADS=8 // this main() in a library was only for testing openMP with Mac OS Xcode and Linux for use uncomment main() and comment openmp() functions // mac os : // clang -I/opt/homebrew/opt/libomp/include -L/opt/homebrew/opt/libomp/lib -Xclang -fopenmp -o collatz -lomp collatz.c // gcc -L/usr/lib/llvm-14/lib/ -fopenmp -o collatz -lomp collatz.c unsigned long long *vtst; unsigned long long collatz(unsigned long long n) { if (n == 1) return 1; if ((n % 2) == 0) return n / 2; else return 3*n + 1; } unsigned long long fct(unsigned long long x) { unsigned long long c; if (x == 0) c = 0; else c = collatz(x); return (x * x * x) + c; } unsigned long long fctapply(unsigned long long i) { return vtst[i] = fct(vtst[i]); } int main() { int vtstlen = 2642245; // cubic root of 18,446,744,073,709,551,615 https://en.wikipedia.org/wiki/C_data_types vtst = calloc(vtstlen, sizeof(unsigned long long)); int ncpus = omp_get_max_threads(); printf("Found a maximum of %i cores.\n",ncpus); printf("Program compute cube of numbers and add collatz result (1) with and without parallelisation with OpenMP library.\n\n"); printf("Initialising data.\n\n"); //int iam,nthr; // init data sequential for (int i=0; i<vtstlen; i++) { /* i is private by default because it is the for indice*/ //iam = omp_get_thread_num(); //printf("iam=%i\n",iam); //nthr = omp_get_num_threads() ; //printf("total number of threads=%i\n",nthr); vtst[i]=i; } printf("STARTING computation without //.\n"); for (int i=0; i<vtstlen; i++) { fctapply(i); } printf("ENDING computation without //.\n\n"); // display a few results for (int i=0;i < 10; i++) { printf("%llu\n",vtst[i]); } printf( ".....\n"); for (int i=vtstlen - 10; i < vtstlen; i++) { printf("%llu\n",vtst[i]); } printf("Initialising data in //.\n\n"); //int iam,nthr; #pragma omp parallel for private(vtstlen) shared(vtst) for (int i=0; i<vtstlen; i++) { /* i is private by default because it is the for indice*/ vtst[i]=i; } printf("STARTING computation in //.\n"); // setting private disable unecessary // overload work on some variables (mutex...) #pragma omp parallel for private(vtstlen) shared(vtst) for (int i=0; i<vtstlen; i++) { /* i is private by default */ fctapply(i); } printf("ENDING computation in //.\n\n"); // display a few results for (int i=0;i < 10; i++) { printf("%llu\n",vtst[i]); } printf( ".....\n"); for (int i=vtstlen - 10; i < vtstlen; i++) { printf("%llu\n",vtst[i]); } } https://github.com/damien-mattei/library-FunctProg/blob/master/collatz.c in conclusion openMP with Guile give a few improvement of a factor between 1.15 (with logic algo) of 2 (benchmarks with cube and collatz) of speed only on MacOS under Linux it fails with segfault or is slower. there should be difference in implementation of Guile between Mac OS and Linux but i do not know the inner mechanism and algorithm used to run Guile in a C environment,what scm_init_guile() is doing? why must it be placed under the // region on Linux (with slower result) and anywhere under MacOS ? (speed up code) possibly this could be improved. It is already a good result to see it works with OpenMP in Scheme . Best wishes, Damien On Fri, Jan 6, 2023 at 6:06 PM Maxime Devos <maximedevos@telenet.be> wrote: > > no it returns something based on address: > > scheme@(guile-user)> (current-thread) > > $1 = #<thread 8814535936 (102a61d80)> > > the good thing it is that it is different for each address, the bad is > that i do not know how to extract it from the result and anyway i need a > number : 0,1,2,3... ordered and being a partition to make scheduling that > each thread deal with a part of the array (vector) the way it is in OpenMP > like in the FOR example i posted a week ago > > You could define a (weak key) hash table from threads to numbers, and > whenever a thread is encountered that isn't yet in the table, assign it > an unused number and insert it in the table. Requires locking (or an > atomics equivalent) though, so not ideal. > > (Maybe there's a method to get a number, directly, but I don't know any.) > > > just do a 'for like in openMP (mentioned above) > > In that case, when implementing slicing the array between different new > fibers, you can give each of the fibers you spawn (one fiber per slice, > if I understand the terminology correctly) an entry in the vector, and > after all the fibers complete do the usual 'sum/multiply/... all > entries' trick. > > As each fiber has its own (independent) storage, not touched by the > other fibers, that should be safe. > > I suppose this might take more memory storage than with openMP. > > > i undertand fibers is better for scheduling web server request but not > for parallelizing like openMP - it is two differents world. > > You can do parallelisation with fibers (see ‘In that case, when > implementing slicing ...’), but from what I'm reading, it will be > somewhat unlike openMP. > > On 06-01-2023 16:06, Damien Mattei wrote: > > > > (define omp-get-max-threads > > (pointer->procedure int > > (dynamic-func "omp_get_max_threads" libomp) > > (list void))) > > > > but i get this error: > > ice-9/boot-9.scm:1685:16: In procedure raise-exception: > > In procedure pointer->procedure: Wrong type argument in position 3: 0 > > > > i do not understand why. > > > ‘int omp_get_max_thread(void);’ is C's way to declare that > omp_get_max_thread has no arguments -- there is no 'void'-typed argument. > > Try (untested): > > (define omp-get-max-threads > (pointer->procedure int > (dynamic-func "omp_get_max_threads" libomp) > (list))) > > Greetings, > Maxime. > [-- Attachment #2: Type: text/html, Size: 16234 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fibers,questions about thread id and mutation of vectors 2023-01-13 11:10 ` Damien Mattei @ 2023-01-13 12:23 ` Maxime Devos 2023-01-17 9:42 ` Damien Mattei 0 siblings, 1 reply; 8+ messages in thread From: Maxime Devos @ 2023-01-13 12:23 UTC (permalink / raw) To: Damien Mattei; +Cc: guile-user, guile-devel [-- Attachment #1.1.1: Type: text/plain, Size: 1578 bytes --] > for (i=start; i<=stop; i++) { /* i is private by default */ > > scm_init_guile(); > scm_call_1( func , scm_from_int(i) ); IIUC, you are calling scm_init_guile once per index, whereas calling it once per thread would suffice. For better performance, I propose doing it once per thread. On 13-01-2023 12:10, Damien Mattei wrote: > there should be difference in implementation of Guile between Mac OS and > Linux but i do not know the inner mechanism and algorithm used to run > Guile in a C environment,what scm_init_guile() is doing? Guile is free software, you can download the source code to read what scm_init_guile is doing. > why must it be placed under the // region on Linux (with slower result) From the manual: 6.4 Each thread that wants to use functions from the Guile API needs to put itself into guile mode with either ‘scm_with_guile’ or ‘scm_init_guile’. The global state of Guile is initialized automatically when the first thread enters guile mode. OpenMP does multi-threading, so you need to call scm_init_guile or scm_with_guile. > and anywhere under MacOS ? (speed up code) You need to do it on non-Linux too -- the extract from the manual does not make a 'except on MacOS' exception. It might work without scm_init_guile in some circumstances, but this is not at all guaranteed. If you want to know the difference between MacOS and Linux in the implementation of scm_init_guile, you can read the source code of scm_init_guile as mentioned before. Greetings, Maxime. [-- Attachment #1.1.2: OpenPGP public key --] [-- Type: application/pgp-keys, Size: 929 bytes --] [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 236 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: fibers,questions about thread id and mutation of vectors 2023-01-13 12:23 ` Maxime Devos @ 2023-01-17 9:42 ` Damien Mattei 0 siblings, 0 replies; 8+ messages in thread From: Damien Mattei @ 2023-01-17 9:42 UTC (permalink / raw) To: Maxime Devos; +Cc: guile-user, guile-devel Hello Maxime, it runs in the fastest way with your idea, as you said it scm_init_guile() is only needed once by thread. On Fri, Jan 13, 2023 at 1:23 PM Maxime Devos <maximedevos@telenet.be> wrote: > > for (i=start; i<=stop; i++) { /* i is private by default */ > > > > scm_init_guile(); > > scm_call_1( func , scm_from_int(i) ); > > IIUC, you are calling scm_init_guile once per index, whereas calling it > yes openMP slice a 1 to N for loop in N/number_of_cpus segments of normal C for loop but run one loop per CPUs so if you do a 'top' command on a C openMP code you will see a load of number_of_cpus*100% for example with 12 cpus top will then display a load for your program of 1200% furthermore if you hit the 1 key you would see in top the load of each CPU (100% each) the same options does not exist with 'top' of BSD like Mac OS. OpenMP do a partition of N and run exactly each part on one thread ,each thread on a different CPU or core, i think it is the only library that can do that , OpenMP is written very near of the compiler and LLVM. In general there is a Master thread and slave threads or you can run a special code only on the first thread to fork (master one or the first to launch) and friday unfortunately i tried the single pragma: https://www.openmp.org/spec-html/5.0/openmpsu38.html but that can not help becaus it run only on the first thread. a solution of the problem could be this one: Executing Code Once Per Thread in an OpenMP Loop <https://ofekshilon.com/2014/06/10/executing-code-once-per-thread-in-an-openmp-loop/> https://www.openmp.org/spec-html/5.0/openmpsu38.html but it is (Visual C++) and even with g++ this would be not compatible. so i use a basic C solution with static and array that keep in memory if the scm_init_guile() as already been launch for the current thread the code is running now. I also put omp_get_max_threads() in a static var as openmp() is called many times in my codes and the number of available hardware cpus would change never. the code is here: https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c unfortunately i find no real speed up, i understood that the only reason of speed up was because the C 'for loop is much faster than the Scheme 'for ones. For this concclusion i compared Scheme and C openmp and C without openMP and in C i got exactly the same time results: Scheme: ... [output cut] Chrono START number: 165 minterms-vector-length = 10944. chrono STOP : elapsedTime = 36.219 ms.totalComputationTime =485311.94 Chrono START number: 166 minterms-vector-length = 12008. chrono STOP : elapsedTime = 39.82 ms.totalComputationTime =485351.76 Chrono START number: 167 minterms-vector-length = 342. chrono STOP : elapsedTime = 1.215 ms.totalComputationTime =485352.97500000003 Scheme with OpenMP call: ...[output cut] Chrono START number: 165 minterms-vector-length = 10944. chrono STOP : elapsedTime = 35.039 ms.Open MP totalComputationTime =385444.1410000001 Chrono START number: 166 minterms-vector-length = 12008. chrono STOP : elapsedTime = 37.792 ms.Open MP totalComputationTime =385481.93300000014 Chrono START number: 167 minterms-vector-length = 342. chrono STOP : elapsedTime = 1.163 ms.Open MP totalComputationTime =385483.09600000014 Scheme with C 'for loop call: ...[output cut] Chrono START number: 165 minterms-vector-length = 10944. chrono STOP : elapsedTime = 33.104 ms.For Funct totalComputationTime =385543.4700000001 Chrono START number: 166 minterms-vector-length = 12008. chrono STOP : elapsedTime = 35.938 ms.For Funct totalComputationTime =385579.4080000001 Chrono START number: 167 minterms-vector-length = 342. chrono STOP : elapsedTime = 1.165 ms.For Funct totalComputationTime =385580.5730000001 on the C codes (// openmp and sequenctial for) the result is almost the same : totalComputationTime =385580.5730000001 ms totalComputationTime =385483.09600000014 ms =385 s i suppose openMP works well by slicing on many processors but the scm_call_1( func , scm_from_int(i) ); works all on the same thread that host the Guile interpreter. Solution would be to have many Guile interpreter running but i do not know how doing that from the C code with OpenMP. Damien note : i did time measure both in C and Scheme with gettimeofday code to compare both 100% scheme code and mixed one: https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3500 ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2023-01-17 9:42 UTC | newest] Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2023-01-06 10:20 fibers,questions about thread id and mutation of vectors Damien Mattei 2023-01-06 14:12 ` Maxime Devos 2023-01-06 15:06 ` Damien Mattei 2023-01-06 17:06 ` Damien Mattei 2023-01-06 17:06 ` Maxime Devos 2023-01-13 11:10 ` Damien Mattei 2023-01-13 12:23 ` Maxime Devos 2023-01-17 9:42 ` Damien Mattei
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).