fibers,questions about thread id and mutation of vectors

unofficial mirror of guile-user@gnu.org 
 help / color / mirror / Atom feed

* fibers,questions about thread id and mutation of vectors
@ 2023-01-06 10:20 Damien Mattei
  2023-01-06 14:12 ` Maxime Devos
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Mattei @ 2023-01-06 10:20 UTC (permalink / raw)
  To: guile-user

Hi,
is fibers having a way to know the thread number the code is running?
i, mean the equivalent of omp_get_thread_num(); in openMP.
Does vector-set! (i read the mutation section of the docand i suppose it is
wrapped/rewritten for fibers) in fibers let many thread access a vector
without time penalty if the index access are differents? (because if index
are different there is no real concurrency)
Regards,
Damien

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fibers,questions about thread id and mutation of vectors
  2023-01-06 10:20 fibers,questions about thread id and mutation of vectors Damien Mattei
@ 2023-01-06 14:12 ` Maxime Devos
  2023-01-06 15:06   ` Damien Mattei
  0 siblings, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2023-01-06 14:12 UTC (permalink / raw)
  To: Damien Mattei, guile-user

[-- Attachment #1.1.1: Type: text/plain, Size: 2026 bytes --]

On 06-01-2023 11:20, Damien Mattei wrote:
> Hi,
> is fibers having a way to know the thread number the code is running?
> i, mean the equivalent of omp_get_thread_num(); in openMP.

There's (current-thread), but that's just Guile; it's not 
Fibers-specific.  It's also not a number.  I don't know if it's 
equivalent to omp_get_thread_num.

> Does vector-set! (i read the mutation section of the docand i suppose it is
> wrapped/rewritten for fibers) in fibers

What docs?  Guile fibers doesn't rewrite vector-set!.  I don't see what 
reason it would have to rewrite vector-set! at all.  The only thing it 
rewrites is the suspendable ports stuff, but that's irrelevant here. 
Could you quote the relevant part of the documentation?

> let many thread access a vector
> without time penalty if the index access are differents? (because if index
> are different there is no real concurrency)

I've heard there is some kind of CPU cache migration penalty when 
mutating and reading nearby regions of memory from different CPUs, but 
that's CPU stuff, not fibers-specific or even Guile-specific stuff.

Going by your previous question 'does fibers have a way to determine the 
thread is running on', I'm assuming your vector has entries per-thread 
and the fibers modify the per-thread entry (maybe some kind of 
accumulator?).

Regardless of the time penalty, it's probably unsafe, because due to 
scheduling, the thread on which a fiber runs can change over time -- 
between determining the index and writing to the vector, the thread can 
have changed.  So you'd have to do 'atomic-box-compare-and-swap!' stuff, 
but then you have a time penalty.  Or use 'call-with-blocked-asyncs' 
appropriately to temporarily block preemption.

Anyway, this reads as an 'XY problem' 
(https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378); 
instead of asking about your attempted solution, could you say what 
problem you are trying to solve?

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fibers,questions about thread id and mutation of vectors
  2023-01-06 14:12 ` Maxime Devos
@ 2023-01-06 15:06   ` Damien Mattei
  2023-01-06 17:06     ` Damien Mattei
  2023-01-06 17:06     ` Maxime Devos
  0 siblings, 2 replies; 8+ messages in thread
From: Damien Mattei @ 2023-01-06 15:06 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-user

yes sorry i misunderstand the fiber doc about mutation:
https://github.com/wingo/fibers/wiki/Manual#Mutation
set! and vector-set! are not wrapped.(i imaginate it ...)

reply continue below:

On Fri, Jan 6, 2023 at 3:12 PM Maxime Devos <maximedevos@telenet.be> wrote:

>
>
> On 06-01-2023 11:20, Damien Mattei wrote:
> > Hi,
> > is fibers having a way to know the thread number the code is running?
> > i, mean the equivalent of omp_get_thread_num(); in openMP.
>
> There's (current-thread), but that's just Guile; it's not
> Fibers-specific.  It's also not a number.  I don't know if it's
> equivalent to omp_get_thread_num.
>

no it returns something based on address:
scheme@(guile-user)> (current-thread)
$1 = #<thread 8814535936 (102a61d80)>
the good thing it is that it is different for each address, the bad is that
i do not know how to extract it from the result and anyway i need a number
: 0,1,2,3... ordered and  being a partition to make scheduling that each
thread deal with a part of the array (vector) the way it is in OpenMP like
in the FOR example i posted a week ago

>
>
> > let many thread access a vector
> > without time penalty if the index access are differents? (because if
> index
> > are different there is no real concurrency)
>
> I've heard there is some kind of CPU cache migration penalty when
> mutating and reading nearby regions of memory from different CPUs, but
> that's CPU stuff, not fibers-specific or even Guile-specific stuff.
>

yes it is true

>
> Going by your previous question 'does fibers have a way to determine the
> thread is running on', I'm assuming your vector has entries per-thread
> and the fibers modify the per-thread entry (maybe some kind of
> accumulator?).
>
> Regardless of the time penalty, it's probably unsafe, because due to
> scheduling, the thread on which a fiber runs can change over time --
> between determining the index and writing to the vector, the thread can
> have changed.

! if the code change of thread the organisation of computation is
completely broken
in OpenMP (and GPUs too) a thread keep it jobs until it is finished , data
are sliced between threads at beginning, each thread use its number id  to
retrieve and compute the data of the array (array defined in shared memory)
there is no concurrency (unless in cpu cache as you mentioned it but i do
not know how openMP deal with that) as each thread use separate regions of
the array (vector)

> So you'd have to do 'atomic-box-compare-and-swap!' stuff,
> but then you have a time penalty.  Or use 'call-with-blocked-asyncs'
> appropriately to temporarily block preemption.
>
> Anyway, this reads as an 'XY problem'
> (
> https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378);
>
> instead of asking about your attempted solution, could you say what
> problem you are trying to solve?
>

just do a 'for like in openMP (mentioned above)
i undertand fibers is better for scheduling web server request but not for
parallelizing like openMP - it is two differents world.

Regards,
Damien

note: a little question again:
i try to use

int omp_get_max_threads(void);

https://www.openmp.org/spec-html/5.0/openmpsu112.html

from guile with:
(define libomp (dynamic-link "libomp"))

(define omp-get-max-threads
  (pointer->procedure int
                      (dynamic-func "omp_get_max_threads" libomp)
                      (list void)))

but i get this error:
ice-9/boot-9.scm:1685:16: In procedure raise-exception:
In procedure pointer->procedure: Wrong type argument in position 3: 0

i do not understand why.

> Greetings,
> Maxime.
>

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fibers,questions about thread id and mutation of vectors
  2023-01-06 15:06   ` Damien Mattei
@ 2023-01-06 17:06     ` Damien Mattei
  2023-01-06 17:06     ` Maxime Devos
  1 sibling, 0 replies; 8+ messages in thread
From: Damien Mattei @ 2023-01-06 17:06 UTC (permalink / raw)
  To: guile-user

when void is not empty , the solution was to replace (list void) with '()
:-)

(define omp-get-max-threads
  (pointer->procedure int
                      (dynamic-func "omp_get_max_threads" libomp)
                      '()))


On Fri, Jan 6, 2023 at 4:06 PM Damien Mattei <damien.mattei@gmail.com>
wrote:

> yes sorry i misunderstand the fiber doc about mutation:
> https://github.com/wingo/fibers/wiki/Manual#Mutation
> set! and vector-set! are not wrapped.(i imaginate it ...)
>
> reply continue below:
>
> On Fri, Jan 6, 2023 at 3:12 PM Maxime Devos <maximedevos@telenet.be>
> wrote:
>
>>
>>
>> On 06-01-2023 11:20, Damien Mattei wrote:
>> > Hi,
>> > is fibers having a way to know the thread number the code is running?
>> > i, mean the equivalent of omp_get_thread_num(); in openMP.
>>
>> There's (current-thread), but that's just Guile; it's not
>> Fibers-specific.  It's also not a number.  I don't know if it's
>> equivalent to omp_get_thread_num.
>>
>
> no it returns something based on address:
> scheme@(guile-user)> (current-thread)
> $1 = #<thread 8814535936 (102a61d80)>
> the good thing it is that it is different for each address, the bad is
> that i do not know how to extract it from the result and anyway i need a
> number : 0,1,2,3... ordered and  being a partition to make scheduling that
> each thread deal with a part of the array (vector) the way it is in OpenMP
> like in the FOR example i posted a week ago
>
>>
>>
>> > let many thread access a vector
>> > without time penalty if the index access are differents? (because if
>> index
>> > are different there is no real concurrency)
>>
>> I've heard there is some kind of CPU cache migration penalty when
>> mutating and reading nearby regions of memory from different CPUs, but
>> that's CPU stuff, not fibers-specific or even Guile-specific stuff.
>>
>
> yes it is true
>
>>
>> Going by your previous question 'does fibers have a way to determine the
>> thread is running on', I'm assuming your vector has entries per-thread
>> and the fibers modify the per-thread entry (maybe some kind of
>> accumulator?).
>>
>> Regardless of the time penalty, it's probably unsafe, because due to
>> scheduling, the thread on which a fiber runs can change over time --
>> between determining the index and writing to the vector, the thread can
>> have changed.
>
>
> ! if the code change of thread the organisation of computation is
> completely broken
> in OpenMP (and GPUs too) a thread keep it jobs until it is finished , data
> are sliced between threads at beginning, each thread use its number id  to
> retrieve and compute the data of the array (array defined in shared memory)
> there is no concurrency (unless in cpu cache as you mentioned it but i do
> not know how openMP deal with that) as each thread use separate regions of
> the array (vector)
>
>> So you'd have to do 'atomic-box-compare-and-swap!' stuff,
>> but then you have a time penalty.  Or use 'call-with-blocked-asyncs'
>> appropriately to temporarily block preemption.
>>
>> Anyway, this reads as an 'XY problem'
>> (
>> https://meta.stackexchange.com/questions/66377/what-is-the-xy-problem/66378#66378);
>>
>> instead of asking about your attempted solution, could you say what
>> problem you are trying to solve?
>>
>
> just do a 'for like in openMP (mentioned above)
> i undertand fibers is better for scheduling web server request but not for
> parallelizing like openMP - it is two differents world.
>
> Regards,
> Damien
>
> note: a little question again:
> i try to use
>
> int omp_get_max_threads(void);
>
> https://www.openmp.org/spec-html/5.0/openmpsu112.html
>
> from guile with:
> (define libomp (dynamic-link "libomp"))
>
> (define omp-get-max-threads
>   (pointer->procedure int
>                       (dynamic-func "omp_get_max_threads" libomp)
>                       (list void)))
>
> but i get this error:
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> In procedure pointer->procedure: Wrong type argument in position 3: 0
>
> i do not understand why.
>
>
>> Greetings,
>> Maxime.
>>
>


^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fibers,questions about thread id and mutation of vectors
  2023-01-06 15:06   ` Damien Mattei
  2023-01-06 17:06     ` Damien Mattei
@ 2023-01-06 17:06     ` Maxime Devos
  2023-01-13 11:10       ` Damien Mattei
  1 sibling, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2023-01-06 17:06 UTC (permalink / raw)
  To: Damien Mattei; +Cc: guile-user

[-- Attachment #1.1.1: Type: text/plain, Size: 2426 bytes --]

> no it returns something based on address:
> scheme@(guile-user)> (current-thread)
> $1 = #<thread 8814535936 (102a61d80)>
> the good thing it is that it is different for each address, the bad is that i do not know how to extract it from the result and anyway i need a number : 0,1,2,3... ordered and  being a partition to make scheduling that each thread deal with a part of the array (vector) the way it is in OpenMP like in the FOR example i posted a week ago

You could define a (weak key) hash table from threads to numbers, and 
whenever a thread is encountered that isn't yet in the table, assign it 
an unused number and insert it in the table.  Requires locking (or an 
atomics equivalent) though, so not ideal.

(Maybe there's a method to get a number, directly, but I don't know any.)

> just do a 'for like in openMP (mentioned above)

In that case, when implementing slicing the array between different new 
fibers, you can give each of the fibers you spawn (one fiber per slice, 
if I understand the terminology correctly) an entry in the vector, and 
after all the fibers complete do the usual 'sum/multiply/... all 
entries' trick.

As each fiber has its own (independent) storage, not touched by the 
other fibers, that should be safe.

I suppose this might take more memory storage than with openMP.

> i undertand fibers is better for scheduling web server request but not for parallelizing like openMP - it is two differents world.

You can do parallelisation with fibers (see ‘In that case, when 
implementing slicing ...’), but from what I'm reading, it will be 
somewhat unlike openMP.

On 06-01-2023 16:06, Damien Mattei wrote:
> 
> (define omp-get-max-threads
>    (pointer->procedure int
>                        (dynamic-func "omp_get_max_threads" libomp)
>                        (list void)))
> 
> but i get this error:
> ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> In procedure pointer->procedure: Wrong type argument in position 3: 0
> 
> i do not understand why.

‘int omp_get_max_thread(void);’ is C's way to declare that 
omp_get_max_thread has no arguments -- there is no 'void'-typed argument.

Try (untested):

(define omp-get-max-threads
   (pointer->procedure int
                       (dynamic-func "omp_get_max_threads" libomp)
                       (list)))

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fibers,questions about thread id and mutation of vectors
  2023-01-06 17:06     ` Maxime Devos
@ 2023-01-13 11:10       ` Damien Mattei
  2023-01-13 12:23         ` Maxime Devos
  0 siblings, 1 reply; 8+ messages in thread
From: Damien Mattei @ 2023-01-13 11:10 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-user, guile-devel

[-- Attachment #1: Type: text/plain, Size: 11152 bytes --]

i made some test of openMP and Guile with Guile 3.0.8.99-f3ea8 on MacOS M1
and Linux Intel because i was not sure of the performances. I find a
problem on Linux the code is slower (could be a factor of 5) with openMP
and in Mac OS the gain is is of 100% (divide by 2) or 15% depending of
computation complexity.
i can not explain why it works under MacOS and not Linux, the only
difference of compilation is that under Mac OS i had to force this option
to succeed compiling:
configure --enable-mini-gmp

Anyway it is not good performance for openMP with scheme, under openMP with
n CPUs i have gain of almost n x 100% of speedup, in C language or Fortran
OpenMP when use for astronomical numerical simulation.
in the // region i have only this code on MacOS:

  scm_init_guile();

#pragma omp parallel for

  for (i=start; i<=stop; i++)  { /* i is private by default */

    scm_call_1( func , scm_from_int(i) );

with linux this create a segmentation fault unless i move inside the for
loop the line scm_init_guile();

like this:

#pragma omp parallel for

  for (i=start; i<=stop; i++)  { /* i is private by default */

    scm_init_guile();
    scm_call_1( func , scm_from_int(i) );

https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c#L91

the scheme+ code for speed test looks like that (i use collatz function to
make the computation unpredictable for any C compiler optimisations when i
compare with pur C code):

;; only for speed tests
{vtstlen <+ 2642245}
{vtst <+ (make-vector vtstlen 0)}

{fct <+ (lambda (x) {x * x * x})}

(define (fctapply i) {vtst[i] <- fct(vtst[i])}) ;; neoteric expression of
{vtst[i] <- (fct vtst[i])}

(define (fctpluscollatzapply i) {vtst[i] <- fctpluscollatz(vtst[i])})

(define (speed-test)

  ;; init data
  (display-nl "speed-test : Initialising data.")
  (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}})
       {vtst[i] <- i})

  ;; compute
  (display-nl "speed-test : testing Scheme alone : start")
  (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}})
       (fctpluscollatzapply i));;(fctapply i))
  (display-nl "speed-test : testing Scheme alone : end")

  (newline)

  ;; display a few results
  (for ({i <+ 0} {i < 10} {i <- {i + 1}})
       (display-nl {vtst[i]}))
  (display-nl ".....")
  (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}})
       (display-nl {vtst[i]}))

  ;; init data
  (display-nl "speed-test : Initialising data.")
  (for ({i <+ 0} {i < vtstlen} {i <- {i + 1}})
       {vtst[i] <- i})

  ;; compute
  (display-nl "speed-test : testing Scheme with OpenMP : start")
  (openmp 0 {vtstlen - 1} (string->pointer
"fctpluscollatzapply"));;"fctapply"))
  (display-nl "speed-test : testing Scheme with OpenMP : end")

  (newline)

  ;; display a few results
  (for ({i <+ 0} {i < 10} {i <- {i + 1}})
       (display-nl {vtst[i]}))
  (display-nl ".....")
  (for ({i <+ {vtstlen - 10}} {i < vtstlen} {i <- {i + 1}})
       (display-nl {vtst[i]}))

  )


(define (collatz n)
  (cond ({n = 1} 1)
({(modulo n 2) = 0} {n / 2})
(else {{3 * n} + 1})))


(define (fctpluscollatz x)
  (declare c)
  (if {x = 0}
      {c <- 0}
      {c <- collatz(x)})
  {{x * x * x} + c})


(define openmp (foreign-library-function "./libguile-openMP" "openmp"
#:return-type int #:arg-types (list int int '*)))


(define libomp (dynamic-link "libomp")) ;;  note: require a link : ln -s
/opt/homebrew/opt/libomp/lib/libomp.dylib libomp.dylib
;; export LTDL_LIBRARY_PATH=. under linux with a link as above
;; or better solution: export LTDL_LIBRARY_PATH=/usr/lib/llvm-14/lib

(define omp-get-max-threads
  (pointer->procedure int
                      (dynamic-func "omp_get_max_threads" libomp)
                      '()))

https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3581

output:

scheme@(guile-user)> (speed-test )
speed-test : Initialising data.
speed-test : testing Scheme alone : start
speed-test : testing Scheme alone : end

0
2
9
37
66
141
219
365
516
757
.....
18446514741354254581
18446535685572961374
18446556629820732765
18446577574071146391
18446598518350624637
18446619462632745120
18446640406943930245
18446661351257757609
18446682295600649637
18446703239946183906
speed-test : Initialising data.
speed-test : testing Scheme with OpenMP : start
speed-test : testing Scheme with OpenMP : end

0
2
9
37
66
141
219
365
516
757
.....
18446514741354254581
18446535685572961374
18446556629820732765
18446577574071146391
18446598518350624637
18446619462632745120
18446640406943930245
18446661351257757609
18446682295600649637
18446703239946183906

the sequential region : 4"
the // region: 2" (twice faster)

of course if i run a pure C eqivlent code it is instantaneous:

// openMP cube - collatz test

#include <omp.h>
#include <stdio.h>
#include <stdlib.h>



// OpenMP on macOS with Xcode tools:
// https://mac.r-project.org/openmp/

// export OMP_NUM_THREADS=8

// this main() in a library was only for testing openMP with Mac OS Xcode
and Linux for use uncomment main() and comment openmp() functions


// mac os :
// clang  -I/opt/homebrew/opt/libomp/include
 -L/opt/homebrew/opt/libomp/lib -Xclang -fopenmp -o collatz  -lomp collatz.c

// gcc -L/usr/lib/llvm-14/lib/ -fopenmp  -o collatz  -lomp collatz.c


unsigned long long *vtst;



unsigned long long collatz(unsigned long long n) {

  if (n == 1) return 1;

  if ((n % 2) == 0)
    return n / 2;
  else
    return 3*n + 1;

}

unsigned long long fct(unsigned long long x) {

  unsigned long long c;
  if (x == 0)
    c = 0;
  else
    c = collatz(x);

  return (x * x * x) + c;
}


unsigned long long fctapply(unsigned long long i) {
  return vtst[i] = fct(vtst[i]);
}




int main() {
  int vtstlen = 2642245; // cubic root of 18,446,744,073,709,551,615
https://en.wikipedia.org/wiki/C_data_types
  vtst = calloc(vtstlen, sizeof(unsigned long long));

  int ncpus = omp_get_max_threads();
  printf("Found a maximum of %i cores.\n",ncpus);
  printf("Program compute cube of numbers and add collatz result (1) with
and without parallelisation with OpenMP library.\n\n");
  printf("Initialising data.\n\n");
  //int iam,nthr;

  // init data sequential
  for (int i=0; i<vtstlen; i++) { /* i is private by default because it is
the for indice*/
    //iam = omp_get_thread_num();
    //printf("iam=%i\n",iam);
    //nthr = omp_get_num_threads() ;
    //printf("total number of threads=%i\n",nthr);
    vtst[i]=i;

  }


  printf("STARTING computation without //.\n");


  for (int i=0; i<vtstlen; i++) {

    fctapply(i);

  }

  printf("ENDING computation without //.\n\n");

  // display a few results
  for (int i=0;i < 10; i++) {
    printf("%llu\n",vtst[i]);
  }
  printf( ".....\n");
  for (int i=vtstlen - 10; i < vtstlen; i++) {
    printf("%llu\n",vtst[i]);
  }


  printf("Initialising data in //.\n\n");
  //int iam,nthr;

#pragma omp parallel for private(vtstlen) shared(vtst)


  for (int i=0; i<vtstlen; i++) { /* i is private by default because it is
the for indice*/

    vtst[i]=i;

  }

  printf("STARTING computation in //.\n");


  // setting private disable unecessary // overload work on some variables
(mutex...)
#pragma omp parallel for private(vtstlen) shared(vtst)


  for (int i=0; i<vtstlen; i++) { /* i is private by default */

    fctapply(i);

  }

  printf("ENDING computation in //.\n\n");


  // display a few results
  for (int i=0;i < 10; i++) {
    printf("%llu\n",vtst[i]);
  }
  printf( ".....\n");
  for (int i=vtstlen - 10; i < vtstlen; i++) {
    printf("%llu\n",vtst[i]);
  }


}

https://github.com/damien-mattei/library-FunctProg/blob/master/collatz.c

in conclusion openMP with Guile give a few improvement of a factor between
1.15 (with logic algo) of 2 (benchmarks with cube and collatz) of speed
only on MacOS under Linux it fails with segfault or is slower.

there should be difference in implementation of Guile between Mac OS and
Linux but i do not know the inner mechanism and algorithm used to run Guile
in a C environment,what  scm_init_guile() is doing?
why must it be placed under the // region on Linux (with slower result) and
anywhere under MacOS ? (speed up code)
possibly this could be improved. It is already a good result to see it
works with OpenMP in Scheme .

Best wishes,

Damien


On Fri, Jan 6, 2023 at 6:06 PM Maxime Devos <maximedevos@telenet.be> wrote:

> > no it returns something based on address:
> > scheme@(guile-user)> (current-thread)
> > $1 = #<thread 8814535936 (102a61d80)>
> > the good thing it is that it is different for each address, the bad is
> that i do not know how to extract it from the result and anyway i need a
> number : 0,1,2,3... ordered and  being a partition to make scheduling that
> each thread deal with a part of the array (vector) the way it is in OpenMP
> like in the FOR example i posted a week ago
>
> You could define a (weak key) hash table from threads to numbers, and
> whenever a thread is encountered that isn't yet in the table, assign it
> an unused number and insert it in the table.  Requires locking (or an
> atomics equivalent) though, so not ideal.
>
> (Maybe there's a method to get a number, directly, but I don't know any.)
>
> > just do a 'for like in openMP (mentioned above)
>
> In that case, when implementing slicing the array between different new
> fibers, you can give each of the fibers you spawn (one fiber per slice,
> if I understand the terminology correctly) an entry in the vector, and
> after all the fibers complete do the usual 'sum/multiply/... all
> entries' trick.
>
> As each fiber has its own (independent) storage, not touched by the
> other fibers, that should be safe.
>
> I suppose this might take more memory storage than with openMP.
>
> > i undertand fibers is better for scheduling web server request but not
> for parallelizing like openMP - it is two differents world.
>
> You can do parallelisation with fibers (see ‘In that case, when
> implementing slicing ...’), but from what I'm reading, it will be
> somewhat unlike openMP.
>
> On 06-01-2023 16:06, Damien Mattei wrote:
> >
> > (define omp-get-max-threads
> >    (pointer->procedure int
> >                        (dynamic-func "omp_get_max_threads" libomp)
> >                        (list void)))
> >
> > but i get this error:
> > ice-9/boot-9.scm:1685:16: In procedure raise-exception:
> > In procedure pointer->procedure: Wrong type argument in position 3: 0
> >
> > i do not understand why.
>
>
> ‘int omp_get_max_thread(void);’ is C's way to declare that
> omp_get_max_thread has no arguments -- there is no 'void'-typed argument.
>
> Try (untested):
>
> (define omp-get-max-threads
>    (pointer->procedure int
>                        (dynamic-func "omp_get_max_threads" libomp)
>                        (list)))
>
> Greetings,
> Maxime.
>

[-- Attachment #2: Type: text/html, Size: 16234 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fibers,questions about thread id and mutation of vectors
  2023-01-13 11:10       ` Damien Mattei
@ 2023-01-13 12:23         ` Maxime Devos
  2023-01-17  9:42           ` Damien Mattei
  0 siblings, 1 reply; 8+ messages in thread
From: Maxime Devos @ 2023-01-13 12:23 UTC (permalink / raw)
  To: Damien Mattei; +Cc: guile-user, guile-devel

[-- Attachment #1.1.1: Type: text/plain, Size: 1578 bytes --]

>   for (i=start; i<=stop; i++)  { /* i is private by default */
> 
>     scm_init_guile();
>     scm_call_1( func , scm_from_int(i) );

IIUC, you are calling scm_init_guile once per index, whereas calling it 
once per thread would suffice.  For better performance, I propose doing 
it once per thread.

On 13-01-2023 12:10, Damien Mattei wrote:
> there should be difference in implementation of Guile between Mac OS and 
> Linux but i do not know the inner mechanism and algorithm used to run 
> Guile in a C environment,what  scm_init_guile() is doing?

Guile is free software, you can download the source code to read what 
scm_init_guile is doing.

> why must it be placed under the // region on Linux (with slower result)

 From the manual:

6.4  Each thread that wants to use functions from the Guile API needs to 
put itself into guile mode with either ‘scm_with_guile’ or 
‘scm_init_guile’. The global state of Guile is initialized automatically 
when the first thread enters guile mode.

OpenMP does multi-threading, so you need to call scm_init_guile or 
scm_with_guile.

> and anywhere under MacOS ? (speed up code)

You need to do it on non-Linux too -- the extract from the manual does 
not make a 'except on MacOS' exception.  It might work without 
scm_init_guile in some circumstances, but this is not at all guaranteed.

If you want to know the difference between MacOS and Linux in the 
implementation of scm_init_guile, you can read the source code of 
scm_init_guile as mentioned before.

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: fibers,questions about thread id and mutation of vectors
  2023-01-13 12:23         ` Maxime Devos
@ 2023-01-17  9:42           ` Damien Mattei
  0 siblings, 0 replies; 8+ messages in thread
From: Damien Mattei @ 2023-01-17  9:42 UTC (permalink / raw)
  To: Maxime Devos; +Cc: guile-user, guile-devel

Hello Maxime,
 it runs in the fastest way with your idea,
as you said it scm_init_guile() is only needed once by thread.

On Fri, Jan 13, 2023 at 1:23 PM Maxime Devos <maximedevos@telenet.be> wrote:

> >   for (i=start; i<=stop; i++)  { /* i is private by default */
> >
> >     scm_init_guile();
> >     scm_call_1( func , scm_from_int(i) );
>
> IIUC, you are calling scm_init_guile once per index, whereas calling it
>

yes openMP slice a 1 to N for loop in N/number_of_cpus segments of normal C
for loop but run one loop per CPUs so if you do a 'top' command on a C
openMP code you will see a load of number_of_cpus*100%
for example with 12 cpus top will then display a load for your program of
1200% furthermore if you hit the 1 key you would see in top the load of
each CPU (100% each) the same options does not exist with 'top' of BSD like
Mac OS.

OpenMP do a partition of N and run exactly each part on  one thread ,each
thread on a different CPU or core, i think it is the only library that can
do that , OpenMP is written very near of the compiler and LLVM.

In general there is a Master thread and slave threads or you can run a
special code only on the first thread to fork (master one or the first to
launch) and friday unfortunately i tried the single pragma:
https://www.openmp.org/spec-html/5.0/openmpsu38.html
but that can not help becaus it run only on the first thread.

a solution of the problem could be this one:
Executing Code Once Per Thread in an OpenMP Loop
<https://ofekshilon.com/2014/06/10/executing-code-once-per-thread-in-an-openmp-loop/>
https://www.openmp.org/spec-html/5.0/openmpsu38.html

but it is (Visual C++) and even with g++  this would be not compatible.

so i use a basic C solution with static and array that keep in memory if
the scm_init_guile()  as already been launch for the current thread the
code is running now.
I also put omp_get_max_threads() in a static var as openmp() is called many
times in my codes and the number of available hardware cpus would change
never.

the code is here:
https://github.com/damien-mattei/library-FunctProg/blob/master/guile-openMP.c

unfortunately i find no real speed up, i understood that the only reason of
speed up was because the C 'for loop is much faster than the Scheme 'for
ones.

For this concclusion i compared Scheme and C openmp and C without openMP
and in C i got exactly the same time results:

Scheme:
... [output cut]
Chrono START number: 165 minterms-vector-length = 10944. chrono STOP :
elapsedTime = 36.219 ms.totalComputationTime =485311.94
Chrono START number: 166 minterms-vector-length = 12008. chrono STOP :
elapsedTime = 39.82 ms.totalComputationTime =485351.76
Chrono START number: 167 minterms-vector-length = 342. chrono STOP :
elapsedTime = 1.215 ms.totalComputationTime =485352.97500000003

Scheme with OpenMP call:
...[output cut]
Chrono START number: 165 minterms-vector-length = 10944. chrono STOP :
elapsedTime = 35.039 ms.Open MP totalComputationTime =385444.1410000001
Chrono START number: 166 minterms-vector-length = 12008. chrono STOP :
elapsedTime = 37.792 ms.Open MP totalComputationTime =385481.93300000014
Chrono START number: 167 minterms-vector-length = 342. chrono STOP :
elapsedTime = 1.163 ms.Open MP totalComputationTime =385483.09600000014

Scheme with C 'for loop call:
...[output cut]
Chrono START number: 165 minterms-vector-length = 10944. chrono STOP :
elapsedTime = 33.104 ms.For Funct totalComputationTime =385543.4700000001
Chrono START number: 166 minterms-vector-length = 12008. chrono STOP :
elapsedTime = 35.938 ms.For Funct totalComputationTime =385579.4080000001
Chrono START number: 167 minterms-vector-length = 342. chrono STOP :
elapsedTime = 1.165 ms.For Funct totalComputationTime =385580.5730000001

on the C codes (// openmp and sequenctial for) the result
is almost the same :
totalComputationTime =385580.5730000001 ms
totalComputationTime =385483.09600000014 ms
=385 s

i suppose openMP works well by slicing on many processors but the
scm_call_1( func , scm_from_int(i) );
works all on the same thread that host the Guile interpreter.
Solution would be to have many Guile interpreter running but i do not know
how doing that from the C code with OpenMP.

Damien

note : i did time measure both in C and Scheme with gettimeofday code to
compare both 100% scheme code and mixed one:

https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L3500

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-01-17  9:42 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-01-06 10:20 fibers,questions about thread id and mutation of vectors Damien Mattei
2023-01-06 14:12 ` Maxime Devos
2023-01-06 15:06   ` Damien Mattei
2023-01-06 17:06     ` Damien Mattei
2023-01-06 17:06     ` Maxime Devos
2023-01-13 11:10       ` Damien Mattei
2023-01-13 12:23         ` Maxime Devos
2023-01-17  9:42           ` Damien Mattei

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).