after intense coding i finally got the good results,
my assumption about the global variable hash table was true ,it is incompatible with 'future : the competition for the writing into the hash table breaks the code.

If i do writing in hash table out of // region all is ok:

a simple function to extract the hash table access:

(define (tag-minterms i umt)
  (when umt
     {mt1 <+ (first {minterms-vector[i]})}
     {mt2 <+ (second {minterms-vector[i]})}
     {minterms-ht[mt1] <- #t}
     {minterms-ht[mt2] <- #t}))



;; proc to be called with futures
(define (proc-unify-minterms-seg seg)

  {function-unify-minterms-list <+ (λ (L) (apply function-unify-two-minterms L))}
  ;;{function-unify-minterms-list <+ (λ (L) (apply function-unify-two-minterms-and-tag L))}
 
 
  {start <+ (segment-start seg)}
  {end <+ (segment-end seg)}
  (for ({i <+ start} {i <= end} {i <- {i + 1}})
       {mtL <+ {minterms-vector[i]}}
       (nodebug
          (dv mtL))
       {unified-minterms-vector-1[i] <- (function-unify-minterms-list mtL)}))


(declare minterms-vector unified-minterms-vector-1)

https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L2893

and the call to the code that use hash table is now completely out of the // region:

 (debug
   (display-nl "before //"))
 
  (run-in-parallel segmts proc-unify-minterms-seg) ;; run the parallel code

  (debug
   (display-nl "after //"))

  (vector-for-each tag-minterms unified-minterms-vector-1) ;; tag the minterms in the hash table

with a partitioned vector it works because one cell is not accessed by different threads

other stuff like continuation seems to work with 'future, i wanted to rewrite code with call/cc in 'future but it was too long and apparently it works,any way i do not see why call/cc would cause trouble in a thread...

i think 'futures are good if you use them carefully like in OpenMP regions... in other languages.

the code is very very fast now i reached C15,in a few minutes on an Apple M1
scheme@(guile-user)>  (current-processor-count)
$1 = 8

C15 = ((B1 ∧ B3 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B13) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B11 ∧ B13) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B13) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B8 ∧ B10 ∧ B11 ∧ B13) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B13) ∨ (B1 ∧ B3 ∧ B4 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B13) ∨ (B1 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B10 ∧ B11 ∧ B13) ∨ (B1 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B13) ∨ (B1 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B11 ∧ B13) ∨ (B1 ∧ B3 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B12 ∧ B13) ∨ (B1 ∧ B3 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B13) ∨ (B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B11 ∧ B13) ∨ (B2 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B2 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B13) ∨ (B2 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B10 ∧ B11 ∧ B13) ∨ (B2 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B13) ∨ (B2 ∧ B3 ∧ B5 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B14) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B11 ∧ B13) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B12 ∧ B13) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B12 ∧ B14) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B13) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B14) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9 ∧ B11 ∧ B13) ∨ (B2 ∧ B4 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B11 ∧ B13) ∨ (B2 ∧ B4 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B12 ∧ B13) ∨ (B2 ∧ B4 ∧ B5 ∧ B7 ∧ B8 ∧ B10 ∧ B12 ∧ B14) ∨ (B2 ∧ B4 ∧ B5 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B13) ∨ (B2 ∧ B4 ∧ B5 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B14) ∨ (B2 ∧ B4 ∧ B5 ∧ B7 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B2 ∧ B4 ∧ B5 ∧ B7 ∧ B9 ∧ B11 ∧ B13) ∨ (B2 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B13) ∨ (B2 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B10 ∧ B12 ∧ B14) ∨ (B2 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B2 ∧ B4 ∧ B6 ∧ B7 ∧ B9 ∧ B11 ∧ B13) ∨ (B2 ∧ B4 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B12 ∧ B14) ∨ (B2 ∧ B4 ∧ B6 ∧ B8 ∧ B9 ∧ B11 ∧ B13) ∨ (B2 ∧ B4 ∧ B6 ∧ B8 ∧ B10 ∧ B11 ∧ B13) ∨ (B2 ∧ B4 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B13) ∨ (B2 ∧ B4 ∧ B6 ∧ B8 ∧ B10 ∧ B12 ∧ B14))

and it consumes more than 2Gb in memory:


PID    COMMAND      %CPU  TIME     #TH   #WQ  #PORT MEM    PURG   CMPRS  PGRP  PPID  STATE    BOOSTS          %CPU_ME
15081  guile        442.8 57:24.28 17/8  0    44    2423M  0B     118M   15081 10551 running  *0[1]           0.0000

i checked a part of the logical computation with Mathematica online and it was ok but i will do a Python program to check the logical expression again and compare speed with Sympy...

and for the space complexity i do not think it can be reduced

good night all

On Sat, Oct 22, 2022 at 6:01 PM Damien Mattei <damien.mattei@gmail.com> wrote:
just for the fun things, i just find that the parallelized results are false :-) even for low indices:
the non // result: C5 = ((B1 ∧ B3) ∨ (B2 ∧ B3) ∨ (B2 ∧ B4))
is different than the parallelized result below:
C5 = ((B1 ∧ B3) ∨ (B2 ∧ B3) ∨ (B2 ∧ B4) ∨ (B3 ∧ B4))

i have to check things again ,i think futures are easy to use but limited in solving acute // problems, such as using a global variable like an hash table...

Damien



On Mon, Oct 17, 2022 at 3:17 PM Damien Mattei <damien.mattei@gmail.com> wrote:
Hello,

sorry for my late answer ( i wrote the code friday but i could only debug today, being busy and on travel last week-end)

i modified my code to works with 'futures' , the speed is now incredible !!! (and it computes exact)
the speed seems multiplied by even more than the number of CPUs (6):
cat /proc/cpuinfo  | grep processor
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5


(declare minterms-vector unified-minterms-vector-1)

(define (funct-unify-minterms-set-1-unit-future set1 set2)

  {function-unify-minterms-list <+ (λ (L) (apply function-unify-two-minterms-and-tag L))}

  ;; note : sorting is useless

  {minterms-set <+ (product-set-with-set-imperative-sorted set1 set2)} ;;(product-set-with-set-imperative set1 set2)} ;;(product-set-with-set set1 set2)} ;;(associate-set-with-set set1 set2)} ;; set multiplication : create list of pair of minterms

  {minterms-vector <- (list->vector minterms-set)}

  {minterms-vector-length <+ (vector-length minterms-vector)}

  {nb-procs <+ (current-processor-count)}
 
  {segmts <+ (segment 0 {minterms-vector-length - 1} nb-procs)} ;; compute the segments

  {unified-minterms-vector-1 <- (make-vector minterms-vector-length #f)}
 
  (run-in-parallel segmts proc-unify-minterms-seg) ;; run the parallel code

  {unified-minterms-set-1 <+ (vector->list unified-minterms-vector-1)}
 
  {unified-minterms-set-2 <+ (filter (λ (x) x) unified-minterms-set-1)} ;; remove #f results

  {unified-minterms-set <+ (remove-duplicates-sorted unified-minterms-set-2)} ;; uniq
     
  unified-minterms-set)

i keeped your 'segment' definitions to index an array of data after convert it from list to vector (list->vector) which probably consume additional time on big data list ( i had more than 1000000 element list lengths at some point)

i used a simplified version of run-in parallel (that do not do 'reduce after processing data):


the computation on a segment is done by those functions:


{function-unify-minterms-list <+ (λ (L) (apply function-unify-two-minterms-and-tag L))}

;; proc to be called with futures
(define (proc-unify-minterms-seg seg)
  {start <+ (segment-start seg)}
  {end <+ (segment-end seg)}
  (for ({i <+ start} {i <= end} {i <- {i + 1}})
       {mtL <+ {minterms-vector[i]}}
       {unified-minterms-vector-1[i] <- (function-unify-minterms-list mtL)}))


i use those code in another program symbolically to compute a value named Cn:

C0 = T
C1 = T
C2 = T
C3 = (B1 ∨ B2)
C4 = (B2 ∨ (B1 ∧ B3))
C5 = ((B1 ∧ B3) ∨ (B2 ∧ B3) ∨ (B2 ∧ B4) ∨ (B3 ∧ B4))
C6 = ((B1 ∧ B3 ∧ B5) ∨ (B2 ∧ B3 ∧ B5) ∨ (B2 ∧ B4) ∨ (B3 ∧ B4) ∨ (B4 ∧ B5))
C7 = ((B1 ∧ B3 ∧ B5) ∨ (B2 ∧ B3 ∧ B5) ∨ (B2 ∧ B4 ∧ B6) ∨ (B3 ∧ B4 ∧ B6) ∨ (B4 ∧ B5) ∨ (B5 ∧ B6))
C8 = ((B1 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B4 ∧ B6) ∨ (B3 ∧ B4 ∧ B6) ∨ (B4 ∧ B5 ∧ B7) ∨ (B5 ∧ B6) ∨ (B6 ∧ B7))
C9 = ((B1 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B4 ∧ B5 ∧ B7) ∨ (B5 ∧ B6 ∧ B8) ∨ (B6 ∧ B7) ∨ (B7 ∧ B8))

from C1 to C9 used to be fast,less than  a minute for the whole with or without //

C10 = ((B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B4 ∧ B5 ∧ B7 ∧ B9) ∨ (B5 ∧ B6 ∧ B8) ∨ (B6 ∧ B7 ∧ B9) ∨ (B7 ∧ B8) ∨ (B8 ∧ B9))

C10 takes a few minutes

but C11 used to take one day before // with 'future' and i got now the result during less than one hour!!!

C11 = ((B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B10 ∧ B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B5 ∧ B6 ∧ B8) ∨ (B10 ∧ B7 ∧ B8) ∨ (B10 ∧ B9) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B4 ∧ B5 ∧ B7 ∧ B9) ∨ (B6 ∧ B7 ∧ B9) ∨ (B8 ∧ B9))

i never got C12 in the past ,even with 7 days of computation! and i got it today during the lunchbreak !!!

C12 = ((B1 ∧ B11 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B10 ∧ B11) ∨ (B10 ∧ B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B5 ∧ B6 ∧ B8) ∨ (B10 ∧ B7 ∧ B8) ∨ (B10 ∧ B9) ∨ (B11 ∧ B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B11 ∧ B4 ∧ B5 ∧ B7 ∧ B9) ∨ (B11 ∧ B6 ∧ B7 ∧ B9) ∨ (B11 ∧ B8 ∧ B9))

(note that a wise people can find a general formula for Cn based on Cn-1 but that is another (mathematical) story....)

i'm computing C13 but i see that it consumes 12gb out of 16gb of my system ! and stopped on the linux box :
GC Warning: Repeated allocation of very large block (appr. size 510935040):
May lead to memory leak and poor performance
Processus arrêté
but still running on the Mac laptop...ok will see that but i think that the data set is now huge and there is noting that can be done with that...

note that there is still an hash table used in function-unify-two-minterms-and-tag and i will use another data structure and algo to avoid that, i feared that the concurrent access to hash table can cause the guile 'future' to crash or fails or to slow down the process but not! result is incredibly fast.Also i use continuation and i read it can cause problem with 'future' i will improve that too...

I will see where i can improve the algo and data structure to optimize again but this is already really good.

Thanks

Damien


On Fri, Oct 14, 2022 at 10:39 AM Zelphir Kaltstahl <zelphirkaltstahl@posteo.de> wrote:

Hello Damien,

On 10/14/22 10:21, Damien Mattei wrote:
yes Zelphir i think there is a problem in the threading part of guile, whatever the code is ,it run well the first time, and after is unstable. Long ago at the very begin of Java language on my master project at university i experienced same problem with Java "green" threads, under windows and/or linux ,i can't remember.

I'm trying your code and future, which have perheaps also the portability with other schemes, future can provide "light" // , with carefull code it could be ok.

in your segment.scm code ,is segment-count possibly replaced by the number of available CPUs or nodes, if i understand well?

Regards,
Damien

Correct. For each segment one future is used.

However, there are scenarios, where one would rather want to interleave the numbers of the whole range, to have a "fairer" workload per future, for example:

(1 2 3 4 5 6 7 8 9)

-> (1 4 7), (2 5 8), (3 6 9)

instead of

-> (1 2 3) (4 5 6) (7 8 9)

(I seem to remember this to be the case for Mandelbrot set calculations, but might be wrong.)

But that is all a matter of forming some segments and what a segment is, not really part of the logic of creating futures. So one could create then in any way that fits ones use-case. I have not generalized that segment logic that far, but it is doable.

Anyway, if anyone more knowledgeable could chime in on what the performance differences between futures and parallel map are, that would be great! Or pointing out some flaws that this kind of future usage might have. Or when the point is reached to switch to guile-fibers library.

Regards,
Zelphir

-- 
repositories: https://notabug.org/ZelphirKaltstahl