Re: map-par slower than map

Hello,

sorry for my late answer ( i wrote the code friday but i could only debug today, being busy and on travel last week-end)

i modified my code to works with 'futures' , the speed is now incredible !!! (and it computes exact)

the speed seems multiplied by even more than the number of CPUs (6):

cat /proc/cpuinfo | grep processor
processor : 0
processor : 1
processor : 2
processor : 3
processor : 4
processor : 5

https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L1900

(declare minterms-vector unified-minterms-vector-1)

(define (funct-unify-minterms-set-1-unit-future set1 set2)

{function-unify-minterms-list <+ (λ (L) (apply function-unify-two-minterms-and-tag L))}

;; note : sorting is useless

{minterms-set <+ (product-set-with-set-imperative-sorted set1 set2)} ;;(product-set-with-set-imperative set1 set2)} ;;(product-set-with-set set1 set2)} ;;(associate-set-with-set set1 set2)} ;; set multiplication : create list of pair of minterms

{minterms-vector <- (list->vector minterms-set)}

{minterms-vector-length <+ (vector-length minterms-vector)}

{nb-procs <+ (current-processor-count)}

{segmts <+ (segment 0 {minterms-vector-length - 1} nb-procs)} ;; compute the segments

{unified-minterms-vector-1 <- (make-vector minterms-vector-length #f)}

(run-in-parallel segmts proc-unify-minterms-seg) ;; run the parallel code

{unified-minterms-set-1 <+ (vector->list unified-minterms-vector-1)}

{unified-minterms-set-2 <+ (filter (λ (x) x) unified-minterms-set-1)} ;; remove #f results

{unified-minterms-set <+ (remove-duplicates-sorted unified-minterms-set-2)} ;; uniq

unified-minterms-set)

i keeped your 'segment' definitions to index an array of data after convert it from list to vector (list->vector) which probably consume additional time on big data list ( i had more than 1000000 element list lengths at some point)

i used a simplified version of run-in parallel (that do not do 'reduce after processing data):

https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L1794

the computation on a segment is done by those functions:

https://github.com/damien-mattei/library-FunctProg/blob/master/guile/logiki%2B.scm#L1842

{function-unify-minterms-list <+ (λ (L) (apply function-unify-two-minterms-and-tag L))}

;; proc to be called with futures
(define (proc-unify-minterms-seg seg)
{start <+ (segment-start seg)}
{end <+ (segment-end seg)}
(for ({i <+ start} {i <= end} {i <- {i + 1}})
{mtL <+ {minterms-vector[i]}}
{unified-minterms-vector-1[i] <- (function-unify-minterms-list mtL)}))

i use those code in another program symbolically to compute a value named Cn:

C0 = T
C1 = T
C2 = T
C3 = (B1 ∨ B2)
C4 = (B2 ∨ (B1 ∧ B3))
C5 = ((B1 ∧ B3) ∨ (B2 ∧ B3) ∨ (B2 ∧ B4) ∨ (B3 ∧ B4))
C6 = ((B1 ∧ B3 ∧ B5) ∨ (B2 ∧ B3 ∧ B5) ∨ (B2 ∧ B4) ∨ (B3 ∧ B4) ∨ (B4 ∧ B5))
C7 = ((B1 ∧ B3 ∧ B5) ∨ (B2 ∧ B3 ∧ B5) ∨ (B2 ∧ B4 ∧ B6) ∨ (B3 ∧ B4 ∧ B6) ∨ (B4 ∧ B5) ∨ (B5 ∧ B6))
C8 = ((B1 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B4 ∧ B6) ∨ (B3 ∧ B4 ∧ B6) ∨ (B4 ∧ B5 ∧ B7) ∨ (B5 ∧ B6) ∨ (B6 ∧ B7))
C9 = ((B1 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B3 ∧ B5 ∧ B7) ∨ (B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B4 ∧ B5 ∧ B7) ∨ (B5 ∧ B6 ∧ B8) ∨ (B6 ∧ B7) ∨ (B7 ∧ B8))

from C1 to C9 used to be fast,less than a minute for the whole with or without //

C10 = ((B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B4 ∧ B5 ∧ B7 ∧ B9) ∨ (B5 ∧ B6 ∧ B8) ∨ (B6 ∧ B7 ∧ B9) ∨ (B7 ∧ B8) ∨ (B8 ∧ B9))

C10 takes a few minutes

but C11 used to take one day before // with 'future' and i got now the result during less than one hour!!!

C11 = ((B1 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B10 ∧ B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B5 ∧ B6 ∧ B8) ∨ (B10 ∧ B7 ∧ B8) ∨ (B10 ∧ B9) ∨ (B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B4 ∧ B5 ∧ B7 ∧ B9) ∨ (B6 ∧ B7 ∧ B9) ∨ (B8 ∧ B9))

i never got C12 in the past ,even with 7 days of computation! and i got it today during the lunchbreak !!!

C12 = ((B1 ∧ B11 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B10 ∧ B11) ∨ (B10 ∧ B2 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B3 ∧ B4 ∧ B6 ∧ B8) ∨ (B10 ∧ B5 ∧ B6 ∧ B8) ∨ (B10 ∧ B7 ∧ B8) ∨ (B10 ∧ B9) ∨ (B11 ∧ B2 ∧ B3 ∧ B5 ∧ B7 ∧ B9) ∨ (B11 ∧ B4 ∧ B5 ∧ B7 ∧ B9) ∨ (B11 ∧ B6 ∧ B7 ∧ B9) ∨ (B11 ∧ B8 ∧ B9))

(note that a wise people can find a general formula for Cn based on Cn-1 but that is another (mathematical) story....)

i'm computing C13 but i see that it consumes 12gb out of 16gb of my system ! and stopped on the linux box :

GC Warning: Repeated allocation of very large block (appr. size 510935040):
May lead to memory leak and poor performance
Processus arrêté

but still running on the Mac laptop...ok will see that but i think that the data set is now huge and there is noting that can be done with that...

note that there is still an hash table used in function-unify-two-minterms-and-tag and i will use another data structure and algo to avoid that, i feared that the concurrent access to hash table can cause the guile 'future' to crash or fails or to slow down the process but not! result is incredibly fast.Also i use continuation and i read it can cause problem with 'future' i will improve that too...

I will see where i can improve the algo and data structure to optimize again but this is already really good.

Thanks

Damien

On Fri, Oct 14, 2022 at 10:39 AM Zelphir Kaltstahl <zelphirkaltstahl@posteo.de> wrote:

Hello Damien,

On 10/14/22 10:21, Damien Mattei wrote:

yes Zelphir i think there is a problem in the threading part of guile, whatever the code is ,it run well the first time, and after is unstable. Long ago at the very begin of Java language on my master project at university i experienced same problem with Java "green" threads, under windows and/or linux ,i can't remember.

I'm trying your code and future, which have perheaps also the portability with other schemes, future can provide "light" // , with carefull code it could be ok.

in your segment.scm code ,is segment-count possibly replaced by the number of available CPUs or nodes, if i understand well?

Regards,

Damien

Correct. For each segment one future is used.

However, there are scenarios, where one would rather want to interleave the numbers of the whole range, to have a "fairer" workload per future, for example:

(1 2 3 4 5 6 7 8 9)

-> (1 4 7), (2 5 8), (3 6 9)

instead of

-> (1 2 3) (4 5 6) (7 8 9)

(I seem to remember this to be the case for Mandelbrot set calculations, but might be wrong.)

But that is all a matter of forming some segments and what a segment is, not really part of the logic of creating futures. So one could create then in any way that fits ones use-case. I have not generalized that segment logic that far, but it is doable.

Anyway, if anyone more knowledgeable could chime in on what the performance differences between futures and parallel map are, that would be great! Or pointing out some flaws that this kind of future usage might have. Or when the point is reached to switch to guile-fibers library.

Regards,
Zelphir
-- 
repositories: https://notabug.org/ZelphirKaltstahl

From:	Damien Mattei
Subject:	Re: map-par slower than map
Date:	Mon, 17 Oct 2022 15:17:04 +0200