guile-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: map-par slower than map


From: Damien Mattei
Subject: Re: map-par slower than map
Date: Thu, 13 Oct 2022 10:20:19 +0200

ok 'parallelization code and hash table' in google show no real solution, i have another idea,instead of function-unify-two-minterms-and-tag ,just unify the minterms and tag them (which use hash table) later out of the // region, after if it still sucks i will drop list for vectors, making it step by step will show where is the bottleneck, let me just a few time and i will go back to the mailing lists with interesting results i hope....

On Thu, Oct 13, 2022 at 9:40 AM Damien Mattei <damien.mattei@gmail.com> wrote:
ok , i think the problem comes both from my code and from guile parmap so.
Obviously parmap could be slower on other codes because of the nature of list i think, it is hard to split a list in sublist and send them to thread and after redo a single list, i better use vector.
As mentioned and forget by me, i apologize, i use an hash table which is a global variable and mutex can be set on it , no dead lock , here but it slow down the code than it is dead for speed , but not locked.

The examples given are good but i do not want to write long and specific code for //, for // must a ssimple as OpenMP directives (on arrays) not be a pain in the ass like things i already did at work with GPUs in C or Fortran,that's nightmare and i do not want to have this in functional programming.

ok i will try to modify the code but i do not think there is easy solution to replace the hash table, any structure i would use would be global and the problem (this function is not pure, it does a side-effect),at the end i do not think this code could be // but i'm not a specialist of //. I think this is a parallelization problem, how can we deal with a global variable such as an hash table?

On Wed, Oct 12, 2022 at 11:55 PM Olivier Dion <olivier.dion@polymtl.ca> wrote:
On Wed, 12 Oct 2022, Damien Mattei <damien.mattei@gmail.com> wrote:
> Hello,
> all is in the title, i test on a approximately 30000 element list , i got
> 9s with map and 3min 30s with par-map on exactly the same piece of
> code!?

I can only speculate here.  But trying with a very simple example here:
--8<---------------cut here---------------start------------->8---
(use-modules (statprof))
(statprof (lambda () (par-map 1+ (iota 300000))))
--8<---------------cut here---------------end--------------->8---

Performance are terrible.  I don't know how par-map is implemented, but
if it does 1 element static scheduling -- which it probably does because
you pass a linked list and not a vector -- then yeah you can assure that
thing will be very slow.

You're probably better off with dynamic scheduling with vectors.  Here's
a quick snippet I made for static scheduling but with vectors.  Feel
free to roll your own.

--8<---------------cut here---------------start------------->8---
(use-modules
 (srfi srfi-1)
 (ice-9 threads))

(define* (par-map-vector proc input
                         #:optional
                         (max-thread (current-processor-count)))

  (let* ((block-size (quotient (vector-length input) max-thread))
         (rest (remainder (vector-length input) max-thread))
         (output (make-vector (vector-length input) #f)))
    (when (not (zero? block-size))
      (let ((mtx (make-mutex))
            (cnd (make-condition-variable))
            (n 0))
        (fold
         (lambda (scale output)
           (begin-thread
            (let lp ((i 0))
              (when (< i block-size)
                (let ((i (+ i (* scale block-size))))
                  (vector-set! output i (proc (vector-ref input i))))
                (lp (1+ i))))
            (with-mutex mtx
              (set! n (1+ n))
              (signal-condition-variable cnd)))
           output)
         output
         (iota max-thread))
        (with-mutex mtx
          (while (not (< n max-thread))
            (wait-condition-variable cnd mtx))))
    (let ((base (- (vector-length input) rest)))
      (let lp ((i 0))
        (when (< i rest)
          (let ((i (+ i base)))
            (vector-set! output i (proc (vector-ref input i))))
          (lp (1+ i)))))
    output))
--8<---------------cut here---------------end--------------->8---

--
Olivier Dion
oldiob.dev

reply via email to

[Prev in Thread] Current Thread [Next in Thread]