gnucap-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnucap-devel] [parallelism]


From: Felix Salfelder
Subject: Re: [Gnucap-devel] [parallelism]
Date: Tue, 4 Mar 2014 22:59:39 +0100
User-agent: Mutt/1.5.20 (2009-06-14)

On Tue, Mar 04, 2014 at 06:52:41PM +0100, beranger six wrote:
> >I think that we can not parallelize the outermost loop because we
> >noticed that most elements are dependant on the previous
> >computations. (a block is dependent to all block beetwen its
> >lownode(bn) and its row_value(mm) ) To bypass this issue we
> >thought about using a dependence tree. This tree would be created
> >when we get the system and updated each time the lownodes change.
> >not making this dependence tree time consuming. Please refer to
> >the PDF attached to this mail explaining it with some figures.
> >Please keep in mind this document is a draft for now..

there may always be elements depending on previous computations. the
dependencies are implicit from the system matrix, and imo there is no
need for an extra tree to keep track of them.

> answer_gnucap_digest_88-1.pdf

the example matrix is this (monospace font required).

1  *
 2 *
  3*
***4*
   *5
     6*   *
     *7*  *
      *8  *
        9**
        *a*
     *****b

my understanding of parallelizing the outermost loop yields something
very close to the presented evaluation scheme.

take 6 threads, T1 .. T6 and choose a contiguous node partition.
T1 takes care of mm=1 .. 1
T2 takes care of mm=2 .. 2
T3 takes care of mm=3 .. 3
T4 takes care of mm=4 .. 5
T5 takes care of mm=6 .. 8
T6 takes care of mm=9 .. b
lets running these threads in parallel, basically executing the
corresponding part of the lu_decomp loop.

clearly, T4 needs to wait for T1-T3 to finish, likewise T6 must wait for
T5. this is obvious from the lownode values. e.g. T4 needs to wait
for T1 .. T3, before entering 4, since T1..T3 are responsible for nodes
>= lownode[4]=1. similarly T6 needs to wait for T5 before entering b.

it might be tricky to find a good partition, and your tree-view might
give the right ideas. to me it does not make sense to schedule
everything in advance. for example, you may want to start with just 4
threads, and take the first of T1..T4 that finishes to become T5. you
cannot know which, as it depends on which nodes have changed
(a.is_changed(mm)).

if there is a matrix where no such partition induces a useful
parallelization scheme, i'd be interested to know!

hth
felix



reply via email to

[Prev in Thread] Current Thread [Next in Thread]