[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnucap-devel] [devel-gnucap] Parralelism
From: |
al davis |
Subject: |
Re: [Gnucap-devel] [devel-gnucap] Parralelism |
Date: |
Wed, 26 Feb 2014 19:34:24 -0500 |
User-agent: |
KMail/1.13.7 (Linux/3.2.0-4-amd64; KDE/4.8.4; x86_64; ; ) |
On Wednesday 26 February 2014, beranger six wrote:
> I start a work to parellelize(with openmp, and then maybe
> cuda) the LU decomposition in gnucap.
Openmp looks like a good way to do it. It looks simple, and
comes with most compilers including gcc.
Do not use cuda. Unless I misunderstand, the licensing of cuda
makes it unsuitable for use in a GNU project.
>
> I see your topic about parallelism, and i have some time to
> do it.
>
> Futhermore, we definitly need faster simulation result for
> our application.
>
> What kind of solution did you have in mind:
Very simple .. Identify certain loops that can run in parallel.
That is really all.
You should look at the output of the "status" command to see
where the time is spent, which will show where parallelism could
be of benefit and how much benefit to expect.
In the LU decomposition, running the outermost loop in parallel
should be all that is needed there. But to get enough benefit
model evaluation also should be parallel, and likely more
important.
> -Is it the "section" you design with row, diagonal, and
> column? In this case did you want to use , the fact that if
> all the section beetween _lownode[mm] and mm are calculated
> we could computed the element.
>
> In this case we could have a dependence graph(or tree)
> applied to your storage matrix section, mostlyy used to
> parrallelize Gilbert-Peierls Algorithm .
I don't think that makes sense here, but you might want to try
it. Remember .. gnucap's matrix solver usually does low rank
updates and partial solutions. If you lose this feature it
could make it so much slower that any parallel operation can
not come close to recovering the loss.
The simpler solver used for AC analysis is not parallel ready.
To parallelize the AC matrix solution it may be necessary to
switch to the other lu_decomp, which requires double matrix
storage.
> -Is it an iterative method,with the problem that the
> convergence could take theorically an infinite number of
> operation.(so maybe not a good way)
no -- not iterative -- except for the standard DC_tran Newton
iteration which would not change.
> -Is it parallelize only the map(multiplicaton beetwen
> element) of dot product, and then maybe parallelize the
> reduction(addition beetween elements).
I think the overhead of parallelizing the dot product would be
too high, thinking of the multi-thread model. The dot product
might be a candidate for GPU type processing, but look at
"status" to judge whether there is enough potential benefit
before doing this.
> -Did you had in mind to apply permutation matrix to ease
> implementation of parrallelism, or directly doing the best
> matrix in evalution of netlist.
No .. that would probably make it slower. The speed gain of a
better order would be offset by the overhead of ordering and the
more complex access.
Also .. remember that gnucap does incremental updates and
partial solutions. The order that is optimal for this is
different from the ordering optimal for solving the entire
matrix.
I am aware of a problem with read-in where the recursive
"find_looking_out" can waste a lot of time. Again, "status"
will tell you.
>
> Regards,
>
> Beranger Six
>
> _______________________________________________
> Gnucap-devel mailing list
> address@hidden
> https://lists.gnu.org/mailman/listinfo/gnucap-devel