[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [Gnucap-devel] [devel-gnucap] Parralelism
From: |
Felix Salfelder |
Subject: |
Re: [Gnucap-devel] [devel-gnucap] Parralelism |
Date: |
Thu, 27 Feb 2014 10:57:35 +0100 |
User-agent: |
Mutt/1.5.20 (2009-06-14) |
On Wed, Feb 26, 2014 at 07:34:24PM -0500, al davis wrote:
> Very simple .. Identify certain loops that can run in parallel.
> That is really all.
>
> You should look at the output of the "status" command to see
> where the time is spent, which will show where parallelism could
> be of benefit and how much benefit to expect.
the status command says, a lot of time is used in "advance". advance
apparently just shifts the history for each device. so, why does every
device store this individually? is this bound to a condition that is not
implemented? something like 'dormant subcircuits'?
and... "rewiev" takes quite some time. i have started to implement an
audio processor, mostly consisting of behavioural models connecting to
jack. here, the sophisticated time step control is of no use, and a
simplified variant of the tran command runs siginificantly faster. it
reaches realtime on simple circuits.
> I think the overhead of parallelizing the dot product would be
> too high, thinking of the multi-thread model. The dot product
> might be a candidate for GPU type processing, but look at
> "status" to judge whether there is enough potential benefit
> before doing this.
when i read this article [1] i had the impression that with some hints
to the compiler, the dot product can be computed faster on recent
hardware. i havent tried. maybe alignment hints help in other places as
well?
> No .. that would probably make it slower. The speed gain of a
> better order would be offset by the overhead of ordering and the
> more complex access.
there's already a permutation matrix involved in matrix allocation,
_sim->_nm (u_sim_data). what i do not understand is, why the permutation
is computed before iwant_matrix is called on the circuit. i have changed
this (in gnucap-uf [2]). the current use case is weeding out unconnected
nodes. but it is also possible to compute a permutation from the
adjecency matrix (which is constructed in BSMATRIX), or from the netlist
hierarchy or from anything. i have implemented some simple examples.
> Also .. remember that gnucap does incremental updates and
> partial solutions. The order that is optimal for this is
> different from the ordering optimal for solving the entire
> matrix.
i expect that global bandwith optimization (amd or rcm) will break the
incremental and partial stuff (haven't come to try). a local optimizer,
tied to the subcircuit hierarchy, might still improve the order,
particularly if the netlist has not been written very carefully.
regards
felix
[1] http://locklessinc.com/articles/vectorize
[2] http://tool.em.cs.uni-frankfurt.de/~felix/gnucap-uf