Re: [Gnucap-devel] [devel-gnucap] Parralelism

gnucap-devel

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Gnucap-devel] [devel-gnucap] Parralelism

From:	Felix Salfelder
Subject:	Re: [Gnucap-devel] [devel-gnucap] Parralelism
Date:	Thu, 27 Feb 2014 10:57:35 +0100
User-agent:	Mutt/1.5.20 (2009-06-14)

On Wed, Feb 26, 2014 at 07:34:24PM -0500, al davis wrote:
> Very simple ..  Identify certain loops that can run in parallel.  
> That is really all.
> 
> You should look at the output of the "status" command to see 
> where the time is spent, which will show where parallelism could 
> be of benefit and how much benefit to expect.

the status command says, a lot of time is used in "advance". advance
apparently just shifts the history for each device.  so, why does every
device store this individually? is this bound to a condition that is not
implemented? something like 'dormant subcircuits'?

and... "rewiev" takes quite some time. i have started to implement an
audio processor, mostly consisting of behavioural models connecting to
jack. here, the sophisticated time step control is of no use, and a
simplified variant of the tran command runs siginificantly faster. it
reaches realtime on simple circuits.

> I think the overhead of parallelizing the dot product would be 
> too high, thinking of the multi-thread model.  The dot product 
> might be a candidate for GPU type processing, but look at 
> "status" to judge whether there is enough potential benefit 
> before doing this.

when i read this article [1] i had the impression that with some hints
to the compiler, the dot product can be computed faster on recent
hardware. i havent tried. maybe alignment hints help in other places as
well?

> No .. that would probably make it slower.  The speed gain of a 
> better order would be offset by the overhead of ordering and the 
> more complex access.

there's already a permutation matrix involved in matrix allocation,
_sim->_nm (u_sim_data). what i do not understand is, why the permutation
is computed before iwant_matrix is called on the circuit. i have changed
this (in gnucap-uf [2]). the current use case is weeding out unconnected
nodes. but it is also possible to compute a permutation from the
adjecency matrix (which is constructed in BSMATRIX), or from the netlist
hierarchy or from anything. i have implemented some simple examples.

> Also ..  remember that gnucap does incremental updates and 
> partial solutions.  The order that is optimal for this is 
> different from the ordering optimal for solving the entire 
> matrix.

i expect that global bandwith optimization (amd or rcm) will break the
incremental and partial stuff (haven't come to try). a local optimizer,
tied to the subcircuit hierarchy, might still improve the order,
particularly if the netlist has not been written very carefully.

regards
felix

[1] http://locklessinc.com/articles/vectorize
[2] http://tool.em.cs.uni-frankfurt.de/~felix/gnucap-uf

[Prev in Thread]

Current Thread

[Next in Thread]

[Gnucap-devel] [devel-gnucap] Parralelism, beranger six, 2014/02/26
- Re: [Gnucap-devel] [devel-gnucap] Parralelism, al davis, 2014/02/26
  - Re: [Gnucap-devel] [devel-gnucap] Parralelism, Felix Salfelder <=

Prev by Date: Re: [Gnucap-devel] [devel-gnucap] Parralelism
Previous by thread: Re: [Gnucap-devel] [devel-gnucap] Parralelism
Index(es):
- Date
- Thread