gnucap-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Gnucap-devel] floating point optimization


From: al davis
Subject: [Gnucap-devel] floating point optimization
Date: Thu, 7 Dec 2006 04:44:53 -0500
User-agent: KMail/1.9.5

This was asked on the ng-spice developer list, in a thread about 
ng-spice and gnucap working together.  It is interesting , so I 
am reposting my reply here.

On Wednesday 06 December 2006 22:07, John Doe wrote:
> But for the same compiler on the same machine, the results
> are much closer across different optimization levels with 64
> bit rounding.  When I run the same regressions on windows,
> freebsd, and linux, the results are off in a much less
> significant decimal place.  I want my regression to show
> significant differences due to algorithmic changes, and not
> just the fact that I chose -O2 versus -O3.

I ran some tests on gnucap ...

Two computers 
1: intel           1.8ghz, Debian testing, 1 g mem
2: AMD64x2, 2.4 ghz, Debian unstable, 2 g mem

Three compiler option settings
All "-O2"
1. as is
2. "-ffast-math"
3. "-float-store"

Configuration #1 "std" took all defaults except "-O2"

Configuration #2 was the same except for the "-ffast-math" 
option, which turns on all available floating point 
optimizations, including those considered dangerous.

Configuration #3, same as std except for the "-ffloat-store" 
option.  This option forces storage of intermediate results, 
therefore rounding to 64 bits.


Two circuit files, one with 147000 nodes, other with 590000 
nodes.  The larger circuit swapped unaccepably on the small 
machine so I tested only the smaller circuit there.  These were 
used to compare speed.

            AMD, large,  AMD small, intel-small
std                39 sec    9.5 sec       11.2 sec
-ffast-math: 39 sec     9.5 sec       11.2 sec
-ffloat-store:   50 sec    12 sec       13 sec

The "small" circuit takes 30 minutes to run on ng-spice, on the 
AMD, with equivalent results.  Note that the time is 9.5 
SECONDS on gnucap, 30 MINUTES in ng-spice.  The algorithms are 
different.

Also, complete gnucap test suite, 345 test files.



Test suite showed 

AMD-64---
no difference between AMD "std" and "-ffloat-store".

13 test differences between AMD "std" and "-ffast-math"
 One difference was that an overflow was not properly trapped 
with -fast-math.


Intel ---
intel with -float-store had 4 trivial test differences compared 
to AMD std

intel standard had   48 test differences, one is significant, 
compared to AMD std.  The significantly different test still 
gave correct answers with trivial differences, but had 
different time steps.

intel with -fast-math had 43 test differences compared to AMD 
std, one is significant.  It had the same time stepping as the 
standard version.  One test had an overflow that was not 
properly trapped.

My conclusion about speed:  The AMD-64 and Intel processor speed 
difference corresponds to clock speed.

The AMD gives more consistent results, apparently because the 
math really is 64 bit, all the time.  "-ffast-math" causes 
problems and does not improve speed.  "-ffloat-store" results 
in a significant speed penalty (28% on the big circuit) with no 
change in results.  The standard setting is therefore the best 
choice.

The Intel has more differences.  With the "-ffloat-store" 
option, only 4 tests had any difference compared to the AMD, 
and these were trivial.  I think this confirms that it was 
doing essentially the same 64 bit rounding.  The standard 
setting resulted in 48 tests with trivially different results 
in all but one.  I am assuming this is because of the excess 
precision you mention.  The "-ffast-math" option gave 43 
differences compared to the reference.  I do not consider this 
43 compared to the 48 with no options difference to be 
significant.  There were 25 trivial differences comparing intel 
with fast-math to intel with no options.  One was the numeric 
overflow case.

As to which option is best, I am not sure.  The "--fast-math" 
option causes problems and does not improve speed, so it should 
not be used.  Whether the "-ffloat-store" option should be used 
could be debated.  It doesn't give improved accuracy, but it 
does give a more predictable error, essentially matching 
another 64 bit system.  The option does give a speed penalty, 
16% in my test.

The particular test that resulted in different time stepping 
gives believable but incorrect results in ng-spice, with no 
warnings.  It is a negative resistance oscillator using the 
switch element as the negative resistance device.  On 
resistance is 1 ohm.  Off resistance is 1e9.  Gnucap handles 
the fast switching correctly, automatically.  Spice hops past, 
giving a glitch that is really trapezoidal ringing, making it 
appear to work.

One important point here is that differences in algorithms have 
much more effect than differences in compiler optimization.

> When I do AMD-64 in 64bit mode, it is going to prefer the
> 64-bit SSE instructions over the 80-bit 387 instructions.
>  Now I am going to get closer results to a machine with a
> sparc chip then when I compiled the program on the same
> machine in 32-bit mode.
>
> If my result is rounded to 64-bit in the floating point
> register, less damage is done when that number is written
> back to memory and read back in. I am happier with that than
> having an 80-bit number written from register to memory, read
> back in and zero extended.

I think I just confirmed what you said.  The results were as I 
expected.

> An excellent paper on this issue is:
> http://www.wrcad.com/linux_numerics.txt

I have read this paper, long time ago.

> When an EDA customer gets a new update to their tools,
> they're going to validate and they want an explanation why
> the results no longer match their golden files.  EDA
> companies are keenly aware of this, and often provide
> extended precision, but only as a non-default option.

==================

comments?????
Should the intel - Linux version by default compile 
with "-ffloat-store"?

How does NetBSD handle this?





reply via email to

[Prev in Thread] Current Thread [Next in Thread]