help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Same .m file: different results with different versions of Octave


From: Jaroslav Hajek
Subject: Re: Same .m file: different results with different versions of Octave
Date: Tue, 20 Apr 2010 09:55:20 +0200

On Tue, Apr 20, 2010 at 9:18 AM, Judd Storrs <address@hidden> wrote:
> On Tue, Apr 20, 2010 at 1:55 AM, Jaroslav Hajek <address@hidden> wrote:
>> I think we're getting closer. I started from there and made a few more
>> changes that I hope should not disrupt accuracy. I removed __pow calls
>> for squaring in favor of simple multiplication, and decomposed the
>> sin(2*imag(x)) part into sin(imag(x)) * cos(imag(x)), precomputing all
>> common subexpression. Based on this I reduced the computation to one
>> of each sinh, cosh and the combined sincos call, which is the same as
>> the original implementation did (but we do it for different values).
>
> Aha! I was tearing my hair out trying to figure out how to get it
> faster and was stuck. The patch I sent to the list turned out to be
> about twice as slow as glibc. I made it as far as removing the pow()
> and undoing the sinh(x)*cish(x) replacement for sinh(2x) an was able
> to get to 30% slower but I was stuck. Replacing sin(2x) with
> sin(x)*cos(x) is brilliant.
>
>> I also included a check for the den == 0 case. Although in reality
>> perhaps cos() can never be zero (at least I failed to produce an exact
>> root), I think it is not guaranteed.
>
> I'm still pondering this one. As far as I can tell, because
>
> den == 2*(sinh(A)^2 + cos(B)^2)
>
> For A and B real, sinh(A)^2 can only be zero when A==0. cos(B) can
> only be zero for B=(n+1/2)pi. For den to be zero, both must must be
> zero simultaneously. I think tanh has to be NaN+NaNi at those places
> because both the real and imaginary parts limit to +inf and -inf
> depending on path. The reason I hesitate is that the glibc author
> thought something could be done there... I feel like maybe I'm missing
> something obvious.
>

Oops, now I see my mistake. I messed up sin with cos and thought that
den == 0 only when z == 0.
I think you're right, mathematically it's got to be NaN + NaN*i if den == 0.
I think we can simply use the original author's den == 0 branch, it's
supposed to be very infrequent. In fact I doubt that it can happen at
all on my platform, as I said, I can't find an exact root of cos() at
all, even if walking the doubles via nextafter()/nexttoward(). If
there was one, then den == 0 could happen as an underflow in
sinhx*sinhx, so the original author's approach may perhaps still
deliver something slightly better than NaN + NaN*i, although the
accuracy is questionable at best.


>> I didn't yet measure the performance impact, but I'm pretty confident
>> it will be negligible, given that we actually only do a few extra
>> multiplications and an extra check.
>
> I've been testing the performance with
>
> x = -20:0.05:20 ;
> [a,b] = ndgrid(x,x) ;
> x = complex(a,b) ;
> clear a b
> for i=1:10
>   tic ; tanh(x) ; toc
> endfor
>
> For the unaltered glibc I get:
>
> Elapsed time is 0.150706 seconds.
> Elapsed time is 0.153792 seconds.
> Elapsed time is 0.151361 seconds.
> Elapsed time is 0.153138 seconds.
> Elapsed time is 0.14215 seconds.
> Elapsed time is 0.141334 seconds.
> Elapsed time is 0.139582 seconds.
> Elapsed time is 0.141947 seconds.
> Elapsed time is 0.139618 seconds.
> Elapsed time is 0.139667 seconds.
>
> Your optimized version does:
>
> Elapsed time is 0.179921 seconds.
> Elapsed time is 0.180975 seconds.
> Elapsed time is 0.181409 seconds.
> Elapsed time is 0.182956 seconds.
> Elapsed time is 0.166545 seconds.
> Elapsed time is 0.169406 seconds.
> Elapsed time is 0.168018 seconds.
> Elapsed time is 0.170243 seconds.
> Elapsed time is 0.170556 seconds.
> Elapsed time is 0.167231 seconds.
>
> This is the unoptimized version:
>
> Elapsed time is 0.281236 seconds.
> Elapsed time is 0.284801 seconds.
> Elapsed time is 0.270071 seconds.
> Elapsed time is 0.268668 seconds.
> Elapsed time is 0.259948 seconds.
> Elapsed time is 0.256692 seconds.
> Elapsed time is 0.256862 seconds.
> Elapsed time is 0.261013 seconds.
> Elapsed time is 0.258265 seconds.
> Elapsed time is 0.256966 seconds.
>
> Of course, it's "cheating" if we go beyond to x>22.
> Or, maybe, it's "cheating" to limit ourselves to x<22 :)
>
> x = -50:0.1:50 ;
>
> Then glibc does
>
> Elapsed time is 0.231459 seconds.
> Elapsed time is 0.219057 seconds.
> Elapsed time is 0.218268 seconds.
> Elapsed time is 0.216653 seconds.
> Elapsed time is 0.20223 seconds.
> Elapsed time is 0.207265 seconds.
> Elapsed time is 0.20144 seconds.
> Elapsed time is 0.201405 seconds.
> Elapsed time is 0.201688 seconds.
> Elapsed time is 0.202221 seconds.
>
> Your optimizations do:
>
> Elapsed time is 0.16741 seconds.
> Elapsed time is 0.193478 seconds.
> Elapsed time is 0.184134 seconds.
> Elapsed time is 0.188431 seconds.
> Elapsed time is 0.173057 seconds.
> Elapsed time is 0.172517 seconds.
> Elapsed time is 0.172727 seconds.
> Elapsed time is 0.172515 seconds.
> Elapsed time is 0.161827 seconds.
> Elapsed time is 0.155982 seconds.
>
>
>> Unless you see any problems with this change, I'd say let's find out
>> how to do the thresholds for single and long double precision (esp.
>> the latter), complete the patch and submit it to glibc maintainers.
>

How do you try the modified libm in Octave? I'm able to link C
programs against the modified libm, but I can't get Octave to use it.
If I do "LD_LIBRARY_PATH=path/to/libm octave", octave still seems to
use the system libm. Or do I need to substitute the whole libc?

> I don't see any problems except for maybe the den==0 case. I'll look
> into the single and long double parts later today and verify that the
> C99 requirements for +inf/+nan inputs are met. The thing about long
> double is I think it's not standardized. Maybe we can assume at least
> 80 bits... I'll see if the rest of glibc gives any hints about that.
>
> Did I mention that I really like your solution to the optimization :)
>

If you suspect that you're failing to see something obvious, show it
to someone else :)
I liked the combined sincos optimization in the original version, so I
immediately pondered whether it can be preserved. If there was a
combined sinh/cosh call, we could use it here as well (but apparently
there isn't).

best regards

-- 
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz



reply via email to

[Prev in Thread] Current Thread [Next in Thread]