help-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers nee


From: Gordan Bobic
Subject: Re: [Help-gsl] Re: C/C++ speed optimization bible/resources/pointers needed, and about using GSL...
Date: Fri, 27 Jul 2007 14:01:21 +0100 (BST)

On Fri, 27 Jul 2007, Jochen Küpper wrote:

[...example..]

Using floats instead of doubles can lead to quite significant performance differences.

On you Pentium 3, not the average number cruncher these days.
A Opteron or any of the modern Intel CPUs would be more appropriate.

*sigh*

On an x86-64 Core2/1.9GHz, CentOS/x86-64 v5, ICC v9.1.051/x86-64
Using the small sample program I posted earlier.
Compiled with: icc -msse3 -xP -fp-model fast=2

Using floats: 2.65 seconds
Using doubles: 5.29 seconds

Twice as many floats vectorize per operation as doubles. Thus it goes twice as fast. How much more evidence do you require?

Where does this whole nonsense of "doubles are as fast as floats" come from? I know it's taught everywhere these days, but it is absolutely, not true. Whoever first came up with it has a lot to answer for.

[...]

ICC 9.1 on x86-64 still makes my code run on the order of 4.5x faster than GCC's best efforts.

Maybe its due to your code?

I'm not convinced. Can you write some code that demonsrates otherwise? I have not yet managed to do so, but I'd be interested to see it. It may provide a useful example of what to look out for when targetting a speciffic compiler.

If you can provide an example of GCC being faster on anything like this order of magnitude, please do.

That is a fair request. Let's see what we get...

[...]

Yes, ICC v10 is quite badly broken. It doesn't vectorize a lot of code than 9.1 vectorizes, and it produces broken code in certain cases. Have a look on Intel's Software Community forum for more details. I don't use it at the moment.

The nice thing about GCC is actually its robustness and availability on many platforms.

Absolutely. It's the ubiqutous compiler. And it's good to have a compiler that is reasonably consistent across various platforms.

Generally speaking it is nice to have options: i.e., to choose between icc and GCC on x86 CPUs. I would appreciate it, however, if Intel would put simply some efforts into optimizing GCC for ix86. But they pay, therefore, they decide.

They did so in the past. They originally produced some Pentium patches for GCC, which AFAIK, later died a death. I am not sure that ever got ported past PGCC. I saw some mention of this being considered on the GCC mailing list a while back, but I'm not aware of anything actually having happened.

And since Intel offer their compiler for free for non-commercial use, they are being reasonably fair toward us OSS people. :-)

Don't get me wrong, I'm not knocking GCC. I'm just a little disappointed that 10 years after SSE was introduced we still don't have a vectorizer to make use of it. That's a long time.

Maybe Intel should have implemented that in GGCC by now to give their CPUs an advantage over others?

Possibly, but economically I'm not sure it would have made as much sense. People who actually need the performance boost like that in a commercial application generally aren't too concerned about forking out a few hundred dollars for a licence. What I'd _really_ like to see the resumption of efforts to compile the Linux kernel on x86 using ICC. This worked back in the 2.4.20 days, and there were patches, as there were for the early 2.6.x kernels. But none of the new ones work. Intel even had a section on their site about this, which has long since disappeared.

GSL seems to descend the solution on my problem in fewer iterations. But even though my library is doing 2-3 times more iterations, it goes about 2x as fast. Mind you, the algorithms are different. GSL uses gradient descent via partial derivatives, whereas I do a sampled annealed steepest descent with caching (less sensitive to local minima on a jagged errorscape).

Why don't you then provide your improvements to GSL codes back to the project? (I can see many reasons, but some general improvements seem to be easily within your reach. We would all appreciate it!)

My code is very specific to fitting the functions I am interested it. If I were to generalize it to the point where it is generic enough for inclusion into GSL, it is likely that most of the performance would disappear. Also remember that I am gaining considerable performance from vectorization on my code, something that is very difficult in GSL. GSL doesn't even support multifit solver operations on float rather than double typed vectors/matrices. THIS, however, I would be interested in developing/contributing. The only problem there is finding the time to do it, especially since I already have a home-brewed solution to my specific problem that is faster on my hardware.

The biggest problem with icc is that it in not free. Neither free as in "free speech", nor, as in "free beer". You have to pay for it! Unless, you are coding for the fun of it; but then, mostly, speed is not a terrible issue.

ICC is "free as in beer" for non-commercial use. If you're not selling your software compiled with ICC, you can use it free of charge. And from what I understand, FOSS counts as non-commercial.

Gordan

reply via email to

[Prev in Thread] Current Thread [Next in Thread]