help-gsl
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Help-gsl] Non-linear Multi-parameter Least-Squares Example


From: John Gehman
Subject: Re: [Help-gsl] Non-linear Multi-parameter Least-Squares Example
Date: Fri, 20 Jul 2007 18:50:28 +1000

Hi Gordan,

You're heaps more informed on the computer science end of things; I'm only as aware as what my needs have required.

So the only further question you raise that I can hope to address regards the sigma -- there are certainly situations where some data will have more error than others, and whacking these different errors into the sigma vector forces the fit to respect some data points more than others.

For the case where all data has the same error, as yours appears to, then all data is weighted equally. HOWEVER, this value does scale the residual and the sensitivity of the gradients, and I believe therefore does influence convergence. It may not make that big a difference in the regime where you are working, but I have noticed before with some of my applications that it was important to be realistic about the error. It was too long ago for me to remember any more substantive detail, unfortunately, and so I could be misguided about that. I'm not sure whether a gsl_vector_scale after the fact would be more efficient that explicit divisions during the loop?

Cheers,
john

---------------------------------------------------------

John Gehman
Research Fellow
School of Chemistry
University of Melbourne
Australia

On 20/07/2007, at 6:30 PM, Gordan Bobic wrote:

On Fri, 20 Jul 2007, John Gehman wrote:

In using integers from 0 to 40 for your xdata, and computing data
with (b/c) as part of your function, you're introducing an undefined
number into the whole process. Hence the residual will be undefined,
and the sum of the squares of the residuals will be undefined, and
the gradients at that point, etc etc. At a minimum try float f = i
+1.0 where the data is built, and in the _f and _df functions.

Yeah, somebody else pointed it out to me yesterday. I can't believe I
didn't spot that. What a schoolboy error. Worse, I only re-wrote it in the period/t instead of frequency format because I was finding it easier to
read while I was getting it to work.

I now changed it using the frequency, so it both runs faster (due to
multiply being roughly 4 clock cycles instead of about 80 for a divide)
and avoids the div/0 problems. :-)

That's surely the biggest problem. But you also might want to be
careful about mixing your float data types with the doubles assumed
by gsl structures. I didn't look closely, but was left with the
impression that they were mixed a bit in your code. You might be
getting away with it anyway, but just a word of caution.

The only thing that seems to require doubles is the x_init array.
Everything else seems to work OK with floats. My main motivations for
using floats are:

1) Smaller - more data fits in the cache
2) Faster
2.1) SSE1 cannot vectorize doubles
2.2) SSE2+ can vectorize doubles but only 2 at a time instead of 4 at a
time.

Also, with regard to the ultimate intent of fitting the data, I
presume this was just an incremental step in your development of the
fitting procedure, but insofar as this goes, once you clean up the
above, you'll still get a shite fit, as the harmonic function, is not
described well by the data sampling.

Indeed I am aware of this, but it depends on the data, and it depends on
the initial guesstimate. I have a reasonably good guesstimator
implemented, and I am finding that typically the error between the fit and the data falls to within 5-10% in most cases. But this may just mean that my data is quite clean (which, arguably, I expected). But you do have a
point - my worst case data may well be much more noisy.

Which brings me to the next question. I have my own home-brewed solver
implemented and that works by steepest descent annealing (i.e. when it
finds a local minimum it ups the resolution). I'm finding that overall it fits better solutions - possibly because noise in the data somtimes upsets the algebraic gradient descent and it gets stuck in a local minimum, and
the local-inspection approach with increasing precision may be more
robust. The GSL implementation also seems slower on my current platform, despite generally managing to descend in considerably fewer iterations (up to the point where it stops being able to fit a better solution - but it
stops earlier than my own algorithm).

Now, granted, the speed could be an implementation issue (my code was made with ICC's auto-vectorization in mind), but are there parameters on the
solver function that could be tweaked, other than the delta checker
sensitivity?

Also, how does sigma fit into all this? I have tried it with various
settings and it didn't seem to make any difference, so I just set it to 1
and removed it from all the equations to save the seemingly needless
divide.

Finally - where is gsl_matrix_float_set() implemented? grep only seems to find it in the header files. I wanted to have a closer look at the code to
see if I could do anything to help it vectorize with ICC (assuming it
doesn't already).

Thank you all for helping, I really appreciate it.

Gordan



_______________________________________________
Help-gsl mailing list
address@hidden
http://lists.gnu.org/mailman/listinfo/help-gsl



reply via email to

[Prev in Thread] Current Thread [Next in Thread]