help-gsl
[Top][All Lists]

Re: [Help-gsl] non-linear least squares fitting

 From: Jay Howard Subject: Re: [Help-gsl] non-linear least squares fitting Date: Tue, 4 Dec 2007 15:27:57 -0600

```Thanks for the help, Barrett.  My responses follow yours:

> Typical usage would be to use gsl_multifit_fdfsolver_lmsder or
> gsl_multifit_fdfsolver_lmsder as the gsl_multifit_fsolver_type. Look at
> the variable "T" in the example here:
> http://www.gnu.org/software/gsl/manual/html_node/Example-programs-for-Nonlinear-Least_002dSquares-Fitting.html

I was hoping to avoid using the fdfsolver set of calls because they
require that I furnish df() and fdf() functions, which I'm not clear
on how to implement based on f().  On this page from the docs:

http://www.gnu.org/software/gsl/manual/html_node/Providing-the-Function-to-be-Minimized.html

the "top" portion (gsl_multifit_function) seems to imply there's a
scenario where I only need to provide f().  Is that not correct?

> I believe you are misinterpretting the docs. The total number of
> observations (n) (= total number of equations) should be greater than or
> equal to the number of parameters (p) that you have in each equation. You
> should only have one equation (called "function" in the docs) per
> observation.

I think you're right; I was phrasing my the problem incorrectly.  The
error function is a linear combination of terms, each of which has a
non-linear component.  So it's something like:

F = a * X * G(u, v, b, c, d) + e * Y * H(u, v, f, g, h) + k

where a, b, c, d, e, f, g, h and k are constants, and G and H are
non-linear functions on (u,v) with constants (b,c,d) and (f,g,h)
respectively.

My plan was to represent this to GSL as a non-linear function with
only 6 parameters.  For each call, I'd compute best-fit values (over
my data set) for the linear coefficients a, e and k using LAPACK's
dgesl().  In retrospect, this seems like the wrong way to go about it.
Rather, I should throw the linear coefficients in with the non-linear
ones and let them be determined iteratively along with the others.  In
that case, my p = 9 and n = ~100 million.

Should I expect to be able to process that much data, or is that
crazy?  So far I've performed linear regressions (7 terms) on this
same dataset, but I'm unclear on the GSL solver's internal storage
requirements.

```