On 03/17/10 10:01, Corrado wrote:
Dear Fredrik, dear Octave friends,
First of all thanks for coming back to me.
I have 40,000 vectors of observations: {y,p1,p2,p3,p4,p5 ..... pn}_j
where j spans over the 40,000 vectors (that is from 1 to 40,000).
The k={k0,k1,k2,k3,k4,.....,kn} is the vector of parameters to be
determined by fitting.
I believe you are right, in machine lerning language, the
{1,p1,p2,....,pn} vector would be called the input.
y is the response variable.
PS: If you build a frequency histogram from the y_j, the distribution
looks approximately beta, but fails tests because of the number of
points ....
Best,
OK, then you have lots of data (which is good :-) How, largerror is n (the
length of your data vector)?
Note that your data, y, is not distributed at all - this is what you
actually know. Your knowledge about the model parameters will be
distributed since your model is not perfect in the sense that you always
have measurement uncertainties and model uncertainties. This is also why
you have the error (or model miss-fit) term e,
y = 1 - exp(-k'*p) + e.
Essentially, this is a parameter estimation problem and how you obtain
your estimates (of k) depends on what you know about the parameters (do
you know a bound on them, mean value, variance etc.) and what you know
about your error, e (a conservative assumption is to use a zero-mean
Gaussian distribution for e).
/Fredrik
Fredrik Lingvall wrote:
On 03/16/10 20:01, Corrado wrote:
Dear Octave users,
I have to fit the non linear regression:
y~1-exp(-(k0+k1*p1+k2*p2+ .... +kn*pn))
where ki>=0 for each i in [1 .... n] and pi are on R+.
I am using, at the moment, nls, but I would rather use a Maximum
Likelhood based algorithm. The error is not necessarily normally
distributed.
y is approximately beta distributed, and the volume of data is
medium to
large (the y,pi may have ~ 40,000 elements).
Any suggestion?
Regards
Corrado,
Can you tell us a little more about your problem?
As I understand it you have a model,
y = 1 - exp(-k'*p) + e
where k = [k_0 k_1 ... k_n]' and p = [1 p_1 p_2 ... p_n]' and where y is
your data vector, p is your "input signal" and k is the parameter vector
of your model. Have I understood you correctly?
/Fredrik