help-octave
[Top][All Lists]

## Re: regarding accuracy of fit

 From: CdeMills Subject: Re: regarding accuracy of fit Date: Tue, 19 Jul 2011 01:42:22 -0700 (PDT)

```rina wrote:
>
> the value of y from the experimental data and value of f from the fit now
> I
> want to know the best fit?
>
> I am doing that suppose R=y-f;
> and then ploting R with x values this is giving the correct result but I
> HAVE TOO much data to handle so I am getting a fluctuation with straight
> line??? not getting what tto do??
> thanks in advance for help
>
>
The "accuracy" of the fit is given by the covariance matrix of the estimated
parameters. It can be computed as follows:
- HYPOTHESIS: the noise on your data is normal, zero mean, unknown variance
- let say that the model is y \simeq a0 + a1 * x
-construct A, the regression matrix: first column is ones(size(x)), second
column is x, and so on
- solve for theta =[a0; a1] as theta = A\y
- compute the estimates of y as ye = A*theta
- compute the noise estimate as noise = y - ye
- verify the basis hypothesis !!! Search for outliers and other problems
- compute the noise estimate variance Cn as var_noise =
sumsq(noise)/(size(A, 1)-size(A, 2))
explanation: the denominator  contains the degrees of freedom, the number
of "free" sources, i.e. the noise e1, e2, ... which are mutually independent
minus the number of linked variables.
- the parameter covariance matrix is computed as
iA = inv(A.'*A);
Ctheta = iA * Cn * iA
This matrix "explains" how the noise on the data is mapped into noise on
the estimated parameters.
- the accuracy of the regression is obtained by testing the NULL hypothesis:
there is no regression, the components of theta are just pure noise, against
their observed value. To this end, compute their studentised residuals:
res_theta = abs(theta)./sqrt(diag(Ctheta))
Those numbers have to be validated as
theta_accur = 2*(1-tcdf(res_theta, size(A, 1)-size(A, 2))
This is the two-sided probability of having observed still greater values
of theta, given the NULL hypothesis is true. Values of 1e-3 or less tell you
that you can be very confident into the existence of a regression law
between y and x.  Values of 1% are so-so (if you reject the null hypothesis,
i.e. accept that there IS a regression between y and x, the probability of
being wrong is 1%). Values greater that 10 % clearly indicate there is a
problem.

With this arsenal, you end up with a signifiance level for your model. Note
that the accuracies are given on a per-coefficient basis. This way, you can
refine your search: introduce a x^2 term, see if the associated signifiance
level is still OK, and so on.

Regards

Pascal

--
View this message in context:
http://octave.1599824.n4.nabble.com/regarding-accuracy-of-fit-tp3675759p3677607.html
Sent from the Octave - General mailing list archive at Nabble.com.

```