help-octave
[Top][All Lists]

## Re: Curve Fitting "splattered" data

 From: Henry F. Mollet Subject: Re: Curve Fitting "splattered" data Date: Wed, 20 Oct 2004 19:17:26 -0700 User-agent: Microsoft-Entourage/10.1.1.2418

```Correction: GM regression is also know as *Reduced Major Axis Regression*
not Minor Axis Regression. The Minor Axis goes with the Major axis of the
correlation ellipse. The axis length ratio is given by sqrt (larger
eigenvalue/smaller eigenvalue). I don't know why it's called reduced major
axis regression.

Summary: PCR apparently same as Major Axis "Regression" but Rafael's
approach using svd (cov([x,y])) is more elegant. I also have two plot
questions (5. and 6.).
Henry

1. I have already posted regarding the Geometrical Mean (GM) regression. The
GM-regression is also known as the Minor Axis Regression. It minimizes the
"diagonal" distances of the data points to the line. It is suggested to be
appropriate when the "error" of the x measurements are similar to the error
of the y measurements. The y-on-x regression, which minimizes the sum of
squares in the y-direction, assumes that there's no error for the data used
on the x-axis.

2. PCA is apparently the same as the Major Axis "regression" and probably
more appropriate here because your x and y are not functionally related.
After all they are some numbers in the complex plane.

3. The Major Axis (of the correlation ellipse) should not be called a
regression. I'm not exactly sure what is being minimized in geometrical
terms but it minimizes the sum of the squares of the distances from the
observed points to the line in a direction at right angles to the line, when
one unit of measurement occupies the same absolute distance on the x and y
coordinates. A mouthful and I don't know what it really means.

4. Here I'll calculate the slope of the major axis for comparison
*** local user variables:
prot  type                       rows   cols  name
====  ====                       ====   ====  ====
rwd  matrix                       37      1  x
rwd  matrix                       37      1  y
octave:10> b=cov(x,y)/var(x)
b = -15.309
octave:11> d= cov(x,y)/var(y)
d = -0.022839
octave:12> 1/d
ans = -43.785
octave:13> r = cov(x,y)/sqrt(var(x)*var(y))
r = -0.59131
octave:14> b_GM = b/r
b_GM = 25.891
octave:15> M = [var(x),cov(x,y);cov(x,y),var(y)]
% same as cov(d)
M =
1.8317e-10   -2.8042e-09
-2.8042e-09    1.2278e-07
octave:16> eig(M)
ans =
1.1906e-10
1.2285e-07
% best to think of these eigenvalues as variances.
% same as svd(cov([x,y])) given by Rafael

octave:18> la2=max(eig(M))
la2 =  1.2285e-07
octave:19> b_MA= cov(x,y)/(la2-var(y))
b_MA = -43.743
% It appears that Major Axis slope is close to 1/d = -43.785 (reciprocal of
x-on-y slope. Probably has something to do with fact that correlation
ellipse is so close to the y-xis (theta=1.31 degrees)?
octave:20> tan2theta=2*cov(x,y)/(var(x)-var(y))
tan2theta = 0.045746
octave:21> theta=atan(tan2theta)/2
theta = 0.022857

*********
5. I had to resort to Excel to be able to plot the data using the *same*
axis range for both x and y, so that I could see the shape of the
correlation ellipse. Apparently gnuplot considered the range of x to be
empty:

octave:5> plot (x,y,"x") % no problem here
octave:6> max(x)
ans =  1.9826e-05
octave:7> min(y)
ans =  -4.1757e-04
octave:8> max(y)
ans = 0.00091074
octave:9> min (y)
ans =  -4.1757e-04
octave:10> axis ([0.001,0.001,0.001,0.001], "square")
octave:11> plot (x,y,"x")
gnuplot> pl '/var/tmp/oct-kUoi1a' t "line 1" w points 1 4
^
line 0: Can't plot with an empty x range!
*******
5. I also had no luck with Rafael's plot:
octave:7> [u,v,w] = svd (cov (d));
octave:8> m = mean (d);
octave:9> r = min (d (:, 1));
octave:10> s = max (d (:, 1));
octave:11> plot (d (:, 1), d (:, 2), '*')
octave:12> hold on
octave:13> plot ([r, s], m (2) + [(r-m(1)),(s-m(1))] * u(2) / u(1));
error: single index only valid for row or column vector
error: evaluating binary operator `*' near line 13, column 43
error: evaluating binary operator `/' near line 13, column 50
error: evaluating binary operator `+' near line 13, column 21
error: evaluating argument list element number 2
*******

on 10/18/04 2:07 AM, Rafael Laboissiere at address@hidden
wrote:

> * Robert A. Macy <address@hidden> [2004-10-17 23:36]:
>
>> Trying to fit a curve to data with poor results.
>>
>> Data is a set of 37 complex data points that roughly lie
>> along a straight line in the complex plane.  Sequence of
>> data has no significance, only their location on the plane.
>>
>> Don't care about intercept point, only the slope.  Ran
>> polyfit.m using  real(datapoints) and imag(datapoints)
>> thinking that would yield slope of trend, for example,
>>
>> p=polyfit(real(datapoints),imag(datapoints),1)
>>
>> calculates a slope, but when I reverse the order
>>
>> p=polyfit(imag(datapoints),real(datapoints),1)
>>
>> I don't get a reciprocal slope?!
>>
>> one way I get 15.093, the other way I get 0.0228387
>> Why aren't they reciprocal?
>
> Because in the first case you minimize the sum of squared errors in
> imag(datapoints) and in the second case in real(datapoints).  This is the
> expected behavior of polyfit (see "help polyfit").
>
>> Is there a better program for finding the straight line?
>
> You might do a PCA (principal component analysis) on your data, which boils
> down to using either eig or svd.  Try this:
>
>   d = [real(datapoints), imag(datapoints)];
>   [u,v,w] = svd (cov (d));
>   m = mean (d);
>   r = min (d (:, 1));
>   s = max (d (:, 1));
>   hold off
>   plot (d (:, 1), d (:, 2), '*')
>   hold on
>   plot ([r, s], m (2) + [(r-m(1)),(s-m(1))] * u(2) / u(1));
>
> The slope of the curve plotted is close to -0.022861.

-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------

```