[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Curve Fitting "splattered" data
Henry F. Mollet
Re: Curve Fitting "splattered" data
Wed, 20 Oct 2004 20:21:46 -0700
I apologize that I have to post for a third time. However,the sample data
provided appears to be particularly unsuitable for what I was trying to
check on because cov(x,y) is so small to begin with (-2.8e-09). The idea of
the Major Axis "Regression" or PCA is to rotate the coordinate system so
that cov(xrot, yrot) will become zero in the rotated coordinate system. In
the example provided, since cov(x,y) is so small, corresponding to small
angle required of about 1.3 degrees, I end up with the same small covariance
after rotation (now +2.8e-09) and the required angle of rotation is now -1.3
degrees. I suspect that I'm at the limit of computer accuracy because the
cov(x,y) is so small, probably zero for practical purposes? On the other
hand, the slopes that were calculated appeared to be reasonable?
Correction: GM regression is also know as *Reduced Major Axis Regression*
not Minor Axis Regression. The Minor Axis goes with the Major axis of the
correlation ellipse. The axis length ratio is given by sqrt (larger
eigenvalue/smaller eigenvalue). I don't know why it's called reduced major
Summary: PCR apparently same as Major Axis "Regression" but Rafael's
approach using svd (cov([x,y])) is more elegant. I also have two plot
questions (5. and 6.).
1. I have already posted regarding the Geometrical Mean (GM) regression. The
GM-regression is also known as the Minor Axis Regression. It minimizes the
"diagonal" distances of the data points to the line. It is suggested to be
appropriate when the "error" of the x measurements are similar to the error
of the y measurements. The y-on-x regression, which minimizes the sum of
squares in the y-direction, assumes that there's no error for the data used
on the x-axis.
2. PCA is apparently the same as the Major Axis "regression" and probably
more appropriate here because your x and y are not functionally related.
After all they are some numbers in the complex plane.
3. The Major Axis (of the correlation ellipse) should not be called a
regression. I'm not exactly sure what is being minimized in geometrical
terms but it minimizes the sum of the squares of the distances from the
observed points to the line in a direction at right angles to the line, when
one unit of measurement occupies the same absolute distance on the x and y
coordinates. A mouthful and I don't know what it really means.
4. Here I'll calculate the slope of the major axis for comparison
*** local user variables:
prot type rows cols name
==== ==== ==== ==== ====
rwd matrix 37 1 x
rwd matrix 37 1 y
b = -15.309
octave:11> d= cov(x,y)/var(y)
d = -0.022839
ans = -43.785
octave:13> r = cov(x,y)/sqrt(var(x)*var(y))
r = -0.59131
octave:14> b_GM = b/r
b_GM = 25.891
octave:15> M = [var(x),cov(x,y);cov(x,y),var(y)]
% same as cov(d)
% best to think of these eigenvalues as variances.
% same as svd(cov([x,y])) given by Rafael
la2 = 1.2285e-07
octave:19> b_MA= cov(x,y)/(la2-var(y))
b_MA = -43.743
% It appears that Major Axis slope is close to 1/d = -43.785 (reciprocal of
x-on-y slope. Probably has something to do with fact that correlation
ellipse is so close to the y-xis (theta=1.31 degrees)?
tan2theta = 0.045746
theta = 0.022857
thetadegress = 1.3096
5. I had to resort to Excel to be able to plot the data using the *same*
axis range for both x and y, so that I could see the shape of the
correlation ellipse. Apparently gnuplot considered the range of x to be
octave:5> plot (x,y,"x") % no problem here
ans = 1.9826e-05
ans = -4.1757e-04
ans = 0.00091074
octave:9> min (y)
ans = -4.1757e-04
octave:10> axis ([0.001,0.001,0.001,0.001], "square")
octave:11> plot (x,y,"x")
gnuplot> pl '/var/tmp/oct-kUoi1a' t "line 1" w points 1 4
line 0: Can't plot with an empty x range!
5. I also had no luck with Rafael's plot:
octave:7> [u,v,w] = svd (cov (d));
octave:8> m = mean (d);
octave:9> r = min (d (:, 1));
octave:10> s = max (d (:, 1));
octave:11> plot (d (:, 1), d (:, 2), '*')
octave:12> hold on
octave:13> plot ([r, s], m (2) + [(r-m(1)),(s-m(1))] * u(2) / u(1));
error: single index only valid for row or column vector
error: evaluating binary operator `*' near line 13, column 43
error: evaluating binary operator `/' near line 13, column 50
error: evaluating binary operator `+' near line 13, column 21
error: evaluating argument list element number 2
on 10/18/04 2:07 AM, Rafael Laboissiere at address@hidden
> * Robert A. Macy <address@hidden> [2004-10-17 23:36]:
>> Trying to fit a curve to data with poor results.
>> Data is a set of 37 complex data points that roughly lie
>> along a straight line in the complex plane. Sequence of
>> data has no significance, only their location on the plane.
>> Don't care about intercept point, only the slope. Ran
>> polyfit.m using real(datapoints) and imag(datapoints)
>> thinking that would yield slope of trend, for example,
>> calculates a slope, but when I reverse the order
>> I don't get a reciprocal slope?!
>> one way I get 15.093, the other way I get 0.0228387
>> Why aren't they reciprocal?
> Because in the first case you minimize the sum of squared errors in
> imag(datapoints) and in the second case in real(datapoints). This is the
> expected behavior of polyfit (see "help polyfit").
>> Is there a better program for finding the straight line?
> You might do a PCA (principal component analysis) on your data, which boils
> down to using either eig or svd. Try this:
> d = [real(datapoints), imag(datapoints)];
> [u,v,w] = svd (cov (d));
> m = mean (d);
> r = min (d (:, 1));
> s = max (d (:, 1));
> hold off
> plot (d (:, 1), d (:, 2), '*')
> hold on
> plot ([r, s], m (2) + [(r-m(1)),(s-m(1))] * u(2) / u(1));
> The slope of the curve plotted is close to -0.022861.
Octave is freely available under the terms of the GNU GPL.
Octave's home on the web: http://www.octave.org
How to fund new projects: http://www.octave.org/funding.html
Subscription information: http://www.octave.org/archive.html