[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Principal Components Analysis
From: |
John W. Eaton |
Subject: |
Principal Components Analysis |
Date: |
Wed, 27 Jan 1999 15:31:02 -0600 (CST) |
On 27-Jan-1999, Gordon Haverland <address@hidden> wrote:
| Maybe someone here can help me. One of my users wants to
| draw ellipses around the centroids of her clusters of data
| points from Principal Components Analysis. I can force
| everything to work as I expect, but I don't understand
| some of the why of what I am doing. I have been reading
| Numerical Recipes and Matrix Computation (3rd edition).
|
| So, PCA does an SVD of the data to find eigenvalues
| and eigenvectors of the data set and we ignore
| eigenvalues (and associated vectors) with values
| less than 1. And the combination of eigenvalue and
| eigenvector defines a hyper-ellipsoid. The eigenvalues are
| equal to the square root of the variances in the rotated
| coordinate system.
|
| The above is more or less definitions of PCA. When I go to
| generate the ellipse(s), it turns out that I have to use the
| square root of the eigenvalues in order to get ellipses of
| the correct order of magnitude. This I don't understand.
| Next, the ellipse for the vector x with 2-norm of 1 appears
| to contain far more than 68% of the data points. This may be
| due to the few points lying outside the ellipse being quite far
| outside, but is still puzzling. Last, if I want to plot
| ellipses of 75%, 90%, 95%, ..., what factors do I either
| multiply the eigenvalues (square roots of the eigenvalues)
| by (or the vector x)?
|
| The end result of this should be a octave script or function
| which will take the data and do a 2D plot of the 2 most
| significant components, along with the ellipse that goes
| along with the data points. I'll gladly donate said script
| to this archive.
Perhaps the following code will be of some help. It will take a 2x2
matrix and generate a set of points for plotting an ellipse, its major
and minor axes, and the bounding box that just encloses the ellipse.
The 2x2 matrix is one that defines a quadratic form:
[ x1 x2 ] [ a11 a12 ] [ x1 ]
[ a21 a22 ] [ x2 ]
= a11*x1^2 + (a12+a21)*x1*x2 + a22*x2^2
The level is the value of the this expression along the contour.
jwe
## Plotting an ellipse with an aspect ratio not equal to one can be
## mighty confusing.
gset size ratio 1;
## [x, y, major, minor, bbox] = ellipse (amat, level, n)
##
## Given a 2x2 matrix, generate ellipse data for plotting.
function [x, y, major, minor, bbox] = ellipse (amat, level, n)
if (nargin < 3)
n = 100;
endif
if (nargin > 1)
[v, l] = eig (amat / level);
dl = diag(l);
if (any (imag (dl)) || any (dl <= 0))
error ("ellipse: amat must be positive definite");
endif
## Generate contour data.
a = 1 / sqrt (l(1,1));
b = 1 / sqrt (l(2,2));
t = linspace (0, 2*pi, n)';
xt = a * cos (t);
yt = b * sin (t);
## Rotate the contours.
ra = atan2 (v(2,1), v(1,1));
cos_ra = cos (ra);
sin_ra = sin (ra);
x = xt * cos_ra - yt * sin_ra;
y = xt * sin_ra + yt * cos_ra;
## Endpoints of the major and minor axes.
major = minor = (v * diag ([a, b]))';
major (2,:) = -major (1,:);
minor (1,:) = -minor (2,:);
## Bounding box for the ellipse using magic formula.
ainv = inv (amat);
xbox = sqrt (level * ainv(1,1));
ybox = sqrt (level * ainv(2,2));
bbox = [xbox ybox; xbox -ybox; -xbox -ybox; -xbox ybox; xbox ybox];
else
usage ("ellipse (amat, level, n)");
endif
endfunction
t = rand (100,2);
amat = t' * t;
level = 100;
npts = 181;
[x, y, major, minor, bbox] = ellipse (amat, level, npts);
gset nokey
plot (minor(:,1), minor(:,2), "address@hidden", major(:,1),
major(:,2),"address@hidden",
bbox(:,1), bbox(:,2), "-g", x, y, "-b");