Principal Components Analysis

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Principal Components Analysis

From:	John W. Eaton
Subject:	Principal Components Analysis
Date:	Wed, 27 Jan 1999 15:31:02 -0600 (CST)

On 27-Jan-1999, Gordon Haverland <address@hidden> wrote:

|   Maybe someone here can help me.  One of my users wants to
| draw ellipses around the centroids of her clusters of data
| points from Principal Components Analysis.  I can force
| everything to work as I expect, but I don't understand
| some of the why of what I am doing.  I have been reading
| Numerical Recipes and Matrix Computation (3rd edition).
| 
|   So, PCA does an SVD of the data to find eigenvalues
| and eigenvectors of the data set and we ignore 
| eigenvalues (and associated vectors) with values 
| less than 1.  And the combination of eigenvalue and
| eigenvector defines a hyper-ellipsoid.  The eigenvalues are
| equal to the square root of the variances in the rotated
| coordinate system.
| 
|   The above is more or less definitions of PCA.  When I go to
| generate the ellipse(s), it turns out that I have to use the
| square root of the eigenvalues in order to get ellipses of
| the correct order of magnitude.  This I don't understand.
| Next, the ellipse for the vector x with 2-norm of 1 appears
| to contain far more than 68% of the data points.  This may be
| due to the few points lying outside the ellipse being quite far
| outside, but is still puzzling.  Last, if I want to plot
| ellipses of 75%, 90%, 95%, ..., what factors do I either
| multiply the eigenvalues (square roots of the eigenvalues)
| by (or the vector x)?
| 
|   The end result of this should be a octave script or function
| which will take the data and do a 2D plot of the 2 most
| significant components, along with the ellipse that goes
| along with the data points.  I'll gladly donate said script
| to this archive.

Perhaps the following code will be of some help.  It will take a 2x2
matrix and generate a set of points for plotting an ellipse, its major
and minor axes, and the bounding box that just encloses the ellipse.

The 2x2 matrix is one that defines a quadratic form:

  [ x1 x2 ] [ a11 a12 ] [ x1 ]
            [ a21 a22 ] [ x2 ]

    = a11*x1^2 + (a12+a21)*x1*x2 + a22*x2^2

The level is the value of the this expression along the contour.

jwe


## Plotting an ellipse with an aspect ratio not equal to one can be
## mighty confusing.

gset size ratio 1;

## [x, y, major, minor, bbox] = ellipse (amat, level, n)
##
## Given a 2x2 matrix, generate ellipse data for plotting.

function [x, y, major, minor, bbox] = ellipse (amat, level, n)

  if (nargin < 3)
    n = 100;
  endif

  if (nargin > 1)

    [v, l] = eig (amat / level);

    dl = diag(l);
    if (any (imag (dl)) || any (dl <= 0))
      error ("ellipse: amat must be positive definite");
    endif

    ## Generate contour data.

    a = 1 / sqrt (l(1,1));
    b = 1 / sqrt (l(2,2));

    t = linspace (0, 2*pi, n)';

    xt = a * cos (t);
    yt = b * sin (t);

    ## Rotate the contours.

    ra = atan2 (v(2,1), v(1,1));

    cos_ra = cos (ra);
    sin_ra = sin (ra);

    x = xt * cos_ra - yt * sin_ra;
    y = xt * sin_ra + yt * cos_ra;

    ## Endpoints of the major and minor axes.

    major = minor = (v * diag ([a, b]))';

    major (2,:) = -major (1,:);
    minor (1,:) = -minor (2,:);

    ## Bounding box for the ellipse using magic formula.

    ainv = inv (amat);
    xbox = sqrt (level * ainv(1,1));
    ybox = sqrt (level * ainv(2,2));

    bbox = [xbox ybox; xbox -ybox; -xbox -ybox; -xbox ybox; xbox ybox];

  else
    usage ("ellipse (amat, level, n)");
  endif

endfunction

t = rand (100,2);

amat = t' * t;

level = 100;
npts = 181;

[x, y, major, minor, bbox] = ellipse (amat, level, npts);

gset nokey

plot (minor(:,1), minor(:,2), "address@hidden", major(:,1), 
major(:,2),"address@hidden",
      bbox(:,1), bbox(:,2), "-g", x, y, "-b");

[Prev in Thread]

Current Thread

[Next in Thread]

FW: save structs, lists, A. Scottedward Hodel, 1999/01/13
- Re: FW: save structs, lists, Joao Cardoso, 1999/01/13
  - Principal Components Analysis, Gordon Haverland, 1999/01/27
    - Principal Components Analysis, John W. Eaton <=
    - Re: Principal Components Analysis, Rafael Laboissiere, 1999/01/28

Prev by Date: Principal Components Analysis
Next by Date: Pretty gnuplot plots
Previous by thread: Principal Components Analysis
Next by thread: Re: Principal Components Analysis
Index(es):
- Date
- Thread