[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Principle components analysis on a large dataset
From: |
Jaroslav Hajek |
Subject: |
Re: Principle components analysis on a large dataset |
Date: |
Sat, 22 Aug 2009 07:41:16 +0200 |
On Fri, Aug 21, 2009 at 2:46 AM, misha680<address@hidden> wrote:
>
> Dear Sirs:
>
> Please pardon me I am very new to Octave. I have been using MATLAB.
>
> I was wondering if Octave would allow me to do principal components analysis
> on a very large
> dataset.
>
> Specifically, our dataset has 68800 variables and around 6000 observations.
> Matlab gives "out of memory" errors. I have tried also doing princomp in
> pieces, but this does not seem to quite work for our approach.
>
A real matrix of size 6000x68800 takes 3.3GB of memory (half in single
prec). To get the SVD of the full matrix via LAPACK, you need at least
twice as many +some workspace, so unless you have 8GB of memory,
forget it.
One option, if you're only interested in a few principal components,
is to use eigs with an implicit matrix - you need to have Octave
linked with ARPACK for that.
Another possibility is to first perform an out-of-core
orthogonalization on the matrix, say, by doing QR factorization on
100x68800 block and using modified gram-schmidt to complete the
process. Then SVD the triangular factor - SVD of 6000x6000 matrix
would still take some time, but is certainly doable even on a typical
PC. Finally, pick the principal components and transform back to the
original basis.
--
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz