help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Principle components analysis on a large dataset


From: Jaroslav Hajek
Subject: Re: Principle components analysis on a large dataset
Date: Sat, 22 Aug 2009 07:41:16 +0200

On Fri, Aug 21, 2009 at 2:46 AM, misha680<address@hidden> wrote:
>
> Dear Sirs:
>
> Please pardon me I am very new to Octave. I have been using MATLAB.
>
> I was wondering if Octave would allow me to do principal components analysis
> on a very large
> dataset.
>
> Specifically, our dataset has 68800 variables and around 6000 observations.
> Matlab gives "out of memory" errors. I have tried also doing princomp in
> pieces, but this does not seem to quite work for our approach.
>

A real matrix of size 6000x68800 takes 3.3GB of memory (half in single
prec). To get the SVD of the full matrix via LAPACK, you need at least
twice as many +some workspace, so unless you have 8GB of memory,
forget it.

One option, if you're only interested in a few principal components,
is to use eigs with an implicit matrix - you need to have Octave
linked with ARPACK for that.

Another possibility is to first perform an out-of-core
orthogonalization on the matrix, say, by doing QR factorization on
100x68800 block and using modified gram-schmidt to complete the
process. Then SVD the triangular factor - SVD of 6000x6000 matrix
would still take some time, but is certainly doable even on a typical
PC. Finally, pick the principal components and transform back to the
original basis.

-- 
RNDr. Jaroslav Hajek
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]