help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Import large field-delimited file with strings and numbers


From: Markus Bergholz
Subject: Re: Import large field-delimited file with strings and numbers
Date: Mon, 8 Sep 2014 22:16:49 +0200



On Mon, Sep 8, 2014 at 10:14 PM, Markus Bergholz <address@hidden> wrote:


On Mon, Sep 8, 2014 at 9:44 PM, Joao Rodrigues <address@hidden> wrote:
On 09/08/2014 08:27 PM, Markus Bergholz wrote:

Bottom line: I think it has to do with the way Octave allocates memory to cells, which is not very efficient (as opposed to dense or sparse numerical data, which it handles very well).

I managed to solve the problem I had, thanks to the help of you guys.

However, I think it would probably be nice if in future versions of Octave there was something akin to ulimit installed by default to prevent a process from eating up all available memory.

If someone wants to check this issue the data I am working with is public:

http://www.bls.gov/cew/data/files/*/csv/*_annual_singlefile.zip

where * = 1990:2013


nvm, got it.
which columns do you need?
Hi. As I said above, I already solved the problem (with your help).

I just put the link so that someone interested can check the memory overload problem.

(But the data I need to extract is in columns 1-3 and 8-11.)

octave:17> joao
all done, lua stack top 0
all done, lua stack top 0
all done, lua stack top 0
all done, lua stack top 0
all done, lua stack top 0
all done, lua stack top 0
all done, lua stack top 0
Elapsed time is 50.5821 seconds.
octave:18> memory

 Memory used by Octave:   277.438 MB
 Physical Memory (RAM): 7929.52 MB

octave:19> mean(mean(m))
ans =       9605668.464203


octave:21> m(1:10,:)
ans =

                     0                     1                     0                     0                  1196                 54587            4050257902
                     0                     1                     0                     0                  1196                 54587            4050257902
                     0                     1                     0                     0                   587                 11537             694789102
                     0                     1                     0                     0                     2                    13                437863
                     0                     1                     0                     0                    17                   154              11894235
                     0                     1                     0                     0                    46                  1760             144453410
                     0                     1                     0                     0                    32                  6245             406490963
                     0                     1                     0                     0                    26                   862              16926869
                     0                     1                     0                     0                     3                   566              49683552
                     0                     1                     0                     0                   484                 33451            2725581908



m=ones(3565137,7);
index = [1 2 3 8 9 10 11];
tic
for n = 1:length(index)
  m(:,n) = lua('colaccess','2013.annual.singlefile.csv',index(n),2)';
end
toc


and the lua script is not optimized... should be possible to pimp it < 30 seconds.
but it's experimental, haven't verified the result :P

yeah, column 1 and 3 are wrong, because they are strings in the csv file...should be easy to manage.

 


 

Many thanks
j




--
icq: 167498924
XMPP|Jabber: address@hidden



--
icq: 167498924
XMPP|Jabber: address@hidden

reply via email to

[Prev in Thread] Current Thread [Next in Thread]