[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: Pre-Allocating Memory for Speedup
From: |
Jaroslav Hajek |
Subject: |
Re: Pre-Allocating Memory for Speedup |
Date: |
Fri, 23 Jul 2010 07:36:05 +0200 |
On Fri, Jul 23, 2010 at 6:58 AM, Manoj Deshpande
<address@hidden> wrote:
> My intention is to loop over unknown size text files, each having rows of 16
> coloumns, by traversing each row , and selectively appending to 12 different
> matrices(1 col wide) and let all the matrices grow. At the end of parsing
> all files, i wish to do some operation over each of the matrices.
>
> Based on the feedback i received on this forum, i have below
>
> 1. create cell arrays , each of size defined by file size( wc -l filename)
> 2. read file line by line, do sscanf into array of 16 values, do conditional
> checks and append to one of the matrices. (see below)
>
> Questions.
>
> 1. How do i optimize sscanf to directly read to an array(onerow), instead of
> the ugly workaround which i did ?
> 2. the below code has a problem, file size is 330,000 rows, and i
> pre-allocate 12 matrices, each is zeros(330,000,1), i cannot APPEND to this
> matrix, since the zeros take space, i should be really tracking the row
> index, and over writing the zero starting from index 1, right ? in that case
> i will need row iterators , for each matrix, (see row_index in code ), so i
> end up with 12 iterators. The problem with this approach though is that
> since i have 5 days to track, and hence 5 files to go over, and then append
> all the matrices, i.e matrix 5 of day 1 with matrix 5 of day2 , and so
> on.... i would end up having 60 different matrices, and 60 different
> iterators. Is there a better approach, without affecting speed ?
> 3. There is another unknown. at the end ofparsing a file, not all matrices
> will be of the same size, i.e it is not guaranteed, that every row in the
> file, will be useful and hence each of matrices maynot necessarily be 330k,
> it could be lesser. I do not mind skipping the 5 day loop ( thereby avoid
> nested loops ), and actually serially creating sets of 12 matrices , one set
> of 12 for each day, and finally appending all correspodning matrices, if we
> have a cleaner approach for a single day.
> 4. the 12 matrices, are only 1 col wide, and i need to do some plots over
> the values, so maybe we dont necessarily need to use matrices? arrays ?
> cells ?
>
> Any suggestions, are deeply appreciated. I am hoping the gurus will have
> code review comments.
>
> Thanks
> Manoj
>
>
Given that you know the number of columns, why don't you just load the
whole matrix at once, filter rows,
and then plot? 330000 rows should be manageable (<50MB). Should there
be many more rows, you can proceed in chunks.
fid = fopen (file_name);
# possibly skip header lines
data = fscanf ("%f", [16, Inf]).';
# interval check
interval=floor((mod(data(:,1),86400)/interval_size));
# filter data by interval
data = data(interval == interval_num);
# mac address check
mac_address = data(:,2);
# etc... whatever
# now, say, plot 3rd column against 4th column:
plot (data(:,3), data(:,4))
this style of work is generally much more efficient than writing
loops. This is Octave, not C.
hth
--
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz