help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Pre-Allocating Memory for Speedup


From: Manoj Deshpande
Subject: Re: Pre-Allocating Memory for Speedup
Date: Tue, 27 Jul 2010 12:13:20 -0400

That approach works very well, Jaroslav, there is significant speed up because of row filtering. I plan to add a loop for the days, but read single days data all at one shot as you suggested.

Thanks CdeMills for your suggestion too.

Regards,
Manoj

On Fri, Jul 23, 2010 at 1:36 AM, Jaroslav Hajek <address@hidden> wrote:
On Fri, Jul 23, 2010 at 6:58 AM, Manoj Deshpande
<address@hidden> wrote:
> My intention is to loop over unknown size text files, each having rows of 16
> coloumns, by traversing each row , and selectively appending to 12 different
> matrices(1 col wide) and let all the matrices grow. At the end of parsing
> all files, i wish to do some operation over each of the matrices.
>
> Based on the feedback i received on this forum, i have below
>
> 1. create cell arrays , each of size defined by file size( wc -l filename)
> 2. read file line by line, do sscanf into array of 16 values, do conditional
> checks and append to one of the matrices. (see below)
>
> Questions.
>
> 1. How do i optimize sscanf to directly read to an array(onerow), instead of
> the ugly workaround which i did ?
> 2. the below code has a problem, file size is 330,000 rows, and  i
> pre-allocate 12 matrices, each is zeros(330,000,1), i cannot APPEND to this
> matrix, since the zeros take space, i should be really tracking the row
> index, and over writing the zero starting from index 1, right ? in that case
> i will need row iterators , for each matrix, (see row_index in code ), so i
> end up with 12 iterators. The problem with this approach though is that
> since  i have 5 days to track, and hence 5 files to go over, and then append
> all the matrices, i.e matrix 5 of day 1 with matrix 5 of day2 , and so
> on.... i would end up having 60 different matrices, and 60 different
> iterators. Is there a better approach, without affecting speed ?
> 3. There is another unknown. at the end ofparsing a file, not all matrices
> will be of the same size, i.e it is not guaranteed, that every row in the
> file, will be useful  and hence each of matrices maynot necessarily be 330k,
> it could be lesser. I do not mind skipping the 5 day loop ( thereby avoid
> nested loops ), and actually serially creating sets of 12 matrices , one set
> of 12  for each day, and finally appending all correspodning matrices, if we
> have a cleaner approach for a single day.
> 4. the 12 matrices, are only  1 col wide, and i need to do some plots over
> the values, so maybe we dont necessarily need to use matrices? arrays ?
> cells ?
>
> Any suggestions, are deeply appreciated. I am hoping the gurus will have
> code review comments.
>
> Thanks
> Manoj
>
>

Given that you know the number of columns, why don't you just load the
whole matrix at once, filter rows,
and then plot? 330000 rows should be manageable (<50MB). Should there
be many more rows, you can proceed in chunks.

fid = fopen (file_name);

# possibly skip header lines

data = "" ("%f", [16, Inf]).';

# interval check

interval=floor((mod(data(:,1),86400)/interval_size));

# filter data by interval

data = "" == interval_num);

# mac address check

mac_address = data(:,2);
# etc... whatever

# now, say, plot 3rd column against 4th column:

plot (data(:,3), data(:,4))

this style of work is generally much more efficient than writing
loops. This is Octave, not C.

hth

--
RNDr. Jaroslav Hajek, PhD
computing expert & GNU Octave developer
Aeronautical Research and Test Institute (VZLU)
Prague, Czech Republic
url: www.highegg.matfyz.cz


reply via email to

[Prev in Thread] Current Thread [Next in Thread]