help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Why is sscanf() so slow?


From: Dirk Eddelbuettel
Subject: Re: Why is sscanf() so slow?
Date: Sat, 25 Jan 2003 15:24:14 -0600
User-agent: Mutt/1.3.28i

On Sat, Jan 25, 2003 at 02:45:39PM -0600, stefan wrote:
> Dear Octaver's,
> 
> first of all:  Thanks for great software.  I am using it now for about two
> years, especialy for displaying and adjusting measured data (comes from
> some measurement bus system to computer).  The software which drives these
> devices almostly always produce tab- or comma-seperated data.
> 
> For this I do:
[...]
> or very likely...
> 
> For more than 1000 lines this takes *ages*.  Is there a better way to do
> so or is it a slow implementation in octave?  I would like to see this
> improved some day.

Please see below for the function aload.m from the octave-ci collection by
Kurt Horik et al; I used to use this a lot.  It essentially pre-processes
the data first, and then uses a normal load (in ascii mode). I never quite
figure out why JWE didn't include it into Octave itself when he chose to
include other octave-ci functions.  Anyway, there is a Debian package of
octave-ci, a tarball in Vienna, Austria. Paul Kienzel also has something
similar in octave-forge.

Hope this help,  Dirk


## Copyright (C) 1996, 1997  Kurt Hornik
## 
## This program is free software; you can redistribute it and/or modify
## it under the terms of the GNU General Public License as published by
## the Free Software Foundation; either version 2, or (at your option)
## any later version.
## 
## This program is distributed in the hope that it will be useful, but
## WITHOUT ANY WARRANTY; without even the implied warranty of
## MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
## General Public License for more details. 
## 
## You should have received a copy of the GNU General Public License
## along with this file.  If not, write to the Free Software Foundation,
## 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.

## x = aload (filename [, cw [, rw [, FS [, NA [, ignore_regexp]]]]])
## loads the flat ASCII data file `filename' into x.
##
## With the optional parameters cw and rw one can select the data
## columns (variables) and rows (observations) to load.  Both cw and rw
## may be index vectors or Inf (default), meaning to load everything.
##
## With FS, one can specify the field separator in the data file as one
## would do in AWK.  Default is " ".
##
## With NA, one can specify how unavailable data are represented in the
## data file, and how they should be loaded into Octave.  The default is
## "NA/NaN", meaning that NA's should be converted to NaN's.  (Note that
## this does not work yet.)
##
## Finally, ignore_regexp is an egrep regular expression specifying
## which lines in the data file should be ignored.  The default is
## "^[\t]*(#|%|$)", meaning that empty lines and lines where # or % are
## the first non-whitespace characters are ignored.
##
## Note that rw selects the data line (observation) numbers and NOT the
## line numbers in the file!
##
## Note also that currently, only real numbers can be loaded.
  
## Author:  KH <address@hidden>
## Description:  Load from a flat ASCII data file

function x = aload (filename, cw, rw, FS, NA, ignore_regexp)

  if ((nargin < 1) || (nargin > 6))
    usage ("aload (filename, cw, rw, FS, NA, ignore_regexp)");
  endif

  if (nargin < 6)
    ignore_regexp = "^[ \t]*(#|%|$)";
  endif
  if (nargin < 5)
    NA = "NA/NaN";
  endif
  if (nargin < 4)
    FS = " ";
  endif
  if (nargin < 3)
    rw = Inf;
  endif
  if (nargin < 2)
    cw = Inf;
  endif

  ## maybe_do_more_sanity_checks ();  

  if !is_struct (stat (filename))
    error (sprintf ("aload:  File '%s' not found", filename));
  endif

  tmpfile = octave_tmp_file_name ();

  system (["cat ", filename, " | ", ...
           "egrep -ve \'", ignore_regexp, "\' | ", ...
           "sed -e 's/", NA, "/g' > ", tmpfile]);

  eval (system (["cat ", tmpfile, " | ", ...
                 "awk 'BEGIN { FS = \"", FS, "\" }; ", ...
                 "END { printf \"rf = %g; cf = %g;\", NR, NF }'"]));

  if (cw == Inf)
    cw = 1 : cf;
  elseif (min (size (cw)) == 1)
    cw = cw (find (cw <= cf));
  else
    error ("aload:  cw must be a scalar or a vector");
  endif

  if (rw == Inf)
    rw = 1 : rf;
  elseif (min (size (rw)) == 1)
    rw = rw (find (rw <= rf));
  else
    error ("aload:  rw must be a scalar or a vector");
  endif

  loadfile = octave_tmp_file_name ();

  fd = fopen (loadfile, "w");
  fprintf (fd, "# name x\n# type: matrix\n");
  fprintf (fd, "# rows: %g\n# columns: %g\n", length (rw), length (cw));
  fclose (fd);
  
  s = sprintf ("$%d", cw(1));
  for i = 2 : length (cw);
    s = sprintf ("%s, $%d", s, cw(i));
  endfor

  system (["cat ", tmpfile, " | ", ...
           "awk 'BEGIN { FS = \"", FS, "\" }; { print ", s, " };' ", ...
           " >> ", loadfile]);

  eval (["load -force -ascii ", loadfile]);

  x = x(rw, :);

  system (sprintf ("rm -f %s %s", tmpfile, loadfile));
  
endfunction



> For now I do 'save -mat-binary %s values' to keep loading times short next
> time.  Some cache algo around the above code.
> 
> Any help is appreciated,
>       address@hidden
> 
> 
> 
> -------------------------------------------------------------
> Octave is freely available under the terms of the GNU GPL.
> 
> Octave's home on the web:  http://www.octave.org
> How to fund new projects:  http://www.octave.org/funding.html
> Subscription information:  http://www.octave.org/archive.html
> -------------------------------------------------------------
> 

-- 
Prediction is very difficult, especially about the future. 
                                             -- Niels Bohr



-------------------------------------------------------------
Octave is freely available under the terms of the GNU GPL.

Octave's home on the web:  http://www.octave.org
How to fund new projects:  http://www.octave.org/funding.html
Subscription information:  http://www.octave.org/archive.html
-------------------------------------------------------------



reply via email to

[Prev in Thread] Current Thread [Next in Thread]