octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: New importdata function testing


From: Philip Nienhuis
Subject: Re: New importdata function testing
Date: Mon, 22 Oct 2012 09:34:20 -0700 (PDT)

Rik-4 wrote
> On 10/22/2012 05:51 AM, Jordi Gutiérrez Hermoso wrote:
>> On 21 October 2012 12:07, Rik <

> rik@

> > wrote:
>>> 10/20/12
>>>
>>> Erik,
>>>
>>> I did just a small test with importdata and it doesn't seem to work as
>>> expected.
>>>
>>> For a file, I used import.tst containing
>>>
>>> 1,2,3
>>> 4,5,6
>>>
>>> And then in Octave, I used
>>> importdata ('import.tst', ',')
>>> warning: unrecognized escape sequence '\S' -- converting to 'S'
>> Oops, my bad:
>>
>>      http://hg.savannah.gnu.org/hgweb/octave/rev/9a455cf96dbe#l2.365
>>
>>> I am also concerned that the implementation reads the entire file into a
>>> string and then uses a number of for loops and regexp which will be slow
>>> in
>>> Octave.  I did a benchmark with the following:
>>>
>>> x = rand (1e4, 10);
>>> dlmwrite ('tst.csv', x, ',')
>>> tic; y = dlmread ('tst.csv', ','); toc
>>> Elapsed time is 0.209933 seconds.
>>> tic; y = importdata ('tst2.csv', ','); toc
>>> Elapsed time is 3.2 seconds.
>>>
>>> I believe it would be faster  to have importdata check the header lines
>>> only and then pass off the work to dlmread if possible.  dlmread is
>>> written
>>> in C++ and, per the benchmarking above, is very fast.
>> It would be preferrable if we could write some minimum common subset
>> of this family of functions as a C++ function and leave the rest in
>> m-file language. I consider writing code in C++ a last resort for
>> optimisation at the very high cost of making the code less
>> understandable for most people. Many of our users are scared by C++,
>> but any Octave user understands the m-file language.
> I think that is why my proposal would make sense.  The parsing of the
> header lines could be done with an m-file script because there won't be
> much work to do there, and then reading could be passed off to dlmread
> which is already a core Octave and Matlab function.  I don't propose
> writing any more C++ if it can be avoided.  On that note, there has been
> talk of having a C++ version of textscan.  When that is done a number of
> these functions could switch to relying on that function because it is the
> most general and can accept mixed numeric and text data.

Well, if mixed data processing is needed (as it seems, looking at
importdata.m) I'd again suggest to look at moving csv2cell, cell2csv,
csvexplode and csvconcat from the io package to core. They are C++, can
process mixed data well and are fairly reasonably commented.

For importdata.m, just have an .m-style wrapper to skip headers, read the
rest with csv2cell, process afterward. 

In textscan.m and textread.m (core) there's ready-baked & tested code for
skipping headerlines.

In the io package there's also parsecell.m which separates numeric and
string data from a mixed-type cell array (as that's usually returned by
spreadsheet code), crops the results from empty outer rows/columns and tells
you which enveloping rectangles in the raw cell array the numeric and string
arrays came from. It may not be perfect (while loops etc) but as it stands
it's not that bad.

So, all the basic building blocks are present.

I just can't assign it priority right now, but I'd be willing to look at it
by the end of November.

Philip




--
View this message in context: 
http://octave.1599824.n4.nabble.com/New-importdata-function-testing-tp4645570p4645613.html
Sent from the Octave - Maintainers mailing list archive at Nabble.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]