help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Speed of odsread


From: siko1056
Subject: Re: Speed of odsread
Date: Fri, 22 Sep 2017 01:57:14 -0700 (MST)

hjborsje wrote
> I'm comparing the speed of loading a LibreOffice (*.ods) spreadsheet in
> Octave and MATLAB.  The results are so vastly different that I wonder if
> I'm
> doing it correctly.  A spreadsheet with 1900 rows, 23 columns takes 135
> seconds in Octave. In Matlab 5.5 seconds.   A larger spreadsheet of 60,000
> x
> 23 takes 10.5 seconds in Matlab. Octave did not finish after 10 minutes.  
> I
> run Octave 4.2.1 with all the latest packages on Windows 10.  I enter:
> [s,~,~,~] = odsload ('MySpreadsheet.ods',1);  Adding a third and fourth
> parameters makes no significant difference. RData_01b.ods
> <http://octave.1599824.n4.nabble.com/file/t372538/RData_01b.ods>  

Are you using this code [1] for your benchmark?  Java is not as native to
Octave as it is to Matlab (the whole GUI is Java), so any attempt to use
Java is overhead.  I tried the odsread function from the io package [2]. 
The results are far better than yours but still have factor 2 with Matlab. 
To use the io package, just type:

>> pkg install -forge io
>> pkg load io
>> javaaddpath ("jOpenDocument-1.3.jar")          # Available from [3]
>> javaaddpath ("xerces-2_11_0/xercesImpl.jar") # Available from [4]

>> N = 10; t = 0; for i = 1:N, tic; X = odsread ('RData_01b.ods'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

csvread: avg. 10.013385 seconds

As documented in [2], it is a bit more time saving, when having multiple
reads to open the document once and only invoke the reading, but not that
much better:

>> ods = odsopen ('RData_01b.ods'); N = 10; t = 0; for i = 1:N, tic; X =
>> ods2oct (ods); t = t + toc; end, fprintf('csvread: avg. %f seconds\n',
>> t/N), ods = odsclose (ods);

csvread: avg. 7.611418 seconds


Anyway, when dealing with a huge amount of data, I think that an Excel- or
ODS-spreadsheet is like running a marathon in   knight's armor, it is an
unnecessary overhead. Simply save your data from your application of choice
as comma separated values (CSV) and see the magic that works even far beyond
60,000 data rows:

>> N = 10; t = 0; for i = 1:N, tic; X = csvread ('RData_01b.csv'); t = t +
>> toc; end, fprintf('csvread: avg. %f seconds\n', t/N)

Matlab R2017a: csvread: avg. 0.014881 seconds
Octave 4.2.1: csvread: avg. 0.020552 seconds

Kai.

[1]:
https://www.mathworks.com/matlabcentral/fileexchange/28411-read-and-write-open-document-format--odf--spreadsheet---ods-
[2]: https://octave.sourceforge.io/io/function/odsread.html
[3]: http://www.jopendocument.org/downloads.html
[4]: https://xerces.apache.org/mirrors.cgi#binary

RData_01b.csv
<http://octave.1599824.n4.nabble.com/file/t370282/RData_01b.csv>  



--
Sent from: http://octave.1599824.n4.nabble.com/Octave-General-f1599825.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]