[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1
From: |
Rik |
Subject: |
[Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size |
Date: |
Tue, 12 May 2020 23:11:35 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; WOW64; Trident/7.0; rv:11.0) like Gecko |
Follow-up Comment #5, bug #58345 (project octave):
In previous versions of Octave textscan was an m-file, and hence slow. In
newer versions textscan is written in C++ and might be fast enough for your
needs. Similarly, fgetl is written in C++, but generally it is wrapped in a
for loop in an m-file and loops in interpreted languages are slow (unless
there is a JIT compiler).
I would look at storing the data in a uniform format so that you could take
advantage of some of the routines written in C++.
As an example, for 2-D matrices. If you just write a text file that contains
numbers separated by spaces you can use 'load' to have the interpreter read in
all the values.
For example, I made this file and called it A.var (attached to the bug report
as well).
1 3.1415 2
-1e2 10 2.718
I can then do
octave:1> load A.var
octave:2> A
A =
1.0000 3.1415 2.0000
-100.0000 10.0000 2.7180
However, it turns out that even load() is ~3X slower than dlmread().
To test, I first created an array of 1GB in size
octave:13> sz = ceil (sqrt (1e9/8));
octave:14> x = rand (sz, sz);
octave:15> whos x
Variables visible from the current scope:
variables in scope: top scope
Attr Name Size Bytes Class
==== ==== ==== ===== =====
x 11181x11181 1000118088 double
Then I wrote it out to a space-separated file.
octave:18> dlmwrite ('x.var', x, ' ');
The resulting file is 2.3GB in size. Next, I tried reading it back in with
dlmread
octave:23> tic; x = dlmread ('x.var', ' '); toc
Elapsed time is 107.864 seconds.
That's okay, I guess. The result was worse for a straight load.
octave:26> tic; load ('x.var'); toc
Elapsed time is 342.51 seconds.
Even with textscan now written in C++, it is far too slow.
octave:41> tic; c = textscan (fid, fmt); toc
Elapsed time is 1911.16 seconds.
Clearly, however, the winner is to use a binary format rather than a text
one.
octave:29> tic; save -binary x.bin x; toc
Elapsed time is 0.811615 seconds.
octave:30> clear x
octave:32> tic; load x.bin; toc
Elapsed time is 1.38077 seconds.
For this purpose fwrite()/fread() are just about as fast as well.
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?58345>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #58345] Crash on executing a 1GB script file, anonymous, 2020/05/10
- [Octave-bug-tracker] [bug #58345] Crash on executing a 1GB script file, Rik, 2020/05/12
- [Octave-bug-tracker] [bug #58345] Crash on executing a 1GB script file, Guillaume, 2020/05/12
- [Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size, Rik, 2020/05/12
- [Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size, anonymous, 2020/05/12
- [Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size,
Rik <=
- [Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size, Rik, 2020/05/12
- [Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size, anonymous, 2020/05/13
- [Octave-bug-tracker] [bug #58345] Emit error and don't parse scripts > 1GB in size, Rik, 2020/05/26