help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Loading a large and unusually formatted dataset into an Octave matri


From: Ben Abbott
Subject: Re: Loading a large and unusually formatted dataset into an Octave matrix
Date: Wed, 19 Jun 2013 01:23:59 +0000 (GMT)

On Jun 18, 2013, at 02:01 PM, Przemek Klosowski <address@hidden> wrote:

On 06/14/2013 02:04 PM, Elliot Gorokhovsky wrote:
>> > Hello! I am a new octave user and I am trying to predict the price of
>> > bitcoins 15 minutes in advance via neural networks for use on the
>> > website btcoracle.com <http://btcoracle.com> <http://btcoracle.com>.
>> I have about a gigabyte of
>> > data that looks like this:

{"data":1279408157,"price":"0.04951","amount":"20","price_int_":"4951","amount_int":"2000000000","tid":"1","price_currency":"USD","item":"BTC","trade_type":""},
{"data":1279424586,"price":"0.05941","amount":"50.01","price_int_":"5941","amount_int":"5001000000","tid":"2","price_currency":"USD","item":"BTC","trade_type":""},
{"data":1279475336,"price":"0.08080","amount":"5","price_int_":"8080","amount_int":"500000000","tid":"3","price_currency":"USD","item":"BTC","trade_type":""}

I responded, trying a=fscanf(FD,"%d %d")

Eh, make it a=fscanf(FD,"%f %f") of course:

command="perl -F'\"' -lane 'print \"$F[5] $F[9]\"' /tmp/bitcoin";
a=reshape(fscanf(popen(command,'r'),"%f"),2,[]);

-lane is a very useful Perl idiom that splits every line into words into
array F ($F[0], $F[1], etc). -F" changes the word break character to the
double quote. After that, some judicious quoting of special characters
et voila.

As usual, I got bitten by *scanf code failing silently unless the format
string is perfect. Does anyone have good tips on debugging issues like
that? a way to figure out how far into the input did the format string
match?
 

From a bash prompt, you perl command works as expected.


perl -F'"' -lane 'print "$F[5] $F[9]"' bitcoin.txt 

0.04951 20

0.05941 50.01

0.08080 5


The snippet below works for me.


cmd = "perl -F'\"' -lane 'print \"$F[5] $F[9]\"' bitcoin.txt"

unwind_protect

pid = popen (cmd, "r");

while (ischar (s = fgets (pid)))

 fputs (stdout, s);

endwhile

unwind_protect_cleanup

 pclose (pid);

end_unwind_protect


If you modify this code (sscanf() for fputs()) to load a lot of data this the array(s) will be resized on each sscanf().  That will be inefficient.  Something like the code below will be faster.


cmd = "perl -F'\"' -lane 'print \"$F[5] $F[9]\"' bitcoin.txt > data.txt"

[status, output] = system (cmd);

data = "" ("data.txt");


Ben



reply via email to

[Prev in Thread] Current Thread [Next in Thread]