help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Loading a large and unusually formatted dataset into an Octave matri


From: Przemek Klosowski
Subject: Re: Loading a large and unusually formatted dataset into an Octave matrix
Date: Wed, 19 Jun 2013 11:52:35 -0400
User-agent: Mozilla/5.0 (X11; Linux x86_64; rv:17.0) Gecko/20130514 Thunderbird/17.0.6

On 06/18/2013 09:23 PM, Ben Abbott wrote:
On Jun 18, 2013, at 02:01 PM, Przemek Klosowski wrote:
command="perl -F'\"' -lane 'print \"$F[5] $F[9]\"' /tmp/bitcoin";
a=reshape(fscanf(popen(command,'r'),"%f"),2,[]);


 From a bash prompt, you perl command works as expected.

Am I to infer that those two Octave commands fail to work for you? They read the file for me (Octave 3.6.4 on 64-bit Fedora 19)

If you modify this code (sscanf() for fputs()) to load a lot of data
this the array(s) will be resized on each sscanf().  That will be
inefficient.

I didn't check but I think that there would be no Octave-level re-sizing in the above a=reshape(...) command, unless fscanf does it internally.


Actually, I found another way of reading that avoids external perl, using textread()'s "delimiter" option. Sorry for droning on about it, but I hope it would be useful to others---I often need to read odd files into octave, and I found it often awkward; this looks like a pretty general approach:

b=textread("/tmp/bitcoin","%f","delimiter",'"');

It's the same trick of breaking each line on double-quotes to extract the content of quoted strings. The resulting array contains a lot of junk, including one extra character from the last line, so I reshape() it and extract a subset of rows containing valid numbers:

reshape(b(1:end-1),[],3)([6,10,14,18,22],:)

This is rather unreadable code but it's actually quite easy to arrive at by trial and error on the Octave command line. In order to find out the required reshaping and to find out what row indexes to use, I found it useful to print pieces of the array using

format + +-.

which shows a compact representation of the jumble that makes it easy to read various required counts off the screen output.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]