help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Import large field-delimited file with strings and numbers


From: Philip Nienhuis
Subject: Re: Import large field-delimited file with strings and numbers
Date: Sat, 6 Sep 2014 13:03:59 -0700 (PDT)

Joao Rodrigues wrote
> I need to import a large CSV file with multiple columns with mixed 
> string and number entries, such as:
> 
> field1, field2, field3, field4
> A,        a,        1,       1.0,
> B,        b,        2,        2.0,
> C,        c,        3,        3.0,
> 
> and I want to pass this on to something like
> 
> cell1 ={[1,1] = A; [2,1] = B; [3,1] = C};
> cell2 ={[1,1] = a; [2,1] = b; [3,1] = c};
> arr3 =[1 2 3]';
> arr4 =[1.0 2.0 3.0]';
> 
> furthermore, some columns can be ignored, the total number of entries is 
> known and there is a header.

If you can get rid of the header and if the number of columns on each line
is constant, csv2cell() in the io package is by far the fastest. It can read
mixed numerical/text data.

Another useful trick that I sometimes use myself would be to read the file
with textscan but then in chunks. You can specify the number of lines to
read. textscan should remember the file position (see "help textscan"). 
After having read chunk# N, you can simply restart textscan (w/o headerlines
param!) to read chunk# N+1, and repeat until EOF.
If you start with a small chunk you can check if the format string works. 
Later on it is easy to append the data columns together (i.e., vertically
concatenate the output cell matrices of textscan).

As for file size: I more or less regularly use textscan to read 30-50 MB csv
files with 32-bit Octave in one swoop. Takes a while, true, but it works.

Philip




--
View this message in context: 
http://octave.1599824.n4.nabble.com/Import-large-field-delimited-file-with-strings-and-numbers-tp4666380p4666388.html
Sent from the Octave - General mailing list archive at Nabble.com.



reply via email to

[Prev in Thread] Current Thread [Next in Thread]