[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: xlsread in Octave 3.6.4
From: |
Philip Nienhuis |
Subject: |
Re: xlsread in Octave 3.6.4 |
Date: |
Mon, 30 Sep 2013 18:42:41 +0200 |
User-agent: |
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.1.11) Gecko/20100701 SeaMonkey/2.0.6 |
Hey Philip, hope everything is going well so far.
I've take a look at io source on svn. Are you finished already?
Referring to __OCT_spsh_open__.m line 51
"## Read file contents. For some reason fgetl needs to be called twice"
In OOXML you have to do this twice because the file hast just two lines.
The first line includes just this: "<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>"
All relevant datas are in the second line. Because of this I triggered
fgetl two times.
Yeah I found out this is valid for most XML files.
Do you just figured out in __OCT_spsh_open__.m whitch sheet should be
opened (given by name or index)?
So you need after __OCT_spsh_open__.m
lastintf = "OCT";
xls.filename = filename;
? If yes, I can split-out this part easily from xlsxread.
Furthermore, what's up with this?
"## FIXME find a class that doesn't display as one looooong string"
That's only for ODS.
ODS, once unzipped, has all of its sheets & formatting in one big file
called "content.xml".
To avoid time-consuming re-reading and re-parsing these xml contents
again and again for each sheet, I decided to just put the entire text
string + pointers to the sheets in the ods file pointer struct.
In unzipped OOXML however each sheet is in a separate file; so for .xlsx
it makes sense to just store the location of the tmp dir containing the
unzipped files.
Back to .ods: later I'll try if just storing the file position pointers
and using fseeks etc. isn't too much of a performance penalty.
That'll save memory, although reading really big spreadsheet files
probably isn't daily routine (even at my work where we spend lots of
time digging around in large databases which are still mostly in Excel
and .mat files).
<snip>
I've found a second mark for xlsx in __OCT_getusedrange__.m: line 43 to
50.
Yeah that's meant "for you".
I guess just extracting the used range from xlsxread is not possible as
you do for ods.
You mean: the really occupied cell range, not the formatted range.
Yeah, maybe for later. Although the latter would do for the moment. As
we saw earlier it is kept somewhere in the top section of one of the
unzipped files.
Note that COM/ActiveX (i.e., Excel behind-the-scenes), and therefore
Matlab as well, has the very same issue. So it isn't all that bad.
__readnum is reading the value and the position of the value. The used
range results from that. So reading just the position in
__OCT_getusedrange__.m is possible, but this information must not be
lost.
But maybe you do this in __ods_get_sheet_dims__.m?
__ods_get_sheet_dims__ is only meant for ODS.
It was written a few years ago for older versions of jOpenDocument when
those still had no methods for determining the used range. As
jOpenDOcument reads ods files as a long text string into the Java heap
memory, it was easy to copy it to the Octave world and apply Octave's
string functions. Already then I saw it as a first step to reading .ods
w/o Java. The unzip part has long been in the way until you sorted that out.
A shedule would be nice. Which file/function comes first, which
information goes in this file/function and which information have to go
out to the next...
See below.
Furthermore I've take a (short) look on xmlread.
No, it's not just "bad documented", it is simple not documented.
Let's say "not" is a subset of "bad" (or the other way round).
There *is* a help string (just try "help xmlread") so I don't agree it
isn't documented at all :-)
Last night it occurred to me that the various Octave/Octave-Forge
mailing lists from around the time Laurent committed these files might
contain a clue.
Trying to use it with OOXML just give an error. "bad declaration" (It
stumbles over the first line (<?xml version="1.0" encoding="UTF-8"
standalone="yes"?>).
By the way, Matlab isn't able to read it too with xmlread :)
It may be (but I know really little about XML) that a .dtd file
(Document Type Description) is needed.
The first line should be skipped in all cases. That is easily fixed in
the source of xmlread.
As to what I think is needed for xlsx in the io package (1.2.4):
-----------------------------
xlsread.m:
Nothing (apart from the help text header, I'll do that)
xlsopen.m, xls2oct.m, xlsclose.m:
Extra stanzas for OCT interface, plus header text. A'la similar parts
for odsopen/ods2oct/odsclose
I could do these too, it's not much work.
private/ functions:
-------------------
__OCT_spsh_open__.m
Unzipping the xlsx file, storing the unzipped dir pointer in the file
pointer struct, perhaps also the sheet file names. Most is already in
place (at the top), there's a section at the bottom for xlsx.
__OCT_xlsx2oct__.m
New file. Contains most of your xlsxread.
__OCT_getusedrange__.m
Needs a little stanza where the stored used range is to be assessed.
Easy, use the <dimension ref="<range>" /> from the top of the worksheet
files; you could use getxmlnode.m/getxmlattv.m to get the value,
parse_sp_range to translate it to row/column numbers.
__OCT_spsh_close__.m
A stanza for xlsx. Just wiping the tmpdir and returning an empty struct
is all that is needed.
__OCT_spsh_info__.m
To be assessed later. AFAICS it could be used w/o change if you set up
the file pointer struct (output of __OCT_spsh_open__.m) similar to what
is done for .ods
FYI I have plans to include reading gnumeric files and add them to the
OCT interface as well. gnumeric also has XML files.
On a related note:
Markus wrote:
> Am 2013-09-18 22:34, schrieb Philip Nienhuis:
<snip>
>> * do you want to add more options, like
>> > specifying a range to read
>
> Range-read would be nice but it has the lowest priority for me.
It doesn't look too hard to get a long way here. After all you can
easily strip a subsection of all rows in a worksheet and apply
'xlsxread' to it. One might read to many columns but that's a matter of
stripping those away.
Philip
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], (continued)
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/18
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/20
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/21
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Philip Nienhuis, 2013/09/21
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/22
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], PhilipNienhuis, 2013/09/22
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/22
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/22
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], Markus, 2013/09/29
- Re: Fwd: Test files [WAS: Re: xlsread in Octave 3.6.4], PhilipNienhuis, 2013/09/29
Re: xlsread in Octave 3.6.4,
Philip Nienhuis <=