octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Table I/O [WAS: io-2.4.13 released]


From: PhilipNienhuis
Subject: Table I/O [WAS: io-2.4.13 released]
Date: Fri, 18 Oct 2019 16:19:46 -0500 (CDT)

apjanke-floss wrote
> On 10/18/19 2:29 AM, Philip Nienhuis wrote:
>> Kai Torben Ohlhus wrote:
>>> On 10/18/19 5:30 PM, PhilipNienhuis wrote:
>>>> [...]
>>>>
>>>> For readTable() and friends Octave needs to have the Table class
>>>> implemented. To my knowlegde there are no definite plans for that; 
>>>> there is
>>>> a prospect of a start by Markus Bergholz somewhere in (IIRC) the bug
>>>> tracker, from several years back.
>>>>
>>>
>>> Regarding this, I want to remind of Andrews project [1].
>> 
>> Thanks, I forgot about that. Yeah, an impressive piece of work. But, 
>> still no Table I/O there AFAICS.
>> On github (or gitlab?) I saw a readTable() that effectively is a wrapper 
>> for xlsread. Could be a temporary solution.
> 
> Thanks Kai and Philip! Table I/O is on my TODO list for Tablicious. Now 
> that I know there's appetite for it, I'll bump up its priority.
> 
> https://github.com/apjanke/octave-tablicious/issues/49
> 
> It looks like Forge io's spreadsheet reading is mature enough that I 
> could build Table I/O on top of that. And that seems like a good design: 
> no need for another package to re-implement basic spreadsheet I/O.
> 
> With one exception I can see: cell formatting. I need to be able to 
> detect which columns are formatted as dates, so they can automatically 
> be converted to @datetime objects. (And eventually I'd like to provide 
> the option to efficiently read selected columns as @categorical, but 
> that's much lower priority.)
> 
> Is there an efficient way in io's spreadsheet support to detect the 
> "type"/format of cells/columns? Or would you be willing to work with me 
> to add one to io? (Maybe by just exposing the cell format as yet another 
> output arg of xlsread()?)
> 
> Same with writing tables: I'd need a way to control cell formatting so 
> that dates could be formatted as dates explicitly instead of shooting 
> them in as strings and relying on Excel's auto-parsing functionality 
> (which I think only works when you're using the Excel COM interface 
> anyway). Would also be nice to do stuff like bold column headers and 
> freeze panes.

First off, sorry for a long reply.

I think, no I'm sure that all that you want there is possible. But it isn't
going to be easy.

The first thing you'll hit (at least, I hit it) is that Octave itself has no
Date or Time type. Only since classdef got implemented there may be a way
out, but for classdef objects there still is no reliable I/O to e.g., .mat
files.
This (no dedicated date or time type) is one of the reasons I left dates and
times aside for spreadsheet I/O. In fact, for file types and support
libraries that do offer date/time types I made the io package convert
date/time values into Octave datenums = doubles.

Then, when you write about cell formatting to uncover cell type I can't help
smelling "Excel". But .ods has a much richer cell type spectrum, maybe you
can largely skip formatting there.

Furthermore, it looks like you imply something like dataframes with headers
and row id's but the spreadsheet file types I know of really only have
individual cell types inside. Spreadsheet I/O usually happens at a fairly
low level (i.e., individual cells). Formatting ("styles") might happen over
ranges but not necessarily along table paradigms.
But I know of Java support SW (e.g., jOpendocument) that might also offer
higher level I/O - the level of entire tables. ActiveX also does that
("Ranges"). I once tried such I/O with jOpenDocument but it was too
complicated for me at the time as there's also a lot of Java itself
involved, i.e., it seemed I needed to build my own .jars. (Oh, and now I
remember) javaArrays didn't work well at the time; even these days I have
the impression they're not so robust (but I may well be wrong).

A bit of context:

.xls files (the old BIFF5 and BIFF8 ones) have no dedicated Date, Time or
DateTime cell type. In fact they basically only have Double, Text and
Boolean. The contents are further indicated by cell formatting; that way one
can find out -in theory!- which text cells are formulas and/or date/time. I
think this is what you referred to when mentioning "cell formatting".
BUT: AFAICS most of the Java libraries for spreadsheet I/O that I used (and
on Windows, ActiveX) shield this from you and offer at most a Formula cell
type. So DateTime etc. is harder to uncover; although in Visual Basic
(ActiveX) there is a dedicated type (see __COM__.cc in the OF windows
package).
The newer .xlsx files do have dedicated Date/Time cell types, see e.g.
__OCT_xlsx2oct__.m; the OCT interface explicitly processes them into
datenums :-)

.ods has a richer type spectrum, it does have Date, Time and even Currency
cell types. But again, not all spreadsheet interfaces return all cell types.

All in all, asking to uncover spreadsheet cell types beyond Double, Boolean,
String and Formula is asking to open a big and fully stuffed can of worms
:-)   It is all file type and interface dependent.
FYI, down in the io package NEWS file there's a table outlining which file
formats can be processed by what interfaces. That'll give a first indication
of complexity.

I am surely willing to help you out, given that I do not have not so much
time anymore for Octave these days. But yeah I think it'll be fun :-) 
As the io package's spreadsheet I/O is so old (it started > 10 years ago)
and has been Just Working all that time I have to admit that my
"operational" knowledge of its innards got fairly rusty.

Maybe open a task in the Task Tracker? that tracker is largely, but IMO
unduly, dormant.

Philip




--
Sent from: https://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]