help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Spreadsheet translation


From: Philip Nienhuis
Subject: Re: Spreadsheet translation
Date: Fri, 31 Jan 2020 13:35:23 +0100
User-agent: Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0

"Markus Mützel" wrote:
Am 31. Januar 2020 um 10:31 Uhr schrieb "Philip Nienhuis":
Well, OCT can (I wrote the code for it) but .xlsx is a very very
complicated spreadsheet format (basically it's a compressed nested
directory tree full of XML files), and we read it using regular
expressions while XML had better be read using XML parsers and such.
In fact, all the Java based spreadsheet I/O classes are based on XML
parsers/validators etc. But regexps are much faster than XML (although I
often think OTOH they're much more fragile).

This legendary stackoverflow answer comes to mind ;-)
https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Markus

Thanks Markus, I'll add that link to the wiki.
An amusing read - further down the arguments pro and con from the strict vs. loose/practical developer types makes for more interesting reading moments :-)

As to our issue, it's especially the order of XML tags in the cell nodes that makes our regexps fragile. Luckily, both Excel and LibreOffice write those tags in similar order (~alphabetical, AFAICT). But maybe some Java based spreadsheet I/O class or embedded SW or so will do otherwise, and then we might have a bit of trouble. I'm unsure if there are rules for the order of appearance of tags in XML. OTOH reading and writing just simple data, which Octave does, isn't so terribly complicated and demanding.

Philip




reply via email to

[Prev in Thread] Current Thread [Next in Thread]