Re: Spreadsheet translation

help-octave

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Spreadsheet translation

From:	Philip Nienhuis
Subject:	Re: Spreadsheet translation
Date:	Fri, 31 Jan 2020 13:35:23 +0100
User-agent:	Mozilla/5.0 (Windows NT 6.1; rv:52.0) Gecko/20100101 Firefox/52.0

"Markus Mützel" wrote:

Am 31. Januar 2020 um 10:31 Uhr schrieb "Philip Nienhuis":

Well, OCT can (I wrote the code for it) but .xlsx is a very very
complicated spreadsheet format (basically it's a compressed nested
directory tree full of XML files), and we read it using regular
expressions while XML had better be read using XML parsers and such.
In fact, all the Java based spreadsheet I/O classes are based on XML
parsers/validators etc. But regexps are much faster than XML (although I
often think OTOH they're much more fragile).


This legendary stackoverflow answer comes to mind ;-)
https://stackoverflow.com/questions/1732348/regex-match-open-tags-except-xhtml-self-contained-tags/1732454#1732454

Markus


Thanks Markus, I'll add that link to the wiki.

An amusing read - further down the arguments pro and con from the strictvs. loose/practical developer types makes for more interesting readingmoments :-)

As to our issue, it's especially the order of XML tags in the cell nodesthat makes our regexps fragile. Luckily, both Excel and LibreOfficewrite those tags in similar order (~alphabetical, AFAICT). But maybesome Java based spreadsheet I/O class or embedded SW or so will dootherwise, and then we might have a bit of trouble. I'm unsure if thereare rules for the order of appearance of tags in XML.OTOH reading and writing just simple data, which Octave does, isn't soterribly complicated and demanding.


Philip

[Prev in Thread]

Current Thread

[Next in Thread]

Spreadsheet translation, Windhorn, Allen E [ACIM/LSA/MKT], 2020/01/29
- Re: Spreadsheet translation, Doug Stewart, 2020/01/29
  - RE: Spreadsheet translation, Windhorn, Allen E [ACIM/LSA/MKT], 2020/01/29
    - RE: Spreadsheet translation, PhilipNienhuis, 2020/01/30
    - RE: Spreadsheet translation, Windhorn, Allen E [ACIM/LSA/MKT], 2020/01/30
    - Re: Spreadsheet translation, Philip Nienhuis, 2020/01/30
    - RE: Spreadsheet translation, Windhorn, Allen E [ACIM/LSA/MKT], 2020/01/30
    - Re: Spreadsheet translation, Philip Nienhuis, 2020/01/31
    - Re: Spreadsheet translation, Markus Mützel, 2020/01/31
    - Re: Spreadsheet translation, Philip Nienhuis <=
    - RE: Spreadsheet translation, Windhorn, Allen E [ACIM/LSA/MKT], 2020/01/31
    - RE: Spreadsheet translation, PhilipNienhuis, 2020/01/31
    - RE: Spreadsheet translation, Windhorn, Allen E [ACIM/LSA/MKT], 2020/01/31

Prev by Date: Re: Spreadsheet translation
Next by Date: RE: Spreadsheet translation
Previous by thread: Re: Spreadsheet translation
Next by thread: RE: Spreadsheet translation
Index(es):
- Date
- Thread