[Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpect

octave-bug-tracker

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpect

From:	Dennis
Subject:	[Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpected
Date:	Tue, 20 Oct 2020 07:48:55 -0400 (EDT)
User-agent:	Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/86.0.4240.75 Safari/537.36

Follow-up Comment #17, bug #59277 (project octave):

Philip,

I figured out why there is a for loop. The reason is that in some cases, the
SharedStrings data contains multiple subentries per string. This happens when
e.g. one word has been formatted. If the complete array called 'strings'
(__OCT_xlsx2oct__.m line 142) is run through regexp, the number of cells at
sublevel is not identical.

Suppose an string in Excel is "this red word". This is saved in SharedStrings
as:

-<si>
        <t>this red word</t>
</si>


In this case regexp (string, '<t[^>]*>(.*?)</t>', "tokens") yields:

{
  [1,1] =
  {
    [1,1] = this red word
  }

}


But if the user colors the word "red" in red, the SharedStrings entry would
look like something this:


<si>
        <r>
                <t xml:space="preserve">this </t>
        </r>
        <r>
                <rPr>
                        <color rgb="FFFF0000"/>
                </rPr>
                <t>red</t>
        </r>
        <r>
                <t> word</t>
        </r>
</si>


regexp would yield:


൓{
  [1,1] =
  {
    [1,1] = this
  }

  [1,2] =
  {
    [1,1] = red
  }

  [1,3] =
  {
    [1,1] =  word
  }

}


So, I guess the for loop is used to ensure that the lower level cell
structures are being processed index by index, so that the final result is
correctly mapped onto the correct index.

So I created a solution that accounts for this. It still includes a for loop,
but that only executes cell2mat(cell2mat()) in case ctext{n} contains a multi
index cell array. It flattens this. Afterwards, the complete cell array ctext
is run through cell2mat(cell2mat()) in one go.

In my real life excel this works properly AND is much faster than the original
code, because only some Excel strings contain formatting for single words.
Please find attached the proposed code (lines 149-156).

(file #50026)
    _______________________________________________________

Additional Item Attachment:

File name: __OCT_xlsx2oct__.m             Size:10 KB
    <https://file.savannah.gnu.org/file/__OCT_xlsx2oct__.m?file_id=50026>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?59277>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/

[Prev in Thread]

Current Thread

[Next in Thread]

[Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpected, (continued)

Prev by Date: [Octave-bug-tracker] [bug #51684] Request for Windows Portable zip version without absolute paths & system hooks
Next by Date: [Octave-bug-tracker] [bug #59215] [MXE] and core Octave: Java 15 detection on MS Windows fails
Previous by thread: [Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpected
Next by thread: [Octave-bug-tracker] [bug #59277] xls2oct and/or openxls behave unexpected
Index(es):
- Date
- Thread