octave-bug-tracker
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT file


From: Markus Mützel
Subject: [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files
Date: Fri, 15 May 2020 17:39:37 -0400 (EDT)
User-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0

Follow-up Comment #1, bug #58368 (project octave):

Thanks for the test case.

The attached patch adds the conversion from UTF-16 or UTF-32 to Octave's
internal UTF-8 encoding when reading from v5 .mat files. It also no longer
strips non-ASCII characters from UTF-8 strings.

This is only save to do for character vectors. For character matrices or ND
arrays (in UTF-16 or UTF-32), it falls back to the previous behavior of
replacing non-ASCII characters with "?".
Maybe we should fall back to this behavior for column vectors, too?

Is there a good way of constructing a charMatrix from a non-zero terminated C
string buffer? In the patch, I constructed a C++ string with the known length
constructor and passed that to the charMatrix constructor. Can the
construction of the intermediate object be avoided?

I only tested with the file from comment #0 which contains UTF-16 encoded
strings. But I hope that UTF-32 and UTF-8 should be working, too.

(file #49082)
    _______________________________________________________

Additional Item Attachment:

File name: bug58368_utf_mat.patch         Size:7 KB
    <https://savannah.gnu.org/file/bug58368_utf_mat.patch?file_id=49082>



    _______________________________________________________

Reply to this item at:

  <https://savannah.gnu.org/bugs/?58368>

_______________________________________________
  Message sent via Savannah
  https://savannah.gnu.org/




reply via email to

[Prev in Thread] Current Thread [Next in Thread]