[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT file
From: |
Markus Mützel |
Subject: |
[Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files |
Date: |
Fri, 15 May 2020 17:39:37 -0400 (EDT) |
User-agent: |
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:77.0) Gecko/20100101 Firefox/77.0 |
Follow-up Comment #1, bug #58368 (project octave):
Thanks for the test case.
The attached patch adds the conversion from UTF-16 or UTF-32 to Octave's
internal UTF-8 encoding when reading from v5 .mat files. It also no longer
strips non-ASCII characters from UTF-8 strings.
This is only save to do for character vectors. For character matrices or ND
arrays (in UTF-16 or UTF-32), it falls back to the previous behavior of
replacing non-ASCII characters with "?".
Maybe we should fall back to this behavior for column vectors, too?
Is there a good way of constructing a charMatrix from a non-zero terminated C
string buffer? In the patch, I constructed a C++ string with the known length
constructor and passed that to the charMatrix constructor. Can the
construction of the intermediate object be avoided?
I only tested with the file from comment #0 which contains UTF-16 encoded
strings. But I hope that UTF-32 and UTF-8 should be working, too.
(file #49082)
_______________________________________________________
Additional Item Attachment:
File name: bug58368_utf_mat.patch Size:7 KB
<https://savannah.gnu.org/file/bug58368_utf_mat.patch?file_id=49082>
_______________________________________________________
Reply to this item at:
<https://savannah.gnu.org/bugs/?58368>
_______________________________________________
Message sent via Savannah
https://savannah.gnu.org/
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Guillaume, 2020/05/14
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files,
Markus Mützel <=
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Guillaume, 2020/05/16
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Markus Mützel, 2020/05/16
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Guillaume, 2020/05/16
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Markus Mützel, 2020/05/17
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Markus Mützel, 2020/05/17
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Guillaume, 2020/05/17
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Markus Mützel, 2020/05/18
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Rik, 2020/05/23
- [Octave-bug-tracker] [bug #58368] UTF16 and UTF32 characters in MAT files, Markus Mützel, 2020/05/24