octave-maintainers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Unicode support in io Forge package


From: PhilipNienhuis
Subject: Re: Unicode support in io Forge package
Date: Sun, 20 Oct 2019 08:13:09 -0500 (CDT)

mmuetzel wrote
> Am 19. Oktober 2019 um 20:35 Uhr schrieb "Andrew Janke":
>> The io code uses native2unicode as an alternative if it's available,
>> using a feature test. Here's an example from xls2oct.m:
>>
>>
>>    ## Convert from UTF-8 and strip characters that are not supported by
>> Octave
>>    ## (any chars < 32 or > 255).
>>    if (! strcmp (xls.xtype, "COM") && (spsh_opts.convert_utf))
>>      if (exist ("native2unicode", "file"))
>>        conv_fcn = @(str) unicode2native (native2unicode (str, "UTF-8"));
>>      else
>>        conv_fcn = @utf82unicode;
>>      endif
>>      rawarr = tidyxml (rawarr, conv_fcn);
>>    endif
>>
>> This is leaving me even more confused: I'm not sure what the round trip
>> through both native2unicode and unicode2native accomplishes, especially
>> since native2unicode converts from the specified code page to UTF-8, so
>> doing native2unicode(str, "UTF-8") should basically be a no-op.
>>
>> Putting aside the first native2unicode call, I _think_ the use of
>> unicode2native here is incorrect, because even on Windows, Octave's
>> internal strings are now UTF-8 and not the system default code page. I'm
>> going to do some more research and set up some test spreadsheets, but I
>> suspect all the encoding conversion logic here should just be removed.
>>
> 
> Please, ignore my previous messages.
> I think you are right! I also believe it should be removed completely. The
> XML in the .xlsx files is encoded in UTF-8 (always?) and that is Octave's
> internal encoding. No transcoding should be done at all.
> The code was originally introduced for bug #49222:
> https://savannah.gnu.org/bugs/?49222
> It's embarrassing to re-read how I initially completely mis-understood the
> issue and came up with a fix that seemed to work (on a western Windows)
> back then.

Please don't judge yourself too harshly :-) At the time we both agreed the
fix worked, and that it worked is what counts. Things can always be done
better in hindsight .

Looking back at my own code in the io package, I would also do many things
quite differently.
E.g., collapsing the separate but largely identical ods and xls code sets
into one is a fix for an IMO big mistake I made early on.


> If I correctly understand the last few comments, the problem was (or is?)
> that UTF-8 encoded strings weren't displayed correctly on legacy Windows.
> But I don't think that the io package should interfere with the encoding
> of the strings it reads to work around this.
> If this is still an issue (it isn't on Windows 10), it should be resolved
> differently.

In an earlier post in this thread you wrote something about legacy systems
and releases.
io is to be backwards-compatible to Octave-4.0.0 and 32-bit systems, so we
need to be sure it's still working there.

Support for Windows 7 formally ends January next year but I'd like to keep
io working on Win7 a little longer.

I'm too unfamiliar with Linux & BSDs to know if there are any distros that
might be affected (I've been using Mageia and predecessors for decades and
that has always been fairly up-to-date).

Philip




--
Sent from: https://octave.1599824.n4.nabble.com/Octave-Maintainers-f1638794.html



reply via email to

[Prev in Thread] Current Thread [Next in Thread]