help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [help-octave] Re: utf8 does not appear to work for function document


From: Mike Miller
Subject: Re: [help-octave] Re: utf8 does not appear to work for function documentation strings generated with texinfo
Date: Wed, 26 Mar 2014 16:19:12 -0400

On Wed, Mar 26, 2014 at 3:13 PM, Alan W. Irwin
<address@hidden> wrote:
> function test_fwrite_utf8
> fid = fopen("test.out", "w")
> fwrite(fid, "The unicode character, ≥, is output\n")
> endfunction
>
> That creates the following file:
>
> address@hidden> od -a test.out
> 0000000   T   h   e  sp   u   n   i   c   o   d   e  sp   c   h   a   r
> 0000020   a   c   t   e   r   ,  sp nul nul nul   ,  sp   i   s  sp   o
> 0000040   u   t   p   u   t  nl
> 0000046

Yep, same here.

> The default precision for fwrite (according to the documentation of
> that function) is uchar.  I believe that would work if the Octave type
> of the utf8 string, "The unicode character, ≥, is output\n" is also uchar
> since all that is needed here is to transmit the bytes of that string
> unmolested to the output file.

Unfortunately there is no "uchar" type in the Octave (or Matlab)
language, this keyword is only meaningful to the fread and fwrite
functions.

> So is the problem that the Octave utf8 string does not have a uchar
> type?  Or does the fwrite built-in have some unnecessary filtering in
> place to zero bytes with a non-zero eighth bit (i.e., non-ascii utf8
> bytes)?

The *real* problem is that the "char" type is supposed to (and will
someday) represent a Unicode character. It currently only represents a
C "char" one-byte value in Octave.

If you want to focus on this specific situation, the problem is that
the argument is a string of type "char", which is probably limiting
the range to [-128,127], which is then limited to [0,127] when
converted to the "uchar" precision range internally by fwrite.

If you try

  fwrite (fid, "The unicode character, ≥, is output\n", "schar");

or

  fwrite (fid, double("The unicode character, ≥, is output\n"));

instead, then the values are not limited and it should work (both work for me).

This may or may not be an acceptable workaround for the particular
case of help strings, but internally fwrite (and other functions) may
still effectively apply ASCII range limits to char matrices (strings)
until Octave actually supports wide characters.

-- 
mike



reply via email to

[Prev in Thread] Current Thread [Next in Thread]