help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with file encoding


From: krthie
Subject: Re: Problems with file encoding
Date: Mon, 17 Oct 2011 08:54:28 -0700 (PDT)

Hi


Jordi GutiƩrrez Hermoso-2 wrote:
> 
> On 12 May 2011 13:46, Richard Balogh <address@hidden> wrote:
>> From Your samples it is clear that non-working file is Unicode encoded
>> file, and working file is ASCII encoded. The difference is that in
>> Unicode
>> each character requires two bytes.
> 
> I couldn't see the file being referred to, but I've seen this problem
> before with encodings. To clarify, Unicode isn't an encoding, but a
> family of them, and not all Unicode encodings require two bytes per
> character. UTF-8 for example, is a kind of Unicode encoding that uses
> one byte for ASCII characters, so UTF-8 and ASCII agree on files that
> do not use more codepoints than those defined by ASCII.
> 
> I have seen that Windows sometimes uses UTF-16, and *that* encoding
> does use at least two bytes per character. I don't think there is a
> way to make Octave guess the encoding, but perhaps it could possibly
> be told what the encoding is. That's a development I don't
> particularly want to undertake myself, though.
> 
> - Jordi G. H.
> 

I stumbled on this problem myself. In matlab, I open my files with the
specific encoding (see 
http://www.mathworks.co.uk/help/techdoc/ref/fopen.html
http://www.mathworks.co.uk/help/techdoc/ref/fopen.html , for example
<code>fid= fopen(inputFilename,'rb','b','ISO-8859-1');</code>

Octave unfortunately doesn't support the 4th argument to fopen. It is
probably a major exercise to get it supported. I guess this means using
locales etc. In attempt to get a volunteer going I've had a look at the
documentation for glibc:



> If the opentype string contains the sequence ,ccs=STRING then STRING is
> taken as the name of a coded character set and fopen will mark the stream
> as wide-oriented which appropriate conversion functions in place to
> convert from and to the character set STRING is place. Any other stream is
> opened initially unoriented and the orientation is decided with the first
> file operation. If the first operation is a wide character operation, the
> stream is not only marked as wide-oriented, also the conversion functions
> to convert to the coded character set used for the current locale are
> loaded. This will not change anymore from this point on even if the locale
> selected for the LC_CTYPE category is changed. 
> 
However,msdn's documentation for fopen is slightly different, so how
portable this is, I don't know...

Kris

--
View this message in context: 
http://octave.1599824.n4.nabble.com/Problems-with-file-encoding-tp3514439p3912486.html
Sent from the Octave - General mailing list archive at Nabble.com.


reply via email to

[Prev in Thread] Current Thread [Next in Thread]