help-octave
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: Problems with file encoding


From: Richard Balogh
Subject: Re: Problems with file encoding
Date: Thu, 12 May 2011 20:46:26 +0200
User-agent: Thunderbird 2.0.0.24 (Windows/20100228)

Hello,

From Your samples it is clear that non-working file is Unicode encoded
file, and working file is ASCII encoded. The difference is that in Unicode
each character requires two bytes. First (strange square) two bytes are
Byte Order Markings (BOM). Unfortunately, I have no experience nor the
knowledge about Unicode support in Octave. Maybe someone else...

  Richard Balogh


HomeRun4711 wrote:
Hello,

thanks for your response. The file is created by exporting
data from Autodesk Multiphysics 2012, a finite element software to calculate heat fluxes, etc.

I had a look into the files using a hex-editor.

The exported (non-working) file is encoded in UCS-2 little endian.
If I convert the file to ANSI, everything works fine.

The content in clear text should be
---------
0,000    28,500
---------

This in the non-working UCS-2 file in hex reads:
----------
FF FE 30 00 2C 00 30 00 30 00 30 00 09 00 32 00 38 00 2C 00 35 00 30 00 30

----------
In clear text in the hex editor it reads
----------
ÿþ0.,.0.0.0...2.8.,.5.0.0
----------

The working, to ANSI converted file that does not cause problems
in Octave reads in hex:
---
30 2C 30 30  30 09 32 38  2C 35 30 30
---

the clear text shown in the hex editor is
---
0,000.28,500
---

So somehow the FF FE, the first characters in the UCS-2 file seems
to cause the problem.

I am not familiar with what that exactly means. Is there a way
to work with that files using Octave? Or must they be converted
before?

Kind regards,
Walter





Am 12.05.2011 15:45, schrieb Balogh Richard, Ing.:
Hello,

it is difficult to decide, as we don't know how Your file was created. But it definitely looks like encoding problem - You write a chars into the file
with different meanings in different encodings.

Try to find a good hex editor and look what exactly is written to Your
file by the CAD software...

No more ideas for now.

Richard

Dňa 11.5.2011 14:12, Fritz Fischer wrote / napísal(a):
Hello Richard,

you are right, I am trying to change decimal separator using the
Octave-Script.
The CAD-file has commas as seperators and I want to change that so
I can calculate using Octave.

But how could this cause the problem with the strange rectangle
character?

The input file I posted was a manual conversion from comma to dots,
using "search and replace",
so it is not really converted using my Octave-script. The original
file has commas as separators.

The error showing the rectangle character is a direct output from the
octave-console,
if I am trying to run it. This is the relevant part of the script:

------
while( (~feof( fin )) )

s_old = fgetl(fin);
s_old %show output

s_new = strrep(s_old, ',', '.');

fprintf(fout,'%s\n',s_new);

end
-----

Everything works fine if the input-file is saved to another encoding
before...















2011/5/11 Balogh Richard, Ing. <address@hidden
<mailto:address@hidden>>

Try to change Your local settings for decimal separator - it used to
be point, not comma.
Or, replace all dots in Your file with commas.

Richard Balogh


Dňa 11.5.2011 13:36, Fritz Fischer wrote / napísal(a):

Hello!

I am using Octave for Windows (octave-forge)

I was trying to import a text file that was spit out by our CAD
software into a nice octave matrix.

But this was not successful due to a problem with the encoding. I can
not completly figure out
why this happens and hope that you can help me.

My text editor tells me that file from the CAD software is encoded in
"UCS-2 Little Endian".
Just trying to read this file unmodified results in a row of strange
characters in the output file
that I am trying to export.

The input file looks like this:

---
0.000 28.500
607.143 25.357
1214.286 22.903
1821.429 20.976
2428.571 19.448
---

Some code I am using:

---
fin = fopen(filename_input, 'rt');
...
s_old = fgetl(fin);
...
---

It seems that only the first line is read wrong, because if I display
all the lines read into the variable s_old it shows that typical
ascii-rectangle followed by the
numbers it should show:

s_old = ■0 , 0 0 0 2 8 , 5 0 0

All other rows look fine.

If I convert the input-file into ANSI or UTF8 before by using the text
editor everything works fine.

Since this calculation has to be done to a lot of text files manually
converting them to
a different encoding is not a solution.

Can I somehow change the input format that Octave accepts? Or can
someone tell me
another solution to this?

Kind regards,
Walter



_______________________________________________
Help-octave mailing list
address@hidden <mailto:address@hidden>
https://mailman.cae.wisc.edu/listinfo/help-octave







__________ Informacia od ESET Smart Security, verzia databazy 6117 (20110512) __________

Tuto spravu preveril ESET Smart Security.

http://www.eset.sk







reply via email to

[Prev in Thread] Current Thread [Next in Thread]