[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1

From: Kenneth Reid Beesley
Subject: [bug-gnu-libiconv] iconv not catching bad bytes for ISO-8859-1
Date: Thu, 13 Aug 2015 19:10:22 -0600

Problem:  iconv not catching/detecting bad bytes when converting from a file 
alleged to be ISO-8859-1 (but it’s not)

Dear All,

I’m using iconv (GNU libiconv 1.14), written by Bruno Haible, in a SUSE Linux 
Also iconv (GNU libiconv 1.11) on a separate machine (OS X 10.10.4).

1.  I create a file, input1252.txt, that contains hex byte values x91 and x92.  
This file is encoded in CP1252,
where x91 and x92 are legal/defined bytes.

These two bytes are not defined in ISO-8859-1 

2.  I run the following script

iconv -f ISO-8859-1 -t UTF-8 —byte-subst=“<PROBLEM: 0x%x>”  
—unicode-subst=“<PROBLEM: U+%04X>” input1252.txt > out.txt

i.e. telling iconv (incorrectly) that the input file is Latin 1, and telling it 
to convert it
to UTF-8.  I expect the x91 and x92 bytes to be recognized as 
and I expect to see <PROBLEM: 0x91> and <PROBLEM: 0x92> in the out.txt file.
But I don’t see them.  The x91 and x92 bytes get copied straight across to the 
output file
on both the systems that I’m using.

What am I missing?



Attachment: input1252.txt
Description: Text document

Attachment: script
Description: Binary data

Kenneth R. Beesley, D.Phil.
PO Box 540475
North Salt Lake UT 84054

reply via email to

[Prev in Thread] Current Thread [Next in Thread]