bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Issue when using iconv 2.12 on RHEL 6.7


From: Lim, Yongkeong
Subject: Re: [bug-gnu-libiconv] Issue when using iconv 2.12 on RHEL 6.7
Date: Sat, 8 Apr 2017 06:11:27 +0000

Hi Bruno,

Thanks for the reply, we have tried different source formats like tis620 but it did not work out of all conversion only iso-8859-11 managed to convert the thai characters.

We tried using //ignore and //translit but it did not work therefore we have to use -c to skip the characters so that it proceeds with other records.

We extracted the error records for iconv seperately and it works.

Rgds

Yong Keong, Lim

Consultant, Big Data Practice SEA

Dell EMC | Global Services, Consulting

mobile +65 98354354

address@hidden



-------- Original message --------
From: Bruno Haible <address@hidden>
Date: 08/04/2017 00:21 (GMT+07:00)
To: address@hidden
Cc: "Lim, Yongkeong" <address@hidden>, "Boppana, Hari" <address@hidden>, "Sittipongvorakul, Sutthisak" <address@hidden>, "Dhal, Amaresh" <address@hidden>
Subject: Re: [bug-gnu-libiconv] Issue when using iconv 2.12 on RHEL 6.7

Hi,

Lim, Yongkeong wrote:
> I have a data file which we managed to convert using macbook running on
> iconv (GNU libiconv 1.11), no characters got deleted after conversion.
> But when we upload the same file to the RHEL server running on iconv
> (GNU libiconv 2.12), some characters got deleted by the iconv function.
>
> Below is the command we used:
>
> iconv -c -f iso-8859-11 -t utf-8 <source file> > <output file>

The second machine is using iconv from GNU libc, not GNU libiconv.
So, it's two different implementations of the iconv facility.
But both have very similar conversion tables.

For Thai, your file could be in encoding TIS-620, ISO-8859-11, or
Mac-Thai. [1] The conversion tables used by GNU libiconv and GNU libc
for ISO-8859-11 are identical [2], and likewise for TIS-620 [3].

I'd suggest that you
  1) Don't use the option "-c" of iconv - this option produces lossy
     output by design.
  2) Instead, try harder to find the right encoding. That is, try
     iconv -f iso-8859-11 -t utf-8 source > output1
     iconv -f tis-620 -t utf-8 source > output2
     iconv -f macthai -t utf-8 source > output3
     and compare the resulting three output files.

Also, in general, ISO-8859-11 should not be used, since it is *not*
standardized - unlike TIS-620, which is a (national) standard. See [4],[5].

Bruno

[1] https://haible.de/bruno/charsets/conversion-tables/Thai.html
[2] https://haible.de/bruno/charsets/conversion-tables/ISO-8859-11.html
[3] https://haible.de/bruno/charsets/conversion-tables/TIS-620.html
[4] https://en.wikipedia.org/wiki/ISO/IEC_8859-11
[5] https://en.wikipedia.org/wiki/Thai_Industrial_Standard_620-2533


reply via email to

[Prev in Thread] Current Thread [Next in Thread]