bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] Updating iconv tables


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] Updating iconv tables
Date: Thu, 12 Jun 2008 02:42:00 +0200
User-agent: KMail/1.5.4

Hi,

I'm not sure I understand it all right.

> When people have
> gone to convert the EDICT file to UTF8 for other
> systems, the iconv utility simply dies on that character

In summary, you are saying that you have a particular character in EUC-JP,
that the iconv conversion from EUC-JP to UTF-8 does not grok?

Then the character is not EUC-JP.

I'm not sure which character you are talking about, because your mail
had an encoding specification of ISO-2022-JP, which usually means
ISO-2022-JP-2, but that particular character was invalid in ISO-2022-JP-2
(it was encoded as "ESC $ B - j"), the other character in that line was
U+682A, and you were talking about U+3231.

> The problem, I conclude, is with the compiled-in tables
> in iconv in the Linux distros. It seems Sun has gone to
> the trouble of keeping theirs up-to-date, but the standard
> distros haven't.

You have a misconception of what EUC-JP is. EUC-JP is a character encoding
scheme based on three standards: ASCII, JIS X 0208, and JIS X 0212. These
are standards issued by Japanese authorities, and carved in stone. Anyone
who thinks that EUC-JP tables have to be "kept up-to-date", is asking for
deviation from standards, and is asking for interoperability problems!

The interoperability problem that you encountered is *precisely* due to
your vendor having added "extensions" to their EUC-JP fonts, and you
expect that everyone else has the same extensions in their fonts and tables!
Take a look at
   http://www.haible.de/bruno/charsets/conversion-tables/EUC-JP.html
to see how many variants of EUC-JP already exist!

Bruno





reply via email to

[Prev in Thread] Current Thread [Next in Thread]