bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] problem with iso-8859-8 encoding


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] problem with iso-8859-8 encoding
Date: Tue, 26 Feb 2008 03:25:28 +0100
User-agent: KMail/1.5.4

Hello,

Alexander Sirotkin wrote:
> I find it hard to believe, but apparently iconv have a problem converting
> iso-8859-8 (hebrew) to any other encoding, for instance UTF-8. Hebrew
> letters in the result appear in the revere order.

As you can read in [1], [2], text in ISO 8859-8 is "sometimes in logical,
sometimes in visual order". Therefore the request to convert ISO-8859-8 to
UTF-8 is already ambiguous per se. Some others [3] say that ISO-8859-8 is always
visual... Oh well.

Additionally, conversion between visual and logical order requires an
arbitrary amount of memory (whose size depends on the input); this is
does not fit into the way iconv is implemented in GNU libc and in GNU libiconv.

For these reasons, GNU libc and GNU libiconv don't implement this reordering.

Fribidi implements reordering from logical to visual order.

The only free software (that I know of) that does reordering of ISO-8859-8
from visual to logical is ICU, and its documentation [4] says:

  "Legacy systems frequently stored text in visual order to avoid
   reordering for display. When exchanging data with such systems for
   processing in Unicode it is necessary to reorder the data from visual
   order to logical order and back. Such not-for-display transformations
   are sometimes referred to as "storage layout" transformations.

   There are two problems with an "inverse reordering" from visual to
   logical order: There may be more than one logical order of text that
   results in the same display (logical-to-visual reordering is a many-to-one
   function), and there is no standard algorithm for it. ICU's BiDi API
   provides a setting for "inverse" operation that modifies the standard
   Unicode Bidi algorithm. However, it may not always produce the expected
   results. Bidirectional data should be converted to Unicode and reordered
   to logical order only once to avoid roundtrip losses. Just as it is best
   to never convert to non-Unicode charsets, data should not be reordered
   from logical to visual order except for display and printing."

Bruno


[1] http://en.wikipedia.org/wiki/ISO_8859-8
[2] http://en.wikipedia.org/wiki/ISO-8859-8-I
[3] http://www.w3.org/TR/2002/WD-xhtml2-20021211/mod-bidi.html
[4] http://www.icu-project.org/userguide/icu.pdf





reply via email to

[Prev in Thread] Current Thread [Next in Thread]