bug-gnu-libiconv
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [bug-gnu-libiconv] [PATCH] armscii8 bugfix


From: Bruno Haible
Subject: Re: [bug-gnu-libiconv] [PATCH] armscii8 bugfix
Date: Mon, 12 Jul 2010 11:35:25 +0200
User-agent: KMail/1.9.9

Hello Gayane,

> the mail archive was claiming that it updates the archive every 2 hours,
> so, when I couldn't find my mail after more than 20 hours, I decided to
> re-post the mail using another mail account.

Yes, when you post to the mailing list for the first time, there is a delay,
because of the spam filter.

> My main reference is the True Type font "Arial Armenian" [1].
> [1] http://www.armsite.com/software/fonts/arialarmenian.zip
> It was a very popular font in Armenia in pre-unicode era in MS Windows for
> ARMSCII-8 encoding. Maybe, as many proprietary things, the font is not fully
> standard-compliant.

I can give only a small weight to a font, compared to a standard if that is
available. Also note that this font is from 1994, whereas the AST 34.002 
standard
is from 1997. So this font cannot really be authoritative.

> [2] http://unicode.org/mail-arch/unicode-ml/Archives-Old/UML025/1232.html

With the two <quote>s in there, it is not clear what to do with this character.
The description as rotating petals sounds like one should leave it unmapped
(because there is no corresponding Unicode character), whereas the description
as "This symbol has neither functional nor any other meaning in the coding
table except that it begins the Armenian character set" sounds like it should
be dropped during conversion.

Unless you show me some web sites that use this character, I would prefer
to leave it as is: unmapped.

> [3] http://tools.ietf.org/html/draft-melikyan-armenian-charsets-00

This copy is from May 1998. In 2002, I downloaded a newer version of this
text, from June 1999, from <http://www.freenet.am/armscii/armcs-006.html>.
Find it attached.

> [AST 34.002 ?] http://users.freenet.am/~vm/AST/002-ArmSCII-8-Encoding.PDF

Now these two, especially the last one, are excellent references. Thank you!
They make it clear:
  - 0xA1 is Armenian eternity sign - not in Unicode.
  - 0xA2 is Armenian Ligature "ew" U+0587 in the standard and in the new
    version of Melikyan's writeup, but Armenian Section Sign U+00A7 in the old
    version of Melikyan's writeup.
  - 0xA8 is Armenian EM Dash - mapped to U+2014.

Conclusion: libiconv's conversion is exactly as it should be. Nothing should
change.

> The thing is, if A2 was the "ARMENIAN SMALL LIGATURE ECH YIWN" and was
> later replaced by "section sign punctuation", then what has happened with
> "ARMENIAN SMALL LIGATURE ECH YIWN"?

Maybe it was not so frequently used, because people did not use this ligature?

> In the mentioned font, the 
> "ARMENIAN SMALL LIGATURE ECH YIWN" is mapped to 0xA8.

This means nothing. Software can choose ligatures from fonts automatically.
This happens for the "fi" ligature in many text processing systems.

> However, maybe the issue is that the popular fonts in Armenia are not
> standard-compliant, and now we, who want to convert the old documents to
> unicode, get some errors,

Yes, bad luck. You can use modified versions of libiconv for your local
needs - that's why it's free software -, but for the releases that I distribute,
I'll stick with the standard and highly authoritative references.

Btw:
[3] is interesting. The section 2.2. could be used to improve the LC_COLLATE
section of the hy_AM locale in glibc. Are you familiar with sorting (in lists
and dictionaries) in Armenian? Would you like to work with me on putting proper
Armenian sorting into glibc?

Bruno

Attachment: armcs-006.zip
Description: Zip archive


reply via email to

[Prev in Thread] Current Thread [Next in Thread]