Document version 006.en.html
June 30, 1999
This document presents the set of Armenian characters used in information systems in accordance with AST 34.001 standard of the State Standards Commission of the Republic of Armenia. It also provides information on the classification and sorting of Armenian characters and recommendations for implementation of basic algorithms of text processing.
1. Introduction |
2. Basic Character Set |
|
3. Encoding |
|
4. Character Set and Language Tags |
|
5. Acknowledgements |
6. Author's Address |
7. References |
The publication of comments in reference to the standards is due to the following considerations:
1. Armenian character sets have been used in different computer systems since at least 1982, although a national standard was established only in 1997. This time lag resulted in the emergence of incompatible coding systems. Some of the existing discrepancies are also due to the existence of two different grammars of the Armenian language.
2. The emergence of internationalized operating systems and an important number of multilingual applications result in difficulties when national language support is implemented by programmers that are not familiar with Armenian.
The present memo is a recommendation rather than a binding standard.
The recommendations set forth herein are elaborated on the basis of the national standard AST 34.001 (reg.no. 166-97), as well as the ArmSCII Version 2 standard.
The Armenian character set presented below follows the standard AST 34.001. The first column contains full naming of the characters, and the second column provides abbreviations thereof that can be used in the systems confined to the Latin character set. The detailed classification of the characters follows in the points below.
In spite of the fact that the space, numbers and Latin script are also part of the Armenian character set, these were not included in the AST 34.001 standard since these are present in all systems.
Table 1. Basic Character Set
1 2 Armenian Eternity Sign armeternity Armenian Ligature "ew" armew Armenian Section Sign armsection Armenian Full Stop (Verjaket) armfullstop Armenian Right Parenthesis armparenright Armenian Left Parenthesis armparenleft Armenian Right Quotation Mark armquotright Armenian Left Quotation Mark armquotleft Armenian EM Dash armemdash Armenian Dot (Mijaket) armdot Armenian Separation Mark (But) armsep Armenian Comma armcomma Armenian EN Dash armendash Armenian Hyphen (Yentamna) armyentamna Armenian Ellipsis armellipsis Armenian Apostrophe armapostrophe Armenian Exclamation Mark (Amanak) armexclam Armenian Accent (Shesht) armaccent Armenian Question Mark (Paruyk) armquestion Armenian Capital Letter [ayb] Armayb Armenian Small Letter [ayb] armayb Armenian Capital Letter [ben] Armben Armenian Small Letter [ben] armben Armenian Capital Letter [gim] Armgim Armenian Small Letter [gim] armgim Armenian Capital Letter [da] Armda Armenian Small Letter [da] armda Armenian Capital Letter [yech] Armyech Armenian Small Letter [yech] armyech Armenian Capital Letter [za] Armza Armenian Small Letter [za] armza Armenian Capital Letter [e] Arme Armenian Small Letter [e] arme Armenian Capital Letter [at] Armat Armenian Small Letter [at] armat Armenian Capital Letter [to] Armto Armenian Small Letter [to] armto Armenian Capital Letter [zhe] Armzhe Armenian Small Letter [zhe] armzhe Armenian Capital Letter [ini] Armini Armenian Small Letter [ini] armini Armenian Capital Letter [lyun] Armlyun Armenian Small Letter [lyun] armlyun Armenian Capital Letter [khe] Armkhe Armenian Small Letter [khe] armkhe Armenian Capital Letter [tsa] Armtsa Armenian Small Letter [tsa] armtsa Armenian Capital Letter [ken] Armken Armenian Small Letter [ken] armken Armenian Capital Letter [ho] Armho Armenian Small Letter [ho] armho Armenian Capital Letter [dza] Armdza Armenian Small Letter [dza] armdza Armenian Capital Letter [ghat] Armghat Armenian Small Letter [ghat] armghat Armenian Capital Letter [tche] Armtche Armenian Small Letter [tche] armtche Armenian Capital Letter [men] Armmen Armenian Small Letter [men] armmen Armenian Capital Letter [hi] Armhi Armenian Small Letter [hi] armhi Armenian Capital Letter [nu] Armnu Armenian Small Letter [nu] armnu Armenian Capital Letter [sha] Armsha Armenian Small Letter [sha] armsha Armenian Capital Letter [vo] Armvo Armenian Small Letter [vo] armvo Armenian Capital Letter [cha] Armcha Armenian Small Letter [cha] armcha Armenian Capital Letter [pe] Armpe Armenian Small Letter [pe] armpe Armenian Capital Letter [je] Armje Armenian Small Letter [je] armje Armenian Capital Letter [ra] Armra Armenian Small Letter [ra] armra Armenian Capital Letter [se] Armse Armenian Small Letter [se] armse Armenian Capital Letter [vev] Armvev Armenian Small Letter [vev] armvev Armenian Capital Letter [tyun] Armtyun Armenian Small Letter [tyun] armtyun Armenian Capital Letter [re] Armre Armenian Small Letter [re] armre Armenian Capital Letter [tso] Armtso Armenian Small Letter [tso] armtso Armenian Capital Letter [vyun] Armvyun Armenian Small Letter [vyun] armvyun Armenian Capital Letter [pyur] Armpyur Armenian Small Letter [pyur] armpyur Armenian Capital Letter [ke] Armke Armenian Small Letter [ke] armke Armenian Capital Letter [o] Armo Armenian Small Letter [o] armo Armenian Capital Letter [fe] Armfe Armenian Small Letter [fe] armfe
The basic character set can be divided into the following functional subsets:
unclassified-symbols ::= {armeternity, armew, armsection}
punctuation-signs ::= {armfullstop, armparenright, armparenleft, armquotright, armquotleft, armemdash, armdot, armsep, armcomma, armendash}
modifier-letters ::= {armyentamna, armellipsis, armapostrophe}
combining-punctuation ::= {armexclam, armaccent, armquestion}
letters ::= {capital-letters, small-letters}
capital-letters ::= {Armayb, Armben, Armgim, Armda, Armyech,Armza, Arme, Armat, Armto, Armzhe, Armini, Armlyun, Armkhe, Armtsa, Armken, Armho, Armdza, Armghat, Armtche, Armmen, Armhi, Armnu, Armsha, Armvo, Armcha, Armpe, Armje, Armra, Armse, Armvev, Armtyun, Armre, Armtso, Armvyun, Armpyur, Armke, Armo, Armfe}
small-letters ::= {armayb, armben, armgim, armda, armyech, armza, arme, armat, armto, armzhe, armini, armlyun, armkhe, armtsa, armken, armho, armdza, armghat, armtche, armmen, armhi, armnu, armsha, armvo, armcha, armpe, armje, armra, armse, armvev, armtyun, armre, armtso, armvyun, armpyur, armke, armo, armfe}
The sorting order is important for letter characters only and should follow the order presented in the Table 1.
Capitalization applies to letter characters only. The shift from upper case to lower case replaces the capital-letter character with the following character as per Table 1. Accordingly, the shift from lower case to upper case replaces the small-letter character with the preceding character as per the Table 1.
Text search and dictionary applications should take into account the following factors: (1) in the Armenian language, a word is a sequence of letters, combining-punctuation, and modifier-letters; (2) in comparison of words in the text or dictionary, the combining-punctuation and modifier-letters may be ignored.
In reference to the combining-punctuation, the following factors are important: (1) the combining-punctuation mark follows the letter to which it applies (which can only be a vowel in Armenian), (2) a letter can be followed by more than one combining-punctuation mark.
A ligature is a traditional or convenient graphical presentation of a sequence of letters, e.g. the Latin ligature "fi", the German ligature "ss", the Armenian ligature "armmen+armnu", etc. The ligatures can be officially registered and codified (as in the UCS), but the systems supporting ligatures substitute them automatically only on the screen, printer, or other graphical devices.
The Armenian ligature armew
that is a combination of armyech
and armvyun
was included in the AST 34.001 standard in view of the following considerations: (1) armew is a "ligature symbol" rather than a ligature, and (2) armew carries an "and" denotation similar to the "&" character.
The Coded Character Set is a mapping of a set of characters into a set of integer numbers, e.g. ArmSCII-7, ArmSCII-8 and ArmSCII-8A tables.
The term "unification" is used in the following denotation: as a rule, the mapping of an Armenian character set takes place in operating environments where other character sets are already available; thus, certain characters, in particular punctuation marks, may have identical graphical mapping and similar functions. In such cases, some characters of the Armenian character set may be mapped into already existing codified characters. The details of unification of Armenian punctuation marks are reviewed below.
The mapping of characters in coding tables has several aspects (in order of priority): (1) scope of the character mapping, (2) sequence of mapping, (3) character unification requirements, (4) general requirements of a given operating environment.
The encoding in every new operating environment should, to the extent possible, use the already existing coding tables (see the next section). Should this be impossible, the newly created coding tables should follow as much as possible the following general principles:
1. The Armenian character set should be comprehensive (with due regard to the unification)
2. The Armenian character set should be mapped into a continual sequence of codes in the order these are presented in the Table 1. The unified character codes should be left absolute, i.e. should not be used for other purposes. The most important is the letter sequence.
3. The unification implies both graphical and functional identity of characters. For example, mapping of the parenthesis (armparenleft and armparenright) into the parenthesis existing in the ASCII is not an error. On the other hand, the similarity of the Armenian full stop (armfullstop
) and the colon is purely graphical. The armdot
and armsep
bear functions different from the Latin dot and the grave accent character accordingly. Another important factor of character unification is the use of the Latin alphabet and punctuation marks in formal languages. It should be born in mind, for example, that a comma is often used as a separator in lists (e.g. in a keyword list in HTML document header), and in order to avoid confusion, the armcomma character may be mapped into a Latin comma.
4. It may often happen that the requirements of a given operating environment may contradict the above principles. For example, the pseudo-graphical characters in DOS that were supported by video-adapters ("ninth pixel" factor), resulted in the creation of an alternative 8-bit coding table ArmSCII-8A. Another example is Macintosh OS where codes like ellipsis, nbsp and soft hyphen are recognized and interpreted in a special by numerous applications, which rendered the meaningful use the ArmSCII standard in this system impossible (the ArmSCII-8A table is used in OS Macintosh).
ArmSCII coding table does not fully correspond to the above principles, and the Armenian block in the current version of Unicode (2.1) corresponds to neither (1), (2), nor (3).
Table 2. Cross reference
1 - Short name
2 - ArmSCII-7
3 - ArmSCII-8
4 - ArmSCII-8A
5 - Unicode Version 2.1
1 | 2 | 3 | 4 | 5 | |
---|---|---|---|---|---|
![]() |
armeternity | 21 | A1 | DC | - |
![]() |
armew | 22 | A2 | 15 | 0587 |
![]() |
armsection | - | - | - | 00A7 |
![]() |
armfullstop | 23 | A3 | 3A | 0589 |
![]() |
armparenright | 24 | A4 | 29 | 0029 |
![]() |
armparenleft | 25 | A5 | 28 | 0028 |
![]() |
armquotright | 26 | A6 | AF | 00BB |
![]() |
armquotleft | 27 | A7 | AE | 00AB |
![]() |
armemdash | 28 | A8 | 2D | 2014 |
![]() |
armdot | 29 | A9 | 2E | 002E |
![]() |
armsep | 2A | AA | 60 | 055D |
![]() |
armcomma | 2B | AB | 2C | 002C |
![]() |
armendash | 2C | AC | 5F | 002D |
![]() |
armyentamna | 2D | AD | DD | 058A |
![]() |
armellipsis | 2E | AE | DE | 2026 |
![]() |
armapostrophe | 7E | FE | FE | 055A |
![]() |
armexclam | 2F | AF | 7E | 055C |
![]() |
armaccent | 30 | B0 | 27 | 055B |
![]() |
armquestion | 31 | B1 | DF | 055E |
![]() |
Armayb | 32 | B2 | 80 | 0531 |
![]() |
armayb | 33 | B3 | 81 | 0561 |
![]() |
Armben | 34 | B4 | 82 | 0532 |
![]() |
armben | 35 | B5 | 83 | 0562 |
![]() |
Armgim | 36 | B6 | 84 | 0533 |
![]() |
armgim | 37 | B7 | 85 | 0563 |
![]() |
Armda | 38 | B8 | 86 | 0534 |
![]() |
armda | 39 | B9 | 87 | 0564 |
![]() |
Armyech | 3A | BA | 88 | 0535 |
![]() |
armyech | 3B | BB | 89 | 0565 |
![]() |
Armza | 3C | BC | 8A | 0536 |
![]() |
armza | 3D | BD | 8B | 0566 |
![]() |
Arme | 3E | BE | 8C | 0537 |
![]() |
arme | 3F | BF | 8D | 0567 |
![]() |
Armat | 40 | C0 | 8E | 0538 |
![]() |
armat | 41 | C1 | 8F | 0568 |
![]() |
Armto | 42 | C2 | 90 | 0539 |
![]() |
armto | 43 | C3 | 91 | 0569 |
![]() |
Armzhe | 44 | C4 | 92 | 053A |
![]() |
armzhe | 45 | C5 | 93 | 056A |
![]() |
Armini | 46 | C6 | 94 | 053B |
![]() |
armini | 47 | C7 | 95 | 056B |
![]() |
Armlyun | 48 | C8 | 96 | 053C |
![]() |
armlyun | 49 | C9 | 97 | 056C |
![]() |
Armkhe | 4A | CA | 98 | 053D |
![]() |
armkhe | 4B | CB | 99 | 056D |
![]() |
Armtsa | 4C | CC | 9A | 053E |
![]() |
armtsa | 4D | CD | 9B | 056E |
![]() |
Armken | 4E | CE | 9C | 053F |
![]() |
armken | 4F | CF | 9D | 056F |
![]() |
Armho | 50 | D0 | 9E | 0540 |
![]() |
armho | 51 | D1 | 9F | 0570 |
![]() |
Armdza | 52 | D2 | A0 | 0541 |
![]() |
armdza | 53 | D3 | A1 | 0571 |
![]() |
Armghat | 54 | D4 | A2 | 0542 |
![]() |
armghat | 55 | D5 | A3 | 0572 |
![]() |
Armtche | 56 | D6 | A4 | 0543 |
![]() |
armtche | 57 | D7 | A5 | 0573 |
![]() |
Armmen | 58 | D8 | A6 | 0544 |
![]() |
armmen | 59 | D9 | A7 | 0574 |
![]() |
Armhi | 5A | DA | A8 | 0545 |
![]() |
armhi | 5B | DB | A9 | 0575 |
![]() |
Armnu | 5C | DC | AA | 0546 |
![]() |
armnu | 5D | DD | AB | 0576 |
![]() |
Armsha | 5E | DE | AC | 0547 |
![]() |
armsha | 5F | DF | AD | 0577 |
![]() |
Armvo | 60 | E0 | E0 | 0548 |
![]() |
armvo | 61 | E1 | E1 | 0578 |
![]() |
Armcha | 62 | E2 | E2 | 0549 |
![]() |
armcha | 63 | E3 | E3 | 0579 |
![]() |
Armpe | 64 | E4 | E4 | 054A |
![]() |
armpe | 65 | E5 | E5 | 057A |
![]() |
Armje | 66 | E6 | E6 | 054B |
![]() |
armje | 67 | E7 | E7 | 057B |
![]() |
Armra | 68 | E8 | E8 | 054C |
![]() |
armra | 69 | E9 | E9 | 057C |
![]() |
Armse | 6A | EA | EA | 054D |
![]() |
armse | 6B | EB | EB | 057D |
![]() |
Armvev | 6C | EC | EC | 054E |
![]() |
armvev | 6D | ED | ED | 057E |
![]() |
Armtyun | 6E | EE | EE | 054F |
![]() |
armtyun | 6F | EF | EF | 057F |
![]() |
Armre | 70 | F0 | F0 | 0550 |
![]() |
armre | 71 | F1 | F1 | 0580 |
![]() |
Armtso | 72 | F2 | F2 | 0551 |
![]() |
armtso | 73 | F3 | F3 | 0581 |
![]() |
Armvyun | 74 | F4 | F4 | 0552 |
![]() |
armvyun | 75 | F5 | F5 | 0582 |
![]() |
Armpyur | 76 | F6 | F6 | 0553 |
![]() |
armpyur | 77 | F7 | F7 | 0583 |
![]() |
Armke | 78 | F8 | F8 | 0554 |
![]() |
armke | 79 | F9 | F9 | 0584 |
![]() |
Armo | 7A | FA | FA | 0555 |
![]() |
armo | 7B | FB | FB | 0585 |
![]() |
Armfe | 7C | FC | FC | 0556 |
![]() |
armfe | 7D | FD | FD | 0586 |
In the systems and protocols using mnemonic tags for coded character sets, the following tags should be used (name, official source):
Name: | armscii-8 | |
Source: | Armenian Standard Code for Information Interchange, 8-bit coded character set | |
Name: | armscii-8a | |
Source: | Armenian Standard Code for Information Interchange, alternative 8-bit coded character set |
Dictionaries, spelling checkers and other linguistic systems, as well as operating environments distinguishing human languages and locale identification should take into consideration the existence of 4 mutually incomprehensible forms (dialects) of the Armenian language: Eastern Armenian, Western Armenian, Grabar and Middle Armenian. Table 3 presents suggested MIME-style (RFC-1766) mnemonic tags.
Table 3. Language tags
Mime-style name | Full name | |
hy-eastern | Eastern Armenian | |
hy-western | Western Armenian | |
hy-grabar | Grabar | |
hy-middle | Middle Armenian |
This document is the result of long and intensive consultations and cooperation with the staff of the Standards Working Group of the Armenian Computer Center. Special thanks for most valuable inputs and comments go to (in alphabetical order):
Vahram Mekhitarian (vm@acc.am)
Aram Hayrapetian (aramhayr@hotmail.com)
Hovhannes Gizoghian (hkizogh@acc.am)
Tigran Haroutunian (nt1@noyan-tapan.am)
Rouben Taroumian-Hakobian (tarumian@acc.am)
Michael Everson (everson@indigo.ie)
Hovik Melikyan
ArmSCII Working Group
Yerevan, Republic of Armenia
hovik@moon.yerphi.am
[AST 34.001-97]
Information Technologies -- Character Set And Information Encoding: Character Set -- State Standardization Committee of the Republic of Armenia, July 1997
[ArmSCII]
Armenian Standard Code for Information Interchange -- Center of Humane Technologies "Armenian Computer", June 1991
[ArmSCII Version 2]
Armenian Standard Code for Information Interchange, Version 2 -- ArmSCII Working Group, May 1999
[RFC-1766]
Alvestrand, H., "Tags for the Identification of Languages", RFC 1766, March 1995.
[Unicode]
The Unicode Consortium, "The Unicode Standard -- Version 2.0", Addison-Wesley, 1996.
[Unicode Version 2.1]
Unicode Technical Report #8, The Unicode Standard, Version 2.1 -- http://www.unicode.org/unicode/reports/tr8.html