smc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[smc-devel] Re: [Indic-computing-standards] Re: Malayalam Half-U: how


From: Dr. U.B. Pavanaja
Subject: [smc-devel] Re: [Indic-computing-standards] Re: Malayalam Half-U: how
Date: Tue, 12 Nov 2002 18:09:02 +0530

>From these discussions I can infer one thing: We need a 
mechanism of choosing one of the many possible display forms for 
a particular combination. 

We are having a similar requirement for Kannada for the case of 
"arkavattu" (reph) and "half ra". Both forms of display are 
possible and both are correct. I had mentioned this to the 
people responsible for OpenType specifications of Indic scripts. 
Currently they don't have any plans to do this changes.

Another point I would like mention here: The sorting rule in 
Unicode has got nothing to do with the character code pages. 
They are different. Unicode has two charts -character chart and 
the collation table. Details of collation are available at 
www.unicode.org/tr10

Rgds,
-Pavanaja

> 
> In Malayalam (iso639-2 language code : mal) there are 37
> 'vyanchanangal' (consonants). All these consonants are usually
> pronounced with a support of 'swaram' (vowel) sound A [U0D06]. The
> pure forms of consonats is writing with a 'chandrakkala' (virama
> [U0D4D]) above the consonant. While pronouncing the pure forms of
> consonants there should be clear sound of vowel U [U0D09]. Some
> consonants another form, which is called 'chillu'. A 'chillu' is a
> consonant which do not require any vowel support to prounce. It is
> writing with a vowel sign U [U0D41] and 'chandrakkala' (virama
> [U0D4D]) above that. Infact Malayalam has seperate 'lipi' (script) for
> 7 chillu forms of consonants which are widely using in Malayalam. 
> Since we have seperate scripts for most of the chillus, in writing
> system we almost stopped writing chillu forms of other consonants
> (which is rarely occurs) as explained above. Eventhough still you can
> see some texts written in this style. Antoine said this is half form
> of u that is the 'samvrutokaram' of U [U0D09] (infact 'samvrutokaram'
> has a sound of A and U, so the 'virama', 'vowel sign U' and
> 'combination of this two' is used in diffnerent places and texts, some
> lingusits says that 'samvrutokaram' has a vowel value.) Now many are
> writing consonants with virama for chillu forms of other consonats One
> example is that Antoine said :  U0D15 + U0D41 + U0D4D (ka, u, virama).
> So internaly a chillu can be represented with unicode character
> sequence like this : <consonant> + <vowel sign U [U0D41]> + <virama
> [U0D4D]>. Then you can render 7 chillu forms with correct script. I
> will explain how to do this below. For making inputting very easy you
> can use the inscript keyboard layout standardised by kerala govt. (See
> they just added chillus to original inscript keyboard layout at
> appropriate positions, they considered the frequency of occurense of
> this chillu forms. I will explain the drawback of this keyboard layout
> below.)
> 
> The proposal for inclusion of scripts of chillus forms of consonants
> as basic characters should not be accepted by Unicode consortium.
> (This is going to be submitted (or already submitted?) by Ministry of
> Information Technology (Govt. of India), a member of Unicode
> consortium) The prosal includeds some other things, in my opinion
> those changes should be accepted.
> 
> Now I will explain howto represent chillu forms of consonants in
> unicode sequence. An important thing to be noticed is that two (or
> more) consonants may have same script for their chillu forms. And its
> pronouciation is also same. Though it should be represented in correct
> unicode sequence. Script for chillu forms of both RA [U0D30] and RRA
> [U0D31] are same. Similary script for chillu forms of both LLA [U0D33]
> and LLLA [U0D34] are same. Other consonants which has chillu forms
> with unique scripts are NNA [U0D23], NA [U0D28] and LA [U0D32].
> 
> Why 5 scripts of 7 chillus forms of consonants should not be included
> in unicode ?
> ----------------------------------------------------------------------
> ----------------
> 
> * The basic reason is that those 5 'lipi' (script) are not part of
> Malayalam 'Aksharamala'
>    (character set). instead these are chillus only (See it is not a
> 'koottaksharam'
>    (consonanat conjunct) )
> 
> Sopporting reasons :-
> 
>   + As I explained above two (or more) consonants is using same script
>   for
> their chillu
>      forms. So if these 'simple shapes' are going to be part of
>      unicode
> hard encoding of
>      hard encoding of chillus wll be impossible. If someone input in
> correct unicode seqence
>      the renderer should render those characters, this will make more
> problems.
> 
>   + Sorting rule cannot impliment effectively.
> 
> Inscript keyboard layout problems :-
> -------------------------------------
> 
>    I think the drawback of new inscript keyboard layout standardised
>    by
> Kerala govt.
> will be clear from the above discussion. Eventhough the layout can be
> accepted with practical consideration. Since we are only using those
> scripts, we can compose any character sequence to keys allocated to
> them. Here the choice is coiming in between RA [U0D30] and RRA [U0D31]
> chillu and LLA [U0D33] and LLLA [U0D34]. By considering the accent of
> pronounciation and freequency of occurense of these chillus, you can
> choose RRA [U0D31] and LLA [U0D33]. Infact this only can be decided by
> cosidering the words. For example :- RA [U0D30] + vowel sign U [U0D41]
> + virama [U0D4D] is correct in words : neer - neere (water), avar -
> avare (they),  aar - aare (who) etc.
> 
> and RRA [U0D31] + vowel sign U [U0D41] + virama [U0D4D] is correct in
> words : car - caRe (car), kiNar - kiNaRe (well), sir - saRe (sir) etc.
> 
> So if someone input the other correct sequences (without using those
> keys), it should render properly.
> 
> P.S : please reply to address@hidden
> 
> Regards,
> Baiju M
> --
> http://baijum81.tripod.com
> 
> 
> --- In address@hidden, Antoine LECA <address@hidden> wrote:
> > Hi folks,
> >
> > A problem was signaled in the Microsoft VOLT mailing list (this list
> > should be dedicated to typographic, but it appears that it deals
> > more with Indic scripts, because VOLT is the MS tool to use to
> > encode OpenType informations in a font, which in turn is required to
> > display Indic scripts on Windows.)
> >
> > The problem deals with Malayalam half-u. An user signaled as an
> > error the fact that Uniscribe displays a dotted circle in the middle
> > of a Malayalam half-u. He wrote
> >         U+0D15 U+0D41 U+0D4D  (ka, u, virama)
> > and Uniscribe displayed (in reformed style) the ku syllable, then a
> > dotted circle, then a virama sign hanging alone.
> >
> > Of course, the problem is that Uniscribe expects virama to come only
> > after consonants, so it displayed it as an error. But I believe the
> > misunderstood hides a real problem: how can be displayed the half-u.
> > Hence I am coming here to see what the gurus believe about this.
> >
> > To help you, I have done some researches. Here is what I have found.
> >
> > First, the phonetic reality: the root is when a word ends with
> > halanta (virama); while in other languages, this "kills" the
> > a-sound, in Malayalam it rather replaces it with the half-u sound,
> > particularly when the consonant is a conjunct. This is for example
> > described in the ISO 15919 standard, available with detailed
> > explanations at Dr Anthony P. Stone site,
> > <URL:http://homepage.ntlworld.com/stone-catend/trind.htm>
> >
> > According to Varamozhi (a site well informed about Malayalam),
> > <URL:http://varamozhi.sourceforge.net/varamozhi-doc/varamozhi-6.html
> > > when it comes to representation, there exists differing writing
> > "styles" contemplating this single phonetic reality; in North
> > Kerala, usage is to write the halanta sign in place, and Done!
> > Obviously, this is very much in line with the other scripts.
> >
> > However, in South Kerala, as Mr. Cibu said, usage is to write the
> > halanta sign as well as to show the matra for the u vowel. While it
> > is said that this latter usage occurs with the reformed style, I
> > have seen examples with the traditional style as well (although this
> > is from a book printed in Madras, so it might be wrong.) Obviously,
> > the user of Uniscribe intended to display this combination, which to
> > him is the normal way to display a word, when it ends with halanta!
> >
> > Knowing that, we can now notice that Unicode has a note under
> > Malayalam virama (U+0D4D), saying it is the same as Malayalam
> > half-u. To me, this means that under Unicode, the half-u is supposed
> > to *not* be specifically encoded, and only the usage from North
> > Kerala is supposed to be followed.
> >
> > Other relevant informations: ISCII-91 seems mute about the subject,
> > and THE CDAC products (like iLeap) seems unable to render the half-u
> > in Malayalam (until one "cheats" using the INV pseudo-consonant.)
> >
> > It is too late to discuss the pros and cons of the choice of
> > Unicode, back in 1992 (probably, they chose to ease as far as
> > possible the unification of encoding, in order to ease sorting and
> > similar tasks.) Now, the problem is, if someone wants to
> > specifically encode the showing of the u matra, in a context (like
> > is Uniscribe) where both usages from North and South Kerala could be
> > intended, how should it be done? It seems rather natural to use then
> > the combination
> >                   U+0D41  U+0D4D,
> > following the precedent established in Unicode 3.1 (IIRC) for the
> > modern Bengali A and E initial vowels (from English borrowed words),
> > which are written as Bengali A or E, followed by virama then ya
> > (hence a exception to the rule virama may only follow a consonant.)
> >
> > Are the gurus here OK with this "solution"?
> >
> > Can it be "sanctified", for example with the inclusion of the
> > adequate words in some revision of Unicode?
> >
> >
> > If this is agreed, when dealing with other aspects than rendering,
> > people should take in account this, and effectively ignore the
> > U+0D41 when followed by U+0D4D, when the task is about searching,
> > sorting, etc. While this is a nuisance, it does not appear
> > completely prohibitive to me. But I admit I have not think a lot
> > about the consequences of allowing such "presentation encoding."
> >
> >
> > Regards,
> > Antoine
> 
> 
> 
> 
> 
> -------------------------------------------------------
> This sf.net email is sponsored by:ThinkGeek
> Welcome to geek heaven.
> http://thinkgeek.com/sf
> _______________________________________________
> Indic-computing-standards mailing list 
> http://indic-computing.sourceforge.net/
> address@hidden
> https://lists.sourceforge.net/lists/listinfo/indic-computing-standards
> [Other Indic-Computing mailing lists: -users, -devel, -announce]
> 
> 

-----------------------------------------------------
Dr. U.B. Pavanaja
Editor, Vishva Kannada
World's first Internet magazine in Kannada
http://www.vishvakannada.com/

Note: I don't worry about pselling mixtakes





reply via email to

[Prev in Thread] Current Thread [Next in Thread]