freebangfont-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [Freebangfont-devel] Bangla Unicode stuff!


From: Deepayan Sarkar
Subject: Re: [Freebangfont-devel] Bangla Unicode stuff!
Date: Sat, 1 Mar 2003 20:16:22 -0600
User-agent: KMail/1.5

Hi,

I might be missing something, but could someone remind me why we need things 
like ya-phala, ba-phala and khondo-to in unicode ? My understanding of the 
spirit of unicode is that it is supposed to represent the language, not the 
way it is written (it has some hints for decomposition -- e.g., vowel sign o 
is split into vowel sign e and vowel sign aa etc, but not much more than 
that). 

This is how bengali is currently described in unicode, and obviously it seems 
to work well for the most part. I would be convinced that this needs to be 
extended only when I see an example which either 

(1) cannot be represented in unicode.
(2) has ambiguous interpretation on how it should be rendered.



Let's consider the ra+hasanta+ya case first. This obviously seems to be an 
instance of (2). Does anyone know of a real word which demonstrates this 
ambiguity ? Taneem mentioned ra+hasanta+ya, but that's not a word. And in any 
case, I don't recall ever seeing any bengali word which started with a 'ra' 
where it was displayed as a 'reph', so if we really are forced to take a 
decision about this, I would say ra + ya-phala. But I would contend that we 
should _not_ worry about non-existent words.

The only types of words I can think of where ra+hasanta+ya is encoded with a 
ya-phala are those like

(a) ra + hasanta + ya + A-vowel-sign + pa + A-vowel-sign + ... 
   (for example, ryApAr/wrapper)

(b) pa + hasanta + ra + hasanta + ya + A-vowel-sign + ... (e.g.,practical) 

From this, I would be inclined to conjecture that ra+hasanta+ya will be 
encoded with a 'reph' if and only if it follows a vowel (whether in full form 
or as a vowel sign, including the invisible vowel sign for 'a' (U+0985) ). In 
any case, I think this should be a decision for the rendering application, 
not unicode -- unless  someone can show me a case where the rendering is 
really ambiguous -- i.e., two words that are rendered differently but consist 
of exactly the same sequence of characters.




Next, consider khondo-to. In my understanding, khondo-to is just a 
representation for ta+hasanta, when it is not combined with the following 
consonant, if any. This is handled well, in my opinion, by a halant form for 
ta in an opentype font, and ZWNJ. I don't see why it should be more special 
than the other consonants in bengali. 

The only way a case for a distinct khondo-to could be made, in my mind, is if 
someone wanted both the khondo-to and an explicit 'ta + hasanta-mark' to 
represent ta+hasanta. My conjecture is that this is an inconsistency in the 
font, and should handled by combinantions of fonts (or perhaps alternate 
forms in fonts). There should NOT be a distinction between these in terms of 
the linguistic content.



Then, of course, there is the problem of a-yaphalaa-aakaar, a case that is 
perhaps of type (1). My take on this is that this is a modern construct 
which, even though it looks like 'a + yaphala', is really a new letter in the 
alphabet. It would be nice to have this added, but that would probably be 
overkill as it is after all an artificial construct. I would be happy as long 
as there is some accepted way to specify this in an unicode sequence -- 
whether that is as a new code point, or as a+hasanta+ya+aakaar (I don't think 
that Andy's point that this is not semantically valid is a strong objection. 
Since there is no valid semantic definition of this character, something new 
will have to be made up anyway, it might as well be this. Of course, adding a 
new code point like CDAC apparently does (according to Andy) would be the 
preferred way.)


[An alternative argument might be as follows: that this (a+yaphalaa+aakaar) is 
a new vowel -- and the ya-phalaa+aakaar that's used more traditionally, e.g. 
in the word byAkaraNa, is the vowel sign for this vowel. This might justify 
ya-phalaa as an addition to unicode. I would disagree with this saying that 
the construct in byAkaraNa is derived from Devanagari, whereas the 
a+yaphala_aakaar is not.]


Anyway, this is all personal and not-very-well-thought-out (and not very 
well-researched either) opinion, so feel free to ignore this if it doesn't 
make sense.

Deepayan

On Saturday 01 March 2003 02:02 pm, Andy White wrote:
> Kaushik Ghose wrote on some other list or some place other ;-)
>
> > Hi Andy et al,
> > yes, its important that "we the people" take a part in fixing
> > the standards. [...]. The end game being arguments for
> > the logical encoding of these "exceptions"
> > Anyone know how IISC handles these cases ?
>
> [e.g. 'A'+Japhalaa+AAkar, Ra+Japhalaa etc.]
>
> (I was wondering about cc'ing to the Indic Standards list.)
>
> The problem is that IISCI has not matured much from its original
> specification. It seems to only have been popular with users of the
> Devanagri script and as such, Bengali 'exceptions' have never been much
> of an issue. In the past, if it didn't work you got a different
> software, right? (or in the programmers case, if it doesn't work, use
> yet another font encoding scheme of your own invention.)
>
> The fact is this; the latest specification of ISCII, does not define
> anything for Bangla; not 'khondoTo' nor 'jofola' not even bofola.
> However, implementers of ISCII have used there own minds for these
> things. E.g. CDAC encode AW+JAWFOLA+AAKAR as a separate letter, in a
> spare code slot. They encode 'khondoTa'  as Ta+Virama when final, or
> Ta+Virama+INV, to stop it forming a conjunct with the next letter, when
> medial. (As Unicode does not have a INV letter, text encoded this way
> may give us problems in the future.)
>
> Why does Unicode need to be backwardly compatible with ISCII? especially
> even when the Indian govt. doesn't seem to care? (If it cared, it would
> be busy updating ISCII to reflect recent additions to Unicode)
> Personally, I don't care! I would be happy to start afresh with a system
> that really works. I don't think that there is that much iscii data out
> there any way. Unfortunately the majority do seem to care, and any
> proposals for additions to TUCS are always met with the question.
>
> Andy
>
>
>
>
>
>
> _______________________________________________
> Freebangfont-Devel mailing list
> address@hidden
> http://mail.nongnu.org/mailman/listinfo/freebangfont-devel





reply via email to

[Prev in Thread] Current Thread [Next in Thread]