smc-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [smc-devel] Re: Malayalam OpenType font (GPLed)


From: Miikka-Markus Alhonen
Subject: Re: [smc-devel] Re: Malayalam OpenType font (GPLed)
Date: Thu, 14 Nov 2002 12:17:43 +0200 (EET)
User-agent: IMP/PHP IMAP webmail program 2.2.6

Hi!

Lainaus Baiju M <address@hidden>:
> I dont know whether U0D4C is used ever in Malayalam as given in the
> chart,
> but there is a chance because in Tamil they still using a similar sign
> (spliting into both sides). I searched a lot to find a single document
> written like that, but I faild, though I will continue my search.

OK, now I understand your point. If it wasn't used even before the
script reformation in 1974, then something is wrong. While searching
in the Internet, I didn't find an actual text sample, but apparently
the glyph in the charts isn't invented by the Unicode Consortium. In
the middle of the page

http://www.proel.org/alfabetos/malayala.html

there's an image malayal7.gif titled "Variación del signo silábico",
where all the Malayalam vowels are shown attached to the letter Tta
(U+0D1F). There you can see the same two-side form of the vowel sign
as in the Unicode charts.

In the manual of Malayalam TeX, available at

http://tex.loria.fr/fontes/malayalam.ps.gz

on page 7, both forms are listed. The glyph of U+0D57 is available
through the key sequence au and the glyph of U+0D4C through the
key sequence au". Perhaps these were interchangeable at some point
in the history? Or maybe the Malayalam script was used to write
some other language besides Malayalam, Sanskrit for example?

Do you know what kind of a sign for "au" is encoded in ISCII?

> And I dont know what is the purpose of AU length mark in Malayalam.
> But it will be very usefull for font developers, they only required to
> design that character once and put it there. The same shape is coming
> in
> lots of glyphs, so that they can just refer to that, it will reduce
> font
> size considerably.

I suppose the original idea of encoding a MALAYALAM VOWEL SIGN AU and a
MALAYALAM AU LENGTH MARK, was because of backwards compatibility with
older standards or fonts. Apparently the Unicode people thought that the
two-side form is used commonly in Malayalam, and allocated a single
codepoint for it, as was done for the other two-side vowels "o" and
"oo". However, not many older technologies supported the idea of a
split glyph in a single codepoint, and this is why a separate codepoint
was assigned for the glyph of the right side of the split vowel. The
same was done with other Indic scripts having split vowel signs, i.e.
Bengali, Oriya, Tamil, Telugu, and Kannada.

> But we cannot use length mark AU instead of vowel sign AU anywhere.
> While sorting it will make problem, or should we give same value
> for both chars (I dont know the deatails of the alogithm).

AFAIK, the collation weights _can_ be changed in different Unicode
versions, so sorting will not become an issue, once the Consortium
is made aware that there is a problem. Or, if the Consortium decides
not to change the collation weight, you can always tailor the
sorting algorithm for your own language, as it is explained in the
Unicode Technical Standard #10, available at

http://www.unicode.org/unicode/reports/tr10/

> > (Besides, I think it _would_ break even the standard, since then you
> > would have to alter the canonical decomposition of U+0D4C, which is
> > strictly against the policy of the Unicode Consortium.)
> 
> Why unicode should specify display of characters?
> I think only code points are important we can render it as we like.

If it was only an issue about whether to display e.g. the Latin letter O
with a rectangular glyph or an elliptical glyph, Unicode wouldn't care.
There is however a _permanent_ Unicode property of canonical/compatibility
decomposition assigned to some characters, which can not be changed.
This means that if a character has a decomposition, the same character
can be encoded in two different Unicode-compliant ways: either directly
as a character by itself, or as a sequence of the constituent parts of
the decomposable character.

In this case this means that MALAYALAM VOWEL SIGN AU U+0D4C is _defined_
so that it can be represented in text with the codepoint U+0D4C, or
as a sequence of the codepoints of MALAYALAM VOWEL SIGN E U+0D46 and
MALAYALAM AU LENGTH MARK U+0D57. Both of these representations should be
treated the same way in every aspect of data processing.

Now, the Unicode Consortium apparently didn't know the situation in
the Malayalam community well enough, and allocated the codepoints in
a way which you find erroneous. They made a mistake, as they have
done also a few times before, which is very unfortunate.

But because of the absolute permanency of a property like canonical
decomposition, the glyph of U+0D4C can not just be changed to that
of U+0D57. My suggestion would be to start encoding the au sign
in modern Malayalam with the codepoint U+0D57, even though the name
of the character suggests something else. Then, it can just be
considered a misnomer, nothing more.

There have been other misnomers, too, in the history of Unicode,
which some people have tried to get changed. But as with the
case of decomposition, the name of a character is also permanently
allocated, no matter how erroneous it might be. If these kinds of
things _could_ be changed when somebody notices an error, it would
make Unicode a very unstable standard. Even now, many people think
that many of the less vital properties of Unicode characters
should never be changed, since security issues arise quite easily,
if something is drastically changed.

> OpenType has single substitution feature. cannt I use that,
> even if Yudit split it into both sides?

No, because as it is, it is a _requirement_ of the Unicode standard
to split the codepoint U+0D4C to two sides in display. Every
Unicode compliant program should do so now and in the future, too,
since this won't be changed even if an official proposal was made.

> But single substitution is not working in Yudit, I tested raghu.ttf
> (Devanagari) font , there is some single substitution for
> vowel sign I (U093F), its not working, may be a bug.

I'm not sure what you mean by this. For me, raghu.ttf is working
just fine, and Yudit is placing the vowel sign I to the left of the
consonant cluster it is attached to, as it should.

Best regards,
Miikka-Markus Alhonen

(BTW, I'm not officially affiliated with the Unicode Consortium,
so I cannot speak for its behalf, but the above mentioned policies
etc. are just from my own experience with Unicode and from some
clearly articulated official documents.)




reply via email to

[Prev in Thread] Current Thread [Next in Thread]