aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

[aspell-devel] [aspell #336454] Aspell Malayalam


From: Santhosh Thottingal
Subject: [aspell-devel] [aspell #336454] Aspell Malayalam
Date: Mon, 2 Jul 2007 15:02:59 +0530

Forwarding to this mailing list.
As Gora said it is strange to make users knows about ZWJ and ZWNJ, but
there is no other way to solve the specific problem of chillaksharas.
We are trying to minimize this problem by using transliteration based
keyboards. Inscript Keyboard layout users must know about these
codepoints as of now. There is a discussion going on this in unicode
mailing list , whether we need to assign unicode points for
chillaksharams or not.

Anyway, can somebody suggest a solution for the aspell-malayalam
problem mentioned in the below mail?

-santhosh

--------
From: Gora Mohanty via RT <address@hidden>
Reply-To: address@hidden
To: address@hidden
Date: Mon, Jul 2, 2007 at 2:29 PM

Dear Santosh,
 I remember our discussing this, and I am sorry that I got awfully
busy on other things. I will take a look at the code for the
Malayalam dictionary that you sent me.
 If you remember, I had suggested that we filter out ZWJ/ZWNJ from
both the words in the dictionary, and from the input words to be
spell-checked. I find it a little strange that you are expecting
end-users to be aware of what ZWJ/ZWNJ are, and how to enter them
correctly. However, it is probably my understanding of Malayalam
chillaksharas that is at fault here, as I am given to understand
that all open-source renderers will now have to start taking this
into account.
 There are two problems here, the first being that ZWJ/ZWNJ have
to be assigned to currently-vacant Unicode Malayalam codepoints.
You are right that this will cause problems later on, should those
codepoints get assigned, but changing the internal aspell encoding
is not too difficult a task. The second problem is that it seems to
be that assigning ZWJ/ZWNJ in this manner does not seem to work in
aspell, probably because it is already aware of the existence of
these. Kevin, can you shed more light on this?
 This discussion should also probably be taken to the aspell-devel
mailing list.

Regards,
Gora

------------------------

From: Santhosh Thottingal <address@hidden>
To: Kevin Atkinson <address@hidden>
Date: Mon, Jul 2, 2007 at 12:48 PM

Hi,
I am working on the Aspell Malayalam wordlist preparation. I am facing
a problem related to ZWJ and ZWNJ. In Malayalam language, the usage of
ZWJ and ZWNJ is very common for a particular set of alphabets named
"chillus". The u-mlym.txt contains the following entry.
0x80..0xFF = U+0D00..U+0D7F
When we try to prepare the wordlist using this, all the words with zwj
and zwnj are rejected.
Some of my friends suggested to use unused unicode points in the
Malayalam for U+200C and U+200D
like this.
0x80..0x81=U+200C..U+200D
0x82..0xFF= U+0D02..U+0D7F
But when i asked about this in the unicode mailing list, they
discouraged to use this approach, since these unused points might be
used in future and at that point our application will break.
Could you please give a solution for this?

Thanks,
Santhosh Thottingal
Swathanthra Malayalam Computing.




reply via email to

[Prev in Thread] Current Thread [Next in Thread]