[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: [silpa-discuss] (no subject)
From: |
Irshad Ahmad |
Subject: |
Re: [silpa-discuss] (no subject) |
Date: |
Thu, 3 Mar 2016 19:12:19 +0530 (IST) |
Sorry missed the publication link in previous mail.
Here it is:
http://dl.acm.org/citation.cfm?id=2824872
----- Original Message -----
From: "Irshad Ahmad" <address@hidden>
To: "silpa-discuss" <address@hidden>, "Riyaz Ahmed" <address@hidden>
Sent: Thursday, March 3, 2016 6:59:46 PM
Subject: Re: [silpa-discuss] (no subject)
Thanks for the reply,
Here are the references to my Indic transliterations systems:
Indic-Roman system: https://github.com/irshadbhat/python-irtrans
Hindi-Urdu system: https://github.com/irshadbhat/python-hutrans
Indic-Indic system: https://github.com/irshadbhat/indic-wx-converter
Indic-Roman is for Indic to Roman and vice-versa transliterations. This system
currently transliterates between the following language pairs:
English <-> Hindi,
English <-> Gujurati,
English <-> Telugu.
I've a better performing system in my local repository which also works for
Tamil, Malayalam, Kannada, Bengali, Oriya, Punjabi, Urdu, Assamese etc. I
haven't just committed the changes yet. Please note that the system is not
rule-based, rather build using Machine-Learning.
Hindi-Urdu system transliterates between hindi <-> urdu. This system is
separately developed because of the huge vocabulary overlap and other
similarities between the two languages, which makes this language pair a
special case unlike other Indian language pairs.
Indic-Indic system (wx-converter) actually is not a transliteration system.
This system converts Indic scripts to WX
(https://en.wikipedia.org/wiki/WX_notation). The main idea of WX is to convert
Indian scripts to a common representation (ASCII) and then convert this ASCII
to Roman letters. This system works reasonable for transliteration between
Indic scripts because Indic scripts have a special property that their phonemes
are one-to-one aligned between their Unicode tables. The only problem with this
scheme are the missing phonemes in some scripts like there is no "Va" in
Bengali script which hardens transliteration. But this can be handled with some
heuristics. This system is completely rule-based but I can develop a
machine-learning system for Indic-Indic transliteration as well.
Here are few examples how this system can be used for transliteration:
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin |
converter-indic --l mal --s wx
ആമ ആദമീ സേ ആജാദീ ആജ ഭീ കോസോം ദൂര ഹൈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin |
converter-indic --l tel --s wx
ఆమ ఆదమీ సే ఆజాదీ ఆజ భీ కోసోం దూర హై
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin |
converter-indic --l ori --s wx
ଆମ ଆଦମୀ ସେ ଆଜାଦୀ ଆଜ ଭୀ କୋସୋଂ ଦୂର ହୈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin |
converter-indic --l guj --s wx
આમ આદમી સે આજાદી આજ ભી કોસોં દૂર હૈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin |
converter-indic --l pan --s wx
ਆਮ ਆਦਮੀ ਸੇ ਆਜਾਦੀ ਆਜ ਭੀ ਕੋਸੋਂ ਦੂਰ ਹੈ
echo 'आम आदमी से आजादी आज भी कोसों दूर है' | converter-indic --l hin |
converter-indic --l kan --s wx
ಆಮ ಆದಮೀ ಸೇ ಆಜಾದೀ ಆಜ ಭೀ ಕೋಸೋಂ ದೂರ ಹೈ
Finally I would like to share my publication which provides the description of
the procedure that I'he used to build the first two systems.
Thanks
--
Irshad Ahmad
----- Original Message -----
From: "Irshad Ahmad" <address@hidden>
To: "silpa-discuss" <address@hidden>
Sent: Thursday, March 3, 2016 4:31:35 PM
Subject: [silpa-discuss] (no subject)
Hello,
I would like to contribute to the transliteration module of libindic. I have
been working on transliteration for Indian Languages for past six months. I've
already come up with some good results for Indic scripts to Roman and
vice-versa transliterations and transliterations within Indic scripts. I've a
Python experience of more than 1.5 years as well. I would like to share my
ideas and propose some extra stuff regarding Indic transliterations as well. I
was unable to find any mentor information for the project on the wiki page.
Could you please inform me who will be mentoring the project so that I can
discuss my ideas with him.
Thanks
--
Irshad Ahmad