[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[aspell-devel] Two questions about run-together words, and affixes
From: |
Gora Mohanty |
Subject: |
[aspell-devel] Two questions about run-together words, and affixes |
Date: |
Mon, 11 Dec 2006 15:11:25 +0530 |
Hello,
I am in the middle of preparing a write-up on using aspell for
Hindi, and, eventually other Indian languages. The long and short
of it is that the original problems that I was facing were because
of a misunderstanding on my part of the format for the soundslike
file.
As measured by the performance on a test list of some 500 words,
it now works reasonably well with the plain Hindi dictionary, i.e.,
without any support for advanced aspell features. Adding soundslike
support makes the performance comparable to the best modes for
English. This is not surprising, as Indian languages are spelt
phonetically.
I still have two questions about issues that would improve
performance:
(a) Run-together words: It seems that for a long mispelled word that
is close to two smaller words, aspell first suggests combinations
of shorter words. For example, in English, "ratdog" turns up
"rat dog" and "rat-dog" as the first two suggestions. I had
thought that this was because of run-together words, but using
"run-together false" in the .dat file does not seem to make a
difference. I understand why one would want to have run-together
words in the suggestion, but is there any way I could eliminate
them (for example, one does not hyphenate words in Hindi), or
use a weighting to reduce their importance, so that they appear
later in the list of suggestions.
(b) Affix rules: Though affix rules seem to be working properly for
Hindi, is there any way that I could have aspell accept, e.g.,
"word + suffix" as correct, when only "word" is in the dictionary,
but there is an affix rule for "word + suffix"? Alternatively,
would it be possible for "word + suffix" to appear as the first
suggestion in such a case? The reason that this would be useful
is that Hindi makes a lot of use of suffices, and without these
being marked correct, an auto-spellchecked document gets cluttered
with spurious underlinings.
Regards,
Gora
- [aspell-devel] Two questions about run-together words, and affixes,
Gora Mohanty <=