aspell-devel
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [aspell-devel] Aspell Status Update as of February 12, 2004


From: Lars Aronsson
Subject: Re: [aspell-devel] Aspell Status Update as of February 12, 2004
Date: Sat, 14 Feb 2004 02:35:36 +0100 (CET)

Kevin Atkinson wrote:
> The biggest change in Aspell 0.51 is support for Affix Compression.
> Affix compression is the act of combining several words with a common base
> word into one word which consists of the base word and a list of affixes
> to apply.  (Affix is the generic term for prefix, suffix or infix).  For
> example "alarm alarms alarmed alarming" will become "alarm/SDG" where SDG
> stands for the suffixes of alarm.  This can make a huge difference in
> space for languages with have extensive affixation such as German.

While I greet this improvement, I object to the term "affix
compression".  Making the dictionary file smaller (compression) might
be one effect of using affix flags, but more important to languages
such as German and Swedish is a guarantee that every grammatically
legal ending for each word is covered by the dictionary.  "Grammatical
completeness" is the desired effect.  It would be sad if this couldn't
be combined with the highly appreciated sounds-alike function of
Aspell.

Ultimately, every time a new word is added to the dictionary, the
correct affix flags should also be added.  There is little point in
adding "alarming" to the dictionary unless "alarm" and "alarms" are
added at the same time.

The OCR software FineReader version 6 and later, at least in its
English version, contains an example of how a user interface for
adding words to a dictionary with affix patterns can be designed.
This is try-and-buy software (for Microsoft Windows), so you can have
a free look at it at http://www.finereader.com/

Roughly speaking, when the user wants to add a word to the dictionary,
she is asked for the word's basic form (alarming -> alarm) and then
all possible endings resulting from the available affix flags are
listed with check boxes.  The user can check the flexions that apply
and submit the new word.

An example: You want to add "going" to the dictionary.  The system
asks what the basic form is.  You enter "go".  The system asks which
endings are legal: goes, goed, going.  You mark goes and going, and
submit.  The system stores go/SG.  (Assuming that /S adds -es to words
that end in a wovel.)

Will the affix definition file follow the ispell or myspell format,
or use its own format?

I personally maintain a Swedish dictionary in ispell format from which
I generate my Aspell dictionary, using "ispell -e" for expansion.
Currently I have no good way to add new words interactively, when
using Aspell.  I usually open my source dictionary file in Emacs, edit
it, then run "make" to rebuild my dictionaries, all batch oriented.

Ispell comes with the "munchlist" utility that can be helpful in
developing good dictionaries.  If munchlist fails to apply an affix
flag, it is because the expanded dictionary (current aspell format)
didn't contain one form of the word.  My Swedish expanded
Aspell dictionary has 5.34 times more words (264K words) than my
source file in ispell format (49K).  Munchlist is able to compress
this to marginally smaller (48K), because my source is grammatically
correct and not mathematically optimized for list compression.


-- 
  Lars Aronsson (address@hidden)
  Aronsson Datateknik - http://aronsson.se/






reply via email to

[Prev in Thread] Current Thread [Next in Thread]