[Top][All Lists]
[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Freecats-Dev] Indexing - Yet Another (BETTER) Version
From: |
Henri Chorand |
Subject: |
[Freecats-Dev] Indexing - Yet Another (BETTER) Version |
Date: |
Tue, 28 Jan 2003 13:00:10 +0100 |
Hi all,
Sorry to bother you with all these different files. The ONLY modified part
is (ALSO) the fuzzy search algorithm.
Apart from typos in numbering, I have suddenly realized that we need to
weigh N-Grams in order to avoid giving a higher priority to "long" words
than to "short" words, which happened because we generated as many N-Grams
as we could from any given word.
Other improvement: when a given word is very long, we now take only its
longest (and most discriminating) N-Grams, and drop the larger number of
(quite probably less meaningful) small ones. (This will also provide a
little more speed).
I included a basic working algorithm which may, of course, be refined.
All this may be done differently for context search.
Thanks for your patience. I won't publish anything else today, I promise !
Regards,
Henri Chorand
DB_indexing 0.2.1.rtf
Description: MS-Word document
[Prev in Thread] |
Current Thread |
[Next in Thread] |
- [Freecats-Dev] Indexing - Yet Another (BETTER) Version,
Henri Chorand <=