[Top][All Lists]

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Approximate Searches

From: leo stone
Subject: Re: [GNUnet-developers] Approximate Searches
Date: Thu, 25 Jun 2009 10:05:41 +0200

There are two considerations.
How many typos are likely and how is the local filtering done.

If the local result filtering is not relaxed about typos of the sort "Woh" than it would make no sense at all
to sort the consonants since non matching results would get filtered out anyway. 

If the local filtering can handle those typos it's still a question of COST vs. GAIN, and this decision will be left to your guts. 
One should consider though that most of the typos will probably happen during search input rather
than when inserting a file. And I must say, if a program is smart enough to handle my search typos
I am likely to be very pleased. You have a much better idea about the impact on the net so I can't 
really say anything about that.

regards leo

ps: I am wondering if you have an opinion about the matters that i am trying to talk about in the forum. 

On Wed, Jun 24, 2009 at 8:15 PM, Christian Grothoff <address@hidden> wrote:
I like this idea (at least as an option that should likely be the default) and
have added it to the list of things to change for 0.9.x.  What I wonder if
sorting the consonants should be omitted or not.  Some statistics on bad
collisions with and without sorting would probably be nice to have...


On Tuesday 23 June 2009 07:27:17 leo stone wrote:
> I believe the biggest factor on how we judge a system for future usability
> is how many results we get if we are looking for "something" like
> "something".
> Imagine a shoe shop, with only two pair of shoes in it. And one with a few
> hundreds.
> The result in the end might be the same you leave both shop's not finding
> what you want, but most people will consider
> the shop with a hundred pairs more promising and worth spending time next
> time they try to find some shoes.
> So making sure people are getting results in their searches is probably one
> of the more important issues, after
> my doubts about how the routing is handled.
> Even though it might mean some significant overhead, i would consider doing
> something like normalizing keywords.
> If it must be, per language but in the beginning English should be enough.
> So if i wanted to share the following file, and i would like it public, so
> people can find it, why not store it such:
> "Woh_the.fuck_is ALICe(2008).divx.avi.WMV"  =>  { HW , HT , CFK , S , CL ,
> 2008 , DVX , V ,  MVW }
> Put the file under the hash's of those nine "key words".
> When i seach now for "fuck alice"  =>   { CFK , CL }
> search h(CFK)  AND h(CL)  will return a lot of wrong similar results but
> them one can filter locally in a more elaborate way.
> It might even be more selective than search  h(video/x-msvideo)
> At least it returns results, whereas "Woh_the.fuck_is
> ALICe(2008).divx.avi.WMV" as a key word is very unlikely that any one
> would think to search for and therefore never be found, never be spread
> ....., except by chance of course.
> regards leo

reply via email to

[Prev in Thread] Current Thread [Next in Thread]