gnunet-developers
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: [GNUnet-developers] Approximate Searches


From: Christian Grothoff
Subject: Re: [GNUnet-developers] Approximate Searches
Date: Sat, 27 Jun 2009 14:03:47 -0600
User-agent: KMail/1.11.4 (Linux/2.6.29-2-686; KDE/4.2.4; i686; ; )

On Thursday 25 June 2009 02:05:41 leo stone wrote:
> There are two considerations.How many typos are likely and how is the local
> filtering done.
>
> If the local result filtering is not relaxed about typos of the sort "Woh"
> than it would make no sense at all
> to sort the consonants since non matching results would get filtered out
> anyway.
>
> If the local filtering can handle those typos it's still a question of COST
> vs. GAIN, and this decision will be left to your guts.

We can certainly do some kind of local filtering in the end that can handle 
typos; I am just not really willing to decide on sorting or not just based on 
guts and without any facts.

> One should consider though that most of the typos will probably happen
> during search input rather
> than when inserting a file. And I must say, if a program is smart enough to
> handle my search typos
> I am likely to be very pleased. You have a much better idea about the
> impact on the net so I can't
> really say anything about that.

Well, I wonder if a better alternative to dealing with typos would not be to 
simply run a spell-checker when the keyword is entered.  That would take care 
of permutations and eliminate the problem of transmitting results that then 
need to be filtered later. 

> regards leo
>
> ps: I am wondering if you have an opinion about the matters that i am
> trying to talk about in the forum.

I do, it usually takes me longer to get to the forum, but I've now added some 
comments there.

Christian

> On Wed, Jun 24, 2009 at 8:15 PM, Christian Grothoff
>
> <address@hidden>wrote:
> > I like this idea (at least as an option that should likely be the
> > default) and
> > have added it to the list of things to change for 0.9.x.  What I wonder
> > if sorting the consonants should be omitted or not.  Some statistics on
> > bad collisions with and without sorting would probably be nice to have...
> >
> > Christian
> >
> > On Tuesday 23 June 2009 07:27:17 leo stone wrote:
> > > I believe the biggest factor on how we judge a system for future
> >
> > usability
> >
> > > is how many results we get if we are looking for "something" like
> > > "something".
> > > Imagine a shoe shop, with only two pair of shoes in it. And one with a
> >
> > few
> >
> > > hundreds.
> > >
> > > The result in the end might be the same you leave both shop's not
> > > finding what you want, but most people will consider
> > > the shop with a hundred pairs more promising and worth spending time
> > > next time they try to find some shoes.
> > >
> > > So making sure people are getting results in their searches is probably
> >
> > one
> >
> > > of the more important issues, after
> > > my doubts about how the routing is handled.
> > >
> > > Even though it might mean some significant overhead, i would consider
> >
> > doing
> >
> > > something like normalizing keywords.
> > > If it must be, per language but in the beginning English should be
> >
> > enough.
> >
> > > So if i wanted to share the following file, and i would like it public,
> >
> > so
> >
> > > people can find it, why not store it such:
> > >
> > > "Woh_the.fuck_is ALICe(2008).divx.avi.WMV"  =>  { HW , HT , CFK , S ,
> > > CL
> >
> > ,
> >
> > > 2008 , DVX , V ,  MVW }
> > >
> > > Put the file under the hash's of those nine "key words".
> > >
> > > When i seach now for "fuck alice"  =>   { CFK , CL }
> > >
> > > search h(CFK)  AND h(CL)  will return a lot of wrong similar results
> > > but them one can filter locally in a more elaborate way.
> > >
> > > It might even be more selective than search  h(video/x-msvideo)
> > >
> > > At least it returns results, whereas "Woh_the.fuck_is
> > > ALICe(2008).divx.avi.WMV" as a key word is very unlikely that any one
> > > would think to search for and therefore never be found, never be spread
> > > ....., except by chance of course.
> > >
> > > regards leo





reply via email to

[Prev in Thread] Current Thread [Next in Thread]