bug-guix
[Top][All Lists]
Advanced

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

bug#70689: guix search doesn't weigh word matches higher than subword ma


From: aurtzy
Subject: bug#70689: guix search doesn't weigh word matches higher than subword matches
Date: Fri, 13 Sep 2024 03:13:41 -0400
User-agent: Mozilla Thunderbird

Hi Richard and bokr,

I've proposed changes to relevance scoring that should help with this issue, if you'd like to try it out here: https://issues.guix.gnu.org/73220

Cheers,

aurtzy

> On +2024-04-30 22:18:03 -0400, Richard Sent wrote:
> > Hi Guix!
> >
> > When running guix search, relevance in synopsis and description fields
> > are computed strictly by the number of matches, both as a word and as a
> > subword. Ideally, if a search string matches an isolated word in a
> > search, that result should be considered more relevant than simply
> > matching a subword, even multiple times.
> >
> > To illustrate, imagine trying to find what package provides the `rsh`
> > binary and running running `$ guix search rsh`. This binary is part of
> > `inetutils` and the description field contains:
> >
> > > Inetutils is a collection of common network programs, such as an ftp
> > > client and server, a telnet client and server, an rsh client and
> > > server, and hostname.
> >
> > Most likely, this is what the user is interested in. However, inetutils
> > does not show up until roughly the ~75th result with a relevance of 2
> > (the lowest possible relevance).
> >
> > Almost every search result beforehand contains the string "rsh" as a
> > component of another word, such as "marshaling", "powershell", and
> > "hershey". However, these match multiple times and are weighted
> > significantly higher.
> >
> > Ideally, guix search should rate inetutils higher because the string
> > "rsh" occurs as its own word, not as a component of another, unrelated
> > word. (Very, very people would search "rsh" looking for matches with
> > "hershey", even if "hershey" occurs multiple times.)
> >
> > Another example of where this can happen is with "dig", part of the bind
> > package. Searching for "dig" returns garbage because "dig" is a common
> > subword. Bind is scored with a relevance of 2, even though bind's
> > description emphasises that dig is part of it.
> >
> > This would improve the experience when searching with strings that
> > commonly occur as subwords.
> >
> > Since this change can't occur in a vacuum, care should be taken not to
> > reduce the effectiveness of other reasonably forseeable search queries.
> >
> > --
> > Take it easy,
> > Richard Sent
> > Making my computer weirder one commit at a time.
> >
> >
> >
>
> I like your proposal :)
>
> I'm wondering how [1] compares in what it does for your use(ful) case.
> (I am not familiar with Hyper Estraier beyond being prompted for gnu.org searching)
>
> [1] <https://directory.fsf.org/wiki/Hyper_Estraier>
>
> --
> Regards,
> Bengt Richter






reply via email to

[Prev in Thread] Current Thread [Next in Thread]