all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: aurtzy <aurtzy@gmail.com>
To: 70689@debbugs.gnu.org
Cc: Richard Sent <richard@freakingpenguin.com>, bokr@bokr.com
Subject: bug#70689: guix search doesn't weigh word matches higher than subword matches
Date: Fri, 13 Sep 2024 03:13:41 -0400	[thread overview]
Message-ID: <b1592bd0-cd96-4e3a-9f79-a7b793cd5d5c@gmail.com> (raw)
In-Reply-To: <20240501134505.GA10144@LionPure>

Hi Richard and bokr,

I've proposed changes to relevance scoring that should help with this 
issue, if you'd like to try it out here: https://issues.guix.gnu.org/73220

Cheers,

aurtzy

 > On +2024-04-30 22:18:03 -0400, Richard Sent wrote:
 > > Hi Guix!
 > >
 > > When running guix search, relevance in synopsis and description fields
 > > are computed strictly by the number of matches, both as a word and as a
 > > subword. Ideally, if a search string matches an isolated word in a
 > > search, that result should be considered more relevant than simply
 > > matching a subword, even multiple times.
 > >
 > > To illustrate, imagine trying to find what package provides the `rsh`
 > > binary and running running `$ guix search rsh`. This binary is part of
 > > `inetutils` and the description field contains:
 > >
 > > > Inetutils is a collection of common network programs, such as an ftp
 > > > client and server, a telnet client and server, an rsh client and
 > > > server, and hostname.
 > >
 > > Most likely, this is what the user is interested in. However, inetutils
 > > does not show up until roughly the ~75th result with a relevance of 2
 > > (the lowest possible relevance).
 > >
 > > Almost every search result beforehand contains the string "rsh" as a
 > > component of another word, such as "marshaling", "powershell", and
 > > "hershey". However, these match multiple times and are weighted
 > > significantly higher.
 > >
 > > Ideally, guix search should rate inetutils higher because the string
 > > "rsh" occurs as its own word, not as a component of another, unrelated
 > > word. (Very, very people would search "rsh" looking for matches with
 > > "hershey", even if "hershey" occurs multiple times.)
 > >
 > > Another example of where this can happen is with "dig", part of the 
bind
 > > package. Searching for "dig" returns garbage because "dig" is a common
 > > subword. Bind is scored with a relevance of 2, even though bind's
 > > description emphasises that dig is part of it.
 > >
 > > This would improve the experience when searching with strings that
 > > commonly occur as subwords.
 > >
 > > Since this change can't occur in a vacuum, care should be taken not to
 > > reduce the effectiveness of other reasonably forseeable search queries.
 > >
 > > --
 > > Take it easy,
 > > Richard Sent
 > > Making my computer weirder one commit at a time.
 > >
 > >
 > >
 >
 > I like your proposal :)
 >
 > I'm wondering how [1] compares in what it does for your use(ful) case.
 > (I am not familiar with Hyper Estraier beyond being prompted for 
gnu.org searching)
 >
 > [1] <https://directory.fsf.org/wiki/Hyper_Estraier>
 >
 > --
 > Regards,
 > Bengt Richter





  reply	other threads:[~2024-09-13  7:16 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01  2:18 bug#70689: guix search doesn't weigh word matches higher than subword matches Richard Sent
2024-05-01 13:45 ` bokr
2024-09-13  7:13   ` aurtzy [this message]
2024-09-13 15:08 ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=b1592bd0-cd96-4e3a-9f79-a7b793cd5d5c@gmail.com \
    --to=aurtzy@gmail.com \
    --cc=70689@debbugs.gnu.org \
    --cc=bokr@bokr.com \
    --cc=richard@freakingpenguin.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.