all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Richard Sent <richard@freakingpenguin.com>
To: 70689@debbugs.gnu.org
Subject: bug#70689: guix search doesn't weigh word matches higher than subword matches
Date: Tue, 30 Apr 2024 22:18:03 -0400	[thread overview]
Message-ID: <87bk5qcm1w.fsf@freakingpenguin.com> (raw)

Hi Guix!

When running guix search, relevance in synopsis and description fields
are computed strictly by the number of matches, both as a word and as a
subword. Ideally, if a search string matches an isolated word in a
search, that result should be considered more relevant than simply
matching a subword, even multiple times.

To illustrate, imagine trying to find what package provides the `rsh`
binary and running running `$ guix search rsh`. This binary is part of
`inetutils` and the description field contains:

> Inetutils is a collection of common network programs, such as an ftp
> client and server, a telnet client and server, an rsh client and
> server, and hostname.

Most likely, this is what the user is interested in. However, inetutils
does not show up until roughly the ~75th result with a relevance of 2
(the lowest possible relevance).

Almost every search result beforehand contains the string "rsh" as a
component of another word, such as "marshaling", "powershell", and
"hershey". However, these match multiple times and are weighted
significantly higher.

Ideally, guix search should rate inetutils higher because the string
"rsh" occurs as its own word, not as a component of another, unrelated
word. (Very, very people would search "rsh" looking for matches with
"hershey", even if "hershey" occurs multiple times.)

Another example of where this can happen is with "dig", part of the bind
package. Searching for "dig" returns garbage because "dig" is a common
subword. Bind is scored with a relevance of 2, even though bind's
description emphasises that dig is part of it.

This would improve the experience when searching with strings that
commonly occur as subwords.

Since this change can't occur in a vacuum, care should be taken not to
reduce the effectiveness of other reasonably forseeable search queries.

-- 
Take it easy,
Richard Sent
Making my computer weirder one commit at a time.




             reply	other threads:[~2024-05-01  2:18 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2024-05-01  2:18 Richard Sent [this message]
2024-05-01 13:45 ` bug#70689: guix search doesn't weigh word matches higher than subword matches bokr
2024-09-13  7:13   ` aurtzy
2024-09-13 15:08 ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87bk5qcm1w.fsf@freakingpenguin.com \
    --to=richard@freakingpenguin.com \
    --cc=70689@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.