From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms1.migadu.com with LMTPS id eNDeKIymMWYvXwAAqHPOHw:P1 (envelope-from ) for ; Wed, 01 May 2024 04:18:52 +0200 Received: from aspmx1.migadu.com ([2001:41d0:403:4876::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id eNDeKIymMWYvXwAAqHPOHw (envelope-from ) for ; Wed, 01 May 2024 04:18:52 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=freakingpenguin.com header.s=x header.b=MAE7RJE4; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=none ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1714529932; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:list-id:list-help:list-unsubscribe:list-subscribe: list-post:dkim-signature; bh=LbwFs/UmKWQRz4yPK9PcAe70fCpOsc5leLdlZXtGdxQ=; b=hR4E8Ki58yHCvQHBIWBwwoLq5LAR7L+GaQsljYYVEkchLSFJxQGeS+88C3MLO4IVnAZktC FOQ1cQ83JaTPkfYkNrB7Jf6Ra0JKxIob1pYZpFqEAeOpmYIVH2ULAelDzPDpgCl5Tv1Dvv BanE5jzsPl9gUs32Ei7wXpL/IZ/7oyjWq40uyyI4nyMkXyvlmOmo9z9cNxfEAcqzCwQ2Om Le0Qf/ZDM92Sijv6tuyqNv8Uy/CyQTEGGlB4/tEdO4x7DYct4hb45EZ+u50xUzP4lEeArY bLSqC4rr27/leYATsMcG1exx8YaRCXPH43ZMO/AKg8NJTyipze/Gld/NJl1Wvw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none ("invalid DKIM record") header.d=freakingpenguin.com header.s=x header.b=MAE7RJE4; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=none ARC-Seal: i=1; s=key1; d=yhetil.org; t=1714529932; a=rsa-sha256; cv=none; b=BHbwgEGLWRzWqqGINc1+iJ+Ecam9CF6Jd8XnaLq0mtW1z6LXh6PTBjHaMJEQBdzWDvKmic GdqiGuZYyw27btfy5Pq2HwO/98bZV5E7qrDjQov7jimCUT0+qzhYUy/ZdqCoTWnY6zCAQe NrueNsF2NK+tT9n00Z1khltnBOQ8dhG899QGOkPl/3U90Rnkyo/eNmNjLskuRVJHtDPkfb J5MfukBRcQIQEA8QyrK61Evp5XwyJnBakXNDetmlmhIqirmiCuzve05mlW1F15LAr/Jfwd XS10BE71W3Qz3Q/XiG0MubYj3EifZkBVueSoxyivnqpaAocYbqDwWQuENqp2Wg== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 4E6A3DAA7 for ; Wed, 1 May 2024 04:18:52 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s1zYQ-0000FB-Ss; Tue, 30 Apr 2024 22:18:42 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1zYQ-0000F2-1n for bug-guix@gnu.org; Tue, 30 Apr 2024 22:18:42 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1zYP-00057Q-Q8 for bug-guix@gnu.org; Tue, 30 Apr 2024 22:18:41 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1s1zYk-0006gm-ME for bug-guix@gnu.org; Tue, 30 Apr 2024 22:19:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#70689: guix search doesn't weigh word matches higher than subword matches Resent-From: Richard Sent Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Wed, 01 May 2024 02:19:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: report 70689 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 70689@debbugs.gnu.org X-Debbugs-Original-To: bug-guix@gnu.org Received: via spool by submit@debbugs.gnu.org id=B.171452992925697 (code B ref -1); Wed, 01 May 2024 02:19:02 +0000 Received: (at submit) by debbugs.gnu.org; 1 May 2024 02:18:49 +0000 Received: from localhost ([127.0.0.1]:34670 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1zYW-0006gP-Vp for submit@debbugs.gnu.org; Tue, 30 Apr 2024 22:18:49 -0400 Received: from lists.gnu.org ([2001:470:142::17]:45604) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s1zYS-0006gJ-5Y for submit@debbugs.gnu.org; Tue, 30 Apr 2024 22:18:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s1zY1-00009Y-QS for bug-guix@gnu.org; Tue, 30 Apr 2024 22:18:17 -0400 Received: from mail-108-mta203.mxroute.com ([136.175.108.203]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s1zY0-00056f-36 for bug-guix@gnu.org; Tue, 30 Apr 2024 22:18:17 -0400 Received: from filter006.mxroute.com ([136.175.111.2] filter006.mxroute.com) (Authenticated sender: mN4UYu2MZsgR) by mail-108-mta203.mxroute.com (ZoneMTA) with ESMTPSA id 18f31f1f2a60008ca2.001 for (version=TLSv1.3 cipher=TLS_AES_256_GCM_SHA384); Wed, 01 May 2024 02:18:10 +0000 X-Zone-Loop: 9731ac6d815cff36f4d0f5bd630a76f1d76e9ea50c52 X-Originating-IP: [136.175.111.2] DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=freakingpenguin.com; s=x; h=Content-Type:MIME-Version:Message-ID:Date: Subject:To:From:Sender:Reply-To:Cc:Content-Transfer-Encoding:Content-ID: Content-Description:Resent-Date:Resent-From:Resent-Sender:Resent-To:Resent-Cc :Resent-Message-ID:In-Reply-To:References:List-Id:List-Help:List-Unsubscribe: List-Subscribe:List-Post:List-Owner:List-Archive; bh=LbwFs/UmKWQRz4yPK9PcAe70fCpOsc5leLdlZXtGdxQ=; b=MAE7RJE4RcGIqT4ERONauFeXy1 +brXFG0sdOtXnpyIf/oEUC/UJViOJ1TlMxYy8PHV0Yx5mcvS9Tw+wOzcjBeO2WU40keNstxdSROcX ZRO/S7EkJEmytqxapymY0jazjk+YYt2xEi67ECb8UA5XaKU+y76RjAPmDbWChOE+/SATU6A09ndJj 4tNH5dZzfX1gqNVlYJRF1nkhvl87w7Mf7AWZ9yTHhCQh5lWzKbHglbYSgxB0Fdl8SZoU/JE/jfGo8 3TrLhCqAWjtAkUUi4sUuYCqP0Hr5Gvng8elB98XRqdL/O1NH1nkE8eOzT3Yy3HoEQgkXxv4n+JdHj e5PG0iQA==; From: Richard Sent Date: Tue, 30 Apr 2024 22:18:03 -0400 Message-ID: <87bk5qcm1w.fsf@freakingpenguin.com> MIME-Version: 1.0 Content-Type: text/plain X-Authenticated-Id: richard@freakingpenguin.com Received-SPF: pass client-ip=136.175.108.203; envelope-from=richard@freakingpenguin.com; helo=mail-108-mta203.mxroute.com X-Spam_score_int: -16 X-Spam_score: -1.7 X-Spam_bar: - X-Spam_report: (-1.7 / 5.0 requ) BAYES_00=-1.9, DKIM_INVALID=0.1, DKIM_SIGNED=0.1, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Spam-Score: -3.39 X-Migadu-Queue-Id: 4E6A3DAA7 X-Migadu-Scanner: mx10.migadu.com X-Migadu-Spam-Score: -3.39 X-TUID: se8lt82L28Em Hi Guix! When running guix search, relevance in synopsis and description fields are computed strictly by the number of matches, both as a word and as a subword. Ideally, if a search string matches an isolated word in a search, that result should be considered more relevant than simply matching a subword, even multiple times. To illustrate, imagine trying to find what package provides the `rsh` binary and running running `$ guix search rsh`. This binary is part of `inetutils` and the description field contains: > Inetutils is a collection of common network programs, such as an ftp > client and server, a telnet client and server, an rsh client and > server, and hostname. Most likely, this is what the user is interested in. However, inetutils does not show up until roughly the ~75th result with a relevance of 2 (the lowest possible relevance). Almost every search result beforehand contains the string "rsh" as a component of another word, such as "marshaling", "powershell", and "hershey". However, these match multiple times and are weighted significantly higher. Ideally, guix search should rate inetutils higher because the string "rsh" occurs as its own word, not as a component of another, unrelated word. (Very, very people would search "rsh" looking for matches with "hershey", even if "hershey" occurs multiple times.) Another example of where this can happen is with "dig", part of the bind package. Searching for "dig" returns garbage because "dig" is a common subword. Bind is scored with a relevance of 2, even though bind's description emphasises that dig is part of it. This would improve the experience when searching with strings that commonly occur as subwords. Since this change can't occur in a vacuum, care should be taken not to reduce the effectiveness of other reasonably forseeable search queries. -- Take it easy, Richard Sent Making my computer weirder one commit at a time.