From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id wEtPHeRV5GYK1AAAqHPOHw:P1 (envelope-from ) for ; Fri, 13 Sep 2024 15:10:28 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e16b::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id wEtPHeRV5GYK1AAAqHPOHw (envelope-from ) for ; Fri, 13 Sep 2024 17:10:28 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=debbugs.gnu.org header.s=debbugs-gnu-org header.b=q+wNpd4c; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20230601 header.b=K+c5VGrr; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1726240228; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=q5IukngVSCTeTU6gpYs2PaQPTn4mOIYPOpVThzg514I=; b=RihWw5f5/cWNsdFCQr7W9uPrZ8VkxDt93TEXzB65M/+8rybhkUH99KtCZaVRNtGsl4shIR SG9JLR/LcfcimnLGN3xySPt8r8lO2aDwX5WOHjSJnj9ZlXhaxWOnr4lugv85YXinuui79e egK1AFlQgovQqmFRp3E8OOcQJRI078izYlJwfsN8J3nKBxNlRUTUdLEVZ6FTE4LEZgy4pL jHkC4jrXyZkDCaBApDxpKqI2u9lI0xbXMV/WrnmHayaD2H+S2VFAKIX73ibKAS6cXyeOuW Ta/WU4I5wiYWyR7axPNe2AzsfvmBsAbJ6HX5KDLMivyxe65v+s2+hbsX6gfZmg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1726240228; a=rsa-sha256; cv=none; b=hOYldLP6N7wj6t6vT1eH0ScqSQc1KDoBY1K8K4if8NCOYjtCpmytQlP3JzKjcQ1N7Fejdd Jd8NHLGKWeAsP+X5TjF565K6JFSqQwd+nsqA3itS8Vlt9KM07bcvLGsb0TbdOYi2GcEsTl OW10S2y1wmhSnp0oVeTHYz0l/YVNf0jEASk6Ux4kE877x6Ylpi7Vj1Rb5XvoRFITo2q1Bw NgXiCaqz+VXKwo2o6bQZNIk96FLv7T4sNJxhX4jGHAEH4KHEuRyZUTDQVWevsaNmIDwpiH xppAhIjNDr+Oa/vUrL6YSZIVaZEb/C3hQkg9q/k/81rKddixxYp4p2mHaNHUgQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=debbugs.gnu.org header.s=debbugs-gnu-org header.b=q+wNpd4c; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20230601 header.b=K+c5VGrr; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 23796D8A4 for ; Fri, 13 Sep 2024 17:10:28 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sp7vn-0004qE-NJ; Fri, 13 Sep 2024 11:09:55 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sp7vl-0004gU-KB for bug-guix@gnu.org; Fri, 13 Sep 2024 11:09:53 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sp7vl-0006yh-A9 for bug-guix@gnu.org; Fri, 13 Sep 2024 11:09:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=MIME-Version:Date:References:In-Reply-To:From:To:Subject; bh=q5IukngVSCTeTU6gpYs2PaQPTn4mOIYPOpVThzg514I=; b=q+wNpd4cK3rvU/CUDawpFWj5E7PVd24cQwtFTsxrGmufuiQAVb5WHgaIi2oacY+u7WCkZWTtyYjiG80iriDfhOTw8Ybc2n5miI1XXxiH2enVK5Kxq/psPa51u3sG1Pd3HjFweiKyrPtHV/oz8kOo0lGAJZ5+r/0/8qXWAE06g1JptBtX6Qca7v7Uz5emE5BKm/a663P0g4h7gwfkuUwA6MTJzIOHeT7XrLPgdOruc6I3zMO9VfRvWldFJgIQthe1LP/GKA4cMp+QwQ/IAPBXSDRjlv7no8zo7bHTGlo8aNwlQTPStW/HTzEo3MY109O6kTW9/5Wo/lmtMGcYZUysCg==; Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sp7vu-0004dO-81 for bug-guix@gnu.org; Fri, 13 Sep 2024 11:10:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#70689: guix search doesn't weigh word matches higher than subword matches Resent-From: Simon Tournier Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 13 Sep 2024 15:10:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70689 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Richard Sent Cc: aurtzy , 70689@debbugs.gnu.org Received: via spool by 70689-submit@debbugs.gnu.org id=B70689.172624018317706 (code B ref 70689); Fri, 13 Sep 2024 15:10:02 +0000 Received: (at 70689) by debbugs.gnu.org; 13 Sep 2024 15:09:43 +0000 Received: from localhost ([127.0.0.1]:43980 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sp7va-0004bU-E4 for submit@debbugs.gnu.org; Fri, 13 Sep 2024 11:09:42 -0400 Received: from mail-wm1-f53.google.com ([209.85.128.53]:45541) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sp7vX-0004b2-5n for 70689@debbugs.gnu.org; Fri, 13 Sep 2024 11:09:40 -0400 Received: by mail-wm1-f53.google.com with SMTP id 5b1f17b1804b1-42cde6b5094so18716655e9.3 for <70689@debbugs.gnu.org>; Fri, 13 Sep 2024 08:09:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726240104; x=1726844904; darn=debbugs.gnu.org; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:from:to:cc:subject:date :message-id:reply-to; bh=q5IukngVSCTeTU6gpYs2PaQPTn4mOIYPOpVThzg514I=; b=K+c5VGrrnbVBPdcinpzwcvIC09FXqocuJQa4Ad6X2nzS3adlMTLY7FOpJjNpARlEpi SvXotPbFTCa/podYRV1lIox8k7tLs2SyvwN5uJxuBd/sQ5HBUtMJZcMQrO17mnZZPZ8d bg7b7spx0/Uqk8NsOGpSAEsShuYQGWH7jO3+gNQ/Gk8/mAgEmkb/wiPA6L5i+ZOox/D2 8wPED8dvqUgVW24FGM29sepIwSadaoKgX5z4nmKBiX3VUkjRbkynmqk7SzdvnqqVOzeN XOYfL7sINv0p5tLH+MzGfVDkKDCV/ot8cGz81SrrO1IzYm0gRFop87Na38G8TQlbLyF5 7Ntw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726240104; x=1726844904; h=content-transfer-encoding:mime-version:user-agent:message-id:date :references:in-reply-to:subject:cc:to:from:x-gm-message-state:from :to:cc:subject:date:message-id:reply-to; bh=q5IukngVSCTeTU6gpYs2PaQPTn4mOIYPOpVThzg514I=; b=LKE9VBOlGoMx8dxCO0D0LnyBc9bC3TPoHNnJc5bfxn+mMbbZwiIZhhry6niXkm21Aa Su2hjlJJ93b1z+l1kHl9DUcb8u5NCzSzdRVd+cjYEvPTD0xQB3g7w/8vAriNUAb3U5Pp 6AjAVMuu2lizFvE9ssrn6x7K9OTBbolxd2tylzMgiXIWqQHPD3DRRgqUnKLuKXtHEDwk DU+nojaM5aTSdp0u8JJ923We0vfFtuf+OqRQqYs88Y5j4yjk9Qx/L+M2q1+BquIyCk9T cZ/Z0Imsy4yOL4CwAPbXhL6+KpIMo9Qn64Jpmlp+JMAglDb7RRvt1OMOlGfSgYysxhuJ MjrA== X-Gm-Message-State: AOJu0YzS+/XXPUG9ih4/hjghxe0hvKr+3ERB59bvQ/FZfzpvNP/Y8UXp M7CGehzxSp+VZ+l3IgYe89TazS6zTxhLCvRl54rtR8IyK8XuiBp7 X-Google-Smtp-Source: AGHT+IEQaJGb7heK9nxFbTSjCWS7K7QpqRkb3VxSUDNxDUUeJGZ5wy4VfeTH2Cijjt1P3tdjdsZ25w== X-Received: by 2002:a5d:5582:0:b0:371:8319:4dcc with SMTP id ffacd0b85a97d-378c2cd5da5mr3613413f8f.2.1726240103438; Fri, 13 Sep 2024 08:08:23 -0700 (PDT) Received: from lili (roam-nat-fw-prg-194-254-61-40.net.univ-paris-diderot.fr. [194.254.61.40]) by smtp.gmail.com with ESMTPSA id ffacd0b85a97d-378956de4b9sm17251174f8f.111.2024.09.13.08.08.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Fri, 13 Sep 2024 08:08:23 -0700 (PDT) From: Simon Tournier In-Reply-To: <87bk5qcm1w.fsf@freakingpenguin.com> (Richard Sent's message of "Tue, 30 Apr 2024 22:18:03 -0400") References: <87bk5qcm1w.fsf@freakingpenguin.com> Date: Fri, 13 Sep 2024 17:08:19 +0200 Message-ID: <877cbfvbfg.fsf@gmail.com> User-Agent: Gnus/5.13 (Gnus v5.13) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: -6.01 X-Spam-Score: -6.01 X-Migadu-Scanner: mx13.migadu.com X-Migadu-Queue-Id: 23796D8A4 X-TUID: 574wp1GDBaHb Hi, On Tue, 30 Apr 2024 at 22:18, Richard Sent wr= ote: >> Inetutils is a collection of common network programs, such as an ftp >> client and server, a telnet client and server, an rsh client and >> server, and hostname. > > Most likely, this is what the user is interested in. However, inetutils > does not show up until roughly the ~75th result with a relevance of 2 > (the lowest possible relevance). Using Guix 056910e, I get: $ guix search rsh | recsel -CP name | grep -n inetutils 76:inetutils Then using the proposed v2 patch#73220 [1], I get: $ ./pre-inst-env guix search rsh | recsel -CP name | grep -n inetutils 34:inetutils Well, that=E2=80=99s not perfect but a bit better. > Almost every search result beforehand contains the string "rsh" as a > component of another word, such as "marshaling", "powershell", and > "hershey". However, these match multiple times and are weighted > significantly higher. Well, if we consider the current implementation, the relevance scoring reads for the highest: 4 * 0 name + 2 * 0 upstream-name + 1 * 0 outputs + 3 * 2 * 1 synopsis + 2 * 4 * 1 description + 1 * 0 file-name =3D 14 where it means: field-weigh * match * weight-match Compared to inetutils: 4 * 0 name + 2 * 0 upstream-name + 1 * 0 outputs + 3 * 0 synopsis + 2 * 1 * 1 description + 1 * 0 file-name =3D 2 Well, this case cannot be improved much. First, the field-weights are almost optimal [2]. Second the number of occurrences depends on the description; maybe it could be improved, I have not checked yet. And v2 of #73220 replace the value of weight-match: the term =E2=80=99rsh= =E2=80=99 in =E2=80=9Can rsh client=E2=80=9D should have an higher score than in =E2=80= =9Cuses `json.Marshal' and `json.Unmarshal'=E2=80=9D. In other words, it reads: 4 * 0 name + 2 * 0 upstream-name + 1 * 0 outputs + 3 * 0 synopsis + 2 * 1 * 3 description + 1 * 0 file-name =3D 6 I think this address your suggestion, I guess. > Ideally, guix search should rate inetutils higher because the string > "rsh" occurs as its own word, not as a component of another, unrelated > word. (Very, very people would search "rsh" looking for matches with > "hershey", even if "hershey" occurs multiple times.) Again, considering the case at hand: If instead of 3 randomly picked in v2 of #73220, we would pick 7, then inetutils is ranked first. Yeah, maybe 3 isn=E2=80=99t enough=E2=80=A6 And maybe 7 is a good choice. Do you have other examples than =E2=80=99rsh=E2=80=99? > Another example of where this can happen is with "dig", part of the bind > package. Searching for "dig" returns garbage because "dig" is a common > subword. Bind is scored with a relevance of 2, even though bind's > description emphasises that dig is part of it. Please note that using v2 of #73220 with the weight of 7, the package is returned =E2=80=9Cthird=E2=80=9C: a relevance of 14 (behind 24 and 20). However, it appears 8th in the list because the appearance for packages having the same relevance scoring is arbitrary. It just depends on how the modules are walked. Therefore, we cannot do much, IMHO. Cheers, simon 1: https://issues.guix.gnu.org/73220#1 2: Re: Search improvements (Was: Opposition to new single-letter package na= me "t") zimoun Tue, 09 Mar 2021 19:37:23 +0100 id:CAJ3okZ3+hn0nJP98OhnZYLWJvhLGpdTUK+jB0hoM5JArQxO=3Dzw@mail.gmail.com https://lists.gnu.org/archive/html/guix-devel/2021-03 https://yhetil.org/guix/CAJ3okZ3+hn0nJP98OhnZYLWJvhLGpdTUK+jB0hoM5JArQxO=3D= zw@mail.gmail.com