From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms13.migadu.com with LMTPS id YIs7H8bm42Yo5wAAe85BDQ:P1 (envelope-from ) for ; Fri, 13 Sep 2024 07:16:22 +0000 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2.migadu.com with LMTPS id YIs7H8bm42Yo5wAAe85BDQ (envelope-from ) for ; Fri, 13 Sep 2024 09:16:22 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=debbugs.gnu.org header.s=debbugs-gnu-org header.b=mBsVDElg; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20230601 header.b=MZtwYtWz; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=gmail.com (policy=none) ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1726211782; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=nX9gp5yExZRDFI1HsgAmCJS1l28/MEfK0eoT4EjmOVc=; b=CazqXRI6gq6deth/ZdoYI1NVTyEwsPNjMMDehPAzZpOlVqYFq+Q3A0QtzJ4ZkEZ6btpGKz 3pJRtgJXLzl/Tu1yTL0GBRok5+eVAYMn2lDUWz+UK0cXXTvgPSFJ/HJ+Two1esc682YVAz gPVBIbHpIChnG4TR4zPIsiVrXdwasR1ofHJ8kBS+cFiRSb00AG0QjBi57MYfKGJnBQiXIO YiLFSk7P4b+DT/uD+vuV3lCu8sO8X4IfSJq+QKX2FQYDhLNtnkOMM8tx7fClSN+9P5zfk7 usR4XJkU2SQVbX75QE+y7mP+LUQPdmMPzMwsj04mBGM5vkZDl9V9eQj9+7/EqQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1726211782; a=rsa-sha256; cv=none; b=NvAhT/3IWqcaUi/0tu8o2WKVfYqzIQSbjWGaXmuiEIEzkEpXQF0uQM3eD5eNIJ1QqyfH7/ AWxl8odkp9n5zsjr/0xvDsTt/yVbAI5R8lcCxqFkQtt/lOe98/uISkm5lDX/pfYviZcJzB dYSloNgoskZp5YWJHhqmHA8YZymiRuEdEwfKnEcYWtm2qMPvObpxfc+DdpFJVaiB3NYMk7 AEaVvXCx3XvlLMnHy1yVcsJ/VIZxgjiGEz+lUTkuBQR3H3fflb6OsSXLGJ1WYF1Qp7OkPy 5awTiBOznJKyDQfpbU+q9IBzOLiusejXvFwSNDWaQ99O5rXx4bGfFWY2Xd4OPg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=debbugs.gnu.org header.s=debbugs-gnu-org header.b=mBsVDElg; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20230601 header.b=MZtwYtWz; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org"; dmarc=fail reason="SPF not aligned (relaxed), DKIM not aligned (relaxed)" header.from=gmail.com (policy=none) Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6C2EF7CDE7 for ; Fri, 13 Sep 2024 09:16:21 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1sp0X6-0004af-87; Fri, 13 Sep 2024 03:15:56 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1sp0X3-0004Yi-Qj for bug-guix@gnu.org; Fri, 13 Sep 2024 03:15:53 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1sp0X3-0007gB-Gz for bug-guix@gnu.org; Fri, 13 Sep 2024 03:15:53 -0400 DKIM-Signature: v=1; a=rsa-sha256; q=dns/txt; c=relaxed/relaxed; d=debbugs.gnu.org; s=debbugs-gnu-org; h=In-Reply-To:From:References:MIME-Version:Date:To:Subject; bh=nX9gp5yExZRDFI1HsgAmCJS1l28/MEfK0eoT4EjmOVc=; b=mBsVDElgIJYU6dsL5I5EyPmGngYhenJbqRQ0/7kReVCBfvDEsj3gSrblLcDf0DBbWo9kdOZFDhIqWJnWUsz6W2v9sql6p4heZAiuoQN5KdOjaifgqTl51s+7IqJwvqaO62yhmmFtdHjpyTU57FzEdGDx3S0VbWNgaIXjo4G1RBpke3twEbcrat3sg5LmMV+mfk+mvIP+8PFzu8rHGTyetB1TzCLz8PTu8bQt1GebbcUf0W6EiCOplPMeZ8ss2NyjJFRhdOcbdXwrnNZRnCbcAqqjBh3OmcD1ppi30GwHKf0SI3nqu+ehKmdmMMK3JmLf8YFwaR7BeQa4QpemSL0FYw==; Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1sp0XC-0001ol-98 for bug-guix@gnu.org; Fri, 13 Sep 2024 03:16:02 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#70689: guix search doesn't weigh word matches higher than subword matches Resent-From: aurtzy Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Fri, 13 Sep 2024 07:16:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70689 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: 70689@debbugs.gnu.org Cc: Richard Sent , bokr@bokr.com Received: via spool by 70689-submit@debbugs.gnu.org id=B70689.17262117036367 (code B ref 70689); Fri, 13 Sep 2024 07:16:02 +0000 Received: (at 70689) by debbugs.gnu.org; 13 Sep 2024 07:15:03 +0000 Received: from localhost ([127.0.0.1]:42355 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sp0WE-0001eR-L4 for submit@debbugs.gnu.org; Fri, 13 Sep 2024 03:15:03 -0400 Received: from mail-io1-f50.google.com ([209.85.166.50]:54692) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1sp0WC-0001dp-3w for 70689@debbugs.gnu.org; Fri, 13 Sep 2024 03:15:00 -0400 Received: by mail-io1-f50.google.com with SMTP id ca18e2360f4ac-82aac438539so20463939f.1 for <70689@debbugs.gnu.org>; Fri, 13 Sep 2024 00:14:51 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20230601; t=1726211625; x=1726816425; darn=debbugs.gnu.org; h=content-transfer-encoding:in-reply-to:from:content-language:subject :references:cc:to:user-agent:mime-version:date:message-id:from:to:cc :subject:date:message-id:reply-to; bh=nX9gp5yExZRDFI1HsgAmCJS1l28/MEfK0eoT4EjmOVc=; b=MZtwYtWzAqQ2DPbpSTkyktsyYztPm9/EQfqm6AF/C/MDVVrw+b2gzb77MFulCd72Lt chK5H48Pbm5AvV6f/dA2C+59IfMNNBNi2B6SCyGyc45SPtZlCjVeFO9WNUyTcRyTaAei IsmvkZ8c9OG+Uay1tW75h8xJkb5wTWxfkEXwJ7vSGzw+u8hP0ltXrV4IcOq0ZWGnu0CI QTJ3FGmK1k1xDEMZNONuCNVCGTNRSfLauObyTjuttrvsODonst87bMHEKl+jN4q0FExI wUUfw5MQwhlvyp8wH9KbFu4g6hUstE1wU7jcbRyoW5IDmId2/zR3q9r0ueptPvzgCpL9 Klug== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1726211625; x=1726816425; h=content-transfer-encoding:in-reply-to:from:content-language:subject :references:cc:to:user-agent:mime-version:date:message-id :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=nX9gp5yExZRDFI1HsgAmCJS1l28/MEfK0eoT4EjmOVc=; b=dESBhYXBcKABji9zymgJWVReaYLerjYMwg9wA/Zlal8DyO3k8XYTmqEUmPD+ptS/DF aupYb/ca6aITyxXI+XDpFvcMnu9TKcthsMKCu4yFbTbMndgnnRVcwzjIfe4XV5kHWYP6 AkHNDnSuHW6PFBWvVkPZkDju+PfqlPfuPKgcr63Gc1My/2QTviiVe3CboKqnpOepjZfl b9gcDFXpietClcO6PeTf0QvnadjCnnaGNWUNHXn9UshBjJeqx6gOZMgxxtHyTSZXw5g/ FgJhZbn1sfS9/yJ0irzq698UAmd+5fDcLGjOIUeKbG804s9b2PwcTOe9KOu4MC2/RKxy XSRA== X-Gm-Message-State: AOJu0Yz6gJsB1FS6GJlWGrCAjCGsz0esvIygQlAquBZK0sv9jEuiETTb 2jqTDhm483oj2v/1FZrVGptzotEP8dXCD+A8MyopUYp4nUUy1ZiI2ifpAQ== X-Google-Smtp-Source: AGHT+IEoGf4jzIaisHmr8gWds8GnBkyyj/t9oUE7jGCjDhUpvHWE5DLeFKE5MCy6w6rEqIFAha/9iw== X-Received: by 2002:a92:c54e:0:b0:39b:32f6:5e90 with SMTP id e9e14a558f8ab-3a08b739125mr13393705ab.15.1726211624411; Fri, 13 Sep 2024 00:13:44 -0700 (PDT) Received: from ?IPV6:2600:4808:a053:7600::e413? ([2600:4808:a053:7600::e413]) by smtp.gmail.com with ESMTPSA id 8926c6da1cb9f-4d35f89137bsm1047381173.104.2024.09.13.00.13.41 (version=TLS1_3 cipher=TLS_AES_128_GCM_SHA256 bits=128/128); Fri, 13 Sep 2024 00:13:43 -0700 (PDT) Message-ID: Date: Fri, 13 Sep 2024 03:13:41 -0400 MIME-Version: 1.0 User-Agent: Mozilla Thunderbird References: <20240501134505.GA10144@LionPure> Content-Language: en-US From: aurtzy In-Reply-To: <20240501134505.GA10144@LionPure> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US X-Migadu-Scanner: mx12.migadu.com X-Migadu-Spam-Score: -10.41 X-Migadu-Queue-Id: 6C2EF7CDE7 X-Spam-Score: -10.41 X-TUID: z1JlMJ5DVph5 Hi Richard and bokr, I've proposed changes to relevance scoring that should help with this issue, if you'd like to try it out here: https://issues.guix.gnu.org/73220 Cheers, aurtzy > On +2024-04-30 22:18:03 -0400, Richard Sent wrote: > > Hi Guix! > > > > When running guix search, relevance in synopsis and description fields > > are computed strictly by the number of matches, both as a word and as a > > subword. Ideally, if a search string matches an isolated word in a > > search, that result should be considered more relevant than simply > > matching a subword, even multiple times. > > > > To illustrate, imagine trying to find what package provides the `rsh` > > binary and running running `$ guix search rsh`. This binary is part of > > `inetutils` and the description field contains: > > > > > Inetutils is a collection of common network programs, such as an ftp > > > client and server, a telnet client and server, an rsh client and > > > server, and hostname. > > > > Most likely, this is what the user is interested in. However, inetutils > > does not show up until roughly the ~75th result with a relevance of 2 > > (the lowest possible relevance). > > > > Almost every search result beforehand contains the string "rsh" as a > > component of another word, such as "marshaling", "powershell", and > > "hershey". However, these match multiple times and are weighted > > significantly higher. > > > > Ideally, guix search should rate inetutils higher because the string > > "rsh" occurs as its own word, not as a component of another, unrelated > > word. (Very, very people would search "rsh" looking for matches with > > "hershey", even if "hershey" occurs multiple times.) > > > > Another example of where this can happen is with "dig", part of the bind > > package. Searching for "dig" returns garbage because "dig" is a common > > subword. Bind is scored with a relevance of 2, even though bind's > > description emphasises that dig is part of it. > > > > This would improve the experience when searching with strings that > > commonly occur as subwords. > > > > Since this change can't occur in a vacuum, care should be taken not to > > reduce the effectiveness of other reasonably forseeable search queries. > > > > -- > > Take it easy, > > Richard Sent > > Making my computer weirder one commit at a time. > > > > > > > > I like your proposal :) > > I'm wondering how [1] compares in what it does for your use(ful) case. > (I am not familiar with Hyper Estraier beyond being prompted for gnu.org searching) > > [1] > > -- > Regards, > Bengt Richter