From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms1.migadu.com with LMTPS id kFb3NZdHMmYr2wAAqHPOHw:P1 (envelope-from ) for ; Wed, 01 May 2024 15:46:00 +0200 Received: from aspmx1.migadu.com ([2001:41d0:303:e224::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0.migadu.com with LMTPS id kFb3NZdHMmYr2wAAqHPOHw (envelope-from ) for ; Wed, 01 May 2024 15:45:59 +0200 X-Envelope-To: larch@yhetil.org Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=bokr.com header.s=easymail header.b=Im+xB7UJ; dkim=fail ("headers rsa verify failed") header.d=bokr.com header.s=easymail header.b=Im+xB7UJ; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1714571159; a=rsa-sha256; cv=none; b=ZSyRMK2EmoUSrfpDRQNf9Gd+Smem98lTFSIqRb6qLbzPo/MDfyAndtSO5prUshwLqT4DXl e5j4ffcQM43OM0QyPrl+Amt88kGGP9yXoY2IJH0z67eCfHR+BGyQ1xallIIPGiAlw3nLE+ UveD+ZG+ksl7Ewa11tKg04NgE2pM6qapKixLL08XHhcTLArXicKo54Y+2Vgyoua6Ubeno3 qct1nWycqwlykrxvOv2soNGMQz2CZU8JWKrSnAJC1eoHS9mg4tbLvAlOsqp69s9Wan8WRS T8l8MxXjRKco7iZWCCq5iJnXrKJdQX/AUNxcocRfUPtgB9//9dFCG/qAw9Uv5A== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=bokr.com header.s=easymail header.b=Im+xB7UJ; dkim=fail ("headers rsa verify failed") header.d=bokr.com header.s=easymail header.b=Im+xB7UJ; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "bug-guix-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="bug-guix-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1714571159; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:resent-cc:resent-from:resent-sender: resent-message-id:in-reply-to:in-reply-to:references:references: list-id:list-help:list-unsubscribe:list-subscribe:list-post: dkim-signature; bh=50q64MHrd/DGI4gl9qUs7Tu+LNSYYcsweTBvoh2i9Ac=; b=JYlvV+a9ZVf5YEULvbVdhIzYchb10ewrTQGycyThk/9JghFaGV75lU+3CatWQAGo1HL5yk julMZaS/u1Jr9btcXQqJ/wgzXWCOtvPklqm1dqrRnrhlEk9ctLdyrDV1WST006i5kXVlZf ObEJlsVdXBzptGohv1RrGMqLN5GjIzf9Obm0lXh1TWOIsjtUyThklch1FqDo2UzUka2D73 TBl//ff7WpGWwKCH+UpQX5Okgey/uJ+6/ZLnRYyJDxjJ7ji3M8Bju7cunEOrrrxR30+M3q 6GpgwYyhI59FPURWdl9GgPg4wncjsV9uzgOGxu7KPX7vo1AUgxRQELGhH5Rr1w== Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 9CE391C65A for ; Wed, 1 May 2024 15:45:59 +0200 (CEST) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1s2AHJ-00024C-3L; Wed, 01 May 2024 09:45:45 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1s2AHE-000236-WF for bug-guix@gnu.org; Wed, 01 May 2024 09:45:41 -0400 Received: from debbugs.gnu.org ([2001:470:142:5::43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1s2AHE-0001el-Nb for bug-guix@gnu.org; Wed, 01 May 2024 09:45:40 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1s2AHZ-0007Fy-KS for bug-guix@gnu.org; Wed, 01 May 2024 09:46:01 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#70689: guix search doesn't weigh word matches higher than subword matches Resent-From: bokr@bokr.com Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Wed, 01 May 2024 13:46:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 70689 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: Richard Sent Cc: 70689@debbugs.gnu.org Received: via spool by 70689-submit@debbugs.gnu.org id=B70689.171457115227888 (code B ref 70689); Wed, 01 May 2024 13:46:01 +0000 Received: (at 70689) by debbugs.gnu.org; 1 May 2024 13:45:52 +0000 Received: from localhost ([127.0.0.1]:37558 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s2AHP-0007Fk-PC for submit@debbugs.gnu.org; Wed, 01 May 2024 09:45:52 -0400 Received: from mailout.easymail.ca ([64.68.200.34]:35802) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1s2AHL-0007Fe-Qo for 70689@debbugs.gnu.org; Wed, 01 May 2024 09:45:49 -0400 Received: from localhost (localhost [127.0.0.1]) by mailout.easymail.ca (Postfix) with ESMTP id BECE96F10F; Wed, 1 May 2024 13:45:20 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=bokr.com; s=easymail; t=1714571120; bh=ZAx1XjR9xIRTe+Kl9c8nccd6gXaB4z3GnOIK2yASGi0=; h=From:Date:To:Cc:Subject:References:In-Reply-To:From; b=Im+xB7UJTH4U9uZYiWi0YDWtf8gwI4YKCV4npqZou54bkXalk+m8a1JcM0mbnAjeV EggkePX71DMWzDZj2QIzYosDTHwnMucPlp/qBZ5wBUJZyUkyJYE2XiWC1lODkAgMbH VdNrkYgjSGYmjlHgHvP5LGh1kUhUHWbjvBVfh26ApW/zXrOSEq8LeFQayD4ayG5jBP /gWMr1oR1LbbK86XP7zCp/ujc0H3zyk+qkk5pxHQwpyPWaKFOmb5XdNRJf+Wjojlo5 8EoWGNt5b8FOhpAyN87G0wFPOzkSYLw83tSvAyDvPh54T65WWbAfJPFuv1Z5UTsVf9 DudHki55m8KNw== X-Virus-Scanned: Debian amavisd-new at emo07-pco.easydns.vpn Received: from mailout.easymail.ca ([127.0.0.1]) by localhost (emo07-pco.easydns.vpn [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id jkCzSTOuKSjW; Wed, 1 May 2024 13:45:20 +0000 (UTC) Received: from localhost (m90-129-222-29.cust.tele2.se [90.129.222.29]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-256) server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by mailout.easymail.ca (Postfix) with ESMTPSA id CB68D6EC97; Wed, 1 May 2024 13:45:19 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=simple/simple; d=bokr.com; s=easymail; t=1714571120; bh=ZAx1XjR9xIRTe+Kl9c8nccd6gXaB4z3GnOIK2yASGi0=; h=From:Date:To:Cc:Subject:References:In-Reply-To:From; b=Im+xB7UJTH4U9uZYiWi0YDWtf8gwI4YKCV4npqZou54bkXalk+m8a1JcM0mbnAjeV EggkePX71DMWzDZj2QIzYosDTHwnMucPlp/qBZ5wBUJZyUkyJYE2XiWC1lODkAgMbH VdNrkYgjSGYmjlHgHvP5LGh1kUhUHWbjvBVfh26ApW/zXrOSEq8LeFQayD4ayG5jBP /gWMr1oR1LbbK86XP7zCp/ujc0H3zyk+qkk5pxHQwpyPWaKFOmb5XdNRJf+Wjojlo5 8EoWGNt5b8FOhpAyN87G0wFPOzkSYLw83tSvAyDvPh54T65WWbAfJPFuv1Z5UTsVf9 DudHki55m8KNw== From: bokr@bokr.com Date: Wed, 1 May 2024 15:45:05 +0200 Message-ID: <20240501134505.GA10144@LionPure> References: <87bk5qcm1w.fsf@freakingpenguin.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <87bk5qcm1w.fsf@freakingpenguin.com> User-Agent: Mutt/1.10.1 (2018-07-13) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: bug-guix-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-Migadu-Queue-Id: 9CE391C65A X-Migadu-Scanner: mx12.migadu.com X-Migadu-Spam-Score: -2.09 X-Spam-Score: -2.09 X-TUID: 51NA2qk1nnls On +2024-04-30 22:18:03 -0400, Richard Sent wrote: > Hi Guix! > > When running guix search, relevance in synopsis and description fields > are computed strictly by the number of matches, both as a word and as a > subword. Ideally, if a search string matches an isolated word in a > search, that result should be considered more relevant than simply > matching a subword, even multiple times. > > To illustrate, imagine trying to find what package provides the `rsh` > binary and running running `$ guix search rsh`. This binary is part of > `inetutils` and the description field contains: > > > Inetutils is a collection of common network programs, such as an ftp > > client and server, a telnet client and server, an rsh client and > > server, and hostname. > > Most likely, this is what the user is interested in. However, inetutils > does not show up until roughly the ~75th result with a relevance of 2 > (the lowest possible relevance). > > Almost every search result beforehand contains the string "rsh" as a > component of another word, such as "marshaling", "powershell", and > "hershey". However, these match multiple times and are weighted > significantly higher. > > Ideally, guix search should rate inetutils higher because the string > "rsh" occurs as its own word, not as a component of another, unrelated > word. (Very, very people would search "rsh" looking for matches with > "hershey", even if "hershey" occurs multiple times.) > > Another example of where this can happen is with "dig", part of the bind > package. Searching for "dig" returns garbage because "dig" is a common > subword. Bind is scored with a relevance of 2, even though bind's > description emphasises that dig is part of it. > > This would improve the experience when searching with strings that > commonly occur as subwords. > > Since this change can't occur in a vacuum, care should be taken not to > reduce the effectiveness of other reasonably forseeable search queries. > > -- > Take it easy, > Richard Sent > Making my computer weirder one commit at a time. > > > I like your proposal :) I'm wondering how [1] compares in what it does for your use(ful) case. (I am not familiar with Hyper Estraier beyond being prompted for gnu.org searching) [1] -- Regards, Bengt Richter