unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: Lars-Dominik Braun <ldb@leibniz-psychology.org>
To: zimoun <zimon.toutoune@gmail.com>
Cc: ludo@gnu.org, 58136@debbugs.gnu.org
Subject: [bug#58136] [PATCH] ui: Improve sort order when searching package names.
Date: Wed, 12 Oct 2022 13:24:08 +0200	[thread overview]
Message-ID: <Y0aj2GRdkZG6cFhs@zpidnb93> (raw)
In-Reply-To: <86wn9na82p.fsf@gmail.com>


[-- Attachment #1.1: Type: text/plain, Size: 1719 bytes --]

Hi simon,

> In addition to your proposal which LGTM, maybe we could also use the
> ’upstream-name’ properties.  Most of the time, the Guix name matches the
> upstream name, but sometimes not.  Although, it would not fix the issue
> for ggplot2 since there is no upstream-name for this package. :-)
I agree that using the upstream-name would be a good idea.

>  2. set the “namespace” weight to 1 (or 2 if you prefer)
> 
>     Otherwise, for example, generic name as CSV could artificially bump
>     the relevance and hide relevant packages.  For instance, compare
> 
>        guix search csv
The issue here is we don’t know what the user is searching for. If we
add more weight to the package name then usually libraries (rust-csv,
ghc-csv, …) win. Imo a search for “csv” should return tools to
manipulate CSV files like csvkit, csvdiff, xlsx2csv, … Just like
“json” should yield tools like jq, json.sh and possibly others which
I cannot find right now. But maybe I’m searching for a C library that
parses CSV instead. And then what…?

As for ggplot2, the particular issue seems to be that scores are added
for each match and the description for some of our packages contains
“ggplot2” alot. So I tried using MAX instead of +, which works,
but results in little variation of scores and thus weird sort order
(descending by name). It does not feel like an improvement either.

Cheers,
Lars

-- 
Lars-Dominik Braun
Wissenschaftlicher Mitarbeiter/Research Associate

www.leibniz-psychology.org
ZPID - Leibniz-Institut für Psychologie /
ZPID - Leibniz Institute for Psychology
Universitätsring 15
D-54296 Trier - Germany
Tel.: +49–651–201-4964

[-- Attachment #1.2: v2.patch --]
[-- Type: text/plain, Size: 4665 bytes --]

diff --git a/guix/packages.scm b/guix/packages.scm
index 94e464cd01..9934501cdb 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -86,6 +86,7 @@ (define-module (guix packages)
             this-package
             package-name
             package-upstream-name
+            package-upstream-name*
             package-version
             package-full-name
             package-source
@@ -657,6 +658,38 @@ (define (package-upstream-name package)
   (or (assq-ref (package-properties package) 'upstream-name)
       (package-name package)))
 
+(define (package-upstream-name* package)
+  "Return the upstream name of PACKAGE, which could be different from the name
+it has in Guix."
+  (let ((namespaces (list "cl-"
+                          "ecl-"
+                          "emacs-"
+                          "ghc-"
+                          "go-"
+                          "guile-"
+                          "java-"
+                          "julia-"
+                          "lua-"
+                          "minetest-"
+                          "node-"
+                          "ocaml-"
+                          "perl-"
+                          "python-"
+                          "r-"
+                          "ruby-"
+                          "rust-"
+                          "sbcl-"
+                          "texlive-"))
+        (name (package-name package)))
+    (or (assq-ref (package-properties package) 'upstream-name)
+        (let loop ((prefixes namespaces))
+          (match prefixes
+            ('() name)
+            ((prefix rest ...)
+              (if (string-prefix? prefix name)
+                (substring name (string-length prefix))
+                (loop (cdr prefixes)))))))))
+
 (define (hidden-package p)
   "Return a \"hidden\" version of P--i.e., one that 'fold-packages' and thus,
 user interfaces, ignores."
diff --git a/guix/ui.scm b/guix/ui.scm
index dad2b853ac..da16a50f9f 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1623,10 +1623,23 @@ (define (relevance obj regexps metrics)
   (define (score regexp str)
     (fold-matches regexp str 0
                   (lambda (m score)
-                    (+ score
-                       (if (string=? (match:substring m) str)
-                           5             ;exact match
-                           1)))))
+                    (let* ((start (- (match:start m) 1))
+                           (end (match:end m))
+                           (left (if (>= start 0) (string-ref str start) #f))
+                           (right (if (< end (string-length str)) (string-ref str end) #f))
+                           (delimiter-classes '(Cc Cf Pd Pe Pf Pi Po Ps Sk Zs Zl Zp))
+                           (delim-left (or (member (and=> left char-general-category) delimiter-classes) (eq? left #f)))
+                           (delim-right (or (member (and=> right char-general-category) delimiter-classes) (eq? right #f))))
+                      (max score
+                        (cond
+                          ;; regexp is a full match for str.
+                          ((and (eq? left #f) (eq? right #f)) 4)
+                          ;; regexp matches a single word in str.
+                          ((and delim-left delim-right) 3)
+                          ;; regexp matches the beginning or end of a word in str.
+                          ((or delim-left delim-right) 2)
+                          ;; Everything else.
+                          (#t 1)))))))
 
   (define (regexp->score regexp)
     (let ((score-regexp (lambda (str) (score regexp str))))
@@ -1635,10 +1648,11 @@ (define (regexp->score regexp)
                 ((field . weight)
                  (match (field obj)
                    (#f  relevance)
+                   ('() relevance)
                    ((? string? str)
-                    (+ relevance (* (score-regexp str) weight)))
+                    (max relevance (* (score-regexp str) weight)))
                    ((lst ...)
-                    (+ relevance (* weight (apply + (map score-regexp lst)))))))))
+                    (max relevance (* weight (apply max (map score-regexp lst)))))))))
             0 metrics)))
 
   (let loop ((regexps regexps)
@@ -1655,7 +1669,8 @@ (define (regexp->score regexp)
 (define %package-metrics
   ;; Metrics used to compute the "relevance score" of a package against a set
   ;; of regexps.
-  `((,package-name . 4)
+  `((,package-name . 5)
+    (,package-upstream-name* . 1)
 
     ;; Match against uncommon outputs.
     (,(lambda (package)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

  parent reply	other threads:[~2022-10-12 11:25 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-09-28  9:27 [bug#58136] [PATCH] ui: Improve sort order when searching package names Lars-Dominik Braun
2022-09-28 14:26 ` zimoun
2022-09-28 20:23   ` Maxime Devos
2022-09-28 20:45     ` Maxime Devos
2022-09-28 21:40     ` zimoun
2022-09-28 21:43       ` Maxime Devos
2022-10-01 21:42   ` Ludovic Courtès
2022-10-02  8:26     ` zimoun
2022-10-12 11:24   ` Lars-Dominik Braun [this message]
2022-10-17  7:46     ` Ludovic Courtès
2022-10-17  8:19       ` zimoun
2022-12-09 11:49 ` Lars-Dominik Braun
2022-12-13 13:28   ` bug#58136: " Ludovic Courtès
2022-12-13 14:53     ` [bug#58136] " Lars-Dominik Braun
2022-12-13 16:40       ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=Y0aj2GRdkZG6cFhs@zpidnb93 \
    --to=ldb@leibniz-psychology.org \
    --cc=58136@debbugs.gnu.org \
    --cc=ludo@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).