all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
@ 2022-09-28  9:27 Lars-Dominik Braun
  2022-09-28 14:26 ` zimoun
  2022-12-09 11:49 ` Lars-Dominik Braun
  0 siblings, 2 replies; 15+ messages in thread
From: Lars-Dominik Braun @ 2022-09-28  9:27 UTC (permalink / raw)
  To: 58136; +Cc: ludo

Hi,

we provide a `guix serach`-based package search to our users and noticed
that searching for “ggplot2” yields the package r-ggplot2 on position
11 only – when it should be the first. Looking at other potential
search terms (haven, shiny, ape, renv, here, ini, setuptools) reveals
similar issues.

I propose we also score the unprefixed package name, so SCORE can award
bonus points for a full string match. I don’t like that we have to
maintain a list of common prefixes for this and that package names are
scored twice now, but I can’t think of a better solution right now.

Cheers,
Lars


* guix/ui.scm (%package-metrics): Increase score for PACKAGE-NAME and
add score for unprefixed package name.
---
 guix/ui.scm | 33 ++++++++++++++++++++++++++++++++-
 1 file changed, 32 insertions(+), 1 deletion(-)

diff --git a/guix/ui.scm b/guix/ui.scm
index dad2b853ac..55b596ed35 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1655,7 +1655,38 @@ (define (regexp->score regexp)
 (define %package-metrics
   ;; Metrics used to compute the "relevance score" of a package against a set
   ;; of regexps.
-  `((,package-name . 4)
+  `((,package-name . 8)
+
+    ;; For packages with a language prefix (namespaces), also compare the
+    ;; unprefixed name, so searching for “ggplot2” yields
+    ;; r-ggplot2 as first result instead of other, higher-ranked packages,
+    ;; which contain “ggplot2” in their description alot.
+    (,(lambda (package)
+        (let ((namespaces (list "cl-"
+                                "ecl-"
+                                "emacs-"
+                                "ghc-"
+                                "go-"
+                                "guile-"
+                                "java-"
+                                "julia-"
+                                "node-"
+                                "ocaml-"
+                                "perl-"
+                                "python-"
+                                "r-"
+                                "ruby-"
+                                "rust-"
+                                "sbcl-"
+                                "texlive-"))
+              (name (package-name package)))
+          (fold (lambda (prefix accum)
+                  (if (string-prefix? prefix name)
+                      (cons (substring name (string-length prefix)) accum)
+                      accum))
+                '()
+                namespaces)))
+       . 4)
 
     ;; Match against uncommon outputs.
     (,(lambda (package)
-- 
2.35.1





^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28  9:27 [bug#58136] [PATCH] ui: Improve sort order when searching package names Lars-Dominik Braun
@ 2022-09-28 14:26 ` zimoun
  2022-09-28 20:23   ` Maxime Devos
                     ` (2 more replies)
  2022-12-09 11:49 ` Lars-Dominik Braun
  1 sibling, 3 replies; 15+ messages in thread
From: zimoun @ 2022-09-28 14:26 UTC (permalink / raw)
  To: Lars-Dominik Braun, 58136; +Cc: ludo

Hi Lars,

On Wed, 28 Sep 2022 at 11:27, Lars-Dominik Braun <lars@6xq.net> wrote:

> I propose we also score the unprefixed package name, so SCORE can award
> bonus points for a full string match. I don’t like that we have to
> maintain a list of common prefixes for this and that package names are
> scored twice now, but I can’t think of a better solution right now.

In addition to your proposal which LGTM, maybe we could also use the
’upstream-name’ properties.  Most of the time, the Guix name matches the
upstream name, but sometimes not.  Although, it would not fix the issue
for ggplot2 since there is no upstream-name for this package. :-)

Well, I propose something like see below (based on your patch):

 1. keep the weight as 4
 2. set the “namespace” weight to 1 (or 2 if you prefer)

    Otherwise, for example, generic name as CSV could artificially bump
    the relevance and hide relevant packages.  For instance, compare

       guix search csv

 3. use upstream-name if provided

WDYT?


--8<---------------cut here---------------start------------->8---
diff --git a/guix/ui.scm b/guix/ui.scm
index 55b596ed35..14f296a546 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1655,7 +1655,7 @@ (define (regexp->score regexp)
 (define %package-metrics
   ;; Metrics used to compute the "relevance score" of a package against a set
   ;; of regexps.
-  `((,package-name . 8)
+  `((,package-name . 4)
 
     ;; For packages with a language prefix (namespaces), also compare the
     ;; unprefixed name, so searching for “ggplot2” yields
@@ -1684,9 +1684,9 @@ (define %package-metrics
                   (if (string-prefix? prefix name)
                       (cons (substring name (string-length prefix)) accum)
                       accum))
-                '()
+                (list (package-upstream-name package))
                 namespaces)))
-       . 4)
+     . 1)
 
     ;; Match against uncommon outputs.
     (,(lambda (package)
--8<---------------cut here---------------end--------------->8---


Cheers,
simon




^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28 14:26 ` zimoun
@ 2022-09-28 20:23   ` Maxime Devos
  2022-09-28 20:45     ` Maxime Devos
  2022-09-28 21:40     ` zimoun
  2022-10-01 21:42   ` Ludovic Courtès
  2022-10-12 11:24   ` Lars-Dominik Braun
  2 siblings, 2 replies; 15+ messages in thread
From: Maxime Devos @ 2022-09-28 20:23 UTC (permalink / raw)
  To: zimoun, Lars-Dominik Braun, 58136; +Cc: ludo


[-- Attachment #1.1.1: Type: text/plain, Size: 348 bytes --]

> +    ;; For packages with a language prefix (namespaces), also compare the

You missed "minetest-"

On 28-09-2022 16:26, zimoun wrote:
>   3. use upstream-name if provided
> 
> WDYT?

For the Minetest mods (which have upstream-names like "Jeija/mesecons"), 
I don't think think the upstream-name is useful as-is

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28 20:23   ` Maxime Devos
@ 2022-09-28 20:45     ` Maxime Devos
  2022-09-28 21:40     ` zimoun
  1 sibling, 0 replies; 15+ messages in thread
From: Maxime Devos @ 2022-09-28 20:45 UTC (permalink / raw)
  To: zimoun, Lars-Dominik Braun, 58136; +Cc: ludo


[-- Attachment #1.1.1: Type: text/plain, Size: 91 bytes --]



On 28-09-2022 22:23, Maxime Devos wrote:
> You missed "minetest-"

Also, "lua-".

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28 20:23   ` Maxime Devos
  2022-09-28 20:45     ` Maxime Devos
@ 2022-09-28 21:40     ` zimoun
  2022-09-28 21:43       ` Maxime Devos
  1 sibling, 1 reply; 15+ messages in thread
From: zimoun @ 2022-09-28 21:40 UTC (permalink / raw)
  To: Maxime Devos, Lars-Dominik Braun, 58136; +Cc: ludo

Hi,

On mer., 28 sept. 2022 at 22:23, Maxime Devos <maximedevos@telenet.be> wrote:

> On 28-09-2022 16:26, zimoun wrote:
>>   3. use upstream-name if provided
>
> For the Minetest mods (which have upstream-names like "Jeija/mesecons"), 
> I don't think think the upstream-name is useful as-is

Using the change I am proposing, just knowing ’jeija’ returns a match:

--8<---------------cut here---------------start------------->8---
$ ./pre-inst-env guix search jeija
name: minetest-mesecons
version: 1.2.1-63.27c3c51
outputs:
+ out: everything else
systems: x86_64-linux
dependencies: 
location: gnu/packages/minetest.scm:367:4
homepage: https://mesecons.net
license: LGPL 3, CC-BY-SA 3.0
synopsis: Digital circuitry for Minetest, including wires, buttons and lights  
description: Mesecons is a mod for Minetest implementing various items
+ related to digital circuitry, such as wires, buttons, lights and programmable
+ controllers.  Among other things, there are also pistons, solar panels,
+ pressure plates and note blocks.
+ 
+ Mesecons has a similar goal to Redstone in Minecraft, but works in its own
+ way, with different rules and mechanics.
relevance: 4
--8<---------------cut here---------------end--------------->8---

which is better than nothing.

Well, upstream-name is not so much useful as-is but it can help for some
cases.


Cheers,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28 21:40     ` zimoun
@ 2022-09-28 21:43       ` Maxime Devos
  0 siblings, 0 replies; 15+ messages in thread
From: Maxime Devos @ 2022-09-28 21:43 UTC (permalink / raw)
  To: zimoun, Lars-Dominik Braun, 58136; +Cc: ludo


[-- Attachment #1.1.1: Type: text/plain, Size: 321 bytes --]



On 28-09-2022 23:40, zimoun wrote:
> Using the change I am proposing, just knowing ’jeija’ returns a match:
> 
> --8<---------------cut here---------------start------------->8---
> $ ./pre-inst-env guix search jeija
> name: minetest-mesecons
 > [...]

Right, didn't think of that.

Greetings,
Maxime.

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28 14:26 ` zimoun
  2022-09-28 20:23   ` Maxime Devos
@ 2022-10-01 21:42   ` Ludovic Courtès
  2022-10-02  8:26     ` zimoun
  2022-10-12 11:24   ` Lars-Dominik Braun
  2 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2022-10-01 21:42 UTC (permalink / raw)
  To: zimoun; +Cc: 58136, Lars-Dominik Braun

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> Well, I propose something like see below (based on your patch):
>
>  1. keep the weight as 4
>  2. set the “namespace” weight to 1 (or 2 if you prefer)
>
>     Otherwise, for example, generic name as CSV could artificially bump
>     the relevance and hide relevant packages.  For instance, compare
>
>        guix search csv
>
>  3. use upstream-name if provided

Maybe something like:

  (define %package-metrics
    `((,package-name . 4)
      (,package-upstream-name* . 4)  ;or a lower weight, dunno
      …))

where:

  (define (package-upstream-name* package)
    (or (assoc-ref (package-properties package) 'upstream-name)
        (package-name-sans-namespace package)))  ;strip "r-", "emacs-", etc.

WDYT?

(Perhaps ‘package-upstream-name’ could be taught about these implicit
name conversions done for R, Python, Emacs, etc.?)

Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-10-01 21:42   ` Ludovic Courtès
@ 2022-10-02  8:26     ` zimoun
  0 siblings, 0 replies; 15+ messages in thread
From: zimoun @ 2022-10-02  8:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Lars-Dominik Braun, 58136

Hi,

On Sat, 01 Oct 2022 at 23:42, Ludovic Courtès <ludo@gnu.org> wrote:

>   (define (package-upstream-name* package)
>     (or (assoc-ref (package-properties package) 'upstream-name)
>         (package-name-sans-namespace package)))  ;strip "r-", "emacs-", etc.

I think it is better to rely on ’package-upstream-name’ and hides the
plumbing.  It helps when something needs to be “refactored“, IMHO.


> (Perhaps ‘package-upstream-name’ could be taught about these implicit
> name conversions done for R, Python, Emacs, etc.?)

Maybe, but it requires to scrutinize importer per importer since
upstream name is mainly used by these, IIUC.


Cheers,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28 14:26 ` zimoun
  2022-09-28 20:23   ` Maxime Devos
  2022-10-01 21:42   ` Ludovic Courtès
@ 2022-10-12 11:24   ` Lars-Dominik Braun
  2022-10-17  7:46     ` Ludovic Courtès
  2 siblings, 1 reply; 15+ messages in thread
From: Lars-Dominik Braun @ 2022-10-12 11:24 UTC (permalink / raw)
  To: zimoun; +Cc: ludo, 58136


[-- Attachment #1.1: Type: text/plain, Size: 1719 bytes --]

Hi simon,

> In addition to your proposal which LGTM, maybe we could also use the
> ’upstream-name’ properties.  Most of the time, the Guix name matches the
> upstream name, but sometimes not.  Although, it would not fix the issue
> for ggplot2 since there is no upstream-name for this package. :-)
I agree that using the upstream-name would be a good idea.

>  2. set the “namespace” weight to 1 (or 2 if you prefer)
> 
>     Otherwise, for example, generic name as CSV could artificially bump
>     the relevance and hide relevant packages.  For instance, compare
> 
>        guix search csv
The issue here is we don’t know what the user is searching for. If we
add more weight to the package name then usually libraries (rust-csv,
ghc-csv, …) win. Imo a search for “csv” should return tools to
manipulate CSV files like csvkit, csvdiff, xlsx2csv, … Just like
“json” should yield tools like jq, json.sh and possibly others which
I cannot find right now. But maybe I’m searching for a C library that
parses CSV instead. And then what…?

As for ggplot2, the particular issue seems to be that scores are added
for each match and the description for some of our packages contains
“ggplot2” alot. So I tried using MAX instead of +, which works,
but results in little variation of scores and thus weird sort order
(descending by name). It does not feel like an improvement either.

Cheers,
Lars

-- 
Lars-Dominik Braun
Wissenschaftlicher Mitarbeiter/Research Associate

www.leibniz-psychology.org
ZPID - Leibniz-Institut für Psychologie /
ZPID - Leibniz Institute for Psychology
Universitätsring 15
D-54296 Trier - Germany
Tel.: +49–651–201-4964

[-- Attachment #1.2: v2.patch --]
[-- Type: text/plain, Size: 4665 bytes --]

diff --git a/guix/packages.scm b/guix/packages.scm
index 94e464cd01..9934501cdb 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -86,6 +86,7 @@ (define-module (guix packages)
             this-package
             package-name
             package-upstream-name
+            package-upstream-name*
             package-version
             package-full-name
             package-source
@@ -657,6 +658,38 @@ (define (package-upstream-name package)
   (or (assq-ref (package-properties package) 'upstream-name)
       (package-name package)))
 
+(define (package-upstream-name* package)
+  "Return the upstream name of PACKAGE, which could be different from the name
+it has in Guix."
+  (let ((namespaces (list "cl-"
+                          "ecl-"
+                          "emacs-"
+                          "ghc-"
+                          "go-"
+                          "guile-"
+                          "java-"
+                          "julia-"
+                          "lua-"
+                          "minetest-"
+                          "node-"
+                          "ocaml-"
+                          "perl-"
+                          "python-"
+                          "r-"
+                          "ruby-"
+                          "rust-"
+                          "sbcl-"
+                          "texlive-"))
+        (name (package-name package)))
+    (or (assq-ref (package-properties package) 'upstream-name)
+        (let loop ((prefixes namespaces))
+          (match prefixes
+            ('() name)
+            ((prefix rest ...)
+              (if (string-prefix? prefix name)
+                (substring name (string-length prefix))
+                (loop (cdr prefixes)))))))))
+
 (define (hidden-package p)
   "Return a \"hidden\" version of P--i.e., one that 'fold-packages' and thus,
 user interfaces, ignores."
diff --git a/guix/ui.scm b/guix/ui.scm
index dad2b853ac..da16a50f9f 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1623,10 +1623,23 @@ (define (relevance obj regexps metrics)
   (define (score regexp str)
     (fold-matches regexp str 0
                   (lambda (m score)
-                    (+ score
-                       (if (string=? (match:substring m) str)
-                           5             ;exact match
-                           1)))))
+                    (let* ((start (- (match:start m) 1))
+                           (end (match:end m))
+                           (left (if (>= start 0) (string-ref str start) #f))
+                           (right (if (< end (string-length str)) (string-ref str end) #f))
+                           (delimiter-classes '(Cc Cf Pd Pe Pf Pi Po Ps Sk Zs Zl Zp))
+                           (delim-left (or (member (and=> left char-general-category) delimiter-classes) (eq? left #f)))
+                           (delim-right (or (member (and=> right char-general-category) delimiter-classes) (eq? right #f))))
+                      (max score
+                        (cond
+                          ;; regexp is a full match for str.
+                          ((and (eq? left #f) (eq? right #f)) 4)
+                          ;; regexp matches a single word in str.
+                          ((and delim-left delim-right) 3)
+                          ;; regexp matches the beginning or end of a word in str.
+                          ((or delim-left delim-right) 2)
+                          ;; Everything else.
+                          (#t 1)))))))
 
   (define (regexp->score regexp)
     (let ((score-regexp (lambda (str) (score regexp str))))
@@ -1635,10 +1648,11 @@ (define (regexp->score regexp)
                 ((field . weight)
                  (match (field obj)
                    (#f  relevance)
+                   ('() relevance)
                    ((? string? str)
-                    (+ relevance (* (score-regexp str) weight)))
+                    (max relevance (* (score-regexp str) weight)))
                    ((lst ...)
-                    (+ relevance (* weight (apply + (map score-regexp lst)))))))))
+                    (max relevance (* weight (apply max (map score-regexp lst)))))))))
             0 metrics)))
 
   (let loop ((regexps regexps)
@@ -1655,7 +1669,8 @@ (define (regexp->score regexp)
 (define %package-metrics
   ;; Metrics used to compute the "relevance score" of a package against a set
   ;; of regexps.
-  `((,package-name . 4)
+  `((,package-name . 5)
+    (,package-upstream-name* . 1)
 
     ;; Match against uncommon outputs.
     (,(lambda (package)

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-10-12 11:24   ` Lars-Dominik Braun
@ 2022-10-17  7:46     ` Ludovic Courtès
  2022-10-17  8:19       ` zimoun
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2022-10-17  7:46 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 58136, zimoun

Hi!

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

> diff --git a/guix/packages.scm b/guix/packages.scm
> index 94e464cd01..9934501cdb 100644
> --- a/guix/packages.scm
> +++ b/guix/packages.scm
> @@ -86,6 +86,7 @@ (define-module (guix packages)
>              this-package
>              package-name
>              package-upstream-name
> +            package-upstream-name*
>              package-version
>              package-full-name
>              package-source
> @@ -657,6 +658,38 @@ (define (package-upstream-name package)
>    (or (assq-ref (package-properties package) 'upstream-name)
>        (package-name package)))
>  
> +(define (package-upstream-name* package)
> +  "Return the upstream name of PACKAGE, which could be different from the name
> +it has in Guix."

s/which could.*Guix/accounting for commonly-used package name prefixes
in addition to the @code{upstream-name} property/

Preferably make this addition in a separate commit.

> +++ b/guix/ui.scm
> @@ -1623,10 +1623,23 @@ (define (relevance obj regexps metrics)
>    (define (score regexp str)
>      (fold-matches regexp str 0
>                    (lambda (m score)
> -                    (+ score
> -                       (if (string=? (match:substring m) str)
> -                           5             ;exact match
> -                           1)))))
> +                    (let* ((start (- (match:start m) 1))
> +                           (end (match:end m))
> +                           (left (if (>= start 0) (string-ref str start) #f))
> +                           (right (if (< end (string-length str)) (string-ref str end) #f))
> +                           (delimiter-classes '(Cc Cf Pd Pe Pf Pi Po Ps Sk Zs Zl Zp))
> +                           (delim-left (or (member (and=> left char-general-category) delimiter-classes) (eq? left #f)))
> +                           (delim-right (or (member (and=> right char-general-category) delimiter-classes) (eq? right #f))))
> +                      (max score
> +                        (cond
> +                          ;; regexp is a full match for str.
> +                          ((and (eq? left #f) (eq? right #f)) 4)
> +                          ;; regexp matches a single word in str.
> +                          ((and delim-left delim-right) 3)
> +                          ;; regexp matches the beginning or end of a word in str.
> +                          ((or delim-left delim-right) 2)
> +                          ;; Everything else.
> +                          (#t 1)))))))

The intent is to have all regexps behave as if the user passed \<STR\>,
is that right?  Would be nice to have a comment clarifying that above
and perhaps making it a separate change?

Stylistic notes:

  (if cond consequent #f)  =>  (and cond consequent)
  (eq? x #f)               =>  (not x)
  (cond … (#t x))          =>  (cond … (else x))

> @@ -1635,10 +1648,11 @@ (define (regexp->score regexp)
>                  ((field . weight)
>                   (match (field obj)
>                     (#f  relevance)
> +                   ('() relevance)
>                     ((? string? str)
> -                    (+ relevance (* (score-regexp str) weight)))
> +                    (max relevance (* (score-regexp str) weight)))
>                     ((lst ...)
> -                    (+ relevance (* weight (apply + (map score-regexp lst)))))))))
> +                    (max relevance (* weight (apply max (map score-regexp lst)))))))))

Intuitively I would expect scores to add up, otherwise we’re kinda
losing information; so I would not make this change.  WDYT?

There’s a test for ‘package-relevance’ in tests/ui.scm.  Please make
sure it still passes and ideally add relevant tests such as the CSV
example you gave.

Thanks!

Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-10-17  7:46     ` Ludovic Courtès
@ 2022-10-17  8:19       ` zimoun
  0 siblings, 0 replies; 15+ messages in thread
From: zimoun @ 2022-10-17  8:19 UTC (permalink / raw)
  To: Ludovic Courtès, Lars-Dominik Braun; +Cc: 58136

Hi Lars, Ludo,

In short, I miss why the initial patch with a minor tweak is not enough
for covering the corner cases. :-)


On lun., 17 oct. 2022 at 09:46, Ludovic Courtès <ludo@gnu.org> wrote:
>> +++ b/guix/ui.scm
>> @@ -1623,10 +1623,23 @@ (define (relevance obj regexps metrics)
>>    (define (score regexp str)
>>      (fold-matches regexp str 0
>>                    (lambda (m score)
>> -                    (+ score
>> -                       (if (string=? (match:substring m) str)
>> -                           5             ;exact match
>> -                           1)))))
>> +                    (let* ((start (- (match:start m) 1))
>> +                           (end (match:end m))
>> +                           (left (if (>= start 0) (string-ref str start) #f))
>> +                           (right (if (< end (string-length str)) (string-ref str end) #f))
>> +                           (delimiter-classes '(Cc Cf Pd Pe Pf Pi Po Ps Sk Zs Zl Zp))
>> +                           (delim-left (or (member (and=> left char-general-category) delimiter-classes) (eq? left #f)))
>> +                           (delim-right (or (member (and=> right char-general-category) delimiter-classes) (eq? right #f))))
>> +                      (max score
>> +                        (cond
>> +                          ;; regexp is a full match for str.
>> +                          ((and (eq? left #f) (eq? right #f)) 4)
>> +                          ;; regexp matches a single word in str.
>> +                          ((and delim-left delim-right) 3)
>> +                          ;; regexp matches the beginning or end of a word in str.
>> +                          ((or delim-left delim-right) 2)
>> +                          ;; Everything else.
>> +                          (#t 1)))))))
>
> The intent is to have all regexps behave as if the user passed \<STR\>,
> is that right?  Would be nice to have a comment clarifying that above
> and perhaps making it a separate change?

All this appears to me overcomplicated.  Personally, I have to read it
many times to get the logic; while the initial patch was much clearer,
IMHO.

Other said, I am not convinced the complexity is worth the corner case.


The initial patch with the minor tweak I am proposing (maybe using
package-upstream-name*) appears to me enough for covering the corner
cases initially reported (as ggplot2).


>> @@ -1635,10 +1648,11 @@ (define (regexp->score regexp)
>>                  ((field . weight)
>>                   (match (field obj)
>>                     (#f  relevance)
>> +                   ('() relevance)
>>                     ((? string? str)
>> -                    (+ relevance (* (score-regexp str) weight)))
>> +                    (max relevance (* (score-regexp str) weight)))
>>                     ((lst ...)
>> -                    (+ relevance (* weight (apply + (map score-regexp lst)))))))))
>> +                    (max relevance (* weight (apply max (map score-regexp lst)))))))))
>
> Intuitively I would expect scores to add up, otherwise we’re kinda
> losing information; so I would not make this change.  WDYT?

I agree with Ludo that ’max’ is counterintuitive.


Well, could we list some examples (keyword and expectation)?  Because
the initial patch with a minor tweak LGTM and covers ggplot2, csv, and
some others.


Cheers,
simon




^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-09-28  9:27 [bug#58136] [PATCH] ui: Improve sort order when searching package names Lars-Dominik Braun
  2022-09-28 14:26 ` zimoun
@ 2022-12-09 11:49 ` Lars-Dominik Braun
  2022-12-13 13:28   ` bug#58136: " Ludovic Courtès
  1 sibling, 1 reply; 15+ messages in thread
From: Lars-Dominik Braun @ 2022-12-09 11:49 UTC (permalink / raw)
  To: 58136; +Cc: Ludovic Courtès, zimoun

[-- Attachment #1: Type: text/plain, Size: 282 bytes --]

Hi,

attached is version 2 of my initial patch, which simply moves prefix
detection into PACKAGE-UPSTREAM-NAME* (as suggested by Ludo) and reduces
the score to 2 (as suggested by Simon). It’s therefore quite tailored
to the “ggplot2 problem”, but does the job.

Thanks,
Lars


[-- Attachment #2: 0001-packages-Add-package-upstream-name.patch --]
[-- Type: text/plain, Size: 2840 bytes --]

From e0092d786b29f4f1cdc35217212a86804f36cdb4 Mon Sep 17 00:00:00 2001
From: Lars-Dominik Braun <lars@6xq.net>
Date: Fri, 9 Dec 2022 11:46:37 +0100
Subject: [PATCH v2 1/2] packages: Add 'package-upstream-name*'.

* guix/packages.scm (package-upstream-name*): New procedure.
* tests/packages.scm ("package-upstream-name*"): New test.
---
 guix/packages.scm  | 33 +++++++++++++++++++++++++++++++++
 tests/packages.scm |  4 ++++
 2 files changed, 37 insertions(+)

diff --git a/guix/packages.scm b/guix/packages.scm
index 8f119d9fa7..5e8e3a4ff4 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -89,6 +89,7 @@ (define-module (guix packages)
             this-package
             package-name
             package-upstream-name
+            package-upstream-name*
             package-version
             package-full-name
             package-source
@@ -691,6 +692,38 @@ (define (package-upstream-name package)
   (or (assq-ref (package-properties package) 'upstream-name)
       (package-name package)))
 
+(define (package-upstream-name* package)
+  "Return the upstream name of PACKAGE, accounting for commonly-used
+package name prefixes in addition to the @code{upstream-name} property."
+  (let ((namespaces (list "cl-"
+                          "ecl-"
+                          "emacs-"
+                          "ghc-"
+                          "go-"
+                          "guile-"
+                          "java-"
+                          "julia-"
+                          "lua-"
+                          "minetest-"
+                          "node-"
+                          "ocaml-"
+                          "perl-"
+                          "python-"
+                          "r-"
+                          "ruby-"
+                          "rust-"
+                          "sbcl-"
+                          "texlive-"))
+        (name (package-name package)))
+    (or (assq-ref (package-properties package) 'upstream-name)
+        (let loop ((prefixes namespaces))
+          (match prefixes
+            ('() name)
+            ((prefix rest ...)
+              (if (string-prefix? prefix name)
+                (substring name (string-length prefix))
+                (loop (cdr prefixes)))))))))
+
 (define (hidden-package p)
   "Return a \"hidden\" version of P--i.e., one that 'fold-packages' and thus,
 user interfaces, ignores."
diff --git a/tests/packages.scm b/tests/packages.scm
index a5819d8de3..f58c47817b 100644
--- a/tests/packages.scm
+++ b/tests/packages.scm
@@ -626,6 +626,10 @@ (define read-at
     (build-derivations %store (list drv))
     (call-with-input-file output get-string-all)))
 
+(test-equal "package-upstream-name*"
+  (package-upstream-name* (specification->package "guile-gcrypt"))
+  "gcrypt")
+
 \f
 ;;;
 ;;; Source derivation with snippets.
-- 
2.37.4


[-- Attachment #3: 0002-ui-Take-package-upstream-name-into-account-when-sear.patch --]
[-- Type: text/plain, Size: 746 bytes --]

From 5e0e8c0145a728d0ed49116596f06cc15e1e865d Mon Sep 17 00:00:00 2001
From: Lars-Dominik Braun <lars@6xq.net>
Date: Fri, 9 Dec 2022 12:01:31 +0100
Subject: [PATCH v2 2/2] ui: Take package upstream name into account when
 searching.

* guix/ui.scm (%package-metrics): Add PACKAGE-UPSTREAM-NAME*.
---
 guix/ui.scm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/guix/ui.scm b/guix/ui.scm
index 45eccb7335..3bca3b1e40 100644
--- a/guix/ui.scm
+++ b/guix/ui.scm
@@ -1668,6 +1668,7 @@ (define %package-metrics
   ;; Metrics used to compute the "relevance score" of a package against a set
   ;; of regexps.
   `((,package-name . 4)
+    (,package-upstream-name* . 2)
 
     ;; Match against uncommon outputs.
     (,(lambda (package)
-- 
2.37.4


^ permalink raw reply related	[flat|nested] 15+ messages in thread

* bug#58136: [PATCH] ui: Improve sort order when searching package names.
  2022-12-09 11:49 ` Lars-Dominik Braun
@ 2022-12-13 13:28   ` Ludovic Courtès
  2022-12-13 14:53     ` [bug#58136] " Lars-Dominik Braun
  0 siblings, 1 reply; 15+ messages in thread
From: Ludovic Courtès @ 2022-12-13 13:28 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 58136-done, zimoun

[-- Attachment #1: Type: text/plain, Size: 799 bytes --]

Hi,

Lars-Dominik Braun <lars@6xq.net> skribis:

> From e0092d786b29f4f1cdc35217212a86804f36cdb4 Mon Sep 17 00:00:00 2001
> From: Lars-Dominik Braun <lars@6xq.net>
> Date: Fri, 9 Dec 2022 11:46:37 +0100
> Subject: [PATCH v2 1/2] packages: Add 'package-upstream-name*'.
>
> * guix/packages.scm (package-upstream-name*): New procedure.
> * tests/packages.scm ("package-upstream-name*"): New test.

[...]

> From 5e0e8c0145a728d0ed49116596f06cc15e1e865d Mon Sep 17 00:00:00 2001
> From: Lars-Dominik Braun <lars@6xq.net>
> Date: Fri, 9 Dec 2022 12:01:31 +0100
> Subject: [PATCH v2 2/2] ui: Take package upstream name into account when
>  searching.
>
> * guix/ui.scm (%package-metrics): Add PACKAGE-UPSTREAM-NAME*.

Applied with the minor changes below, thank you!

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 707 bytes --]

diff --git a/guix/packages.scm b/guix/packages.scm
index 5e8e3a4ff4..6e61e16aa4 100644
--- a/guix/packages.scm
+++ b/guix/packages.scm
@@ -718,11 +718,11 @@ (define (package-upstream-name* package)
     (or (assq-ref (package-properties package) 'upstream-name)
         (let loop ((prefixes namespaces))
           (match prefixes
-            ('() name)
+            (() name)
             ((prefix rest ...)
               (if (string-prefix? prefix name)
                 (substring name (string-length prefix))
-                (loop (cdr prefixes)))))))))
+                (loop rest))))))))
 
 (define (hidden-package p)
   "Return a \"hidden\" version of P--i.e., one that 'fold-packages' and thus,

[-- Attachment #3: Type: text/plain, Size: 7 bytes --]


and:


[-- Attachment #4: Type: text/x-patch, Size: 1144 bytes --]

diff --git a/tests/ui.scm b/tests/ui.scm
index 6a25a204ca..438acae525 100644
--- a/tests/ui.scm
+++ b/tests/ui.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2013, 2014, 2015, 2016, 2017, 2019, 2020 Ludovic Courtès <ludo@gnu.org>
+;;; Copyright © 2013-2017, 2019-2020, 2022 Ludovic Courtès <ludo@gnu.org>
 ;;; Copyright © 2022 Taiju HIGASHI <higashi@taiju.info>
 ;;;
 ;;; This file is part of GNU Guix.
@@ -294,6 +294,15 @@ (define guile-2.0.9
          (>0 (package-relevance libb2
                                 (map rx '("crypto" "library")))))))
 
+(test-assert "package-relevance and upstream name"
+  ;; https://issues.guix.gnu.org/58136
+  (let ((ggplot2  (specification->package "r-ggplot2"))
+        (ggstance (specification->package "r-ggstance"))
+        (rx       (make-regexp "ggplot2" regexp/icase)))
+    (> (package-relevance ggplot2 (list rx))
+       (package-relevance ggstance (list rx))
+       0)))
+
 (define (make-empty-file directory file)
   ;; Create FILE in DIRECTORY.
   (close-port (open-output-file (in-vicinity directory file))))

^ permalink raw reply related	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-12-13 13:28   ` bug#58136: " Ludovic Courtès
@ 2022-12-13 14:53     ` Lars-Dominik Braun
  2022-12-13 16:40       ` Ludovic Courtès
  0 siblings, 1 reply; 15+ messages in thread
From: Lars-Dominik Braun @ 2022-12-13 14:53 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 58136

Hi Ludo,

> Applied with the minor changes below, thank you!
thank you!

> +(test-assert "package-relevance and upstream name"
> +  ;; https://issues.guix.gnu.org/58136
> +  (let ((ggplot2  (specification->package "r-ggplot2"))
> +        (ggstance (specification->package "r-ggstance"))
> +        (rx       (make-regexp "ggplot2" regexp/icase)))
> +    (> (package-relevance ggplot2 (list rx))
> +       (package-relevance ggstance (list rx))
> +       0)))
I was hesitant to add a system test, which depends on real package
descriptions (not synthetic ones), because at some point it *will* break.

Cheers,
Lars





^ permalink raw reply	[flat|nested] 15+ messages in thread

* [bug#58136] [PATCH] ui: Improve sort order when searching package names.
  2022-12-13 14:53     ` [bug#58136] " Lars-Dominik Braun
@ 2022-12-13 16:40       ` Ludovic Courtès
  0 siblings, 0 replies; 15+ messages in thread
From: Ludovic Courtès @ 2022-12-13 16:40 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: 58136

Hi,

Lars-Dominik Braun <lars@6xq.net> skribis:

>> Applied with the minor changes below, thank you!
> thank you!
>
>> +(test-assert "package-relevance and upstream name"
>> +  ;; https://issues.guix.gnu.org/58136
>> +  (let ((ggplot2  (specification->package "r-ggplot2"))
>> +        (ggstance (specification->package "r-ggstance"))
>> +        (rx       (make-regexp "ggplot2" regexp/icase)))
>> +    (> (package-relevance ggplot2 (list rx))
>> +       (package-relevance ggstance (list rx))
>> +       0)))
> I was hesitant to add a system test, which depends on real package
> descriptions (not synthetic ones), because at some point it *will* break.

Yes, that’s a tradeoff.  For now, I would think the test is a plus as it
will allow us to see if future tweaks break this use case but yeah, on
the day it breaks, we’ll have to rewrite it or to drop it.

Ludo’.




^ permalink raw reply	[flat|nested] 15+ messages in thread

end of thread, other threads:[~2022-12-13 16:41 UTC | newest]

Thread overview: 15+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-09-28  9:27 [bug#58136] [PATCH] ui: Improve sort order when searching package names Lars-Dominik Braun
2022-09-28 14:26 ` zimoun
2022-09-28 20:23   ` Maxime Devos
2022-09-28 20:45     ` Maxime Devos
2022-09-28 21:40     ` zimoun
2022-09-28 21:43       ` Maxime Devos
2022-10-01 21:42   ` Ludovic Courtès
2022-10-02  8:26     ` zimoun
2022-10-12 11:24   ` Lars-Dominik Braun
2022-10-17  7:46     ` Ludovic Courtès
2022-10-17  8:19       ` zimoun
2022-12-09 11:49 ` Lars-Dominik Braun
2022-12-13 13:28   ` bug#58136: " Ludovic Courtès
2022-12-13 14:53     ` [bug#58136] " Lars-Dominik Braun
2022-12-13 16:40       ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.