all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: Arun Isaac <arunisaac@systemreboot.net>
Cc: "Ludovic Courtès" <ludo@gnu.org>, 39258@debbugs.gnu.org
Subject: [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search.
Date: Tue, 3 Mar 2020 20:21:46 +0100	[thread overview]
Message-ID: <CAJ3okZ0XiJcAwq8bqOWL7bPWMXrZ6zo9D0JMx86x0nJa4apBpQ@mail.gmail.com> (raw)
In-Reply-To: <20200227204150.30985-5-arunisaac@systemreboot.net>

Hi Arun,


On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote:
>
> * gnu/packages.scm (search-package-index): New function.
> * guix/scripts/package.scm (find-packages-by-description): Search using the
> xapian package index if search patterns are literal strings. Else, search
> using fold-packages.
> ---
>  gnu/packages.scm         | 17 +++++++++++-
>  guix/scripts/package.scm | 57 +++++++++++++++++++++++-----------------
>  2 files changed, 49 insertions(+), 25 deletions(-)
>
> diff --git a/gnu/packages.scm b/gnu/packages.scm
> index e91753e2a8..5b5b29bf84 100644
> --- a/gnu/packages.scm
> +++ b/gnu/packages.scm
> @@ -67,7 +67,8 @@
>              specifications->manifest
>
>              generate-package-cache
> -            generate-package-search-index))
> +            generate-package-search-index
> +            search-package-index))
>
>  ;;; Commentary:
>  ;;;
> @@ -453,6 +454,20 @@ reducing the memory footprint."
>
>    db-path)
>
> +(define (search-package-index profile querystring)
> +  (let ((offset 0)
> +        (pagesize 10))

Why this value of 10?
This fix the number of packages returned. Hum?
I have tried to replace by 100 and I got 100 packages. :-)


> +    (call-with-database (string-append profile %package-search-index)
> +      (lambda (db)
> +        (let ((query (parse-query querystring #:stemmer (make-stem "en"))))
> +          (mset-fold (lambda (item result)

I do not know what is the convention for the bindings.
But there is 'fold-packages' so I would be inclined to 'fold-msets' or
something in this flavour.


> +                       (match (find-packages-by-name
> +                               (document-data (mset-item-document item)))
> +                         ((package _ ...)
> +                          (append result `((,package . ,(mset-item-weight item)))))))
> +                     '()
> +                     (enquire-mset (enquire db query) offset pagesize)))))))
> +
>
>  (define %sigint-prompt
>    ;; The prompt to jump to upon SIGINT.
> diff --git a/guix/scripts/package.scm b/guix/scripts/package.scm
> index 1cb0d382bf..6a3b9002dd 100644
> --- a/guix/scripts/package.scm
> +++ b/guix/scripts/package.scm
> @@ -7,6 +7,7 @@
>  ;;; Copyright © 2016 Benz Schenk <benz.schenk@uzh.ch>
>  ;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com>
>  ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr>
> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>
>  ;;;
>  ;;; This file is part of GNU Guix.
>  ;;;
> @@ -178,31 +179,40 @@ hooks\" run when building the profile."
>  ;;; Package specifications.
>  ;;;
>
> -(define (find-packages-by-description regexps)
> +(define (find-packages-by-description patterns)
>    "Return a list of pairs: packages whose name, synopsis, description,
>  or output matches at least one of REGEXPS sorted by relevance, and its
>  non-zero relevance score."
> -  (let ((matches (fold-packages (lambda (package result)
> -                                  (if (package-superseded package)
> -                                      result
> -                                      (match (package-relevance package
> -                                                                regexps)
> -                                        ((? zero?)
> -                                         result)
> -                                        (score
> -                                         (cons (cons package score)
> -                                               result)))))
> -                                '())))
> -    (sort matches
> -          (lambda (m1 m2)
> -            (match m1
> -              ((package1 . score1)
> -               (match m2
> -                 ((package2 . score2)
> -                  (if (= score1 score2)
> -                      (string>? (package-full-name package1)
> -                                (package-full-name package2))
> -                      (> score1 score2))))))))))
> +  (define (regexp? str)
> +    (string-any
> +     (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)
> +     str))

Instead of reverting this, I would let the current
'find-packages-by-description' and would add
'find-packages-by-description-indexed' doing just
'(search-package-index (current-profile) (string-join patterns " "))'.
And maybe refactoring the sort of scores. Then I would put the test
branch in 'guix/scripts/packages.scm'...


> +  (if (and (current-profile)
> +           (not (any regexp? patterns)))
> +      (search-package-index (current-profile) (string-join patterns " "))
> +      (let* ((regexps (map (cut make-regexp* <> regexp/icase) patterns))
> +             (matches (fold-packages (lambda (package result)
> +                                       (if (package-superseded package)
> +                                           result
> +                                           (match (package-relevance package

Note that I am in the process of implementing the BM25 weights as
'package-relevance'; at least really thinking about it! :-)
I have already talked about TF-IDF as relevance, for example here [1].
And reading the Xapian documentation [2], it seems affordable. Or not
;-) because of the regexp... Need some thoughts... I mean "in the
process". ;-)
And in this case, it is almost a drop-in replacement of
'fold-packages' by 'mset-fold'; well it should add some flexibility
and a more unified code.

(Aside the searching, IMHO 'package-relevance' should help too in the
linting process of bad written descriptions, another story. ;-)

[1] https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00252.html
[2] https://xapian.org/docs/bm25.html


> +                                                                     regexps)
> +                                             ((? zero?)
> +                                              result)
> +                                             (score
> +                                              (cons (cons package score)
> +                                                    result)))))
> +                                     '())))
> +        (sort matches
> +              (lambda (m1 m2)
> +                (match m1
> +                  ((package1 . score1)
> +                   (match m2
> +                     ((package2 . score2)
> +                      (if (= score1 score2)
> +                          (string>? (package-full-name package1)
> +                                    (package-full-name package2))
> +                          (> score1 score2)))))))))))
>
>  (define (transaction-upgrade-entry store entry transaction)
>    "Return a variant of TRANSACTION that accounts for the upgrade of ENTRY, a
> @@ -777,8 +787,7 @@ processed, #f otherwise."

...here.

+  (define (regexp? str)
+    (string-any
+     (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)
+     str))

>                                        (('query 'search rx) rx)
>                                        (_                   #f))
>                                      opts))
>
> -              (regexps  (map (cut make-regexp* <> regexp/icase) patterns))
> -              (matches  (find-packages-by-description regexps)))

+ (if   (any regexp? patterns)
+    (matches (find-packages-by-description regexps))
+    (matches (find-packages-by-description-indexed patterns))

I mean something like that.

>           (leave-on-EPIPE
>            (display-search-results matches (current-output-port)))
>           #t))
> --
> 2.23.0


All the best,
simon

  parent reply	other threads:[~2020-03-03 19:23 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-23 19:51 [bug#39258] Faster guix search using an sqlite cache Arun Isaac
2020-01-29 23:33 ` zimoun
2020-01-30 13:48   ` Arun Isaac
2020-01-31 12:48     ` zimoun
2020-02-02 21:16       ` Arun Isaac
2020-02-04 10:19         ` zimoun
2020-02-06  1:58           ` Arun Isaac
2020-02-11 16:29             ` Ludovic Courtès
2020-02-11 18:21               ` zimoun
2020-02-11 18:39                 ` Ludovic Courtès
2020-02-11 19:07                   ` Arun Isaac
2020-02-11 20:20                     ` zimoun
2020-02-15 14:50                     ` Arun Isaac
2020-02-11 20:13                   ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 0/4] Xapian for Guix package search Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 1/4] gnu: Add guile-xapian Arun Isaac
2020-03-03 16:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 2/4] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index Arun Isaac
2020-02-28  8:04     ` Pierre Neidhardt
2020-03-05 20:26       ` Arun Isaac
2020-03-03 18:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search Arun Isaac
2020-02-28  8:11     ` Pierre Neidhardt
2020-03-03 19:21     ` zimoun [this message]
2020-03-03 19:51       ` zimoun
2020-02-28  8:13   ` [bug#39258] [PATCH 0/4] Xapian for Guix " Pierre Neidhardt
2020-02-28 12:39     ` zimoun
2020-02-28 12:49       ` Pierre Neidhardt
2020-02-28 15:36     ` Arun Isaac
2020-02-28 16:04       ` Arun Isaac
2020-03-02 18:37         ` zimoun
2020-03-02 19:13           ` zimoun
2020-03-03 20:04             ` zimoun
2020-02-29  8:25       ` Arun Isaac
2020-03-02 18:27         ` zimoun
2020-02-28 12:36   ` zimoun
2020-03-05 16:46   ` Ludovic Courtès
2020-03-07 13:31 ` [bug#39258] [PATCH v2 0/3] " Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-03-09 18:14     ` zimoun
2020-03-09 23:40     ` Jonathan Brielmaier
2020-03-10  5:24       ` Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 2/3] gnu: Generate Xapian package search index Arun Isaac
2020-03-09 18:19     ` zimoun
2020-03-07 13:31   ` [bug#39258] [PATCH v2 3/3] gnu: Use Xapian index for package search Arun Isaac
2020-03-07 20:33   ` [bug#39258] [PATCH v2 0/3] Xapian for Guix " Ludovic Courtès
2020-03-08  9:01     ` Arun Isaac
2020-03-08 11:33       ` Ludovic Courtès
2020-03-08 20:27         ` Arun Isaac
2020-03-09  7:42           ` Pierre Neidhardt
2020-03-09 12:50             ` zimoun
2020-03-09 10:35           ` Ludovic Courtès
2020-03-10 14:17             ` Arun Isaac
2020-03-10 14:33               ` zimoun
2020-03-11 13:50               ` Ludovic Courtès
2020-03-13  5:37                 ` Arun Isaac
2020-03-15 20:40                   ` Ludovic Courtès
2020-03-09  7:50         ` Pierre Neidhardt
2020-03-09 10:28           ` Ludovic Courtès
2020-03-09 13:03             ` zimoun
2020-03-09 12:53           ` zimoun
2020-03-09 12:47         ` zimoun
2020-03-09 12:40       ` zimoun
2020-03-09 12:34     ` zimoun
2020-03-08 20:27   ` zimoun
2020-03-08 20:40     ` Arun Isaac
2020-03-09 12:28   ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Arun Isaac
2020-03-27 16:26   ` [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache Arun Isaac
2020-04-24 20:48     ` Ludovic Courtès
2020-04-26  9:48       ` zimoun
2020-04-26 14:35         ` Ludovic Courtès
2020-04-26 14:54           ` Pierre Neidhardt
2020-04-26 15:33             ` Ludovic Courtès
2020-04-26 15:05           ` zimoun
2020-03-27 16:26   ` [bug#39258] [PATCH v3 2/3] guix: Search " Arun Isaac
2020-04-24 20:58     ` Ludovic Courtès
2020-03-27 16:26   ` [bug#39258] [PATCH v3 3/3] guix: Use package metadata cache for package search Arun Isaac
2020-04-24 21:03     ` Ludovic Courtès
2020-04-05 14:08   ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Ludovic Courtès
2020-04-24 21:05   ` Ludovic Courtès
2020-04-26  3:54 ` [bug#39258] benchmark search: default vs v2 vs v3 zimoun
2020-04-26  7:29   ` Pierre Neidhardt
2020-04-26 15:49   ` Ludovic Courtès
2020-04-26 17:01     ` zimoun
2020-04-26 20:22       ` Ludovic Courtès
2020-04-30 13:10     ` zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 1/3] DRAFT packages: Add fields to packages cache zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 2/3] DRAFT packages: Add new procedure 'fold-packages*' zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 3/3] DRAFT guix package: Use cache in 'find-packages-by-description' zimoun
2020-05-03 16:43   ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) Ludovic Courtès
2020-05-03 18:10     ` zimoun
2020-05-03 19:49       ` Ludovic Courtès
2020-06-01  0:00 ` [bug#39258] [PATCH 0/4] Optimize guix search Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 1/4] ui: Cut off search early if any regexp does not match Arun Isaac
2020-06-09  8:29     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-09  9:55       ` zimoun
2020-06-13 12:37       ` Arun Isaac
2020-06-13 13:36         ` zimoun
2020-06-13 17:21           ` Arun Isaac
2020-06-14 19:14             ` zimoun
2020-06-13 19:32         ` Ludovic Courtès
2020-06-15 20:18           ` Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 3/4] ui: Do not translate package synopsis a second time Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 4/4] ui: Use package-description-string Arun Isaac
2020-06-09  8:34     ` Ludovic Courtès
2020-06-01  1:25   ` [bug#39258] [PATCH v5 0/4] Optimize guix search zimoun
2020-06-01  2:24     ` Arun Isaac
2020-06-01 10:01     ` zimoun
2020-06-01 10:11 ` [bug#39258] KMP string search algorithm? zimoun
2020-06-01 22:24   ` Leo Famulari
2020-06-01 23:48     ` Arun Isaac
2020-06-02  8:49       ` Ludovic Courtès
2021-07-15  7:33 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 1/2] DRAFT packages: Add fields to packages cache zimoun
2021-07-17  8:31     ` Arun Isaac
2021-07-23 15:30       ` Ludovic Courtès
2021-08-17 14:03         ` zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 2/2] DRAFT scripts: package: Use cache in 'find-packages-by-description' zimoun
2021-07-23 15:43   ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances Ludovic Courtès
2021-08-20 15:42     ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ3okZ0XiJcAwq8bqOWL7bPWMXrZ6zo9D0JMx86x0nJa4apBpQ@mail.gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=39258@debbugs.gnu.org \
    --cc=arunisaac@systemreboot.net \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.