From: zimoun <zimon.toutoune@gmail.com>
To: Arun Isaac <arunisaac@systemreboot.net>
Cc: "Ludovic Courtès" <ludo@gnu.org>, 39258@debbugs.gnu.org
Subject: [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search.
Date: Tue, 3 Mar 2020 20:21:46 +0100 [thread overview]
Message-ID: <CAJ3okZ0XiJcAwq8bqOWL7bPWMXrZ6zo9D0JMx86x0nJa4apBpQ@mail.gmail.com> (raw)
In-Reply-To: <20200227204150.30985-5-arunisaac@systemreboot.net>
Hi Arun,
On Thu, 27 Feb 2020 at 21:42, Arun Isaac <arunisaac@systemreboot.net> wrote:
>
> * gnu/packages.scm (search-package-index): New function.
> * guix/scripts/package.scm (find-packages-by-description): Search using the
> xapian package index if search patterns are literal strings. Else, search
> using fold-packages.
> ---
> gnu/packages.scm | 17 +++++++++++-
> guix/scripts/package.scm | 57 +++++++++++++++++++++++-----------------
> 2 files changed, 49 insertions(+), 25 deletions(-)
>
> diff --git a/gnu/packages.scm b/gnu/packages.scm
> index e91753e2a8..5b5b29bf84 100644
> --- a/gnu/packages.scm
> +++ b/gnu/packages.scm
> @@ -67,7 +67,8 @@
> specifications->manifest
>
> generate-package-cache
> - generate-package-search-index))
> + generate-package-search-index
> + search-package-index))
>
> ;;; Commentary:
> ;;;
> @@ -453,6 +454,20 @@ reducing the memory footprint."
>
> db-path)
>
> +(define (search-package-index profile querystring)
> + (let ((offset 0)
> + (pagesize 10))
Why this value of 10?
This fix the number of packages returned. Hum?
I have tried to replace by 100 and I got 100 packages. :-)
> + (call-with-database (string-append profile %package-search-index)
> + (lambda (db)
> + (let ((query (parse-query querystring #:stemmer (make-stem "en"))))
> + (mset-fold (lambda (item result)
I do not know what is the convention for the bindings.
But there is 'fold-packages' so I would be inclined to 'fold-msets' or
something in this flavour.
> + (match (find-packages-by-name
> + (document-data (mset-item-document item)))
> + ((package _ ...)
> + (append result `((,package . ,(mset-item-weight item)))))))
> + '()
> + (enquire-mset (enquire db query) offset pagesize)))))))
> +
>
> (define %sigint-prompt
> ;; The prompt to jump to upon SIGINT.
> diff --git a/guix/scripts/package.scm b/guix/scripts/package.scm
> index 1cb0d382bf..6a3b9002dd 100644
> --- a/guix/scripts/package.scm
> +++ b/guix/scripts/package.scm
> @@ -7,6 +7,7 @@
> ;;; Copyright © 2016 Benz Schenk <benz.schenk@uzh.ch>
> ;;; Copyright © 2016 Chris Marusich <cmmarusich@gmail.com>
> ;;; Copyright © 2019 Tobias Geerinckx-Rice <me@tobias.gr>
> +;;; Copyright © 2020 Arun Isaac <arunisaac@systemreboot.net>
> ;;;
> ;;; This file is part of GNU Guix.
> ;;;
> @@ -178,31 +179,40 @@ hooks\" run when building the profile."
> ;;; Package specifications.
> ;;;
>
> -(define (find-packages-by-description regexps)
> +(define (find-packages-by-description patterns)
> "Return a list of pairs: packages whose name, synopsis, description,
> or output matches at least one of REGEXPS sorted by relevance, and its
> non-zero relevance score."
> - (let ((matches (fold-packages (lambda (package result)
> - (if (package-superseded package)
> - result
> - (match (package-relevance package
> - regexps)
> - ((? zero?)
> - result)
> - (score
> - (cons (cons package score)
> - result)))))
> - '())))
> - (sort matches
> - (lambda (m1 m2)
> - (match m1
> - ((package1 . score1)
> - (match m2
> - ((package2 . score2)
> - (if (= score1 score2)
> - (string>? (package-full-name package1)
> - (package-full-name package2))
> - (> score1 score2))))))))))
> + (define (regexp? str)
> + (string-any
> + (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)
> + str))
Instead of reverting this, I would let the current
'find-packages-by-description' and would add
'find-packages-by-description-indexed' doing just
'(search-package-index (current-profile) (string-join patterns " "))'.
And maybe refactoring the sort of scores. Then I would put the test
branch in 'guix/scripts/packages.scm'...
> + (if (and (current-profile)
> + (not (any regexp? patterns)))
> + (search-package-index (current-profile) (string-join patterns " "))
> + (let* ((regexps (map (cut make-regexp* <> regexp/icase) patterns))
> + (matches (fold-packages (lambda (package result)
> + (if (package-superseded package)
> + result
> + (match (package-relevance package
Note that I am in the process of implementing the BM25 weights as
'package-relevance'; at least really thinking about it! :-)
I have already talked about TF-IDF as relevance, for example here [1].
And reading the Xapian documentation [2], it seems affordable. Or not
;-) because of the regexp... Need some thoughts... I mean "in the
process". ;-)
And in this case, it is almost a drop-in replacement of
'fold-packages' by 'mset-fold'; well it should add some flexibility
and a more unified code.
(Aside the searching, IMHO 'package-relevance' should help too in the
linting process of bad written descriptions, another story. ;-)
[1] https://lists.gnu.org/archive/html/guix-devel/2019-07/msg00252.html
[2] https://xapian.org/docs/bm25.html
> + regexps)
> + ((? zero?)
> + result)
> + (score
> + (cons (cons package score)
> + result)))))
> + '())))
> + (sort matches
> + (lambda (m1 m2)
> + (match m1
> + ((package1 . score1)
> + (match m2
> + ((package2 . score2)
> + (if (= score1 score2)
> + (string>? (package-full-name package1)
> + (package-full-name package2))
> + (> score1 score2)))))))))))
>
> (define (transaction-upgrade-entry store entry transaction)
> "Return a variant of TRANSACTION that accounts for the upgrade of ENTRY, a
> @@ -777,8 +787,7 @@ processed, #f otherwise."
...here.
+ (define (regexp? str)
+ (string-any
+ (char-set #\. #\[ #\{ #\} #\( #\) #\\ #\* #\+ #\? #\| #\^ #\$)
+ str))
> (('query 'search rx) rx)
> (_ #f))
> opts))
>
> - (regexps (map (cut make-regexp* <> regexp/icase) patterns))
> - (matches (find-packages-by-description regexps)))
+ (if (any regexp? patterns)
+ (matches (find-packages-by-description regexps))
+ (matches (find-packages-by-description-indexed patterns))
I mean something like that.
> (leave-on-EPIPE
> (display-search-results matches (current-output-port)))
> #t))
> --
> 2.23.0
All the best,
simon
next prev parent reply other threads:[~2020-03-03 19:23 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-23 19:51 [bug#39258] Faster guix search using an sqlite cache Arun Isaac
2020-01-29 23:33 ` zimoun
2020-01-30 13:48 ` Arun Isaac
2020-01-31 12:48 ` zimoun
2020-02-02 21:16 ` Arun Isaac
2020-02-04 10:19 ` zimoun
2020-02-06 1:58 ` Arun Isaac
2020-02-11 16:29 ` Ludovic Courtès
2020-02-11 18:21 ` zimoun
2020-02-11 18:39 ` Ludovic Courtès
2020-02-11 19:07 ` Arun Isaac
2020-02-11 20:20 ` zimoun
2020-02-15 14:50 ` Arun Isaac
2020-02-11 20:13 ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 0/4] Xapian for Guix package search Arun Isaac
2020-02-27 20:41 ` [bug#39258] [PATCH 1/4] gnu: Add guile-xapian Arun Isaac
2020-03-03 16:29 ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 2/4] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-02-27 20:41 ` [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index Arun Isaac
2020-02-28 8:04 ` Pierre Neidhardt
2020-03-05 20:26 ` Arun Isaac
2020-03-03 18:29 ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search Arun Isaac
2020-02-28 8:11 ` Pierre Neidhardt
2020-03-03 19:21 ` zimoun [this message]
2020-03-03 19:51 ` zimoun
2020-02-28 8:13 ` [bug#39258] [PATCH 0/4] Xapian for Guix " Pierre Neidhardt
2020-02-28 12:39 ` zimoun
2020-02-28 12:49 ` Pierre Neidhardt
2020-02-28 15:36 ` Arun Isaac
2020-02-28 16:04 ` Arun Isaac
2020-03-02 18:37 ` zimoun
2020-03-02 19:13 ` zimoun
2020-03-03 20:04 ` zimoun
2020-02-29 8:25 ` Arun Isaac
2020-03-02 18:27 ` zimoun
2020-02-28 12:36 ` zimoun
2020-03-05 16:46 ` Ludovic Courtès
2020-03-07 13:31 ` [bug#39258] [PATCH v2 0/3] " Arun Isaac
2020-03-07 13:31 ` [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-03-09 18:14 ` zimoun
2020-03-09 23:40 ` Jonathan Brielmaier
2020-03-10 5:24 ` Arun Isaac
2020-03-07 13:31 ` [bug#39258] [PATCH v2 2/3] gnu: Generate Xapian package search index Arun Isaac
2020-03-09 18:19 ` zimoun
2020-03-07 13:31 ` [bug#39258] [PATCH v2 3/3] gnu: Use Xapian index for package search Arun Isaac
2020-03-07 20:33 ` [bug#39258] [PATCH v2 0/3] Xapian for Guix " Ludovic Courtès
2020-03-08 9:01 ` Arun Isaac
2020-03-08 11:33 ` Ludovic Courtès
2020-03-08 20:27 ` Arun Isaac
2020-03-09 7:42 ` Pierre Neidhardt
2020-03-09 12:50 ` zimoun
2020-03-09 10:35 ` Ludovic Courtès
2020-03-10 14:17 ` Arun Isaac
2020-03-10 14:33 ` zimoun
2020-03-11 13:50 ` Ludovic Courtès
2020-03-13 5:37 ` Arun Isaac
2020-03-15 20:40 ` Ludovic Courtès
2020-03-09 7:50 ` Pierre Neidhardt
2020-03-09 10:28 ` Ludovic Courtès
2020-03-09 13:03 ` zimoun
2020-03-09 12:53 ` zimoun
2020-03-09 12:47 ` zimoun
2020-03-09 12:40 ` zimoun
2020-03-09 12:34 ` zimoun
2020-03-08 20:27 ` zimoun
2020-03-08 20:40 ` Arun Isaac
2020-03-09 12:28 ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Arun Isaac
2020-03-27 16:26 ` [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache Arun Isaac
2020-04-24 20:48 ` Ludovic Courtès
2020-04-26 9:48 ` zimoun
2020-04-26 14:35 ` Ludovic Courtès
2020-04-26 14:54 ` Pierre Neidhardt
2020-04-26 15:33 ` Ludovic Courtès
2020-04-26 15:05 ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 2/3] guix: Search " Arun Isaac
2020-04-24 20:58 ` Ludovic Courtès
2020-03-27 16:26 ` [bug#39258] [PATCH v3 3/3] guix: Use package metadata cache for package search Arun Isaac
2020-04-24 21:03 ` Ludovic Courtès
2020-04-05 14:08 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Ludovic Courtès
2020-04-24 21:05 ` Ludovic Courtès
2020-04-26 3:54 ` [bug#39258] benchmark search: default vs v2 vs v3 zimoun
2020-04-26 7:29 ` Pierre Neidhardt
2020-04-26 15:49 ` Ludovic Courtès
2020-04-26 17:01 ` zimoun
2020-04-26 20:22 ` Ludovic Courtès
2020-04-30 13:10 ` zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 1/3] DRAFT packages: Add fields to packages cache zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 2/3] DRAFT packages: Add new procedure 'fold-packages*' zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 3/3] DRAFT guix package: Use cache in 'find-packages-by-description' zimoun
2020-05-03 16:43 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) Ludovic Courtès
2020-05-03 18:10 ` zimoun
2020-05-03 19:49 ` Ludovic Courtès
2020-06-01 0:00 ` [bug#39258] [PATCH 0/4] Optimize guix search Arun Isaac
2020-06-01 0:00 ` [bug#39258] [PATCH 1/4] ui: Cut off search early if any regexp does not match Arun Isaac
2020-06-09 8:29 ` Ludovic Courtès
2020-06-01 0:00 ` [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings Arun Isaac
2020-06-09 8:33 ` Ludovic Courtès
2020-06-09 9:55 ` zimoun
2020-06-13 12:37 ` Arun Isaac
2020-06-13 13:36 ` zimoun
2020-06-13 17:21 ` Arun Isaac
2020-06-14 19:14 ` zimoun
2020-06-13 19:32 ` Ludovic Courtès
2020-06-15 20:18 ` Arun Isaac
2020-06-01 0:00 ` [bug#39258] [PATCH 3/4] ui: Do not translate package synopsis a second time Arun Isaac
2020-06-09 8:33 ` Ludovic Courtès
2020-06-01 0:00 ` [bug#39258] [PATCH 4/4] ui: Use package-description-string Arun Isaac
2020-06-09 8:34 ` Ludovic Courtès
2020-06-01 1:25 ` [bug#39258] [PATCH v5 0/4] Optimize guix search zimoun
2020-06-01 2:24 ` Arun Isaac
2020-06-01 10:01 ` zimoun
2020-06-01 10:11 ` [bug#39258] KMP string search algorithm? zimoun
2020-06-01 22:24 ` Leo Famulari
2020-06-01 23:48 ` Arun Isaac
2020-06-02 8:49 ` Ludovic Courtès
2021-07-15 7:33 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances zimoun
2021-07-15 7:33 ` [bug#39258] [PATCH v6 1/2] DRAFT packages: Add fields to packages cache zimoun
2021-07-17 8:31 ` Arun Isaac
2021-07-23 15:30 ` Ludovic Courtès
2021-08-17 14:03 ` zimoun
2021-07-15 7:33 ` [bug#39258] [PATCH v6 2/2] DRAFT scripts: package: Use cache in 'find-packages-by-description' zimoun
2021-07-23 15:43 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances Ludovic Courtès
2021-08-20 15:42 ` zimoun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=CAJ3okZ0XiJcAwq8bqOWL7bPWMXrZ6zo9D0JMx86x0nJa4apBpQ@mail.gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=39258@debbugs.gnu.org \
--cc=arunisaac@systemreboot.net \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).