unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Arun Isaac <arunisaac@systemreboot.net>
Cc: mail@ambrevar.xyz, 39258@debbugs.gnu.org, zimon.toutoune@gmail.com
Subject: [bug#39258] [PATCH v2 0/3] Xapian for Guix package search
Date: Sun, 08 Mar 2020 12:33:44 +0100	[thread overview]
Message-ID: <875zffcc87.fsf@gnu.org> (raw)
In-Reply-To: <cu7tv2zcj9l.fsf@systemreboot.net> (Arun Isaac's message of "Sun, 08 Mar 2020 14:31:42 +0530")

Hi,

Arun Isaac <arunisaac@systemreboot.net> skribis:

>>> It turns out that most of the time is spent in printing and texinfo
>>> rendering of the search results.
>
> Also, when we put all package metadata into the Xapian index, we don't
> have to look up any of the package variables in (gnu packages *) during
> `guix search` time. This also contributes substantially to the speedup.

Yup.

>> In general, pre-rendering doesn’t seem practical to me: the output of
>> ‘guix search’ is locale-dependent (it speaks the user’s language) and
>
> Note that we already need to index package synopses and descriptions in
> all languages. I still haven't implemented this, though.

Oh, right.  Tricky!

>> adjusts to the terminal width (well, this is temporarily broken on
>> Guile 3.0.0, but see ‘%text-width’ in (guix ui)).
>
> This could be accomplished even with pre-rendering. Xapian provides
> "slots" to store arbitrary strings with a document. Instead of storing
> the pre-rendered document as a whole, we could store pre-rendered fields
> in separate slots. Then, during `guix search` time, we can assemble the
> result from these pre-rendered fields.

I’m not sure I understand.  The index wouldn’t store pre-rendered
strings for every possible terminal width, right?

>> Also, if the 12K+ descriptions need to be rendered at the time the user
>> runs ‘guix pull’, the experience may not be great, because it could take
>> a bit of time.
>
> This is a problem, but I would see it as a necessary "compilation"
> step. :-P In fact, this whole patchset speeds up `guix search` by doing
> part of the work of `guix search` ahead of time. So, some such cost is
> unavoidable.

Yeah.  I think we need to take the whole user experience into account,
not just ‘guix search’.  ‘guix pull’ already feels very slow, and it’s a
fairly common operation.  Conversely, ‘guix search’ takes roughly
between 0.5 and 2 seconds and is an uncommon operation on a “slow path”
(in the sense that when you’re searching for software, you’ll probably
have to spend more than a couple of seconds to find what you’re looking
for.)

>> What I like about the recutils format in this context is that it’s both
>> human- and machine-readable.  The examples in the manual show how it can
>> be useful to select the information displayed or to refine the search
>> (info "(guix) Invoking guix package").
>
> Xapian's query language is much more natural (as in natural language)
> than the regexp based techniques we need to use with recutils. I have
> hardly ever used the regexp based search and I suspect many others
> haven't either. Also, refining the search query should be easier to do
> with Xapian. We could even use Xapian's query expansion feature to
> suggest improved queries to the user.

I’m not sufficiently familiar with Xapian’s query language.  The
examples I had in mind were:

  guix search malloc | recsel -p name,version,relevance
  guix search | recsel -p name -e 'license ~ "LGPL 3"'
  guix search crypto library | \
    recsel -e '! (name ~ "^(ghc|perl|python|ruby)")' -p name,synopsis

It’s not so much about regexps than it is about selecting individual
fields.

>> Were you able to measure the cost of rendering specifically?
>
> generate-package-search-index takes around 50 seconds. If I modify
> generate-package-search-index to not pre-render but simply store the
> package description alone, it takes around 20 seconds. That gives us a
> rough idea of the cost of pre-rendering.

To me, adding 20–50 seconds on ‘guix pull’ would be undesirable.  :-/

>> I think we should look at a profile of ‘package->recutils’, there’s
>> probably room for improvement there.
>
> On quick inspection, most of the time in package->recutils is spent in
> texinfo rendering the description. Unless we use the simplified search
> results format as discussed above, we cannot avoid it.

What I meant was that we could use (statprof) to see whether/how Texinfo
rendering/parsing can be optimized.

Thanks,
Ludo’.

  reply	other threads:[~2020-03-08 11:34 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-23 19:51 [bug#39258] Faster guix search using an sqlite cache Arun Isaac
2020-01-29 23:33 ` zimoun
2020-01-30 13:48   ` Arun Isaac
2020-01-31 12:48     ` zimoun
2020-02-02 21:16       ` Arun Isaac
2020-02-04 10:19         ` zimoun
2020-02-06  1:58           ` Arun Isaac
2020-02-11 16:29             ` Ludovic Courtès
2020-02-11 18:21               ` zimoun
2020-02-11 18:39                 ` Ludovic Courtès
2020-02-11 19:07                   ` Arun Isaac
2020-02-11 20:20                     ` zimoun
2020-02-15 14:50                     ` Arun Isaac
2020-02-11 20:13                   ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 0/4] Xapian for Guix package search Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 1/4] gnu: Add guile-xapian Arun Isaac
2020-03-03 16:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 2/4] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index Arun Isaac
2020-02-28  8:04     ` Pierre Neidhardt
2020-03-05 20:26       ` Arun Isaac
2020-03-03 18:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search Arun Isaac
2020-02-28  8:11     ` Pierre Neidhardt
2020-03-03 19:21     ` zimoun
2020-03-03 19:51       ` zimoun
2020-02-28  8:13   ` [bug#39258] [PATCH 0/4] Xapian for Guix " Pierre Neidhardt
2020-02-28 12:39     ` zimoun
2020-02-28 12:49       ` Pierre Neidhardt
2020-02-28 15:36     ` Arun Isaac
2020-02-28 16:04       ` Arun Isaac
2020-03-02 18:37         ` zimoun
2020-03-02 19:13           ` zimoun
2020-03-03 20:04             ` zimoun
2020-02-29  8:25       ` Arun Isaac
2020-03-02 18:27         ` zimoun
2020-02-28 12:36   ` zimoun
2020-03-05 16:46   ` Ludovic Courtès
2020-03-07 13:31 ` [bug#39258] [PATCH v2 0/3] " Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-03-09 18:14     ` zimoun
2020-03-09 23:40     ` Jonathan Brielmaier
2020-03-10  5:24       ` Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 2/3] gnu: Generate Xapian package search index Arun Isaac
2020-03-09 18:19     ` zimoun
2020-03-07 13:31   ` [bug#39258] [PATCH v2 3/3] gnu: Use Xapian index for package search Arun Isaac
2020-03-07 20:33   ` [bug#39258] [PATCH v2 0/3] Xapian for Guix " Ludovic Courtès
2020-03-08  9:01     ` Arun Isaac
2020-03-08 11:33       ` Ludovic Courtès [this message]
2020-03-08 20:27         ` Arun Isaac
2020-03-09  7:42           ` Pierre Neidhardt
2020-03-09 12:50             ` zimoun
2020-03-09 10:35           ` Ludovic Courtès
2020-03-10 14:17             ` Arun Isaac
2020-03-10 14:33               ` zimoun
2020-03-11 13:50               ` Ludovic Courtès
2020-03-13  5:37                 ` Arun Isaac
2020-03-15 20:40                   ` Ludovic Courtès
2020-03-09  7:50         ` Pierre Neidhardt
2020-03-09 10:28           ` Ludovic Courtès
2020-03-09 13:03             ` zimoun
2020-03-09 12:53           ` zimoun
2020-03-09 12:47         ` zimoun
2020-03-09 12:40       ` zimoun
2020-03-09 12:34     ` zimoun
2020-03-08 20:27   ` zimoun
2020-03-08 20:40     ` Arun Isaac
2020-03-09 12:28   ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Arun Isaac
2020-03-27 16:26   ` [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache Arun Isaac
2020-04-24 20:48     ` Ludovic Courtès
2020-04-26  9:48       ` zimoun
2020-04-26 14:35         ` Ludovic Courtès
2020-04-26 14:54           ` Pierre Neidhardt
2020-04-26 15:33             ` Ludovic Courtès
2020-04-26 15:05           ` zimoun
2020-03-27 16:26   ` [bug#39258] [PATCH v3 2/3] guix: Search " Arun Isaac
2020-04-24 20:58     ` Ludovic Courtès
2020-03-27 16:26   ` [bug#39258] [PATCH v3 3/3] guix: Use package metadata cache for package search Arun Isaac
2020-04-24 21:03     ` Ludovic Courtès
2020-04-05 14:08   ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Ludovic Courtès
2020-04-24 21:05   ` Ludovic Courtès
2020-04-26  3:54 ` [bug#39258] benchmark search: default vs v2 vs v3 zimoun
2020-04-26  7:29   ` Pierre Neidhardt
2020-04-26 15:49   ` Ludovic Courtès
2020-04-26 17:01     ` zimoun
2020-04-26 20:22       ` Ludovic Courtès
2020-04-30 13:10     ` zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 1/3] DRAFT packages: Add fields to packages cache zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 2/3] DRAFT packages: Add new procedure 'fold-packages*' zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 3/3] DRAFT guix package: Use cache in 'find-packages-by-description' zimoun
2020-05-03 16:43   ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) Ludovic Courtès
2020-05-03 18:10     ` zimoun
2020-05-03 19:49       ` Ludovic Courtès
2020-06-01  0:00 ` [bug#39258] [PATCH 0/4] Optimize guix search Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 1/4] ui: Cut off search early if any regexp does not match Arun Isaac
2020-06-09  8:29     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-09  9:55       ` zimoun
2020-06-13 12:37       ` Arun Isaac
2020-06-13 13:36         ` zimoun
2020-06-13 17:21           ` Arun Isaac
2020-06-14 19:14             ` zimoun
2020-06-13 19:32         ` Ludovic Courtès
2020-06-15 20:18           ` Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 3/4] ui: Do not translate package synopsis a second time Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 4/4] ui: Use package-description-string Arun Isaac
2020-06-09  8:34     ` Ludovic Courtès
2020-06-01  1:25   ` [bug#39258] [PATCH v5 0/4] Optimize guix search zimoun
2020-06-01  2:24     ` Arun Isaac
2020-06-01 10:01     ` zimoun
2020-06-01 10:11 ` [bug#39258] KMP string search algorithm? zimoun
2020-06-01 22:24   ` Leo Famulari
2020-06-01 23:48     ` Arun Isaac
2020-06-02  8:49       ` Ludovic Courtès
2021-07-15  7:33 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 1/2] DRAFT packages: Add fields to packages cache zimoun
2021-07-17  8:31     ` Arun Isaac
2021-07-23 15:30       ` Ludovic Courtès
2021-08-17 14:03         ` zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 2/2] DRAFT scripts: package: Use cache in 'find-packages-by-description' zimoun
2021-07-23 15:43   ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances Ludovic Courtès
2021-08-20 15:42     ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=875zffcc87.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=39258@debbugs.gnu.org \
    --cc=arunisaac@systemreboot.net \
    --cc=mail@ambrevar.xyz \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).