unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: 39258@debbugs.gnu.org, "Arun Isaac" <arunisaac@systemreboot.net>,
	"Ludovic Courtès" <ludo@gnu.org>,
	"Pierre Neidhardt" <mail@ambrevar.xyz>
Subject: [bug#39258] benchmark search: default vs v2 vs v3
Date: Sun, 26 Apr 2020 05:54:21 +0200	[thread overview]
Message-ID: <CAJ3okZ0KP0bYuwShZoNkkFWrhGNPsUGOOfiL5azj_x8Q+GNqLA@mail.gmail.com> (raw)
In-Reply-To: <cu7pnfaar36.fsf@systemreboot.net>

Hi,

Thank you Arun for the patches and all the work.  Sorryfor the delay.


TLDR:

 1) around 25 seconds added to "guix pull"... but I am more than often
waiting around 10 minutes when pulling.
 2) the speedup is clear: more than 2x.


The question is the tradeoff between: the slowdown of pull vs the
speedup of search. What is acceptable?


Here let benchmark 3 versions of Guix:

 - default is a357849f5b
 - v2 rebased on default and based on Xapian
 - v3 rebased on default too and based on "custom" index

and let compare the time of "guix pull" and then "guix search".
Because v2 uses Xapian, the accuracy is different and so the list of
outputs is different depending on the query; the impact on the
performance seems minimal.  Let discuss elsewhere about accuracy and
BM25 and let focus on performance for now.


* guix pull
-----------

The idea is: measure if computing the new index is expensive or not,
compared to all of what "guix pull" computes.


** Reference
------------

Maybe, I should have misconfigured something or my laptop is really
not powerful at all, but here some numbers.

(Note: /proc/cpuinfo says 4 times Intel(R) Core(TM) i5-5200U CPU @ 2.20GHz
and /sys/block/sda/queue/rotational says 0 which is SSD.)

--8<---------------cut here---------------start------------->8---
$ guix describe
Generation 8    Apr 25 2020 09:00:01    (current)
  guix f84b036
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: f84b0363053e5479464f6ce6ded45f80360d90fc
--8<---------------cut here---------------end--------------->8---

--8<---------------cut here---------------start------------->8---
$ time guix pull -C ~/.config/guix/default-channels.scm
Updating channel 'guix' from Git repository at
'https://git.savannah.gnu.org/git/guix.git'...
Building from this channel:
  guix      https://git.savannah.gnu.org/git/guix.git   8cf6d15
downloading from
https://ci.guix.gnu.org/nar/gzip/xgakzpfs3rz57m666hsk1v3d3zcy7wgn-config.scm
...
 config.scm

[...]

building fonts directory...
building directory of Info manuals...
building database for manual pages...
building profile with 1 package...
building /gnu/store/kq1zlj5rxz8wrxc3ha8vck2wv2iakfnb-inferior-script.scm.drv...
building package cache...
building profile with 1 package...
New in this revision:
  2 new packages: cl-osicat, sbcl-osicat


real    13m37.997s
user    1m38.129s
sys     0m0.856s
--8<---------------cut here---------------end--------------->8---


And because "guix search" is used say 10 times more than "guix pull",
an increase of 10% of "guix pull" will ease the experience of the user
if "guix search" is faster, IMHO.

Therefore, because "guix pull" takes around 13 minutes, the extra cost
to index all the packages can be roughly 1min30s (at most).


Then, if I pull back from 8cf6d15 to '--commit=a357849f5b' then it takes:

real    2m13.693s
user    1m37.418s
sys     0m0.666s

so in this case 10% means around 7s. But after 1 minute waiting, the
command feels too long to me and personally I am already waiting so I
do not mind much if it would take 2m13s or 3m00s.


Well, it is hard to draw a clear line about what could be accepted as
the time of indexing because the time of pulling is already highly
variable.


What is the average of "guix pull"?

It could be really interresting to probe the users.  They could report:
 - guix describe
 - time guix pull
whatever which channels are up.

Just to have an idea about what should be the acceptable extra time
added by indexing.  For sure it depends on the hardware but it would
provide an idea and help to see if the extra time is worth or not.

WDYT?



** Let's compare the index time
-------------------------------

Let pull for the 3 cases and populate the store by all the necessary
items.  Could be looooonng! (20minutes)  For example, for the version
2 of patches -- living in my branch 'search-v2' using a worktree.

--8<---------------cut here---------------start------------->8---
time ./pre-inst-env guix pull -p /tmp/v2 \
     --url=$PWD --branch=search-v2 \
     -C ~/.config/guix/default-channels.scm
--8<---------------cut here---------------end--------------->8---

and then let spot the index file for each version:

--8<---------------cut here---------------start------------->8---
# ls -l /tmp/default/lib/guix
/gnu/store/g5c08vqsv31nkn2r0hr32dbrkhf3cvd8-guix-package-cache

readlink /tmp/v2/lib/guix/package-search.index
/gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index/lib/guix/package-search.index

readlink /tmp/v3/lib/guix/package-metadata.cache
/gnu/store/8j78b5c4ddic21gcx7wpbq2akjn7x7mr-guix-package-metadata-cache/lib/guix/package-metadata.cache
--8<---------------cut here---------------end--------------->8---

Well, let remove the profiles and garbage collect the index files:

--8<---------------cut here---------------start------------->8---
rm /tmp/default /tmp/v{2,3}*
guix gc -D \
   /gnu/store/g5c08vqsv31nkn2r0hr32dbrkhf3cvd8-guix-package-cache \
   /gnu/store/8xbzhn81hmshagbgazmnr7xfps1cdsa3-guix-package-search-index \
   /gnu/store/8j78b5c4ddic21gcx7wpbq2akjn7x7mr-guix-package-metadata-cache
--8<---------------cut here---------------end--------------->8---


And then re-run "guix pull". We are now comparing apple to apple, I guess.


| time | default   | v2        | v3        |
|------+-----------+-----------+-----------|
| real | 1m11.899s | 1m30.806s | 1m34.341s |
| user | 1m23.845s | 1m24.160s | 1m24.233s |
| sys  | 0m0.570s  | 0m0.563s  | 0m0.529s  |


Therefore less than extra 20s and 25s for v2 and v3.


All the question is an extra 25s compared to which time of "guix pull":
 - more than 13m: adding 25s is acceptable
 - less than 2m: adding 25s is questionable

Usually, my feeling about "guix pull" is... I am waiting!  Therefore,
I will not see this extra 25s because it is masked by all the other
work "guix pull" is doing.


* guix search
-------------

Let compare cold (sudo echo 3 > /proc/sys/vm/drop_caches) and warm
cache.  For example for the query 'inkscape'.


| time | default  | v2       | v3       |
|------+----------+----------+----------|
| real | 0m1.842s | 0m0.331s | 0m0.437s |
| user | 0m1.270s | 0m0.179s | 0m0.336s |
| sys  | 0m0.142s | 0m0.047s | 0m0.052s |
|------+----------+----------+----------|
| real | 0m0.898s | 0m0.132s | 0m0.292s |
| user | 0m1.069s | 0m0.168s | 0m0.353s |
| sys  | 0m0.072s | 0m0.008s | 0m0.019s |


Therefore the speedup is at least 3.

| cache | default-vs-v2 | default-vs-v3 |
|-------+---------------+---------------|
| cold  |           5.6 |           4.2 |
| warm  |           6.8 |           3.1 |


Another query:

--8<---------------cut here---------------start------------->8---
time guix search crypto library | recsel -P name | grep libb2
--8<---------------cut here---------------end--------------->8---

| time | default  | v2       | v3       |
|------+----------+----------+----------|
| real | 0m2.216s | 0m1.109s | 0m0.689s |
| user | 0m1.655s | 0m1.309s | 0m0.683s |
| sys  | 0m0.193s | 0m0.073s | 0m0.035s |
|------+----------+----------+----------|
| real | 0m1.197s | 0m0.490s | 0m0.491s |
| user | 0m1.448s | 0m0.819s | 0m0.625s |
| sys  | 0m0.089s | 0m0.034s | 0m0.039s |


| cache | default-vs-v2 | default-vs-v3 |
|-------+---------------+---------------|
| cold  |           2.0 |           3.2 |
| warm  |           2.4 |           2.4 |




Before going further, especially about any other more sophisticated
inverted index (BM25), it appears to me important to fix what is
"cost" on "guix pull" that the users are ready to pay.  Because
somehow the inverted index has to be computed.  And without an
inverted index, it seems difficult to improve the accurary.

One solution should be: let compute the inverted index in the
background with a low priority.  If the index is not done yet when
"guix search" is called, then fallback to the current default
behaviour.


WDYT?


Cheers,
simon

  parent reply	other threads:[~2020-04-26  3:55 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-23 19:51 [bug#39258] Faster guix search using an sqlite cache Arun Isaac
2020-01-29 23:33 ` zimoun
2020-01-30 13:48   ` Arun Isaac
2020-01-31 12:48     ` zimoun
2020-02-02 21:16       ` Arun Isaac
2020-02-04 10:19         ` zimoun
2020-02-06  1:58           ` Arun Isaac
2020-02-11 16:29             ` Ludovic Courtès
2020-02-11 18:21               ` zimoun
2020-02-11 18:39                 ` Ludovic Courtès
2020-02-11 19:07                   ` Arun Isaac
2020-02-11 20:20                     ` zimoun
2020-02-15 14:50                     ` Arun Isaac
2020-02-11 20:13                   ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 0/4] Xapian for Guix package search Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 1/4] gnu: Add guile-xapian Arun Isaac
2020-03-03 16:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 2/4] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index Arun Isaac
2020-02-28  8:04     ` Pierre Neidhardt
2020-03-05 20:26       ` Arun Isaac
2020-03-03 18:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search Arun Isaac
2020-02-28  8:11     ` Pierre Neidhardt
2020-03-03 19:21     ` zimoun
2020-03-03 19:51       ` zimoun
2020-02-28  8:13   ` [bug#39258] [PATCH 0/4] Xapian for Guix " Pierre Neidhardt
2020-02-28 12:39     ` zimoun
2020-02-28 12:49       ` Pierre Neidhardt
2020-02-28 15:36     ` Arun Isaac
2020-02-28 16:04       ` Arun Isaac
2020-03-02 18:37         ` zimoun
2020-03-02 19:13           ` zimoun
2020-03-03 20:04             ` zimoun
2020-02-29  8:25       ` Arun Isaac
2020-03-02 18:27         ` zimoun
2020-02-28 12:36   ` zimoun
2020-03-05 16:46   ` Ludovic Courtès
2020-03-07 13:31 ` [bug#39258] [PATCH v2 0/3] " Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-03-09 18:14     ` zimoun
2020-03-09 23:40     ` Jonathan Brielmaier
2020-03-10  5:24       ` Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 2/3] gnu: Generate Xapian package search index Arun Isaac
2020-03-09 18:19     ` zimoun
2020-03-07 13:31   ` [bug#39258] [PATCH v2 3/3] gnu: Use Xapian index for package search Arun Isaac
2020-03-07 20:33   ` [bug#39258] [PATCH v2 0/3] Xapian for Guix " Ludovic Courtès
2020-03-08  9:01     ` Arun Isaac
2020-03-08 11:33       ` Ludovic Courtès
2020-03-08 20:27         ` Arun Isaac
2020-03-09  7:42           ` Pierre Neidhardt
2020-03-09 12:50             ` zimoun
2020-03-09 10:35           ` Ludovic Courtès
2020-03-10 14:17             ` Arun Isaac
2020-03-10 14:33               ` zimoun
2020-03-11 13:50               ` Ludovic Courtès
2020-03-13  5:37                 ` Arun Isaac
2020-03-15 20:40                   ` Ludovic Courtès
2020-03-09  7:50         ` Pierre Neidhardt
2020-03-09 10:28           ` Ludovic Courtès
2020-03-09 13:03             ` zimoun
2020-03-09 12:53           ` zimoun
2020-03-09 12:47         ` zimoun
2020-03-09 12:40       ` zimoun
2020-03-09 12:34     ` zimoun
2020-03-08 20:27   ` zimoun
2020-03-08 20:40     ` Arun Isaac
2020-03-09 12:28   ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Arun Isaac
2020-03-27 16:26   ` [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache Arun Isaac
2020-04-24 20:48     ` Ludovic Courtès
2020-04-26  9:48       ` zimoun
2020-04-26 14:35         ` Ludovic Courtès
2020-04-26 14:54           ` Pierre Neidhardt
2020-04-26 15:33             ` Ludovic Courtès
2020-04-26 15:05           ` zimoun
2020-03-27 16:26   ` [bug#39258] [PATCH v3 2/3] guix: Search " Arun Isaac
2020-04-24 20:58     ` Ludovic Courtès
2020-03-27 16:26   ` [bug#39258] [PATCH v3 3/3] guix: Use package metadata cache for package search Arun Isaac
2020-04-24 21:03     ` Ludovic Courtès
2020-04-05 14:08   ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Ludovic Courtès
2020-04-24 21:05   ` Ludovic Courtès
2020-04-26  3:54 ` zimoun [this message]
2020-04-26  7:29   ` [bug#39258] benchmark search: default vs v2 vs v3 Pierre Neidhardt
2020-04-26 15:49   ` Ludovic Courtès
2020-04-26 17:01     ` zimoun
2020-04-26 20:22       ` Ludovic Courtès
2020-04-30 13:10     ` zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 1/3] DRAFT packages: Add fields to packages cache zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 2/3] DRAFT packages: Add new procedure 'fold-packages*' zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 3/3] DRAFT guix package: Use cache in 'find-packages-by-description' zimoun
2020-05-03 16:43   ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) Ludovic Courtès
2020-05-03 18:10     ` zimoun
2020-05-03 19:49       ` Ludovic Courtès
2020-06-01  0:00 ` [bug#39258] [PATCH 0/4] Optimize guix search Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 1/4] ui: Cut off search early if any regexp does not match Arun Isaac
2020-06-09  8:29     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-09  9:55       ` zimoun
2020-06-13 12:37       ` Arun Isaac
2020-06-13 13:36         ` zimoun
2020-06-13 17:21           ` Arun Isaac
2020-06-14 19:14             ` zimoun
2020-06-13 19:32         ` Ludovic Courtès
2020-06-15 20:18           ` Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 3/4] ui: Do not translate package synopsis a second time Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 4/4] ui: Use package-description-string Arun Isaac
2020-06-09  8:34     ` Ludovic Courtès
2020-06-01  1:25   ` [bug#39258] [PATCH v5 0/4] Optimize guix search zimoun
2020-06-01  2:24     ` Arun Isaac
2020-06-01 10:01     ` zimoun
2020-06-01 10:11 ` [bug#39258] KMP string search algorithm? zimoun
2020-06-01 22:24   ` Leo Famulari
2020-06-01 23:48     ` Arun Isaac
2020-06-02  8:49       ` Ludovic Courtès
2021-07-15  7:33 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 1/2] DRAFT packages: Add fields to packages cache zimoun
2021-07-17  8:31     ` Arun Isaac
2021-07-23 15:30       ` Ludovic Courtès
2021-08-17 14:03         ` zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 2/2] DRAFT scripts: package: Use cache in 'find-packages-by-description' zimoun
2021-07-23 15:43   ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances Ludovic Courtès
2021-08-20 15:42     ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ3okZ0KP0bYuwShZoNkkFWrhGNPsUGOOfiL5azj_x8Q+GNqLA@mail.gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=39258@debbugs.gnu.org \
    --cc=arunisaac@systemreboot.net \
    --cc=ludo@gnu.org \
    --cc=mail@ambrevar.xyz \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).