From: zimoun <zimon.toutoune@gmail.com>
To: "Arun Isaac" <arunisaac@systemreboot.net>,
"Ludovic Courtès" <ludo@gnu.org>
Cc: 39258@debbugs.gnu.org
Subject: [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings.
Date: Sun, 14 Jun 2020 21:14:34 +0200 [thread overview]
Message-ID: <86sgex8nol.fsf@gmail.com> (raw)
In-Reply-To: <cu7zh96dgql.fsf@systemreboot.net>
Dear Arun,
Here, I am speaking about only the first patch: the cut-off.
TL;DR:
1. I was wrong about the bottleneck.
2. The queries were not the good ones to see a clear effect
-- on my machine.
On Sat, 13 Jun 2020 at 22:51, Arun Isaac <arunisaac@systemreboot.net> wrote:
> Yes, I did read your earlier mail. And, I tried again, this time with
> patch 1 alone. It certainly makes a difference on my machine. It is
> clear from the code logic that it should make a difference on your
> machine as well, at least for longer queries. But, somehow it isn't and
> I do not understand why. :-(
Well, I spent some hours* to do some stats (Student's t-test). Roughly
speaking, on my machine, the standard deviation error (stddev) hides the
point -- depending on the query -- and that's why I am not always seeing
the improvement, I guess.
*ah all my Sunday in fact. ;-)
I compared different conditions for the query "game strategy":
- cold vs warm
- xterm vs shell in Emacs (my config vs -q)
- no pipe vs pipe
And I run 10 times in a row each experiment. The conclusion is: in
average -- on my machine -- the cut-off improves. But sometimes
considering only 3 repeats in a row, the improvement is not obvious (on
the mean); because the both tails of distribution overlap a bit on my
machine and so it is kind of bad luck. And it is ``worse'' depending
against which commit your patch is rebased: a357849 (old) vs e782756.
The t-test captures this variation, even with only 3 repeats, but I have
not done in my previous email and only compared the visible mean. Sorry
about that.
Moreover, printing increases the stddev, so the results are more
fluctuating inside Emacs vs xterm and piping helps in this case.
Piping does not change the final result -- hopefully. :-) It adds an
extra time but in average it is the same.
About cold vs warm cache, I notice that the improvement is not the same
(in average). Considering the raw time, there is a difference about 10%
(with "good" confidence); it could be worth to understand why.
Well, considering that, I did other stats with other queries and the
conclusion for my machine is that *the patch improves* on average by
reducing the timing for typical usages. Which is really cool! :-)
I definitively have wrong about the bottleneck and this one could be
one. One way to have an idea is to use "statprof" but it is hard for me
to read the results (I believe Guile master have a fix improving the
'anon #addr', but do not really know more).
--8<---------------cut here---------------start------------->8---
$ /tmp/v5-1/bin/guix repl
scheme@(guix-user)> ,use(guix scripts search)
scheme@(guix-user)> ,pr (guix-search "game" "strategy")
% cumulative self
time seconds seconds procedure
17.81 0.29 0.27 anon #xe40178
12.33 0.20 0.18 ice-9/boot-9.scm:2201:0:%load-announce
12.33 0.18 0.18 anon #xe3c770
5.48 0.08 0.08 ice-9/boot-9.scm:1396:0:symbol-append
4.11 1.57 0.06 guix/memoization.scm:100:0
4.11 0.06 0.06 ice-9/popen.scm:145:0:reap-pipes
2.74 0.55 0.04 guix/ui.scm:1511:12
2.74 0.33 0.04 ice-9/regex.scm:170:0:fold-matches
2.74 0.04 0.04 ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress?
2.74 0.04 0.04 texinfo/string-utils.scm:98:5
2.74 0.04 0.04 ice-9/vlist.scm:539:0:vhash-assq
1.37 69.81 0.02 ice-9/threads.scm:388:4
[...]
---
Sample count: 73
Total time: 1.490955132 seconds (0.387756476 seconds in GC)
--8<---------------cut here---------------end--------------->8---
To compare with the default:
--8<---------------cut here---------------start------------->8---
time seconds seconds procedure
24.47 0.49 0.46 anon #x1d89178
21.28 0.40 0.40 anon #x1d85770
9.57 0.20 0.18 ice-9/boot-9.scm:2201:0:%load-announce
3.19 4.71 0.06 ice-9/boot-9.scm:1673:4:with-exception-handler
3.19 1.64 0.06 guix/memoization.scm:100:0
3.19 0.06 0.06 ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress?
3.19 0.06 0.06 anon #x1d84c78
3.19 0.06 0.06 ice-9/popen.scm:145:0:reap-pipes
2.13 1.01 0.04 guix/ui.scm:1511:12
2.13 0.08 0.04 ice-9/boot-9.scm:1396:0:symbol-append
2.13 0.04 0.04 anon #x1d83248
1.06 0.30 0.02 anon #x7f057e6c90e8
[...]
--8<---------------cut here---------------end--------------->8---
So clearly the patch has an effect! If someone knows what is:
- ice-9/boot-9.scm:2201:0:%load-announce
- ice-9/boot-9.scm:1396:0:symbol-append
and from where they could come from, it could help. :-)
Well, I am interested to know which part is the Regex Engine and the
string search. :-) Linking to the discussion about KMP and others.
> Here are more fresh results. Could you try for longer queries like
> "strategy game caesar" and without the output being piped to recsel,
> grep, etc.? For simplicity, let's talk only about warm cache results.
>
> |----------------------------------+--------+-------|
> | query | before | after |
> |----------------------------------+--------+-------|
> | guix search strategy game | 2.58 | 1.96 |
> | guix search strategy game caesar | 2.95 | 1.76 |
> |----------------------------------+--------+-------|
At first, I was confused why one more terms returns faster. This is
because the query "caesar" returns only one package so the query
"strategy game caesar" cuts off all the packages when searching the
terms "game" and then "strategy". I mean
guix search julius
should be as long as
guix search strategy game caesar
It is; in average on my machine.
And secondly, I was confused because the timing of the query "caesar
strategy game" is almost the same (2.8% +/- 2.5% with 99.0% of
confidence; 10 repeats). Well, it is because in one case the term
"caesar" is applied to 15 packages and in another case the terms
"strategy" and "game" are applied to 1 package. Adding some stddev
error and not enough repeats (nor good stats), the confusion is complete
and my conclusion is wrong.
That's said, the effect of the cut-off is clear (on my machine even with
on shot) with the queries:
- game strategy the
- the game strategy
Thank you,
simon
next prev parent reply other threads:[~2020-06-14 19:15 UTC|newest]
Thread overview: 126+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-01-23 19:51 [bug#39258] Faster guix search using an sqlite cache Arun Isaac
2020-01-29 23:33 ` zimoun
2020-01-30 13:48 ` Arun Isaac
2020-01-31 12:48 ` zimoun
2020-02-02 21:16 ` Arun Isaac
2020-02-04 10:19 ` zimoun
2020-02-06 1:58 ` Arun Isaac
2020-02-11 16:29 ` Ludovic Courtès
2020-02-11 18:21 ` zimoun
2020-02-11 18:39 ` Ludovic Courtès
2020-02-11 19:07 ` Arun Isaac
2020-02-11 20:20 ` zimoun
2020-02-15 14:50 ` Arun Isaac
2020-02-11 20:13 ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 0/4] Xapian for Guix package search Arun Isaac
2020-02-27 20:41 ` [bug#39258] [PATCH 1/4] gnu: Add guile-xapian Arun Isaac
2020-03-03 16:29 ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 2/4] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-02-27 20:41 ` [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index Arun Isaac
2020-02-28 8:04 ` Pierre Neidhardt
2020-03-05 20:26 ` Arun Isaac
2020-03-03 18:29 ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search Arun Isaac
2020-02-28 8:11 ` Pierre Neidhardt
2020-03-03 19:21 ` zimoun
2020-03-03 19:51 ` zimoun
2020-02-28 8:13 ` [bug#39258] [PATCH 0/4] Xapian for Guix " Pierre Neidhardt
2020-02-28 12:39 ` zimoun
2020-02-28 12:49 ` Pierre Neidhardt
2020-02-28 15:36 ` Arun Isaac
2020-02-28 16:04 ` Arun Isaac
2020-03-02 18:37 ` zimoun
2020-03-02 19:13 ` zimoun
2020-03-03 20:04 ` zimoun
2020-02-29 8:25 ` Arun Isaac
2020-03-02 18:27 ` zimoun
2020-02-28 12:36 ` zimoun
2020-03-05 16:46 ` Ludovic Courtès
2020-03-07 13:31 ` [bug#39258] [PATCH v2 0/3] " Arun Isaac
2020-03-07 13:31 ` [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-03-09 18:14 ` zimoun
2020-03-09 23:40 ` Jonathan Brielmaier
2020-03-10 5:24 ` Arun Isaac
2020-03-07 13:31 ` [bug#39258] [PATCH v2 2/3] gnu: Generate Xapian package search index Arun Isaac
2020-03-09 18:19 ` zimoun
2020-03-07 13:31 ` [bug#39258] [PATCH v2 3/3] gnu: Use Xapian index for package search Arun Isaac
2020-03-07 20:33 ` [bug#39258] [PATCH v2 0/3] Xapian for Guix " Ludovic Courtès
2020-03-08 9:01 ` Arun Isaac
2020-03-08 11:33 ` Ludovic Courtès
2020-03-08 20:27 ` Arun Isaac
2020-03-09 7:42 ` Pierre Neidhardt
2020-03-09 12:50 ` zimoun
2020-03-09 10:35 ` Ludovic Courtès
2020-03-10 14:17 ` Arun Isaac
2020-03-10 14:33 ` zimoun
2020-03-11 13:50 ` Ludovic Courtès
2020-03-13 5:37 ` Arun Isaac
2020-03-15 20:40 ` Ludovic Courtès
2020-03-09 7:50 ` Pierre Neidhardt
2020-03-09 10:28 ` Ludovic Courtès
2020-03-09 13:03 ` zimoun
2020-03-09 12:53 ` zimoun
2020-03-09 12:47 ` zimoun
2020-03-09 12:40 ` zimoun
2020-03-09 12:34 ` zimoun
2020-03-08 20:27 ` zimoun
2020-03-08 20:40 ` Arun Isaac
2020-03-09 12:28 ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Arun Isaac
2020-03-27 16:26 ` [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache Arun Isaac
2020-04-24 20:48 ` Ludovic Courtès
2020-04-26 9:48 ` zimoun
2020-04-26 14:35 ` Ludovic Courtès
2020-04-26 14:54 ` Pierre Neidhardt
2020-04-26 15:33 ` Ludovic Courtès
2020-04-26 15:05 ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 2/3] guix: Search " Arun Isaac
2020-04-24 20:58 ` Ludovic Courtès
2020-03-27 16:26 ` [bug#39258] [PATCH v3 3/3] guix: Use package metadata cache for package search Arun Isaac
2020-04-24 21:03 ` Ludovic Courtès
2020-04-05 14:08 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Ludovic Courtès
2020-04-24 21:05 ` Ludovic Courtès
2020-04-26 3:54 ` [bug#39258] benchmark search: default vs v2 vs v3 zimoun
2020-04-26 7:29 ` Pierre Neidhardt
2020-04-26 15:49 ` Ludovic Courtès
2020-04-26 17:01 ` zimoun
2020-04-26 20:22 ` Ludovic Courtès
2020-04-30 13:10 ` zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 1/3] DRAFT packages: Add fields to packages cache zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 2/3] DRAFT packages: Add new procedure 'fold-packages*' zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 3/3] DRAFT guix package: Use cache in 'find-packages-by-description' zimoun
2020-05-03 16:43 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) Ludovic Courtès
2020-05-03 18:10 ` zimoun
2020-05-03 19:49 ` Ludovic Courtès
2020-06-01 0:00 ` [bug#39258] [PATCH 0/4] Optimize guix search Arun Isaac
2020-06-01 0:00 ` [bug#39258] [PATCH 1/4] ui: Cut off search early if any regexp does not match Arun Isaac
2020-06-09 8:29 ` Ludovic Courtès
2020-06-01 0:00 ` [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings Arun Isaac
2020-06-09 8:33 ` Ludovic Courtès
2020-06-09 9:55 ` zimoun
2020-06-13 12:37 ` Arun Isaac
2020-06-13 13:36 ` zimoun
2020-06-13 17:21 ` Arun Isaac
2020-06-14 19:14 ` zimoun [this message]
2020-06-13 19:32 ` Ludovic Courtès
2020-06-15 20:18 ` Arun Isaac
2020-06-01 0:00 ` [bug#39258] [PATCH 3/4] ui: Do not translate package synopsis a second time Arun Isaac
2020-06-09 8:33 ` Ludovic Courtès
2020-06-01 0:00 ` [bug#39258] [PATCH 4/4] ui: Use package-description-string Arun Isaac
2020-06-09 8:34 ` Ludovic Courtès
2020-06-01 1:25 ` [bug#39258] [PATCH v5 0/4] Optimize guix search zimoun
2020-06-01 2:24 ` Arun Isaac
2020-06-01 10:01 ` zimoun
2020-06-01 10:11 ` [bug#39258] KMP string search algorithm? zimoun
2020-06-01 22:24 ` Leo Famulari
2020-06-01 23:48 ` Arun Isaac
2020-06-02 8:49 ` Ludovic Courtès
2021-07-15 7:33 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances zimoun
2021-07-15 7:33 ` [bug#39258] [PATCH v6 1/2] DRAFT packages: Add fields to packages cache zimoun
2021-07-17 8:31 ` Arun Isaac
2021-07-23 15:30 ` Ludovic Courtès
2021-08-17 14:03 ` zimoun
2021-07-15 7:33 ` [bug#39258] [PATCH v6 2/2] DRAFT scripts: package: Use cache in 'find-packages-by-description' zimoun
2021-07-23 15:43 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances Ludovic Courtès
2021-08-20 15:42 ` zimoun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86sgex8nol.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=39258@debbugs.gnu.org \
--cc=arunisaac@systemreboot.net \
--cc=ludo@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).