unofficial mirror of guix-patches@gnu.org 
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: "Arun Isaac" <arunisaac@systemreboot.net>,
	"Ludovic Courtès" <ludo@gnu.org>
Cc: 39258@debbugs.gnu.org
Subject: [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings.
Date: Sun, 14 Jun 2020 21:14:34 +0200	[thread overview]
Message-ID: <86sgex8nol.fsf@gmail.com> (raw)
In-Reply-To: <cu7zh96dgql.fsf@systemreboot.net>

Dear Arun,

Here, I am speaking about only the first patch: the cut-off.

TL;DR:
 1. I was wrong about the bottleneck.
 2. The queries were not the good ones to see a clear effect
    -- on my machine.
    

On Sat, 13 Jun 2020 at 22:51, Arun Isaac <arunisaac@systemreboot.net> wrote:

> Yes, I did read your earlier mail. And, I tried again, this time with
> patch 1 alone. It certainly makes a difference on my machine. It is
> clear from the code logic that it should make a difference on your
> machine as well, at least for longer queries. But, somehow it isn't and
> I do not understand why. :-(

Well, I spent some hours* to do some stats (Student's t-test).  Roughly
speaking, on my machine, the standard deviation error (stddev) hides the
point -- depending on the query -- and that's why I am not always seeing
the improvement, I guess.

*ah all my Sunday in fact. ;-)


I compared different conditions for the query "game strategy":

 - cold    vs warm
 - xterm   vs shell in Emacs (my config vs -q)
 - no pipe vs pipe

And I run 10 times in a row each experiment.  The conclusion is: in
average -- on my machine -- the cut-off improves.  But sometimes
considering only 3 repeats in a row, the improvement is not obvious (on
the mean); because the both tails of distribution overlap a bit on my
machine and so it is kind of bad luck.  And it is ``worse'' depending
against which commit your patch is rebased: a357849 (old) vs e782756.

The t-test captures this variation, even with only 3 repeats, but I have
not done in my previous email and only compared the visible mean.  Sorry
about that.

Moreover, printing increases the stddev, so the results are more
fluctuating inside Emacs vs xterm and piping helps in this case.

Piping does not change the final result -- hopefully. :-)  It adds an
extra time but in average it is the same.

About cold vs warm cache, I notice that the improvement is not the same
(in average).  Considering the raw time, there is a difference about 10%
(with "good" confidence); it could be worth to understand why.


Well, considering that, I did other stats with other queries and the
conclusion for my machine is that *the patch improves* on average by
reducing the timing for typical usages.  Which is really cool! :-)


I definitively have wrong about the bottleneck and this one could be
one.  One way to have an idea is to use "statprof" but it is hard for me
to read the results (I believe Guile master have a fix improving the
'anon #addr', but do not really know more).

--8<---------------cut here---------------start------------->8---
$ /tmp/v5-1/bin/guix repl
scheme@(guix-user)> ,use(guix scripts search)
scheme@(guix-user)> ,pr (guix-search "game" "strategy")
%     cumulative   self             
time   seconds     seconds  procedure
 17.81      0.29      0.27  anon #xe40178
 12.33      0.20      0.18  ice-9/boot-9.scm:2201:0:%load-announce
 12.33      0.18      0.18  anon #xe3c770
  5.48      0.08      0.08  ice-9/boot-9.scm:1396:0:symbol-append
  4.11      1.57      0.06  guix/memoization.scm:100:0
  4.11      0.06      0.06  ice-9/popen.scm:145:0:reap-pipes
  2.74      0.55      0.04  guix/ui.scm:1511:12
  2.74      0.33      0.04  ice-9/regex.scm:170:0:fold-matches
  2.74      0.04      0.04  ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress?
  2.74      0.04      0.04  texinfo/string-utils.scm:98:5
  2.74      0.04      0.04  ice-9/vlist.scm:539:0:vhash-assq
  1.37     69.81      0.02  ice-9/threads.scm:388:4
[...]
---
Sample count: 73
Total time: 1.490955132 seconds (0.387756476 seconds in GC)
--8<---------------cut here---------------end--------------->8---

To compare with the default:

--8<---------------cut here---------------start------------->8---
time   seconds     seconds  procedure
 24.47      0.49      0.46  anon #x1d89178
 21.28      0.40      0.40  anon #x1d85770
  9.57      0.20      0.18  ice-9/boot-9.scm:2201:0:%load-announce
  3.19      4.71      0.06  ice-9/boot-9.scm:1673:4:with-exception-handler
  3.19      1.64      0.06  guix/memoization.scm:100:0
  3.19      0.06      0.06  ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress?
  3.19      0.06      0.06  anon #x1d84c78
  3.19      0.06      0.06  ice-9/popen.scm:145:0:reap-pipes
  2.13      1.01      0.04  guix/ui.scm:1511:12
  2.13      0.08      0.04  ice-9/boot-9.scm:1396:0:symbol-append
  2.13      0.04      0.04  anon #x1d83248
  1.06      0.30      0.02  anon #x7f057e6c90e8
[...]
--8<---------------cut here---------------end--------------->8---

So clearly the patch has an effect!  If someone knows what is:

 - ice-9/boot-9.scm:2201:0:%load-announce
 - ice-9/boot-9.scm:1396:0:symbol-append
 
and from where they could come from, it could help. :-)

Well, I am interested to know which part is the Regex Engine and the
string search. :-) Linking to the discussion about KMP and others.


> Here are more fresh results. Could you try for longer queries like
> "strategy game caesar" and without the output being piped to recsel,
> grep, etc.? For simplicity, let's talk only about warm cache results.
>
> |----------------------------------+--------+-------|
> | query                            | before | after |
> |----------------------------------+--------+-------|
> | guix search strategy game        |   2.58 |  1.96 |
> | guix search strategy game caesar |   2.95 |  1.76 |
> |----------------------------------+--------+-------|

At first, I was confused why one more terms returns faster.  This is
because the query "caesar" returns only one package so the query
"strategy game caesar" cuts off all the packages when searching the
terms "game" and then "strategy".  I mean

   guix search julius

should be as long as

   guix search strategy game caesar

It is; in average on my machine.

And secondly, I was confused because the timing of the query "caesar
strategy game" is almost the same (2.8% +/- 2.5% with 99.0% of
confidence; 10 repeats).  Well, it is because in one case the term
"caesar" is applied to 15 packages and in another case the terms
"strategy" and "game" are applied to 1 package.  Adding some stddev
error and not enough repeats (nor good stats), the confusion is complete
and my conclusion is wrong.


That's said, the effect of the cut-off is clear (on my machine even with
on shot) with the queries:

  - game strategy the
  - the game strategy


Thank you,
simon





  reply	other threads:[~2020-06-14 19:15 UTC|newest]

Thread overview: 126+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-01-23 19:51 [bug#39258] Faster guix search using an sqlite cache Arun Isaac
2020-01-29 23:33 ` zimoun
2020-01-30 13:48   ` Arun Isaac
2020-01-31 12:48     ` zimoun
2020-02-02 21:16       ` Arun Isaac
2020-02-04 10:19         ` zimoun
2020-02-06  1:58           ` Arun Isaac
2020-02-11 16:29             ` Ludovic Courtès
2020-02-11 18:21               ` zimoun
2020-02-11 18:39                 ` Ludovic Courtès
2020-02-11 19:07                   ` Arun Isaac
2020-02-11 20:20                     ` zimoun
2020-02-15 14:50                     ` Arun Isaac
2020-02-11 20:13                   ` zimoun
2020-02-27 20:41 ` [bug#39258] [PATCH 0/4] Xapian for Guix package search Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 1/4] gnu: Add guile-xapian Arun Isaac
2020-03-03 16:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 2/4] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-02-27 20:41   ` [bug#39258] [PATCH 3/4] gnu: Generate xapian package search index Arun Isaac
2020-02-28  8:04     ` Pierre Neidhardt
2020-03-05 20:26       ` Arun Isaac
2020-03-03 18:29     ` zimoun
2020-02-27 20:41   ` [bug#39258] [PATCH 4/4] gnu: Use xapian index for package search Arun Isaac
2020-02-28  8:11     ` Pierre Neidhardt
2020-03-03 19:21     ` zimoun
2020-03-03 19:51       ` zimoun
2020-02-28  8:13   ` [bug#39258] [PATCH 0/4] Xapian for Guix " Pierre Neidhardt
2020-02-28 12:39     ` zimoun
2020-02-28 12:49       ` Pierre Neidhardt
2020-02-28 15:36     ` Arun Isaac
2020-02-28 16:04       ` Arun Isaac
2020-03-02 18:37         ` zimoun
2020-03-02 19:13           ` zimoun
2020-03-03 20:04             ` zimoun
2020-02-29  8:25       ` Arun Isaac
2020-03-02 18:27         ` zimoun
2020-02-28 12:36   ` zimoun
2020-03-05 16:46   ` Ludovic Courtès
2020-03-07 13:31 ` [bug#39258] [PATCH v2 0/3] " Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 1/3] build-self: Add guile-xapian to Guix dependencies Arun Isaac
2020-03-09 18:14     ` zimoun
2020-03-09 23:40     ` Jonathan Brielmaier
2020-03-10  5:24       ` Arun Isaac
2020-03-07 13:31   ` [bug#39258] [PATCH v2 2/3] gnu: Generate Xapian package search index Arun Isaac
2020-03-09 18:19     ` zimoun
2020-03-07 13:31   ` [bug#39258] [PATCH v2 3/3] gnu: Use Xapian index for package search Arun Isaac
2020-03-07 20:33   ` [bug#39258] [PATCH v2 0/3] Xapian for Guix " Ludovic Courtès
2020-03-08  9:01     ` Arun Isaac
2020-03-08 11:33       ` Ludovic Courtès
2020-03-08 20:27         ` Arun Isaac
2020-03-09  7:42           ` Pierre Neidhardt
2020-03-09 12:50             ` zimoun
2020-03-09 10:35           ` Ludovic Courtès
2020-03-10 14:17             ` Arun Isaac
2020-03-10 14:33               ` zimoun
2020-03-11 13:50               ` Ludovic Courtès
2020-03-13  5:37                 ` Arun Isaac
2020-03-15 20:40                   ` Ludovic Courtès
2020-03-09  7:50         ` Pierre Neidhardt
2020-03-09 10:28           ` Ludovic Courtès
2020-03-09 13:03             ` zimoun
2020-03-09 12:53           ` zimoun
2020-03-09 12:47         ` zimoun
2020-03-09 12:40       ` zimoun
2020-03-09 12:34     ` zimoun
2020-03-08 20:27   ` zimoun
2020-03-08 20:40     ` Arun Isaac
2020-03-09 12:28   ` zimoun
2020-03-27 16:26 ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Arun Isaac
2020-03-27 16:26   ` [bug#39258] [PATCH v3 1/3] guix: Generate package metadata cache Arun Isaac
2020-04-24 20:48     ` Ludovic Courtès
2020-04-26  9:48       ` zimoun
2020-04-26 14:35         ` Ludovic Courtès
2020-04-26 14:54           ` Pierre Neidhardt
2020-04-26 15:33             ` Ludovic Courtès
2020-04-26 15:05           ` zimoun
2020-03-27 16:26   ` [bug#39258] [PATCH v3 2/3] guix: Search " Arun Isaac
2020-04-24 20:58     ` Ludovic Courtès
2020-03-27 16:26   ` [bug#39258] [PATCH v3 3/3] guix: Use package metadata cache for package search Arun Isaac
2020-04-24 21:03     ` Ludovic Courtès
2020-04-05 14:08   ` [bug#39258] [PATCH v3 0/3] Package metadata cache for guix search Ludovic Courtès
2020-04-24 21:05   ` Ludovic Courtès
2020-04-26  3:54 ` [bug#39258] benchmark search: default vs v2 vs v3 zimoun
2020-04-26  7:29   ` Pierre Neidhardt
2020-04-26 15:49   ` Ludovic Courtès
2020-04-26 17:01     ` zimoun
2020-04-26 20:22       ` Ludovic Courtès
2020-04-30 13:10     ` zimoun
2020-05-03 15:01 ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 1/3] DRAFT packages: Add fields to packages cache zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 2/3] DRAFT packages: Add new procedure 'fold-packages*' zimoun
2020-05-03 15:01   ` [bug#39258] [PATCH v4 3/3] DRAFT guix package: Use cache in 'find-packages-by-description' zimoun
2020-05-03 16:43   ` [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) Ludovic Courtès
2020-05-03 18:10     ` zimoun
2020-05-03 19:49       ` Ludovic Courtès
2020-06-01  0:00 ` [bug#39258] [PATCH 0/4] Optimize guix search Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 1/4] ui: Cut off search early if any regexp does not match Arun Isaac
2020-06-09  8:29     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-09  9:55       ` zimoun
2020-06-13 12:37       ` Arun Isaac
2020-06-13 13:36         ` zimoun
2020-06-13 17:21           ` Arun Isaac
2020-06-14 19:14             ` zimoun [this message]
2020-06-13 19:32         ` Ludovic Courtès
2020-06-15 20:18           ` Arun Isaac
2020-06-01  0:00   ` [bug#39258] [PATCH 3/4] ui: Do not translate package synopsis a second time Arun Isaac
2020-06-09  8:33     ` Ludovic Courtès
2020-06-01  0:00   ` [bug#39258] [PATCH 4/4] ui: Use package-description-string Arun Isaac
2020-06-09  8:34     ` Ludovic Courtès
2020-06-01  1:25   ` [bug#39258] [PATCH v5 0/4] Optimize guix search zimoun
2020-06-01  2:24     ` Arun Isaac
2020-06-01 10:01     ` zimoun
2020-06-01 10:11 ` [bug#39258] KMP string search algorithm? zimoun
2020-06-01 22:24   ` Leo Famulari
2020-06-01 23:48     ` Arun Isaac
2020-06-02  8:49       ` Ludovic Courtès
2021-07-15  7:33 ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 1/2] DRAFT packages: Add fields to packages cache zimoun
2021-07-17  8:31     ` Arun Isaac
2021-07-23 15:30       ` Ludovic Courtès
2021-08-17 14:03         ` zimoun
2021-07-15  7:33   ` [bug#39258] [PATCH v6 2/2] DRAFT scripts: package: Use cache in 'find-packages-by-description' zimoun
2021-07-23 15:43   ` [bug#39258] [PATCH v6 0/2] DRAFT "guix search" performances Ludovic Courtès
2021-08-20 15:42     ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86sgex8nol.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=39258@debbugs.gnu.org \
    --cc=arunisaac@systemreboot.net \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).