From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id oKHlNz135l4mFwAA0tVLHw (envelope-from ) for ; Sun, 14 Jun 2020 19:15:09 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id QOnKMz135l5WTwAAB5/wlQ (envelope-from ) for ; Sun, 14 Jun 2020 19:15:09 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 539B0940058 for ; Sun, 14 Jun 2020 19:15:09 +0000 (UTC) Received: from localhost ([::1]:51216 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jkY60-0005AY-Bj for larch@yhetil.org; Sun, 14 Jun 2020 15:15:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:48790) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jkY5u-0005A9-Lx for guix-patches@gnu.org; Sun, 14 Jun 2020 15:15:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:32977) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jkY5u-0007EI-Ca for guix-patches@gnu.org; Sun, 14 Jun 2020 15:15:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jkY5u-0006bx-2f for guix-patches@gnu.org; Sun, 14 Jun 2020 15:15:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#39258] [PATCH 2/4] ui: Use string matching with literal search strings. References: Resent-From: zimoun Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sun, 14 Jun 2020 19:15:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 39258 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: Arun Isaac , Ludovic =?UTF-8?Q?Court=C3=A8s?= Cc: 39258@debbugs.gnu.org Received: via spool by 39258-submit@debbugs.gnu.org id=B39258.159216209925390 (code B ref 39258); Sun, 14 Jun 2020 19:15:02 +0000 Received: (at 39258) by debbugs.gnu.org; 14 Jun 2020 19:14:59 +0000 Received: from localhost ([127.0.0.1]:44523 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jkY5q-0006bS-GB for submit@debbugs.gnu.org; Sun, 14 Jun 2020 15:14:58 -0400 Received: from mail-wr1-f53.google.com ([209.85.221.53]:37304) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jkY5n-0006bF-T2 for 39258@debbugs.gnu.org; Sun, 14 Jun 2020 15:14:56 -0400 Received: by mail-wr1-f53.google.com with SMTP id x13so14909098wrv.4 for <39258@debbugs.gnu.org>; Sun, 14 Jun 2020 12:14:55 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:in-reply-to:date:message-id:mime-version; bh=t0IjIRjNaXuyy/TBt+Ok+vhyTVH35/BEWnKzF+RTR+E=; b=cTJosOqmduwecaI1GJmzevUqWBUpDTpi9RRIbJUcn7CQSCrt9hacOi8+G1e9NCaWXg wx7c3uhpZeZXn4VBEdRe9W9ycvXNd8sFpLIdgW4wWsyNV6eaJYft49oBwp+1W54qqFgJ CRPtT2UQhfJ1pYDl8Yc56xDVuL8sEY4x6oKObleXHS1mO3Ny6oXNvvrUn5H2WaFH/KeV rEhpaohpYiZs/SUHcnKFCfPCczqiUrksIcrTlgrDGYuNKYIK4aQABryWZT5vv9ymHIbv ZPd52vHkZZUHZXpI8c9K3UIDQG6z1EAIoImCQuLUH66sBW4JFHb9+Cmo6LIr7XEwgBiN JuQQ== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:in-reply-to:date:message-id :mime-version; bh=t0IjIRjNaXuyy/TBt+Ok+vhyTVH35/BEWnKzF+RTR+E=; b=bmhlLkM0GPezMnTK6k5OwBd2F5YQxuOGrb4TsKyqtZZXoxbPtF/GTMr3cQCpk6eGd+ d2GigsG1xTyZNAM+fIrfcb3XBVs1abm1Pd2Uu48D09ktN1O6Myr9pd7m6VwV7Rc0EecW SAR4Z63Ln4kYLOVzHYbd76GYx4EkwDwLR2sQ/fl4rH8yFTzk7gat9acU/+t9FsCwgO7N 1yvLARCYqB8HY0peHWj9dXD336uoKvYIJN+1+uN/B22fdREFbTFkG2sbO7DfMKCLkO5z nzdrmWkhLe3gHRThRF9hhDwM3yDfnDL6ZDh+3BPnCABJ3IIRExBYjAU2qnyCbHV/lz8M mV5g== X-Gm-Message-State: AOAM530jgn9HB+95AohcPIjT/LUfPxxhft5JqnZe1YcI+ZrKzwYnua+P e2MYubP9Ed2d3crcQ+OFr7IfCqWdn7I= X-Google-Smtp-Source: ABdhPJxVxrQMwTtR9+nzq+pmBww2vVjo+VoR6azb3L21v2h++wMxi77OWXiU+toDpcJY1tRIVSfyMw== X-Received: by 2002:a05:6000:1292:: with SMTP id f18mr25643027wrx.208.1592162089528; Sun, 14 Jun 2020 12:14:49 -0700 (PDT) Received: from lili (57.246.195.77.rev.sfr.net. [77.195.246.57]) by smtp.gmail.com with ESMTPSA id a1sm19032136wmj.29.2020.06.14.12.14.48 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 14 Jun 2020 12:14:48 -0700 (PDT) From: zimoun In-Reply-To: Date: Sun, 14 Jun 2020 21:14:34 +0200 Message-ID: <86sgex8nol.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.0 (-) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=cTJosOqm; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Spam-Score: 0.09 X-TUID: r/5PtPD7lCsc Dear Arun, Here, I am speaking about only the first patch: the cut-off. TL;DR: 1. I was wrong about the bottleneck. 2. The queries were not the good ones to see a clear effect -- on my machine. On Sat, 13 Jun 2020 at 22:51, Arun Isaac wrote: > Yes, I did read your earlier mail. And, I tried again, this time with > patch 1 alone. It certainly makes a difference on my machine. It is > clear from the code logic that it should make a difference on your > machine as well, at least for longer queries. But, somehow it isn't and > I do not understand why. :-( Well, I spent some hours* to do some stats (Student's t-test). Roughly speaking, on my machine, the standard deviation error (stddev) hides the point -- depending on the query -- and that's why I am not always seeing the improvement, I guess. *ah all my Sunday in fact. ;-) I compared different conditions for the query "game strategy": - cold vs warm - xterm vs shell in Emacs (my config vs -q) - no pipe vs pipe And I run 10 times in a row each experiment. The conclusion is: in average -- on my machine -- the cut-off improves. But sometimes considering only 3 repeats in a row, the improvement is not obvious (on the mean); because the both tails of distribution overlap a bit on my machine and so it is kind of bad luck. And it is ``worse'' depending against which commit your patch is rebased: a357849 (old) vs e782756. The t-test captures this variation, even with only 3 repeats, but I have not done in my previous email and only compared the visible mean. Sorry about that. Moreover, printing increases the stddev, so the results are more fluctuating inside Emacs vs xterm and piping helps in this case. Piping does not change the final result -- hopefully. :-) It adds an extra time but in average it is the same. About cold vs warm cache, I notice that the improvement is not the same (in average). Considering the raw time, there is a difference about 10% (with "good" confidence); it could be worth to understand why. Well, considering that, I did other stats with other queries and the conclusion for my machine is that *the patch improves* on average by reducing the timing for typical usages. Which is really cool! :-) I definitively have wrong about the bottleneck and this one could be one. One way to have an idea is to use "statprof" but it is hard for me to read the results (I believe Guile master have a fix improving the 'anon #addr', but do not really know more). --8<---------------cut here---------------start------------->8--- $ /tmp/v5-1/bin/guix repl scheme@(guix-user)> ,use(guix scripts search) scheme@(guix-user)> ,pr (guix-search "game" "strategy") % cumulative self time seconds seconds procedure 17.81 0.29 0.27 anon #xe40178 12.33 0.20 0.18 ice-9/boot-9.scm:2201:0:%load-announce 12.33 0.18 0.18 anon #xe3c770 5.48 0.08 0.08 ice-9/boot-9.scm:1396:0:symbol-append 4.11 1.57 0.06 guix/memoization.scm:100:0 4.11 0.06 0.06 ice-9/popen.scm:145:0:reap-pipes 2.74 0.55 0.04 guix/ui.scm:1511:12 2.74 0.33 0.04 ice-9/regex.scm:170:0:fold-matches 2.74 0.04 0.04 ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress? 2.74 0.04 0.04 texinfo/string-utils.scm:98:5 2.74 0.04 0.04 ice-9/vlist.scm:539:0:vhash-assq 1.37 69.81 0.02 ice-9/threads.scm:388:4 [...] --- Sample count: 73 Total time: 1.490955132 seconds (0.387756476 seconds in GC) --8<---------------cut here---------------end--------------->8--- To compare with the default: --8<---------------cut here---------------start------------->8--- time seconds seconds procedure 24.47 0.49 0.46 anon #x1d89178 21.28 0.40 0.40 anon #x1d85770 9.57 0.20 0.18 ice-9/boot-9.scm:2201:0:%load-announce 3.19 4.71 0.06 ice-9/boot-9.scm:1673:4:with-exception-handler 3.19 1.64 0.06 guix/memoization.scm:100:0 3.19 0.06 0.06 ice-9/boot-9.scm:3540:0:autoload-done-or-in-progress? 3.19 0.06 0.06 anon #x1d84c78 3.19 0.06 0.06 ice-9/popen.scm:145:0:reap-pipes 2.13 1.01 0.04 guix/ui.scm:1511:12 2.13 0.08 0.04 ice-9/boot-9.scm:1396:0:symbol-append 2.13 0.04 0.04 anon #x1d83248 1.06 0.30 0.02 anon #x7f057e6c90e8 [...] --8<---------------cut here---------------end--------------->8--- So clearly the patch has an effect! If someone knows what is: - ice-9/boot-9.scm:2201:0:%load-announce - ice-9/boot-9.scm:1396:0:symbol-append and from where they could come from, it could help. :-) Well, I am interested to know which part is the Regex Engine and the string search. :-) Linking to the discussion about KMP and others. > Here are more fresh results. Could you try for longer queries like > "strategy game caesar" and without the output being piped to recsel, > grep, etc.? For simplicity, let's talk only about warm cache results. > > |----------------------------------+--------+-------| > | query | before | after | > |----------------------------------+--------+-------| > | guix search strategy game | 2.58 | 1.96 | > | guix search strategy game caesar | 2.95 | 1.76 | > |----------------------------------+--------+-------| At first, I was confused why one more terms returns faster. This is because the query "caesar" returns only one package so the query "strategy game caesar" cuts off all the packages when searching the terms "game" and then "strategy". I mean guix search julius should be as long as guix search strategy game caesar It is; in average on my machine. And secondly, I was confused because the timing of the query "caesar strategy game" is almost the same (2.8% +/- 2.5% with 99.0% of confidence; 10 repeats). Well, it is because in one case the term "caesar" is applied to 15 packages and in another case the terms "strategy" and "game" are applied to 1 package. Adding some stddev error and not enough repeats (nor good stats), the confusion is complete and my conclusion is wrong. That's said, the effect of the cut-off is clear (on my machine even with on shot) with the queries: - game strategy the - the game strategy Thank you, simon