From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id EPFrLjVZ1F5DMQAA0tVLHw (envelope-from ) for ; Mon, 01 Jun 2020 01:26:13 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id 0LBKKjVZ1F4+VQAAB5/wlQ (envelope-from ) for ; Mon, 01 Jun 2020 01:26:13 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id F35D4940B0C for ; Mon, 1 Jun 2020 01:26:12 +0000 (UTC) Received: from localhost ([::1]:59714 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jfZDO-0003RF-8C for larch@yhetil.org; Sun, 31 May 2020 21:26:10 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46508) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jfZDG-0003Qs-KY for guix-patches@gnu.org; Sun, 31 May 2020 21:26:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:50877) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jfZDG-0000v7-Bu for guix-patches@gnu.org; Sun, 31 May 2020 21:26:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jfZDG-0005fB-9A for guix-patches@gnu.org; Sun, 31 May 2020 21:26:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#39258] [PATCH v5 0/4] Optimize guix search Resent-From: zimoun Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Mon, 01 Jun 2020 01:26:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 39258 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: Arun Isaac Cc: Ludovic =?UTF-8?Q?Court=C3=A8s?= , 39258@debbugs.gnu.org Received: via spool by 39258-submit@debbugs.gnu.org id=B39258.159097472821729 (code B ref 39258); Mon, 01 Jun 2020 01:26:02 +0000 Received: (at 39258) by debbugs.gnu.org; 1 Jun 2020 01:25:28 +0000 Received: from localhost ([127.0.0.1]:34190 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jfZCh-0005eP-Tj for submit@debbugs.gnu.org; Sun, 31 May 2020 21:25:28 -0400 Received: from mail-qt1-f193.google.com ([209.85.160.193]:39444) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jfZCg-0005e9-MH for 39258@debbugs.gnu.org; Sun, 31 May 2020 21:25:27 -0400 Received: by mail-qt1-f193.google.com with SMTP id k22so6551607qtm.6 for <39258@debbugs.gnu.org>; Sun, 31 May 2020 18:25:26 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=mime-version:references:in-reply-to:from:date:message-id:subject:to :cc; bh=/gbfEJHcOpkZHNmmRKEOTH6rOPp5GZXuXF3+L3Pe7W4=; b=RJVpNU9buXibDcBj67narNogTWh01MzOur+jAvsaj4KC/NNGcSl9grVoc6y5RRRb7e fPlff1++tSGa79Yos+N0ViBFaSGLvps9twGpqG+8mdBkTKKQ/rn1Qu+uvEjlKMp71zNs dxzaURhDOpVjkPRgojzlNK5T1Kl3n5v+lMCKKtbv4Mt01z8dkEha+TA5hUpwj6a5PqYq EO0LKzNsUhgK8kcSwFgzW5t4ieVUkT9kDh06A9gBXo/Lq+A96EmCO7copG+vzrzM0OXn E5VILhL+Xk/ffd8BiBy82fUOyhT3UyY+E9jfnhpM5zOSALrzrKqEIMABZlQpATapYtBa sdaA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=/gbfEJHcOpkZHNmmRKEOTH6rOPp5GZXuXF3+L3Pe7W4=; b=Acw21+JJ9HCUfRptyDhuz2T2Oxw8YNJyfWIFdbtbPB+y4+L0yuxCS1xVjd0FJ3W1+D FxCFslVm2O4oFmgOzTsnqQM9MMBWiwXIAmQxca6UCgHfu/ufoj8mcDT3WN8dSycFtlwK 5UFcHyfOrIFYsggH3gQU577r8jwuOWLzHHgze3H5nfxI40P9m6BkG5TXK9XcPjXEJd5R H/KDs0DGwAKHm0Tk2LEwBLDJvUI9BQBMVgsmWIVmxjT/+kQFwCz1IkOVVhX0f20UtzN1 lP73kjKbkkD36Wq3U1smim++dsqLCtAfZZ2peZgRLt7hb+/YAW/WBVbXMd+5ZwJIFFwp cGyA== X-Gm-Message-State: AOAM530IFsp6097852P9mPEz6MKRKaYktKP5RmQPcHM0paAMc/gElxBy ZRXfB5oq1diwehQ9Rx1uBO/RMIW3IYtPDdJrf0I= X-Google-Smtp-Source: ABdhPJx8NosTpnZl78ZOtvdCZ9rAxvg+5jRMkVvoQL5iAPDN7bnCmA6VMBsEXBJfTEJaXOGWbSwUIl1SfZf+TimgxjQ= X-Received: by 2002:aed:3169:: with SMTP id 96mr18719772qtg.211.1590974720770; Sun, 31 May 2020 18:25:20 -0700 (PDT) MIME-Version: 1.0 References: <20200601000030.7443-1-arunisaac@systemreboot.net> In-Reply-To: <20200601000030.7443-1-arunisaac@systemreboot.net> From: zimoun Date: Mon, 1 Jun 2020 03:25:09 +0200 Message-ID: Content-Type: text/plain; charset="UTF-8" X-Spam-Score: -0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.0 (-) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=RJVpNU9b; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Spam-Score: 0.09 X-TUID: ZZ7m1qJMoapX Hi Arun, On Mon, 1 Jun 2020 at 02:00, Arun Isaac wrote: > Sorry for the long delay in replying to this thread. Based on the Ludo's comments [1] on v4 which is a simple re-write of your v3, I am finishing a vN+1.. but time flies and I am late on the topic too. :-) Well, this still unsent vN+1 series has the same performance of v4 on "guix pull" which is a key point compared to v3. Obviously, the performance on "guix search" are equivalent on both version. This vN+1 builds two caches -- to avoid binary breakage -- in only one go; the consuming 'fold-modules-public-variables*' is applied only once. [1] http://issues.guix.gnu.org/39258#93 > I think Ludo is right in that we can improve guix search performance with only > simple code improvements rather than including xapian or improving our > existing cache. Here are a few patches on those lines. Well, improving the cache is easy; at least as you did in v3 by adding another one. The most annoying part is the arguments rewrite of 'package->recutils' to be compliant. However after some comparisons, I am not convinced that BM25 will be worth to implement... > In `relevance`, we set our score to 0 if any of the regexps don't match. Then, > we might as well not match the remaining regexps. Patch 1 does this early cut > off optimization. Interesting. > Often our search strings are only literal strings. So, we can save some time > by using string-contains instead of invoking the regexp engine. Patch 2 does > this. In addition, guile's string-contains uses a naive O(n^2) string search > algorithm. We should perhaps use the O(n) Knuth-Morris-Pratt algorithm[1]. In > fact, a comment on line 2006 of libguile/srfi-13.c in the guile source code > mentions this. If implemented, the KMP algorithm could speed up guix search > further. > > [1]: https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm Really interesting idea, > Patch 3 and 4 are minor improvements. > > Here's a rough performance comparison. On cold or warm cache? > --8<---------------cut here---------------start------------->8--- > time ./pre-inst-env guix search game > > real 0m2.261s > user 0m2.351s > sys 0m0.104s > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > time guix search game > > real 0m2.661s > user 0m2.843s > sys 0m0.080s > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > time ./pre-inst-env guix search strategy game > > real 0m1.613s > user 0m1.635s > sys 0m0.096s > --8<---------------cut here---------------end--------------->8--- > > --8<---------------cut here---------------start------------->8--- > time guix search strategy game > > real 0m2.520s > user 0m2.583s > sys 0m0.112s > --8<---------------cut here---------------end--------------->8--- So in the best case, you have the ratio old/new is 1.5; this new version is 1.5 faster. Well, in the extra cache approach (v3 or v4) the ration old/new is really higher: 3.1 faster on cold cache (which is the one I am interested in) and 2.4 faster on warm cache. I will give a look to this new series and report what happens on my laptop. But basically, I would like "guix search" under the 1.0 second on my machine. ;-) Thank you for this new input. Cheers, simon