From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id YFQqEifdrl4GJAAA0tVLHw (envelope-from ) for ; Sun, 03 May 2020 15:03:03 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id eLHiAjLdrl5KLgAAbx9fmQ (envelope-from ) for ; Sun, 03 May 2020 15:03:14 +0000 Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:470:142::17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 371059445E0 for ; Sun, 3 May 2020 15:03:12 +0000 (UTC) Received: from localhost ([::1]:37578 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jVG9A-00060O-GS for larch@yhetil.org; Sun, 03 May 2020 11:03:12 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:46034) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jVG90-0005o1-KD for guix-patches@gnu.org; Sun, 03 May 2020 11:03:03 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:46102) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jVG90-0000lF-Al for guix-patches@gnu.org; Sun, 03 May 2020 11:03:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jVG90-0005CJ-8J for guix-patches@gnu.org; Sun, 03 May 2020 11:03:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) References: In-Reply-To: Resent-From: zimoun Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sun, 03 May 2020 15:03:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 39258 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: 39258@debbugs.gnu.org Cc: arunisaac@systemreboot.net, mail@ambrevar.xyz, ludo@gnu.org, zimoun Received: via spool by 39258-submit@debbugs.gnu.org id=B39258.158851813919858 (code B ref 39258); Sun, 03 May 2020 15:03:02 +0000 Received: (at 39258) by debbugs.gnu.org; 3 May 2020 15:02:19 +0000 Received: from localhost ([127.0.0.1]:57633 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVG8I-0005AE-Gt for submit@debbugs.gnu.org; Sun, 03 May 2020 11:02:18 -0400 Received: from mail-wr1-f49.google.com ([209.85.221.49]:38465) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVG8G-0005A0-Rp for 39258@debbugs.gnu.org; Sun, 03 May 2020 11:02:17 -0400 Received: by mail-wr1-f49.google.com with SMTP id x17so17811311wrt.5 for <39258@debbugs.gnu.org>; Sun, 03 May 2020 08:02:16 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=XpDBkCV1DffKgjJoHS+zSKi+yRynZG1skgetGV1GmE8=; b=sNCxwcYe9mZD152kYS+gaBWjO8yxXYXhLP/IU5XiqgN2FGodHdyt8+nmuyY5x7XVrR LABRsXXTvnEx4lb40d5p1PreOpVjEaHkx4yONpi4yWU/FHdjuQbM83CBpBvvxDh41Z02 X3rwY2NtYFhsMG8vCn7aXw45esgeO/sfaPZds3Y8yVnipxAWKrUaFN+o8v8yaXtE8NAo QsROiwJljO+wDpppLM+OmW+PBM3FDnf3U7gFQBXKVeE+t/x2OJliOKkSj4GCkhaOK0QD 15WO6P2SbBEfF4CZ4XkoidGa6JN2GcnVKs5a3IIgcbeuRbfspX4XRq40a2IuaixQernM 18eA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id:mime-version :content-transfer-encoding; bh=XpDBkCV1DffKgjJoHS+zSKi+yRynZG1skgetGV1GmE8=; b=X4A0XfTcJ4E+vcnd/9SEpitajfeG1lMEzyzSAbDHO//eADCWBhWHPv5G+NfpXzCwtd wusA3WSdG+SoiRoZZ0TKXG2w3/ug6I6ulhd/s0ncW7arv0JvejurB7FePi4BTa2z1Sit Dy3+PIGllY/RNDv+9BivgBz5zRyuq0dYPT63T7nZ6Z54MJKekz7mfrX1xi4X9WaZ0io/ aIKxtNMCFjd8se3wu7ZIMbV06CEIgjz7tWTQe4Q7QrZcruwVFy9+QfU3kfXO9J7/1Cj4 hTOGuEXDWJtRNM+E/QPt/p5/ziBqRxZ7i1IIvR7jULjJ/Zg0L8bnJCKYT/Yx8zmCErG5 5WyQ== X-Gm-Message-State: AGi0PuZRz/xczQIgAV5VVEKewAc7cSIBSPPmTJVj12iPzDDpwZEG28g/ dGKURhLY10Ohn68Y1tErBGHjBGXN X-Google-Smtp-Source: APiQypL0vX7OC3rAki25pkau40GUKhiwicXoCdr4XaIdqEhuz+OuTbDDw72MjlNspao31qKHR2A/8w== X-Received: by 2002:adf:dd8a:: with SMTP id x10mr14711946wrl.308.1588518130518; Sun, 03 May 2020 08:02:10 -0700 (PDT) Received: from localhost.localdomain (57.246.195.77.rev.sfr.net. [77.195.246.57]) by smtp.gmail.com with ESMTPSA id x13sm9787829wmc.5.2020.05.03.08.02.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sun, 03 May 2020 08:02:09 -0700 (PDT) From: zimoun Date: Sun, 3 May 2020 17:01:51 +0200 Message-Id: <20200503150154.26532-1-zimon.toutoune@gmail.com> X-Mailer: git-send-email 2.26.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Spam-Score: 0.0 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.0 (-) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 X-Spam-Score: 5.59 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=gmail.com header.s=20161025 header.b=sNCxwcYe; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 2001:470:142::17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Scan-Result: default: False [5.59 / 13.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; GENERIC_REPUTATION(0.00)[-0.49374384522908]; FORGED_SENDER_MAILLIST(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:142::/48:c]; R_DKIM_REJECT(1.00)[gmail.com:s=20161025]; DWL_DNSWL_FAIL(0.00)[2001:470:142::17:server fail]; FREEMAIL_FROM(0.00)[gmail.com]; BROKEN_CONTENT_TYPE(1.50)[]; RCPT_COUNT_FIVE(0.00)[5]; R_MISSING_CHARSET(2.50)[]; DKIM_TRACE(0.00)[gmail.com:-]; MX_GOOD(-0.50)[cached: eggs.gnu.org]; MAILLIST(-0.20)[mailman]; FORGED_RECIPIENTS_MAILLIST(0.00)[]; RCVD_IN_DNSWL_FAIL(0.00)[2001:470:142::17:server fail]; MIME_TRACE(0.00)[0:+]; RCVD_TLS_LAST(0.00)[]; ASN(0.00)[asn:22989, ipnet:2001:470:142::/48, country:US]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_FROM(0.00)[larch=yhetil.org]; ARC_NA(0.00)[]; IP_REPUTATION_HAM(0.00)[asn: 22989(0.14), country: US(-0.00), ip: 2001:470:142::17(-0.49)]; FROM_NEQ_ENVFROM(0.00)[zimontoutoune@gmail.com,guix-patches-bounces@gnu.org]; FROM_HAS_DN(0.00)[]; TAGGED_RCPT(0.00)[]; PREVIOUSLY_DELIVERED(0.00)[39258@debbugs.gnu.org]; MIME_GOOD(-0.10)[text/plain]; HAS_LIST_UNSUB(-0.01)[]; FREEMAIL_CC(0.00)[systemreboot.net,ambrevar.xyz,gnu.org,gmail.com]; RCVD_COUNT_SEVEN(0.00)[9]; SUSPICIOUS_RECIPS(1.50)[]; DMARC_POLICY_SOFTFAIL(0.10)[gmail.com : SPF not aligned (relaxed),none] X-TUID: GC/Q6y0Fruqe Dear, The aim of this version v4 is to keep the same searching performances as the previous version v3 but to drastically reduce the generation of the cache. On my laptop, the overhead is now 4 seconds; compared to more than 20 seconds for v2 and v3. --8<---------------cut here---------------start------------->8--- # default time guix build /gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-cache.drv --check # v4 time guix build /gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-cache.drv --check --8<---------------cut here---------------end--------------->8--- | | default | v4 | |------+----------+-----------| | real | 0m6.012s | 0m10.244s | | user | 0m0.541s | 0m0.542s | | sys | 0m0.033s | 0m0.032s | In the version v3, the cache is built using 'cons' and 'fold-packages' (wrapper to 'fold-module-public-variables'). The version v4 modifies -- by adding other information -- the function 'generate-package-cache' which uses 'vhash' and 'fold-module-public-variables*'. Therefore the cache '/lib/guix/package.cache' contains more information. (The v4 structure of 'package.cache' is a quick draft, so details should be discussed and an interesting move should to have a structured (binary and all strings) S-exp; because it should become an entry point to export the packages list to JSON. WDYT?) Now, we are comparing apples to apples and the cost to compute BM25 (v2) is not free at all. Remember that BM25 is the state-of-the-art of information retrieval (relevance ranking) and it is delegated to Xapian (v2). I do not know if there is perfomance bottleneck between Guix, Guile-Xapian and Xapian itself but for sure the computation of BM25 is not free. More about that soon. To be clear about BM25 and caching, what I have in mind is: 1. "guix search --build-index" optionally done by the user if they wants for example the BM25 ranking. 2. Use BM25 metrics to detect poor package meta-data (synopsis and description); if it worth why not add another checker to "guix lint". However, ranking is another story and I am not convinced yet if BM25 fits Guix needs or not. * Details ~~~~~~~~~ The pacthes applies against the commit a357849f5b (and it is not yet rebased). --8<---------------cut here---------------start------------->8--- time ./pre-env-inst guix pull --branch=search-v4 --url=$PWD -p /tmp/v4 --8<---------------cut here---------------end--------------->8--- Similar test than the previous benchmark (cold cache). --8<---------------cut here---------------start------------->8--- time ./pre-env-inst /tmp/v4/bin/guix search crypto library \ | recsel -P name | grep libb2 name: libb2 real 0m0.784s user 0m0.810s sys 0m0.037s --8<---------------cut here---------------end--------------->8--- And the option '--load-path' turns off the cache and it fallbacks to the usual 'fold-package'. --8<---------------cut here---------------start------------->8--- time ./pre-inst-env /tmp/v4/bin/guix search -L /tmp/my-pkgs crypto library \ | recsel -C -p name | grep libb2 name: libb2 real 0m2.446s user 0m1.872s sys 0m0.187s --8<---------------cut here---------------end--------------->8--- * Still draft ~~~~~~~~~~~~~ 1. The name of 'fold-packages*' should be misleading since it does not return "true" packages. --8<---------------cut here---------------start------------->8--- (define get-hello (p r) (if (string=? (package-name p) "hello") p r)) (define no-cache (fold-packages get-hello '())) (define from-cache (fold-packages* get-hello '())) (equal? no-cache from-cache) ;;; #f --8<---------------cut here---------------end--------------->8--- Another name for the procedure is welcome if it is an issue. 2. The function 'package->recutils' in 'guix/ui.scm' is modified but it is not the better. --8<---------------cut here---------------start------------->8--- (match (package-supported-systems p) (('cache supported-systems) (string-join supported-systems)) (_ (string-join (package-transitive-supported-systems p))))) --8<---------------cut here---------------end--------------->8--- However it avoids to duplicate code; as it is done in version v3. 3. Deprecated packages are displayed (bug in v3 too). 4. Impolite '@@' is used to access the private license construction. 5. Commit messages are incomplete, copyright header too, etc.. * Next? ~~~~~~~ IMHO, simply caching improves the current situation: - a bit of extra time at pull time (less than 5s on my machine) + speed up at search time (2x faster) * maintainable code? Is it in the right direction? Could you advise for a more compliant code? Could you test on your machines to have another point of comparison? Best regards, simon zimoun (3): DRAFT packages: Add fields to packages cache. DRAFT packages: Add new procedure 'fold-packages*'. DRAFT guix package: Use cache in 'find-packages-by-description'. gnu/packages.scm | 98 ++++++++++++++++++++++++++++++++++++++-- guix/scripts/package.scm | 2 +- guix/ui.scm | 29 +++++++----- tests/packages.scm | 31 +++++++++++++ 4 files changed, 143 insertions(+), 17 deletions(-) -- 2.26.1