From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id voj+Gc/0rl4gYAAA0tVLHw (envelope-from ) for ; Sun, 03 May 2020 16:43:59 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 6KfGEtn0rl62agAA1q6Kng (envelope-from ) for ; Sun, 03 May 2020 16:44:09 +0000 Received: from lists.gnu.org (lists.gnu.org [IPv6:2001:470:142::17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id F283D943D73 for ; Sun, 3 May 2020 16:44:07 +0000 (UTC) Received: from localhost ([::1]:45586 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jVHiq-00005V-4w for larch@yhetil.org; Sun, 03 May 2020 12:44:08 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:41304) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1jVHik-00005H-CS for guix-patches@gnu.org; Sun, 03 May 2020 12:44:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:46222) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1jVHik-0005kx-43 for guix-patches@gnu.org; Sun, 03 May 2020 12:44:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jVHik-0001Un-19 for guix-patches@gnu.org; Sun, 03 May 2020 12:44:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#39258] [PATCH v4 0/3] Faster cache generation (similar as v3) Resent-From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Sun, 03 May 2020 16:44:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 39258 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: zimoun Cc: arunisaac@systemreboot.net, mail@ambrevar.xyz, 39258@debbugs.gnu.org Received: via spool by 39258-submit@debbugs.gnu.org id=B39258.15885242335711 (code B ref 39258); Sun, 03 May 2020 16:44:01 +0000 Received: (at 39258) by debbugs.gnu.org; 3 May 2020 16:43:53 +0000 Received: from localhost ([127.0.0.1]:57765 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVHia-0001U2-95 for submit@debbugs.gnu.org; Sun, 03 May 2020 12:43:53 -0400 Received: from eggs.gnu.org ([209.51.188.92]:38276) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1jVHiZ-0001Tq-1f for 39258@debbugs.gnu.org; Sun, 03 May 2020 12:43:51 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:55928) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jVHiS-0005fb-Aw; Sun, 03 May 2020 12:43:44 -0400 Received: from [2a01:e0a:1d:7270:af76:b9b:ca24:c465] (port=49660 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1jVHiR-0001GQ-HC; Sun, 03 May 2020 12:43:44 -0400 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= References: <20200503150154.26532-1-zimon.toutoune@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 15 =?UTF-8?Q?Flor=C3=A9al?= an 228 de la =?UTF-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Sun, 03 May 2020 18:43:41 +0200 In-Reply-To: <20200503150154.26532-1-zimon.toutoune@gmail.com> (zimoun's message of "Sun, 3 May 2020 17:01:51 +0200") Message-ID: <87r1w1ynnm.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -2.3 (--) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -3.3 (---) X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Scanner: scn0 X-Spam-Score: 0.49 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 2001:470:142::17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Scan-Result: default: False [0.49 / 13.00]; RCVD_VIA_SMTP_AUTH(0.00)[]; GENERIC_REPUTATION(0.00)[-0.49383536190199]; FORGED_SENDER_MAILLIST(0.00)[]; TO_DN_SOME(0.00)[]; R_SPF_ALLOW(-0.20)[+ip6:2001:470:142::/48:c]; IP_REPUTATION_HAM(0.00)[asn: 22989(0.14), country: US(-0.00), ip: 2001:470:142::17(-0.49)]; DWL_DNSWL_FAIL(0.00)[2001:470:142::17:server fail]; MX_GOOD(-0.50)[cached: eggs.gnu.org]; MAILLIST(-0.20)[mailman]; FREEMAIL_TO(0.00)[gmail.com]; RCVD_IN_DNSWL_FAIL(0.00)[2001:470:142::17:server fail]; RCVD_TLS_LAST(0.00)[]; R_DKIM_NA(0.00)[]; ASN(0.00)[asn:22989, ipnet:2001:470:142::/48, country:US]; TAGGED_FROM(0.00)[larch=yhetil.org]; ARC_NA(0.00)[]; FORGED_RECIPIENTS_MAILLIST(0.00)[]; FROM_NEQ_ENVFROM(0.00)[ludo@gnu.org,guix-patches-bounces@gnu.org]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[4]; MID_RHS_MATCH_FROM(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; DMARC_NA(0.00)[gnu.org]; HAS_LIST_UNSUB(-0.01)[]; MIME_TRACE(0.00)[0:+]; RCVD_COUNT_SEVEN(0.00)[9]; SUSPICIOUS_RECIPS(1.50)[] X-TUID: tl021/ORh9V4 Hello! zimoun skribis: > The aim of this version v4 is to keep the same searching performances as = the previous version v3 but to drastically reduce the generation of the cac= he. On my laptop, the overhead is now 4 seconds; compared to more than 20 = seconds for v2 and v3. > > # default > time guix build /gnu/store/0nfpp82mqglpwvl1nbfpaphw5db2ivcp-guix-package-= cache.drv --check > # v4 > time guix build /gnu/store/y78gfh1n7m3kyrj8wsqj25qc2cbc1a4d-guix-package-= cache.drv --check > > | | default | v4 | > |------+----------+-----------| > | real | 0m6.012s | 0m10.244s | > | user | 0m0.541s | 0m0.542s | > | sys | 0m0.033s | 0m0.032s | Not bad! > In the version v3, the cache is built using 'cons' and 'fold-packages' (w= rapper to 'fold-module-public-variables'). The version v4 modifies -- by a= dding other information -- the function 'generate-package-cache' which uses= 'vhash' and 'fold-module-public-variables*'. > > Therefore the cache '/lib/guix/package.cache' contains more > information. This breaks the binary interface, so we=E2=80=99ll have to analyze the impa= ct of such a change and devise a strategy. > (The v4 structure of 'package.cache' is a quick draft, so details > should be discussed and an interesting move should to have a > structured (binary and all strings) S-exp; because it should become an > entry point to export the packages list to JSON. WDYT?) It=E2=80=99s on purpose that this cache is an object file: it just needs to= be mmap=E2=80=99d, and that=E2=80=99s it. It=E2=80=99s the cheapest possible = way to do it. Parsing sexps would be more costly, and since we=E2=80=99re talking about startup time, this is sensitive. > Now, we are comparing apples to apples and the cost to compute BM25 (v2) = is not free at all. Remember that BM25 is the state-of-the-art of informat= ion retrieval (relevance ranking) and it is delegated to Xapian (v2). I do= not know if there is perfomance bottleneck between Guix, Guile-Xapian and = Xapian itself but for sure the computation of BM25 is not free. More about= that soon. > > To be clear about BM25 and caching, what I have in mind is: > 1. "guix search --build-index" optionally done by the user if they want= s for example the BM25 ranking. Something that must be done explicitly doesn=E2=80=99t seem great to me. A= s a user, I=E2=80=99d rather not think about search indexes and all. But I don= =E2=80=99t know, maybe if it happened automatically on the first =E2=80=98guix search= =E2=80=99 invocation that=E2=80=99d be fine. > 2. Use BM25 metrics to detect poor package meta-data (synopsis and desc= ription); if it worth why not add another checker to "guix lint". That=E2=80=99d be interesting! > 1. The name of 'fold-packages*' should be misleading since it does not r= eturn "true" packages. Did you see =E2=80=98fold-available-packages=E2=80=99? It seems you could = extend it instead of introducing =E2=80=98fold-packages*=E2=80=99, no? > 2. The function 'package->recutils' in 'guix/ui.scm' is modified but it = is not the better. > > (match (package-supported-systems p) > (('cache supported-systems) > (string-join supported-systems)) > (_ > (string-join (package-transitive-supported-systems p))))) > > However it avoids to duplicate code; as it is done in version v3. I made suggestions to Arun=E2=80=99s v3 about the API here. Essentially, I think I proposed having a procedure that takes the list of fields as keyword parameters, and =E2=80=98package->recutils=E2=80=99 would just dele= gate to that. > 3. Deprecated packages are displayed (bug in v3 too). > > 4. Impolite '@@' is used to access the private license construction. (guix licenses) could provide a =E2=80=98string->license=E2=80=99 procedure. Stopping here for now because I=E2=80=99m sorta drowning in patch review. = :-) Thanks for exploring this design space, we=E2=80=99re making progress! Ludo=E2=80=99.