From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id CCn8Jz4Wg18jWAAA0tVLHw (envelope-from ) for ; Sun, 11 Oct 2020 14:27:10 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id 2GS4Iz4Wg1/mAwAAbx9fmQ (envelope-from ) for ; Sun, 11 Oct 2020 14:27:10 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 584B89400BF for ; Sun, 11 Oct 2020 14:27:10 +0000 (UTC) Received: from localhost ([::1]:60438 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRcJZ-00089Y-A8 for larch@yhetil.org; Sun, 11 Oct 2020 10:27:09 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:57274) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRcJO-00089E-PV for guix-devel@gnu.org; Sun, 11 Oct 2020 10:26:58 -0400 Received: from relay11.mail.gandi.net ([217.70.178.231]:40633) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRcIY-0007WL-6C; Sun, 11 Oct 2020 10:26:57 -0400 Received: from bababa (lfbn-idf2-1-1094-122.w90-92.abo.wanadoo.fr [90.92.160.122]) (Authenticated sender: mail@ambrevar.xyz) by relay11.mail.gandi.net (Postfix) with ESMTPSA id 74149100003; Sun, 11 Oct 2020 14:26:01 +0000 (UTC) From: Pierre Neidhardt To: zimoun Subject: Re: File search progress: database review and question on triggers In-Reply-To: References: <87sgcuh8rb.fsf@ambrevar.xyz> <86imd4e7cr.fsf@gmail.com> <87eenspcf8.fsf@ambrevar.xyz> <865z94dz83.fsf@gmail.com> <87zh6gns4l.fsf@ambrevar.xyz> <87zh5c7hx6.fsf@ambrevar.xyz> <87k0w4zw8q.fsf@gnu.org> <875z7oijxu.fsf@ambrevar.xyz> <865z7iqd9f.fsf@gmail.com> <87wnzx6mdh.fsf@ambrevar.xyz> Date: Sun, 11 Oct 2020 16:25:59 +0200 Message-ID: <87a6ws7saw.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.178.231; envelope-from=mail@ambrevar.xyz; helo=relay11.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/11 09:15:08 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: 0 X-Spam_score: -0.1 X-Spam_bar: / X-Spam_report: (-0.1 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=0.499, PDS_OTHER_BAD_TLD=1.999, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix Devel , Mathieu Othacehe Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: 0.89 X-TUID: Kp/kLrPA+yC5 --=-=-= Content-Type: text/plain Content-Transfer-Encoding: quoted-printable Hi Zimoun, Maybe you misunderstood a point: the filesearch database is not a database of _all store items_, but only of the items that correspond to the packages of a given Guix generation. This should answer many of your comments. >> I don't know the size of your store nor your hardware. Could you >> benchmark against my filesearch implementation? > > 30G as I reported in my previous email. ;-) Sorry, I was unclear: I meant to benchmark the runtime of the Guile code I wrote in the patch, i.e. =2D-8<---------------cut here---------------start------------->8--- ,time (persist-all-local-packages) =2D-8<---------------cut here---------------end--------------->8--- >> I should have benchmarked with Lzip, it would have been more useful. I >> think we can get it down to approximately 8 MiB in Lzip. > > Well, I think it will be more with all the items of all the packages. No, the 8 MiB include _all the packages_ of a Guix generation. We never include the complete store, it would not make sense for filesearch. > This means to setup server side, right? So implement the "diff" in > "guix publish", right? Hum? I feel it is overcomplicated. I don't think it's to complicated: client sends a request along with the Guix generation commit and the closer Guix generation commit for which they have a database, server diffs the 2 SQLite database, compresses the result and sends it back. > Well, what is the size of for a full /gnu/store/ containing all the > packages of one specific revision? Sorry if you already provided this > information, I have missed it. The size of a /gnu/store does not matter. The size of the databse does however. In the email from the 26th of September: =2D-8<---------------cut here---------------start------------->8--- The database will all package descriptions and synopsis is 46 MiB and compresses down to 11 MiB in zstd. =2D-8<---------------cut here---------------end--------------->8--- >> "manually" is not good in my opinion. The end-user will inevitably >> forget. An out-of-sync database would return bad results which is a >> big no-no for search. On-demand database updates are ideals I think. > > The tradeoff is: > - when is "on-demand"? When updates the database? "guix build" and "guix pull". > - still fast when I search Sorry, what is your question? > - do not slow down other guix subcommands "guix pull" is not called by other commands. I don't think that "guix build" would be impacted much because the database update for a single store item is very fast. > What you are proposing is: > > - when "guix search --file": > + if the database does not exist: fetch it > + otherwise: use it No, do it in "guix pull" since it requires networking already. > - after each "guix build", update the database Yes. > I am still missing the other update mechanism for updating the database. Why? > (Note that the "fetch it" could be done at "guix pull" time which is > more meaningful since pull requires network access as you said. And > the real computations for updating could be done at the first "guix > search --file" after the pull.) Maybe this is the misunderstanding: "fetch it" and "update it" is the same thing. You fetch the diff from the substitute server and you apply it onto your local database. > Note that since the same code is used on build farms and their store > is several TB (see recent discussion about "guix gc" on Berlin that > takes hours), the build and update of the database need some care. :-) There is no difference between the build farm and my computer since I've generated the database over all 15000+ packages. That the store has several TB is irrelevant since only the given 15000 items will be browsed. Cheers! =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl+DFfcSHG1haWxAYW1i cmV2YXIueHl6AAoJEJvc9Jeku8x/ppkH/R0je2PvsFfNAmFCDn1F8FU4im5i7txH vMtS2WjDH/s3MQ7xvKnG5rGejbCEBPLEJrAlE/XkTXD9q9a+BFB4TCOWYjawyTrY 1SV/+3ckkDYAcEumorKf2IkOa75Fvr9JwVTyJQactYyVnqE5G3g/zMugS4ubFLWT Rt0fV9/ZwO0ZD2EODv0NMbdxzXTnCV7g5LmLrlp6CCipi6/MxUiWOkKAjP/F8W/7 xtdyL67faNauizrVCr6QmLSyvYoXG0+jd5whAJxucLdVamvxVIYipkN1PJbACYE4 xnE2/4EPsH1bJW54J2uxnFVmjJQrSuY9M+W2v3j9ktR100nwVGKCQyM= =0B/P -----END PGP SIGNATURE----- --=-=-=--