From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id mFJrLHXqgl/0BwAA0tVLHw (envelope-from ) for ; Sun, 11 Oct 2020 11:20:21 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id OABAKHXqgl/kWwAA1q6Kng (envelope-from ) for ; Sun, 11 Oct 2020 11:20:21 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 95A0F9402A8 for ; Sun, 11 Oct 2020 11:20:20 +0000 (UTC) Received: from localhost ([::1]:60988 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRZOk-0004GG-Bt for larch@yhetil.org; Sun, 11 Oct 2020 07:20:18 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:35712) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRZO4-0004Ft-2h for guix-devel@gnu.org; Sun, 11 Oct 2020 07:19:37 -0400 Received: from relay8-d.mail.gandi.net ([217.70.183.201]:46131) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRZO1-0000HL-Gp; Sun, 11 Oct 2020 07:19:35 -0400 X-Originating-IP: 90.92.160.122 Received: from bababa (lfbn-idf2-1-1094-122.w90-92.abo.wanadoo.fr [90.92.160.122]) (Authenticated sender: mail@ambrevar.xyz) by relay8-d.mail.gandi.net (Postfix) with ESMTPSA id A5C9D1BF209; Sun, 11 Oct 2020 11:19:23 +0000 (UTC) From: Pierre Neidhardt To: zimoun , Ludovic =?utf-8?Q?Court=C3=A8s?= Subject: Re: File search progress: database review and question on triggers In-Reply-To: <865z7iqd9f.fsf@gmail.com> References: <87sgcuh8rb.fsf@ambrevar.xyz> <86imd4e7cr.fsf@gmail.com> <87eenspcf8.fsf@ambrevar.xyz> <865z94dz83.fsf@gmail.com> <87zh6gns4l.fsf@ambrevar.xyz> <87zh5c7hx6.fsf@ambrevar.xyz> <87k0w4zw8q.fsf@gnu.org> <875z7oijxu.fsf@ambrevar.xyz> <865z7iqd9f.fsf@gmail.com> Date: Sun, 11 Oct 2020 13:19:22 +0200 Message-ID: <87wnzx6mdh.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.183.201; envelope-from=mail@ambrevar.xyz; helo=relay8-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/10/11 07:19:24 X-ACL-Warn: Detected OS = Linux 3.11 and newer [fuzzy] X-Spam_score_int: 6 X-Spam_score: 0.6 X-Spam_bar: / X-Spam_report: (0.6 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=0.499, PDS_OTHER_BAD_TLD=1.999, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org, Mathieu Othacehe Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -1.61 X-TUID: z7eXGCACcCb8 --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Zimoun, Thanks for the feedback! > --8<---------------cut here---------------start------------->8--- > echo 3 > /proc/sys/vm/drop_caches > time updatedb --output=3D/tmp/store.db --database-root=3D/gnu/store/ > > real 0m19.903s > user 0m1.549s > sys 0m4.500s I don't know the size of your store nor your hardware. Could you benchmark against my filesearch implementation? > And then =E2=80=9Clocate=E2=80=9D support regexp and regex and it is fast= enough. But locate does not support word permutations, which is a very important feature for filesearch in my opinion. > The only point is that regexp is always cumbersome for me. Well: =C2=ABS= ome > people, when confronted with a problem, think "I know, I'll use regular > expressions." Now they have two problems.=C2=BB :-) [1] Exactly. Full text search is a big step forward for usability I think. > From my point of view, yes. Somehow =E2=80=9Cfilesearch=E2=80=9D is a su= bpart of > =E2=80=9Csearch=E2=80=9D. So it should be the machinery. I'll work on it. I'll try to make the code flexible enough so that it can be moved to another command easily, should we decide that "search" is not the right fit. > From my point of view, how to transfer the database from substitutes to > users and how to locally update (custom channels or custom load path) are > not easy. Maybe the core issues. Absolutely. > For example, I just did =E2=80=9Cguix pull=E2=80=9D and =E2=80=9C=E2=80= =93list-generation=E2=80=9D says from > f6dfe42 (Sept. 15) to 4ec2190 (Oct. 10):: > > 39.9 MB will be download > > more the tiny bits before =E2=80=9CComputing Guix derivation=E2=80=9D. S= ay 50MB max. > > Well, the =E2=80=9Clocate=E2=80=9D database for my =E2=80=9C/gnu/store=E2= =80=9D (~30GB) is already to > ~50MB, and ~20MB when compressed with gzip. And Pierre said: > > The database will all package descriptions and synopsis is 46 MiB > and compresses down to 11 MiB in zstd. I should have benchmarked with Lzip, it would have been more useful. I think we can get it down to approximately 8 MiB in Lzip. > which is better but still something. Well, it is not affordable to > fetch the database with =E2=80=9Cguix pull=E2=80=9D, In My Humble Opinion. We could send a "diff" of the database. For instance, if the user already has a file database for the Guix generation A, then guix pulls to B, the substitute server can send the diff between A and B. This would probably amount to less than 1 MiB if the generations are not too far apart. (Warning: actual measures needed!) > Therefore, the database would be fetched at the first =E2=80=9Cguix searc= h=E2=80=9D > (assuming point above). But now, how =E2=80=9Csearch=E2=80=9D could know= what is custom > build and what is not? Somehow, =E2=80=9Csearch=E2=80=9D should scan all= the store to > be able to update the database. > > And what happens each time I am doing a custom build then =E2=80=9Cfilese= arch=E2=80=9D. > The database should be updated, right? Well, it seems almost unusable. I mentioned this previously: we need to update the database on "guix build". This is very fast and would be mostly transparent to the user. This is essentially how "guix size" behaves. > The model =E2=80=9Cupdatedb/locate=E2=80=9D seems better. The user updat= es =E2=80=9Cmanually=E2=80=9D > if required and then location is fast. "manually" is not good in my opinion. The end-user will inevitably forget. An out-of-sync database would return bad results which is a big no-no for search. On-demand database updates are ideals I think. > To me, each time I am using =E2=80=9Cfilesearch=E2=80=9D: > > - first time: fetch the database corresponding the Guix commit and then > update it with my local store Possibly using a "diff" to shrink the download size. > - otherwise: use this database > - optionally update the database if the user wants to include new > custom items. No need for the optional point I believe. > We could imagine a hook or option to =E2=80=9Cguix pull=E2=80=9D specifyi= ng to also > fetch the database and update it at pull time instead of =E2=80=9Csearch= =E2=80=9D time. > Personally, I prefer longer =E2=80=9Cguix pull=E2=80=9D because it is alr= eady a bit long > and then fast =E2=80=9Csearch=E2=80=9D than half/half (not so long pull a= nd longer > search). I suggest we do it at pull time so that =3Dguix search=3D does not need an online network. =3Dguix pull=3D requires networking anyways. >> - Find a way to garbage-collect the database(s). My intuition is that >> we should have 1 database per Guix checkout and when we `guix gc` a >> Guix checkout we collect the corresponding database. > > Well, the exact same strategy as > ~/.config/guix/current/lib/guix/package.cache can be used. Oh! I didn't know about this file! What is it used for? > BTW, thanks Pierre for improving the Guix discoverability. :-) Thank you! :) =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQFGBAEBCAAwFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl+C6joSHG1haWxAYW1i cmV2YXIueHl6AAoJEJvc9Jeku8x/e5EH/1w0AFj3bOJmPZxWXyAKINE1PJO1OndK uhXohIYus+gieV0ziUZWpXD6a9sUW9vnKyjZW1JQdl0jPbdbDYAGOau317j0Pkqs 3ph2voexZAemHk6RAeEytUOAAhg1MHQGzKupbQsmfUPMzgUoxQZSdDWYZrS2qh8g pnZBpxi0Fp62iftTUfqyOPskleB8I83ae0Yhs+s8bN2GVbqBRTpbbF6kzOgbipYT gzZ9OcsqyY0S+KVsgBIOMIAuvBaUj7bSHCVFVXHImH+xc5/n9ugMdgoRERcVbuqe LLByt2aKZCz44IdhrDSJoCOX8+CEnFqjspxBfuD+rEhiAlhdE+RmvhQ= =xLkk -----END PGP SIGNATURE----- --=-=-=--