From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arun Isaac Subject: Re: Inverted index to accelerate guix package search Date: Tue, 21 Jan 2020 02:12:31 +0530 Message-ID: References: <87a76r68u6.fsf@ambrevar.xyz> <87sgkgxwir.fsf@elephly.net> <87a76ncvg0.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:44085) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1itdsw-0007Qc-O9 for guix-devel@gnu.org; Mon, 20 Jan 2020 15:43:00 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1itdsu-00027E-Ux for guix-devel@gnu.org; Mon, 20 Jan 2020 15:42:57 -0500 Received: from mugam.systemreboot.net ([139.59.75.54]:33298) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1itdst-00024k-Qa for guix-devel@gnu.org; Mon, 20 Jan 2020 15:42:56 -0500 In-Reply-To: List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: zimoun Cc: Guix Devel --=-=-= Content-Type: text/plain I've replaced the cache building code in gnu/packages.scm with code that builds a sqlite database instead. I haven't finished hooking this up to the guix search code. I'll have it ready in another day or two. > To test "guix pull", simple "make as-derivation". Disclaim: can take > some time :-) > > Then the issue is more to avoid to pollute your ~/.cache/guix and > ~/.config/guix :-) > > 1. Update Guix with the result in /tmp/test > > guix pull -p /tmp/test --url=/path/to/guix/repo > > 2. Create your SQL index > > /tmp/test/bin/guix pull -p /tmp/trash > > Now your index should be created with all the packages currently in master. > To have something reproducible (and faster), I suggest to add > --commit= and always pull against the same commit. > > 3. Test the index > > /tmp/test/bin/guix search foo > > I mean something along these lines. ;-) Thanks, I got the idea. I'll try it out. >> I think it is not possible to search using regular expressions in sqlite > > I think it is possible. I imagine something using multiple query. > I will give a look at the Guile module. Sure, let me know if you find something. > I disagree. We should keep the regexp. Otherwise we cannot include > under "guix search" or "guix package --search=" because arguments > about backward compatibility. I agree about backward compatibility. >> About sqlite versus an inverted index using vhashes, I don't know if it >> is possible to serialize a vhash onto disk. Even if that were possible, >> we'll have to load the entire vhash based inverted index into memory for >> every invocation of guix search, and that could hit >> performance. Something like guile-gdbm could have helped, but that's >> another story. > > And your first test was not fair. ;-) > Because you compared when the hash table was already in memory. > I mean to know the real performance, only timing can talk. :-) Yes, it wasn't completely fair. :-P I was assuming there was some way to efficiently serialize to disk and read from it without too much overhead. --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl4mELcACgkQLiXui2GA K7Pjngf+IeeuhLhHq0q+Pg2Mt9hl1nPJnT+5+C3fUOBAKw7aGrxgCDWkJr/OEP1Q nYz3XipwLoemZ8AewHPdFb4Su7yf4ZgiqiVfhuyPy1iPVwEJ8zNNsJRI9yj5r30c Ukt4mpjTcxn3k65vuWJ3pgfndOjHgYNFg7PDdfXauxaxVPHjpfooME0Bfed/D5dd o1TLj+Mj1ZerX3wHmvO4Lm4WwFbMFyhIjmrAlsM3vA+SSKGnfnGc9YHgRxLnlOKB uhYl4cbsdVV/F9whppW80zt7SVGQ5OlkI4bbr4VHf7iy6PUhI1M3IHKAl/h2gz9e nKEey3w2lgvlfIdeqZv/n1vhic+KkQ== =EHnY -----END PGP SIGNATURE----- --=-=-=--