From mboxrd@z Thu Jan 1 00:00:00 1970 From: Arun Isaac Subject: Re: Inverted index to accelerate guix package search Date: Sat, 18 Jan 2020 00:59:05 +0530 Message-ID: References: <87a76r68u6.fsf@ambrevar.xyz> <87sgkgxwir.fsf@elephly.net> <87a76ncvg0.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:40247) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1isXJB-0005sZ-Mi for guix-devel@gnu.org; Fri, 17 Jan 2020 14:29:39 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1isXJ7-0004lk-0O for guix-devel@gnu.org; Fri, 17 Jan 2020 14:29:27 -0500 Received: from mugam.systemreboot.net ([139.59.75.54]:46590) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1isXJ5-0004gS-GX for guix-devel@gnu.org; Fri, 17 Jan 2020 14:29:24 -0500 In-Reply-To: List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane-mx.org@gnu.org Sender: "Guix-devel" To: zimoun Cc: Guix Devel --=-=-= Content-Type: text/plain > What is not clear to me right now in both implementations are. > > 1. > How to update the index. > Give a look at the "pull" code and the ~/.cache/guix folder. We don't "update" the index. At every guix pull we create it anew. Currently, generate-package-cache in gnu/packages.scm does this. generate-package-cache is called by package-cache-file in guix/channels.scm. package-cache-file is a channel profile hook listed under %channel-profile-hooks. Now, what I am unclear about is how to test my sqlite index building code without actually pushing to master and running a guix pull. I will go through the various tests in Guix and see if I can figure something out, but any pointers would be much appreciated. > 2. > How to deal with regexp. > It is more or less clear to me how to deal with using the trigram keys > but I do not know with SQLite; I have not thought about yet. I think it is not possible to search using regular expressions in sqlite unless some external module is loaded. See https://stackoverflow.com/questions/5071601/how-do-i-use-regex-in-a-sqlite-query/8338515#8338515 I think we should remove regex support altogether. I don't think a good search interface should expect the user to provide regexes for search. Certainly, it will be a lot less useful if and when we have xapian. However, just to keep backward compatibility, we can fall back to brute force fold-packages search for regexes. As Ludo pointed out, we can't remove the brute force code since we need to support cases when the cache is not authoritative. > If you want to implement it, go ahead. :-) Yes, I'll give it a shot. :-) I have some other commitments over the weekend, but hopefully I'll have something by Monday night. > Otherwise, I will try to finish next week what I started yesterday > evening using VHash. :-) About sqlite versus an inverted index using vhashes, I don't know if it is possible to serialize a vhash onto disk. Even if that were possible, we'll have to load the entire vhash based inverted index into memory for every invocation of guix search, and that could hit performance. Something like guile-gdbm could have helped, but that's another story. Also, I now agree with your earlier assessment that we should delegate all this to sqlite. :-) That guix already uses sqlite for other things is all the more reason. > (note that to avoid duplicate , the file sets.scm can be relevant) I didn't know about sets.scm when I wrote my first proof of concept inverted index script. That is why I reinvented the set using hash tables. I don't know how hash tables are different from VHashes or which is better. Cheers! :-) Arun. --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEf3MDQ/Lwnzx3v3nTLiXui2GAK7MFAl4iCwEACgkQLiXui2GA K7MOmgf8CG1OBDZXpzgSpUS0mASrl8M+AdHYf1uo+33Udwxi5lmTnNo/7c+/b+zB M6+psby/m11AJ8iEi+IAV18cF94xziZz7sSgBKMi8J3M/4UB2YfZaAB6VJZGLYip XNpNcVa6U68sg9YEZJZl+pl3l55f60NjGNIYd5l5HFJVk5RJgoXaZoFPv4wriAWM kza1sXrzl4PwlUJHGCVIZt5EEW4fzcUZ3KLfTLs8/xb3NfOP5jvPODIvaJYjxAn3 DBHrTru7QuMUatwdgw9vo+jkitVGeGn434Ry3zVy3tx7zm2QKXgMSKGkQ3swon1b vlw00KXeZajBDbzpY608fZ66Hz6Pdg== =GCYJ -----END PGP SIGNATURE----- --=-=-=--