> What is not clear to me right now in both implementations are. > > 1. > How to update the index. > Give a look at the "pull" code and the ~/.cache/guix folder. We don't "update" the index. At every guix pull we create it anew. Currently, generate-package-cache in gnu/packages.scm does this. generate-package-cache is called by package-cache-file in guix/channels.scm. package-cache-file is a channel profile hook listed under %channel-profile-hooks. Now, what I am unclear about is how to test my sqlite index building code without actually pushing to master and running a guix pull. I will go through the various tests in Guix and see if I can figure something out, but any pointers would be much appreciated. > 2. > How to deal with regexp. > It is more or less clear to me how to deal with using the trigram keys > but I do not know with SQLite; I have not thought about yet. I think it is not possible to search using regular expressions in sqlite unless some external module is loaded. See https://stackoverflow.com/questions/5071601/how-do-i-use-regex-in-a-sqlite-query/8338515#8338515 I think we should remove regex support altogether. I don't think a good search interface should expect the user to provide regexes for search. Certainly, it will be a lot less useful if and when we have xapian. However, just to keep backward compatibility, we can fall back to brute force fold-packages search for regexes. As Ludo pointed out, we can't remove the brute force code since we need to support cases when the cache is not authoritative. > If you want to implement it, go ahead. :-) Yes, I'll give it a shot. :-) I have some other commitments over the weekend, but hopefully I'll have something by Monday night. > Otherwise, I will try to finish next week what I started yesterday > evening using VHash. :-) About sqlite versus an inverted index using vhashes, I don't know if it is possible to serialize a vhash onto disk. Even if that were possible, we'll have to load the entire vhash based inverted index into memory for every invocation of guix search, and that could hit performance. Something like guile-gdbm could have helped, but that's another story. Also, I now agree with your earlier assessment that we should delegate all this to sqlite. :-) That guix already uses sqlite for other things is all the more reason. > (note that to avoid duplicate , the file sets.scm can be relevant) I didn't know about sets.scm when I wrote my first proof of concept inverted index script. That is why I reinvented the set using hash tables. I don't know how hash tables are different from VHashes or which is better. Cheers! :-) Arun.