Hi Zimoun, Maybe you misunderstood a point: the filesearch database is not a database of _all store items_, but only of the items that correspond to the packages of a given Guix generation. This should answer many of your comments. >> I don't know the size of your store nor your hardware. Could you >> benchmark against my filesearch implementation? > > 30G as I reported in my previous email. ;-) Sorry, I was unclear: I meant to benchmark the runtime of the Guile code I wrote in the patch, i.e. --8<---------------cut here---------------start------------->8--- ,time (persist-all-local-packages) --8<---------------cut here---------------end--------------->8--- >> I should have benchmarked with Lzip, it would have been more useful. I >> think we can get it down to approximately 8 MiB in Lzip. > > Well, I think it will be more with all the items of all the packages. No, the 8 MiB include _all the packages_ of a Guix generation. We never include the complete store, it would not make sense for filesearch. > This means to setup server side, right? So implement the "diff" in > "guix publish", right? Hum? I feel it is overcomplicated. I don't think it's to complicated: client sends a request along with the Guix generation commit and the closer Guix generation commit for which they have a database, server diffs the 2 SQLite database, compresses the result and sends it back. > Well, what is the size of for a full /gnu/store/ containing all the > packages of one specific revision? Sorry if you already provided this > information, I have missed it. The size of a /gnu/store does not matter. The size of the databse does however. In the email from the 26th of September: --8<---------------cut here---------------start------------->8--- The database will all package descriptions and synopsis is 46 MiB and compresses down to 11 MiB in zstd. --8<---------------cut here---------------end--------------->8--- >> "manually" is not good in my opinion. The end-user will inevitably >> forget. An out-of-sync database would return bad results which is a >> big no-no for search. On-demand database updates are ideals I think. > > The tradeoff is: > - when is "on-demand"? When updates the database? "guix build" and "guix pull". > - still fast when I search Sorry, what is your question? > - do not slow down other guix subcommands "guix pull" is not called by other commands. I don't think that "guix build" would be impacted much because the database update for a single store item is very fast. > What you are proposing is: > > - when "guix search --file": > + if the database does not exist: fetch it > + otherwise: use it No, do it in "guix pull" since it requires networking already. > - after each "guix build", update the database Yes. > I am still missing the other update mechanism for updating the database. Why? > (Note that the "fetch it" could be done at "guix pull" time which is > more meaningful since pull requires network access as you said. And > the real computations for updating could be done at the first "guix > search --file" after the pull.) Maybe this is the misunderstanding: "fetch it" and "update it" is the same thing. You fetch the diff from the substitute server and you apply it onto your local database. > Note that since the same code is used on build farms and their store > is several TB (see recent discussion about "guix gc" on Berlin that > takes hours), the build and update of the database need some care. :-) There is no difference between the build farm and my computer since I've generated the database over all 15000+ packages. That the store has several TB is irrelevant since only the given 15000 items will be browsed. Cheers! -- Pierre Neidhardt https://ambrevar.xyz/