Hi Zimoun,

Maybe you misunderstood a point: the filesearch database is not a
database of _all store items_, but only of the items that correspond to
the packages of a given Guix generation.

This should answer many of your comments.

>> I don't know the size of your store nor your hardware.  Could you
>> benchmark against my filesearch implementation?
>
> 30G as I reported in my previous email. ;-)

Sorry, I was unclear: I meant to benchmark the runtime of the Guile code
I wrote in the patch, i.e.

--8<---------------cut here---------------start------------->8---
,time (persist-all-local-packages)
--8<---------------cut here---------------end--------------->8---

>> I should have benchmarked with Lzip, it would have been more useful.  I
>> think we can get it down to approximately 8 MiB in Lzip.
>
> Well, I think it will be more with all the items of all the packages.

No, the 8 MiB include _all the packages_ of a Guix generation.
We never include the complete store, it would not make sense for filesearch.

> This means to setup server side, right?  So implement the "diff" in
> "guix publish", right?  Hum? I feel it is overcomplicated.

I don't think it's to complicated: client sends a request along with the
Guix generation commit and the closer Guix generation commit for which
they have a database, server diffs the 2 SQLite database, compresses the
result and sends it back.

> Well, what is the size of for a full /gnu/store/ containing all the
> packages of one specific revision?  Sorry if you already provided this
> information, I have missed it.

The size of a /gnu/store does not matter.  The size of the databse does
however. In the email from the 26th of September:

--8<---------------cut here---------------start------------->8---
			The database will all package descriptions and synopsis is 46 MiB and
			compresses down to 11 MiB in zstd.
--8<---------------cut here---------------end--------------->8---

>> "manually" is not good in my opinion.  The end-user will inevitably
>> forget.  An out-of-sync database would return bad results which is a
>> big no-no for search.  On-demand database updates are ideals I think.
>
> The tradeoff is:
>   - when is "on-demand"?  When updates the database?

"guix build" and "guix pull".

>   - still fast when I search

Sorry, what is your question?

>  - do not slow down other guix subcommands

"guix pull" is not called by other commands.
I don't think that "guix build" would be impacted much because the
database update for a single store item is very fast.

> What you are proposing is:
>
>  - when "guix search --file":
>      + if the database does not exist: fetch it
>      + otherwise: use it

No, do it in "guix pull" since it requires networking already.

>  - after each "guix build", update the database

Yes.

> I am still missing the other update mechanism for updating the database.

Why?

> (Note that the "fetch it" could be done at "guix pull" time which is
> more meaningful since pull requires network access as you said.  And
> the real computations for updating could be done at the first "guix
> search --file" after the pull.)

Maybe this is the misunderstanding: "fetch it" and "update it" is the
same thing.
You fetch the diff from the substitute server and you apply it onto your
local database.

> Note that since the same code is used on build farms and their store
> is several TB (see recent discussion about "guix gc" on Berlin that
> takes hours), the build and update of the database need some care. :-)

There is no difference between the build farm and my computer since I've
generated the database over all 15000+ packages.  That the store has
several TB is irrelevant since only the given 15000 items will be browsed.

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/