From: Pierre Neidhardt <mail@ambrevar.xyz>
To: zimoun <zimon.toutoune@gmail.com>
Cc: Guix Devel <guix-devel@gnu.org>, Mathieu Othacehe <othacehe@gnu.org>
Subject: Re: File search progress: database review and question on triggers
Date: Sun, 11 Oct 2020 16:25:59 +0200 [thread overview]
Message-ID: <87a6ws7saw.fsf@ambrevar.xyz> (raw)
In-Reply-To: <CAJ3okZ2EbtvBF7O-m1sU0PT1VrJtyM5ezKzy8DFsNXCz+w2x6Q@mail.gmail.com>
[-- Attachment #1: Type: text/plain, Size: 3898 bytes --]
Hi Zimoun,
Maybe you misunderstood a point: the filesearch database is not a
database of _all store items_, but only of the items that correspond to
the packages of a given Guix generation.
This should answer many of your comments.
>> I don't know the size of your store nor your hardware. Could you
>> benchmark against my filesearch implementation?
>
> 30G as I reported in my previous email. ;-)
Sorry, I was unclear: I meant to benchmark the runtime of the Guile code
I wrote in the patch, i.e.
--8<---------------cut here---------------start------------->8---
,time (persist-all-local-packages)
--8<---------------cut here---------------end--------------->8---
>> I should have benchmarked with Lzip, it would have been more useful. I
>> think we can get it down to approximately 8 MiB in Lzip.
>
> Well, I think it will be more with all the items of all the packages.
No, the 8 MiB include _all the packages_ of a Guix generation.
We never include the complete store, it would not make sense for filesearch.
> This means to setup server side, right? So implement the "diff" in
> "guix publish", right? Hum? I feel it is overcomplicated.
I don't think it's to complicated: client sends a request along with the
Guix generation commit and the closer Guix generation commit for which
they have a database, server diffs the 2 SQLite database, compresses the
result and sends it back.
> Well, what is the size of for a full /gnu/store/ containing all the
> packages of one specific revision? Sorry if you already provided this
> information, I have missed it.
The size of a /gnu/store does not matter. The size of the databse does
however. In the email from the 26th of September:
--8<---------------cut here---------------start------------->8---
The database will all package descriptions and synopsis is 46 MiB and
compresses down to 11 MiB in zstd.
--8<---------------cut here---------------end--------------->8---
>> "manually" is not good in my opinion. The end-user will inevitably
>> forget. An out-of-sync database would return bad results which is a
>> big no-no for search. On-demand database updates are ideals I think.
>
> The tradeoff is:
> - when is "on-demand"? When updates the database?
"guix build" and "guix pull".
> - still fast when I search
Sorry, what is your question?
> - do not slow down other guix subcommands
"guix pull" is not called by other commands.
I don't think that "guix build" would be impacted much because the
database update for a single store item is very fast.
> What you are proposing is:
>
> - when "guix search --file":
> + if the database does not exist: fetch it
> + otherwise: use it
No, do it in "guix pull" since it requires networking already.
> - after each "guix build", update the database
Yes.
> I am still missing the other update mechanism for updating the database.
Why?
> (Note that the "fetch it" could be done at "guix pull" time which is
> more meaningful since pull requires network access as you said. And
> the real computations for updating could be done at the first "guix
> search --file" after the pull.)
Maybe this is the misunderstanding: "fetch it" and "update it" is the
same thing.
You fetch the diff from the substitute server and you apply it onto your
local database.
> Note that since the same code is used on build farms and their store
> is several TB (see recent discussion about "guix gc" on Berlin that
> takes hours), the build and update of the database need some care. :-)
There is no difference between the build farm and my computer since I've
generated the database over all 15000+ packages. That the store has
several TB is irrelevant since only the given 15000 items will be browsed.
Cheers!
--
Pierre Neidhardt
https://ambrevar.xyz/
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]
next prev parent reply other threads:[~2020-10-11 14:27 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11 9:43 ` Mathieu Othacehe
2020-08-11 12:35 ` Pierre Neidhardt
2020-08-15 12:48 ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54 ` Pierre Neidhardt
2020-08-11 17:58 ` Pierre Neidhardt
2020-08-11 20:08 ` Ricardo Wurmus
2020-08-12 19:10 ` Pierre Neidhardt
2020-08-12 20:13 ` Julien Lepiller
2020-08-12 20:43 ` Pierre Neidhardt
2020-08-12 21:29 ` Julien Lepiller
2020-08-12 22:29 ` Ricardo Wurmus
2020-08-13 6:55 ` Pierre Neidhardt
2020-08-13 6:52 ` Pierre Neidhardt
2020-08-13 9:34 ` Ricardo Wurmus
2020-08-13 10:04 ` Pierre Neidhardt
2020-08-15 12:47 ` Hartmut Goebel
2020-08-15 21:20 ` Bengt Richter
2020-08-16 8:18 ` Hartmut Goebel
2020-08-12 20:32 ` Pierre Neidhardt
2020-08-13 0:17 ` Arun Isaac
2020-08-13 6:58 ` Pierre Neidhardt
2020-08-13 9:40 ` Pierre Neidhardt
2020-08-13 10:08 ` Pierre Neidhardt
2020-08-13 11:47 ` Ricardo Wurmus
2020-08-13 13:44 ` Pierre Neidhardt
2020-08-13 12:20 ` Arun Isaac
2020-08-13 13:53 ` Pierre Neidhardt
2020-08-13 15:14 ` Arun Isaac
2020-08-13 15:36 ` Pierre Neidhardt
2020-08-13 15:56 ` Pierre Neidhardt
2020-08-15 19:33 ` Arun Isaac
2020-08-24 8:29 ` Pierre Neidhardt
2020-08-24 10:53 ` Pierre Neidhardt
2020-09-04 19:15 ` Arun Isaac
2020-09-05 7:48 ` Pierre Neidhardt
2020-09-06 9:25 ` Arun Isaac
2020-09-06 10:05 ` Pierre Neidhardt
2020-09-06 10:33 ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15 ` Pierre Neidhardt
2020-08-27 12:56 ` zimoun
2020-08-27 13:19 ` Pierre Neidhardt
2020-09-26 14:04 ` Pierre Neidhardt
2020-09-26 14:12 ` Pierre Neidhardt
2020-10-05 12:35 ` Ludovic Courtès
2020-10-05 18:53 ` Pierre Neidhardt
2020-10-09 21:16 ` zimoun
2020-10-10 8:57 ` Pierre Neidhardt
2020-10-10 14:58 ` zimoun
2020-10-12 10:16 ` Ludovic Courtès
2020-10-12 11:18 ` Pierre Neidhardt
2020-10-13 13:48 ` Ludovic Courtès
2020-10-13 13:59 ` Pierre Neidhardt
2020-10-10 16:03 ` zimoun
2020-10-11 11:19 ` Pierre Neidhardt
2020-10-11 13:02 ` zimoun
2020-10-11 14:25 ` Pierre Neidhardt [this message]
2020-10-11 16:05 ` zimoun
2020-10-12 10:20 ` Ludovic Courtès
2020-10-12 11:21 ` Pierre Neidhardt
2020-10-13 13:45 ` Ludovic Courtès
2020-10-13 13:56 ` Pierre Neidhardt
2020-10-13 21:22 ` Ludovic Courtès
2020-10-14 7:50 ` Pierre Neidhardt
2020-10-16 10:30 ` Ludovic Courtès
2020-10-17 9:14 ` Pierre Neidhardt
2020-10-17 19:17 ` Pierre Neidhardt
2020-10-21 9:53 ` Ludovic Courtès
2020-10-21 9:58 ` Pierre Neidhardt
2020-10-12 11:23 ` zimoun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87a6ws7saw.fsf@ambrevar.xyz \
--to=mail@ambrevar.xyz \
--cc=guix-devel@gnu.org \
--cc=othacehe@gnu.org \
--cc=zimon.toutoune@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.