all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Pierre Neidhardt <mail@ambrevar.xyz>
To: zimoun <zimon.toutoune@gmail.com>
Cc: Guix Devel <guix-devel@gnu.org>, Mathieu Othacehe <othacehe@gnu.org>
Subject: Re: File search progress: database review and question on triggers
Date: Sun, 11 Oct 2020 16:25:59 +0200	[thread overview]
Message-ID: <87a6ws7saw.fsf@ambrevar.xyz> (raw)
In-Reply-To: <CAJ3okZ2EbtvBF7O-m1sU0PT1VrJtyM5ezKzy8DFsNXCz+w2x6Q@mail.gmail.com>

[-- Attachment #1: Type: text/plain, Size: 3898 bytes --]

Hi Zimoun,

Maybe you misunderstood a point: the filesearch database is not a
database of _all store items_, but only of the items that correspond to
the packages of a given Guix generation.

This should answer many of your comments.

>> I don't know the size of your store nor your hardware.  Could you
>> benchmark against my filesearch implementation?
>
> 30G as I reported in my previous email. ;-)

Sorry, I was unclear: I meant to benchmark the runtime of the Guile code
I wrote in the patch, i.e.

--8<---------------cut here---------------start------------->8---
,time (persist-all-local-packages)
--8<---------------cut here---------------end--------------->8---

>> I should have benchmarked with Lzip, it would have been more useful.  I
>> think we can get it down to approximately 8 MiB in Lzip.
>
> Well, I think it will be more with all the items of all the packages.

No, the 8 MiB include _all the packages_ of a Guix generation.
We never include the complete store, it would not make sense for filesearch.

> This means to setup server side, right?  So implement the "diff" in
> "guix publish", right?  Hum? I feel it is overcomplicated.

I don't think it's to complicated: client sends a request along with the
Guix generation commit and the closer Guix generation commit for which
they have a database, server diffs the 2 SQLite database, compresses the
result and sends it back.

> Well, what is the size of for a full /gnu/store/ containing all the
> packages of one specific revision?  Sorry if you already provided this
> information, I have missed it.

The size of a /gnu/store does not matter.  The size of the databse does
however. In the email from the 26th of September:

--8<---------------cut here---------------start------------->8---
			The database will all package descriptions and synopsis is 46 MiB and
			compresses down to 11 MiB in zstd.
--8<---------------cut here---------------end--------------->8---

>> "manually" is not good in my opinion.  The end-user will inevitably
>> forget.  An out-of-sync database would return bad results which is a
>> big no-no for search.  On-demand database updates are ideals I think.
>
> The tradeoff is:
>   - when is "on-demand"?  When updates the database?

"guix build" and "guix pull".

>   - still fast when I search

Sorry, what is your question?

>  - do not slow down other guix subcommands

"guix pull" is not called by other commands.
I don't think that "guix build" would be impacted much because the
database update for a single store item is very fast.

> What you are proposing is:
>
>  - when "guix search --file":
>      + if the database does not exist: fetch it
>      + otherwise: use it

No, do it in "guix pull" since it requires networking already.

>  - after each "guix build", update the database

Yes.

> I am still missing the other update mechanism for updating the database.

Why?

> (Note that the "fetch it" could be done at "guix pull" time which is
> more meaningful since pull requires network access as you said.  And
> the real computations for updating could be done at the first "guix
> search --file" after the pull.)

Maybe this is the misunderstanding: "fetch it" and "update it" is the
same thing.
You fetch the diff from the substitute server and you apply it onto your
local database.

> Note that since the same code is used on build farms and their store
> is several TB (see recent discussion about "guix gc" on Berlin that
> takes hours), the build and update of the database need some care. :-)

There is no difference between the build farm and my computer since I've
generated the database over all 15000+ packages.  That the store has
several TB is irrelevant since only the given 15000 items will be browsed.

Cheers!

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

  reply	other threads:[~2020-10-11 14:27 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11  9:43 ` Mathieu Othacehe
2020-08-11 12:35   ` Pierre Neidhardt
2020-08-15 12:48     ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54   ` Pierre Neidhardt
2020-08-11 17:58     ` Pierre Neidhardt
2020-08-11 20:08       ` Ricardo Wurmus
2020-08-12 19:10         ` Pierre Neidhardt
2020-08-12 20:13           ` Julien Lepiller
2020-08-12 20:43             ` Pierre Neidhardt
2020-08-12 21:29               ` Julien Lepiller
2020-08-12 22:29                 ` Ricardo Wurmus
2020-08-13  6:55                   ` Pierre Neidhardt
2020-08-13  6:52                 ` Pierre Neidhardt
2020-08-13  9:34                   ` Ricardo Wurmus
2020-08-13 10:04                     ` Pierre Neidhardt
2020-08-15 12:47                       ` Hartmut Goebel
2020-08-15 21:20                         ` Bengt Richter
2020-08-16  8:18                           ` Hartmut Goebel
2020-08-12 20:32           ` Pierre Neidhardt
2020-08-13  0:17           ` Arun Isaac
2020-08-13  6:58             ` Pierre Neidhardt
2020-08-13  9:40             ` Pierre Neidhardt
2020-08-13 10:08               ` Pierre Neidhardt
2020-08-13 11:47               ` Ricardo Wurmus
2020-08-13 13:44                 ` Pierre Neidhardt
2020-08-13 12:20               ` Arun Isaac
2020-08-13 13:53                 ` Pierre Neidhardt
2020-08-13 15:14                   ` Arun Isaac
2020-08-13 15:36                     ` Pierre Neidhardt
2020-08-13 15:56                       ` Pierre Neidhardt
2020-08-15 19:33                         ` Arun Isaac
2020-08-24  8:29                           ` Pierre Neidhardt
2020-08-24 10:53                             ` Pierre Neidhardt
2020-09-04 19:15                               ` Arun Isaac
2020-09-05  7:48                                 ` Pierre Neidhardt
2020-09-06  9:25                                   ` Arun Isaac
2020-09-06 10:05                                     ` Pierre Neidhardt
2020-09-06 10:33                                       ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15   ` Pierre Neidhardt
2020-08-27 12:56     ` zimoun
2020-08-27 13:19       ` Pierre Neidhardt
2020-09-26 14:04         ` Pierre Neidhardt
2020-09-26 14:12           ` Pierre Neidhardt
2020-10-05 12:35           ` Ludovic Courtès
2020-10-05 18:53             ` Pierre Neidhardt
2020-10-09 21:16               ` zimoun
2020-10-10  8:57                 ` Pierre Neidhardt
2020-10-10 14:58                   ` zimoun
2020-10-12 10:16                   ` Ludovic Courtès
2020-10-12 11:18                     ` Pierre Neidhardt
2020-10-13 13:48                       ` Ludovic Courtès
2020-10-13 13:59                         ` Pierre Neidhardt
2020-10-10 16:03               ` zimoun
2020-10-11 11:19                 ` Pierre Neidhardt
2020-10-11 13:02                   ` zimoun
2020-10-11 14:25                     ` Pierre Neidhardt [this message]
2020-10-11 16:05                       ` zimoun
2020-10-12 10:20               ` Ludovic Courtès
2020-10-12 11:21                 ` Pierre Neidhardt
2020-10-13 13:45                   ` Ludovic Courtès
2020-10-13 13:56                     ` Pierre Neidhardt
2020-10-13 21:22                       ` Ludovic Courtès
2020-10-14  7:50                         ` Pierre Neidhardt
2020-10-16 10:30                           ` Ludovic Courtès
2020-10-17  9:14                             ` Pierre Neidhardt
2020-10-17 19:17                               ` Pierre Neidhardt
2020-10-21  9:53                               ` Ludovic Courtès
2020-10-21  9:58                                 ` Pierre Neidhardt
2020-10-12 11:23                 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87a6ws7saw.fsf@ambrevar.xyz \
    --to=mail@ambrevar.xyz \
    --cc=guix-devel@gnu.org \
    --cc=othacehe@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.