unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Pierre Neidhardt <mail@ambrevar.xyz>
To: zimoun <zimon.toutoune@gmail.com>, "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org, Mathieu Othacehe <othacehe@gnu.org>
Subject: Re: File search progress: database review and question on triggers
Date: Sun, 11 Oct 2020 13:19:22 +0200	[thread overview]
Message-ID: <87wnzx6mdh.fsf@ambrevar.xyz> (raw)
In-Reply-To: <865z7iqd9f.fsf@gmail.com>

[-- Attachment #1: Type: text/plain, Size: 4800 bytes --]

Hi Zimoun,

Thanks for the feedback!

> --8<---------------cut here---------------start------------->8---
> echo 3 > /proc/sys/vm/drop_caches
> time updatedb --output=/tmp/store.db --database-root=/gnu/store/
>
> real    0m19.903s
> user    0m1.549s
> sys     0m4.500s

I don't know the size of your store nor your hardware.  Could you
benchmark against my filesearch implementation?

> And then “locate” support regexp and regex and it is fast enough.

But locate does not support word permutations, which is a very important
feature for filesearch in my opinion.

> The only point is that regexp is always cumbersome for me.  Well: «Some
> people, when confronted with a problem, think "I know, I'll use regular
> expressions." Now they have two problems.» :-) [1]

Exactly.  Full text search is a big step forward for usability I think.

> From my point of view, yes.  Somehow “filesearch” is a subpart of
> “search”.  So it should be the machinery.

I'll work on it.  I'll try to make the code flexible enough so that it
can be moved to another command easily, should we decide that "search"
is not the right fit.

> From my point of view, how to transfer the database from substitutes to
> users and how to locally update (custom channels or custom load path) are
> not easy.  Maybe the core issues.

Absolutely.

> For example, I just did “guix pull” and “–list-generation” says from
> f6dfe42 (Sept. 15) to 4ec2190 (Oct. 10)::
>
>    39.9 MB will be download
>
> more the tiny bits before “Computing Guix derivation”.  Say 50MB max.
>
> Well, the “locate” database for my “/gnu/store” (~30GB) is already to
> ~50MB, and ~20MB when compressed with gzip.  And Pierre said:
>
>       The database will all package descriptions and synopsis is 46 MiB
>       and compresses down to 11 MiB in zstd.

I should have benchmarked with Lzip, it would have been more useful.  I
think we can get it down to approximately 8 MiB in Lzip.

> which is better but still something.  Well, it is not affordable to
> fetch the database with “guix pull”, In My Humble Opinion.

We could send a "diff" of the database.

For instance, if the user already has a file database for the Guix
generation A, then guix pulls to B, the substitute server can send the
diff between A and B.  This would probably amount to less than 1 MiB if
the generations are not too far apart.  (Warning: actual measures needed!)

> Therefore, the database would be fetched at the first “guix search”
> (assuming point above).  But now, how “search” could know what is custom
> build and what is not?  Somehow, “search” should scan all the store to
> be able to update the database.
>
> And what happens each time I am doing a custom build then “filesearch”.
> The database should be updated, right?  Well, it seems almost unusable.

I mentioned this previously: we need to update the database on "guix
build".  This is very fast and would be mostly transparent to the user.
This is essentially how "guix size" behaves.

> The model “updatedb/locate” seems better.  The user updates “manually”
> if required and then location is fast.

"manually" is not good in my opinion.  The end-user will inevitably
forget.  An out-of-sync database would return bad results which is a
big no-no for search.  On-demand database updates are ideals I think.

> To me, each time I am using “filesearch”:
>
>  - first time: fetch the database corresponding the Guix commit and then
>  update it with my local store

Possibly using a "diff" to shrink the download size.

>  - otherwise: use this database
>  - optionally update the database if the user wants to include new
>  custom items.

No need for the optional point I believe.

> We could imagine a hook or option to “guix pull” specifying to also
> fetch the database and update it at pull time instead of “search” time.
> Personally, I prefer longer “guix pull” because it is already a bit long
> and then fast “search” than half/half (not so long pull and longer
> search).

I suggest we do it at pull time so that =guix search= does not need an
online network.  =guix pull= requires networking anyways.

>> - Find a way to garbage-collect the database(s).  My intuition is that
>>   we should have 1 database per Guix checkout and when we `guix gc` a
>>   Guix checkout we collect the corresponding database.
>
> Well, the exact same strategy as
> ~/.config/guix/current/lib/guix/package.cache can be used.

Oh!  I didn't know about this file!  What is it used for?

> BTW, thanks Pierre for improving the Guix discoverability. :-)

Thank you! :)

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 511 bytes --]

  reply	other threads:[~2020-10-11 11:20 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11  9:43 ` Mathieu Othacehe
2020-08-11 12:35   ` Pierre Neidhardt
2020-08-15 12:48     ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54   ` Pierre Neidhardt
2020-08-11 17:58     ` Pierre Neidhardt
2020-08-11 20:08       ` Ricardo Wurmus
2020-08-12 19:10         ` Pierre Neidhardt
2020-08-12 20:13           ` Julien Lepiller
2020-08-12 20:43             ` Pierre Neidhardt
2020-08-12 21:29               ` Julien Lepiller
2020-08-12 22:29                 ` Ricardo Wurmus
2020-08-13  6:55                   ` Pierre Neidhardt
2020-08-13  6:52                 ` Pierre Neidhardt
2020-08-13  9:34                   ` Ricardo Wurmus
2020-08-13 10:04                     ` Pierre Neidhardt
2020-08-15 12:47                       ` Hartmut Goebel
2020-08-15 21:20                         ` Bengt Richter
2020-08-16  8:18                           ` Hartmut Goebel
2020-08-12 20:32           ` Pierre Neidhardt
2020-08-13  0:17           ` Arun Isaac
2020-08-13  6:58             ` Pierre Neidhardt
2020-08-13  9:40             ` Pierre Neidhardt
2020-08-13 10:08               ` Pierre Neidhardt
2020-08-13 11:47               ` Ricardo Wurmus
2020-08-13 13:44                 ` Pierre Neidhardt
2020-08-13 12:20               ` Arun Isaac
2020-08-13 13:53                 ` Pierre Neidhardt
2020-08-13 15:14                   ` Arun Isaac
2020-08-13 15:36                     ` Pierre Neidhardt
2020-08-13 15:56                       ` Pierre Neidhardt
2020-08-15 19:33                         ` Arun Isaac
2020-08-24  8:29                           ` Pierre Neidhardt
2020-08-24 10:53                             ` Pierre Neidhardt
2020-09-04 19:15                               ` Arun Isaac
2020-09-05  7:48                                 ` Pierre Neidhardt
2020-09-06  9:25                                   ` Arun Isaac
2020-09-06 10:05                                     ` Pierre Neidhardt
2020-09-06 10:33                                       ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15   ` Pierre Neidhardt
2020-08-27 12:56     ` zimoun
2020-08-27 13:19       ` Pierre Neidhardt
2020-09-26 14:04         ` Pierre Neidhardt
2020-09-26 14:12           ` Pierre Neidhardt
2020-10-05 12:35           ` Ludovic Courtès
2020-10-05 18:53             ` Pierre Neidhardt
2020-10-09 21:16               ` zimoun
2020-10-10  8:57                 ` Pierre Neidhardt
2020-10-10 14:58                   ` zimoun
2020-10-12 10:16                   ` Ludovic Courtès
2020-10-12 11:18                     ` Pierre Neidhardt
2020-10-13 13:48                       ` Ludovic Courtès
2020-10-13 13:59                         ` Pierre Neidhardt
2020-10-10 16:03               ` zimoun
2020-10-11 11:19                 ` Pierre Neidhardt [this message]
2020-10-11 13:02                   ` zimoun
2020-10-11 14:25                     ` Pierre Neidhardt
2020-10-11 16:05                       ` zimoun
2020-10-12 10:20               ` Ludovic Courtès
2020-10-12 11:21                 ` Pierre Neidhardt
2020-10-13 13:45                   ` Ludovic Courtès
2020-10-13 13:56                     ` Pierre Neidhardt
2020-10-13 21:22                       ` Ludovic Courtès
2020-10-14  7:50                         ` Pierre Neidhardt
2020-10-16 10:30                           ` Ludovic Courtès
2020-10-17  9:14                             ` Pierre Neidhardt
2020-10-17 19:17                               ` Pierre Neidhardt
2020-10-21  9:53                               ` Ludovic Courtès
2020-10-21  9:58                                 ` Pierre Neidhardt
2020-10-12 11:23                 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87wnzx6mdh.fsf@ambrevar.xyz \
    --to=mail@ambrevar.xyz \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    --cc=othacehe@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).