unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: "Pierre Neidhardt" <mail@ambrevar.xyz>, "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org, Mathieu Othacehe <othacehe@gnu.org>
Subject: Re: File search progress: database review and question on triggers
Date: Sat, 10 Oct 2020 18:03:40 +0200	[thread overview]
Message-ID: <865z7iqd9f.fsf@gmail.com> (raw)
In-Reply-To: <875z7oijxu.fsf@ambrevar.xyz>

Hi,

On Mon, 05 Oct 2020 at 20:53, Pierre Neidhardt <mail@ambrevar.xyz> wrote:

> - Textual database: slow and not lighter than SQLite.  Not worth it I believe.

Maybe I am out-of-scope, but re-reading *all* the discussion about
“fileserch”, is it possible to really do better than “locate”? As
Ricardo mentioned.

--8<---------------cut here---------------start------------->8---
echo 3 > /proc/sys/vm/drop_caches
time updatedb --output=/tmp/store.db --database-root=/gnu/store/

real    0m19.903s
user    0m1.549s
sys     0m4.500s

du -sh /gnu/store /tmp/store.db
30G     /gnu/store
56M     /tmp/store.db

guix gc -F XXG
echo 3 > /proc/sys/vm/drop_caches
time updatedb --output=/tmp/store.db --database-root=/gnu/store/

real    0m10.105s
user    0m0.865s
sys     0m2.020s

du -sh /gnu/store /tmp/store.db
28G     /gnu/store
52M     /tmp/store.db
--8<---------------cut here---------------end--------------->8---

And then “locate” support regexp and regex and it is fast enough.

--8<---------------cut here---------------start------------->8---
echo 3 > /proc/sys/vm/drop_caches
time locate -d /tmp/store.db --regex "emacs-ma[a-z0-9\.\-]+\/[^.]+.el$" | tail -n5
/gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-transient.el
/gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-utils.el
/gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-wip.el
/gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit-worktree.el
/gnu/store/zawdnn1hhf4a2nscgw7rydkws383dl4l-emacs-magit-2.90.1-6.7f486d4/share/emacs/site-lisp/magit.el

real    0m3.601s
user    0m3.528s
sys     0m0.061s
--8<---------------cut here---------------end--------------->8---

The only point is that regexp is always cumbersome for me.  Well: «Some
people, when confronted with a problem, think "I know, I'll use regular
expressions." Now they have two problems.» :-) [1]

[1] https://en.wikiquote.org/wiki/Jamie_Zawinski


> - Include synopsis and descriptions.  Maybe we should include all fields
>   that are searched by `guix search`.  This incurs a cost on the
>   database size but it would fix the `guix search` speed issue.  Size
>   increases by some 10 MiB.

From my point of view, yes.  Somehow “filesearch” is a subpart of
“search”.  So it should be the machinery.


> I say we go with SQLite full-text search for now with all package
> details.  Switching to without full-text search is just a matter of a
> minor adjustment, which we can decide later when merging the final
> patch.  Same if we decide not to include the description, synopsis, etc.

[...]

> - Populate the database on demand, either after a `guix build` or from a
>   `guix filesearch...`.  This is important so that `guix filesearch`
>   works on packages built locally.  If `guix build`, I need help to know
>   where to plug it in.

[...]

> - Sync the databases from the substitute server to the client when
>   running `guix filesearch`.  For this I suggest we send the compressed
>   database corresponding to a guix generation over the network (around
>   10 MiB).  Not sure sending just the delta is worth it.

From my point of view, how to transfer the database from substitutes to
users and how to locally update (custom channels or custom load path) are
not easy.  Maybe the core issues.


For example, I just did “guix pull” and “–list-generation” says from
f6dfe42 (Sept. 15) to 4ec2190 (Oct. 10)::

   39.9 MB will be download

more the tiny bits before “Computing Guix derivation”.  Say 50MB max.

Well, the “locate” database for my “/gnu/store” (~30GB) is already to
~50MB, and ~20MB when compressed with gzip.  And Pierre said:

      The database will all package descriptions and synopsis is 46 MiB
      and compresses down to 11 MiB in zstd.

which is better but still something.  Well, it is not affordable to
fetch the database with “guix pull”, IMHO.


Therefore, the database would be fetched at the first “guix search”
(assuming point above).  But now, how “search” could know what is custom
build and what is not?  Somehow, “search” should scan all the store to
be able to update the database.

And what happens each time I am doing a custom build then “filesearch”.
The database should be updated, right?  Well, it seems almost unusable.

The model “updatedb/locate” seems better.  The user updates “manually”
if required and then location is fast.

Most of the cases are searching files in packages that are not my custom
packages.  IMHO.


To me, each time I am using “filesearch”:

 - first time: fetch the database corresponding the Guix commit and then
 update it with my local store
 - otherwise: use this database
 - optionally update the database if the user wants to include new
 custom items.


We could imagine a hook or option to “guix pull” specifying to also
fetch the database and update it at pull time instead of “search” time.
Personally, I prefer longer “guix pull” because it is already a bit long
and then fast “search” than half/half (not so long pull and longer
search).


WDYT?

> - Find a way to garbage-collect the database(s).  My intuition is that
>   we should have 1 database per Guix checkout and when we `guix gc` a
>   Guix checkout we collect the corresponding database.

Well, the exact same strategy as
~/.config/guix/current/lib/guix/package.cache can be used.


BTW, thanks Pierre for improving the Guix discoverability. :-)

Cheers,
simon



  parent reply	other threads:[~2020-10-10 16:04 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11  9:43 ` Mathieu Othacehe
2020-08-11 12:35   ` Pierre Neidhardt
2020-08-15 12:48     ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54   ` Pierre Neidhardt
2020-08-11 17:58     ` Pierre Neidhardt
2020-08-11 20:08       ` Ricardo Wurmus
2020-08-12 19:10         ` Pierre Neidhardt
2020-08-12 20:13           ` Julien Lepiller
2020-08-12 20:43             ` Pierre Neidhardt
2020-08-12 21:29               ` Julien Lepiller
2020-08-12 22:29                 ` Ricardo Wurmus
2020-08-13  6:55                   ` Pierre Neidhardt
2020-08-13  6:52                 ` Pierre Neidhardt
2020-08-13  9:34                   ` Ricardo Wurmus
2020-08-13 10:04                     ` Pierre Neidhardt
2020-08-15 12:47                       ` Hartmut Goebel
2020-08-15 21:20                         ` Bengt Richter
2020-08-16  8:18                           ` Hartmut Goebel
2020-08-12 20:32           ` Pierre Neidhardt
2020-08-13  0:17           ` Arun Isaac
2020-08-13  6:58             ` Pierre Neidhardt
2020-08-13  9:40             ` Pierre Neidhardt
2020-08-13 10:08               ` Pierre Neidhardt
2020-08-13 11:47               ` Ricardo Wurmus
2020-08-13 13:44                 ` Pierre Neidhardt
2020-08-13 12:20               ` Arun Isaac
2020-08-13 13:53                 ` Pierre Neidhardt
2020-08-13 15:14                   ` Arun Isaac
2020-08-13 15:36                     ` Pierre Neidhardt
2020-08-13 15:56                       ` Pierre Neidhardt
2020-08-15 19:33                         ` Arun Isaac
2020-08-24  8:29                           ` Pierre Neidhardt
2020-08-24 10:53                             ` Pierre Neidhardt
2020-09-04 19:15                               ` Arun Isaac
2020-09-05  7:48                                 ` Pierre Neidhardt
2020-09-06  9:25                                   ` Arun Isaac
2020-09-06 10:05                                     ` Pierre Neidhardt
2020-09-06 10:33                                       ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15   ` Pierre Neidhardt
2020-08-27 12:56     ` zimoun
2020-08-27 13:19       ` Pierre Neidhardt
2020-09-26 14:04         ` Pierre Neidhardt
2020-09-26 14:12           ` Pierre Neidhardt
2020-10-05 12:35           ` Ludovic Courtès
2020-10-05 18:53             ` Pierre Neidhardt
2020-10-09 21:16               ` zimoun
2020-10-10  8:57                 ` Pierre Neidhardt
2020-10-10 14:58                   ` zimoun
2020-10-12 10:16                   ` Ludovic Courtès
2020-10-12 11:18                     ` Pierre Neidhardt
2020-10-13 13:48                       ` Ludovic Courtès
2020-10-13 13:59                         ` Pierre Neidhardt
2020-10-10 16:03               ` zimoun [this message]
2020-10-11 11:19                 ` Pierre Neidhardt
2020-10-11 13:02                   ` zimoun
2020-10-11 14:25                     ` Pierre Neidhardt
2020-10-11 16:05                       ` zimoun
2020-10-12 10:20               ` Ludovic Courtès
2020-10-12 11:21                 ` Pierre Neidhardt
2020-10-13 13:45                   ` Ludovic Courtès
2020-10-13 13:56                     ` Pierre Neidhardt
2020-10-13 21:22                       ` Ludovic Courtès
2020-10-14  7:50                         ` Pierre Neidhardt
2020-10-16 10:30                           ` Ludovic Courtès
2020-10-17  9:14                             ` Pierre Neidhardt
2020-10-17 19:17                               ` Pierre Neidhardt
2020-10-21  9:53                               ` Ludovic Courtès
2020-10-21  9:58                                 ` Pierre Neidhardt
2020-10-12 11:23                 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=865z7iqd9f.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    --cc=mail@ambrevar.xyz \
    --cc=othacehe@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).