unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Pierre Neidhardt <mail@ambrevar.xyz>
Cc: guix-devel@gnu.org, Mathieu Othacehe <othacehe@gnu.org>
Subject: Re: File search progress: database review and question on triggers
Date: Mon, 12 Oct 2020 12:20:43 +0200	[thread overview]
Message-ID: <87eem3u4n8.fsf@gnu.org> (raw)
In-Reply-To: <875z7oijxu.fsf@ambrevar.xyz> (Pierre Neidhardt's message of "Mon, 05 Oct 2020 20:53:01 +0200")

Hi!

Pierre Neidhardt <mail@ambrevar.xyz> skribis:

>> Could you post a summary of what you have done, what’s left to do, and
>> how you’d like to integrate it?  (If you’ve already done it, my
>> apologies, but you can resend a link.  :-))
>
> What I've done: mostly a database benchmark.
>
> - Textual database: slow and not lighter than SQLite.  Not worth it I believe.
>
> - SQLite without full-text search: fast, supports classic patterns
>   (e.g. "foo*bar") but does not support word permutations.
>
> - SQLite with full-text search: fast, supports word permutations but
>   does not support suffix-matching (e.g. "bar" won't match "foobar").
>   Size is about the same as without full-text search.
>
> - Include synopsis and descriptions.  Maybe we should include all fields
>   that are searched by `guix search`.  This incurs a cost on the
>   database size but it would fix the `guix search` speed issue.  Size
>   increases by some 10 MiB.

Oh so this is going beyond file search, right?

Perhaps it would make sense to focus on file search only as a first
step, and see what can be done with synopses/descriptions (like Arun and
zimoun did before) later, separately?

> What's left to do:
>
> - Populate the database on demand, either after a `guix build` or from a
>   `guix filesearch...`.  This is important so that `guix filesearch`
>   works on packages built locally.  If `guix build`, I need help to know
>   where to plug it in.
>
> - Adapt Cuirass so that it builds its file database.
>   I need pointers to get started here.
>
> - Sync the databases from the substitute server to the client when
>   running `guix filesearch`.  For this I suggest we send the compressed
>   database corresponding to a guix generation over the network (around
>   10 MiB).  Not sure sending just the delta is worth it.

It would be nice to see whether/how this could be integrated with
third-party channels.  Of course it’s not a priority, but while
designing this feature, we should keep in mind that we might want
third-party channel authors to be able to offer such a database for
their packages.

> - Find a way to garbage-collect the database(s).  My intuition is that
>   we should have 1 database per Guix checkout and when we `guix gc` a
>   Guix checkout we collect the corresponding database.

If we download a fresh database every time, we might as well simply
overwrite the one we have?

Thanks,
Ludo’.


  parent reply	other threads:[~2020-10-12 10:21 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11  9:43 ` Mathieu Othacehe
2020-08-11 12:35   ` Pierre Neidhardt
2020-08-15 12:48     ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54   ` Pierre Neidhardt
2020-08-11 17:58     ` Pierre Neidhardt
2020-08-11 20:08       ` Ricardo Wurmus
2020-08-12 19:10         ` Pierre Neidhardt
2020-08-12 20:13           ` Julien Lepiller
2020-08-12 20:43             ` Pierre Neidhardt
2020-08-12 21:29               ` Julien Lepiller
2020-08-12 22:29                 ` Ricardo Wurmus
2020-08-13  6:55                   ` Pierre Neidhardt
2020-08-13  6:52                 ` Pierre Neidhardt
2020-08-13  9:34                   ` Ricardo Wurmus
2020-08-13 10:04                     ` Pierre Neidhardt
2020-08-15 12:47                       ` Hartmut Goebel
2020-08-15 21:20                         ` Bengt Richter
2020-08-16  8:18                           ` Hartmut Goebel
2020-08-12 20:32           ` Pierre Neidhardt
2020-08-13  0:17           ` Arun Isaac
2020-08-13  6:58             ` Pierre Neidhardt
2020-08-13  9:40             ` Pierre Neidhardt
2020-08-13 10:08               ` Pierre Neidhardt
2020-08-13 11:47               ` Ricardo Wurmus
2020-08-13 13:44                 ` Pierre Neidhardt
2020-08-13 12:20               ` Arun Isaac
2020-08-13 13:53                 ` Pierre Neidhardt
2020-08-13 15:14                   ` Arun Isaac
2020-08-13 15:36                     ` Pierre Neidhardt
2020-08-13 15:56                       ` Pierre Neidhardt
2020-08-15 19:33                         ` Arun Isaac
2020-08-24  8:29                           ` Pierre Neidhardt
2020-08-24 10:53                             ` Pierre Neidhardt
2020-09-04 19:15                               ` Arun Isaac
2020-09-05  7:48                                 ` Pierre Neidhardt
2020-09-06  9:25                                   ` Arun Isaac
2020-09-06 10:05                                     ` Pierre Neidhardt
2020-09-06 10:33                                       ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15   ` Pierre Neidhardt
2020-08-27 12:56     ` zimoun
2020-08-27 13:19       ` Pierre Neidhardt
2020-09-26 14:04         ` Pierre Neidhardt
2020-09-26 14:12           ` Pierre Neidhardt
2020-10-05 12:35           ` Ludovic Courtès
2020-10-05 18:53             ` Pierre Neidhardt
2020-10-09 21:16               ` zimoun
2020-10-10  8:57                 ` Pierre Neidhardt
2020-10-10 14:58                   ` zimoun
2020-10-12 10:16                   ` Ludovic Courtès
2020-10-12 11:18                     ` Pierre Neidhardt
2020-10-13 13:48                       ` Ludovic Courtès
2020-10-13 13:59                         ` Pierre Neidhardt
2020-10-10 16:03               ` zimoun
2020-10-11 11:19                 ` Pierre Neidhardt
2020-10-11 13:02                   ` zimoun
2020-10-11 14:25                     ` Pierre Neidhardt
2020-10-11 16:05                       ` zimoun
2020-10-12 10:20               ` Ludovic Courtès [this message]
2020-10-12 11:21                 ` Pierre Neidhardt
2020-10-13 13:45                   ` Ludovic Courtès
2020-10-13 13:56                     ` Pierre Neidhardt
2020-10-13 21:22                       ` Ludovic Courtès
2020-10-14  7:50                         ` Pierre Neidhardt
2020-10-16 10:30                           ` Ludovic Courtès
2020-10-17  9:14                             ` Pierre Neidhardt
2020-10-17 19:17                               ` Pierre Neidhardt
2020-10-21  9:53                               ` Ludovic Courtès
2020-10-21  9:58                                 ` Pierre Neidhardt
2020-10-12 11:23                 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87eem3u4n8.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=guix-devel@gnu.org \
    --cc=mail@ambrevar.xyz \
    --cc=othacehe@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).