unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Pierre Neidhardt <mail@ambrevar.xyz>
To: Mathieu Othacehe <othacehe@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: File search progress: database review and question on triggers
Date: Tue, 11 Aug 2020 14:35:48 +0200	[thread overview]
Message-ID: <87pn7xgy1n.fsf@ambrevar.xyz> (raw)
In-Reply-To: <87sgctwm9d.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 3454 bytes --]

Hi Mathieu,

Thanks for you comments!
Answers below.

>> 3. Size of the database:
>>    I've persisted all locally-present store items for my current Guix version
>>    and it produced a database of 72 MiB.  It compresses down to 8 MiB in zstd.
>>
>>    But since we can have multiple Guix versions, this means that the
>>    packages have one entry per store path, so we might end up with more
>>    entries than that as the number of Guix generations grows.
>
> I'm not sure we actually need to save the full history. I think we could
> just store that the package X produces [Y1, Y2, ...]  executable files. Then
> on package X update, the executable files list could be refreshed.

Maybe you are missing some context.  The original discussion is there:
https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00019.html.

Unlike Nix, we would like to do more than just index executable files.
Indeed, it's very useful to know where to find, say, a C header, a .so
library, a TeXlive .sty file, etc.

However, as you hinted, maybe it's unnecessary to save the file listings of
the packages for every Guix versions.  Maybe we could only store the
"diffs" between the Guix generations.  I don't know if SQLite supports
this.  If not, it sounds like a rather complex thing to do.

But really, if the compressed database over multiple Guix generations is
<100 MiB, then size is not a big problem.

>>    Question: Should we include empty directories in the database?  I'm tempted
>>    to answer no.
>
> I would also say no, and also exclude non-executable files.

See above, I think we would lose a lot in not including non-executable files.

>>    Question: This bounds us to the SQLite syntax for pattern matching.  Is it a
>>    problem?
>>    It seems powerful enough in practice.  But maybe we can use regular
>>    expression in SQLite as well?
>
> From the UI perspective, we already have "guix search" that expects a
> regex. If we were to include a "guix file-search" command, then I think
> it would make sense that it uses the same regex syntax.

I found out that SQLite has a REGEXP operator, I'll see if it works well enough.

>> 7. Have substitute servers distribute database content.  When the user performs
>>    a file search, Guix asks the substitute server for a database update.  Only
>>    the diff should be sent over the network, not the whole thing since it might
>
> If I understand correctly, you are proposing to create local databases
> that would be kept in sync with a master database populated by the CI
> server. This seems a bit complex.
>
> What about extending Cuirass database to add the two tables you are
> proposing. Then, each time a package is built, if the version is
> updated, the "Files" table would be updated.
>
> Then we could add an HTTP interface such as "/search/file?query=libxxx"
> to Cuirass, that would directly query the database. In Guix itself, we
> could add the counterpart in the (guix ci) module.

The problem with this approach is that it would not work offline.  In my
opinion, this is a big limitation.  I'd rather have a local database.

Besides, we need a local database for non-official, locally-built
packages anyways (which Cuirass would not know about).  Since this is a
requirement, the only piece that'd be missing is database synchronization.

Thoughts?

-- 
Pierre Neidhardt
https://ambrevar.xyz/

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

  reply	other threads:[~2020-08-11 12:36 UTC|newest]

Thread overview: 73+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11  9:43 ` Mathieu Othacehe
2020-08-11 12:35   ` Pierre Neidhardt [this message]
2020-08-15 12:48     ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54   ` Pierre Neidhardt
2020-08-11 17:58     ` Pierre Neidhardt
2020-08-11 20:08       ` Ricardo Wurmus
2020-08-12 19:10         ` Pierre Neidhardt
2020-08-12 20:13           ` Julien Lepiller
2020-08-12 20:43             ` Pierre Neidhardt
2020-08-12 21:29               ` Julien Lepiller
2020-08-12 22:29                 ` Ricardo Wurmus
2020-08-13  6:55                   ` Pierre Neidhardt
2020-08-13  6:52                 ` Pierre Neidhardt
2020-08-13  9:34                   ` Ricardo Wurmus
2020-08-13 10:04                     ` Pierre Neidhardt
2020-08-15 12:47                       ` Hartmut Goebel
2020-08-15 21:20                         ` Bengt Richter
2020-08-16  8:18                           ` Hartmut Goebel
2020-08-12 20:32           ` Pierre Neidhardt
2020-08-13  0:17           ` Arun Isaac
2020-08-13  6:58             ` Pierre Neidhardt
2020-08-13  9:40             ` Pierre Neidhardt
2020-08-13 10:08               ` Pierre Neidhardt
2020-08-13 11:47               ` Ricardo Wurmus
2020-08-13 13:44                 ` Pierre Neidhardt
2020-08-13 12:20               ` Arun Isaac
2020-08-13 13:53                 ` Pierre Neidhardt
2020-08-13 15:14                   ` Arun Isaac
2020-08-13 15:36                     ` Pierre Neidhardt
2020-08-13 15:56                       ` Pierre Neidhardt
2020-08-15 19:33                         ` Arun Isaac
2020-08-24  8:29                           ` Pierre Neidhardt
2020-08-24 10:53                             ` Pierre Neidhardt
2020-09-04 19:15                               ` Arun Isaac
2020-09-05  7:48                                 ` Pierre Neidhardt
2020-09-06  9:25                                   ` Arun Isaac
2020-09-06 10:05                                     ` Pierre Neidhardt
2020-09-06 10:33                                       ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15   ` Pierre Neidhardt
2020-08-27 12:56     ` zimoun
2020-08-27 13:19       ` Pierre Neidhardt
2020-09-26 14:04         ` Pierre Neidhardt
2020-09-26 14:12           ` Pierre Neidhardt
2020-10-05 12:35           ` Ludovic Courtès
2020-10-05 18:53             ` Pierre Neidhardt
2020-10-09 21:16               ` zimoun
2020-10-10  8:57                 ` Pierre Neidhardt
2020-10-10 14:58                   ` zimoun
2020-10-12 10:16                   ` Ludovic Courtès
2020-10-12 11:18                     ` Pierre Neidhardt
2020-10-13 13:48                       ` Ludovic Courtès
2020-10-13 13:59                         ` Pierre Neidhardt
2020-10-10 16:03               ` zimoun
2020-10-11 11:19                 ` Pierre Neidhardt
2020-10-11 13:02                   ` zimoun
2020-10-11 14:25                     ` Pierre Neidhardt
2020-10-11 16:05                       ` zimoun
2020-10-12 10:20               ` Ludovic Courtès
2020-10-12 11:21                 ` Pierre Neidhardt
2020-10-13 13:45                   ` Ludovic Courtès
2020-10-13 13:56                     ` Pierre Neidhardt
2020-10-13 21:22                       ` Ludovic Courtès
2020-10-14  7:50                         ` Pierre Neidhardt
2020-10-16 10:30                           ` Ludovic Courtès
2020-10-17  9:14                             ` Pierre Neidhardt
2020-10-17 19:17                               ` Pierre Neidhardt
2020-10-21  9:53                               ` Ludovic Courtès
2020-10-21  9:58                                 ` Pierre Neidhardt
2020-10-12 11:23                 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pn7xgy1n.fsf@ambrevar.xyz \
    --to=mail@ambrevar.xyz \
    --cc=guix-devel@gnu.org \
    --cc=othacehe@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).