From: zimoun <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>, "Pierre Neidhardt" <mail@ambrevar.xyz>
Cc: guix-devel@gnu.org, Mathieu Othacehe <othacehe@gnu.org>
Subject: Re: File search progress: database review and question on triggers
Date: Mon, 12 Oct 2020 13:23:13 +0200 [thread overview]
Message-ID: <86h7qzvgbi.fsf@gmail.com> (raw)
In-Reply-To: <87eem3u4n8.fsf@gnu.org>
On Mon, 12 Oct 2020 at 12:20, Ludovic Courtès <ludo@gnu.org> wrote:
>> - Textual database: slow and not lighter than SQLite. Not worth it I believe.
>>
>> - SQLite without full-text search: fast, supports classic patterns
>> (e.g. "foo*bar") but does not support word permutations.
>>
>> - SQLite with full-text search: fast, supports word permutations but
>> does not support suffix-matching (e.g. "bar" won't match "foobar").
>> Size is about the same as without full-text search.
>>
>> - Include synopsis and descriptions. Maybe we should include all fields
>> that are searched by `guix search`. This incurs a cost on the
>> database size but it would fix the `guix search` speed issue. Size
>> increases by some 10 MiB.
>
> Oh so this is going beyond file search, right?
>
> Perhaps it would make sense to focus on file search only as a first
> step, and see what can be done with synopses/descriptions (like Arun and
> zimoun did before) later, separately?
Well, the first patch set that Arun sent for improving “guix search” was
the introduction of a SQLite database, replacing the current
’package.cache’. And I quote your wise advice:
I would rather keep the current package cache as-is instead of
inserting sqlite in here. I don’t expect it to bring much
compared performance-wise to the current simple cache
(especially if we look at load time), and it does increase
complexity quite a bit.
However, using sqlite for keyword search as you initially
proposed on guix-devel does sound like a great idea to me.
Message-ID: <87sgjhx92g.fsf@gnu.org>
Therefore, if Pierre is going to introduce a SQL database where the
addition of the synopses/descriptions is cheap, it seems a good idea to
use it, isn’t it? Keeping the ’package.cache’ as-is. And in parallel,
“we“ can try to use this WIP branch for improving the speed of “guix
search” (by “we”, I mean that I plan to work on).
BTW, somehow, it would be really easy to remove these 2 extra fields if
it is not concluding for search, since it is only the function
’add-files’:
--8<---------------cut here---------------start------------->8---
(with-statement
db
(string-append "insert into Info (name, synopsis, description, package)"
" values (:name, :synopsis, :description, :id)")
stmt
(sqlite-bind-arguments stmt
#:name name
#:synopsis synopsis
#:description description
#:id id)
--8<---------------cut here---------------end--------------->8---
and used only once by ’persist-package-files’.
> It would be nice to see whether/how this could be integrated with
> third-party channels. Of course it’s not a priority, but while
> designing this feature, we should keep in mind that we might want
> third-party channel authors to be able to offer such a database for
> their packages.
If the third-party channels also provides substitutes, then it would be
part of the substitutes, or easy to build from the substitute meta-data.
>> - Find a way to garbage-collect the database(s). My intuition is that
>> we should have 1 database per Guix checkout and when we `guix gc` a
>> Guix checkout we collect the corresponding database.
>
> If we download a fresh database every time, we might as well simply
> overwrite the one we have?
But you do not want to download it again if you roll-back for example.
From my point of view, it should be the same mechanism as
’package.cache’.
Cheers,
simon
prev parent reply other threads:[~2020-10-12 11:24 UTC|newest]
Thread overview: 73+ messages / expand[flat|nested] mbox.gz Atom feed top
2020-08-10 14:32 File search progress: database review and question on triggers Pierre Neidhardt
2020-08-11 9:43 ` Mathieu Othacehe
2020-08-11 12:35 ` Pierre Neidhardt
2020-08-15 12:48 ` Hartmut Goebel
2020-08-11 15:43 ` Ricardo Wurmus
2020-08-11 17:54 ` Pierre Neidhardt
2020-08-11 17:58 ` Pierre Neidhardt
2020-08-11 20:08 ` Ricardo Wurmus
2020-08-12 19:10 ` Pierre Neidhardt
2020-08-12 20:13 ` Julien Lepiller
2020-08-12 20:43 ` Pierre Neidhardt
2020-08-12 21:29 ` Julien Lepiller
2020-08-12 22:29 ` Ricardo Wurmus
2020-08-13 6:55 ` Pierre Neidhardt
2020-08-13 6:52 ` Pierre Neidhardt
2020-08-13 9:34 ` Ricardo Wurmus
2020-08-13 10:04 ` Pierre Neidhardt
2020-08-15 12:47 ` Hartmut Goebel
2020-08-15 21:20 ` Bengt Richter
2020-08-16 8:18 ` Hartmut Goebel
2020-08-12 20:32 ` Pierre Neidhardt
2020-08-13 0:17 ` Arun Isaac
2020-08-13 6:58 ` Pierre Neidhardt
2020-08-13 9:40 ` Pierre Neidhardt
2020-08-13 10:08 ` Pierre Neidhardt
2020-08-13 11:47 ` Ricardo Wurmus
2020-08-13 13:44 ` Pierre Neidhardt
2020-08-13 12:20 ` Arun Isaac
2020-08-13 13:53 ` Pierre Neidhardt
2020-08-13 15:14 ` Arun Isaac
2020-08-13 15:36 ` Pierre Neidhardt
2020-08-13 15:56 ` Pierre Neidhardt
2020-08-15 19:33 ` Arun Isaac
2020-08-24 8:29 ` Pierre Neidhardt
2020-08-24 10:53 ` Pierre Neidhardt
2020-09-04 19:15 ` Arun Isaac
2020-09-05 7:48 ` Pierre Neidhardt
2020-09-06 9:25 ` Arun Isaac
2020-09-06 10:05 ` Pierre Neidhardt
2020-09-06 10:33 ` Arun Isaac
2020-08-18 14:58 ` File search progress: database review and question on triggers OFF TOPIC PRAISE Joshua Branson
2020-08-27 10:00 ` File search progress: database review and question on triggers zimoun
2020-08-27 11:15 ` Pierre Neidhardt
2020-08-27 12:56 ` zimoun
2020-08-27 13:19 ` Pierre Neidhardt
2020-09-26 14:04 ` Pierre Neidhardt
2020-09-26 14:12 ` Pierre Neidhardt
2020-10-05 12:35 ` Ludovic Courtès
2020-10-05 18:53 ` Pierre Neidhardt
2020-10-09 21:16 ` zimoun
2020-10-10 8:57 ` Pierre Neidhardt
2020-10-10 14:58 ` zimoun
2020-10-12 10:16 ` Ludovic Courtès
2020-10-12 11:18 ` Pierre Neidhardt
2020-10-13 13:48 ` Ludovic Courtès
2020-10-13 13:59 ` Pierre Neidhardt
2020-10-10 16:03 ` zimoun
2020-10-11 11:19 ` Pierre Neidhardt
2020-10-11 13:02 ` zimoun
2020-10-11 14:25 ` Pierre Neidhardt
2020-10-11 16:05 ` zimoun
2020-10-12 10:20 ` Ludovic Courtès
2020-10-12 11:21 ` Pierre Neidhardt
2020-10-13 13:45 ` Ludovic Courtès
2020-10-13 13:56 ` Pierre Neidhardt
2020-10-13 21:22 ` Ludovic Courtès
2020-10-14 7:50 ` Pierre Neidhardt
2020-10-16 10:30 ` Ludovic Courtès
2020-10-17 9:14 ` Pierre Neidhardt
2020-10-17 19:17 ` Pierre Neidhardt
2020-10-21 9:53 ` Ludovic Courtès
2020-10-21 9:58 ` Pierre Neidhardt
2020-10-12 11:23 ` zimoun [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86h7qzvgbi.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=guix-devel@gnu.org \
--cc=ludo@gnu.org \
--cc=mail@ambrevar.xyz \
--cc=othacehe@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).