From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id CMCsAy8uhF+sWAAA0tVLHw (envelope-from ) for ; Mon, 12 Oct 2020 10:21:35 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id +MM6Oy4uhF8IfQAAB5/wlQ (envelope-from ) for ; Mon, 12 Oct 2020 10:21:34 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 127F5940149 for ; Mon, 12 Oct 2020 10:21:34 +0000 (UTC) Received: from localhost ([::1]:38330 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRuxQ-0003Vz-HW for larch@yhetil.org; Mon, 12 Oct 2020 06:21:32 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:50684) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kRuwj-0003Uq-7d for guix-devel@gnu.org; Mon, 12 Oct 2020 06:20:54 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:40468) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kRuwh-0005N8-Mr; Mon, 12 Oct 2020 06:20:47 -0400 Received: from [2001:660:6102:320:e120:2c8f:8909:cdfe] (port=54068 helo=ribbon) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1kRuwg-000089-1A; Mon, 12 Oct 2020 06:20:46 -0400 From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Pierre Neidhardt Subject: Re: File search progress: database review and question on triggers References: <87sgcuh8rb.fsf@ambrevar.xyz> <86imd4e7cr.fsf@gmail.com> <87eenspcf8.fsf@ambrevar.xyz> <865z94dz83.fsf@gmail.com> <87zh6gns4l.fsf@ambrevar.xyz> <87zh5c7hx6.fsf@ambrevar.xyz> <87k0w4zw8q.fsf@gnu.org> <875z7oijxu.fsf@ambrevar.xyz> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 21 =?utf-8?Q?Vend=C3=A9miaire?= an 229 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Mon, 12 Oct 2020 12:20:43 +0200 In-Reply-To: <875z7oijxu.fsf@ambrevar.xyz> (Pierre Neidhardt's message of "Mon, 05 Oct 2020 20:53:01 +0200") Message-ID: <87eem3u4n8.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org, Mathieu Othacehe Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -1.51 X-TUID: 9flLG/d1BQRh Hi! Pierre Neidhardt skribis: >> Could you post a summary of what you have done, what=E2=80=99s left to d= o, and >> how you=E2=80=99d like to integrate it? (If you=E2=80=99ve already done= it, my >> apologies, but you can resend a link. :-)) > > What I've done: mostly a database benchmark. > > - Textual database: slow and not lighter than SQLite. Not worth it I bel= ieve. > > - SQLite without full-text search: fast, supports classic patterns > (e.g. "foo*bar") but does not support word permutations. > > - SQLite with full-text search: fast, supports word permutations but > does not support suffix-matching (e.g. "bar" won't match "foobar"). > Size is about the same as without full-text search. > > - Include synopsis and descriptions. Maybe we should include all fields > that are searched by `guix search`. This incurs a cost on the > database size but it would fix the `guix search` speed issue. Size > increases by some 10 MiB. Oh so this is going beyond file search, right? Perhaps it would make sense to focus on file search only as a first step, and see what can be done with synopses/descriptions (like Arun and zimoun did before) later, separately? > What's left to do: > > - Populate the database on demand, either after a `guix build` or from a > `guix filesearch...`. This is important so that `guix filesearch` > works on packages built locally. If `guix build`, I need help to know > where to plug it in. > > - Adapt Cuirass so that it builds its file database. > I need pointers to get started here. > > - Sync the databases from the substitute server to the client when > running `guix filesearch`. For this I suggest we send the compressed > database corresponding to a guix generation over the network (around > 10 MiB). Not sure sending just the delta is worth it. It would be nice to see whether/how this could be integrated with third-party channels. Of course it=E2=80=99s not a priority, but while designing this feature, we should keep in mind that we might want third-party channel authors to be able to offer such a database for their packages. > - Find a way to garbage-collect the database(s). My intuition is that > we should have 1 database per Guix checkout and when we `guix gc` a > Guix checkout we collect the corresponding database. If we download a fresh database every time, we might as well simply overwrite the one we have? Thanks, Ludo=E2=80=99.