From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id GHoBJmNoMl+lZgAA0tVLHw (envelope-from ) for ; Tue, 11 Aug 2020 09:44:03 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id 2I1bIGNoMl+JIAAA1q6Kng (envelope-from ) for ; Tue, 11 Aug 2020 09:44:03 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3732C9403C5 for ; Tue, 11 Aug 2020 09:44:03 +0000 (UTC) Received: from localhost ([::1]:56756 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k5Qp8-0005oI-0N for larch@yhetil.org; Tue, 11 Aug 2020 05:44:02 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:58466) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k5Qoz-0005mk-1Y for guix-devel@gnu.org; Tue, 11 Aug 2020 05:43:53 -0400 Received: from fencepost.gnu.org ([2001:470:142:3::e]:39706) by eggs.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k5Qoy-0002Ky-0J; Tue, 11 Aug 2020 05:43:52 -0400 Received: from [2a01:e0a:19b:d9a0:3107:b202:556:bd51] (port=55224 helo=cervin) by fencepost.gnu.org with esmtpsa (TLS1.2:RSA_AES_256_CBC_SHA1:256) (Exim 4.82) (envelope-from ) id 1k5Qop-0003Vv-Mh; Tue, 11 Aug 2020 05:43:46 -0400 From: Mathieu Othacehe To: Pierre Neidhardt Subject: Re: File search progress: database review and question on triggers References: <87sgcuh8rb.fsf@ambrevar.xyz> Date: Tue, 11 Aug 2020 11:43:42 +0200 In-Reply-To: <87sgcuh8rb.fsf@ambrevar.xyz> (Pierre Neidhardt's message of "Mon, 10 Aug 2020 16:32:08 +0200") Message-ID: <87sgctwm9d.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -1.01 X-TUID: NlUl05/kqCS+ Hello Pierre, Thanks for sharing your progress. A few remarks below. > 3. Size of the database: > I've persisted all locally-present store items for my current Guix ver= sion > and it produced a database of 72=C2=A0MiB. It compresses down to 8=C2= =A0MiB in zstd. > > But since we can have multiple Guix versions, this means that the > packages have one entry per store path, so we might end up with more > entries than that as the number of Guix generations grows. I'm not sure we actually need to save the full history. I think we could just store that the package X produces [Y1, Y2, ...] executable files. Then on package X update, the executable files list could be refreshed. > Question: Should we include empty directories in the database? I'm te= mpted > to answer no. I would also say no, and also exclude non-executable files. > Question: This bounds us to the SQLite syntax for pattern matching. I= s it a > problem? > It seems powerful enough in practice. But maybe we can use regular > expression in SQLite as well? >From the UI perspective, we already have "guix search" that expects a regex. If we were to include a "guix file-search" command, then I think it would make sense that it uses the same regex syntax. > Next points I'd like to address: > > 6. Automatically persist the database entry when building a package. > Any idea where I should plug that in? > > 7. Have substitute servers distribute database content. When the user pe= rforms > a file search, Guix asks the substitute server for a database update. = Only > the diff should be sent over the network, not the whole thing since it= might If I understand correctly, you are proposing to create local databases that would be kept in sync with a master database populated by the CI server. This seems a bit complex. What about extending Cuirass database to add the two tables you are proposing. Then, each time a package is built, if the version is updated, the "Files" table would be updated. Then we could add an HTTP interface such as "/search/file?query=3Dlibxxx" to Cuirass, that would directly query the database. In Guix itself, we could add the counterpart in the (guix ci) module. WDYT? Thanks, Mathieu