From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp2 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id 8OL7MLyQMl9yDgAA0tVLHw (envelope-from ) for ; Tue, 11 Aug 2020 12:36:12 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp2 with LMTPS id MEkpK7yQMl+PHgAAB5/wlQ (envelope-from ) for ; Tue, 11 Aug 2020 12:36:12 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 736B9940414 for ; Tue, 11 Aug 2020 12:36:12 +0000 (UTC) Received: from localhost ([::1]:33426 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1k5TVj-0006hg-C1 for larch@yhetil.org; Tue, 11 Aug 2020 08:36:11 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:38498) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k5TVW-0006fQ-Bi for guix-devel@gnu.org; Tue, 11 Aug 2020 08:35:58 -0400 Received: from relay6-d.mail.gandi.net ([217.70.183.198]:59905) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1k5TVT-00068a-WD; Tue, 11 Aug 2020 08:35:58 -0400 X-Originating-IP: 86.246.37.13 Received: from bababa (lfbn-idf2-1-572-13.w86-246.abo.wanadoo.fr [86.246.37.13]) (Authenticated sender: mail@ambrevar.xyz) by relay6-d.mail.gandi.net (Postfix) with ESMTPSA id 1A137C0003; Tue, 11 Aug 2020 12:35:49 +0000 (UTC) From: Pierre Neidhardt To: Mathieu Othacehe Subject: Re: File search progress: database review and question on triggers In-Reply-To: <87sgctwm9d.fsf@gnu.org> References: <87sgcuh8rb.fsf@ambrevar.xyz> <87sgctwm9d.fsf@gnu.org> Date: Tue, 11 Aug 2020 14:35:48 +0200 Message-ID: <87pn7xgy1n.fsf@ambrevar.xyz> MIME-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha256; protocol="application/pgp-signature" Received-SPF: pass client-ip=217.70.183.198; envelope-from=mail@ambrevar.xyz; helo=relay6-d.mail.gandi.net X-detected-operating-system: by eggs.gnu.org: First seen = 2020/08/11 08:35:51 X-ACL-Warn: Detected OS = Linux 3.11 and newer X-Spam_score_int: -5 X-Spam_score: -0.6 X-Spam_bar: / X-Spam_report: (-0.6 / 5.0 requ) BAYES_00=-1.9, FROM_SUSPICIOUS_NTLD=1, PDS_OTHER_BAD_TLD=1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001, URIBL_BLOCKED=0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Spam-Score: -3.11 X-TUID: H+QciRmnYUSi --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Mathieu, Thanks for you comments! Answers below. >> 3. Size of the database: >> I've persisted all locally-present store items for my current Guix ve= rsion >> and it produced a database of 72=C2=A0MiB. It compresses down to 8= =C2=A0MiB in zstd. >> >> But since we can have multiple Guix versions, this means that the >> packages have one entry per store path, so we might end up with more >> entries than that as the number of Guix generations grows. > > I'm not sure we actually need to save the full history. I think we could > just store that the package X produces [Y1, Y2, ...] executable files. T= hen > on package X update, the executable files list could be refreshed. Maybe you are missing some context. The original discussion is there: https://lists.gnu.org/archive/html/guix-devel/2020-01/msg00019.html. Unlike Nix, we would like to do more than just index executable files. Indeed, it's very useful to know where to find, say, a C header, a .so library, a TeXlive .sty file, etc. However, as you hinted, maybe it's unnecessary to save the file listings of the packages for every Guix versions. Maybe we could only store the "diffs" between the Guix generations. I don't know if SQLite supports this. If not, it sounds like a rather complex thing to do. But really, if the compressed database over multiple Guix generations is <100=C2=A0MiB, then size is not a big problem. >> Question: Should we include empty directories in the database? I'm t= empted >> to answer no. > > I would also say no, and also exclude non-executable files. See above, I think we would lose a lot in not including non-executable file= s. >> Question: This bounds us to the SQLite syntax for pattern matching. = Is it a >> problem? >> It seems powerful enough in practice. But maybe we can use regular >> expression in SQLite as well? > > From the UI perspective, we already have "guix search" that expects a > regex. If we were to include a "guix file-search" command, then I think > it would make sense that it uses the same regex syntax. I found out that SQLite has a REGEXP operator, I'll see if it works well en= ough. >> 7. Have substitute servers distribute database content. When the user p= erforms >> a file search, Guix asks the substitute server for a database update.= Only >> the diff should be sent over the network, not the whole thing since i= t might > > If I understand correctly, you are proposing to create local databases > that would be kept in sync with a master database populated by the CI > server. This seems a bit complex. > > What about extending Cuirass database to add the two tables you are > proposing. Then, each time a package is built, if the version is > updated, the "Files" table would be updated. > > Then we could add an HTTP interface such as "/search/file?query=3Dlibxxx" > to Cuirass, that would directly query the database. In Guix itself, we > could add the counterpart in the (guix ci) module. The problem with this approach is that it would not work offline. In my opinion, this is a big limitation. I'd rather have a local database. Besides, we need a local database for non-official, locally-built packages anyways (which Cuirass would not know about). Since this is a requirement, the only piece that'd be missing is database synchronization. Thoughts? =2D-=20 Pierre Neidhardt https://ambrevar.xyz/ --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCAAdFiEEUPM+LlsMPZAEJKvom9z0l6S7zH8FAl8ykKQACgkQm9z0l6S7 zH8Mgwf+MqFP52aQOClx4tCaPA3aBbyJaqNQ88dQ0hDNYGq3oXIWkMhK0UprZxCj S0mBBvRD5jYa9lh7xf3c1Q1uGxSpJ+8KEC3gqkrbtJjdJ5mQmHxdiol1hvqiauwr N4h9J0VgGiyuYHsQvzPpQ/HleIZGINVrbW+gsS5RhJ9Km8E6qa9kkOD9kV84dwBp 8E7UtmVUXE/RqXf0046v1PTxbyBfHi+MG22Jh11/l9d4A3NfNxMyG6YYioaQ/MZX 4CzduhcPFkDiFZun94qGdcXgQbtFmzjO7FudJykft8R6FundcY4P45CHNAlibLR0 onBp0507Mxj82KXM6G3yDHRaUjuDug== =FReA -----END PGP SIGNATURE----- --=-=-=--