From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id 8ML0ORCd62HxWAAAgWs5BA (envelope-from ) for ; Sat, 22 Jan 2022 06:58:40 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id cOp4NhCd62HcjQAAauVa8A (envelope-from ) for ; Sat, 22 Jan 2022 06:58:40 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 97F34178FE for ; Sat, 22 Jan 2022 06:58:40 +0100 (CET) Received: from localhost ([::1]:53646 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nB9Q7-0001Jt-Pb for larch@yhetil.org; Sat, 22 Jan 2022 00:58:39 -0500 Received: from eggs.gnu.org ([209.51.188.92]:54998) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nB9Pl-0001Io-H7 for guix-devel@gnu.org; Sat, 22 Jan 2022 00:58:18 -0500 Received: from mx1.riseup.net ([198.252.153.129]:43348) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nB9Pi-0002lb-NQ; Sat, 22 Jan 2022 00:58:16 -0500 Received: from fews2.riseup.net (fews2-pn.riseup.net [10.0.1.84]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.riseup.net", Issuer "R3" (not verified)) by mx1.riseup.net (Postfix) with ESMTPS id 4Jglsc3zhxzF6db; Fri, 21 Jan 2022 21:58:12 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1642831092; bh=9jYm8OvJ35UwQmZu5xzL9U3MXE3KqBS4JVHlPNwn884=; h=Date:From:To:Cc:Subject:In-Reply-To:References:From; b=Ib5wnkmAL+4LQobEzRLk6a0/0euvNra1XoWeBfnAL9mB9uri7YS+9zEraY7MJZjuP 34Q/Ah9ZCelhVfacW0Sb7t//sKP7Xi44zL3m1wuXglgzC7TQznYEWpeS9Rwjs8FoqI i+bjOzv9K5nJRBlaKdIieUgxLO8oslNZk59FLLnM= X-Riseup-User-ID: 538D98ECDD731B89390536DB9DA963E69AA6061D8BC423526550D5D815D02F51 Received: from [127.0.0.1] (localhost [127.0.0.1]) by fews2.riseup.net (Postfix) with ESMTPSA id 4Jglsb5xRrz1yGd; Fri, 21 Jan 2022 21:58:11 -0800 (PST) Date: Sat, 22 Jan 2022 05:46:13 +0100 From: raingloom To: Ludovic =?UTF-8?B?Q291cnTDqHM=?= Subject: Re: File search Message-ID: <20220122054613.4c09367e@riseup.net> In-Reply-To: <8735lh5ukw.fsf@inria.fr> References: <8735lh5ukw.fsf@inria.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=198.252.153.129; envelope-from=raingloom@riseup.net; helo=mx1.riseup.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix Devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1642831120; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=0bdQr5kyNEgVFE9Vl1JFXufZRydjTF1gbHdg9+uNBh0=; b=nmdzjEpxVEk4Adj0+AY7tEpRjS9czDUZA92c12xESIunPgn1ecxh3528dbTc7lW2y7Ig+C pN7sbJyVhUNWNEwCMu6sWuGsJPhCFcUlt09uZYw4L4ZaJGDm5vt5fSX5Iirdr3vDog2fhG CBoLPQzDzkHIWyeCJKQ27ov5YFmhyptU8QsKLW15MRRSQ6mjaOFS88w8kcFjdbUJV6hlna GaDwX74D+VJYUqX0h3dqSgOJTy0gre1iyK3S2PBHz6c1evs6N8uxm1PwxNQf4KdVEfsaIY k3pj14o50E2hWnj7Baw8ndd5Hq9nryFOZDnkr8AIs2794uL1zOWfmGnxNVAAwg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1642831120; a=rsa-sha256; cv=none; b=POY6CenVE6BRn3ElGsoSyjAg3R7MiRizUogNnNxCehuIYJ5SynkHlFzKebzI1ZpPTkx66f QKqzPhzXpq7i5GEgPgLdBSXeyZmWCViJsEm/yH+SAmMpJO7eQ16EsEy7rC67EkCXzEKXM8 a9ycO/d8CC9QZnjLmDyxdsvk8HkSE8ylJOesnK6Uz9CS+Ka9zbMgU33HK9D+fXXuPpRcZr 6BiWFbSrzCt8EvUmxpFTqK6ePSQHITMgpsQNiExkU4oqFDdbzfblRsoPuCVAPyNDuZDOvr PKibPkMRxL8WoYPqvIGVUF1pvCPklfNH4yTe3xVESFOVtiEUoFc1T2rbsfqOrg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=riseup.net header.s=squak header.b=Ib5wnkmA; dmarc=pass (policy=none) header.from=riseup.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -5.43 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=riseup.net header.s=squak header.b=Ib5wnkmA; dmarc=pass (policy=none) header.from=riseup.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 97F34178FE X-Spam-Score: -5.43 X-Migadu-Scanner: scn1.migadu.com X-TUID: w/jTI0TnQbwX On Fri, 21 Jan 2022 10:03:43 +0100 Ludovic Court=C3=A8s wrote: > Hello Guix! >=20 > Lately I found myself going several times to > to look for packages providing a given > file and I thought it=E2=80=99s time to do something about it. >=20 > The script below creates an SQLite database for the current set of > packages, but only for those already in the store: >=20 > guix repl file-database.scm populate >=20 > That creates /tmp/db; it took about 25mn on berlin, for 18K packages. > Then you can run, say: >=20 > guix repl file-database.scm search boot-9.scm >=20 > to find which packages provide a file named =E2=80=98boot-9.scm=E2=80=99.= That part > is instantaneous. >=20 > The database for 18K packages is quite big: >=20 > --8<---------------cut here---------------start------------->8--- > $ du -h /tmp/db* > 389M /tmp/db > 82M /tmp/db.gz > 61M /tmp/db.zst > --8<---------------cut here---------------end--------------->8--- >=20 > How do we expose that information? There are several criteria I can > think of: accuracy, freshness, privacy, responsiveness, off-line > operation. >=20 > I think accuracy (making sure you get results that correspond > precisely to, say, your current channel revisions and your current > system) is not a high priority: some result is better than no result. > Likewise for freshness: results for an older version of a given > package may still be valid now. >=20 > In terms of privacy, I think it=E2=80=99s better if we can avoid making o= ne > request per file searched for. Off-line operation would be sweet, and > it comes with responsiveness; fast off-line search is necessary for > things like =E2=80=98command-not-found=E2=80=99 (where the shell tells yo= u what > package to install when a command is not found). >=20 > Based on that, it is tempting to just distribute a full database from > ci.guix, say, that the client command would regularly fetch. The > downside is that that=E2=80=99s quite a lot of data to download; if you u= se > the file search command infrequently, you might find yourself > spending more time downloading the database than actually searching > it. >=20 > We could have a hybrid solution: distribute a database that contains > only files in /bin and /sbin (it should be much smaller), and for > everything else, resort to a web service (the Data Service could be > extended to include file lists). That way, we=E2=80=99d have fast > privacy-respecting search for command names, and on-line search for > everything else. >=20 > Thoughts? >=20 > Ludo=E2=80=99. >=20 One use case that I hope can be addressed is TeXlive packages. Trying to figure out which package corresponded to which missing file was a nightmare the last I had to use LaTeX.