From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id iPTjCOmL8GFBUAEAgWs5BA (envelope-from ) for ; Wed, 26 Jan 2022 00:46:49 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id GO2fBemL8GEXDgEAauVa8A (envelope-from ) for ; Wed, 26 Jan 2022 00:46:49 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C5F2521784 for ; Wed, 26 Jan 2022 00:46:48 +0100 (CET) Received: from localhost ([::1]:32972 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCVWS-0007oA-0N for larch@yhetil.org; Tue, 25 Jan 2022 18:46:48 -0500 Received: from eggs.gnu.org ([209.51.188.92]:34988) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCVWB-0007fp-1H for guix-devel@gnu.org; Tue, 25 Jan 2022 18:46:31 -0500 Received: from mail-4322.protonmail.ch ([185.70.43.22]:39351) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCVW8-0002Hg-7n for guix-devel@gnu.org; Tue, 25 Jan 2022 18:46:30 -0500 Date: Tue, 25 Jan 2022 23:45:35 +0000 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=protonmail.com; s=protonmail2; t=1643154338; bh=CJjFLPW1jR0BIDb5OJIzYTRgUOKJRKIHbGVkK+dDfE0=; h=Date:To:From:Cc:Reply-To:Subject:Message-ID:In-Reply-To: References:From:To:Cc; b=M9FDNBeMBE5CXkvmHAclLXS1WwF0vkRa0o4fUIAGjglF4xOLRqFL1uEdunqwVzrjc amoR19Aw/06Pvajl951nJ69bSI/yHNA/pG3hEH5d7G+S6Os9Y5E+MyMUKjRVLhsx0g ZwvB7ZtZDuCZ7MyM7Ejy/sg63g6C7tsBTTvbIsupXGbk/tr1/3YwLn9V0Qqvl+SwMI UweAbnKAcuRhJ/B/XcOb8B8SxZnIOUq0sWHcZ7MEoRapRULT13jgLXWJUtjaPSJ0U9 oIqfFWvJtuJimc24jitPzYFcjdUR3EjVjfn++0yZcEUIEG8xnVgQS71glO/nIbTIkw 8yVge6md+wesQ== To: =?utf-8?Q?Ludovic_Court=C3=A8s?= From: Ryan Prior Cc: Guix Devel Subject: Re: File search Message-ID: In-Reply-To: <8735lh5ukw.fsf@inria.fr> References: <8735lh5ukw.fsf@inria.fr> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=185.70.43.22; envelope-from=rprior@protonmail.com; helo=mail-4322.protonmail.ch X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Reply-To: Ryan Prior Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1643154408; h=from:from:sender:sender:reply-to:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=CJjFLPW1jR0BIDb5OJIzYTRgUOKJRKIHbGVkK+dDfE0=; b=cJnc1OQtl1K+iNZDrOXKz4aNoEfC/i141qGVy1Sm6Goyz7jSAB/ViDaBfa2N5/nZy7QUE7 MuinB/c+xQ8UnUpw6TVjtEjcXjEktuPSoMSTA1/FrhcHpCqovDisea8f/roS557jAQkVB7 8O9Gfx2NFsPTuKH+IiHkAJNmz4E9w9w0GjBBeMNxyvZht0jeM7KJ6vXaI+6n1KXyPqARR+ pS9n3YY/JOgKvpM3di0hlboekpdWRwCxn/1RheYmaiFKZwGrHHLT558HRtkEAF/mLfwC6V t65DKPxFhr5FPdPtUcOxfkN9KvM1sFp4DxvBeSixesbkg+mcV5LdCf3MbOHhYQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1643154408; a=rsa-sha256; cv=none; b=dbyIJVAjXViXwJXH+2BbluL5jky8gfp2ffRqRoDBpJ+Ce1LBz1INGx+fMGxM4N6aMKLeU+ GX7Ymq5OqxNxHhxrbtaFkhsf0ffz25PMdlRiVW8fUG8bt0ihnlqkc7pSuFOnlHVFLl47Ky tk4PiBXpBIF/St+of73yTLniED2qp5ediCKON3YspfC9GlSsHz/dQCYLB4pP/TpPy2gEIK jGg3AMrUXj5GQsUlw1YbxlDOipswgjTDKSjJ6rJWX/0ZoZBXxOyJzJP0ZPdPDmxaFkBq6j 23eWD77i/0+wcSA3jPAaF1SHidN7dyHASNSVXwCb7KDCOZE35YhtiBq4jAq/BQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail2 header.b=M9FDNBeM; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -4.33 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=protonmail.com header.s=protonmail2 header.b=M9FDNBeM; dmarc=pass (policy=quarantine) header.from=protonmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: C5F2521784 X-Spam-Score: -4.33 X-Migadu-Scanner: scn1.migadu.com X-TUID: 0ZDAqINNpcz7 On Friday, January 21st, 2022 at 9:03 AM, Ludovic Court=C3=A8s wrote: > The database for 18K packages is quite big: > > --8<---------------cut here---------------start------------->8--- > > $ du -h /tmp/db* > > 389M /tmp/db > > 82M /tmp/db.gz > > 61M /tmp/db.zst > > --8<---------------cut here---------------end--------------->8--- > [snip] > In terms of privacy, I think it=E2=80=99s better if we can avoid making > one request per file searched for. Off-line operation would be > sweet, and it comes with responsiveness; fast off-line search is > necessary for things like =E2=80=98command-not-found=E2=80=99 (where the = shell > tells you what package to install when a command is not found). Offline operation is crucial, and I don't think it's desirable to download = tens or hundreds of megabytes. What about creating & distributing a bloom f= ilter per package, with members being file names? This would allow us to dr= amatically reduce the size of data we distribute, at the cost of not giving= 100% reliable answers. We've established, though, that some information is= better than none, and the uncertainty can be resolved by querying a web se= rvice or building the package locally and searching its directory.