From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id yC1PLjZd/mExdQAAgWs5BA (envelope-from ) for ; Sat, 05 Feb 2022 12:19:18 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id AIn2KjZd/mFzFAAAauVa8A (envelope-from ) for ; Sat, 05 Feb 2022 12:19:18 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 6BF2D21F8C for ; Sat, 5 Feb 2022 12:19:18 +0100 (CET) Received: from localhost ([::1]:57496 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nGJ64-0007xl-LS for larch@yhetil.org; Sat, 05 Feb 2022 06:19:16 -0500 Received: from eggs.gnu.org ([209.51.188.92]:51790) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGJ5e-0007vp-FV for guix-devel@gnu.org; Sat, 05 Feb 2022 06:18:50 -0500 Received: from [2a0c:e300::1] (port=60084 helo=hera.aquilenet.fr) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nGJ5c-0001S6-DX for guix-devel@gnu.org; Sat, 05 Feb 2022 06:18:50 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 31685838; Sat, 5 Feb 2022 12:18:46 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id grK9NFxX1uUw; Sat, 5 Feb 2022 12:18:45 +0100 (CET) Received: from ribbon (91-160-117-201.subs.proxad.net [91.160.117.201]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 8AD0692; Sat, 5 Feb 2022 12:18:44 +0100 (CET) From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Ryan Prior Subject: Re: File search References: <8735lh5ukw.fsf@inria.fr> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 17 =?utf-8?Q?Pluvi=C3=B4se?= an 230 de la =?utf-8?Q?R=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Sat, 05 Feb 2022 12:18:44 +0100 In-Reply-To: (Ryan Prior's message of "Tue, 25 Jan 2022 23:45:35 +0000") Message-ID: <87sfsxk1d7.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: / X-Rspamd-Server: hera X-Rspamd-Queue-Id: 31685838 X-Spamd-Result: default: False [0.90 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; FREEMAIL_ENVRCPT(0.00)[protonmail.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_ALL(0.00)[]; RCPT_COUNT_TWO(0.00)[2]; FREEMAIL_TO(0.00)[protonmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_MIXED_CHARSET(1.00)[subject]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a0c:e300::1 (failed) Received-SPF: softfail client-ip=2a0c:e300::1; envelope-from=ludo@gnu.org; helo=hera.aquilenet.fr X-Spam_score_int: -4 X-Spam_score: -0.5 X-Spam_bar: / X-Spam_report: (-0.5 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665, T_SCC_BODY_TEXT_LINE=-0.01 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Guix Devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1644059958; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=MIWIZX4mahcgjdBp/+cr8CZRst8Wqeqn94rzY+lUpLw=; b=KCUL4UcRrnIXrVZ5U47bXYih1E+wUEEcyo09OGGMAyG2f4qF+9McDFEiXF544Q6FUvDluP 26p6OZODFZeWU1LbBXQ1JvhyqHfo/hlUZJZ3jWNDHSBau8RUokeqz5Y6Hz2eaONez9+0om 0WGwQSJIURNkHSGBGPst2C0vitoQuWtlyORUYNqhP3B84XZuhZZiGm7ER1h617FO1kqI3G +7h5BoBugx1gOZqAo31DubxKW8KsgUIVerFoQ8wwKbFmR8ezSIIefcJ68pbUgCFxPbxT0w bOX6peb5nCvgQhBPl5kUGdhrBNdA4ikKCa/GzLyyO/ACoLoMFBot5nKd5gjqyA== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1644059958; a=rsa-sha256; cv=none; b=b8tM0G8ngNpyaIQj/gjFuIz07ET/B1lTeg/YUSlrfys888BFDTWukA1JQQhlLxP7VmZVAD sRvDRXu71sAYXfSp1YGYhkGUXfpnf3aPpvu5IEGRf/YRrutvfIddFKzSOqgbz5yBOxjNTn nC9TkT/eU+pM0setQLlQKNjbQQ52xtfvtS2z8soJm1VQdYPVVE7/ayw9FPVhDS3bZfq4Po 2/pYkWgXkdpP/+UScGkbjbDKaSHBgpeK0Jo8aAUph4iIz8cb1iAdkWepqBmCd4jhH1jaN9 PdjFgsrayfqOHGrI+HhirX4V2yiih+3RVSUME/j7EIvyREWXJ4S5UENGk/QKyw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.63 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 6BF2D21F8C X-Spam-Score: -3.63 X-Migadu-Scanner: scn1.migadu.com X-TUID: Tz0OumkvaIuU Hi, Ryan Prior skribis: > On Friday, January 21st, 2022 at 9:03 AM, Ludovic Court=C3=A8s wrote: > >> The database for 18K packages is quite big: >> >> --8<---------------cut here---------------start------------->8--- >> >> $ du -h /tmp/db* >> >> 389M /tmp/db >> >> 82M /tmp/db.gz >> >> 61M /tmp/db.zst >> >> --8<---------------cut here---------------end--------------->8--- >> [snip] >> In terms of privacy, I think it=E2=80=99s better if we can avoid making >> one request per file searched for. Off-line operation would be >> sweet, and it comes with responsiveness; fast off-line search is >> necessary for things like =E2=80=98command-not-found=E2=80=99 (where the= shell >> tells you what package to install when a command is not found). > > Offline operation is crucial, and I don't think it's desirable to downloa= d tens or hundreds of megabytes. What about creating & distributing a bloom= filter per package, with members being file names? This would allow us to = dramatically reduce the size of data we distribute, at the cost of not givi= ng 100% reliable answers. We've established, though, that some information = is better than none, and the uncertainty can be resolved by querying a web = service or building the package locally and searching its directory. My understanding is that Bloom filters are sets essentially, but here we need more than that: we need to map files to package names. Or am I misunderstanding what you have in mind? Thanks, Ludo=E2=80=99.