From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id MNi7Jx7c72EV2wAAgWs5BA (envelope-from ) for ; Tue, 25 Jan 2022 12:16:46 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id AOIgJR7c72GVLgAA9RJhRA (envelope-from ) for ; Tue, 25 Jan 2022 12:16:46 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 404353C27B for ; Tue, 25 Jan 2022 12:16:46 +0100 (CET) Received: from localhost ([::1]:41554 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1nCJob-00087p-FY for larch@yhetil.org; Tue, 25 Jan 2022 06:16:45 -0500 Received: from eggs.gnu.org ([209.51.188.92]:40292) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCJni-00085n-Cx for guix-devel@gnu.org; Tue, 25 Jan 2022 06:15:50 -0500 Received: from [2a0c:e300::1] (port=50354 helo=hera.aquilenet.fr) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1nCJng-0007wH-92 for guix-devel@gnu.org; Tue, 25 Jan 2022 06:15:50 -0500 Received: from localhost (localhost [127.0.0.1]) by hera.aquilenet.fr (Postfix) with ESMTP id 7E6F1141; Tue, 25 Jan 2022 12:15:45 +0100 (CET) X-Virus-Scanned: Debian amavisd-new at aquilenet.fr Received: from hera.aquilenet.fr ([127.0.0.1]) by localhost (hera.aquilenet.fr [127.0.0.1]) (amavisd-new, port 10024) with ESMTP id 2X3TKRU1FSfO; Tue, 25 Jan 2022 12:15:44 +0100 (CET) Received: from ribbon (91-160-117-201.subs.proxad.net [91.160.117.201]) by hera.aquilenet.fr (Postfix) with ESMTPSA id 47E579D; Tue, 25 Jan 2022 12:15:44 +0100 (CET) From: =?utf-8?Q?Ludovic_Court=C3=A8s?= To: Maxim Cournoyer Subject: Re: File search References: <8735lh5ukw.fsf@inria.fr> <8735lgap7i.fsf@yucca> <874k5w38s7.fsf@gnu.org> <87pmok8orm.fsf@gmail.com> X-URL: http://www.fdn.fr/~lcourtes/ X-Revolutionary-Date: 6 =?utf-8?Q?Pluvi=C3=B4se?= an 230 de la =?utf-8?Q?R?= =?utf-8?Q?=C3=A9volution?= X-PGP-Key-ID: 0x090B11993D9AEBB5 X-PGP-Key: http://www.fdn.fr/~lcourtes/ludovic.asc X-PGP-Fingerprint: 3CE4 6455 8A84 FDC6 9DB4 0CFB 090B 1199 3D9A EBB5 X-OS: x86_64-pc-linux-gnu Date: Tue, 25 Jan 2022 12:15:43 +0100 In-Reply-To: <87pmok8orm.fsf@gmail.com> (Maxim Cournoyer's message of "Fri, 21 Jan 2022 21:53:17 -0500") Message-ID: <87fspcoylc.fsf@gnu.org> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spamd-Bar: / X-Rspamd-Server: hera X-Rspamd-Queue-Id: 7E6F1141 X-Spamd-Result: default: False [0.90 / 15.00]; ARC_NA(0.00)[]; RCVD_VIA_SMTP_AUTH(0.00)[]; FROM_HAS_DN(0.00)[]; RCPT_COUNT_THREE(0.00)[3]; FREEMAIL_ENVRCPT(0.00)[gmail.com]; TO_MATCH_ENVRCPT_ALL(0.00)[]; TAGGED_RCPT(0.00)[]; MIME_GOOD(-0.10)[text/plain]; TO_DN_ALL(0.00)[]; FREEMAIL_TO(0.00)[gmail.com]; FROM_EQ_ENVFROM(0.00)[]; MIME_TRACE(0.00)[0:+]; R_MIXED_CHARSET(1.00)[subject]; RCVD_COUNT_TWO(0.00)[2]; RCVD_TLS_ALL(0.00)[]; MID_RHS_MATCH_FROM(0.00)[] X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a0c:e300::1 (failed) Received-SPF: softfail client-ip=2a0c:e300::1; envelope-from=ludo@gnu.org; helo=hera.aquilenet.fr X-Spam_score_int: -3 X-Spam_score: -0.4 X-Spam_bar: / X-Spam_report: (-0.4 / 5.0 requ) BAYES_00=-1.9, RDNS_NONE=0.793, SPF_HELO_PASS=-0.001, SPF_SOFTFAIL=0.665 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: Vagrant Cascadian , Guix Devel Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1643109406; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post; bh=fXZItpsZtO17NvTniGuSETDZ58+ArK/O6cLEeRr6MdY=; b=bmKDSXj+7Q8b2LGpe1lYeQoHq9FAWvbxybt4cm1ZTdNbyt+FAJJZITJK3FIwrJUBIOlgzO C7Qd4IhCh2ZninxU/aGQ4rtef2aDK105BBoBVBjlGN05pcN/AwCuNMO+CcwDcrc1EG0Kxz ezEN8eiSW2Q805alI44hU1lH3T5yg9vN9W6Z2zebe+ATabht2MUYaL4eOVo1jWBM2GcoUv 5jYnpbDCdkxPpdElzBqy/s+JyOgLhIEQD0z3K+9ISohDNA3IJ6Vkmj9eEkplotzYqJVFqW U0JtuFOC4+kUMNaQibUJjMQAPzU5vNRdWTjyCYqQTj49CJ17W8VYnD2PXjds2g== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1643109406; a=rsa-sha256; cv=none; b=WHaHkOuS9z1Y0pGeuO0iaLVexXzxfyOeeir5UyYJ8SpjCM6k59esUFwMZC6VerJygqU0hx UtX4wJ09FmJLKOiIOpxxZCKpWe/c9cbx96KmBBuTGHP1EBBi9Cz3EKwRMTc4YwGrp8nwsw pSrUOUMj5nCPX2+KMLGDRi19zp2WhciW/saHeczKqeb3shwQn3XEYNTbGkFaK7ajvQPmbC ZQ97LJBqViwS2OopvXKK/w8wnjj2Bic1vSfcxkuCQ70bGBOEWIE4xPlN9btZeZgYgxZCMF tjO101gNgcGZDSN0DKJHiJVDn0IN53bDBi7WpX6Ll7rFZ+GhF4tIIEkpeEZXYg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.63 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=pass (policy=none) header.from=gnu.org; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 404353C27B X-Spam-Score: -3.63 X-Migadu-Scanner: scn1.migadu.com X-TUID: /Sgb7uCiSfBA Maxim Cournoyer skribis: > I also had the idea of making it a package... this way only the people > who opt to install the database locally would incur the cost (in > bandwidth). > > Perhaps a question for Vagrant: talking about size, is this SQLite > database file comparable or smaller in size to the apt-file database > that needs to be downloaded? With the Debian software catalog being > about 30% bigger, I'd expect a similarly bigger file size. > > If Debian is doing better in terms of database file size, we could look > at how they're doing it. As a back-of-the-envelope estimate, here=E2=80=99s the amount of text that = needs to be available in the database: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ sqlite3 -csv /tmp/db 'select name,version from packages= ; select name from directories;select name from files;'|wc -c 197689978 ludo@berlin ~/src$ guile -c '(pk (/ 197689978 (expt 2. 20)))' ;;; (188.5318546295166) ludo@berlin ~/src$ du -h /tmp/db 389M /tmp/db --8<---------------cut here---------------end--------------->8--- So roughly, SQLite with this particular schema ends up taking twice as much space as the lower bound. We can do a bit better (I=E2=80=99m not an expert, so I=E2=80=99m just tryi= ng things naively) by dropping the index and cleaning up the database: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ cp /tmp/db{,.without-index} ludo@berlin ~/src$ sqlite3 /tmp/db.without-index SQLite version 3.32.3 2020-06-18 14:00:33 Enter ".help" for usage hints. sqlite> drop index IndexFiles; sqlite> .quit ludo@berlin ~/src$ du -h /tmp/db.without-index=20 389M /tmp/db.without-index ludo@berlin ~/src$ sqlite3 /tmp/db.without-index=20 SQLite version 3.32.3 2020-06-18 14:00:33 Enter ".help" for usage hints. sqlite> vacuum; sqlite> .quit ludo@berlin ~/src$ du -h /tmp/db.without-index=20 290M /tmp/db.without-index --8<---------------cut here---------------end--------------->8--- With compression: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ zstd -19 < /tmp/db.without-index > /tmp/db.without-index= .zst ludo@berlin ~/src$ du -h /tmp/db.without-index.zst=20 37M /tmp/db.without-index.zst --8<---------------cut here---------------end--------------->8--- (Down from 61MB.) For comparison, this is smaller than guile, perl, gtk+, and roughly the same as glibc:out. For the record, with compression, the lower bound is about 12=C2=A0MiB: --8<---------------cut here---------------start------------->8--- ludo@berlin ~/src$ sqlite3 -csv /tmp/db 'select name,version from packages= ; select name from directories;select name from files;'|zstd -19|wc -c 12128674 ludo@berlin ~/src$ guile -c '(pk (/ 12128674 (expt 2. 20)))' ;;; (11.566804885864258) --8<---------------cut here---------------end--------------->8--- All this to say that we could distribute the database in a form that gets closer to the optimal size, at the expense of extra processing on the client side upon reception to put it into shape (creating an index, etc.). Ludo=E2=80=99.