From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id eNTFMzHQRl+QLgAA0tVLHw (envelope-from ) for ; Wed, 26 Aug 2020 21:12:17 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id EEiWLzHQRl9LXAAA1q6Kng (envelope-from ) for ; Wed, 26 Aug 2020 21:12:17 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 0F7429404C3 for ; Wed, 26 Aug 2020 21:12:17 +0000 (UTC) Received: from localhost ([::1]:52800 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1kB2iM-0008De-Dp for larch@yhetil.org; Wed, 26 Aug 2020 17:12:14 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:34894) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1kB2iA-0008DT-8K for bug-guix@gnu.org; Wed, 26 Aug 2020 17:12:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:57668) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1kB2i9-0002N6-U0 for bug-guix@gnu.org; Wed, 26 Aug 2020 17:12:01 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1kB2i9-0001uG-O6 for bug-guix@gnu.org; Wed, 26 Aug 2020 17:12:01 -0400 X-Loop: help-debbugs@gnu.org Subject: bug#42162: Recovering source tarballs Resent-From: Timothy Sample Original-Sender: "Debbugs-submit" Resent-CC: bug-guix@gnu.org Resent-Date: Wed, 26 Aug 2020 21:12:01 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 42162 X-GNU-PR-Package: guix X-GNU-PR-Keywords: To: zimoun Received: via spool by 42162-submit@debbugs.gnu.org id=B42162.15984763207320 (code B ref 42162); Wed, 26 Aug 2020 21:12:01 +0000 Received: (at 42162) by debbugs.gnu.org; 26 Aug 2020 21:12:00 +0000 Received: from localhost ([127.0.0.1]:40981 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kB2i8-0001u0-3m for submit@debbugs.gnu.org; Wed, 26 Aug 2020 17:12:00 -0400 Received: from wout1-smtp.messagingengine.com ([64.147.123.24]:57449) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1kB2i6-0001to-If for 42162@debbugs.gnu.org; Wed, 26 Aug 2020 17:11:59 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 952B91653; Wed, 26 Aug 2020 17:11:52 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute3.internal (MEProxy); Wed, 26 Aug 2020 17:11:52 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm3; bh=AxLkDrBOJnmHcfRv+GNc3Kv+MFRjakCXWZGBhdAJN dw=; b=gUVFRJr9Fp8QM8wQyf7nGI086/PzndWc0KJaGAfmzmO/GBTdrcbT74q9d 2ovINT/9o0Yz9/GfSPO1FaK7ryK0L/RG9bxoLgpLCAnvhWhArAaCfkavbl4fUv22 Bi/NClGE+n7xPjUP+lUYkSDtuPUNK2yLSBn8voLhSB19Mo2nR3jMFUekQvIQSktV YjCOe02NRT1seg6i8IO9reajNKM06hxZzmf6iHjvrumbcqgBaBfS0gYoF8DglXwp n0QhNcQD4zOEf6aHNDJxPSHcNirZrcm/lLFZYBRkeQ66bxSKgNvNCcwY/mjFWd48 u2jJ7Zm/BnwWe3RLflBtiI1ystw2w== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduiedruddvvddgudehhecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecusecvtfgvtghiphhivghnthhsucdlqddutddtmd enucfjughrpefhvffufhffjgfkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhm ohhthhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucggtf frrghtthgvrhhnpefhtefhiedvtdeftdffvdehkeejhedvvdetuedtvdefgedtuedujeel ueetvdektdenucffohhmrghinhepghhnuhdrohhrghdpshhofhhtfigrrhgvhhgvrhhith grghgvrdhorhhgnecukfhppeejgedrudduiedrudekiedrgeegnecuvehluhhsthgvrhfu ihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepshgrmhhplhgvthesnhhghihroh drtghomh X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA id 849C330600A3; Wed, 26 Aug 2020 17:11:51 -0400 (EDT) From: Timothy Sample References: <87mu4iv0gc.fsf@inria.fr> <86h7uq8fmk.fsf@gmail.com> <87d05etero.fsf@gnu.org> <87r1tit5j6.fsf_-_@gnu.org> <875za4ykej.fsf@ngyro.com> <86blixyb7c.fsf@gmail.com> Date: Wed, 26 Aug 2020 17:11:50 -0400 In-Reply-To: <86blixyb7c.fsf@gmail.com> (zimoun's message of "Wed, 26 Aug 2020 12:04:55 +0200") Message-ID: <87k0xlaz8p.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.3 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-Spam-Score: -0.7 (/) X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-Spam-Score: -1.7 (-) X-BeenThere: bug-guix@gnu.org List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: 42162@debbugs.gnu.org, Maurice =?UTF-8?Q?Br=C3=A9mond?= Errors-To: bug-guix-bounces+larch=yhetil.org@gnu.org Sender: "bug-Guix" X-Scanner: scn0 Authentication-Results: aspmx1.migadu.com; dkim=fail (rsa verify failed) header.d=messagingengine.com header.s=fm3 header.b=gUVFRJr9; dmarc=none; spf=pass (aspmx1.migadu.com: domain of bug-guix-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=bug-guix-bounces@gnu.org X-Spam-Score: -0.01 X-TUID: xS85QMI+QTul Hi zimoun, zimoun writes: > One question is how this database scales? > > For example, a quick back-to-envelop estimation leads to ~1.2GB metadata > for ~14k packages and then an increase of ~700MB per year, both with the > Ludo=E2=80=99s code [1]. > > [1] It=E2=80=99s a good question. A good part of the size comes from the representation rather than the data. Compression helps a lot here. I have a database of 3,912 packages. It=E2=80=99s 295M uncompressed (which i= s a little better than your estimation). If I pass each file through Lzip, it shrinks down to 60M. That=E2=80=99s more like 15.5K per package, which = is almost an order of magnitude smaller than the estimation you used (120K). I think that makes the numbers rather pleasant, but it comes at the expense of easy storing in Git. > As mentioned [2], should this service be part of SWH (download cooking > task)? Or project side? > > [2] It would be interesting to just have SWH absorb the project. Since other distros already know how to produce a =E2=80=9Csources.json=E2=80=9D = and how to query the SWH archive, it would mean that they benefit for free (and so would Guix, for that matter). I=E2=80=99m open to that, but right now havi= ng the freedom to experiment is important. -- Tim