From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id +IJbH1FskGOWGQEAbAwnHQ (envelope-from ) for ; Wed, 07 Dec 2022 11:34:57 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id 2CdyH1FskGPLAQEA9RJhRA (envelope-from ) for ; Wed, 07 Dec 2022 11:34:57 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 3C85514054 for ; Wed, 7 Dec 2022 11:34:57 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p2rks-0008Hy-BQ; Wed, 07 Dec 2022 05:34:22 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2rkZ-00087T-KK for guix-devel@gnu.org; Wed, 07 Dec 2022 05:34:19 -0500 Received: from cyberdimension.org ([2001:910:1314:ffff::1] helo=gnutoo.cyberdimension.org) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256) (Exim 4.90_1) (envelope-from ) id 1p2rkQ-0004bd-M1 for guix-devel@gnu.org; Wed, 07 Dec 2022 05:34:00 -0500 Received: from gnutoo.cyberdimension.org (localhost [127.0.0.1]) by cyberdimension.org (OpenSMTPD) with ESMTP id 406ecca2 for ; Wed, 7 Dec 2022 10:32:12 +0000 (UTC) Received: from primary_laptop (localhost [::1]) by gnutoo.cyberdimension.org (OpenSMTPD) with ESMTP id 8eea2133 for ; Wed, 7 Dec 2022 10:32:12 +0000 (UTC) Date: Wed, 7 Dec 2022 11:33:15 +0100 From: Denis 'GNUtoo' Carikli To: Guix-devel Subject: Packaging big generated data files? Message-ID: <20221207113315.6a43a206@primary_laptop> X-Mailer: Claws Mail 4.1.1 (GTK 3.24.30; i686-pc-linux-gnu) MIME-Version: 1.0 Content-Type: multipart/signed; boundary="Sig_/E5pkDfnsmHwZC4DJvBjsd16"; protocol="application/pgp-signature"; micalg=pgp-sha256 Received-SPF: pass client-ip=2001:910:1314:ffff::1; envelope-from=GNUtoo@cyberdimension.org; helo=gnutoo.cyberdimension.org X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1670409297; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=rrclMkRO2Ve/cldUnrgNvTymkR4OGZnyJlqTewdJS3M=; b=lpFHQ5K1NMk57yetF4ZN4wiIcAQ6g5/TcW6mvx8HS7S7ZIPHUzIwtF3AntCU43gQ1j3CHH saWzYOg8Y+vMX7cEwbU0IfyJVB6WJ6Zm+VjvZxucLPwS4TKCufwiWEpkdpO8KNdf13d01e jW9Mqa5v1w/yYaI98p2XlMAjgwhu4iQSc0Ql3Ca95m5qprfdGt9qhU1+3+nHrjjDzHbE9f wgDONGTcPXCPayIoqssScE6VcZqxrRMi4/b8m93hMoVnt0hjKWhMuMQpsrrhFb4WwaA7uO k7y5m+ICWaHcjVzZ0m3mpeV+4CoenqHOj/pyRXLEk/kShc9/rkVux6VmjnEBCg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1670409297; a=rsa-sha256; cv=none; b=sQobOj/Uf8bhnq16xq6xFLSZuW9THo7R9mig7rDyF4YYkQcPYsTpk9zNUYd5dcHaQ1PGce /TVhA/CFi5S8Wo4COlX8OoLD+OUqq0FOOenTXUUf1W3aGBVN0MQtv0OYoDkCk49TSiVTgd lhoBekg1B/0JrUDbg3toLjqDiU5LqaiclqXaT5j28ENUkjYTxezrnlxx7N/srKQdTlmNnS heiDx0ybqOK5TLrUyer0WCAPpoAQxKAfNRw7VPNajeTNEGJL4968rz5EWZNYFBoIyC2qZk Ic1QA5xXk/zXohhjDHIMiRjl5dFNp4DHKqWOLb0/ab5YjS8nDDaryy1nbeQ6yg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -1.87 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 3C85514054 X-Spam-Score: -1.87 X-Migadu-Scanner: scn0.migadu.com X-TUID: yMRMt2iBQOcI --Sig_/E5pkDfnsmHwZC4DJvBjsd16 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: quoted-printable Hi, Is there any policies or past decisions of the Guix project on packaging big generated data files? I've added packages for software like kiwix-tools and navit that both work offline but that also need data files to be useful. Navit is a (car) navigation software that need maps. The maps can be generated from OpenStreetMap dumps with a tool available in Navit source code (maptool)[1] which is not packaged yet. Binary map files can also be downloaded directly from various sources. Right now the biggest file possible for such maps is about 47 GiB (for the whole planet). As for kiwix-tools, it can serve offline versions of websites like Wikipedia, and there too it needs files to work. The biggest file seems to be the complete version of English Wikipedia with scaled down pictures[2] and it takes about 89 GiB. I didn't look yet how these files were generated but I guess that they somehow can be generated from Wikipedia dumps. Packaging the binary files (without generating them) can be useful as it simplifies a lot the maintenance as one can just update the package version and checksum to update these. It also enables to keep the information (download URL, checksum, license) in one place and it enables easy reuse by Guix services and/or configuration files. If these files were generated in packages, it would also enable to tweak the data, for instance by adding height data in navit maps. As for kiwix compatible files, it would probably enable to decide when to make the snapshots or enable to package additional wikis (like the Libreplanet Wiki) or websites. The issue here is probably the size of the generated files: they are huge, so if they are packaged, they will most likely take significant resources in the Guix infrastructure. So what would be the way to go here? Would Guix accept patches to add packages for these files in Guix proper? =20 If so, does it needs to be done like with the ZFS (kernel module) package where "#:substitutable? #f" is used to avoid redistributing package builds? Or are other ways better for such use cases? Note that so far I've only packaged locally only kiwix compatible files for various wikis by just downloading already prepared files, so I didn't look yet into navit maps or into generating all these files, so I might miss some details about generating them. References: ----------- [1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-you= rself [2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_20= 22-05.zim Denis. --Sig_/E5pkDfnsmHwZC4DJvBjsd16 Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- iQIzBAEBCAAdFiEEeC+d2+Nrp/PU3kkGX138wUF34mMFAmOQa+sACgkQX138wUF3 4mMjXA/8C8LTELVXK7KttknJl7YQJ5PSoQkWFvKvRguWlJpYETLwf+9BEH+nYAA5 0G1KO5rC9Cf8Fd07E8JbXcr8DxOMFQ/qjzRMTJtKrjv9h69IPt20G9pFDclv+L94 Cb+fWOPYlkpN9wTfnQ88ROIJoGgAs+4BjXWuF96z+ayoWuURZib1VU03MzO/M6Iq 2EiPpWZMayg+eXLqSbO/eZkQmlhlZyRireMw0RRfKuJ3m3M1AktWp1m3yR83PSfg 29+xviKBu4w76HMVBR32La2UkP/G+3vatdY+vZVZF8XjSWmwGGcwCy/y1gggxc0d emGn9jKfSzAPGvs/2N+ucEXgyfOjhgkKpgtpTFgbrTb/4tjhaw72ScOdBZM23knz t8LxHVGJ/egAL2p1fhnxyAh7HcHN7UOFIHV52MRQzeoA16PrL1QnXKZox7O4uPud 2uDksRLppS5B+pkl6EkOI/GL2gmz5e1pBnPufgV3Sc6E9Q+HS9NzF1CBISZFGg6R wpkNPftekxXs+zJjYRCnScN21rBaPhZGCeaMHopn8Q64KG2on2S5NuG1+8v9dkMN fLOzQaaDgJGZbfNbap4zKUMX+De/G7JBhuZQM+6SOjrCU9yJXxZnC6yakzpBwkVw littrbSQ+9P6KO53f34DRICuzn5eEFkZHFbJFTej0bKZg6bHvgY= =b5Y2 -----END PGP SIGNATURE----- --Sig_/E5pkDfnsmHwZC4DJvBjsd16--