From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp11.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id qCPlIfPrkWMypgAAbAwnHQ (envelope-from ) for ; Thu, 08 Dec 2022 14:51:47 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp11.migadu.com with LMTPS id 0AvaIfPrkWO8MwEA9RJhRA (envelope-from ) for ; Thu, 08 Dec 2022 14:51:47 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 649A98D6A for ; Thu, 8 Dec 2022 14:51:47 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p3HIp-0001x3-8f; Thu, 08 Dec 2022 08:51:13 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p3HIU-0001w0-Tj for guix-devel@gnu.org; Thu, 08 Dec 2022 08:50:55 -0500 Received: from mx0.riseup.net ([198.252.153.6]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p3HIS-00045o-CK for guix-devel@gnu.org; Thu, 08 Dec 2022 08:50:46 -0500 Received: from fews1.riseup.net (fews1-pn.riseup.net [10.0.1.83]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256 client-signature RSA-PSS (2048 bits) client-digest SHA256) (Client CN "mail.riseup.net", Issuer "R3" (not verified)) by mx0.riseup.net (Postfix) with ESMTPS id 4NSbBS5sBbz9t4T; Thu, 8 Dec 2022 13:50:08 +0000 (UTC) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=riseup.net; s=squak; t=1670507408; bh=PkvrvNpDOWxNa1gCMbkGvne/R8Iydo1GZeT5bg5yyd4=; h=References:From:To:Cc:Subject:Date:In-reply-to:From; b=Kva1vp92R8BYTi9VjK6dTnZs0lz79xofapoFnq9ev8Pgin8A8yMTz7xalduoGV75P ArukHAGmylorBPVKRqXzvDWsp7u8RbNdsTUHzhgQY/4kmXaWhv42IDKjEEg/+5Q3N3 X+pt4SzwsnJP09lZebabLi9hmxH1OBgUnHHat7iQ= X-Riseup-User-ID: B3DB1DBD01388FC4031960C2D79ACC9265A3DBE73E4A930FBCF2A2E512B6794B Received: from [127.0.0.1] (localhost [127.0.0.1]) by fews1.riseup.net (Postfix) with ESMTPSA id 4NSbBS24SQz5vMZ; Thu, 8 Dec 2022 13:50:08 +0000 (UTC) References: <20221207113315.6a43a206@primary_laptop> From: Csepp To: Denis 'GNUtoo' Carikli Cc: guix-devel@gnu.org Subject: Re: Packaging big generated data files? Date: Thu, 08 Dec 2022 14:46:51 +0100 In-reply-to: <20221207113315.6a43a206@primary_laptop> Message-ID: <875yemgg37.fsf@riseup.net> MIME-Version: 1.0 Content-Type: text/plain Received-SPF: pass client-ip=198.252.153.6; envelope-from=raingloom@riseup.net; helo=mx0.riseup.net X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1670507507; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=RXZ7QwIZ6C9tG1ED01PNs223PYxrfk4H/lofpKG7Xro=; b=aA+BEk+PkwYNNqsom7wqYHarvdJTgL10PSphp6qPhqm3UhA9P6Y4w9U25QTtA//eNOdT3p QDdaZskQHRJ346+jBMznWsDqmCgvr3F25KOl6hpKRgB5Z6+znqZWigJFIqaMAc2AeusKlB 0q7LjPWTGJ9pMa9YV3HNt2VZpsQJuiF40ZBHyuGcy+32Ha1f48csFNnMYbWvyuuB9WC427 Haph4UA1SDig/HGWu98A4tsnG6KkvvXJIgM6OKxoH4/RZxbvK26GObi37ULiscjUKaWxIN RLCvkm0vCeAE3B629dCa8eQN6EjTRsMk9vrQtr7fmI6Q7t5yVE4bqc6wxmwIVw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1670507507; a=rsa-sha256; cv=none; b=bASy0d6arg2+oucRjR0CQq3dAjLb2ZiIsNu03VkMuPev6NtVo41OGDhagKUGllKLJhna+L j9orsvsTzgJsZCaQ4YcJn/bk2g7IlnGSl4ZrYm17kQJzlV87LTbMkPBhQZnKOyGbDY2Shb QXRei0T9CPXYAz9kB/CLvFtcKTuWq2vadYWpaO1l8jxeko/7Reh1fVLMAunQ+Omfd6gfBU dHbaHsF1X0MpJLcrshBqHglYSpQ9iKNMSBcnZGeJ22eda7/Q7xze7juVWGx4/5LkMLp0UA 4czIbU7K9ZBHHsbyRogdbKf4vxbhhig+SW5x8m7ij6/IwNFkF5KsjZuHzUpEKA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=riseup.net header.s=squak header.b=Kva1vp92; dmarc=pass (policy=none) header.from=riseup.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.06 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=riseup.net header.s=squak header.b=Kva1vp92; dmarc=pass (policy=none) header.from=riseup.net; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 649A98D6A X-Spam-Score: -3.06 X-Migadu-Scanner: scn0.migadu.com X-TUID: +MBFIV3HgJWG Denis 'GNUtoo' Carikli writes: > [[PGP Signed Part:Undecided]] > Hi, > > Is there any policies or past decisions of the Guix project on > packaging big generated data files? > > I've added packages for software like kiwix-tools and navit that both > work offline but that also need data files to be useful. > > Navit is a (car) navigation software that need maps. The maps can be > generated from OpenStreetMap dumps with a tool available in Navit > source code (maptool)[1] which is not packaged yet. Binary map files can > also be downloaded directly from various sources. > > Right now the biggest file possible for such maps is about 47 GiB > (for the whole planet). > > As for kiwix-tools, it can serve offline versions of websites like > Wikipedia, and there too it needs files to work. The biggest file seems > to be the complete version of English Wikipedia with scaled down > pictures[2] and it takes about 89 GiB. I didn't look yet how these files > were generated but I guess that they somehow can be generated from > Wikipedia dumps. > > Packaging the binary files (without generating them) can be useful as > it simplifies a lot the maintenance as one can just update the package > version and checksum to update these. It also enables to keep the > information (download URL, checksum, license) in one place and it > enables easy reuse by Guix services and/or configuration files. > > If these files were generated in packages, it would also enable to > tweak the data, for instance by adding height data in navit maps. As > for kiwix compatible files, it would probably enable to decide when to > make the snapshots or enable to package additional wikis > (like the Libreplanet Wiki) or websites. > > The issue here is probably the size of the generated files: they are > huge, so if they are packaged, they will most likely take significant > resources in the Guix infrastructure. > > So what would be the way to go here? Would Guix accept patches to add > packages for these files in Guix proper? > > If so, does it needs to be done like with the ZFS (kernel module) > package where "#:substitutable? #f" is used to avoid redistributing > package builds? Or are other ways better for such use cases? > > Note that so far I've only packaged locally only kiwix compatible files > for various wikis by just downloading already prepared files, so I > didn't look yet into navit maps or into generating all these files, so > I might miss some details about generating them. > > References: > ----------- > [1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself > [2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim > > Denis. > > [[End of PGP Signed Part]] Could ZIM files be downloaded over bittorrent as fixed output derivations? They can be pretty huge. Also if the system started seeding them as well, that would be pretty cool.