all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: Denis 'GNUtoo' Carikli <GNUtoo@cyberdimension.org>
To: Guix-devel <guix-devel@gnu.org>
Subject: Packaging big generated data files?
Date: Wed, 7 Dec 2022 11:33:15 +0100	[thread overview]
Message-ID: <20221207113315.6a43a206@primary_laptop> (raw)

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Hi,

Is there any policies or past decisions of the Guix project on
packaging big generated data files?

I've added packages for software like kiwix-tools and navit that both
work offline but that also need data files to be useful.

Navit is a (car) navigation software that need maps. The maps can be
generated from OpenStreetMap dumps with a tool available in Navit
source code (maptool)[1] which is not packaged yet. Binary map files can
also be downloaded directly from various sources.

Right now the biggest file possible for such maps is about 47 GiB
(for the whole planet).

As for kiwix-tools, it can serve offline versions of websites like
Wikipedia, and there too it needs files to work. The biggest file seems
to be the complete version of English Wikipedia with scaled down
pictures[2] and it takes about 89 GiB. I didn't look yet how these files
were generated but I guess that they somehow can be generated from
Wikipedia dumps.

Packaging the binary files (without generating them) can be useful as
it simplifies a lot the maintenance as one can just update the package
version and checksum to update these. It also enables to keep the
information (download URL, checksum, license) in one place and it
enables easy reuse by Guix services and/or configuration files.

If these files were generated in packages, it would also enable to
tweak the data, for instance by adding height data in navit maps. As
for kiwix compatible files, it would probably enable to decide when to
make the snapshots or enable to package additional wikis
(like the Libreplanet Wiki) or websites.

The issue here is probably the size of the generated files: they are
huge, so if they are packaged, they will most likely take significant
resources in the Guix infrastructure.

So what would be the way to go here? Would Guix accept patches to add
packages for these files in Guix proper?  

If so, does it needs to be done like with the ZFS (kernel module)
package where "#:substitutable? #f" is used to avoid redistributing
package builds? Or are other ways better for such use cases?

Note that so far I've only packaged locally only kiwix compatible files
for various wikis by just downloading already prepared files, so I
didn't look yet into navit maps or into generating all these files, so
I might miss some details about generating them.

References:
-----------
[1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
[2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim

Denis.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

             reply	other threads:[~2022-12-07 10:34 UTC|newest]

Thread overview: 7+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-12-07 10:33 Denis 'GNUtoo' Carikli [this message]
2022-12-07 14:45 ` Packaging big generated data files? pelzflorian (Florian Pelz)
2022-12-10 17:08   ` Denis 'GNUtoo' Carikli
2022-12-08 13:46 ` Csepp
2022-12-10 17:19   ` Denis 'GNUtoo' Carikli
2022-12-11 10:16     ` Ludovic Courtès
2022-12-12 13:51 ` zimoun

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20221207113315.6a43a206@primary_laptop \
    --to=gnutoo@cyberdimension.org \
    --cc=guix-devel@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.