unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Packaging big generated data files?
@ 2022-12-07 10:33 Denis 'GNUtoo' Carikli
  2022-12-07 14:45 ` pelzflorian (Florian Pelz)
                   ` (2 more replies)
  0 siblings, 3 replies; 7+ messages in thread
From: Denis 'GNUtoo' Carikli @ 2022-12-07 10:33 UTC (permalink / raw)
  To: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 2592 bytes --]

Hi,

Is there any policies or past decisions of the Guix project on
packaging big generated data files?

I've added packages for software like kiwix-tools and navit that both
work offline but that also need data files to be useful.

Navit is a (car) navigation software that need maps. The maps can be
generated from OpenStreetMap dumps with a tool available in Navit
source code (maptool)[1] which is not packaged yet. Binary map files can
also be downloaded directly from various sources.

Right now the biggest file possible for such maps is about 47 GiB
(for the whole planet).

As for kiwix-tools, it can serve offline versions of websites like
Wikipedia, and there too it needs files to work. The biggest file seems
to be the complete version of English Wikipedia with scaled down
pictures[2] and it takes about 89 GiB. I didn't look yet how these files
were generated but I guess that they somehow can be generated from
Wikipedia dumps.

Packaging the binary files (without generating them) can be useful as
it simplifies a lot the maintenance as one can just update the package
version and checksum to update these. It also enables to keep the
information (download URL, checksum, license) in one place and it
enables easy reuse by Guix services and/or configuration files.

If these files were generated in packages, it would also enable to
tweak the data, for instance by adding height data in navit maps. As
for kiwix compatible files, it would probably enable to decide when to
make the snapshots or enable to package additional wikis
(like the Libreplanet Wiki) or websites.

The issue here is probably the size of the generated files: they are
huge, so if they are packaged, they will most likely take significant
resources in the Guix infrastructure.

So what would be the way to go here? Would Guix accept patches to add
packages for these files in Guix proper?  

If so, does it needs to be done like with the ZFS (kernel module)
package where "#:substitutable? #f" is used to avoid redistributing
package builds? Or are other ways better for such use cases?

Note that so far I've only packaged locally only kiwix compatible files
for various wikis by just downloading already prepared files, so I
didn't look yet into navit maps or into generating all these files, so
I might miss some details about generating them.

References:
-----------
[1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
[2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim

Denis.

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-12-12 15:04 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-07 10:33 Packaging big generated data files? Denis 'GNUtoo' Carikli
2022-12-07 14:45 ` pelzflorian (Florian Pelz)
2022-12-10 17:08   ` Denis 'GNUtoo' Carikli
2022-12-08 13:46 ` Csepp
2022-12-10 17:19   ` Denis 'GNUtoo' Carikli
2022-12-11 10:16     ` Ludovic Courtès
2022-12-12 13:51 ` zimoun

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).