From: Csepp <raingloom@riseup.net>
To: Denis 'GNUtoo' Carikli <GNUtoo@cyberdimension.org>
Cc: guix-devel@gnu.org
Subject: Re: Packaging big generated data files?
Date: Thu, 08 Dec 2022 14:46:51 +0100 [thread overview]
Message-ID: <875yemgg37.fsf@riseup.net> (raw)
In-Reply-To: <20221207113315.6a43a206@primary_laptop>
Denis 'GNUtoo' Carikli <GNUtoo@cyberdimension.org> writes:
> [[PGP Signed Part:Undecided]]
> Hi,
>
> Is there any policies or past decisions of the Guix project on
> packaging big generated data files?
>
> I've added packages for software like kiwix-tools and navit that both
> work offline but that also need data files to be useful.
>
> Navit is a (car) navigation software that need maps. The maps can be
> generated from OpenStreetMap dumps with a tool available in Navit
> source code (maptool)[1] which is not packaged yet. Binary map files can
> also be downloaded directly from various sources.
>
> Right now the biggest file possible for such maps is about 47 GiB
> (for the whole planet).
>
> As for kiwix-tools, it can serve offline versions of websites like
> Wikipedia, and there too it needs files to work. The biggest file seems
> to be the complete version of English Wikipedia with scaled down
> pictures[2] and it takes about 89 GiB. I didn't look yet how these files
> were generated but I guess that they somehow can be generated from
> Wikipedia dumps.
>
> Packaging the binary files (without generating them) can be useful as
> it simplifies a lot the maintenance as one can just update the package
> version and checksum to update these. It also enables to keep the
> information (download URL, checksum, license) in one place and it
> enables easy reuse by Guix services and/or configuration files.
>
> If these files were generated in packages, it would also enable to
> tweak the data, for instance by adding height data in navit maps. As
> for kiwix compatible files, it would probably enable to decide when to
> make the snapshots or enable to package additional wikis
> (like the Libreplanet Wiki) or websites.
>
> The issue here is probably the size of the generated files: they are
> huge, so if they are packaged, they will most likely take significant
> resources in the Guix infrastructure.
>
> So what would be the way to go here? Would Guix accept patches to add
> packages for these files in Guix proper?
>
> If so, does it needs to be done like with the ZFS (kernel module)
> package where "#:substitutable? #f" is used to avoid redistributing
> package builds? Or are other ways better for such use cases?
>
> Note that so far I've only packaged locally only kiwix compatible files
> for various wikis by just downloading already prepared files, so I
> didn't look yet into navit maps or into generating all these files, so
> I might miss some details about generating them.
>
> References:
> -----------
> [1]https://navit.readthedocs.io/en/latest/maps.html#processing-osm-maps-yourself
> [2]https://mirror.download.kiwix.org/zim/wikipedia/wikipedia_en_all_maxi_2022-05.zim
>
> Denis.
>
> [[End of PGP Signed Part]]
Could ZIM files be downloaded over bittorrent as fixed output
derivations? They can be pretty huge. Also if the system started
seeding them as well, that would be pretty cool.
next prev parent reply other threads:[~2022-12-08 13:51 UTC|newest]
Thread overview: 7+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-07 10:33 Packaging big generated data files? Denis 'GNUtoo' Carikli
2022-12-07 14:45 ` pelzflorian (Florian Pelz)
2022-12-10 17:08 ` Denis 'GNUtoo' Carikli
2022-12-08 13:46 ` Csepp [this message]
2022-12-10 17:19 ` Denis 'GNUtoo' Carikli
2022-12-11 10:16 ` Ludovic Courtès
2022-12-12 13:51 ` zimoun
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=875yemgg37.fsf@riseup.net \
--to=raingloom@riseup.net \
--cc=GNUtoo@cyberdimension.org \
--cc=guix-devel@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).