From mboxrd@z Thu Jan 1 00:00:00 1970 From: zimoun Subject: Re: Use guix to distribute data & reproducible (data) science Date: Sat, 10 Feb 2018 12:28:33 +0100 Message-ID: References: <365e13248634ac1e26cf6678611d550d@hypermove.net> <87mv0ixf07.fsf@gnu.org> <1cb709d0-b282-192c-ce1d-20fbff43430e@fastmail.net> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:33184) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ekTKe-0000RD-7t for guix-devel@gnu.org; Sat, 10 Feb 2018 06:28:37 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ekTKd-0002E3-DT for guix-devel@gnu.org; Sat, 10 Feb 2018 06:28:36 -0500 Received: from mail-wm0-x22a.google.com ([2a00:1450:400c:c09::22a]:38916) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ekTKd-0002Bm-77 for guix-devel@gnu.org; Sat, 10 Feb 2018 06:28:35 -0500 Received: by mail-wm0-x22a.google.com with SMTP id b21so1881493wme.4 for ; Sat, 10 Feb 2018 03:28:35 -0800 (PST) In-Reply-To: List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: Amirouche Boubekki Cc: Guix Devel Hi, Thank you for the topic feeding my thoughts. And thank you Ricardo for your explanations. > What I was thinking about, is use guix to distribute data packages just like > we distribute softwares from pypi. The advantage of using guix seems > obvious, > but apparantly it's not desirable or possible and I don't understand why. Are you talking to package a way to fetch the data ? The first Debian example I found: https://packages.debian.org/fr/stretch/astrometry-data-2mass-00 Or to package the dataset itself ? Which does not seem affordable in term of resources (bandwith+disk), is it ? Last, when considering large dataset --say hundred or more samples of GB-- then hashing becomes the bottleneck. All the best, simon