From mboxrd@z Thu Jan 1 00:00:00 1970 From: Amirouche Boubekki Subject: Re: Use guix to distribute data & reproducible (data) science Date: Fri, 16 Feb 2018 17:43:40 +0100 Message-ID: <24274adb01ba9c928a4701054b686a4a@hypermove.net> Mime-Version: 1.0 Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 8bit Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:56380) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1emj6u-0006d5-CX for guix-devel@gnu.org; Fri, 16 Feb 2018 11:43:45 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1emj6t-00049w-Be for guix-devel@gnu.org; Fri, 16 Feb 2018 11:43:44 -0500 Received: from relay3-d.mail.gandi.net ([2001:4b98:c:538::195]:34887) by eggs.gnu.org with esmtps (TLS1.0:DHE_RSA_AES_256_CBC_SHA1:32) (Exim 4.71) (envelope-from ) id 1emj6t-000495-60 for guix-devel@gnu.org; Fri, 16 Feb 2018 11:43:43 -0500 List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: ludovic.courtes@inria.fr Cc: Guix Devel Hello again Ludovic, On 2018-02-09 18:13, ludovic.courtes@inria.fr wrote: > Hi! > > Amirouche Boubekki skribis: > >> tl;dr: Distribution of data and software seems similar. >> Data is more and more important in software and reproducible >> science. Data science ecosystem lakes resources sharing. >> I think guix can help. > > I think some of us especially Guix-HPC folks are convinced about the > usefulness of Guix as one of the tools in the reproducible science > toolchain (that was one of the themes of my FOSDEM talk). :-) > > Now, whether Guix is the right tool to distribute data, I don’t know. > Distributing large amounts of data is a job in itself, and the store > isn’t designed for that. It could quickly become a bottleneck. What does it mean technically that the store “isn't designed for that”? > That’s one of the reasons why the Guix Workflow Language (GWL) > does not store scientific data in the store itself. Sorry, I did not follow the engineering discussion around GWL. Looking up the web brings me [0]. That said the question I am asking is not answered there. In particular there is no rationale for that in the design paper. [0] http://lists.gnu.org/archive/html/guix-devel/2016-10/msg01248.html > I think data should probably be stored and distributed out-of-band > using > appropriate storage mechanisms. Then, in a follow up mail, you reply to Konrad: >> Konrad Hinsen skribis: > > [...] > >> It would be nice if big datasets could conceptually be handled in the >> same way while being stored elsewhere - a bit like git-annex does for >> git. And for parallel computing, we could have special build daemons. > > Exactly. I think we need a git-annex/git-lfs-like tool for the store. > (It could also be useful for things like secrets, which we don’t want > to have in the store.) > - " The most basic of all human needs is the need to understand and be understood " Ralph Nichols