From mboxrd@z Thu Jan 1 00:00:00 1970 From: zimoun Subject: Re: Use guix to distribute data & reproducible (data) science Date: Fri, 9 Feb 2018 18:48:48 +0100 Message-ID: References: <365e13248634ac1e26cf6678611d550d@hypermove.net> <87mv0ixf07.fsf@gnu.org> Mime-Version: 1.0 Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:55474) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1ekCn6-0002KB-9L for guix-devel@gnu.org; Fri, 09 Feb 2018 12:48:53 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1ekCn5-00048N-BO for guix-devel@gnu.org; Fri, 09 Feb 2018 12:48:52 -0500 Received: from mail-wr0-x229.google.com ([2a00:1450:400c:c0c::229]:40092) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1ekCn5-00045o-35 for guix-devel@gnu.org; Fri, 09 Feb 2018 12:48:51 -0500 Received: by mail-wr0-x229.google.com with SMTP id o76so5944549wrb.7 for ; Fri, 09 Feb 2018 09:48:50 -0800 (PST) In-Reply-To: <87mv0ixf07.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: =?UTF-8?Q?Ludovic_Court=C3=A8s?= Cc: Guix Devel , Amirouche Boubekki Dear, >From my understanding, what you are describing is what bioinfo guys call a workflow: 1- fetch data here and there 2- clean and prepare data 3- compute stuff with these data 4- obtain an answer and loop several times on several data sets. Guix Workflow Language allows to implement the workflow, i.e., all the steps and their link to deal with the data. And because Guix, reproducibility in terms of softwares comes for almost fr= ee. Moreover, if there is some channel mechanism, then there is a way to share these workflows. I think the tools are there, modulo UI and corner cases. :-) >From my point of view, workflows are missing because of manpower (lispy guy, etc.). Last, a workflow is not necessary reproducible bit-to-bit since some algorithms use randomness. Hope that helps. All the best, simon On 9 February 2018 at 18:13, Ludovic Court=C3=A8s wrote: > Hi! > > Amirouche Boubekki skribis: > >> tl;dr: Distribution of data and software seems similar. >> Data is more and more important in software and reproducible >> science. Data science ecosystem lakes resources sharing. >> I think guix can help. > > I think some of us especially Guix-HPC folks are convinced about the > usefulness of Guix as one of the tools in the reproducible science > toolchain (that was one of the themes of my FOSDEM talk). :-) > > Now, whether Guix is the right tool to distribute data, I don=E2=80=99t k= now. > Distributing large amounts of data is a job in itself, and the store > isn=E2=80=99t designed for that. It could quickly become a bottleneck. = That=E2=80=99s > one of the reasons why the Guix Workflow Language (GWL) does not store > scientific data in the store itself. > > I think data should probably be stored and distributed out-of-band using > appropriate storage mechanisms. > > Ludo=E2=80=99. >