all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>
Cc: Guix Devel <guix-devel@gnu.org>,
	Amirouche Boubekki <amirouche@hypermove.net>
Subject: Re: Use guix to distribute data & reproducible (data) science
Date: Fri, 9 Feb 2018 18:48:48 +0100	[thread overview]
Message-ID: <CAJ3okZ2Lk2eQSFr-_kN33xVukeyogEh7xgraUC+-9dUkk4_53w@mail.gmail.com> (raw)
In-Reply-To: <87mv0ixf07.fsf@gnu.org>

Dear,

From my understanding, what you are describing is what bioinfo guys
call a workflow:

 1- fetch data here and there
 2- clean and prepare data
 3- compute stuff with these data
 4- obtain an answer
and loop several times on several data sets.

Guix Workflow Language allows to implement the workflow, i.e., all the
steps and their link to deal with the data.
And because Guix, reproducibility in terms of softwares comes for almost free.
Moreover, if there is some channel mechanism, then there is a way to
share these workflows.

I think the tools are there, modulo UI and corner cases. :-)

From my point of view, workflows are missing because of manpower
(lispy guy, etc.).


Last, a workflow is not necessary reproducible bit-to-bit since some
algorithms use randomness.


Hope that helps.

All the best,
simon





On 9 February 2018 at 18:13, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
> Hi!
>
> Amirouche Boubekki <amirouche@hypermove.net> skribis:
>
>> tl;dr: Distribution of data and software seems similar.
>>        Data is more and more important in software and reproducible
>>        science. Data science ecosystem lakes resources sharing.
>>        I think guix can help.
>
> I think some of us especially Guix-HPC folks are convinced about the
> usefulness of Guix as one of the tools in the reproducible science
> toolchain (that was one of the themes of my FOSDEM talk).  :-)
>
> Now, whether Guix is the right tool to distribute data, I don’t know.
> Distributing large amounts of data is a job in itself, and the store
> isn’t designed for that.  It could quickly become a bottleneck.  That’s
> one of the reasons why the Guix Workflow Language (GWL) does not store
> scientific data in the store itself.
>
> I think data should probably be stored and distributed out-of-band using
> appropriate storage mechanisms.
>
> Ludo’.
>

  reply	other threads:[~2018-02-09 17:48 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-09 16:32 Use guix to distribute data & reproducible (data) science Amirouche Boubekki
2018-02-09 17:13 ` Ludovic Courtès
2018-02-09 17:48   ` zimoun [this message]
2018-02-09 19:15   ` Konrad Hinsen
2018-02-09 23:01     ` zimoun
2018-02-09 23:17       ` Ricardo Wurmus
2018-02-12 11:46       ` Konrad Hinsen
2018-02-14  4:43         ` Do you use packages in Guix to run neural networks? Fis Trivial
2018-02-14  6:07           ` Pjotr Prins
2018-02-14  7:27             ` Fis Trivial
2018-02-14  8:04           ` Konrad Hinsen
2018-02-10  9:51     ` Use guix to distribute data & reproducible (data) science Amirouche Boubekki
2018-02-10 11:28       ` zimoun
2018-02-14 13:06     ` Ludovic Courtès
2018-02-15 17:10       ` zimoun
2018-02-16  9:28         ` Konrad Hinsen
2018-02-16 14:33           ` myglc2
2018-02-16 15:20             ` Konrad Hinsen
2018-02-16 12:41         ` Amirouche Boubekki
  -- strict thread matches above, loose matches on Subject: below --
2018-02-16 16:43 Amirouche Boubekki
2018-02-17 22:21 ` Roel Janssen
2018-02-18 23:42 ` Ludovic Courtès
2018-02-19  7:57 ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ3okZ2Lk2eQSFr-_kN33xVukeyogEh7xgraUC+-9dUkk4_53w@mail.gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=amirouche@hypermove.net \
    --cc=guix-devel@gnu.org \
    --cc=ludovic.courtes@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.