all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: Guix Devel <guix-devel@gnu.org>
Subject: Re: Use guix to distribute data & reproducible (data) science
Date: Thu, 15 Feb 2018 18:10:33 +0100	[thread overview]
Message-ID: <CAJ3okZ14RAUU4FrBMEOpOLU9aV21BJbY1G5-WeZq3Q-TEmk1Hg@mail.gmail.com> (raw)
In-Reply-To: <87lgfvu3dg.fsf@gnu.org>

Hi,

Thank you for this food for thought.


I agree that the frontier between code and data is arbitary.

However, I am not sure to get the picture about the data management in
the context of Reproducible Science. What is the issue ?

So, I catch your invitation to explore your idea. :-)


Let think about the old lab experiment. On one hand, you have your
protocol and the description of all the steps. On the other hand, you
have measurements and results. Then, I am able to imagine a sense of
some bit-to-bit mechanism for the protocol part. I am not sure about the
measurements part.

Well, protocol is code or workflow; measurements are data.
And I agree that e.g., information of electronic orbits or weights of a
trained neural network is sometimes part of the protocol. :-)

For me, just talking about code, it is not a straightforward task to
define what are the properties for a reproducible and fully controled
computational environment. It is --I guess-- what Guix is defining
(transactional, user-profile, hackable, etc.). Then, it appears to me
even more difficult about data.

What are such properties for data management ?

In other words, on the paper, what are the benefits of a management of
some piece of data in the store ? For example for the applications of
weights of a trained neural network; or of the positions of the atoms in
protein structure.


For me --maybe I have wrong-- the way is to define a package (or
workflow) that fetches the data from some external source, cleans if
needed, does some checks, and then puts it to /path/to/somewhere/
outside the store. In parallel computing, this /path/to/somewhere/ is
accessible by all the nodes. Moreover, this /path/to/somewhere/ contains
something hash-based in the folder name.

Is it not enough ?

Why do you need the history of changes ? as git provide ?


Secrets is another story than reproducible science toolchain, I guess.


Thank you again.

All the best,
simon

  reply	other threads:[~2018-02-15 17:10 UTC|newest]

Thread overview: 23+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-02-09 16:32 Use guix to distribute data & reproducible (data) science Amirouche Boubekki
2018-02-09 17:13 ` Ludovic Courtès
2018-02-09 17:48   ` zimoun
2018-02-09 19:15   ` Konrad Hinsen
2018-02-09 23:01     ` zimoun
2018-02-09 23:17       ` Ricardo Wurmus
2018-02-12 11:46       ` Konrad Hinsen
2018-02-14  4:43         ` Do you use packages in Guix to run neural networks? Fis Trivial
2018-02-14  6:07           ` Pjotr Prins
2018-02-14  7:27             ` Fis Trivial
2018-02-14  8:04           ` Konrad Hinsen
2018-02-10  9:51     ` Use guix to distribute data & reproducible (data) science Amirouche Boubekki
2018-02-10 11:28       ` zimoun
2018-02-14 13:06     ` Ludovic Courtès
2018-02-15 17:10       ` zimoun [this message]
2018-02-16  9:28         ` Konrad Hinsen
2018-02-16 14:33           ` myglc2
2018-02-16 15:20             ` Konrad Hinsen
2018-02-16 12:41         ` Amirouche Boubekki
  -- strict thread matches above, loose matches on Subject: below --
2018-02-16 16:43 Amirouche Boubekki
2018-02-17 22:21 ` Roel Janssen
2018-02-18 23:42 ` Ludovic Courtès
2018-02-19  7:57 ` Ricardo Wurmus

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJ3okZ14RAUU4FrBMEOpOLU9aV21BJbY1G5-WeZq3Q-TEmk1Hg@mail.gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.