all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
blob dea487ddf4baafec42faa8d06d583c643571eac2 3312 bytes (raw)
name: talks/bluehats-2019/outline 	 # note: path name is non-authoritative(*)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
 
-*- mode: org -*-

This talk was in French with a slot of 5-7 minutes, questions included.  It was
taken in a full day satellite to Paris Open Source Summit.  The initiative was
lead by Bastien Guerry from https://www.etalab.gouv.fr/.  More information of
the programme [[https://forum.etalab.gouv.fr/t/journee-bluehats-lors-du-paris-open-source-summit-le-11-decembre-2019/4614][here]].

The slot was very short and the audience very heterogeneous; especially about
the day-to-day concerns.  As an engineer working in an institute doing research
in biology, I have tried to explain what is the Reproducible Science challenge
in the modern age of data.

In short, today a scientific result is an experiment producing data *and* a
numerical processing.  From what I am seeing, the experimental part is more or
less well described, or let say that people in labs are aware of its importance
because they have already several decades (even more) of collective learning.

However, not enough people take care about the numerical processing.  Mainly, in
my opinion, because we are living a scientific paradigm shift.  From what I am
seeing, more than often, it is not understood that more scientific value is in
the numerical process than really in the data itself (or how they are produced).
Even if I am fully biased because computing is my job and I understand nothing
about labs.

To guarantee Reproducible Science in the modern age of data, we need to
guarantee several items, especially:
 1. Open Articles
 2. Open Data
 3. Open Source
 4. Controlled computing environment (open, too)
Today, initiatives have been starting, to name some, about 1. [[http://rescience.github.io/][ReScience journal]]
or french specific [[https://hal.archives-ouvertes.fr/][HAL]], 2.  [[https://zenodo.org/][Zenodo]] and 3. [[https://www.softwareheritage.org/][Software Heritage]].

However, what about the point 4.?

To fix the ideas, let consider some examples I encounter everyday.
  + Alice use the tool foo-1.2, bar-3.4 and baz-5.6
  + Carole works with Alice but works for another project with the tools foo-7.8
    and bar-9.0
  + Charlie upgrades their system and then nothing is working
  + Bob uses the same versions than Alice but he hits different results
  + Dan wants to replay the same numerical processing several months (or years)
    later but he is not able to reinstall the same versions of the tools because
    the tools have been updated breaking the backward compatibility.
With these scenarii, the idea is to spot concrete issues of the daily life of
researchers.

Each issue is fixable separately:
 * package managers fix the dependency hell
 * virtual environments fix the coexistence of several versions
 * containers fix the exact same version (and the coexistence).
But now the nightmare is to work with all these layers.  Wait, Guix already
provides all we need.

Guix allows to control with a fine grain the toolchain and this control is the
masterpiece of Reproducible Science.  At in least in my opinion.

The two keys are the binary transparency which allows to track what should be
wrong and the bootstrapping which is the root ingredient of the former.

Then, it is how Guix works, firstly as an end-user for each scenario and
secondly some plumbing presented in length elsewhere (FOSDEM, etc.)

debug log:

solving dea487d ...
found dea487d in https://yhetil.org/guix/CAJ3okZ0iOUh2hA1qpOeecQnwM0-twz=k6jgakcf-f-d8qbpOmA@mail.gmail.com/

applying [1/1] https://yhetil.org/guix/CAJ3okZ0iOUh2hA1qpOeecQnwM0-twz=k6jgakcf-f-d8qbpOmA@mail.gmail.com/
diff --git a/talks/bluehats-2019/outline b/talks/bluehats-2019/outline
new file mode 100644
index 0000000..dea487d

Checking patch talks/bluehats-2019/outline...
Applied patch talks/bluehats-2019/outline cleanly.

index at:
100644 dea487ddf4baafec42faa8d06d583c643571eac2	talks/bluehats-2019/outline

(*) Git path names are given by the tree(s) the blob belongs to.
    Blobs themselves have no identifier aside from the hash of its contents.^

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.