unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Konrad Hinsen <konrad.hinsen@cnrs.fr>
To: Simon TOURNIER <simon.tournier@inserm.fr>,
	"guix-science@gnu.org" <guix-science@gnu.org>
Subject: Re: Rproducibility for Python and beyond
Date: Sat, 15 Apr 2023 09:55:30 +0200	[thread overview]
Message-ID: <m1cz45wpe5.fsf@fastmail.net> (raw)
In-Reply-To: <73307ac3c0ef44ea9dcdc220a2307506@inserm.fr>

Hi Simon,

> French speakers, here is an interesting presentation by Konrad about
> the state of Python for scientific computing and reproducibility.
>
> https://reproducibility.gricad-pages.univ-grenoble-alpes.fr/web/presentation_110423.html#presentation_110423
>
> Without watching the video, here the questions I would like to discuss. :-) 

Summary: Why you should use Guix rather than Conda to manage your Python
environments.

Now I'll jump to the end:

> For instance, the file "guix describe -f channels" is one mean for
> capturing (and cite too!) one computational environment.  Do we need
> to make it more popular?  How to link this mean with the archiving
> part of source code (relying on SWH, say)?

Yes, we should make this more popular. With Guix, a full description of
a computational environment is:

 - hardware architecture
 - channel file
 - manifest file

Leaving out the Linux kernel and file system, which should in principle
be listed but in practice never cause any problems.

It would be nice to have tools that automatically extract a list of
citations from such a description. That is not as easy as it seems
because the list should not really be exhaustive if it is meant to be
listed in a paper for human consumption.

> 1. Considering the Konrad's schema of some scientific computation
> (Model --technical choices--> Code --computational env--> Results),
> there are also technical choices about the computational environment,
> but they are implicit.  And often impossible to scrutinize because of

Indeed. Most people are happy to leave "the environment" as a black box,
which I think is fine as long as it is (1) archivable and (2)
transparent for those who are willing to open the box.

> the lack of transparency.  The key, IMHO, is not the determinism of
> the computation, instead the key is its transparency.  Determinism is
> one mean to obtain transparency and determinism is not the only mean.

Agreed as well. The reason I tend to speak about determinism is to
illustrate why we shouldn't consider irreproducibility normal but
surprising.

> For instance, this determinism is not affordable for very intensive
> computation, where is not doable to repeat.  How to think about

True, but a niche topic. Most computational science is not HPC, and yet
suffers from reproducibility issues.

> 2. The "redo" of computations is only possible when the citation is
> correct.  L'Inria is somehow proposing

Correct and complete.

Cheers,
  Konrad.
-- 
---------------------------------------------------------------------
Konrad Hinsen
Centre de Biophysique Moléculaire, CNRS Orléans
Synchrotron Soleil - Division Expériences
Saint Aubin - BP 48
91192 Gif sur Yvette Cedex, France
Tel. +33-1 69 35 97 15
E-Mail: konrad DOT hinsen AT cnrs DOT fr
http://dirac.cnrs-orleans.fr/~hinsen/
ORCID: https://orcid.org/0000-0003-0330-9428
Mastodon: @khinsen@scholar.social
---------------------------------------------------------------------


      reply	other threads:[~2023-04-15  7:56 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-04-14 10:43 Rproducibility for Python and beyond Simon TOURNIER
2023-04-15  7:55 ` Konrad Hinsen [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=m1cz45wpe5.fsf@fastmail.net \
    --to=konrad.hinsen@cnrs.fr \
    --cc=guix-science@gnu.org \
    --cc=simon.tournier@inserm.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).