From: Lars-Dominik Braun <lars@6xq.net>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>
Cc: Lars-Dominik Braun <ldb@leibniz-psychology.org>,
Simon Tournier <zimon.toutoune@gmail.com>,
guix-science@gnu.org, Simon Tournier <simon.tournier@u-paris.fr>
Subject: Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
Date: Sat, 17 Dec 2022 10:53:01 +0100 [thread overview]
Message-ID: <Y52RfZbQEgYjQ0+A@noor.fritz.box> (raw)
In-Reply-To: <87tu1vk9nm.fsf@inria.fr>
[-- Attachment #1: Type: text/plain, Size: 444 bytes --]
Hi Ludo,
> Having said all that, you’re the author of the article, so let us know
> whether you want to publish it as-is or to modify it, and we’ll go ahead
> (I’ll be on IRC today). I think it’s already an insightful article!
as discussed on IRC yesterday I tried to come up with a different,
more gentle introduction – see attached patch. Please have a look and
point out any weirdness I might have introduced.
Thank you,
Lars
[-- Attachment #2: introduction.patch --]
[-- Type: text/plain, Size: 3169 bytes --]
diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
index c691163..157c1dc 100644
--- a/drafts/reproducible-cran.md
+++ b/drafts/reproducible-cran.md
@@ -4,10 +4,53 @@ tags: Reproducibility, Packages, Research
date: 2022-12-09 15:50:00
---
-GNU Guix provides scripts (“importer”) to turn packages from
-various language-specific repositories like [PyPi](https://pypi.org/)
-for Python, [crates.io](https://crates.io/) for Rust and
-[CRAN](https://cran.r-project.org/) for R into Guix package recipes.
+A recent [study published in *Nature Scientific Data* in February
+2022](https://doi.org/10.1038/s41597-022-01143-6) gives empirical insight
+into the success rate of reproducing R scripts obtained from Harvard’s
+Dataverse:
+
+> _We re-executed R code from each of the replication packages using
+> three R software versions, R 3.2, R 3.6, and R 4.0, in a clean
+> environment._
+> […]
+> _We find that 74% of R files failed to complete without
+> error in the initial execution, while 56% failed when code cleaning
+> was applied, showing that many errors can be prevented with good
+> coding practices._
+
+Given that more than half of the published R files failed to run even when
+trying to run it with three different R versions, recording the exact
+environment a software is supposed to run in could be declared a _good
+coding practice_ for scientific publications.
+
+The R ecosystem itself provides tools to capture and restore R software
+environments, including [Packrat](https://rstudio.github.io/packrat/)
+and its successor [renv](https://rstudio.github.io/renv/)
+which both originate from within the RStudio project. Two replication
+packages in the study above used renv while the others did not record
+the environment at all.
+
+Looking at renv more closely reveals that it is able to
+capture the current R version and installed packages in a lockfile
+called `renv.lock`. However, [as noted
+before](https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/),
+restoring an environment comes with a few
+[caveats](https://rstudio.github.io/renv/articles/renv.html#caveats):
+First of all, renv does not install a different version of R if the
+recorded and current version disagree. This is a manual step and up to
+the user. The same is true for packages with external dependencies. Those
+libraries, their headers and binaries also need to be installed by the
+user in the correct version, which is _not_ recorded in the lockfile.
+Furthermore renv supports restoring packages installed from git
+repositories, but fails if the user did not install git beforehand.
+
+None of the guesswork and manual installation steps are required
+when using GNU Guix, since software in it’s repositories is
+bit-for-bit reproducible. It also provides scripts (“importer”)
+to turn packages from various language-specific repositories like
+[PyPi](https://pypi.org/) for Python, [crates.io](https://crates.io/)
+for Rust and [CRAN](https://cran.r-project.org/) for R into Guix package
+recipes.
An example workflow for the CRAN package
[zoid](https://CRAN.R-project.org/package=zoid), which is not available
next prev parent reply other threads:[~2022-12-17 9:53 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-06 7:53 [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix" Lars-Dominik Braun
2022-12-06 12:51 ` Simon Tournier
2022-12-07 7:44 ` Efraim Flashner
2022-12-07 8:39 ` Lars-Dominik Braun
2022-12-07 11:11 ` Simon Tournier
2022-12-07 8:36 ` Lars-Dominik Braun
2022-12-13 13:53 ` Ludovic Courtès
2022-12-13 16:34 ` zimoun
2022-12-16 8:00 ` Lars-Dominik Braun
2022-12-16 8:58 ` Ludovic Courtès
2022-12-17 9:53 ` Lars-Dominik Braun [this message]
2022-12-17 11:43 ` Simon Tournier
2022-12-19 15:06 ` Lars-Dominik Braun
2022-12-21 14:43 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y52RfZbQEgYjQ0+A@noor.fritz.box \
--to=lars@6xq.net \
--cc=guix-science@gnu.org \
--cc=ldb@leibniz-psychology.org \
--cc=ludovic.courtes@inria.fr \
--cc=simon.tournier@u-paris.fr \
--cc=zimon.toutoune@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).