diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md index c691163..157c1dc 100644 --- a/drafts/reproducible-cran.md +++ b/drafts/reproducible-cran.md @@ -4,10 +4,53 @@ tags: Reproducibility, Packages, Research date: 2022-12-09 15:50:00 --- -GNU Guix provides scripts (“importer”) to turn packages from -various language-specific repositories like [PyPi](https://pypi.org/) -for Python, [crates.io](https://crates.io/) for Rust and -[CRAN](https://cran.r-project.org/) for R into Guix package recipes. +A recent [study published in *Nature Scientific Data* in February +2022](https://doi.org/10.1038/s41597-022-01143-6) gives empirical insight +into the success rate of reproducing R scripts obtained from Harvard’s +Dataverse: + +> _We re-executed R code from each of the replication packages using +> three R software versions, R 3.2, R 3.6, and R 4.0, in a clean +> environment._ +> […] +> _We find that 74% of R files failed to complete without +> error in the initial execution, while 56% failed when code cleaning +> was applied, showing that many errors can be prevented with good +> coding practices._ + +Given that more than half of the published R files failed to run even when +trying to run it with three different R versions, recording the exact +environment a software is supposed to run in could be declared a _good +coding practice_ for scientific publications. + +The R ecosystem itself provides tools to capture and restore R software +environments, including [Packrat](https://rstudio.github.io/packrat/) +and its successor [renv](https://rstudio.github.io/renv/) +which both originate from within the RStudio project. Two replication +packages in the study above used renv while the others did not record +the environment at all. + +Looking at renv more closely reveals that it is able to +capture the current R version and installed packages in a lockfile +called `renv.lock`. However, [as noted +before](https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/), +restoring an environment comes with a few +[caveats](https://rstudio.github.io/renv/articles/renv.html#caveats): +First of all, renv does not install a different version of R if the +recorded and current version disagree. This is a manual step and up to +the user. The same is true for packages with external dependencies. Those +libraries, their headers and binaries also need to be installed by the +user in the correct version, which is _not_ recorded in the lockfile. +Furthermore renv supports restoring packages installed from git +repositories, but fails if the user did not install git beforehand. + +None of the guesswork and manual installation steps are required +when using GNU Guix, since software in it’s repositories is +bit-for-bit reproducible. It also provides scripts (“importer”) +to turn packages from various language-specific repositories like +[PyPi](https://pypi.org/) for Python, [crates.io](https://crates.io/) +for Rust and [CRAN](https://cran.r-project.org/) for R into Guix package +recipes. An example workflow for the CRAN package [zoid](https://CRAN.R-project.org/package=zoid), which is not available