Hi Simon, hi all, attached my draft post for hpc.guix.info regarding guix-cran. Thanks, Lars * drafts/reproducible-cran.md: New file. --- drafts/reproducible-cran.md | 195 ++++++++++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 drafts/reproducible-cran.md diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md new file mode 100644 index 0000000..c759b02 --- /dev/null +++ b/drafts/reproducible-cran.md @@ -0,0 +1,195 @@ +# CRAN, a practical example for being reproducible at large scale using GNU Guix + +GNU Guix provides scripts (“importer”) to turn packages from +various language-specific repositories like [PyPi](https://pypi.org/) +for Python, [crates.io](https://crates.io/) for Rust and +[CRAN](https://cran.r-project.org/) for R into Guix package recipes. + +An example workflow for the CRAN package +[zoid](https://CRAN.R-project.org/package=zoid), which is not available +in Guix proper, would look like this: + +1. Import the package into a manifest. + + ```console + $ guix import cran -r zoid > manifest.scm + ``` +2. Edit `manifest.scm` to import the required modules and return a + usable manifest containing the package and R itself. + + ```scheme + (use-modules (guix packages) + (guix download) + (guix licenses) + (guix build-system r) + (gnu packages cran) + (gnu packages statistics)) + + (define-public r-zoid …) + + (packages->manifest (list r-zoid r)) + ``` +3. Run your code. + + ```console + guix shell -m manifest.scm -- R -e 'library(zoid)' + ``` + +Although Guix displays hints which modules are missing when trying to +use an incomplete manifest, editing the manifest file to include all of +them can be quite tedious. + +For R specifically the R package +[guix.install](https://CRAN.R-project.org/package=guix.install) provides +a way to automate this import. It also uses `guix import`, but references +dependencies using package specifications like `(specification->package +"r-bh")`. This way no extra logic to figure out the correct module +imports is required. It then extends the package search path, including +the newly written file at `~/.Rguix/packages.scm`, installs the package +into the default Guix profile at `~/.guix-profile` and adds this profile +to R’s search path. + +While this approach works well for individual users, Guix installations +with a larger user-base, for instance institution-wide, would benefit +from default availability of the entire CRAN package collection with +pre-built substitutes to speed up installation times. Additionally +reproducing environments would include less steps if the package +recipes were available to anyone by default. + +## Introducing guix-cran + +GNU Guix provides a mechanism called “channels”, +which can extend the package collection in Guix +proper. [guix-cran](https://github.com/guix-science/guix-cran) does +exactly that: It provides all CRAN packages missing in Guix proper in +a channel and has all of the properties mentioned above. It can be +installed globally via `/etc/guix/channels.scm` and packages can be +pre-built on a central server. + +As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431 +packages available in guix-cran. 95% of them are buildable and only 0.5% +of these builds are not reproducible via `guix build --check`. It is +also possible to use old package versions via `guix time-machine`, similar +to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility) +offers. However, that time-frame only spans about two months right now. + +Creating and updating guix-cran is [fully +automated](https://github.com/guix-science/guix-cran-scripts) and happens +without any human intervention. Improvements to the already very good +CRAN importer also improve the channel’s quality. The channel itself +is always in a usable state, because updates are tested with `guix pull` +before committing and pushing them. However some packages may not build +or work, because (usually undeclared) build or runtime dependencies are +missing. This could be improved through better auto-detection in the +CRAN importer. + +Currently building the channel derivation is very slow, most +likely due to Guile performance issues. For this reason packages +are split into files by first letter. This way they can +still be referenced deterministically by the first letter of +their name. Since the number of loadable modules is [limited to +8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html), +creating one module file per package is not possible and putting them +all into the same file is even slower. + +The channel is not signed, because all changes are automated anyway. + +## Usage + +Using guix-cran requires the following steps: + +1. Create `channels.scm`: + + ```scheme + (cons + (channel + (name 'guix-cran) + (url "https://github.com/guix-science/guix-cran.git")) + %default-channels) + ``` +2. Create `manifest.scm`: + + ```scheme + (specifications->manifest '("r-zoid" "r")) + ``` +3. Run: + + ```console + guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'library(zoid)' + ``` + +For true reproducibility it’s necessary to pin the channels to a +specific commit by running + +```console +guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm +``` + +once and using `channels.pinned.scm` instead of `channels.scm` from there on. + +## Appendix + +Ludovic Courtès, Simon Tournier and Ricardo Wurmus provided valuable +feedback to the draft of this post. + +The channel statistics above can be reproduced using the following +manifest (`channels.scm`): + +```scheme +(list + (channel + (name 'guix) + (url "https://git.savannah.gnu.org/git/guix.git") + (branch "master") + (commit + "4781f0458de7419606b71bdf0fe56bca83ace910") + (introduction + (make-channel-introduction + "9edb3f66fd807b096b48283debdcddccfea34bad" + (openpgp-fingerprint + "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))) + (channel + (name 'guix-cran) + (url "https://github.com/guix-science/guix-cran.git") + (branch "master") + (commit + "cc7394098f306550c476316710ccad20a510fa4b"))) +``` + +And the following Scheme code to obtain a list of all packages provided +by guix-cran (`list-packages.scm`): + +```scheme +(use-modules (guix discovery) + (gnu packages) + (guix modules) + (guix utils) + (guix packages)) +(let* ((modules (all-modules (%package-module-path))) + (packages (fold-packages + (lambda (p accum) + (let ((mod (file-name->module-name (location-file (package-location p))))) + (if (member (car mod) '(guix-cran)) + (cons p accum) + accum))) + '() modules))) + (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages)) +``` + +And this Bash script: + +```bash +#!/bin/sh + +guix pull -p guix-profile -C channels.scm +export GUIX_PROFILE=`pwd`/guix-profile +source guix-profile/etc/profile +guix repl list-packages.scm > packages +cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts --timeout=300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=300 --check -q {} 2>&1' | tee build.log + +echo "total" && wc -l packages +echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l +echo "failure" && sort -u build.log | grep 'failed$' | wc -l +echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l +``` + -- 2.38.1