# CRAN, a practical example for being reproducible at large scale using GNU Guix GNU Guix provides scripts (“importer”) to turn packages from various language-specific repositories like [PyPi](https://pypi.org/) for Python, [crates.io](https://crates.io/) for Rust and [CRAN](https://cran.r-project.org/) for R into Guix package recipes. An example workflow for the CRAN package [zoid](https://CRAN.R-project.org/package=zoid), which is not available in Guix proper, would look like this: 1. Import the package into a manifest. ```console $ guix import cran -r zoid > manifest.scm ``` 2. Edit `manifest.scm` to import the required modules and return a usable manifest containing the package and R itself. ```scheme (use-modules (guix packages) (guix download) (guix licenses) (guix build-system r) (gnu packages cran) (gnu packages statistics)) (define-public r-zoid …) (packages->manifest (list r-zoid r)) ``` 3. Run your code. ```console $ guix shell -m manifest.scm -- R -e 'library(zoid)' ``` Although Guix displays hints which modules are missing when trying to use an incomplete manifest, editing the manifest file to include all of them can be quite tedious. For R specifically the R package [guix.install](https://CRAN.R-project.org/package=guix.install) provides a way to automate this import. It also uses `guix import`, but references dependencies using package specifications like `(specification->package "r-bh")`. This way no extra logic to figure out the correct module imports is required. It then extends the package search path, including the newly written file at `~/.Rguix/packages.scm`, installs the package into the default Guix profile at `~/.guix-profile` and adds this profile to R’s search path. While this approach works well for individual users, Guix installations with a larger user-base, for instance institution-wide, would benefit from default availability of the entire CRAN package collection with pre-built substitutes to speed up installation times. Additionally reproducing environments would include less steps if the package recipes were available to anyone by default. ## Introducing guix-cran GNU Guix provides a mechanism called “channels”, which can extend the package collection in Guix proper. [guix-cran](https://github.com/guix-science/guix-cran) does exactly that: It provides all CRAN packages missing in Guix proper in a channel and has all of the properties mentioned above. It can be installed globally via `/etc/guix/channels.scm` and packages can be pre-built on a central server. As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431 packages available in guix-cran. 95% of them are buildable and only 0.5% of these builds are not reproducible via `guix build --check`. It is also possible to use old package versions via `guix time-machine`, similar to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility) offers. However, that time-frame only spans about two months right now. Creating and updating guix-cran is [fully automated](https://github.com/guix-science/guix-cran-scripts) and happens without any human intervention. Improvements to the already very good CRAN importer also improve the channel’s quality. The channel itself is always in a usable state, because updates are tested with `guix pull` before committing and pushing them. However some packages may not build or work, because (usually undeclared) build or runtime dependencies are missing. This could be improved through better auto-detection in the CRAN importer. Currently building the channel derivation is very slow, most likely due to Guile performance issues. For this reason packages are split into files by first letter. This way they can still be referenced deterministically by the first letter of their name. Since the number of loadable modules is [limited to 8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html), creating one module file per package is not possible and putting them all into the same file is even slower. The channel is not signed, because all changes are automated anyway. ## Usage Using guix-cran requires the following steps: 1. Create `channels.scm`: ```scheme (cons (channel (name 'guix-cran) (url "https://github.com/guix-science/guix-cran.git")) %default-channels) ``` 2. Create `manifest.scm`: ```scheme (specifications->manifest '("r-zoid" "r")) ``` 3. Run: ```console $ guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'library(zoid)' ``` For true reproducibility it’s necessary to pin the channels to a specific commit by running ```console $ guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm ``` once and using `channels.pinned.scm` instead of `channels.scm` from there on. ## Appendix Ludovic Courtès, Simon Tournier and Ricardo Wurmus provided valuable feedback to the draft of this post. The channel statistics above can be reproduced using the following manifest (`channels.scm`): ```scheme (list (channel (name 'guix) (url "https://git.savannah.gnu.org/git/guix.git") (branch "master") (commit "4781f0458de7419606b71bdf0fe56bca83ace910") (introduction (make-channel-introduction "9edb3f66fd807b096b48283debdcddccfea34bad" (openpgp-fingerprint "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))) (channel (name 'guix-cran) (url "https://github.com/guix-science/guix-cran.git") (branch "master") (commit "cc7394098f306550c476316710ccad20a510fa4b"))) ``` And the following Scheme code to obtain a list of all packages provided by guix-cran (`list-packages.scm`): ```scheme (use-modules (guix discovery) (gnu packages) (guix modules) (guix utils) (guix packages)) (let* ((modules (all-modules (%package-module-path))) (packages (fold-packages (lambda (p accum) (let ((mod (file-name->module-name (location-file (package-location p))))) (if (member (car mod) '(guix-cran)) (cons p accum) accum))) '() modules))) (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages)) ``` And this Bash script: ```bash #!/bin/sh guix pull -p guix-profile -C channels.scm export GUIX_PROFILE=`pwd`/guix-profile source guix-profile/etc/profile guix repl list-packages.scm > packages cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts --timeout=300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=300 --check -q {} 2>&1' | tee build.log echo "total" && wc -l packages echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l echo "failure" && sort -u build.log | grep 'failed$' | wc -l echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l ```