From: Lars-Dominik Braun <ldb@leibniz-psychology.org>
To: guix-science@gnu.org
Cc: Simon Tournier <simon.tournier@u-paris.fr>
Subject: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
Date: Tue, 6 Dec 2022 08:53:22 +0100 [thread overview]
Message-ID: <Y4708u/sOQYOypHE@zpidnb93> (raw)
[-- Attachment #1: Type: text/plain, Size: 7972 bytes --]
Hi Simon, hi all,
attached my draft post for hpc.guix.info regarding guix-cran.
Thanks,
Lars
* drafts/reproducible-cran.md: New file.
---
drafts/reproducible-cran.md | 195 ++++++++++++++++++++++++++++++++++++
1 file changed, 195 insertions(+)
create mode 100644 drafts/reproducible-cran.md
diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
new file mode 100644
index 0000000..c759b02
--- /dev/null
+++ b/drafts/reproducible-cran.md
@@ -0,0 +1,195 @@
+# CRAN, a practical example for being reproducible at large scale using GNU Guix
+
+GNU Guix provides scripts (“importer”) to turn packages from
+various language-specific repositories like [PyPi](https://pypi.org/)
+for Python, [crates.io](https://crates.io/) for Rust and
+[CRAN](https://cran.r-project.org/) for R into Guix package recipes.
+
+An example workflow for the CRAN package
+[zoid](https://CRAN.R-project.org/package=zoid), which is not available
+in Guix proper, would look like this:
+
+1. Import the package into a manifest.
+
+ ```console
+ $ guix import cran -r zoid > manifest.scm
+ ```
+2. Edit `manifest.scm` to import the required modules and return a
+ usable manifest containing the package and R itself.
+
+ ```scheme
+ (use-modules (guix packages)
+ (guix download)
+ (guix licenses)
+ (guix build-system r)
+ (gnu packages cran)
+ (gnu packages statistics))
+
+ (define-public r-zoid …)
+
+ (packages->manifest (list r-zoid r))
+ ```
+3. Run your code.
+
+ ```console
+ guix shell -m manifest.scm -- R -e 'library(zoid)'
+ ```
+
+Although Guix displays hints which modules are missing when trying to
+use an incomplete manifest, editing the manifest file to include all of
+them can be quite tedious.
+
+For R specifically the R package
+[guix.install](https://CRAN.R-project.org/package=guix.install) provides
+a way to automate this import. It also uses `guix import`, but references
+dependencies using package specifications like `(specification->package
+"r-bh")`. This way no extra logic to figure out the correct module
+imports is required. It then extends the package search path, including
+the newly written file at `~/.Rguix/packages.scm`, installs the package
+into the default Guix profile at `~/.guix-profile` and adds this profile
+to R’s search path.
+
+While this approach works well for individual users, Guix installations
+with a larger user-base, for instance institution-wide, would benefit
+from default availability of the entire CRAN package collection with
+pre-built substitutes to speed up installation times. Additionally
+reproducing environments would include less steps if the package
+recipes were available to anyone by default.
+
+## Introducing guix-cran
+
+GNU Guix provides a mechanism called “channels”,
+which can extend the package collection in Guix
+proper. [guix-cran](https://github.com/guix-science/guix-cran) does
+exactly that: It provides all CRAN packages missing in Guix proper in
+a channel and has all of the properties mentioned above. It can be
+installed globally via `/etc/guix/channels.scm` and packages can be
+pre-built on a central server.
+
+As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431
+packages available in guix-cran. 95% of them are buildable and only 0.5%
+of these builds are not reproducible via `guix build --check`. It is
+also possible to use old package versions via `guix time-machine`, similar
+to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility)
+offers. However, that time-frame only spans about two months right now.
+
+Creating and updating guix-cran is [fully
+automated](https://github.com/guix-science/guix-cran-scripts) and happens
+without any human intervention. Improvements to the already very good
+CRAN importer also improve the channel’s quality. The channel itself
+is always in a usable state, because updates are tested with `guix pull`
+before committing and pushing them. However some packages may not build
+or work, because (usually undeclared) build or runtime dependencies are
+missing. This could be improved through better auto-detection in the
+CRAN importer.
+
+Currently building the channel derivation is very slow, most
+likely due to Guile performance issues. For this reason packages
+are split into files by first letter. This way they can
+still be referenced deterministically by the first letter of
+their name. Since the number of loadable modules is [limited to
+8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html),
+creating one module file per package is not possible and putting them
+all into the same file is even slower.
+
+The channel is not signed, because all changes are automated anyway.
+
+## Usage
+
+Using guix-cran requires the following steps:
+
+1. Create `channels.scm`:
+
+ ```scheme
+ (cons
+ (channel
+ (name 'guix-cran)
+ (url "https://github.com/guix-science/guix-cran.git"))
+ %default-channels)
+ ```
+2. Create `manifest.scm`:
+
+ ```scheme
+ (specifications->manifest '("r-zoid" "r"))
+ ```
+3. Run:
+
+ ```console
+ guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'library(zoid)'
+ ```
+
+For true reproducibility it’s necessary to pin the channels to a
+specific commit by running
+
+```console
+guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm
+```
+
+once and using `channels.pinned.scm` instead of `channels.scm` from there on.
+
+## Appendix
+
+Ludovic Courtès, Simon Tournier and Ricardo Wurmus provided valuable
+feedback to the draft of this post.
+
+The channel statistics above can be reproduced using the following
+manifest (`channels.scm`):
+
+```scheme
+(list
+ (channel
+ (name 'guix)
+ (url "https://git.savannah.gnu.org/git/guix.git")
+ (branch "master")
+ (commit
+ "4781f0458de7419606b71bdf0fe56bca83ace910")
+ (introduction
+ (make-channel-introduction
+ "9edb3f66fd807b096b48283debdcddccfea34bad"
+ (openpgp-fingerprint
+ "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA"))))
+ (channel
+ (name 'guix-cran)
+ (url "https://github.com/guix-science/guix-cran.git")
+ (branch "master")
+ (commit
+ "cc7394098f306550c476316710ccad20a510fa4b")))
+```
+
+And the following Scheme code to obtain a list of all packages provided
+by guix-cran (`list-packages.scm`):
+
+```scheme
+(use-modules (guix discovery)
+ (gnu packages)
+ (guix modules)
+ (guix utils)
+ (guix packages))
+(let* ((modules (all-modules (%package-module-path)))
+ (packages (fold-packages
+ (lambda (p accum)
+ (let ((mod (file-name->module-name (location-file (package-location p)))))
+ (if (member (car mod) '(guix-cran))
+ (cons p accum)
+ accum)))
+ '() modules)))
+ (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages))
+```
+
+And this Bash script:
+
+```bash
+#!/bin/sh
+
+guix pull -p guix-profile -C channels.scm
+export GUIX_PROFILE=`pwd`/guix-profile
+source guix-profile/etc/profile
+guix repl list-packages.scm > packages
+cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts --timeout=300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=300 --check -q {} 2>&1' | tee build.log
+
+echo "total" && wc -l packages
+echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l
+echo "failure" && sort -u build.log | grep 'failed$' | wc -l
+echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l
+```
+
--
2.38.1
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]
next reply other threads:[~2022-12-06 8:12 UTC|newest]
Thread overview: 14+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-12-06 7:53 Lars-Dominik Braun [this message]
2022-12-06 12:51 ` [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix" Simon Tournier
2022-12-07 7:44 ` Efraim Flashner
2022-12-07 8:39 ` Lars-Dominik Braun
2022-12-07 11:11 ` Simon Tournier
2022-12-07 8:36 ` Lars-Dominik Braun
2022-12-13 13:53 ` Ludovic Courtès
2022-12-13 16:34 ` zimoun
2022-12-16 8:00 ` Lars-Dominik Braun
2022-12-16 8:58 ` Ludovic Courtès
2022-12-17 9:53 ` Lars-Dominik Braun
2022-12-17 11:43 ` Simon Tournier
2022-12-19 15:06 ` Lars-Dominik Braun
2022-12-21 14:43 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=Y4708u/sOQYOypHE@zpidnb93 \
--to=ldb@leibniz-psychology.org \
--cc=guix-science@gnu.org \
--cc=simon.tournier@u-paris.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).