all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
blob c759b02b4a96218ea9b2226ad05d9be29bb549e5 7070 bytes (raw)
name: drafts/reproducible-cran.md 	 # note: path name is non-authoritative(*)

  1
  2
  3
  4
  5
  6
  7
  8
  9
 10
 11
 12
 13
 14
 15
 16
 17
 18
 19
 20
 21
 22
 23
 24
 25
 26
 27
 28
 29
 30
 31
 32
 33
 34
 35
 36
 37
 38
 39
 40
 41
 42
 43
 44
 45
 46
 47
 48
 49
 50
 51
 52
 53
 54
 55
 56
 57
 58
 59
 60
 61
 62
 63
 64
 65
 66
 67
 68
 69
 70
 71
 72
 73
 74
 75
 76
 77
 78
 79
 80
 81
 82
 83
 84
 85
 86
 87
 88
 89
 90
 91
 92
 93
 94
 95
 96
 97
 98
 99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
 
# CRAN, a practical example for being reproducible at large scale using GNU Guix

GNU Guix provides scripts (“importer”) to turn packages from
various language-specific repositories like [PyPi](https://pypi.org/)
for Python, [crates.io](https://crates.io/) for Rust and
[CRAN](https://cran.r-project.org/) for R into Guix package recipes.

An example workflow for the CRAN package
[zoid](https://CRAN.R-project.org/package=zoid), which is not available
in Guix proper, would look like this:

1. Import the package into a manifest.

   ```console
   $ guix import cran -r zoid > manifest.scm
   ```
2. Edit `manifest.scm` to import the required modules and return a
   usable manifest containing the package and R itself.

   ```scheme
   (use-modules (guix packages)
                (guix download)
                (guix licenses)
                (guix build-system r)
                (gnu packages cran)
                (gnu packages statistics))
   
   (define-public r-zoid …)
   
   (packages->manifest (list r-zoid r))
   ```
3. Run your code.

   ```console
   guix shell -m manifest.scm -- R -e 'library(zoid)'
   ```

Although Guix displays hints which modules are missing when trying to
use an incomplete manifest, editing the manifest file to include all of
them can be quite tedious.

For R specifically the R package
[guix.install](https://CRAN.R-project.org/package=guix.install) provides
a way to automate this import. It also uses `guix import`, but references
dependencies using package specifications like `(specification->package
"r-bh")`. This way no extra logic to figure out the correct module
imports is required. It then extends the package search path, including
the newly written file at `~/.Rguix/packages.scm`, installs the package
into the default Guix profile at `~/.guix-profile` and adds this profile
to R’s search path.

While this approach works well for individual users, Guix installations
with a larger user-base, for instance institution-wide, would benefit
from default availability of the entire CRAN package collection with
pre-built substitutes to speed up installation times. Additionally
reproducing environments would include less steps if the package
recipes were available to anyone by default.

## Introducing guix-cran

GNU Guix provides a mechanism called “channels”,
which can extend the package collection in Guix
proper. [guix-cran](https://github.com/guix-science/guix-cran) does
exactly that: It provides all CRAN packages missing in Guix proper in
a channel and has all of the properties mentioned above. It can be
installed globally via `/etc/guix/channels.scm` and packages can be
pre-built on a central server.

As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431
packages available in guix-cran. 95% of them are buildable and only 0.5%
of these builds are not reproducible via `guix build --check`.  It is
also possible to use old package versions via `guix time-machine`, similar
to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility)
offers. However, that time-frame only spans about two months right now.

Creating and updating guix-cran is [fully
automated](https://github.com/guix-science/guix-cran-scripts) and happens
without any human intervention. Improvements to the already very good
CRAN importer also improve the channel’s quality. The channel itself
is always in a usable state, because updates are tested with `guix pull`
before committing and pushing them. However some packages may not build
or work, because (usually undeclared) build or runtime dependencies are
missing. This could be improved through better auto-detection in the
CRAN importer.

Currently building the channel derivation is very slow, most
likely due to Guile performance issues. For this reason packages
are split into files by first letter.  This way they can
still be referenced deterministically by the first letter of
their name.  Since the number of loadable modules is [limited to
8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html),
creating one module file per package is not possible and putting them
all into the same file is even slower.

The channel is not signed, because all changes are automated anyway.

## Usage
 
Using guix-cran requires the following steps:

1. Create `channels.scm`:

   ```scheme
   (cons
     (channel
       (name 'guix-cran)
       (url "https://github.com/guix-science/guix-cran.git"))
     %default-channels)
   ```
2. Create `manifest.scm`:

   ```scheme
   (specifications->manifest '("r-zoid" "r"))
   ```
3. Run:

   ```console
   guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'library(zoid)'
   ```

For true reproducibility it’s necessary to pin the channels to a
specific commit by running

```console
guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm
```

once and using `channels.pinned.scm` instead of `channels.scm` from there on.

## Appendix

Ludovic Courtès, Simon Tournier and Ricardo Wurmus provided valuable
feedback to the draft of this post.

The channel statistics above can be reproduced using the following
manifest (`channels.scm`):

```scheme
(list
  (channel
    (name 'guix)
    (url "https://git.savannah.gnu.org/git/guix.git")
    (branch "master")
    (commit
      "4781f0458de7419606b71bdf0fe56bca83ace910")
    (introduction
      (make-channel-introduction
        "9edb3f66fd807b096b48283debdcddccfea34bad"
        (openpgp-fingerprint
          "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
  (channel
    (name 'guix-cran)
    (url "https://github.com/guix-science/guix-cran.git")
    (branch "master")
    (commit
      "cc7394098f306550c476316710ccad20a510fa4b")))
```

And the following Scheme code to obtain a list of all packages provided
by guix-cran (`list-packages.scm`):

```scheme
(use-modules (guix discovery)
             (gnu packages)
             (guix modules)
             (guix utils)
             (guix packages))
(let* ((modules (all-modules (%package-module-path)))
       (packages (fold-packages
                   (lambda (p accum)
                     (let ((mod (file-name->module-name (location-file (package-location p)))))
                       (if (member (car mod) '(guix-cran))
                         (cons p accum)
                         accum)))
                   '() modules)))
  (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages))
```

And this Bash script:

```bash
#!/bin/sh

guix pull -p guix-profile -C channels.scm
export GUIX_PROFILE=`pwd`/guix-profile
source guix-profile/etc/profile
guix repl list-packages.scm > packages
cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts --timeout=300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=300 --check -q {} 2>&1' | tee build.log

echo "total" && wc -l packages
echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l
echo "failure" && sort -u build.log | grep 'failed$' | wc -l
echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l
```


debug log:

solving c759b02 ...
found c759b02 in https://yhetil.org/guix/Y4708u/sOQYOypHE@zpidnb93/

applying [1/1] https://yhetil.org/guix/Y4708u/sOQYOypHE@zpidnb93/
diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
new file mode 100644
index 0000000..c759b02

1:33: trailing whitespace.
   
1:35: trailing whitespace.
   
1:104: trailing whitespace.
 
Checking patch drafts/reproducible-cran.md...
1:201: new blank line at EOF.
+
Applied patch drafts/reproducible-cran.md cleanly.
warning: 4 lines add whitespace errors.

index at:
100644 c759b02b4a96218ea9b2226ad05d9be29bb549e5	drafts/reproducible-cran.md

(*) Git path names are given by the tree(s) the blob belongs to.
    Blobs themselves have no identifier aside from the hash of its contents.^

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.