all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
@ 2022-12-06  7:53 Lars-Dominik Braun
  2022-12-06 12:51 ` Simon Tournier
  0 siblings, 1 reply; 14+ messages in thread
From: Lars-Dominik Braun @ 2022-12-06  7:53 UTC (permalink / raw)
  To: guix-science; +Cc: Simon Tournier

[-- Attachment #1: Type: text/plain, Size: 7972 bytes --]

Hi Simon, hi all,

attached my draft post for hpc.guix.info regarding guix-cran.

Thanks,
Lars

* drafts/reproducible-cran.md: New file.
---
 drafts/reproducible-cran.md | 195 ++++++++++++++++++++++++++++++++++++
 1 file changed, 195 insertions(+)
 create mode 100644 drafts/reproducible-cran.md

diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
new file mode 100644
index 0000000..c759b02
--- /dev/null
+++ b/drafts/reproducible-cran.md
@@ -0,0 +1,195 @@
+# CRAN, a practical example for being reproducible at large scale using GNU Guix
+
+GNU Guix provides scripts (“importer”) to turn packages from
+various language-specific repositories like [PyPi](https://pypi.org/)
+for Python, [crates.io](https://crates.io/) for Rust and
+[CRAN](https://cran.r-project.org/) for R into Guix package recipes.
+
+An example workflow for the CRAN package
+[zoid](https://CRAN.R-project.org/package=zoid), which is not available
+in Guix proper, would look like this:
+
+1. Import the package into a manifest.
+
+   ```console
+   $ guix import cran -r zoid > manifest.scm
+   ```
+2. Edit `manifest.scm` to import the required modules and return a
+   usable manifest containing the package and R itself.
+
+   ```scheme
+   (use-modules (guix packages)
+                (guix download)
+                (guix licenses)
+                (guix build-system r)
+                (gnu packages cran)
+                (gnu packages statistics))
+   
+   (define-public r-zoid …)
+   
+   (packages->manifest (list r-zoid r))
+   ```
+3. Run your code.
+
+   ```console
+   guix shell -m manifest.scm -- R -e 'library(zoid)'
+   ```
+
+Although Guix displays hints which modules are missing when trying to
+use an incomplete manifest, editing the manifest file to include all of
+them can be quite tedious.
+
+For R specifically the R package
+[guix.install](https://CRAN.R-project.org/package=guix.install) provides
+a way to automate this import. It also uses `guix import`, but references
+dependencies using package specifications like `(specification->package
+"r-bh")`. This way no extra logic to figure out the correct module
+imports is required. It then extends the package search path, including
+the newly written file at `~/.Rguix/packages.scm`, installs the package
+into the default Guix profile at `~/.guix-profile` and adds this profile
+to R’s search path.
+
+While this approach works well for individual users, Guix installations
+with a larger user-base, for instance institution-wide, would benefit
+from default availability of the entire CRAN package collection with
+pre-built substitutes to speed up installation times. Additionally
+reproducing environments would include less steps if the package
+recipes were available to anyone by default.
+
+## Introducing guix-cran
+
+GNU Guix provides a mechanism called “channels”,
+which can extend the package collection in Guix
+proper. [guix-cran](https://github.com/guix-science/guix-cran) does
+exactly that: It provides all CRAN packages missing in Guix proper in
+a channel and has all of the properties mentioned above. It can be
+installed globally via `/etc/guix/channels.scm` and packages can be
+pre-built on a central server.
+
+As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431
+packages available in guix-cran. 95% of them are buildable and only 0.5%
+of these builds are not reproducible via `guix build --check`.  It is
+also possible to use old package versions via `guix time-machine`, similar
+to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility)
+offers. However, that time-frame only spans about two months right now.
+
+Creating and updating guix-cran is [fully
+automated](https://github.com/guix-science/guix-cran-scripts) and happens
+without any human intervention. Improvements to the already very good
+CRAN importer also improve the channel’s quality. The channel itself
+is always in a usable state, because updates are tested with `guix pull`
+before committing and pushing them. However some packages may not build
+or work, because (usually undeclared) build or runtime dependencies are
+missing. This could be improved through better auto-detection in the
+CRAN importer.
+
+Currently building the channel derivation is very slow, most
+likely due to Guile performance issues. For this reason packages
+are split into files by first letter.  This way they can
+still be referenced deterministically by the first letter of
+their name.  Since the number of loadable modules is [limited to
+8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html),
+creating one module file per package is not possible and putting them
+all into the same file is even slower.
+
+The channel is not signed, because all changes are automated anyway.
+
+## Usage
+ 
+Using guix-cran requires the following steps:
+
+1. Create `channels.scm`:
+
+   ```scheme
+   (cons
+     (channel
+       (name 'guix-cran)
+       (url "https://github.com/guix-science/guix-cran.git"))
+     %default-channels)
+   ```
+2. Create `manifest.scm`:
+
+   ```scheme
+   (specifications->manifest '("r-zoid" "r"))
+   ```
+3. Run:
+
+   ```console
+   guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'library(zoid)'
+   ```
+
+For true reproducibility it’s necessary to pin the channels to a
+specific commit by running
+
+```console
+guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm
+```
+
+once and using `channels.pinned.scm` instead of `channels.scm` from there on.
+
+## Appendix
+
+Ludovic Courtès, Simon Tournier and Ricardo Wurmus provided valuable
+feedback to the draft of this post.
+
+The channel statistics above can be reproduced using the following
+manifest (`channels.scm`):
+
+```scheme
+(list
+  (channel
+    (name 'guix)
+    (url "https://git.savannah.gnu.org/git/guix.git")
+    (branch "master")
+    (commit
+      "4781f0458de7419606b71bdf0fe56bca83ace910")
+    (introduction
+      (make-channel-introduction
+        "9edb3f66fd807b096b48283debdcddccfea34bad"
+        (openpgp-fingerprint
+          "BBB0 2DDF 2CEA F6A8 0D1D  E643 A2A0 6DF2 A33A 54FA"))))
+  (channel
+    (name 'guix-cran)
+    (url "https://github.com/guix-science/guix-cran.git")
+    (branch "master")
+    (commit
+      "cc7394098f306550c476316710ccad20a510fa4b")))
+```
+
+And the following Scheme code to obtain a list of all packages provided
+by guix-cran (`list-packages.scm`):
+
+```scheme
+(use-modules (guix discovery)
+             (gnu packages)
+             (guix modules)
+             (guix utils)
+             (guix packages))
+(let* ((modules (all-modules (%package-module-path)))
+       (packages (fold-packages
+                   (lambda (p accum)
+                     (let ((mod (file-name->module-name (location-file (package-location p)))))
+                       (if (member (car mod) '(guix-cran))
+                         (cons p accum)
+                         accum)))
+                   '() modules)))
+  (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages))
+```
+
+And this Bash script:
+
+```bash
+#!/bin/sh
+
+guix pull -p guix-profile -C channels.scm
+export GUIX_PROFILE=`pwd`/guix-profile
+source guix-profile/etc/profile
+guix repl list-packages.scm > packages
+cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts --timeout=300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=300 --check -q {} 2>&1' | tee build.log
+
+echo "total" && wc -l packages
+echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l
+echo "failure" && sort -u build.log | grep 'failed$' | wc -l
+echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l
+```
+
-- 
2.38.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-06  7:53 [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix" Lars-Dominik Braun
@ 2022-12-06 12:51 ` Simon Tournier
  2022-12-07  7:44   ` Efraim Flashner
  2022-12-07  8:36   ` Lars-Dominik Braun
  0 siblings, 2 replies; 14+ messages in thread
From: Simon Tournier @ 2022-12-06 12:51 UTC (permalink / raw)
  To: Lars-Dominik Braun, guix-science; +Cc: Simon Tournier

Hi Lars,

On Tue, 06 Dec 2022 at 08:53, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:

> attached my draft post for hpc.guix.info regarding guix-cran.

Applied, thanks.  It is under drafts/ [1].  Last round proofread before
publishing.  On Friday?

1: <https://gitlab.inria.fr/guix-hpc/website/-/blob/master/drafts/reproducible-cran.md>


Cheers,
simon



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-06 12:51 ` Simon Tournier
@ 2022-12-07  7:44   ` Efraim Flashner
  2022-12-07  8:39     ` Lars-Dominik Braun
  2022-12-07  8:36   ` Lars-Dominik Braun
  1 sibling, 1 reply; 14+ messages in thread
From: Efraim Flashner @ 2022-12-07  7:44 UTC (permalink / raw)
  To: Simon Tournier; +Cc: Lars-Dominik Braun, guix-science, Simon Tournier


[-- Attachment #1.1: Type: text/plain, Size: 703 bytes --]

On Tue, Dec 06, 2022 at 01:51:23PM +0100, Simon Tournier wrote:
> Hi Lars,
> 
> On Tue, 06 Dec 2022 at 08:53, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:
> 
> > attached my draft post for hpc.guix.info regarding guix-cran.
> 
> Applied, thanks.  It is under drafts/ [1].  Last round proofread before
> publishing.  On Friday?
> 
> 1: <https://gitlab.inria.fr/guix-hpc/website/-/blob/master/drafts/reproducible-cran.md>

I had a couple of small changes I would make.

-- 
Efraim Flashner   <efraim@flashner.co.il>   אפרים פלשנר
GPG key = A28B F40C 3E55 1372 662D  14F7 41AA E7DC CA3D 8351
Confidentiality cannot be guaranteed on emails sent or received unencrypted

[-- Attachment #1.2: guix-hpc-cran-blogpost.diff --]
[-- Type: text/plain, Size: 2166 bytes --]

diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
index e365f70..61f7444 100644
--- a/drafts/reproducible-cran.md
+++ b/drafts/reproducible-cran.md
@@ -55,9 +55,9 @@ to R’s search path.
 
 While this approach works well for individual users, Guix installations
 with a larger user-base, for instance institution-wide, would benefit
-from default availability of the entire CRAN package collection with
-pre-built substitutes to speed up installation times. Additionally
-reproducing environments would include less steps if the package
+from the default availability of the entire CRAN package collection with
+pre-built substitutes to speed up installation times. Additionally,
+reproducing environments would include fewer steps if the package
 recipes were available to anyone by default.
 
 ## Introducing guix-cran
@@ -72,7 +72,7 @@ pre-built on a central server.
 
 As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431
 packages available in guix-cran. 95% of them are buildable and only 0.5%
-of these builds are not reproducible via `guix build --check`.  It is
+of these builds are not reproducible via `guix build --check`. It is
 also possible to use old package versions via `guix time-machine`, similar
 to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility)
 offers. However, that time-frame only spans about two months right now.
@@ -89,9 +89,9 @@ CRAN importer.
 
 Currently building the channel derivation is very slow, most
 likely due to Guile performance issues. For this reason packages
-are split into files by first letter.  This way they can
-still be referenced deterministically by the first letter of
-their name.  Since the number of loadable modules is [limited to
+are split into files by the first letter of their name. This way they can
+still be referenced deterministically by their first letter.
+Since the number of loadable modules is [limited to
 8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html),
 creating one module file per package is not possible and putting them
 all into the same file is even slower.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 833 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-06 12:51 ` Simon Tournier
  2022-12-07  7:44   ` Efraim Flashner
@ 2022-12-07  8:36   ` Lars-Dominik Braun
  2022-12-13 13:53     ` Ludovic Courtès
  1 sibling, 1 reply; 14+ messages in thread
From: Lars-Dominik Braun @ 2022-12-07  8:36 UTC (permalink / raw)
  To: Simon Tournier; +Cc: guix-science, Simon Tournier, lars


[-- Attachment #1.1: Type: text/plain, Size: 477 bytes --]

Hi Simon,

> Applied, thanks.  It is under drafts/ [1].  Last round proofread before
> publishing.  On Friday?
Friday sounds good. I’m attching minor changes to the synax highlighting.

Thanks,
Lars

-- 
Lars-Dominik Braun
Wissenschaftlicher Mitarbeiter/Research Associate

www.leibniz-psychology.org
ZPID - Leibniz-Institut für Psychologie /
ZPID - Leibniz Institute for Psychology
Universitätsring 15
D-54296 Trier - Germany
Tel.: +49–651–201-4964

[-- Attachment #1.2: 0001-reproducible-cran-Fix-console-syntax-highlighting.patch --]
[-- Type: text/plain, Size: 1546 bytes --]

From f1019fa43a34a9cf5e394c86f8389dd6ce4b07dc Mon Sep 17 00:00:00 2001
From: Lars-Dominik Braun <ldb@leibniz-psychology.org>
Date: Wed, 7 Dec 2022 09:33:21 +0100
Subject: [PATCH] reproducible-cran: Fix console syntax highlighting.

* drafts/reproducible-cran.md: Add $ symbols to the beginning of line.
---
 drafts/reproducible-cran.md | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
index c759b02..31e3e2a 100644
--- a/drafts/reproducible-cran.md
+++ b/drafts/reproducible-cran.md
@@ -32,7 +32,7 @@ in Guix proper, would look like this:
 3. Run your code.
 
    ```console
-   guix shell -m manifest.scm -- R -e 'library(zoid)'
+   $ guix shell -m manifest.scm -- R -e 'library(zoid)'
    ```
 
 Although Guix displays hints which modules are missing when trying to
@@ -115,14 +115,14 @@ Using guix-cran requires the following steps:
 3. Run:
 
    ```console
-   guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'library(zoid)'
+   $ guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'library(zoid)'
    ```
 
 For true reproducibility it’s necessary to pin the channels to a
 specific commit by running
 
 ```console
-guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm
+$ guix time-machine -C channels.scm -- describe -f channels > channels.pinned.scm
 ```
 
 once and using `channels.pinned.scm` instead of `channels.scm` from there on.
-- 
2.38.1


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-07  7:44   ` Efraim Flashner
@ 2022-12-07  8:39     ` Lars-Dominik Braun
  2022-12-07 11:11       ` Simon Tournier
  0 siblings, 1 reply; 14+ messages in thread
From: Lars-Dominik Braun @ 2022-12-07  8:39 UTC (permalink / raw)
  To: Simon Tournier, guix-science, Simon Tournier; +Cc: lars

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

Hi Efraim,

> I had a couple of small changes I would make.
these sound good to me. Can you apply them, Simon?

Thanks for proofreading,
Lars

-- 
Lars-Dominik Braun
Wissenschaftlicher Mitarbeiter/Research Associate

www.leibniz-psychology.org
ZPID - Leibniz-Institut für Psychologie /
ZPID - Leibniz Institute for Psychology
Universitätsring 15
D-54296 Trier - Germany
Tel.: +49–651–201-4964

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 659 bytes --]

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-07  8:39     ` Lars-Dominik Braun
@ 2022-12-07 11:11       ` Simon Tournier
  0 siblings, 0 replies; 14+ messages in thread
From: Simon Tournier @ 2022-12-07 11:11 UTC (permalink / raw)
  To: Lars-Dominik Braun, guix-science, Simon Tournier; +Cc: lars

Hi,

On Wed, 07 Dec 2022 at 09:39, Lars-Dominik Braun <ldb@leibniz-psychology.org> wrote:

> these sound good to me. Can you apply them, Simon?

Applied.  And the other one too.

Cheers,
simon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-07  8:36   ` Lars-Dominik Braun
@ 2022-12-13 13:53     ` Ludovic Courtès
  2022-12-13 16:34       ` zimoun
  2022-12-16  8:00       ` Lars-Dominik Braun
  0 siblings, 2 replies; 14+ messages in thread
From: Ludovic Courtès @ 2022-12-13 13:53 UTC (permalink / raw)
  To: Lars-Dominik Braun; +Cc: Simon Tournier, guix-science, Simon Tournier, lars

[-- Attachment #1: Type: text/plain, Size: 718 bytes --]

Hello!

Lars-Dominik Braun <ldb@leibniz-psychology.org> skribis:

>> Applied, thanks.  It is under drafts/ [1].  Last round proofread before
>> publishing.  On Friday?
> Friday sounds good. I’m attching minor changes to the synax highlighting.

We missed one Friday but there are plenty coming up.  :-)

As mentioned on #guix-hpc, I think it’d be interesting to add a
reference to https://www.nature.com/articles/s41597-022-01143-6 to
illustrate the rationale.  I think it’s important because R users are
likely to wonder why they’d bother with Guix in the first place.

Here’s a proposal in that direction; feel free to take it, tear it down,
change it, or whatever.

Thanks,
Ludo’.


[-- Attachment #2: Type: text/x-patch, Size: 2196 bytes --]

diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
index c691163..28f6108 100644
--- a/drafts/reproducible-cran.md
+++ b/drafts/reproducible-cran.md
@@ -60,6 +60,42 @@ pre-built substitutes to speed up installation times. Additionally,
 reproducing environments would include fewer steps if the package
 recipes were available to anyone by default.
 
+## Why deploy R software with Guix anyway?
+
+At this point, perhaps you're wondering: R is stable, and tools such as
+[Packrat](https://rstudio.github.io/packrat/) let me save and restore
+the exact R package versions I need.  While this might seem “good
+enough”, we can already tell this approach [has a number of
+shortcomings](https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/),
+one of which being that it cannot handle dependencies not written in
+R—such as R itself.
+
+A [study published in *Nature Scientific Data* in February
+2022](https://doi.org/10.1038/s41597-022-01143-6) gives empirical
+insight into this:
+
+> _[We] retrieve and analyze more than 2000 replication datasets with
+> over 9000 unique R files published from 2010 to 2020. Second, we
+> execute the code in a clean runtime environment to assess its ease of
+> reuse. […] We find that 74% of R files failed to complete without
+> error in the initial execution, while 56% failed when code cleaning
+> was applied, showing that many errors can be prevented with good
+> coding practices._
+
+Three fourth of those R packages fail to run out of the box—this is
+huge.  How did the authors re-execute this code?
+
+> _We re-executed R code from each of the replication packages using
+> three R software versions, R 3.2, R 3.6, and R 4.0, in a clean
+> environment._
+
+Despite this guesswork, coupled with automatic “source cleaning”, the
+authors found that most packages still fail to run.
+
+The motivation to deploy R software with Guix becomes clear: it’s the
+ability to automatically redeploy the same software environment, at
+different points in time, on different machines.
+
 ## Introducing guix-cran
 
 GNU Guix provides a mechanism called “channels”,

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-13 13:53     ` Ludovic Courtès
@ 2022-12-13 16:34       ` zimoun
  2022-12-16  8:00       ` Lars-Dominik Braun
  1 sibling, 0 replies; 14+ messages in thread
From: zimoun @ 2022-12-13 16:34 UTC (permalink / raw)
  To: Ludovic Courtès, Lars-Dominik Braun
  Cc: guix-science, Simon Tournier, lars

Hi,

On Tue, 13 Dec 2022 at 14:53, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

> We missed one Friday but there are plenty coming up.  :-)

Oh, sorry.


> Here’s a proposal in that direction; feel free to take it, tear it down,
> change it, or whatever.

Lars?  Well, if it is fine for you, then the best seems that Ludo, you
apply your change and publish, WDYT?


Cheers,
simon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-13 13:53     ` Ludovic Courtès
  2022-12-13 16:34       ` zimoun
@ 2022-12-16  8:00       ` Lars-Dominik Braun
  2022-12-16  8:58         ` Ludovic Courtès
  1 sibling, 1 reply; 14+ messages in thread
From: Lars-Dominik Braun @ 2022-12-16  8:00 UTC (permalink / raw)
  To: Ludovic Courtès
  Cc: Lars-Dominik Braun, Simon Tournier, guix-science, Simon Tournier

Hi Ludo,

> As mentioned on #guix-hpc, I think it’d be interesting to add a
> reference to https://www.nature.com/articles/s41597-022-01143-6 to
> illustrate the rationale.  I think it’s important because R users are
> likely to wonder why they’d bother with Guix in the first place.
from the article and the quotes in your patch I feel it’s not clear
the execution failures are the result of mismatched dependencies. Sure,
if I put on my Guix glasses I would assume they are at least partially
responsible, but in “Limitations of the Study” they mention they did
not investigate causes for the failures. So arguing that code quality
in these open repositories is just terrible – as we can see from the
automated cleaning step doing wonders – would be equally valid. Or am
I missing something?

You’re right that if the blog post would be published in a non-Guix
context it would need a good reason to use Guix, but in this case I was
just describing a cool new toy for people already using Guix. Is that
mind-set acceptable for posts on hpc.guix.info or do we need a motivating
section?

Sorry for the late and quite negative reply :(
Lars

PS: I believe packrat has been superseeded by renv.



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-16  8:00       ` Lars-Dominik Braun
@ 2022-12-16  8:58         ` Ludovic Courtès
  2022-12-17  9:53           ` Lars-Dominik Braun
  0 siblings, 1 reply; 14+ messages in thread
From: Ludovic Courtès @ 2022-12-16  8:58 UTC (permalink / raw)
  To: Lars-Dominik Braun
  Cc: Lars-Dominik Braun, Simon Tournier, guix-science, Simon Tournier

Hello Lars,

Lars-Dominik Braun <lars@6xq.net> skribis:

>> As mentioned on #guix-hpc, I think it’d be interesting to add a
>> reference to https://www.nature.com/articles/s41597-022-01143-6 to
>> illustrate the rationale.  I think it’s important because R users are
>> likely to wonder why they’d bother with Guix in the first place.
> from the article and the quotes in your patch I feel it’s not clear
> the execution failures are the result of mismatched dependencies. Sure,
> if I put on my Guix glasses I would assume they are at least partially
> responsible, but in “Limitations of the Study” they mention they did
> not investigate causes for the failures. So arguing that code quality
> in these open repositories is just terrible – as we can see from the
> automated cleaning step doing wonders – would be equally valid. Or am
> I missing something?

The point I wanted to make is that, instead of going through the hacks
they describe (R version guesswork, source “cleanup”) and yet being
unable to run a large part of the code, we could have a tool that
ensures *by construction* that one is going to be able to rerun the
code.

> You’re right that if the blog post would be published in a non-Guix
> context it would need a good reason to use Guix, but in this case I was
> just describing a cool new toy for people already using Guix. Is that
> mind-set acceptable for posts on hpc.guix.info or do we need a motivating
> section?

The way I see it, we’re trying to reach out to people who’re using R and
are interested in reproducible research.  Their first reaction might be
“this sounds nice, but is it really necessary?”, or: “isn’t renv/packrat
already doing the job?”  Guix fans already know the answers.  :-)

Having said all that, you’re the author of the article, so let us know
whether you want to publish it as-is or to modify it, and we’ll go ahead
(I’ll be on IRC today).  I think it’s already an insightful article!

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-16  8:58         ` Ludovic Courtès
@ 2022-12-17  9:53           ` Lars-Dominik Braun
  2022-12-17 11:43             ` Simon Tournier
  0 siblings, 1 reply; 14+ messages in thread
From: Lars-Dominik Braun @ 2022-12-17  9:53 UTC (permalink / raw)
  To: Ludovic Courtès
  Cc: Lars-Dominik Braun, Simon Tournier, guix-science, Simon Tournier

[-- Attachment #1: Type: text/plain, Size: 444 bytes --]

Hi Ludo,

> Having said all that, you’re the author of the article, so let us know
> whether you want to publish it as-is or to modify it, and we’ll go ahead
> (I’ll be on IRC today).  I think it’s already an insightful article!
as discussed on IRC yesterday I tried to come up with a different,
more gentle introduction – see attached patch. Please have a look and
point out any weirdness I might have introduced.

Thank you,
Lars
 

[-- Attachment #2: introduction.patch --]
[-- Type: text/plain, Size: 3169 bytes --]

diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
index c691163..157c1dc 100644
--- a/drafts/reproducible-cran.md
+++ b/drafts/reproducible-cran.md
@@ -4,10 +4,53 @@ tags: Reproducibility, Packages, Research
 date: 2022-12-09 15:50:00
 ---
 
-GNU Guix provides scripts (“importer”) to turn packages from
-various language-specific repositories like [PyPi](https://pypi.org/)
-for Python, [crates.io](https://crates.io/) for Rust and
-[CRAN](https://cran.r-project.org/) for R into Guix package recipes.
+A recent [study published in *Nature Scientific Data* in February
+2022](https://doi.org/10.1038/s41597-022-01143-6) gives empirical insight
+into the success rate of reproducing R scripts obtained from Harvard’s
+Dataverse:
+
+> _We re-executed R code from each of the replication packages using
+> three R software versions, R 3.2, R 3.6, and R 4.0, in a clean
+> environment._
+> […]
+> _We find that 74% of R files failed to complete without
+> error in the initial execution, while 56% failed when code cleaning
+> was applied, showing that many errors can be prevented with good
+> coding practices._
+
+Given that more than half of the published R files failed to run even when
+trying to run it with three different R versions, recording the exact
+environment a software is supposed to run in could be declared a _good
+coding practice_ for scientific publications.
+
+The R ecosystem itself provides tools to capture and restore R software
+environments, including [Packrat](https://rstudio.github.io/packrat/)
+and its successor [renv](https://rstudio.github.io/renv/)
+which both originate from within the RStudio project. Two replication
+packages in the study above used renv while the others did not record
+the environment at all.
+
+Looking at renv more closely reveals that it is able to
+capture the current R version and installed packages in a lockfile
+called `renv.lock`. However, [as noted
+before](https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/),
+restoring an environment comes with a few
+[caveats](https://rstudio.github.io/renv/articles/renv.html#caveats):
+First of all, renv does not install a different version of R if the
+recorded and current version disagree. This is a manual step and up to
+the user. The same is true for packages with external dependencies. Those
+libraries, their headers and binaries also need to be installed by the
+user in the correct version, which is _not_ recorded in the lockfile.
+Furthermore renv supports restoring packages installed from git
+repositories, but fails if the user did not install git beforehand.
+
+None of the guesswork and manual installation steps are required
+when using GNU Guix, since software in it’s repositories is
+bit-for-bit reproducible. It also provides scripts (“importer”)
+to turn packages from various language-specific repositories like
+[PyPi](https://pypi.org/) for Python, [crates.io](https://crates.io/)
+for Rust and [CRAN](https://cran.r-project.org/) for R into Guix package
+recipes.
 
 An example workflow for the CRAN package
 [zoid](https://CRAN.R-project.org/package=zoid), which is not available

^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-17  9:53           ` Lars-Dominik Braun
@ 2022-12-17 11:43             ` Simon Tournier
  2022-12-19 15:06               ` Lars-Dominik Braun
  0 siblings, 1 reply; 14+ messages in thread
From: Simon Tournier @ 2022-12-17 11:43 UTC (permalink / raw)
  To: Lars-Dominik Braun, Ludovic Courtès
  Cc: Lars-Dominik Braun, guix-science, Simon Tournier

Hi Lars,

On Sat, 17 Dec 2022 at 10:53, Lars-Dominik Braun <lars@6xq.net> wrote:

> diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md
> index c691163..157c1dc 100644
> --- a/drafts/reproducible-cran.md
> +++ b/drafts/reproducible-cran.md

LGTM, I have merged this patch.  Let publish it on Monday if no more
comments. :-)


Cheers,
simon


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-17 11:43             ` Simon Tournier
@ 2022-12-19 15:06               ` Lars-Dominik Braun
  2022-12-21 14:43                 ` Ludovic Courtès
  0 siblings, 1 reply; 14+ messages in thread
From: Lars-Dominik Braun @ 2022-12-19 15:06 UTC (permalink / raw)
  To: Simon Tournier
  Cc: Ludovic Courtès, Lars-Dominik Braun, guix-science,
	Simon Tournier

Hi simon,

> LGTM, I have merged this patch.  Let publish it on Monday if no more
> comments. :-)
alright, let’s do it.

Thanks,
Lars



^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix".
  2022-12-19 15:06               ` Lars-Dominik Braun
@ 2022-12-21 14:43                 ` Ludovic Courtès
  0 siblings, 0 replies; 14+ messages in thread
From: Ludovic Courtès @ 2022-12-21 14:43 UTC (permalink / raw)
  To: Lars-Dominik Braun
  Cc: Simon Tournier, Lars-Dominik Braun, guix-science, Simon Tournier

Hi Lars,

Lars-Dominik Braun <lars@6xq.net> skribis:

>> LGTM, I have merged this patch.  Let publish it on Monday if no more
>> comments. :-)
> alright, let’s do it.

It was delayed a bit (there was a release on Monday :-)) but it’s now
on-line:

  https://hpc.guix.info/blog/2022/12/cran-a-practical-example-for-being-reproducible-at-large-scale-using-gnu-guix/

Thank you for your work!

Ludo’.


^ permalink raw reply	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2022-12-21 14:43 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-06  7:53 [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix" Lars-Dominik Braun
2022-12-06 12:51 ` Simon Tournier
2022-12-07  7:44   ` Efraim Flashner
2022-12-07  8:39     ` Lars-Dominik Braun
2022-12-07 11:11       ` Simon Tournier
2022-12-07  8:36   ` Lars-Dominik Braun
2022-12-13 13:53     ` Ludovic Courtès
2022-12-13 16:34       ` zimoun
2022-12-16  8:00       ` Lars-Dominik Braun
2022-12-16  8:58         ` Ludovic Courtès
2022-12-17  9:53           ` Lars-Dominik Braun
2022-12-17 11:43             ` Simon Tournier
2022-12-19 15:06               ` Lars-Dominik Braun
2022-12-21 14:43                 ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.