From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id yC9EFJ+RnWNjbAEAbAwnHQ (envelope-from ) for ; Sat, 17 Dec 2022 10:53:35 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id iCxvE5+RnWOEpwAAG6o9tA (envelope-from ) for ; Sat, 17 Dec 2022 10:53:35 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C4CFE1AD31 for ; Sat, 17 Dec 2022 10:53:34 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p6Tsk-0001Jo-B8; Sat, 17 Dec 2022 04:53:26 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p6Tsi-0001I7-TC for guix-science@gnu.org; Sat, 17 Dec 2022 04:53:24 -0500 Received: from mout-p-101.mailbox.org ([2001:67c:2050:0:465::101]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_CHACHA20_POLY1305:256) (Exim 4.90_1) (envelope-from ) id 1p6Tsb-00058Q-As for guix-science@gnu.org; Sat, 17 Dec 2022 04:53:24 -0500 Received: from smtp1.mailbox.org (smtp1.mailbox.org [10.196.197.1]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange ECDHE (P-384) server-signature RSA-PSS (4096 bits) server-digest SHA256) (No client certificate requested) by mout-p-101.mailbox.org (Postfix) with ESMTPS id 4NZ1Vm00Qfz9sRs; Sat, 17 Dec 2022 10:53:04 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=6xq.net; s=MBO0001; t=1671270784; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=ScgUTWZ7VrOgvd5IaeDfzC/jbcJH5bF+ZVnVz98Q9SQ=; b=L3HvOqmUs7gX4tt9AWT8agARgke97iWi9YQ3thqosJFCoJmPNGeDVt91ERNUfGYlO8/6kS DvzvnjvK8UN0Mrur95cFHyftMR8kn3HDFz9hb+NrHA9ZYLEdNb3AKidR13NmB6KPpmYnHR Xa8BKPtkbTB/N3D0yLAkhGOPMBQxUFuc+nGJ5fvyDbCLpZOEBJveC+a3stWfSxSPGCx0ki 8HR9Dd9+/rv7HsA0uJ5oJhZM3uiSWlLP/LY/zgrKgxbnly3BlaSKyDRw3AGq392gpsyHHX kNS2/R94lapTrz2uWYCR+caJySNM6ClyAPMN/5jDZEKaU+jnW1ier8sndvlMAg== Date: Sat, 17 Dec 2022 10:53:01 +0100 From: Lars-Dominik Braun To: Ludovic =?iso-8859-1?Q?Court=E8s?= Cc: Lars-Dominik Braun , Simon Tournier , guix-science@gnu.org, Simon Tournier Subject: Re: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix". Message-ID: References: <86y1rkitkk.fsf@gmail.com> <875yefwgtd.fsf@gnu.org> <87tu1vk9nm.fsf@inria.fr> MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="9aqt+mCbmNRyltL9" Content-Disposition: inline Content-Transfer-Encoding: 8bit In-Reply-To: <87tu1vk9nm.fsf@inria.fr> Received-SPF: pass client-ip=2001:67c:2050:0:465::101; envelope-from=lars@6xq.net; helo=mout-p-101.mailbox.org X-Spam_score_int: -27 X-Spam_score: -2.8 X-Spam_bar: -- X-Spam_report: (-2.8 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: guix-science-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1671270814; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=ScgUTWZ7VrOgvd5IaeDfzC/jbcJH5bF+ZVnVz98Q9SQ=; b=rUoTrr82Ii2ZygcAZ3JHe5oYWtSjGNa+/c0GO5YIEdkyn+T9xEdDkXWdrPkB/keDhuLZXG GUqEKMw5WjXR2/XW13Sc6nQwaZV7GpoRBcU+533M/6qJhZwJsCm6fm00x+vnlkygPscfRr mPz7AOXXOQQkWcFjvk4wGWxl7RGFnp9FHlUDnntsVkY4+HY4D29252Qbcyx9HSqHGfsqbj 25HaVUzAQy32YXjZHlyv1P8pW4H8iS4uKyv8KD+O/EGksE6NIGYlZrPp5qq41HoPy7ewBN b5mLwqgevcOASp5paQjHQ6dcAT0D0pQ/FmQ8CJPbCRE9VsZVr/HNW6ro0elI8Q== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=6xq.net header.s=MBO0001 header.b=L3HvOqmU; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=6xq.net ARC-Seal: i=1; s=key1; d=yhetil.org; t=1671270814; a=rsa-sha256; cv=none; b=fTtpbpnu+VK6ojzgqIJCX2gUSA7ibiqukhXQ1b7+sIfnsn2R6K5CUWsUDNzWEhq6dxbnHs AWRyuzgde1QeEGarEyOgig6EJLNJIOvMN4F6c9Pr01o6RvTEoHc7ZHDdUPJFH4wtcHubCM g3N+cKsEwjRbbPmFrREpc4w6QK/UDqo9ajurlksGxi0zMLUCK7X1FMS9zUL1PtuvhXmIIT rLV2uA0GMplPsXVUYhjCIxzq2VfVU8cH0Ge9MlJ23wcePw5nmIxvrUBOkyJzx0+DH0jRaJ OHGQe9og6vELYCovoeAMHDtvQYy+mnraTqiBclrQVOODPjA66xQWEBRuONrSRg== X-Migadu-Spam-Score: -6.28 X-Spam-Score: -6.28 X-Migadu-Queue-Id: C4CFE1AD31 X-Migadu-Scanner: scn0.migadu.com Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=6xq.net header.s=MBO0001 header.b=L3HvOqmU; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org"; dmarc=pass (policy=none) header.from=6xq.net X-TUID: IR0MGcGukv6L --9aqt+mCbmNRyltL9 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: 8bit Hi Ludo, > Having said all that, you’re the author of the article, so let us know > whether you want to publish it as-is or to modify it, and we’ll go ahead > (I’ll be on IRC today). I think it’s already an insightful article! as discussed on IRC yesterday I tried to come up with a different, more gentle introduction – see attached patch. Please have a look and point out any weirdness I might have introduced. Thank you, Lars --9aqt+mCbmNRyltL9 Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename="introduction.patch" Content-Transfer-Encoding: 8bit diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md index c691163..157c1dc 100644 --- a/drafts/reproducible-cran.md +++ b/drafts/reproducible-cran.md @@ -4,10 +4,53 @@ tags: Reproducibility, Packages, Research date: 2022-12-09 15:50:00 --- -GNU Guix provides scripts (“importer”) to turn packages from -various language-specific repositories like [PyPi](https://pypi.org/) -for Python, [crates.io](https://crates.io/) for Rust and -[CRAN](https://cran.r-project.org/) for R into Guix package recipes. +A recent [study published in *Nature Scientific Data* in February +2022](https://doi.org/10.1038/s41597-022-01143-6) gives empirical insight +into the success rate of reproducing R scripts obtained from Harvard’s +Dataverse: + +> _We re-executed R code from each of the replication packages using +> three R software versions, R 3.2, R 3.6, and R 4.0, in a clean +> environment._ +> […] +> _We find that 74% of R files failed to complete without +> error in the initial execution, while 56% failed when code cleaning +> was applied, showing that many errors can be prevented with good +> coding practices._ + +Given that more than half of the published R files failed to run even when +trying to run it with three different R versions, recording the exact +environment a software is supposed to run in could be declared a _good +coding practice_ for scientific publications. + +The R ecosystem itself provides tools to capture and restore R software +environments, including [Packrat](https://rstudio.github.io/packrat/) +and its successor [renv](https://rstudio.github.io/renv/) +which both originate from within the RStudio project. Two replication +packages in the study above used renv while the others did not record +the environment at all. + +Looking at renv more closely reveals that it is able to +capture the current R version and installed packages in a lockfile +called `renv.lock`. However, [as noted +before](https://hpc.guix.info/blog/2022/07/is-reproducibility-practical/), +restoring an environment comes with a few +[caveats](https://rstudio.github.io/renv/articles/renv.html#caveats): +First of all, renv does not install a different version of R if the +recorded and current version disagree. This is a manual step and up to +the user. The same is true for packages with external dependencies. Those +libraries, their headers and binaries also need to be installed by the +user in the correct version, which is _not_ recorded in the lockfile. +Furthermore renv supports restoring packages installed from git +repositories, but fails if the user did not install git beforehand. + +None of the guesswork and manual installation steps are required +when using GNU Guix, since software in it’s repositories is +bit-for-bit reproducible. It also provides scripts (“importer”) +to turn packages from various language-specific repositories like +[PyPi](https://pypi.org/) for Python, [crates.io](https://crates.io/) +for Rust and [CRAN](https://cran.r-project.org/) for R into Guix package +recipes. An example workflow for the CRAN package [zoid](https://CRAN.R-project.org/package=zoid), which is not available --9aqt+mCbmNRyltL9--