From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id SIdJD1r5jmPjtgAAbAwnHQ (envelope-from ) for ; Tue, 06 Dec 2022 09:12:10 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id SNNmD1r5jmNBawEAauVa8A (envelope-from ) for ; Tue, 06 Dec 2022 09:12:10 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id B14C51CD42 for ; Tue, 6 Dec 2022 09:12:09 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p2T3I-0006MF-0K; Tue, 06 Dec 2022 03:11:44 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p2Sld-0001p0-76 for guix-science@gnu.org; Tue, 06 Dec 2022 02:53:29 -0500 Received: from mail-wr1-x42d.google.com ([2a00:1450:4864:20::42d]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p2Sla-0000qk-HP for guix-science@gnu.org; Tue, 06 Dec 2022 02:53:28 -0500 Received: by mail-wr1-x42d.google.com with SMTP id q7so22264724wrr.8 for ; Mon, 05 Dec 2022 23:53:24 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=leibniz-psychology-org.20210112.gappssmtp.com; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :from:to:cc:subject:date:message-id:reply-to; bh=8DEFBUZQLBljCyoq1PYeW0+CqCELZl3aBSiV5dFCGPQ=; b=dUEaX/F0WqtiuDKnwTCndSCmEnniAUyEm/V1nQDsQ8lpFhWeDcOHwDdgTqo1FIbJo0 qKYnOmRnr9tx9TjEq5N1J+qIJUkoJuBUuI7SkAvIX0inasDeUoW4fNJxeCmbQm28NPei lPsGf0+uiaF7wZkoOKhoyTwHDOIRbgvw3kx9nlW46+ZY+vhd0atIanjW1CTXTZFqZ2v3 Hcl3Sx2rSMZwSQVaylTdPxVw+vbA1ayC5XvGeBXD8XAVUPvBwakQ7/HSpWcqInCZfxpT qeuNC529sZHdfAP38j/6RsD83TO2j261R8l1uSpk7TgRBvwxeajstHKTgqFmhrQLmB10 cV2w== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-disposition:mime-version:message-id:subject:cc:to:from:date :x-gm-message-state:from:to:cc:subject:date:message-id:reply-to; bh=8DEFBUZQLBljCyoq1PYeW0+CqCELZl3aBSiV5dFCGPQ=; b=3qPrfLP+P0/wmKwY0x13fl9TFhFxyPeobKINWlRGW6mQ5/TC8kBP+3aKDUXMTFRsO7 xFFDFrW1IqJxKo0r7JtoYiz2V2nxbESdTNX8txiNeoP8ZuYxuonfOZreGrjilV5NMVdG jWWeu47PnpIltzo9iwbvU98YsiE74rye1lboDFIIHrNTUK/qYCfX0oZPePeRlWgkmNPv qIIPrjXgK7F7hGrXsV3r7wVXo23LuU7OFjvMfTLV2e90fxY4HdjlYIC8F/e0Tf5rzz50 7s/YOtINg+rGHI1nPff9XeHaBwcetSXhrmBfne9PUWqxT/HaV9Peuro8Mtw3ovu7sZ8L wAlw== X-Gm-Message-State: ANoB5pk8JYcHLPoYpP3LF2PfcUG0ZSodMiOU6DdI7/LZFzeJtQsUl8Gc PznambJ0Xv1m0JW7Q7WNhfmu0ktJhnGGECZxJrbrfCWYBHcgseZwsLX2eF0XZcO24H/bUZ7PhV5 EPVSHoq7EeuUD6RtIwG6d1SlihtMm8BLWO0Js4OKX+txt3HNkDUY86HGH4WqGnngPH792IKGI/Z tkcUogdSMs9/pM X-Google-Smtp-Source: AA0mqf6CSKl5q62+7m9mzMxzPhw561tC1WscbAJVQwkGYN3bwtIPUv+KzE4rz5FmlLXr/6yp9CwMUQ== X-Received: by 2002:a5d:5d0f:0:b0:236:6f6f:8dd7 with SMTP id ch15-20020a5d5d0f000000b002366f6f8dd7mr42676666wrb.4.1670313203517; Mon, 05 Dec 2022 23:53:23 -0800 (PST) Received: from localhost (opensense.uni-trier.de. [136.199.1.50]) by smtp.gmail.com with ESMTPSA id l2-20020a5d6742000000b0024216d2e386sm15743637wrw.16.2022.12.05.23.53.22 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 05 Dec 2022 23:53:23 -0800 (PST) Date: Tue, 6 Dec 2022 08:53:22 +0100 From: Lars-Dominik Braun To: guix-science@gnu.org Cc: Simon Tournier Subject: [PATCH] Add draft post "CRAN, a practical example for being reproducible at large scale using GNU Guix". Message-ID: MIME-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha256; protocol="application/pgp-signature"; boundary="D7T8RCToD7KS8BO2" Content-Disposition: inline Received-SPF: none client-ip=2a00:1450:4864:20::42d; envelope-from=ldb@leibniz-psychology.org; helo=mail-wr1-x42d.google.com X-Spam_score_int: -18 X-Spam_score: -1.9 X-Spam_bar: - X-Spam_report: (-1.9 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_NONE=0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 06 Dec 2022 03:11:37 -0500 X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: guix-science-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1670314329; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=8DEFBUZQLBljCyoq1PYeW0+CqCELZl3aBSiV5dFCGPQ=; b=ug6hvk0/LYZ9sg/2qQyV4p/eERogO84Np3mKQ9siEHLB1cEevuNauxlc1Alzth3EZwJUgN Be73r3tZRiPaUz3YfqavsZm9qz7LqQ+GyR476ScVwhDV/yO0Ww3pERyANjegEiHFzzGsMX QYVGhuA42I2eeCd0rRWxoAhAw0JWL1SSil0kYLPWLsMR2logCTulSmJToqpOMxw6oJ+t4p 77RWhonq0vGGqNVd35IcwWJoMN8uVtmq5faXPKgCNICtv1fIi0rz3upeovv7Vb+Nywj0Qx WBhubwWAzY+llL+1mkkZbo+5tA7H2fuauTuM6bw9Ipc+nF/21zBZl+rcigiYrg== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1670314329; a=rsa-sha256; cv=none; b=HguJZCe1BqEwvx7+3LK/CuhNQ9fZ0PzpWVDD4sf/XAGHWX9vafyEItndzlg37IbeHmT2Fc OCdFEKqUKPnydk6po7GZRWszK+VLdk14mQlYh8U7hZEG0rBuq/2M+B0Sz+Rh6GLwiliuBf 3+s6P3hkOgvnyZ0DMCf7oqV+fMMUrIug4Yz1ZX+du/c99CjcBbIPO78iLhETNXmLm4gZmB ssGN54GvA7C6DXBJFYAfSgepa9Wjo3u+l+ovujwKXBGW2jRfRKVbTDU+M1X5jvtT3wccnQ Tjq4bymxPzR43ZQVaBCs06XakykmpNPUB4AsAbGVz7WarLgJ04MMlq6Xqe0Kvw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=leibniz-psychology-org.20210112.gappssmtp.com header.s=20210112 header.b="dUEaX/F0"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -5.36 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=leibniz-psychology-org.20210112.gappssmtp.com header.s=20210112 header.b="dUEaX/F0"; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: B14C51CD42 X-Spam-Score: -5.36 X-Migadu-Scanner: scn0.migadu.com X-TUID: FXXP1vHVidxn --D7T8RCToD7KS8BO2 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline Content-Transfer-Encoding: quoted-printable Hi Simon, hi all, attached my draft post for hpc.guix.info regarding guix-cran. Thanks, Lars * drafts/reproducible-cran.md: New file. --- drafts/reproducible-cran.md | 195 ++++++++++++++++++++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 drafts/reproducible-cran.md diff --git a/drafts/reproducible-cran.md b/drafts/reproducible-cran.md new file mode 100644 index 0000000..c759b02 --- /dev/null +++ b/drafts/reproducible-cran.md @@ -0,0 +1,195 @@ +# CRAN, a practical example for being reproducible at large scale using GN= U Guix + +GNU Guix provides scripts (=E2=80=9Cimporter=E2=80=9D) to turn packages fr= om +various language-specific repositories like [PyPi](https://pypi.org/) +for Python, [crates.io](https://crates.io/) for Rust and +[CRAN](https://cran.r-project.org/) for R into Guix package recipes. + +An example workflow for the CRAN package +[zoid](https://CRAN.R-project.org/package=3Dzoid), which is not available +in Guix proper, would look like this: + +1. Import the package into a manifest. + + ```console + $ guix import cran -r zoid > manifest.scm + ``` +2. Edit `manifest.scm` to import the required modules and return a + usable manifest containing the package and R itself. + + ```scheme + (use-modules (guix packages) + (guix download) + (guix licenses) + (guix build-system r) + (gnu packages cran) + (gnu packages statistics)) + =20 + (define-public r-zoid =E2=80=A6) + =20 + (packages->manifest (list r-zoid r)) + ``` +3. Run your code. + + ```console + guix shell -m manifest.scm -- R -e 'library(zoid)' + ``` + +Although Guix displays hints which modules are missing when trying to +use an incomplete manifest, editing the manifest file to include all of +them can be quite tedious. + +For R specifically the R package +[guix.install](https://CRAN.R-project.org/package=3Dguix.install) provides +a way to automate this import. It also uses `guix import`, but references +dependencies using package specifications like `(specification->package +"r-bh")`. This way no extra logic to figure out the correct module +imports is required. It then extends the package search path, including +the newly written file at `~/.Rguix/packages.scm`, installs the package +into the default Guix profile at `~/.guix-profile` and adds this profile +to R=E2=80=99s search path. + +While this approach works well for individual users, Guix installations +with a larger user-base, for instance institution-wide, would benefit +from default availability of the entire CRAN package collection with +pre-built substitutes to speed up installation times. Additionally +reproducing environments would include less steps if the package +recipes were available to anyone by default. + +## Introducing guix-cran + +GNU Guix provides a mechanism called =E2=80=9Cchannels=E2=80=9D, +which can extend the package collection in Guix +proper. [guix-cran](https://github.com/guix-science/guix-cran) does +exactly that: It provides all CRAN packages missing in Guix proper in +a channel and has all of the properties mentioned above. It can be +installed globally via `/etc/guix/channels.scm` and packages can be +pre-built on a central server. + +As of commit `cc7394098f306550c476316710ccad20a510fa4b` there are 17431 +packages available in guix-cran. 95% of them are buildable and only 0.5% +of these builds are not reproducible via `guix build --check`. It is +also possible to use old package versions via `guix time-machine`, similar +to what [MRAN](https://mran.microsoft.com/documents/rro/reproducibility) +offers. However, that time-frame only spans about two months right now. + +Creating and updating guix-cran is [fully +automated](https://github.com/guix-science/guix-cran-scripts) and happens +without any human intervention. Improvements to the already very good +CRAN importer also improve the channel=E2=80=99s quality. The channel itse= lf +is always in a usable state, because updates are tested with `guix pull` +before committing and pushing them. However some packages may not build +or work, because (usually undeclared) build or runtime dependencies are +missing. This could be improved through better auto-detection in the +CRAN importer. + +Currently building the channel derivation is very slow, most +likely due to Guile performance issues. For this reason packages +are split into files by first letter. This way they can +still be referenced deterministically by the first letter of +their name. Since the number of loadable modules is [limited to +8192](https://www.mail-archive.com/guile-devel@gnu.org/msg16244.html), +creating one module file per package is not possible and putting them +all into the same file is even slower. + +The channel is not signed, because all changes are automated anyway. + +## Usage +=20 +Using guix-cran requires the following steps: + +1. Create `channels.scm`: + + ```scheme + (cons + (channel + (name 'guix-cran) + (url "https://github.com/guix-science/guix-cran.git")) + %default-channels) + ``` +2. Create `manifest.scm`: + + ```scheme + (specifications->manifest '("r-zoid" "r")) + ``` +3. Run: + + ```console + guix time-machine -C channels.scm -- shell -m manifest.scm -- R -e 'lib= rary(zoid)' + ``` + +For true reproducibility it=E2=80=99s necessary to pin the channels to a +specific commit by running + +```console +guix time-machine -C channels.scm -- describe -f channels > channels.pinne= d.scm +``` + +once and using `channels.pinned.scm` instead of `channels.scm` from there = on. + +## Appendix + +Ludovic Court=C3=A8s, Simon Tournier and Ricardo Wurmus provided valuable +feedback to the draft of this post. + +The channel statistics above can be reproduced using the following +manifest (`channels.scm`): + +```scheme +(list + (channel + (name 'guix) + (url "https://git.savannah.gnu.org/git/guix.git") + (branch "master") + (commit + "4781f0458de7419606b71bdf0fe56bca83ace910") + (introduction + (make-channel-introduction + "9edb3f66fd807b096b48283debdcddccfea34bad" + (openpgp-fingerprint + "BBB0 2DDF 2CEA F6A8 0D1D E643 A2A0 6DF2 A33A 54FA")))) + (channel + (name 'guix-cran) + (url "https://github.com/guix-science/guix-cran.git") + (branch "master") + (commit + "cc7394098f306550c476316710ccad20a510fa4b"))) +``` + +And the following Scheme code to obtain a list of all packages provided +by guix-cran (`list-packages.scm`): + +```scheme +(use-modules (guix discovery) + (gnu packages) + (guix modules) + (guix utils) + (guix packages)) +(let* ((modules (all-modules (%package-module-path))) + (packages (fold-packages + (lambda (p accum) + (let ((mod (file-name->module-name (location-file (pa= ckage-location p))))) + (if (member (car mod) '(guix-cran)) + (cons p accum) + accum))) + '() modules))) + (for-each (lambda (p) (format #t "~a~%" (package-name p))) packages)) +``` + +And this Bash script: + +```bash +#!/bin/sh + +guix pull -p guix-profile -C channels.scm +export GUIX_PROFILE=3D`pwd`/guix-profile +source guix-profile/etc/profile +guix repl list-packages.scm > packages +cat packages| parallel -j 4 'rm -f builds/{} && guix build --no-grafts --t= imeout=3D300 -r builds/{} -q {} 2>&1 && guix build --no-grafts --timeout=3D= 300 --check -q {} 2>&1' | tee build.log + +echo "total" && wc -l packages +echo "success" && sort -u build.log | grep '^/gnu/store' | wc -l +echo "failure" && sort -u build.log | grep 'failed$' | wc -l +echo "non-reproducible" && sort -u build.log | grep 'differs$' | wc -l +``` + --=20 2.38.1 --D7T8RCToD7KS8BO2 Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQGzBAABCAAdFiEEyk+M9DfXR4/aBV/UQhN3ARo3hEYFAmOO9PAACgkQQhN3ARo3 hEYo8wwAuvsNXllE0e/9IRz+wRcA/Ux/rwkpBGe6+1iZfW+SsWBQeRIkeYi/T7za SkM32psMAwPxex6OcxzZFRtILN/Q5DonXSQ/pEt3o6pXylCnvAVMBJlnJYVIi7JS 584kiTENnGmRPmskxtJDtg4EqkvOwufpn0ujBrJk3DQXSs4adeoai1xdrMwP2vEN Pip8Wxv6UWkALJFZbJgEjPObH9LUbimLIruE/PRTOZtWG1+tfGA84kzVwcZMSmm/ 2orIuhRCbpEKuG/CHZIyNSXkirQNGvWoXuGY66U0BaXvx4ONuv+DLeZzb3/c6xhm t1cTKyDTiq4Rxh5YvMYZEQLq0qaOxnjlBLA8xcgP7PErqUchUSZEDw9G7ZrM6NTd c11XmXoWfPRjhuY5f5XsQ7j3HOSElOwjYRGI/0pxanCwv6VjNx7GgXX1+zLFfGrk ukUeaf9M6M1RHH3KPaIrKywnd9dbRykGDGFbyUBG/8A3iOKu/2XJP2yp2B7sjyAL 350fkOqo =i9aP -----END PGP SIGNATURE----- --D7T8RCToD7KS8BO2--