all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
From: kyle <kyle@posteo.net>
To: 61701@debbugs.gnu.org
Cc: Kyle Andrews <kyle@posteo.net>
Subject: [bug#61701] [PATCH] doc: Propose new cookbook section for reproducible research.
Date: Wed, 22 Feb 2023 05:17:29 +0000	[thread overview]
Message-ID: <3ffea5b37541a6f3409299f3e8e6200bc1c9aef6.1677043049.git.kyle@posteo.net> (raw)

From: Kyle Andrews <kyle@posteo.net>

The intent was to cover the most common cases where R and python using
researchers could rapidly achieve the benefits of reproducibility.
---
 doc/guix-cookbook.texi       | 174 +++++++++++++++++++++++++++++++++++
 guix/build-system/python.scm |   1 +
 2 files changed, 175 insertions(+)

diff --git a/doc/guix-cookbook.texi b/doc/guix-cookbook.texi
index b9fb916f4a..8a10bcbec7 100644
--- a/doc/guix-cookbook.texi
+++ b/doc/guix-cookbook.texi
@@ -114,6 +114,7 @@ Top
 
 Environment management
 
+* Reproducible Research in Practice:: Write manifests to create reproducible environments.
 * Guix environment via direnv:: Setup Guix environment with direnv
 
 Installing Guix on a Cluster
@@ -3538,9 +3539,182 @@ Environment management
 demonstrate such utilities.
 
 @menu
+* Reproducible Research in Practice:: Write manifests to create reproducible environments
 * Guix environment via direnv:: Setup Guix environment with direnv
 @end menu
 
+@node Reproducible Research in Practice
+@section Common scientific software environments
+
+Many researchers write applied scientific software supported by a
+mixture of more generic tools developed by teams written within the R
+and Python ecosystems and supporting shell utilities. Even researchers
+who predominantly stick to using just R or just python often have to use
+both R and python at the same time when collaborating with others.  This
+tutorial covers strategies for creating manifests to handle such
+situations.
+
+Widely used R packages are hosted on CRAN, which employs a strict test
+suite backed by continuous integration infrastructure for the latest R
+version. A positive result of this rigid discipline is that most R
+packages from the same period of time will interoperate well together
+when used with a particular R version. This means there is a clear
+low-complexity target for achieving a reproducible environment.
+
+Writing a manifest for packaging R code alone requires only minimal
+knowledge of the Guix infrastructure. This stub should work for most
+cases involving the R packages already in Guix.
+
+@example
+(use-modules
+ (gnu packages cran)
+ (gnu packages statistics))
+
+(packages->manifest
+ (list r r-tidyverse))
+
+R packages are defined predominantly inside of gnu/packages/cran.scm and
+gnu/packages/statistics.scm files under a guix source repository.
+
+This manifest can be run with the basic guix shell command:
+
+@example
+guix shell --manifest=manifest.scm --container
+@end example
+
+Please remember at the end to pin your channels so that others in the
+future know how to recover your exact Guix environment.
+
+@example
+guix describe --format=channels > channels.scm
+@end example
+
+This can be done with Guix time machine:
+
+@example
+guix time-machine --channels=channels.scm \
+  -- guix shell --manifest=manifest.scm --container
+@end example
+
+In contrast, the python scientific ecosystem is far less
+standardized. There is no effort made to integrate all python packages
+together. While there is a latest python version, it is less often less
+dominantly used for various reasons such as the fact that python tends
+to be employed with much larger teams than R is. This makes packaging up
+reproducible python environments much more difficult. Adding R together
+with python as a mixture complicates things still further. However, we
+have to be mindful of the goals of reproducible research.
+
+If reproducibility becomes an end in itself and not a catlyst towards
+faster discovery, then Guix will be a non-starter for scientists. Their
+goal is to develop useful understanding about particular aspects of the
+world.
+
+Thankfully, three common scenarios cover the vast majority of
+needs. These are:
+
+@itemize
+@item
+combining standard package definitions with custom package definitions
+@item
+combining package definitions from the current revision with other revisions
+@item
+combining package variants which need a modified build-system
+@end itemize
+
+In the rest of the tutorial we develop a manifest which tackles all
+three of these common issues. The hope is that if you see the hardest
+possible common situation as being readily solvable without writing
+thousands of lines of code, researchers will clearly see it as worth the
+effort which will not pose a significant detour from the main line of
+their research.
+
+@example
+(use-modules
+ (guix packages)
+ (guix download)
+ (guix licenses)
+ (guix profiles)
+ (gnu packages)
+ (gnu packages cran)
+ (guix inferior)
+ (guix channels)
+ (guix build-system python))
+
+;; guix import pypi APTED
+(define python-apted
+ (package
+  (name "python-apted")
+  (version "1.0.3")
+  (source (origin
+            (method url-fetch)
+            (uri (pypi-uri "apted" version))
+            (sha256
+             (base32
+              "1sawf6s5c64fgnliwy5w5yxliq2fc215m6alisl7yiflwa0m3ymy"))))
+  (build-system python-build-system)
+  (home-page "https://github.com/JoaoFelipe/apted")
+  (synopsis "APTED algorithm for the Tree Edit Distance")
+  (description "APTED algorithm for the Tree Edit Distance")
+  (license expat)))
+
+(define last-guix-with-python-3.6
+ (list
+  (channel
+   (name 'guix)
+   (url "https://git.savannah.gnu.org/git/guix.git")
+   (commit
+    "d66146073def03d1a3d61607bc6b77997284904b"))))
+
+(define connection-to-last-guix-with-python-3.6
+ (inferior-for-channels last-guix-with-python-3.6))
+
+(define first car)
+
+(define python-3.6
+ (first
+  (lookup-inferior-packages
+   connection-to-last-guix-with-python-3.6 "python")))
+
+(define python3.6-numpy
+ (first
+  (lookup-inferior-packages
+   connection-to-last-guix-with-python-3.6 "python-numpy")))
+
+(define included-packages
+ (list r r-reticulate))
+ 
+(define inferior-packages
+ (list python-3.6 python3.6-numpy))
+
+(define package-with-python-3.6
+ (package-with-explicit-python python-3.6
+  "python-" "python3.6-" 'python3-variant))
+ 
+(define custom-variant-packages
+ (list (package-with-python-3.6 python-apted)))
+
+(concatenate-manifest
+ (map packages->manifest
+  (list
+   included-packages
+   inferior-packages
+   custom-variant-packages)))
+@end example
+
+This should produce a profile with the latest R and an older python
+3.6. These should be able to interoperate with code like:
+
+@example
+library(reticulate)
+use_python("python")
+apted = import("apted")
+t1 = '{a{b}{c}}'
+t2 = '{a{b{d}}}'
+metric = apted$APTED(t1, t2)
+distance = metric$compute_edit_distance()
+@end example
+
 @node Guix environment via direnv
 @section Guix environment via direnv
 
diff --git a/guix/build-system/python.scm b/guix/build-system/python.scm
index c8f04b2298..d4aaab906d 100644
--- a/guix/build-system/python.scm
+++ b/guix/build-system/python.scm
@@ -36,6 +36,7 @@ (define-module (guix build-system python)
   #:use-module (srfi srfi-1)
   #:use-module (srfi srfi-26)
   #:export (%python-build-system-modules
+            package-with-explicit-python
             package-with-python2
             strip-python2-variant
             default-python
-- 
2.37.2





             reply	other threads:[~2023-02-22  5:18 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-02-22  5:17 kyle [this message]
2023-02-22 10:52 ` [bug#61701] [PATCH] doc: Propose new cookbook section for reproducible research Simon Tournier
2023-02-22 23:21   ` Kyle Andrews
2023-02-28 14:16     ` Simon Tournier
2023-03-02 18:30 ` Ludovic Courtès
2023-09-14 16:24   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=3ffea5b37541a6f3409299f3e8e6200bc1c9aef6.1677043049.git.kyle@posteo.net \
    --to=kyle@posteo.net \
    --cc=61701@debbugs.gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.