unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Hugo Buddelmeijer <hugo@buddelmeijer.nl>
To: Konrad Hinsen <konrad.hinsen@cnrs.fr>
Cc: Thibault Lestang <t.lestang@imperial.ac.uk>,
	guix-science <guix-science@gnu.org>
Subject: Re: Conda environments and reproducibility
Date: Thu, 1 Dec 2022 15:01:15 +0100	[thread overview]
Message-ID: <CA+Jv8O14VdZFGABgY=7ARX0N2LqA9Fshstgocft4Yc0sEXLiTQ@mail.gmail.com> (raw)
In-Reply-To: <m1k03d517h.fsf@fastmail.net>

[-- Attachment #1: Type: text/plain, Size: 4219 bytes --]

Thanks Konrad,

On Tue, 29 Nov 2022 at 14:39, Konrad Hinsen <konrad.hinsen@cnrs.fr> wrote:

>
>  Buddelmeijer <hugo@buddelmeijer.nl> writes:
>
> > Hi Konrad, Thibault and others,
> >
> > Konrad, is it perhaps possible for you to dig up this broken conda
> > environment file?
>
> Yes:
>
>    https://gist.github.com/brospars/4671d9013f0d99e1c961482peopledab533c57
> <https://gist.github.com/brospars/4671d9013f0d99e1c961482dab533c57>
>
> That environment was set up in 2018 on a Linux machine, and then tested
> under macOS and Windows as well. It broke in early 2019.
>

Thanks. Those dependencies indeed do not contain the hashes, so it is
probably created with "conda env export --no-build".

I think such a file without build hashes would probably be what you want
when you are giving a course, because it would allow students to install
these exact versions of the packages, but build for their specific
environment (e.g. Linux / macOS / Windows). It would provide limited
reproducibility in the future, as you noticed. I guess you'd want three
sets of environment files for a conda environment for a course:

1. With unpinned dependencies, so just "scipy", whenever possible. That
way, you'd get the latest versions when rerunning the course. This requires
frequent updates to the files to restrict/pin dependencies when necessary,
e.g. "scipy<=1.8.0". This would be equivalent to a guix manifest file
without any channel information.
2. With dependencies pinned just on version, "scipy=1.8.0", like the one
you shared. This should allow you to get equivalent stacks on different
environments. Guix does not really have an equivalent, by design, since it
is not multi-platform. Although I suppose one could create a channel with
many different versions of packages; then the manifest should specify the
ones used.
3. With dependencies pinned on build hash, "scipy=1.8.0=py39hee8e79c_1".
This should give you the exact same binaries every time. Roughly equivalent
to a guix manifest with a channel file. But guix is still better, because
its dependency graph is based on source code, which is easier to archive,
so less chance of missing binaries (and more determinism).

Guix differentiates between scenarios 1 and 3 more cleanly, by having a
clean separation between the manifest and the channels.

(Lets ignore the pip packages in the conda environment file for now.)

> It doesn't seem common to overwrite conda binaries. Conda takes some (not
> > enough?) measures to prevent the scenario Konrad describes. In
> particular,
> > the filenames include a 'hash' since conda 3 (~2014) [1]:
>
> Weird. We worked with official Miniconda downloads from early 2018, and
> our environment files contain no hashes.
>

Probably due to "--no-build" in "conda env export", or maybe the default
was different back then.


> My conclusion so far is that conda can never attain long-term
> reproducibility, because it wants to be multi-platform. And that means
> that it doesn't control the foundations on which it has to build.
>

Perhaps we are at the right time. I started using conda when I myself, or
my colleagues, used many different environments. Linux, windows, mac, and
different versions thereof. Back then, anaconda was great, because it was
very hard to install everything otherwise.

However, nowadays everyone can run linux, either directly, or through WSL
(windows subsystem for linux), or through containers. And everyone knows
how to do this, and it is integrated in IDE's and such. So conda isn't
really necessary anymore.

From a user's point of view, a big problem with conda is the opacity of
> the machinery, which in addition changes all the time as you say. With
> Guix, I can understand how everything is built, and thus understand the
> potential obstacles to a rebuild many years later. With conda, I don't
> really know and my understanding is that the build machinery is not
> even completely public (for Anaconda at least).
>

I agree with you on a philosophical level; ultimately understanding
everything would be easier with guix. But we aren't there yet, I don't
understand most of the guix packages I've looked at. That is probably
because my guile/scheme skills are lacking.

Cheers,
Hugo

[-- Attachment #2: Type: text/html, Size: 5609 bytes --]

  reply	other threads:[~2022-12-01 14:01 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 17:28 Conda environments and reproducibility Thibault Lestang
2022-11-28 19:45 ` Konrad Hinsen
2022-11-29 10:32   ` Thibault Lestang
2022-11-29 13:12     ` Hugo Buddelmeijer
2022-11-29 13:39       ` Konrad Hinsen
2022-12-01 14:01         ` Hugo Buddelmeijer [this message]
2022-12-02 13:01           ` Konrad Hinsen
2022-11-29 20:10       ` Simon Tournier
2022-12-16 10:16         ` Thibault Lestang
2023-03-11 11:05           ` Ludovic Courtès
2023-03-11 11:43             ` Simon Tournier
2023-03-13 10:26               ` Lestang, Thibault
2023-03-13 11:00                 ` Ricardo Wurmus
2023-03-13 12:38                   ` Simon Tournier
2023-03-16 10:26                     ` Ludovic Courtès
2023-03-16 13:40                       ` Thibault Lestang
2023-04-03 15:22                         ` Simon Tournier
2023-04-04 12:19                           ` Thibault Lestang
2022-12-02 10:52       ` Ludovic Courtès
2022-12-02 11:05       ` Ludovic Courtès
2022-12-02 13:59         ` Simon Tournier
2022-12-02 14:06         ` Hugo Buddelmeijer
2022-11-28 20:46 ` Simon Tournier
2022-11-29 10:41   ` Thibault Lestang
2022-11-29 14:25     ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CA+Jv8O14VdZFGABgY=7ARX0N2LqA9Fshstgocft4Yc0sEXLiTQ@mail.gmail.com' \
    --to=hugo@buddelmeijer.nl \
    --cc=guix-science@gnu.org \
    --cc=konrad.hinsen@cnrs.fr \
    --cc=t.lestang@imperial.ac.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).