From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id QKbxHdKziGM0XgAAbAwnHQ (envelope-from ) for ; Thu, 01 Dec 2022 15:01:54 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id YKQuHdKziGO4egEAG6o9tA (envelope-from ) for ; Thu, 01 Dec 2022 15:01:54 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id EE9AE1F847 for ; Thu, 1 Dec 2022 15:01:53 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p0k89-0006iP-Qs; Thu, 01 Dec 2022 09:01:42 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p0k84-0006hd-AV for guix-science@gnu.org; Thu, 01 Dec 2022 09:01:32 -0500 Received: from mail-ej1-f48.google.com ([209.85.218.48]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p0k81-0003yy-Ly for guix-science@gnu.org; Thu, 01 Dec 2022 09:01:32 -0500 Received: by mail-ej1-f48.google.com with SMTP id n21so4390389ejb.9 for ; Thu, 01 Dec 2022 06:01:28 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=siOTooXX4ESWJwFZ+BiC7H/Ct+ahpqt+PiQRT1ZGMEU=; b=I9SDtpu+wli7MmU42Uun1YVd1ClkIG+APwI0SOIWAbQPRfWmFhW2HFZ4AkNfHkAEYE 9osljjUFSuxE2PAgE2Jz/I6zmMI76THxrEOFRoTwe+Vn6f1ZiXI4cLGyA1QGnapniliE 4bBYOLxz3kWdJQDftalphor2qt7h5+3PwFc7iby7C2JInNLd0foygChoLF1N6rZQvWJI yEG3+NCdmJV/6zGVjItGJGutfz1RIfw2+mKCtPl2g241DWIbE5RMVijmvGZ5QZIH/GF/ RZQKML0SVag6X2TdYhBrEiaCEbO9ycS2VrM6T6s5OIdPokkquSfLjkXjVsaPpQEOerP6 qzeg== X-Gm-Message-State: ANoB5pknSP7hSjIv2YP6PPemOCFyoGCXeZ0btz9Iywl3A7Hrh3pLvvr1 zOOZu1jmPvpxuiN24Abzq36rsiC5pTDHmSObOlg= X-Google-Smtp-Source: AA0mqf6gaTx0/9+nRVHewAFz+aeqApVczGbtRbc3j5n/v8E4Zn+9d+qpUuU/mRIY17xOX7oP9ho/bomz3lcVlQ+i1pY= X-Received: by 2002:a17:906:16d1:b0:7be:893:fea with SMTP id t17-20020a17090616d100b007be08930feamr21457358ejd.468.1669903287257; Thu, 01 Dec 2022 06:01:27 -0800 (PST) MIME-Version: 1.0 References: <87pmd7ar8k.fsf@imperial.ac.uk> <87zgcayre2.fsf@imperial.ac.uk> In-Reply-To: From: Hugo Buddelmeijer Date: Thu, 1 Dec 2022 15:01:15 +0100 Message-ID: Subject: Re: Conda environments and reproducibility To: Konrad Hinsen Cc: Thibault Lestang , guix-science Content-Type: multipart/alternative; boundary="0000000000003f1bc805eec4a8d5" Received-SPF: pass client-ip=209.85.218.48; envelope-from=blackshift@gmail.com; helo=mail-ej1-f48.google.com X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: guix-science-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1669903314; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=siOTooXX4ESWJwFZ+BiC7H/Ct+ahpqt+PiQRT1ZGMEU=; b=ea3zvM2gz76xSbz1DneBebxZpdLp34byJq/ekohpsMBBgOUHkqpmtHdPYFwGvdDFawLL+8 NrusICcnEIVO7+2nHvVzc+jU6yUeop9UpnZKp0iTam6e7xJxxJhUgyqHo9tfmIahChN+ey lndGB86Mi9y2ErfLmXaTKa95rcKMIa56j2opEqITD3IupTZqOcPQPvxsD/ckNcUs/Rv35r m/GdCt0wV7jCkAWHf6qXUR6DdiW3MoC3SYhQ/BuF6ldOq43mbXzY3dUVL36aT+rR2waooV 91yDOcMmwsxGVdcrGz2UiQG/5FWM52FrOlgR10Y4N3e/0BkSwahghAJSpFjTcw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1669903314; a=rsa-sha256; cv=none; b=ELYXP/LbA+EUX9Vul1H5d0V98DAWMitCrnMWDye+2WHxx2lqJRtBi2TudeYqBT3Tk8KrBW CXZ1oNrhR5Tc15mwqHX14KWb25T5Ifndz4C3P5SzuoITBSRHoNcKvtPH//bFNIVmFTkSVo n1zQqM+KUwqgWCYFR1ANipA3MeEGddXNdjixeuTod14mq+9t/oygun7HZ2D/UWsqJ/dISo DAWsbHRdQ7p6ae2rCashTJ7MhwXMf2fR9FY0ZSnrxuYKQfwfcwkMQUhfcq3kIHtDvdrR1l PfieD6veH7f+u5v8LFUD2mbVbIBEwKvEF+I+wjT7xqFI2nEprX2Absd7ns73Wg== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -2.76 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: EE9AE1F847 X-Spam-Score: -2.76 X-Migadu-Scanner: scn1.migadu.com X-TUID: 0PF6T8DoJZlu --0000000000003f1bc805eec4a8d5 Content-Type: text/plain; charset="UTF-8" Thanks Konrad, On Tue, 29 Nov 2022 at 14:39, Konrad Hinsen wrote: > > Buddelmeijer writes: > > > Hi Konrad, Thibault and others, > > > > Konrad, is it perhaps possible for you to dig up this broken conda > > environment file? > > Yes: > > https://gist.github.com/brospars/4671d9013f0d99e1c961482peopledab533c57 > > > That environment was set up in 2018 on a Linux machine, and then tested > under macOS and Windows as well. It broke in early 2019. > Thanks. Those dependencies indeed do not contain the hashes, so it is probably created with "conda env export --no-build". I think such a file without build hashes would probably be what you want when you are giving a course, because it would allow students to install these exact versions of the packages, but build for their specific environment (e.g. Linux / macOS / Windows). It would provide limited reproducibility in the future, as you noticed. I guess you'd want three sets of environment files for a conda environment for a course: 1. With unpinned dependencies, so just "scipy", whenever possible. That way, you'd get the latest versions when rerunning the course. This requires frequent updates to the files to restrict/pin dependencies when necessary, e.g. "scipy<=1.8.0". This would be equivalent to a guix manifest file without any channel information. 2. With dependencies pinned just on version, "scipy=1.8.0", like the one you shared. This should allow you to get equivalent stacks on different environments. Guix does not really have an equivalent, by design, since it is not multi-platform. Although I suppose one could create a channel with many different versions of packages; then the manifest should specify the ones used. 3. With dependencies pinned on build hash, "scipy=1.8.0=py39hee8e79c_1". This should give you the exact same binaries every time. Roughly equivalent to a guix manifest with a channel file. But guix is still better, because its dependency graph is based on source code, which is easier to archive, so less chance of missing binaries (and more determinism). Guix differentiates between scenarios 1 and 3 more cleanly, by having a clean separation between the manifest and the channels. (Lets ignore the pip packages in the conda environment file for now.) > It doesn't seem common to overwrite conda binaries. Conda takes some (not > > enough?) measures to prevent the scenario Konrad describes. In > particular, > > the filenames include a 'hash' since conda 3 (~2014) [1]: > > Weird. We worked with official Miniconda downloads from early 2018, and > our environment files contain no hashes. > Probably due to "--no-build" in "conda env export", or maybe the default was different back then. > My conclusion so far is that conda can never attain long-term > reproducibility, because it wants to be multi-platform. And that means > that it doesn't control the foundations on which it has to build. > Perhaps we are at the right time. I started using conda when I myself, or my colleagues, used many different environments. Linux, windows, mac, and different versions thereof. Back then, anaconda was great, because it was very hard to install everything otherwise. However, nowadays everyone can run linux, either directly, or through WSL (windows subsystem for linux), or through containers. And everyone knows how to do this, and it is integrated in IDE's and such. So conda isn't really necessary anymore. >From a user's point of view, a big problem with conda is the opacity of > the machinery, which in addition changes all the time as you say. With > Guix, I can understand how everything is built, and thus understand the > potential obstacles to a rebuild many years later. With conda, I don't > really know and my understanding is that the build machinery is not > even completely public (for Anaconda at least). > I agree with you on a philosophical level; ultimately understanding everything would be easier with guix. But we aren't there yet, I don't understand most of the guix packages I've looked at. That is probably because my guile/scheme skills are lacking. Cheers, Hugo --0000000000003f1bc805eec4a8d5 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Thanks Konrad,

On Tue, 29 Nov 2022 at = 14:39, Konrad Hinsen <konrad.hi= nsen@cnrs.fr> wrote:

=C2=A0Buddelmeijer <hugo@buddelmeijer.nl> writes:

> Hi Konrad, Thibault and others,
>
> Konrad, is it perhaps possible for you to dig up this broken conda
> environment file?

Yes:

=C2=A0 =C2=A0https://gist.github.co= m/brospars/4671d9013f0d99e1c961482peopledab533c57

That environment was set up in 2018 on a Linux machine, and then tested
under macOS and Windows as well. It broke in early 2019.

Thanks. Those dependencies indeed do not contain the hash= es, so it is probably created with "conda env export --no-build".=

I think such a file without build hashes would pr= obably be what you want when you are giving a course, because it would allo= w students to install these exact versions of the packages, but build for t= heir specific environment (e.g. Linux / macOS / Windows). It would provide = limited reproducibility in the future, as you noticed. I guess you'd w= ant three sets of environment files for a conda environment for a course:

1. With unpinned dependencies, so just "scipy&= quot;, whenever possible. That way, you'd get the latest versions when = rerunning the course. This requires frequent updates to the files to restri= ct/pin dependencies when necessary, e.g. "scipy<=3D1.8.0". Thi= s would be equivalent to a guix manifest file without any channel informati= on.
2. With dependencies pinned just on version, "scipy= =3D1.8.0", like the one you shared. This should allow you to get equiv= alent stacks on different environments. Guix does not really have an equiva= lent, by design, since it is not multi-platform. Although I suppose one cou= ld create a channel with many different versions of packages; then the mani= fest should specify the ones used.
3. With dependencies pinned on= build hash, "scipy=3D1.8.0=3Dpy39hee8e79c_1". This should give y= ou the exact same binaries every time. Roughly equivalent to a guix manifes= t with a channel file. But guix is still better, because its dependency gra= ph is based on source code, which is easier to archive, so less chance of m= issing binaries (and more determinism).

Guix diffe= rentiates between scenarios 1 and 3 more cleanly, by having a clean separat= ion between the manifest and the channels.

(Le= ts ignore the pip packages in the conda environment file for now.)

> It doesn't seem common to overwrite conda binaries. Conda takes so= me (not
> enough?) measures to prevent the scenario Konrad describes. In particu= lar,
> the filenames include a 'hash' since conda 3 (~2014) [1]:

Weird. We worked with official Miniconda downloads from early 2018, and
our environment files contain no hashes.

Probably due to "--no-build" in "conda env export", o= r maybe the default was different back then.
=C2=A0
My conclusion so far is that conda can never attain long-term
reproducibility, because it wants to be multi-platform. And that means
that it doesn't control the foundations on which it has to build.

Perhaps we are at the right time. I started = using conda when I myself, or my colleagues, used many different environmen= ts. Linux, windows, mac, and different versions thereof. Back then, anacond= a was great, because it was very hard to install everything otherwise.

However, nowadays everyone can run linux, either direc= tly, or through WSL (windows subsystem for linux), or through containers. A= nd everyone knows how to do this, and it is integrated in IDE's and suc= h. So conda isn't really necessary anymore.

>From a user's point of view, a big problem with conda is the opacity of=
the machinery, which in addition changes all the time as you say. With
Guix, I can understand how everything is built, and thus understand the
potential obstacles to a rebuild many years later. With conda, I don't<= br> really know and my understanding is that the build machinery is not
even completely public (for Anaconda at least).

I agree with you on a philosophical level; ultimately understandin= g everything would be easier with guix. But we aren't there yet, I don&= #39;t understand most of the guix packages I've looked at. That is prob= ably because my guile/scheme skills are lacking.

Cheers,
Hugo

--0000000000003f1bc805eec4a8d5--