From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id IPNGD4gGimNGFgAAbAwnHQ (envelope-from ) for ; Fri, 02 Dec 2022 15:07:04 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id WKWPDogGimO8qwAAG6o9tA (envelope-from ) for ; Fri, 02 Dec 2022 15:07:04 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id A96168C60 for ; Fri, 2 Dec 2022 15:07:03 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p16gm-0007sq-Ad; Fri, 02 Dec 2022 09:06:52 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p16gd-0007pQ-R9 for guix-science@gnu.org; Fri, 02 Dec 2022 09:06:43 -0500 Received: from mail-ej1-f49.google.com ([209.85.218.49]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p16gX-0002cZ-I4; Fri, 02 Dec 2022 09:06:42 -0500 Received: by mail-ej1-f49.google.com with SMTP id ud5so11791278ejc.4; Fri, 02 Dec 2022 06:06:33 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=wwH49wEeV4jnYlChxIPk0DwH8k6SjMV2+DuVJ22PJ8Q=; b=GLRQnr9dpWuoH0LkTJ/eNVDwgohYrda0uBGmmGPvO0A9b8fLnHrnr4Zy2/qK+V4z+a ogr/vvUvplLqf5lqRBwoQa+a57Y76HfJ7G9dflbCzE6+UF0LlGNe9DmVwXejt1baG/DS opKBuqFVciLn4I/svWKCq7SJMpaD3i+c4E0Z5F91fKlKC844mbQd8CdmPs252FrXv6eT oqpwlwUbSgqF169Mb+Ib7Xz4tqv0HiCDRGpZnDskjWw5Wh5qsq0syFO8hEKMmQAXCU4v Zq+n0n05JUQ7kCjzdf0YJrTm6cEV3bcns/jFAe3FDdo6oWMQQFkao6tiKo10+Msgq3BF LmGA== X-Gm-Message-State: ANoB5pm4BAj5EEvCLIVMvMniM/8JBKsEaBlUClgdGK+7gFLyanGWXq8O nPrJCGjG5LCtTmXRHlVki3bFLkTg/gVPm5eO+QXvlFovE2cXsw== X-Google-Smtp-Source: AA0mqf7tlmxUgr761YiWsTpfJgG6Fb8xQBEjnhkFkiB+ZVRBH7GpITIOV4+KUBDxS3+Es4ViUbuuvCLY1GjDD3oYsxs= X-Received: by 2002:a17:906:65c4:b0:7ad:d250:b907 with SMTP id z4-20020a17090665c400b007add250b907mr59655515ejn.737.1669989991877; Fri, 02 Dec 2022 06:06:31 -0800 (PST) MIME-Version: 1.0 References: <87pmd7ar8k.fsf@imperial.ac.uk> <87zgcayre2.fsf@imperial.ac.uk> <87k03at69n.fsf@gnu.org> In-Reply-To: <87k03at69n.fsf@gnu.org> From: Hugo Buddelmeijer Date: Fri, 2 Dec 2022 15:06:20 +0100 Message-ID: Subject: Re: Conda environments and reproducibility To: =?UTF-8?Q?Ludovic_Court=C3=A8s?= Cc: Thibault Lestang , Konrad Hinsen , guix-science Content-Type: multipart/alternative; boundary="0000000000003e909005eed8d84d" Received-SPF: pass client-ip=209.85.218.49; envelope-from=blackshift@gmail.com; helo=mail-ej1-f49.google.com X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.25, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: guix-science-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1669990023; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=wwH49wEeV4jnYlChxIPk0DwH8k6SjMV2+DuVJ22PJ8Q=; b=XXc5NkZAiDsaSV7oU12iumRy4dPptzn6pwAOou6p6jCMHaO08Mwh/2pGm/tX1ynTxIcNZm /sdn4z8NcbYEPMyRNTnAnmoVb19tO+BpHgLBVol7KrpMzdM1Fe+bT+/QErMBff4exnrYSn x3FEz7uW+kb+iLqmWT0zHt4XbQUSulR9si8JjNJjjS272y9ofI6G67fpDiTdfaeERAdDqH VzEOpxkKYG+HRBa021cU15HlcT/ifGGurmnmO1/WKEMxUpEVoibgNYjAf5PE6sEML4EEDO xHgZZkiVHO26izS5z435G3OR+QaFP0sfD8eEKbLnegRsPQqJRrthQc+xytj7Pw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1669990023; a=rsa-sha256; cv=none; b=DmgmQvKdhYziRfQE3tGJtXhunYgdKqmodiUZofYghEjaNOkFvgu1qIjy42NkiQBuy5ckUB XFU2E6HwSVAj82D6wk1LlRa0Np4LGGPfHJchO/aHtwaqrwRYJanQuJxwb6AsqJxYOwG8Bl c/hpPBVNxjc8BGO6yScX40Xn9NYymK+OrrOSsmqlCcAuIiOSIzQvsE31hZuulkwnJVLCyL yxuL46KSj0sAypTdSDKQ4ANZepo9X9C1Shk5zcxrSFxW8eWzOjKMtTeNO91mBywMdzSEG4 KRO/KNg+WNk0eaqp77C5Yb+6iW4hneQSnNe1VWFKIK3Gbph+5vMRMrK34CkOjA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -1.77 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: A96168C60 X-Spam-Score: -1.77 X-Migadu-Scanner: scn0.migadu.com X-TUID: yzjrm1M4tdCz --0000000000003e909005eed8d84d Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable Hi Ludovic, On Fri, 2 Dec 2022 at 12:05, Ludovic Court=C3=A8s wrote: > Hi, > > I read this thread with interest=E2=80=94great to have first-hand feedbac= k from > Conda users and packagers who also understand Guix! > > Hugo Buddelmeijer skribis: > > > That is, "conda env export" should contain entries like > > "scipy=3D1.8.0=3Dpy39hee8e79c_1", where the hee8e79c should uniquely de= fine > the > > dependencies 'that matter', like which compiler is used. What goes into > the > > hash seems rather complicated, and grows over time. > > I think one source of many problems here is to think that there are > dependencies that do not matter. In the Python world, most dependencies are runtime dependencies. Those do not actually affect the build, or the build result, and therefore arguably 'do not matter'. (I disagree, because what matters is whether the software runs and creates the right results.) > Another one, which those hashes appear > to address, is to think that a name/version pair is enough to > unambiguously designate a software artifact. > > This hash is a hash of the build result, not a hash of the input, is > that correct? > No, this conda build hash is used to identify the build environment, not to identify a particular package build. The easiest way to explain is to show an example. Here is a small part of a "conda env export" of one of my environments: - pybind11-abi=3D4=3Dhd8ed1ab_3 - pycodestyle=3D2.8.0=3Dpyhd8ed1ab_0 - pycosat=3D0.6.3=3Dpy39h3811e60_1009 - pycparser=3D2.21=3Dpyhd8ed1ab_0 - pydocstyle=3D6.1.1=3Dpyhd8ed1ab_0 - pyerfa=3D2.0.0.1=3Dpy39hce5d2b2_1 - pyflakes=3D2.4.0=3Dpyhd8ed1ab_0 - pygments=3D2.11.2=3Dpyhd8ed1ab_0 - pyopenssl=3D22.0.0=3Dpyhd8ed1ab_0 - pyqt=3D5.12.3=3Dpy39hf3d152e_8 - pyqt-impl=3D5.12.3=3Dpy39hde8b62d_8 - pyqt5-sip=3D4.19.18=3Dpy39he80948d_8 - pyqtchart=3D5.12=3Dpy39h0fcd23e_8 - pyqtwebengine=3D5.12.1=3Dpy39h0fcd23e_8 As you see, many packages share the "hd8ed1ab" build hash, two qt-related packages have h0fcd23e, and some others have their own. The "hd8ed1ab" hash is by far the most common in this environment. These "hd8ed1ab" packages are mostly independent (with separate maintainers, etc), but are probably all in conda-forge and probably all use the 'default' conda environment. (The last digit/number is the build number. The "8" suggests that all qt-packages are actually built together, even though their build hash differs.) I don't really understand what goes into the hash. It is described on https://docs.conda.io/projects/conda-build/en/stable/resources/define-metad= ata.html#build-number-and-string The goal of these hashes is to capture which package builds will work together. So two package builds with the same build-hash should have been made with the same environment and thus work together. I'm not sure how it works if the hashes are different. Maybe they are merkle trees? So it is possible to determine whether one hash is a 'superset' of another hash. Probably not. > > I think it would be great to have a blog post that walks through > shortcomings and concrete issues one may encounter when trying to > reproduce a software environment with Conda, contrasting it with how > Guix does thing. This would probably make more sense for people who use > Conda everyday than a high-level overview of Guix. > A key difference might be how to handle different combinations of versions. E.g. you might want to use numpy 3.0 and scipy 18.0, while I want to use numpy 6.0 and scipy 15.0 (made up numbers, but on purpose with one lower and one greater between us). Conda and Guix solve this in fundamentally different ways. Conda-forge (as a project) is kinda in between conda alone and Guix, and can kinda be seen as a linux distribution itself (sans kernel). Conda forge is moving closer to Guix every year, including more and more dependencies, and more shared recreate-everything moments. Greetings,, Hugo --0000000000003e909005eed8d84d Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Ludovic,

<= div dir=3D"ltr" class=3D"gmail_attr">On Fri, 2 Dec 2022 at 12:05, Ludovic C= ourt=C3=A8s <ludo@gnu.org> wrote:=
Hi,

I read this thread with interest=E2=80=94great to have first-hand feedback = from
Conda users and packagers who also understand Guix!

Hugo Buddelmeijer <hugo@buddelmeijer.nl> skribis:

> That is, "conda env export" should contain entries like
> "scipy=3D1.8.0=3Dpy39hee8e79c_1", where the hee8e79c should = uniquely define the
> dependencies 'that matter', like which compiler is used. What = goes into the
> hash seems rather complicated, and grows over time.

I think one source of many problems here is to think that there are
dependencies that do not matter.=C2=A0

In t= he Python world, most dependencies are runtime dependencies. Those do not a= ctually affect the build, or the build result, and therefore arguably '= do not matter'. (I disagree, because what matters is whether the softwa= re runs and creates the right results.)
=C2=A0
Another one, which those hashes a= ppear
to address, is to think that a name/version pair is enough to
unambiguously designate a software artifact.

This hash is a hash of the build result, not a hash of the input, is
that correct?

No, this conda build hash= is used to identify the build environment, not to identify a particular pa= ckage build.

The easiest way to explain is to show= an example. Here is a small part of a "conda env export" of one = of my environments:
=C2=A0 - pybind11-abi=3D4=3Dhd8ed1ab_3
=C2= =A0 - pycodestyle=3D2.8.0=3Dpyhd8ed1ab_0
=C2=A0 - pycosat=3D0.6.3=3Dpy39= h3811e60_1009
=C2=A0 - pycparser=3D2.21=3Dpyhd8ed1ab_0
=C2=A0 - pydoc= style=3D6.1.1=3Dpyhd8ed1ab_0
=C2=A0 - pyerfa=3D2.0.0.1=3Dpy39hce5d2b2_1<= br>=C2=A0 - pyflakes=3D2.4.0=3Dpyhd8ed1ab_0
=C2=A0 - pygments=3D2.11.2= =3Dpyhd8ed1ab_0
=C2=A0 - pyopenssl=3D22.0.0=3Dpyhd8ed1ab_0
=C2=A0 - p= yqt=3D5.12.3=3Dpy39hf3d152e_8
=C2=A0 - pyqt-impl=3D5.12.3=3Dpy39hde8b62d= _8
=C2=A0 - pyqt5-sip=3D4.19.18=3Dpy39he80948d_8
=C2=A0 - pyqtchart= =3D5.12=3Dpy39h0fcd23e_8
=C2=A0 - pyqtwebengine=3D5.12.1=3Dpy39h0fcd23e_= 8

As you see, many packages share the "hd8ed1= ab" build hash, two qt-related packages have h0fcd23e, and some others= have their own. The "hd8ed1ab" hash is by far the most common in= this environment. These "hd8ed1ab" packages are mostly independe= nt (with separate maintainers, etc), but are probably all in conda-forge an= d probably all use the 'default' conda environment.
<= br>
(The last digit/number is the build number. The "8"= suggests that all qt-packages are actually built together, even though the= ir build hash differs.)

I don't really und= erstand what goes into the hash. It is described on
<= div>
The goal of these hashes is to capture which package bui= lds will work together. So two package builds with the same build-hash shou= ld have been made with the same environment and thus work together.

I'm not sure how it works if the hashes are diffe= rent. Maybe they are merkle trees? So it is possible to determine whether o= ne hash is a 'superset' of another hash. Probably not.
=C2=A0

I think it would be great to have a blog post that walks through
shortcomings and concrete issues one may encounter when trying to
reproduce a software environment with Conda, contrasting it with how
Guix does thing.=C2=A0 This would probably make more sense for people who u= se
Conda everyday than a high-level overview of Guix.
A key difference might be how to handle different combinations = of versions.

E.g. you might want to use numpy 3.0 = and scipy 18.0, while I want to use numpy 6.0 and scipy 15.0 (made up numbe= rs, but on purpose with one lower and one greater between us). Conda and Gu= ix solve this in fundamentally different ways.

Conda-forge (as a project) is kinda in between conda alone and Guix, and c= an kinda be seen as a linux distribution itself (sans kernel). Conda forge = is moving closer to Guix every year, including more and more dependencies, = and more shared recreate-everything moments.

Greeti= ngs,,
Hugo

--0000000000003e909005eed8d84d--