unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Hugo Buddelmeijer <hugo@buddelmeijer.nl>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: Thibault Lestang <t.lestang@imperial.ac.uk>,
	Konrad Hinsen <konrad.hinsen@cnrs.fr>,
	guix-science <guix-science@gnu.org>
Subject: Re: Conda environments and reproducibility
Date: Fri, 2 Dec 2022 15:06:20 +0100	[thread overview]
Message-ID: <CA+Jv8O0S0jkfhxSBw5WUCyQaX8NsZJhmFnYQ68nsCiBEZy8UpA@mail.gmail.com> (raw)
In-Reply-To: <87k03at69n.fsf@gnu.org>

[-- Attachment #1: Type: text/plain, Size: 4007 bytes --]

Hi Ludovic,

On Fri, 2 Dec 2022 at 12:05, Ludovic Courtès <ludo@gnu.org> wrote:

> Hi,
>
> I read this thread with interest—great to have first-hand feedback from
> Conda users and packagers who also understand Guix!
>
> Hugo Buddelmeijer <hugo@buddelmeijer.nl> skribis:
>
> > That is, "conda env export" should contain entries like
> > "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define
> the
> > dependencies 'that matter', like which compiler is used. What goes into
> the
> > hash seems rather complicated, and grows over time.
>
> I think one source of many problems here is to think that there are
> dependencies that do not matter.


In the Python world, most dependencies are runtime dependencies. Those do
not actually affect the build, or the build result, and therefore arguably
'do not matter'. (I disagree, because what matters is whether the software
runs and creates the right results.)


> Another one, which those hashes appear
> to address, is to think that a name/version pair is enough to
> unambiguously designate a software artifact.
>
> This hash is a hash of the build result, not a hash of the input, is
> that correct?
>

No, this conda build hash is used to identify the build environment, not to
identify a particular package build.

The easiest way to explain is to show an example. Here is a small part of a
"conda env export" of one of my environments:
  - pybind11-abi=4=hd8ed1ab_3
  - pycodestyle=2.8.0=pyhd8ed1ab_0
  - pycosat=0.6.3=py39h3811e60_1009
  - pycparser=2.21=pyhd8ed1ab_0
  - pydocstyle=6.1.1=pyhd8ed1ab_0
  - pyerfa=2.0.0.1=py39hce5d2b2_1
  - pyflakes=2.4.0=pyhd8ed1ab_0
  - pygments=2.11.2=pyhd8ed1ab_0
  - pyopenssl=22.0.0=pyhd8ed1ab_0
  - pyqt=5.12.3=py39hf3d152e_8
  - pyqt-impl=5.12.3=py39hde8b62d_8
  - pyqt5-sip=4.19.18=py39he80948d_8
  - pyqtchart=5.12=py39h0fcd23e_8
  - pyqtwebengine=5.12.1=py39h0fcd23e_8

As you see, many packages share the "hd8ed1ab" build hash, two qt-related
packages have h0fcd23e, and some others have their own. The "hd8ed1ab" hash
is by far the most common in this environment. These "hd8ed1ab" packages
are mostly independent (with separate maintainers, etc), but are probably
all in conda-forge and probably all use the 'default' conda environment.

(The last digit/number is the build number. The "8" suggests that all
qt-packages are actually built together, even though their build hash
differs.)

I don't really understand what goes into the hash. It is described on
https://docs.conda.io/projects/conda-build/en/stable/resources/define-metadata.html#build-number-and-string

The goal of these hashes is to capture which package builds will work
together. So two package builds with the same build-hash should have been
made with the same environment and thus work together.

I'm not sure how it works if the hashes are different. Maybe they are
merkle trees? So it is possible to determine whether one hash is a
'superset' of another hash. Probably not.


>
> I think it would be great to have a blog post that walks through
> shortcomings and concrete issues one may encounter when trying to
> reproduce a software environment with Conda, contrasting it with how
> Guix does thing.  This would probably make more sense for people who use
> Conda everyday than a high-level overview of Guix.
>

A key difference might be how to handle different combinations of versions.

E.g. you might want to use numpy 3.0 and scipy 18.0, while I want to use
numpy 6.0 and scipy 15.0 (made up numbers, but on purpose with one lower
and one greater between us). Conda and Guix solve this in fundamentally
different ways.

Conda-forge (as a project) is kinda in between conda alone and Guix, and
can kinda be seen as a linux distribution itself (sans kernel). Conda forge
is moving closer to Guix every year, including more and more dependencies,
and more shared recreate-everything moments.

Greetings,,
Hugo

[-- Attachment #2: Type: text/html, Size: 5311 bytes --]

  parent reply	other threads:[~2022-12-02 14:07 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 17:28 Conda environments and reproducibility Thibault Lestang
2022-11-28 19:45 ` Konrad Hinsen
2022-11-29 10:32   ` Thibault Lestang
2022-11-29 13:12     ` Hugo Buddelmeijer
2022-11-29 13:39       ` Konrad Hinsen
2022-12-01 14:01         ` Hugo Buddelmeijer
2022-12-02 13:01           ` Konrad Hinsen
2022-11-29 20:10       ` Simon Tournier
2022-12-16 10:16         ` Thibault Lestang
2023-03-11 11:05           ` Ludovic Courtès
2023-03-11 11:43             ` Simon Tournier
2023-03-13 10:26               ` Lestang, Thibault
2023-03-13 11:00                 ` Ricardo Wurmus
2023-03-13 12:38                   ` Simon Tournier
2023-03-16 10:26                     ` Ludovic Courtès
2023-03-16 13:40                       ` Thibault Lestang
2023-04-03 15:22                         ` Simon Tournier
2023-04-04 12:19                           ` Thibault Lestang
2022-12-02 10:52       ` Ludovic Courtès
2022-12-02 11:05       ` Ludovic Courtès
2022-12-02 13:59         ` Simon Tournier
2022-12-02 14:06         ` Hugo Buddelmeijer [this message]
2022-11-28 20:46 ` Simon Tournier
2022-11-29 10:41   ` Thibault Lestang
2022-11-29 14:25     ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+Jv8O0S0jkfhxSBw5WUCyQaX8NsZJhmFnYQ68nsCiBEZy8UpA@mail.gmail.com \
    --to=hugo@buddelmeijer.nl \
    --cc=guix-science@gnu.org \
    --cc=konrad.hinsen@cnrs.fr \
    --cc=ludo@gnu.org \
    --cc=t.lestang@imperial.ac.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).