From: Simon Tournier <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>,
"Hugo Buddelmeijer" <hugo@buddelmeijer.nl>
Cc: Thibault Lestang <t.lestang@imperial.ac.uk>,
Konrad Hinsen <konrad.hinsen@cnrs.fr>,
guix-science <guix-science@gnu.org>
Subject: Re: Conda environments and reproducibility
Date: Fri, 02 Dec 2022 14:59:42 +0100 [thread overview]
Message-ID: <878rjpsy7l.fsf@gmail.com> (raw)
In-Reply-To: <87k03at69n.fsf@gnu.org>
Hi,
On Fri, 02 Dec 2022 at 12:05, Ludovic Courtès <ludo@gnu.org> wrote:
> Hugo Buddelmeijer <hugo@buddelmeijer.nl> skribis:
>
>> That is, "conda env export" should contain entries like
>> "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the
>> dependencies 'that matter', like which compiler is used. What goes into the
>> hash seems rather complicated, and grows over time.
>
> I think one source of many problems here is to think that there are
> dependencies that do not matter. Another one, which those hashes appear
> to address, is to think that a name/version pair is enough to
> unambiguously designate a software artifact.
>
> This hash is a hash of the build result, not a hash of the input, is
> that correct?
Well, the official Conda documentation seems explanatory, IMHO. For
instance,
https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/solvers.html#matchspec-vs-packagerecord
From my understanding, if you go via MatchSpec then the SAT solver is
invoked. The SAT solver tries to satisfy all the constraints and the
solution depends on the state of the index (the upstream repository).
Aside the SAT solver can be very long and even fails if the constraints
are too hard, there is no guarantee that the SAT solver will find the
exact same combination for the packages to install. Having an equality
(numpy=1.23) or something else does not really change this point.
Conda offers the option to be “explicit”. And in that case, the solver
is not invoked. Somehow, it is a way to directly deal with
PackageRecord. Then, the Conda documentation has these warnings:
* Explicit package installs
Since the solver is not involved, the dependencies of the
explicit package(s) are not processed at all. This can leave the
environment in an inconsistent state, which can be fixed by
running conda update --all, for example.
* Cloning an environment
It essentially takes the source environment, generates the URLs
for each installed packages (filtering conda, conda-env and
their dependencies) and passes the list of URLs to
explicit(). If the source tarballs are not in the cache anymore,
it will query the index for the best possible match for the
current channels. As such, there’s a slim chance that the copy
is not exactly a clone of the original environment.
https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/solvers.html#early-exit-tasks
Therefore, the official Conda documentation explains that it is not
possible to have some guarantee about reproducing an environment.
> I think it would be great to have a blog post that walks through
> shortcomings and concrete issues one may encounter when trying to
> reproduce a software environment with Conda, contrasting it with how
> Guix does thing. This would probably make more sense for people who use
> Conda everyday than a high-level overview of Guix.
From my understanding, the main issue is that Conda perfectly works when
you are in a short temporal window (2-3 months, say!). In this range,
people can often reproduce. It becomes more complicated outside this
range – so it is hard to demo for explaining. :-)
For sure, a blog post by people being fluent in both Conda and Guix
would be very welcome. Aside the discussion about reproducibility, just
a Rosetta Stone comparing how to do that using Conda vs Guix. It would
smooth the migration and at least give a try with Guix. :-)
Cheers,
simon
next prev parent reply other threads:[~2022-12-02 16:10 UTC|newest]
Thread overview: 25+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-11-28 17:28 Conda environments and reproducibility Thibault Lestang
2022-11-28 19:45 ` Konrad Hinsen
2022-11-29 10:32 ` Thibault Lestang
2022-11-29 13:12 ` Hugo Buddelmeijer
2022-11-29 13:39 ` Konrad Hinsen
2022-12-01 14:01 ` Hugo Buddelmeijer
2022-12-02 13:01 ` Konrad Hinsen
2022-11-29 20:10 ` Simon Tournier
2022-12-16 10:16 ` Thibault Lestang
2023-03-11 11:05 ` Ludovic Courtès
2023-03-11 11:43 ` Simon Tournier
2023-03-13 10:26 ` Lestang, Thibault
2023-03-13 11:00 ` Ricardo Wurmus
2023-03-13 12:38 ` Simon Tournier
2023-03-16 10:26 ` Ludovic Courtès
2023-03-16 13:40 ` Thibault Lestang
2023-04-03 15:22 ` Simon Tournier
2023-04-04 12:19 ` Thibault Lestang
2022-12-02 10:52 ` Ludovic Courtès
2022-12-02 11:05 ` Ludovic Courtès
2022-12-02 13:59 ` Simon Tournier [this message]
2022-12-02 14:06 ` Hugo Buddelmeijer
2022-11-28 20:46 ` Simon Tournier
2022-11-29 10:41 ` Thibault Lestang
2022-11-29 14:25 ` Simon Tournier
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=878rjpsy7l.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=guix-science@gnu.org \
--cc=hugo@buddelmeijer.nl \
--cc=konrad.hinsen@cnrs.fr \
--cc=ludo@gnu.org \
--cc=t.lestang@imperial.ac.uk \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).