unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Simon Tournier <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludo@gnu.org>,
	"Hugo Buddelmeijer" <hugo@buddelmeijer.nl>
Cc: Thibault Lestang <t.lestang@imperial.ac.uk>,
	Konrad Hinsen <konrad.hinsen@cnrs.fr>,
	guix-science <guix-science@gnu.org>
Subject: Re: Conda environments and reproducibility
Date: Fri, 02 Dec 2022 14:59:42 +0100	[thread overview]
Message-ID: <878rjpsy7l.fsf@gmail.com> (raw)
In-Reply-To: <87k03at69n.fsf@gnu.org>

Hi,

On Fri, 02 Dec 2022 at 12:05, Ludovic Courtès <ludo@gnu.org> wrote:
> Hugo Buddelmeijer <hugo@buddelmeijer.nl> skribis:
>
>> That is, "conda env export" should contain entries like
>> "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the
>> dependencies 'that matter', like which compiler is used. What goes into the
>> hash seems rather complicated, and grows over time.
>
> I think one source of many problems here is to think that there are
> dependencies that do not matter.  Another one, which those hashes appear
> to address, is to think that a name/version pair is enough to
> unambiguously designate a software artifact.
>
> This hash is a hash of the build result, not a hash of the input, is
> that correct?

Well, the official Conda documentation seems explanatory, IMHO.  For
instance,

https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/solvers.html#matchspec-vs-packagerecord

From my understanding, if you go via MatchSpec then the SAT solver is
invoked.  The SAT solver tries to satisfy all the constraints and the
solution depends on the state of the index (the upstream repository).

Aside the SAT solver can be very long and even fails if the constraints
are too hard, there is no guarantee that the SAT solver will find the
exact same combination for the packages to install.  Having an equality
(numpy=1.23) or something else does not really change this point.

Conda offers the option to be “explicit”.  And in that case, the solver
is not invoked.  Somehow, it is a way to directly deal with
PackageRecord.  Then, the Conda documentation has these warnings:

        * Explicit package installs

        Since  the  solver is  not  involved,  the dependencies  of  the
        explicit package(s) are not processed at all. This can leave the
        environment  in an  inconsistent state,  which can  be fixed  by
        running conda update --all, for example.

        * Cloning an environment

        It essentially takes the  source environment, generates the URLs
        for  each installed  packages  (filtering  conda, conda-env  and
        their   dependencies)   and  passes   the   list   of  URLs   to
        explicit(). If the source tarballs are not in the cache anymore,
        it will  query the  index for  the best  possible match  for the
        current channels. As  such, there’s a slim chance  that the copy
        is not exactly a clone of the original environment.

        https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/solvers.html#early-exit-tasks


Therefore, the official Conda documentation explains that it is not
possible to have some guarantee about reproducing an environment.


> I think it would be great to have a blog post that walks through
> shortcomings and concrete issues one may encounter when trying to
> reproduce a software environment with Conda, contrasting it with how
> Guix does thing.  This would probably make more sense for people who use
> Conda everyday than a high-level overview of Guix.

From my understanding, the main issue is that Conda perfectly works when
you are in a short temporal window (2-3 months, say!).  In this range,
people can often reproduce.  It becomes more complicated outside this
range – so it is hard to demo for explaining. :-)

For sure, a blog post by people being fluent in both Conda and Guix
would be very welcome.  Aside the discussion about reproducibility, just
a Rosetta Stone comparing how to do that using Conda vs Guix.  It would
smooth the migration and at least give a try with Guix. :-)


Cheers,
simon


  reply	other threads:[~2022-12-02 16:10 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 17:28 Conda environments and reproducibility Thibault Lestang
2022-11-28 19:45 ` Konrad Hinsen
2022-11-29 10:32   ` Thibault Lestang
2022-11-29 13:12     ` Hugo Buddelmeijer
2022-11-29 13:39       ` Konrad Hinsen
2022-12-01 14:01         ` Hugo Buddelmeijer
2022-12-02 13:01           ` Konrad Hinsen
2022-11-29 20:10       ` Simon Tournier
2022-12-16 10:16         ` Thibault Lestang
2023-03-11 11:05           ` Ludovic Courtès
2023-03-11 11:43             ` Simon Tournier
2023-03-13 10:26               ` Lestang, Thibault
2023-03-13 11:00                 ` Ricardo Wurmus
2023-03-13 12:38                   ` Simon Tournier
2023-03-16 10:26                     ` Ludovic Courtès
2023-03-16 13:40                       ` Thibault Lestang
2023-04-03 15:22                         ` Simon Tournier
2023-04-04 12:19                           ` Thibault Lestang
2022-12-02 10:52       ` Ludovic Courtès
2022-12-02 11:05       ` Ludovic Courtès
2022-12-02 13:59         ` Simon Tournier [this message]
2022-12-02 14:06         ` Hugo Buddelmeijer
2022-11-28 20:46 ` Simon Tournier
2022-11-29 10:41   ` Thibault Lestang
2022-11-29 14:25     ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=878rjpsy7l.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-science@gnu.org \
    --cc=hugo@buddelmeijer.nl \
    --cc=konrad.hinsen@cnrs.fr \
    --cc=ludo@gnu.org \
    --cc=t.lestang@imperial.ac.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).