unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Simon Tournier <zimon.toutoune@gmail.com>
To: Hugo Buddelmeijer <hugo@buddelmeijer.nl>,
	Thibault Lestang <t.lestang@imperial.ac.uk>
Cc: Konrad Hinsen <konrad.hinsen@cnrs.fr>,
	guix-science <guix-science@gnu.org>
Subject: Re: Conda environments and reproducibility
Date: Tue, 29 Nov 2022 21:10:20 +0100	[thread overview]
Message-ID: <86y1rt5xoz.fsf@gmail.com> (raw)
In-Reply-To: <CA+Jv8O1VzXjPgZ04HaDHpeyvuDqaU_e2FYdsckhDzyi8Dgi8Pg@mail.gmail.com>

Hi Hugo, all,

On Tue, 29 Nov 2022 at 14:12, Hugo Buddelmeijer <hugo@buddelmeijer.nl> wrote:

>                                                      However, conda seems
> to work fine for most people. It would therefore be instructive to have
> concrete 'failure stories' in order to show people that conda is not enough.

What I would do if I would try to convince my colleagues that Conda is
not enough.

1. Target one or two common environments; for example,
   (Python+Numpy+Scipy+Matplotlib) for one, and (R+Seurat) for two.

2. Generate the both environments following the Conda documentation.

Until here all should work smoothly. :-)

3. Commit the Conda files in a Git repository; for instance,

       for e in py rseurat
       do
         conda activate $e
         conda env export > environment-$e.yml
         conda list --explicit > explicit-spec-$e.txt       
         conda deactivate
       done

4.
   a) on the same machine, try to recreate the 2 environments.
   b) on another machine, idem.
   c) Commit to the Git repository how it goes.
   d) Remove the two environments and more on both machine.

5. Every new month, do #4.


Maybe it can be automated with a Cron task.  And maybe we could
collectively do this experience.  And we could do the same with
Guix. :-)

Well, we have not spoken about running something.  We could also write a
small Python script plotting something using Numpy and/or Scipy and try
to run the Seurat vignette.

From my experience, after some months (from 2-3 to 6), Conda will fail.
Especially after an update of the system (apt upgrade)–and it can worse
with a ’dist-upgrade’. :-)
    

> On Tue, 29 Nov 2022 at 11:32, Thibault Lestang <t.lestang@imperial.ac.uk>
> wrote:
>
>> That's fair enough. Conda & pip are everywhere around me, and I'd like
>> to form an accurate picture of their shotcomings before mentioning
>> alternative approaches to people who use these tools everyday!
>
> I agree, let me share my perspective.

Conda and pip works very well when we have in mind a forward view of the
history.  By design, they fail when backward.  For engineering, they are
very efficient and personally I would rely on them **if** I had some
systems to maintain only caring about upgrading them.  Well, Conda, pip
or some other distro package manager.

The troubles are when you try to restore the past.  The 10 Years
Challenge [1] provides very good examples.  This report [2] (in French,
but an English version is probably around) provides very good insights,
IMHO, about the limitations of classical package managers (as Debian,
Conda, pip, etc.)

For what my biased opinion is worth, many shortcomings are around. :-)
For instance, this paper [3] points the reproduction was «so
time-consuming and resulted in only 11 out of 28 (39%) figure panels
conveying the same information».  Well, for sure it is hard to know if
the students tried hard or not–and the paper does not speak much about
the computational environment.

(Well, aside the transparency of the computational stack that Conda
barely provides, but that’s another story. :-))

1: <https://www.nature.com/articles/d41586-020-02462-7>
2: <https://hpc.guix.info/static/videos/atelier-reproductibilit%C3%A9-2021/arnaud-legrand.webm>
3: <https://doi.org/10.1371/journal.pcbi.1010615>


> That is, "conda env export" should contain entries like
> "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the
> dependencies 'that matter', like which compiler is used. What goes into the
> hash seems rather complicated, and grows over time.
>
> This hash is a great step forward in reproducibility. But it is too
> fragile. I can't directly see how, but I can easily assume that this
> dependency-hash mechanism leads to the problem that Konrad faced even when
> no files are overwritten. Maybe because a new dependency resolver in conda
> would have stricter rules on interoperability. (It is still possible that
> files indeed were overwritten though; it was probably an incident like this
> that made them change the hashes.)

Well, I think Conda documentation [4] about the solver for dependencies
put some warnings around this explicit mechanism.  It is a long time
that I have not given a look at Conda but from my understanding of the
solver documentation, this “failure” reported by Konrad appears to me
expected, by design of Conda. ;-)

If the solver tries to satisfy many constraints, then the problem is
more complex as the time is going.  So, Conda probably fails to find a
working combination.

If the solver is bypassed, then there is no guarantee that the generated
state is a working computational environment.  Conda recommends to
update in order to fix the potential issues.

4: <https://conda.io/projects/conda/en/latest/dev-guide/deep-dives/solvers.html>


> One thing that conda (or actualy conda-forge) does well, are their bots.
> I'm a maintainer of some conda packages and once a month or so I get a
> fully automated pull request to update my package [4], e.g. when the
> upstream package is updated, or when a dependency is updated. They even
> have a tracking system for migrating dependencies that are used by many
> packages, such as compilers. This makes maintaining conda-forge packages a
> breeze. Having such bots also within the guix-ecosystem would probably help
> attract developers.

Cool!  Do you know if the code of these bots is available?


> By the way, it is quite hard to use conda in guix,

Maybe you could open bugs and/or report on help-guix or guix-devel the
annoyance you are observing.  For instance, I fully removed Conda from
my toolbox so I never hit annoyance. ;-)


Cheers,
simon


  parent reply	other threads:[~2022-11-29 20:14 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 17:28 Conda environments and reproducibility Thibault Lestang
2022-11-28 19:45 ` Konrad Hinsen
2022-11-29 10:32   ` Thibault Lestang
2022-11-29 13:12     ` Hugo Buddelmeijer
2022-11-29 13:39       ` Konrad Hinsen
2022-12-01 14:01         ` Hugo Buddelmeijer
2022-12-02 13:01           ` Konrad Hinsen
2022-11-29 20:10       ` Simon Tournier [this message]
2022-12-16 10:16         ` Thibault Lestang
2023-03-11 11:05           ` Ludovic Courtès
2023-03-11 11:43             ` Simon Tournier
2023-03-13 10:26               ` Lestang, Thibault
2023-03-13 11:00                 ` Ricardo Wurmus
2023-03-13 12:38                   ` Simon Tournier
2023-03-16 10:26                     ` Ludovic Courtès
2023-03-16 13:40                       ` Thibault Lestang
2023-04-03 15:22                         ` Simon Tournier
2023-04-04 12:19                           ` Thibault Lestang
2022-12-02 10:52       ` Ludovic Courtès
2022-12-02 11:05       ` Ludovic Courtès
2022-12-02 13:59         ` Simon Tournier
2022-12-02 14:06         ` Hugo Buddelmeijer
2022-11-28 20:46 ` Simon Tournier
2022-11-29 10:41   ` Thibault Lestang
2022-11-29 14:25     ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86y1rt5xoz.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-science@gnu.org \
    --cc=hugo@buddelmeijer.nl \
    --cc=konrad.hinsen@cnrs.fr \
    --cc=t.lestang@imperial.ac.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).