unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Thibault Lestang <t.lestang@imperial.ac.uk>
To: Simon Tournier <zimon.toutoune@gmail.com>
Cc: guix-science <guix-science@gnu.org>
Subject: Re: Conda environments and reproducibility
Date: Tue, 29 Nov 2022 10:41:37 +0000	[thread overview]
Message-ID: <87v8myypsg.fsf@imperial.ac.uk> (raw)
In-Reply-To: <86v8my7qpe.fsf@gmail.com>


Simon Tournier <zimon.toutoune@gmail.com> writes:

> On Mon, 28 Nov 2022 at 17:28, Thibault Lestang <t.lestang@imperial.ac.uk> wrote:
>> -----
>> @luispedrocoelho
>> Me, 6 months ago: I am going to save this conda
>> environment with all the versions of all the packages so it can be
>> recreated later; this is Reproducible Science!
>>
>> conda, today: these versions don't work together, lol.
>> -----
>>
>> I simply can't explain how such a behavior can happen.
>
> One thing is the link rot.  I do not know if it is currently estimated,
> but for sure, we always underestimate it.

How far back do packages version go in Anaconda's archives? Are there
any guarantees? Good question.

>> I understand that conda ships pre-compiled binaries. I see how that's
>> bad for reproducibility and provenance tracking since it's not
>> straightforward to know how these binaries and dependencies were
>> compiled. I'm assuming that, when conda saves an environment, it records
>> version tags and "everything else required" to pull the same binaries
>> later. Okay - I see how binaries could /technically/ be modified at a
>> later stage whilst maintaning the same version tag (provenance tracking
>> issue).
>
> Aside, you are assuming the availability of such binaries. :-)

Yes I am - I guess that's linked to your point about link rot?
>
> Another thing, from the old time where I used Conda, and I may be wrong,
> is, I guess , the SAT solver [1].  Well, 6 months ago, you described
> your environment, for instance saying:
>
>     1.0 <= foo
>     2.0 <= bar <= 3.0
>     baz <= 4.0
>
> then foo@1.1, foo@1.2 and foo@2.0 had been released in these past 6
> months.  But baz <= 4.0 only works with 0.9 <= foo <= 1.2 and the
> constraint on bar implies other constraints on foo and/or baz.
>
> The complexity about SAT solvers is exponential, IIRC, for sure really
> bad, and I do not know the state-of-the-art but I guess the problem to
> solve is going to be worse and worse as the time flies.
>
> From my experience, you have only one solution to fight against the
> time: freeze.  The question is then how or what to freeze. :-)
>
> One way for freezing is the binary container.  Another way for freezing
> is to have a “summary” capturing the whole (fixed) graph of
> dependencies.  This is (usually named) the channels.scm file (guix
> describe).  Then, the assumptions become:
>
>  1. solve the link rot; tackled by Software Heritage,
>  2. Linux kernel API backward compatibility,
>  3. hardware compatibility,

I think the tweet above is about reproducing an enviroment after
effectively freezing constitutive packages and their dependenies as you
describe. They probably used something like

conda env export

Which outputs something similar to (trimmed)

name: justnumpy
channels:
  - defaults
dependencies:
  - _libgcc_mutex=0.1=main
  - _openmp_mutex=5.1=1_gnu
  - blas=1.0=mkl
  - libuuid=1.41.5=h5eee18b_0
  - mkl=2021.4.0=h06a4308_640
  - mkl-service=2.4.0=py310h7f8727e_0
  - mkl_fft=1.3.1=py310hd6ae3a3_0
  - mkl_random=1.2.2=py310h00e6091_0
  - ncurses=6.3=h5eee18b_3
  - numpy=1.23.4=py310hd5efca6_0
  - numpy-base=1.23.4=py310h8e6c178_0
  - ...
  - ...
prefix: /home/thibault/miniconda3/envs/justnumpy

> If I might, here some stuff: :-)
>
> https://www.nature.com/articles/s41597-022-01720-9
> https://simon.tournier.info/posts/2022-11-08-bluehats.html
> https://simon.tournier.info/posts/2022-04-15-cafe-guix-long-term.html

Great stuff - thank you. Congratulations on the paper!

-- Thibault


  reply	other threads:[~2022-11-29 11:16 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 17:28 Conda environments and reproducibility Thibault Lestang
2022-11-28 19:45 ` Konrad Hinsen
2022-11-29 10:32   ` Thibault Lestang
2022-11-29 13:12     ` Hugo Buddelmeijer
2022-11-29 13:39       ` Konrad Hinsen
2022-12-01 14:01         ` Hugo Buddelmeijer
2022-12-02 13:01           ` Konrad Hinsen
2022-11-29 20:10       ` Simon Tournier
2022-12-16 10:16         ` Thibault Lestang
2023-03-11 11:05           ` Ludovic Courtès
2023-03-11 11:43             ` Simon Tournier
2023-03-13 10:26               ` Lestang, Thibault
2023-03-13 11:00                 ` Ricardo Wurmus
2023-03-13 12:38                   ` Simon Tournier
2023-03-16 10:26                     ` Ludovic Courtès
2023-03-16 13:40                       ` Thibault Lestang
2023-04-03 15:22                         ` Simon Tournier
2023-04-04 12:19                           ` Thibault Lestang
2022-12-02 10:52       ` Ludovic Courtès
2022-12-02 11:05       ` Ludovic Courtès
2022-12-02 13:59         ` Simon Tournier
2022-12-02 14:06         ` Hugo Buddelmeijer
2022-11-28 20:46 ` Simon Tournier
2022-11-29 10:41   ` Thibault Lestang [this message]
2022-11-29 14:25     ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87v8myypsg.fsf@imperial.ac.uk \
    --to=t.lestang@imperial.ac.uk \
    --cc=guix-science@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).