unofficial mirror of guix-science@gnu.org 
 help / color / mirror / Atom feed
From: Hugo Buddelmeijer <hugo@buddelmeijer.nl>
To: Thibault Lestang <t.lestang@imperial.ac.uk>
Cc: Konrad Hinsen <konrad.hinsen@cnrs.fr>,
	guix-science <guix-science@gnu.org>
Subject: Re: Conda environments and reproducibility
Date: Tue, 29 Nov 2022 14:12:55 +0100	[thread overview]
Message-ID: <CA+Jv8O1VzXjPgZ04HaDHpeyvuDqaU_e2FYdsckhDzyi8Dgi8Pg@mail.gmail.com> (raw)
In-Reply-To: <87zgcayre2.fsf@imperial.ac.uk>

[-- Attachment #1: Type: text/plain, Size: 4858 bytes --]

Hi Konrad, Thibault and others,

Konrad, is it perhaps possible for you to dig up this broken conda
environment file?

First, just like you all, my conclusion is that guix is the answer. The
last two paragraphs by Simon captures it succinctly. However, conda seems
to work fine for most people. It would therefore be instructive to have
concrete 'failure stories' in order to show people that conda is not enough.


On Tue, 29 Nov 2022 at 11:32, Thibault Lestang <t.lestang@imperial.ac.uk>
wrote:

> That's fair enough. Conda & pip are everywhere around me, and I'd like
> to form an accurate picture of their shotcomings before mentioning
> alternative approaches to people who use these tools everyday!


I agree, let me share my perspective.

Konrad Hinsen <konrad.hinsen@cnrs.fr> writes:
> > That's in a way what happened in my scenario: rebuilding with a new
> > compilation infrastructure produces different packages that share
> > version numbers and tags with the prior ones.
>
> Okay - this is an explanation I can understand. A better approach
> would have been /not/ to overwrite existing package binaries with new
> ones produced from the new infrastructure.
>

It doesn't seem common to overwrite conda binaries. Conda takes some (not
enough?) measures to prevent the scenario Konrad describes. In particular,
the filenames include a 'hash' since conda 3 (~2014) [1]:

in the past, we have had things like py27np111 in filenames. This is the
> same idea, just generalized. Since we can't readily put every possible
> constraint into the filename, we have kept the old ones, but added the hash
> as a general solution.
>

This hash includes information about the compiler used (~2017) [2, 3]:

The build hash will be added to the build string if these are true for any
> dependency: [...] package uses {{ compiler() }} jinja2 function
>

That is, "conda env export" should contain entries like
"scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the
dependencies 'that matter', like which compiler is used. What goes into the
hash seems rather complicated, and grows over time.


This hash is a great step forward in reproducibility. But it is too
fragile. I can't directly see how, but I can easily assume that this
dependency-hash mechanism leads to the problem that Konrad faced even when
no files are overwritten. Maybe because a new dependency resolver in conda
would have stricter rules on interoperability. (It is still possible that
files indeed were overwritten though; it was probably an incident like this
that made them change the hashes.)

My realization was that improving these hashes is a goose chase and will
ultimately lead to horrific things like "turing-complete yaml files". And
at that point it is clear, at least to me, that guix is the answer.


One thing that conda (or actualy conda-forge) does well, are their bots.
I'm a maintainer of some conda packages and once a month or so I get a
fully automated pull request to update my package [4], e.g. when the
upstream package is updated, or when a dependency is updated. They even
have a tracking system for migrating dependencies that are used by many
packages, such as compilers. This makes maintaining conda-forge packages a
breeze. Having such bots also within the guix-ecosystem would probably help
attract developers.

By the way, it is quite hard to use conda in guix, primarily because "conda
activate myenvironment" will try to set PS1 by calling a bash function
called 'conda'. This bash function calls the 'conda' executable, which
takes PS1, modifies it, and returns it to the bash function. The bash
function subsequently sets PS1 (and makes a backup for deactivating the
environment again). However, the conda executable is replaced by a bash
script that calls conda_real. And bash scripts eat PS1 (because it is in
non-interactive mode), so conda_real gets an empty PS1, fails to modify it,
and then the bash function sets PS1 to nothing. I've got it working
properly on my machine, but don't feel comfortable enough yet with Scheme /
guix to provide a proper patch. The simplest might be to use another shell
for the conda package (because I believe only bash eats PS1); not sure
whether that is possible in guix. And I would rather make guix packages of
everything and ditch conda altogether. But supporting conda properly would
help more people transition.

(Oh, this reminds me of the problems of activation and deactivation scripts
in conda. For another time.)

Greetings,
Hugo


[1] https://www.anaconda.com/blog/package-better-conda-build-3
[2]
https://docs.conda.io/projects/conda-build/en/stable/resources/define-metadata.html
[3]
https://github.com/conda/conda-build/blob/e4d9b3bd255565d47b6ab6b93380ef246b2a1ddf/conda_build/metadata.py#L1294
[4]
https://github.com/conda-forge/python-cpl-feedstock/pulls?q=is%3Apr+is%3Aclosed

[-- Attachment #2: Type: text/html, Size: 6880 bytes --]

  reply	other threads:[~2022-11-29 14:03 UTC|newest]

Thread overview: 25+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-11-28 17:28 Conda environments and reproducibility Thibault Lestang
2022-11-28 19:45 ` Konrad Hinsen
2022-11-29 10:32   ` Thibault Lestang
2022-11-29 13:12     ` Hugo Buddelmeijer [this message]
2022-11-29 13:39       ` Konrad Hinsen
2022-12-01 14:01         ` Hugo Buddelmeijer
2022-12-02 13:01           ` Konrad Hinsen
2022-11-29 20:10       ` Simon Tournier
2022-12-16 10:16         ` Thibault Lestang
2023-03-11 11:05           ` Ludovic Courtès
2023-03-11 11:43             ` Simon Tournier
2023-03-13 10:26               ` Lestang, Thibault
2023-03-13 11:00                 ` Ricardo Wurmus
2023-03-13 12:38                   ` Simon Tournier
2023-03-16 10:26                     ` Ludovic Courtès
2023-03-16 13:40                       ` Thibault Lestang
2023-04-03 15:22                         ` Simon Tournier
2023-04-04 12:19                           ` Thibault Lestang
2022-12-02 10:52       ` Ludovic Courtès
2022-12-02 11:05       ` Ludovic Courtès
2022-12-02 13:59         ` Simon Tournier
2022-12-02 14:06         ` Hugo Buddelmeijer
2022-11-28 20:46 ` Simon Tournier
2022-11-29 10:41   ` Thibault Lestang
2022-11-29 14:25     ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CA+Jv8O1VzXjPgZ04HaDHpeyvuDqaU_e2FYdsckhDzyi8Dgi8Pg@mail.gmail.com \
    --to=hugo@buddelmeijer.nl \
    --cc=guix-science@gnu.org \
    --cc=konrad.hinsen@cnrs.fr \
    --cc=t.lestang@imperial.ac.uk \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).