From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 8KukBU8RhmPozAAAbAwnHQ (envelope-from ) for ; Tue, 29 Nov 2022 15:03:59 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id OKu9BE8RhmNSgwEAG6o9tA (envelope-from ) for ; Tue, 29 Nov 2022 15:03:59 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 9C2A9112DD for ; Tue, 29 Nov 2022 15:03:58 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p01D1-0002HE-Jg; Tue, 29 Nov 2022 09:03:39 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p00QC-0007cJ-IU for guix-science@gnu.org; Tue, 29 Nov 2022 08:13:12 -0500 Received: from mail-ed1-f42.google.com ([209.85.208.42]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p00Q9-0001oV-IU for guix-science@gnu.org; Tue, 29 Nov 2022 08:13:12 -0500 Received: by mail-ed1-f42.google.com with SMTP id z18so19663990edb.9 for ; Tue, 29 Nov 2022 05:13:08 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=cc:to:subject:message-id:date:from:in-reply-to:references :mime-version:x-gm-message-state:from:to:cc:subject:date:message-id :reply-to; bh=8OhAwzaw6m4NifeoAVeW/lv1mRg2cP8iCleRFuR6zhw=; b=r4NBHPRVy7JMEf9TJk2dQ4S5a68Ee8g8KehAvqIet7uoa7CmQPyMhPMoOGDkzQ7I0q tZ8K0YkOxlsbqe83RcGhZnsh8hQRpcmIqqkjymJh1m1ZUuZX5qmmdmiksPvU1PR5e191 aEJIyIGh51DUefpGN2UC3K+xzpj0YcT9BYQZeQQgBjlrOpZu+V0/HR+yVdfHjWNSduM1 BldF8hkTPTI4Ci9+CziV/vw2zdzjpF2GyN1//Agt8plsTTNmO1cqaNamkqtOSD7I3mbq La93i6bWsw/PG1qASjlWgXJlhpNrZwuURcSZzzQ6R+N2QsxkfbbH893GpmSWHAKj3onx 3X3g== X-Gm-Message-State: ANoB5pkBhXPNRTTlFKke1lxknrVRpFuasxEz3f6Bw43tRScB6f8S6AtF TDpYkpvJ5AnkK8axSiQ1ApPYMvbGgYjUefbBivw= X-Google-Smtp-Source: AA0mqf6S6AOM2ThgMqDfLeSWjXxDGDVLoMDktTsyIlQm//+6x2P/CcoYzOhoe/DeCtyDDflFSA7EehDpi1BHM54+QTI= X-Received: by 2002:a05:6402:3644:b0:45f:c7f2:297d with SMTP id em4-20020a056402364400b0045fc7f2297dmr52688329edb.266.1669727586913; Tue, 29 Nov 2022 05:13:06 -0800 (PST) MIME-Version: 1.0 References: <87pmd7ar8k.fsf@imperial.ac.uk> <87zgcayre2.fsf@imperial.ac.uk> In-Reply-To: <87zgcayre2.fsf@imperial.ac.uk> From: Hugo Buddelmeijer Date: Tue, 29 Nov 2022 14:12:55 +0100 Message-ID: Subject: Re: Conda environments and reproducibility To: Thibault Lestang Cc: Konrad Hinsen , guix-science Content-Type: multipart/alternative; boundary="000000000000b0b02405ee9bbf84" Received-SPF: pass client-ip=209.85.208.42; envelope-from=blackshift@gmail.com; helo=mail-ed1-f42.google.com X-Spam_score_int: -15 X-Spam_score: -1.6 X-Spam_bar: - X-Spam_report: (-1.6 / 5.0 requ) BAYES_00=-1.9, FREEMAIL_FORGED_FROMDOMAIN=0.001, FREEMAIL_FROM=0.001, HEADER_FROM_DIFFERENT_DOMAINS=0.249, HTML_MESSAGE=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-Mailman-Approved-At: Tue, 29 Nov 2022 09:03:37 -0500 X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: guix-science-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1669730638; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post; bh=8OhAwzaw6m4NifeoAVeW/lv1mRg2cP8iCleRFuR6zhw=; b=F5szaUK0TQEy4G9hAvvcUHtci5YcP/if6ieaeeSJ9r7BxO2avp+HrtRJiRfS0kkg1o25DL AJ9iEdwtU3H5Y7FyKU53ceXUppuhcIEGNXrRqK3gSxcfbS3x+OuOfnGzp3m8bIFmTh/c7e EO7ntkcqU3F3e/qJ7Mcs+PlgL8a/3k5EIFchfufn63e/5rVO6ofg8Pon7w7d29TVXRVnhQ iD4T+ypoH0viAFDBvAQJjyAJSlQJM8gJPh0DHhlA+qWqQuYRDd+2wTIURPuLJOzzYKA01L gD5EURbSC/jHbmsQhky6K5CPQnjA/M9/NlCz2lUjPu58qpRSyRySO2HxUmTRVQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1669730638; a=rsa-sha256; cv=none; b=dnCBtETFbKm2BqKzyII1uoP77TuSJHe3OK2+hZtZwOJYXlNwteNKWgO5B4Flor+w0JmGKI DZGIjioeeOjgzQqMIqCH8zstIdSqKgCJZr7m9hiWF2+YNATU2ogHTyNHXbuF8/vTkb7C1Y 1yE5UI2bUrg0UTLXLRwIn/8QuXG9qxLgiwN84BtwSvn9SbAdBEYADbJI1L3nv1CHF7jqVt jUBbzSYoX8Sbk6J7xcseCXgoMsEvN3lZWrfhjiZIhO5Ah1b40IvRgkdzfF7JPjJewxjVHX 2YrtSdXA4rBbNHos8ywrt9LEKxVCRg/L9VG53I8iSdxtYOHtPqFVnWwWs6iHFw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -1.76 Authentication-Results: aspmx1.migadu.com; dkim=none; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 9C2A9112DD X-Spam-Score: -1.76 X-Migadu-Scanner: scn0.migadu.com X-TUID: dSM3NwbZkXx1 --000000000000b0b02405ee9bbf84 Content-Type: text/plain; charset="UTF-8" Hi Konrad, Thibault and others, Konrad, is it perhaps possible for you to dig up this broken conda environment file? First, just like you all, my conclusion is that guix is the answer. The last two paragraphs by Simon captures it succinctly. However, conda seems to work fine for most people. It would therefore be instructive to have concrete 'failure stories' in order to show people that conda is not enough. On Tue, 29 Nov 2022 at 11:32, Thibault Lestang wrote: > That's fair enough. Conda & pip are everywhere around me, and I'd like > to form an accurate picture of their shotcomings before mentioning > alternative approaches to people who use these tools everyday! I agree, let me share my perspective. Konrad Hinsen writes: > > That's in a way what happened in my scenario: rebuilding with a new > > compilation infrastructure produces different packages that share > > version numbers and tags with the prior ones. > > Okay - this is an explanation I can understand. A better approach > would have been /not/ to overwrite existing package binaries with new > ones produced from the new infrastructure. > It doesn't seem common to overwrite conda binaries. Conda takes some (not enough?) measures to prevent the scenario Konrad describes. In particular, the filenames include a 'hash' since conda 3 (~2014) [1]: in the past, we have had things like py27np111 in filenames. This is the > same idea, just generalized. Since we can't readily put every possible > constraint into the filename, we have kept the old ones, but added the hash > as a general solution. > This hash includes information about the compiler used (~2017) [2, 3]: The build hash will be added to the build string if these are true for any > dependency: [...] package uses {{ compiler() }} jinja2 function > That is, "conda env export" should contain entries like "scipy=1.8.0=py39hee8e79c_1", where the hee8e79c should uniquely define the dependencies 'that matter', like which compiler is used. What goes into the hash seems rather complicated, and grows over time. This hash is a great step forward in reproducibility. But it is too fragile. I can't directly see how, but I can easily assume that this dependency-hash mechanism leads to the problem that Konrad faced even when no files are overwritten. Maybe because a new dependency resolver in conda would have stricter rules on interoperability. (It is still possible that files indeed were overwritten though; it was probably an incident like this that made them change the hashes.) My realization was that improving these hashes is a goose chase and will ultimately lead to horrific things like "turing-complete yaml files". And at that point it is clear, at least to me, that guix is the answer. One thing that conda (or actualy conda-forge) does well, are their bots. I'm a maintainer of some conda packages and once a month or so I get a fully automated pull request to update my package [4], e.g. when the upstream package is updated, or when a dependency is updated. They even have a tracking system for migrating dependencies that are used by many packages, such as compilers. This makes maintaining conda-forge packages a breeze. Having such bots also within the guix-ecosystem would probably help attract developers. By the way, it is quite hard to use conda in guix, primarily because "conda activate myenvironment" will try to set PS1 by calling a bash function called 'conda'. This bash function calls the 'conda' executable, which takes PS1, modifies it, and returns it to the bash function. The bash function subsequently sets PS1 (and makes a backup for deactivating the environment again). However, the conda executable is replaced by a bash script that calls conda_real. And bash scripts eat PS1 (because it is in non-interactive mode), so conda_real gets an empty PS1, fails to modify it, and then the bash function sets PS1 to nothing. I've got it working properly on my machine, but don't feel comfortable enough yet with Scheme / guix to provide a proper patch. The simplest might be to use another shell for the conda package (because I believe only bash eats PS1); not sure whether that is possible in guix. And I would rather make guix packages of everything and ditch conda altogether. But supporting conda properly would help more people transition. (Oh, this reminds me of the problems of activation and deactivation scripts in conda. For another time.) Greetings, Hugo [1] https://www.anaconda.com/blog/package-better-conda-build-3 [2] https://docs.conda.io/projects/conda-build/en/stable/resources/define-metadata.html [3] https://github.com/conda/conda-build/blob/e4d9b3bd255565d47b6ab6b93380ef246b2a1ddf/conda_build/metadata.py#L1294 [4] https://github.com/conda-forge/python-cpl-feedstock/pulls?q=is%3Apr+is%3Aclosed --000000000000b0b02405ee9bbf84 Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
Hi Konrad, Thibault and others,
<= /div>

Konrad, is it perhaps possible for you to dig up t= his broken conda environment file?

First, just like you all, my conclusion is that guix is the answer. The last two=20 paragraphs by Simon captures it succinctly. However, conda seems to work fine for most people. It would therefore be instructive to have=20 concrete 'failure stories' in order to show people that conda is no= t=20 enough.


On Tue, 29 Nov 2022 at 11:32, Thibault Lestang <t.lestang@imperial.ac.uk> wrote= :
That's fair enough. Conda & pip are everywhere around me, and I'= ;d like
to form an accurate picture of their shotcomings before mentioning
alternative approaches to people who use these tools everyday!
=

I agree, let me share my perspe= ctive.

Konrad Hinsen <konrad.hinsen@cnrs.fr> writes:
> That's in a way what ha= ppened in my scenario: rebuilding with a new
> compilation infrastructure produces different packages that share
> version numbers and tags with the prior ones.

Okay - this is an explanation I can understand. A better approach
would have been /not/ to overwrite existing package binaries with new
ones produced from the new infrastructure.

It doesn't seem common to overwrite conda binaries. Conda ta= kes some (not enough?) measures to prevent the scenario Konrad describes. I= n particular, the filenames include a 'hash' since conda 3 (~2014) = [1]:

=
in the past, we have had things like py27np111 in filenames. This is the=20 same idea, just generalized. Since we can't readily put every possible= =20 constraint into the filename, we have kept the old ones, but added the=20 hash as a general solution.

This hash= includes information about the compiler used (~2017) [2, 3]:
The build hash will be added to the build string if these are true for any=20 dependency: [...] package uses {{ compiler() }} jinja2 function

That is, "conda env export" should contain entries like=20 "scipy=3D1.8.0=3Dpy39hee8e79c_1", where the hee8e79c should uniqu= ely define=20 the dependencies 'that matter', like which compiler is used. What g= oes into the hash seems rather=20 complicated, and grows over time.


This hash is a=20 great step forward in reproducibility. But it is too fragile. I can't= =20 directly see how, but I can easily assume that this dependency-hash=20 mechanism leads to the problem that Konrad faced even when no files are ove= rwritten. Maybe because a new dependency resolver in conda would have stric= ter rules on interoperability. (It is still possible that files indeed were= overwritten though; it was probably an incident like this that made them c= hange the hashes.)

My realization was=20 that improving these hashes is a goose chase and will ultimately lead to horrific things like "turing-complete yaml files". And at that po= int it is=20 clear, at least to me, that guix is the answer.


One thing that conda (or actualy conda-forge) does well, are their bots.=20 I'm a maintainer of some conda packages and once a month or so I get a= =20 fully automated pull request to update my package [4], e.g. when the=20 upstream package is updated, or when a dependency is updated. They even=20 have a tracking system for migrating dependencies that are used by many=20 packages, such as compilers. This makes maintaining conda-forge packages a breeze. Having such bots also within the guix-ecosystem would=20 probably help attract developers.

By the way, = it is quite hard to use conda in guix, primarily because "conda activa= te myenvironment" will try to set PS1 by calling a bash function calle= d 'conda'. This bash function calls the 'conda' executable,= which takes PS1, modifies it, and returns it to the bash function. The bas= h function subsequently sets PS1 (and makes a backup for deactivating the e= nvironment again). However, the conda executable is replaced by a bash scri= pt that calls conda_real. And bash scripts eat PS1 (because it is in non-in= teractive mode), so conda_real gets an empty PS1, fails to modify it, and t= hen the bash function sets PS1 to nothing. I've got it working properly= on my machine, but don't feel comfortable enough yet with Scheme / gui= x to provide a proper patch. The simplest might be to use another shell for= the conda package (because I believe only bash eats PS1); not sure whether= that is possible in guix. And I would rather make guix packages of everyth= ing and ditch conda altogether. But supporting conda properly would help mo= re people transition.

(Oh, this reminds me of = the problems of activation and deactivation scripts in conda. For another t= ime.)

Greetings,
Hugo
=




--000000000000b0b02405ee9bbf84--