From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id 2OtSEQ1ohmMPMQAAbAwnHQ (envelope-from ) for ; Tue, 29 Nov 2022 21:14:05 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id SNUwEQ1ohmNHpgAAauVa8A (envelope-from ) for ; Tue, 29 Nov 2022 21:14:05 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id E660B3A710 for ; Tue, 29 Nov 2022 21:14:04 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1p06z6-0008O1-Ay; Tue, 29 Nov 2022 15:13:40 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1p06z4-0008NV-D5 for guix-science@gnu.org; Tue, 29 Nov 2022 15:13:38 -0500 Received: from mail-wm1-x334.google.com ([2a00:1450:4864:20::334]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1p06z2-0003J8-BW for guix-science@gnu.org; Tue, 29 Nov 2022 15:13:38 -0500 Received: by mail-wm1-x334.google.com with SMTP id ay27-20020a05600c1e1b00b003d070f4060bso226010wmb.2 for ; Tue, 29 Nov 2022 12:13:35 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id :reply-to; bh=mGo/Pjg/3HQ9iJ/Y+aCL+6yo1ImJXLcJQqmXDmlG32U=; b=OToT9rhO/GW8pGi0yTuf8HAOuHZYPzFWN6J4X3tYGavDb/FUKvF1i7pRlPtr0TowlG cLbQ0AiMMu4z7jFn1DtH0G2kjQRMNZQ8xgPA9hMp9JjMednIXdqK0mgJkWnhm4D3baps obOAIUjvWVOTRb++mkNpZK1JMoRR6WU8GcIVmZpfG1pX4+LbAK22ssvrcML1U/R1jURN /p5F2CuUCE69uCcbu4ykKyxma6H0/5rxuxktEOc8Z4S6V191L0cUKtH00DyqfKZbXT8M 8PTVW0gmlN62wnYt+oi4vCKx1JNRV8UY9lnVjk2nw5+DoK9Byq5IXt6LQ1D9T7C232DI +8Mg== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=content-transfer-encoding:mime-version:message-id:date:references :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=mGo/Pjg/3HQ9iJ/Y+aCL+6yo1ImJXLcJQqmXDmlG32U=; b=VT6nKG8I4TxcbqW8OMOHvMy6MWPDxZNipaI7y441lE1AZxhtJoqjkjE7em6BVc7s8U 5jvHq/XTDxoCSYqSez454dpFyF51ZWYtbVCqirlPd6hq2A4YEHHxTIVCV1TXHLLkK55G kc4RcVI7UW1CpWA+9Dir0crMEOoqMV4/UjVvNW9QQg653Fma4WxdjWUXs6Dkr066Bt+U tA41cLTg1itXTof6osIX8mgLEX22mgBUtdkvs/x3eR2AArXz3PLsb3W23noF8qFgcsnK YUKlGOMChLL204y+Mrg/9CfznCWfks/ZniwQi0QLsYJa7Z0c4IUjuzL8Wqh+kyo2FxQn slcg== X-Gm-Message-State: ANoB5pmFvdi6k0Hr1ux03/2LD9cVbtuokkReTcyl6VKTqpbaspUjiWqO YCqf58fA6MTh0qOmbIQRapMCgot28pU= X-Google-Smtp-Source: AA0mqf6X449Aqj5cpaYiyHVjpu+ja4WKU49ebsXsb+DTc5KEqHpPMRL2P5crD1Meckwl6QO5OkYsZA== X-Received: by 2002:a05:600c:1ca6:b0:3d0:2476:5aad with SMTP id k38-20020a05600c1ca600b003d024765aadmr32274619wms.46.1669752814478; Tue, 29 Nov 2022 12:13:34 -0800 (PST) Received: from lili ([2a01:e0a:59b:9120:65d2:2476:f637:db1e]) by smtp.gmail.com with ESMTPSA id l18-20020a05600c4f1200b003b95ed78275sm3688546wmq.20.2022.11.29.12.13.33 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Tue, 29 Nov 2022 12:13:34 -0800 (PST) From: Simon Tournier To: Hugo Buddelmeijer , Thibault Lestang Cc: Konrad Hinsen , guix-science Subject: Re: Conda environments and reproducibility In-Reply-To: References: <87pmd7ar8k.fsf@imperial.ac.uk> <87zgcayre2.fsf@imperial.ac.uk> Date: Tue, 29 Nov 2022 21:10:20 +0100 Message-ID: <86y1rt5xoz.fsf@gmail.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=2a00:1450:4864:20::334; envelope-from=zimon.toutoune@gmail.com; helo=mail-wm1-x334.google.com X-Spam_score_int: -20 X-Spam_score: -2.1 X-Spam_bar: -- X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-science@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org Sender: guix-science-bounces+larch=yhetil.org@gnu.org X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1669752845; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=mGo/Pjg/3HQ9iJ/Y+aCL+6yo1ImJXLcJQqmXDmlG32U=; b=NJvsHw5ZCLDTpxbowLxW8Pp7O9Q+WZoROCA4EP1fhMT5JykA6dXuE34tYzhkKTGN623g1m vfcGz2bSUUxBev42yOtasvaMVctbeTme+9GrbNqIP8l4/dV90J/dxQOrMsQdaZokAHX/mX rHpI1R79tR0k5ZzYmOycRA0X9NrmoUBeR8KenKWprFPPWli3Gay4WKhBCv2U7Vu4Q/cL9G MX171bbAKVoyR7ZE+27sbIhr3uWvTI5ruCjQvpXdr0A/T492jWLfq7aCYLcjcoY1T1pggL Eru0oyAhoFSZ7eW12Ml1gww0YSALgEcHjgsIN0px8eytemHPdNMB+7FUCgOvGw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1669752845; a=rsa-sha256; cv=none; b=gFL/AYyv1HXeiIExNWxh/CvzPlNvGZimNi8YTIchCULd7eUGK9kHntpzvyt32w4G8n1EsF 2jugQAFfH+RIpETqns4Wfd+gR1ay/tPbpa8xb8RW0gmhzIFNTrfy5dPmAgh7N+1NWPHwjU W9InL/hSXFqE+KCZaJTGBwhrUMNnEa/g8ew8qHoa+UZadNvtvCh+LnIg2yOLXCqV6ogtpR WyzjXV8WNsuD/qA5cZyGabKTxWaw3fpcXPojpnSCyBCRPTTHe+e6k+D60yic2KKoCgUZtt +aZSsaL0kjRgqqMNoJzpeCiNLXCiXB6M8HBUQlyHb5C2rfi0TeuGAzAXTFjhHA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=OToT9rhO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -3.96 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=gmail.com header.s=20210112 header.b=OToT9rhO; dmarc=pass (policy=none) header.from=gmail.com; spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: E660B3A710 X-Spam-Score: -3.96 X-Migadu-Scanner: scn1.migadu.com X-TUID: JwpAjBbDdTfJ Hi Hugo, all, On Tue, 29 Nov 2022 at 14:12, Hugo Buddelmeijer wrot= e: > However, conda seems > to work fine for most people. It would therefore be instructive to have > concrete 'failure stories' in order to show people that conda is not enou= gh. What I would do if I would try to convince my colleagues that Conda is not enough. 1. Target one or two common environments; for example, (Python+Numpy+Scipy+Matplotlib) for one, and (R+Seurat) for two. 2. Generate the both environments following the Conda documentation. Until here all should work smoothly. :-) 3. Commit the Conda files in a Git repository; for instance, for e in py rseurat do conda activate $e conda env export > environment-$e.yml conda list --explicit > explicit-spec-$e.txt=20=20=20=20=20=20=20 conda deactivate done 4. a) on the same machine, try to recreate the 2 environments. b) on another machine, idem. c) Commit to the Git repository how it goes. d) Remove the two environments and more on both machine. 5. Every new month, do #4. Maybe it can be automated with a Cron task. And maybe we could collectively do this experience. And we could do the same with Guix. :-) Well, we have not spoken about running something. We could also write a small Python script plotting something using Numpy and/or Scipy and try to run the Seurat vignette. >From my experience, after some months (from 2-3 to 6), Conda will fail. Especially after an update of the system (apt upgrade)=E2=80=93and it can w= orse with a =E2=80=99dist-upgrade=E2=80=99. :-) =20=20=20=20 > On Tue, 29 Nov 2022 at 11:32, Thibault Lestang > wrote: > >> That's fair enough. Conda & pip are everywhere around me, and I'd like >> to form an accurate picture of their shotcomings before mentioning >> alternative approaches to people who use these tools everyday! > > I agree, let me share my perspective. Conda and pip works very well when we have in mind a forward view of the history. By design, they fail when backward. For engineering, they are very efficient and personally I would rely on them **if** I had some systems to maintain only caring about upgrading them. Well, Conda, pip or some other distro package manager. The troubles are when you try to restore the past. The 10 Years Challenge [1] provides very good examples. This report [2] (in French, but an English version is probably around) provides very good insights, IMHO, about the limitations of classical package managers (as Debian, Conda, pip, etc.) For what my biased opinion is worth, many shortcomings are around. :-) For instance, this paper [3] points the reproduction was =C2=ABso time-consuming and resulted in only 11 out of 28 (39%) figure panels conveying the same information=C2=BB. Well, for sure it is hard to know if the students tried hard or not=E2=80=93and the paper does not speak much ab= out the computational environment. (Well, aside the transparency of the computational stack that Conda barely provides, but that=E2=80=99s another story. :-)) 1: 2: 3: > That is, "conda env export" should contain entries like > "scipy=3D1.8.0=3Dpy39hee8e79c_1", where the hee8e79c should uniquely defi= ne the > dependencies 'that matter', like which compiler is used. What goes into t= he > hash seems rather complicated, and grows over time. > > This hash is a great step forward in reproducibility. But it is too > fragile. I can't directly see how, but I can easily assume that this > dependency-hash mechanism leads to the problem that Konrad faced even when > no files are overwritten. Maybe because a new dependency resolver in conda > would have stricter rules on interoperability. (It is still possible that > files indeed were overwritten though; it was probably an incident like th= is > that made them change the hashes.) Well, I think Conda documentation [4] about the solver for dependencies put some warnings around this explicit mechanism. It is a long time that I have not given a look at Conda but from my understanding of the solver documentation, this =E2=80=9Cfailure=E2=80=9D reported by Konrad app= ears to me expected, by design of Conda. ;-) If the solver tries to satisfy many constraints, then the problem is more complex as the time is going. So, Conda probably fails to find a working combination. If the solver is bypassed, then there is no guarantee that the generated state is a working computational environment. Conda recommends to update in order to fix the potential issues. 4: > One thing that conda (or actualy conda-forge) does well, are their bots. > I'm a maintainer of some conda packages and once a month or so I get a > fully automated pull request to update my package [4], e.g. when the > upstream package is updated, or when a dependency is updated. They even > have a tracking system for migrating dependencies that are used by many > packages, such as compilers. This makes maintaining conda-forge packages a > breeze. Having such bots also within the guix-ecosystem would probably he= lp > attract developers. Cool! Do you know if the code of these bots is available? > By the way, it is quite hard to use conda in guix, Maybe you could open bugs and/or report on help-guix or guix-devel the annoyance you are observing. For instance, I fully removed Conda from my toolbox so I never hit annoyance. ;-) Cheers, simon