From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <guix-science-bounces+larch=yhetil.org@gnu.org>
Received: from mp12.migadu.com ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by ms9.migadu.com with LMTPS
	id eJHrMhI7D2SZbQEASxT56A
	(envelope-from <guix-science-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Mon, 13 Mar 2023 16:02:42 +0100
Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::])
	(using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits))
	by mp12.migadu.com with LMTPS
	id oA7dMhI7D2TyFQEAauVa8A
	(envelope-from <guix-science-bounces+larch=yhetil.org@gnu.org>)
	for <larch@yhetil.org>; Mon, 13 Mar 2023 16:02:42 +0100
Received: from lists.gnu.org (lists.gnu.org [209.51.188.17])
	(using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits))
	(No client certificate requested)
	by aspmx1.migadu.com (Postfix) with ESMTPS id 34DBB11618
	for <larch@yhetil.org>; Mon, 13 Mar 2023 16:02:42 +0100 (CET)
Authentication-Results: aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=Pfc3kaM1;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org"
ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org;
	s=key1; t=1678719762;
	h=from:from:sender:sender:reply-to:subject:subject:date:date:
	 message-id:message-id:to:to:cc:cc:mime-version:mime-version:
	 content-type:content-type:
	 content-transfer-encoding:content-transfer-encoding:
	 in-reply-to:in-reply-to:references:references:list-id:list-help:
	 list-unsubscribe:list-subscribe:list-post:dkim-signature;
	bh=pRmFQ9SPfJzrSNXagrKF3iikCHP8e6Ycwx5bZ04oNYE=;
	b=ZDVEOFfWWhcCyydPaBZ/pOYvaV6sTYK85ANWbu+UfL4F5naq5vX9cyAuKJACVh+qbDJmCC
	xIFALBofBZh/wfnQa7kqN91XqXTgFs3bMGrzpT/q+slqwpBCBOaSQ4k4zqWxvOWbLIY63x
	P3AEDbCzDGEWP7tqNxOJiMokn8KATFuaXIXgfu9HoYGXX2i2cenbbu81u6kACZGwBdP2Ko
	fcXcW0Co/mYAx1BJS5fF34I9jtdV6nTRIEwLgb9ZZ1QwxZH5V6/6CJOERZGxG/TGbKSJsM
	khuu+lOWOK9dNUHwYqrjth2nVAtYfCH4e4ANscPPxc2kr++c56QUWIY+/3vChQ==
ARC-Authentication-Results: i=1;
	aspmx1.migadu.com;
	dkim=pass header.d=gmail.com header.s=20210112 header.b=Pfc3kaM1;
	dmarc=pass (policy=none) header.from=gmail.com;
	spf=pass (aspmx1.migadu.com: domain of "guix-science-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-science-bounces+larch=yhetil.org@gnu.org"
ARC-Seal: i=1; s=key1; d=yhetil.org; t=1678719762; a=rsa-sha256; cv=none;
	b=j4/K0A72MFNVvEBIrIc+B4kt3XrEJvTDNsZ521y3niVTybOcqgmkVs+AA9JDUxt5+ZVmJn
	FwXreRBSbhkjHtJ7iTZVZLoN20e2YRZuUlWwQfg/KIu9K4Uil1LdmCe62nLQTlCe4p345W
	mYOhpEH8W3AbKARKz7sESK5T2GRHB2J5NnbzrIoVh9RFD/3IAoFc3K58kbAdW3rUXkHPxR
	UHiFYXw1vZgIfvYoLxdUM8bENht2bMFAR6X5yRZmksg0Jpf7gC5Aq6kJYQqnIXVye0xZlI
	fX6XFRJ7egiw4OyvmBBRmcptEqCu6piqS/sXdz6ZrTDmBNJzlRkR+5CFluIQMg==
Received: from localhost ([::1] helo=lists1p.gnu.org)
	by lists.gnu.org with esmtp (Exim 4.90_1)
	(envelope-from <guix-science-bounces@gnu.org>)
	id 1pbjgl-0006zn-DV; Mon, 13 Mar 2023 11:02:19 -0400
Received: from eggs.gnu.org ([2001:470:142:3::10])
 by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256)
 (Exim 4.90_1) (envelope-from <zimon.toutoune@gmail.com>)
 id 1pbjgd-0006yb-7G
 for guix-science@gnu.org; Mon, 13 Mar 2023 11:02:07 -0400
Received: from mail-wm1-x332.google.com ([2a00:1450:4864:20::332])
 by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128)
 (Exim 4.90_1) (envelope-from <zimon.toutoune@gmail.com>)
 id 1pbjgZ-0006Ln-OH
 for guix-science@gnu.org; Mon, 13 Mar 2023 11:02:06 -0400
Received: by mail-wm1-x332.google.com with SMTP id ay8so2979473wmb.1
 for <guix-science@gnu.org>; Mon, 13 Mar 2023 08:02:03 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=gmail.com; s=20210112; t=1678719722;
 h=content-transfer-encoding:mime-version:message-id:date:references
 :in-reply-to:subject:cc:to:from:from:to:cc:subject:date:message-id
 :reply-to; bh=pRmFQ9SPfJzrSNXagrKF3iikCHP8e6Ycwx5bZ04oNYE=;
 b=Pfc3kaM1tTbcYLPiHLIhPpW+f9oEXwc7c3EBFLL+FGgXqdS0nO45AESFTwu3tGBiJ9
 dzwXY25wCfVBY7kCQmRFd6zSk8lsy7OJTz+qVQMBJnSM7UPnbgvAoaaPMWbVhaPocD9/
 wSwqfc4DH4zeWU0nFOLQzO5aXRcuPg6pbM4nLboAhh9phJc8dBBPnU8dt7mi+dm3/Yar
 Undg9pADRU5eA74bPZV9foLAffmRfgzHPj+d60sM7p7Vs2TLhEvpgvlpBHoBg+h1g0mg
 dilixAJlU6Rd/qnCMmU06ibKfe3xH4i4DNMExZy6Kv83HIPujTtrJ9oxHtGE4p4SgKR6
 o8pw==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112; t=1678719722;
 h=content-transfer-encoding:mime-version:message-id:date:references
 :in-reply-to:subject:cc:to:from:x-gm-message-state:from:to:cc
 :subject:date:message-id:reply-to;
 bh=pRmFQ9SPfJzrSNXagrKF3iikCHP8e6Ycwx5bZ04oNYE=;
 b=vmnMuDhLU2yOtnxBQjrxam10rD6GBuzbJatNMtzZedD5mvFrOwK9ZWAPZ+lG6ObNae
 mMwzhZdDrOcw06b9T9bNWvErCDisndylCpo/aKZvMLgbUcs4+3Tnh1kouHxcPscmD0cb
 5V1Ntk7aJ04kqULJbIDUJsYq/0/0km/9fJPb0vo8w4RRsn7RAOhWBnIXdFELzAc072O9
 P1dQrHEFlhFwD9xcwf4NwtABpqQHdFjJvG+SSqnpMhVKB1XWnE1cZemv0aXDeGFpDq6O
 pj7gXqwbt0I5JbwTiEFjgiZSb/nDMFxAVOUqz4nee+PwAAPPd824ygA6kCrZxji4iWvf
 6cbg==
X-Gm-Message-State: AO0yUKVmW9UbYT0adKBB5t9x0VCCAuhy0+fKdP3qRQWY5vS6S/WBCEDe
 LKbFNbHrymonlM0xh+UkFhYr3ZRLnOQ=
X-Google-Smtp-Source: AK7set/FT2VJ5nbOIXraGAPbNQ6BgRpOJy9VudXyTZuA8nsakiud9Xqu3dgmVzJXtbtmhEg6rKZFSA==
X-Received: by 2002:a05:600c:3ac5:b0:3e7:534a:694e with SMTP id
 d5-20020a05600c3ac500b003e7534a694emr12680812wms.3.1678719721372; 
 Mon, 13 Mar 2023 08:02:01 -0700 (PDT)
Received: from pfiuh07 ([193.48.40.241]) by smtp.gmail.com with ESMTPSA id
 q17-20020a05600c46d100b003dc1d668866sm11792wmo.10.2023.03.13.08.02.00
 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256);
 Mon, 13 Mar 2023 08:02:01 -0700 (PDT)
From: Simon Tournier <zimon.toutoune@gmail.com>
To: Ricardo Wurmus <rekado@elephly.net>, "Lestang, Thibault"
 <t.lestang@imperial.ac.uk>
Cc: Ludovic =?utf-8?Q?Court=C3=A8s?= <ludovic.courtes@inria.fr>,
 guix-science@gnu.org
Subject: Re: Conda environments and reproducibility
In-Reply-To: <87r0ts3o8n.fsf@elephly.net>
References: <87pmd7ar8k.fsf@imperial.ac.uk> <m1h6yioobp.fsf@fastmail.net>
 <87zgcayre2.fsf@imperial.ac.uk>
 <CA+Jv8O1VzXjPgZ04HaDHpeyvuDqaU_e2FYdsckhDzyi8Dgi8Pg@mail.gmail.com>
 <86y1rt5xoz.fsf@gmail.com> <87fsdfejfv.fsf@imperial.ac.uk>
 <87a60jy2e7.fsf@gnu.org>
 <CAJ3okZ1d9HSV9MRSto4kaBG69SCGFpB7EavQ-6dff4+gfyWTVQ@mail.gmail.com>
 <LO2P265MB059058CA46ECF928E3322DF2DDB99@LO2P265MB0590.GBRP265.PROD.OUTLOOK.COM>
 <87r0ts3o8n.fsf@elephly.net>
Date: Mon, 13 Mar 2023 13:38:52 +0100
Message-ID: <87a60gn7vn.fsf@gmail.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: quoted-printable
Received-SPF: pass client-ip=2a00:1450:4864:20::332;
 envelope-from=zimon.toutoune@gmail.com; helo=mail-wm1-x332.google.com
X-Spam_score_int: -20
X-Spam_score: -2.1
X-Spam_bar: --
X-Spam_report: (-2.1 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1,
 DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001,
 RCVD_IN_DNSWL_NONE=-0.0001, SPF_HELO_NONE=0.001,
 SPF_PASS=-0.001 autolearn=ham autolearn_force=no
X-Spam_action: no action
X-BeenThere: guix-science@gnu.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: <guix-science.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-science>,
 <mailto:guix-science-request@gnu.org?subject=unsubscribe>
List-Archive: <https://lists.gnu.org/archive/html/guix-science>
List-Post: <mailto:guix-science@gnu.org>
X-Migadu-Queue-Id: 34DBB11618
X-Spam-Score: -11.45
X-Migadu-Spam-Score: -11.45
X-Migadu-Scanner: scn0.migadu.com
List-Help: <mailto:guix-science-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-science>,
 <mailto:guix-science-request@gnu.org?subject=subscribe>
Errors-To: guix-science-bounces+larch=yhetil.org@gnu.org
Sender: guix-science-bounces+larch=yhetil.org@gnu.org
X-Migadu-Country: US
X-Migadu-Flow: FLOW_IN
X-TUID: lzH0mXdSvw67

Hi,

On lun., 13 mars 2023 at 12:00, Ricardo Wurmus <rekado@elephly.net> wrote:

>> If the process of reproducing the environment is going to fail at some p=
oint, I=20
>> wonder if we could accelerate this process by defining a more complex en=
vironment.=20
>> Any ideas?

Maybe something using PyTorch or some other ML framework.


> A more complex environment would increase the chance of failure because
> it increases the complexity of the challenge to the resolver.  While it
> would be a useful demonstration to see the resolver fail I think it is
> the least damning kind of failure.

Yes, I agree the solver will be the last thing to break.  Well, from my
understanding of [1], the breakage of the Conda solver depends on the
state of their index.  Quoting [1]:

        This is where the SAT solver will act. It will use the list of Matc=
hSpec
        objects to pick a number of PackageRecord entries from the index, t=
hus
        building the =E2=80=9Cfinal state of the solved environment=E2=80=
=9D. This is detailed
        later in this deep dive guide, if you need more info.=20

so more complex is the environment and more complicated the solution of
the SAT will be.  And finding the solution can be slow.  That=E2=80=99s why=
 they
implemented various solvers [2].  And it is not clear for me if [1] and
[2] always lead to the same environment.

To my knowledge, the issue is well-identified, for instance by the
Mancoosi project [3]; in short, it reads:

        4.2 Package installation is NP-Complete
=20=20=20=20=20=20=20=20
        Theorem 1: Checking whether a single package P can be installed,
                   given a repository R, is NP-complete.

        4.2.4 Conclusions

        Despite the apparent differences, the constraint languages in DEB a=
nd
        RPM are sensibly equivalent in expressiveness, and the associated
        installation problems are both NP-complete.=20

        This means that automatic package installation tools like APT, URPM=
I or
        SMART live dangerously on the edge of intractability, and must care=
fully
        apply heuristics that may be either safe (the approach advocated by
        SMART), and hence still not guaranteed to avoid intractability, or
        unsafe, thus accepting the risk of not always finding a solution wh=
en it
        exists.=20

Therefore, I do not see where Conda would be different.  However, indeed
it could be hard to construct a concrete example of a failure for the
SAT solver part.  Moreover, Conda documentation reads [1],

        Explicit package installs

        These commands do not need a solver because the requested packages =
are
        expressed with a direct URL or path to a specific tarball. Instead =
of a
        MatchSpec, we already have a PackageRecord-like entity! For this to
        work, all the requested packages neeed to be URLs or paths. They ca=
n be
        typed in the command line or in a text file including a @EXPLICIT l=
ine.=20

        Since the solver is not involved, the dependencies of the explicit
        package(s) are not processed at all. This can leave the environment=
 in
        an inconsistent state, which can be fixed by running conda update -=
-all,
        for example.=20

        Explicit installs are taken care of by the explicit function.

For sure, the failure of Conda is by design.  And as with many things in
life, people only believe what they see from their own eyes. :-)

1: https://docs.conda.io/projects/conda/en/latest/dev-guide/deep-dives/solv=
ers.html
2: https://conda.github.io/conda-libmamba-solver/libmamba-vs-classic/
3: https://www.mancoosi.org/edos/algorithmic/


>> Simon Tournier <zimon.toutoune@gmail.com> writes:
>>
>>> 1. also use the image continuumio/miniconda3:latest
>>> 2. install Miniconda on the top of the Docker image of Debian
>>>   unstable and run "apt update && apt upgrade"
>>>=20
>>> And I expect that #2 will break first, then #1 and last the current
>>> one.
>>
>> Could you elaborate on this? For context the current pipeline=20
>> pulls a pinned miniconda image then updates conda (=3Dconda update conda=
=3D).=20=20
>> Do you expect system libraries (I mean software installed through apt, n=
ot=20
>> managed by conda) to influence the conda environment creation?  My curre=
nt=20
>> understanding is that conda brings its  own copies of these libraries wi=
thout relying=20
>> on whatever was/will be installed through other ways (e.g. apt).
>
> This depends on the packages.  There are packages that do link with
> system libraries, and these are provided by a base image in which the
> binary artefacts are built.

As Ricardo explained, sometimes Conda relies on system libraries.  Guix
makes the assumption of a compatible Linux kernel.  Conda also makes
assumptions and, to my knowledge, they are less strict about isolated
environments.

That=E2=80=99s why replacing the base image could also help to expose examp=
les
where it breaks.

Just to point that I was in a workshop of Reproducible Research past
week and I discussed with the developer of BenchOpt [4].  Their aim is
to maintain the computational stack for some ML framework when the
passing of time by making their benchmarks evolving.  Other said, they
take the other direction of Guix.  If they do that, that=E2=80=99s because =
it is
not possible to run again. :-)

4: https://benchopt.github.io/


It is hard to predict beforehand where Conda will break. :-)  From my
point of view, by order of most probable:

 1. because the underlying Linux distribution base
 2. because the SAT solver

Well, for testing #1, I propose:

 a) to also run the pipeline using continuumio/miniconda3:latest
 b) to run an installation of Conda
     i) on the top of Debian
     ii) on the top of Ubuntu
    and then run the script

As corollary, it will also test #2. ;-)

The current script is about Numpy, maybe it would accelerate the process
if instead it would be PyTorch.


Thanks for the discussion about that topic.  If no one beats me, I will
adapt .gitlab-ci.yml.  Well, do not hold your breath=E2=80=A6 first holiday=
s! ;-)


Cheers,
simon