From mboxrd@z Thu Jan  1 00:00:00 1970
From: =?UTF-8?Q?G=C3=A1bor_Boskovits?= <boskovits@gmail.com>
Subject: Re: Treating tests as special case
Date: Thu, 5 Apr 2018 08:05:39 +0200
Message-ID: <CAE4v=pjxkScynVas=WVn0acK-OCH0F+WO8PRmeJJW-v1Ma7kvA@mail.gmail.com>
References: <20180405052439.GA30291@thebird.nl>
Mime-Version: 1.0
Content-Type: multipart/alternative; boundary="001a113eb8d49da3e7056913bad3"
Return-path: <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
Received: from eggs.gnu.org ([2001:4830:134:3::10]:50399)
	by lists.gnu.org with esmtp (Exim 4.71)
	(envelope-from <boskovits@gmail.com>) id 1f3y1m-0005RT-E3
	for guix-devel@gnu.org; Thu, 05 Apr 2018 02:05:43 -0400
Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71)
	(envelope-from <boskovits@gmail.com>) id 1f3y1l-0006AR-0s
	for guix-devel@gnu.org; Thu, 05 Apr 2018 02:05:42 -0400
Received: from mail-io0-x22a.google.com ([2607:f8b0:4001:c06::22a]:37806)
	by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16)
	(Exim 4.71) (envelope-from <boskovits@gmail.com>) id 1f3y1k-0006AA-PT
	for guix-devel@gnu.org; Thu, 05 Apr 2018 02:05:40 -0400
Received: by mail-io0-x22a.google.com with SMTP id y128so29157758iod.4
	for <guix-devel@gnu.org>; Wed, 04 Apr 2018 23:05:40 -0700 (PDT)
In-Reply-To: <20180405052439.GA30291@thebird.nl>
List-Id: "Development of GNU Guix and the GNU System distribution."
	<guix-devel.gnu.org>
List-Unsubscribe: <https://lists.gnu.org/mailman/options/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=unsubscribe>
List-Archive: <http://lists.gnu.org/archive/html/guix-devel/>
List-Post: <mailto:guix-devel@gnu.org>
List-Help: <mailto:guix-devel-request@gnu.org?subject=help>
List-Subscribe: <https://lists.gnu.org/mailman/listinfo/guix-devel>,
	<mailto:guix-devel-request@gnu.org?subject=subscribe>
Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org
Sender: "Guix-devel" <guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org>
To: Pjotr Prins <pjotr.public12@thebird.nl>
Cc: Guix-devel <guix-devel@gnu.org>

--001a113eb8d49da3e7056913bad3
Content-Type: text/plain; charset="UTF-8"

2018-04-05 7:24 GMT+02:00 Pjotr Prins <pjotr.public12@thebird.nl>:

> Last night I was watching Rich Hickey's on Specs and deployment. It is
> a very interesting talk in many ways, recommended. He talks about
> tests at 1:02 into the talk:
>
>   https://www.youtube.com/watch?v=oyLBGkS5ICk
>
> and he gave me a new insight which rang immediately true. He said:
> what is the point of running tests everywhere? If two people test the
> same thing, what is the added value of that? (I paraphrase)


Actually running tests test the behaviour of a software. Unfortunately
reproducible build does not guarantee reproducible behaviour.
Furthermore there are still cases, where the environment is
not the same around these running software, like hardware or
kernel configuration settings leaking into the environment.
These can be spotted by running tests. Nondeterministic
failures can also be spotted more easily. There are a lot of
packages where pulling tests can be done, I guess, but probably not
for all of them. WDYT?

>
>
With Guix a reproducibly building package generates the same Hash on
> all dependencies. Running the same tests every time on that makes no
> sense.
>
> And this hooks in with my main peeve about building from source. The
> building takes long enough. Testing takes incredibly long with many
> packages (especially language related) and are usually single core
> (unlike the build). It is also bad for our carbon foot print. Assuming
> everyone uses Guix on the planet, is that where we want to end up?
>
> Burning down the house.
>
> Like we pull substitutes we could pull a list of hashes of test cases
> that are known to work (on Hydra or elsewhere). This is much lighter
> than storing substitutes, so when the binaries get removed we can
> still retain the test hashes and have fast builds. Also true for guix
> repo itself.
>
> I know there are two 'inputs' I am not accounting for: (1) hardware
> variants and (2) the Linux kernel. But, honestly, I do not think we
> are in the business of testing those. We can assume these work. If
> not, any issues will be found in other ways (typically a segfault ;).
> Our tests are generally meaningless when it comes to (1) and (2). And
> packages that build differently on different platforms, like openblas,
> we should opt out on.
>
> I think this would be a cool innovation (in more ways than one).
>
> Pj.
>
>

--001a113eb8d49da3e7056913bad3
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<div dir=3D"ltr"><div class=3D"gmail_extra"><div class=3D"gmail_quote">2018=
-04-05 7:24 GMT+02:00 Pjotr Prins <span dir=3D"ltr">&lt;<a href=3D"mailto:p=
jotr.public12@thebird.nl" target=3D"_blank">pjotr.public12@thebird.nl</a>&g=
t;</span>:<br><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;=
border-left:1px #ccc solid;padding-left:1ex">Last night I was watching Rich=
 Hickey&#39;s on Specs and deployment. It is<br>
a very interesting talk in many ways, recommended. He talks about<br>
tests at 1:02 into the talk:<br>
<br>
=C2=A0 <a href=3D"https://www.youtube.com/watch?v=3DoyLBGkS5ICk" rel=3D"nor=
eferrer" target=3D"_blank">https://www.youtube.com/watch?<wbr>v=3DoyLBGkS5I=
Ck</a><br>
<br>
and he gave me a new insight which rang immediately true. He said:<br>
what is the point of running tests everywhere? If two people test the<br>
same thing, what is the added value of that? (I paraphrase)</blockquote><di=
v><br></div><div>Actually running tests test the behaviour of a software. U=
nfortunately</div><div>reproducible build does not guarantee reproducible b=
ehaviour.</div><div>Furthermore there are still cases, where the environmen=
t is</div><div>not the same around these running software, like hardware or=
</div><div>kernel configuration settings leaking into the environment.</div=
><div>These can be spotted by running tests. Nondeterministic</div><div>fai=
lures can also be spotted more easily. There are a lot of</div><div>package=
s where pulling tests can be done, I guess, but probably not</div><div>for =
all of them. WDYT?=C2=A0</div><blockquote class=3D"gmail_quote" style=3D"ma=
rgin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">=C2=A0<br></bl=
ockquote><blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;borde=
r-left:1px #ccc solid;padding-left:1ex">
With Guix a reproducibly building package generates the same Hash on<br>
all dependencies. Running the same tests every time on that makes no<br>
sense.<br>
<br>
And this hooks in with my main peeve about building from source. The<br>
building takes long enough. Testing takes incredibly long with many<br>
packages (especially language related) and are usually single core<br>
(unlike the build). It is also bad for our carbon foot print. Assuming<br>
everyone uses Guix on the planet, is that where we want to end up?<br>
<br>
Burning down the house.<br>
<br>
Like we pull substitutes we could pull a list of hashes of test cases<br>
that are known to work (on Hydra or elsewhere). This is much lighter<br>
than storing substitutes, so when the binaries get removed we can<br>
still retain the test hashes and have fast builds. Also true for guix<br>
repo itself.<br>
<br>
I know there are two &#39;inputs&#39; I am not accounting for: (1) hardware=
<br>
variants and (2) the Linux kernel. But, honestly, I do not think we<br>
are in the business of testing those. We can assume these work. If<br>
not, any issues will be found in other ways (typically a segfault ;).<br>
Our tests are generally meaningless when it comes to (1) and (2). And<br>
packages that build differently on different platforms, like openblas,<br>
we should opt out on.<br>
<br>
I think this would be a cool innovation (in more ways than one).<br>
<br>
Pj.<br>
<br>
</blockquote></div><br></div></div>

--001a113eb8d49da3e7056913bad3--