Treating tests as special case

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Treating tests as special case
@ 2018-04-05  5:24 Pjotr Prins
  2018-04-05  6:05 ` Gábor Boskovits
                   ` (3 more replies)
  0 siblings, 4 replies; 22+ messages in thread
From: Pjotr Prins @ 2018-04-05  5:24 UTC (permalink / raw)
  To: guix-devel

Last night I was watching Rich Hickey's on Specs and deployment. It is
a very interesting talk in many ways, recommended. He talks about
tests at 1:02 into the talk:

  https://www.youtube.com/watch?v=oyLBGkS5ICk

and he gave me a new insight which rang immediately true. He said:
what is the point of running tests everywhere? If two people test the
same thing, what is the added value of that? (I paraphrase)

With Guix a reproducibly building package generates the same Hash on
all dependencies. Running the same tests every time on that makes no
sense.

And this hooks in with my main peeve about building from source. The
building takes long enough. Testing takes incredibly long with many
packages (especially language related) and are usually single core
(unlike the build). It is also bad for our carbon foot print. Assuming
everyone uses Guix on the planet, is that where we want to end up?

Burning down the house.

Like we pull substitutes we could pull a list of hashes of test cases
that are known to work (on Hydra or elsewhere). This is much lighter
than storing substitutes, so when the binaries get removed we can
still retain the test hashes and have fast builds. Also true for guix
repo itself.

I know there are two 'inputs' I am not accounting for: (1) hardware
variants and (2) the Linux kernel. But, honestly, I do not think we
are in the business of testing those. We can assume these work. If
not, any issues will be found in other ways (typically a segfault ;).
Our tests are generally meaningless when it comes to (1) and (2). And
packages that build differently on different platforms, like openblas,
we should opt out on. 

I think this would be a cool innovation (in more ways than one).

Pj.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  5:24 Treating tests as special case Pjotr Prins
@ 2018-04-05  6:05 ` Gábor Boskovits
  2018-04-05  8:39   ` Pjotr Prins
  2018-04-05  6:21 ` Björn Höfling
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 22+ messages in thread
From: Gábor Boskovits @ 2018-04-05  6:05 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: Guix-devel

[-- Attachment #1: Type: text/plain, Size: 2388 bytes --]

2018-04-05 7:24 GMT+02:00 Pjotr Prins <pjotr.public12@thebird.nl>:

> Last night I was watching Rich Hickey's on Specs and deployment. It is
> a very interesting talk in many ways, recommended. He talks about
> tests at 1:02 into the talk:
>
>   https://www.youtube.com/watch?v=oyLBGkS5ICk
>
> and he gave me a new insight which rang immediately true. He said:
> what is the point of running tests everywhere? If two people test the
> same thing, what is the added value of that? (I paraphrase)


Actually running tests test the behaviour of a software. Unfortunately
reproducible build does not guarantee reproducible behaviour.
Furthermore there are still cases, where the environment is
not the same around these running software, like hardware or
kernel configuration settings leaking into the environment.
These can be spotted by running tests. Nondeterministic
failures can also be spotted more easily. There are a lot of
packages where pulling tests can be done, I guess, but probably not
for all of them. WDYT?

>
>
With Guix a reproducibly building package generates the same Hash on
> all dependencies. Running the same tests every time on that makes no
> sense.
>
> And this hooks in with my main peeve about building from source. The
> building takes long enough. Testing takes incredibly long with many
> packages (especially language related) and are usually single core
> (unlike the build). It is also bad for our carbon foot print. Assuming
> everyone uses Guix on the planet, is that where we want to end up?
>
> Burning down the house.
>
> Like we pull substitutes we could pull a list of hashes of test cases
> that are known to work (on Hydra or elsewhere). This is much lighter
> than storing substitutes, so when the binaries get removed we can
> still retain the test hashes and have fast builds. Also true for guix
> repo itself.
>
> I know there are two 'inputs' I am not accounting for: (1) hardware
> variants and (2) the Linux kernel. But, honestly, I do not think we
> are in the business of testing those. We can assume these work. If
> not, any issues will be found in other ways (typically a segfault ;).
> Our tests are generally meaningless when it comes to (1) and (2). And
> packages that build differently on different platforms, like openblas,
> we should opt out on.
>
> I think this would be a cool innovation (in more ways than one).
>
> Pj.
>
>

[-- Attachment #2: Type: text/html, Size: 3253 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  6:05 ` Gábor Boskovits
@ 2018-04-05  8:39   ` Pjotr Prins
  2018-04-05  8:58     ` Hartmut Goebel
  0 siblings, 1 reply; 22+ messages in thread
From: Pjotr Prins @ 2018-04-05  8:39 UTC (permalink / raw)
  To: Gábor Boskovits; +Cc: Guix-devel

On Thu, Apr 05, 2018 at 08:05:39AM +0200, Gábor Boskovits wrote:
>    Actually running tests test the behaviour of a software. Unfortunately
>    reproducible build does not guarantee reproducible behaviour.
>    Furthermore there are still cases, where the environment is
>    not the same around these running software, like hardware or
>    kernel configuration settings leaking into the environment.
>    These can be spotted by running tests. Nondeterministic
>    failures can also be spotted more easily. There are a lot of
>    packages where pulling tests can be done, I guess, but probably not
>    for all of them. WDYT?

Hi Gabor,

If that were a real problem we should not be providing substitutes -
same problem. With substitutes we also provide software with tests
that have been run once (at least).

We should not forbid people to run tests. But I don't think it should
be the default once tests have been run in a configuation.

Think of it as functional programming. In my opinion rerunning tests
can be cached.

My point is that we should not overestimate/overdo the idea of
leakage. Save the planet. We have responsibility.

Pj.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  8:39   ` Pjotr Prins
@ 2018-04-05  8:58     ` Hartmut Goebel
  0 siblings, 0 replies; 22+ messages in thread
From: Hartmut Goebel @ 2018-04-05  8:58 UTC (permalink / raw)
  To: guix-devel

Am 05.04.2018 um 10:39 schrieb Pjotr Prins:
> We should not forbid people to run tests. But I don't think it should
> be the default once tests have been run in a configuation.
+1

> My point is that we should not overestimate/overdo the idea of
> leakage. Save the planet. We have responsibility.
+1

-- 
Regards
Hartmut Goebel

| Hartmut Goebel          | h.goebel@crazy-compilers.com               |
| www.crazy-compilers.com | compilers which you thought are impossible |

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  5:24 Treating tests as special case Pjotr Prins
  2018-04-05  6:05 ` Gábor Boskovits
@ 2018-04-05  6:21 ` Björn Höfling
  2018-04-05  8:43   ` Pjotr Prins
  2018-04-05 10:14   ` Ricardo Wurmus
  2018-04-05 10:26 ` Ricardo Wurmus
  2018-04-05 20:26 ` Treating tests as special case Mark H Weaver
  3 siblings, 2 replies; 22+ messages in thread
From: Björn Höfling @ 2018-04-05  6:21 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 3533 bytes --]

On Thu, 5 Apr 2018 07:24:39 +0200
Pjotr Prins <pjotr.public12@thebird.nl> wrote:

> Last night I was watching Rich Hickey's on Specs and deployment. It is
> a very interesting talk in many ways, recommended. He talks about
> tests at 1:02 into the talk:
> 
>   https://www.youtube.com/watch?v=oyLBGkS5ICk
> 
> and he gave me a new insight which rang immediately true. He said:
> what is the point of running tests everywhere? If two people test the
> same thing, what is the added value of that? (I paraphrase)
> 
> With Guix a reproducibly building package generates the same Hash on
> all dependencies. Running the same tests every time on that makes no
> sense.
> 
> And this hooks in with my main peeve about building from source. The
> building takes long enough. Testing takes incredibly long with many
> packages (especially language related) and are usually single core
> (unlike the build). It is also bad for our carbon foot print. Assuming
> everyone uses Guix on the planet, is that where we want to end up?
> 
> Burning down the house.
> 
> Like we pull substitutes we could pull a list of hashes of test cases
> that are known to work (on Hydra or elsewhere). This is much lighter
> than storing substitutes, so when the binaries get removed we can
> still retain the test hashes and have fast builds. Also true for guix
> repo itself.
> 
> I know there are two 'inputs' I am not accounting for: (1) hardware
> variants and (2) the Linux kernel. But, honestly, I do not think we
> are in the business of testing those. We can assume these work. If
> not, any issues will be found in other ways (typically a segfault ;).
> Our tests are generally meaningless when it comes to (1) and (2). And
> packages that build differently on different platforms, like openblas,
> we should opt out on. 
> 
> I think this would be a cool innovation (in more ways than one).
> 
> Pj.

Hi Pjotr,

great ideas!

Last night I did a 

guix pull && guix package -i git

We have substitutes, right? Yeah, but someone updated git, on my new
machine I didn't configure berlin.guixsd.org yet and hydra didn't have
any substitutes (build wasn't started yet?).

Building git was relatively fast, but all the tests took ages. And it
was just git. It should work. The git maintainers ran the tests. Marius
when he updated it in commit 5c151862c ran the tests. And that should
be enough of testing. Let's skip the tests.

On the other hand, if I create a new package definition and forget to
run the tests. If upstream is too sloppy, did not run the tests and had
no continuous integration. Who will run the tests then?

What if I build my package with different sources?

And you mentioned different environment conditions like machine and
kernel. We still have "only" 70-90% reproducibility. The complement
should have tests enabled. And the question "is my package
reproducible?" is not trivial to answer, and is not computable.

We saw tests that failed only in 2% of the runs and were fine in 98%.
If we would run those tests "just once", we couldn't figure out that
there is a problem (assuming the problem really is in the software, not
just the tests).

There could also be practible problems with that: If all write there
software nice and with autoconfigure and we just have a "make && make
test && make install" it's easy to skip the test. But for more
complicated things we have to find a way to tell the build-system how
to skip tests.

Björn

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  6:21 ` Björn Höfling
@ 2018-04-05  8:43   ` Pjotr Prins
  2018-04-06  8:58     ` Chris Marusich
  2018-04-05 10:14   ` Ricardo Wurmus
  1 sibling, 1 reply; 22+ messages in thread
From: Pjotr Prins @ 2018-04-05  8:43 UTC (permalink / raw)
  To: Björn Höfling; +Cc: guix-devel

On Thu, Apr 05, 2018 at 08:21:15AM +0200, Björn Höfling wrote:
> great ideas!
> 
> Last night I did a 
> 
> guix pull && guix package -i git
> 
> We have substitutes, right? Yeah, but someone updated git, on my new
> machine I didn't configure berlin.guixsd.org yet and hydra didn't have
> any substitutes (build wasn't started yet?).
> 
> Building git was relatively fast, but all the tests took ages. And it
> was just git. It should work. The git maintainers ran the tests. Marius
> when he updated it in commit 5c151862c ran the tests. And that should
> be enough of testing. Let's skip the tests.

Not exactly what I am proposing ;). But, even so, I think we should
have a switch for turning off tests. Let the builder decide what is
good or bad. Too much nannying serves no one.

> On the other hand, if I create a new package definition and forget to
> run the tests. If upstream is too sloppy, did not run the tests and had
> no continuous integration. Who will run the tests then?

Hydra should always test before providing a hash that testing is done.

> What if I build my package with different sources?
> 
> And you mentioned different environment conditions like machine and
> kernel. We still have "only" 70-90% reproducibility. The complement
> should have tests enabled. And the question "is my package
> reproducible?" is not trivial to answer, and is not computable.

Well, I believe that case is overrated and we prove that by actually
providing binary substitutes without testing ;)

> We saw tests that failed only in 2% of the runs and were fine in 98%.
> If we would run those tests "just once", we couldn't figure out that
> there is a problem (assuming the problem really is in the software, not
> just the tests).
> 
> There could also be practible problems with that: If all write there
> software nice and with autoconfigure and we just have a "make && make
> test && make install" it's easy to skip the test. But for more
> complicated things we have to find a way to tell the build-system how
> to skip tests.

Totally agree. At this point I patch the tree not to run tests.

Pj.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  8:43   ` Pjotr Prins
@ 2018-04-06  8:58     ` Chris Marusich
  2018-04-06 18:36       ` David Pirotte
  0 siblings, 1 reply; 22+ messages in thread
From: Chris Marusich @ 2018-04-06  8:58 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 4384 bytes --]

Pjotr Prins <pjotr.public12@thebird.nl> writes:

> I think we should have a switch for turning off tests. Let the builder
> decide what is good or bad. Too much nannying serves no one.

I think it would be OK to give users the choice of not running tests
when building from source, if they really don't want to.  This is
similar to how users can choose to skip the "make check" step (and live
with the risk) when building something manually.  However, I think we
should always run the tests by default.

Maybe you could submit a patch to add a "--no-tests" option?

ludo@gnu.org (Ludovic Courtès) writes:

> That is why I was suggesting putting effort in improving substitute
> delivery rather than trying to come up with special mechanisms.

Yes, I think that improving substitute availability is the best path
forward.  I'm willing to bet that Pjotr would not be so frustrated if
substitutes were consistently available.

Regarding Pjotr's suggestion to add a "test result substitute" feature:
It isn't clear to me how a "test result substitute" is any better than a
substitute in the usual sense.  It sounds like Pjotr is arguing that if
the substitute server can tell me that a package's tests have passed,
then I don't need to run the tests a second time.  But why would I have
to build the package from source in that case, anyway?  Assuming the
substitute server has told me that the package's tests have passed, it
is almost certainly the case that the package has been built and its
substitute is currently available, so I don't have to build the package
myself - I can just download the substitute!  Conversely, if a
substitute server says the tests have not passed, then certainly no
substitute will be available, so I'll have to build it (and run the
tests) myself.  Perhaps I am missing something, but it does not seem to
me that the existence of a "test result substitute" would add value.

I think what Pjotr really wants is (1) better substitute availability,
or (2) the option to skip tests when he has to build from source because
substitutes are not available.  I think (1) is the best goal, and (2) is
a reasonable request in line with Guix's goal of giving control to the
user.

Ricardo Wurmus <rekado@elephly.net> skribis:

> An idea that came up on #guix several months ago was to separate the
> building of packages from testing.  Testing would be a continuation of
> the build, like grafts could be envisioned as a continuation of the
> build.

What problems would that solve?

Pjotr Prins <pjotr.public12@thebird.nl> writes:

> The building takes long enough. Testing takes incredibly long with
> many packages (especially language related) and are usually single
> core (unlike the build).

Eelco told me that in Nix, they set --max-jobs to the number of CPU
cores, and --cores to 1, since lots of software has concurrency bugs
that are easier to work around by building on a single core.  Notably,
Guix does the opposite: we set --max-jobs to 1 and --cores to the number
of CPU cores.  I wonder if you would see faster builds by adjusting
these options for your use case?

> It is also bad for our carbon foot print. Assuming everyone uses Guix
> on the planet, is that where we want to end up?

When everyone uses Guix on the planet, substitutes will be ubiquitous.
You'll be able to skip the tests because, in practice, substitutes will
always be available (which means an authorized substitute server ran the
tests successfully).  Or, if you are very concerned about correctness,
you might STILL choose to build from source - AND run the tests -
because you are concerned that your particular circumstances (kernel
version, file system type, hardware, etc.) was not tested by the build
farm.

> I know there are two 'inputs' I am not accounting for: (1) hardware
> variants and (2) the Linux kernel. But, honestly, I do not think we
> are in the business of testing those. We can assume these work.

Even if those components worked for the maintainers who ran the tests on
their own machines and made a release, they might not work correctly in
your own situation.  Mark's story is a great example of this!  For this
reason, some people will still choose to build things from source
themselves, even if substitutes are available from some other place.

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-06  8:58     ` Chris Marusich
@ 2018-04-06 18:36       ` David Pirotte
  0 siblings, 0 replies; 22+ messages in thread
From: David Pirotte @ 2018-04-06 18:36 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1143 bytes --]

Hello,

> > An idea that came up on #guix several months ago was to separate the
> > building of packages from testing.  Testing would be a continuation of
> > the build, like grafts could be envisioned as a continuation of the
> > build.  

> What problems would that solve?

If one can run tests suites locally upon built packages, that would already save
quite a great deal of planet heat I guess, not building from the source in the first
place, but only if they find a bug, fix it ... - and iiuc, Mark would have found the
bug he mentioned ... 

> Even if those components worked for the maintainers who ran the tests on
> their own machines and made a release, they might not work correctly in
> your own situation.  Mark's story is a great example of this!  For this
> reason, some people will still choose to build things from source
> themselves, even if substitutes are available from some other place.

But they would rebuild from the source just to run the tests? Sounds to me that, if
possible, separate test suites from the building process is an added value to the
current situation

Cheers,
David



[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 488 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  6:21 ` Björn Höfling
  2018-04-05  8:43   ` Pjotr Prins
@ 2018-04-05 10:14   ` Ricardo Wurmus
  2018-04-05 12:19     ` Björn Höfling
  1 sibling, 1 reply; 22+ messages in thread
From: Ricardo Wurmus @ 2018-04-05 10:14 UTC (permalink / raw)
  To: Björn Höfling; +Cc: guix-devel

Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes:

> And you mentioned different environment conditions like machine and
> kernel. We still have "only" 70-90% reproducibility.

Where does that number come from?  In my tests for a non-trivial set of
bioinfo pipelines I got to 97.7% reproducibility (or 95.2% if you
include very minor problems) for 355 direct inputs.

I rebuilt on three different machines.

--
Ricardo

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 10:14   ` Ricardo Wurmus
@ 2018-04-05 12:19     ` Björn Höfling
  2018-04-05 14:10       ` Ricardo Wurmus
  0 siblings, 1 reply; 22+ messages in thread
From: Björn Höfling @ 2018-04-05 12:19 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 998 bytes --]

On Thu, 05 Apr 2018 12:14:53 +0200
Ricardo Wurmus <rekado@elephly.net> wrote:

> Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes:
> 
> > And you mentioned different environment conditions like machine and
> > kernel. We still have "only" 70-90% reproducibility.  
> 
> Where does that number come from?  In my tests for a non-trivial set
> of bioinfo pipelines I got to 97.7% reproducibility (or 95.2% if you
> include very minor problems) for 355 direct inputs.
> 
> I rebuilt on three different machines.

I have no own numbers but checked Ludivic's blog post from October 2017:

https://www.gnu.org/software/guix/blog/2017/reproducible-builds-a-status-update/

"We’re somewhere between 78% and 91%—not as good as Debian yet, [..]".

So if your numbers are valid for the whole repository, that is good
news and would mean we are now better than Debian [1], and that would
be worth a new blog post.

Björn

[1] https://isdebianreproducibleyet.com/




[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 12:19     ` Björn Höfling
@ 2018-04-05 14:10       ` Ricardo Wurmus
  0 siblings, 0 replies; 22+ messages in thread
From: Ricardo Wurmus @ 2018-04-05 14:10 UTC (permalink / raw)
  To: Björn Höfling; +Cc: guix-devel

Hi Björn,

> On Thu, 05 Apr 2018 12:14:53 +0200
> Ricardo Wurmus <rekado@elephly.net> wrote:
>
>> Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes:
>>
>> > And you mentioned different environment conditions like machine and
>> > kernel. We still have "only" 70-90% reproducibility.
>>
>> Where does that number come from?  In my tests for a non-trivial set
>> of bioinfo pipelines I got to 97.7% reproducibility (or 95.2% if you
>> include very minor problems) for 355 direct inputs.
>>
>> I rebuilt on three different machines.
>
> I have no own numbers but checked Ludivic's blog post from October 2017:
>
> https://www.gnu.org/software/guix/blog/2017/reproducible-builds-a-status-update/
>
> "We’re somewhere between 78% and 91%—not as good as Debian yet, [..]".

Ah, I see.

Back then we didn’t have a fix for Python bytecode, which affects a
large number of packages in Guix but not on Debian (who simply don’t
distribute bytecode AFAIU).

> So if your numbers are valid for the whole repository, that is good
> news and would mean we are now better than Debian [1], and that would
> be worth a new blog post.

The analysis was only done for the “pigx” package and its
direct/propagated inputs.

I’d like to investigate the sources of non-determinism for remaining
packages and fix them one by one.  For some we already know what’s wrong
(e.g. for Haskell packages the random order of packages in the database
seems to be responsible), but for others we haven’t made an effort to
look closely enough.

I’d also take the Debian numbers with a spoonful of salt (and then take
probiotics in an effort to undo some of the damage, see[1]), because
they aren’t actually rebuilding all Debian packages.

[1]: https://insights.mdc-berlin.de/en/2017/11/gut-bacteria-sensitive-salt/

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  5:24 Treating tests as special case Pjotr Prins
  2018-04-05  6:05 ` Gábor Boskovits
  2018-04-05  6:21 ` Björn Höfling
@ 2018-04-05 10:26 ` Ricardo Wurmus
  2018-04-05 14:14   ` Ludovic Courtès
  2018-04-05 20:26 ` Treating tests as special case Mark H Weaver
  3 siblings, 1 reply; 22+ messages in thread
From: Ricardo Wurmus @ 2018-04-05 10:26 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

Hi Pjotr,

> And this hooks in with my main peeve about building from source. The
> building takes long enough. Testing takes incredibly long with many
> packages (especially language related) and are usually single core
> (unlike the build).

I share the sentiment.  Waiting for tests to complete can be quite
annoying.

An idea that came up on #guix several months ago was to separate the
building of packages from testing.  Testing would be a continuation of
the build, like grafts could be envisioned as a continuation of the
build.

Packages with tests would then become leaf nodes in the graph — nothing
would depend on the packages with tests, only on the packages without
tests.  Building the test continuation would thus be optional and could
be something that’s done by the build farm but not by users who need to
compile a package for lack of substitutes.

The implementation details are tricky: can it be a proper continuation
from the time after the build phase but before the install phase?  Would
this involve reverting to a snapshot of the build container?  There are
packages that force “make check” before “make install” — do we patch
them or ignore them?  Will every package then produce one extra
derivation for tests?

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 10:26 ` Ricardo Wurmus
@ 2018-04-05 14:14   ` Ludovic Courtès
  2018-04-05 14:59     ` Pjotr Prins
  0 siblings, 1 reply; 22+ messages in thread
From: Ludovic Courtès @ 2018-04-05 14:14 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hello!

I sympathize with what you write about the inconvenience of running
tests, when substitutes aren’t available.  However, I do think running
tests has real value.

Of course sometimes we just spend time fiddling with the tests so they
would run in the isolated build environment, and they do run flawlessly
once we’ve done the usual adjustments (no networking, no /bin/sh, etc.)

However, in many packages we found integration issues that we would just
have missed had we not run the tests; that in turn can lead to very bad
user experience.  In other cases we found real upstream bugs and were
able to report them
(cf. <https://github.com/TaylanUB/scheme-bytestructures/issues/30> for
an example from today.)  Back when I contributed to Nixpkgs, tests were
not run by default and I think that it had a negative impact on QA.

So to me, not running tests is not an option.

The problem I’m more interested in is: can we provide substitutes more
quickly?  Can we grow an infrastructure such that ‘master’, by default,
contains software that has already been built?

Ricardo Wurmus <rekado@elephly.net> skribis:

> An idea that came up on #guix several months ago was to separate the
> building of packages from testing.  Testing would be a continuation of
> the build, like grafts could be envisioned as a continuation of the
> build.

I agree it would be nice, but I think there’s a significant technical
issue: test suites usually expect to run from the build tree.

Also, would a test failure invalidate the previously-built store
item(s)?

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 14:14   ` Ludovic Courtès
@ 2018-04-05 14:59     ` Pjotr Prins
  2018-04-05 15:17       ` Ricardo Wurmus
  2018-04-05 15:24       ` Ludovic Courtès
  0 siblings, 2 replies; 22+ messages in thread
From: Pjotr Prins @ 2018-04-05 14:59 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Thu, Apr 05, 2018 at 04:14:19PM +0200, Ludovic Courtès wrote:
> I sympathize with what you write about the inconvenience of running
> tests, when substitutes aren’t available.  However, I do think running
> tests has real value.
> 
> Of course sometimes we just spend time fiddling with the tests so they
> would run in the isolated build environment, and they do run flawlessly
> once we’ve done the usual adjustments (no networking, no /bin/sh, etc.)
> 
> However, in many packages we found integration issues that we would just
> have missed had we not run the tests; that in turn can lead to very bad
> user experience.  In other cases we found real upstream bugs and were
> able to report them
> (cf. <https://github.com/TaylanUB/scheme-bytestructures/issues/30> for
> an example from today.)  Back when I contributed to Nixpkgs, tests were
> not run by default and I think that it had a negative impact on QA.
> 
> So to me, not running tests is not an option.

I am *not* suggesting we stop testing and stop writing tests. They are
extremely important for integration (thought we could do with a lot
less and more focussed integration tests - ref Hickey). What I am
writing is that we don't have to rerun tests for everyone *once* they
succeed *somewhere*. If you have a successful reproducible build and
tests on a platform there is really no point in rerunning tests
everywhere for the exact same setup. It is a nice property of our FP
approach. Proof that it is not necessary is the fact that we
distribute substitute binaries without running tests there. What I am
proposing in essence is 'substitute tests'. 

Ricardo is suggesting an implementation. I think it is simpler. When
building a derivation we know the hash. If we have a list of hashes in
the database for successful tests (hash-tests-passed) it is
essentially queriable and done. Even when the substitute gets removed,
that item can still remain at almost no cost.

Ludo, I think we need to do this. There is no point in running tests
that already have been run. Hickey is right. I have reached
enlightment. Almost everything I thought about testing is wrong. If
all the inputs are the same the test will *always* pass. There is no
point to it! The only way such a test won't pass it by divine
intervention or real hardware problems. Both we don't want to test
for.

If tests are so important to rerun: tell me why we are not running
tests when substituting binaries?

> The problem I’m more interested in is: can we provide substitutes more
> quickly?  Can we grow an infrastructure such that ‘master’, by default,
> contains software that has already been built?

Sure, that is another challenge and an important one.

> Ricardo Wurmus <rekado@elephly.net> skribis:
> 
> > An idea that came up on #guix several months ago was to separate the
> > building of packages from testing.  Testing would be a continuation of
> > the build, like grafts could be envisioned as a continuation of the
> > build.
> 
> I agree it would be nice, but I think there’s a significant technical
> issue: test suites usually expect to run from the build tree.

What I understand is that Nix already does something like this. they
have split testing out to allow for network access. I don't propose to
split the process. I propose to cache testing as part of the build.

Pj.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 14:59     ` Pjotr Prins
@ 2018-04-05 15:17       ` Ricardo Wurmus
  2018-04-05 15:24       ` Ludovic Courtès
  1 sibling, 0 replies; 22+ messages in thread
From: Ricardo Wurmus @ 2018-04-05 15:17 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

Pjotr Prins <pjotr.public12@thebird.nl> writes:

> If all the inputs are the same the test will *always* pass. There is
> no point to it! The only way such a test won't pass it by divine
> intervention or real hardware problems. Both we don't want to test
> for.
>
> If tests are so important to rerun: tell me why we are not running
> tests when substituting binaries?

I don’t understand this.  People only run tests when they haven’t been
run on the build farm, because that’s part of the build.  So when the
tests have passed (and the few short phases after that), then we have
substitutes anyway, and so users won’t re-run tests.

If you get substitutes you don’t need to run the tests.

Any change here seems to only affect the case where you build locally
even though there are substitutes.  I’d say that this is a pretty rare
use case.  Build farms do this, but they build binaries (and if they
differ from binaries built elsewhere the tests may also behave
differently).

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 14:59     ` Pjotr Prins
  2018-04-05 15:17       ` Ricardo Wurmus
@ 2018-04-05 15:24       ` Ludovic Courtès
  2018-04-05 16:41         ` Pjotr Prins
  1 sibling, 1 reply; 22+ messages in thread
From: Ludovic Courtès @ 2018-04-05 15:24 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

Pjotr Prins <pjotr.public12@thebird.nl> skribis:

> I am *not* suggesting we stop testing and stop writing tests. They are
> extremely important for integration (thought we could do with a lot
> less and more focussed integration tests - ref Hickey). What I am
> writing is that we don't have to rerun tests for everyone *once* they
> succeed *somewhere*. If you have a successful reproducible build and
> tests on a platform there is really no point in rerunning tests
> everywhere for the exact same setup. It is a nice property of our FP
> approach. Proof that it is not necessary is the fact that we
> distribute substitute binaries without running tests there. What I am
> proposing in essence is 'substitute tests'. 

Understood.

> If tests are so important to rerun: tell me why we are not running
> tests when substituting binaries?

Because you have a substitute if and only those tests already passed
somewhere.  This is exactly the property we’re interested in, right?

That is why I was suggesting putting effort in improving substitute
delivery rather than trying to come up with special mechanisms.

>> Ricardo Wurmus <rekado@elephly.net> skribis:
>> 
>> > An idea that came up on #guix several months ago was to separate the
>> > building of packages from testing.  Testing would be a continuation of
>> > the build, like grafts could be envisioned as a continuation of the
>> > build.
>> 
>> I agree it would be nice, but I think there’s a significant technical
>> issue: test suites usually expect to run from the build tree.
>
> What I understand is that Nix already does something like this. they
> have split testing out to allow for network access.

Do you have pointers to that?  All I’m aware of is the ‘doCheck’
variable that is unset (i.e., false) by default:

  https://github.com/NixOS/nixpkgs/blob/master/pkgs/stdenv/generic/setup.sh#L1192

Ludo’.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 15:24       ` Ludovic Courtès
@ 2018-04-05 16:41         ` Pjotr Prins
  2018-04-05 18:35           ` Pjotr Prins
  2018-04-06  7:57           ` Retaining substitutes Ludovic Courtès
  0 siblings, 2 replies; 22+ messages in thread
From: Pjotr Prins @ 2018-04-05 16:41 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

On Thu, Apr 05, 2018 at 05:24:12PM +0200, Ludovic Courtès wrote:
> Pjotr Prins <pjotr.public12@thebird.nl> skribis:
> 
> > I am *not* suggesting we stop testing and stop writing tests. They are
> > extremely important for integration (thought we could do with a lot
> > less and more focussed integration tests - ref Hickey). What I am
> > writing is that we don't have to rerun tests for everyone *once* they
> > succeed *somewhere*. If you have a successful reproducible build and
> > tests on a platform there is really no point in rerunning tests
> > everywhere for the exact same setup. It is a nice property of our FP
> > approach. Proof that it is not necessary is the fact that we
> > distribute substitute binaries without running tests there. What I am
> > proposing in essence is 'substitute tests'. 
> 
> Understood.
> 
> > If tests are so important to rerun: tell me why we are not running
> > tests when substituting binaries?
> 
> Because you have a substitute if and only those tests already passed
> somewhere.  This is exactly the property we’re interested in, right?

Yup. Problem is substitutes go away. We don't retain them and I often
encounter that use case.

Providing test-substitutes is much lighter and can be retained
forever.

When tests ever pass on a build server, we don't have to repeat them.
That is my story.

Pj.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 16:41         ` Pjotr Prins
@ 2018-04-05 18:35           ` Pjotr Prins
  2018-04-06  7:57           ` Retaining substitutes Ludovic Courtès
  1 sibling, 0 replies; 22+ messages in thread
From: Pjotr Prins @ 2018-04-05 18:35 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 665 bytes --]

On Thu, Apr 05, 2018 at 06:41:58PM +0200, Pjotr Prins wrote:
> Providing test-substitutes is much lighter and can be retained
> forever.

See it as a light-weight substitute. It can also mean we can retire
large binary substitutes quicker. Saving disk space. I think it is a
brilliant idea ;)

A result of the Hickey insight is that I am going to cut down on my
own tests (the ones I write). Only integration tests are of interest
for deployment. 

For those interested, attached patch disables tests in the build
system. You may need to adapt it a little for a recent checkout, but
you get the idea. Use at your own risk, but in a pinch it can be
handy.

Pj.

-- 

[-- Attachment #2: disable-tests.patch --]
[-- Type: text/x-diff, Size: 2179 bytes --]

diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm
index 1786e2e3c..2aff344df 100644
--- a/guix/build/gnu-build-system.scm
+++ b/guix/build/gnu-build-system.scm
@@ -286,7 +286,7 @@ makefiles."
 (define* (check #:key target (make-flags '()) (tests? (not target))
                 (test-target "check") (parallel-tests? #t)
                 #:allow-other-keys)
-  (if tests?
+  (if #f
       (zero? (apply system* "make" test-target
                     `(,@(if parallel-tests?
                             `("-j" ,(number->string (parallel-job-count)))
diff --git a/guix/build/perl-build-system.scm b/guix/build/perl-build-system.scm
index b2024e440..8008a7173 100644
--- a/guix/build/perl-build-system.scm
+++ b/guix/build/perl-build-system.scm
@@ -63,7 +63,7 @@
 (define-w/gnu-fallback* (check #:key target
                                (tests? (not target)) (test-flags '())
                                #:allow-other-keys)
-  (if tests?
+  (if #f
       (zero? (apply system* "./Build" "test" test-flags))
       (begin
         (format #t "test suite not run~%")
diff --git a/guix/build/python-build-system.scm b/guix/build/python-build-system.scm
index dd07986b9..dacf58110 100644
--- a/guix/build/python-build-system.scm
+++ b/guix/build/python-build-system.scm
@@ -131,7 +131,7 @@
 
 (define* (check #:key tests? test-target use-setuptools? #:allow-other-keys)
   "Run the test suite of a given Python package."
-  (if tests?
+  (if #f
       ;; Running `setup.py test` creates an additional .egg-info directory in
       ;; build/lib in some cases, e.g. if the source is in a sub-directory
       ;; (given with `package_dir`). This will by copied to the output, too,
diff --git a/guix/build/ruby-build-system.scm b/guix/build/ruby-build-system.scm
index c2d276627..2f12a4362 100644
--- a/guix/build/ruby-build-system.scm
+++ b/guix/build/ruby-build-system.scm
@@ -116,7 +116,7 @@ generate the files list."
 (define* (check #:key tests? test-target #:allow-other-keys)
   "Run the gem's test suite rake task TEST-TARGET.  Skip the tests if TESTS?
 is #f."
-  (if tests?
+  (if #f
       (zero? (system* "rake" test-target))
       #t))
 

^ permalink raw reply related	[flat|nested] 22+ messages in thread

* Retaining substitutes
  2018-04-05 16:41         ` Pjotr Prins
  2018-04-05 18:35           ` Pjotr Prins
@ 2018-04-06  7:57           ` Ludovic Courtès
  1 sibling, 0 replies; 22+ messages in thread
From: Ludovic Courtès @ 2018-04-06  7:57 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

Hello,

Pjotr Prins <pjotr.public12@thebird.nl> skribis:

> On Thu, Apr 05, 2018 at 05:24:12PM +0200, Ludovic Courtès wrote:
>> Pjotr Prins <pjotr.public12@thebird.nl> skribis:
>> 
>> > I am *not* suggesting we stop testing and stop writing tests. They are
>> > extremely important for integration (thought we could do with a lot
>> > less and more focussed integration tests - ref Hickey). What I am
>> > writing is that we don't have to rerun tests for everyone *once* they
>> > succeed *somewhere*. If you have a successful reproducible build and
>> > tests on a platform there is really no point in rerunning tests
>> > everywhere for the exact same setup. It is a nice property of our FP
>> > approach. Proof that it is not necessary is the fact that we
>> > distribute substitute binaries without running tests there. What I am
>> > proposing in essence is 'substitute tests'. 
>> 
>> Understood.
>> 
>> > If tests are so important to rerun: tell me why we are not running
>> > tests when substituting binaries?
>> 
>> Because you have a substitute if and only those tests already passed
>> somewhere.  This is exactly the property we’re interested in, right?
>
> Yup. Problem is substitutes go away. We don't retain them and I often
> encounter that use case.

I agree this is a problem.  We’ve tweaked ‘guix publish’, our nginx
configs, etc. over time to mitigate this, but I suppose we could still
do better.

When that happens, could you try to gather data about the missing
substitutes?  Like what packages are missing (where in the stack), and
also how old is the Guix commit you’re using.

More generally, I think there are connections with telemetry as we
discussed it recently: we should be able to monitor our build farms to
see concretely how much we’re retaining in high-level terms.

FWIW, today, on mirror.hydra.gnu.org, the nginx cache for nars contains
94G (for 3 architectures).

On berlin.guixsd.org, /var/cache/guix/publish takes 118G (3
architectures as well), and there’s room left.

> Providing test-substitutes is much lighter and can be retained
> forever.

I understand.  Now, I agree with Ricardo that this would target the
specific use case where you’re building from source (explicitly
disabling substitutes), yet you’d like to avoid running tests.

We could adresss this using specific mechanisms (although like I said, I
really don’t see what it would look like.)  However, I believe
optimizing substitute delivery in general would benefit everyone and
would also address the running-tests-takes-too-much-time issue.

Can we focus on measuring the performance of substitute delivery and
thinking about ways to improve it?

Thanks for your feedback,
Ludo’.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05  5:24 Treating tests as special case Pjotr Prins
                   ` (2 preceding siblings ...)
  2018-04-05 10:26 ` Ricardo Wurmus
@ 2018-04-05 20:26 ` Mark H Weaver
  2018-04-06  6:06   ` Pjotr Prins
  3 siblings, 1 reply; 22+ messages in thread
From: Mark H Weaver @ 2018-04-05 20:26 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

Hi Pjotr,

Pjotr Prins <pjotr.public12@thebird.nl> writes:

> and he gave me a new insight which rang immediately true. He said:
> what is the point of running tests everywhere? If two people test the
> same thing, what is the added value of that? (I paraphrase)
>
> With Guix a reproducibly building package generates the same Hash on
> all dependencies. Running the same tests every time on that makes no
> sense.

I appreciate your thoughts on this, but I respectfully disagree.

> I know there are two 'inputs' I am not accounting for: (1) hardware
> variants and (2) the Linux kernel. But, honestly, I do not think we
> are in the business of testing those. We can assume these work.

No, we can't.  For example, I recently discovered that GNU Tar fails one
of its tests on my GuixSD system based on Btrfs.  It turned out to be a
real bug in GNU Tar that could lead to data loss when creating an
archive of recently written files, with --sparse enabled.  I fixed it in
commit 45413064c9db1712c845e5a1065aa81f66667abe on core-updates.

I would not have discovered this bug if I had simply assumed that since
GNU Tar passes its tests on ext4fs, it surely must also pass its tests
on every other file system.

> If not, any issues will be found in other ways (typically a segfault
> ;).

The GNU Tar bug on Btrfs would never produce a segfault.  The only way
the bug could be observed is by noticing that data was lost.  I don't
think that's a good way to discover a bug.  I'd much rather discover the
bug by a failing test suite.

Tests on different hardware/kernel/kernel-config/file-system
combinations are quite useful for those who care about reliability of
their systems.  I, for one, would like to keep running test suites on my
own systems.

     Regards,
       Mark

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-05 20:26 ` Treating tests as special case Mark H Weaver
@ 2018-04-06  6:06   ` Pjotr Prins
  2018-04-06  8:27     ` Ricardo Wurmus
  0 siblings, 1 reply; 22+ messages in thread
From: Pjotr Prins @ 2018-04-06  6:06 UTC (permalink / raw)
  To: Mark H Weaver; +Cc: guix-devel

On Thu, Apr 05, 2018 at 04:26:50PM -0400, Mark H Weaver wrote:
> Tests on different hardware/kernel/kernel-config/file-system
> combinations are quite useful for those who care about reliability of
> their systems.  I, for one, would like to keep running test suites on my
> own systems.

Sure. And it is a great example why to test scenarios. But why force
it down everyone's throat? I don't want to test Scipy or ldc over and
over again. Note that I can work around it, but we are forcing our
methods here on others. If I do not like it, others won't. I am just
looking at running test billion times uselessly around the planet.
Does that not matter? We need to be green.

Ludo is correct that provisioning binary substitutes is one solution.
But not cheap.  Can we guarantee keeping all substitutes? At least the
ones with long running tests ;). I don't know how we remove
substitutes now, but it would make sense to me to base that on
download metrics and size. How about ranking downloads in the last 3
months times the time to build? And trim from the end. That may be
interesting.

Even so, with my idea of test substitutes you don't have to opt out of
testing.  And you would still have found that bug. Those who care can
test all they please. 

Anyway, that is enough. I made my point and I am certain that we will
change our ways at some point. The laborious solution is to remove all
meaningless tests. And I am sure over 90% are pretty damn meaningless
for our purposes. Like the glut in binaries, we will trim it down over
time.

One suggestion: let's also look at tests that are *not* about
integration or hardware/kernel configuration and allow for running them
optionally. Stupidly running all tests that people come up with is not
a great idea. We just run what authors decide that should be run.

Pj.

^ permalink raw reply	[flat|nested] 22+ messages in thread

* Re: Treating tests as special case
  2018-04-06  6:06   ` Pjotr Prins
@ 2018-04-06  8:27     ` Ricardo Wurmus
  0 siblings, 0 replies; 22+ messages in thread
From: Ricardo Wurmus @ 2018-04-06  8:27 UTC (permalink / raw)
  To: Pjotr Prins; +Cc: guix-devel

Pjotr Prins <pjotr.public12@thebird.nl> writes:

> Ludo is correct that provisioning binary substitutes is one solution.
> But not cheap.  Can we guarantee keeping all substitutes? At least the
> ones with long running tests ;).

For berlin.guixsd.org we have an external storage array of a couple of
TB, which currently isn’t attached (I’ll get around to it some day).  We
can keep quite a few substitutes with that amount of space.

> Even so, with my idea of test substitutes you don't have to opt out of
> testing.  And you would still have found that bug. Those who care can
> test all they please.

I am not sure there’s an easy implementation that allows us to make
tests optional safely.  They are part of the derivation.  We could make
execution dependent on an environment variable that is set or not by the
daemon, I suppose.

> One suggestion: let's also look at tests that are *not* about
> integration or hardware/kernel configuration and allow for running them
> optionally. Stupidly running all tests that people come up with is not
> a great idea. We just run what authors decide that should be run.

We’ve already trimmed some of the longer test suites.  There are some
libraries and applications that have different test suites for different
purposes, and in those cases we picked something lighter and more
appropriate for our purposes.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 22+ messages in thread

end of thread, other threads:[~2018-04-06 18:37 UTC | newest]

Thread overview: 22+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-04-05  5:24 Treating tests as special case Pjotr Prins
2018-04-05  6:05 ` Gábor Boskovits
2018-04-05  8:39   ` Pjotr Prins
2018-04-05  8:58     ` Hartmut Goebel
2018-04-05  6:21 ` Björn Höfling
2018-04-05  8:43   ` Pjotr Prins
2018-04-06  8:58     ` Chris Marusich
2018-04-06 18:36       ` David Pirotte
2018-04-05 10:14   ` Ricardo Wurmus
2018-04-05 12:19     ` Björn Höfling
2018-04-05 14:10       ` Ricardo Wurmus
2018-04-05 10:26 ` Ricardo Wurmus
2018-04-05 14:14   ` Ludovic Courtès
2018-04-05 14:59     ` Pjotr Prins
2018-04-05 15:17       ` Ricardo Wurmus
2018-04-05 15:24       ` Ludovic Courtès
2018-04-05 16:41         ` Pjotr Prins
2018-04-05 18:35           ` Pjotr Prins
2018-04-06  7:57           ` Retaining substitutes Ludovic Courtès
2018-04-05 20:26 ` Treating tests as special case Mark H Weaver
2018-04-06  6:06   ` Pjotr Prins
2018-04-06  8:27     ` Ricardo Wurmus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).