* Treating tests as special case @ 2018-04-05 5:24 Pjotr Prins 2018-04-05 6:05 ` Gábor Boskovits ` (3 more replies) 0 siblings, 4 replies; 22+ messages in thread From: Pjotr Prins @ 2018-04-05 5:24 UTC (permalink / raw) To: guix-devel Last night I was watching Rich Hickey's on Specs and deployment. It is a very interesting talk in many ways, recommended. He talks about tests at 1:02 into the talk: https://www.youtube.com/watch?v=oyLBGkS5ICk and he gave me a new insight which rang immediately true. He said: what is the point of running tests everywhere? If two people test the same thing, what is the added value of that? (I paraphrase) With Guix a reproducibly building package generates the same Hash on all dependencies. Running the same tests every time on that makes no sense. And this hooks in with my main peeve about building from source. The building takes long enough. Testing takes incredibly long with many packages (especially language related) and are usually single core (unlike the build). It is also bad for our carbon foot print. Assuming everyone uses Guix on the planet, is that where we want to end up? Burning down the house. Like we pull substitutes we could pull a list of hashes of test cases that are known to work (on Hydra or elsewhere). This is much lighter than storing substitutes, so when the binaries get removed we can still retain the test hashes and have fast builds. Also true for guix repo itself. I know there are two 'inputs' I am not accounting for: (1) hardware variants and (2) the Linux kernel. But, honestly, I do not think we are in the business of testing those. We can assume these work. If not, any issues will be found in other ways (typically a segfault ;). Our tests are generally meaningless when it comes to (1) and (2). And packages that build differently on different platforms, like openblas, we should opt out on. I think this would be a cool innovation (in more ways than one). Pj. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 5:24 Treating tests as special case Pjotr Prins @ 2018-04-05 6:05 ` Gábor Boskovits 2018-04-05 8:39 ` Pjotr Prins 2018-04-05 6:21 ` Björn Höfling ` (2 subsequent siblings) 3 siblings, 1 reply; 22+ messages in thread From: Gábor Boskovits @ 2018-04-05 6:05 UTC (permalink / raw) To: Pjotr Prins; +Cc: Guix-devel [-- Attachment #1: Type: text/plain, Size: 2388 bytes --] 2018-04-05 7:24 GMT+02:00 Pjotr Prins <pjotr.public12@thebird.nl>: > Last night I was watching Rich Hickey's on Specs and deployment. It is > a very interesting talk in many ways, recommended. He talks about > tests at 1:02 into the talk: > > https://www.youtube.com/watch?v=oyLBGkS5ICk > > and he gave me a new insight which rang immediately true. He said: > what is the point of running tests everywhere? If two people test the > same thing, what is the added value of that? (I paraphrase) Actually running tests test the behaviour of a software. Unfortunately reproducible build does not guarantee reproducible behaviour. Furthermore there are still cases, where the environment is not the same around these running software, like hardware or kernel configuration settings leaking into the environment. These can be spotted by running tests. Nondeterministic failures can also be spotted more easily. There are a lot of packages where pulling tests can be done, I guess, but probably not for all of them. WDYT? > > With Guix a reproducibly building package generates the same Hash on > all dependencies. Running the same tests every time on that makes no > sense. > > And this hooks in with my main peeve about building from source. The > building takes long enough. Testing takes incredibly long with many > packages (especially language related) and are usually single core > (unlike the build). It is also bad for our carbon foot print. Assuming > everyone uses Guix on the planet, is that where we want to end up? > > Burning down the house. > > Like we pull substitutes we could pull a list of hashes of test cases > that are known to work (on Hydra or elsewhere). This is much lighter > than storing substitutes, so when the binaries get removed we can > still retain the test hashes and have fast builds. Also true for guix > repo itself. > > I know there are two 'inputs' I am not accounting for: (1) hardware > variants and (2) the Linux kernel. But, honestly, I do not think we > are in the business of testing those. We can assume these work. If > not, any issues will be found in other ways (typically a segfault ;). > Our tests are generally meaningless when it comes to (1) and (2). And > packages that build differently on different platforms, like openblas, > we should opt out on. > > I think this would be a cool innovation (in more ways than one). > > Pj. > > [-- Attachment #2: Type: text/html, Size: 3253 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 6:05 ` Gábor Boskovits @ 2018-04-05 8:39 ` Pjotr Prins 2018-04-05 8:58 ` Hartmut Goebel 0 siblings, 1 reply; 22+ messages in thread From: Pjotr Prins @ 2018-04-05 8:39 UTC (permalink / raw) To: Gábor Boskovits; +Cc: Guix-devel On Thu, Apr 05, 2018 at 08:05:39AM +0200, Gábor Boskovits wrote: > Actually running tests test the behaviour of a software. Unfortunately > reproducible build does not guarantee reproducible behaviour. > Furthermore there are still cases, where the environment is > not the same around these running software, like hardware or > kernel configuration settings leaking into the environment. > These can be spotted by running tests. Nondeterministic > failures can also be spotted more easily. There are a lot of > packages where pulling tests can be done, I guess, but probably not > for all of them. WDYT? Hi Gabor, If that were a real problem we should not be providing substitutes - same problem. With substitutes we also provide software with tests that have been run once (at least). We should not forbid people to run tests. But I don't think it should be the default once tests have been run in a configuation. Think of it as functional programming. In my opinion rerunning tests can be cached. My point is that we should not overestimate/overdo the idea of leakage. Save the planet. We have responsibility. Pj. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 8:39 ` Pjotr Prins @ 2018-04-05 8:58 ` Hartmut Goebel 0 siblings, 0 replies; 22+ messages in thread From: Hartmut Goebel @ 2018-04-05 8:58 UTC (permalink / raw) To: guix-devel Am 05.04.2018 um 10:39 schrieb Pjotr Prins: > We should not forbid people to run tests. But I don't think it should > be the default once tests have been run in a configuation. +1 > My point is that we should not overestimate/overdo the idea of > leakage. Save the planet. We have responsibility. +1 -- Regards Hartmut Goebel | Hartmut Goebel | h.goebel@crazy-compilers.com | | www.crazy-compilers.com | compilers which you thought are impossible | ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 5:24 Treating tests as special case Pjotr Prins 2018-04-05 6:05 ` Gábor Boskovits @ 2018-04-05 6:21 ` Björn Höfling 2018-04-05 8:43 ` Pjotr Prins 2018-04-05 10:14 ` Ricardo Wurmus 2018-04-05 10:26 ` Ricardo Wurmus 2018-04-05 20:26 ` Treating tests as special case Mark H Weaver 3 siblings, 2 replies; 22+ messages in thread From: Björn Höfling @ 2018-04-05 6:21 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 3533 bytes --] On Thu, 5 Apr 2018 07:24:39 +0200 Pjotr Prins <pjotr.public12@thebird.nl> wrote: > Last night I was watching Rich Hickey's on Specs and deployment. It is > a very interesting talk in many ways, recommended. He talks about > tests at 1:02 into the talk: > > https://www.youtube.com/watch?v=oyLBGkS5ICk > > and he gave me a new insight which rang immediately true. He said: > what is the point of running tests everywhere? If two people test the > same thing, what is the added value of that? (I paraphrase) > > With Guix a reproducibly building package generates the same Hash on > all dependencies. Running the same tests every time on that makes no > sense. > > And this hooks in with my main peeve about building from source. The > building takes long enough. Testing takes incredibly long with many > packages (especially language related) and are usually single core > (unlike the build). It is also bad for our carbon foot print. Assuming > everyone uses Guix on the planet, is that where we want to end up? > > Burning down the house. > > Like we pull substitutes we could pull a list of hashes of test cases > that are known to work (on Hydra or elsewhere). This is much lighter > than storing substitutes, so when the binaries get removed we can > still retain the test hashes and have fast builds. Also true for guix > repo itself. > > I know there are two 'inputs' I am not accounting for: (1) hardware > variants and (2) the Linux kernel. But, honestly, I do not think we > are in the business of testing those. We can assume these work. If > not, any issues will be found in other ways (typically a segfault ;). > Our tests are generally meaningless when it comes to (1) and (2). And > packages that build differently on different platforms, like openblas, > we should opt out on. > > I think this would be a cool innovation (in more ways than one). > > Pj. Hi Pjotr, great ideas! Last night I did a guix pull && guix package -i git We have substitutes, right? Yeah, but someone updated git, on my new machine I didn't configure berlin.guixsd.org yet and hydra didn't have any substitutes (build wasn't started yet?). Building git was relatively fast, but all the tests took ages. And it was just git. It should work. The git maintainers ran the tests. Marius when he updated it in commit 5c151862c ran the tests. And that should be enough of testing. Let's skip the tests. On the other hand, if I create a new package definition and forget to run the tests. If upstream is too sloppy, did not run the tests and had no continuous integration. Who will run the tests then? What if I build my package with different sources? And you mentioned different environment conditions like machine and kernel. We still have "only" 70-90% reproducibility. The complement should have tests enabled. And the question "is my package reproducible?" is not trivial to answer, and is not computable. We saw tests that failed only in 2% of the runs and were fine in 98%. If we would run those tests "just once", we couldn't figure out that there is a problem (assuming the problem really is in the software, not just the tests). There could also be practible problems with that: If all write there software nice and with autoconfigure and we just have a "make && make test && make install" it's easy to skip the test. But for more complicated things we have to find a way to tell the build-system how to skip tests. Björn [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 6:21 ` Björn Höfling @ 2018-04-05 8:43 ` Pjotr Prins 2018-04-06 8:58 ` Chris Marusich 2018-04-05 10:14 ` Ricardo Wurmus 1 sibling, 1 reply; 22+ messages in thread From: Pjotr Prins @ 2018-04-05 8:43 UTC (permalink / raw) To: Björn Höfling; +Cc: guix-devel On Thu, Apr 05, 2018 at 08:21:15AM +0200, Björn Höfling wrote: > great ideas! > > Last night I did a > > guix pull && guix package -i git > > We have substitutes, right? Yeah, but someone updated git, on my new > machine I didn't configure berlin.guixsd.org yet and hydra didn't have > any substitutes (build wasn't started yet?). > > Building git was relatively fast, but all the tests took ages. And it > was just git. It should work. The git maintainers ran the tests. Marius > when he updated it in commit 5c151862c ran the tests. And that should > be enough of testing. Let's skip the tests. Not exactly what I am proposing ;). But, even so, I think we should have a switch for turning off tests. Let the builder decide what is good or bad. Too much nannying serves no one. > On the other hand, if I create a new package definition and forget to > run the tests. If upstream is too sloppy, did not run the tests and had > no continuous integration. Who will run the tests then? Hydra should always test before providing a hash that testing is done. > What if I build my package with different sources? > > And you mentioned different environment conditions like machine and > kernel. We still have "only" 70-90% reproducibility. The complement > should have tests enabled. And the question "is my package > reproducible?" is not trivial to answer, and is not computable. Well, I believe that case is overrated and we prove that by actually providing binary substitutes without testing ;) > We saw tests that failed only in 2% of the runs and were fine in 98%. > If we would run those tests "just once", we couldn't figure out that > there is a problem (assuming the problem really is in the software, not > just the tests). > > There could also be practible problems with that: If all write there > software nice and with autoconfigure and we just have a "make && make > test && make install" it's easy to skip the test. But for more > complicated things we have to find a way to tell the build-system how > to skip tests. Totally agree. At this point I patch the tree not to run tests. Pj. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 8:43 ` Pjotr Prins @ 2018-04-06 8:58 ` Chris Marusich 2018-04-06 18:36 ` David Pirotte 0 siblings, 1 reply; 22+ messages in thread From: Chris Marusich @ 2018-04-06 8:58 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 4384 bytes --] Pjotr Prins <pjotr.public12@thebird.nl> writes: > I think we should have a switch for turning off tests. Let the builder > decide what is good or bad. Too much nannying serves no one. I think it would be OK to give users the choice of not running tests when building from source, if they really don't want to. This is similar to how users can choose to skip the "make check" step (and live with the risk) when building something manually. However, I think we should always run the tests by default. Maybe you could submit a patch to add a "--no-tests" option? ludo@gnu.org (Ludovic Courtès) writes: > That is why I was suggesting putting effort in improving substitute > delivery rather than trying to come up with special mechanisms. Yes, I think that improving substitute availability is the best path forward. I'm willing to bet that Pjotr would not be so frustrated if substitutes were consistently available. Regarding Pjotr's suggestion to add a "test result substitute" feature: It isn't clear to me how a "test result substitute" is any better than a substitute in the usual sense. It sounds like Pjotr is arguing that if the substitute server can tell me that a package's tests have passed, then I don't need to run the tests a second time. But why would I have to build the package from source in that case, anyway? Assuming the substitute server has told me that the package's tests have passed, it is almost certainly the case that the package has been built and its substitute is currently available, so I don't have to build the package myself - I can just download the substitute! Conversely, if a substitute server says the tests have not passed, then certainly no substitute will be available, so I'll have to build it (and run the tests) myself. Perhaps I am missing something, but it does not seem to me that the existence of a "test result substitute" would add value. I think what Pjotr really wants is (1) better substitute availability, or (2) the option to skip tests when he has to build from source because substitutes are not available. I think (1) is the best goal, and (2) is a reasonable request in line with Guix's goal of giving control to the user. Ricardo Wurmus <rekado@elephly.net> skribis: > An idea that came up on #guix several months ago was to separate the > building of packages from testing. Testing would be a continuation of > the build, like grafts could be envisioned as a continuation of the > build. What problems would that solve? Pjotr Prins <pjotr.public12@thebird.nl> writes: > The building takes long enough. Testing takes incredibly long with > many packages (especially language related) and are usually single > core (unlike the build). Eelco told me that in Nix, they set --max-jobs to the number of CPU cores, and --cores to 1, since lots of software has concurrency bugs that are easier to work around by building on a single core. Notably, Guix does the opposite: we set --max-jobs to 1 and --cores to the number of CPU cores. I wonder if you would see faster builds by adjusting these options for your use case? > It is also bad for our carbon foot print. Assuming everyone uses Guix > on the planet, is that where we want to end up? When everyone uses Guix on the planet, substitutes will be ubiquitous. You'll be able to skip the tests because, in practice, substitutes will always be available (which means an authorized substitute server ran the tests successfully). Or, if you are very concerned about correctness, you might STILL choose to build from source - AND run the tests - because you are concerned that your particular circumstances (kernel version, file system type, hardware, etc.) was not tested by the build farm. > I know there are two 'inputs' I am not accounting for: (1) hardware > variants and (2) the Linux kernel. But, honestly, I do not think we > are in the business of testing those. We can assume these work. Even if those components worked for the maintainers who ran the tests on their own machines and made a release, they might not work correctly in your own situation. Mark's story is a great example of this! For this reason, some people will still choose to build things from source themselves, even if substitutes are available from some other place. -- Chris [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 832 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-06 8:58 ` Chris Marusich @ 2018-04-06 18:36 ` David Pirotte 0 siblings, 0 replies; 22+ messages in thread From: David Pirotte @ 2018-04-06 18:36 UTC (permalink / raw) To: Chris Marusich; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 1143 bytes --] Hello, > > An idea that came up on #guix several months ago was to separate the > > building of packages from testing. Testing would be a continuation of > > the build, like grafts could be envisioned as a continuation of the > > build. > What problems would that solve? If one can run tests suites locally upon built packages, that would already save quite a great deal of planet heat I guess, not building from the source in the first place, but only if they find a bug, fix it ... - and iiuc, Mark would have found the bug he mentioned ... > Even if those components worked for the maintainers who ran the tests on > their own machines and made a release, they might not work correctly in > your own situation. Mark's story is a great example of this! For this > reason, some people will still choose to build things from source > themselves, even if substitutes are available from some other place. But they would rebuild from the source just to run the tests? Sounds to me that, if possible, separate test suites from the building process is an added value to the current situation Cheers, David [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 488 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 6:21 ` Björn Höfling 2018-04-05 8:43 ` Pjotr Prins @ 2018-04-05 10:14 ` Ricardo Wurmus 2018-04-05 12:19 ` Björn Höfling 1 sibling, 1 reply; 22+ messages in thread From: Ricardo Wurmus @ 2018-04-05 10:14 UTC (permalink / raw) To: Björn Höfling; +Cc: guix-devel Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes: > And you mentioned different environment conditions like machine and > kernel. We still have "only" 70-90% reproducibility. Where does that number come from? In my tests for a non-trivial set of bioinfo pipelines I got to 97.7% reproducibility (or 95.2% if you include very minor problems) for 355 direct inputs. I rebuilt on three different machines. -- Ricardo ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 10:14 ` Ricardo Wurmus @ 2018-04-05 12:19 ` Björn Höfling 2018-04-05 14:10 ` Ricardo Wurmus 0 siblings, 1 reply; 22+ messages in thread From: Björn Höfling @ 2018-04-05 12:19 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 998 bytes --] On Thu, 05 Apr 2018 12:14:53 +0200 Ricardo Wurmus <rekado@elephly.net> wrote: > Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes: > > > And you mentioned different environment conditions like machine and > > kernel. We still have "only" 70-90% reproducibility. > > Where does that number come from? In my tests for a non-trivial set > of bioinfo pipelines I got to 97.7% reproducibility (or 95.2% if you > include very minor problems) for 355 direct inputs. > > I rebuilt on three different machines. I have no own numbers but checked Ludivic's blog post from October 2017: https://www.gnu.org/software/guix/blog/2017/reproducible-builds-a-status-update/ "We’re somewhere between 78% and 91%—not as good as Debian yet, [..]". So if your numbers are valid for the whole repository, that is good news and would mean we are now better than Debian [1], and that would be worth a new blog post. Björn [1] https://isdebianreproducibleyet.com/ [-- Attachment #2: OpenPGP digital signature --] [-- Type: application/pgp-signature, Size: 181 bytes --] ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 12:19 ` Björn Höfling @ 2018-04-05 14:10 ` Ricardo Wurmus 0 siblings, 0 replies; 22+ messages in thread From: Ricardo Wurmus @ 2018-04-05 14:10 UTC (permalink / raw) To: Björn Höfling; +Cc: guix-devel Hi Björn, > On Thu, 05 Apr 2018 12:14:53 +0200 > Ricardo Wurmus <rekado@elephly.net> wrote: > >> Björn Höfling <bjoern.hoefling@bjoernhoefling.de> writes: >> >> > And you mentioned different environment conditions like machine and >> > kernel. We still have "only" 70-90% reproducibility. >> >> Where does that number come from? In my tests for a non-trivial set >> of bioinfo pipelines I got to 97.7% reproducibility (or 95.2% if you >> include very minor problems) for 355 direct inputs. >> >> I rebuilt on three different machines. > > I have no own numbers but checked Ludivic's blog post from October 2017: > > https://www.gnu.org/software/guix/blog/2017/reproducible-builds-a-status-update/ > > "We’re somewhere between 78% and 91%—not as good as Debian yet, [..]". Ah, I see. Back then we didn’t have a fix for Python bytecode, which affects a large number of packages in Guix but not on Debian (who simply don’t distribute bytecode AFAIU). > So if your numbers are valid for the whole repository, that is good > news and would mean we are now better than Debian [1], and that would > be worth a new blog post. The analysis was only done for the “pigx” package and its direct/propagated inputs. I’d like to investigate the sources of non-determinism for remaining packages and fix them one by one. For some we already know what’s wrong (e.g. for Haskell packages the random order of packages in the database seems to be responsible), but for others we haven’t made an effort to look closely enough. I’d also take the Debian numbers with a spoonful of salt (and then take probiotics in an effort to undo some of the damage, see[1]), because they aren’t actually rebuilding all Debian packages. [1]: https://insights.mdc-berlin.de/en/2017/11/gut-bacteria-sensitive-salt/ -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 5:24 Treating tests as special case Pjotr Prins 2018-04-05 6:05 ` Gábor Boskovits 2018-04-05 6:21 ` Björn Höfling @ 2018-04-05 10:26 ` Ricardo Wurmus 2018-04-05 14:14 ` Ludovic Courtès 2018-04-05 20:26 ` Treating tests as special case Mark H Weaver 3 siblings, 1 reply; 22+ messages in thread From: Ricardo Wurmus @ 2018-04-05 10:26 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel Hi Pjotr, > And this hooks in with my main peeve about building from source. The > building takes long enough. Testing takes incredibly long with many > packages (especially language related) and are usually single core > (unlike the build). I share the sentiment. Waiting for tests to complete can be quite annoying. An idea that came up on #guix several months ago was to separate the building of packages from testing. Testing would be a continuation of the build, like grafts could be envisioned as a continuation of the build. Packages with tests would then become leaf nodes in the graph — nothing would depend on the packages with tests, only on the packages without tests. Building the test continuation would thus be optional and could be something that’s done by the build farm but not by users who need to compile a package for lack of substitutes. The implementation details are tricky: can it be a proper continuation from the time after the build phase but before the install phase? Would this involve reverting to a snapshot of the build container? There are packages that force “make check” before “make install” — do we patch them or ignore them? Will every package then produce one extra derivation for tests? -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 10:26 ` Ricardo Wurmus @ 2018-04-05 14:14 ` Ludovic Courtès 2018-04-05 14:59 ` Pjotr Prins 0 siblings, 1 reply; 22+ messages in thread From: Ludovic Courtès @ 2018-04-05 14:14 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel Hello! I sympathize with what you write about the inconvenience of running tests, when substitutes aren’t available. However, I do think running tests has real value. Of course sometimes we just spend time fiddling with the tests so they would run in the isolated build environment, and they do run flawlessly once we’ve done the usual adjustments (no networking, no /bin/sh, etc.) However, in many packages we found integration issues that we would just have missed had we not run the tests; that in turn can lead to very bad user experience. In other cases we found real upstream bugs and were able to report them (cf. <https://github.com/TaylanUB/scheme-bytestructures/issues/30> for an example from today.) Back when I contributed to Nixpkgs, tests were not run by default and I think that it had a negative impact on QA. So to me, not running tests is not an option. The problem I’m more interested in is: can we provide substitutes more quickly? Can we grow an infrastructure such that ‘master’, by default, contains software that has already been built? Ricardo Wurmus <rekado@elephly.net> skribis: > An idea that came up on #guix several months ago was to separate the > building of packages from testing. Testing would be a continuation of > the build, like grafts could be envisioned as a continuation of the > build. I agree it would be nice, but I think there’s a significant technical issue: test suites usually expect to run from the build tree. Also, would a test failure invalidate the previously-built store item(s)? Thanks, Ludo’. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 14:14 ` Ludovic Courtès @ 2018-04-05 14:59 ` Pjotr Prins 2018-04-05 15:17 ` Ricardo Wurmus 2018-04-05 15:24 ` Ludovic Courtès 0 siblings, 2 replies; 22+ messages in thread From: Pjotr Prins @ 2018-04-05 14:59 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel On Thu, Apr 05, 2018 at 04:14:19PM +0200, Ludovic Courtès wrote: > I sympathize with what you write about the inconvenience of running > tests, when substitutes aren’t available. However, I do think running > tests has real value. > > Of course sometimes we just spend time fiddling with the tests so they > would run in the isolated build environment, and they do run flawlessly > once we’ve done the usual adjustments (no networking, no /bin/sh, etc.) > > However, in many packages we found integration issues that we would just > have missed had we not run the tests; that in turn can lead to very bad > user experience. In other cases we found real upstream bugs and were > able to report them > (cf. <https://github.com/TaylanUB/scheme-bytestructures/issues/30> for > an example from today.) Back when I contributed to Nixpkgs, tests were > not run by default and I think that it had a negative impact on QA. > > So to me, not running tests is not an option. I am *not* suggesting we stop testing and stop writing tests. They are extremely important for integration (thought we could do with a lot less and more focussed integration tests - ref Hickey). What I am writing is that we don't have to rerun tests for everyone *once* they succeed *somewhere*. If you have a successful reproducible build and tests on a platform there is really no point in rerunning tests everywhere for the exact same setup. It is a nice property of our FP approach. Proof that it is not necessary is the fact that we distribute substitute binaries without running tests there. What I am proposing in essence is 'substitute tests'. Ricardo is suggesting an implementation. I think it is simpler. When building a derivation we know the hash. If we have a list of hashes in the database for successful tests (hash-tests-passed) it is essentially queriable and done. Even when the substitute gets removed, that item can still remain at almost no cost. Ludo, I think we need to do this. There is no point in running tests that already have been run. Hickey is right. I have reached enlightment. Almost everything I thought about testing is wrong. If all the inputs are the same the test will *always* pass. There is no point to it! The only way such a test won't pass it by divine intervention or real hardware problems. Both we don't want to test for. If tests are so important to rerun: tell me why we are not running tests when substituting binaries? > The problem I’m more interested in is: can we provide substitutes more > quickly? Can we grow an infrastructure such that ‘master’, by default, > contains software that has already been built? Sure, that is another challenge and an important one. > Ricardo Wurmus <rekado@elephly.net> skribis: > > > An idea that came up on #guix several months ago was to separate the > > building of packages from testing. Testing would be a continuation of > > the build, like grafts could be envisioned as a continuation of the > > build. > > I agree it would be nice, but I think there’s a significant technical > issue: test suites usually expect to run from the build tree. What I understand is that Nix already does something like this. they have split testing out to allow for network access. I don't propose to split the process. I propose to cache testing as part of the build. Pj. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 14:59 ` Pjotr Prins @ 2018-04-05 15:17 ` Ricardo Wurmus 2018-04-05 15:24 ` Ludovic Courtès 1 sibling, 0 replies; 22+ messages in thread From: Ricardo Wurmus @ 2018-04-05 15:17 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel Pjotr Prins <pjotr.public12@thebird.nl> writes: > If all the inputs are the same the test will *always* pass. There is > no point to it! The only way such a test won't pass it by divine > intervention or real hardware problems. Both we don't want to test > for. > > If tests are so important to rerun: tell me why we are not running > tests when substituting binaries? I don’t understand this. People only run tests when they haven’t been run on the build farm, because that’s part of the build. So when the tests have passed (and the few short phases after that), then we have substitutes anyway, and so users won’t re-run tests. If you get substitutes you don’t need to run the tests. Any change here seems to only affect the case where you build locally even though there are substitutes. I’d say that this is a pretty rare use case. Build farms do this, but they build binaries (and if they differ from binaries built elsewhere the tests may also behave differently). -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 14:59 ` Pjotr Prins 2018-04-05 15:17 ` Ricardo Wurmus @ 2018-04-05 15:24 ` Ludovic Courtès 2018-04-05 16:41 ` Pjotr Prins 1 sibling, 1 reply; 22+ messages in thread From: Ludovic Courtès @ 2018-04-05 15:24 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel Pjotr Prins <pjotr.public12@thebird.nl> skribis: > I am *not* suggesting we stop testing and stop writing tests. They are > extremely important for integration (thought we could do with a lot > less and more focussed integration tests - ref Hickey). What I am > writing is that we don't have to rerun tests for everyone *once* they > succeed *somewhere*. If you have a successful reproducible build and > tests on a platform there is really no point in rerunning tests > everywhere for the exact same setup. It is a nice property of our FP > approach. Proof that it is not necessary is the fact that we > distribute substitute binaries without running tests there. What I am > proposing in essence is 'substitute tests'. Understood. > If tests are so important to rerun: tell me why we are not running > tests when substituting binaries? Because you have a substitute if and only those tests already passed somewhere. This is exactly the property we’re interested in, right? That is why I was suggesting putting effort in improving substitute delivery rather than trying to come up with special mechanisms. >> Ricardo Wurmus <rekado@elephly.net> skribis: >> >> > An idea that came up on #guix several months ago was to separate the >> > building of packages from testing. Testing would be a continuation of >> > the build, like grafts could be envisioned as a continuation of the >> > build. >> >> I agree it would be nice, but I think there’s a significant technical >> issue: test suites usually expect to run from the build tree. > > What I understand is that Nix already does something like this. they > have split testing out to allow for network access. Do you have pointers to that? All I’m aware of is the ‘doCheck’ variable that is unset (i.e., false) by default: https://github.com/NixOS/nixpkgs/blob/master/pkgs/stdenv/generic/setup.sh#L1192 Ludo’. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 15:24 ` Ludovic Courtès @ 2018-04-05 16:41 ` Pjotr Prins 2018-04-05 18:35 ` Pjotr Prins 2018-04-06 7:57 ` Retaining substitutes Ludovic Courtès 0 siblings, 2 replies; 22+ messages in thread From: Pjotr Prins @ 2018-04-05 16:41 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel On Thu, Apr 05, 2018 at 05:24:12PM +0200, Ludovic Courtès wrote: > Pjotr Prins <pjotr.public12@thebird.nl> skribis: > > > I am *not* suggesting we stop testing and stop writing tests. They are > > extremely important for integration (thought we could do with a lot > > less and more focussed integration tests - ref Hickey). What I am > > writing is that we don't have to rerun tests for everyone *once* they > > succeed *somewhere*. If you have a successful reproducible build and > > tests on a platform there is really no point in rerunning tests > > everywhere for the exact same setup. It is a nice property of our FP > > approach. Proof that it is not necessary is the fact that we > > distribute substitute binaries without running tests there. What I am > > proposing in essence is 'substitute tests'. > > Understood. > > > If tests are so important to rerun: tell me why we are not running > > tests when substituting binaries? > > Because you have a substitute if and only those tests already passed > somewhere. This is exactly the property we’re interested in, right? Yup. Problem is substitutes go away. We don't retain them and I often encounter that use case. Providing test-substitutes is much lighter and can be retained forever. When tests ever pass on a build server, we don't have to repeat them. That is my story. Pj. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 16:41 ` Pjotr Prins @ 2018-04-05 18:35 ` Pjotr Prins 2018-04-06 7:57 ` Retaining substitutes Ludovic Courtès 1 sibling, 0 replies; 22+ messages in thread From: Pjotr Prins @ 2018-04-05 18:35 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 665 bytes --] On Thu, Apr 05, 2018 at 06:41:58PM +0200, Pjotr Prins wrote: > Providing test-substitutes is much lighter and can be retained > forever. See it as a light-weight substitute. It can also mean we can retire large binary substitutes quicker. Saving disk space. I think it is a brilliant idea ;) A result of the Hickey insight is that I am going to cut down on my own tests (the ones I write). Only integration tests are of interest for deployment. For those interested, attached patch disables tests in the build system. You may need to adapt it a little for a recent checkout, but you get the idea. Use at your own risk, but in a pinch it can be handy. Pj. -- [-- Attachment #2: disable-tests.patch --] [-- Type: text/x-diff, Size: 2179 bytes --] diff --git a/guix/build/gnu-build-system.scm b/guix/build/gnu-build-system.scm index 1786e2e3c..2aff344df 100644 --- a/guix/build/gnu-build-system.scm +++ b/guix/build/gnu-build-system.scm @@ -286,7 +286,7 @@ makefiles." (define* (check #:key target (make-flags '()) (tests? (not target)) (test-target "check") (parallel-tests? #t) #:allow-other-keys) - (if tests? + (if #f (zero? (apply system* "make" test-target `(,@(if parallel-tests? `("-j" ,(number->string (parallel-job-count))) diff --git a/guix/build/perl-build-system.scm b/guix/build/perl-build-system.scm index b2024e440..8008a7173 100644 --- a/guix/build/perl-build-system.scm +++ b/guix/build/perl-build-system.scm @@ -63,7 +63,7 @@ (define-w/gnu-fallback* (check #:key target (tests? (not target)) (test-flags '()) #:allow-other-keys) - (if tests? + (if #f (zero? (apply system* "./Build" "test" test-flags)) (begin (format #t "test suite not run~%") diff --git a/guix/build/python-build-system.scm b/guix/build/python-build-system.scm index dd07986b9..dacf58110 100644 --- a/guix/build/python-build-system.scm +++ b/guix/build/python-build-system.scm @@ -131,7 +131,7 @@ (define* (check #:key tests? test-target use-setuptools? #:allow-other-keys) "Run the test suite of a given Python package." - (if tests? + (if #f ;; Running `setup.py test` creates an additional .egg-info directory in ;; build/lib in some cases, e.g. if the source is in a sub-directory ;; (given with `package_dir`). This will by copied to the output, too, diff --git a/guix/build/ruby-build-system.scm b/guix/build/ruby-build-system.scm index c2d276627..2f12a4362 100644 --- a/guix/build/ruby-build-system.scm +++ b/guix/build/ruby-build-system.scm @@ -116,7 +116,7 @@ generate the files list." (define* (check #:key tests? test-target #:allow-other-keys) "Run the gem's test suite rake task TEST-TARGET. Skip the tests if TESTS? is #f." - (if tests? + (if #f (zero? (system* "rake" test-target)) #t)) ^ permalink raw reply related [flat|nested] 22+ messages in thread
* Retaining substitutes 2018-04-05 16:41 ` Pjotr Prins 2018-04-05 18:35 ` Pjotr Prins @ 2018-04-06 7:57 ` Ludovic Courtès 1 sibling, 0 replies; 22+ messages in thread From: Ludovic Courtès @ 2018-04-06 7:57 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel Hello, Pjotr Prins <pjotr.public12@thebird.nl> skribis: > On Thu, Apr 05, 2018 at 05:24:12PM +0200, Ludovic Courtès wrote: >> Pjotr Prins <pjotr.public12@thebird.nl> skribis: >> >> > I am *not* suggesting we stop testing and stop writing tests. They are >> > extremely important for integration (thought we could do with a lot >> > less and more focussed integration tests - ref Hickey). What I am >> > writing is that we don't have to rerun tests for everyone *once* they >> > succeed *somewhere*. If you have a successful reproducible build and >> > tests on a platform there is really no point in rerunning tests >> > everywhere for the exact same setup. It is a nice property of our FP >> > approach. Proof that it is not necessary is the fact that we >> > distribute substitute binaries without running tests there. What I am >> > proposing in essence is 'substitute tests'. >> >> Understood. >> >> > If tests are so important to rerun: tell me why we are not running >> > tests when substituting binaries? >> >> Because you have a substitute if and only those tests already passed >> somewhere. This is exactly the property we’re interested in, right? > > Yup. Problem is substitutes go away. We don't retain them and I often > encounter that use case. I agree this is a problem. We’ve tweaked ‘guix publish’, our nginx configs, etc. over time to mitigate this, but I suppose we could still do better. When that happens, could you try to gather data about the missing substitutes? Like what packages are missing (where in the stack), and also how old is the Guix commit you’re using. More generally, I think there are connections with telemetry as we discussed it recently: we should be able to monitor our build farms to see concretely how much we’re retaining in high-level terms. FWIW, today, on mirror.hydra.gnu.org, the nginx cache for nars contains 94G (for 3 architectures). On berlin.guixsd.org, /var/cache/guix/publish takes 118G (3 architectures as well), and there’s room left. > Providing test-substitutes is much lighter and can be retained > forever. I understand. Now, I agree with Ricardo that this would target the specific use case where you’re building from source (explicitly disabling substitutes), yet you’d like to avoid running tests. We could adresss this using specific mechanisms (although like I said, I really don’t see what it would look like.) However, I believe optimizing substitute delivery in general would benefit everyone and would also address the running-tests-takes-too-much-time issue. Can we focus on measuring the performance of substitute delivery and thinking about ways to improve it? Thanks for your feedback, Ludo’. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 5:24 Treating tests as special case Pjotr Prins ` (2 preceding siblings ...) 2018-04-05 10:26 ` Ricardo Wurmus @ 2018-04-05 20:26 ` Mark H Weaver 2018-04-06 6:06 ` Pjotr Prins 3 siblings, 1 reply; 22+ messages in thread From: Mark H Weaver @ 2018-04-05 20:26 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel Hi Pjotr, Pjotr Prins <pjotr.public12@thebird.nl> writes: > and he gave me a new insight which rang immediately true. He said: > what is the point of running tests everywhere? If two people test the > same thing, what is the added value of that? (I paraphrase) > > With Guix a reproducibly building package generates the same Hash on > all dependencies. Running the same tests every time on that makes no > sense. I appreciate your thoughts on this, but I respectfully disagree. > I know there are two 'inputs' I am not accounting for: (1) hardware > variants and (2) the Linux kernel. But, honestly, I do not think we > are in the business of testing those. We can assume these work. No, we can't. For example, I recently discovered that GNU Tar fails one of its tests on my GuixSD system based on Btrfs. It turned out to be a real bug in GNU Tar that could lead to data loss when creating an archive of recently written files, with --sparse enabled. I fixed it in commit 45413064c9db1712c845e5a1065aa81f66667abe on core-updates. I would not have discovered this bug if I had simply assumed that since GNU Tar passes its tests on ext4fs, it surely must also pass its tests on every other file system. > If not, any issues will be found in other ways (typically a segfault > ;). The GNU Tar bug on Btrfs would never produce a segfault. The only way the bug could be observed is by noticing that data was lost. I don't think that's a good way to discover a bug. I'd much rather discover the bug by a failing test suite. Tests on different hardware/kernel/kernel-config/file-system combinations are quite useful for those who care about reliability of their systems. I, for one, would like to keep running test suites on my own systems. Regards, Mark ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-05 20:26 ` Treating tests as special case Mark H Weaver @ 2018-04-06 6:06 ` Pjotr Prins 2018-04-06 8:27 ` Ricardo Wurmus 0 siblings, 1 reply; 22+ messages in thread From: Pjotr Prins @ 2018-04-06 6:06 UTC (permalink / raw) To: Mark H Weaver; +Cc: guix-devel On Thu, Apr 05, 2018 at 04:26:50PM -0400, Mark H Weaver wrote: > Tests on different hardware/kernel/kernel-config/file-system > combinations are quite useful for those who care about reliability of > their systems. I, for one, would like to keep running test suites on my > own systems. Sure. And it is a great example why to test scenarios. But why force it down everyone's throat? I don't want to test Scipy or ldc over and over again. Note that I can work around it, but we are forcing our methods here on others. If I do not like it, others won't. I am just looking at running test billion times uselessly around the planet. Does that not matter? We need to be green. Ludo is correct that provisioning binary substitutes is one solution. But not cheap. Can we guarantee keeping all substitutes? At least the ones with long running tests ;). I don't know how we remove substitutes now, but it would make sense to me to base that on download metrics and size. How about ranking downloads in the last 3 months times the time to build? And trim from the end. That may be interesting. Even so, with my idea of test substitutes you don't have to opt out of testing. And you would still have found that bug. Those who care can test all they please. Anyway, that is enough. I made my point and I am certain that we will change our ways at some point. The laborious solution is to remove all meaningless tests. And I am sure over 90% are pretty damn meaningless for our purposes. Like the glut in binaries, we will trim it down over time. One suggestion: let's also look at tests that are *not* about integration or hardware/kernel configuration and allow for running them optionally. Stupidly running all tests that people come up with is not a great idea. We just run what authors decide that should be run. Pj. ^ permalink raw reply [flat|nested] 22+ messages in thread
* Re: Treating tests as special case 2018-04-06 6:06 ` Pjotr Prins @ 2018-04-06 8:27 ` Ricardo Wurmus 0 siblings, 0 replies; 22+ messages in thread From: Ricardo Wurmus @ 2018-04-06 8:27 UTC (permalink / raw) To: Pjotr Prins; +Cc: guix-devel Pjotr Prins <pjotr.public12@thebird.nl> writes: > Ludo is correct that provisioning binary substitutes is one solution. > But not cheap. Can we guarantee keeping all substitutes? At least the > ones with long running tests ;). For berlin.guixsd.org we have an external storage array of a couple of TB, which currently isn’t attached (I’ll get around to it some day). We can keep quite a few substitutes with that amount of space. > Even so, with my idea of test substitutes you don't have to opt out of > testing. And you would still have found that bug. Those who care can > test all they please. I am not sure there’s an easy implementation that allows us to make tests optional safely. They are part of the derivation. We could make execution dependent on an environment variable that is set or not by the daemon, I suppose. > One suggestion: let's also look at tests that are *not* about > integration or hardware/kernel configuration and allow for running them > optionally. Stupidly running all tests that people come up with is not > a great idea. We just run what authors decide that should be run. We’ve already trimmed some of the longer test suites. There are some libraries and applications that have different test suites for different purposes, and in those cases we picked something lighter and more appropriate for our purposes. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 22+ messages in thread
end of thread, other threads:[~2018-04-06 18:37 UTC | newest] Thread overview: 22+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2018-04-05 5:24 Treating tests as special case Pjotr Prins 2018-04-05 6:05 ` Gábor Boskovits 2018-04-05 8:39 ` Pjotr Prins 2018-04-05 8:58 ` Hartmut Goebel 2018-04-05 6:21 ` Björn Höfling 2018-04-05 8:43 ` Pjotr Prins 2018-04-06 8:58 ` Chris Marusich 2018-04-06 18:36 ` David Pirotte 2018-04-05 10:14 ` Ricardo Wurmus 2018-04-05 12:19 ` Björn Höfling 2018-04-05 14:10 ` Ricardo Wurmus 2018-04-05 10:26 ` Ricardo Wurmus 2018-04-05 14:14 ` Ludovic Courtès 2018-04-05 14:59 ` Pjotr Prins 2018-04-05 15:17 ` Ricardo Wurmus 2018-04-05 15:24 ` Ludovic Courtès 2018-04-05 16:41 ` Pjotr Prins 2018-04-05 18:35 ` Pjotr Prins 2018-04-06 7:57 ` Retaining substitutes Ludovic Courtès 2018-04-05 20:26 ` Treating tests as special case Mark H Weaver 2018-04-06 6:06 ` Pjotr Prins 2018-04-06 8:27 ` Ricardo Wurmus
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).