Dear, Recently, we discovered a regression in the Haskell build system: introducing unreproducible builds. Well, it was a kind of luck: I was testing ’git-annex’ with the willing to have ’git-annex-assitant’ building it several times (--check) [1]. Aside this particular issue, ~10% of packages are not reproducible and I am not convinced that “--check” is done by submitter/committer at each update or new package. Otherwise the case of unreproducible Mesa [2] would raised before than June. :-) (That’s fine, we need package after all and we cannot fix the world all in the same time. :-)) The issue is to be able to find them. I proposed (below) to run cron task doing ’--check’ on the build farms and then report by email the failure. Chris indicated me the work they is doing [3] and instead of a cron task, they is proposing to parse the JSON. That’s what the tiny script attached is doing. guix repl -L . -- weather-repro.scm For example, I run: guix repl -L . -- weather-repro.scm | sort | grep ghc to list (almost) all the unreproducible Haskell packages. What I would like is to be able to filter by build system for example. First, Chris could you add the fields package name and version? Because it is hard to automatically reconstruct them by parsing the output-path. Second, the revision of does not match the Guix commit. Is it possible to have a bridge? Other said, how is computed this revision hash? (A working revision is 6cf35799dec60723f37d83a559429aa8b90482d5 which does not seems founding in Guix repo.) Third, this tiny script is better than nothing but *far far away* form perfect. The question about tooling is: does it make sense to include something like that directly in “guix weather”? For example, guix weather --reproducible or maybe under “guix challenge”? WDYT? Feedback and ideas are very welcome. :-) All the best, simon PS: Below my question and the Chris’s answer. Both deserve to be public as Chris told me. :-) 1: 2: 3: -------------------- Start of forwarded message -------------------- From: zimoun Subject: [guix-sysadmin] whishlist: Hook on the build-farm? Date: Sun, 11 Oct 2020 17:19:26 +0200 Hi, Currently, it is hard to catch: 1. which commit breaks which package 2. if the package builds reproductibly Even if the Data services helps, *a lot!*. There are still a lot of manual actions to spot one or the other. And I fully agree that the work initiated by Chris is The Right Thing©. However it is not ready and the man power is not extensible. For the #1, Danny have started a discussion. For the #2, I am proposing to add a cron task on one build-farm. To be concrete, let’s *randomly* pick 100 packages once a week, rebuild with “--check“ and send by email the unreproducible packages. Even, I am proposing: 1rst week 100 packages of build-system “foo”, 2nd week 100 packages of build-system “bar”, 3rd week… It is far from perfect but it seems a good heuristic to catch regression, spot packages with reproducibility troubles, etc. Note that it should not happen since the committer should catch the reproducibility issue; but as a matter of fact it is not the case. Somehow, I am proposing a workaround. I volunteer to be the recipient of these automatic emails, then I can do some triage (remove false-positive, check what’s going, etc.) and open a bug report if there is an issue. Currently, I do not have the CPU power to do so. So I am asking if it possible to put something like that on one of the building machines. I totally understand an answer as: « Simon, you are enthusiast and that’s nice but no and go to hell! » :-) Cheers, simon -------------------- End of forwarded message -------------------- -------------------- Start of forwarded message -------------------- From: Christopher Baines Subject: Re: [guix-sysadmin] whishlist: Hook on the build-farm? Date: Sun, 11 Oct 2020 17:38:03 +0100 > Currently, I do not have the CPU power to do so. So I am asking if it > possible to put something like that on one of the building machines. I > totally understand an answer as: « Simon, you are enthusiast and that’s > nice but no and go to hell! » :-) I too would really like to be able to identify/prevent regressions, including with respect to build reproducibility, and although the work I'm doing on this is going slowly I'm hoping that with the Guix Build Coordinator now I'll be able to get something sort of working. I've just made a few tweaks to the Guix Data Service to make the data it has on this a little easier to use. This URL [1] should show you package reproducibility stats for each architecture, computed from substitute data from ci.guix.gnu.org, bayfront.guix.gnu.org, guix.tobias.gr and guix.cbaines.net. Currently there seem to be 1515 outputs (so not exactly packages, but close) that don't seem to have built reproducibly. 1: https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-reproducibility Clicking through to the "Not matching" ones for x86_64-linux should give you this URL [2]. In case it's useful to have this data in a more machine readable form, I've added a JSON output option. 2 :https://data.guix-patches.cbaines.net/repository/2/branch/master/latest-processed-revision/package-derivation-outputs?output_consistency=not-matching&system=x86_64-linux You mention triage, and that's probably the biggest blocker to being able to methodically try and reduce the "Not matching" numbers on [1]. As far as I know, Debian has things like [3] and [4] to help with that. 3: https://tests.reproducible-builds.org/debian/index_issues.html 4: https://salsa.debian.org/reproducible-builds/reproducible-notes/-/blob/master/issues.yml Going back to the issue of a cron job to run guix build --check on some random packages and send emails, if you're looking for a list of packages (well actually outputs) which don't build reproducibly, then [2] might do? The JSON output doesn't contain the package names, but it probably could with only a little effort. If it did, you could download the JSON file for all the non-matching package outputs, and record the package names in a sorted list in a Git repository. If you do that every day, then you could read the git log to spot potential patterns/regressions. I don't think your email hit a mailing list, feel free to send my reply to one though, maybe guix-devel as this discussion probably deserves a wide audience. Thanks, Chris -------------------- End of forwarded message --------------------