On 2022-06-12, Vagrant Cascadian wrote: > I've been working on Reproducible Builds in guix a fair amount this > month. I did another round of this... I fixed a few packages recently, and noticed some other people fixing packages too, yay! As of this moment for x86_64, it looks like: * ~83% matching (a.k.a. reproducible) for 18920 packages * ~6% not matching (a.k.a. NOT reproducible) for 1337 packages * ~11% unknown (e.g. not built on both build farms) for 2440 packages https://data.guix.gnu.org/repository/1/branch/master/latest-processed-revision/package-reproducibility Ignoring the pesky unknown packages, it is more like ~93% reproducible and ~7% unreproducible... that feels a bit better to me! These numbers wander around over time, mostly due to packages moving back into an "unknown" state while the build farms catch up with each other... although the above numbers seem to have been pretty consistent over the last few days. > Some rough summaries about the types of issues: > > * ecl-* packages account for nearly half of the issues (~500 out of > ~1000 packages) More like ~570 out of ~1300 this time. Apparently there is an upstream issue for ecl, which is referenced in the summary. > * ~850 packages categorized (ecl-* accounting for most of them) ~990 packages reviewed (many duplicates from previous run). Slightly higher number to review higher this time, mostly due to some previous unknowns being reproducible/not reproducible. There are a handful of older-versions of things (e.g. package@1.0 vs package@2.0) that fail to build reproducibly and I didn't bother to look, I only checked the most recent versions of packages, so there are probably 300+ packages that could be reviewed. > * 19 packages embed kernel version 22 kernel version > * 63 packages embed timestamps 92 timestamps > * 52 packages embed dates (harder to reproduce that full timestamps) 46 dates > * 5 timestamps in python .pyc files 7 .pyc timestamps > * 12 timestamps in .jar files 12 .jar timestamps > * 66 ordering issues 82 ordering > * 3 ordering issues in .pyc files 3 .pyc ordering > * 9 ordering in .jar files 10 .jar ordering > * 16 ordering in guile .go files 13 guile .go ordering > * ~160 largely unidentified and inscrutible issues 193 unidentified > This does reveal that there are some opportunities for toolchain fixes, > fixing multiple packages at a time (and future packages too!), such as > ecl, sbcl, python, java, guile, clojure, texlive (see FORCE_SOURCE_DATE > proposal > https://lists.gnu.org/archive/html/guix-devel/2022-06/msg00171.html ). Still true! I tried patching texlive directly and failed to come up with something that worked, but haven't tried again recently. > I haven't done extensive cross-referencing with other distros, but > suspect there may be patches to fix some of these toolchain issues... If > you've savvy with any of the above languages, help fixing toolchain > issues would be amazing! Did a little of this, but still more to do! > If you're looking to get your hands dirty with some reproducibility > fixes in guix, a fair number of the timestamp, date and kernel version > fixes are likely fairly easy, but you generally have to manually verify > that the date or kernel version aren't embedded, as "guix build > --rounds=2" will likely happen with the same kernel version and date. Still very true! Maybe I should arrange a little virtual hackfest or something... I should probably normalize these issues a bit more and simplify them, but the full list I looked should be attached. Would it be ok to maintain this and some of the relevent tooling in a branch in guix.git, say, "reproducibility-notes"? Or make a new repository just for this? It most likely wouldn't share history with the other branches (much like the "keyring" branch), but presumably won't grow too large either. live well, vagrant