* Investigating a reproducibility failure @ 2022-02-01 14:05 Konrad Hinsen 2022-02-01 14:30 ` Konrad Hinsen 2022-02-05 14:05 ` Ludovic Courtès 0 siblings, 2 replies; 18+ messages in thread From: Konrad Hinsen @ 2022-02-01 14:05 UTC (permalink / raw) To: Guix Devel Hi everyone, Two years ago, I published a supposedly reproducible computation, explaining how to re-run it at any time using Guix (it's at https://github.com/khinsen/rescience-ten-year-challenge-paper-3/). Yesterday, I got an e-mail from someone who tried, and failed. I tried myself, and failed as well. But I don't understand what's going on. To see the failure, do guix time-machine \ --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \ -- build openblas The build log is attached, the first error is getarch_2nd.c: In function ‘main’: getarch_2nd.c:12:35: error: ‘SGEMM_DEFAULT_UNROLL_M’ undeclared (first use in this function); did you mean ‘XGEMM_DEFAULT_UNROLL_M’? printf("SGEMM_UNROLL_M=%d\n", SGEMM_DEFAULT_UNROLL_M); ^~~~~~~~~~~~~~~~~~~~~~ XGEMM_DEFAULT_UNROLL_M What makes this complicated is the DYNAMIC_ARCH feature of openblas that Guix uses on X86 architectures. I don't know the details of who this should work and why it could fail. In particular, I don't know if the source code file getarch₂nd is supposed to be compiled at all if all goes well. I doubt we can do anything to fix the past, but I would like to understand what exactly went wrong here so we can make sure we do better in the future. Cheers, Konrad. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-01 14:05 Investigating a reproducibility failure Konrad Hinsen @ 2022-02-01 14:30 ` Konrad Hinsen 2022-02-02 23:19 ` Ricardo Wurmus 2022-02-05 14:05 ` Ludovic Courtès 1 sibling, 1 reply; 18+ messages in thread From: Konrad Hinsen @ 2022-02-01 14:30 UTC (permalink / raw) To: Guix Devel [-- Attachment #1: Type: text/plain, Size: 501 bytes --] Konrad Hinsen <konrad.hinsen@fastmail.net> writes: > To see the failure, do > > guix time-machine \ > --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \ > -- build openblas > > The build log is attached, the first error is Oops... Two mistakes ! First, I forgot the attachment, so here it comes, Second, I didn't quote the right commit. The failure happens with guix time-machine \ --commit=87e7faa2ae641d8302efc8b90f1e45f43f67f6da \ -- build openblas Cheers, Konrad. [-- Attachment #2: 6df92lhfz4vccgn5v2z0rc092bhz89-openblas-0.3.a.drv.bz2 --] [-- Type: application/octet-stream, Size: 28540 bytes --] ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-01 14:30 ` Konrad Hinsen @ 2022-02-02 23:19 ` Ricardo Wurmus 2022-02-02 23:36 ` Ricardo Wurmus 0 siblings, 1 reply; 18+ messages in thread From: Ricardo Wurmus @ 2022-02-02 23:19 UTC (permalink / raw) To: Konrad Hinsen; +Cc: guix-devel Konrad Hinsen <konrad.hinsen@fastmail.net> writes: > Konrad Hinsen <konrad.hinsen@fastmail.net> writes: > >> To see the failure, do >> >> guix time-machine \ >> --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \ >> -- build openblas >> >> The build log is attached, the first error is > > Oops... Two mistakes ! First, I forgot the attachment, so here it comes, > Second, I didn't quote the right commit. The failure happens with > > guix time-machine \ > --commit=87e7faa2ae641d8302efc8b90f1e45f43f67f6da \ > -- build openblas It builds fine on this laptop. --8<---------------cut here---------------start------------->8--- $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Address sizes: 39 bits physical, 48 bits virtual Byte Order: Little Endian CPU(s): 4 On-line CPU(s) list: 0-3 Vendor ID: GenuineIntel Model name: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz CPU family: 6 Model: 78 Thread(s) per core: 2 Core(s) per socket: 2 Socket(s): 1 Stepping: 3 CPU max MHz: 3100.0000 CPU min MHz: 400.0000 BogoMIPS: 5199.98 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr ss e sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nop l xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra nd lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx sm ap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act _window hwp_epp md_clear flush_l1d Virtualization features: Virtualization: VT-x Caches (sum of all): L1d: 64 KiB (2 instances) L1i: 64 KiB (2 instances) L2: 512 KiB (2 instances) L3: 4 MiB (1 instance) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-3 --8<---------------cut here---------------end--------------->8--- CPU detection is a bottomless can of worms. -- Ricardo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-02 23:19 ` Ricardo Wurmus @ 2022-02-02 23:36 ` Ricardo Wurmus 0 siblings, 0 replies; 18+ messages in thread From: Ricardo Wurmus @ 2022-02-02 23:36 UTC (permalink / raw) To: Konrad Hinsen; +Cc: guix-devel Ricardo Wurmus <rekado@elephly.net> writes: > Konrad Hinsen <konrad.hinsen@fastmail.net> writes: > >> Konrad Hinsen <konrad.hinsen@fastmail.net> writes: >> >>> To see the failure, do >>> >>> guix time-machine \ >>> --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \ >>> -- build openblas >>> >>> The build log is attached, the first error is >> >> Oops... Two mistakes ! First, I forgot the attachment, so here it comes, >> Second, I didn't quote the right commit. The failure happens with >> >> guix time-machine \ >> --commit=87e7faa2ae641d8302efc8b90f1e45f43f67f6da \ >> -- build openblas > > It builds fine on this laptop. > > $ lscpu > Architecture: x86_64 > CPU op-mode(s): 32-bit, 64-bit > Address sizes: 39 bits physical, 48 bits virtual > Byte Order: Little Endian > CPU(s): 4 > On-line CPU(s) list: 0-3 > Vendor ID: GenuineIntel > Model name: Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz > CPU family: 6 > Model: 78 > Thread(s) per core: 2 > Core(s) per socket: 2 > Socket(s): 1 > Stepping: 3 > CPU max MHz: 3100.0000 > CPU min MHz: 400.0000 > BogoMIPS: 5199.98 > Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr ss > e sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nop > l xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg > fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra > nd lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi > flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx sm > ap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act > _window hwp_epp md_clear flush_l1d > Virtualization features: > Virtualization: VT-x > Caches (sum of all): > L1d: 64 KiB (2 instances) > L1i: 64 KiB (2 instances) > L2: 512 KiB (2 instances) > L3: 4 MiB (1 instance) > NUMA: > NUMA node(s): 1 > NUMA node0 CPU(s): 0-3 I also built this on a different machine, foreign distro. Here’s the output of lscpu: --8<---------------cut here---------------start------------->8--- [rwurmus@beast:~] (571) $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 120 On-line CPU(s) list: 0-119 Thread(s) per core: 2 Core(s) per socket: 15 Socket(s): 4 NUMA node(s): 4 Vendor ID: GenuineIntel CPU family: 6 Model: 62 Model name: Intel(R) Xeon(R) CPU E7-4870 v2 @ 2.30GHz Stepping: 7 CPU MHz: 2127.050 CPU max MHz: 2900.0000 CPU min MHz: 1200.0000 BogoMIPS: 4588.44 Virtualization: VT-x L1d cache: 32K L1i cache: 32K L2 cache: 256K L3 cache: 30720K NUMA node0 CPU(s): 0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116 NUMA node1 CPU(s): 1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109,113,117 NUMA node2 CPU(s): 2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78,82,86,90,94,98,102,106,110,114,118 NUMA node3 CPU(s): 3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79,83,87,91,95,99,103,107,111,115,119 Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d --8<---------------cut here---------------end--------------->8--- The output differs, but the build did not fail. -- Ricardo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-01 14:05 Investigating a reproducibility failure Konrad Hinsen 2022-02-01 14:30 ` Konrad Hinsen @ 2022-02-05 14:05 ` Ludovic Courtès 2022-02-08 5:57 ` Konrad Hinsen 1 sibling, 1 reply; 18+ messages in thread From: Ludovic Courtès @ 2022-02-05 14:05 UTC (permalink / raw) To: Konrad Hinsen; +Cc: Guix Devel Hi! Konrad Hinsen <konrad.hinsen@fastmail.net> skribis: > To see the failure, do > > guix time-machine \ > --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \ > -- build openblas For the record, there’s still a substitute available for this one: --8<---------------cut here---------------start------------->8--- $ guix time-machine --commit=7357b3d7a52eb5db1674012c50d308d792741c48 -- weather openblas guile: warning: failed to install locale computing 1 package derivations for x86_64-linux... looking for 1 store items on https://ci.guix.gnu.org... https://ci.guix.gnu.org 100.0% substitutes available (1 out of 1) at least 24.5 MiB of nars (compressed) 78.3 MiB on disk (uncompressed) 0.003 seconds per request (0.0 seconds in total) 343.4 requests per second [ugly but unimportant backtrace omitted…] $ guix time-machine --commit=7357b3d7a52eb5db1674012c50d308d792741c48 -- build openblas guile: warning: failed to install locale /gnu/store/vax1vsg3ivf0r7j7n2xkbi1z3r0504l9-openblas-0.3.7 --8<---------------cut here---------------end--------------->8--- That doesn’t solve the fact that OpenBLAS compilation is not reproducible, as zimoun noted¹, and we need to fix it, but at least this colleague of yours should have been able to fetch substitutes, no? Thanks, Ludo’. ¹ https://issues.guix.gnu.org/51536 ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-05 14:05 ` Ludovic Courtès @ 2022-02-08 5:57 ` Konrad Hinsen 0 siblings, 0 replies; 18+ messages in thread From: Konrad Hinsen @ 2022-02-08 5:57 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guix Devel Hi Ludo, > Konrad Hinsen <konrad.hinsen@fastmail.net> skribis: > >> To see the failure, do >> >> guix time-machine \ >> --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \ >> -- build openblas > > For the record, there’s still a substitute available for this one: ... > That doesn’t solve the fact that OpenBLAS compilation is not > reproducible, as zimoun noted¹, and we need to fix it, but at least this > colleague of yours should have been able to fetch substitutes, no? Good point. If I try to use it now, it works, fetching the substitute. Back when I started investigating this, on the same machine, Guix tried to build locally. So I guess there was some ephemeral problem with accessing the substitute server. Cheers, Konrad ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure @ 2022-02-02 20:35 zimoun 2022-02-02 23:43 ` zimoun 0 siblings, 1 reply; 18+ messages in thread From: zimoun @ 2022-02-02 20:35 UTC (permalink / raw) To: guix-devel Hi Konrad, I get the same error as you. And for more versions than the only one your tested. For instance, for these commits, * substitutes and rebuilds 923dcc3597 Fri Jan 14 12:59:33 2022 +0100 gnu: iverilog: Update to 11.0. 79ca578182 Thu Nov 11 21:52:08 2021 -0500 gnu: fpc: Fix build. ab0cf06244 Thu Nov 11 13:35:51 2021 -0500 gnu: rust: Remove #:rust ,rust-1.52 arguments. bbd2864272 Sun Dec 27 15:50:08 2020 +0100 gnu: openblas: Update to 0.3.13. * substitutes but failed rebuilds (--no-grafts --check) 4b1538e6ef Thu Nov 11 12:18:37 2021 -0500 gnu: kexec-tools: Fix build on i686-linux. ade7638d84 Fri Sep 18 14:05:51 2020 +0200 Revert "gnu: openblas: Update to 0.3.10." c59e9f0a03 Fri Sep 18 08:57:48 2020 +0200 gnu: openblas: Update to 0.3.10 5969598149 Sat Mar 7 12:48:18 2020 +0100 gnu: openblas: Use HTTPS home page. 2ea095300a Tue Oct 8 21:23:06 2019 +0200 gnu: OpenBLAS: Update to 0.3.7. * no substitute and failed builds a4384dc970 Tue Oct 8 21:23:06 2019 +0200 gnu: OpenBLAS: Incorporate grafted changes. ba05be2249 Fri Sep 13 10:50:11 2019 +0200 gnu: openblas: Set 'NUM_THREADS'. 5855756c81 Thu Feb 21 22:04:48 2019 -0600 gnu: openblas: Honor parallel-job-count. 602a5ef9f3 Sun Feb 10 21:04:23 2019 +0100 gnu: OpenBLAS: Update to 0.3.5. Last, note that the time-machine is failing earlier for these commits: d26584fcda Thu Nov 11 12:18:37 2021 -0500 gnu: binutils-gold: Inherit from binutils-next. ac6f677249 Thu Nov 11 12:18:37 2021 -0500 gnu: Add binutils-next. 661b25a2ed Thu Nov 11 12:18:36 2021 -0500 gnu: openblas: Do not build static library. 9e497f44ba Thu Nov 11 12:18:36 2021 -0500 gnu: openblas: Add support for older x86 processors. bd771edd6c Thu Nov 11 12:18:31 2021 -0500 gnu: openblas: Update to 0.3.18 e364758d44 Fri Sep 18 22:26:33 2020 +0200 gnu: openblas: Update to 0.3.10 df5a2e4f83 Thu Mar 5 23:36:05 2020 +0100 gnu: OpenBLAS: Update to 0.3.9 087c94019d Sat Feb 15 22:02:56 2020 +0100 gnu: OpenBLAS: Update to 0.3.8. e77412362f Sat May 4 16:25:53 2019 +0200 gnu: OpenBLAS: Update to 0.3.6. which reduces the range for testing. This bug#51536 is discussing reproducibility of openblas and the compilation flags. 1: <https://issues.guix.gnu.org/51536> Cheers, simon ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-02 20:35 zimoun @ 2022-02-02 23:43 ` zimoun 2022-02-03 9:16 ` Konrad Hinsen 0 siblings, 1 reply; 18+ messages in thread From: zimoun @ 2022-02-02 23:43 UTC (permalink / raw) To: Guix Devel, Konrad Hinsen Hi Konrad, What is the output of 'lscpu'? For instance, on machine A running on Intel(R) Xeon(R) Gold 5218 CPU @ 2.30GHz, OpenBLAS for commit 87e7faa2ae641d8302efc8b90f1e45f43f67f6da builds. On machine B running Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz, OpenBLAS for the same commit fails. Cheers, simon ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-02 23:43 ` zimoun @ 2022-02-03 9:16 ` Konrad Hinsen 2022-02-03 11:41 ` Ricardo Wurmus ` (2 more replies) 0 siblings, 3 replies; 18+ messages in thread From: Konrad Hinsen @ 2022-02-03 9:16 UTC (permalink / raw) To: zimoun, Guix Devel, Ricardo Wurmus Hi Ricardo and Simon, Thanks for your insight! I didn't even know about lscpu. The output for my laptop is shown below. I tried building on a virtual machine, and that works fine. > CPU detection is a bottomless can of worms. That sounds very credible. But what can we do about this? There is obviously a trade-off between reproducibility and performance here. Can we support both, in a way that users can understand and manage? The OpenBlas package in Guix is (or at least was, back then) written for performance. Can I, as a user, ask for a reproducible version? That could either be a generic version for any x86 architecture (preferably), or one that always builds for a given sub-architecture and then fails at runtime if the CPU doesn't match. Next: can I, as a user of dependent code, ask for reproducible versions of all my dependencies? In my case, I was packaging Python code that calls OpenBlas via NumPy. Many people in that situation don't even know what OpenBlas is. I did know, but wasn't aware of the build-time CPU detection. There is of course the issue that we can never be sure if a build will be reproducible in the future. But we can at least take care of the cases where the packager is aware of non-reproducibility issues, and make them transparent and manageable. Cheers, Konrad --8<---------------cut here---------------start------------->8--- $ lscpu Architecture: x86_64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian Address sizes: 39 bits physical, 48 bits virtual CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 Vendor ID: GenuineIntel CPU family: 6 Model: 140 Model name: 11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz Stepping: 1 CPU MHz: 1800.000 CPU max MHz: 4800,0000 CPU min MHz: 400,0000 BogoMIPS: 3609.60 Virtualization: VT-x L1d cache: 192 KiB L1i cache: 128 KiB L2 cache: 5 MiB L3 cache: 12 MiB NUMA node0 CPU(s): 0-7 Vulnerability Itlb multihit: Not affected Vulnerability L1tf: Not affected Vulnerability Mds: Not affected Vulnerability Meltdown: Not affected Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via p rctl and seccomp Vulnerability Spectre v1: Mitigation; usercopy/swapgs barriers and __user poi nter sanitization Vulnerability Spectre v2: Mitigation; Enhanced IBRS, IBPB conditional, RSB fi lling Vulnerability Srbds: Not affected Vulnerability Tsx async abort: Not affected Flags: fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pg e mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm cons tant_tsc art arch_perfmon pebs bts rep_good nopl xt opology nonstop_tsc cpuid aperfmperf tsc_known_freq pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm 2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave av x f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb st ibp ibrs_enhanced tpr_shadow vnmi flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid rdt_a avx512f avx512dq rdseed adx sma p avx512ifma clflushopt clwb intel_pt avx512cd sha_ ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves split_lock_detect dtherm ida arat pln pts hwp hwp_ notify hwp_act_window hwp_epp hwp_pkg_req avx512vbm i umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpi d movdiri movdir64b fsrm avx512_vp2intersect md_cle ar flush_l1d arch_capabilities --8<---------------cut here---------------end--------------->8--- ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-03 9:16 ` Konrad Hinsen @ 2022-02-03 11:41 ` Ricardo Wurmus 2022-02-03 17:05 ` Konrad Hinsen 2022-02-03 12:07 ` zimoun 2022-02-05 14:12 ` Ludovic Courtès 2 siblings, 1 reply; 18+ messages in thread From: Ricardo Wurmus @ 2022-02-03 11:41 UTC (permalink / raw) To: Konrad Hinsen; +Cc: Guix Devel, zimoun Hi Konrad, >> CPU detection is a bottomless can of worms. > > That sounds very credible. But what can we do about this? > > There is obviously a trade-off between reproducibility and performance > here. Can we support both, in a way that users can understand and manage? So far our default approach has been to use the lowest common set of CPU instructions, which generally leads to poorly performing code. Some packages are smarter and provide different code paths for different CPUs. The resulting binary is built the same, but at runtime different parts of the code run dependent on the features the CPU reports. The case of OpenBLAS is an anomaly in that this mechanism seems to produce different binaries dependent on where it is built. When I first encountered this problem I guessed that perhaps it can only build these different code paths up to the feature set of the CPU on the build machine, so if you’re building with an older CPU your binary will lack components that would be used on newer CPUs. This is just a guess, though. Your problem is that the OpenBLAS build system doesn’t recognize your modern CPU. Ideally, it wouldn’t need to know anything about the build-time CPU to build all the different code paths for different CPU features. The only way around this — retroactively — is to pretend to have an older CPU, e.g. by using qemu. In the long term it would be great if we could patch OpenBLAS to not attempt to detect CPU features at build time. I’m not sure this will work if it does indeed use the currently available CPU features to determine “how far up” to build modules in support of certain CPU features / instruction sets. > There is of course the issue that we can never be sure if a build will > be reproducible in the future. But we can at least take care of the > cases where the packager is aware of non-reproducibility issues, and > make them transparent and manageable. The new “--tune” feature is supposed to take care of cases like this. We would still patch the code so that by default you’d get a package that is reproducible (= you get the same exact binary no matter when or where you build it) but that many not have optimal performance. With “--tune” you could opt to replace that generic build with one that uses features of your current CPU, using grafts to swap the generic library for the more performant library. -- Ricardo ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-03 11:41 ` Ricardo Wurmus @ 2022-02-03 17:05 ` Konrad Hinsen 0 siblings, 0 replies; 18+ messages in thread From: Konrad Hinsen @ 2022-02-03 17:05 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: Guix Devel, zimoun Hi Ricardo and Simon, Ricardo Wurmus <rekado@elephly.net> writes: > The case of OpenBLAS is an anomaly in that this mechanism seems to > produce different binaries dependent on where it is built. When I first Thanks a lot for those explanations, I hadn't realized how peculiar OpenBLAS is! > Your problem is that the OpenBLAS build system doesn’t recognize your > modern CPU. Ideally, it wouldn’t need to know anything about the > build-time CPU to build all the different code paths for different CPU > features. The only way around this — retroactively — is to pretend to > have an older CPU, e.g. by using qemu. So all we need is a "QEMU build system" in Guix, just for OpenBLAS ;-) > The new “--tune” feature is supposed to take care of cases like this. Right, I remember Ludo's blog post about this. zimoun <zimon.toutoune@gmail.com> writes: > Somehow, “recent” processors cannot build old versions. That's a whole new level of planned obsolescence! Cheers, Konrad. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-03 9:16 ` Konrad Hinsen 2022-02-03 11:41 ` Ricardo Wurmus @ 2022-02-03 12:07 ` zimoun 2022-02-05 14:12 ` Ludovic Courtès 2 siblings, 0 replies; 18+ messages in thread From: zimoun @ 2022-02-03 12:07 UTC (permalink / raw) To: Konrad Hinsen, Guix Devel, Ricardo Wurmus Hi Konrad, On Thu, 03 Feb 2022 at 10:16, Konrad Hinsen <konrad.hinsen@fastmail.net> wrote: >> CPU detection is a bottomless can of worms. > > That sounds very credible. But what can we do about this? Well, I do not know what could be done about this. Today, the picture for OpenBLAS@0.3.6 build looks like: * Fail i7-1185G7E (Tiger Lake) i7-10700K (Comet Lake) * Build i7-6500U (Skylake) E7-4870V2 (Ivy Bridge) 5218 (Cascade Lake) Somehow, “recent” processors cannot build old versions. > There is obviously a trade-off between reproducibility and performance > here. Can we support both, in a way that users can understand and manage? Usually both [1]. However, it is not clear for me why OpenBLAS v0.3.6 does not build on some “recent“ processors; even in poor performance mode with as much as possible generic code. 1: <https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/> > The OpenBlas package in Guix is (or at least was, back then) written for > performance. Can I, as a user, ask for a reproducible version? That > could either be a generic version for any x86 architecture (preferably), > or one that always builds for a given sub-architecture and then fails at > runtime if the CPU doesn't match. > > Next: can I, as a user of dependent code, ask for reproducible versions > of all my dependencies? In my case, I was packaging Python code that > calls OpenBlas via NumPy. Many people in that situation don't even know > what OpenBlas is. I did know, but wasn't aware of the build-time CPU > detection. > > There is of course the issue that we can never be sure if a build will > be reproducible in the future. But we can at least take care of the > cases where the packager is aware of non-reproducibility issues, and > make them transparent and manageable. The answer of your concerns is the transformation --tune, I guess. This transformation is providing micro-optimizations for high performance while preserving provenance tracking. Here the issue seems different. OpenBLAS v0.3.6 seems to fail to fallback to generic processor when it does not find the processor–probably because the microarchitecture was not existing or supported at the time. (Note that ’Comet Lake’ is not in the list %gcc-10-x86_64-micro-architectures, so --tune would probably be inefficient; I do not know.) Cheers, simon ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-03 9:16 ` Konrad Hinsen 2022-02-03 11:41 ` Ricardo Wurmus 2022-02-03 12:07 ` zimoun @ 2022-02-05 14:12 ` Ludovic Courtès 2022-02-15 14:10 ` Bengt Richter 2 siblings, 1 reply; 18+ messages in thread From: Ludovic Courtès @ 2022-02-05 14:12 UTC (permalink / raw) To: Konrad Hinsen; +Cc: Guix Devel, zimoun Konrad Hinsen <konrad.hinsen@fastmail.net> skribis: > There is obviously a trade-off between reproducibility and performance > here. I tried hard to dispel that belief: you do not have to trade one for the other. Yes, in some cases scientific software might lack the engineering work that allows for portable performance; but in those cases, there’s ‘--tune’. https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/ We should keep repeating that message: reproducibility and performance are not antithetic. And I really mean it, otherwise fellow HPC practitioners will keep producing unverifiable results on the grounds that they cannot possibly compromise on performance! Thanks, Ludo’. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-05 14:12 ` Ludovic Courtès @ 2022-02-15 14:10 ` Bengt Richter 2022-02-16 12:03 ` zimoun 0 siblings, 1 reply; 18+ messages in thread From: Bengt Richter @ 2022-02-15 14:10 UTC (permalink / raw) To: Ludovic Courtès; +Cc: Guix Devel, zimoun Hi, On +2022-02-05 15:12:28 +0100, Ludovic Courtès wrote: > Konrad Hinsen <konrad.hinsen@fastmail.net> skribis: > > > There is obviously a trade-off between reproducibility and performance > > here. > I suspect what you really want to reproduce is not verbatim code, but the abstract computation that it implements, typically a digitally simulated experiment? Thus far, "show me the code" is the usual way to ask someone what they did, and guix makes is possible to answer in great detail. But what is really relevant if you are helping a colleague reproduce e.g. a monte-carlo simulation experiment computing pi by throwing random darts at a square, to draw a graph showing convergence of statistically-computed pi on y-axis vs number of darts thrown on x-axis? (IIRC pi should be hits within inscribed circle / hits in 1x1 square) Well, ISTM you can reproduce this experiment in any language and method that does the abtract job. The details of Fortran version or Julia/Clang or guile pedigree only really come into play for forensics looking for where the abstract was implemented differently. E.g., if results were different, were the x and y random numbers displacing the darts within the square really uniform and independent, and seeded with constants to ensure bit-for-bit equivalent computations? How fast the computations happened is not relevant, though of course nice for getting work done :) > I tried hard to dispel that belief: you do not have to trade one for the other. > > Yes, in some cases scientific software might lack the engineering work > that allows for portable performance; but in those cases, there’s > ‘--tune’. > > https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/ > > We should keep repeating that message: reproducibility and performance > are not antithetic. And I really mean it, otherwise fellow HPC > practitioners will keep producing unverifiable results on the grounds > that they cannot possibly compromise on performance! > Maybe the above pi computation could be a start on some kind of abstract model validation test? It's simple, but it pulls on a lot of simulation tool chains. WDYT? > Thanks, > Ludo’. > -- Regards, Bengt Richter ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-15 14:10 ` Bengt Richter @ 2022-02-16 12:03 ` zimoun 2022-02-16 13:04 ` Konrad Hinsen 0 siblings, 1 reply; 18+ messages in thread From: zimoun @ 2022-02-16 12:03 UTC (permalink / raw) To: Bengt Richter, Ludovic Courtès; +Cc: Guix Devel Hi, On Tue, 15 Feb 2022 at 15:10, Bengt Richter <bokr@bokr.com> wrote: > I suspect what you really want to reproduce is not verbatim > code, but the abstract computation that it implements, > typically a digitally simulated experiment? [...] > Maybe the above pi computation could be a start on some kind > of abstract model validation test? It's simple, but it pulls > on a lot of simulation tool chains. WDYT? Well, it depends on the community which term they pick for which concept: - same team, same experimental setup - different team, same experimental setup - different team, different experimental setup and the terms are repeat, replicate, reproduce. For details, see [1]. Since Konrad is editor for the ReScience journal, I guess ’reproduce’ means [2]: Reproduction of a computational study means running the same computation on the same input data, and then checking if the results are the same, or at least “close enough” when it comes to numerical approximations. Reproduction can be considered as software testing at the level of a complete study. Where my understanding of your “abstract computation” looks more as [2]: Replication of a scientific study (computational or other) means repeating a published protocol, respecting its spirit and intentions but varying the technical details. For computational work, this would mean using different software, running a simulation from different initial conditions, etc. The idea is to change something that everyone believes shouldn’t matter, and see if the scientific conclusions are affected or not. Therefore, again from my understanding, you are somehow proposing what science should be. :-) It is what the initiative GuixHPC [3] is trying to tackle. Transparency and full control of the variability––the roots of the scientific method––allow to achieve, with more or less success, ’reproduction’. Here and today, Guix plays a central role for reproducing because Guix does not cheat with transparency and full control of variability. Note that some people are calling for bit-to-bit scientific reproduction. I am not. Because the meaning of “same” or “equal” depends on the scientific fields. However, it is up to any scientific debate or controversy to draw the line for “same” and argue if the conclusions hold. Again, transparency and full control of the variability are fundamental here. How to argue if they are not satisfied? Then, and out of Guix scope, if the reproduced result matters enough, people can try to replicate, for confirmation, for performance improvements, or as a step targeting another results. This replication can use Guix to control the variability and also help the reproduction of the replication; but Guix does not take a central role here. Last, it is in this second and other steps that the “abstract model” could play role, and it is out of Guix scope, IMHO. 1: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5778115/> 2: <http://rescience.github.io/faq/> 3: <https://hpc.guix.info/> Cheers, simon ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-16 12:03 ` zimoun @ 2022-02-16 13:04 ` Konrad Hinsen 2022-02-17 11:21 ` zimoun 0 siblings, 1 reply; 18+ messages in thread From: Konrad Hinsen @ 2022-02-16 13:04 UTC (permalink / raw) To: zimoun, Bengt Richter, Ludovic Courtès; +Cc: Guix Devel Hi Bengt and Simon, zimoun <zimon.toutoune@gmail.com> writes: > Note that some people are calling for bit-to-bit scientific > reproduction. I am not. Because the meaning of “same” or “equal” I am. Not as a goal in itself, because in the larger scientific context it's robust replicability that matters, not bit-for-bit re-execution. And yet, the latter matters for two reasons : - It's verifiable automatically, making it cheap and fast to check. No need to bother an expert for a qualified opinion. - If you hit a case of non-replicability (scientifically relevant differences in two computations that everybody expects to yield equivalent results), then it is nearly impossible to investigate if the individual computations are not bit-for-bit reproducible. Making scientific computations bit-for-bit reproducible is the moral equivalent of keeping a detailed lab notebook: doing your best to tell others exactly what you did. > conclusions hold. Again, transparency and full control of the > variability are fundamental here. How to argue if they are not > satisfied? Exactly, that's very similar to my second point. Or, in Bengt's formulation: > The details of Fortran version or Julia/Clang or guile > pedigree only really come into play for forensics looking > for where the abstract was implemented differently. When the forensics are called in, then... > Thus far, "show me the code" is the usual way to ask someone > what they did, and guix makes is possible to answer in great > detail. ... "show me the code" is not sufficient. You must also be sure that the code you look at is really the code that was run. And that's the role of bit-for-bit reproducibility. Cheers, Konrad. ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-16 13:04 ` Konrad Hinsen @ 2022-02-17 11:21 ` zimoun 2022-02-17 16:55 ` Konrad Hinsen 0 siblings, 1 reply; 18+ messages in thread From: zimoun @ 2022-02-17 11:21 UTC (permalink / raw) To: Konrad Hinsen, Bengt Richter, Ludovic Courtès; +Cc: Guix Devel Hi Konrad, We agree on the main points in the scope of Guix. :-) We probably disagree on some specific points about epistemology or epistemic justification; I am not sure to understand enough these terms to put them here. :-) We are far from OpenBLAS. :-) On Wed, 16 Feb 2022 at 14:04, Konrad Hinsen <konrad.hinsen@fastmail.net> wrote: > Making scientific computations bit-for-bit reproducible is the moral > equivalent of keeping a detailed lab notebook: doing your best to tell > others exactly what you did. A detailed lab notebook implies transparency and full control of variability, not bit-for-bit reproducibility. If my detailed lab notebook tracks my experiment to test gravity and pendulum, as detailed and ideal (moral?) as it would be, i.e., providing the capacity to build and re-build two exact same benches, then two experiences would not provide the bit-for-bit numbers in a table measuring the oscillations. Because, for instance, it would depend on the two locations, on the touch of the experimenter, etc. In many fields, the experimental reproduction depends on the variability of the inputs or of the instruments and therefore the scientific community, field by field, somehow defines what “same” means, depending on their common variability from their field. For one, I do not see why it would be different for the computational processing part of the experiment. And two, asking bit-for-bit reproducibility for one part of the experiment is asking far more than for the others non-computational part of the same experiment. Because I use daily computers and am deeply interested in what a computation means, for sure, I advocate for bit-to-bit reproducibility. But then, I discuss with my colleagues biologist or MD and somehow my views are biased, i.e, I am trying to apply my own criteria defining “same” from my “field” to their “field” where the same “same” must be applied to the all chain, computational processing included. Or at least they have to define what is acceptable for each part. Do not take me wrong, such computational part must be transparent where the variability must also be controlled, but no strictly more or totally less than the other parts. > When the forensics are called in, then... > >> Thus far, "show me the code" is the usual way to ask someone >> what they did, and guix makes is possible to answer in great >> detail. > > ... "show me the code" is not sufficient. You must also be sure that the > code you look at is really the code that was run. I agree. It is “show me ALL the code” and e.g., “guix graph python-scipy” points it is a long read. :-) Therefore, being able to build, run, re-build and re-run are weak requirements to establish trust. > And that's the role of > bit-for-bit reproducibility. From my understanding, the validation of a reproduction depends on trust: what is the confidence about this or that? Well, bit-for-bit reproducibility is one criteria for establishing such trust. However, IMHO, such criteria is not the unique one, and defeating it can be compensated by other criteria used by many experimental sciences. Bah for what my opinion is worth on this topic. :-) In any cases, thanks Konrad for the materials you provide about this topic. For the interested French reader: :-) - https://www.societe-informatique-de-france.fr/wp-content/uploads/2021/11/1024_18_2021_11.html - https://webcast.in2p3.fr/video/les-enjeux-et-defis-de-la-recherche-reproductible - https://www.fun-mooc.fr/en/courses/reproducible-research-methodological-principles-transparent-scie/ Cheers, simon ^ permalink raw reply [flat|nested] 18+ messages in thread
* Re: Investigating a reproducibility failure 2022-02-17 11:21 ` zimoun @ 2022-02-17 16:55 ` Konrad Hinsen 0 siblings, 0 replies; 18+ messages in thread From: Konrad Hinsen @ 2022-02-17 16:55 UTC (permalink / raw) To: zimoun, Bengt Richter, Ludovic Courtès; +Cc: Guix Devel Hi Simon, > We are far from OpenBLAS. :-) That's fine with me. The more distance between me and OpenBLAS, the happier I am ;-) > On Wed, 16 Feb 2022 at 14:04, Konrad Hinsen <konrad.hinsen@fastmail.net> wrote: > >> Making scientific computations bit-for-bit reproducible is the moral >> equivalent of keeping a detailed lab notebook: doing your best to tell >> others exactly what you did. > > A detailed lab notebook implies transparency and full control of > variability, not bit-for-bit reproducibility. That's why I said "moral" equivalent. Computations are different from experiments. Typical mistakes are different, and technical possibilities are different. 1. You can't have the equivalent of bit-for-bit reproducibility with experiment. You can with computers, and with good tool support (Guix!) it can become a routine task that takes little effort. So... why *not* do it? 2. A computation involves many more details than any typical experiment. Just writing down what you did is *not* enough for documenting a computation, as experience has shown. So you need more than the lab notebook. If your computation is bit-for-bit reproducible, you know that you have documented every last detail. Inversely, if you cannot reproduce to the bit level, you know that *something* is out of your control. In the end, my argument is more pragmatic than philosophical. If bit-for-bit reproducibility is (1) useful for resolving issues in the future, and (2) cheap to get with good tool support, then we should go for it. The main reason why people argue against it is lack of tool support in their work environments. They conclude that it's a difficult goal to achieve, and then start to reason that it's not strictly necessary for the scientific method. Which is true. But... it's still very useful. >> And that's the role of bit-for-bit reproducibility. > > From my understanding, the validation of a reproduction depends on > trust: what is the confidence about this or that? Well, bit-for-bit > reproducibility is one criteria for establishing such trust. However, > IMHO, such criteria is not the unique one, and defeating it can be > compensated by other criteria used by many experimental sciences. Definitely. But in many cases, bit-for-bit reproducibility is the cheapest way to build trust, given good tool support. In other cases, e.g. HPC or exotic hardware, it's expensive, and then you look for something else. Cheers, Konrad. ^ permalink raw reply [flat|nested] 18+ messages in thread
end of thread, other threads:[~2022-02-17 16:56 UTC | newest] Thread overview: 18+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2022-02-01 14:05 Investigating a reproducibility failure Konrad Hinsen 2022-02-01 14:30 ` Konrad Hinsen 2022-02-02 23:19 ` Ricardo Wurmus 2022-02-02 23:36 ` Ricardo Wurmus 2022-02-05 14:05 ` Ludovic Courtès 2022-02-08 5:57 ` Konrad Hinsen -- strict thread matches above, loose matches on Subject: below -- 2022-02-02 20:35 zimoun 2022-02-02 23:43 ` zimoun 2022-02-03 9:16 ` Konrad Hinsen 2022-02-03 11:41 ` Ricardo Wurmus 2022-02-03 17:05 ` Konrad Hinsen 2022-02-03 12:07 ` zimoun 2022-02-05 14:12 ` Ludovic Courtès 2022-02-15 14:10 ` Bengt Richter 2022-02-16 12:03 ` zimoun 2022-02-16 13:04 ` Konrad Hinsen 2022-02-17 11:21 ` zimoun 2022-02-17 16:55 ` Konrad Hinsen
Code repositories for project(s) associated with this external index https://git.savannah.gnu.org/cgit/guix.git This is an external index of several public inboxes, see mirroring instructions on how to clone and mirror all data and code used by this external index.