Re: Investigating a reproducibility failure

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Re: Investigating a reproducibility failure
@ 2022-02-02 20:35 zimoun
  2022-02-02 23:43 ` zimoun
  0 siblings, 1 reply; 18+ messages in thread
From: zimoun @ 2022-02-02 20:35 UTC (permalink / raw)
  To: guix-devel

Hi Konrad,

I get the same error as you.  And for more versions than the only one
your tested.  For instance, for these commits,

* substitutes and rebuilds

923dcc3597 Fri Jan 14 12:59:33 2022 +0100 gnu: iverilog: Update to 11.0.
79ca578182 Thu Nov 11 21:52:08 2021 -0500 gnu: fpc: Fix build.
ab0cf06244 Thu Nov 11 13:35:51 2021 -0500 gnu: rust: Remove #:rust ,rust-1.52 arguments.
bbd2864272 Sun Dec 27 15:50:08 2020 +0100 gnu: openblas: Update to 0.3.13.


* substitutes but failed rebuilds (--no-grafts --check)

4b1538e6ef Thu Nov 11 12:18:37 2021 -0500 gnu: kexec-tools: Fix build on i686-linux.
ade7638d84 Fri Sep 18 14:05:51 2020 +0200 Revert "gnu: openblas: Update to 0.3.10."
c59e9f0a03 Fri Sep 18 08:57:48 2020 +0200 gnu: openblas: Update to 0.3.10
5969598149 Sat Mar 7 12:48:18 2020 +0100 gnu: openblas: Use HTTPS home page.
2ea095300a Tue Oct 8 21:23:06 2019 +0200 gnu: OpenBLAS: Update to 0.3.7.


* no substitute and failed builds

a4384dc970 Tue Oct 8 21:23:06 2019 +0200 gnu: OpenBLAS: Incorporate grafted changes.
ba05be2249 Fri Sep 13 10:50:11 2019 +0200 gnu: openblas: Set 'NUM_THREADS'.
5855756c81 Thu Feb 21 22:04:48 2019 -0600 gnu: openblas: Honor parallel-job-count.
602a5ef9f3 Sun Feb 10 21:04:23 2019 +0100 gnu: OpenBLAS: Update to 0.3.5.



Last, note that the time-machine is failing earlier for these commits:

d26584fcda Thu Nov 11 12:18:37 2021 -0500 gnu: binutils-gold: Inherit from binutils-next.
ac6f677249 Thu Nov 11 12:18:37 2021 -0500 gnu: Add binutils-next.
661b25a2ed Thu Nov 11 12:18:36 2021 -0500 gnu: openblas: Do not build static library.
9e497f44ba Thu Nov 11 12:18:36 2021 -0500 gnu: openblas: Add support for older x86 processors.
bd771edd6c Thu Nov 11 12:18:31 2021 -0500 gnu: openblas: Update to 0.3.18
e364758d44 Fri Sep 18 22:26:33 2020 +0200 gnu: openblas: Update to 0.3.10
df5a2e4f83 Thu Mar 5 23:36:05 2020 +0100 gnu: OpenBLAS: Update to 0.3.9
087c94019d Sat Feb 15 22:02:56 2020 +0100 gnu: OpenBLAS: Update to 0.3.8.
e77412362f Sat May 4 16:25:53 2019 +0200 gnu: OpenBLAS: Update to 0.3.6.

which reduces the range for testing.


This bug#51536 is discussing reproducibility of openblas and the
compilation flags.


1: <https://issues.guix.gnu.org/51536>


Cheers,
simon


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-02 20:35 Investigating a reproducibility failure zimoun
@ 2022-02-02 23:43 ` zimoun
  2022-02-03  9:16   ` Konrad Hinsen
  0 siblings, 1 reply; 18+ messages in thread
From: zimoun @ 2022-02-02 23:43 UTC (permalink / raw)
  To: Guix Devel, Konrad Hinsen

Hi Konrad,

What is the output of 'lscpu'?

For instance, on machine A running on Intel(R) Xeon(R) Gold 5218 CPU @
2.30GHz, OpenBLAS for commit 87e7faa2ae641d8302efc8b90f1e45f43f67f6da
builds.
On machine B running Intel(R) Core(TM) i7-10700K CPU @ 3.80GHz,
OpenBLAS for the same commit fails.

Cheers,
simon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-02 23:43 ` zimoun
@ 2022-02-03  9:16   ` Konrad Hinsen
  2022-02-03 11:41     ` Ricardo Wurmus
                       ` (2 more replies)
  0 siblings, 3 replies; 18+ messages in thread
From: Konrad Hinsen @ 2022-02-03  9:16 UTC (permalink / raw)
  To: zimoun, Guix Devel, Ricardo Wurmus

Hi Ricardo and Simon,

Thanks for your insight! I didn't even know about lscpu. The output for
my laptop is shown below. I tried building on a virtual machine, and
that works fine.

> CPU detection is a bottomless can of worms.

That sounds very credible. But what can we do about this?

There is obviously a trade-off between reproducibility and performance
here. Can we support both, in a way that users can understand and manage?

The OpenBlas package in Guix is (or at least was, back then) written for
performance. Can I, as a user, ask for a reproducible version? That
could either be a generic version for any x86 architecture (preferably),
or one that always builds for a given sub-architecture and then fails at
runtime if the CPU doesn't match.

Next: can I, as a user of dependent code, ask for reproducible versions
of all my dependencies? In my case, I was packaging Python code that
calls OpenBlas via NumPy. Many people in that situation don't even know
what OpenBlas is. I did know, but wasn't aware of the build-time CPU
detection.

There is of course the issue that we can never be sure if a build will
be reproducible in the future. But we can at least take care of the
cases where the packager is aware of non-reproducibility issues, and
make them transparent and manageable.

Cheers,
  Konrad

--8<---------------cut here---------------start------------->8---
$ lscpu
Architecture:                    x86_64
CPU op-mode(s):                  32-bit, 64-bit
Byte Order:                      Little Endian
Address sizes:                   39 bits physical, 48 bits virtual
CPU(s):                          8
On-line CPU(s) list:             0-7
Thread(s) per core:              2
Core(s) per socket:              4
Socket(s):                       1
NUMA node(s):                    1
Vendor ID:                       GenuineIntel
CPU family:                      6
Model:                           140
Model name:                      11th Gen Intel(R) Core(TM) i7-1185G7 @ 3.00GHz
Stepping:                        1
CPU MHz:                         1800.000
CPU max MHz:                     4800,0000
CPU min MHz:                     400,0000
BogoMIPS:                        3609.60
Virtualization:                  VT-x
L1d cache:                       192 KiB
L1i cache:                       128 KiB
L2 cache:                        5 MiB
L3 cache:                        12 MiB
NUMA node0 CPU(s):               0-7
Vulnerability Itlb multihit:     Not affected
Vulnerability L1tf:              Not affected
Vulnerability Mds:               Not affected
Vulnerability Meltdown:          Not affected
Vulnerability Spec store bypass: Mitigation; Speculative Store Bypass disabled via p
                                 rctl and seccomp
Vulnerability Spectre v1:        Mitigation; usercopy/swapgs barriers and __user poi
                                 nter sanitization
Vulnerability Spectre v2:        Mitigation; Enhanced IBRS, IBPB conditional, RSB fi
                                 lling
Vulnerability Srbds:             Not affected
Vulnerability Tsx async abort:   Not affected
Flags:                           fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pg
                                 e mca cmov pat pse36 clflush dts acpi mmx fxsr sse 
                                 sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm cons
                                 tant_tsc art arch_perfmon pebs bts rep_good nopl xt
                                 opology nonstop_tsc cpuid aperfmperf tsc_known_freq
                                  pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm
                                 2 ssse3 sdbg fma cx16 xtpr pdcm pcid sse4_1 sse4_2 
                                 x2apic movbe popcnt tsc_deadline_timer aes xsave av
                                 x f16c rdrand lahf_lm abm 3dnowprefetch cpuid_fault
                                  epb cat_l2 invpcid_single cdp_l2 ssbd ibrs ibpb st
                                 ibp ibrs_enhanced tpr_shadow vnmi flexpriority ept 
                                 vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2
                                  erms invpcid rdt_a avx512f avx512dq rdseed adx sma
                                 p avx512ifma clflushopt clwb intel_pt avx512cd sha_
                                 ni avx512bw avx512vl xsaveopt xsavec xgetbv1 xsaves
                                  split_lock_detect dtherm ida arat pln pts hwp hwp_
                                 notify hwp_act_window hwp_epp hwp_pkg_req avx512vbm
                                 i umip pku ospke avx512_vbmi2 gfni vaes vpclmulqdq 
                                 avx512_vnni avx512_bitalg tme avx512_vpopcntdq rdpi
                                 d movdiri movdir64b fsrm avx512_vp2intersect md_cle
                                 ar flush_l1d arch_capabilities
--8<---------------cut here---------------end--------------->8---


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-03  9:16   ` Konrad Hinsen
@ 2022-02-03 11:41     ` Ricardo Wurmus
  2022-02-03 17:05       ` Konrad Hinsen
  2022-02-03 12:07     ` zimoun
  2022-02-05 14:12     ` Ludovic Courtès
  2 siblings, 1 reply; 18+ messages in thread
From: Ricardo Wurmus @ 2022-02-03 11:41 UTC (permalink / raw)
  To: Konrad Hinsen; +Cc: Guix Devel, zimoun

Hi Konrad,

>> CPU detection is a bottomless can of worms.
>
> That sounds very credible. But what can we do about this?
>
> There is obviously a trade-off between reproducibility and performance
> here. Can we support both, in a way that users can understand and manage?

So far our default approach has been to use the lowest common set of CPU
instructions, which generally leads to poorly performing code.  Some
packages are smarter and provide different code paths for different
CPUs.  The resulting binary is built the same, but at runtime different
parts of the code run dependent on the features the CPU reports.

The case of OpenBLAS is an anomaly in that this mechanism seems to
produce different binaries dependent on where it is built.  When I first
encountered this problem I guessed that perhaps it can only build these
different code paths up to the feature set of the CPU on the build
machine, so if you’re building with an older CPU your binary will lack
components that would be used on newer CPUs.  This is just a guess,
though.

Your problem is that the OpenBLAS build system doesn’t recognize your
modern CPU.  Ideally, it wouldn’t need to know anything about the
build-time CPU to build all the different code paths for different CPU
features.  The only way around this — retroactively — is to pretend to
have an older CPU, e.g. by using qemu.

In the long term it would be great if we could patch OpenBLAS to not
attempt to detect CPU features at build time.  I’m not sure this will
work if it does indeed use the currently available CPU features to
determine “how far up” to build modules in support of certain CPU
features / instruction sets.

> There is of course the issue that we can never be sure if a build will
> be reproducible in the future. But we can at least take care of the
> cases where the packager is aware of non-reproducibility issues, and
> make them transparent and manageable.

The new “--tune” feature is supposed to take care of cases like this.
We would still patch the code so that by default you’d get a package
that is reproducible (= you get the same exact binary no matter when or
where you build it) but that many not have optimal performance.  With
“--tune” you could opt to replace that generic build with one that uses
features of your current CPU, using grafts to swap the generic library
for the more performant library.

-- 
Ricardo

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-03 11:41     ` Ricardo Wurmus
@ 2022-02-03 17:05       ` Konrad Hinsen
  0 siblings, 0 replies; 18+ messages in thread
From: Konrad Hinsen @ 2022-02-03 17:05 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Guix Devel, zimoun

Hi Ricardo and Simon,

Ricardo Wurmus <rekado@elephly.net> writes:

> The case of OpenBLAS is an anomaly in that this mechanism seems to
> produce different binaries dependent on where it is built.  When I first

Thanks a lot for those explanations, I hadn't realized how peculiar
OpenBLAS is!

> Your problem is that the OpenBLAS build system doesn’t recognize your
> modern CPU.  Ideally, it wouldn’t need to know anything about the
> build-time CPU to build all the different code paths for different CPU
> features.  The only way around this — retroactively — is to pretend to
> have an older CPU, e.g. by using qemu.

So all we need is a "QEMU build system" in Guix, just for OpenBLAS ;-)

> The new “--tune” feature is supposed to take care of cases like this.

Right, I remember Ludo's blog post about this.


zimoun <zimon.toutoune@gmail.com> writes:

> Somehow, “recent” processors cannot build old versions.

That's a whole new level of planned obsolescence!

Cheers,
  Konrad.


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-03  9:16   ` Konrad Hinsen
  2022-02-03 11:41     ` Ricardo Wurmus
@ 2022-02-03 12:07     ` zimoun
  2022-02-05 14:12     ` Ludovic Courtès
  2 siblings, 0 replies; 18+ messages in thread
From: zimoun @ 2022-02-03 12:07 UTC (permalink / raw)
  To: Konrad Hinsen, Guix Devel, Ricardo Wurmus

Hi Konrad,

On Thu, 03 Feb 2022 at 10:16, Konrad Hinsen <konrad.hinsen@fastmail.net> wrote:

>> CPU detection is a bottomless can of worms.
>
> That sounds very credible. But what can we do about this?

Well, I do not know what could be done about this.  Today, the picture
for OpenBLAS@0.3.6 build looks like:

* Fail

        i7-1185G7E (Tiger Lake)
        i7-10700K  (Comet Lake)

* Build

        i7-6500U   (Skylake)
        E7-4870V2  (Ivy Bridge)
        5218       (Cascade Lake)


Somehow, “recent” processors cannot build old versions.


> There is obviously a trade-off between reproducibility and performance
> here. Can we support both, in a way that users can understand and manage?

Usually both [1].  However, it is not clear for me why OpenBLAS v0.3.6
does not build on some “recent“ processors; even in poor performance
mode with as much as possible generic code.

1: <https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/>


> The OpenBlas package in Guix is (or at least was, back then) written for
> performance. Can I, as a user, ask for a reproducible version? That
> could either be a generic version for any x86 architecture (preferably),
> or one that always builds for a given sub-architecture and then fails at
> runtime if the CPU doesn't match.
>
> Next: can I, as a user of dependent code, ask for reproducible versions
> of all my dependencies? In my case, I was packaging Python code that
> calls OpenBlas via NumPy. Many people in that situation don't even know
> what OpenBlas is. I did know, but wasn't aware of the build-time CPU
> detection.
>
> There is of course the issue that we can never be sure if a build will
> be reproducible in the future. But we can at least take care of the
> cases where the packager is aware of non-reproducibility issues, and
> make them transparent and manageable.

The answer of your concerns is the transformation --tune, I guess. This
transformation is providing micro-optimizations for high performance
while preserving provenance tracking.

Here the issue seems different.  OpenBLAS v0.3.6 seems to fail to
fallback to generic processor when it does not find the
processor–probably because the microarchitecture was not existing or
supported at the time.

(Note that ’Comet Lake’ is not in the list
%gcc-10-x86_64-micro-architectures, so --tune would probably be
inefficient; I do not know.)


Cheers,
simon


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-03  9:16   ` Konrad Hinsen
  2022-02-03 11:41     ` Ricardo Wurmus
  2022-02-03 12:07     ` zimoun
@ 2022-02-05 14:12     ` Ludovic Courtès
  2022-02-15 14:10       ` Bengt Richter
  2 siblings, 1 reply; 18+ messages in thread
From: Ludovic Courtès @ 2022-02-05 14:12 UTC (permalink / raw)
  To: Konrad Hinsen; +Cc: Guix Devel, zimoun

Konrad Hinsen <konrad.hinsen@fastmail.net> skribis:

> There is obviously a trade-off between reproducibility and performance
> here.

I tried hard to dispel that belief: you do not have to trade one for the other.

Yes, in some cases scientific software might lack the engineering work
that allows for portable performance; but in those cases, there’s
‘--tune’.

  https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/

We should keep repeating that message: reproducibility and performance
are not antithetic.  And I really mean it, otherwise fellow HPC
practitioners will keep producing unverifiable results on the grounds
that they cannot possibly compromise on performance!

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-05 14:12     ` Ludovic Courtès
@ 2022-02-15 14:10       ` Bengt Richter
  2022-02-16 12:03         ` zimoun
  0 siblings, 1 reply; 18+ messages in thread
From: Bengt Richter @ 2022-02-15 14:10 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel, zimoun

Hi,

On +2022-02-05 15:12:28 +0100, Ludovic Courtès wrote:
> Konrad Hinsen <konrad.hinsen@fastmail.net> skribis:
> 
> > There is obviously a trade-off between reproducibility and performance
> > here.
>

I suspect what you really want to reproduce is not verbatim
code, but the abstract computation that it implements,
typically a digitally simulated experiment?

Thus far, "show me the code" is the usual way to ask someone
what they did, and guix makes is possible to answer in great
detail.

But what is really relevant if you are helping a colleague
reproduce e.g. a monte-carlo simulation experiment computing
pi by throwing random darts at a square, to draw a graph
showing convergence of statistically-computed pi on y-axis
vs number of darts thrown on x-axis?

(IIRC pi should be hits within inscribed circle / hits in
1x1 square)

Well, ISTM you can reproduce this experiment in any language
and method that does the abtract job.

The details of Fortran version or Julia/Clang or guile
pedigree only really come into play for forensics looking
for where the abstract was implemented differently.

E.g., if results were different, were the x and y random
numbers displacing the darts within the square really
uniform and independent, and seeded with constants to ensure
bit-for-bit equivalent computations?

How fast the computations happened is not relevant,
though of course nice for getting work done :)

> I tried hard to dispel that belief: you do not have to trade one for the other.
> 
> Yes, in some cases scientific software might lack the engineering work
> that allows for portable performance; but in those cases, there’s
> ‘--tune’.
> 
>   https://hpc.guix.info/blog/2022/01/tuning-packages-for-a-cpu-micro-architecture/
> 
> We should keep repeating that message: reproducibility and performance
> are not antithetic.  And I really mean it, otherwise fellow HPC
> practitioners will keep producing unverifiable results on the grounds
> that they cannot possibly compromise on performance!
>

Maybe the above pi computation could be a start on some kind
of abstract model validation test? It's simple, but it pulls
on a lot of simulation tool chains. WDYT?

> Thanks,
> Ludo’.
> 

-- 
Regards,
Bengt Richter

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-15 14:10       ` Bengt Richter
@ 2022-02-16 12:03         ` zimoun
  2022-02-16 13:04           ` Konrad Hinsen
  0 siblings, 1 reply; 18+ messages in thread
From: zimoun @ 2022-02-16 12:03 UTC (permalink / raw)
  To: Bengt Richter, Ludovic Courtès; +Cc: Guix Devel

Hi,

On Tue, 15 Feb 2022 at 15:10, Bengt Richter <bokr@bokr.com> wrote:

> I suspect what you really want to reproduce is not verbatim
> code, but the abstract computation that it implements,
> typically a digitally simulated experiment?

[...]

> Maybe the above pi computation could be a start on some kind
> of abstract model validation test? It's simple, but it pulls
> on a lot of simulation tool chains. WDYT?

Well, it depends on the community which term they pick for which
concept:

 - same team, same experimental setup
 - different team, same experimental setup
 - different team, different experimental setup

and the terms are repeat, replicate, reproduce.  For details, see [1].

Since Konrad is editor for the ReScience journal, I guess ’reproduce’
means [2]:

        Reproduction of a computational study means running the same
        computation on the same input data, and then checking if the
        results are the same, or at least “close enough” when it comes
        to numerical approximations. Reproduction can be considered as
        software testing at the level of a complete study.

Where my understanding of your “abstract computation” looks more as [2]:

        Replication of a scientific study (computational or other) means
        repeating a published protocol, respecting its spirit and
        intentions but varying the technical details. For computational
        work, this would mean using different software, running a
        simulation from different initial conditions, etc. The idea is
        to change something that everyone believes shouldn’t matter, and
        see if the scientific conclusions are affected or not.

Therefore, again from my understanding, you are somehow proposing what
science should be. :-) It is what the initiative GuixHPC [3] is trying
to tackle.

Transparency and full control of the variability––the roots of the
scientific method––allow to achieve, with more or less success,
’reproduction’.  Here and today, Guix plays a central role for
reproducing because Guix does not cheat with transparency and full
control of variability.

Note that some people are calling for bit-to-bit scientific
reproduction.  I am not.  Because the meaning of “same” or “equal”
depends on the scientific fields.  However, it is up to any scientific
debate or controversy to draw the line for “same” and argue if the
conclusions hold.  Again, transparency and full control of the
variability are fundamental here.  How to argue if they are not
satisfied?

Then, and out of Guix scope, if the reproduced result matters enough,
people can try to replicate, for confirmation, for performance
improvements, or as a step targeting another results.  This replication
can use Guix to control the variability and also help the reproduction
of the replication; but Guix does not take a central role here.

Last, it is in this second and other steps that the “abstract model”
could play role, and it is out of Guix scope, IMHO.

1: <https://www.ncbi.nlm.nih.gov/pmc/articles/PMC5778115/>
2: <http://rescience.github.io/faq/>
3: <https://hpc.guix.info/>

Cheers,
simon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-16 12:03         ` zimoun
@ 2022-02-16 13:04           ` Konrad Hinsen
  2022-02-17 11:21             ` zimoun
  0 siblings, 1 reply; 18+ messages in thread
From: Konrad Hinsen @ 2022-02-16 13:04 UTC (permalink / raw)
  To: zimoun, Bengt Richter, Ludovic Courtès; +Cc: Guix Devel

Hi Bengt and Simon,

zimoun <zimon.toutoune@gmail.com> writes:

> Note that some people are calling for bit-to-bit scientific
> reproduction.  I am not.  Because the meaning of “same” or “equal”

I am. Not as a goal in itself, because in the larger scientific context
it's robust replicability that matters, not bit-for-bit re-execution.
And yet, the latter matters for two reasons :

 - It's verifiable automatically, making it cheap and fast to check.
   No need to bother an expert for a qualified opinion.

 - If you hit a case of non-replicability (scientifically relevant
   differences in two computations that everybody expects to yield
   equivalent results), then it is nearly impossible to investigate
   if the individual computations are not bit-for-bit reproducible.

Making scientific computations bit-for-bit reproducible is the moral
equivalent of keeping a detailed lab notebook: doing your best to tell
others exactly what you did.

> conclusions hold.  Again, transparency and full control of the
> variability are fundamental here.  How to argue if they are not
> satisfied?

Exactly, that's very similar to my second point.

Or, in Bengt's formulation:

> The details of Fortran version or Julia/Clang or guile
> pedigree only really come into play for forensics looking
> for where the abstract was implemented differently.

When the forensics are called in, then...

> Thus far, "show me the code" is the usual way to ask someone
> what they did, and guix makes is possible to answer in great
> detail.

... "show me the code" is not sufficient. You must also be sure that the
code you look at is really the code that was run. And that's the role of
bit-for-bit reproducibility.

Cheers,
  Konrad.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-16 13:04           ` Konrad Hinsen
@ 2022-02-17 11:21             ` zimoun
  2022-02-17 16:55               ` Konrad Hinsen
  0 siblings, 1 reply; 18+ messages in thread
From: zimoun @ 2022-02-17 11:21 UTC (permalink / raw)
  To: Konrad Hinsen, Bengt Richter, Ludovic Courtès; +Cc: Guix Devel

Hi Konrad,

We agree on the main points in the scope of Guix. :-)  We probably
disagree on some specific points about epistemology or epistemic
justification; I am not sure to understand enough these terms to put
them here. :-)

We are far from OpenBLAS. :-)

On Wed, 16 Feb 2022 at 14:04, Konrad Hinsen <konrad.hinsen@fastmail.net> wrote:

> Making scientific computations bit-for-bit reproducible is the moral
> equivalent of keeping a detailed lab notebook: doing your best to tell
> others exactly what you did.

A detailed lab notebook implies transparency and full control of
variability, not bit-for-bit reproducibility.

If my detailed lab notebook tracks my experiment to test gravity and
pendulum, as detailed and ideal (moral?) as it would be, i.e., providing
the capacity to build and re-build two exact same benches, then two
experiences would not provide the bit-for-bit numbers in a table
measuring the oscillations.  Because, for instance, it would depend on
the two locations, on the touch of the experimenter, etc.

In many fields, the experimental reproduction depends on the variability
of the inputs or of the instruments and therefore the scientific
community, field by field, somehow defines what “same” means, depending
on their common variability from their field.

For one, I do not see why it would be different for the computational
processing part of the experiment.  And two, asking bit-for-bit
reproducibility for one part of the experiment is asking far more than
for the others non-computational part of the same experiment.

Because I use daily computers and am deeply interested in what a
computation means, for sure, I advocate for bit-to-bit reproducibility.
But then, I discuss with my colleagues biologist or MD and somehow my
views are biased, i.e, I am trying to apply my own criteria defining
“same” from my “field” to their “field” where the same “same” must be
applied to the all chain, computational processing included.  Or at
least they have to define what is acceptable for each part.  Do not take
me wrong, such computational part must be transparent where the
variability must also be controlled, but no strictly more or totally
less than the other parts.

> When the forensics are called in, then...
>
>> Thus far, "show me the code" is the usual way to ask someone
>> what they did, and guix makes is possible to answer in great
>> detail.
>
> ... "show me the code" is not sufficient. You must also be sure that the
> code you look at is really the code that was run.

I agree.  It is “show me ALL the code” and e.g., “guix graph
python-scipy” points it is a long read. :-) Therefore, being able to
build, run, re-build and re-run are weak requirements to establish
trust.

>                                                   And that's the role of
> bit-for-bit reproducibility.

From my understanding, the validation of a reproduction depends on
trust: what is the confidence about this or that?  Well, bit-for-bit
reproducibility is one criteria for establishing such trust.  However,
IMHO, such criteria is not the unique one, and defeating it can be
compensated by other criteria used by many experimental sciences.

Bah for what my opinion is worth on this topic. :-)

In any cases, thanks Konrad for the materials you provide about this
topic.  For the interested French reader: :-)

 - https://www.societe-informatique-de-france.fr/wp-content/uploads/2021/11/1024_18_2021_11.html
 - https://webcast.in2p3.fr/video/les-enjeux-et-defis-de-la-recherche-reproductible
 - https://www.fun-mooc.fr/en/courses/reproducible-research-methodological-principles-transparent-scie/

Cheers,
simon

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-17 11:21             ` zimoun
@ 2022-02-17 16:55               ` Konrad Hinsen
  0 siblings, 0 replies; 18+ messages in thread
From: Konrad Hinsen @ 2022-02-17 16:55 UTC (permalink / raw)
  To: zimoun, Bengt Richter, Ludovic Courtès; +Cc: Guix Devel

Hi Simon,

> We are far from OpenBLAS. :-)

That's fine with me. The more distance between me and OpenBLAS, the
happier I am ;-)

> On Wed, 16 Feb 2022 at 14:04, Konrad Hinsen <konrad.hinsen@fastmail.net> wrote:
>
>> Making scientific computations bit-for-bit reproducible is the moral
>> equivalent of keeping a detailed lab notebook: doing your best to tell
>> others exactly what you did.
>
> A detailed lab notebook implies transparency and full control of
> variability, not bit-for-bit reproducibility.

That's why I said "moral" equivalent. Computations are different from
experiments. Typical mistakes are different, and technical possibilities
are different.

1. You can't have the equivalent of bit-for-bit reproducibility with
   experiment. You can with computers, and with good tool support
   (Guix!)  it can become a routine task that takes little
   effort. So... why *not* do it?

2. A computation involves many more details than any typical experiment.
   Just writing down what you did is *not* enough for documenting a
   computation, as experience has shown. So you need more than the
   lab notebook. If your computation is bit-for-bit reproducible, you
   know that you have documented every last detail. Inversely, if you
   cannot reproduce to the bit level, you know that *something* is out
   of your control.

In the end, my argument is more pragmatic than philosophical. If
bit-for-bit reproducibility is (1) useful for resolving issues in the
future, and (2) cheap to get with good tool support, then we should
go for it.

The main reason why people argue against it is lack of tool support in
their work environments. They conclude that it's a difficult goal to
achieve, and then start to reason that it's not strictly necessary for
the scientific method. Which is true. But... it's still very useful.

>> And that's the role of bit-for-bit reproducibility.
>
> From my understanding, the validation of a reproduction depends on
> trust: what is the confidence about this or that?  Well, bit-for-bit
> reproducibility is one criteria for establishing such trust.  However,
> IMHO, such criteria is not the unique one, and defeating it can be
> compensated by other criteria used by many experimental sciences.

Definitely. But in many cases, bit-for-bit reproducibility is the
cheapest way to build trust, given good tool support. In other cases,
e.g. HPC or exotic hardware, it's expensive, and then you look for
something else.

Cheers,
  Konrad.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Investigating a reproducibility failure
@ 2022-02-01 14:05 Konrad Hinsen
  2022-02-01 14:30 ` Konrad Hinsen
  2022-02-05 14:05 ` Ludovic Courtès
  0 siblings, 2 replies; 18+ messages in thread
From: Konrad Hinsen @ 2022-02-01 14:05 UTC (permalink / raw)
  To: Guix Devel

Hi everyone,

Two years ago, I published a supposedly reproducible computation,
explaining how to re-run it at any time using Guix (it's at
https://github.com/khinsen/rescience-ten-year-challenge-paper-3/). Yesterday,
I got an e-mail from someone who tried, and failed. I tried myself, and
failed as well. But I don't understand what's going on.

To see the failure, do

   guix time-machine \
    --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \
    -- build openblas

The build log is attached, the first error is

   getarch_2nd.c: In function ‘main’:
   getarch_2nd.c:12:35: error: ‘SGEMM_DEFAULT_UNROLL_M’ undeclared (first use in this function); did you mean ‘XGEMM_DEFAULT_UNROLL_M’?
        printf("SGEMM_UNROLL_M=%d\n", SGEMM_DEFAULT_UNROLL_M);
                                      ^~~~~~~~~~~~~~~~~~~~~~
                                      XGEMM_DEFAULT_UNROLL_M

What makes this complicated is the DYNAMIC_ARCH feature of openblas that
Guix uses on X86 architectures. I don't know the details of who this
should work and why it could fail. In particular, I don't know if the
source code file getarch₂nd is supposed to be compiled at all if all
goes well.

I doubt we can do anything to fix the past, but I would like to
understand what exactly went wrong here so we can make sure we do better
in the future.

Cheers,
  Konrad.

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-01 14:05 Konrad Hinsen
@ 2022-02-01 14:30 ` Konrad Hinsen
  2022-02-02 23:19   ` Ricardo Wurmus
  2022-02-05 14:05 ` Ludovic Courtès
  1 sibling, 1 reply; 18+ messages in thread
From: Konrad Hinsen @ 2022-02-01 14:30 UTC (permalink / raw)
  To: Guix Devel

[-- Attachment #1: Type: text/plain, Size: 501 bytes --]

Konrad Hinsen <konrad.hinsen@fastmail.net> writes:

> To see the failure, do
>
>    guix time-machine \
>     --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \
>     -- build openblas
>
> The build log is attached, the first error is

Oops... Two mistakes ! First, I forgot the attachment, so here it comes,
Second, I didn't quote the right commit. The failure happens with

    guix time-machine \
     --commit=87e7faa2ae641d8302efc8b90f1e45f43f67f6da \
     -- build openblas

Cheers,
  Konrad.


[-- Attachment #2: 6df92lhfz4vccgn5v2z0rc092bhz89-openblas-0.3.a.drv.bz2 --]
[-- Type: application/octet-stream, Size: 28540 bytes --]

^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-01 14:30 ` Konrad Hinsen
@ 2022-02-02 23:19   ` Ricardo Wurmus
  2022-02-02 23:36     ` Ricardo Wurmus
  0 siblings, 1 reply; 18+ messages in thread
From: Ricardo Wurmus @ 2022-02-02 23:19 UTC (permalink / raw)
  To: Konrad Hinsen; +Cc: guix-devel


Konrad Hinsen <konrad.hinsen@fastmail.net> writes:

> Konrad Hinsen <konrad.hinsen@fastmail.net> writes:
>
>> To see the failure, do
>>
>>    guix time-machine \
>>     --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \
>>     -- build openblas
>>
>> The build log is attached, the first error is
>
> Oops... Two mistakes ! First, I forgot the attachment, so here it comes,
> Second, I didn't quote the right commit. The failure happens with
>
>     guix time-machine \
>      --commit=87e7faa2ae641d8302efc8b90f1e45f43f67f6da \
>      -- build openblas

It builds fine on this laptop.

--8<---------------cut here---------------start------------->8---
$ lscpu
Architecture:            x86_64
  CPU op-mode(s):        32-bit, 64-bit
  Address sizes:         39 bits physical, 48 bits virtual
  Byte Order:            Little Endian
CPU(s):                  4
  On-line CPU(s) list:   0-3
Vendor ID:               GenuineIntel
  Model name:            Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
    CPU family:          6
    Model:               78
    Thread(s) per core:  2
    Core(s) per socket:  2
    Socket(s):           1
    Stepping:            3
    CPU max MHz:         3100.0000
    CPU min MHz:         400.0000
    BogoMIPS:            5199.98
    Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr ss
                         e sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nop
                         l xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg 
                         fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra
                         nd lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi
                          flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx sm
                         ap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act
                         _window hwp_epp md_clear flush_l1d
Virtualization features: 
  Virtualization:        VT-x
Caches (sum of all):     
  L1d:                   64 KiB (2 instances)
  L1i:                   64 KiB (2 instances)
  L2:                    512 KiB (2 instances)
  L3:                    4 MiB (1 instance)
NUMA:                    
  NUMA node(s):          1
  NUMA node0 CPU(s):     0-3
--8<---------------cut here---------------end--------------->8---

CPU detection is a bottomless can of worms.

-- 
Ricardo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-02 23:19   ` Ricardo Wurmus
@ 2022-02-02 23:36     ` Ricardo Wurmus
  0 siblings, 0 replies; 18+ messages in thread
From: Ricardo Wurmus @ 2022-02-02 23:36 UTC (permalink / raw)
  To: Konrad Hinsen; +Cc: guix-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> Konrad Hinsen <konrad.hinsen@fastmail.net> writes:
>
>> Konrad Hinsen <konrad.hinsen@fastmail.net> writes:
>>
>>> To see the failure, do
>>>
>>>    guix time-machine \
>>>     --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \
>>>     -- build openblas
>>>
>>> The build log is attached, the first error is
>>
>> Oops... Two mistakes ! First, I forgot the attachment, so here it comes,
>> Second, I didn't quote the right commit. The failure happens with
>>
>>     guix time-machine \
>>      --commit=87e7faa2ae641d8302efc8b90f1e45f43f67f6da \
>>      -- build openblas
>
> It builds fine on this laptop.
>
> $ lscpu
> Architecture:            x86_64
>   CPU op-mode(s):        32-bit, 64-bit
>   Address sizes:         39 bits physical, 48 bits virtual
>   Byte Order:            Little Endian
> CPU(s):                  4
>   On-line CPU(s) list:   0-3
> Vendor ID:               GenuineIntel
>   Model name:            Intel(R) Core(TM) i7-6500U CPU @ 2.50GHz
>     CPU family:          6
>     Model:               78
>     Thread(s) per core:  2
>     Core(s) per socket:  2
>     Socket(s):           1
>     Stepping:            3
>     CPU max MHz:         3100.0000
>     CPU min MHz:         400.0000
>     BogoMIPS:            5199.98
>     Flags:               fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr ss
>                          e sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc art arch_perfmon pebs bts rep_good nop
>                          l xtopology nonstop_tsc cpuid aperfmperf pni pclmulqdq dtes64 monitor ds_cpl vmx est tm2 ssse3 sdbg 
>                          fma cx16 xtpr pdcm pcid sse4_1 sse4_2 x2apic movbe popcnt tsc_deadline_timer aes xsave avx f16c rdra
>                          nd lahf_lm abm 3dnowprefetch cpuid_fault epb invpcid_single pti ssbd ibrs ibpb stibp tpr_shadow vnmi
>                           flexpriority ept vpid ept_ad fsgsbase tsc_adjust bmi1 avx2 smep bmi2 erms invpcid mpx rdseed adx sm
>                          ap clflushopt intel_pt xsaveopt xsavec xgetbv1 xsaves dtherm ida arat pln pts hwp hwp_notify hwp_act
>                          _window hwp_epp md_clear flush_l1d
> Virtualization features: 
>   Virtualization:        VT-x
> Caches (sum of all):     
>   L1d:                   64 KiB (2 instances)
>   L1i:                   64 KiB (2 instances)
>   L2:                    512 KiB (2 instances)
>   L3:                    4 MiB (1 instance)
> NUMA:                    
>   NUMA node(s):          1
>   NUMA node0 CPU(s):     0-3

I also built this on a different machine, foreign distro.  Here’s the
output of lscpu:

--8<---------------cut here---------------start------------->8---
[rwurmus@beast:~] (571) $ lscpu 
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                120
On-line CPU(s) list:   0-119
Thread(s) per core:    2
Core(s) per socket:    15
Socket(s):             4
NUMA node(s):          4
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 62
Model name:            Intel(R) Xeon(R) CPU E7-4870 v2 @ 2.30GHz
Stepping:              7
CPU MHz:               2127.050
CPU max MHz:           2900.0000
CPU min MHz:           1200.0000
BogoMIPS:              4588.44
Virtualization:        VT-x
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              30720K
NUMA node0 CPU(s):     0,4,8,12,16,20,24,28,32,36,40,44,48,52,56,60,64,68,72,76,80,84,88,92,96,100,104,108,112,116
NUMA node1 CPU(s):     1,5,9,13,17,21,25,29,33,37,41,45,49,53,57,61,65,69,73,77,81,85,89,93,97,101,105,109,113,117
NUMA node2 CPU(s):     2,6,10,14,18,22,26,30,34,38,42,46,50,54,58,62,66,70,74,78,82,86,90,94,98,102,106,110,114,118
NUMA node3 CPU(s):     3,7,11,15,19,23,27,31,35,39,43,47,51,55,59,63,67,71,75,79,83,87,91,95,99,103,107,111,115,119
Flags:                 fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca cmov pat pse36 clflush dts acpi mmx fxsr sse sse2 ss ht tm pbe syscall nx pdpe1gb rdtscp lm constant_tsc arch_perfmon pebs bts rep_good nopl xtopology nonstop_tsc aperfmperf eagerfpu pni pclmulqdq dtes64 monitor ds_cpl vmx smx est tm2 ssse3 cx16 xtpr pdcm pcid dca sse4_1 sse4_2 x2apic popcnt tsc_deadline_timer aes xsave avx f16c rdrand lahf_lm ssbd ibrs ibpb stibp tpr_shadow vnmi flexpriority ept vpid fsgsbase smep erms xsaveopt dtherm ida arat pln pts md_clear spec_ctrl intel_stibp flush_l1d
--8<---------------cut here---------------end--------------->8---

The output differs, but the build did not fail.

-- 
Ricardo


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-01 14:05 Konrad Hinsen
  2022-02-01 14:30 ` Konrad Hinsen
@ 2022-02-05 14:05 ` Ludovic Courtès
  2022-02-08  5:57   ` Konrad Hinsen
  1 sibling, 1 reply; 18+ messages in thread
From: Ludovic Courtès @ 2022-02-05 14:05 UTC (permalink / raw)
  To: Konrad Hinsen; +Cc: Guix Devel

Hi!

Konrad Hinsen <konrad.hinsen@fastmail.net> skribis:

> To see the failure, do
>
>    guix time-machine \
>     --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \
>     -- build openblas

For the record, there’s still a substitute available for this one:

--8<---------------cut here---------------start------------->8---
$ guix time-machine --commit=7357b3d7a52eb5db1674012c50d308d792741c48 -- weather openblas
guile: warning: failed to install locale
computing 1 package derivations for x86_64-linux...
looking for 1 store items on https://ci.guix.gnu.org...
https://ci.guix.gnu.org
  100.0% substitutes available (1 out of 1)
  at least 24.5 MiB of nars (compressed)
  78.3 MiB on disk (uncompressed)
  0.003 seconds per request (0.0 seconds in total)
  343.4 requests per second
[ugly but unimportant backtrace omitted…]
$ guix time-machine --commit=7357b3d7a52eb5db1674012c50d308d792741c48 -- build openblas
guile: warning: failed to install locale
/gnu/store/vax1vsg3ivf0r7j7n2xkbi1z3r0504l9-openblas-0.3.7
--8<---------------cut here---------------end--------------->8---

That doesn’t solve the fact that OpenBLAS compilation is not
reproducible, as zimoun noted¹, and we need to fix it, but at least this
colleague of yours should have been able to fetch substitutes, no?

Thanks,
Ludo’.

¹ https://issues.guix.gnu.org/51536


^ permalink raw reply	[flat|nested] 18+ messages in thread

* Re: Investigating a reproducibility failure
  2022-02-05 14:05 ` Ludovic Courtès
@ 2022-02-08  5:57   ` Konrad Hinsen
  0 siblings, 0 replies; 18+ messages in thread
From: Konrad Hinsen @ 2022-02-08  5:57 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix Devel

Hi Ludo,

> Konrad Hinsen <konrad.hinsen@fastmail.net> skribis:
>
>> To see the failure, do
>>
>>    guix time-machine \
>>     --commit=7357b3d7a52eb5db1674012c50d308d792741c48 \
>>     -- build openblas
>
> For the record, there’s still a substitute available for this one:

...

> That doesn’t solve the fact that OpenBLAS compilation is not
> reproducible, as zimoun noted¹, and we need to fix it, but at least this
> colleague of yours should have been able to fetch substitutes, no?

Good point. If I try to use it now, it works, fetching the substitute.
Back when I started investigating this, on the same machine, Guix tried
to build locally.  So I guess there was some ephemeral problem with
accessing the substitute server.

Cheers,
  Konrad


^ permalink raw reply	[flat|nested] 18+ messages in thread

end of thread, other threads:[~2022-02-17 16:56 UTC | newest]

Thread overview: 18+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-02-02 20:35 Investigating a reproducibility failure zimoun
2022-02-02 23:43 ` zimoun
2022-02-03  9:16   ` Konrad Hinsen
2022-02-03 11:41     ` Ricardo Wurmus
2022-02-03 17:05       ` Konrad Hinsen
2022-02-03 12:07     ` zimoun
2022-02-05 14:12     ` Ludovic Courtès
2022-02-15 14:10       ` Bengt Richter
2022-02-16 12:03         ` zimoun
2022-02-16 13:04           ` Konrad Hinsen
2022-02-17 11:21             ` zimoun
2022-02-17 16:55               ` Konrad Hinsen
  -- strict thread matches above, loose matches on Subject: below --
2022-02-01 14:05 Konrad Hinsen
2022-02-01 14:30 ` Konrad Hinsen
2022-02-02 23:19   ` Ricardo Wurmus
2022-02-02 23:36     ` Ricardo Wurmus
2022-02-05 14:05 ` Ludovic Courtès
2022-02-08  5:57   ` Konrad Hinsen

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).