From: "Ludovic Courtès" <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: Ricardo Wurmus <rekado@elephly.net>, 51536@debbugs.gnu.org
Subject: bug#51536: openblas builds not reproducible on different x86_64 machines
Date: Thu, 03 Feb 2022 00:13:33 +0100 [thread overview]
Message-ID: <87czk4rheq.fsf@gnu.org> (raw)
In-Reply-To: <87h7cw7ewb.fsf@gmail.com> (Maxim Cournoyer's message of "Sun, 31 Oct 2021 23:07:00 -0400")
[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]
Hi!
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> Our OpenBLAS package uses DYNAMIC_ARCH=1 to provide optimizations for
> all supported targets, at least of x86 and x86_64. In theory that seems
> OK, but in practice the builds differ depending on the host CPU.
What follows is the log of an investigation that didn’t find the root
cause, but perhaps it’ll give us ideas…
Right now the build results of ci.guix and bordeaux.guix differ:
--8<---------------cut here---------------start------------->8---
$ guix describe
Generacio 202 Jan 30 2022 23:57:03 (nuna)
guix 43dd34c
repository URL: https://git.savannah.gnu.org/git/guix.git
branch: master
commit: 43dd34c7777a212c99a97da7a2c237158faa9a1b
ludo@ribbon ~/src/guix$ guix challenge openblas
/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18 contents differ:
no local build for '/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18'
https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18: 0m1jlc26yrwxn8gxwpj8452kw4g84ywclh0hnab93873ifz87s5c
https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18: 1d0m9v3kpsqzplpl1law2lfhm6rrbhkkqsvh19dlg9wx45vbbvjb
differing file:
/lib/libopenblasp-r0.3.18.so
1 store items were analyzed:
- 0 (0.0%) were identical
- 1 (100.0%) differed
- 0 (0.0%) were inconclusive
--8<---------------cut here---------------end--------------->8---
To get an idea, I thought we could compare the two build logs:
https://ci.guix.gnu.org/log/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18
https://bordeaux.guix.gnu.org/build/3fab433c-e7d3-498d-86f8-4bcd5da9c4db
(Protip: I found the second one via
<http://data.guix.gnu.org/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18>.)
The “ar -ru ../libopenblasp-r0.3.18.a …” are apparently the same in
both cases, which rules out the simple case of unsorted .o files.
The .so on ci.guix is slightly bigger:
--8<---------------cut here---------------start------------->8---
$ wget -qO - https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18| lzip -d | guix archive -x /tmp/o1
$ wget -qO - https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18| lzip -d | guix archive -x /tmp/o2
$ ls -l /tmp/{o1,o2}/lib/libopenblasp-r0.3.18.so
-r-xr-xr-x 1 ludo users 40538768 Jan 1 1970 /tmp/o1/lib/libopenblasp-r0.3.18.so
-r-xr-xr-x 1 ludo users 40436368 Jan 1 1970 /tmp/o2/lib/libopenblasp-r0.3.18.so
--8<---------------cut here---------------end--------------->8---
Both have the same symbols though, and in the same order:
--8<---------------cut here---------------start------------->8---
$ diff -u <(objdump -T /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60- ) <(objdump -T /tmp/o2/lib/libopenblasp-r0.3.18.so |cut -c60- )
$ echo $?
0
--8<---------------cut here---------------end--------------->8---
… which suggests they include code optimized for the same
micro-architectures because symbols include the name of the
micro-architecture:
--8<---------------cut here---------------start------------->8---
$ objdump -T /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60-|tail -10
csymm3m_RU
cgemv_c_BARCELONA
csymv_U_HASWELL
dtrmm_iltncopy_CORE2
LAPACKE_dsytrs2
openblas_num_threads_env
csycon_rook_
csytri_rook_
--8<---------------cut here---------------end--------------->8---
Some of the offsets differ though:
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3761 bytes --]
$ diff -u <(objdump -T /tmp/o1/lib/libopenblasp-r0.3.18.so ) <(objdump -T /tmp/o2/lib/libopenblasp-r0.3.18.so )
--- /dev/fd/63 2022-02-03 00:10:17.308357982 +0100
+++ /dev/fd/62 2022-02-03 00:10:17.276357923 +0100
@@ -1,5 +1,5 @@
-/tmp/o1/lib/libopenblasp-r0.3.18.so: format de fixer elf64-x86-64
+/tmp/o2/lib/libopenblasp-r0.3.18.so: format de fixer elf64-x86-64
DYNAMIC SYMBOL TABLE:
0000000000000000 DF *UND* 0000000000000000 GLIBC_2.3.2 pthread_cond_signal
@@ -91,57 +91,57 @@
00000000013edb70 g DF .text 00000000000001be Base zgemm3m_incopyb_BULLDOZER
0000000000e6d200 g DF .text 0000000000002b06 Base strsm_kernel_RT_BOBCAT
0000000000512c00 g DF .text 0000000000000a0a Base zsymv_U_PRESCOTT
-00000000023c7530 g DF .text 0000000000000201 Base LAPACKE_dpttrs_work
+00000000023ae930 g DF .text 0000000000000201 Base LAPACKE_dpttrs_work
0000000000692000 g DF .text 0000000000000b89 Base srot_k_PENRYN
000000000179caa0 g DF .text 0000000000000200 Base dgemm_beta_HASWELL
0000000000a44690 g DF .text 00000000000004b4 Base dtrsm_iutucopy_OPTERON
-000000000231cfc0 g DF .text 000000000000021d Base LAPACKE_sstein_work
-0000000002327800 g DF .text 000000000000014b Base LAPACKE_ssytrd
-0000000001ad9100 g DF .text 00000000000002aa Base chemm_outcopy_SKYLAKEX
+00000000023043c0 g DF .text 000000000000021d Base LAPACKE_sstein_work
+000000000230ec00 g DF .text 000000000000014b Base LAPACKE_ssytrd
+0000000001acc900 g DF .text 00000000000002aa Base chemm_outcopy_SKYLAKEX
00000000017d6c10 g DF .text 0000000000000c38 Base cgemv_n_HASWELL
-0000000002327b70 g DF .text 0000000000000143 Base LAPACKE_ssytrf
+000000000230ef70 g DF .text 0000000000000143 Base LAPACKE_ssytrf
000000000018f010 g DF .text 000000000000025c Base cblas_stbmv
0000000000195a20 g DF .text 000000000000003b Base cblas_idamin
-0000000002328d40 g DF .text 0000000000000101 Base LAPACKE_ssytri
+0000000002310140 g DF .text 0000000000000101 Base LAPACKE_ssytri
000000000077be00 g DF .text 0000000000000e65 Base ztrsm_kernel_RN_PENRYN
0000000001583f20 g DF .text 0000000000001c22 Base dtrmm_iltucopy_STEAMROLLER
-00000000021bf830 g DF .text 0000000000000527 Base ztbcon_
-0000000001a70630 g DF .text 00000000000001c7 Base dsymm_oltcopy_SKYLAKEX
-000000000245a910 g DF .text 000000000000001b Base LAPACKE_zpp_nancheck
+00000000021a6c30 g DF .text 0000000000000527 Base ztbcon_
+0000000001a640c0 g DF .text 000000000000066d Base dsymm_oltcopy_SKYLAKEX
+0000000002441d10 g DF .text 000000000000001b Base LAPACKE_zpp_nancheck
000000000108ee20 g DF .text 000000000000014d Base zgemm3m_oncopyb_ATOM
-0000000002409df0 g DF .text 000000000000035c Base LAPACKE_zgtsvx_work
-0000000001e7d120 g DF .text 0000000000001743 Base dlatrs_
-0000000001e948a0 g DF .text 00000000000001d1 Base drscl_
+00000000023f11f0 g DF .text 000000000000035c Base LAPACKE_zgtsvx_work
+0000000001e64520 g DF .text 0000000000001743 Base dlatrs_
+0000000001e7bca0 g DF .text 00000000000001d1 Base drscl_
00000000019ac700 g DF .text 00000000000004bd Base zhemm3m_iucopyb_ZEN
00000000003c0f30 g DF .text 000000000000001e Base support_avx512_bf16
-0000000002329ac0 g DF .text 0000000000000107 Base LAPACKE_ssytrs
+0000000002310ec0 g DF .text 0000000000000107 Base LAPACKE_ssytrs
0000000000f94890 g DF .text 00000000000002d3 Base ztrmm_oltncopy_BOBCAT
[-- Attachment #3: Type: text/plain, Size: 96 bytes --]
On #guix-hpc Ricardo mentioned encountering this reproducibility issue
earlier.
Ludo’.
prev parent reply other threads:[~2022-02-02 23:14 UTC|newest]
Thread overview: 6+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-11-01 3:07 bug#51536: openblas builds not reproducible on different x86_64 machines Maxim Cournoyer
2021-11-01 8:54 ` Efraim Flashner
2021-11-05 16:38 ` Maxim Cournoyer
2021-11-07 2:33 ` Maxim Cournoyer
2021-11-03 15:03 ` zimoun
2022-02-02 23:13 ` Ludovic Courtès [this message]
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87czk4rheq.fsf@gnu.org \
--to=ludo@gnu.org \
--cc=51536@debbugs.gnu.org \
--cc=maxim.cournoyer@gmail.com \
--cc=rekado@elephly.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).