unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludo@gnu.org>
To: Maxim Cournoyer <maxim.cournoyer@gmail.com>
Cc: Ricardo Wurmus <rekado@elephly.net>, 51536@debbugs.gnu.org
Subject: bug#51536: openblas builds not reproducible on different x86_64 machines
Date: Thu, 03 Feb 2022 00:13:33 +0100	[thread overview]
Message-ID: <87czk4rheq.fsf@gnu.org> (raw)
In-Reply-To: <87h7cw7ewb.fsf@gmail.com> (Maxim Cournoyer's message of "Sun, 31 Oct 2021 23:07:00 -0400")

[-- Attachment #1: Type: text/plain, Size: 3545 bytes --]

Hi!

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Our OpenBLAS package uses DYNAMIC_ARCH=1 to provide optimizations for
> all supported targets, at least of x86 and x86_64.  In theory that seems
> OK, but in practice the builds differ depending on the host CPU.

What follows is the log of an investigation that didn’t find the root
cause, but perhaps it’ll give us ideas…

Right now the build results of ci.guix and bordeaux.guix differ:

--8<---------------cut here---------------start------------->8---
$ guix describe
Generacio 202	Jan 30 2022 23:57:03	(nuna)
  guix 43dd34c
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: 43dd34c7777a212c99a97da7a2c237158faa9a1b
ludo@ribbon ~/src/guix$ guix challenge openblas
/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18 contents differ:
  no local build for '/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18'
  https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18: 0m1jlc26yrwxn8gxwpj8452kw4g84ywclh0hnab93873ifz87s5c
  https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18: 1d0m9v3kpsqzplpl1law2lfhm6rrbhkkqsvh19dlg9wx45vbbvjb
  differing file:
    /lib/libopenblasp-r0.3.18.so

1 store items were analyzed:
  - 0 (0.0%) were identical
  - 1 (100.0%) differed
  - 0 (0.0%) were inconclusive
--8<---------------cut here---------------end--------------->8---

To get an idea, I thought we could compare the two build logs:

  https://ci.guix.gnu.org/log/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18
  https://bordeaux.guix.gnu.org/build/3fab433c-e7d3-498d-86f8-4bcd5da9c4db

(Protip: I found the second one via
<http://data.guix.gnu.org/gnu/store/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18>.)

The “ar  -ru ../libopenblasp-r0.3.18.a …” are apparently the same in
both cases, which rules out the simple case of unsorted .o files.

The .so on ci.guix is slightly bigger:

--8<---------------cut here---------------start------------->8---
$ wget -qO - https://ci.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18| lzip -d | guix archive -x /tmp/o1
$ wget -qO - https://bordeaux.guix.gnu.org/nar/lzip/ras6dprsw3wm3swk23jjp8ww5dwxj333-openblas-0.3.18| lzip -d | guix archive -x /tmp/o2
$ ls -l /tmp/{o1,o2}/lib/libopenblasp-r0.3.18.so
-r-xr-xr-x 1 ludo users 40538768 Jan  1  1970 /tmp/o1/lib/libopenblasp-r0.3.18.so
-r-xr-xr-x 1 ludo users 40436368 Jan  1  1970 /tmp/o2/lib/libopenblasp-r0.3.18.so
--8<---------------cut here---------------end--------------->8---

Both have the same symbols though, and in the same order:

--8<---------------cut here---------------start------------->8---
$ diff -u <(objdump -T  /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60-  ) <(objdump -T /tmp/o2/lib/libopenblasp-r0.3.18.so |cut -c60- )
$ echo $?
0
--8<---------------cut here---------------end--------------->8---

… which suggests they include code optimized for the same
micro-architectures because symbols include the name of the
micro-architecture:

--8<---------------cut here---------------start------------->8---
$ objdump -T  /tmp/o1/lib/libopenblasp-r0.3.18.so |cut -c 60-|tail -10
  csymm3m_RU
  cgemv_c_BARCELONA
  csymv_U_HASWELL
  dtrmm_iltncopy_CORE2
  LAPACKE_dsytrs2
  openblas_num_threads_env
  csycon_rook_
  csytri_rook_


--8<---------------cut here---------------end--------------->8---

Some of the offsets differ though:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3761 bytes --]

$ diff -u <(objdump -T  /tmp/o1/lib/libopenblasp-r0.3.18.so  ) <(objdump -T /tmp/o2/lib/libopenblasp-r0.3.18.so )
--- /dev/fd/63	2022-02-03 00:10:17.308357982 +0100
+++ /dev/fd/62	2022-02-03 00:10:17.276357923 +0100
@@ -1,5 +1,5 @@
 
-/tmp/o1/lib/libopenblasp-r0.3.18.so:     format de fixer elf64-x86-64
+/tmp/o2/lib/libopenblasp-r0.3.18.so:     format de fixer elf64-x86-64
 
 DYNAMIC SYMBOL TABLE:
 0000000000000000      DF *UND*	0000000000000000  GLIBC_2.3.2 pthread_cond_signal
@@ -91,57 +91,57 @@
 00000000013edb70 g    DF .text	00000000000001be  Base        zgemm3m_incopyb_BULLDOZER
 0000000000e6d200 g    DF .text	0000000000002b06  Base        strsm_kernel_RT_BOBCAT
 0000000000512c00 g    DF .text	0000000000000a0a  Base        zsymv_U_PRESCOTT
-00000000023c7530 g    DF .text	0000000000000201  Base        LAPACKE_dpttrs_work
+00000000023ae930 g    DF .text	0000000000000201  Base        LAPACKE_dpttrs_work
 0000000000692000 g    DF .text	0000000000000b89  Base        srot_k_PENRYN
 000000000179caa0 g    DF .text	0000000000000200  Base        dgemm_beta_HASWELL
 0000000000a44690 g    DF .text	00000000000004b4  Base        dtrsm_iutucopy_OPTERON
-000000000231cfc0 g    DF .text	000000000000021d  Base        LAPACKE_sstein_work
-0000000002327800 g    DF .text	000000000000014b  Base        LAPACKE_ssytrd
-0000000001ad9100 g    DF .text	00000000000002aa  Base        chemm_outcopy_SKYLAKEX
+00000000023043c0 g    DF .text	000000000000021d  Base        LAPACKE_sstein_work
+000000000230ec00 g    DF .text	000000000000014b  Base        LAPACKE_ssytrd
+0000000001acc900 g    DF .text	00000000000002aa  Base        chemm_outcopy_SKYLAKEX
 00000000017d6c10 g    DF .text	0000000000000c38  Base        cgemv_n_HASWELL
-0000000002327b70 g    DF .text	0000000000000143  Base        LAPACKE_ssytrf
+000000000230ef70 g    DF .text	0000000000000143  Base        LAPACKE_ssytrf
 000000000018f010 g    DF .text	000000000000025c  Base        cblas_stbmv
 0000000000195a20 g    DF .text	000000000000003b  Base        cblas_idamin
-0000000002328d40 g    DF .text	0000000000000101  Base        LAPACKE_ssytri
+0000000002310140 g    DF .text	0000000000000101  Base        LAPACKE_ssytri
 000000000077be00 g    DF .text	0000000000000e65  Base        ztrsm_kernel_RN_PENRYN
 0000000001583f20 g    DF .text	0000000000001c22  Base        dtrmm_iltucopy_STEAMROLLER
-00000000021bf830 g    DF .text	0000000000000527  Base        ztbcon_
-0000000001a70630 g    DF .text	00000000000001c7  Base        dsymm_oltcopy_SKYLAKEX
-000000000245a910 g    DF .text	000000000000001b  Base        LAPACKE_zpp_nancheck
+00000000021a6c30 g    DF .text	0000000000000527  Base        ztbcon_
+0000000001a640c0 g    DF .text	000000000000066d  Base        dsymm_oltcopy_SKYLAKEX
+0000000002441d10 g    DF .text	000000000000001b  Base        LAPACKE_zpp_nancheck
 000000000108ee20 g    DF .text	000000000000014d  Base        zgemm3m_oncopyb_ATOM
-0000000002409df0 g    DF .text	000000000000035c  Base        LAPACKE_zgtsvx_work
-0000000001e7d120 g    DF .text	0000000000001743  Base        dlatrs_
-0000000001e948a0 g    DF .text	00000000000001d1  Base        drscl_
+00000000023f11f0 g    DF .text	000000000000035c  Base        LAPACKE_zgtsvx_work
+0000000001e64520 g    DF .text	0000000000001743  Base        dlatrs_
+0000000001e7bca0 g    DF .text	00000000000001d1  Base        drscl_
 00000000019ac700 g    DF .text	00000000000004bd  Base        zhemm3m_iucopyb_ZEN
 00000000003c0f30 g    DF .text	000000000000001e  Base        support_avx512_bf16
-0000000002329ac0 g    DF .text	0000000000000107  Base        LAPACKE_ssytrs
+0000000002310ec0 g    DF .text	0000000000000107  Base        LAPACKE_ssytrs
 0000000000f94890 g    DF .text	00000000000002d3  Base        ztrmm_oltncopy_BOBCAT

[-- Attachment #3: Type: text/plain, Size: 96 bytes --]


On #guix-hpc Ricardo mentioned encountering this reproducibility issue
earlier.

Ludo’.

      parent reply	other threads:[~2022-02-02 23:14 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-11-01  3:07 bug#51536: openblas builds not reproducible on different x86_64 machines Maxim Cournoyer
2021-11-01  8:54 ` Efraim Flashner
2021-11-05 16:38   ` Maxim Cournoyer
2021-11-07  2:33   ` Maxim Cournoyer
2021-11-03 15:03 ` zimoun
2022-02-02 23:13 ` Ludovic Courtès [this message]

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87czk4rheq.fsf@gnu.org \
    --to=ludo@gnu.org \
    --cc=51536@debbugs.gnu.org \
    --cc=maxim.cournoyer@gmail.com \
    --cc=rekado@elephly.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).