all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#50672: python-pytorch is not reproducible
@ 2021-09-19  9:57 Ludovic Courtès
  2021-09-21 15:17 ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-09-19  9:57 UTC (permalink / raw)
  To: 50672

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

Bad news!

--8<---------------cut here---------------start------------->8---
$ guix challenge python-pytorch
/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0 contents differ:
  no local build for '/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0'
  https://ci.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 0i55iwy3z4da4lhn93dnrmz775s9ga5kyfli6cmrchacacf9xfpq
  https://bordeaux.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 1fl2v4pd0gcw7wp5k662q0zd4lvvzsggcm5ii8b4kq4v6synhkic
  differing file:
    /lib/python3.8/site-packages/torch/lib/libtorch_cpu.so

1 store items were analyzed:
  - 0 (0.0%) were identical
  - 1 (100.0%) differed
  - 0 (0.0%) were inconclusive
$ guix describe 
Generacio 189   Aug 30 2021 12:09:27    (nuna)
  guix f91ae94
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: f91ae9425bb385b60396a544afe27933896b8fa3
--8<---------------cut here---------------end--------------->8---

The file is 165 MiB and Diffoscope (which reads the output of ‘objdump’)
takes forever on it.

However, by comparing the output of ‘strings’ on each file, we get a
hint:


[-- Attachment #2: Type: text/x-patch, Size: 1723 bytes --]

diff -ubBr --show-c-function /tmp/str2 /tmp/str1
--- /tmp/str2	2021-09-19 11:14:47.806798779 +0200
+++ /tmp/str1	2021-09-19 11:14:41.962761127 +0200
@@ -1100584,472 +1100584,472 @@ compute_fast_convolution_input_gradient
 compute_grad_kernel_transform
 compute_fast_convolution_kernel_gradient.isra.0
 compute_fast_convolution_output
-nnp_fft8x8_with_offset_and_stream__avx2.__local0
-nnp_fft8x8_with_offset_and_stream__avx2.__local13
-nnp_fft8x8_with_offset_and_stream__avx2.__local18
-nnp_fft8x8_with_offset_and_stream__avx2.__local1
+nnp_fft8x8_with_offset_and_stream__avx2.__local5
 nnp_fft8x8_with_offset_and_stream__avx2.__local16
+nnp_fft8x8_with_offset_and_stream__avx2.__local6
+nnp_fft8x8_with_offset_and_stream__avx2.__local11
+nnp_fft8x8_with_offset_and_stream__avx2.__local0
 nnp_fft8x8_with_offset_and_stream__avx2.__local2
 nnp_fft8x8_with_offset_and_stream__avx2.__local7
-nnp_fft8x8_with_offset_and_stream__avx2.__local17
-nnp_fft8x8_with_offset_and_stream__avx2.__local10
-nnp_fft8x8_with_offset_and_stream__avx2.__local8
 nnp_fft8x8_with_offset_and_stream__avx2.__local15
+nnp_fft8x8_with_offset_and_stream__avx2.__local8
 nnp_fft8x8_with_offset_and_stream__avx2.__local3
-nnp_fft8x8_with_offset_and_stream__avx2.__local6
-nnp_fft8x8_with_offset_and_stream__avx2.__local14
-nnp_fft8x8_with_offset_and_stream__avx2.__local9
+nnp_fft8x8_with_offset_and_stream__avx2.__local1
 nnp_fft8x8_with_offset_and_stream__avx2.__local4
[…]
 nnp_shdotxf8__avx2.__local13
-nnp_shdotxf8__avx2.__local15
 nnp_shdotxf8__avx2.__local0
+nnp_shdotxf8__avx2.__local9
+nnp_shdotxf8__avx2.__local10
+nnp_shdotxf8__avx2.__local11
+nnp_shdotxf8__avx2.__local12
+nnp_shdotxf8__avx2.__local2

[-- Attachment #3: Type: text/plain, Size: 2153 bytes --]


This appears to come from NNPACK, one of the libraries that are still
bundled.  These functions seem to be generated by Python scripts that
use PeachPy, such as NNPACK/src/x86_64-fma/2d-fourier-8x8.py:

--8<---------------cut here---------------start------------->8---
for post_operation in ["stream", "store"]:
    fft8x8_arguments = (arg_t_pointer, arg_f_pointer, arg_t_stride, arg_f_stride, arg_row_count, arg_column_count, arg_row_offset, arg_column_offset)
    with Function("nnp_fft8x8_with_offset_and_{post_operation}__avx2".format(post_operation=post_operation),
        fft8x8_arguments, target=uarch.default + isa.fma3 + isa.avx2):
[…]
--8<---------------cut here---------------end--------------->8---


The ‘__local’ bit in the name comes from PeachPy, in peachpy/name.py:

--8<---------------cut here---------------start------------->8---
            suffixed_name = "__local" + str(suffix)
            for name_object in iter(unnamed_objects):
                # Generate a non-conflicting name by appending a suffix
                while suffixed_name in self.names:
                    suffix += 1
                    suffixed_name = "__local" + str(suffix)
--8<---------------cut here---------------end--------------->8---

So the problem may be that these things get generated in parallel, and
thus numbering is non-deterministic.

NNPACK/CMakeLists.txt has this bit to generate targets to build all
that:

--8<---------------cut here---------------start------------->8---
      ADD_CUSTOM_COMMAND(
        OUTPUT ${obj}
        COMMAND "PYTHONPATH=${PEACHPY_PYTHONPATH}"
          ${PYTHON_EXECUTABLE} -m peachpy.x86_64
            -mabi=sysv -g4 -mimage-format=${PEACHPY_IMAGE_FORMAT}
            "-I${PROJECT_SOURCE_DIR}/src" "-I${PROJECT_SOURCE_DIR}/src/x86_64-fma" "-I${FP16_SOURCE_DIR}/include"
            -o ${obj} "${PROJECT_SOURCE_DIR}/${src}"
        DEPENDS ${NNPACK_BACKEND_PEACHPY_OBJS})
--8<---------------cut here---------------end--------------->8---

It might be that building just those targets sequentially would solve
the problem.

To be continued…

Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-25 13:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-09-19  9:57 bug#50672: python-pytorch is not reproducible Ludovic Courtès
2021-09-21 15:17 ` Ludovic Courtès
2021-09-24 14:04   ` Ludovic Courtès
2021-09-27 13:25     ` zimoun
2021-09-28  9:24       ` Ludovic Courtès
2021-10-22 15:31         ` bug#50672: nnpack " Ludovic Courtès
2021-10-23  3:32           ` Kyle Meyer
2021-10-25 12:54             ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.