unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#50672: python-pytorch is not reproducible
@ 2021-09-19  9:57 Ludovic Courtès
  2021-09-21 15:17 ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-09-19  9:57 UTC (permalink / raw)
  To: 50672

[-- Attachment #1: Type: text/plain, Size: 1250 bytes --]

Bad news!

--8<---------------cut here---------------start------------->8---
$ guix challenge python-pytorch
/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0 contents differ:
  no local build for '/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0'
  https://ci.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 0i55iwy3z4da4lhn93dnrmz775s9ga5kyfli6cmrchacacf9xfpq
  https://bordeaux.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 1fl2v4pd0gcw7wp5k662q0zd4lvvzsggcm5ii8b4kq4v6synhkic
  differing file:
    /lib/python3.8/site-packages/torch/lib/libtorch_cpu.so

1 store items were analyzed:
  - 0 (0.0%) were identical
  - 1 (100.0%) differed
  - 0 (0.0%) were inconclusive
$ guix describe 
Generacio 189   Aug 30 2021 12:09:27    (nuna)
  guix f91ae94
    repository URL: https://git.savannah.gnu.org/git/guix.git
    branch: master
    commit: f91ae9425bb385b60396a544afe27933896b8fa3
--8<---------------cut here---------------end--------------->8---

The file is 165 MiB and Diffoscope (which reads the output of ‘objdump’)
takes forever on it.

However, by comparing the output of ‘strings’ on each file, we get a
hint:


[-- Attachment #2: Type: text/x-patch, Size: 1723 bytes --]

diff -ubBr --show-c-function /tmp/str2 /tmp/str1
--- /tmp/str2	2021-09-19 11:14:47.806798779 +0200
+++ /tmp/str1	2021-09-19 11:14:41.962761127 +0200
@@ -1100584,472 +1100584,472 @@ compute_fast_convolution_input_gradient
 compute_grad_kernel_transform
 compute_fast_convolution_kernel_gradient.isra.0
 compute_fast_convolution_output
-nnp_fft8x8_with_offset_and_stream__avx2.__local0
-nnp_fft8x8_with_offset_and_stream__avx2.__local13
-nnp_fft8x8_with_offset_and_stream__avx2.__local18
-nnp_fft8x8_with_offset_and_stream__avx2.__local1
+nnp_fft8x8_with_offset_and_stream__avx2.__local5
 nnp_fft8x8_with_offset_and_stream__avx2.__local16
+nnp_fft8x8_with_offset_and_stream__avx2.__local6
+nnp_fft8x8_with_offset_and_stream__avx2.__local11
+nnp_fft8x8_with_offset_and_stream__avx2.__local0
 nnp_fft8x8_with_offset_and_stream__avx2.__local2
 nnp_fft8x8_with_offset_and_stream__avx2.__local7
-nnp_fft8x8_with_offset_and_stream__avx2.__local17
-nnp_fft8x8_with_offset_and_stream__avx2.__local10
-nnp_fft8x8_with_offset_and_stream__avx2.__local8
 nnp_fft8x8_with_offset_and_stream__avx2.__local15
+nnp_fft8x8_with_offset_and_stream__avx2.__local8
 nnp_fft8x8_with_offset_and_stream__avx2.__local3
-nnp_fft8x8_with_offset_and_stream__avx2.__local6
-nnp_fft8x8_with_offset_and_stream__avx2.__local14
-nnp_fft8x8_with_offset_and_stream__avx2.__local9
+nnp_fft8x8_with_offset_and_stream__avx2.__local1
 nnp_fft8x8_with_offset_and_stream__avx2.__local4
[…]
 nnp_shdotxf8__avx2.__local13
-nnp_shdotxf8__avx2.__local15
 nnp_shdotxf8__avx2.__local0
+nnp_shdotxf8__avx2.__local9
+nnp_shdotxf8__avx2.__local10
+nnp_shdotxf8__avx2.__local11
+nnp_shdotxf8__avx2.__local12
+nnp_shdotxf8__avx2.__local2

[-- Attachment #3: Type: text/plain, Size: 2153 bytes --]


This appears to come from NNPACK, one of the libraries that are still
bundled.  These functions seem to be generated by Python scripts that
use PeachPy, such as NNPACK/src/x86_64-fma/2d-fourier-8x8.py:

--8<---------------cut here---------------start------------->8---
for post_operation in ["stream", "store"]:
    fft8x8_arguments = (arg_t_pointer, arg_f_pointer, arg_t_stride, arg_f_stride, arg_row_count, arg_column_count, arg_row_offset, arg_column_offset)
    with Function("nnp_fft8x8_with_offset_and_{post_operation}__avx2".format(post_operation=post_operation),
        fft8x8_arguments, target=uarch.default + isa.fma3 + isa.avx2):
[…]
--8<---------------cut here---------------end--------------->8---


The ‘__local’ bit in the name comes from PeachPy, in peachpy/name.py:

--8<---------------cut here---------------start------------->8---
            suffixed_name = "__local" + str(suffix)
            for name_object in iter(unnamed_objects):
                # Generate a non-conflicting name by appending a suffix
                while suffixed_name in self.names:
                    suffix += 1
                    suffixed_name = "__local" + str(suffix)
--8<---------------cut here---------------end--------------->8---

So the problem may be that these things get generated in parallel, and
thus numbering is non-deterministic.

NNPACK/CMakeLists.txt has this bit to generate targets to build all
that:

--8<---------------cut here---------------start------------->8---
      ADD_CUSTOM_COMMAND(
        OUTPUT ${obj}
        COMMAND "PYTHONPATH=${PEACHPY_PYTHONPATH}"
          ${PYTHON_EXECUTABLE} -m peachpy.x86_64
            -mabi=sysv -g4 -mimage-format=${PEACHPY_IMAGE_FORMAT}
            "-I${PROJECT_SOURCE_DIR}/src" "-I${PROJECT_SOURCE_DIR}/src/x86_64-fma" "-I${FP16_SOURCE_DIR}/include"
            -o ${obj} "${PROJECT_SOURCE_DIR}/${src}"
        DEPENDS ${NNPACK_BACKEND_PEACHPY_OBJS})
--8<---------------cut here---------------end--------------->8---

It might be that building just those targets sequentially would solve
the problem.

To be continued…

Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#50672: python-pytorch is not reproducible
  2021-09-19  9:57 bug#50672: python-pytorch is not reproducible Ludovic Courtès
@ 2021-09-21 15:17 ` Ludovic Courtès
  2021-09-24 14:04   ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-09-21 15:17 UTC (permalink / raw)
  To: 50672

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> $ guix challenge python-pytorch
> /gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0 contents differ:
>   no local build for '/gnu/store/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0'
>   https://ci.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 0i55iwy3z4da4lhn93dnrmz775s9ga5kyfli6cmrchacacf9xfpq
>   https://bordeaux.guix.gnu.org/nar/lzip/dgdswx4vvf07xmhih21n4fnr68dh3fhd-python-pytorch-1.9.0: 1fl2v4pd0gcw7wp5k662q0zd4lvvzsggcm5ii8b4kq4v6synhkic
>   differing file:
>     /lib/python3.8/site-packages/torch/lib/libtorch_cpu.so
>
> 1 store items were analyzed:
>   - 0 (0.0%) were identical
>   - 1 (100.0%) differed
>   - 0 (0.0%) were inconclusive
> $ guix describe 
> Generacio 189   Aug 30 2021 12:09:27    (nuna)
>   guix f91ae94
>     repository URL: https://git.savannah.gnu.org/git/guix.git
>     branch: master
>     commit: f91ae9425bb385b60396a544afe27933896b8fa3

Reported upstream: <https://github.com/pytorch/pytorch/issues/65404>.

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#50672: python-pytorch is not reproducible
  2021-09-21 15:17 ` Ludovic Courtès
@ 2021-09-24 14:04   ` Ludovic Courtès
  2021-09-27 13:25     ` zimoun
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-09-24 14:04 UTC (permalink / raw)
  To: 50672

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> Reported upstream: <https://github.com/pytorch/pytorch/issues/65404>.

PyTorch upstream noted that the problem is in NNPACK, not PyTorch
proper.

Having unbundled NNPACK in d326dec8115cf5e2cac9497633dc11ecc970361b, I
can confirm that PyTorch itself is now reproducible, but NNPACK isn’t.

Reported at <https://github.com/Maratyszcza/NNPACK/issues/206>.

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#50672: python-pytorch is not reproducible
  2021-09-24 14:04   ` Ludovic Courtès
@ 2021-09-27 13:25     ` zimoun
  2021-09-28  9:24       ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: zimoun @ 2021-09-27 13:25 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 50672

Hi,

On Fri, 24 Sept 2021 at 16:11, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

> Having unbundled NNPACK in d326dec8115cf5e2cac9497633dc11ecc970361b, I
> can confirm that PyTorch itself is now reproducible, but NNPACK isn’t.

I reproduce: "guix build nnpack --no-grafts --check" differs.  Pytorch, not.

> PyTorch upstream noted that the problem is in NNPACK, not PyTorch
> proper.

Closing this report?

However, I notice 2 things:

 1- Unbundled dependencies are still fetched
 2- Does the Git submodule mechanism work with the SWH fallback?

--8<---------------cut here---------------start------------->8---
Initialized empty Git repository in
/gnu/store/…-python-pytorch-1.9.0-checkout/.git/
From https://github.com/pytorch/pytorch
 * tag               v1.9.0     -> FETCH_HEAD

[...]

HEAD is now at d69c22d [docs] Add torch.package documentation for beta
release (#59886)
/gnu/store/…-bash-minimal-5.0.16/bin/sh: warning: setlocale: LC_ALL:
cannot change locale (en_US.utf8)
Submodule 'android/libs/fbjni'
(https://github.com/facebookincubator/fbjni.git) registered for path
'android/libs/fbjni'
Submodule 'third_party/NNPACK_deps/FP16'
(https://github.com/Maratyszcza/FP16.git) registered for path
'third_party/FP16'
Submodule 'third_party/NNPACK_deps/FXdiv'
(https://github.com/Maratyszcza/FXdiv.git) registered for path
'third_party/FXdiv'
Submodule 'third_party/NNPACK'
(https://github.com/Maratyszcza/NNPACK.git) registered for path
'third_party/NNPACK'
Submodule 'third_party/QNNPACK' (https://github.com/pytorch/QNNPACK)
registered for path 'third_party/QNNPACK'
Submodule 'third_party/XNNPACK'
(https://github.com/google/XNNPACK.git) registered for path
'third_party/XNNPACK'

[...]

Submodule 'third_party/NNPACK_deps/psimd'
(https://github.com/Maratyszcza/psimd.git) registered for path
'third_party/psimd'
Submodule 'third_party/NNPACK_deps/pthreadpool'
(https://github.com/Maratyszcza/pthreadpool.git) registered for path
'third_party/pthreadpool'

[...]

Cloning into '/gnu/store/…-python-pytorch-1.9.0-checkout/third_party/NNPACK'...
Cloning into '/gnu/store/…-python-pytorch-1.9.0-checkout/third_party/QNNPACK'...
Cloning into '/gnu/store/…-python-pytorch-1.9.0-checkout/third_party/XNNPACK'...

[...]

Submodule path 'third_party/NNPACK': checked out
'c07e3a0400713d546e0dea2d5466dd22ea389c73'
Submodule path 'third_party/QNNPACK': checked out
'7d2a4e9931a82adc3814275b6219a03e24e36b4c'
Submodule path 'third_party/XNNPACK': checked out
'55d53a4e7079d38e90acd75dd9e4f9e781d2da35'

[...]
--8<---------------cut here---------------end--------------->8---


Cheers,
simon




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#50672: python-pytorch is not reproducible
  2021-09-27 13:25     ` zimoun
@ 2021-09-28  9:24       ` Ludovic Courtès
  2021-10-22 15:31         ` bug#50672: nnpack " Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-09-28  9:24 UTC (permalink / raw)
  To: zimoun; +Cc: 50672

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

> On Fri, 24 Sept 2021 at 16:11, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
>
>> Having unbundled NNPACK in d326dec8115cf5e2cac9497633dc11ecc970361b, I
>> can confirm that PyTorch itself is now reproducible, but NNPACK isn’t.
>
> I reproduce: "guix build nnpack --no-grafts --check" differs.  Pytorch, not.
>
>> PyTorch upstream noted that the problem is in NNPACK, not PyTorch
>> proper.
>
> Closing this report?

No, I’ve retitled it.  Now looking at PeachPy:

  https://github.com/Maratyszcza/PeachPy/issues/88

> However, I notice 2 things:
>
>  1- Unbundled dependencies are still fetched

Yes but the snippet wipes them right after.

>  2- Does the Git submodule mechanism work with the SWH fallback?

No, not yet; there’s a comment in (guix git-download).  Fixing it should
be doable.

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#50672: nnpack is not reproducible
  2021-09-28  9:24       ` Ludovic Courtès
@ 2021-10-22 15:31         ` Ludovic Courtès
  2021-10-23  3:32           ` Kyle Meyer
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2021-10-22 15:31 UTC (permalink / raw)
  To: zimoun; +Cc: 50672

[-- Attachment #1: Type: text/plain, Size: 417 bytes --]

Hi,

Ludovic Courtès <ludovic.courtes@inria.fr> skribis:

> No, I’ve retitled it.  Now looking at PeachPy:
>
>   https://github.com/Maratyszcza/PeachPy/issues/88

For the record, I tried the attached patch in an attempt to sort things
as discussed in the issue above, but it doesn’t have the intended
effect.  There must be other unsorted dictionaries elsewhere.

Suggestions welcome!

Ludo’.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: the patch --]
[-- Type: text/x-patch, Size: 755 bytes --]

Make PeachPy processes deterministic:

  https://github.com/Maratyszcza/PeachPy/issues/88
  https://issues.guix.gnu.org/50672

diff --git a/peachpy/name.py b/peachpy/name.py
index b6a03dc..c069fc2 100644
--- a/peachpy/name.py
+++ b/peachpy/name.py
@@ -95,6 +95,10 @@ class Namespace:
             self.prenames[scope_name.prename].add(scope)
 
     def assign_names(self):
+        # Step 0: sort the dictionary for deterministic output
+        self.prenames = dict(sorted(self.prenames.items(),
+                                    key=lambda item: "" if item[0] == None else item[0]))
+
         # Step 1: assign names to symbols with prenames with no conflicts
         for prename in six.iterkeys(self.prenames):
             if prename is not None:

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#50672: nnpack is not reproducible
  2021-10-22 15:31         ` bug#50672: nnpack " Ludovic Courtès
@ 2021-10-23  3:32           ` Kyle Meyer
  2021-10-25 12:54             ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Kyle Meyer @ 2021-10-23  3:32 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 50672

Ludovic Courtès writes:

> For the record, I tried the attached patch in an attempt to sort things
> as discussed in the issue above, but it doesn’t have the intended
> effect.  There must be other unsorted dictionaries elsewhere.

Hmm, I don't think dictionaries are a likely culprit here because
Python's dict implementation preserves the insertion order as of Python
v3.6 (and that behavior is declared as part of the language spec with
v3.7).

> diff --git a/peachpy/name.py b/peachpy/name.py
> index b6a03dc..c069fc2 100644
> --- a/peachpy/name.py
> +++ b/peachpy/name.py
> @@ -95,6 +95,10 @@ class Namespace:
>              self.prenames[scope_name.prename].add(scope)
>  
>      def assign_names(self):
> +        # Step 0: sort the dictionary for deterministic output
> +        self.prenames = dict(sorted(self.prenames.items(),
> +                                    key=lambda item: "" if item[0] == None else item[0]))
> +

In cases where the order of the keys isn't specified (i.e. Python 3.5
and below), I think the end result after your change is the same: it
creates a new dictionary for sorted _input_, but things won't
necessarily come out in the same order.

I'm not familiar with PeachPy, but taking a peek at name.py, the sets
used for the values of the prenames dictionary could be the problem.
And if that's the case, one solution would be switching those values
from sets to dictionaries.

With the change below (on top of PeachPy's 257881e), nnpack builds
reliably for me across a couple of attempts:

  $ guix-dev build --with-git-url=python-peachpy=$local --no-grafts --check nnpack
  successfully built /gnu/store/7z4nl55gssrf9na7wsvmw1dsqgawnj2p-nnpack-0.0-1.c07e3a0.drv
  successfully built /gnu/store/7z4nl55gssrf9na7wsvmw1dsqgawnj2p-nnpack-0.0-1.c07e3a0.drv
  /gnu/store/4ihjil42fbk53q73gpvdakynbv9q5q09-nnpack-0.0-1.c07e3a0

diff --git a/peachpy/name.py b/peachpy/name.py
index b6a03dc..412079d 100644
--- a/peachpy/name.py
+++ b/peachpy/name.py
@@ -86,13 +86,13 @@ def add_scoped_name(self, scoped_name):
                 self.names[scope_name.name] = scope
         else:
             assert scope_name.name is None
-            self.prenames.setdefault(scope_name.prename, set())
+            self.prenames.setdefault(scope_name.prename, {})
             if subscoped_name:
                 for subscope in iter(self.prenames[scope_name.prename]):
                     if isinstance(subscope, Namespace) and subscope.scope_name is scope_name:
                         subscope.add_scoped_name(subscoped_name)
                         return
-            self.prenames[scope_name.prename].add(scope)
+            self.prenames[scope_name.prename][scope] = None
 
     def assign_names(self):
         # Step 1: assign names to symbols with prenames with no conflicts




^ permalink raw reply related	[flat|nested] 8+ messages in thread

* bug#50672: nnpack is not reproducible
  2021-10-23  3:32           ` Kyle Meyer
@ 2021-10-25 12:54             ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2021-10-25 12:54 UTC (permalink / raw)
  To: Kyle Meyer; +Cc: 50672-done

Hi!

Kyle Meyer <kyle@kyleam.com> skribis:

> Ludovic Courtès writes:
>
>> For the record, I tried the attached patch in an attempt to sort things
>> as discussed in the issue above, but it doesn’t have the intended
>> effect.  There must be other unsorted dictionaries elsewhere.
>
> Hmm, I don't think dictionaries are a likely culprit here because
> Python's dict implementation preserves the insertion order as of Python
> v3.6 (and that behavior is declared as part of the language spec with
> v3.7).

Ah, silly me.

> In cases where the order of the keys isn't specified (i.e. Python 3.5
> and below), I think the end result after your change is the same: it
> creates a new dictionary for sorted _input_, but things won't
> necessarily come out in the same order.

Noted, thanks for explaining.

> I'm not familiar with PeachPy, but taking a peek at name.py, the sets
> used for the values of the prenames dictionary could be the problem.
> And if that's the case, one solution would be switching those values
> from sets to dictionaries.
>
> With the change below (on top of PeachPy's 257881e), nnpack builds
> reliably for me across a couple of attempts:
>
>   $ guix-dev build --with-git-url=python-peachpy=$local --no-grafts --check nnpack
>   successfully built /gnu/store/7z4nl55gssrf9na7wsvmw1dsqgawnj2p-nnpack-0.0-1.c07e3a0.drv
>   successfully built /gnu/store/7z4nl55gssrf9na7wsvmw1dsqgawnj2p-nnpack-0.0-1.c07e3a0.drv
>   /gnu/store/4ihjil42fbk53q73gpvdakynbv9q5q09-nnpack-0.0-1.c07e3a0

Your patch does the trick, indeed.  I went ahead and pushed it as
b87fe805aa66851f17f56078cb0e94f7cc4525df.

Thank you!

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2021-10-25 13:07 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-09-19  9:57 bug#50672: python-pytorch is not reproducible Ludovic Courtès
2021-09-21 15:17 ` Ludovic Courtès
2021-09-24 14:04   ` Ludovic Courtès
2021-09-27 13:25     ` zimoun
2021-09-28  9:24       ` Ludovic Courtès
2021-10-22 15:31         ` bug#50672: nnpack " Ludovic Courtès
2021-10-23  3:32           ` Kyle Meyer
2021-10-25 12:54             ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).