* bug#22533: Non-determinism in python-3 ".pyc" bytecode @ 2016-02-02 5:15 Leo Famulari 2016-02-02 8:54 ` Leo Famulari ` (2 more replies) 0 siblings, 3 replies; 29+ messages in thread From: Leo Famulari @ 2016-02-02 5:15 UTC (permalink / raw) To: 22533 While preparing a package for borg [0], I found that the built output was not reproducible. The problem is that the bytecode compiler [1] for Python 3.4.3 (our current version) encodes the mtime of the corresponding Python source file in the output. This is described in PEP-3147 [2], and the responsible Python code is referenced below [3]. I tested a few of our existing python-3 packages: python-ccm, python-pysam, and python-scripttest all exhibit the same problem. We fixed this in python-2 with the patch python-2.7-source-date-epoch.patch, but I don't know how to write this patch for python-3. Can somebody write this patch? I asked about this on #debian-reproducible and they said that it wasn't an issue for Debian since they don't ship bytecode, but instead generate it at install time. Of course, that doesn't really apply to Guix. I used diffoscope-34 to inspect the build outputs to find this, and you can see the report here: https://famulari.name/misc/7c55c9e97f668234ddea50299d986f14/borg-diffoscope-report.html It's first demonstrated in the file ...-borg-0.30.0/lib/python3.4/site-packages/__pycache__/site.cpython-34.pyc. The first 2 bytes are the "magic numbers" described in PEP-3147, which specify the version of the bytecode format. The next 2 bytes are the problematic timestamp, as described in the PEP-3147. [0] http://borgbackup.github.io/ [1] https://docs.python.org/3/library/py_compile.html [2] https://www.python.org/dev/peps/pep-3147/ [3] Check out the Guix git commit 4efc8eb27502c, and from there: $ tar xf $(./pre-inst-env guix build --source python-3) $ sed -n 139,140p Python-3.4.3/Lib/py_compile.py bytecode = importlib._bootstrap._code_to_bytecode( code, source_stats['mtime'], source_stats['size']) ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Non-determinism in python-3 ".pyc" bytecode 2016-02-02 5:15 bug#22533: Non-determinism in python-3 ".pyc" bytecode Leo Famulari @ 2016-02-02 8:54 ` Leo Famulari 2016-02-02 20:41 ` Ludovic Courtès 2017-05-26 13:41 ` bug#22533: Python bytecode reproducibility Marius Bakke 2 siblings, 0 replies; 29+ messages in thread From: Leo Famulari @ 2016-02-02 8:54 UTC (permalink / raw) To: 22533 On Tue, Feb 02, 2016 at 12:15:44AM -0500, Leo Famulari wrote: > While preparing a package for borg [0], I found that the built output > was not reproducible. The problem is that the bytecode compiler [1] for > Python 3.4.3 (our current version) encodes the mtime of the > corresponding Python source file in the output. This is described in > PEP-3147 [2], and the responsible Python code is referenced below [3]. > > I tested a few of our existing python-3 packages: python-ccm, > python-pysam, and python-scripttest all exhibit the same problem. > > We fixed this in python-2 with the patch > python-2.7-source-date-epoch.patch, but I don't know how to write this > patch for python-3. mark_weaver suggested setting the timestamps of the source files before building. I think this is a better option if it doesn't break anything. It would allow the bytecode "staleness" check to work as expected while keeping the output consistent. > > Can somebody write this patch? > > I asked about this on #debian-reproducible and they said that it wasn't > an issue for Debian since they don't ship bytecode, but instead generate > it at install time. Of course, that doesn't really apply to Guix. > > I used diffoscope-34 to inspect the build outputs to find this, and you > can see the report here: > https://famulari.name/misc/7c55c9e97f668234ddea50299d986f14/borg-diffoscope-report.html > > It's first demonstrated in the file > ...-borg-0.30.0/lib/python3.4/site-packages/__pycache__/site.cpython-34.pyc. > > The first 2 bytes are the "magic numbers" described in PEP-3147, which > specify the version of the bytecode format. The next 2 bytes are the > problematic timestamp, as described in the PEP-3147. > > [0] > http://borgbackup.github.io/ > > [1] > https://docs.python.org/3/library/py_compile.html > > [2] > https://www.python.org/dev/peps/pep-3147/ > > [3] Check out the Guix git commit 4efc8eb27502c, and from there: > $ tar xf $(./pre-inst-env guix build --source python-3) > $ sed -n 139,140p Python-3.4.3/Lib/py_compile.py > bytecode = importlib._bootstrap._code_to_bytecode( > code, source_stats['mtime'], source_stats['size']) > > > ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Non-determinism in python-3 ".pyc" bytecode 2016-02-02 5:15 bug#22533: Non-determinism in python-3 ".pyc" bytecode Leo Famulari 2016-02-02 8:54 ` Leo Famulari @ 2016-02-02 20:41 ` Ludovic Courtès 2016-02-04 23:17 ` Leo Famulari 2017-05-26 13:41 ` bug#22533: Python bytecode reproducibility Marius Bakke 2 siblings, 1 reply; 29+ messages in thread From: Ludovic Courtès @ 2016-02-02 20:41 UTC (permalink / raw) To: Leo Famulari; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 231 bytes --] Leo Famulari <leo@famulari.name> skribis: > We fixed this in python-2 with the patch > python-2.7-source-date-epoch.patch, but I don't know how to write this > patch for python-3. I would imagine something like this (untested): [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 653 bytes --] --- Python-3.4.3/Lib/importlib/_bootstrap.py 2016-02-02 21:38:48.655809055 +0100 +++ Python-3.4.3/Lib/importlib/_bootstrap.py.new 2016-02-02 21:38:43.659769251 +0100 @@ -667,7 +667,10 @@ def _code_to_bytecode(code, mtime=0, sou """Compile a code object into bytecode for writing out to a byte-compiled file.""" data = bytearray(MAGIC_NUMBER) - data.extend(_w_long(mtime)) + if 'SOURCE_DATE_EPOCH' in _os.environ: + data.extend(_w_long(string.atoi(_os.environ['SOURCE_DATE_EPOCH']))) + else: + data.extend(_w_long(mtime)) data.extend(_w_long(source_size)) data.extend(marshal.dumps(code)) return data [-- Attachment #3: Type: text/plain, Size: 618 bytes --] Could you give it a try and refine as needed? :-) > I asked about this on #debian-reproducible and they said that it wasn't > an issue for Debian since they don't ship bytecode, but instead generate > it at install time. Of course, that doesn't really apply to Guix. I’d recommend trying #reproducible-builds on OFTC, which is more generic. Also, in some cases, it’s useful to look at <git://git.debian.org/git/reproducible/notes.git>, which contains notes about non-reproducible packages (currently partly Debian-specific, but we need to lobby to make it more generic. ;-)) Thanks, Ludo’. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Non-determinism in python-3 ".pyc" bytecode 2016-02-02 20:41 ` Ludovic Courtès @ 2016-02-04 23:17 ` Leo Famulari 2016-03-29 23:11 ` Cyril Roelandt 2016-03-29 23:13 ` Cyril Roelandt 0 siblings, 2 replies; 29+ messages in thread From: Leo Famulari @ 2016-02-04 23:17 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 1096 bytes --] On Tue, Feb 02, 2016 at 09:41:19PM +0100, Ludovic Courtès wrote: > Could you give it a try and refine as needed? :-) I altered your example as shown in the attached patch. It causes some tests related to timestamps to fail, so I disabled them in a very crude way. The final patch should address those tests more carefully. But, the patch doesn't seem to have the desired effect so I'm asking for help! Here is how I tested the patch: I build python-3 with it, and then `export SOURCE_DATE_EPOCH=1` and enter the resulting Python shell. I manually define the '_w_long' function used by the patched function. Then: print (_w_long(locale.atoi(os.getenv('SOURCE_DATE_EPOCH')))) b'\x01\x00\x00\x00' But, when I leave the Python shell and issue `python3 -m compileall helloworld.py`, the timestamps are present in the compiled bytecode. I can watch the clock "tick" by doing this repeatedly: $ touch helloworld.py && rm -r __pycache__ && \ python3 -m compileall helloworld.py && \ hexdump __pycache__/helloworld.cpython-34.pyc | head -n1 I'm not much of a Python programmer, so I'm stumped. [-- Attachment #2: 0001-SOURCE_DATE_EPOCH.patch --] [-- Type: text/x-diff, Size: 3447 bytes --] From d34a71e4ec4501cb53acd3e15633bc1a05665be9 Mon Sep 17 00:00:00 2001 Message-Id: <d34a71e4ec4501cb53acd3e15633bc1a05665be9.1454625404.git.leo@famulari.name> From: Leo Famulari <leo@famulari.name> Date: Wed, 3 Feb 2016 20:44:02 -0500 Subject: [PATCH 1/1] SOURCE_DATE_EPOCH --- .../patches/python-3.4.3-source-date-epoch.patch | 21 +++++++++++++++++++++ gnu/packages/python.scm | 14 +++++++++++++- 2 files changed, 34 insertions(+), 1 deletion(-) create mode 100644 gnu/packages/patches/python-3.4.3-source-date-epoch.patch diff --git a/gnu/packages/patches/python-3.4.3-source-date-epoch.patch b/gnu/packages/patches/python-3.4.3-source-date-epoch.patch new file mode 100644 index 0000000..403b2df --- /dev/null +++ b/gnu/packages/patches/python-3.4.3-source-date-epoch.patch @@ -0,0 +1,21 @@ +diff --git a/Lib/importlib/_bootstrap.py b/Lib/importlib/_bootstrap.py +index 5b91c05..a87d178 100644 +--- Lib/importlib/_bootstrap.py ++++ Lib/importlib/_bootstrap.py +@@ -666,8 +666,15 @@ def _compile_bytecode(data, name=None, bytecode_path=None, source_path=None): + def _code_to_bytecode(code, mtime=0, source_size=0): + """Compile a code object into bytecode for writing out to a byte-compiled + file.""" ++ """os and locale are required for the SOURCE_DATE_EPOCH ++ deterministic timestamp conditional.""" ++ import os ++ import locale + data = bytearray(MAGIC_NUMBER) +- data.extend(_w_long(mtime)) ++ if os.getenv('SOURCE_DATE_EPOCH'): ++ data.extend(_w_long(locale.atoi(os.getenv('SOURCE_DATE_EPOCH')))) ++ else: ++ data.extend(_w_long(mtime)) + data.extend(_w_long(source_size)) + data.extend(marshal.dumps(code)) + return data diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm index 48f65b5..cd366f5 100644 --- a/gnu/packages/python.scm +++ b/gnu/packages/python.scm @@ -173,6 +173,17 @@ ;; gnu-build-system.scm. (setenv "SOURCE_DATE_EPOCH" "1") #t)) + (add-before 'configure 'disable-timestamp-tests + (lambda _ + ;; Filter for existing files, since this only affects + ;; Python-3 if the SOURCE_DATE_EPOCH patch is applied. + (substitute* (filter file-exists? + '("Lib/test/test_importlib/test_abc.py")) + (("test_code_bad_timestamp") "disable_test_code_bad_timestamp")) + (substitute* (filter file-exists? + '("Lib/test/test_importlib/source/test_file_loader.py")) + (("test_old_timestamp") "disable_test_old_timestamp")) + )) (add-before 'configure 'do-not-record-configure-flags (lambda* (#:key configure-flags #:allow-other-keys) ;; Remove configure flags from the installed '_sysconfigdata.py' @@ -268,7 +279,8 @@ data types.") ;; XXX Try removing this patch for python > 3.4.3 "python-disable-ssl-test.patch" "python-3-deterministic-build-info.patch" - "python-3-search-paths.patch"))) + "python-3-search-paths.patch" + "python-3.4.3-source-date-epoch.patch"))) (patch-flags '("-p0")) (sha256 (base32 -- 2.6.3 ^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#22533: Non-determinism in python-3 ".pyc" bytecode 2016-02-04 23:17 ` Leo Famulari @ 2016-03-29 23:11 ` Cyril Roelandt 2016-03-29 23:13 ` Cyril Roelandt 1 sibling, 0 replies; 29+ messages in thread From: Cyril Roelandt @ 2016-03-29 23:11 UTC (permalink / raw) To: 22533 [-- Attachment #1: Type: text/plain, Size: 209 bytes --] Here is a version of the patch that works with the upstream Python, but that I cannot get to work with our Guix recipe. Could you test it and tell me what you think? I intend to push this to CPython. Cyril. [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: upstream.patch --] [-- Type: text/x-diff; name="upstream.patch", Size: 1213 bytes --] diff --git a/Lib/importlib/_bootstrap.py b/Lib/importlib/_bootstrap.py index c4ee41a..d9885c9 100644 --- Lib/importlib/_bootstrap.py +++ Lib/importlib/_bootstrap.py @@ -1443,7 +1443,8 @@ class SourceLoader(_LoaderBasics): Implementing this method allows the loader to read bytecode files. Raises IOError when the path cannot be handled. """ - return {'mtime': self.path_mtime(path)} + return {'mtime': float(_os.environ.get(b'SOURCE_DATE_EPOCH', + st.st_mtime))} def _cache_bytecode(self, source_path, cache_path, data): """Optional method which writes data (bytes) to a file path (a str). @@ -1580,7 +1581,10 @@ class SourceFileLoader(FileLoader, SourceLoader): def path_stats(self, path): """Return the metadata for the path.""" st = _path_stat(path) - return {'mtime': st.st_mtime, 'size': st.st_size} + return { + 'mtime': float(_os.environ.get(b'SOURCE_DATE_EPOCH', st.st_mtime)), + 'size': st.st_size + } def _cache_bytecode(self, source_path, bytecode_path, data): # Adapt between the two APIs ^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#22533: Non-determinism in python-3 ".pyc" bytecode 2016-02-04 23:17 ` Leo Famulari 2016-03-29 23:11 ` Cyril Roelandt @ 2016-03-29 23:13 ` Cyril Roelandt 2016-04-06 8:29 ` Ludovic Courtès 1 sibling, 1 reply; 29+ messages in thread From: Cyril Roelandt @ 2016-03-29 23:13 UTC (permalink / raw) To: Leo Famulari, Ludovic Courtès; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 209 bytes --] Here is a version of the patch that works with the upstream Python, but that I cannot get to work with our Guix recipe. Could you test it and tell me what you think? I intend to push this to CPython. Cyril. [-- Attachment #2: upstream.patch --] [-- Type: text/x-diff, Size: 1187 bytes --] diff --git a/Lib/importlib/_bootstrap.py b/Lib/importlib/_bootstrap.py index c4ee41a..d9885c9 100644 --- Lib/importlib/_bootstrap.py +++ Lib/importlib/_bootstrap.py @@ -1443,7 +1443,8 @@ class SourceLoader(_LoaderBasics): Implementing this method allows the loader to read bytecode files. Raises IOError when the path cannot be handled. """ - return {'mtime': self.path_mtime(path)} + return {'mtime': float(_os.environ.get(b'SOURCE_DATE_EPOCH', + st.st_mtime))} def _cache_bytecode(self, source_path, cache_path, data): """Optional method which writes data (bytes) to a file path (a str). @@ -1580,7 +1581,10 @@ class SourceFileLoader(FileLoader, SourceLoader): def path_stats(self, path): """Return the metadata for the path.""" st = _path_stat(path) - return {'mtime': st.st_mtime, 'size': st.st_size} + return { + 'mtime': float(_os.environ.get(b'SOURCE_DATE_EPOCH', st.st_mtime)), + 'size': st.st_size + } def _cache_bytecode(self, source_path, bytecode_path, data): # Adapt between the two APIs ^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#22533: Non-determinism in python-3 ".pyc" bytecode 2016-03-29 23:13 ` Cyril Roelandt @ 2016-04-06 8:29 ` Ludovic Courtès 0 siblings, 0 replies; 29+ messages in thread From: Ludovic Courtès @ 2016-04-06 8:29 UTC (permalink / raw) To: Cyril Roelandt; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 255 bytes --] Cyril Roelandt <tipecaml@gmail.com> skribis: > Here is a version of the patch that works with the upstream Python, but > that I cannot get to work with our Guix recipe. At first sight the patch LGTM. How does it not work for you? :-) I applied this: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: Type: text/x-patch, Size: 1557 bytes --] diff --git a/gnu/packages/patches/python-3-deterministic-build-info.patch b/gnu/packages/patches/python-3-deterministic-build-info.patch index 22c372a..bdf9f20 100644 --- a/gnu/packages/patches/python-3-deterministic-build-info.patch +++ b/gnu/packages/patches/python-3-deterministic-build-info.patch @@ -15,3 +15,28 @@ We cannot pass it in CPPFLAGS due to whitespace in the DATE string. #ifndef DATE #ifdef __DATE__ #define DATE __DATE__ + +--- Lib/importlib/_bootstrap.py ++++ Lib/importlib/_bootstrap.py +@@ -1443,7 +1443,8 @@ class SourceLoader(_LoaderBasics): + Implementing this method allows the loader to read bytecode files. + Raises IOError when the path cannot be handled. + """ +- return {'mtime': self.path_mtime(path)} ++ return {'mtime': float(_os.environ.get(b'SOURCE_DATE_EPOCH', ++ st.st_mtime))} + + def _cache_bytecode(self, source_path, cache_path, data): + """Optional method which writes data (bytes) to a file path (a str). +@@ -1580,7 +1581,10 @@ class SourceFileLoader(FileLoader, SourceLoader): + def path_stats(self, path): + """Return the metadata for the path.""" + st = _path_stat(path) +- return {'mtime': st.st_mtime, 'size': st.st_size} ++ return { ++ 'mtime': float(_os.environ.get(b'SOURCE_DATE_EPOCH', st.st_mtime)), ++ 'size': st.st_size ++ } + + def _cache_bytecode(self, source_path, bytecode_path, data): + # Adapt between the two APIs [-- Attachment #3: Type: text/plain, Size: 7418 bytes --] … and that leads to these test failures: --8<---------------cut here---------------start------------->8--- $ ./pre-inst-env guix build python@3 --rounds=2 -K [...] ====================================================================== FAIL: test_bad_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 452, in test_bad_marshal self._test_bad_marshal() File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 342, in _test_bad_marshal self.import_(file_path, '_temp') AssertionError: EOFError not raised ====================================================================== FAIL: test_no_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 441, in test_no_marshal self._test_no_marshal() File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 322, in _test_no_marshal self.import_(file_path, '_temp') AssertionError: EOFError not raised ====================================================================== FAIL: test_non_code_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 445, in test_non_code_marshal self._test_non_code_marshal() File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 331, in _test_non_code_marshal self.import_(file_path, '_temp') AssertionError: ImportError not raised ====================================================================== FAIL: test_old_timestamp (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP302) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 471, in test_old_timestamp self.assertEqual(bytecode_file.read(4), source_timestamp) AssertionError: b'\x01\x00\x00\x00' != b'\x7f\xc7\x04W' ====================================================================== FAIL: test_bad_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 452, in test_bad_marshal self._test_bad_marshal() File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 342, in _test_bad_marshal self.import_(file_path, '_temp') AssertionError: EOFError not raised ====================================================================== FAIL: test_no_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 441, in test_no_marshal self._test_no_marshal() File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 322, in _test_no_marshal self.import_(file_path, '_temp') AssertionError: EOFError not raised ====================================================================== FAIL: test_non_code_marshal (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 445, in test_non_code_marshal self._test_non_code_marshal() File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 331, in _test_non_code_marshal self.import_(file_path, '_temp') AssertionError: ImportError not raised ====================================================================== FAIL: test_old_timestamp (test.test_importlib.source.test_file_loader.Source_SourceLoaderBadBytecodeTestPEP451) ---------------------------------------------------------------------- Traceback (most recent call last): File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/util.py", line 22, in wrapper to_return = fxn(*args, **kwargs) File "/tmp/guix-build-python-minimal-3.4.3.drv-0/Python-3.4.3/Lib/test/test_importlib/source/test_file_loader.py", line 471, in test_old_timestamp self.assertEqual(bytecode_file.read(4), source_timestamp) AssertionError: b'\x01\x00\x00\x00' != b'\x7f\xc7\x04W' ---------------------------------------------------------------------- Ran 951 tests in 1.102s FAILED (failures=8, skipped=19, expected failures=1) Makefile:958: recipe for target 'test' failed --8<---------------cut here---------------end--------------->8--- ‘test_old_timestamp’ clearly needs to be adjusted to account for the change. The others have to do with the bytecode loader, so it’s probably a similar story. Could you look into it? Perhaps you tested with SOURCE_DATE_EPOCH unset? Thanks for working on this, it’s an important bug to fix! Ludo’. ^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2016-02-02 5:15 bug#22533: Non-determinism in python-3 ".pyc" bytecode Leo Famulari 2016-02-02 8:54 ` Leo Famulari 2016-02-02 20:41 ` Ludovic Courtès @ 2017-05-26 13:41 ` Marius Bakke 2018-03-03 22:37 ` Ricardo Wurmus 2 siblings, 1 reply; 29+ messages in thread From: Marius Bakke @ 2017-05-26 13:41 UTC (permalink / raw) To: 22533 [-- Attachment #1: Type: text/plain, Size: 476 bytes --] Hello! I stumbled across this bug after re-discovering that Python bytecode is not reproducible (through "glib"). Just sharing some notes.. Nix recently made an effort to fix this. AFAICT the ".pyc" files are still a problem, but at least they got the interpreters building reproducibly: https://github.com/NixOS/nixpkgs/issues/22570 https://github.com/NixOS/nixpkgs/pull/22585 It would be great to revive this longstanding bug! *walks away slowly before anyone notices* [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2017-05-26 13:41 ` bug#22533: Python bytecode reproducibility Marius Bakke @ 2018-03-03 22:37 ` Ricardo Wurmus 2018-03-04 9:21 ` Gábor Boskovits 2018-03-05 9:25 ` Ludovic Courtès 0 siblings, 2 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-03 22:37 UTC (permalink / raw) To: Marius Bakke; +Cc: 22533 Hi Guix, Marius Bakke <mbakke@fastmail.com> writes: > It would be great to revive this longstanding bug! Indeed. Here’s another attempt. As far as I understand, the timestamp in the pyc files only affects the header. Up until Python 3.6 (incl) the header looks like this: magic | timestamp | size Since Python 3.7 the header may either contain a timestamp or a hash: magic | 00000000000000000000000000000000 | timestamp | size magic | 00000000000000000000000000000001 | hash | size This means we likely won’t have this problem any more with Python 3.7. For Python 3.6 I guess we could add a final build phase that overwrites the timestamp in the *binary*. This needs to happen before any of the compiled files are wrapped up in a wheel. Should we just wait for Python 3.7 which is expected to be released in June 2018? We’d still have to deal with this problem in Python 2, though. Is it a bad idea to override the timestamps in the generated binaries? I think that we could avoid the recency check then, which was an obstacle to resetting the timestamps of the source files. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-03 22:37 ` Ricardo Wurmus @ 2018-03-04 9:21 ` Gábor Boskovits 2018-03-04 12:46 ` Ricardo Wurmus 2018-03-05 9:25 ` Ludovic Courtès 1 sibling, 1 reply; 29+ messages in thread From: Gábor Boskovits @ 2018-03-04 9:21 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 1516 bytes --] 2018-03-03 23:37 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>: > Hi Guix, > > Marius Bakke <mbakke@fastmail.com> writes: > > > It would be great to revive this longstanding bug! > > Indeed. > > Here’s another attempt. As far as I understand, the timestamp in the > pyc files only affects the header. > > Up until Python 3.6 (incl) the header looks like this: > > magic | timestamp | size > > Since Python 3.7 the header may either contain a timestamp or a hash: > > magic | 00000000000000000000000000000000 | timestamp | size > magic | 00000000000000000000000000000001 | hash | size > > This means we likely won’t have this problem any more with Python 3.7. > For Python 3.6 I guess we could add a final build phase that overwrites > the timestamp in the *binary*. This needs to happen before any of the > compiled files are wrapped up in a wheel. > > Should we just wait for Python 3.7 which is expected to be released in > June 2018? We’d still have to deal with this problem in Python 2, > though. > > Is it a bad idea to override the timestamps in the generated binaries? > I think that we could avoid the recency check then, which was an > obstacle to resetting the timestamps of the source files. -- > Ricardo > > GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC > https://elephly.net > > Nix had this issue, it seems they have a python 3.5 solution, which should be easy to adopt: https://github.com/NixOS/nixpkgs/issues/22570. WDYT? [-- Attachment #2: Type: text/html, Size: 2278 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-04 9:21 ` Gábor Boskovits @ 2018-03-04 12:46 ` Ricardo Wurmus 2018-03-04 15:30 ` Gábor Boskovits 2018-03-04 19:18 ` Ricardo Wurmus 0 siblings, 2 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-04 12:46 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 Hi Gábor, > Nix had this issue, it seems they have a python 3.5 solution, which > should be easy to adopt: https://github.com/NixOS/nixpkgs/issues/22570. > WDYT? Here’s the patch for Nix: https://patch-diff.githubusercontent.com/raw/NixOS/nixpkgs/pull/22585.diff Here are the relevant changes to the Python packages: * Python 3.4 substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" substituteInPlace "Lib/importlib/_bootstrap.py" --replace "source_mtime = int(source_stats['mtime'])" "source_mtime = 1" * Python 3.5 substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace "source_mtime = int(st['mtime'])" "source_mtime = 1" * Python 3.6 substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace "source_mtime = int(st['mtime'])" "source_mtime = 1" For all packages they set these environment variables: - set PYTHONHASHSEED=0 (for hashes of str, bytes and datetime objects) - set DETERMINISTIC_BUILD; for conditional patching of the timestamp for package builds. The timestamp is not patched in ad-hoc environments, because that would mess with Python’s ability to determine whether to compile source files. They also rebuild all bytecode (with the exception of lib2to3 because it is Python 2 code) three times, once for each optimization level. --8<---------------cut here---------------start------------->8--- + # Determinism: rebuild all bytecode + # We exclude lib2to3 because that's Python 2 code which fails + # We rebuild three times, once for each optimization level + find $out -name "*.py" | $out/bin/python -m compileall -q -f -x "lib2to3" -i - + find $out -name "*.py" | $out/bin/python -O -m compileall -q -f -x "lib2to3" -i - + find $out -name "*.py" | $out/bin/python -OO -m compileall -q -f -x "lib2to3" -i - --8<---------------cut here---------------end--------------->8--- -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-04 12:46 ` Ricardo Wurmus @ 2018-03-04 15:30 ` Gábor Boskovits 2018-03-04 19:18 ` Ricardo Wurmus 1 sibling, 0 replies; 29+ messages in thread From: Gábor Boskovits @ 2018-03-04 15:30 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 3002 bytes --] 2018-03-04 13:46 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>: > > Hi Gábor, > > > Nix had this issue, it seems they have a python 3.5 solution, which > > should be easy to adopt: https://github.com/NixOS/nixpkgs/issues/22570. > > WDYT? > > Here’s the patch for Nix: > > https://patch-diff.githubusercontent.com/raw/ > NixOS/nixpkgs/pull/22585.diff > > Here are the relevant changes to the Python packages: > > * Python 3.4 > > substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" > "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" > substituteInPlace "Lib/importlib/_bootstrap.py" --replace "source_mtime > = int(source_stats['mtime'])" "source_mtime = 1" > > * Python 3.5 > > substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" > "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" > substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace > "source_mtime = int(st['mtime'])" "source_mtime = 1" > > * Python 3.6 > substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" > "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" > substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace > "source_mtime = int(st['mtime'])" "source_mtime = 1" > > > Nice, thanks for the summary. Can we adopt this as is? Do we need the 3.4 and 3.5 fix or the 3.6 one is enough? > For all packages they set these environment variables: > > - set PYTHONHASHSEED=0 (for hashes of str, bytes and datetime objects) > > - set DETERMINISTIC_BUILD; for conditional patching of the timestamp > for package builds. The timestamp is not patched in ad-hoc > environments, because that would mess with Python’s ability to > determine whether to compile source files. > > Should we set these in python-build-system? What about python booststrap? I guess we use gnu-build-system there, so bootstrap packages might need to set these explicitly? > They also rebuild all bytecode (with the exception of lib2to3 because it > is Python 2 code) three times, once for each optimization level. > > --8<---------------cut here---------------start------------->8--- > + # Determinism: rebuild all bytecode > + # We exclude lib2to3 because that's Python 2 code which fails > + # We rebuild three times, once for each optimization level > + find $out -name "*.py" | $out/bin/python -m compileall -q -f -x > "lib2to3" -i - > + find $out -name "*.py" | $out/bin/python -O -m compileall -q -f -x > "lib2to3" -i - > + find $out -name "*.py" | $out/bin/python -OO -m compileall -q -f -x > "lib2to3" -i - > --8<---------------cut here---------------end--------------->8--- > > Do we also have to do this, or should we settle with one optimization level? Which one? > -- > Ricardo > > GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC > https://elephly.net > > > [-- Attachment #2: Type: text/html, Size: 4686 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-04 12:46 ` Ricardo Wurmus 2018-03-04 15:30 ` Gábor Boskovits @ 2018-03-04 19:18 ` Ricardo Wurmus 2018-03-05 0:02 ` Ricardo Wurmus ` (2 more replies) 1 sibling, 3 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-04 19:18 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 36 bytes --] I have applied this patch locally: [-- Warning: decoded text below may be mangled, UTF-8 assumed --] [-- Attachment #2: 1.diff --] [-- Type: text/x-patch, Size: 2279 bytes --] diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm index 5f701701a..0d1ecc3c6 100644 --- a/gnu/packages/python.scm +++ b/gnu/packages/python.scm @@ -359,8 +359,42 @@ data types.") "Lib/ctypes/test/test_win32.py" ; fails on aarch64 "Lib/test/test_fcntl.py")) ; fails on aarch64 #t)))) - (arguments (substitute-keyword-arguments (package-arguments python-2) - ((#:tests? _) #t))) + (arguments + (substitute-keyword-arguments (package-arguments python-2) + ((#:tests? _) #t) + ((#:phases phases) + `(modify-phases ,phases + (add-after 'unpack 'patch-timestamp-for-pyc-files + (lambda _ + ;; We set DETERMINISTIC_BUILD to only override the mtime when + ;; building with Guix, lest we break auto-compilation in + ;; environments. + (setenv "DETERMINISTIC_BUILD" "1") + (substitute* "Lib/py_compile.py" + (("source_stats\\['mtime'\\]") + "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])")) + + ;; Use deterministic hashes for strings, bytes, and datetime + ;; objects. + (setenv "PYTHONHASHSEED" "0") + + ;; Reset mtime when validating bytecode header. + (substitute* "Lib/importlib/_bootstrap_external.py" + (("source_mtime = int\\(source_stats\\['mtime'\\]\\)") + "source_mtime = 1")) + #t)) + (add-after 'unpack 'disable-timestamp-tests + (lambda _ + (substitute* "Lib/test/test_importlib/source/test_file_loader.py" + (("test_bad_marshal") + "disable_test_bad_marshal") + (("test_no_marshal") + "disable_test_no_marshal") + (("test_non_code_marshal") + "disable_test_non_code_marshal")) + #t)) + (add-before 'check 'allow-non-deterministic-compilation + (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t)))))) (native-search-paths (list (search-path-specification (variable "PYTHONPATH") [-- Attachment #3: Type: text/plain, Size: 389 bytes --] It allows me to build python-six and python-sip reproducibly. It does not fix problems with Python 2, and I haven’t yet tested if it causes any new problems. It’s a little worrying that I had to disable three more tests that I think shouldn’t have failed. What do you think? -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply related [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-04 19:18 ` Ricardo Wurmus @ 2018-03-05 0:02 ` Ricardo Wurmus 2018-03-05 0:05 ` Ricardo Wurmus 2018-03-05 22:06 ` Ricardo Wurmus 2018-03-05 23:21 ` Marius Bakke 2018-03-08 10:39 ` Gábor Boskovits 2 siblings, 2 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-05 0:02 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 Ricardo Wurmus <rekado@elephly.net> writes: > I have applied this patch locally: > > diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm > index 5f701701a..0d1ecc3c6 100644 > --- a/gnu/packages/python.scm > +++ b/gnu/packages/python.scm > @@ -359,8 +359,42 @@ data types.") > "Lib/ctypes/test/test_win32.py" ; fails on aarch64 > "Lib/test/test_fcntl.py")) ; fails on aarch64 > #t)))) > - (arguments (substitute-keyword-arguments (package-arguments python-2) > - ((#:tests? _) #t))) > + (arguments > + (substitute-keyword-arguments (package-arguments python-2) > + ((#:tests? _) #t) > + ((#:phases phases) > + `(modify-phases ,phases > + (add-after 'unpack 'patch-timestamp-for-pyc-files > + (lambda _ > + ;; We set DETERMINISTIC_BUILD to only override the mtime when > + ;; building with Guix, lest we break auto-compilation in > + ;; environments. > + (setenv "DETERMINISTIC_BUILD" "1") > + (substitute* "Lib/py_compile.py" > + (("source_stats\\['mtime'\\]") > + "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])")) > + > + ;; Use deterministic hashes for strings, bytes, and datetime > + ;; objects. > + (setenv "PYTHONHASHSEED" "0") > + > + ;; Reset mtime when validating bytecode header. > + (substitute* "Lib/importlib/_bootstrap_external.py" > + (("source_mtime = int\\(source_stats\\['mtime'\\]\\)") > + "source_mtime = 1")) > + #t)) > + (add-after 'unpack 'disable-timestamp-tests > + (lambda _ > + (substitute* "Lib/test/test_importlib/source/test_file_loader.py" > + (("test_bad_marshal") > + "disable_test_bad_marshal") > + (("test_no_marshal") > + "disable_test_no_marshal") > + (("test_non_code_marshal") > + "disable_test_non_code_marshal")) > + #t)) > + (add-before 'check 'allow-non-deterministic-compilation > + (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t)))))) > (native-search-paths > (list (search-path-specification > (variable "PYTHONPATH") > > It allows me to build python-six and python-sip reproducibly. It does > not fix problems with Python 2, and I haven’t yet tested if it causes > any new problems. I tested importing modules in an ad-hoc environment — no problems. Unfortunately, this doesn’t fix all reproducibility problems with numpy: --8<---------------cut here---------------start------------->8--- Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc differ Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc differ Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc differ Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc differ --8<---------------cut here---------------end--------------->8--- But the successes with simpler Python packages are promising. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-05 0:02 ` Ricardo Wurmus @ 2018-03-05 0:05 ` Ricardo Wurmus 2018-03-05 15:36 ` Gábor Boskovits 2018-03-05 22:02 ` Ricardo Wurmus 2018-03-05 22:06 ` Ricardo Wurmus 1 sibling, 2 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-05 0:05 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 Ricardo Wurmus <rekado@elephly.net> writes: > Unfortunately, this doesn’t fix all reproducibility problems with numpy: > > --8<---------------cut here---------------start------------->8--- > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc differ > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc differ > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc differ > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc differ > --8<---------------cut here---------------end--------------->8--- Here’s what diffoscope says: --8<---------------cut here---------------start------------->8--- diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc @@ -1,8 +1,8 @@ -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000 3......Z&....... +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000 3......Z&....... 00000010: 0000 0000 0000 0000 0001 0000 0040 0000 .............@.. 00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400 .s ...d.Z.d.Z.d. 00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502 Z.d.Z.d.Z.e.s.e. 00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30 Z.d.S.).z.1.14.0 00000050: da28 3639 3134 6262 3431 6630 6662 3363 .(6914bb41f0fb3c 00000060: 3162 6135 3030 6261 6534 6537 6436 3731 1ba500bae4e7d671 00000070: 6461 3935 3336 3738 3666 544e 2905 da0d da9536786fTN)... --8<---------------cut here---------------end--------------->8--- In other words: this is the timestamp field of the pyc file. Maybe this can be avoided by setting DETERMINISTIC_BUILD in the python-build-system? -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-05 0:05 ` Ricardo Wurmus @ 2018-03-05 15:36 ` Gábor Boskovits 2018-03-05 20:33 ` Gábor Boskovits 2018-03-05 22:02 ` Ricardo Wurmus 1 sibling, 1 reply; 29+ messages in thread From: Gábor Boskovits @ 2018-03-05 15:36 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 4143 bytes --] 2018-03-05 1:05 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>: > > Ricardo Wurmus <rekado@elephly.net> writes: > > > Unfortunately, this doesn’t fix all reproducibility problems with numpy: > > > > --8<---------------cut here---------------start------------->8--- > > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw > cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/ > numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/ > kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/ > python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc > differ > > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw > cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/ > numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/ > kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/ > python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc > differ > > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw > cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/ > numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/ > kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/ > python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc > differ > > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw > cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/ > numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/ > kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/ > python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ > > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw > cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/ > numpy/__pycache__/version.cpython-36.pyc and /gnu/store/ > kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/ > python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ > > Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw > cc-python-numpy-1.14.0-check/lib/python3.6/site-packages/ > numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/ > kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/ > python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc > differ > > --8<---------------cut here---------------end--------------->8--- > > Here’s what diffoscope says: > > --8<---------------cut here---------------start------------->8--- > diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dw > cc-python-numpy-1.14.0{-check,}/lib/python3.6/site-packages/ > numpy/__pycache__/version.cpython-36.pyc > --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/ > lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc > +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/ > python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc > @@ -1,8 +1,8 @@ > -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000 3......Z&....... > +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000 3......Z&....... > 00000010: 0000 0000 0000 0000 0001 0000 0040 0000 .............@.. > 00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400 .s ...d.Z.d.Z.d. > 00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502 Z.d.Z.d.Z.e.s.e. > 00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30 Z.d.S.).z.1.14.0 > 00000050: da28 3639 3134 6262 3431 6630 6662 3363 .(6914bb41f0fb3c > 00000060: 3162 6135 3030 6261 6534 6537 6436 3731 1ba500bae4e7d671 > 00000070: 6461 3935 3336 3738 3666 544e 2905 da0d da9536786fTN)... > --8<---------------cut here---------------end--------------->8--- > > In other words: this is the timestamp field of the pyc file. > > Maybe this can be avoided by setting DETERMINISTIC_BUILD in the > python-build-system? > > It seems that the deterministic build patch already landed upstream https://github.com/python/cpython/pull/5200, so we might consider applying the upstream patches. WDYT? > -- > Ricardo > > GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC > https://elephly.net > > > [-- Attachment #2: Type: text/html, Size: 5401 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-05 15:36 ` Gábor Boskovits @ 2018-03-05 20:33 ` Gábor Boskovits 2018-03-05 21:46 ` Ricardo Wurmus 0 siblings, 1 reply; 29+ messages in thread From: Gábor Boskovits @ 2018-03-05 20:33 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 4476 bytes --] 2018-03-05 16:36 GMT+01:00 Gábor Boskovits <boskovits@gmail.com>: > 2018-03-05 1:05 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>: > >> >> Ricardo Wurmus <rekado@elephly.net> writes: >> >> > Unfortunately, this doesn’t fix all reproducibility problems with numpy: >> > >> > --8<---------------cut here---------------start------------->8--- >> > Binary files /gnu/store/kd06ql8fynlydymzhhn >> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >> packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and >> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0/lib/python3.6/site-packages/numpy/distutils/__ >> pycache__/__config__.cpython-36.pyc differ >> > Binary files /gnu/store/kd06ql8fynlydymzhhn >> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >> packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and >> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0/lib/python3.6/site-packages/numpy/distutils/__ >> pycache__/exec_command.cpython-36.pyc differ >> > Binary files /gnu/store/kd06ql8fynlydymzhhn >> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >> packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and >> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0/lib/python3.6/site-packages/numpy/distutils/__ >> pycache__/system_info.cpython-36.pyc differ >> > Binary files /gnu/store/kd06ql8fynlydymzhhn >> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >> packages/numpy/__pycache__/__config__.cpython-36.pyc and >> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc >> differ >> > Binary files /gnu/store/kd06ql8fynlydymzhhn >> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >> packages/numpy/__pycache__/version.cpython-36.pyc and >> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc >> differ >> > Binary files /gnu/store/kd06ql8fynlydymzhhn >> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >> packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and >> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0/lib/python3.6/site-packages/numpy/testing/nose_ >> tools/__pycache__/utils.cpython-36.pyc differ >> > --8<---------------cut here---------------end--------------->8--- >> >> Here’s what diffoscope says: >> >> --8<---------------cut here---------------start------------->8--- >> diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache_ >> _/version.cpython-36.pyc >> --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0-check/lib/python3.6/site-packages/numpy/__pycache__/ >> version.cpython-36.pyc >> +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc >> @@ -1,8 +1,8 @@ >> -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000 3......Z&....... >> +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000 3......Z&....... >> 00000010: 0000 0000 0000 0000 0001 0000 0040 0000 .............@.. >> 00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400 .s ...d.Z.d.Z.d. >> 00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502 Z.d.Z.d.Z.e.s.e. >> 00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30 Z.d.S.).z.1.14.0 >> 00000050: da28 3639 3134 6262 3431 6630 6662 3363 .(6914bb41f0fb3c >> 00000060: 3162 6135 3030 6261 6534 6537 6436 3731 1ba500bae4e7d671 >> 00000070: 6461 3935 3336 3738 3666 544e 2905 da0d da9536786fTN)... >> --8<---------------cut here---------------end--------------->8--- >> >> In other words: this is the timestamp field of the pyc file. >> >> Maybe this can be avoided by setting DETERMINISTIC_BUILD in the >> python-build-system? >> >> > It seems that the deterministic build patch already landed upstream > https://github.com/python/cpython/pull/5200, so we might consider > applying the upstream patches. WDYT? > And also this: https://github.com/python/cpython/pull/4575. I'm now having a look at this approach. However this second one seems quite invasive... > > >> -- >> Ricardo >> >> GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC >> https://elephly.net >> >> >> > [-- Attachment #2: Type: text/html, Size: 6509 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-05 20:33 ` Gábor Boskovits @ 2018-03-05 21:46 ` Ricardo Wurmus 0 siblings, 0 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-05 21:46 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 Gábor Boskovits <boskovits@gmail.com> writes: > 2018-03-05 16:36 GMT+01:00 Gábor Boskovits <boskovits@gmail.com>: > >> 2018-03-05 1:05 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>: >> >>> >>> Ricardo Wurmus <rekado@elephly.net> writes: >>> >>> > Unfortunately, this doesn’t fix all reproducibility problems with numpy: >>> > >>> > --8<---------------cut here---------------start------------->8--- >>> > Binary files /gnu/store/kd06ql8fynlydymzhhn >>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >>> packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and >>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0/lib/python3.6/site-packages/numpy/distutils/__ >>> pycache__/__config__.cpython-36.pyc differ >>> > Binary files /gnu/store/kd06ql8fynlydymzhhn >>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >>> packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and >>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0/lib/python3.6/site-packages/numpy/distutils/__ >>> pycache__/exec_command.cpython-36.pyc differ >>> > Binary files /gnu/store/kd06ql8fynlydymzhhn >>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >>> packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and >>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0/lib/python3.6/site-packages/numpy/distutils/__ >>> pycache__/system_info.cpython-36.pyc differ >>> > Binary files /gnu/store/kd06ql8fynlydymzhhn >>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >>> packages/numpy/__pycache__/__config__.cpython-36.pyc and >>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc >>> differ >>> > Binary files /gnu/store/kd06ql8fynlydymzhhn >>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >>> packages/numpy/__pycache__/version.cpython-36.pyc and >>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc >>> differ >>> > Binary files /gnu/store/kd06ql8fynlydymzhhn >>> wk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site- >>> packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and >>> /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0/lib/python3.6/site-packages/numpy/testing/nose_ >>> tools/__pycache__/utils.cpython-36.pyc differ >>> > --8<---------------cut here---------------end--------------->8--- >>> >>> Here’s what diffoscope says: >>> >>> --8<---------------cut here---------------start------------->8--- >>> diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache_ >>> _/version.cpython-36.pyc >>> --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0-check/lib/python3.6/site-packages/numpy/__pycache__/ >>> version.cpython-36.pyc >>> +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1. >>> 14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc >>> @@ -1,8 +1,8 @@ >>> -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000 3......Z&....... >>> +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000 3......Z&....... >>> 00000010: 0000 0000 0000 0000 0001 0000 0040 0000 .............@.. >>> 00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400 .s ...d.Z.d.Z.d. >>> 00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502 Z.d.Z.d.Z.e.s.e. >>> 00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30 Z.d.S.).z.1.14.0 >>> 00000050: da28 3639 3134 6262 3431 6630 6662 3363 .(6914bb41f0fb3c >>> 00000060: 3162 6135 3030 6261 6534 6537 6436 3731 1ba500bae4e7d671 >>> 00000070: 6461 3935 3336 3738 3666 544e 2905 da0d da9536786fTN)... >>> --8<---------------cut here---------------end--------------->8--- >>> >>> In other words: this is the timestamp field of the pyc file. >>> >>> Maybe this can be avoided by setting DETERMINISTIC_BUILD in the >>> python-build-system? >>> >>> >> It seems that the deterministic build patch already landed upstream >> https://github.com/python/cpython/pull/5200, so we might consider >> applying the upstream patches. WDYT? >> > > And also this: https://github.com/python/cpython/pull/4575. > I'm now having a look at this approach. However this second one > seems quite invasive... These patches are for what will become Python 3.7. Python 3.6 does not have support for “invalidation_mode”, so at least the first patch would not work for us. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-05 0:05 ` Ricardo Wurmus 2018-03-05 15:36 ` Gábor Boskovits @ 2018-03-05 22:02 ` Ricardo Wurmus 1 sibling, 0 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-05 22:02 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 Ricardo Wurmus <rekado@elephly.net> writes: > Ricardo Wurmus <rekado@elephly.net> writes: > >> Unfortunately, this doesn’t fix all reproducibility problems with numpy: >> >> --8<---------------cut here---------------start------------->8--- >> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/__config__.cpython-36.pyc differ >> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/exec_command.cpython-36.pyc differ >> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/distutils/__pycache__/system_info.cpython-36.pyc differ >> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/__config__.cpython-36.pyc differ >> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc differ >> Binary files /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc and /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc differ >> --8<---------------cut here---------------end--------------->8--- > > Here’s what diffoscope says: > > --8<---------------cut here---------------start------------->8--- > diffoscope /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0{-check,}/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc > --- /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0-check/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc > +++ /gnu/store/kd06ql8fynlydymzhhnwk2lh0778dwcc-python-numpy-1.14.0/lib/python3.6/site-packages/numpy/__pycache__/version.cpython-36.pyc > @@ -1,8 +1,8 @@ > -00000000: 330d 0d0a fa87 9c5a 2601 0000 e300 0000 3......Z&....... > +00000000: 330d 0d0a c485 9c5a 2601 0000 e300 0000 3......Z&....... > 00000010: 0000 0000 0000 0000 0001 0000 0040 0000 .............@.. > 00000020: 0073 2000 0000 6400 5a00 6400 5a01 6400 .s ...d.Z.d.Z.d. > 00000030: 5a02 6401 5a03 6402 5a04 6504 731c 6502 Z.d.Z.d.Z.e.s.e. > 00000040: 5a01 6403 5300 2904 7a06 312e 3134 2e30 Z.d.S.).z.1.14.0 > 00000050: da28 3639 3134 6262 3431 6630 6662 3363 .(6914bb41f0fb3c > 00000060: 3162 6135 3030 6261 6534 6537 6436 3731 1ba500bae4e7d671 > 00000070: 6461 3935 3336 3738 3666 544e 2905 da0d da9536786fTN)... > --8<---------------cut here---------------end--------------->8--- > > In other words: this is the timestamp field of the pyc file. > > Maybe this can be avoided by setting DETERMINISTIC_BUILD in the > python-build-system? It cannot. So, something’s still missing from my patch. Does anyone see what might be missing? -- Ricardo ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-05 0:02 ` Ricardo Wurmus 2018-03-05 0:05 ` Ricardo Wurmus @ 2018-03-05 22:06 ` Ricardo Wurmus 1 sibling, 0 replies; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-05 22:06 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 Ricardo Wurmus <rekado@elephly.net> writes: > Ricardo Wurmus <rekado@elephly.net> writes: > >> I have applied this patch locally: >> >> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm >> index 5f701701a..0d1ecc3c6 100644 >> --- a/gnu/packages/python.scm >> +++ b/gnu/packages/python.scm >> @@ -359,8 +359,42 @@ data types.") >> "Lib/ctypes/test/test_win32.py" ; fails on aarch64 >> "Lib/test/test_fcntl.py")) ; fails on aarch64 >> #t)))) >> - (arguments (substitute-keyword-arguments (package-arguments python-2) >> - ((#:tests? _) #t))) >> + (arguments >> + (substitute-keyword-arguments (package-arguments python-2) >> + ((#:tests? _) #t) >> + ((#:phases phases) >> + `(modify-phases ,phases >> + (add-after 'unpack 'patch-timestamp-for-pyc-files >> + (lambda _ >> + ;; We set DETERMINISTIC_BUILD to only override the mtime when >> + ;; building with Guix, lest we break auto-compilation in >> + ;; environments. >> + (setenv "DETERMINISTIC_BUILD" "1") >> + (substitute* "Lib/py_compile.py" >> + (("source_stats\\['mtime'\\]") >> + "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])")) >> + >> + ;; Use deterministic hashes for strings, bytes, and datetime >> + ;; objects. >> + (setenv "PYTHONHASHSEED" "0") >> + >> + ;; Reset mtime when validating bytecode header. >> + (substitute* "Lib/importlib/_bootstrap_external.py" >> + (("source_mtime = int\\(source_stats\\['mtime'\\]\\)") >> + "source_mtime = 1")) >> + #t)) >> + (add-after 'unpack 'disable-timestamp-tests >> + (lambda _ >> + (substitute* "Lib/test/test_importlib/source/test_file_loader.py" >> + (("test_bad_marshal") >> + "disable_test_bad_marshal") >> + (("test_no_marshal") >> + "disable_test_no_marshal") >> + (("test_non_code_marshal") >> + "disable_test_non_code_marshal")) >> + #t)) >> + (add-before 'check 'allow-non-deterministic-compilation >> + (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t)))))) >> (native-search-paths >> (list (search-path-specification >> (variable "PYTHONPATH") >> >> It allows me to build python-six and python-sip reproducibly. It does >> not fix problems with Python 2, and I haven’t yet tested if it causes >> any new problems. I should also note that Python 3 itself still contains pyc files with timestamps. This could be the reason why in Nix all pyc files are rebuilt (more than once). -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-04 19:18 ` Ricardo Wurmus 2018-03-05 0:02 ` Ricardo Wurmus @ 2018-03-05 23:21 ` Marius Bakke 2018-03-06 13:28 ` Ricardo Wurmus 2018-03-08 10:39 ` Gábor Boskovits 2 siblings, 1 reply; 29+ messages in thread From: Marius Bakke @ 2018-03-05 23:21 UTC (permalink / raw) To: Ricardo Wurmus, Gábor Boskovits; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 4105 bytes --] Ricardo Wurmus <rekado@elephly.net> writes: > I have applied this patch locally: > > diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm > index 5f701701a..0d1ecc3c6 100644 > --- a/gnu/packages/python.scm > +++ b/gnu/packages/python.scm > @@ -359,8 +359,42 @@ data types.") > "Lib/ctypes/test/test_win32.py" ; fails on aarch64 > "Lib/test/test_fcntl.py")) ; fails on aarch64 > #t)))) > - (arguments (substitute-keyword-arguments (package-arguments python-2) > - ((#:tests? _) #t))) > + (arguments > + (substitute-keyword-arguments (package-arguments python-2) > + ((#:tests? _) #t) > + ((#:phases phases) > + `(modify-phases ,phases > + (add-after 'unpack 'patch-timestamp-for-pyc-files > + (lambda _ > + ;; We set DETERMINISTIC_BUILD to only override the mtime when > + ;; building with Guix, lest we break auto-compilation in > + ;; environments. > + (setenv "DETERMINISTIC_BUILD" "1") > + (substitute* "Lib/py_compile.py" > + (("source_stats\\['mtime'\\]") > + "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])")) > + > + ;; Use deterministic hashes for strings, bytes, and datetime > + ;; objects. > + (setenv "PYTHONHASHSEED" "0") > + > + ;; Reset mtime when validating bytecode header. > + (substitute* "Lib/importlib/_bootstrap_external.py" > + (("source_mtime = int\\(source_stats\\['mtime'\\]\\)") > + "source_mtime = 1")) > + #t)) > + (add-after 'unpack 'disable-timestamp-tests > + (lambda _ > + (substitute* "Lib/test/test_importlib/source/test_file_loader.py" > + (("test_bad_marshal") > + "disable_test_bad_marshal") > + (("test_no_marshal") > + "disable_test_no_marshal") > + (("test_non_code_marshal") > + "disable_test_non_code_marshal")) > + #t)) > + (add-before 'check 'allow-non-deterministic-compilation > + (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t)))))) > (native-search-paths > (list (search-path-specification > (variable "PYTHONPATH") > > It allows me to build python-six and python-sip reproducibly. It does > not fix problems with Python 2, and I haven’t yet tested if it causes > any new problems. > > It’s a little worrying that I had to disable three more tests that I > think shouldn’t have failed. Woow, nice work! I can't tell what's going on with the tests, they do some bytecode manipulation stuff. Maybe it does not expect the low timestamp somehow? https://github.com/python/cpython/blob/374c6e178a7599aae46c857b17c6c8bc19dfe4c2/Lib/test/test_importlib/source/test_file_loader.py#L457-L484 I guess we'll do at least one 'core-updates' before 3.7 is released, so it makes sense to include this. It should also give us some experience that might be relevant for 2.7, since it probably won't get the upstream reproducibility patch that relies on 3.7 features. The only remark I have is: is introducing a new variable necessary? SOURCE_DATE_EPOCH implies that the user wants a deterministic build; the upstream patch doesn't actually honor it outside of making the hashing method deterministic. So, I think it might be enough to just test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD. The former is also already set in the build environment. However, I just noticed that you unset DETERMINISTIC_BUILD before the 'check' phase. Did it break more things? I suppose we'll have to set PYTHONHASHSEED somewhere in python-build-system as well. Did you check if that makes a difference for numpy? Perhaps it's enough to set it if we add an auto-compilation step? [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 487 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-05 23:21 ` Marius Bakke @ 2018-03-06 13:28 ` Ricardo Wurmus 2018-03-06 14:43 ` Ricardo Wurmus 0 siblings, 1 reply; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-06 13:28 UTC (permalink / raw) To: Marius Bakke; +Cc: 22533 Marius Bakke <mbakke@fastmail.com> writes: > The only remark I have is: is introducing a new variable necessary? > SOURCE_DATE_EPOCH implies that the user wants a deterministic build; > the upstream patch doesn't actually honor it outside of making the > hashing method deterministic. So, I think it might be enough to just > test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD. The former > is also already set in the build environment. > However, I just noticed that you unset DETERMINISTIC_BUILD before the > 'check' phase. Did it break more things? Yes, it broke a bunch of tests that are all about recompiling files when they are considered stale. > I suppose we'll have to set PYTHONHASHSEED somewhere in > python-build-system as well. Did you check if that makes a difference > for numpy? Perhaps it's enough to set it if we add an auto-compilation > step? Right, I’m going to test this with numpy now. Thanks for the hint! -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-06 13:28 ` Ricardo Wurmus @ 2018-03-06 14:43 ` Ricardo Wurmus 2018-03-06 14:57 ` Gábor Boskovits 0 siblings, 1 reply; 29+ messages in thread From: Ricardo Wurmus @ 2018-03-06 14:43 UTC (permalink / raw) To: Marius Bakke; +Cc: 22533 Ricardo Wurmus <rekado@elephly.net> writes: > Marius Bakke <mbakke@fastmail.com> writes: > >> I suppose we'll have to set PYTHONHASHSEED somewhere in >> python-build-system as well. Did you check if that makes a difference >> for numpy? Perhaps it's enough to set it if we add an auto-compilation >> step? > > Right, I’m going to test this with numpy now. Thanks for the hint! It did help with one file, which is now built reproducibly, namely lib/python3.6/site-packages/numpy/testing/nose_tools/__pycache__/utils.cpython-36.pyc This leaves five files in numpy that shouldn’t be but unfortunately are different. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-06 14:43 ` Ricardo Wurmus @ 2018-03-06 14:57 ` Gábor Boskovits 0 siblings, 0 replies; 29+ messages in thread From: Gábor Boskovits @ 2018-03-06 14:57 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 1029 bytes --] 2018-03-06 15:43 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>: > > Ricardo Wurmus <rekado@elephly.net> writes: > > > Marius Bakke <mbakke@fastmail.com> writes: > > > >> I suppose we'll have to set PYTHONHASHSEED somewhere in > >> python-build-system as well. Did you check if that makes a difference > >> for numpy? Perhaps it's enough to set it if we add an auto-compilation > >> step? > > > > Right, I’m going to test this with numpy now. Thanks for the hint! > > It did help with one file, which is now built reproducibly, namely > > lib/python3.6/site-packages/numpy/testing/nose_tools/__ > pycache__/utils.cpython-36.pyc > > This leaves five files in numpy that shouldn’t be but unfortunately are > different. > > Unfortunately backporting the upstream version is not straightforward at all. There are too many changes. I will have a look at those test failures instead. > -- > Ricardo > > GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC > https://elephly.net > > > [-- Attachment #2: Type: text/html, Size: 1861 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-04 19:18 ` Ricardo Wurmus 2018-03-05 0:02 ` Ricardo Wurmus 2018-03-05 23:21 ` Marius Bakke @ 2018-03-08 10:39 ` Gábor Boskovits 2019-01-14 13:40 ` Ricardo Wurmus 2 siblings, 1 reply; 29+ messages in thread From: Gábor Boskovits @ 2018-03-08 10:39 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 22533 [-- Attachment #1: Type: text/plain, Size: 3218 bytes --] 2018-03-04 20:18 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>: > I have applied this patch locally: > > > diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm > index 5f701701a..0d1ecc3c6 100644 > --- a/gnu/packages/python.scm > +++ b/gnu/packages/python.scm > @@ -359,8 +359,42 @@ data types.") > "Lib/ctypes/test/test_win32.py" ; fails on > aarch64 > "Lib/test/test_fcntl.py")) ; fails on > aarch64 > #t)))) > - (arguments (substitute-keyword-arguments (package-arguments python-2) > - ((#:tests? _) #t))) > + (arguments > + (substitute-keyword-arguments (package-arguments python-2) > + ((#:tests? _) #t) > + ((#:phases phases) > + `(modify-phases ,phases > + (add-after 'unpack 'patch-timestamp-for-pyc-files > + (lambda _ > + ;; We set DETERMINISTIC_BUILD to only override the mtime > when > + ;; building with Guix, lest we break auto-compilation in > + ;; environments. > + (setenv "DETERMINISTIC_BUILD" "1") > + (substitute* "Lib/py_compile.py" > + (("source_stats\\['mtime'\\]") > + "(1 if 'DETERMINISTIC_BUILD' in os.environ else > source_stats['mtime'])")) > + > + ;; Use deterministic hashes for strings, bytes, and > datetime > + ;; objects. > + (setenv "PYTHONHASHSEED" "0") > + > + ;; Reset mtime when validating bytecode header. > + (substitute* "Lib/importlib/_bootstrap_external.py" > + (("source_mtime = int\\(source_stats\\['mtime'\\]\\)") > + "source_mtime = 1")) > + #t)) > + (add-after 'unpack 'disable-timestamp-tests > + (lambda _ > + (substitute* "Lib/test/test_importlib/ > source/test_file_loader.py" > + (("test_bad_marshal") > + "disable_test_bad_marshal") > + (("test_no_marshal") > + "disable_test_no_marshal") > + (("test_non_code_marshal") > + "disable_test_non_code_marshal")) > + #t)) > + (add-before 'check 'allow-non-deterministic-compilation > + (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t)))))) > (native-search-paths > (list (search-path-specification > (variable "PYTHONPATH") > > > It allows me to build python-six and python-sip reproducibly. It does > not fix problems with Python 2, and I haven’t yet tested if it causes > any new problems. > > It’s a little worrying that I had to disable three more tests that I > think shouldn’t have failed. > > Ok, I've checked the test issue again. If we change the _bootstrap_external.py substitution to: "source_mtime = 1 if 'DETERMINISTIC_BUILD' in _os.environ else int(source_stats['mtime'])" the test do not fail any more. WDYT? > What do you think? > > -- > Ricardo > > GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC > https://elephly.net > > [-- Attachment #2: Type: text/html, Size: 4525 bytes --] ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-08 10:39 ` Gábor Boskovits @ 2019-01-14 13:40 ` Ricardo Wurmus 2019-02-03 21:22 ` Ricardo Wurmus 0 siblings, 1 reply; 29+ messages in thread From: Ricardo Wurmus @ 2019-01-14 13:40 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533 Now that we’re using Python 3.7 and this version supports hash-based pyc files, is this still an issue? Do we need to do anything to enable hash-based pyc compilation? See: https://docs.python.org/3/whatsnew/3.7.html#pep-552-hash-based-pyc-files https://www.python.org/dev/peps/pep-0552/ -- Ricardo ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2019-01-14 13:40 ` Ricardo Wurmus @ 2019-02-03 21:22 ` Ricardo Wurmus 2019-02-04 22:39 ` Ludovic Courtès 0 siblings, 1 reply; 29+ messages in thread From: Ricardo Wurmus @ 2019-02-03 21:22 UTC (permalink / raw) To: Gábor Boskovits; +Cc: 22533-done Ricardo Wurmus <rekado@elephly.net> writes: > Now that we’re using Python 3.7 and this version supports hash-based pyc > files, is this still an issue? Do we need to do anything to enable > hash-based pyc compilation? > > See: > https://docs.python.org/3/whatsnew/3.7.html#pep-552-hash-based-pyc-files > https://www.python.org/dev/peps/pep-0552/ It looks like this is no longer a problem. I built borg just now and the pyc files are reproducible. (The man pages include a date stamp, though, which I’m trying to patch now.) -- Ricardo ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2019-02-03 21:22 ` Ricardo Wurmus @ 2019-02-04 22:39 ` Ludovic Courtès 0 siblings, 0 replies; 29+ messages in thread From: Ludovic Courtès @ 2019-02-04 22:39 UTC (permalink / raw) To: 22533 Ricardo Wurmus <rekado@elephly.net> skribis: > Ricardo Wurmus <rekado@elephly.net> writes: > >> Now that we’re using Python 3.7 and this version supports hash-based pyc >> files, is this still an issue? Do we need to do anything to enable >> hash-based pyc compilation? >> >> See: >> https://docs.python.org/3/whatsnew/3.7.html#pep-552-hash-based-pyc-files >> https://www.python.org/dev/peps/pep-0552/ > > It looks like this is no longer a problem. I built borg just now and > the pyc files are reproducible. Yay! \o/ Ludo'. ^ permalink raw reply [flat|nested] 29+ messages in thread
* bug#22533: Python bytecode reproducibility 2018-03-03 22:37 ` Ricardo Wurmus 2018-03-04 9:21 ` Gábor Boskovits @ 2018-03-05 9:25 ` Ludovic Courtès 1 sibling, 0 replies; 29+ messages in thread From: Ludovic Courtès @ 2018-03-05 9:25 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: 22533 Hello! Ricardo Wurmus <rekado@elephly.net> skribis: > Is it a bad idea to override the timestamps in the generated binaries? > I think that we could avoid the recency check then, which was an > obstacle to resetting the timestamps of the source files. I think it’s good if we can fix Python itself to honor SOURCE_DATE_EPOCH for its timestamps, but it’s also OK to patch timestamps in generated binaries. We do that already in gzip headers, with ‘reset-gzip-timestamp’. Thanks for tackling this! Ludo’. ^ permalink raw reply [flat|nested] 29+ messages in thread
end of thread, other threads:[~2019-02-04 22:56 UTC | newest] Thread overview: 29+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2016-02-02 5:15 bug#22533: Non-determinism in python-3 ".pyc" bytecode Leo Famulari 2016-02-02 8:54 ` Leo Famulari 2016-02-02 20:41 ` Ludovic Courtès 2016-02-04 23:17 ` Leo Famulari 2016-03-29 23:11 ` Cyril Roelandt 2016-03-29 23:13 ` Cyril Roelandt 2016-04-06 8:29 ` Ludovic Courtès 2017-05-26 13:41 ` bug#22533: Python bytecode reproducibility Marius Bakke 2018-03-03 22:37 ` Ricardo Wurmus 2018-03-04 9:21 ` Gábor Boskovits 2018-03-04 12:46 ` Ricardo Wurmus 2018-03-04 15:30 ` Gábor Boskovits 2018-03-04 19:18 ` Ricardo Wurmus 2018-03-05 0:02 ` Ricardo Wurmus 2018-03-05 0:05 ` Ricardo Wurmus 2018-03-05 15:36 ` Gábor Boskovits 2018-03-05 20:33 ` Gábor Boskovits 2018-03-05 21:46 ` Ricardo Wurmus 2018-03-05 22:02 ` Ricardo Wurmus 2018-03-05 22:06 ` Ricardo Wurmus 2018-03-05 23:21 ` Marius Bakke 2018-03-06 13:28 ` Ricardo Wurmus 2018-03-06 14:43 ` Ricardo Wurmus 2018-03-06 14:57 ` Gábor Boskovits 2018-03-08 10:39 ` Gábor Boskovits 2019-01-14 13:40 ` Ricardo Wurmus 2019-02-03 21:22 ` Ricardo Wurmus 2019-02-04 22:39 ` Ludovic Courtès 2018-03-05 9:25 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).