From: Marius Bakke <mbakke@fastmail.com>
To: "Ricardo Wurmus" <rekado@elephly.net>,
"Gábor Boskovits" <boskovits@gmail.com>
Cc: 22533@debbugs.gnu.org
Subject: bug#22533: Python bytecode reproducibility
Date: Tue, 06 Mar 2018 00:21:21 +0100 [thread overview]
Message-ID: <87h8pu153i.fsf@fastmail.com> (raw)
In-Reply-To: <871sgz1wg0.fsf@elephly.net>
[-- Attachment #1: Type: text/plain, Size: 4105 bytes --]
Ricardo Wurmus <rekado@elephly.net> writes:
> I have applied this patch locally:
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index 5f701701a..0d1ecc3c6 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -359,8 +359,42 @@ data types.")
> "Lib/ctypes/test/test_win32.py" ; fails on aarch64
> "Lib/test/test_fcntl.py")) ; fails on aarch64
> #t))))
> - (arguments (substitute-keyword-arguments (package-arguments python-2)
> - ((#:tests? _) #t)))
> + (arguments
> + (substitute-keyword-arguments (package-arguments python-2)
> + ((#:tests? _) #t)
> + ((#:phases phases)
> + `(modify-phases ,phases
> + (add-after 'unpack 'patch-timestamp-for-pyc-files
> + (lambda _
> + ;; We set DETERMINISTIC_BUILD to only override the mtime when
> + ;; building with Guix, lest we break auto-compilation in
> + ;; environments.
> + (setenv "DETERMINISTIC_BUILD" "1")
> + (substitute* "Lib/py_compile.py"
> + (("source_stats\\['mtime'\\]")
> + "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"))
> +
> + ;; Use deterministic hashes for strings, bytes, and datetime
> + ;; objects.
> + (setenv "PYTHONHASHSEED" "0")
> +
> + ;; Reset mtime when validating bytecode header.
> + (substitute* "Lib/importlib/_bootstrap_external.py"
> + (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
> + "source_mtime = 1"))
> + #t))
> + (add-after 'unpack 'disable-timestamp-tests
> + (lambda _
> + (substitute* "Lib/test/test_importlib/source/test_file_loader.py"
> + (("test_bad_marshal")
> + "disable_test_bad_marshal")
> + (("test_no_marshal")
> + "disable_test_no_marshal")
> + (("test_non_code_marshal")
> + "disable_test_non_code_marshal"))
> + #t))
> + (add-before 'check 'allow-non-deterministic-compilation
> + (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
> (native-search-paths
> (list (search-path-specification
> (variable "PYTHONPATH")
>
> It allows me to build python-six and python-sip reproducibly. It does
> not fix problems with Python 2, and I haven’t yet tested if it causes
> any new problems.
>
> It’s a little worrying that I had to disable three more tests that I
> think shouldn’t have failed.
Woow, nice work! I can't tell what's going on with the tests, they do
some bytecode manipulation stuff. Maybe it does not expect the low
timestamp somehow?
https://github.com/python/cpython/blob/374c6e178a7599aae46c857b17c6c8bc19dfe4c2/Lib/test/test_importlib/source/test_file_loader.py#L457-L484
I guess we'll do at least one 'core-updates' before 3.7 is released, so
it makes sense to include this. It should also give us some experience
that might be relevant for 2.7, since it probably won't get the upstream
reproducibility patch that relies on 3.7 features.
The only remark I have is: is introducing a new variable necessary?
SOURCE_DATE_EPOCH implies that the user wants a deterministic build;
the upstream patch doesn't actually honor it outside of making the
hashing method deterministic. So, I think it might be enough to just
test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD. The former
is also already set in the build environment.
However, I just noticed that you unset DETERMINISTIC_BUILD before the
'check' phase. Did it break more things?
I suppose we'll have to set PYTHONHASHSEED somewhere in
python-build-system as well. Did you check if that makes a difference
for numpy? Perhaps it's enough to set it if we add an auto-compilation
step?
[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]
next prev parent reply other threads:[~2018-03-05 23:22 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2016-02-02 5:15 bug#22533: Non-determinism in python-3 ".pyc" bytecode Leo Famulari
2016-02-02 8:54 ` Leo Famulari
2016-02-02 20:41 ` Ludovic Courtès
2016-02-04 23:17 ` Leo Famulari
2016-03-29 23:11 ` Cyril Roelandt
2016-03-29 23:13 ` Cyril Roelandt
2016-04-06 8:29 ` Ludovic Courtès
2017-05-26 13:41 ` bug#22533: Python bytecode reproducibility Marius Bakke
2018-03-03 22:37 ` Ricardo Wurmus
2018-03-04 9:21 ` Gábor Boskovits
2018-03-04 12:46 ` Ricardo Wurmus
2018-03-04 15:30 ` Gábor Boskovits
2018-03-04 19:18 ` Ricardo Wurmus
2018-03-05 0:02 ` Ricardo Wurmus
2018-03-05 0:05 ` Ricardo Wurmus
2018-03-05 15:36 ` Gábor Boskovits
2018-03-05 20:33 ` Gábor Boskovits
2018-03-05 21:46 ` Ricardo Wurmus
2018-03-05 22:02 ` Ricardo Wurmus
2018-03-05 22:06 ` Ricardo Wurmus
2018-03-05 23:21 ` Marius Bakke [this message]
2018-03-06 13:28 ` Ricardo Wurmus
2018-03-06 14:43 ` Ricardo Wurmus
2018-03-06 14:57 ` Gábor Boskovits
2018-03-08 10:39 ` Gábor Boskovits
2019-01-14 13:40 ` Ricardo Wurmus
2019-02-03 21:22 ` Ricardo Wurmus
2019-02-04 22:39 ` Ludovic Courtès
2018-03-05 9:25 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87h8pu153i.fsf@fastmail.com \
--to=mbakke@fastmail.com \
--cc=22533@debbugs.gnu.org \
--cc=boskovits@gmail.com \
--cc=rekado@elephly.net \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.