unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
From: Marius Bakke <mbakke@fastmail.com>
To: "Ricardo Wurmus" <rekado@elephly.net>,
	"Gábor Boskovits" <boskovits@gmail.com>
Cc: 22533@debbugs.gnu.org
Subject: bug#22533: Python bytecode reproducibility
Date: Tue, 06 Mar 2018 00:21:21 +0100	[thread overview]
Message-ID: <87h8pu153i.fsf@fastmail.com> (raw)
In-Reply-To: <871sgz1wg0.fsf@elephly.net>

[-- Attachment #1: Type: text/plain, Size: 4105 bytes --]

Ricardo Wurmus <rekado@elephly.net> writes:

> I have applied this patch locally:
>
> diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm
> index 5f701701a..0d1ecc3c6 100644
> --- a/gnu/packages/python.scm
> +++ b/gnu/packages/python.scm
> @@ -359,8 +359,42 @@ data types.")
>                                "Lib/ctypes/test/test_win32.py" ; fails on aarch64
>                                "Lib/test/test_fcntl.py")) ; fails on aarch64
>                    #t))))
> -    (arguments (substitute-keyword-arguments (package-arguments python-2)
> -                 ((#:tests? _) #t)))
> +    (arguments
> +     (substitute-keyword-arguments (package-arguments python-2)
> +       ((#:tests? _) #t)
> +       ((#:phases phases)
> +        `(modify-phases ,phases
> +           (add-after 'unpack 'patch-timestamp-for-pyc-files
> +             (lambda _
> +               ;; We set DETERMINISTIC_BUILD to only override the mtime when
> +               ;; building with Guix, lest we break auto-compilation in
> +               ;; environments.
> +               (setenv "DETERMINISTIC_BUILD" "1")
> +               (substitute* "Lib/py_compile.py"
> +                 (("source_stats\\['mtime'\\]")
> +                  "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])"))
> +
> +               ;; Use deterministic hashes for strings, bytes, and datetime
> +               ;; objects.
> +               (setenv "PYTHONHASHSEED" "0")
> +
> +               ;; Reset mtime when validating bytecode header.
> +               (substitute* "Lib/importlib/_bootstrap_external.py"
> +                 (("source_mtime = int\\(source_stats\\['mtime'\\]\\)")
> +                  "source_mtime = 1"))
> +               #t))
> +           (add-after 'unpack 'disable-timestamp-tests
> +             (lambda _
> +               (substitute* "Lib/test/test_importlib/source/test_file_loader.py"
> +                 (("test_bad_marshal")
> +                  "disable_test_bad_marshal")
> +                 (("test_no_marshal")
> +                  "disable_test_no_marshal")
> +                 (("test_non_code_marshal")
> +                  "disable_test_non_code_marshal"))
> +               #t))
> +           (add-before 'check 'allow-non-deterministic-compilation
> +             (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t))))))
>      (native-search-paths
>       (list (search-path-specification
>              (variable "PYTHONPATH")
>
> It allows me to build python-six and python-sip reproducibly.  It does
> not fix problems with Python 2, and I haven’t yet tested if it causes
> any new problems.
>
> It’s a little worrying that I had to disable three more tests that I
> think shouldn’t have failed.

Woow, nice work!  I can't tell what's going on with the tests, they do
some bytecode manipulation stuff.  Maybe it does not expect the low
timestamp somehow?

https://github.com/python/cpython/blob/374c6e178a7599aae46c857b17c6c8bc19dfe4c2/Lib/test/test_importlib/source/test_file_loader.py#L457-L484

I guess we'll do at least one 'core-updates' before 3.7 is released, so
it makes sense to include this.  It should also give us some experience
that might be relevant for 2.7, since it probably won't get the upstream
reproducibility patch that relies on 3.7 features.

The only remark I have is: is introducing a new variable necessary?
SOURCE_DATE_EPOCH implies that the user wants a deterministic build;
the upstream patch doesn't actually honor it outside of making the
hashing method deterministic.  So, I think it might be enough to just
test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD.  The former
is also already set in the build environment.

However, I just noticed that you unset DETERMINISTIC_BUILD before the
'check' phase.  Did it break more things?

I suppose we'll have to set PYTHONHASHSEED somewhere in
python-build-system as well.  Did you check if that makes a difference
for numpy?  Perhaps it's enough to set it if we add an auto-compilation
step?

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 487 bytes --]

  parent reply	other threads:[~2018-03-05 23:22 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-02-02  5:15 bug#22533: Non-determinism in python-3 ".pyc" bytecode Leo Famulari
2016-02-02  8:54 ` Leo Famulari
2016-02-02 20:41 ` Ludovic Courtès
2016-02-04 23:17   ` Leo Famulari
2016-03-29 23:11     ` Cyril Roelandt
2016-03-29 23:13     ` Cyril Roelandt
2016-04-06  8:29       ` Ludovic Courtès
2017-05-26 13:41 ` bug#22533: Python bytecode reproducibility Marius Bakke
2018-03-03 22:37   ` Ricardo Wurmus
2018-03-04  9:21     ` Gábor Boskovits
2018-03-04 12:46       ` Ricardo Wurmus
2018-03-04 15:30         ` Gábor Boskovits
2018-03-04 19:18         ` Ricardo Wurmus
2018-03-05  0:02           ` Ricardo Wurmus
2018-03-05  0:05             ` Ricardo Wurmus
2018-03-05 15:36               ` Gábor Boskovits
2018-03-05 20:33                 ` Gábor Boskovits
2018-03-05 21:46                   ` Ricardo Wurmus
2018-03-05 22:02               ` Ricardo Wurmus
2018-03-05 22:06             ` Ricardo Wurmus
2018-03-05 23:21           ` Marius Bakke [this message]
2018-03-06 13:28             ` Ricardo Wurmus
2018-03-06 14:43               ` Ricardo Wurmus
2018-03-06 14:57                 ` Gábor Boskovits
2018-03-08 10:39           ` Gábor Boskovits
2019-01-14 13:40             ` Ricardo Wurmus
2019-02-03 21:22               ` Ricardo Wurmus
2019-02-04 22:39                 ` Ludovic Courtès
2018-03-05  9:25     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87h8pu153i.fsf@fastmail.com \
    --to=mbakke@fastmail.com \
    --cc=22533@debbugs.gnu.org \
    --cc=boskovits@gmail.com \
    --cc=rekado@elephly.net \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).