From mboxrd@z Thu Jan 1 00:00:00 1970 From: Marius Bakke Subject: bug#22533: Python bytecode reproducibility Date: Tue, 06 Mar 2018 00:21:21 +0100 Message-ID: <87h8pu153i.fsf@fastmail.com> References: <20160202051544.GA11744@jasmine> <87bmqfu44s.fsf@fastmail.com> <87606c23bq.fsf@elephly.net> <874llw101c.fsf@elephly.net> <871sgz1wg0.fsf@elephly.net> Mime-Version: 1.0 Content-Type: multipart/signed; boundary="=-=-="; micalg=pgp-sha512; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:51371) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1eszQk-0005aw-Fj for bug-guix@gnu.org; Mon, 05 Mar 2018 18:22:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1eszQg-0006B3-6q for bug-guix@gnu.org; Mon, 05 Mar 2018 18:22:06 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:38529) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1eszQg-0006AJ-1J for bug-guix@gnu.org; Mon, 05 Mar 2018 18:22:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1eszQf-0004Ml-Mb for bug-guix@gnu.org; Mon, 05 Mar 2018 18:22:01 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <871sgz1wg0.fsf@elephly.net> List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ricardo Wurmus , =?UTF-8?Q?G=C3=A1bor?= Boskovits Cc: 22533@debbugs.gnu.org --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Ricardo Wurmus writes: > I have applied this patch locally: > > diff --git a/gnu/packages/python.scm b/gnu/packages/python.scm > index 5f701701a..0d1ecc3c6 100644 > --- a/gnu/packages/python.scm > +++ b/gnu/packages/python.scm > @@ -359,8 +359,42 @@ data types.") > "Lib/ctypes/test/test_win32.py" ; fails on= aarch64 > "Lib/test/test_fcntl.py")) ; fails on aarc= h64 > #t)))) > - (arguments (substitute-keyword-arguments (package-arguments python-2) > - ((#:tests? _) #t))) > + (arguments > + (substitute-keyword-arguments (package-arguments python-2) > + ((#:tests? _) #t) > + ((#:phases phases) > + `(modify-phases ,phases > + (add-after 'unpack 'patch-timestamp-for-pyc-files > + (lambda _ > + ;; We set DETERMINISTIC_BUILD to only override the mtime = when > + ;; building with Guix, lest we break auto-compilation in > + ;; environments. > + (setenv "DETERMINISTIC_BUILD" "1") > + (substitute* "Lib/py_compile.py" > + (("source_stats\\['mtime'\\]") > + "(1 if 'DETERMINISTIC_BUILD' in os.environ else source= _stats['mtime'])")) > + > + ;; Use deterministic hashes for strings, bytes, and datet= ime > + ;; objects. > + (setenv "PYTHONHASHSEED" "0") > + > + ;; Reset mtime when validating bytecode header. > + (substitute* "Lib/importlib/_bootstrap_external.py" > + (("source_mtime =3D int\\(source_stats\\['mtime'\\]\\)") > + "source_mtime =3D 1")) > + #t)) > + (add-after 'unpack 'disable-timestamp-tests > + (lambda _ > + (substitute* "Lib/test/test_importlib/source/test_file_lo= ader.py" > + (("test_bad_marshal") > + "disable_test_bad_marshal") > + (("test_no_marshal") > + "disable_test_no_marshal") > + (("test_non_code_marshal") > + "disable_test_non_code_marshal")) > + #t)) > + (add-before 'check 'allow-non-deterministic-compilation > + (lambda _ (unsetenv "DETERMINISTIC_BUILD") #t)))))) > (native-search-paths > (list (search-path-specification > (variable "PYTHONPATH") > > It allows me to build python-six and python-sip reproducibly. It does > not fix problems with Python 2, and I haven=E2=80=99t yet tested if it ca= uses > any new problems. > > It=E2=80=99s a little worrying that I had to disable three more tests tha= t I > think shouldn=E2=80=99t have failed. Woow, nice work! I can't tell what's going on with the tests, they do some bytecode manipulation stuff. Maybe it does not expect the low timestamp somehow? https://github.com/python/cpython/blob/374c6e178a7599aae46c857b17c6c8bc19df= e4c2/Lib/test/test_importlib/source/test_file_loader.py#L457-L484 I guess we'll do at least one 'core-updates' before 3.7 is released, so it makes sense to include this. It should also give us some experience that might be relevant for 2.7, since it probably won't get the upstream reproducibility patch that relies on 3.7 features. The only remark I have is: is introducing a new variable necessary? SOURCE_DATE_EPOCH implies that the user wants a deterministic build; the upstream patch doesn't actually honor it outside of making the hashing method deterministic. So, I think it might be enough to just test for SOURCE_DATE_EPOCH instead of DETERMINISTIC_BUILD. The former is also already set in the build environment. However, I just noticed that you unset DETERMINISTIC_BUILD before the 'check' phase. Did it break more things? I suppose we'll have to set PYTHONHASHSEED somewhere in python-build-system as well. Did you check if that makes a difference for numpy? Perhaps it's enough to set it if we add an auto-compilation step? --=-=-= Content-Type: application/pgp-signature; name="signature.asc" -----BEGIN PGP SIGNATURE----- iQEzBAEBCgAdFiEEu7At3yzq9qgNHeZDoqBt8qM6VPoFAlqd0PEACgkQoqBt8qM6 VPrDLwf7BrDvvnHtKJms7+uZjPg3VNiBNqXjdLm3+Ic8hXcD+T/rUnPbTDjQfKWS WlsmaUsDa8W5Xs6nTOaalEw2ifwrB0pqGpEZGOnHsGuldDRgt4AMZmKvojt9A4Bx kjuz06y9i1R1QQel04rQRpFoiC2D9MYgktJczpxeZJyp/wzbhw6BDBtEdCq7+Fap crIe2CdIepnHlDu44imBawPu59YHLfzXKA2GdwLS3zjfns6ZLaigQTqSNqqAYxML mpiwDsReZV/OmAa+xhStZMur2LxBIMSkh+PCm2xgW0H8ZaPdRV69bWmqopRynyLN uzzKniezMl+HM2SrVuljKa1RMMbfFw== =vpXg -----END PGP SIGNATURE----- --=-=-=--