From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?Q?G=C3=A1bor?= Boskovits Subject: bug#22533: Python bytecode reproducibility Date: Sun, 4 Mar 2018 16:30:59 +0100 Message-ID: References: <20160202051544.GA11744@jasmine> <87bmqfu44s.fsf@fastmail.com> <87606c23bq.fsf@elephly.net> <874llw101c.fsf@elephly.net> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary="001a1144123684cf89056697e53c" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:39608) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1esVcM-0005RQ-IS for bug-guix@gnu.org; Sun, 04 Mar 2018 10:32:07 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1esVcI-0007kF-Lz for bug-guix@gnu.org; Sun, 04 Mar 2018 10:32:06 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:36672) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1esVcI-0007jb-Hf for bug-guix@gnu.org; Sun, 04 Mar 2018 10:32:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1esVcI-0003hD-Au for bug-guix@gnu.org; Sun, 04 Mar 2018 10:32:02 -0500 Sender: "Debbugs-submit" Resent-Message-ID: In-Reply-To: <874llw101c.fsf@elephly.net> List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane.org@gnu.org Sender: "bug-Guix" To: Ricardo Wurmus Cc: 22533@debbugs.gnu.org --001a1144123684cf89056697e53c Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable 2018-03-04 13:46 GMT+01:00 Ricardo Wurmus : > > Hi G=C3=A1bor, > > > Nix had this issue, it seems they have a python 3.5 solution, which > > should be easy to adopt: https://github.com/NixOS/nixpkgs/issues/22570. > > WDYT? > > Here=E2=80=99s the patch for Nix: > > https://patch-diff.githubusercontent.com/raw/ > NixOS/nixpkgs/pull/22585.diff > > Here are the relevant changes to the Python packages: > > * Python 3.4 > > substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" > "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" > substituteInPlace "Lib/importlib/_bootstrap.py" --replace "source_mtime > =3D int(source_stats['mtime'])" "source_mtime =3D 1" > > * Python 3.5 > > substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" > "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" > substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace > "source_mtime =3D int(st['mtime'])" "source_mtime =3D 1" > > * Python 3.6 > substituteInPlace "Lib/py_compile.py" --replace "source_stats['mtime']" > "(1 if 'DETERMINISTIC_BUILD' in os.environ else source_stats['mtime'])" > substituteInPlace "Lib/importlib/_bootstrap_external.py" --replace > "source_mtime =3D int(st['mtime'])" "source_mtime =3D 1" > > > Nice, thanks for the summary. Can we adopt this as is? Do we need the 3.4 and 3.5 fix or the 3.6 one is enough? > For all packages they set these environment variables: > > - set PYTHONHASHSEED=3D0 (for hashes of str, bytes and datetime objects= ) > > - set DETERMINISTIC_BUILD; for conditional patching of the timestamp > for package builds. The timestamp is not patched in ad-hoc > environments, because that would mess with Python=E2=80=99s ability t= o > determine whether to compile source files. > > Should we set these in python-build-system? What about python booststrap? I guess we use gnu-build-system there, so bootstrap packages might need to set these explicitly? > They also rebuild all bytecode (with the exception of lib2to3 because it > is Python 2 code) three times, once for each optimization level. > > --8<---------------cut here---------------start------------->8--- > + # Determinism: rebuild all bytecode > + # We exclude lib2to3 because that's Python 2 code which fails > + # We rebuild three times, once for each optimization level > + find $out -name "*.py" | $out/bin/python -m compileall -q -f -x > "lib2to3" -i - > + find $out -name "*.py" | $out/bin/python -O -m compileall -q -f -x > "lib2to3" -i - > + find $out -name "*.py" | $out/bin/python -OO -m compileall -q -f -x > "lib2to3" -i - > --8<---------------cut here---------------end--------------->8--- > > Do we also have to do this, or should we settle with one optimization level? Which one? > -- > Ricardo > > GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC > https://elephly.net > > > --001a1144123684cf89056697e53c Content-Type: text/html; charset="UTF-8" Content-Transfer-Encoding: quoted-printable
2018= -03-04 13:46 GMT+01:00 Ricardo Wurmus <rekado@elephly.net>:=

Hi G=C3=A1bor,

> Nix had this issue, it seems they have a python 3.5 solution, which > should be easy to adopt: https://github.com/NixOS/= nixpkgs/issues/22570.
> WDYT?

Here=E2=80=99s the patch for Nix:

=C2=A0 https://patch-diff.= githubusercontent.com/raw/NixOS/nixpkgs/pull/22585.diff

Here are the relevant changes to the Python packages:

* Python 3.4

=C2=A0 substituteInPlace "Lib/py_compile.py" --replace "sour= ce_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' i= n os.environ else source_stats['mtime'])"
=C2=A0 substituteInPlace "Lib/importlib/_bootstrap.py" --replace = "source_mtime =3D int(source_stats['mtime'])" "sourc= e_mtime =3D 1"

* Python 3.5

=C2=A0 substituteInPlace "Lib/py_compile.py" --replace "sour= ce_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' i= n os.environ else source_stats['mtime'])"
=C2=A0 substituteInPlace "Lib/importlib/_bootstrap_external.py&qu= ot; --replace "source_mtime =3D int(st['mtime'])" "s= ource_mtime =3D 1"

* Python 3.6
=C2=A0 substituteInPlace "Lib/py_compile.py" --replace "sour= ce_stats['mtime']" "(1 if 'DETERMINISTIC_BUILD' i= n os.environ else source_stats['mtime'])"
=C2=A0 substituteInPlace "Lib/importlib/_bootstrap_external.py&qu= ot; --replace "source_mtime =3D int(st['mtime'])" "s= ource_mtime =3D 1"



Nice, thanks for the summary.
Can we adopt this as is?
Do we need the 3.4 and 3.5 fix or the = 3.6 one is enough?=C2=A0
=C2=A0
For all packages they set these environment variables:

=C2=A0 - set PYTHONHASHSEED=3D0 (for hashes of str, bytes and datetime obje= cts)

=C2=A0 - set DETERMINISTIC_BUILD; for conditional patching of the timestamp=
=C2=A0 =C2=A0 for package builds.=C2=A0 The timestamp is not patched in ad-= hoc
=C2=A0 =C2=A0 environments, because that would mess with Python=E2=80=99s a= bility to
=C2=A0 =C2=A0 determine whether to compile source files.


Should we set these in python-build-sy= stem? What about python booststrap?
I guess we use gnu-build-syst= em there, so bootstrap packages might need to
set these explicitl= y?
=C2=A0
They also rebuild all bytecode (with the exception of lib2to3 because it is Python 2 code) three times, once for each optimization level.

--8<---------------cut here---------------start------------->8--= -
+=C2=A0 =C2=A0 # Determinism: rebuild all bytecode
+=C2=A0 =C2=A0 # We exclude lib2to3 because that's Python 2 code which = fails
+=C2=A0 =C2=A0 # We rebuild three times, once for each optimization level +=C2=A0 =C2=A0 find $out -name "*.py" | $out/bin/python -m compil= eall -q -f -x "lib2to3" -i -
+=C2=A0 =C2=A0 find $out -name "*.py" | $out/bin/python -O -m com= pileall -q -f -x "lib2to3" -i -
+=C2=A0 =C2=A0 find $out -name "*.py" | $out/bin/python -OO -m co= mpileall -q -f -x "lib2to3" -i -
--8<---------------cut here---------------end--------------->8--= -

<= br>
Do we also have to do this, or should we settle with one opti= mization level? Which one?
=C2=A0
--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6=C2=A0 2150 197A 5888 235F ACAC
https:= //elephly.net



--001a1144123684cf89056697e53c--