From mboxrd@z Thu Jan 1 00:00:00 1970 From: =?UTF-8?B?QmrDtnJuIEjDtmZsaW5n?= Subject: Output of guix build --check foo is not part of store deduplication Date: Thu, 9 Aug 2018 11:45:55 +0200 Message-ID: <20180809114555.33f21652@alma-ubu> Mime-Version: 1.0 Content-Type: multipart/signed; micalg=pgp-sha1; boundary="Sig_/MEWW4GLxQHRha2l9Jv2i+5W"; protocol="application/pgp-signature" Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:33429) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnhWF-0006aU-Gu for help-guix@gnu.org; Thu, 09 Aug 2018 05:46:13 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1fnhWB-0005Xn-EJ for help-guix@gnu.org; Thu, 09 Aug 2018 05:46:11 -0400 Received: from m4s11.vlinux.de ([83.151.27.109]:43224 helo=bjoernhoefling.de) by eggs.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1fnhWB-0005W6-2G for help-guix@gnu.org; Thu, 09 Aug 2018 05:46:07 -0400 Received: from alma-ubu (p57B52973.dip0.t-ipconnect.de [87.181.41.115]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by bjoernhoefling.de (Postfix) with ESMTPSA id 69BD03FA0A for ; Thu, 9 Aug 2018 11:46:02 +0200 (CEST) List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: help-guix-bounces+gcggh-help-guix=m.gmane.org@gnu.org Sender: "Help-Guix" To: Guix-Help --Sig_/MEWW4GLxQHRha2l9Jv2i+5W Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable Is there any reason why the output of 'guix build --check ...' is not part of deduplication? I will explain my problem: When checking for (un)reproducibility, we use something like: guix build --check -K foo That will build the package foo again and produce a store output /gnu/store/hash..-foo-1.0.0-check You can then use diffoscope to view the difference between the old and the new '-check' output. Usually, the store gets deduplicated, i.e. if files bar and baz have the same content, they will hard-link to the same thing on disk. That's cool for saving space if for example some package get's updated because of a changed dependency but really there is no or little change to the output files. But the '-check' files are somehow not part of that deduplication. Even if you enforce deduplication with guix gc --optimize. You can see it like this: ls -l /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share= /man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shi= shi-1.0.2/share/man/man3/shishi_asreq.3.gz=20 -r--r--r-- 1 root root 624 Jan 1 1970 /gnu/store/zlxarsbwwkasy69cyv34jvz= i7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz -r--r--r-- 11 root root 624 Jan 1 1970 /gnu/store/zlxarsbwwkasy69cyv34jvz= i7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz ls -i /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share= /man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shi= shi-1.0.2/share/man/man3/shishi_asreq.3.gz=20 46161304 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/sha= re/man/man3/shishi_asreq.3.gz 45141642 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man= /man3/shishi_asreq.3.gz The '-check' output has only one link count and the actual output has 11 links, because I have already so many store items/generations of that package around. The inode differs. If you now diffoscope them, diffoscope will call stat and then we get diffs like: =E2=94=82 =E2=94=82 --- /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libch= op-0.5.2/share/info/libchop.info.gz =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 +++ /gnu/store/h63cx6akyrv3= m73lky585ba10qq3mydc-libchop-0.5.2-check/share/info/libchop.info.gz =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=9C=E2=94=80=E2=94=80 /gnu/store/as7vb5= xx7vqdwmmqj9543470r49b4c0c-coreutils-8.28/bin/stat {} =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 @@ -1,8 +1,8 @@ =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 =20 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 Size: 29524 Blocks: 64 = IO Block: 4096 regular file =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 -Links: 3 =E2=94=82 =E2=94=82 =E2=94=82 =E2=94=82 +Links: 1 This is annoying because it hides the actual unreproducibility-problem.=20 Is there any reason for that? At least, I found a very guixy way around it: There's a patch by Eelco to filter those Links out: https://github.com/edolstra/diffoscope/commit/367f77bba8df0dbc89e63c9f66f05= 736adf5ec59 (with copy/paste errors): diffoscope/comparators/directory.py @@ -47,14 +47,18 @@ def cmdline(self): FILE_RE =3D re.compile(r'^\s*File:.*$') DEVICE_RE =3D re.compile(r'Device: [0-9a-f]+h/[0-9]+d') + LINKS_RE =3D re.compile(r'Links: [0-9]+') ACCESS_TIME_RE =3D re.compile(r'^Access: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$') CHANGE_TIME_RE =3D re.compile(r'^Change: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$') def filter(self, line): line =3D line.decode('utf-8') line =3D Stat.FILE_RE.sub('', line) line =3D Stat.DEVICE_RE.sub('', line) line =3D Stat.INODE_RE.sub('', line) + line =3D Stat.LINKS_RE.sub('', line) line =3D Stat.ACCESS_TIME_RE.sub('', line) line =3D Stat.CHANGE_TIME_RE.sub('', line) return line.encode('utf-8') So, I did: guix build -S diffoscope to get the source tarball, unpacked the sources. Patched. Packed. Then: guix package -i diffoscope --with-source=3Ddiffoscope-96.tar.gz and have a Links-free version of diffoscope in my profile (If I would have thought about that earlier, I would have done it in a separate profile and not in my main one)! Bj=C3=B6rn --Sig_/MEWW4GLxQHRha2l9Jv2i+5W Content-Type: application/pgp-signature Content-Description: OpenPGP digital signature -----BEGIN PGP SIGNATURE----- Version: GnuPG v2 iEYEARECAAYFAltsDVQACgkQvyhstlk+X/1ptACgsGOPW/FOblhC1Eiy/nejgzxZ UhAAnAq1rAgg106jYV7xfg3qbAkzXqKS =FZMH -----END PGP SIGNATURE----- --Sig_/MEWW4GLxQHRha2l9Jv2i+5W--