unofficial mirror of help-guix@gnu.org 
 help / color / mirror / Atom feed
From: "Björn Höfling" <bjoern.hoefling@bjoernhoefling.de>
To: Guix-Help <help-guix@gnu.org>
Subject: Output of guix build --check foo is not part of store deduplication
Date: Thu, 9 Aug 2018 11:45:55 +0200	[thread overview]
Message-ID: <20180809114555.33f21652@alma-ubu> (raw)

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

Is there any reason why the output of 'guix build --check ...' is not
part of deduplication? I will explain my problem:

When checking for (un)reproducibility, we use something like:

guix build --check -K foo

That will build the package foo again and produce a store output

/gnu/store/hash..-foo-1.0.0-check

You can then use diffoscope to view the difference between the old and
the new '-check' output.

Usually, the store gets deduplicated, i.e. if files bar and baz have
the same content, they will hard-link to the same thing on disk. That's
cool for saving space if for example some package get's updated because
of a changed dependency but really there is no or little change to the
output files.

But the '-check' files are somehow not part of that deduplication. Even
if you enforce deduplication with guix gc --optimize. You can see it
like this:

ls -l  /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz 
-r--r--r--  1 root root 624 Jan  1  1970 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
-r--r--r-- 11 root root 624 Jan  1  1970 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz

ls -i  /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz 
46161304 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
45141642 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz

The '-check' output has only one link count and the actual output has
11 links, because I have already so many store items/generations of
that package around. The inode differs.

If you now diffoscope them, diffoscope will call stat and then we get
diffs like:

│ │   --- /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2/share/info/libchop.info.gz
│ │ ├── +++ /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2-check/share/info/libchop.info.gz
│ │ │ ├── /gnu/store/as7vb5xx7vqdwmmqj9543470r49b4c0c-coreutils-8.28/bin/stat {}
│ │ │ │ @@ -1,8 +1,8 @@
│ │ │ │  
│ │ │ │    Size: 29524          Blocks: 64         IO Block: 4096   regular file
│ │ │ │ -Links: 3
│ │ │ │ +Links: 1


This is annoying because it hides the actual unreproducibility-problem. 
Is there any reason for that?


At least, I found a very guixy way around it:

There's a patch by Eelco to filter those Links out:

https://github.com/edolstra/diffoscope/commit/367f77bba8df0dbc89e63c9f66f05736adf5ec59

(with copy/paste errors):

 diffoscope/comparators/directory.py
@@ -47,14 +47,18 @@ def cmdline(self):
    FILE_RE = re.compile(r'^\s*File:.*$')
    DEVICE_RE = re.compile(r'Device: [0-9a-f]+h/[0-9]+d')
+   LINKS_RE = re.compile(r'Links: [0-9]+')
    ACCESS_TIME_RE = re.compile(r'^Access: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
    CHANGE_TIME_RE = re.compile(r'^Change: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
    def filter(self, line):
        line = line.decode('utf-8')
        line = Stat.FILE_RE.sub('', line)
        line = Stat.DEVICE_RE.sub('', line)
        line = Stat.INODE_RE.sub('', line)
+       line = Stat.LINKS_RE.sub('', line)
        line = Stat.ACCESS_TIME_RE.sub('', line)
        line = Stat.CHANGE_TIME_RE.sub('', line)
        return line.encode('utf-8')


So, I did:

guix build -S diffoscope

to get the source tarball, unpacked the sources. Patched. Packed. Then:

guix package -i diffoscope --with-source=diffoscope-96.tar.gz

and have a Links-free version of diffoscope in my profile (If I would
have thought about that earlier, I would have done it in a separate
profile and not in my main one)!

Björn





[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

             reply	other threads:[~2018-08-09  9:46 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-08-09  9:45 Björn Höfling [this message]
2018-08-10  9:16 ` Output of guix build --check foo is not part of store deduplication Chris Marusich
2018-08-24 22:08 ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=20180809114555.33f21652@alma-ubu \
    --to=bjoern.hoefling@bjoernhoefling.de \
    --cc=help-guix@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).