From: "Björn Höfling" <bjoern.hoefling@bjoernhoefling.de>
To: Guix-Help <help-guix@gnu.org>
Subject: Output of guix build --check foo is not part of store deduplication
Date: Thu, 9 Aug 2018 11:45:55 +0200 [thread overview]
Message-ID: <20180809114555.33f21652@alma-ubu> (raw)
[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]
Is there any reason why the output of 'guix build --check ...' is not
part of deduplication? I will explain my problem:
When checking for (un)reproducibility, we use something like:
guix build --check -K foo
That will build the package foo again and produce a store output
/gnu/store/hash..-foo-1.0.0-check
You can then use diffoscope to view the difference between the old and
the new '-check' output.
Usually, the store gets deduplicated, i.e. if files bar and baz have
the same content, they will hard-link to the same thing on disk. That's
cool for saving space if for example some package get's updated because
of a changed dependency but really there is no or little change to the
output files.
But the '-check' files are somehow not part of that deduplication. Even
if you enforce deduplication with guix gc --optimize. You can see it
like this:
ls -l /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz
-r--r--r-- 1 root root 624 Jan 1 1970 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
-r--r--r-- 11 root root 624 Jan 1 1970 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz
ls -i /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz
46161304 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
45141642 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz
The '-check' output has only one link count and the actual output has
11 links, because I have already so many store items/generations of
that package around. The inode differs.
If you now diffoscope them, diffoscope will call stat and then we get
diffs like:
│ │ --- /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2/share/info/libchop.info.gz
│ │ ├── +++ /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2-check/share/info/libchop.info.gz
│ │ │ ├── /gnu/store/as7vb5xx7vqdwmmqj9543470r49b4c0c-coreutils-8.28/bin/stat {}
│ │ │ │ @@ -1,8 +1,8 @@
│ │ │ │
│ │ │ │ Size: 29524 Blocks: 64 IO Block: 4096 regular file
│ │ │ │ -Links: 3
│ │ │ │ +Links: 1
This is annoying because it hides the actual unreproducibility-problem.
Is there any reason for that?
At least, I found a very guixy way around it:
There's a patch by Eelco to filter those Links out:
https://github.com/edolstra/diffoscope/commit/367f77bba8df0dbc89e63c9f66f05736adf5ec59
(with copy/paste errors):
diffoscope/comparators/directory.py
@@ -47,14 +47,18 @@ def cmdline(self):
FILE_RE = re.compile(r'^\s*File:.*$')
DEVICE_RE = re.compile(r'Device: [0-9a-f]+h/[0-9]+d')
+ LINKS_RE = re.compile(r'Links: [0-9]+')
ACCESS_TIME_RE = re.compile(r'^Access: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
CHANGE_TIME_RE = re.compile(r'^Change: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
def filter(self, line):
line = line.decode('utf-8')
line = Stat.FILE_RE.sub('', line)
line = Stat.DEVICE_RE.sub('', line)
line = Stat.INODE_RE.sub('', line)
+ line = Stat.LINKS_RE.sub('', line)
line = Stat.ACCESS_TIME_RE.sub('', line)
line = Stat.CHANGE_TIME_RE.sub('', line)
return line.encode('utf-8')
So, I did:
guix build -S diffoscope
to get the source tarball, unpacked the sources. Patched. Packed. Then:
guix package -i diffoscope --with-source=diffoscope-96.tar.gz
and have a Links-free version of diffoscope in my profile (If I would
have thought about that earlier, I would have done it in a separate
profile and not in my main one)!
Björn
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]
next reply other threads:[~2018-08-09 9:46 UTC|newest]
Thread overview: 3+ messages / expand[flat|nested] mbox.gz Atom feed top
2018-08-09 9:45 Björn Höfling [this message]
2018-08-10 9:16 ` Output of guix build --check foo is not part of store deduplication Chris Marusich
2018-08-24 22:08 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=20180809114555.33f21652@alma-ubu \
--to=bjoern.hoefling@bjoernhoefling.de \
--cc=help-guix@gnu.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).