unofficial mirror of help-guix@gnu.org 
 help / color / mirror / Atom feed
* Output of guix build --check foo is not part of store deduplication
@ 2018-08-09  9:45 Björn Höfling
  2018-08-10  9:16 ` Chris Marusich
  2018-08-24 22:08 ` Ludovic Courtès
  0 siblings, 2 replies; 3+ messages in thread
From: Björn Höfling @ 2018-08-09  9:45 UTC (permalink / raw)
  To: Guix-Help

[-- Attachment #1: Type: text/plain, Size: 3998 bytes --]

Is there any reason why the output of 'guix build --check ...' is not
part of deduplication? I will explain my problem:

When checking for (un)reproducibility, we use something like:

guix build --check -K foo

That will build the package foo again and produce a store output

/gnu/store/hash..-foo-1.0.0-check

You can then use diffoscope to view the difference between the old and
the new '-check' output.

Usually, the store gets deduplicated, i.e. if files bar and baz have
the same content, they will hard-link to the same thing on disk. That's
cool for saving space if for example some package get's updated because
of a changed dependency but really there is no or little change to the
output files.

But the '-check' files are somehow not part of that deduplication. Even
if you enforce deduplication with guix gc --optimize. You can see it
like this:

ls -l  /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz 
-r--r--r--  1 root root 624 Jan  1  1970 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
-r--r--r-- 11 root root 624 Jan  1  1970 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz

ls -i  /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz 
46161304 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2-check/share/man/man3/shishi_asreq.3.gz
45141642 /gnu/store/zlxarsbwwkasy69cyv34jvzi7bgmajxz-shishi-1.0.2/share/man/man3/shishi_asreq.3.gz

The '-check' output has only one link count and the actual output has
11 links, because I have already so many store items/generations of
that package around. The inode differs.

If you now diffoscope them, diffoscope will call stat and then we get
diffs like:

│ │   --- /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2/share/info/libchop.info.gz
│ │ ├── +++ /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2-check/share/info/libchop.info.gz
│ │ │ ├── /gnu/store/as7vb5xx7vqdwmmqj9543470r49b4c0c-coreutils-8.28/bin/stat {}
│ │ │ │ @@ -1,8 +1,8 @@
│ │ │ │  
│ │ │ │    Size: 29524          Blocks: 64         IO Block: 4096   regular file
│ │ │ │ -Links: 3
│ │ │ │ +Links: 1


This is annoying because it hides the actual unreproducibility-problem. 
Is there any reason for that?


At least, I found a very guixy way around it:

There's a patch by Eelco to filter those Links out:

https://github.com/edolstra/diffoscope/commit/367f77bba8df0dbc89e63c9f66f05736adf5ec59

(with copy/paste errors):

 diffoscope/comparators/directory.py
@@ -47,14 +47,18 @@ def cmdline(self):
    FILE_RE = re.compile(r'^\s*File:.*$')
    DEVICE_RE = re.compile(r'Device: [0-9a-f]+h/[0-9]+d')
+   LINKS_RE = re.compile(r'Links: [0-9]+')
    ACCESS_TIME_RE = re.compile(r'^Access: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
    CHANGE_TIME_RE = re.compile(r'^Change: [0-9]{4}-[0-9]{2}-[0-9]{2}.*$')
    def filter(self, line):
        line = line.decode('utf-8')
        line = Stat.FILE_RE.sub('', line)
        line = Stat.DEVICE_RE.sub('', line)
        line = Stat.INODE_RE.sub('', line)
+       line = Stat.LINKS_RE.sub('', line)
        line = Stat.ACCESS_TIME_RE.sub('', line)
        line = Stat.CHANGE_TIME_RE.sub('', line)
        return line.encode('utf-8')


So, I did:

guix build -S diffoscope

to get the source tarball, unpacked the sources. Patched. Packed. Then:

guix package -i diffoscope --with-source=diffoscope-96.tar.gz

and have a Links-free version of diffoscope in my profile (If I would
have thought about that earlier, I would have done it in a separate
profile and not in my main one)!

Björn





[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 181 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Output of guix build --check foo is not part of store deduplication
  2018-08-09  9:45 Output of guix build --check foo is not part of store deduplication Björn Höfling
@ 2018-08-10  9:16 ` Chris Marusich
  2018-08-24 22:08 ` Ludovic Courtès
  1 sibling, 0 replies; 3+ messages in thread
From: Chris Marusich @ 2018-08-10  9:16 UTC (permalink / raw)
  To: Björn Höfling; +Cc: Guix-Help

[-- Attachment #1: Type: text/plain, Size: 1070 bytes --]

Hi Björn,

In registerOutputs from nix/libstore/build.cc, it seems that if a check
was requested (i.e., buildMode == bmCheck), an exception is thrown when
the build is found to be non-deterministic.  In addition, it looks like
that would prevent execution from reaching the line farther down where
we call worker.store.optimisePath(path), which does the deduplication
when deduplication is enabled (i.e., when settings.autoOptimiseStore
evaluates to true).

The current Nix source code seems to have the same structure, which
suggests that perhaps the current version of the nix-daemon behaves the
same way.  Perhaps you could reproduce the issue in Nix and ask about it
upstream?

Maybe there is a good reason why Eelco changed diffoscope to ignore the
links instead of changing the build.cc code to perform the
deduplication.  For example, maybe the work to ensure that deduplication
occurs even when a check fails is more complicated and error-prone than
simply updating diffoscope to ignore link counts.  It couldn't hurt to
ask!

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 3+ messages in thread

* Re: Output of guix build --check foo is not part of store deduplication
  2018-08-09  9:45 Output of guix build --check foo is not part of store deduplication Björn Höfling
  2018-08-10  9:16 ` Chris Marusich
@ 2018-08-24 22:08 ` Ludovic Courtès
  1 sibling, 0 replies; 3+ messages in thread
From: Ludovic Courtès @ 2018-08-24 22:08 UTC (permalink / raw)
  To: Björn Höfling; +Cc: Guix-Help

Hello!

Björn Höfling <bjoern.hoefling@bjoernhoefling.de> skribis:

> Usually, the store gets deduplicated, i.e. if files bar and baz have
> the same content, they will hard-link to the same thing on disk. That's
> cool for saving space if for example some package get's updated because
> of a changed dependency but really there is no or little change to the
> output files.
>
> But the '-check' files are somehow not part of that deduplication. Even
> if you enforce deduplication with guix gc --optimize. You can see it
> like this:

I think it’s on purpose: “-check” items are not registered as valid (in
the database), and thus it makes little sense to deduplicate them since
they’ll be GC’d rapidly.  (And Chris provides useful links to the
relevant code.)

> If you now diffoscope them, diffoscope will call stat and then we get
> diffs like:
>
> │ │   --- /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2/share/info/libchop.info.gz
> │ │ ├── +++ /gnu/store/h63cx6akyrv3m73lky585ba10qq3mydc-libchop-0.5.2-check/share/info/libchop.info.gz
> │ │ │ ├── /gnu/store/as7vb5xx7vqdwmmqj9543470r49b4c0c-coreutils-8.28/bin/stat {}
> │ │ │ │ @@ -1,8 +1,8 @@
> │ │ │ │  
> │ │ │ │    Size: 29524          Blocks: 64         IO Block: 4096   regular file
> │ │ │ │ -Links: 3
> │ │ │ │ +Links: 1
>
>
> This is annoying because it hides the actual unreproducibility-problem. 
> Is there any reason for that?

I think this information is useful for .deb or RPM packages, but
typically useless for Nix and Guix because we “canonicalize” file
metadata.

Diffoscope chould have a flag to disable ‘stat’ comparison altogether,
not just nlinks, but the patch by Eelco that you mention should have the
same effect in practice.

Ludo’.

^ permalink raw reply	[flat|nested] 3+ messages in thread

end of thread, other threads:[~2018-08-24 22:08 UTC | newest]

Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-08-09  9:45 Output of guix build --check foo is not part of store deduplication Björn Höfling
2018-08-10  9:16 ` Chris Marusich
2018-08-24 22:08 ` Ludovic Courtès

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).