unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#25598: R packages are not bit-reproducible
@ 2017-02-01  9:55 Ludovic Courtès
  2017-02-01 11:08 ` Ricardo Wurmus
                   ` (3 more replies)
  0 siblings, 4 replies; 9+ messages in thread
From: Ludovic Courtès @ 2017-02-01  9:55 UTC (permalink / raw)
  To: 25598

R packages build non-deterministically:

  https://www.gnu.org/software/guix/packages/reproducibility.html

--8<---------------cut here---------------start------------->8---
$ wget -q -O - https://mirror.hydra.gnu.org/nar/imiwif0wn7dxcc7f4zdq09y1l1132pqj-r-zoo-1.7-14 | bunzip2 | guix archive -x one
$ wget -q -O - https://bayfront.guixsd.org/nar/gzip/imiwif0wn7dxcc7f4zdq09y1l1132pqj-r-zoo-1.7-14 | gunzip | guix archive -x two
$ diff -ru one two
diff -ru one/site-library/zoo/DESCRIPTION two/site-library/zoo/DESCRIPTION
--- one/site-library/zoo/DESCRIPTION	2017-02-01 10:49:49.700423133 +0100
+++ two/site-library/zoo/DESCRIPTION	2017-02-01 10:49:57.224462007 +0100
@@ -28,4 +28,4 @@
 Maintainer: Achim Zeileis <Achim.Zeileis@R-project.org>
 Repository: CRAN
 Date/Publication: 2016-12-19 09:38:14
-Built: R 3.3.2; x86_64-unknown-linux-gnu; 2017-01-15 03:12:57 UTC; unix
+Built: R 3.3.2; x86_64-unknown-linux-gnu; 2017-01-23 21:48:44 UTC; unix
Binary files one/site-library/zoo/Meta/package.rds and two/site-library/zoo/Meta/package.rds differ
--8<---------------cut here---------------end--------------->8---

First there’s a timestamp in ‘DESCRIPTION’ (this is discussed at
<https://bugs.debian.org/782764>).

The .rds differences seem less trivial but there’s apparently a fix at
<https://bugs.debian.org/774031>.

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#25598: R packages are not bit-reproducible
  2017-02-01  9:55 bug#25598: R packages are not bit-reproducible Ludovic Courtès
@ 2017-02-01 11:08 ` Ricardo Wurmus
  2017-02-01 13:00   ` Ludovic Courtès
       [not found] ` <idj4m03whdy.fsf@bimsb-sys02.mdc-berlin.net>
                   ` (2 subsequent siblings)
  3 siblings, 1 reply; 9+ messages in thread
From: Ricardo Wurmus @ 2017-02-01 11:08 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 25598

[-- Attachment #1: Type: text/plain, Size: 163 bytes --]


It looks like R 3.3.2 already includes the fixes but they need to be
explicitly requested when installing packages.

Attached is a patch that seems to fix this.


[-- Attachment #2: 0001-build-r-build-system-Use-deterministic-built-date.patch --]
[-- Type: text/x-patch, Size: 1340 bytes --]

From fa42971cb7099e3b370565de5d3f454faecf0369 Mon Sep 17 00:00:00 2001
From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
Date: Wed, 1 Feb 2017 11:42:34 +0100
Subject: [PATCH] build: r-build-system: Use deterministic built date.

Fixes <http://bugs.gnu.org/25598>.

* guix/build/r-build-system.scm (install): Pass "--built-timestamp"
option to make build deterministic.
---
 guix/build/r-build-system.scm | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/guix/build/r-build-system.scm b/guix/build/r-build-system.scm
index 3fc13eb83..24aa73d4f 100644
--- a/guix/build/r-build-system.scm
+++ b/guix/build/r-build-system.scm
@@ -1,5 +1,5 @@
 ;;; GNU Guix --- Functional package management for GNU
-;;; Copyright © 2015 Ricardo Wurmus <rekado@elephly.net>
+;;; Copyright © 2015, 2017 Ricardo Wurmus <rekado@elephly.net>
 ;;;
 ;;; This file is part of GNU Guix.
 ;;;
@@ -84,6 +84,7 @@
          (params       (append configure-flags
                                (list "--install-tests"
                                      (string-append "--library=" site-library)
+                                     "--built-timestamp=1970-01-01"
                                      ".")))
          (site-path    (string-append site-library ":"
                                       (generate-site-path inputs))))
-- 
2.11.0


^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#25598: R packages are not bit-reproducible
  2017-02-01 11:08 ` Ricardo Wurmus
@ 2017-02-01 13:00   ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2017-02-01 13:00 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 25598

Hi!

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:

> From fa42971cb7099e3b370565de5d3f454faecf0369 Mon Sep 17 00:00:00 2001
> From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
> Date: Wed, 1 Feb 2017 11:42:34 +0100
> Subject: [PATCH] build: r-build-system: Use deterministic built date.
>
> Fixes <http://bugs.gnu.org/25598>.
>
> * guix/build/r-build-system.scm (install): Pass "--built-timestamp"
> option to make build deterministic.

Great.  I think it’s fine for master, that’s 276 packages but they don’t
take long to build.

Does that also help with the .rds discrepancies?

Thank you for the super-fast reply!

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#25598: [PATCH] More reproducibility fixes for R.
       [not found] ` <idj4m03whdy.fsf@bimsb-sys02.mdc-berlin.net>
@ 2017-02-10 12:38   ` Ludovic Courtès
  2017-03-08 17:56   ` Ricardo Wurmus
  1 sibling, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2017-02-10 12:38 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: Guix-devel, 25598

Hi!

Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:

> attached are more reproducibility fixes for R.  Unfortunately, it seems
> that files of type “rdb”, “rdx”, and “rds” are still not reproducible.
> This leaves us with the following files in R that are currently not
> reproducible:

Could it be that --built-timestamp is not honored for R modules within R?
Do the Debian patches mentioned in #25598 help?

> From e8cd2114b824ab6fed671c2214956ee22deeaedf Mon Sep 17 00:00:00 2001
> From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
> Date: Thu, 9 Feb 2017 14:34:57 +0100
> Subject: [PATCH 1/2] gnu: r: Fix syntax for INSTALL_OPTS.
>
> This is a follow-up to commit 4621acfd8272fa93d0530faa5f015b26a194b587.
>
> * gnu/packages/statistics.scm (r)[arguments]: Ensure that
> "--built-timestamp" appears on the same line as the other INSTALL_OPTS.

So the previous attempt had no effect, right?

LGTM.

> From 95b939f662a29b3cc6973a2fba286f32faf010c1 Mon Sep 17 00:00:00 2001
> From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
> Date: Thu, 9 Feb 2017 15:40:02 +0100
> Subject: [PATCH 2/2] gnu: r: Fix more reproducibility problems.
>
> * gnu/packages/statistics.scm (r)[arguments]: Patch locations in the
> build system that need special treatment for reproducibility.

LGTM, thanks!

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#25598: [PATCH] More reproducibility fixes for R.
       [not found] ` <87bmuafcgg.fsf@gnu.org>
@ 2017-03-08 11:53   ` Ricardo Wurmus
  0 siblings, 0 replies; 9+ messages in thread
From: Ricardo Wurmus @ 2017-03-08 11:53 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Guix-devel, 25598


Ludovic Courtès <ludo@gnu.org> writes:

> Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> skribis:
>
>> attached are more reproducibility fixes for R.  Unfortunately, it seems
>> that files of type “rdb”, “rdx”, and “rds” are still not reproducible.
>> This leaves us with the following files in R that are currently not
>> reproducible:
>
> Could it be that --built-timestamp is not honored for R modules within
> R?

With these two patches the flag *should* be honoured.  I don’t
understand yet where the rds differences come from, but I’ll
investigate this now.

> Do the Debian patches mentioned in #25598 help?

R 3.3.2 already includes the patches that were posted on Debian bug
#774031.  The patch at #782764 is the equivalent of our change to the
r-build-system to pass down the flag to R packages.


>> From e8cd2114b824ab6fed671c2214956ee22deeaedf Mon Sep 17 00:00:00 2001
>> From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
>> Date: Thu, 9 Feb 2017 14:34:57 +0100
>> Subject: [PATCH 1/2] gnu: r: Fix syntax for INSTALL_OPTS.
>>
>> This is a follow-up to commit 4621acfd8272fa93d0530faa5f015b26a194b587.
>>
>> * gnu/packages/statistics.scm (r)[arguments]: Ensure that
>> "--built-timestamp" appears on the same line as the other INSTALL_OPTS.
>
> So the previous attempt had no effect, right?

Yeah, it was not effective and I failed to use “guix build --check”
properly (without grafts), so I thought everything was fine already.

>> From 95b939f662a29b3cc6973a2fba286f32faf010c1 Mon Sep 17 00:00:00 2001
>> From: Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de>
>> Date: Thu, 9 Feb 2017 15:40:02 +0100
>> Subject: [PATCH 2/2] gnu: r: Fix more reproducibility problems.
>>
>> * gnu/packages/statistics.scm (r)[arguments]: Patch locations in the
>> build system that need special treatment for reproducibility.
>
> LGTM, thanks!

I pushed both to master.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#25598: [PATCH] More reproducibility fixes for R.
       [not found] ` <idj4m03whdy.fsf@bimsb-sys02.mdc-berlin.net>
  2017-02-10 12:38   ` bug#25598: [PATCH] More reproducibility fixes for R Ludovic Courtès
@ 2017-03-08 17:56   ` Ricardo Wurmus
  1 sibling, 0 replies; 9+ messages in thread
From: Ricardo Wurmus @ 2017-03-08 17:56 UTC (permalink / raw)
  To: Guix-devel; +Cc: 25598


Ricardo Wurmus <ricardo.wurmus@mdc-berlin.de> writes:

> attached are more reproducibility fixes for R.  Unfortunately, it seems
> that files of type “rdb”, “rdx”, and “rds” are still not reproducible.
> This leaves us with the following files in R that are currently not
> reproducible:

[…]
> /lib/R/library/boot/help/paths.rds
> /lib/R/library/class/help/paths.rds
> /lib/R/library/cluster/help/paths.rds
> /lib/R/library/codetools/help/paths.rds
> /lib/R/library/foreign/help/paths.rds
> /lib/R/library/KernSmooth/help/paths.rds
> /lib/R/library/lattice/help/paths.rds
> /lib/R/library/MASS/help/paths.rds
> /lib/R/library/Matrix/help/paths.rds
> /lib/R/library/mgcv/help/paths.rds
> /lib/R/library/nlme/help/paths.rds
> /lib/R/library/nnet/help/paths.rds
> /lib/R/library/rpart/help/paths.rds
> /lib/R/library/spatial/help/paths.rds
> /lib/R/library/survival/help/paths.rds
[…]
>
> I’ll try to figure out if there’s something we can do to make them
> reproducible (there’s a Debian bug report with relevant information).  I
> had originally assumed that 3.3.2 already included fixes for this.

The paths.rds files contain temporary paths like this:

    /tmp/guix-build-r-3.3.2.drv-0/RtmpCmeE9W/R.INSTALL43fb733deccc/survival/

These paths contain the random strings produced by “mkdtemp”.  This
happens in “src/main/sysutils.c”.

I don’t know if we need these files.  All of them are part of the
recommended packages.  I don’t know if these are also built by Debian.

I patched the package in a previous commit to override the built
timestamp, and it does seem to have an effect on the DESCRIPTION file,
but it does not affect the .rd* files.  More investigation required.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#25598: [PATCH] gnu: r: Fix remaining reproducibility problems.
  2017-02-01  9:55 bug#25598: R packages are not bit-reproducible Ludovic Courtès
                   ` (2 preceding siblings ...)
       [not found] ` <87bmuafcgg.fsf@gnu.org>
@ 2017-03-16  7:54 ` Ricardo Wurmus
  2017-03-16  9:00   ` Ludovic Courtès
  3 siblings, 1 reply; 9+ messages in thread
From: Ricardo Wurmus @ 2017-03-16  7:54 UTC (permalink / raw)
  To: 25598

Fixes <https://bugs.gnu.org/25598>.

* gnu/packages/statistics.scm (r)[arguments]: Add remaining reproducibility
fixes to "build-reproducibly" phase.
---
 gnu/packages/statistics.scm | 35 ++++++++++++++++++++++++++++++++++-
 1 file changed, 34 insertions(+), 1 deletion(-)

diff --git a/gnu/packages/statistics.scm b/gnu/packages/statistics.scm
index 656895273..2a20abd86 100644
--- a/gnu/packages/statistics.scm
+++ b/gnu/packages/statistics.scm
@@ -134,11 +134,44 @@ be output in text, PostScript, PDF or HTML.")
              #t))
          (add-after 'unpack 'build-reproducibly
            (lambda _
-             ;; Ensure that gzipped files are reproducible
+             ;; The documentation contains time stamps to demonstrate
+             ;; documentation generation in different phases.
+             (substitute* "src/library/tools/man/Rd2HTML.Rd"
+               (("\\\\%Y-\\\\%m-\\\\%d at \\\\%H:\\\\%M:\\\\%S")
+                "(removed for reproducibility)"))
+
+             ;; Remove timestamp from tracing environment.  This fixes
+             ;; reproducibility of "methods.rd{b,x}".
+             (substitute* "src/library/methods/R/trace.R"
+               (("dateCreated = Sys.time\\(\\)")
+                "dateCreated = as.POSIXct(\"1970-1-1 00:00:00\", tz = \"UTC\")"))
+
+             ;; Ensure that gzipped files are reproducible.
              (substitute* '("src/library/grDevices/Makefile.in"
                             "doc/manual/Makefile.in")
                (("R_GZIPCMD\\)" line)
                 (string-append line " -n")))
+
+             ;; The "srcfile" procedure in "src/library/base/R/srcfile.R"
+             ;; queries the mtime of a given file and records it in an object.
+             ;; This is acceptable at runtime to detect stale source files,
+             ;; but it destroys reproducibility at build time.
+             ;;
+             ;; Instead of disabling this feature, which may have unexpected
+             ;; consequences, we reset the mtime of generated files before
+             ;; passing them to the "srcfile" procedure.
+             (substitute* "src/library/Makefile.in"
+               (("@\\(cd base && \\$\\(MAKE\\) mkdesc\\)" line)
+                (string-append line "\n	find $(top_builddir)/library/tools | xargs touch -d '1970-01-01'; \n"))
+               (("@\\$\\(MAKE\\) Rdobjects" line)
+                (string-append "@find $(srcdir)/tools | xargs touch -d '1970-01-01'; \n	"
+                               line)))
+             (substitute* "src/library/tools/Makefile.in"
+               (("@\\$\\(INSTALL_DATA\\) all.R \\$\\(top_builddir\\)/library/\\$\\(pkg\\)/R/\\$\\(pkg\\)" line)
+                (string-append
+                 line
+                 "\n	find $(srcdir)/$(pkg) $(top_builddir)/library/$(pkg) | xargs touch -d \"1970-01-01\"; \n")))
+
              ;; This library is installed using "install_package_description",
              ;; so we need to pass the "builtStamp" argument.
              (substitute* "src/library/tools/Makefile.in"
-- 
2.12.0

^ permalink raw reply related	[flat|nested] 9+ messages in thread

* bug#25598: [PATCH] gnu: r: Fix remaining reproducibility problems.
  2017-03-16  7:54 ` bug#25598: [PATCH] gnu: r: Fix remaining reproducibility problems Ricardo Wurmus
@ 2017-03-16  9:00   ` Ludovic Courtès
  2017-03-17  9:17     ` Ricardo Wurmus
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2017-03-16  9:00 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: 25598

Hello!

Ricardo Wurmus <rekado@elephly.net> skribis:

> Fixes <https://bugs.gnu.org/25598>.
>
> * gnu/packages/statistics.scm (r)[arguments]: Add remaining reproducibility
> fixes to "build-reproducibly" phase.

Woow, impressive work.  You’re a reproducibility hero!

You’re welcome to push to master.

Thank you!

Ludo’.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#25598: [PATCH] gnu: r: Fix remaining reproducibility problems.
  2017-03-16  9:00   ` Ludovic Courtès
@ 2017-03-17  9:17     ` Ricardo Wurmus
  0 siblings, 0 replies; 9+ messages in thread
From: Ricardo Wurmus @ 2017-03-17  9:17 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 25598-done


Ludovic Courtès <ludo@gnu.org> writes:

> You’re welcome to push to master.

Done with commit 60c9190e21edfaa3a18be857b9a906b8521e948b.  Thanks for
the quick review!

-- 
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2017-03-17  9:18 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2017-02-01  9:55 bug#25598: R packages are not bit-reproducible Ludovic Courtès
2017-02-01 11:08 ` Ricardo Wurmus
2017-02-01 13:00   ` Ludovic Courtès
     [not found] ` <idj4m03whdy.fsf@bimsb-sys02.mdc-berlin.net>
2017-02-10 12:38   ` bug#25598: [PATCH] More reproducibility fixes for R Ludovic Courtès
2017-03-08 17:56   ` Ricardo Wurmus
     [not found] ` <87bmuafcgg.fsf@gnu.org>
2017-03-08 11:53   ` Ricardo Wurmus
2017-03-16  7:54 ` bug#25598: [PATCH] gnu: r: Fix remaining reproducibility problems Ricardo Wurmus
2017-03-16  9:00   ` Ludovic Courtès
2017-03-17  9:17     ` Ricardo Wurmus

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).