all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Reproducible jar archives.
@ 2015-12-30 19:03 Ricardo Wurmus
  2016-01-01 15:51 ` Ludovic Courtès
  0 siblings, 1 reply; 2+ messages in thread
From: Ricardo Wurmus @ 2015-12-30 19:03 UTC (permalink / raw)
  To: guix-devel

Hi Guix,

Java libraries are usually installed as jar archives.  These are
(optionally compressed) archives containing the “.class” files;
optionally, they may also contain a manifest.

These archives are created by the “jar” tool, which is part of the JDK;
they can also be created by “fastjar”, an alternative implementation in
C.

Unlike “tar” the “jar” tool does not support anything like the “--mtime”
flag, so file modification dates cannot be reset before packing.  Also,
the manifest is usually generated automatically (unless manually
provided) and the generated directories and the manifest file itself
have uncontrollable timestamps.

Although compiling “.java” to “.class” with the javac compiler is
deterministic, packaging up the “.class” files to a “.jar” is not for
the above reasons.

For Java projects without a build system this is not so bad: we can
generate our own build.xml and build the project with Ant.  The
build.xml would just have to run a command to reset the modification
time of all files before archiving them.  (This is what my WIP
‘ant-build-system’ does and packages built with it are, in fact,
deterministic.)

For many non-trivial projects, however, we have no way to easily inject
a phase before “jar” is called, because of some indirection through
build plugins (in the case of Maven projects) or because of a convoluted
build.xml for Ant, which may use different Ant tasks to package the
“.jar”.  I only came up with two ways to get around this problem:

* build the “.jar” archive as usual, then unpack it, reset all file
  timestamps, and archive them again (this time without generating an
  additional manifest)

* patch up the “jar” tool so that it resets the mtime for archive
  contents.

The first approach might work if we can reliably find a point in the
build process where all “jar” archives have been created (after the
“build” phase?).  It’s an ugly solution because we have to do so much
more work for every “jar” (unpacking, touching, and repacking can take a
lot of time and space for big projects).

The second approach is tempting.  I looked at the “fastjar” code and
making it set the file modification time is trivial; of course this must
be optional (e.g. when a certain environment variable is set) lest this
affects operation under normal conditions.  But it’s also really hacky.

I haven’t looked at the sources for the JDK-provided “jar” tool yet, so
I cannot say if that would work.

What do you think about this?  Other ideas maybe?

~~ Ricardo

^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: Reproducible jar archives.
  2015-12-30 19:03 Reproducible jar archives Ricardo Wurmus
@ 2016-01-01 15:51 ` Ludovic Courtès
  0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2016-01-01 15:51 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Ricardo Wurmus <rekado@elephly.net> skribis:

> These archives are created by the “jar” tool, which is part of the JDK;
> they can also be created by “fastjar”, an alternative implementation in
> C.
>
> Unlike “tar” the “jar” tool does not support anything like the “--mtime”
> flag, so file modification dates cannot be reset before packing.  Also,
> the manifest is usually generated automatically (unless manually
> provided) and the generated directories and the manifest file itself
> have uncontrollable timestamps.

See also
<https://wiki.debian.org/ReproducibleBuilds/TimestampsInJarFiles>.

Debian leaves jar/fastjar unchanged and instead runs
‘strip-nondeterminism’ to remove the timestamps.

> For Java projects without a build system this is not so bad: we can
> generate our own build.xml and build the project with Ant.  The
> build.xml would just have to run a command to reset the modification
> time of all files before archiving them.  (This is what my WIP
> ‘ant-build-system’ does and packages built with it are, in fact,
> deterministic.)

OK.

> For many non-trivial projects, however, we have no way to easily inject
> a phase before “jar” is called, because of some indirection through
> build plugins (in the case of Maven projects) or because of a convoluted
> build.xml for Ant, which may use different Ant tasks to package the
> “.jar”.  I only came up with two ways to get around this problem:
>
> * build the “.jar” archive as usual, then unpack it, reset all file
>   timestamps, and archive them again (this time without generating an
>   additional manifest)
>
> * patch up the “jar” tool so that it resets the mtime for archive
>   contents.
>
> The first approach might work if we can reliably find a point in the
> build process where all “jar” archives have been created (after the
> “build” phase?).  It’s an ugly solution because we have to do so much
> more work for every “jar” (unpacking, touching, and repacking can take a
> lot of time and space for big projects).

Yeah.  ‘strip-nondeterminism’ essentially does that, I think (or with
some knowledge of the zip format, since that’s what Jar is, it could
directly modify the timestamps therein; Göran Weinholt’s Industria has
Scheme code to deal with zip files, FWIW.)

> The second approach is tempting.  I looked at the “fastjar” code and
> making it set the file modification time is trivial; of course this must
> be optional (e.g. when a certain environment variable is set) lest this
> affects operation under normal conditions.  But it’s also really hacky.
>
> I haven’t looked at the sources for the JDK-provided “jar” tool yet, so
> I cannot say if that would work.

Maybe we could modify jar & fastjar to honor SOURCE_DATE_EPOCH and even
submit the change upstream for discussion.

> What do you think about this?  Other ideas maybe?

It may be worth discussing with the other “reproducible build” folks at
<https://lists.reproducible-builds.org/listinfo/rb-general> and/or on
#reproducible-builds on OFTC.  They probably have useful ideas!

Thanks,
Ludo’.

^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2016-01-01 15:51 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2015-12-30 19:03 Reproducible jar archives Ricardo Wurmus
2016-01-01 15:51 ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.