Storing serialised graph along with packages

unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed

* Storing serialised graph along with packages
@ 2017-07-22 11:32 Ricardo Wurmus
  2017-07-23  5:11 ` Catonano
                   ` (2 more replies)
  0 siblings, 3 replies; 8+ messages in thread
From: Ricardo Wurmus @ 2017-07-22 11:32 UTC (permalink / raw)
  To: guix-devel

Hey Guix,

it always bothered me that after building a package we lose all of the
beautiful features that Guix as a Guile library gives us.  We always
need to keep track of the Guix version at the time of building the
package and only then can we hope to rebuild the same thing again at
some point in the future.

What do you think about storing the serialised subset of the package
graph in a separate output of the package?  Currently, the only place
where we store anything meta is the database.  Wouldn’t it be great if
we could “dump an image” of the state of Guile when it has evaluated the
section of the package graph that is needed to build it?

Then we could just load the serialised state into Guile at a later point
and inspect the package graph as if we had Guix checked out at the given
version.  I suppose we could also store this kind of information in the
database.

I’d really like the graph to stay alive even after Guix has moved on to
later versions.  It also sounds like a really lispy thing to do.

What do you think?

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Storing serialised graph along with packages
  2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus
@ 2017-07-23  5:11 ` Catonano
  2017-07-23  9:30 ` Ricardo Wurmus
  2017-07-24 12:33 ` Ludovic Courtès
  2 siblings, 0 replies; 8+ messages in thread
From: Catonano @ 2017-07-23  5:11 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 2167 bytes --]

2017-07-22 13:32 GMT+02:00 Ricardo Wurmus <rekado@elephly.net>:

> Hey Guix,
>
> it always bothered me that after building a package we lose all of the
> beautiful features that Guix as a Guile library gives us.  We always
> need to keep track of the Guix version at the time of building the
> package and only then can we hope to rebuild the same thing again at
> some point in the future.
>
> What do you think about storing the serialised subset of the package
> graph in a separate output of the package?  Currently, the only place
> where we store anything meta is the database.  Wouldn’t it be great if
> we could “dump an image” of the state of Guile when it has evaluated the
> section of the package graph that is needed to build it?
>
> Then we could just load the serialised state into Guile at a later point
> and inspect the package graph as if we had Guix checked out at the given
> version.  I suppose we could also store this kind of information in the
> database.
>
> I’d really like the graph to stay alive even after Guix has moved on to
> later versions.  It also sounds like a really lispy thing to do.
>
> What do you think?
>

In the Clojure world, there's Datomic

On the Datomic web site there are sentences like

"Build flexible, distributed systems that can leverage the entire history
of your critical data, not just the most current state"

"Critical insights come from knowing the full story of your data, not just
the most recent state."

Disclaimer: I never used Datomic, I only read about it.

I think that would be an extremely cool tool; it cold be useful for a lot
of applications, not only Guix

It could be useful for the academia, both for teaching and research.
Stored computations can be very important
A tool like that could have a place in the "data science" arena

AND it could be an option for Datomic users (sorry Rich Hyckey), so it
could attract some Clojure people to the copyleft world

Also, the ability to store "images" borrows from the Lisp and Smalltalk
tradition

So, this is an extremely cool idea

Too bad I wouldn't know where to start from :-/

[-- Attachment #2: Type: text/html, Size: 2777 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Storing serialised graph along with packages
  2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus
  2017-07-23  5:11 ` Catonano
@ 2017-07-23  9:30 ` Ricardo Wurmus
  2017-07-23 14:53   ` Ricardo Wurmus
  2017-07-23 17:33   ` Jan Nieuwenhuizen
  2017-07-24 12:33 ` Ludovic Courtès
  2 siblings, 2 replies; 8+ messages in thread
From: Ricardo Wurmus @ 2017-07-23  9:30 UTC (permalink / raw)
  To: guix-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> What do you think about storing the serialised subset of the package
> graph in a separate output of the package?

Here’s a simple and stupid way of getting a little closer to this idea.
This is serialize.scm:

--8<---------------cut here---------------start------------->8---
(use-modules (system base language)
             (system base compile)
             (ice-9 match))

(match-let (((_ expr target) (command-line)))
  (with-input-from-string
      (format #f "~a" `(begin (use-modules (guix packages))
                              ,expr))
    (lambda _
      (call-with-output-file target
        (lambda (out)
          ((language-printer (lookup-language 'bytecode))
           (read-and-compile (current-input-port)
                             #:env (default-environment 'scheme)
                             #:from 'scheme #:to 'bytecode
                             #:opts (list #:to-file? #t))
           out))))))
--8<---------------cut here---------------end--------------->8---

You run it like this:

--8<---------------cut here---------------start------------->8---
    ./pre-inst-env guile --no-auto-compile serialize.scm \
        '(@@ (gnu packages statistics) r)' \
        /tmp/foo.go
--8<---------------cut here---------------end--------------->8---

You end up with /tmp/foo.go, a little binary which appears to contain
everything you need to play with the closure of the “r” package.

--8<---------------cut here---------------start------------->8---
rekado in guix: guile
GNU Guile 2.2.2
Copyright (C) 1995-2017 Free Software Foundation, Inc.

Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.

Enter `,help' for help.
scheme@(guile-user)> (load-compiled "/tmp/foo.go")
$1 = #<package r@3.4.0 gnu/packages/statistics.scm:577 21c9000>
scheme@(guile-user)> (package-inputs $1)
$2 = (("pango" #<package pango@1.40.6 gnu/packages/gtk.scm:210 1f6bb40>) ("coreutils" #<package coreutils@8.26 gnu/packages/base.scm:286 1ed96c0>) ("curl" #<package curl@7.53.0 gnu/packages/curl.scm:41 1c29240>) ("openblas" #<package openblas@0.2.19 gnu/packages/maths.scm:2281 1cb5600>) ("gfortran" #<package gfortran@5.4.0 gnu/packages/gcc.scm:488 206a000>) ("icu4c" #<package icu4c@58.2 gnu/packages/icu4c.scm:33 1927840>) ("libjpeg" #<package libjpeg@9b gnu/packages/image.scm:193 20f5f00>) ("libpng" #<package libpng@1.6.28 gnu/packages/image.scm:67 20e1240>) ("libtiff" #<package libtiff@4.0.7 gnu/packages/image.scm:334 20f5b40>) ("libxt" #<package libxt@1.1.5 gnu/packages/xorg.scm:5223 20d2240>) ("pcre" #<package pcre@8.40 gnu/packages/pcre.scm:34 1742e40>) ("readline" #<package readline@7
 .0 gnu/packages/readline.scm:43 16c50c0>) ("zlib" #<package zlib@1.2.11 gnu/packages/compression.scm:62 157d900>))
scheme@(guile-user)> (cadar $2)
$3 = #<package pango@1.40.6 gnu/packages/gtk.scm:210 2bd9b40>
scheme@(guile-user)> (package-description $3)
$4 = "Pango is the core text and font handling library used in GNOME\napplications.  It has extensive support for the different writing systems\nused throughout the world."
scheme@(guile-user)>
--8<---------------cut here---------------end--------------->8---

 “/tmp/foo.go” in the above example is 68K in size.  It is not stripped
and I cannot find what options I can pass to the compiler to get it
smaller.

What do you think?  Is this useful or is it silly?  I’m pretty excited
about it.  The only limitation for future exploration is that one may
need a legacy version of Guile in order to read the possibly outdated
elf format currently used by Guile 2.2.2.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Storing serialised graph along with packages
  2017-07-23  9:30 ` Ricardo Wurmus
@ 2017-07-23 14:53   ` Ricardo Wurmus
  2017-07-23 17:33   ` Jan Nieuwenhuizen
  1 sibling, 0 replies; 8+ messages in thread
From: Ricardo Wurmus @ 2017-07-23 14:53 UTC (permalink / raw)
  To: guix-devel


Ricardo Wurmus <rekado@elephly.net> writes:

> You run it like this:
>
> --8<---------------cut here---------------start------------->8---
>     ./pre-inst-env guile --no-auto-compile serialize.scm \
>         '(@@ (gnu packages statistics) r)' \
>         /tmp/foo.go
> --8<---------------cut here---------------end--------------->8---
>
> You end up with /tmp/foo.go, a little binary which appears to contain
> everything you need to play with the closure of the “r” package.

Of course, I’m wrong.  I forgot about GUILE_LOAD_PATH and
GUILE_LOAD_COMPILED_PATH, which were still active.  Unsetting them shows
that this doesn’t actually work at all :(

I wonder if this is possible at all.

-- 
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Storing serialised graph along with packages
  2017-07-23  9:30 ` Ricardo Wurmus
  2017-07-23 14:53   ` Ricardo Wurmus
@ 2017-07-23 17:33   ` Jan Nieuwenhuizen
  1 sibling, 0 replies; 8+ messages in thread
From: Jan Nieuwenhuizen @ 2017-07-23 17:33 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Ricardo Wurmus writes:

> Here’s a simple and stupid way of getting a little closer to this idea.
> This is serialize.scm:

A stupid example is even nicer!

> What do you think?  Is this useful or is it silly?  I’m pretty excited
> about it.  The only limitation for future exploration is that one may
> need a legacy version of Guile in order to read the possibly outdated
> elf format currently used by Guile 2.2.2.

I like the idea a lot.  It has really annoyed me a couple of times to
have a profile that "works" that I have no clue where i created it from
and that I cannot reproduce.

Greetings,
janneke

-- 
Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org
Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.com

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Storing serialised graph along with packages
  2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus
  2017-07-23  5:11 ` Catonano
  2017-07-23  9:30 ` Ricardo Wurmus
@ 2017-07-24 12:33 ` Ludovic Courtès
  2017-07-24 16:43   ` Ricardo Wurmus
  2 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2017-07-24 12:33 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hello!

Ricardo Wurmus <rekado@elephly.net> skribis:

> it always bothered me that after building a package we lose all of the
> beautiful features that Guix as a Guile library gives us.  We always
> need to keep track of the Guix version at the time of building the
> package and only then can we hope to rebuild the same thing again at
> some point in the future.
>
> What do you think about storing the serialised subset of the package
> graph in a separate output of the package?  Currently, the only place
> where we store anything meta is the database.  Wouldn’t it be great if
> we could “dump an image” of the state of Guile when it has evaluated the
> section of the package graph that is needed to build it?
>
> Then we could just load the serialised state into Guile at a later point
> and inspect the package graph as if we had Guix checked out at the given
> version.  I suppose we could also store this kind of information in the
> database.
>
> I’d really like the graph to stay alive even after Guix has moved on to
> later versions.  It also sounds like a really lispy thing to do.

I sympathize with the goal, and I like the parallel with Lisp.

However I’m skeptical about our ability to do something that is robust
enough.  The package → bag → derivation compilation process is “lossy”
in the sense that at each layer we lose a bit of context from the higher
layers.  Each arrow potentially involves all the code and package
definitions of Guix, as opposed to just a subset of the package
definitions.  We could certainly serialize package objects to sexps, but
that would not capture the implementation of build systems,
‘package-derivation’, or even lower-level primitives.  So this would be
a rough approximation, at best.

The safe way to achieve what you want would be to store the whole Guix
tree (+ GUIX_PACKAGE_PATH), or a pointer to that (a Git commit).

There’s a also the problem of bit-for-bit reproducibility: there’s an
infinite set of source trees that can lead to a given store item.  If we
stored along with, say, Emacs, the Guix source tree/commit that led to
it, then we’d effectively remove that equivalence (whitespace would
become significant, for instance[*].)

Tough issue!

Ludo’.

[*] For gexps we have a similar problem, where we’d like to have source
    code mapping, but that how that mapping is done should not affect
    the .drv itself.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Storing serialised graph along with packages
  2017-07-24 12:33 ` Ludovic Courtès
@ 2017-07-24 16:43   ` Ricardo Wurmus
  2017-07-25  8:14     ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Ricardo Wurmus @ 2017-07-24 16:43 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

Hi,

> Ricardo Wurmus <rekado@elephly.net> skribis:
>
>> it always bothered me that after building a package we lose all of the
>> beautiful features that Guix as a Guile library gives us.  We always
>> need to keep track of the Guix version at the time of building the
>> package and only then can we hope to rebuild the same thing again at
>> some point in the future.
>>
>> What do you think about storing the serialised subset of the package
>> graph in a separate output of the package?  Currently, the only place
>> where we store anything meta is the database.  Wouldn’t it be great if
>> we could “dump an image” of the state of Guile when it has evaluated the
>> section of the package graph that is needed to build it?
>>
>> Then we could just load the serialised state into Guile at a later point
>> and inspect the package graph as if we had Guix checked out at the given
>> version.  I suppose we could also store this kind of information in the
>> database.
>>
>> I’d really like the graph to stay alive even after Guix has moved on to
>> later versions.  It also sounds like a really lispy thing to do.
>
> I sympathize with the goal, and I like the parallel with Lisp.
>
> However I’m skeptical about our ability to do something that is robust
> enough.  The package → bag → derivation compilation process is “lossy”
> in the sense that at each layer we lose a bit of context from the higher
> layers.  Each arrow potentially involves all the code and package
> definitions of Guix, as opposed to just a subset of the package
> definitions.  We could certainly serialize package objects to sexps, but
> that would not capture the implementation of build systems,
> ‘package-derivation’, or even lower-level primitives.  So this would be
> a rough approximation, at best.

Yes, indeed.  My goal is to get a *better* approximation than what the
references database currently gives us.

Out of curiosity I’ve been playing with serialisation on the train ride
and build systems are indeed a problem.  In my tests I just skipped
them until I figured something out.

I played with cutting out the sources for the package expression (using
“package-location”) and compiling the record to a file.  Unfortunately,
this won’t work for packages that are the result of generator procedures
(like “gfortran”).

My current approach is just to go through each field of a package record
to generate an S-expression representing the package object, and then to
compile that.  In a clean environment I can load that module along with
copies of the modules under the “guix” directory that implement things
like “url-fetch” or the search-path-specifications record.

To be able to traverse the dependency graph, one must load additional
modules for each of the store items making up the package closure.
(This would require that in addition to just embedded references we
would need to record the store items that were present at build time,
but that’s easy.)

> The safe way to achieve what you want would be to store the whole Guix
> tree (+ GUIX_PACKAGE_PATH), or a pointer to that (a Git commit).
>
> There’s a also the problem of bit-for-bit reproducibility: there’s an
> infinite set of source trees that can lead to a given store item.  If we
> stored along with, say, Emacs, the Guix source tree/commit that led to
> it, then we’d effectively remove that equivalence (whitespace would
> become significant, for instance[*].)

Hmm, that’s true.  And it’s not just a problem of sources.  We might
still introduce unimportant differences if we only serialised the
compiled objects and completely excluded the plain text source code,
e.g. when we refactor supporting code that has no impact on the value of
the result but which would lead to a change in the compiled module.

Can we separate the two?  Instead of installing modules (or the whole
Guix tree) into the output directory of a store item, could we instead
treat them like a table in the database?  Building that part would not
be part of the package derivation; it would just be a pre- or
post-processing step, like registering the references in the database.

--
Ricardo

GPG: BCA6 89B6 3655 3801 C3C6  2150 197A 5888 235F ACAC
https://elephly.net

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Storing serialised graph along with packages
  2017-07-24 16:43   ` Ricardo Wurmus
@ 2017-07-25  8:14     ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2017-07-25  8:14 UTC (permalink / raw)
  To: Ricardo Wurmus; +Cc: guix-devel

Hi,

Ricardo Wurmus <rekado@elephly.net> skribis:

> Yes, indeed.  My goal is to get a *better* approximation than what the
> references database currently gives us.

I think the problem is that this would remain an approximation; people
might get a false sense that they can “decompile” a store item to a
package object and then be disappointed.

> Out of curiosity I’ve been playing with serialisation on the train ride
> and build systems are indeed a problem.  In my tests I just skipped
> them until I figured something out.
>
> I played with cutting out the sources for the package expression (using
> “package-location”) and compiling the record to a file.  Unfortunately,
> this won’t work for packages that are the result of generator procedures
> (like “gfortran”).
>
> My current approach is just to go through each field of a package record
> to generate an S-expression representing the package object, and then to
> compile that.  In a clean environment I can load that module along with
> copies of the modules under the “guix” directory that implement things
> like “url-fetch” or the search-path-specifications record.
>
> To be able to traverse the dependency graph, one must load additional
> modules for each of the store items making up the package closure.
> (This would require that in addition to just embedded references we
> would need to record the store items that were present at build time,
> but that’s easy.)

‘source-module-closure’ might be helpful:

--8<---------------cut here---------------start------------->8---
scheme@(guile-user)> ,use(guix)
scheme@(guile-user)> ,use(guix modules)
scheme@(guile-user)> (length (source-module-closure '((gnu packages gcc))))
$2 = 272
--8<---------------cut here---------------end--------------->8---

>> The safe way to achieve what you want would be to store the whole Guix
>> tree (+ GUIX_PACKAGE_PATH), or a pointer to that (a Git commit).
>>
>> There’s a also the problem of bit-for-bit reproducibility: there’s an
>> infinite set of source trees that can lead to a given store item.  If we
>> stored along with, say, Emacs, the Guix source tree/commit that led to
>> it, then we’d effectively remove that equivalence (whitespace would
>> become significant, for instance[*].)
>
> Hmm, that’s true.  And it’s not just a problem of sources.  We might
> still introduce unimportant differences if we only serialised the
> compiled objects and completely excluded the plain text source code,
> e.g. when we refactor supporting code that has no impact on the value of
> the result but which would lead to a change in the compiled module.
>
> Can we separate the two?  Instead of installing modules (or the whole
> Guix tree) into the output directory of a store item, could we instead
> treat them like a table in the database?  Building that part would not
> be part of the package derivation; it would just be a pre- or
> post-processing step, like registering the references in the database.

To me the source/store mapping should be a separate service.

I imagine we could have some sort of a ledger that maps Git commits to
sets of store items (we could even call that a “blockchain” and be
buzzword-compliant ;-)).  Guix could come with a library to maintain
such a database, and ‘guix publish’ could even publish it.  We’d have
tools to query that database both for mappings and reverse-mappings,
things like that.

(There are also connections with the “binary transparency” ledger
discussed at the R-B summit.)

WDYT?

Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2017-07-25  8:14 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus
2017-07-23  5:11 ` Catonano
2017-07-23  9:30 ` Ricardo Wurmus
2017-07-23 14:53   ` Ricardo Wurmus
2017-07-23 17:33   ` Jan Nieuwenhuizen
2017-07-24 12:33 ` Ludovic Courtès
2017-07-24 16:43   ` Ricardo Wurmus
2017-07-25  8:14     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).