* Storing serialised graph along with packages @ 2017-07-22 11:32 Ricardo Wurmus 2017-07-23 5:11 ` Catonano ` (2 more replies) 0 siblings, 3 replies; 8+ messages in thread From: Ricardo Wurmus @ 2017-07-22 11:32 UTC (permalink / raw) To: guix-devel Hey Guix, it always bothered me that after building a package we lose all of the beautiful features that Guix as a Guile library gives us. We always need to keep track of the Guix version at the time of building the package and only then can we hope to rebuild the same thing again at some point in the future. What do you think about storing the serialised subset of the package graph in a separate output of the package? Currently, the only place where we store anything meta is the database. Wouldn’t it be great if we could “dump an image” of the state of Guile when it has evaluated the section of the package graph that is needed to build it? Then we could just load the serialised state into Guile at a later point and inspect the package graph as if we had Guix checked out at the given version. I suppose we could also store this kind of information in the database. I’d really like the graph to stay alive even after Guix has moved on to later versions. It also sounds like a really lispy thing to do. What do you think? -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Storing serialised graph along with packages 2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus @ 2017-07-23 5:11 ` Catonano 2017-07-23 9:30 ` Ricardo Wurmus 2017-07-24 12:33 ` Ludovic Courtès 2 siblings, 0 replies; 8+ messages in thread From: Catonano @ 2017-07-23 5:11 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel [-- Attachment #1: Type: text/plain, Size: 2167 bytes --] 2017-07-22 13:32 GMT+02:00 Ricardo Wurmus <rekado@elephly.net>: > Hey Guix, > > it always bothered me that after building a package we lose all of the > beautiful features that Guix as a Guile library gives us. We always > need to keep track of the Guix version at the time of building the > package and only then can we hope to rebuild the same thing again at > some point in the future. > > What do you think about storing the serialised subset of the package > graph in a separate output of the package? Currently, the only place > where we store anything meta is the database. Wouldn’t it be great if > we could “dump an image” of the state of Guile when it has evaluated the > section of the package graph that is needed to build it? > > Then we could just load the serialised state into Guile at a later point > and inspect the package graph as if we had Guix checked out at the given > version. I suppose we could also store this kind of information in the > database. > > I’d really like the graph to stay alive even after Guix has moved on to > later versions. It also sounds like a really lispy thing to do. > > What do you think? > In the Clojure world, there's Datomic On the Datomic web site there are sentences like "Build flexible, distributed systems that can leverage the entire history of your critical data, not just the most current state" "Critical insights come from knowing the full story of your data, not just the most recent state." Disclaimer: I never used Datomic, I only read about it. I think that would be an extremely cool tool; it cold be useful for a lot of applications, not only Guix It could be useful for the academia, both for teaching and research. Stored computations can be very important A tool like that could have a place in the "data science" arena AND it could be an option for Datomic users (sorry Rich Hyckey), so it could attract some Clojure people to the copyleft world Also, the ability to store "images" borrows from the Lisp and Smalltalk tradition So, this is an extremely cool idea Too bad I wouldn't know where to start from :-/ [-- Attachment #2: Type: text/html, Size: 2777 bytes --] ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Storing serialised graph along with packages 2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus 2017-07-23 5:11 ` Catonano @ 2017-07-23 9:30 ` Ricardo Wurmus 2017-07-23 14:53 ` Ricardo Wurmus 2017-07-23 17:33 ` Jan Nieuwenhuizen 2017-07-24 12:33 ` Ludovic Courtès 2 siblings, 2 replies; 8+ messages in thread From: Ricardo Wurmus @ 2017-07-23 9:30 UTC (permalink / raw) To: guix-devel Ricardo Wurmus <rekado@elephly.net> writes: > What do you think about storing the serialised subset of the package > graph in a separate output of the package? Here’s a simple and stupid way of getting a little closer to this idea. This is serialize.scm: --8<---------------cut here---------------start------------->8--- (use-modules (system base language) (system base compile) (ice-9 match)) (match-let (((_ expr target) (command-line))) (with-input-from-string (format #f "~a" `(begin (use-modules (guix packages)) ,expr)) (lambda _ (call-with-output-file target (lambda (out) ((language-printer (lookup-language 'bytecode)) (read-and-compile (current-input-port) #:env (default-environment 'scheme) #:from 'scheme #:to 'bytecode #:opts (list #:to-file? #t)) out)))))) --8<---------------cut here---------------end--------------->8--- You run it like this: --8<---------------cut here---------------start------------->8--- ./pre-inst-env guile --no-auto-compile serialize.scm \ '(@@ (gnu packages statistics) r)' \ /tmp/foo.go --8<---------------cut here---------------end--------------->8--- You end up with /tmp/foo.go, a little binary which appears to contain everything you need to play with the closure of the “r” package. --8<---------------cut here---------------start------------->8--- rekado in guix: guile GNU Guile 2.2.2 Copyright (C) 1995-2017 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guile-user)> (load-compiled "/tmp/foo.go") $1 = #<package r@3.4.0 gnu/packages/statistics.scm:577 21c9000> scheme@(guile-user)> (package-inputs $1) $2 = (("pango" #<package pango@1.40.6 gnu/packages/gtk.scm:210 1f6bb40>) ("coreutils" #<package coreutils@8.26 gnu/packages/base.scm:286 1ed96c0>) ("curl" #<package curl@7.53.0 gnu/packages/curl.scm:41 1c29240>) ("openblas" #<package openblas@0.2.19 gnu/packages/maths.scm:2281 1cb5600>) ("gfortran" #<package gfortran@5.4.0 gnu/packages/gcc.scm:488 206a000>) ("icu4c" #<package icu4c@58.2 gnu/packages/icu4c.scm:33 1927840>) ("libjpeg" #<package libjpeg@9b gnu/packages/image.scm:193 20f5f00>) ("libpng" #<package libpng@1.6.28 gnu/packages/image.scm:67 20e1240>) ("libtiff" #<package libtiff@4.0.7 gnu/packages/image.scm:334 20f5b40>) ("libxt" #<package libxt@1.1.5 gnu/packages/xorg.scm:5223 20d2240>) ("pcre" #<package pcre@8.40 gnu/packages/pcre.scm:34 1742e40>) ("readline" #<package readline@7 .0 gnu/packages/readline.scm:43 16c50c0>) ("zlib" #<package zlib@1.2.11 gnu/packages/compression.scm:62 157d900>)) scheme@(guile-user)> (cadar $2) $3 = #<package pango@1.40.6 gnu/packages/gtk.scm:210 2bd9b40> scheme@(guile-user)> (package-description $3) $4 = "Pango is the core text and font handling library used in GNOME\napplications. It has extensive support for the different writing systems\nused throughout the world." scheme@(guile-user)> --8<---------------cut here---------------end--------------->8--- “/tmp/foo.go” in the above example is 68K in size. It is not stripped and I cannot find what options I can pass to the compiler to get it smaller. What do you think? Is this useful or is it silly? I’m pretty excited about it. The only limitation for future exploration is that one may need a legacy version of Guile in order to read the possibly outdated elf format currently used by Guile 2.2.2. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Storing serialised graph along with packages 2017-07-23 9:30 ` Ricardo Wurmus @ 2017-07-23 14:53 ` Ricardo Wurmus 2017-07-23 17:33 ` Jan Nieuwenhuizen 1 sibling, 0 replies; 8+ messages in thread From: Ricardo Wurmus @ 2017-07-23 14:53 UTC (permalink / raw) To: guix-devel Ricardo Wurmus <rekado@elephly.net> writes: > You run it like this: > > --8<---------------cut here---------------start------------->8--- > ./pre-inst-env guile --no-auto-compile serialize.scm \ > '(@@ (gnu packages statistics) r)' \ > /tmp/foo.go > --8<---------------cut here---------------end--------------->8--- > > You end up with /tmp/foo.go, a little binary which appears to contain > everything you need to play with the closure of the “r” package. Of course, I’m wrong. I forgot about GUILE_LOAD_PATH and GUILE_LOAD_COMPILED_PATH, which were still active. Unsetting them shows that this doesn’t actually work at all :( I wonder if this is possible at all. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Storing serialised graph along with packages 2017-07-23 9:30 ` Ricardo Wurmus 2017-07-23 14:53 ` Ricardo Wurmus @ 2017-07-23 17:33 ` Jan Nieuwenhuizen 1 sibling, 0 replies; 8+ messages in thread From: Jan Nieuwenhuizen @ 2017-07-23 17:33 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel Ricardo Wurmus writes: > Here’s a simple and stupid way of getting a little closer to this idea. > This is serialize.scm: A stupid example is even nicer! > What do you think? Is this useful or is it silly? I’m pretty excited > about it. The only limitation for future exploration is that one may > need a legacy version of Guile in order to read the possibly outdated > elf format currently used by Guile 2.2.2. I like the idea a lot. It has really annoyed me a couple of times to have a profile that "works" that I have no clue where i created it from and that I cannot reproduce. Greetings, janneke -- Jan Nieuwenhuizen <janneke@gnu.org> | GNU LilyPond http://lilypond.org Freelance IT http://JoyofSource.com | Avatar® http://AvatarAcademy.com ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Storing serialised graph along with packages 2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus 2017-07-23 5:11 ` Catonano 2017-07-23 9:30 ` Ricardo Wurmus @ 2017-07-24 12:33 ` Ludovic Courtès 2017-07-24 16:43 ` Ricardo Wurmus 2 siblings, 1 reply; 8+ messages in thread From: Ludovic Courtès @ 2017-07-24 12:33 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel Hello! Ricardo Wurmus <rekado@elephly.net> skribis: > it always bothered me that after building a package we lose all of the > beautiful features that Guix as a Guile library gives us. We always > need to keep track of the Guix version at the time of building the > package and only then can we hope to rebuild the same thing again at > some point in the future. > > What do you think about storing the serialised subset of the package > graph in a separate output of the package? Currently, the only place > where we store anything meta is the database. Wouldn’t it be great if > we could “dump an image” of the state of Guile when it has evaluated the > section of the package graph that is needed to build it? > > Then we could just load the serialised state into Guile at a later point > and inspect the package graph as if we had Guix checked out at the given > version. I suppose we could also store this kind of information in the > database. > > I’d really like the graph to stay alive even after Guix has moved on to > later versions. It also sounds like a really lispy thing to do. I sympathize with the goal, and I like the parallel with Lisp. However I’m skeptical about our ability to do something that is robust enough. The package → bag → derivation compilation process is “lossy” in the sense that at each layer we lose a bit of context from the higher layers. Each arrow potentially involves all the code and package definitions of Guix, as opposed to just a subset of the package definitions. We could certainly serialize package objects to sexps, but that would not capture the implementation of build systems, ‘package-derivation’, or even lower-level primitives. So this would be a rough approximation, at best. The safe way to achieve what you want would be to store the whole Guix tree (+ GUIX_PACKAGE_PATH), or a pointer to that (a Git commit). There’s a also the problem of bit-for-bit reproducibility: there’s an infinite set of source trees that can lead to a given store item. If we stored along with, say, Emacs, the Guix source tree/commit that led to it, then we’d effectively remove that equivalence (whitespace would become significant, for instance[*].) Tough issue! Ludo’. [*] For gexps we have a similar problem, where we’d like to have source code mapping, but that how that mapping is done should not affect the .drv itself. ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Storing serialised graph along with packages 2017-07-24 12:33 ` Ludovic Courtès @ 2017-07-24 16:43 ` Ricardo Wurmus 2017-07-25 8:14 ` Ludovic Courtès 0 siblings, 1 reply; 8+ messages in thread From: Ricardo Wurmus @ 2017-07-24 16:43 UTC (permalink / raw) To: Ludovic Courtès; +Cc: guix-devel Hi, > Ricardo Wurmus <rekado@elephly.net> skribis: > >> it always bothered me that after building a package we lose all of the >> beautiful features that Guix as a Guile library gives us. We always >> need to keep track of the Guix version at the time of building the >> package and only then can we hope to rebuild the same thing again at >> some point in the future. >> >> What do you think about storing the serialised subset of the package >> graph in a separate output of the package? Currently, the only place >> where we store anything meta is the database. Wouldn’t it be great if >> we could “dump an image” of the state of Guile when it has evaluated the >> section of the package graph that is needed to build it? >> >> Then we could just load the serialised state into Guile at a later point >> and inspect the package graph as if we had Guix checked out at the given >> version. I suppose we could also store this kind of information in the >> database. >> >> I’d really like the graph to stay alive even after Guix has moved on to >> later versions. It also sounds like a really lispy thing to do. > > I sympathize with the goal, and I like the parallel with Lisp. > > However I’m skeptical about our ability to do something that is robust > enough. The package → bag → derivation compilation process is “lossy” > in the sense that at each layer we lose a bit of context from the higher > layers. Each arrow potentially involves all the code and package > definitions of Guix, as opposed to just a subset of the package > definitions. We could certainly serialize package objects to sexps, but > that would not capture the implementation of build systems, > ‘package-derivation’, or even lower-level primitives. So this would be > a rough approximation, at best. Yes, indeed. My goal is to get a *better* approximation than what the references database currently gives us. Out of curiosity I’ve been playing with serialisation on the train ride and build systems are indeed a problem. In my tests I just skipped them until I figured something out. I played with cutting out the sources for the package expression (using “package-location”) and compiling the record to a file. Unfortunately, this won’t work for packages that are the result of generator procedures (like “gfortran”). My current approach is just to go through each field of a package record to generate an S-expression representing the package object, and then to compile that. In a clean environment I can load that module along with copies of the modules under the “guix” directory that implement things like “url-fetch” or the search-path-specifications record. To be able to traverse the dependency graph, one must load additional modules for each of the store items making up the package closure. (This would require that in addition to just embedded references we would need to record the store items that were present at build time, but that’s easy.) > The safe way to achieve what you want would be to store the whole Guix > tree (+ GUIX_PACKAGE_PATH), or a pointer to that (a Git commit). > > There’s a also the problem of bit-for-bit reproducibility: there’s an > infinite set of source trees that can lead to a given store item. If we > stored along with, say, Emacs, the Guix source tree/commit that led to > it, then we’d effectively remove that equivalence (whitespace would > become significant, for instance[*].) Hmm, that’s true. And it’s not just a problem of sources. We might still introduce unimportant differences if we only serialised the compiled objects and completely excluded the plain text source code, e.g. when we refactor supporting code that has no impact on the value of the result but which would lead to a change in the compiled module. Can we separate the two? Instead of installing modules (or the whole Guix tree) into the output directory of a store item, could we instead treat them like a table in the database? Building that part would not be part of the package derivation; it would just be a pre- or post-processing step, like registering the references in the database. -- Ricardo GPG: BCA6 89B6 3655 3801 C3C6 2150 197A 5888 235F ACAC https://elephly.net ^ permalink raw reply [flat|nested] 8+ messages in thread
* Re: Storing serialised graph along with packages 2017-07-24 16:43 ` Ricardo Wurmus @ 2017-07-25 8:14 ` Ludovic Courtès 0 siblings, 0 replies; 8+ messages in thread From: Ludovic Courtès @ 2017-07-25 8:14 UTC (permalink / raw) To: Ricardo Wurmus; +Cc: guix-devel Hi, Ricardo Wurmus <rekado@elephly.net> skribis: > Yes, indeed. My goal is to get a *better* approximation than what the > references database currently gives us. I think the problem is that this would remain an approximation; people might get a false sense that they can “decompile” a store item to a package object and then be disappointed. > Out of curiosity I’ve been playing with serialisation on the train ride > and build systems are indeed a problem. In my tests I just skipped > them until I figured something out. > > I played with cutting out the sources for the package expression (using > “package-location”) and compiling the record to a file. Unfortunately, > this won’t work for packages that are the result of generator procedures > (like “gfortran”). > > My current approach is just to go through each field of a package record > to generate an S-expression representing the package object, and then to > compile that. In a clean environment I can load that module along with > copies of the modules under the “guix” directory that implement things > like “url-fetch” or the search-path-specifications record. > > To be able to traverse the dependency graph, one must load additional > modules for each of the store items making up the package closure. > (This would require that in addition to just embedded references we > would need to record the store items that were present at build time, > but that’s easy.) ‘source-module-closure’ might be helpful: --8<---------------cut here---------------start------------->8--- scheme@(guile-user)> ,use(guix) scheme@(guile-user)> ,use(guix modules) scheme@(guile-user)> (length (source-module-closure '((gnu packages gcc)))) $2 = 272 --8<---------------cut here---------------end--------------->8--- >> The safe way to achieve what you want would be to store the whole Guix >> tree (+ GUIX_PACKAGE_PATH), or a pointer to that (a Git commit). >> >> There’s a also the problem of bit-for-bit reproducibility: there’s an >> infinite set of source trees that can lead to a given store item. If we >> stored along with, say, Emacs, the Guix source tree/commit that led to >> it, then we’d effectively remove that equivalence (whitespace would >> become significant, for instance[*].) > > Hmm, that’s true. And it’s not just a problem of sources. We might > still introduce unimportant differences if we only serialised the > compiled objects and completely excluded the plain text source code, > e.g. when we refactor supporting code that has no impact on the value of > the result but which would lead to a change in the compiled module. > > Can we separate the two? Instead of installing modules (or the whole > Guix tree) into the output directory of a store item, could we instead > treat them like a table in the database? Building that part would not > be part of the package derivation; it would just be a pre- or > post-processing step, like registering the references in the database. To me the source/store mapping should be a separate service. I imagine we could have some sort of a ledger that maps Git commits to sets of store items (we could even call that a “blockchain” and be buzzword-compliant ;-)). Guix could come with a library to maintain such a database, and ‘guix publish’ could even publish it. We’d have tools to query that database both for mappings and reverse-mappings, things like that. (There are also connections with the “binary transparency” ledger discussed at the R-B summit.) WDYT? Ludo’. ^ permalink raw reply [flat|nested] 8+ messages in thread
end of thread, other threads:[~2017-07-25 8:14 UTC | newest] Thread overview: 8+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2017-07-22 11:32 Storing serialised graph along with packages Ricardo Wurmus 2017-07-23 5:11 ` Catonano 2017-07-23 9:30 ` Ricardo Wurmus 2017-07-23 14:53 ` Ricardo Wurmus 2017-07-23 17:33 ` Jan Nieuwenhuizen 2017-07-24 12:33 ` Ludovic Courtès 2017-07-24 16:43 ` Ricardo Wurmus 2017-07-25 8:14 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).