unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Leveraging the synergy of deduplication
@ 2015-03-25 13:46 Ludovic Courtès
  0 siblings, 0 replies; only message in thread
From: Ludovic Courtès @ 2015-03-25 13:46 UTC (permalink / raw)
  To: guix-devel

Currently the daemon implements a simple yet efficient way do
deduplicate files identical among store items.  The /gnu/store/.links
directory contains hard links to files in the store; the link name is
the base32-encoded SHA256 of the file.  When the daemon adds a new file
in the store, it checks in /gnu/store/.links whether an identical file
is already in store, and if so makes a hard link to that thing.

When installing, say, two different variants of texlive, which in
practice are 90% bit-identical, there’s a lot of deduplication
happening.  However, we still end up downloading the whole texlive
archive just to realize that we already have most of its files in store.

A solution to this would be to change the HTTP substitute protocol.
‘guix publish’ could serve content-addressed files.  For instance,

  http://example.org/1ghws12lrp62vvxxxqmxp7jgxv2p18ihiyq420ag77nh9bw5qsfg.file

would serve the contents of the store file that has the given hash.

The archive format would have to be different from the one currently
implemented by ‘write-file’: for regular files, ‘write-contents’ would
simply write the hash of the contents, and it would be up to the
substituter to go fetch that file if it’s not already in store (which
can be determined by looking it up in /gnu/store/.links.)

This is not very sophisticated, but has the advantage of being
relatively easy to implement in Guix itself.

The downside is that Hydra would most likely not implement this new
protocol (which would give us another incentive to move away from it.)

Thoughts?  Patches?  :-)

Ludo’.

PS: Title inspired by <http://www.sansbullshitsans.com/>.

^ permalink raw reply	[flat|nested] only message in thread

only message in thread, other threads:[~2015-03-25 13:46 UTC | newest]

Thread overview: (only message) (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-03-25 13:46 Leveraging the synergy of deduplication Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).