It reads as follows: • Pairwise similarities of subsequent revisions of IceCat contain identical files representing ~10% of its total size (it’s the ratio of the size of the identical files to the total file size.) The first bar represents similarity between the first and the second revision; the second bar shows the similarity between the second one and the third one, etc. • For Emacs, identical files between subsequent revisions represent ~85% of its total size. Intuitively, this is because the package contains mostly source code (.el.gz files) and bytecode, which remains unchanged. • Diffoscope is written in Python so it’s similar to Emacs: its .py file remain unchanged across revisions, and they represent ~60% of its total size. • GIMP has lots of headers, locale data, and icons that don’t change. • For R, we see the effect of the upgrades to 4.0.2 and then 4.0.3, where similarity drops to ~25% instead of ~80% when changes are in dependencies. • For Open MPI, which is compiled C + headers, ~25% is shared across revisions. The reason I’m looking at this is to understand how much would be gained in terms of bandwidth usage if we were able to avoid downloading individual files already in the store. It would seem to be rather encouraging. Below is the program that does that. It grabs revision history from data.guix.gnu.org, fetches nars from ci.guix.gnu.org, computes a “digest” (list of files along with their hash and size), compares package digests pairwise, and plots the result with Guile-Charting. Example REPL session: --8<---------------cut here---------------start------------->8--- scheme@(similarities)> (pairwise-similarities (package-archive-contents "icecat" #:max 4)) updating substitutes from 'https://ci.guix.gnu.org'... 100.0% $86 = (17363915/196387883 11380615/98152193) scheme@(similarities)> (map exact->inexact $86) $87 = (0.08841642740249916 0.11594865740799087) […] scheme@(similarities)> ,pp (at-most 10 (package-instances "r-minimal") ) $100 = (#< version: "4.0.3" output: "/gnu/store/vv3ca1r5zw5y35xgkix4r80hdnncx52b-r-minimal-4.0.3"> #< version: "4.0.3" output: "/gnu/store/5dzad7nhhv3dvmap60d6gsj9ppflgzrd-r-minimal-4.0.3"> #< version: "4.0.3" output: "/gnu/store/01xi3sig314wgwa1j9sxk37vl816mj74-r-minimal-4.0.3"> #< version: "4.0.2" output: "/gnu/store/nv7lqhnm0mncqwdpkkhnlsgb562lcwff-r-minimal-4.0.2"> #< version: "4.0.2" output: "/gnu/store/w0izbm8q26dmyndhv46xr7dgz1irai1z-r-minimal-4.0.2"> #< version: "4.0.2" output: "/gnu/store/yd83ibzxjrb7cgcc6d4smx4pqcdl8r3p-r-minimal-4.0.2"> #< version: "4.0.1" output: "/gnu/store/kpdh0lwlwcwfmmfzqhwbi6j7m4zzxlmn-r-minimal-4.0.1"> #< version: "4.0.1" output: "/gnu/store/9x9nzzsiyn1q7g5myhgwjh0yx9js3nrj-r-minimal-4.0.1"> #< version: "4.0.0" output: "/gnu/store/ahbm2gsqc3420a23pcwrxd4pdhl7rdpp-r-minimal-4.0.0"> #< version: "4.0.0" output: "/gnu/store/0sgqhj2628js419wvw1vc1cw07wil7gr-r-minimal-4.0.0">) $101 = (#< version: "3.6.3" output: "/gnu/store/gmx6p0wk3xbc9ylv83zfj855azgjxr0p-r-minimal-3.6.3"> #< version: "3.6.2" output: "/gnu/store/dnb6fzp5295fcda66dnjk2y51mcva20f-r-minimal-3.6.2"> #< version: "3.6.1" output: "/gnu/store/gd6sm42b6fr1qgyp6p1zp3z4aavxwyk2-r-minimal-3.6.1"> #< version: "3.6.1" output: "/gnu/store/lpmfhxys3vsaqmqvj85r344ygfmlmlbg-r-minimal-3.6.1"> #< version: "3.6.1" output: "/gnu/store/4413h13v7zrb7rp9lvyp2gfx2laj60wm-r-minimal-3.6.1"> #< version: "3.6.1" output: "/gnu/store/zm5pl1y0wmh3c845498pbjvrzrm6sb07-r-minimal-3.6.1"> #< version: "3.6.1" output: "/gnu/store/xrv7y4xgrdrsx5qba7144cpw69qc5f0j-r-minimal-3.6.1"> #< version: "3.6.0" output: "/gnu/store/cbwhhqv69xi3k3g25dcfhwjkkf2427rp-r-minimal-3.6.0"> #< version: "3.6.0" output: "/gnu/store/69k46yr70zkzzz9v18wl7sxasha5m0a5-r-minimal-3.6.0"> #< version: "3.6.0" output: "/gnu/store/71w0383x0hay6ng1zaddz59by3g0gxaf-r-minimal-3.6.0"> #< version: "3.6.0" output: "/gnu/store/m317mg8faxp9qn949dnv1xgsxyw57s3x-r-minimal-3.6.0"> #< version: "3.5.3" output: "/gnu/store/33qdfplk4riig77vqg758m1zll16n6f0-r-minimal-3.5.3"> #< version: "3.5.3" output: "/gnu/store/g8gkrcxn49yj8zjr59l7y4k7wgar5brq-r-minimal-3.5.3"> #< version: "3.5.1" output: "/gnu/store/vrgbyvnx7b1sva2ij5a1gwrkbfwmg3lm-r-minimal-3.5.1"> #< version: "3.5.1" output: "/gnu/store/4fzw7s0cv2zbixw4wb58zq6lkd97ghnf-r-minimal-3.5.1"> #< version: "3.5.1" output: "/gnu/store/yb5048dr40nbmyfa1ar4hfiy3kd06v4c-r-minimal-3.5.1">) scheme@(similarities)> (similarity-chart '("icecat" "gimp" "openmpi" "emacs" "diffoscope" "r-minimal") "/tmp/t.png" #:max 8) updating substitutes from 'https://ci.guix.gnu.org'... 100.0% $102 = # --8<---------------cut here---------------end--------------->8--- Thoughts? :-) Ludo’.