* Some stats about the graph of dependencies
@ 2022-12-09 17:29 zimoun
2022-12-10 0:19 ` jbranso
0 siblings, 1 reply; 3+ messages in thread
From: zimoun @ 2022-12-09 17:29 UTC (permalink / raw)
To: Guix Devel
[-- Attachment #1: Type: text/plain, Size: 4734 bytes --]
Hi,
Preparing some Python stuff, I was toying with the package
python-networkx. And Guix is awesome because it is easy to extract the
graph of dependencies.
Here dependencies are just inputs, native-inputs and propagated-inputs.
It could be interesting to also include build-system dependencies, I
have been lazy. :-)
My initial question is to know what are the “essentials”? By essential,
I mean the “important“ ones, the “hot” ones, etc. The ones which are
“influencers” – yeah the world is a social network. :-)
First, let extract the graph with a tiny Scheme script:
$ guix repl -- packages-to-dict.scm > dod.py
Then, let import that into an IPython session:
$ guix shell python python-ipython \
python-scipy python-matplotlib python-networkx -- ipython
and run another tiny Python script for plotting. See Figure attached.
We can compare a link analysis metrics [1] and a centrality measure
[2]; say PageRank [3] and Eigenvector [4]. More the value is large and
higher the package is “important“ (for this metrics).
And the Directed and Undirected graphs can be compared, using Networkx
[5,6]. Well, Eigenvector centrality (or Katz centrality [7]) is failing
because the power iteration does not converge but other metrics could be
also considered. Here is just a first rough toy. :-)
According to PageRank applied to the Directed Graph, the 10 most
“important” packages are:
--8<---------------cut here---------------start------------->8---
[('pkg-config-0.29.2', 0.02418335991713879),
('perl-5.34.0', 0.015404032767249512),
('coreutils-minimal-8.32', 0.013240458675517012),
('zlib-1.2.11', 0.009107245584307803),
('python-pytest-6.2.5', 0.008413060648307678),
('ncurses-6.2.20210619', 0.007598925467605917),
('r-knitr-1.41', 0.00554772892485958),
('sbcl-rt-1990.12.19-1.a6a7503', 0.004884721933452539),
('bzip2-1.0.8', 0.004800877844001881),
('python-3.9.9', 0.00415536078558266)]
--8<---------------cut here---------------end--------------->8---
And if we compare the 3 results (Undirected with PageRank and
Eigenvector, and Directed with PageRank only, then 10 most “important”
packages are:
--8<---------------cut here---------------start------------->8---
['pkg-config-0.29.2',
'glib-2.70.2',
'zlib-1.2.11',
'gtk+-3.24.30',
'perl-5.34.0',
'gettext-minimal-0.21',
'qtbase-5.15.5',
'libxml2-2.9.12',
'python-3.9.9',
'autoconf-2.69']
--8<---------------cut here---------------end--------------->8---
Somehow, it means that these packages have an high influence on all the
others. Now, we can roughly compare with the release-manifest.scm [8],
--8<---------------cut here---------------start------------->8---
'("bootstrap-tarballs" "gcc-toolchain" "nss-certs"
"openssh" "emacs" "vim" "python" "guile" "guix")))
'("coreutils" "grep" "findutils" "gawk" "make"
#;"gcc-toolchain" "tar" "xz")))
'("xorg-server" "xfce" "gnome" "mate" "enlightenment"
"openbox" "awesome" "i3-wm" "ratpoison"
"emacs" "emacs-exwm" "emacs-desktop-environment"
"xlockmore" "slock" "libreoffice"
"connman" "network-manager" "network-manager-applet"
"openssh" "ntp" "tor"
"linux-libre" "grub-hybrid"
'("coreutils" "grep" "sed" "findutils" "diffutils" "patch"
"gawk" "gettext" "gzip" "xz"
"hello" "zlib"))))
--8<---------------cut here---------------end--------------->8---
Well, we could investigate more and play more with some graphs tools.
For instance, include all the build-system dependencies and so on.
Some list about “statistically important” packages could help for
improving the list of “essential” packages.
Although Python is great, I would like to run Guile. Any Guile library
for manipulating graph is around?
All that to say, Guix is great! :-) And perhaps some of you have already
some Guile code for analysing graphs. Maybe.
Well, comment or idea is welcome. :-)
1: <https://en.wikipedia.org/wiki/Network_theory#Link_analysis>
2: <https://en.wikipedia.org/wiki/Network_theory#Centrality_measures>
3: <https://en.wikipedia.org/wiki/PageRank>
4: <https://en.wikipedia.org/wiki/Eigenvector_centrality>
5: <https://networkx.org/documentation/stable/reference/algorithms/link_analysis.html>
6: <https://networkx.org/documentation/stable/reference/algorithms/centrality.html>
7: <https://en.wikipedia.org/wiki/Katz_centrality>
8: <https://git.savannah.gnu.org/cgit/guix.git/tree/etc/release-manifest.scm#n47>
Cheers,
simon
[-- Attachment #2: figure --]
[-- Type: image/png, Size: 76269 bytes --]
[-- Attachment #3: packages-to-dict.scm --]
[-- Type: application/octet-stream, Size: 1285 bytes --]
(use-modules (guix)
(gnu)
(srfi srfi-1)
(ice-9 match)
)
(define (all-packages)
(fold-packages cons '()))
;(define %packages (take (all-packages) 10))
(define %packages (all-packages))
(define name-version
(match-lambda
((? package? package)
(string-append
(package-name package)
"-"
(package-version package)))
(other other)))
(define (dictify lst)
(for-each (lambda (package)
(format #t " ~s:{}, " (name-version package)))
lst))
(define (list-inputs package)
(append-map
(lambda (inputs)
(map (match-lambda
((_ p out ...) (match p
((? package? pkg) pkg)
((? origin? orig)
(symbol->string (gensym "orig-")))
(_
(symbol->string (gensym "else-"))))))
(inputs package)))
(list package-inputs
package-native-inputs
package-propagated-inputs)))
(format #t "dod = {~%")
(for-each (lambda (package)
(format #t "~s: {" (name-version package))
(dictify (list-inputs package))
(format #t "},~%"))
%packages)
(format #t "~%}~%")
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #4: stats.py --]
[-- Type: text/x-python, Size: 1204 bytes --]
import networkx as nx
import matplotlib
matplotlib.use('TkAgg')
import matplotlib.pyplot as plt
import numpy as np
exec(open("dod.py").read())
UG = nx.Graph(dod)
DG = nx.DiGraph(dod)
nnodes = DG.number_of_nodes()
# nx.draw(G, with_labels=True);plt.show()
metrics = [
(UG, nx.eigenvector_centrality),
(UG, nx.pagerank),
(DG, nx.pagerank),
]
results = [sorted(metric(G).items(), key=lambda k: k[1], reverse=True)
for G, metric in metrics]
x = [i+1 for i in range(nnodes)]
y = lambda rs: [v for _, v in rs]
plt.loglog(x, y(results[0]), label="eigen Undirected", color='red', linestyle='-.')
plt.loglog(x, y(results[1]), label="pagerank Undirected", color='blue', linestyle='--')
plt.loglog(x, y(results[2]), label="pagerank Directed", color='blue', linestyle='-')
plt.legend()
plt.xlabel("packages")
plt.ylabel("metric value")
plt.grid()
plt.xlim([1, nnodes+2])
plt.ylim([1e-5, 1])
plt.show()
def common(results, n):
packages = results[0][:2*n].copy()
for result in results[1:]:
for name, value in packages:
if not name in [name for name, _ in result]:
packages.remove((name, value))
return [name for name, _ in packages[:n]]
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Some stats about the graph of dependencies
2022-12-09 17:29 Some stats about the graph of dependencies zimoun
@ 2022-12-10 0:19 ` jbranso
2022-12-10 13:30 ` zimoun
0 siblings, 1 reply; 3+ messages in thread
From: jbranso @ 2022-12-10 0:19 UTC (permalink / raw)
To: zimoun, Guix Devel
December 9, 2022 12:32 PM, "zimoun" <zimon.toutoune@gmail.com> wrote:
> Hi,
>
> Preparing some Python stuff, I was toying with the package
> python-networkx. And Guix is awesome because it is easy to extract the
> graph of dependencies.
>
> Here dependencies are just inputs, native-inputs and propagated-inputs.
> It could be interesting to also include build-system dependencies, I
> have been lazy. :-)
>
> My initial question is to know what are the “essentials”? By essential,
> I mean the “important“ ones, the “hot” ones, etc. The ones which are
> “influencers” – yeah the world is a social network. :-)
>
> First, let extract the graph with a tiny Scheme script:
>
> $ guix repl -- packages-to-dict.scm > dod.py
>
> Then, let import that into an IPython session:
>
> $ guix shell python python-ipython \
> python-scipy python-matplotlib python-networkx -- ipython
>
> and run another tiny Python script for plotting. See Figure attached.
>
> We can compare a link analysis metrics [1] and a centrality measure
> [2]; say PageRank [3] and Eigenvector [4]. More the value is large and
> higher the package is “important“ (for this metrics).
>
> And the Directed and Undirected graphs can be compared, using Networkx
> [5,6]. Well, Eigenvector centrality (or Katz centrality [7]) is failing
> because the power iteration does not converge but other metrics could be
> also considered. Here is just a first rough toy. :-)
>
> According to PageRank applied to the Directed Graph, the 10 most
> “important” packages are:
>
> --8<---------------cut here---------------start------------->8---
> [('pkg-config-0.29.2', 0.02418335991713879),
> ('perl-5.34.0', 0.015404032767249512),
> ('coreutils-minimal-8.32', 0.013240458675517012),
> ('zlib-1.2.11', 0.009107245584307803),
> ('python-pytest-6.2.5', 0.008413060648307678),
> ('ncurses-6.2.20210619', 0.007598925467605917),
> ('r-knitr-1.41', 0.00554772892485958),
> ('sbcl-rt-1990.12.19-1.a6a7503', 0.004884721933452539),
> ('bzip2-1.0.8', 0.004800877844001881),
> ('python-3.9.9', 0.00415536078558266)]
> --8<---------------cut here---------------end--------------->8---
>
> And if we compare the 3 results (Undirected with PageRank and
> Eigenvector, and Directed with PageRank only, then 10 most “important”
> packages are:
>
> --8<---------------cut here---------------start------------->8---
> ['pkg-config-0.29.2',
> 'glib-2.70.2',
> 'zlib-1.2.11',
> 'gtk+-3.24.30',
> 'perl-5.34.0',
> 'gettext-minimal-0.21',
> 'qtbase-5.15.5',
> 'libxml2-2.9.12',
> 'python-3.9.9',
> 'autoconf-2.69']
> --8<---------------cut here---------------end--------------->8---
>
> Somehow, it means that these packages have an high influence on all the
> others. Now, we can roughly compare with the release-manifest.scm [8],
>
> --8<---------------cut here---------------start------------->8---
> '("bootstrap-tarballs" "gcc-toolchain" "nss-certs"
> "openssh" "emacs" "vim" "python" "guile" "guix")))
> '("coreutils" "grep" "findutils" "gawk" "make"
> #;"gcc-toolchain" "tar" "xz")))
> '("xorg-server" "xfce" "gnome" "mate" "enlightenment"
> "openbox" "awesome" "i3-wm" "ratpoison"
> "emacs" "emacs-exwm" "emacs-desktop-environment"
> "xlockmore" "slock" "libreoffice"
> "connman" "network-manager" "network-manager-applet"
> "openssh" "ntp" "tor"
> "linux-libre" "grub-hybrid"
> '("coreutils" "grep" "sed" "findutils" "diffutils" "patch"
> "gawk" "gettext" "gzip" "xz"
> "hello" "zlib"))))
> --8<---------------cut here---------------end--------------->8---
>
> Well, we could investigate more and play more with some graphs tools.
> For instance, include all the build-system dependencies and so on.
>
> Some list about “statistically important” packages could help for
> improving the list of “essential” packages.
>
> Although Python is great, I would like to run Guile. Any Guile library
> for manipulating graph is around?
https://packages.guix.gnu.org/packages/guile2.2-charting/0.2.0-1.75f755b/
Thought it may be guile 2 only...?
>
> All that to say, Guix is great! :-) And perhaps some of you have already
> some Guile code for analysing graphs. Maybe.
>
> Well, comment or idea is welcome. :-)
>
> 1: <https://en.wikipedia.org/wiki/Network_theory#Link_analysis>
> 2: <https://en.wikipedia.org/wiki/Network_theory#Centrality_measures>
> 3: <https://en.wikipedia.org/wiki/PageRank>
> 4: <https://en.wikipedia.org/wiki/Eigenvector_centrality>
> 5: <https://networkx.org/documentation/stable/reference/algorithms/link_analysis.html>
> 6: <https://networkx.org/documentation/stable/reference/algorithms/centrality.html>
> 7: <https://en.wikipedia.org/wiki/Katz_centrality>
> 8: <https://git.savannah.gnu.org/cgit/guix.git/tree/etc/release-manifest.scm#n47>
>
> Cheers,
> simon
^ permalink raw reply [flat|nested] 3+ messages in thread
* Re: Some stats about the graph of dependencies
2022-12-10 0:19 ` jbranso
@ 2022-12-10 13:30 ` zimoun
0 siblings, 0 replies; 3+ messages in thread
From: zimoun @ 2022-12-10 13:30 UTC (permalink / raw)
To: jbranso, Guix Devel
Hi,
On Sat, 10 Dec 2022 at 00:19, jbranso@dismail.de wrote:
>> Although Python is great, I would like to run Guile. Any Guile library
>> for manipulating graph is around?
>
> https://packages.guix.gnu.org/packages/guile2.2-charting/0.2.0-1.75f755b/
By graph, I was meaning this kind of graph:
https://en.wikipedia.org/wiki/Graph_(discrete_mathematics)
and not this kind of graph:
https://en.wikipedia.org/wiki/Chart
Sorry for the confusing terminology. :-)
Cheers,
simon
^ permalink raw reply [flat|nested] 3+ messages in thread
end of thread, other threads:[~2022-12-10 13:40 UTC | newest]
Thread overview: 3+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-12-09 17:29 Some stats about the graph of dependencies zimoun
2022-12-10 0:19 ` jbranso
2022-12-10 13:30 ` zimoun
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).