* [bug#61527] [PATCH] Add edgelist graph backend @ 2023-02-15 5:21 Kyle Andrews 2023-02-15 16:32 ` Simon Tournier 2023-02-27 22:48 ` Ludovic Courtès 0 siblings, 2 replies; 7+ messages in thread From: Kyle Andrews @ 2023-02-15 5:21 UTC (permalink / raw) To: 61527 [-- Attachment #1: Type: text/plain, Size: 538 bytes --] Dear Guix, I would like to be able to conveniently analyze Guix package dependencies using general purpose network analysis software such as igraph. To achieve this, I have added another backend to Guix and which is exposed via guix graph which spits out a three column table that, while not technically and edge list, is readily transformed into one with minimal data munging. Please see the attached patch file which I have created with `git diff' from my working tree since I am not yet comfortable with more advanced git workflows. [-- Attachment #2: edgelist backend patch --] [-- Type: text/plain, Size: 1150 bytes --] diff --git a/guix/graph.scm b/guix/graph.scm index 41219ab67d..e1760ed92a 100644 --- a/guix/graph.scm +++ b/guix/graph.scm @@ -255,6 +255,24 @@ (define %graphviz-backend emit-prologue emit-epilogue emit-node emit-edge)) +(define (emit-edgelist-prologue name port) + (display "" port)) + +(define (emit-edgelist-epilogue port) + (display "" port)) + +(define (emit-edgelist-node id label port) + (format port "package, ~a, ~a\n" label id)) + +(define (emit-edgelist-edge id1 id2 port) + (format port "depends, ~a, ~a\n" id1 id2)) + +(define %edgelist-backend + (graph-backend "edgelist" + "Generate graph in CSV edge list format" + emit-edgelist-prologue emit-edgelist-epilogue + emit-edgelist-node emit-edgelist-edge)) + \f ;;; ;;; d3js export. @@ -338,7 +356,8 @@ (define %cypher-backend (define %graph-backends (list %graphviz-backend %d3js-backend - %cypher-backend)) + %cypher-backend + %edgelist-backend)) (define (lookup-backend name) "Return the graph backend called NAME. Raise an error if it is not found." ^ permalink raw reply related [flat|nested] 7+ messages in thread
* [bug#61527] [PATCH] Add edgelist graph backend 2023-02-15 5:21 [bug#61527] [PATCH] Add edgelist graph backend Kyle Andrews @ 2023-02-15 16:32 ` Simon Tournier 2023-02-16 3:28 ` Kyle Andrews 2023-02-27 22:48 ` Ludovic Courtès 1 sibling, 1 reply; 7+ messages in thread From: Simon Tournier @ 2023-02-15 16:32 UTC (permalink / raw) To: Kyle Andrews, 61527 Hi, On Wed, 15 Feb 2023 at 05:21, Kyle Andrews <kyle@posteo.net> wrote: > Dear Guix, > > I would like to be able to conveniently analyze Guix package > dependencies using general purpose network analysis software such as > igraph. To achieve this, I have added another backend to Guix and which > is exposed via guix graph which spits out a three column table that, > while not technically and edge list, is readily transformed into one > with minimal data munging. You might be interested by [1] where I export all the packages as JSON-like (Python dictionary) and then import with python-networkx. Feel free to report your analyses, I am very interested by such. :-) 1: https://yhetil.org/guix/874ju4qyd4.fsf@gmail.com > +(define (emit-edgelist-prologue name port) > + (display "" port)) Here, I would add the description of the data as header of the CSV-like file. For instance, something: --8<---------------cut here---------------start------------->8--- # type, name-or-edge1, item-or-edge2 # package, name, item # depends, edge1, edge2 --8<---------------cut here---------------end--------------->8--- Well, is this format a standard format for representing graph? From igraph documentation [1], it reads ’igraph_read_graph_edgelist’: This format is simply a series of an even number of non-negative integers separated by whitespace. The integers represent vertex IDs. Placing each edge (i.e. pair of integers) on a separate line is not required, but it is recommended for readability. Edges of directed graphs are assumed to be in "from, to" order. so maybe it could be nice to use this plain list for the edgelist backend. WDYT? 1: https://igraph.org/c/doc/igraph-Foreign.html#igraph_read_graph_edgelist Cheers, simon ^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#61527] [PATCH] Add edgelist graph backend 2023-02-15 16:32 ` Simon Tournier @ 2023-02-16 3:28 ` Kyle Andrews 0 siblings, 0 replies; 7+ messages in thread From: Kyle Andrews @ 2023-02-16 3:28 UTC (permalink / raw) To: Simon Tournier; +Cc: 61527 Simon Tournier <zimon.toutoune@gmail.com> writes: > Hi, > > On Wed, 15 Feb 2023 at 05:21, Kyle Andrews <kyle@posteo.net> wrote: >> Dear Guix, >> >> I would like to be able to conveniently analyze Guix package >> dependencies using general purpose network analysis software such as >> igraph. To achieve this, I have added another backend to Guix and which >> is exposed via guix graph which spits out a three column table that, >> while not technically and edge list, is readily transformed into one >> with minimal data munging. > > You might be interested by [1] where I export all the packages as > JSON-like (Python dictionary) and then import with python-networkx. > > Feel free to report your analyses, I am very interested by such. :-) That's my plan. I hope to provide Guix developers some tools to guide their efforts. Of course, I would really like to make networks of many more things in Guix including the Guile code itself and the Git history. > 1: https://yhetil.org/guix/874ju4qyd4.fsf@gmail.com That looks like a great reference! With regards to your query about graphs in that email, I remember there being a Racket library which provides the core graph data structures. Maybe those could be ported to guile? However, I wonder if it would more practical for guile to interface with igraph. >> +(define (emit-edgelist-prologue name port) >> + (display "" port)) > > Here, I would add the description of the data as header of the CSV-like > file. For instance, something: > > --8<---------------cut here---------------start------------->8--- > # type, name-or-edge1, item-or-edge2 > # package, name, item > # depends, edge1, edge2 > --8<---------------cut here---------------end--------------->8--- I toyed with calling columns 2 and 3 "parent/child", "source/sink", "input/output", "origin/destination". The "input/output" option sounds the best to me. > Well, is this format a standard format for representing graph? No it's not. Since I am not particularly comfortable with guile, I was hesitant to make extensive changes to the existing backend code. If I could have produced just the "depends" lines with the integers substituted with their meaningful names@versions, I would have done that instead. If I could pass id1 id2 label1 label2 to all of the emit procedures that would be pretty handy! For example: ``` input, output package1@1.2, package3@1.4 package1@1.2, package2@3.9 ... ``` Technically, R programmers would probably first turn this into a matrix or a data frame. That intermediate step would provide a convenient opportuntity to extract out the versions strings so that networks could more readily be compared across commits and branches. The versions could be added back in later as node attributes. Just as with relational tables, network analysis gets much more powerful when they have lots of attributes, some of which may refer to hash table keys pointing to other data structures. For example, libressl and openssl might share a protocol attribute with a value of "SSL". With a rich set of attributes data, researchers could start thinking about how to sample from the distribution of possible alternative system configurations when doing reproducibility studies. This might reveal "hot spots" of irreproducibility which package authors could be looking out for. That's one idea I just had while writing this email. I'm sure many people could come up with many more neat ideas if the biggest barriers to getting the data in the first place were removed. > From igraph documentation [1], it reads ’igraph_read_graph_edgelist’: > > This format is simply a series of an even number of non-negative > integers separated by whitespace. The integers represent vertex > IDs. Placing each edge (i.e. pair of integers) on a separate > line is not required, but it is recommended for > readability. Edges of directed graphs are assumed to be in > "from, to" order. > > so maybe it could be nice to use this plain list for the edgelist > backend. WDYT? > > 1: https://igraph.org/c/doc/igraph-Foreign.html#igraph_read_graph_edgelist This is exactly why I did what I did in this patch. Just by filtering rows with "depends" in the first column, you get the edge list the igraph manual describes. To get the labels, you just need to filter rows with "package" instead. These are straightfoward post processing steps for many R and python users. I don't like the idea of just returning integers though. It's no fun to not be able to readily see what nodes refer to. That's why I prefer the CSV view of things with descriptive labels. Thanks for looking at my patch! Cheers, Kyle ^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#61527] [PATCH] Add edgelist graph backend 2023-02-15 5:21 [bug#61527] [PATCH] Add edgelist graph backend Kyle Andrews 2023-02-15 16:32 ` Simon Tournier @ 2023-02-27 22:48 ` Ludovic Courtès 2023-02-28 9:21 ` Simon Tournier 1 sibling, 1 reply; 7+ messages in thread From: Ludovic Courtès @ 2023-02-27 22:48 UTC (permalink / raw) To: Kyle Andrews; +Cc: 61527, Simon Tournier Hello Kyle, Kyle Andrews <kyle@posteo.net> skribis: > I would like to be able to conveniently analyze Guix package > dependencies using general purpose network analysis software such as > igraph. To achieve this, I have added another backend to Guix and which > is exposed via guix graph which spits out a three column table that, > while not technically and edge list, is readily transformed into one > with minimal data munging. Is “CSV edge list” some sort of a standard format, or is it more of an idea you came up with? The patch LGTM but we’ll need a couple more things: 1. Maybe emitting extra metadata as Simon suggested. 2. Adding documentation under “Invoking guix graph”. In particular, it’d be nice to have an example showing how to query the generated CSV with igraph. 3. Ideally a full patch with commit log as generated with ‘git format-patch’. :-) Could you send an updated patch? Thank you! Ludo’. ^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#61527] [PATCH] Add edgelist graph backend 2023-02-27 22:48 ` Ludovic Courtès @ 2023-02-28 9:21 ` Simon Tournier 2023-03-01 3:49 ` Kyle Andrews 0 siblings, 1 reply; 7+ messages in thread From: Simon Tournier @ 2023-02-28 9:21 UTC (permalink / raw) To: Ludovic Courtès, Kyle Andrews; +Cc: 61527 Hi Kyle, Thank you for your inputs on the topic. :-) On lun., 27 févr. 2023 at 23:48, Ludovic Courtès <ludo@gnu.org> wrote: > Is “CSV edge list” some sort of a standard format, or is it more of an > idea you came up with? In addition to Ludo’s suggestions below, and commenting your answer [1], instead of “edgelist” backend – which seems well documented by igraph so it could be confusing for igraph folk, if any :-) – instead of “edgelist” backend I would use “csv”. WDYT? Quoting [1] for context: > > Here, I would add the description of the data as header of the CSV-like > > file. For instance, something: > > > > --8<---------------cut here---------------start------------->8--- > > # type, name-or-edge1, item-or-edge2 > > # package, name, item > > # depends, edge1, edge2 > > --8<---------------cut here---------------end--------------->8--- > > I toyed with calling columns 2 and 3 "parent/child", "source/sink", > "input/output", "origin/destination". The "input/output" option sounds > the best to me. This “input/output” does not sound to me. I would keep something like: --8<---------------cut here---------------start------------->8--- # type, name-or-from, vertex-or-to # package, name, vertex # depends, from, to --8<---------------cut here---------------end--------------->8--- Thinking a bit about this format, I agree with you that this “format” covers various needs feeding Python, R, etc. graph libraries. And it is easy to filter via plain pipe “| grep depends“ or else. 1: <https://issues.guix.gnu.org/msgid/878rgynpox.fsf@posteo.net> > The patch LGTM but we’ll need a couple more things: > > 1. Maybe emitting extra metadata as Simon suggested. > > 2. Adding documentation under “Invoking guix graph”. In particular, > it’d be nice to have an example showing how to query the generated > CSV with igraph. > > 3. Ideally a full patch with commit log as generated with ‘git > format-patch’. :-) > > Could you send an updated patch? Let me know if you need help. :-) Cheers, simon ^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#61527] [PATCH] Add edgelist graph backend 2023-02-28 9:21 ` Simon Tournier @ 2023-03-01 3:49 ` Kyle Andrews 2023-03-01 10:34 ` Simon Tournier 0 siblings, 1 reply; 7+ messages in thread From: Kyle Andrews @ 2023-03-01 3:49 UTC (permalink / raw) To: Simon Tournier; +Cc: Ludovic Courtès, 61527 Simon Tournier <zimon.toutoune@gmail.com> writes: > Hi Kyle, > > Thank you for your inputs on the topic. :-) > > > On lun., 27 févr. 2023 at 23:48, Ludovic Courtès <ludo@gnu.org> wrote: > >> Is “CSV edge list” some sort of a standard format, or is it more of an >> idea you came up with? > > In addition to Ludo’s suggestions below, and commenting your answer [1], > instead of “edgelist” backend – which seems well documented by igraph so > it could be confusing for igraph folk, if any :-) – instead of > “edgelist” backend I would use “csv”. WDYT? I agree 'csv' would be less confusing since in reality there are two (normalized) tables embedded in one in the output. > Quoting [1] for context: > >> > Here, I would add the description of the data as header of the CSV-like >> > file. For instance, something: >> > >> > --8<---------------cut here---------------start------------->8--- >> > # type, name-or-edge1, item-or-edge2 >> > # package, name, item >> > # depends, edge1, edge2 >> > --8<---------------cut here---------------end--------------->8--- >> >> I toyed with calling columns 2 and 3 "parent/child", "source/sink", >> "input/output", "origin/destination". The "input/output" option sounds >> the best to me. > > This “input/output” does not sound to me. I would keep something like: > > --8<---------------cut here---------------start------------->8--- > # type, name-or-from, vertex-or-to > # package, name, vertex > # depends, from, to > --8<---------------cut here---------------end--------------->8--- I like the name "table" for the first column following my rationale above. For the second and third columns my sense of style and experience working with data in R, I tend to prefer shorter vaguer names which are easier to type. I hope it should be clear from context what is a label for a vertex and what is an id for vertex. It would make more sense to me to explain the meaning of each column in a "data dictionary" included in the documentation, rather than sacrifice interactive convenience. > Thinking a bit about this format, I agree with you that this “format” > covers various needs feeding Python, R, etc. graph libraries. And it is > easy to filter via plain pipe “| grep depends“ or else. Exactly. > > 1: <https://issues.guix.gnu.org/msgid/878rgynpox.fsf@posteo.net> > > >> The patch LGTM but we’ll need a couple more things: >> >> 1. Maybe emitting extra metadata as Simon suggested. Mostly done depending on whether you will be satisfied with my "data dictionary" idea. >> 2. Adding documentation under “Invoking guix graph”. In particular, >> it’d be nice to have an example showing how to query the generated >> CSV with igraph. I have not added an example yet but I did mention the backend in the documentation with a short description. >> 3. Ideally a full patch with commit log as generated with ‘git >> format-patch’. :-) >> Could you send an updated patch? Sorry, I tried to send an updated patch without reading (2) closely enough. So, it's missing the example. I also want to add the data dictionary I mentioned above. I used git send-email for this task. --8<---------------cut here---------------start------------->8--- git send-email *.patch --to=guix-patches@gnu.org --8<---------------cut here---------------end--------------->8--- Is that the preferred way to do it? I'm not quite sure how debbugs associates the issue numbers. > > Let me know if you need help. :-) Thanks, for the offer! I really appreciate your patience. > Cheers, > simon ^ permalink raw reply [flat|nested] 7+ messages in thread
* [bug#61527] [PATCH] Add edgelist graph backend 2023-03-01 3:49 ` Kyle Andrews @ 2023-03-01 10:34 ` Simon Tournier 0 siblings, 0 replies; 7+ messages in thread From: Simon Tournier @ 2023-03-01 10:34 UTC (permalink / raw) To: Kyle Andrews; +Cc: Ludovic Courtès, 61527 Hi Kyle, On mer., 01 mars 2023 at 03:49, Kyle Andrews <kyle@posteo.net> wrote: > I used git send-email for this task. > > --8<---------------cut here---------------start------------->8--- > git send-email *.patch --to=guix-patches@gnu.org > --8<---------------cut here---------------end--------------->8--- > > Is that the preferred way to do it? I'm not quite sure how debbugs > associates the issue numbers. No, it is not. :-) Doing this, you trigger a new Debbugs number, and worse, one per patch behind *.patch. Please give a look to the manual, https://guix.gnu.org/manual/devel/en/guix.html#Submitting-Patches https://guix.gnu.org/manual/devel/en/guix.html#Sending-a-Patch-Series and let us know if it appears to you unclear, or any feedback. :-) In short: Once Debbugs created a number for tracking the patch, it is currently the number 61527, you must reply to 61527@debbugs.gnu.org. After a first version, you can use the option --reroll-count of git-format-patch for incrementing the count. Here, it would be ’-v2’ (short for --reroll-count=2). This makes the Subject: [PATCH v2 x/N]. Hope that helps, simon ^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2023-03-01 10:46 UTC | newest] Thread overview: 7+ messages (download: mbox.gz follow: Atom feed -- links below jump to the message on this page -- 2023-02-15 5:21 [bug#61527] [PATCH] Add edgelist graph backend Kyle Andrews 2023-02-15 16:32 ` Simon Tournier 2023-02-16 3:28 ` Kyle Andrews 2023-02-27 22:48 ` Ludovic Courtès 2023-02-28 9:21 ` Simon Tournier 2023-03-01 3:49 ` Kyle Andrews 2023-03-01 10:34 ` Simon Tournier
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).