From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms5.migadu.com with LMTPS id +EEFAm+/7WPgdgAAbAwnHQ (envelope-from ) for ; Thu, 16 Feb 2023 06:30:23 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id aP09Am+/7WMXCgEAauVa8A (envelope-from ) for ; Thu, 16 Feb 2023 06:30:23 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id BCE503C8F8 for ; Thu, 16 Feb 2023 06:30:22 +0100 (CET) Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=posteo.net header.s=2017 header.b=puDAX0r+; dmarc=fail reason="SPF not aligned (strict)" header.from=posteo.net (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1676525422; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=kYThwqE8iKwdKMlfwhah2JabXvvNFqTCEwEtj24OGqM=; b=knBtYB/n/K1LTThBm+Ibu89paABQPmifx5OX7ngS3vH2OqSwu177tb+y6hS1rFGoRs4xTj QBSsf4Vkm2DEKYHM6pkibH3ilSVJ+XIEpvv/PI7WjPZkpnbuKJQBsBqZeCLAcddB92226z KyGFO+bXpI9s8QkoIDvrFw2BNacRXuCB8HimAifpdttPxiToSzTDZcWJtIw9ZspEmjOvfp nGYRaj6UgiXcTe9m73AThBIhFNbLW7iHlC74nTi1kUPggMhvBM+/lHNZ4a2xiq63LM3QOE m2qeS4YEYyhfAS46e8Jn327wVNStZh4CfgjTXxFZmTEJonw6w2puyUxgBAGffA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=posteo.net header.s=2017 header.b=puDAX0r+; dmarc=fail reason="SPF not aligned (strict)" header.from=posteo.net (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-patches-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-patches-bounces+larch=yhetil.org@gnu.org" ARC-Seal: i=1; s=key1; d=yhetil.org; t=1676525422; a=rsa-sha256; cv=none; b=YXeguWZDJ/DTCm+CcP2CeKYzWRBkcVuS5bMPtv5cvQJ6ii5/jzfWjRxKLeIHN/AT0wgaro ihDo+ymOtWKkOGtfb1FXIKoxKbXs6hswBX4eZ0PHuy3knWa5NI0Li07gg5guarObe83Yxs S3lcg2MHnKFXOl+NsMYxit/VxIQiifmnaNgQBZZ/MJPGSo4bKgvksMMMLhNbHg4asDOSuw 5o8q9jy+EOrS3WQ14RPfAugHR2ZSSioGHfR5DRRSWpI6maWw34vYbycGHMWtP3jQe1sy6C KJXu6jQToxTrT4e6T6AA/xE9AoETiHYGmTkC8/NpcSvKGO2CxjZUuPA+0R2MMA== Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pSWqL-0005Of-9z; Thu, 16 Feb 2023 00:30:05 -0500 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pSWqJ-0005OV-Hx for guix-patches@gnu.org; Thu, 16 Feb 2023 00:30:03 -0500 Received: from debbugs.gnu.org ([209.51.188.43]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1pSWqI-0001E9-T6 for guix-patches@gnu.org; Thu, 16 Feb 2023 00:30:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1pSWqI-0004mI-BH for guix-patches@gnu.org; Thu, 16 Feb 2023 00:30:02 -0500 X-Loop: help-debbugs@gnu.org Subject: [bug#61527] [PATCH] Add edgelist graph backend Resent-From: Kyle Andrews Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Thu, 16 Feb 2023 05:30:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 61527 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: patch To: Simon Tournier Cc: 61527@debbugs.gnu.org Received: via spool by 61527-submit@debbugs.gnu.org id=B61527.167652537418309 (code B ref 61527); Thu, 16 Feb 2023 05:30:02 +0000 Received: (at 61527) by debbugs.gnu.org; 16 Feb 2023 05:29:34 +0000 Received: from localhost ([127.0.0.1]:34641 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pSWpq-0004lF-7n for submit@debbugs.gnu.org; Thu, 16 Feb 2023 00:29:34 -0500 Received: from mout01.posteo.de ([185.67.36.65]:45333) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1pSWpl-0004ky-Ki for 61527@debbugs.gnu.org; Thu, 16 Feb 2023 00:29:33 -0500 Received: from submission (posteo.de [185.67.36.169]) by mout01.posteo.de (Postfix) with ESMTPS id 4ED092401F1 for <61527@debbugs.gnu.org>; Thu, 16 Feb 2023 06:29:23 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/simple; d=posteo.net; s=2017; t=1676525363; bh=MqV9IK5pBXsoPMgCJjTwtvyhFVTxd9+aeEZClKUaLs8=; h=From:To:Cc:Subject:Date:From; b=puDAX0r+PpTyjrVw9+cyDAPfUb7+Kkdz5DzBRfPfOD/Bv1wVOlB9TsByL0AyFCxhg Nye0vL0LJtMTT2q6O3+6k7OQv2EM6qCkw4AuywNuwMfG+bWCBbLEasCLHkiFvxk8CR IOxFCfs9V7XA9qgolxWz4sq74d/AP0Ky5Kpap5BjfAS3GcWh8rLgCh30KPnC98GLjC ANR0Awc+Jyr0cN7Z/+pvVUu3BIa3eT/m4wEejiCTu5wyFhs6BTCNwC/HYxgJmEkEiv WcUiLhqhlCOj1QzpPHJj8lG1IJCIkuOROv1kI4HdzNM3Ap3k/PKPL+lOXkGeALMqR6 60Q4sfUJJNbOA== Received: from customer (localhost [127.0.0.1]) by submission (posteo.de) with ESMTPSA id 4PHNmK3hJRz6tmd; Thu, 16 Feb 2023 06:29:20 +0100 (CET) References: <875yc3sdfo.fsf@posteo.net> <86h6vmdh3k.fsf@gmail.com> From: Kyle Andrews Date: Thu, 16 Feb 2023 03:28:34 +0000 In-reply-to: <86h6vmdh3k.fsf@gmail.com> Message-ID: <878rgynpox.fsf@posteo.net> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: X-Migadu-Queue-Id: BCE503C8F8 X-Spam-Score: 6.28 X-Migadu-Spam-Score: 6.28 X-Migadu-Scanner: scn0.migadu.com List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: guix-patches-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN X-TUID: wOcONbXT5NY+ Simon Tournier writes: > Hi, > > On Wed, 15 Feb 2023 at 05:21, Kyle Andrews wrote: >> Dear Guix, >> >> I would like to be able to conveniently analyze Guix package >> dependencies using general purpose network analysis software such as >> igraph. To achieve this, I have added another backend to Guix and which >> is exposed via guix graph which spits out a three column table that, >> while not technically and edge list, is readily transformed into one >> with minimal data munging. > > You might be interested by [1] where I export all the packages as > JSON-like (Python dictionary) and then import with python-networkx. > > Feel free to report your analyses, I am very interested by such. :-) That's my plan. I hope to provide Guix developers some tools to guide their efforts. Of course, I would really like to make networks of many more things in Guix including the Guile code itself and the Git history. > 1: https://yhetil.org/guix/874ju4qyd4.fsf@gmail.com That looks like a great reference! With regards to your query about graphs in that email, I remember there being a Racket library which provides the core graph data structures. Maybe those could be ported to guile? However, I wonder if it would more practical for guile to interface with igraph. >> +(define (emit-edgelist-prologue name port) >> + (display "" port)) > > Here, I would add the description of the data as header of the CSV-like > file. For instance, something: > > --8<---------------cut here---------------start------------->8--- > # type, name-or-edge1, item-or-edge2 > # package, name, item > # depends, edge1, edge2 > --8<---------------cut here---------------end--------------->8--- I toyed with calling columns 2 and 3 "parent/child", "source/sink", "input/output", "origin/destination". The "input/output" option sounds the best to me. > Well, is this format a standard format for representing graph? No it's not. Since I am not particularly comfortable with guile, I was hesitant to make extensive changes to the existing backend code. If I could have produced just the "depends" lines with the integers substituted with their meaningful names@versions, I would have done that instead. If I could pass id1 id2 label1 label2 to all of the emit procedures that would be pretty handy! For example: ``` input, output package1@1.2, package3@1.4 package1@1.2, package2@3.9 ... ``` Technically, R programmers would probably first turn this into a matrix or a data frame. That intermediate step would provide a convenient opportuntity to extract out the versions strings so that networks could more readily be compared across commits and branches. The versions could be added back in later as node attributes. Just as with relational tables, network analysis gets much more powerful when they have lots of attributes, some of which may refer to hash table keys pointing to other data structures. For example, libressl and openssl might share a protocol attribute with a value of "SSL". With a rich set of attributes data, researchers could start thinking about how to sample from the distribution of possible alternative system configurations when doing reproducibility studies. This might reveal "hot spots" of irreproducibility which package authors could be looking out for. That's one idea I just had while writing this email. I'm sure many people could come up with many more neat ideas if the biggest barriers to getting the data in the first place were removed. > From igraph documentation [1], it reads =E2=80=99igraph_read_graph_edgeli= st=E2=80=99: > > This format is simply a series of an even number of non-negative > integers separated by whitespace. The integers represent vertex > IDs. Placing each edge (i.e. pair of integers) on a separate > line is not required, but it is recommended for > readability. Edges of directed graphs are assumed to be in > "from, to" order. > > so maybe it could be nice to use this plain list for the edgelist > backend. WDYT? > > 1: https://igraph.org/c/doc/igraph-Foreign.html#igraph_read_graph_edgelist This is exactly why I did what I did in this patch. Just by filtering rows with "depends" in the first column, you get the edge list the igraph manual describes. To get the labels, you just need to filter rows with "package" instead. These are straightfoward post processing steps for many R and python users. I don't like the idea of just returning integers though. It's no fun to not be able to readily see what nodes refer to. That's why I prefer the CSV view of things with descriptive labels. Thanks for looking at my patch! Cheers, Kyle