From mboxrd@z Thu Jan 1 00:00:00 1970 From: Catonano Subject: Re: [GSoC update] Npm & guix Date: Fri, 29 Jul 2016 16:53:39 +0200 Message-ID: References: <8737mxflz4.fsf@gnu.org> Mime-Version: 1.0 Content-Type: multipart/alternative; boundary=001a1140fbc4893ab70538c76a95 Return-path: Received: from eggs.gnu.org ([2001:4830:134:3::10]:42104) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1bT9AX-0001r6-Dl for guix-devel@gnu.org; Fri, 29 Jul 2016 10:53:46 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1bT9AV-00015B-NB for guix-devel@gnu.org; Fri, 29 Jul 2016 10:53:45 -0400 In-Reply-To: <8737mxflz4.fsf@gnu.org> List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+gcggd-guix-devel=m.gmane.org@gnu.org Sender: "Guix-devel" To: =?UTF-8?Q?Ludovic_Court=C3=A8s?= Cc: guix-devel --001a1140fbc4893ab70538c76a95 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable 2016-07-25 23:26 GMT+02:00 Ludovic Court=C3=A8s : > Hello! > > Jelle Licht skribis: > > > On Ludo's advice, I snarfed Ricardo's recursive importer and bolted it > > on my npm importer. After leaving the importer running for a quite > > some hours (and making it more robust in the face of inconsistent npm > > information), it turns out that jQuery has a direct or indirect > > dependcy on about everything. We are talking pretty much all of the > > build systems, all of the testing frameworks and all of the test > > runners. Literally thousands of packages, and multiple (conflicting) > > versions of most. > > I=E2=80=99m really impressed that your importer can already grovel this m= uch! > In itself, that=E2=80=99s already a significant achievement, despite the > frustration of not getting =E2=80=9Cguix package -i jquery=E2=80=9D right= away. > > Do you have figures on the number of vertices and edges on this graph? > Could it be that the recursive importer keeps revisiting the same nodes > over and over again? :-) > > I would suggest publishing the code somewhere so others can try to > import their favorite JS library and give feedback. > > I'd like to indicate that in the Guix code base there's a function visiting the graph of the dependencies of a package with the concern of covering it all AND of not considering the same node more than one time It's in guix/packages.scm on line 552, it's called "transitive-inputs" and takes as an arguments the inputs of a single package. It could be used as a blueprint for such task of dependencies graphs covering. The main observation that I can do is that both versions do check whether a package being considered is already been seen in a previous passage. So, it doesn't seem to me that Jelle's version could revisit the same package (node) over and over Also, the "official" version clearly distinguishes between the current depth level in the graph and the next one, using 2 different variables for containing the packages (nodes) at the current depth and those in the next level ("inputs" and "propagated") as it does a breadth first visit. Instead the Jelle's version uses the same variable for the current AND the next level and it's a list and it conses the nodes for the next level on the current list So it seems to me that it does a depth first visit (I might be wrong, I didn't do a complete trace) If anything, the "official" version uses a VHash to check if a package has already been "seen" while Jelle's uses a plain list (again) So the cost of this check is constant in the official version and linear in Jelle's version. Further, in Jelle's version, every package gets checked against the list of already imported packages (as a plain list) AND against the packages already present in the store. Both checks at every iteration. It seems to me that there's space for improving here ;-) In fact, in Jelle's guix/import/npm.scm, on line 462 the "and" could be switched to an "or". It would work anyway and it could save a handful of checks on a complete run of the algorithm Anyway, I tried to "import" a random package and it has a ton of dependencies. It seems to me that a more systematic approach (like that of a real data scientist ;-) ) could help here The whole graph should be imported in some database and then graph algorithms should be run on it For example: which are the packages with less or no dependencies (and a lot of dependants) ? Because those should be imported first, in my opinion. Jelle came to the notion that testing frameworks are a dependencies for almost all the packages but as far as I understand that's a quite empirical knowledge. > > I am currently hovering between version 0.6 and 0.7, which can > > properly recompile itself and slightly more contemporary > > version. Getting to version 1.6 from June 2013 should be doable using > > this exact same approach. This will allow us to package a 2014 > > version of the Mocha testing framework. > > Impressive. > Indeed. Impressive. Ok, so these are the first impressions that I came up with. I'd like my observations on the algorithms to be vetted by more experienced people. Just to help me verify if I can actually understand someone's else scheme code Thanks --001a1140fbc4893ab70538c76a95 Content-Type: text/html; charset=UTF-8 Content-Transfer-Encoding: quoted-printable
2016-07-25 23:26 GMT+02:00 Ludovic Court=C3=A8s <ludo@gnu.org= >:
<= blockquote class=3D"gmail_quote" style=3D"margin:0 0 0 .8ex;border-left:1px= #ccc solid;padding-left:1ex">Hello!

Jelle Licht <jlicht@fsfe.org> = skribis:

> On Ludo's advice, I snarfed Ricardo's recursive importer and b= olted it
> on my npm importer. After leaving the importer running for a quite
> some hours (and making it more robust in the face of inconsistent npm<= br> > information), it turns out that jQuery has a direct or indirect
> dependcy on about everything. We are talking pretty much all of the > build systems, all of the testing frameworks and all of the test
> runners. Literally thousands of packages, and multiple (conflicting) > versions of most.

I=E2=80=99m really impressed that your importer can already grovel t= his much!
In itself, that=E2=80=99s already a significant achievement, despite the frustration of not getting =E2=80=9Cguix package -i jquery=E2=80=9D right a= way.

Do you have figures on the number of vertices and edges on this graph?
Could it be that the recursive importer keeps revisiting the same nodes
over and over again?=C2=A0 :-)

=C2=A0
I would suggest publishing the code somewhere so others can try to
import their favorite JS library and give feedback.


I'd like to indi= cate that in the Guix code base there's a function=20 visiting the graph of the dependencies of a package with the concern of=20 covering it all AND of not considering the same node more than one time
=
It's in guix/packages.scm on line 552, it's called &= quot;transitive-inputs" and takes as an arguments the inputs of a sing= le package.

It could be used as a blueprint for such task= of dependencies graphs covering.

The main observation that I can do is that both versions do check whether a package being considered is already been seen in a previous passage.

So, it doesn't seem to me that Jelle's vers= ion could revisit the same package (node) over and over

A= lso, the "official" version clearly distinguishes between the current= depth=20 level in the graph and the next one, using 2 different variables for=20 containing the packages (nodes) at the current depth and those in the=20 next level ("inputs" and "propagated") as it does a bre= adth first visit.

Instead the=20 Jelle's version uses the same variable for the current AND the next=20 level and it's a list and it conses the nodes for the next level on the= =20 current list

So it seems to me that it does a depth first= visit (I might be wrong, I didn't do a complete trace)

If anything, the "official" version uses a VHash to check if a= package has already been "seen" while Jelle's uses a plain l= ist (again)

So the cost of this check is constant in the = official version and linear in Jelle's version.

Further, in Jelle's version, every package gets checked against the list of=20 already imported packages (as a plain list) AND against the packages=20 already present in the store.

Both checks at every iterat= ion. It seems to me that there's space for improving here ;-)

In fact, in Jelle's guix/import/npm.scm, on line 462 the "and" could be switched to an "or". It= would work=20 anyway and it could save a handful of checks on a complete run of the=20 algorithm

Anyway, I tried to "import" a random packag= e and it has a ton of dependencies.

It seems to me that = a more systematic approach (like that of a real data scientist ;-) ) could = help here

The whole graph should be imported in some data= base and then graph algorithms should be run on it

For ex= ample: which are the packages with less or no dependencies (and a lot of de= pendants) ?
Because those should be imported first, in my opi= nion.

Jelle came to the notion that testing frameworks ar= e a dependencies for almost all the packages but as far as I understand tha= t's a quite empirical knowledge.
=C2=A0
> I am currently hovering between version 0.6 and 0.7, which can
> properly recompile itself and slightly more contemporary
> version. Getting to version 1.6 from June 2013 should be doable using<= br> > this exact same approach.=C2=A0 This will allow us to package a 2014 > version of the Mocha testing framework.

Impressive.

Indeed. Impressive. =

Ok, so these are the first impressions that I came up with. I'd like my=20 observations on the algorithms to be vetted by more experienced people.
=
Just to help me verify if I can actually understand someone&= #39;s else scheme code

Thanks
=
--001a1140fbc4893ab70538c76a95--