Hey, zimoun writes: > On Wed, 19 Jan 2022 at 11:36, Ludovic Courtès wrote: > >> Oh right, so we’ll need to feed them historical ‘sources.json’ files >> eventually, I think Timothy was planning to do that eventually. > > From my side, what I would like to achieve soon: > > [...] > > Then what is also missing is: > > 3- have a collection of sources.json per Guix revision -- at least > some determined by us; I’ve attached a script that makes a “sources.json” per commit from the PoG database. It only lists regularly downloaded sources (no VCS sources), since that’s all the SWH loader supports so far. I also played around with it and came up with https://ngyro.com/pog-reports/2022-01-16/missing-sources.json This is a “sources.json” that only lists the “missing” and “unknown” sources from the PoG report. It lists sources across all commits (since 1.0.0). This might be the easiest thing for SWH to handle, since it omits nearly 20k sources that they definitely already have. Since they don’t have the tarball hashes, they have no way to skip downloading and processing tarballs that they already have by hash. Hence, filtering it with the extra data we have through the PoG projects should be something that they welcome! If they want, they could point a loader task at https://ngyro.com/pog-reports/latest/missing-sources.json and I could publish updates when I publish new PoG reports. There’s one other thing to think about. Some of our sources are arguably unsuitable for SWH. For instance, our bootstrap binaries. I bet we have a bunch of other borderline things, too, like game assets. Of course, if they are indiscriminately ingesting Github, I’m sure they’ve loaded plenty of garbage. Mostly, I think about these things because I believe it’s important to maintain the Guix-SWH relationship. Here’s the per-commit script. You can run it like this: $ guile sources.scm pog.db output-directory