From: zimoun <zimon.toutoune@gmail.com>
To: Timothy Sample <samplet@ngyro.com>, guix-devel@gnu.org
Subject: Re: Preservation of Guix Report
Date: Thu, 21 Oct 2021 09:39:27 +0200 [thread overview]
Message-ID: <86sfwug72o.fsf@gmail.com> (raw)
In-Reply-To: <87o87jjx54.fsf@ngyro.com>
Hi Timothy,
On Wed, 20 Oct 2021 at 15:48, Timothy Sample <samplet@ngyro.com> wrote:
> Early this summer I did a bunch of work trying to figure out which Guix
> sources are preserved by the SWH archive. I’m finally ready to share
> some preliminary results!
>
> https://ngyro.com/pog-reports/2021-10-20/
Cool! Really interesting.
> What’s cool is that the report is automated. Next on my list is to
> update the database and generate a new report. Then, we can compare the
> results and see if we are improving. (My read on the results so far is
> that improving “sources.json” will yield big improvements, but we might
> not be able to get to that before the next report.)
Here two minor comments:
1. Since a couple of days, I run:
$ GUIX_SWH_TOKEN=$TOKEN guix lint -c archival
where $TOKEN is provided by the SWH Authentication service [1].
Instead of a rate limit at 120, it is 1200. Therefore, more
’git-fetch’ packages are added. I am in the process to automate
that but do not hold your breath. :-)
2. For still unknown reasons, the bridge between SWH and Disarchive has
some holes. For instance,
$ guix lint -c archive znc
gnu/packages/messaging.scm:996:12: znc@1.8.2: Disarchive entry refers to non-existent SWH directory '33a3b509b5ff8e9039626d11b7a800281884cf2a'
$ wget https://guix.gnu.org/sources.json
$ cat sources.json | jq | grep znc
"integrity": "sha256-IwbxlQzncsWlmlf1SG1Zu5yrmEl8RfxJy8RawN7BGbs="
"integrity": "sha256-q0jatpd+j0PW//szIo0ViGX2jd5wJtEjxpPXcznc8rs="
"https://znc.in/releases/archive/znc-1.8.2.tar.gz"
$ guix download https://znc.in/releases/archive/znc-1.8.2.tar.gz
Starting download of /tmp/guix-file.hnjWTE
From https://znc.in/releases/archive/znc-1.8.2.tar.gz...
znc-1.8.2.tar.gz 2.0MiB 599KiB/s 00:03 [##################] 100.0%
/gnu/store/58khbiwp2ghhzg00gnzdy2jlfv49vajm-znc-1.8.2.tar.gz
03fyi0j44zcanj1rsdx93hkdskwfvhbywjiwd17f9q1a7yp8l8zz
Therefore, something is wrong somewhere. Because of #1, I detect
many of such examples. I do not know if SWH-ID computed by
Disarchive is incorrect or if SWH has not ingested. Investigations
required. :-)
1: <https://archive.softwareheritage.org/api/>
> It’s surprising to me that SWH is not already getting these from
> “sources.json”. I picked an arbitrary one, “rust-quote-0.6”, and it’s
> simply not in “sources.json”. On the other hand, I bet SWH would like a
> crates.io (and CRAN, etc.) loader, too.
From the SWH doc, there is a CRAN lister [2] but I have not checked what
they ingest concretely. Because on our side, we are using ’url-fetch’
and it appears to me possible to have a tiny mismatch between what is
inside the release tarball (what we concretely use) vs what SWH ingests
directly from CRAN.
2: <https://docs.softwareheritage.org/devel/apidoc/swh.lister.cran.html?highlight=cran#module-swh.lister.cran>
And answering to your question [3] about “sources.json”, I think the
ingestion started after this commit
35bb77108fc7f2339da0b5be139043a5f3f21493 from guix-artwork. Other said,
SWH started to ingest from “sources.json” after July 2020; probably
around September 2020.
3: <https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00141.html>
> One other way to help would be to suggest improvements to the report. I
> don’t want to fiddle with it too much, but if there is some simple graph
> or table or list that should be there, I’m happy to give it a go.
For the Missing and Unknown fields, could you distinguish the kind of
origin? Is it mainly git-fetch or url-fetch or others?
It would help to spot the issues to work on it (sources.json, SWH side,
Disarchive, etc.).
Cheers,
simon
next prev parent reply other threads:[~2021-10-21 7:47 UTC|newest]
Thread overview: 13+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-20 19:48 Preservation of Guix Report Timothy Sample
2021-10-21 2:04 ` Timothy Sample
2021-10-21 7:39 ` zimoun [this message]
2021-10-21 16:26 ` Timothy Sample
2021-10-22 7:58 ` zimoun
2021-10-21 20:47 ` Ludovic Courtès
2021-10-22 7:53 ` zimoun
2021-10-29 14:12 ` Mutable Git tags & Software Heritage Ludovic Courtès
2021-10-30 16:19 ` zimoun
2021-11-09 16:55 ` Ludovic Courtès
2021-10-22 14:19 ` Preservation of Guix Report Timothy Sample
2021-10-22 17:32 ` Timothy Sample
2021-10-29 14:20 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86sfwug72o.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=guix-devel@gnu.org \
--cc=samplet@ngyro.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).