unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: Timothy Sample <samplet@ngyro.com>, guix-devel@gnu.org
Subject: Re: Preservation of Guix Report
Date: Thu, 21 Oct 2021 09:39:27 +0200	[thread overview]
Message-ID: <86sfwug72o.fsf@gmail.com> (raw)
In-Reply-To: <87o87jjx54.fsf@ngyro.com>

Hi Timothy,

On Wed, 20 Oct 2021 at 15:48, Timothy Sample <samplet@ngyro.com> wrote:

> Early this summer I did a bunch of work trying to figure out which Guix
> sources are preserved by the SWH archive.  I’m finally ready to share
> some preliminary results!
>
>     https://ngyro.com/pog-reports/2021-10-20/

Cool!  Really interesting.


> What’s cool is that the report is automated.  Next on my list is to
> update the database and generate a new report.  Then, we can compare the
> results and see if we are improving.  (My read on the results so far is
> that improving “sources.json” will yield big improvements, but we might
> not be able to get to that before the next report.)

Here two minor comments:

 1. Since a couple of days, I run:

        $ GUIX_SWH_TOKEN=$TOKEN guix lint -c archival

    where $TOKEN is provided by the SWH Authentication service [1].
    Instead of a rate limit at 120, it is 1200.  Therefore, more
    ’git-fetch’ packages are added.  I am in the process to automate
    that but do not hold your breath. :-)

 2. For still unknown reasons, the bridge between SWH and Disarchive has
    some holes.  For instance,

        $ guix lint -c archive znc
        gnu/packages/messaging.scm:996:12: znc@1.8.2: Disarchive entry refers to non-existent SWH directory '33a3b509b5ff8e9039626d11b7a800281884cf2a'

        $ wget https://guix.gnu.org/sources.json
        $ cat sources.json | jq | grep znc
             "integrity": "sha256-IwbxlQzncsWlmlf1SG1Zu5yrmEl8RfxJy8RawN7BGbs="
             "integrity": "sha256-q0jatpd+j0PW//szIo0ViGX2jd5wJtEjxpPXcznc8rs="
               "https://znc.in/releases/archive/znc-1.8.2.tar.gz"

        $ guix download https://znc.in/releases/archive/znc-1.8.2.tar.gz
        Starting download of /tmp/guix-file.hnjWTE
        From https://znc.in/releases/archive/znc-1.8.2.tar.gz...
         znc-1.8.2.tar.gz  2.0MiB                                     599KiB/s 00:03 [##################] 100.0%
        /gnu/store/58khbiwp2ghhzg00gnzdy2jlfv49vajm-znc-1.8.2.tar.gz
        03fyi0j44zcanj1rsdx93hkdskwfvhbywjiwd17f9q1a7yp8l8zz

    Therefore, something is wrong somewhere.  Because of #1, I detect
    many of such examples.  I do not know if SWH-ID computed by
    Disarchive is incorrect or if SWH has not ingested.  Investigations
    required. :-)


1: <https://archive.softwareheritage.org/api/>


> It’s surprising to me that SWH is not already getting these from
> “sources.json”.  I picked an arbitrary one, “rust-quote-0.6”, and it’s
> simply not in “sources.json”.  On the other hand, I bet SWH would like a
> crates.io (and CRAN, etc.) loader, too.

From the SWH doc, there is a CRAN lister [2] but I have not checked what
they ingest concretely.  Because on our side, we are using ’url-fetch’
and it appears to me possible to have a tiny mismatch between what is
inside the release tarball (what we concretely use) vs what SWH ingests
directly from CRAN.

2: <https://docs.softwareheritage.org/devel/apidoc/swh.lister.cran.html?highlight=cran#module-swh.lister.cran>


And answering to your question [3] about “sources.json”, I think the
ingestion started after this commit
35bb77108fc7f2339da0b5be139043a5f3f21493 from guix-artwork.  Other said,
SWH started to ingest from “sources.json” after July 2020; probably
around September 2020.

3: <https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00141.html>

> One other way to help would be to suggest improvements to the report.  I
> don’t want to fiddle with it too much, but if there is some simple graph
> or table or list that should be there, I’m happy to give it a go.

For the Missing and Unknown fields, could you distinguish the kind of
origin?  Is it mainly git-fetch or url-fetch or others?

It would help to spot the issues to work on it (sources.json, SWH side,
Disarchive, etc.).


Cheers,
simon


  parent reply	other threads:[~2021-10-21  7:47 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 19:48 Preservation of Guix Report Timothy Sample
2021-10-21  2:04 ` Timothy Sample
2021-10-21  7:39 ` zimoun [this message]
2021-10-21 16:26   ` Timothy Sample
2021-10-22  7:58     ` zimoun
2021-10-21 20:47 ` Ludovic Courtès
2021-10-22  7:53   ` zimoun
2021-10-29 14:12     ` Mutable Git tags & Software Heritage Ludovic Courtès
2021-10-30 16:19       ` zimoun
2021-11-09 16:55         ` Ludovic Courtès
2021-10-22 14:19   ` Preservation of Guix Report Timothy Sample
2021-10-22 17:32     ` Timothy Sample
2021-10-29 14:20     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86sfwug72o.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=samplet@ngyro.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).