unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Timothy Sample <samplet@ngyro.com>
To: "Ludovic Courtès" <ludo@gnu.org>
Cc: guix-devel@gnu.org
Subject: Re: Preservation of Guix Report
Date: Fri, 22 Oct 2021 10:19:17 -0400	[thread overview]
Message-ID: <87tuh9cfbu.fsf@ngyro.com> (raw)
In-Reply-To: <87a6j2w1et.fsf@gnu.org> ("Ludovic Courtès"'s message of "Thu, 21 Oct 2021 22:47:22 +0200")

Hey,

Ludovic Courtès <ludo@gnu.org> writes:

> Timothy Sample <samplet@ngyro.com> skribis:
>
>> Early this summer I did a bunch of work trying to figure out which Guix
>> sources are preserved by the SWH archive.  I’m finally ready to share
>> some preliminary results!
>>
>>     https://ngyro.com/pog-reports/2021-10-20/
>>
>> This report is already quite outdated, though.  It only covers commits
>> up to the end of May, and sometime in June is when the sources were
>> checked against the SWH archive.  I’m sharing it now to avoid any
>> further delays.
>
> This is truly awesome!  (Did you manage to grab all that info with the
> default rate limit?!)

Yes, but I have another trick.  The “known” endpoint [1].  If you
already know the SWHIDs you want to check, you can check 1,000 per call.
With the anonymous rate limit, I can check 120,000 every hour, which is
plenty.

[1] https://docs.softwareheritage.org/devel/swh-web/uri-scheme-api.html#get--api-1-content-known-(sha1)[,(sha1),%20...,(sha1)]-

> I can’t wait for the updated report now that Simon and yourself have
> identified that SWHID computation bug!

I’m computing SWHIDs while writing this.  Not long now!

> Some of our <git-reference> refer to tags, not commits.  How do you
> determine whether they’re saved?

The short answer is “elbow grease”.  Basically, I’m taking a “work
harder, not smarter” approach.  :p  I go out and obtain the source,
verify it with Guix’s hash, and then compute the SWHID.  This is another
thing we could move to the CI infrastructure, but I think there might be
some hiccoughs.  For git-references, I believe we can’t just compute the
ID after the download derivation – we would have to change the download
derivation itself.  Maybe add an ‘swhid’ output?  It’s a little more
complicated than just throwing up some scripts, anyway.

> ‘guix lint -c archival’ uses ‘lookup-origin-revision’, which is a good
> approximation, but it’s not 100% reliable because tags can be modified
> and that procedure only tells you that a same-named tag was found, not
> that it’s the commit you were expecting.  (And really, we should stop
> referring to tags.)

Like zimoun said elsewhere in this thread, having an explicit mapping
from Guix hash to SHWID will improve reliability quite a bit.  It’s hard
to get to 100%, though!  With the reports, we will eventually be able to
check everything.  However, there’s still a small possibility of bugs
and false positives.  Ultimately, I’m hoping the reports will help
detect small problems (some specific source is missing) and guide our
efforts on big problems (xz support in Disarchive or support for more
version control systems, etc.).


-- Tim


  parent reply	other threads:[~2021-10-22 14:27 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 19:48 Preservation of Guix Report Timothy Sample
2021-10-21  2:04 ` Timothy Sample
2021-10-21  7:39 ` zimoun
2021-10-21 16:26   ` Timothy Sample
2021-10-22  7:58     ` zimoun
2021-10-21 20:47 ` Ludovic Courtès
2021-10-22  7:53   ` zimoun
2021-10-29 14:12     ` Mutable Git tags & Software Heritage Ludovic Courtès
2021-10-30 16:19       ` zimoun
2021-11-09 16:55         ` Ludovic Courtès
2021-10-22 14:19   ` Timothy Sample [this message]
2021-10-22 17:32     ` Preservation of Guix Report Timothy Sample
2021-10-29 14:20     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tuh9cfbu.fsf@ngyro.com \
    --to=samplet@ngyro.com \
    --cc=guix-devel@gnu.org \
    --cc=ludo@gnu.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).