unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Timothy Sample <samplet@ngyro.com>
To: zimoun <zimon.toutoune@gmail.com>
Cc: guix-devel@gnu.org
Subject: Re: Preservation of Guix Report
Date: Thu, 21 Oct 2021 12:26:26 -0400	[thread overview]
Message-ID: <87zgr2cpjh.fsf@ngyro.com> (raw)
In-Reply-To: <86sfwug72o.fsf@gmail.com> (zimoun's message of "Thu, 21 Oct 2021 09:39:27 +0200")

Hi zimoun,

zimoun <zimon.toutoune@gmail.com> writes:

>  2. For still unknown reasons, the bridge between SWH and Disarchive has
>     some holes.  For instance,
>
>         $ guix lint -c archive znc
>         gnu/packages/messaging.scm:996:12: znc@1.8.2: Disarchive entry refers to non-existent SWH directory '33a3b509b5ff8e9039626d11b7a800281884cf2a'
>
> [...]
>
>     Therefore, something is wrong somewhere.  Because of #1, I detect
>     many of such examples.  I do not know if SWH-ID computed by
>     Disarchive is incorrect [...].

Bingo!

According to SWH (emphasis mine):

    SWHIDs for contents, directories, revisions, and releases are, *at
    present*, compatible with the Git way of computing identifiers for
    its objects.

This is not true anymore.  As they go on to say:

    Note that Git compatibility is incidental and is not guaranteed to
    be maintained in future versions of this scheme (or Git).

Disarchive does it the Git way, and SWH does something slightly
different.  The SWH hash is 4e58dc09b8362caf1265102130a593b070562a68,
but the Git hash is 33a3b509b5ff8e9039626d11b7a800281884cf2a.  The
difference is that Disarchive, like Git, ignores empty directories.  It
makes sense that an archival project like SWH would not do that, and
they indeed don’t.

Fixing this in Disarchive is going to make a *huge* difference, so that
is now high priority for me (it’s a one line change, but I want to fix
it, release it, update Guix, and recompute the report).

> And answering to your question [3] about “sources.json”, I think the
> ingestion started after this commit
> 35bb77108fc7f2339da0b5be139043a5f3f21493 from guix-artwork.  Other said,
> SWH started to ingest from “sources.json” after July 2020; probably
> around September 2020.
>
> 3: <https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00141.html>

Thanks!  While investigating the above problem, I found a page that
lists what SWH is getting from us [1] and another showing when they are
scanning “sources.json” [2].  I don’t know if you’ve seen them before,
but they will be invaluable for figuring this stuff out.

[1] https://archive.softwareheritage.org/browse/origin/branches/?origin_url=https://guix.gnu.org/sources.json
[2] https://archive.softwareheritage.org/browse/origin/visits/?origin_url=https://guix.gnu.org/sources.json

> For the Missing and Unknown fields, could you distinguish the kind of
> origin?  Is it mainly git-fetch or url-fetch or others?

Good idea.  I think I can do this easily enough.  I might shelve it for
a bit, because I’m too excited to update the report with the Disarchive
hash fix.  :)


-- Tim


  reply	other threads:[~2021-10-21 16:26 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-20 19:48 Preservation of Guix Report Timothy Sample
2021-10-21  2:04 ` Timothy Sample
2021-10-21  7:39 ` zimoun
2021-10-21 16:26   ` Timothy Sample [this message]
2021-10-22  7:58     ` zimoun
2021-10-21 20:47 ` Ludovic Courtès
2021-10-22  7:53   ` zimoun
2021-10-29 14:12     ` Mutable Git tags & Software Heritage Ludovic Courtès
2021-10-30 16:19       ` zimoun
2021-11-09 16:55         ` Ludovic Courtès
2021-10-22 14:19   ` Preservation of Guix Report Timothy Sample
2021-10-22 17:32     ` Timothy Sample
2021-10-29 14:20     ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87zgr2cpjh.fsf@ngyro.com \
    --to=samplet@ngyro.com \
    --cc=guix-devel@gnu.org \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).