unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Preservation of Guix Report
@ 2021-10-20 19:48 Timothy Sample
  2021-10-21  2:04 ` Timothy Sample
                   ` (2 more replies)
  0 siblings, 3 replies; 13+ messages in thread
From: Timothy Sample @ 2021-10-20 19:48 UTC (permalink / raw)
  To: guix-devel

Hi everyone!

Early this summer I did a bunch of work trying to figure out which Guix
sources are preserved by the SWH archive.  I’m finally ready to share
some preliminary results!

    https://ngyro.com/pog-reports/2021-10-20/

This report is already quite outdated, though.  It only covers commits
up to the end of May, and sometime in June is when the sources were
checked against the SWH archive.  I’m sharing it now to avoid any
further delays.

What’s cool is that the report is automated.  Next on my list is to
update the database and generate a new report.  Then, we can compare the
results and see if we are improving.  (My read on the results so far is
that improving “sources.json” will yield big improvements, but we might
not be able to get to that before the next report.)

The report itself only provides a very high level overview.  If you want
to check on specifics, you will have to download the database.  There’s
a link at the bottom of the report as well as a link to a detailed
schema definition.  Anyone interested in making some sense of the 5,043
known missing sources is encouraged to look there.  However, I can say
from my own investigation that a lot of them are kinda boring.  For
instance, 3,435 are from crates.io, CRAN, Hackage, Bioconductor, and
CPAN:

    select count(*)
    from fods
        join fod_references using (fod_id)
    where not is_in_swh
        and (reference like '%crates.io%' or
             reference like '%/cran/%' or
             reference like '%hackage%' or
             reference like '%/bioconductor.%' or
             reference like '%/cpan/%');
    => 3435

It’s surprising to me that SWH is not already getting these from
“sources.json”.  I picked an arbitrary one, “rust-quote-0.6”, and it’s
simply not in “sources.json”.  On the other hand, I bet SWH would like a
crates.io (and CRAN, etc.) loader, too.

One other more interesting approach might be to check Git sources:

    select count(*)
    from fods
        join fod_references using (fod_id)
    where not is_in_swh
        and reference like '(git-reference%';
    => 336

There are fewer, but they might be more interesting.  Just be sure to
check that they haven’t made it into the SWH archive since June.  For
instance, I just checked “asciidoc@9.1.0” and learned that the database
has “NOT is_in_swh”, but it is now in the SWH archive.  So, caveat
emptor, I guess.  Maybe it would be wise to wait for a more recent
report before diving in.

One other way to help would be to suggest improvements to the report.  I
don’t want to fiddle with it too much, but if there is some simple graph
or table or list that should be there, I’m happy to give it a go.


-- Tim


^ permalink raw reply	[flat|nested] 13+ messages in thread

end of thread, other threads:[~2021-11-09 16:55 UTC | newest]

Thread overview: 13+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-10-20 19:48 Preservation of Guix Report Timothy Sample
2021-10-21  2:04 ` Timothy Sample
2021-10-21  7:39 ` zimoun
2021-10-21 16:26   ` Timothy Sample
2021-10-22  7:58     ` zimoun
2021-10-21 20:47 ` Ludovic Courtès
2021-10-22  7:53   ` zimoun
2021-10-29 14:12     ` Mutable Git tags & Software Heritage Ludovic Courtès
2021-10-30 16:19       ` zimoun
2021-11-09 16:55         ` Ludovic Courtès
2021-10-22 14:19   ` Preservation of Guix Report Timothy Sample
2021-10-22 17:32     ` Timothy Sample
2021-10-29 14:20     ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).