unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Preservation of Guix report 2022-01-16
@ 2022-01-16 19:51 Timothy Sample
  2022-01-18 15:28 ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Timothy Sample @ 2022-01-16 19:51 UTC (permalink / raw)
  To: guix-devel

Hi all,

I’ve published a new preservation of Guix report:

    https://ngyro.com/pog-reports/latest/

Actually, the URL is <https://ngyro.com/pog-reports/2022-01-16/>, but I
thought having a way to reference the latest report would be helpful.

There’s no big news in the report.  I’ve tracked down a handful of ‘git’
and ‘text’ sources.  There are a few new ‘git’ sources missing from SWH.
Two of them are Mumi and guile-netlink, which both failed when I tried
to get SWH to visit them.

Mostly things look pretty good.  For commit 195bb1f from a week ago, we
have 85.8% coverage.  There are about 300 sources missing from SWH.
I’ve looked over the list, but there are no big obvious problems.  There
are a handful of Ruby “.gem” files, which I guess SWH skips when
visiting our “sources.json” (it probably only takes archives).  There
are some things that aren’t in “sources.json” (e.g., parts of IcedTea
like Shenandoah).  There are also some things that should be fine, but
aren’t in SWH anyway, like Guile Plotutils.  It’s a pretty normal
looking tarball listed in “sources.json”, but SWH doesn’t have it
(despite visiting “sources.json” yesterday).

A really important thing to do at this point is to verify that some
reasonable looking computation is covered by what we are doing already.
For instance, is every source used to build Guile (or Python or R)
preserved?  This will ensure that key sources are not missing, which is
a real possibility given that everything so far has been purely a
numbers game!


-- Tim


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Preservation of Guix report 2022-01-16
  2022-01-16 19:51 Preservation of Guix report 2022-01-16 Timothy Sample
@ 2022-01-18 15:28 ` Ludovic Courtès
  2022-01-18 18:16   ` Vagrant Cascadian
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2022-01-18 15:28 UTC (permalink / raw)
  To: Timothy Sample; +Cc: guix-devel

Hi,

Timothy Sample <samplet@ngyro.com> skribis:

> I’ve published a new preservation of Guix report:
>
>     https://ngyro.com/pog-reports/latest/
>
> Actually, the URL is <https://ngyro.com/pog-reports/2022-01-16/>, but I
> thought having a way to reference the latest report would be helpful.

Nice!

[...]

> A really important thing to do at this point is to verify that some
> reasonable looking computation is covered by what we are doing already.
> For instance, is every source used to build Guile (or Python or R)
> preserved?  This will ensure that key sources are not missing, which is
> a real possibility given that everything so far has been purely a
> numbers game!

I wonder if we could have something similar to ‘guix weather -c’, which
would highlight missing sources with many dependents.

TeX Live is a big concern: it’s all Subversion, and everything depends
on those packages.  IIRC, SWH does not support Subversion yet; and when
it does, we’ll have to adjust our code so it can actually fetch
Subversion checkouts from SWH.  One issue is partial checkouts: all
these ‘texlive-’ packages refer to partial checkouts of the big TeX Live
repo.

Thoughts?

Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Preservation of Guix report 2022-01-16
  2022-01-18 15:28 ` Ludovic Courtès
@ 2022-01-18 18:16   ` Vagrant Cascadian
  2022-01-18 19:38     ` Timothy Sample
  0 siblings, 1 reply; 7+ messages in thread
From: Vagrant Cascadian @ 2022-01-18 18:16 UTC (permalink / raw)
  To: Ludovic Courtès, Timothy Sample; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1446 bytes --]

On 2022-01-18, Ludovic Courtès wrote:
> Timothy Sample <samplet@ngyro.com> skribis:
>> A really important thing to do at this point is to verify that some
>> reasonable looking computation is covered by what we are doing already.
>> For instance, is every source used to build Guile (or Python or R)
>> preserved?  This will ensure that key sources are not missing, which is
>> a real possibility given that everything so far has been purely a
>> numbers game!
>
> I wonder if we could have something similar to ‘guix weather -c’, which
> would highlight missing sources with many dependents.
>
> TeX Live is a big concern: it’s all Subversion, and everything depends
> on those packages.  IIRC, SWH does not support Subversion yet; and when
> it does, we’ll have to adjust our code so it can actually fetch
> Subversion checkouts from SWH.  One issue is partial checkouts: all
> these ‘texlive-’ packages refer to partial checkouts of the big TeX Live
> repo.

Maintain a git mirror of texlive SVN using git-svn or similar, and have
guix packages use that, and make sure SWH ingests it? Is it difficult
because of the size of TeX Live?

A little ugly, sure, but maybe only include the revisions that guix is
interested in for starters...

It has been some years ago that I used git-svn, but it worked well for a
several year transition for me and a small number of projects...


live well,
  vagrant

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Preservation of Guix report 2022-01-16
  2022-01-18 18:16   ` Vagrant Cascadian
@ 2022-01-18 19:38     ` Timothy Sample
  2022-01-19 10:44       ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: Timothy Sample @ 2022-01-18 19:38 UTC (permalink / raw)
  To: Vagrant Cascadian; +Cc: guix-devel, Ludovic Courtès

Hi,

Vagrant Cascadian <vagrant@debian.org> writes:

> On 2022-01-18, Ludovic Courtès wrote:
>> Timothy Sample <samplet@ngyro.com> skribis:
>>> A really important thing to do at this point is to verify that some
>>> reasonable looking computation is covered by what we are doing already.
>>> For instance, is every source used to build Guile (or Python or R)
>>> preserved?  This will ensure that key sources are not missing, which is
>>> a real possibility given that everything so far has been purely a
>>> numbers game!
>>
>> I wonder if we could have something similar to ‘guix weather -c’, which
>> would highlight missing sources with many dependents.

Definitely.  The simplest way to do that is to use the PoG database
as-is, and just write a script that traverses the derivation graph
checking for coverage.  It’s a bit trickier to integrate into Guix
itself, since we would have to make the data available.  It’s a good job
for the Data Service, but I feel like it’s a long road from here to
there.

>> TeX Live is a big concern: it’s all Subversion, and everything depends
>> on those packages.  IIRC, SWH does not support Subversion yet; and when
>> it does, we’ll have to adjust our code so it can actually fetch
>> Subversion checkouts from SWH.  One issue is partial checkouts: all
>> these ‘texlive-’ packages refer to partial checkouts of the big TeX Live
>> repo.
>
> Maintain a git mirror of texlive SVN using git-svn or similar, and have
> guix packages use that, and make sure SWH ingests it? Is it difficult
> because of the size of TeX Live?
>
> A little ugly, sure, but maybe only include the revisions that guix is
> interested in for starters...

Fortunately, SWH does support Subversion, so we can avoid this.  They
haven’t visited the TeX Live sources yet, but I’m sure they will
eventually.

How to actually match up what we want with what they have is a big
question!  I imagine they do quite a number to make the SVN repos fit
their Git-inspired data model.  It’s not clear to me how it works.  I’ll
have to look over the loader sometime:

https://forge.softwareheritage.org/source/swh-loader-svn/repository/master/

One way to handle things like partial checkouts is to revisit storing
SWHIDs with our origins.  If you have the directory ID, you can just
download the directory from SWH without worrying about SVN at all.  It
would also give us an easy way to handle Bazaar and CVS (which are under
development at SWH).


-- Tim


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Preservation of Guix report 2022-01-16
  2022-01-18 19:38     ` Timothy Sample
@ 2022-01-19 10:44       ` Ludovic Courtès
  2022-01-20  9:35         ` zimoun
  0 siblings, 1 reply; 7+ messages in thread
From: Ludovic Courtès @ 2022-01-19 10:44 UTC (permalink / raw)
  To: Timothy Sample; +Cc: Vagrant Cascadian, guix-devel

Hi,

Timothy Sample <samplet@ngyro.com> skribis:

> One way to handle things like partial checkouts is to revisit storing
> SWHIDs with our origins.  If you have the directory ID, you can just
> download the directory from SWH without worrying about SVN at all.  It
> would also give us an easy way to handle Bazaar and CVS (which are under
> development at SWH).

That sounds by far the easiest and most robust solution.

The downside is that we’d be storing both the nar hash and the SWHID.
We could arrange so that the daemon checks both hashes when both are
provided.

The more we make progress, the more it seems we won’t be able to avoid
storing multiple hashes.

Thanks,
Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Preservation of Guix report 2022-01-16
  2022-01-19 10:44       ` Ludovic Courtès
@ 2022-01-20  9:35         ` zimoun
  2022-01-24 15:17           ` Ludovic Courtès
  0 siblings, 1 reply; 7+ messages in thread
From: zimoun @ 2022-01-20  9:35 UTC (permalink / raw)
  To: Ludovic Courtès, Timothy Sample; +Cc: Vagrant Cascadian, guix-devel

Hi,

> The more we make progress, the more it seems we won’t be able to avoid
> storing multiple hashes.

Yes, it appears to me unavoidable.  The question is where store such
hashes?  At the origin level?  At the package level via ’properties’?
Using an external service?  As Disarchive database?  Other?

From my point of view, the bridge between all the hashes should be done
by SWH itself.  They promote their ’swhid’ which is far less common than
Git hashes, for instance.  It would make sense, at least to me, that
they would provide various maps using different keys.  They already
somehow provide the map Git+Sha1 to swid, they could also provide
NAR+Sha256 to swhid and maybe other serializers checksum to swhid.  The
world existed before swhid. ;-)

The question about the metadata is another question.

Perhaps, all is already a work-in-progress. :-)

Well, therefore, maybe the step forward is that Guix relies more (or
exclusively) on Disarchive-DB when it uses SWH as fallback.

Cheers,
simon


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Preservation of Guix report 2022-01-16
  2022-01-20  9:35         ` zimoun
@ 2022-01-24 15:17           ` Ludovic Courtès
  0 siblings, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2022-01-24 15:17 UTC (permalink / raw)
  To: zimoun; +Cc: Vagrant Cascadian, guix-devel

Hi,

zimoun <zimon.toutoune@gmail.com> skribis:

>> The more we make progress, the more it seems we won’t be able to avoid
>> storing multiple hashes.
>
> Yes, it appears to me unavoidable.  The question is where store such
> hashes?  At the origin level?  At the package level via ’properties’?
> Using an external service?  As Disarchive database?  Other?

I was thinking <origin> could accept several hashes.

> From my point of view, the bridge between all the hashes should be done
> by SWH itself.  They promote their ’swhid’ which is far less common than
> Git hashes, for instance.  It would make sense, at least to me, that
> they would provide various maps using different keys.  They already
> somehow provide the map Git+Sha1 to swid, they could also provide
> NAR+Sha256 to swhid and maybe other serializers checksum to swhid.  The
> world existed before swhid. ;-)

In principle, sure, it would be nice if SWH could map from one hash
flavor to another.

In practice, I can understand why they wouldn’t want to compute
nar/sha256 or some other underground flavor for all the archived source.

I think we have to do something on our side.  We could “upgrade”
<origin>, ‘guix-daemon’, and ‘guix publish’ so they can usefully handle
multiple hashes.  That’d be a long-term effort.

Now, an external “hash mapping service” has its appeal: it could be put
to work right away.  Tricky!

Ludo’.


^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2022-01-24 15:17 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-01-16 19:51 Preservation of Guix report 2022-01-16 Timothy Sample
2022-01-18 15:28 ` Ludovic Courtès
2022-01-18 18:16   ` Vagrant Cascadian
2022-01-18 19:38     ` Timothy Sample
2022-01-19 10:44       ` Ludovic Courtès
2022-01-20  9:35         ` zimoun
2022-01-24 15:17           ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).