unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Re: [swh-devel] Call for public review - SWH Nix/GNU Guix stack
       [not found] <CAKFPOSwdnSgjtOSW2CbaNyDEivGqan-gJaE_GVrZ7tbj83zRhg@mail.gmail.com>
@ 2024-01-11 12:32 ` Ludovic Courtès
  2024-01-12 18:42 ` Simon TOURNIER
  1 sibling, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2024-01-11 12:32 UTC (permalink / raw)
  To: Benoit Chauvet; +Cc: swh-devel, community, guix-devel, Simon Tournier

Hi Benoit and all!

(Cc: guix-devel rather than gnu-system-discuss.)

Benoit Chauvet <benoit.chauvet@inria.fr> skribis:

> Regarding the Nix/GNU Guix stack, Software Heritage will soon be ready to
> support the
> ingestion of specific versioned files, tarballs, git, hg, svn source code
> listed in their respective manifests [1] (as origins). The new lister (and
> extra loaders, namely
> {Content|Directory|GitCheckout|SvnExport|HgCheckout}Loader) have been
> deployed in our staging infrastructure [2].

Excellent!  I believe this addresses a problem we recently reported
regarding tarballs published with our own content-addressed URLs, which
look like:

  https://bordeaux.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas

My understanding is that so far these URLs were ignored by the
lister/loader because they didn’t end in *.tar.*.⁰

> The initial NixGuix loader (currently in production) lists and loads
> origins from a manifest, ignoring the specific origins mentioned above. The
> new stack will be able to ingest those origins. It will also optionally
> associate, if present, a NAR hash (specific intrinsic identifier to Nix and
> Guix) to what’s called an ExtID (SWH side).
> Regarding the SWH API reading side of the ExtID though is a work to be done.
>
> On staging, we have currently ingested origins that were listed from the
> GNU Guix manifest [3].
>
> We have already improved the implementations after discussing multiple
> limitations encountered along the way with the Guix community [4].

I’m sure Simon Tournier (Cc’d) already discussed with others at SWH how
crucial it is for us to be able to query content by nar hash.

Essentially, it would fill the gap that currently prevents us from
retrieving Subversion checkouts from SWH¹ and more generally complicates
retrieval of anything not referenced by a Git hash.  So obviously, we’re
looking forward to that ExtID interface for SWH.

Thanks for sharing this status update, these are all exciting news and
perspectives!

Ludo’.

⁰ https://issues.guix.gnu.org/39885#15-lineno60
¹ https://issues.guix.gnu.org/43442#13-lineno37


^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [swh-devel] Call for public review - SWH Nix/GNU Guix stack
       [not found] <CAKFPOSwdnSgjtOSW2CbaNyDEivGqan-gJaE_GVrZ7tbj83zRhg@mail.gmail.com>
  2024-01-11 12:32 ` [swh-devel] Call for public review - SWH Nix/GNU Guix stack Ludovic Courtès
@ 2024-01-12 18:42 ` Simon TOURNIER
  2024-01-15  9:04   ` Antoine R. Dumont (@ardumont)
  1 sibling, 1 reply; 6+ messages in thread
From: Simon TOURNIER @ 2024-01-12 18:42 UTC (permalink / raw)
  To: swh-devel@inria.fr, community@nixos.org, guix-devel@gnu.org
  Cc: ludovic.courtes

Hi,

> The initial NixGuix loader (currently in production) lists and loads
> origins from a manifest, ignoring the specific origins mentioned above. The
> new stack will be able to ingest those origins. It will also optionally
> associate, if present, a NAR hash (specific intrinsic identifier to Nix and
> Guix) to what’s called an ExtID (SWH side).

Cool!  Thank you.

> Regarding the SWH API reading side of the ExtID though is a work to be done.

In short, currently Guix relies on SWH API for resolving from
“something” to SWHID, where “something” can be:

 + Git label tag + url
 + Git commit hash
 + plain url

Well, the situation is in good shape IMHO – I do not have recent
numbers, say all is fine for 75% of all Guix packages and for 90% of
Guix packages coming from some Git repositories – but still, we have
examples where “Git label tag + url” fails.  For one instance, see [1]
pointed by [2].

The information – history of history – is there in SWH but it would
require on Guix side to parse the snapshot information and extract as
best as possible; trying several SWH snapshots until a match.  Something
like that.  Chance of success until completion?  Weak. :-)

Moreover, what about the missing 25%?  They are Guix packages coming
from Mercurial repositories or from Subversion repositories or some
others.

Back on October 2020, we had discussion [3] for sending a save request
for packages using SVN checkouts but at the time we did not have a clear
path for retrieving.  Then on March 2023, maybe an path for retrieving
with this discussion [4]… but still many hacks are required [5].

Again, the information is there in SWH but it would require on Guix side
to parse the snapshot information and extract as best as possible;
trying several SWH snapshots until a match.  Something like that.
Chance of success until completion?  Weak. :-)

If only one source is missing, all the castle potentially falls down.  Somehow,
a dictionary from ExtID as nar hash to SWHID would help to have the
castle more robust. :-)

The SWH archive coverage of Guix packages would not be 75% because we, on
Guix side, are not able to know or retrieve these missing 25%.  Such dictionary
could reinforce the bridge between reproducible computational environment 
and archiving, IMHO.

So yeah, we are looking forward to some ExtID interface.  :-)

Cheers,
simon


1: https://issues.guix.gnu.org/66015#0-lineno53
2: https://gitlab.softwareheritage.org/swh/devel/swh-loader-git/-/issues/4751#note_148587
3: https://issues.guix.gnu.org/43442#9
4: https://sympa.inria.fr/sympa/arc/swh-devel/2023-03/msg00009.html
5: https://issues.guix.gnu.org/43442#13



^ permalink raw reply	[flat|nested] 6+ messages in thread

* RE: [swh-devel] Call for public review - SWH Nix/GNU Guix stack
  2024-01-12 18:42 ` Simon TOURNIER
@ 2024-01-15  9:04   ` Antoine R. Dumont (@ardumont)
  2024-01-15 11:22     ` Ludovic Courtès
  2024-01-16 20:39     ` Timothy Sample
  0 siblings, 2 replies; 6+ messages in thread
From: Antoine R. Dumont (@ardumont) @ 2024-01-15  9:04 UTC (permalink / raw)
  To: Simon TOURNIER, swh-devel@inria.fr, community@nixos.org,
	guix-devel@gnu.org
  Cc: ludovic.courtes

[-- Attachment #1: Type: text/plain, Size: 3993 bytes --]


>> My understanding is that so far these URLs were ignored by the
>> lister/loader because they didn’t end in *.tar.*.⁰

FWIW, in the "new" lister [1] implementation, there are a bunch of extra
computations done [1] to try and resolve those situations. It's trying
to fetch more information from upstream server (e.g. crates urls which
ends in /download, ...) now. It's probably not exhaustive though.

[1] https://gitlab.softwareheritage.org/swh/devel/swh-lister/-/blob/master/swh/lister/nixguix/lister.py?ref_type=heads

>> I’m sure Simon Tournier (Cc’d) already discussed with others at SWH
>> how crucial it is for us to be able to query content by nar hash.

> So yeah, we are looking forward to some ExtID interface.  :-)

Yes, and there is an ongoing merge request about the new interface [2]

[2] https://gitlab.softwareheritage.org/swh/devel/swh-web/-/merge_requests/1220

Cheers,
tony / Antoine R. Dumont (@ardumont)

-----------------------------------------------------------------
gpg fingerprint BF00 203D 741A C9D5 46A8 BE07 52E2 E984 0D10 C3B8

Simon TOURNIER <simon.tournier@inserm.fr> writes:

> Hi,
>
>> The initial NixGuix loader (currently in production) lists and loads
>> origins from a manifest, ignoring the specific origins mentioned above. The
>> new stack will be able to ingest those origins. It will also optionally
>> associate, if present, a NAR hash (specific intrinsic identifier to Nix and
>> Guix) to what’s called an ExtID (SWH side).
>
> Cool!  Thank you.
>
>> Regarding the SWH API reading side of the ExtID though is a work to be done.
>
> In short, currently Guix relies on SWH API for resolving from
> “something” to SWHID, where “something” can be:
>
>  + Git label tag + url
>  + Git commit hash
>  + plain url
>
> Well, the situation is in good shape IMHO – I do not have recent
> numbers, say all is fine for 75% of all Guix packages and for 90% of
> Guix packages coming from some Git repositories – but still, we have
> examples where “Git label tag + url” fails.  For one instance, see [1]
> pointed by [2].
>
> The information – history of history – is there in SWH but it would
> require on Guix side to parse the snapshot information and extract as
> best as possible; trying several SWH snapshots until a match.  Something
> like that.  Chance of success until completion?  Weak. :-)
>
> Moreover, what about the missing 25%?  They are Guix packages coming
> from Mercurial repositories or from Subversion repositories or some
> others.
>
> Back on October 2020, we had discussion [3] for sending a save request
> for packages using SVN checkouts but at the time we did not have a clear
> path for retrieving.  Then on March 2023, maybe an path for retrieving
> with this discussion [4]… but still many hacks are required [5].
>
> Again, the information is there in SWH but it would require on Guix side
> to parse the snapshot information and extract as best as possible;
> trying several SWH snapshots until a match.  Something like that.
> Chance of success until completion?  Weak. :-)
>
> If only one source is missing, all the castle potentially falls down.  Somehow,
> a dictionary from ExtID as nar hash to SWHID would help to have the
> castle more robust. :-)
>
> The SWH archive coverage of Guix packages would not be 75% because we, on
> Guix side, are not able to know or retrieve these missing 25%.  Such dictionary
> could reinforce the bridge between reproducible computational environment 
> and archiving, IMHO.
>
> So yeah, we are looking forward to some ExtID interface.  :-)
>
> Cheers,
> simon
>
>
> 1: https://issues.guix.gnu.org/66015#0-lineno53
> 2: https://gitlab.softwareheritage.org/swh/devel/swh-loader-git/-/issues/4751#note_148587
> 3: https://issues.guix.gnu.org/43442#9
> 4: https://sympa.inria.fr/sympa/arc/swh-devel/2023-03/msg00009.html
> 5: https://issues.guix.gnu.org/43442#13

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [swh-devel] Call for public review - SWH Nix/GNU Guix stack
  2024-01-15  9:04   ` Antoine R. Dumont (@ardumont)
@ 2024-01-15 11:22     ` Ludovic Courtès
  2024-01-16 20:39     ` Timothy Sample
  1 sibling, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2024-01-15 11:22 UTC (permalink / raw)
  To: Antoine R. Dumont (@ardumont)
  Cc: Simon TOURNIER, swh-devel@inria.fr, guix-devel@gnu.org

Hey Antoine,

"Antoine R. Dumont (@ardumont)" <ardumont@softwareheritage.org> skribis:

>>> My understanding is that so far these URLs were ignored by the
>>> lister/loader because they didn’t end in *.tar.*.⁰
>
> FWIW, in the "new" lister [1] implementation, there are a bunch of extra
> computations done [1] to try and resolve those situations. It's trying
> to fetch more information from upstream server (e.g. crates urls which
> ends in /download, ...) now. It's probably not exhaustive though.
>
> [1] https://gitlab.softwareheritage.org/swh/devel/swh-lister/-/blob/master/swh/lister/nixguix/lister.py?ref_type=heads
>
>>> I’m sure Simon Tournier (Cc’d) already discussed with others at SWH
>>> how crucial it is for us to be able to query content by nar hash.
>
>> So yeah, we are looking forward to some ExtID interface.  :-)
>
> Yes, and there is an ongoing merge request about the new interface [2]
>
> [2] https://gitlab.softwareheritage.org/swh/devel/swh-web/-/merge_requests/1220

These are both excellent news, thank you!

Ludo’.


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [swh-devel] Call for public review - SWH Nix/GNU Guix stack
  2024-01-15  9:04   ` Antoine R. Dumont (@ardumont)
  2024-01-15 11:22     ` Ludovic Courtès
@ 2024-01-16 20:39     ` Timothy Sample
  2024-01-19 16:44       ` Antoine R. Dumont (@ardumont)
  1 sibling, 1 reply; 6+ messages in thread
From: Timothy Sample @ 2024-01-16 20:39 UTC (permalink / raw)
  To: Antoine R. Dumont (@ardumont)
  Cc: Simon TOURNIER, swh-devel@inria.fr, community@nixos.org,
	guix-devel@gnu.org, ludovic.courtes

Hello,

This is very exciting work, thanks everyone!

"Antoine R. Dumont (@ardumont)" <ardumont@softwareheritage.org> writes:

> FWIW, in the "new" lister [1] implementation, there are a bunch of extra
> computations done [1] to try and resolve those situations. It's trying
> to fetch more information from upstream server (e.g. crates urls which
> ends in /download, ...) now. It's probably not exhaustive though.
>
> [1] https://gitlab.softwareheritage.org/swh/devel/swh-lister/-/blob/master/swh/lister/nixguix/lister.py?ref_type=heads

I was just looking over some of the new results and noticed that crates
are being treated as ‘content’ rather than ‘tarball-directory’.  E.g.:

https://webapp.staging.swh.network/browse/content/sha1_git:e05b33b2d3b40254ceaaa5fe4c501d1b15c75ea6/?origin_url=https://crates.io/api/v1/crates/diff/0.1.12/download

Is that because the changes you describe were done after the staging
data was loaded or is it a bug?


-- Tim


^ permalink raw reply	[flat|nested] 6+ messages in thread

* Re: [swh-devel] Call for public review - SWH Nix/GNU Guix stack
  2024-01-16 20:39     ` Timothy Sample
@ 2024-01-19 16:44       ` Antoine R. Dumont (@ardumont)
  0 siblings, 0 replies; 6+ messages in thread
From: Antoine R. Dumont (@ardumont) @ 2024-01-19 16:44 UTC (permalink / raw)
  To: Timothy Sample
  Cc: Simon TOURNIER, swh-devel@inria.fr, guix-devel@gnu.org,
	ludovic.courtes, julien

[-- Attachment #1: Type: text/plain, Size: 2546 bytes --]

Hello,

> Is that because the changes you describe were done after the staging
> data was loaded or is it a bug?

Our staging instance inherits its append-only property from our main
archive. In the staging case (for "prototypes", soon-to-be-deployed new
feature or so), that makes it hard to see through the "old bug" noise.
It's old origins that were ingested initially with a first version of
the lister (which got iteratively fixed).

----

@anlambert made a pass this week in docker (from scratch) to check (thx ;)

> Excellent!  I believe this addresses a problem we recently reported
> regarding tarballs published with our own content-addressed URLs, which
> look like:
>
>   https://bordeaux.guix.gnu.org/file/BiocNeighbors_1.20.0.tar.gz/sha256/0a5wg099fgwjbzd6r3mr4l02rcmjqlkdcz1w97qzwx1mir41fmas

As a result, he actually enhanced the listing so the urls mentioned
earlier ^ is treated correctly out of the data in the url. (@me That
needs a bump in deployment [for next week])

Early on, I was referring to another heuristic using a HEAD query to
parse header informations [if any]. As that specific url does not
provide any, so it passed through.

----

Note: cc-ed julien@malka.sh instead of community@nixos.org (as asked in
the thread)

Cheers,
--
tony / Antoine R. Dumont (@ardumont)

-----------------------------------------------------------------
gpg fingerprint BF00 203D 741A C9D5 46A8 BE07 52E2 E984 0D10 C3B8


Timothy Sample <samplet@ngyro.com> writes:

> Hello,
>
> This is very exciting work, thanks everyone!
>
> "Antoine R. Dumont (@ardumont)" <ardumont@softwareheritage.org> writes:
>
>> FWIW, in the "new" lister [1] implementation, there are a bunch of extra
>> computations done [1] to try and resolve those situations. It's trying
>> to fetch more information from upstream server (e.g. crates urls which
>> ends in /download, ...) now. It's probably not exhaustive though.
>>
>> [1] https://gitlab.softwareheritage.org/swh/devel/swh-lister/-/blob/master/swh/lister/nixguix/lister.py?ref_type=heads
>
> I was just looking over some of the new results and noticed that crates
> are being treated as ‘content’ rather than ‘tarball-directory’.  E.g.:
>
> https://webapp.staging.swh.network/browse/content/sha1_git:e05b33b2d3b40254ceaaa5fe4c501d1b15c75ea6/?origin_url=https://crates.io/api/v1/crates/diff/0.1.12/download
>
> Is that because the changes you describe were done after the staging
> data was loaded or is it a bug?
>
>
> -- Tim

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 873 bytes --]

^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2024-01-19 16:46 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
     [not found] <CAKFPOSwdnSgjtOSW2CbaNyDEivGqan-gJaE_GVrZ7tbj83zRhg@mail.gmail.com>
2024-01-11 12:32 ` [swh-devel] Call for public review - SWH Nix/GNU Guix stack Ludovic Courtès
2024-01-12 18:42 ` Simon TOURNIER
2024-01-15  9:04   ` Antoine R. Dumont (@ardumont)
2024-01-15 11:22     ` Ludovic Courtès
2024-01-16 20:39     ` Timothy Sample
2024-01-19 16:44       ` Antoine R. Dumont (@ardumont)

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).