>> My understanding is that so far these URLs were ignored by the >> lister/loader because they didn’t end in *.tar.*.⁰ FWIW, in the "new" lister [1] implementation, there are a bunch of extra computations done [1] to try and resolve those situations. It's trying to fetch more information from upstream server (e.g. crates urls which ends in /download, ...) now. It's probably not exhaustive though. [1] https://gitlab.softwareheritage.org/swh/devel/swh-lister/-/blob/master/swh/lister/nixguix/lister.py?ref_type=heads >> I’m sure Simon Tournier (Cc’d) already discussed with others at SWH >> how crucial it is for us to be able to query content by nar hash. > So yeah, we are looking forward to some ExtID interface. :-) Yes, and there is an ongoing merge request about the new interface [2] [2] https://gitlab.softwareheritage.org/swh/devel/swh-web/-/merge_requests/1220 Cheers, tony / Antoine R. Dumont (@ardumont) ----------------------------------------------------------------- gpg fingerprint BF00 203D 741A C9D5 46A8 BE07 52E2 E984 0D10 C3B8 Simon TOURNIER writes: > Hi, > >> The initial NixGuix loader (currently in production) lists and loads >> origins from a manifest, ignoring the specific origins mentioned above. The >> new stack will be able to ingest those origins. It will also optionally >> associate, if present, a NAR hash (specific intrinsic identifier to Nix and >> Guix) to what’s called an ExtID (SWH side). > > Cool! Thank you. > >> Regarding the SWH API reading side of the ExtID though is a work to be done. > > In short, currently Guix relies on SWH API for resolving from > “something” to SWHID, where “something” can be: > > + Git label tag + url > + Git commit hash > + plain url > > Well, the situation is in good shape IMHO – I do not have recent > numbers, say all is fine for 75% of all Guix packages and for 90% of > Guix packages coming from some Git repositories – but still, we have > examples where “Git label tag + url” fails. For one instance, see [1] > pointed by [2]. > > The information – history of history – is there in SWH but it would > require on Guix side to parse the snapshot information and extract as > best as possible; trying several SWH snapshots until a match. Something > like that. Chance of success until completion? Weak. :-) > > Moreover, what about the missing 25%? They are Guix packages coming > from Mercurial repositories or from Subversion repositories or some > others. > > Back on October 2020, we had discussion [3] for sending a save request > for packages using SVN checkouts but at the time we did not have a clear > path for retrieving. Then on March 2023, maybe an path for retrieving > with this discussion [4]… but still many hacks are required [5]. > > Again, the information is there in SWH but it would require on Guix side > to parse the snapshot information and extract as best as possible; > trying several SWH snapshots until a match. Something like that. > Chance of success until completion? Weak. :-) > > If only one source is missing, all the castle potentially falls down. Somehow, > a dictionary from ExtID as nar hash to SWHID would help to have the > castle more robust. :-) > > The SWH archive coverage of Guix packages would not be 75% because we, on > Guix side, are not able to know or retrieve these missing 25%. Such dictionary > could reinforce the bridge between reproducible computational environment > and archiving, IMHO. > > So yeah, we are looking forward to some ExtID interface. :-) > > Cheers, > simon > > > 1: https://issues.guix.gnu.org/66015#0-lineno53 > 2: https://gitlab.softwareheritage.org/swh/devel/swh-loader-git/-/issues/4751#note_148587 > 3: https://issues.guix.gnu.org/43442#9 > 4: https://sympa.inria.fr/sympa/arc/swh-devel/2023-03/msg00009.html > 5: https://issues.guix.gnu.org/43442#13