Hi, I guess, Gitlab means the instance gitlab.com, right? On sam., 06 août 2022 at 09:08, Olivier Dion via "Development of GNU Guix and the GNU System distribution." wrote: > Many packages origin in Guix use an url to a GitLab project. What are > the consequence of such deletion on Guix reproducibility? Will it > affects the time-machine? As explained by others, thanks to Software Heritage, the time-machine should not be impacted when Gitlab.com would stop to serve some source. First, Guix is able to automatically fallback to SWH when upstream source are unavailable. Considering substitutes is turned on, fetching respects this order: 1. try with Guix build farms 2. try upstream defined by origin 3. try SWH 4. try other “webarchives“ Second, the coverage by SWH depends on the kind of origin (url-fetch, git-fetch, etc.) because it is not the same entry point (for SWH). On a side note, SWH ingests many forges using what they call a ‘loader’ [1]. For example, their Git loader ingests an instance of a Gitlab forge, e.g., gitlab.com; but many others too as gitlab.inria.fr or gitlab.freedesktop.org or gitlab.gnome.org etc. It exists a (rudimentary) ‘nixguix’ loader [2]. ;-) This loader reads the file ’sources.json’ [3] and then SWH ingests all the tarball archives. Moreover, “guix lint -c archival” allows to send a save request to SWH but this request is only for Git origin. In summary, it is highly probable that the source code is in SWH. Third, the issue: being able to later fetch back from SWH using the (meta-)information we have now. Other said, the fallback requires an unique identifier. The net: this identifier needs to be compatible with SWH, which provides their own–named swh-id. The Git commit hash is compatible. But the checksum is not. That’s why the Guix project currently maintains a map (named Disarchive) from the checksum to swh-id allowing to rebuild the expected source code from the data stored in SWH. Well, many Guix packages use a string Git tag for referring. It can lead to issues, as in-place replacements. SWH regularly crawls, ingests and build “snapshots” (history of history) but there is no guarantee that the Guix origin is well-covered – aside Guix is currently not able to manage these snapshots. :-) And today, the main weakness is about Subversion or CVS. Some packages – deep in the dependency graph – are svn-fetch or cvs-fetch. And there is no robust fallback mechanism, AFAIK. In summary, the time-machine may or may not work. The main factor when it fails is about the availability of the substitute (from Guix build farm). Other said, older the time-machine jump is, and higher the probability of the failure becomes. Back to Gitlab.com. Using Guix 8f0d45c from July, 18th let “guix repl”; code attached below. --8<---------------cut here---------------start------------->8--- $ guix repl GNU Guile 3.0.8 Copyright (C) 1995-2021 Free Software Foundation, Inc. Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'. This program is free software, and you are welcome to redistribute it under certain conditions; type `,show c' for details. Enter `,help' for help. scheme@(guix-user)> (load "from-gitlab-dot-com.scm") scheme@(guix-user)> (length packages-on-gitlab.com) $1 = 223 scheme@(guix-user)> (length git-references-on-gitlab.com) $2 = 213 scheme@(guix-user)> ,pp tarballs-from-gitlab.com $3 = (# # # # # # # # # #) scheme@(guix-user)> ,pp recursive-packages-on-gitlab.com $4 = (# #) --8<---------------cut here---------------end--------------->8--- It means that the string “gitlab.com“ appears in the origin of 223 packages and 213 of those are git-fetch. Others said, 10 packages are using url-fetch with tarballs generated by Gitlab.com. Only 2 packages are recursive git-reference, therefore badly covered. Guix is currently not able to fully save them in SWH. Moreover, fetch back the data from SWH works but the not the resulting checksum; which defeats the fallback. See . --8<---------------cut here---------------start------------->8--- scheme@(guix-user)> (length archived-packages-on-swh) $5 = 202 scheme@(guix-user)> ,pp missing-packages $6 = (# # # # # # # # # # # # # # # # # # # # #) --8<---------------cut here---------------end--------------->8--- Not that bad. :-) Note that the 2 packages using recursive checkouts are not missing; the data is in SWH but the checksum hits bug#48540. Ok, let save these missing packages. --8<---------------cut here---------------start------------->8--- $ for p in tint2 surfraw remmina libsequoia zn-poly ecl-cl-utilities cl-utilities sbcl-cl-utilities openrgb guile-ac-d-bus guile-goblins graphviz komikku bitcoin-unlimited fulcrum flowee kicad kicad-footprints kicad-symbols emacs-execline python-pyodbc-c; do guix lint -c archival $p ;done gnu/packages/xdisorg.scm:1848:12: tint2@0.14.6: Disarchive entry refers to non-existent SWH directory 'b37b584d6b32848a4d57e8cab1af412cd46fcc9e' gnu/packages/vnc.scm:66:5: remmina@1.4.23: scheduled Software Heritage archival gnu/packages/sequoia.scm:421:12: libsequoia@0.22.0: scheduled Software Heritage archival gnu/packages/sagemath.scm:231:5: zn-poly@0.9.2: scheduled Software Heritage archival gnu/packages/hardware.scm:986:5: openrgb@0.7: scheduled Software Heritage archival gnu/packages/guile-xyz.scm:3800:12: guile-ac-d-bus@1.0.0-beta.0: scheduled Software Heritage archival gnu/packages/guile-xyz.scm:5109:5: guile-goblins@0.8: scheduled Software Heritage archival gnu/packages/gnome.scm:12350:5: komikku@0.39.0: scheduled Software Heritage archival gnu/packages/finance.scm:1644:5: bitcoin-unlimited@1.10.0.0: scheduled Software Heritage archival gnu/packages/engineering.scm:949:12: kicad@6.0.6: scheduled Software Heritage archival gnu/packages/engineering.scm:1118:12: kicad-footprints@6.0.6: scheduled Software Heritage archival gnu/packages/engineering.scm:1089:12: kicad-symbols@6.0.6: scheduled Software Heritage archival gnu/packages/emacs-xyz.scm:30313:12: emacs-execline@1.1: scheduled Software Heritage archival gnu/packages/databases.scm:3061:5: python-pyodbc-c@3.1.5: scheduled Software Heritage archival --8<---------------cut here---------------end--------------->8--- About tint2, commit 34c0cb5d6305ff7cc56318fbaa649afbe83464c7 from Thu Aug 4 replaces url-fetch by git-fetch. Now, let examine SWH and browse the saved requests. The package ’remmina@1.4.23’ was saved because it had been visited on 11 January 2022. For instance, give a look at [4]. It means something is unexpected although the source code is there: instead of string Git tag, let consider the Git commit hash [5]. --8<---------------cut here---------------start------------->8--- scheme@(guix-user)> (lookup-origin-revision "https://gitlab.com/Remmina/Remmina" "v1.4.23") $10 = #f scheme@(guix-user)> (lookup-revision "a03c1648a090458736434c77c0be00a7cf9cc44b") $11 = #< id: "a03c1648a090458736434c77c0be00a7cf9cc44b" date: # directory: "cc094a7d19d607beea54bfec549b4120d8c2ec92" directory-url: "https://archive.softwareheritage.org/api/1/directory/cc094a7d19d607beea54bfec549b4120d8c2ec92/"> --8<---------------cut here---------------end--------------->8--- Well, it requires more investigations to understand why the Guix code fails. Last, SWH fails to ingest for instance; another investigation. 1: 2: 3: 4: 5: All in all, robust time-machine needs some love. :-) Cheers, simon