* GitLab to plans to delete dormant projects
@ 2022-08-06 13:08 Olivier Dion via Development of GNU Guix and the GNU System distribution.
2022-08-06 14:34 ` Maxime Devos
` (3 more replies)
0 siblings, 4 replies; 7+ messages in thread
From: Olivier Dion via Development of GNU Guix and the GNU System distribution. @ 2022-08-06 13:08 UTC (permalink / raw)
To: guix-devel
Hi,
Following this article <https://lwn.net/Articles/903858/>, GitLab is
planning to start deleting project that were idle for > 12 months.
Many packages origin in Guix use an url to a GitLab project. What are
the consequence of such deletion on Guix reproducibility? Will it
affects the time-machine?
--
Olivier Dion
oldiob.dev
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GitLab to plans to delete dormant projects
2022-08-06 13:08 GitLab to plans to delete dormant projects Olivier Dion via Development of GNU Guix and the GNU System distribution.
@ 2022-08-06 14:34 ` Maxime Devos
2022-08-06 14:41 ` Julien Lepiller
` (2 subsequent siblings)
3 siblings, 0 replies; 7+ messages in thread
From: Maxime Devos @ 2022-08-06 14:34 UTC (permalink / raw)
To: Olivier Dion, guix-devel
[-- Attachment #1.1.1: Type: text/plain, Size: 1038 bytes --]
On 06-08-2022 15:08, Olivier Dion via Development of GNU Guix and the
GNU System distribution. wrote:
> Hi,
>
> Following this article<https://lwn.net/Articles/903858/>, GitLab is
> planning to start deleting project that were idle for > 12 months.
>
> Many packages origin in Guix use an url to a GitLab project. What are
> the consequence of such deletion on Guix reproducibility? Will it
> affects the time-machine?
software heritage should avoid some problems, but from what I've heard
it doesn't support all edge cases yet (something about recursive
checkouts?).
I think it would be a good idea to make some Guile script to find all
GitLab git checkouts in Guix and run the swh linter on them to make sure
they are archived.
Additionally, it would be nice to support multiple URLs as fallbacks in
the 'origin' record for git-fetch (like we have for url-fetch) to avoid
the three points of failures (SWH and the copy at the substitute
servers) in the fallback mechanism.
Greetings,
Maxime.
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GitLab to plans to delete dormant projects
2022-08-06 13:08 GitLab to plans to delete dormant projects Olivier Dion via Development of GNU Guix and the GNU System distribution.
2022-08-06 14:34 ` Maxime Devos
@ 2022-08-06 14:41 ` Julien Lepiller
2022-08-06 14:50 ` Olivier Dion via Development of GNU Guix and the GNU System distribution.
2022-08-06 21:57 ` John Kehayias
2022-08-17 16:41 ` zimoun
3 siblings, 1 reply; 7+ messages in thread
From: Julien Lepiller @ 2022-08-06 14:41 UTC (permalink / raw)
To: Olivier Dion,
Olivier Dion via Development of GNU Guix and the GNU System distribution.,
guix-devel
[-- Attachment #1: Type: text/plain, Size: 1254 bytes --]
Our build farms need those sources, so they keep them in cache. If you need a source, you can always substitute from the build farms if the origin disappeared (that's actually the default and you don't even need to trust the build farm for that to work).
Another fallback option when substitution is not possible is to get the source from Software Heritage. They keep an archive of almost everything. To do that, they have listers that help tgem find sources fsom different sesvices. They have a lister for GitLab, and even one for Guix. Also, as part of guix lint, a request is sert to swh if the oriqin is not yet archived.
Hopefully that means our origins are saved by Software Heritage, so we can transparently fall back to them.
Le 6 août 2022 15:08:21 GMT+02:00, "Olivier Dion via Development of GNU Guix and the GNU System distribution." <guix-devel@gnu.org> a écrit :
>Hi,
>
>Following this article <https://lwn.net/Articles/903858/>, GitLab is
>planning to start deleting project that were idle for > 12 months.
>
>Many packages origin in Guix use an url to a GitLab project. What are
>the consequence of such deletion on Guix reproducibility? Will it
>affects the time-machine?
>
>--
>Olivier Dion
>oldiob.dev
>
>
[-- Attachment #2: Type: text/html, Size: 1627 bytes --]
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GitLab to plans to delete dormant projects
2022-08-06 14:41 ` Julien Lepiller
@ 2022-08-06 14:50 ` Olivier Dion via Development of GNU Guix and the GNU System distribution.
2022-08-07 21:09 ` Ludovic Courtès
0 siblings, 1 reply; 7+ messages in thread
From: Olivier Dion via Development of GNU Guix and the GNU System distribution. @ 2022-08-06 14:50 UTC (permalink / raw)
To: Julien Lepiller,
Olivier Dion via Development of GNU Guix and the GNU System distribution.,
guix-devel
On Sat, 06 Aug 2022, Julien Lepiller <julien@lepiller.eu> wrote:
> Our build farms need those sources, so they keep them in cache. If you
> need a source, you can always substitute from the build farms if the
> origin disappeared (that's actually the default and you don't even
> need to trust the build farm for that to work).
Does the cache as a time to live? For example, would a source from 2020 in
20 years still be available on the build farms? Or would the build
farms make a request to Software Heritage?
> Another fallback option when substitution is not possible is to get
> the source from Software Heritage. They keep an archive of almost
> everything. To do that, they have listers that help tgem find sources
> fsom different sesvices. They have a lister for GitLab, and even one
> for Guix. Also, as part of guix lint, a request is sert to swh if the
> oriqin is not yet archived.
>
> Hopefully that means our origins are saved by Software Heritage, so we
> can transparently fall back to them.
Okay great then. Maybe it would be a good idea to test this by
simulating the deletion of a Gitlab repo. Better find it out now than
when it's too late.
--
Olivier Dion
oldiob.dev
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GitLab to plans to delete dormant projects
2022-08-06 13:08 GitLab to plans to delete dormant projects Olivier Dion via Development of GNU Guix and the GNU System distribution.
2022-08-06 14:34 ` Maxime Devos
2022-08-06 14:41 ` Julien Lepiller
@ 2022-08-06 21:57 ` John Kehayias
2022-08-17 16:41 ` zimoun
3 siblings, 0 replies; 7+ messages in thread
From: John Kehayias @ 2022-08-06 21:57 UTC (permalink / raw)
To: Olivier Dion; +Cc: guix-devel, Julien Lepiller
Hi all,
------- Original Message -------
On Saturday, August 6th, 2022 at 9:08 AM, Olivier Dion via "Development of GNU Guix and the GNU System distribution." <guix-devel@gnu.org> wrote:
>
>
> Hi,
>
> Following this article https://lwn.net/Articles/903858/, GitLab is
> planning to start deleting project that were idle for > 12 months.
>
> Many packages origin in Guix use an url to a GitLab project. What are
> the consequence of such deletion on Guix reproducibility? Will it
> affects the time-machine?
>
I think having good backup and archival plans are great, so not to dissuade anyone on this, but as an update here looks like GitLab has walked back on this:
https://www.theregister.com/2022/08/05/gitlab_reverses_deletion_policy/
John
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GitLab to plans to delete dormant projects
2022-08-06 14:50 ` Olivier Dion via Development of GNU Guix and the GNU System distribution.
@ 2022-08-07 21:09 ` Ludovic Courtès
0 siblings, 0 replies; 7+ messages in thread
From: Ludovic Courtès @ 2022-08-07 21:09 UTC (permalink / raw)
To: Olivier Dion via Development of GNU Guix and the GNU System distribution.
Cc: Julien Lepiller, Olivier Dion
Hi,
Olivier Dion via "Development of GNU Guix and the GNU System
distribution." <guix-devel@gnu.org> skribis:
> On Sat, 06 Aug 2022, Julien Lepiller <julien@lepiller.eu> wrote:
>> Our build farms need those sources, so they keep them in cache. If you
>> need a source, you can always substitute from the build farms if the
>> origin disappeared (that's actually the default and you don't even
>> need to trust the build farm for that to work).
>
> Does the cache as a time to live? For example, would a source from 2020 in
> 20 years still be available on the build farms? Or would the build
> farms make a request to Software Heritage?
Our build farms have limited capacity, so they probably won’t keep
everything forever, but that’s the mission of Software Heritage (SWH).
Guix automatically falls back to SWH for some time now, and SWH
archives all of gitlab.com, so I’m not very concerned:
https://guix.gnu.org/en/blog/2019/connecting-reproducible-deployment-to-a-long-term-source-code-archive/
A shortcoming of our code, as Maxime mentions, is that it doesn’t
correctly handle retrieval of recursive checkouts from SWH. That’s a
bug to fix, and there’s another one in this area:
https://issues.guix.gnu.org/48540
> Okay great then. Maybe it would be a good idea to test this by
> simulating the deletion of a Gitlab repo. Better find it out now than
> when it's too late.
Other than checking whether ‘guix lint -c archival PKG’ complains, one
can run:
guix build -S PKG --check
HTH!
Ludo’.
^ permalink raw reply [flat|nested] 7+ messages in thread
* Re: GitLab to plans to delete dormant projects
2022-08-06 13:08 GitLab to plans to delete dormant projects Olivier Dion via Development of GNU Guix and the GNU System distribution.
` (2 preceding siblings ...)
2022-08-06 21:57 ` John Kehayias
@ 2022-08-17 16:41 ` zimoun
3 siblings, 0 replies; 7+ messages in thread
From: zimoun @ 2022-08-17 16:41 UTC (permalink / raw)
To: Olivier Dion, guix-devel
[-- Attachment #1: Type: text/plain, Size: 10702 bytes --]
Hi,
I guess, Gitlab means the instance gitlab.com, right?
On sam., 06 août 2022 at 09:08, Olivier Dion via "Development of GNU Guix and the GNU System distribution." <guix-devel@gnu.org> wrote:
> Many packages origin in Guix use an url to a GitLab project. What are
> the consequence of such deletion on Guix reproducibility? Will it
> affects the time-machine?
As explained by others, thanks to Software Heritage, the time-machine
should not be impacted when Gitlab.com would stop to serve some source.
First, Guix is able to automatically fallback to SWH when upstream
source are unavailable. Considering substitutes is turned on, fetching
respects this order:
1. try with Guix build farms
2. try upstream defined by origin
3. try SWH
4. try other “webarchives“
Second, the coverage by SWH depends on the kind of origin (url-fetch,
git-fetch, etc.) because it is not the same entry point (for SWH).
On a side note, SWH ingests many forges using what they call a ‘loader’
[1]. For example, their Git loader ingests an instance of a Gitlab
forge, e.g., gitlab.com; but many others too as gitlab.inria.fr or
gitlab.freedesktop.org or gitlab.gnome.org etc.
It exists a (rudimentary) ‘nixguix’ loader [2]. ;-) This loader reads
the file ’sources.json’ [3] and then SWH ingests all the tarball
archives.
Moreover, “guix lint -c archival” allows to send a save request to SWH
but this request is only for Git origin.
In summary, it is highly probable that the source code is in SWH.
Third, the issue: being able to later fetch back from SWH using the
(meta-)information we have now. Other said, the fallback requires an
unique identifier. The net: this identifier needs to be compatible with
SWH, which provides their own–named swh-id.
The Git commit hash is compatible. But the checksum is not. That’s why
the Guix project currently maintains a map (named Disarchive) from the
checksum to swh-id allowing to rebuild the expected source code from the
data stored in SWH.
Well, many Guix packages use a string Git tag for referring. It can
lead to issues, as in-place replacements. SWH regularly crawls, ingests
and build “snapshots” (history of history) but there is no guarantee
that the Guix origin is well-covered – aside Guix is currently not able
to manage these snapshots. :-)
And today, the main weakness is about Subversion or CVS. Some packages
– deep in the dependency graph – are svn-fetch or cvs-fetch. And there
is no robust fallback mechanism, AFAIK.
In summary, the time-machine may or may not work. The main factor when
it fails is about the availability of the substitute (from Guix build
farm). Other said, older the time-machine jump is, and higher the
probability of the failure becomes.
Back to Gitlab.com. Using Guix 8f0d45c from July, 18th let “guix repl”;
code attached below.
--8<---------------cut here---------------start------------->8---
$ guix repl
GNU Guile 3.0.8
Copyright (C) 1995-2021 Free Software Foundation, Inc.
Guile comes with ABSOLUTELY NO WARRANTY; for details type `,show w'.
This program is free software, and you are welcome to redistribute it
under certain conditions; type `,show c' for details.
Enter `,help' for help.
scheme@(guix-user)> (load "from-gitlab-dot-com.scm")
scheme@(guix-user)> (length packages-on-gitlab.com)
$1 = 223
scheme@(guix-user)> (length git-references-on-gitlab.com)
$2 = 213
scheme@(guix-user)> ,pp tarballs-from-gitlab.com
$3 = (#<package tint2@0.14.6 gnu/packages/xdisorg.scm:1845 7f21cf29d000>
#<package surfraw@2.3.0 gnu/packages/web.scm:5600 7f21cdf99370>
#<package python-dogtail@0.9.11 gnu/packages/python-xyz.scm:2777 7f21cd2cba50>
#<package ecl-cl-utilities@0.0.0-1.dce2d2f gnu/packages/lisp-xyz.scm:4454 7f21cec95790>
#<package cl-utilities@0.0.0-1.dce2d2f gnu/packages/lisp-xyz.scm:4454 7f21cec95840>
#<package sbcl-cl-utilities@0.0.0-1.dce2d2f gnu/packages/lisp-xyz.scm:4454 7f21cec958f0>
#<package iucode-tool@2.3.1 gnu/packages/linux.scm:4476 7f21cd59fc60>
#<package graphviz@2.49.0 gnu/packages/graphviz.scm:67 7f21cd286000>
#<package fulcrum@1.1.1 gnu/packages/finance.scm:1712 7f21cd2a5e70>
#<package flowee@2020.04.1 gnu/packages/finance.scm:1747 7f21cd2a5dc0>)
scheme@(guix-user)> ,pp recursive-packages-on-gitlab.com
$4 = (#<package jucipp@1.7.1 gnu/packages/text-editors.scm:309 7f21d204f580>
#<package emilua@0.3.2 gnu/packages/lua.scm:1122 7f21cd3d7bb0>)
--8<---------------cut here---------------end--------------->8---
It means that the string “gitlab.com“ appears in the origin of 223
packages and 213 of those are git-fetch. Others said, 10 packages are
using url-fetch with tarballs generated by Gitlab.com.
Only 2 packages are recursive git-reference, therefore badly covered.
Guix is currently not able to fully save them in SWH. Moreover, fetch
back the data from SWH works but the not the resulting checksum; which
defeats the fallback. See <https://issues.guix.gnu.org/48540>.
--8<---------------cut here---------------start------------->8---
scheme@(guix-user)> (length archived-packages-on-swh)
$5 = 202
scheme@(guix-user)> ,pp missing-packages
$6 = (#<package tint2@0.14.6 gnu/packages/xdisorg.scm:1845 7f21cf29d000>
#<package surfraw@2.3.0 gnu/packages/web.scm:5600 7f21cdf99370>
#<package remmina@1.4.23 gnu/packages/vnc.scm:62 7f21d1ed1d10>
#<package libsequoia@0.22.0 gnu/packages/sequoia.scm:418 7f21ce25ea50>
#<package zn-poly@0.9.2 gnu/packages/sagemath.scm:227 7f21d1f7f370>
#<package ecl-cl-utilities@0.0.0-1.dce2d2f gnu/packages/lisp-xyz.scm:4454 7f21cec95790>
#<package cl-utilities@0.0.0-1.dce2d2f gnu/packages/lisp-xyz.scm:4454 7f21cec95840>
#<package sbcl-cl-utilities@0.0.0-1.dce2d2f gnu/packages/lisp-xyz.scm:4454 7f21cec958f0>
#<package openrgb@0.7 gnu/packages/hardware.scm:982 7f21cef46f20>
#<package guile-ac-d-bus@1.0.0-beta.0 gnu/packages/guile-xyz.scm:3796 7f21d8585210>
#<package guile-goblins@0.8 gnu/packages/guile-xyz.scm:5105 7f21d858d8f0>
#<package graphviz@2.49.0 gnu/packages/graphviz.scm:67 7f21cd286000>
#<package komikku@0.39.0 gnu/packages/gnome.scm:12346 7f21ce037420>
#<package bitcoin-unlimited@1.10.0.0 gnu/packages/finance.scm:1640 7f21cd2a5f20>
#<package fulcrum@1.1.1 gnu/packages/finance.scm:1712 7f21cd2a5e70>
#<package flowee@2020.04.1 gnu/packages/finance.scm:1747 7f21cd2a5dc0>
#<package kicad@6.0.6 gnu/packages/engineering.scm:946 7f21d20b6000>
#<package kicad-footprints@6.0.6 gnu/packages/engineering.scm:1114 7f21d20d4dc0>
#<package kicad-symbols@6.0.6 gnu/packages/engineering.scm:1086 7f21d20d4e70>
#<package emacs-execline@1.1 gnu/packages/emacs-xyz.scm:30310 7f21ced50c60>
#<package python-pyodbc-c@3.1.5 gnu/packages/databases.scm:3057 7f21cd48f370>)
--8<---------------cut here---------------end--------------->8---
Not that bad. :-)
Note that the 2 packages using recursive checkouts are not missing; the
data is in SWH but the checksum hits bug#48540.
Ok, let save these missing packages.
--8<---------------cut here---------------start------------->8---
$ for p in tint2 surfraw remmina libsequoia zn-poly ecl-cl-utilities cl-utilities sbcl-cl-utilities openrgb guile-ac-d-bus guile-goblins graphviz komikku bitcoin-unlimited fulcrum flowee kicad kicad-footprints kicad-symbols emacs-execline python-pyodbc-c; do guix lint -c archival $p ;done
gnu/packages/xdisorg.scm:1848:12: tint2@0.14.6: Disarchive entry refers to non-existent SWH directory 'b37b584d6b32848a4d57e8cab1af412cd46fcc9e'
gnu/packages/vnc.scm:66:5: remmina@1.4.23: scheduled Software Heritage archival
gnu/packages/sequoia.scm:421:12: libsequoia@0.22.0: scheduled Software Heritage archival
gnu/packages/sagemath.scm:231:5: zn-poly@0.9.2: scheduled Software Heritage archival
gnu/packages/hardware.scm:986:5: openrgb@0.7: scheduled Software Heritage archival
gnu/packages/guile-xyz.scm:3800:12: guile-ac-d-bus@1.0.0-beta.0: scheduled Software Heritage archival
gnu/packages/guile-xyz.scm:5109:5: guile-goblins@0.8: scheduled Software Heritage archival
gnu/packages/gnome.scm:12350:5: komikku@0.39.0: scheduled Software Heritage archival
gnu/packages/finance.scm:1644:5: bitcoin-unlimited@1.10.0.0: scheduled Software Heritage archival
gnu/packages/engineering.scm:949:12: kicad@6.0.6: scheduled Software Heritage archival
gnu/packages/engineering.scm:1118:12: kicad-footprints@6.0.6: scheduled Software Heritage archival
gnu/packages/engineering.scm:1089:12: kicad-symbols@6.0.6: scheduled Software Heritage archival
gnu/packages/emacs-xyz.scm:30313:12: emacs-execline@1.1: scheduled Software Heritage archival
gnu/packages/databases.scm:3061:5: python-pyodbc-c@3.1.5: scheduled Software Heritage archival
--8<---------------cut here---------------end--------------->8---
About tint2, commit 34c0cb5d6305ff7cc56318fbaa649afbe83464c7 from Thu
Aug 4 replaces url-fetch by git-fetch.
Now, let examine SWH and browse the saved requests. The package
’remmina@1.4.23’ was saved because it had been visited on 11 January
2022. For instance, give a look at [4]. It means something is
unexpected although the source code is there: instead of string Git tag,
let consider the Git commit hash [5].
--8<---------------cut here---------------start------------->8---
scheme@(guix-user)> (lookup-origin-revision "https://gitlab.com/Remmina/Remmina" "v1.4.23")
$10 = #f
scheme@(guix-user)> (lookup-revision "a03c1648a090458736434c77c0be00a7cf9cc44b")
$11 = #<<revision> id: "a03c1648a090458736434c77c0be00a7cf9cc44b" date: #<date nanosecond: 0 second: 1 minute: 3 hour: 23 day: 19 month: 12 year: 2021 zone-offset: 0> directory: "cc094a7d19d607beea54bfec549b4120d8c2ec92" directory-url: "https://archive.softwareheritage.org/api/1/directory/cc094a7d19d607beea54bfec549b4120d8c2ec92/">
--8<---------------cut here---------------end--------------->8---
Well, it requires more investigations to understand why the Guix code
fails. Last, SWH fails to ingest
<https://gitlab.com/sequoia-pgp/sequoia.git> for instance; another
investigation.
1: <https://docs.softwareheritage.org/devel/glossary.html#term-loader>
2: <https://docs.softwareheritage.org/devel/apidoc/swh.loader.package.nixguix.html?highlight=nixguix#module-swh.loader.package.nixguix>
3: <https://guix.gnu.org/sources.json>
4: <https://archive.softwareheritage.org/browse/origin/visits/?origin_url=https://gitlab.com/Remmina/Remmina>
5: <https://gitlab.com/Remmina/Remmina/-/tags/v1.4.23>
All in all, robust time-machine needs some love. :-)
Cheers,
simon
[-- Attachment #2: snippet.scm --]
[-- Type: application/octet-stream, Size: 2617 bytes --]
(use-modules (guix) (gnu)
(guix git-download)
((guix swh) #:hide (origin?))
(ice-9 match)
(srfi srfi-1)
(srfi srfi-26))
(define (gitlab.com? package)
(define (gitlab-string? str)
(string-contains str "gitlab.com"))
(match (package-source package)
((? origin? o)
(match (origin-uri o)
((? string? url)
(gitlab-string? url))
(((? string? urls) ...)
(any gitlab-string? urls))
((? git-reference? ref)
(gitlab-string? (git-reference-url ref)))
(_ #f)))
(_ #f)))
(define packages-on-gitlab.com
(fold-packages (lambda (package result)
(if (gitlab.com? package)
(cons package result)
result))
'()))
(define git-references-on-gitlab.com
(filter (lambda (package)
(let* ((origin (package-source package))
(uri (origin-uri origin)))
(git-reference? uri)))
packages-on-gitlab.com))
(define tarballs-from-gitlab.com
(lset-difference eq?
packages-on-gitlab.com
git-references-on-gitlab.com))
(define recursive-packages-on-gitlab.com
(filter (lambda (package)
(let* ((origin (package-source package))
(uri (origin-uri origin)))
(match uri
((? git-reference? ref)
(git-reference-recursive? ref))
(_ #f))))
packages-on-gitlab.com))
(define archived-packages-on-swh
(filter (lambda (package)
(let* ((origin (package-source package))
(uri (origin-uri origin)))
(match uri
((? git-reference? ref)
(let ((url (git-reference-url ref))
(commit (git-reference-commit ref)))
(revision? (if (commit-id? commit)
(or (lookup-revision commit)
(lookup-origin-revision url commit))
(lookup-origin-revision url commit)))))
(_
(let ((hash (origin-hash origin)))
(lookup-content (content-hash-value hash) ;TODO: Check on Disarchive too.
(symbol->string
(content-hash-algorithm hash))))))))
packages-on-gitlab.com))
(define missing-packages
(lset-difference eq?
packages-on-gitlab.com
archived-packages-on-swh))
^ permalink raw reply [flat|nested] 7+ messages in thread
end of thread, other threads:[~2022-08-17 16:50 UTC | newest]
Thread overview: 7+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-08-06 13:08 GitLab to plans to delete dormant projects Olivier Dion via Development of GNU Guix and the GNU System distribution.
2022-08-06 14:34 ` Maxime Devos
2022-08-06 14:41 ` Julien Lepiller
2022-08-06 14:50 ` Olivier Dion via Development of GNU Guix and the GNU System distribution.
2022-08-07 21:09 ` Ludovic Courtès
2022-08-06 21:57 ` John Kehayias
2022-08-17 16:41 ` zimoun
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).