From: zimoun <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>, guix-devel@gnu.org
Subject: Re: Disarchive update
Date: Tue, 12 Oct 2021 11:19:18 +0200 [thread overview]
Message-ID: <86r1cqmwh5.fsf@gmail.com> (raw)
In-Reply-To: <87r1cu1pj5.fsf@inria.fr>
Hi Ludo,
On Sat, 09 Oct 2021 at 12:05, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:
> If you run:
>
> guix build /gnu/store/nnl67m8c2x9rwqbnych1agc6p7g5473g-disarchive-collection.drv
Oh, cool!
> and if you’re patient :-), you eventually get a 579 MB directory
> containing Disarchive metadata for 8,413 tarballs out of 9,113 (the
> missing tarballs are those that “disarchive disassemble” fails to
> handle, for instance because it couldn’t guess what compression method
> is being used.)
Timothy made this table months ago:
tar+gz 9090 52.0%
git 5294 30.3%
tar+xz 1184 06.8%
tar+bz2 775 04.4%
tar 393 02.2%
zip 273 01.6%
svn-multi 175 01.0%
svn 125 00.7%
file 51 00.3%
computed 38 00.2%
hg 36 00.2%
unknown-uri 20 00.1%
tar+gz? 15 00.1%
tar+lz 13 00.1%
tar+Z 4 00.0%
cvs 3 00.0%
bzr 3 00.0%
tar+lzma 2 00.0%
total 17494 100.0%
What is really missing is XZ and Bzip2 support in Disarchive, I guess.
> Where to go from here? Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated. The goal here
> would be for the Guix project to set up infrastructure populating a
> database automatically and creating backups, possibly via SWH (we’ll
> have to discuss it with them).
Timothy was working on feeding the database using each release. Well,
you can give a look at:
<https://git.ngyro.com/preservation-of-guix>
Then something along these lines:
$ sqlite3 /tmp/pog.db < schema.sql
$ guix repl -L . <(echo '
(use-modules (pog))
(ingest "6298c3ffd9654d3231a6f25390b056483e8f407c"
"/tmp/pog.db")
')
for where the commit hash corresponds to v1.0.0. I do not know if it
would be equivalent to run:
guix time-machine --commit=6298c3ffd9654d3231a6f25390b056483e8f407c \
-- build -m etc/disarchive-manifest.scm
> A plan we can already deploy would be:
>
> 1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
> 2. On berlin, add an mcron job that periodically copies the output of
> the latest “disarchive-collection” build to a directory, say
> /srv/disarchive. Thus, the database would accumulate tarball
> metadata over time.
>
> 3. Add an nginx route so that /srv/disarchive is served at
> https://disarchive.guix.gnu.org.
>
> 4. Add disarchive.guix.gnu.org to (guix download).
To replace (or add to) the current ’%disarchive-mirrors’ right?
Going this road (use Cuirass), why not generating the sources.json
similarly? Instead of the hack using the website builder.
On my side, I will try to resume what I started months ago: knowing the
SWH coverage. For instance, on this ~92% of tarballs, how many are
currently stored into SWH? Well, do not take your breath and I would be
happy if someone beats me. ;-)
Cheers,
simon
next prev parent reply other threads:[~2021-10-12 9:41 UTC|newest]
Thread overview: 15+ messages / expand[flat|nested] mbox.gz Atom feed top
2021-10-09 10:05 Disarchive update Ludovic Courtès
2021-10-09 10:37 ` Mathieu Othacehe
2021-10-10 13:22 ` Ludovic Courtès
2021-10-12 8:41 ` Mathieu Othacehe
2021-10-14 14:06 ` Ludovic Courtès
2021-10-12 9:19 ` zimoun [this message]
2021-10-14 14:02 ` Ludovic Courtès
2021-10-14 19:17 ` zimoun
2021-10-21 19:41 ` Ludovic Courtès
2021-10-21 19:57 ` zimoun
2021-10-13 14:54 ` Timothy Sample
2021-10-14 14:04 ` Ludovic Courtès
2021-10-14 14:31 ` Ludovic Courtès
2021-10-14 21:44 ` zimoun
2021-10-21 19:44 ` Ludovic Courtès
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
List information: https://guix.gnu.org/
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=86r1cqmwh5.fsf@gmail.com \
--to=zimon.toutoune@gmail.com \
--cc=guix-devel@gnu.org \
--cc=ludovic.courtes@inria.fr \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
Code repositories for project(s) associated with this public inbox
https://git.savannah.gnu.org/cgit/guix.git
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).