unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: zimoun <zimon.toutoune@gmail.com>
To: "Ludovic Courtès" <ludovic.courtes@inria.fr>, guix-devel@gnu.org
Subject: Re: Disarchive update
Date: Tue, 12 Oct 2021 11:19:18 +0200	[thread overview]
Message-ID: <86r1cqmwh5.fsf@gmail.com> (raw)
In-Reply-To: <87r1cu1pj5.fsf@inria.fr>

Hi Ludo,

On Sat, 09 Oct 2021 at 12:05, Ludovic Courtès <ludovic.courtes@inria.fr> wrote:

> If you run:
>
>   guix build /gnu/store/nnl67m8c2x9rwqbnych1agc6p7g5473g-disarchive-collection.drv

Oh, cool!

> and if you’re patient :-), you eventually get a 579 MB directory
> containing Disarchive metadata for 8,413 tarballs out of 9,113 (the
> missing tarballs are those that “disarchive disassemble” fails to
> handle, for instance because it couldn’t guess what compression method
> is being used.)

Timothy made this table months ago:

        tar+gz        9090  52.0%
        git           5294  30.3%
        tar+xz        1184  06.8%
        tar+bz2        775  04.4%
        tar            393  02.2%
        zip            273  01.6%
        svn-multi      175  01.0%
        svn            125  00.7%
        file            51  00.3%
        computed        38  00.2%
        hg              36  00.2%
        unknown-uri     20  00.1%
        tar+gz?         15  00.1%
        tar+lz          13  00.1%
        tar+Z            4  00.0%
        cvs              3  00.0%
        bzr              3  00.0%
        tar+lzma         2  00.0%
        total        17494 100.0%

What is really missing is XZ and Bzip2 support in Disarchive, I guess.


> Where to go from here?  Timothy Sample had already set up a Disarchive
> database at <https://disarchive.ngyro.com>, which (guix download) uses
> as a fallback; I’m not sure exactly how it’s populated.  The goal here
> would be for the Guix project to set up infrastructure populating a
> database automatically and creating backups, possibly via SWH (we’ll
> have to discuss it with them).

Timothy was working on feeding the database using each release.  Well,
you can give a look at:

<https://git.ngyro.com/preservation-of-guix>

Then something along these lines:

    $ sqlite3 /tmp/pog.db < schema.sql
    $ guix repl -L . <(echo '
          (use-modules (pog))
          (ingest "6298c3ffd9654d3231a6f25390b056483e8f407c"
                  "/tmp/pog.db")
      ')

for where the commit hash corresponds to v1.0.0.  I do not know if it
would be equivalent to run:

   guix time-machine --commit=6298c3ffd9654d3231a6f25390b056483e8f407c \
        -- build -m etc/disarchive-manifest.scm


> A plan we can already deploy would be:
>
>   1. Add the disarchive.guix.gnu.org DNS entry, pointing to berlin.
>
>   2. On berlin, add an mcron job that periodically copies the output of
>      the latest “disarchive-collection” build to a directory, say
>      /srv/disarchive.  Thus, the database would accumulate tarball
>      metadata over time.
>
>   3. Add an nginx route so that /srv/disarchive is served at
>      https://disarchive.guix.gnu.org.
>
>   4. Add disarchive.guix.gnu.org to (guix download).

To replace (or add to) the current ’%disarchive-mirrors’ right?

Going this road (use Cuirass), why not generating the sources.json
similarly?   Instead of the hack using the website builder.


On my side, I will try to resume what I started months ago: knowing the
SWH coverage.  For instance, on this ~92% of tarballs, how many are
currently stored into SWH?  Well, do not take your breath and I would be
happy if someone beats me. ;-)


Cheers,
simon


  parent reply	other threads:[~2021-10-12  9:41 UTC|newest]

Thread overview: 15+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-10-09 10:05 Disarchive update Ludovic Courtès
2021-10-09 10:37 ` Mathieu Othacehe
2021-10-10 13:22   ` Ludovic Courtès
2021-10-12  8:41     ` Mathieu Othacehe
2021-10-14 14:06       ` Ludovic Courtès
2021-10-12  9:19 ` zimoun [this message]
2021-10-14 14:02   ` Ludovic Courtès
2021-10-14 19:17     ` zimoun
2021-10-21 19:41       ` Ludovic Courtès
2021-10-21 19:57         ` zimoun
2021-10-13 14:54 ` Timothy Sample
2021-10-14 14:04   ` Ludovic Courtès
2021-10-14 14:31 ` Ludovic Courtès
2021-10-14 21:44   ` zimoun
2021-10-21 19:44   ` Ludovic Courtès

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=86r1cqmwh5.fsf@gmail.com \
    --to=zimon.toutoune@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=ludovic.courtes@inria.fr \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).