unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: "Ludovic Courtès" <ludovic.courtes@inria.fr>
To: guix-devel <guix-devel@gnu.org>
Cc: guix-sysadmin@gnu.org, Timothy Sample <samplet@ngyro.com>,
	Simon Tournier <zimon.toutoune@gmail.com>
Subject: Disarchive database synchronization
Date: Tue, 14 Mar 2023 16:55:07 +0100	[thread overview]
Message-ID: <877cvj2uqs.fsf@inria.fr> (raw)


[-- Attachment #1.1: Type: text/plain, Size: 2102 bytes --]

Hello Guix!

As you may know, there are currently two different Disarchive databases:
the one at <https://disarchive.ngyro.com/> that Timothy Sample set up a
few years back, and the one at <https://disarchive.guix.gnu.org> that we
set up later, with a continuous integration job to populate it¹.

The database at ngyro.com has more historical metadata (metadata about
tarballs that older Guix revisions referred to) because Timothy worked
hard to populate it with tarballs from all the packages Guix refers to
starting from 1.0—which is crucial for long-term reproducibility.

Thanks to Timothy, I have now copied over things from
disarchive.ngyro.com to disarchive.guix.gnu.org.  The stats are as
follows:

  disarchive.ngyro.com had 28,396 entries
  12,905 (45%) entries were missing from disarchive.guix.gnu.org
  15,491 (the rest: 55%) entries were present in both yet different.
  3,444 entries of disarchive.guix were missing from disarchive.ngyro²

I copied over the 12K entries that were missing from
disarchive.guix.gnu.org.  (Note that there are currently only two copies
of the database: one at/in [bB]erlin, and one at/in [Bb]ordeaux.)
disarchive.guix.gnu.org now weighs in at 1.8 GiB for 31,839 entries.

For the remaining entries, it’s trickier.  Sometimes it’s just the
gzip compression parameters that differ, which could be addressed with a
little bit more work:

--8<---------------cut here---------------start------------->8---
$ file ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz
ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz:                         gzip compressed data, max compression, from Unix, original size modulo 2^32 446731
../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz: gzip compressed data, max speed, from Unix, original size modulo 2^32 446731
--8<---------------cut here---------------end--------------->8---

Sometimes it’s trickier:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: Type: text/x-patch, Size: 1266 bytes --]

# diff -u <(gunzip -d < 0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz) <(gunzip -d < ../../disarchive/sha256/0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz)
--- /dev/fd/63  2023-03-14 16:13:21.635733426 +0100
+++ /dev/fd/62  2023-03-14 16:13:21.635733426 +0100
@@ -1,7 +1,7 @@
 (disarchive
   (version 0)
   (gzip-member
-    (name "webview-sys-0.6.2.tar.gz")
+    (name "rust-webview-sys-0.6.2.tar.gz")
     (digest
       (sha256
         "0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9"))
@@ -13,7 +13,7 @@
     (footer (crc 1807070134) (isize 121344))
     (compressor zlib-best)
     (input (tarball
-             (name "webview-sys-0.6.2.tar")
+             (name "rust-webview-sys-0.6.2.tar")
              (digest
                (sha256
                  "4fb18f3206838e11f7f8caba6fad9e0f796109428b502793b9f2f0613fe0f275"))
@@ -78,7 +78,7 @@
              (padding 0)
              (input (directory-ref
                       (version 0)
-                      (name "webview-sys-0.6.2")
+                      (name "rust-webview-sys-0.6.2")
                       (addresses
                         (swhid "swh:1:dir:fa41df38bf639ada28c900b0915661e787fe6d15"))
                       (digest

[-- Attachment #1.3: Type: text/plain, Size: 808 bytes --]


As Tim pointed out, Disarchive disassembly is not fully deterministic
and/or might change a bit over time as Disarchive evolves, and that’s
prolly what we’re seeing here.

The admins among us can see the remaining files in
/gnu/disarchive.ngyro.com on berlin.  That directory also contains two
files: ‘files-present-in-both-yet-different.txt’ and
‘files-that-were-missing.txt’.

Kudos to Timothy for making it possible.

Feedback welcome!

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00060.html

² Some of these showed up at disarchive.ngyro.com since I copied the
  database ~16h ago.  Example missing entry is “samplv1-0.9.24.tar.gz”:
  <https://disarchive.ngyro.com/sha256/ff0bfbaacfb514cb1a0194b0a43ca121f7679640a293f907fb1bbb2640d373b0>.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 869 bytes --]

             reply	other threads:[~2023-03-14 15:55 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-03-14 15:55 Ludovic Courtès [this message]
2023-03-18 19:49 ` Disarchive database synchronization Timothy Sample
2023-03-20  9:14   ` Ludovic Courtès
2023-04-03 15:07   ` Simon Tournier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=877cvj2uqs.fsf@inria.fr \
    --to=ludovic.courtes@inria.fr \
    --cc=guix-devel@gnu.org \
    --cc=guix-sysadmin@gnu.org \
    --cc=samplet@ngyro.com \
    --cc=zimon.toutoune@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).