unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* Disarchive database synchronization
@ 2023-03-14 15:55 Ludovic Courtès
  2023-03-18 19:49 ` Timothy Sample
  0 siblings, 1 reply; 4+ messages in thread
From: Ludovic Courtès @ 2023-03-14 15:55 UTC (permalink / raw)
  To: guix-devel; +Cc: guix-sysadmin, Timothy Sample, Simon Tournier


[-- Attachment #1.1: Type: text/plain, Size: 2102 bytes --]

Hello Guix!

As you may know, there are currently two different Disarchive databases:
the one at <https://disarchive.ngyro.com/> that Timothy Sample set up a
few years back, and the one at <https://disarchive.guix.gnu.org> that we
set up later, with a continuous integration job to populate it¹.

The database at ngyro.com has more historical metadata (metadata about
tarballs that older Guix revisions referred to) because Timothy worked
hard to populate it with tarballs from all the packages Guix refers to
starting from 1.0—which is crucial for long-term reproducibility.

Thanks to Timothy, I have now copied over things from
disarchive.ngyro.com to disarchive.guix.gnu.org.  The stats are as
follows:

  disarchive.ngyro.com had 28,396 entries
  12,905 (45%) entries were missing from disarchive.guix.gnu.org
  15,491 (the rest: 55%) entries were present in both yet different.
  3,444 entries of disarchive.guix were missing from disarchive.ngyro²

I copied over the 12K entries that were missing from
disarchive.guix.gnu.org.  (Note that there are currently only two copies
of the database: one at/in [bB]erlin, and one at/in [Bb]ordeaux.)
disarchive.guix.gnu.org now weighs in at 1.8 GiB for 31,839 entries.

For the remaining entries, it’s trickier.  Sometimes it’s just the
gzip compression parameters that differ, which could be addressed with a
little bit more work:

--8<---------------cut here---------------start------------->8---
$ file ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz ../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz
ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz:                         gzip compressed data, max compression, from Unix, original size modulo 2^32 446731
../../disarchive/sha256/ffdc77f5e5cb2390b9309de63eb7be68d9fe631e898f4da6c04a8159daefc2c0.gz: gzip compressed data, max speed, from Unix, original size modulo 2^32 446731
--8<---------------cut here---------------end--------------->8---

Sometimes it’s trickier:


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: Type: text/x-patch, Size: 1266 bytes --]

# diff -u <(gunzip -d < 0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz) <(gunzip -d < ../../disarchive/sha256/0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9.gz)
--- /dev/fd/63  2023-03-14 16:13:21.635733426 +0100
+++ /dev/fd/62  2023-03-14 16:13:21.635733426 +0100
@@ -1,7 +1,7 @@
 (disarchive
   (version 0)
   (gzip-member
-    (name "webview-sys-0.6.2.tar.gz")
+    (name "rust-webview-sys-0.6.2.tar.gz")
     (digest
       (sha256
         "0001f025c1425ffe36270a81cb091eade87dd8d29ac773735ae47e1a8c8066c9"))
@@ -13,7 +13,7 @@
     (footer (crc 1807070134) (isize 121344))
     (compressor zlib-best)
     (input (tarball
-             (name "webview-sys-0.6.2.tar")
+             (name "rust-webview-sys-0.6.2.tar")
              (digest
                (sha256
                  "4fb18f3206838e11f7f8caba6fad9e0f796109428b502793b9f2f0613fe0f275"))
@@ -78,7 +78,7 @@
              (padding 0)
              (input (directory-ref
                       (version 0)
-                      (name "webview-sys-0.6.2")
+                      (name "rust-webview-sys-0.6.2")
                       (addresses
                         (swhid "swh:1:dir:fa41df38bf639ada28c900b0915661e787fe6d15"))
                       (digest

[-- Attachment #1.3: Type: text/plain, Size: 808 bytes --]


As Tim pointed out, Disarchive disassembly is not fully deterministic
and/or might change a bit over time as Disarchive evolves, and that’s
prolly what we’re seeing here.

The admins among us can see the remaining files in
/gnu/disarchive.ngyro.com on berlin.  That directory also contains two
files: ‘files-present-in-both-yet-different.txt’ and
‘files-that-were-missing.txt’.

Kudos to Timothy for making it possible.

Feedback welcome!

Ludo’.

¹ https://lists.gnu.org/archive/html/guix-devel/2021-10/msg00060.html

² Some of these showed up at disarchive.ngyro.com since I copied the
  database ~16h ago.  Example missing entry is “samplv1-0.9.24.tar.gz”:
  <https://disarchive.ngyro.com/sha256/ff0bfbaacfb514cb1a0194b0a43ca121f7679640a293f907fb1bbb2640d373b0>.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 869 bytes --]

^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2023-04-03 16:59 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-03-14 15:55 Disarchive database synchronization Ludovic Courtès
2023-03-18 19:49 ` Timothy Sample
2023-03-20  9:14   ` Ludovic Courtès
2023-04-03 15:07   ` Simon Tournier

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).