From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id KNPUOQDQD2RTpwAASxT56A (envelope-from ) for ; Tue, 14 Mar 2023 02:38:09 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id EEAIOQDQD2Q98AAAG6o9tA (envelope-from ) for ; Tue, 14 Mar 2023 02:38:08 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 88A1B22713 for ; Tue, 14 Mar 2023 02:38:08 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pbtbc-0004qd-CT; Mon, 13 Mar 2023 21:37:36 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pbtba-0004qP-8k for guix-devel@gnu.org; Mon, 13 Mar 2023 21:37:34 -0400 Received: from wout5-smtp.messagingengine.com ([64.147.123.21]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pbtbX-0003OI-BP for guix-devel@gnu.org; Mon, 13 Mar 2023 21:37:33 -0400 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.west.internal (Postfix) with ESMTP id 1E13B32005D8 for ; Mon, 13 Mar 2023 21:37:25 -0400 (EDT) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Mon, 13 Mar 2023 21:37:25 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-type:content-type:date:date :feedback-id:feedback-id:from:from:in-reply-to:message-id :mime-version:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1678757844; x=1678844244; bh=0Pj7onpBZzAxxzgYqvsdxw0tOl2gKX7IPqN SuIm3sVw=; b=o1OKFB2uXQLzusBDydrBRcbDM2lSIPRY76rpr/Ax2YIbsfOZ4VW +lYmXL0PcINtDpyWEuC85npMcGX1wK0Lk67DkqqeTAol1TSzighnWoqGJP7m8OoW 6yu1ZufL3DqjKwggJXtKp6g+Mj+kVzdbA1yVv5j8/RD75VG/ooL8VbubkrgzFymF Bi0aim/KZ3DnQ7YuCRePPr7QrtWeuiAwBAOeoMFhgwejuZELgmiRvww8qsW1LcAx IXRSEU8Q5B/TCLYMDP+zDi0kYJPwXyVxceOOMiACmUd5xapD1fZKnTLQiQCt+Udd K/6NBIaubEUrbu3outdkvDXrfIA85LVsFrg== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvddvhedgfeehucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucenucfjughrpefhvffufffkfgggtgesmhdtreertd erjeenucfhrhhomhepvfhimhhothhhhicuufgrmhhplhgvuceoshgrmhhplhgvthesnhhg hihrohdrtghomheqnecuggftrfgrthhtvghrnhepgfekkefghfetgfehiedtjefgtefhie elueeihffghffhteeuhedtkedugefhjefhnecuffhomhgrihhnpehnghihrhhordgtohhm pdhgnhhurdhorhhgpdgsihhotghonhguuhgtthhorhdrohhrghenucevlhhushhtvghruf hiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhrohhmpehsrghmphhlvghtsehnghihrhho rdgtohhm X-ME-Proxy: Feedback-ID: i4721425c:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Mon, 13 Mar 2023 21:37:24 -0400 (EDT) From: Timothy Sample To: guix-devel@gnu.org Subject: Preservation of Guix (PoG) report 2023-03-13 Date: Mon, 13 Mar 2023 19:37:23 -0600 Message-ID: <87r0tsm7u4.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: multipart/mixed; boundary="=-=-=" Received-SPF: pass client-ip=64.147.123.21; envelope-from=samplet@ngyro.com; helo=wout5-smtp.messagingengine.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=messagingengine.com header.s=fm2 header.b=o1OKFB2u; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=none ARC-Seal: i=1; s=key1; d=yhetil.org; t=1678757888; a=rsa-sha256; cv=none; b=tNPFmBx0S7tNKo9uT0j9UAbHgFxBrUk+B6GwUtgmq0YZjLRN1UoqDezvcIufme2l3rpGQv Ex0h1HfnU8vwBwX0G84++9yfk3FYVKIMpJpPNv+DjndA5IxC7pfZA7ptZ1tteFVAIMW7c1 Gx8roCwS15dG3rmCijoL9TiNU87J11w7QbdRRywD02kO4WwTn3KPsumnU+UriA3zGfoTyg viS3WyFgJ3JZHGnUMhzu7fvPFvKrRVQbDFCCu7/K5YZ8DwLKybbT/l9pIOJ/QPctOixqhL jqiC0obPrpHQ0QQDWW2T3w1fx3+9nVSnx5wBgX3DxOjkLBoi5jO8bmOSocrelw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1678757888; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=AOtItuFkr7LccIm0fK0oO/yzgJyCB7TG1WIgO7/0iK0=; b=HCPTyoHlIvqie9zQrgOoxhxvtY2bobeytDi/BG4UmS/E7vJ/7SvzzONA4uDuegHC/NHAvK AyLnYgYm38KNrRyotxSFoVHMNs3dXRQF3SjYh7WjKqQcYBTN5g0PMyR9BgVx724vnoAVZA gRJrm8zpWdJe7Y0I2RwNi7jYh+chP46GSsPQdWvr8etc1YGOUmG0Ph3o7Ms8SazvXQU0ws NQkYsudfNKPJc4ibIlBdufMGmYCSa/8gSdkwL8x0NtDsX/bw4z/FzJcmhOBgukpBL3s5kk rbhXHIFkTAMPzWriLk8F3I4AFfc0sSxEYdWF5Ls3PvEUNLPKyPw2sWk2/1bMdA== X-Migadu-Spam-Score: -2.12 X-Spam-Score: -2.12 X-Migadu-Queue-Id: 88A1B22713 X-Migadu-Scanner: scn1.migadu.com Authentication-Results: aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=messagingengine.com header.s=fm2 header.b=o1OKFB2u; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=none X-TUID: zc4S0ou2yQTN --=-=-= Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Hi Guix, It=E2=80=99s been a while! :) Allow me to present to you a long-overdue update to the Preservation of Guix (PoG) report: . =F0=9F=8E= =89 Note that you can link to the most recent version of the report using . What is this? Well, I added a description to the report itself, but here=E2=80=99s a brief teaser. The PoG report shows what we know about the archival status of the approximately 54K sources (and counting) Guix has linked to since around the time of the 1.0 release. For this edition, I took a bit of time to fix the contrast and colours to be a bit more accessible. They=E2=80=99re about half as garish as they = used to be, too. Over the whole set, 77.1% are known to be safely tucked away in the Software Heritage archive. But it=E2=80=99s actually much better than that= . If we only look at the most recent sampled commit (from Sunday the 5th), that number becomes 87.4%, which is starting to look pretty good! I have a few more notes on the report, but I want to put this near the top of the message so that people will see it. :) I wrote a script (see attached) that uses the PoG database to find missing sources on a packge-by-package basis. That is, you can run guix repl specification-to-swhids.scm pog.db bash and it will print a table of all of the transitive sources needed to build Bash, along with their preservation status. Here=E2=80=99s a (heavily edited and snipped to fit an email message) sample of its output: [... many =E2=80=9Cstored=E2=80=9D inputs] sha256 0r5p. swh:1:dir:02f7. stored /gnu/store/.-gmp-6.0.0a.tar.xz sha256 0c3k. swh:1:dir:6027. stored /gnu/store/.-mescc-....tar.xz sha256 1r1z. swh:1:dir:6087. stored /gnu/store/.-bash-2.05b.tar.gz sha256 14l0. unknown unknown /gnu/store/.-gcc-4.9.4.tar.bz2 sha256 0m2y. unknown unknown /gnu/store/.-ed-1.17.tar.lz [... more =E2=80=9Cunknown=E2=80=9D inputs] (I had to pipe the output to =E2=80=9Csort -k 4=E2=80=9D to have it sorted = by status.) The first two columns are the Guix hash. The next two columns are the SWHID (if known) and whether SWH has it (if known). That last column is the store filename (which is nice because it usually tells you what it is we are looking at). In this sample, you can see that GMP, MesCC Tools, and Bash are all safe. However, we don=E2=80=99t know about GCC 4 a= nd ed. This is kinda like an automated version of Simon=E2=80=99s recent investigation [1]. The =E2=80=9Cunknown=E2=80=9D two are due to Disarchive= =E2=80=99s lack of support for those compression formats. I just wrote this script today (mind the rough edges), and I=E2=80=99ve learned a lot from trying it on a = few packages. It=E2=80=99s a little like a terrifying robotic TODO list, since= it shows a lot of problems, but it=E2=80=99s also exiting because solving all = the problems for the Guix package, say, would be a massive leap forward. Here=E2=80=99s a rough road map for that based on a glance at the script=E2= =80=99s output: =E2=80=A2 Subversion support (for TeX-based documentation stuff, I gues= s) =E2=80=A2 bzip2 support for Disarchive (there are 45 bzip2 tarballs) =E2=80=A2 ZIP support for Disarchive (for the 8 ZIP files) =E2=80=A2 lzip support for Disarchive (or a workaround for ed) =E2=80=A2 Fix some issues (gettext is .tar.gz, but something went wrong) =E2=80=A2 Do something with the static bootstrap binaries [1] https://lists.gnu.org/archive/html/guix-devel/2023-02/msg00398.html If you want to try it out for yourself, you=E2=80=99ll need to download the database . Heads up: it=E2=80=99s just over 200M, and my server can be pretty slow. One other stray thought: the script should work with the time machine, so you can check on packages from the past. I didn=E2=80=99t test it, but = I bet it=E2=80=99s fine. Okay. Here are the rest of my notes about the report itself. One thing that jumps out at me is 189 Git sources that SWH does not have. Usually they have basically all of the non-recursive Git sources. It=E2=80=99s something to look into. I also took a quick peek at the 1.9K =E2=80=9Cunknown=E2=80=9D tar-gz sourc= es. About 39% percent of them are old Rust crates. It=E2=80=99s a known problem with Disarchive. However, 42% of them are old Bioconductor packages. They seem to be lost. It looks like Bioconductor now stores multiple package versions per Bioconductor version [2], but before version 3.15 that was not the case. As an example, take =E2=80=9Cggcyto=E2=80=9D from Bioconduct= or 3.10 [3]. We packaged version 1.14.0, and then at some point Bioconductor 3.10 switched to version 1.14.1. We packaged that, too, but now 1.14.0 is gone. I know it=E2=80=99s been discussed before, but I can=E2=80=99t remem= ber what the conclusion was. Are these just gone forever? I=E2=80=99m doing another pa= ss through all of them and recovering a few from the bordeaux substitute server, but only a handful. [2] https://bioconductor.org/packages/3.15/bioc/src/contrib/Archive/DiffBin= d/ [3] https://bioconductor.org/packages/3.10/bioc/html/ggcyto.html That=E2=80=99s all for now. Enjoy the update and the script! -- Tim --=-=-= Content-Type: text/plain; charset=utf-8 Content-Disposition: attachment; filename=specification-to-swhids.scm Content-Transfer-Encoding: quoted-printable ;;; specification-to-swhids.scm ;;; Copyright =C2=A9 2023 Timothy Sample ;;; ;;; This program is free software: you can redistribute it and/or modify ;;; it under the terms of the GNU General Public License as published by ;;; the Free Software Foundation, either version 3 of the License, or (at ;;; your option) any later version. ;;; ;;; This program is distributed in the hope that it will be useful, but ;;; WITHOUT ANY WARRANTY; without even the implied warranty of ;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU ;;; General Public License for more details. ;;; ;;; You should have received a copy of the GNU General Public License ;;; along with this program. If not, see . (use-modules (gnu packages) (guix base32) (guix derivations) (guix gexp) (guix monads) (guix store) (ice-9 format) (ice-9 getopt-long) (ice-9 match) (sqlite3) (srfi srfi-9 gnu)) ;;; Database stuff (define (call-with-sqlite-db filename proc) "Open the SQLite database at FILENAME and pass the resulting connection to PROC. The connection will only be open during the dynamic extent of PROC. If that dynamic extent is re-entered (using a continuation, say), the database connection will be re-established." (let ((db #f)) (dynamic-wind (lambda () (set! db (sqlite-open filename))) (lambda () (proc db)) (lambda () (sqlite-close db) (set! db #f))))) (define (database-lookup db query params converter) "Using the SQLite database connection DB, run QUERY with PARAMS, and map CONVERTER over the resulting rows." (let* ((stmt (sqlite-prepare db query)) (_ (unless (null? params) (apply sqlite-bind-arguments stmt params))) (result (sqlite-fold (lambda (x acc) (cons (converter x) acc)) '() stmt))) (sqlite-finalize stmt) result)) (define (lookup-swh-status db algorithm hash) "Using the SQLite database connection DB, lookup the SWHID of the fixed-output derivation with the ALGORITHM-computed checksum HASH. Here, both ALGORITHM and HASH are strings, the latter being the Nix base-32 representation of the hash value." (define query "\ SELECT swhid, is_in_swh FROM fods WHERE algorithm =3D ? AND hash =3D ?") (define (converter row) row) (and=3D> (database-lookup db query (list algorithm hash) converter) car)) ;;; Guix stuff (define (derivation-transitive-fixed-output-inputs drv) "Compute the list of all fixed-output derivations in the transitive inputs of the derivation DRV." (define seen (make-hash-table)) (define fod-hashes (make-hash-table)) (define (seen? drv) (hashq-ref seen drv)) (let loop ((queue (list drv))) (match queue (() (hash-map->list cons fod-hashes)) ((drv . rest) (hashq-set! seen drv #t) (when (fixed-output-derivation? drv) (let* ((out (assoc-ref (derivation-outputs drv) "out")) (algo (derivation-output-hash-algo out)) (hash (derivation-output-hash out)) (filename (derivation-output-path out))) (hash-set! fod-hashes (cons algo hash) filename))) (loop (append (filter (negate seen?) (map derivation-input-derivation (derivation-inputs drv))) rest)))))) (define (lookup-object-hashes obj) "Get the list of Guix hashes needed for the lowerable object OBJ." (let ((drv (run-with-store (open-connection) (lower-object obj)))) (derivation-transitive-fixed-output-inputs drv))) ;;; Glue (define-immutable-record-type (make-source algorithm hash filename swhid in-swh?) source? (algorithm source-algorithm) (hash source-hash) (filename source-filename) (swhid source-swhid) (in-swh? source-in-swh?)) (define (guix-hash->source db hash-obj) "Using the SQLite database connection DB, convert HASH-OBJ to a source record. HASH-OBJ should be a result from 'lookup-object-hashes'." (match-let* ((((algorithm . hash) . filename) hash-obj) (algorithm (symbol->string algorithm)) (hash (bytevector->nix-base32-string hash)) (#(swhid in-swh?) (lookup-swh-status db algorithm hash))) (make-source algorithm hash filename swhid in-swh?))) (define (object-sources db obj) "Using the SQLite database connection DB, get the list of source records for the lowerable object OBJ." (let ((hashes (lookup-object-hashes obj))) (map (lambda (hash) (guix-hash->source db hash)) hashes))) ;; Shell interface (define (print-source src) (format #t "~a\t~a\t~50a\t~a\t~a~%" (source-algorithm src) (source-hash src) (or (source-swhid src) "unknown") (cond ((source-in-swh? src) "stored") ((source-swhid src) "missing") (else "unknown")) (source-filename src))) (define version "2023-03-13-0") (define version-message (string-append "\ specification-to-swhids.scm " version " ")) (define help-message "\ Usage: guix repl specification-to-swhids.scm DB-FILENAME SPECIFICATION Print a table of Guix hashes, SWHIDs, and store filenames for the Guix package SPECIFICATION using the Preservation of Guix database at DB-FILENAME. See . ") (define options-grammar `((help (single-char #\h)) (version (single-char #\V)))) (define (main args) (let ((options (getopt-long args options-grammar))) (when (option-ref options 'help #f) (display help-message) (exit EXIT_SUCCESS)) (when (option-ref options 'version #f) (display version-message) (exit EXIT_SUCCESS)) (match (option-ref options '() #f) ((db-filename specification) (for-each print-source (let ((obj (specification->package specification))) (call-with-sqlite-db db-filename (lambda (db) (object-sources db obj))))) (exit EXIT_SUCCESS)) (_ (display help-message) (exit EXIT_FAILURE))))) (main (command-line)) --=-=-=--