From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms9.migadu.com with LMTPS id AIP9Kb8gFmQnMwAASxT56A (envelope-from ) for ; Sat, 18 Mar 2023 21:36:15 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:4a6f::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id 0BVVKb8gFmQ9vAAAG6o9tA (envelope-from ) for ; Sat, 18 Mar 2023 21:36:15 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 561663BB8B for ; Sat, 18 Mar 2023 21:36:15 +0100 (CET) Received: from localhost ([::1] helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1pddHH-0001f5-3W; Sat, 18 Mar 2023 16:35:47 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pddHF-0001ei-Ju for guix-devel@gnu.org; Sat, 18 Mar 2023 16:35:45 -0400 Received: from out4-smtp.messagingengine.com ([66.111.4.28]) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1pddHD-0004jt-7o for guix-devel@gnu.org; Sat, 18 Mar 2023 16:35:45 -0400 Received: from compute5.internal (compute5.nyi.internal [10.202.2.45]) by mailout.nyi.internal (Postfix) with ESMTP id 3A2535C0112; Sat, 18 Mar 2023 16:35:42 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute5.internal (MEProxy); Sat, 18 Mar 2023 16:35:42 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:cc:content-transfer-encoding :content-type:content-type:date:date:feedback-id:feedback-id :from:from:in-reply-to:in-reply-to:message-id:mime-version :references:reply-to:sender:subject:subject:to:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm2; t= 1679171742; x=1679258142; bh=anBKri+Tex9GjquFu5VuQR54AVEQzJGvI29 r59J1nDY=; b=Sh50QhE/oUB/mnAK+p2n/NE2pgoop5IW1gQO9wZUrsVab0FBF0v 8s+wSjJdRqHTleVRCPaXTgob2YtaHPKFHqeEZXi9hWnl8hNNlVm5aSCFHYXrzBio uc2NJgtWQlgIiA4Ay7lMyj3M3B9YryC6V6RauM7SRD6tEvDiJJkWJLZifmD+tJQI eEq7GOYqejHEarOHVOk9JnEpIFMOadGgnkHuMqtCBNKiHCawpStlaw1sNSSmqneL eq7o9crTqprbF5yIVx5+JzuHvs8XeyXi4/T0ifisFI/dw9FblXEmGWctceZC8ips pmPvI8l2zdZtjERjQXGUTrtnzrcvG5ZWUCw== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvhedrvdefgedgudefkecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvvefufhffjgfkfgggtgfgse htqhertddtreejnecuhfhrohhmpefvihhmohhthhihucfurghmphhlvgcuoehsrghmphhl vghtsehnghihrhhordgtohhmqeenucggtffrrghtthgvrhhnpeehvdekgfetffelvdeivd egudevgeeiteeutedtvdekkeeikeffgeeuleeiteeufeenucffohhmrghinhepshhofhht figrrhgvhhgvrhhithgrghgvrdhorhhgpdhgnhhurdhorhhgpdgsihhotghonhguuhgtth horhdrohhrghenucevlhhushhtvghrufhiiigvpedtnecurfgrrhgrmhepmhgrihhlfhhr ohhmpehsrghmphhlvghtsehnghihrhhordgtohhm X-ME-Proxy: Feedback-ID: i4721425c:Fastmail Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sat, 18 Mar 2023 16:35:41 -0400 (EDT) From: Timothy Sample To: Simon Tournier Cc: guix-devel@gnu.org Subject: Re: Preservation of Guix (PoG) report 2023-03-13 References: <86356739hb.fsf@gmail.com> Date: Sat, 18 Mar 2023 14:35:40 -0600 In-Reply-To: <86356739hb.fsf@gmail.com> (Simon Tournier's message of "Tue, 14 Mar 2023 11:36:48 +0100") Message-ID: <87o7oplrvn.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=66.111.4.28; envelope-from=samplet@ngyro.com; helo=out4-smtp.messagingengine.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=-0.01, RCVD_IN_MSPIKE_WL=-0.01, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: guix-devel-bounces+larch=yhetil.org@gnu.org X-Migadu-Country: US X-Migadu-Flow: FLOW_IN ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b="Sh50QhE/"; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=none ARC-Seal: i=1; s=key1; d=yhetil.org; t=1679171775; a=rsa-sha256; cv=none; b=eJN4pzs5iSdJtUGFewqvppLQDtxQp9pXAKbSu1qNsRqpPK3HusdB6IZ3R3aU/Pn7jj7yoE /Hqcjw+OBURC3wh1DuKOg1jby6u1OWxVe3jk/p7RfEX7rty7iCpfTyNyOs6eeHhbsw4dth PHy9P6RkBApNK791idfPpUNFxbw8UgHYYQQ0/+fOBTE/LbJ5umpIQAzUz/GlLiqAZrRPkP 9Xr0EhD0IiRVMAvEGB+z3t+Q48quyOIb7pX34P8PqUeoIriqwJz0Q9E2Kw8VquxL97LRxa mvX/jzWAqSWlZeEHGYPQdupFrLVmpRHTmGSxo+8u/bLanmf9yLY4NsTpOFlbBw== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1679171775; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=anBKri+Tex9GjquFu5VuQR54AVEQzJGvI29r59J1nDY=; b=pT1KOTwlIiqxqeONZDkB+DBFj8SUXFoH5xxFBy7Ra0z2wWqXXfRMZdhpsN1+rcC680EjSH HaGs0QxH87mQzEuX5vl5XH8GbmF5YisSXtCFBvkko6hikOzxMU26kRwWi+Unxl9oJANPCe s5tPbwu1SN3BOIef/mHTKC6AeLX0pc7Ph6r0G5L1NU3sG74cMRC/jFI/FzSY/SfwyH9+c4 z71+h8DX3d6gNAR9h/+chj3l0SEQoLAXV/INqyIUOMELkgvuYcUbscPB0SnK7MhZi5A6Zr cpwq3985W77hPoyzbKbeQeYNjafxp2e9XU/u1/ZE3RFjrxRfoHu8C8hwtUOmfA== X-Migadu-Spam-Score: -2.15 X-Spam-Score: -2.15 X-Migadu-Queue-Id: 561663BB8B X-Migadu-Scanner: scn1.migadu.com Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b="Sh50QhE/"; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org"; dmarc=none X-TUID: 9JgCAwiMCLkH Hey, Simon Tournier writes: > Well, I do not remember if you consider also the =E2=80=99origin=E2=80=99 > (fixed-outputs) as =E2=80=99inputs=E2=80=99 or =E2=80=99patches=E2=80=99.= Do you? I=E2=80=99m quite confident I=E2=80=99m getting everything. I=E2=80=99ll d= escribe my approach, because I=E2=80=99m happy with it. :) The Guix package graph exists twice, essentially. There=E2=80=99s the high-level representation made up of packages, origins, gexps, etc. Then, there is the low-level representation which is just derivations. The high-level representation has nice metadata and makes sense to humans, while the low-level representation is easy to traverse. AFAICT, there=E2=80=99s no generic way to traverse the high-level representation. Every lowerable object has complete control over how it references other lowerable objects, and is not obliged provide any means of listing those references. That is, there=E2=80=99s no =E2=80=98lowerabl= e-inputs=E2=80=99 procedure or anything like that. (We have =E2=80=98bag-node-edges=E2=80=99= in =E2=80=98(guix scripts graph)=E2=80=99, but it doesn=E2=80=99t cover everything.) What I do for the report is traverse (as best I can) the high-level representation and construct a map from derivations to origin objects. Then, I traverse the low-level representation to find all the fixed-output derivations. Finally, I use the map to look up origin objects for each fixed-output derivation. If I miss an origin object, the fixed-output derivation still gets recorded. It will show up in the report as =E2=80=9Cunknown=E2=80=9D until I investigate why it=E2=80=99s mi= ssing and correct it. There=E2=80=99s currently 56 (out of 54K) fixed-output derivations that are missing metadata in my database. A fair few of them have to do with Telegram, Thunderbird, and UBlock Origin. All it means is that those packages have sneaky ways of referencing origins that my code can=E2=80=99t handle. It=E2=80=99s harmless and easy to fix as time permits. >> Over the whole set, 77.1% are known to be safely tucked away in the >> Software Heritage archive. But it=E2=80=99s actually much better than t= hat. If >> we only look at the most recent sampled commit (from Sunday the 5th), >> that number becomes 87.4%, which is starting to look pretty good! > > Just to be point the new nixguix loader [1] is still in SWH staging and > not yet deployed, IIRC. It will not change much the coverage on our > side but it should be fix some corner-cases. > > 1: Good to know! >> This is kinda like an automated version of Simon=E2=80=99s recent >> investigation. > > Neat! Note that I also wanted to check the SWH capacity for cooking, > not only checking the end points. For instance, it allowed to discover > mismatch due to uncovered CR/LF normalization; now fixed with: > 58f20fa8181bdcd4269671e1d3cef1268947af3a. Maybe we need a =E2=80=9Cchaos monkey mode=E2=80=9D for Guix. It could ran= domly select packages to build, randomly pick source code fallback methods, and also test reproducibility (like =E2=80=9C--check=E2=80=9D). You could have a bl= ocklist for browsers, etc., but otherwise it could pick the odd package to test thoroughly. Those of us with the time and inclination could crank up that knob and get interesting feedback about reproducibility at the cost of doing a few package builds here and there. >> Here=E2=80=99s a rough road map for that based on a glance at the script= =E2=80=99s >> output: >> >> =E2=80=A2 Subversion support (for TeX-based documentation stuff, I g= uess) > > For the interested reader, details for helping in the implementation: > > https://issues.guix.gnu.org/issue/43442#9 > https://issues.guix.gnu.org/issue/43442#11 Fantastic. That looks very promising! > However, it would ease all the dance if SWH would consider to store and > expose NAR hashes on their side. As discussed here: > > https://gitlab.softwareheritage.org/swh/meta/-/issues/4538 This would be nice, yes. >> However, 42% of them are old Bioconductor packages. They >> seem to be lost. It looks like Bioconductor now stores multiple package >> versions per Bioconductor version [2], but before version 3.15 that was >> not the case. As an example, take =E2=80=9Cggcyto=E2=80=9D from Biocond= uctor 3.10 [3]. >> We packaged version 1.14.0, and then at some point Bioconductor 3.10 >> switched to version 1.14.1. We packaged that, too, but now 1.14.0 is >> gone. > > Well, I have not investigated much because it is between December 2019 > and March 2020 thus =E2=80=9Cguix time-machine=E2=80=9D is not smooth for= this old time. > > First question, does we have the source tarball in Berlin or Bordeaux or > somewhere else? If yes, there is a hope. :-) Else, it is probably gone > forever. Like I wrote, I picked up a handful from Bordeaux, but not much. > The hope is: https://git.bioconductor.org/packages/ggcyto > > If we have the tarball with the correct checksum from commit > f5f440312d848e12463f0c6f7510a86b623a9e27 > > + (version "1.14.0") > + (source > + (origin > + (method url-fetch) > + (uri (bioconductor-uri "ggcyto" version)) > + (sha256 > + (base32 > + "165qszvy5z176h1l3dnjb5dcm279b6bjl5n5gzz8wfn4xpn8anc8")))) > > then we can disassemble it and then using the Git repository, we can try > to assemble the content from SWH and the meta from Disarchive DB. I played around with this approach a bit, but it=E2=80=99s extremely tediou= s, and I=E2=80=99m not hopeful it will work. Even if it does, it will be hard= to automate. I never fully tested the idea, just decided the effort was too high for such a low probability of success. I=E2=80=99m putting these = in the =E2=80=9Clow priority=E2=80=9D bin for now. -- Tim