From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp1 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id s3cCBo1ycGF4WQEAgWs5BA (envelope-from ) for ; Wed, 20 Oct 2021 21:48:29 +0200 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp1 with LMTPS id yOkaAY1ycGFHMwAAbx9fmQ (envelope-from ) for ; Wed, 20 Oct 2021 19:48:29 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id C3044241CE for ; Wed, 20 Oct 2021 21:48:28 +0200 (CEST) Received: from localhost ([::1]:60824 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1mdHZc-0003oA-15 for larch@yhetil.org; Wed, 20 Oct 2021 15:48:28 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:42762) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mdHZN-0003o1-Tz for guix-devel@gnu.org; Wed, 20 Oct 2021 15:48:13 -0400 Received: from wout4-smtp.messagingengine.com ([64.147.123.20]:54127) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1mdHZM-00042Q-13 for guix-devel@gnu.org; Wed, 20 Oct 2021 15:48:13 -0400 Received: from compute4.internal (compute4.nyi.internal [10.202.2.44]) by mailout.west.internal (Postfix) with ESMTP id CA01A32013BE for ; Wed, 20 Oct 2021 15:48:08 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute4.internal (MEProxy); Wed, 20 Oct 2021 15:48:08 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=content-transfer-encoding:content-type :date:from:message-id:mime-version:subject:to:x-me-proxy :x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s=fm1; bh=lG957S 7qSWX9EpcCGsrr/47YECblbXKULXOLb3X3rvg=; b=Vls2025Qg7tJs4T5RGUz/q cJpxBeF/DAKT+8TDMCDh1XRxcHIzUxCJzygybUDHM8/QpSoYnHo9ISjxuJCx9Za2 Xn+XF7LX2DR3kHXKANEKDy45D92bhng41MYR3DVYZxJbZeYvWv8rrnBKKocJPfJX RFxmHWdb3Pyx3K6ts+Szes0mvSNpuGf78CSxOwLczd1LpS0hqoPdV3MedNNU/CoB YyJzi9m1T6fzN9Pqmxmkn/ce/il8Fh0qiunZ2BnjQJnQ0JC4Ikm2/JC7y0UdY6ot 9eBEWbxgqFmaT7Oq3JdbCuKF62C71ep02ivz4MRIW85Pl8hhNcNfd6w5mdPF2Uog == X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvtddrvddvgedgudefiecutefuodetggdotefrod ftvfcurfhrohhfihhlvgemucfhrghsthforghilhdpqfgfvfdpuffrtefokffrpgfnqfgh necuuegrihhlohhuthemuceftddtnecunecujfgurhephffvufffkfgfgggtgfesthhqre dttderjeenucfhrhhomhepvfhimhhothhhhicuufgrmhhplhgvuceoshgrmhhplhgvthes nhhghihrohdrtghomheqnecuggftrfgrthhtvghrnhepveeftdeiudefhffhhffhgffhue ejvdelveelfeefieffvedtkedugeffleevlefgnecuffhomhgrihhnpehnghihrhhordgt ohhmnecuvehluhhsthgvrhfuihiivgeptdenucfrrghrrghmpehmrghilhhfrhhomhepsh grmhhplhgvthesnhhghihrohdrtghomh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA for ; Wed, 20 Oct 2021 15:48:08 -0400 (EDT) From: Timothy Sample To: guix-devel@gnu.org Subject: Preservation of Guix Report Date: Wed, 20 Oct 2021 15:48:07 -0400 Message-ID: <87o87jjx54.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=64.147.123.20; envelope-from=samplet@ngyro.com; helo=wout4-smtp.messagingengine.com X-Spam_score_int: -25 X-Spam_score: -2.6 X-Spam_bar: -- X-Spam_report: (-2.6 / 5.0 requ) BAYES_00=-1.9, DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H2=-0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=ham autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.23 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1634759308; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=lG957S7qSWX9EpcCGsrr/47YECblbXKULXOLb3X3rvg=; b=dbpDU/IrSc0p95slf0xDqtcS1Tc0todTakg0JFMRNnEbKxcWeU5PINizQCgjZlpzByZZdr WFJFiiiippA2U1fKEjEOUWPLV7Dzaapj8nzMPLxi3GrID5AuJYmEfXM9MN1BU2o24HiSLk Jhjh+TydfwuGDz52B9OuPlE2nfK5F0YiTttGL1RDumXXFu3P+gRBsGQSm6iyVHThh/ohRR pl19eO/EXC8ZGJZDbXnJyxoUllhaJrUZ1zeNN7PomDW6R23t/PJWv8fU8yO0xWXYf+58/E 1ozfrPtkquVMPpikt2JonCoT24Qq/35arZNp2gizmX39rY4TGzMBtrfu0D22QQ== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1634759308; a=rsa-sha256; cv=none; b=Y1sA7mX+bXorRge44ZNbv2qSBSaYT2SAlHiO5EUgkXxVlGwp5+4j4lBRgOLpMRJo5Gul7L u4DM5Of5+GrU4LNBILqaWqveeznedrBeVpew73i3/5RMHSdrzd1fjahTYqPQRYKnE3j717 Ne11L2npc0We84qiNthSXiowZe5BFtiDvNhaQ5xxWM7DTx/0Kz6W3jsf+5oW34bE7zqggr asoaoR2SrIYtVxezsPnddyvfMtzwjN0uhXphgRoaaNMd7n4rzU2/jqaS9AwyD51xp4N/IK hNfP0PYUJUZKJM8Ll/MIFRDKDvyFgxD12wtoeew7tUXqvI2VyqFmb0HE5+6gDQ== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=pass header.d=messagingengine.com header.s=fm1 header.b=Vls2025Q; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Spam-Score: -2.63 Authentication-Results: aspmx1.migadu.com; dkim=pass header.d=messagingengine.com header.s=fm1 header.b=Vls2025Q; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-devel-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-devel-bounces@gnu.org X-Migadu-Queue-Id: C3044241CE X-Spam-Score: -2.63 X-Migadu-Scanner: scn0.migadu.com X-TUID: kSYzzzIzZhQg Hi everyone! Early this summer I did a bunch of work trying to figure out which Guix sources are preserved by the SWH archive. I=E2=80=99m finally ready to sha= re some preliminary results! https://ngyro.com/pog-reports/2021-10-20/ This report is already quite outdated, though. It only covers commits up to the end of May, and sometime in June is when the sources were checked against the SWH archive. I=E2=80=99m sharing it now to avoid any further delays. What=E2=80=99s cool is that the report is automated. Next on my list is to update the database and generate a new report. Then, we can compare the results and see if we are improving. (My read on the results so far is that improving =E2=80=9Csources.json=E2=80=9D will yield big improvements, = but we might not be able to get to that before the next report.) The report itself only provides a very high level overview. If you want to check on specifics, you will have to download the database. There=E2=80= =99s a link at the bottom of the report as well as a link to a detailed schema definition. Anyone interested in making some sense of the 5,043 known missing sources is encouraged to look there. However, I can say from my own investigation that a lot of them are kinda boring. For instance, 3,435 are from crates.io, CRAN, Hackage, Bioconductor, and CPAN: select count(*) from fods join fod_references using (fod_id) where not is_in_swh and (reference like '%crates.io%' or reference like '%/cran/%' or reference like '%hackage%' or reference like '%/bioconductor.%' or reference like '%/cpan/%'); =3D> 3435 It=E2=80=99s surprising to me that SWH is not already getting these from =E2=80=9Csources.json=E2=80=9D. I picked an arbitrary one, =E2=80=9Crust-q= uote-0.6=E2=80=9D, and it=E2=80=99s simply not in =E2=80=9Csources.json=E2=80=9D. On the other hand, I bet SWH= would like a crates.io (and CRAN, etc.) loader, too. One other more interesting approach might be to check Git sources: select count(*) from fods join fod_references using (fod_id) where not is_in_swh and reference like '(git-reference%'; =3D> 336 There are fewer, but they might be more interesting. Just be sure to check that they haven=E2=80=99t made it into the SWH archive since June. F= or instance, I just checked =E2=80=9Casciidoc@9.1.0=E2=80=9D and learned that = the database has =E2=80=9CNOT is_in_swh=E2=80=9D, but it is now in the SWH archive. So,= caveat emptor, I guess. Maybe it would be wise to wait for a more recent report before diving in. One other way to help would be to suggest improvements to the report. I don=E2=80=99t want to fiddle with it too much, but if there is some simple = graph or table or list that should be there, I=E2=80=99m happy to give it a go. -- Tim