From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id nsk8Lu78WWBbUAAA0tVLHw (envelope-from ) for ; Tue, 23 Mar 2021 14:36:30 +0000 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id EKOWKe78WWC0JwAA1q6Kng (envelope-from ) for ; Tue, 23 Mar 2021 14:36:30 +0000 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 1BC5425E7C for ; Tue, 23 Mar 2021 15:36:30 +0100 (CET) Received: from localhost ([::1]:59312 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1lOi8y-0001QR-Vp for larch@yhetil.org; Tue, 23 Mar 2021 10:36:29 -0400 Received: from eggs.gnu.org ([2001:470:142:3::10]:47624) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1lOi4g-00051p-FC for guix-patches@gnu.org; Tue, 23 Mar 2021 10:32:02 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:49367) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1lOi4g-0001dn-6Q for guix-patches@gnu.org; Tue, 23 Mar 2021 10:32:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1lOi4g-0006go-2n for guix-patches@gnu.org; Tue, 23 Mar 2021 10:32:02 -0400 X-Loop: help-debbugs@gnu.org Subject: [bug#47336] Disarchive as a fallback for downloads Resent-From: Timothy Sample Original-Sender: "Debbugs-submit" Resent-CC: guix-patches@gnu.org Resent-Date: Tue, 23 Mar 2021 14:32:02 +0000 Resent-Message-ID: Resent-Sender: help-debbugs@gnu.org X-GNU-PR-Message: followup 47336 X-GNU-PR-Package: guix-patches X-GNU-PR-Keywords: To: zimoun Cc: Mathieu Othacehe , 47336@debbugs.gnu.org Received: via spool by 47336-submit@debbugs.gnu.org id=B47336.161650988425671 (code B ref 47336); Tue, 23 Mar 2021 14:32:02 +0000 Received: (at 47336) by debbugs.gnu.org; 23 Mar 2021 14:31:24 +0000 Received: from localhost ([127.0.0.1]:60913 helo=debbugs.gnu.org) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lOi44-0006fy-BI for submit@debbugs.gnu.org; Tue, 23 Mar 2021 10:31:24 -0400 Received: from out3-smtp.messagingengine.com ([66.111.4.27]:56299) by debbugs.gnu.org with esmtp (Exim 4.84_2) (envelope-from ) id 1lOi42-0006fk-Ml for 47336@debbugs.gnu.org; Tue, 23 Mar 2021 10:31:23 -0400 Received: from compute1.internal (compute1.nyi.internal [10.202.2.41]) by mailout.nyi.internal (Postfix) with ESMTP id 842A05C00D7; Tue, 23 Mar 2021 10:31:17 -0400 (EDT) Received: from mailfrontend2 ([10.202.2.163]) by compute1.internal (MEProxy); Tue, 23 Mar 2021 10:31:17 -0400 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:in-reply-to:message-id:mime-version:references :subject:to:x-me-proxy:x-me-proxy:x-me-sender:x-me-sender :x-sasl-enc; s=fm2; bh=MZOGcucNXJCcVOURYWb89EzsG/A48KOsBZjXvVtAo MM=; b=MQKGGVHnN1CXz9SGX+OTCpaG7PZ/x0DwKjIXqjWZcZ3Luo7adfatsuHAR j0mV3bvla//JvxvUlipL04UE5jFWH8DSnCOVqR9dBXJ9xCoL55XVT36h0lQwD8tT 9a+8VDrAnjzIkIsYohhK257/H8AFRdPXF8ggD2tpfvNY1QySvlFGH9EnYb33Wpr9 5g8Ner1rxahmOFLpTqbqUFQ61IxzXNloA/UyYJ8bUyXuoq9fTonPe1GzaIAh8jgY z8mOB8/v+dtQ11PURRpSCFWJOeBZee6nbuX/xCt8PRBF9dr19CfFaraYxG0y7iQo KrD7ZVIXJGMxyF5C4WgJJ+ymR0VNg== X-ME-Sender: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgeduledrudegiedgieeiucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufhfffgjkfgfgggtgfesthhqredttderjeenucfhrhhomhepvfhimhho thhhhicuufgrmhhplhgvuceoshgrmhhplhgvthesnhhghihrohdrtghomheqnecuggftrf grthhtvghrnhepledvlefhtdfhgfeuhfekiefhuefhfeevieeuheeuuedufefhudegffeg vedvjeefnecuffhomhgrihhnpehgnhhurdhorhhgpdhnghihrhhordgtohhmpdehnecukf hppeejgedrudduiedrudekiedrgeegnecuvehluhhsthgvrhfuihiivgeptdenucfrrghr rghmpehmrghilhhfrhhomhepshgrmhhplhgvthesnhhghihrohdrtghomh X-ME-Proxy: Received: from mrblack (74-116-186-44.qc.dsl.ebox.net [74.116.186.44]) by mail.messagingengine.com (Postfix) with ESMTPA id 6D8F51080066; Tue, 23 Mar 2021 10:31:16 -0400 (EDT) From: Timothy Sample References: <87eeg6o50b.fsf@ngyro.com> <86sg4mnreu.fsf@gmail.com> Date: Tue, 23 Mar 2021 10:31:15 -0400 In-Reply-To: <86sg4mnreu.fsf@gmail.com> (zimoun's message of "Tue, 23 Mar 2021 10:35:53 +0100") Message-ID: <87sg4mt00c.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-BeenThere: debbugs-submit@debbugs.gnu.org X-Mailman-Version: 2.1.18 Precedence: list X-BeenThere: guix-patches@gnu.org List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+larch=yhetil.org@gnu.org Sender: "Guix-patches" X-Migadu-Flow: FLOW_IN ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1616510190; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding:resent-cc: resent-from:resent-sender:resent-message-id:in-reply-to:in-reply-to: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=MZOGcucNXJCcVOURYWb89EzsG/A48KOsBZjXvVtAoMM=; b=swyaRaC/OxqM2fw6jduc8R6xp8uSONU0CAYC2J6JzwbS2dQwjYDfvuq8xHJdyB31P6qaWz 79N257kTaJcBKsHyMRhhiFh53KpEm+s/U5aLQwkTnMRXXCUxWGlrMHxtR0surb/tZm7411 e48QEk4vHjYWIDUzeBgnl7r33OeAQ0HGpkk/fcbv2UjFet+ka2TPQSRWzDrCpLI+tJEyck yv75I/SOaFfcTPW6cEWyYV4nlRN7oBxsLZBkgio9apSSuwu/bfz+Rsvdvp7vklabzsEWWM xTJ0QPGd2W6qQhCZheDGGVFS6pvJOB2k+w4oyrybvH2bpeOv0/jBq1Z9HEQf8A== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1616510190; a=rsa-sha256; cv=none; b=f823qn7N9qxzJ6AD5BCFYq0km5wDlPNEdUj3BYB1bu04rzFlPQlhFXIxpzkUwkwKggacc8 kkjzJRS6RwGkVg/YyXYrf2D0Vq0iD1tHZzxY2zRLV5kIBBcScGap+BRko8W1KpA18Nfg5c AELYjUm/D/V4sUcQuUk36W8nOBwHi0vB4lFdOvLDgRmPcRUshPnet2A6ENDVWHUvC2cBaW gxJYjtVbvhySH3+UL0XZloho8NO9EAVtkjA7i2mPR85GtQ3e3MmXHCaVvtc82WY6kOY3z2 NW7t0jfCvdKeMy3sxbeQrIN3DZPokxaDUInLx75jrlOR4b8DR1xoJCYONcBazw== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b=MQKGGVHn; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Migadu-Spam-Score: -1.42 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm2 header.b=MQKGGVHn; dmarc=none; spf=pass (aspmx1.migadu.com: domain of guix-patches-bounces@gnu.org designates 209.51.188.17 as permitted sender) smtp.mailfrom=guix-patches-bounces@gnu.org X-Migadu-Queue-Id: 1BC5425E7C X-Spam-Score: -1.42 X-Migadu-Scanner: scn0.migadu.com X-TUID: lKvHvTOUaVNJ Hi zimoun, You make a lot of good points here. Let me at least provide some quick answers even if I=E2=80=99m not ready to comment on some of the bigger pict= ure stuff. zimoun writes: > (CC Mathieu to advice if it could be a feature of Cuirass.) So far I have been using Cuirass with only a tiny patch. I=E2=80=99m not s= ure we need anything more than what Cuirass already provides. (The tiny patch is for allowing sorting the =E2=80=9Clatestbuilds=E2=80=9D results by= =E2=80=9Cstoptime=E2=80=9D and =E2=80=9Cid=E2=80=9D. This in turn allows paging through all the build= s from the API.) > On Tue, 23 Mar 2021 at 00:42, Timothy Sample wrote: > >> Now you can ask Guix for a recent .tar.gz source package: >> >> $ ./pre-inst-env guix build --no-substitutes -S python-httpretty > > Neat! Now, there is a way to easily check the coverage, right? Since > SWH is ingesting the tarball using , > there is now a mean to report what Guix is able to rebuild. I=E2=80=99m not sure I fully understand. Disarchive covers about 4,300 Gzi= p=E2=80=99ed tarballs (no XZ yet). There are about 100 for which compression parameters cannot be found, and a handful (about 5) that have a particularly funny idea about what a tarball is. The metadata builds for my database started one week ago and have been continuously updating since then. Are you asking if we could check what SWH has? Yes! Each metadata file contains the SWHID of the input directory. You could use Disarchive to get this value or a simple =E2=80=9Cgrep swhid=E2=80=9D would= do it. :) $ curl https://disarchive.ngyro.com/sha256/67989614004773db349791c37675= efb914d084bdb221356a05e4369c35e7eb62 | grep swhid It would be neat to have a big database of archive coverage from Guix 1.0 through to the present. It=E2=80=99s quite a big project though. Of course, you know all about the SWH rate limit.... >> Checking httpretty-1.0.5 digest... ok > > What happens if it is not ok? For that particular digest, it means the source directory is wrong. Since we get the source from SWH, it means that the SWH archive is wrong. You will have to look elsewhere, I guess (this seems pretty unlikely). (There is a vanishing possibility that Disarchive miscomputed the SWHID and managed to come up with a different, but still valid SWHID....) The other digest checks are more likely to fail. They would indicate that Disarchive no longer knows how to interpret the metadata. Maybe there will be a subtle bug in Disarchive 0.3.0 that causes this. Either use an old version of Disarchive or try to fix the current version. :) I worry about this, because it would be annoying, but the metadata does have all the information needed to recover the original archive, so nothing is really lost (except the user=E2=80=99s time). >> Assembling the tarball httpretty-1.0.5.tar >> Checking httpretty-1.0.5.tar digest... ok >> Assembling the Gzip file httpretty-1.0.5.tar.gz >> Checking httpretty-1.0.5.tar.gz digest... ok >> Copying result to >> /gnu/store/kbcnm57y2q1jvhvd8zw1g5vdiwlv19y9-httpretty-1.0.5.tar.gz > > Where is the assembly done? In /tmp/, right? Yes. >> successfully built >> /gnu/store/k0b3c7kgzyn1nlyhx192pcbcgbfnhnwa-httpretty-1.0.5.tar.gz.drv > > Just to be sure, when does Guix check the integrity checksum? I mean, > does Guix check the checksum after =E2=80=99disassemble=E2=80=99 re-assem= bled the source? Disarchive checks the result against the metadata to make sure it didn=E2= =80=99t make a mistake. Guix also checks the final result to make sure the fixed-output derivation is correct. A fixed-output derivation is basically just a checksum with a hint about how the data can be obtained. Guix really only cares about the checksum, the hint can do whatever as long as it produces the result Guix wants. With this patch series, Disarchive is part of the hint. >> First, it looks up the metadata on my server. This is fine for a demo, >> but not what we want forever. The patch series supports adding >> several > > As we talked before, how does the database scale? Do you have some > numbers for the current demo? In order to try to extrapolate what does > it mean for a server to =C2=ABstore the metadata=C2=BB. With =E2=80=9Cgzip -9=E2=80=9D, the average metadata file is 6.8KiB. It=E2= =80=99s pretty manageable. There=E2=80=99s room for improvement on the Disarchive side, t= oo. It still stores some redundant information. Uncompressed, it=E2=80=99s more like 112KiB per file. This is still pretty okay, really. It means we might hit tens of GiB over a couple years. (It would take just over 100GiB to store a million uncompressed metadata files.) The compression ratio is what drove me to skip Git for now. >> mirrors for looking up the metadata. In the past, we talked about >> putting everything on one or a few of the big Git hosting platforms like >> GitHub or Gitlab. That way, it would be easily picked up by SWH and >> archived =E2=80=9Cforever=E2=80=9D. Right now, I have Cuirass set up to= build the >> metadata, and a little script that moves it from the build server to my >> Web server. It would be simple enough to adjust that script to push it >> to a remote Git repo. (Of course, the next step is to move this setup >> to Guix infrastructure.) Thoughts? > > Maybe this database could be a package, say =E2=80=9Cguix-tarball-db=E2= =80=9D, updated > in agreement with the package =E2=80=9Cguix=E2=80=9D. The source of this > =E2=80=9Cguix-tarball-db=E2=80=9D would be a remote big Git hosting platf= orms like > GitHub or whatever and not stored on Guix infrastructure, or maybe > stored on Guix infra. > > Regularly, i.e., when the package =E2=80=9Cguix=E2=80=9D is updated, in t= he same time, > the package =E2=80=9Cguix-tarball-db=E2=80=9D is updated too. The =E2=80= =9Cguix lint -c > archival=E2=80=9D sends the saving request to SWH. Even if this saving r= equest > should be automated soon. :-) > > Then if Cuirass would have a feature to disassemble and update the Git > repo. > > Last, a service should run as your demo. But for long-term, this > service could disappear=E2=80=93=E2=80=93assuming SWH not :-). Therefore= , we could > imagine installing =E2=80=9Cguix-tarball-db=E2=80=9D then tweak some para= meters of the > guix-daemon and =E2=80=9Cguix build =E2=80=9D. Both installing and = building would > fetch from SWH if both upstream disappear. > > Or this =E2=80=9Cguix-tarball-db=E2=80=9D should not be a plain package b= ut only an > input as origin for the package =E2=80=9Cguix=E2=80=9D. This is an interesting idea, but one that I would have to think about more. :) -- Tim