From mboxrd@z Thu Jan 1 00:00:00 1970 Received: from eggs.gnu.org ([2001:4830:134:3::10]:54282) by lists.gnu.org with esmtp (Exim 4.71) (envelope-from ) id 1gPPZ9-0000Cs-V3 for guix-patches@gnu.org; Wed, 21 Nov 2018 05:17:04 -0500 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1gPPZ8-0000Ss-22 for guix-patches@gnu.org; Wed, 21 Nov 2018 05:17:03 -0500 Received: from debbugs.gnu.org ([208.118.235.43]:33551) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1gPPZ7-0000S9-V3 for guix-patches@gnu.org; Wed, 21 Nov 2018 05:17:02 -0500 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1gPPZ7-0006nn-LZ for guix-patches@gnu.org; Wed, 21 Nov 2018 05:17:01 -0500 Subject: [bug#33432] On tags Resent-Message-ID: From: ludo@gnu.org (Ludovic =?UTF-8?Q?Court=C3=A8s?=) References: <20181119161325.7801-1-ludo@gnu.org> Date: Wed, 21 Nov 2018 11:15:52 +0100 In-Reply-To: <20181119161325.7801-1-ludo@gnu.org> ("Ludovic \=\?utf-8\?Q\?Cour\?\= \=\?utf-8\?Q\?t\=C3\=A8s\=22's\?\= message of "Mon, 19 Nov 2018 17:13:25 +0100") Message-ID: <87muq2u46f.fsf@gnu.org> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable List-Id: List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: guix-patches-bounces+kyle=kyleam.com@gnu.org Sender: "Guix-patches" To: 33432@debbugs.gnu.org Hello, Ludovic Court=C3=A8s skribis: > When downloading over SWH, the =E2=80=98swh-download=E2=80=99 procedure f= irst resolves > the tag (if it=E2=80=99s a tag), then tries to download the corresponding= tarball Speaking of tags, it=E2=80=99s not news but tags are bad from a reproducibi= lity standpoint: they are mutable and per-repository. Tag lookup is necessarily relative to a repository URL (and to a snapshot of the repository, since it can be mutated): scheme@(guile-user)> (lookup-origin-revision "https://git.savannah.gnu.or= g/git/guix.git" "v0.15.0") $5 =3D #< id: "359fdda40f754bbf1b5dc261e7427b75463b59be" date: = # directory: "27c69c5d298a43096a53affbf881e7b13f17bdcd= " directory-url: "/api/1/directory/27c69c5d298a43096a53affbf881e7b13f17bdcd= /"> So if, say, SWH archived a mirror of but not itself, then tag lookup will fail, which is sad given that the code is actually there. To address this, possible options include: 1. Always store commit IDs rather than tags, effectively giving us =E2=80=9Cnormal=E2=80=9D Git content-addressability. This is not grea= t for code readability and review though. 2. Store =E2=80=98sha1_git=E2=80=99 hashes (SHA1s of Git trees) instead o= f or in addition to nar sha256 hashes so we can perform lookups by content hash on SWH or Git mirrors. #2 might be the best long-term option though it would require daemon support to compute, store, and check these Git-style hashes. Ludo=E2=80=99.