From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp12.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id sJlJOhWx0GEwQAEAgWs5BA (envelope-from ) for ; Sat, 01 Jan 2022 20:52:53 +0100 Received: from aspmx1.migadu.com ([2001:41d0:2:bcc0::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp12.migadu.com with LMTPS id WAHeNhWx0GFI2QAAauVa8A (envelope-from ) for ; Sat, 01 Jan 2022 20:52:53 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 704E839EF7 for ; Sat, 1 Jan 2022 20:52:53 +0100 (CET) Received: from localhost ([::1]:33958 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n3kQt-00067n-Vx for larch@yhetil.org; Sat, 01 Jan 2022 14:52:52 -0500 Received: from eggs.gnu.org ([209.51.188.92]:38028) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n3kQZ-00067f-5V for guix-devel@gnu.org; Sat, 01 Jan 2022 14:52:31 -0500 Received: from [2a00:1450:4864:20::441] (port=46830 helo=mail-wr1-x441.google.com) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_128_GCM_SHA256:128) (Exim 4.90_1) (envelope-from ) id 1n3kQX-0006ZF-Cr for guix-devel@gnu.org; Sat, 01 Jan 2022 14:52:30 -0500 Received: by mail-wr1-x441.google.com with SMTP id i22so61826129wrb.13 for ; Sat, 01 Jan 2022 11:52:29 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20210112; h=message-id:subject:from:to:cc:date:in-reply-to:references :user-agent:mime-version:content-transfer-encoding; bh=XoSTXa8nIzkILyFAdwiYwqaCOi6K0vMBdFm9Q+QByos=; b=Xq618GUgWU+5Uz6LtOQvVSSpFzFK0AZ3qdTUuVDle0OcGy+FyAuH1UADYS8t/n2qfl WQ8L5ChB3paR7qnftVplRA3gF3SUQhazlTRhtmD3CokfVFuOdzEzx5ZxzENTDPOkiP2Z n5nA+k+5IRxXBkMN30JD5sgjyImADlqiXSnjtSqCsj3oc2MMdHF8ll1jwaKhN8aC7zT6 Ox7XZiLBX5qirJsy9fpTCohaOnnLjxZiC+regRfhRADADUgFATk19MW1WMdxPhDoRMSE XRjDSvBM60hVoME/5Bav8HegBBWngC6W+TCxU/xPnaIKUG+s2GQWmvg6FFKIdNJfoV3f o3tA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:message-id:subject:from:to:cc:date:in-reply-to :references:user-agent:mime-version:content-transfer-encoding; bh=XoSTXa8nIzkILyFAdwiYwqaCOi6K0vMBdFm9Q+QByos=; b=fA2TvJINinKf5QA+gwYJQLhnCbu//hy0f1j1j4jJZsFIzq0Qkzx2vFiBxdijf+DdWF bt0LCDMQ2lIi0ut9+70lSMdwmHi2Ogw7JCLl9DgYhZXB5I2PDyFFmlB2OfqaY9zjleZ3 32T1jfZaj0/+T7aC2opyDOtxSJ5ZB8y+M1uMZYpxiatSLENnnk99xVKCgwWwm/F1a2A5 J/JvPVH/RR3Odjlz3IqXBx8ib8VKTf7mIkMR7IIqOqe2v5SxrBCq4DtEwelqRqkUSPYl CuVeqIuNC68eo/TR2PWkdtWX1WMFLIIBpMcM3RDb4Wd6HFxEekIazUrMe3xd+yR6Rfpr 2vEw== X-Gm-Message-State: AOAM532E/1Heni6+rKQzOgDrD8a/CVARXnsCEw/2hn3P6KyEvRZHD44V hcW9EnD/l7t1uxqmYsNv8w7uBqRY+ZknLw== X-Google-Smtp-Source: ABdhPJzXevwT0ogApbr6sgOXUpaBM6wLBp10akPAhFdHVTrDSwn9ZPqHdjc1h2AMqS/a0Te2ie2wGQ== X-Received: by 2002:adf:f6c1:: with SMTP id y1mr33228570wrp.351.1641066747410; Sat, 01 Jan 2022 11:52:27 -0800 (PST) Received: from nijino.fritz.box (85-127-52-93.dsl.dynamic.surfer.at. [85.127.52.93]) by smtp.gmail.com with ESMTPSA id y1sm30022518wrm.3.2022.01.01.11.52.26 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Sat, 01 Jan 2022 11:52:26 -0800 (PST) Message-ID: Subject: Re: On raw strings in commit field From: Liliana Marie Prikler To: Timothy Sample Date: Sat, 01 Jan 2022 20:52:25 +0100 In-Reply-To: <874k6nqrhs.fsf@ngyro.com> References: <6e451a878b749d4afb6eede9b476e5faabb0d609.camel@gmail.com> <87k0fm7v3k.fsf@netris.org> <871r1smdu6.fsf@netris.org> <874k6nqrhs.fsf@ngyro.com> Content-Type: text/plain; charset="UTF-8" User-Agent: Evolution 3.42.1 MIME-Version: 1.0 Content-Transfer-Encoding: 8bit X-Host-Lookup-Failed: Reverse DNS lookup failed for 2a00:1450:4864:20::441 (failed) Received-SPF: pass client-ip=2a00:1450:4864:20::441; envelope-from=liliana.prikler@gmail.com; helo=mail-wr1-x441.google.com X-Spam_score_int: 6 X-Spam_score: 0.6 X-Spam_bar: / X-Spam_report: (0.6 / 5.0 requ) DKIM_SIGNED=0.1, DKIM_VALID=-0.1, DKIM_VALID_AU=-0.1, DKIM_VALID_EF=-0.1, FREEMAIL_FROM=0.001, RCVD_IN_DNSWL_NONE=-0.0001, RDNS_NONE=0.793, SPF_HELO_NONE=0.001, SPF_PASS=-0.001 autolearn=no autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1641066773; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references:list-id:list-help: list-unsubscribe:list-subscribe:list-post:dkim-signature; bh=XoSTXa8nIzkILyFAdwiYwqaCOi6K0vMBdFm9Q+QByos=; b=f/D12T/q6LjL+z5S01mdghDdKE5C32h5c47QKDswiUF7n0N8bdaDojxYCiMuw3aABZ1kFk 48pFwLjX9vM4Pt2fR+wR5jwG8EM1Gq65kg0nFdlpKGXNM7OCpp5UXrUYerGQPwjByXkELD SBjlGSE4SIVpMJ0yNqmX431smdqflkNp9po89wfdnPylF0mqRVaiKcQVZgZzVjoXssyUaP oLuAeufNj5MftaDMNt+THoQe+/cJYoqJBoCdc+FpbjV/dmwGBjnmXOf+Y5vw9cqNus/a1c Vbo+wbU0GtN8yJhZ/GpRdTcqOVyX9JTvLmmGUEgciz6hbKIS4zvvWTFBZKIq5g== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1641066773; a=rsa-sha256; cv=none; b=YmHp8VkAinaibihpXiDjfrH0DhN++/2xmPrQA0ejsyMyo6oxt71N7bZ6brZu+GuUHBVe5a GBkSRfGjRDdXNE9A/CZjTjwNKOqeBDO6LnwHdH2rVrNKmyHaen6JjNfZdf9tAxIp5rTb0m g8mwfxc+GaCQDR7CDLfNfKT4fIdzefrpyQJY07GuDbuS3lnnb8yQgBV1IJrADQ5f77I9AM 1JvuNXu1rsCRG2uxd0UMJqWQr6zJrYL8GKZi6EhgsFCyF7RiWGHeW8RVwOEKh4u+JzP93u h79GeDEe+RNSmVi91PO3CqdkHpt1CA/XPRaGDOJ0SYNiz3awrspGWb8biA7WPA== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=Xq618GUg; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -2.48 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=gmail.com header.s=20210112 header.b=Xq618GUg; dmarc=fail reason="SPF not aligned (relaxed)" header.from=gmail.com (policy=none); spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 704E839EF7 X-Spam-Score: -2.48 X-Migadu-Scanner: scn0.migadu.com X-TUID: 3wBkH514HJCp Hi Timothy, Am Samstag, dem 01.01.2022 um 12:45 -0500 schrieb Timothy Sample: > If you want a concrete example to think through, there’s ‘eclib’.  > Our package says it’s version “20190909”, but that’s not what > upstream calls version “20190909”.  It looks like when we packaged > ‘eclib’, that tag pointed to commit > 19e7e3e74268bf78bd9a1c4ba07597d5434fb166, but now > it points to bfbbd7c414521e1bf5e718a2925ea8ad845a2e87. > > If you try to build ‘eclib’, everything will work great, since we can > grab the checkout from our servers.  If you use > >     $ guix build --check -S eclib > > you get a hash mismatch.  We have CI jobs for sources, but they > aren’t checking this: .  > That job succeeds after downloading the checkout from our servers. With the robustness framework that is talked about here, this is only partially robust. If you have substitutes disabled, then a normal `guix build -S eclib' also fails, and if CI eventually garbage collects the source, the same happens for everyone. If we simply hardcoded the hash on the other hand, none of that would happen at all, you couldn't even use `guix build --check -S' as an oracle. > There are two things I can highlight from this case. > > First, as expected, finding the original commit was painful.  SWH did > not record the old version of the tag.  Comparing it with the > checkout from our servers showed that the differences were very > minor.  With that in mind, I moved backwards through the commit > history with ‘guix hash’ until I found a match.  As pointed out many > times, if I had the original commit, I could just ask SWH for it > directly. > > Second, these cases are very, very rare.  (I’ve essentially checked > every Git origin since Guix version 1.0.0, and this problem is not > one that worries me).  “Tricking Peer Review”-style problems seem to > be much more prevalent.  When tracking down a “difficult” Git origin, > the first thing I do is grep the Guix Git history for a “oops I > committed the wrong hash” message.  I recommend we focus our energies > there before worrying too much about replacing tags with commits or > using both or whatever. Since you are our expert on preservation, would you mind if I ask you for some estimates on how painful it is to track down such commits in general, if it could be made easier were you to record tag → commit (alternatively file-name x sha256 → SWHID) maps periodically (or if you already have such a map and those arise while creating it), and how many “Tricking Peer Review”-style problems you think are currently around? > > > Regarding "Tricking Peer Review": I think it would be ideal for > > > package definitions to include both the git tag _and_ the git > > > commit hash, and to teach our linter to raise an alarm when the > > > expected tags are missing or fail to match the expected commit > > > hash. > > > > That is among the solutions I've proposed here, so naturally I'd be > > fine with it. > > Given what I wrote above, maybe we could start by updating the linter > so that ‘check-source’ actually checks that it gets the right result. > Right now it uses a few heuristics to check that the result looks > okay (for instance, it checks if the result is suspiciously small).  > Maybe it should just go through the whole download process and verify > the hash? Alternatively (or additionally), the CI “source” > specification could be configured to avoid using our servers as a > fallback when checking sources. I think substitutes should be disabled for the source download of a "check-source". Even if a substitute or SWH fallback exists, that's not what we want to check here, no? > I agree that adding more identifiers (commit hashes or whatever) > makes things more robust, but the cost is more work when creating, > updating, and reviewing packages.  I think we should start by > verifying the identifiers we already have (i.e., checking that the > URI and method of the origin produce the right output).  It would > solve many existing problems and would serve as a nice foundation for > future improvements. Is this something we can reasonably expect our current CI or CI in general to handle (assuming we tweaked the linter to behave as you intend?) Or would it make more sense to implement this as a weekly/monthly cronjob? > And as a bonus, if you want to be really kind to future time > travellers, when fixing an errant hash, please include a nice hint as > to what the original hash was for (like a commit hash).  We have > commit ca5a791f6285b08506ccd662d5911ccf0c4d1ece in our repo, which > says: > > > The previous hash was from the "dev" branch of the repository. > > I can’t find the source for the previous hash, and if I could > actually travel through time, I would change the commit message to: > > > The previous hash was from commit abcd0123..., which comes from the > > "dev" branch of the repository. +1 from me for useful commit messages.