From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp10.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms0.migadu.com with LMTPS id GIEuBrcu0mEFTwAAgWs5BA (envelope-from ) for ; Mon, 03 Jan 2022 00:01:11 +0100 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp10.migadu.com with LMTPS id 2CV3OrYu0mG1CwAAG6o9tA (envelope-from ) for ; Mon, 03 Jan 2022 00:01:10 +0100 Received: from lists.gnu.org (lists.gnu.org [209.51.188.17]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 460B31947C for ; Mon, 3 Jan 2022 00:01:10 +0100 (CET) Received: from localhost ([::1]:44910 helo=lists1p.gnu.org) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1n49qf-0004ub-Gw for larch@yhetil.org; Sun, 02 Jan 2022 18:01:09 -0500 Received: from eggs.gnu.org ([209.51.188.92]:47936) by lists.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n49pw-0004mY-KE for guix-devel@gnu.org; Sun, 02 Jan 2022 18:00:24 -0500 Received: from out4-smtp.messagingengine.com ([66.111.4.28]:60327) by eggs.gnu.org with esmtps (TLS1.2:ECDHE_RSA_AES_256_GCM_SHA384:256) (Exim 4.90_1) (envelope-from ) id 1n49ps-0002lr-7s for guix-devel@gnu.org; Sun, 02 Jan 2022 18:00:21 -0500 Received: from compute3.internal (compute3.nyi.internal [10.202.2.43]) by mailout.nyi.internal (Postfix) with ESMTP id 937F85C0117; Sun, 2 Jan 2022 18:00:13 -0500 (EST) Received: from mailfrontend1 ([10.202.2.162]) by compute3.internal (MEProxy); Sun, 02 Jan 2022 18:00:13 -0500 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d= messagingengine.com; h=cc:content-transfer-encoding:content-type :date:from:message-id:mime-version:references:subject:to :x-me-proxy:x-me-proxy:x-me-sender:x-me-sender:x-sasl-enc; s= fm1; bh=q1p31LTlsF4vD8SP62cAg4jNl9c8nhUMO6sS+wX+8cY=; b=mxk4WU3T sYfyZCilXGvuRw+by+DC2CZVMO7uzxV9FdHSCtoYDFTejFxtuuqChjTuxNEOoQt/ 2A8efgqvt0tEkow63+hD0Ix9rIwfdRFllMVJ8yDLx2Vw9v1LgsC8F82yINwzjy84 jlKE+ByObD6N2LbOzmCB10U9nr0n3ctJUXyVpThSBwSSmWbtpeUO5EWCVhSv022F D5pZFlBACplVg3aSLpfkudbQWGbTXOWB+7VdkHNhnet8Ngu5R9lFdgPwirHYfLXz rDm9pS9c7SVIMVGZLWIo6Zw5aaI+60SbD4zyHEOc3laxiu7RJ5lMJSA2+P5up8oI QmE6zJ1JwoWwuQ== X-ME-Sender: X-ME-Received: X-ME-Proxy-Cause: gggruggvucftvghtrhhoucdtuddrgedvuddrudeftddgtdegucetufdoteggodetrfdotf fvucfrrhhofhhilhgvmecuhfgrshhtofgrihhlpdfqfgfvpdfurfetoffkrfgpnffqhgen uceurghilhhouhhtmecufedttdenucesvcftvggtihhpihgvnhhtshculddquddttddmne cujfgurhephffvufhffffkfgggtgfgsehtqhertddtreejnecuhfhrohhmpefvihhmohht hhihucfurghmphhlvgcuoehsrghmphhlvghtsehnghihrhhordgtohhmqeenucggtffrrg htthgvrhhnpedufefhheduleektdeghfdufeekueeiveelgfffteetgfduudefgeejudeu vdegheenucffohhmrghinhepghhnuhdrohhrghdpghhithhhuhgsrdgtohhmpdhumhgrnh gvthhirdhnvghtpdhnghihrhhordgtohhmnecuvehluhhsthgvrhfuihiivgeptdenucfr rghrrghmpehmrghilhhfrhhomhepshgrmhhplhgvthesnhhghihrohdrtghomh X-ME-Proxy: Received: by mail.messagingengine.com (Postfix) with ESMTPA; Sun, 2 Jan 2022 18:00:13 -0500 (EST) From: Timothy Sample To: Liliana Marie Prikler Subject: Re: On raw strings in commit field References: <6e451a878b749d4afb6eede9b476e5faabb0d609.camel@gmail.com> <87k0fm7v3k.fsf@netris.org> <871r1smdu6.fsf@netris.org> <874k6nqrhs.fsf@ngyro.com> Date: Sun, 02 Jan 2022 18:00:09 -0500 Message-ID: <87ee5pspza.fsf@ngyro.com> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.2 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Received-SPF: pass client-ip=66.111.4.28; envelope-from=samplet@ngyro.com; helo=out4-smtp.messagingengine.com X-Spam_score_int: -6 X-Spam_score: -0.7 X-Spam_bar: / X-Spam_report: (-0.7 / 5.0 requ) DKIM_SIGNED=0.1, DKIM_VALID=-0.1, RCVD_IN_DNSWL_LOW=-0.7, RCVD_IN_MSPIKE_H3=0.001, RCVD_IN_MSPIKE_WL=0.001, SPF_HELO_PASS=-0.001, SPF_PASS=-0.001 autolearn=unavailable autolearn_force=no X-Spam_action: no action X-BeenThere: guix-devel@gnu.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: "Development of GNU Guix and the GNU System distribution." List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Cc: guix-devel@gnu.org Errors-To: guix-devel-bounces+larch=yhetil.org@gnu.org Sender: "Guix-devel" X-Migadu-Flow: FLOW_IN X-Migadu-Country: US ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=yhetil.org; s=key1; t=1641164470; h=from:from:sender:sender:reply-to:subject:subject:date:date: message-id:message-id:to:to:cc:cc:mime-version:mime-version: content-type:content-type: content-transfer-encoding:content-transfer-encoding: references:references:list-id:list-help:list-unsubscribe: list-subscribe:list-post:dkim-signature; bh=q1p31LTlsF4vD8SP62cAg4jNl9c8nhUMO6sS+wX+8cY=; b=qgXknvPQeXhBKotJLHvyLq1vfywoRWVJ4/74W0B/wg4TjFCU1bpjS5iA/4npQcUP9GL3zs vu3CYMK7lG9C3vzTmxijtu+c75Ds9IuKA3VjsjiSyU6vIQcH4gEOzOLHpm2JoTlmpkGLfK ZVRKi0QYI8fzzlijfmSB61LP63Qa0I56nrTzeKJjMWgT7iBImA8rvGvf+dJ6LF7nLOH24g 0ka5Fs52hbL4Vrp23P/wonEGEqZBZ3Kwcpl3fntSuT6KACaLcD9HidOpCsbOYkUy9U0BYS uzAlU9ctBGeme1ezphLvLB+ZmjmFLRifyCsSjqWqXk70k0DMXTjG/jXTvRjqCw== ARC-Seal: i=1; s=key1; d=yhetil.org; t=1641164470; a=rsa-sha256; cv=none; b=GVoFkoPESeOQKLINqNzA60mgD7fwI2ojkdSiIi4u7q1AzqU4GqNqDyWXDOLde6G1nodiXA PsF6EB9gWjdAqAo/OnFH0ZOd0myt4964FjQqQJa3UI7cn9hncuz4nFoexGTiuOS+5JKcIJ Zb3UPSzSyIGNuXrS218NoXN0oQ2wdbR1cnBreaNxXox9gNSRQEUmH961Q0focFPCcRgDOP fzoZXfjtsX1YH832hHr0CwVeQRZ4eT8M5igR2wZ3Y29NKhGEYgg62AmBUzBEqmyhFML8+t v/ZJ8+mX3Ywh+n6Kg8cMKCAsrCDGnEWl1x0vbw0Ax6DLtheFM1X2OiERuFN42g== ARC-Authentication-Results: i=1; aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm1 header.b=mxk4WU3T; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Spam-Score: -2.18 Authentication-Results: aspmx1.migadu.com; dkim=fail ("headers rsa verify failed") header.d=messagingengine.com header.s=fm1 header.b=mxk4WU3T; dmarc=none; spf=pass (aspmx1.migadu.com: domain of "guix-devel-bounces+larch=yhetil.org@gnu.org" designates 209.51.188.17 as permitted sender) smtp.mailfrom="guix-devel-bounces+larch=yhetil.org@gnu.org" X-Migadu-Queue-Id: 460B31947C X-Spam-Score: -2.18 X-Migadu-Scanner: scn1.migadu.com X-TUID: CAP5b8EEs/z6 Hey, Liliana Marie Prikler writes: > Since you are our expert on preservation, would you mind if I ask you > for some estimates on how painful it is to track down such commits in > general, if it could be made easier were you to record tag =E2=86=92 comm= it > (alternatively file-name x sha256 =E2=86=92 SWHID) maps periodically (or = if you > already have such a map and those arise while creating it), and how > many =E2=80=9CTricking Peer Review=E2=80=9D-style problems you think are = currently > around? I haven=E2=80=99t been keeping a detailed log of issues or anything, but I = have some notes and recollections. As of last month [1], we have 9,554 Git sources (fixed-output derivations lowered from =E2=80=98git-reference=E2=80= =99 origins). Of those, 186 could not be recovered automatically (by simply cloning the repo or, for about 100 cases with the commit hash, checking SWH). Most of the 186 have =E2=80=98(recursive? #t)=E2=80=99, which is something = I haven=E2=80=99t implemented yet (there=E2=80=99s no Guix fallback support for it either). However, there are 51 of those that should just work but don=E2=80=99t. It turns out that most of these are due to my scripts ignoring the wrong kind of VCS files (like ignoring =E2=80=9C.hg=E2=80=9D) when hashing. My s= cripts follow the logic of =E2=80=98guix hash -S nar -x .=E2=80=99, but Guix actually jus= t deletes the Git metadata: =E2=80=98rm -rf .git; guix hash -S nar .=E2=80=99. :) Another couple are . There were a handful of mutated tags (around a dozen). Some of them were deleted, but the tag name referred to the commit hash (as if the tag was named by =E2=80=98git describe=E2=80=99). Some of them were change= d, but it was clear that the original tag was just a few commits back. There was only one I couldn=E2=80=99t figure out: https://github.com/jurplel/qView.git at tag 2.0 with hash 1s29hz44rb5dwzq8d4i4bfg77dr0v3ywpvidpa6xzg7hnnv3mhi5 A similar problem is when the repo URL changes, but the tags are still the same when you track down another copy of the repo. I encountered this a few times. Another handful (again around a dozen) were hash mistakes in the style of tricking peer review. In most cases our commit messages were clear enough to figure out what the hash was actually for. There are two mysterious cases: https://git.umaneti.net/flycheck-grammalecte/ at tag v1.3 with hash 1f1gapvs9j89qr474103dqgsiyb96phlnsmq5hiv4ba242blg9lb (see Guix commit ca5a791f6285b08506ccd662d5911ccf0c4d1ece) https://github.com/fdik/libetpan at commit 210ba2b3b310b8b7a6ee4a4e35e50f7fa379643f with hash 00000nij3ray7nssvq0lzb352wmnab8ffzk7dgff2c68mvjbh1l6 (the hash kinda looks fake, but it was like that for a long time) There are two other cases that are basically =E2=80=9Ctypos=E2=80=9D in the= hash. One is clearly just an edit to the hash to make the build fail and print the correct hash (see commits 618df2e335acb49a27ca014b555ede34f79503f3 and bdc7f72fe4391ede313a0388ddd17cbb053931c9). The other one is commit c0dc4179091f85fe4b8a2bbdb07c154a7f0408ed, which changes the hash of the package =E2=80=98zimg=E2=80=99 without mentioning anything about it in the = commit message. This is fixed in b08c4f5fceff6064baedea3385703689b8a72e47 (back to the original hash). Tobias might remember what happened there, but it looks like an honest mistake to me. I have no clue what that other hash was for. Note for all of this that my scripts treat the SHA256 hash as *the* identifier for a source. That is, if a tag is mutated and a someone adjusts the origin URI to point to the commit that the tag used to refer to, I would not notice. Similarly, for tricking peer review: fixing the URI to match the hash is invisible to me. It=E2=80=99s only when we fix the hash to match the URI that I notice. See also zimoun=E2=80=99s analysis of the same thing, but with older data: . [1] https://ngyro.com/pog-reports/2021-12-06/ > Am Samstag, dem 01.01.2022 um 12:45 -0500 schrieb Timothy Sample: > >> Given what I wrote above, maybe we could start by updating the linter >> so that =E2=80=98check-source=E2=80=99 actually checks that it gets the = right result. >> Right now it uses a few heuristics to check that the result looks >> okay (for instance, it checks if the result is suspiciously small).=C2=A0 >> Maybe it should just go through the whole download process and verify >> the hash? Alternatively (or additionally), the CI =E2=80=9Csource=E2=80= =9D >> specification could be configured to avoid using our servers as a >> fallback when checking sources. > > I think substitutes should be disabled for the source download of a > "check-source". Even if a substitute or SWH fallback exists, that's > not what we want to check here, no? Exactly. It should just fetch the source as na=C3=AFvely as possible, akin to =E2=80=98GUIX_DOWNLOAD_FALLBACK_TEST=3Dnone guix build --check -S ...=E2= =80=99 (or with =E2=80=98--substitute-urls=3D""=E2=80=99 or whatever). >> I agree that adding more identifiers (commit hashes or whatever) >> makes things more robust, but the cost is more work when creating, >> updating, and reviewing packages.=C2=A0 I think we should start by >> verifying the identifiers we already have (i.e., checking that the >> URI and method of the origin produce the right output).=C2=A0 It would >> solve many existing problems and would serve as a nice foundation for >> future improvements. > > Is this something we can reasonably expect our current CI or CI in > general to handle (assuming we tweaked the linter to behave as you > intend?) Or would it make more sense to implement this as a > weekly/monthly cronjob? I really only mentioned the CI because I had to explain to myself why it didn=E2=80=99t notice the problem. I think the linter is probably the bett= er place to improve things here. It=E2=80=99s something I=E2=80=99m willing t= o work on, but I would need to understand why it doesn=E2=80=99t check the hash alread= y. It seems like one of those things someone may have already thought about and decided against. -- Tim