unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Timothy Sample <samplet@ngyro.com>
To: Liliana Marie Prikler <liliana.prikler@gmail.com>
Cc: guix-devel@gnu.org
Subject: Re: On raw strings in <origin> commit field
Date: Sat, 01 Jan 2022 12:45:51 -0500	[thread overview]
Message-ID: <874k6nqrhs.fsf@ngyro.com> (raw)
In-Reply-To: <b72eb9430694fe7bccb6b37f9cfce7d8e47f1385.camel@gmail.com> (Liliana Marie Prikler's message of "Sat, 01 Jan 2022 12:12:33 +0100")

Hi all,

Liliana Marie Prikler <liliana.prikler@gmail.com> writes:

> Am Freitag, dem 31.12.2021 um 20:41 -0500 schrieb Mark H Weaver:
>
>> If upstream later indicates that version "1.2.3" is now commit YYZ, I
>> don't think that invalidates our basis for continuing to associate
>> version "1.2.3" with commit XYZ.  The aforementioned immutable
>> historical fact still remains our basis and justification for making
>> that association.
>
> I'm pretty sure it does, particularly to a future observer who may not
> have the luxury of a history to distinguish that record from one in
> which a malicious committer linked those versions and tag together and
> then no one bothered to check.

If you want a concrete example to think through, there’s ‘eclib’.  Our
package says it’s version “20190909”, but that’s not what upstream calls
version “20190909”.  It looks like when we packaged ‘eclib’, that tag
pointed to commit 19e7e3e74268bf78bd9a1c4ba07597d5434fb166, but now it
points to bfbbd7c414521e1bf5e718a2925ea8ad845a2e87.

If you try to build ‘eclib’, everything will work great, since we can
grab the checkout from our servers.  If you use

    $ guix build --check -S eclib

you get a hash mismatch.  We have CI jobs for sources, but they aren’t
checking this: <https://ci.guix.gnu.org/build/319/details>.  That job
succeeds after downloading the checkout from our servers.

There are two things I can highlight from this case.

First, as expected, finding the original commit was painful.  SWH did
not record the old version of the tag.  Comparing it with the checkout
from our servers showed that the differences were very minor.  With that
in mind, I moved backwards through the commit history with ‘guix hash’
until I found a match.  As pointed out many times, if I had the original
commit, I could just ask SWH for it directly.

Second, these cases are very, very rare.  (I’ve essentially checked
every Git origin since Guix version 1.0.0, and this problem is not one
that worries me).  “Tricking Peer Review”-style problems seem to be much
more prevalent.  When tracking down a “difficult” Git origin, the first
thing I do is grep the Guix Git history for a “oops I committed the
wrong hash” message.  I recommend we focus our energies there before
worrying too much about replacing tags with commits or using both or
whatever.

>> Regarding "Tricking Peer Review": I think it would be ideal for
>> package definitions to include both the git tag _and_ the git commit
>> hash, and to teach our linter to raise an alarm when the expected
>> tags are missing or fail to match the expected commit hash.
>
> That is among the solutions I've proposed here, so naturally I'd be
> fine with it.

Given what I wrote above, maybe we could start by updating the linter so
that ‘check-source’ actually checks that it gets the right result.
Right now it uses a few heuristics to check that the result looks okay
(for instance, it checks if the result is suspiciously small).  Maybe it
should just go through the whole download process and verify the hash?
Alternatively (or additionally), the CI “source” specification could be
configured to avoid using our servers as a fallback when checking
sources.

I agree that adding more identifiers (commit hashes or whatever) makes
things more robust, but the cost is more work when creating, updating,
and reviewing packages.  I think we should start by verifying the
identifiers we already have (i.e., checking that the URI and method of
the origin produce the right output).  It would solve many existing
problems and would serve as a nice foundation for future improvements.

And as a bonus, if you want to be really kind to future time travellers,
when fixing an errant hash, please include a nice hint as to what the
original hash was for (like a commit hash).  We have commit
ca5a791f6285b08506ccd662d5911ccf0c4d1ece in our repo, which says:

> The previous hash was from the "dev" branch of the repository.

I can’t find the source for the previous hash, and if I could actually
travel through time, I would change the commit message to:

> The previous hash was from commit abcd0123..., which comes from the
> "dev" branch of the repository.

:)


-- Tim


  reply	other threads:[~2022-01-01 17:46 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-28 20:55 On raw strings in <origin> commit field Liliana Marie Prikler
2021-12-29  8:39 ` zimoun
2021-12-29 20:25   ` Liliana Marie Prikler
2021-12-30 12:43     ` zimoun
2021-12-31  0:02       ` Liliana Marie Prikler
2021-12-31  1:23         ` zimoun
2021-12-31  3:27           ` Liliana Marie Prikler
2021-12-31  9:31             ` Ricardo Wurmus
2021-12-31 11:07               ` Liliana Marie Prikler
2021-12-31 12:31                 ` Ricardo Wurmus
2021-12-31 13:18                   ` Liliana Marie Prikler
2021-12-31 13:15               ` zimoun
2021-12-31 15:19                 ` Liliana Marie Prikler
2021-12-31 17:21                   ` zimoun
2021-12-31 20:52                     ` Liliana Marie Prikler
2021-12-31 23:36         ` Mark H Weaver
2022-01-01  1:33           ` Liliana Marie Prikler
2022-01-01  5:00             ` Mark H Weaver
2022-01-01 10:33               ` Liliana Marie Prikler
2022-01-01 20:37                 ` Mark H Weaver
2022-01-01 22:55                   ` Liliana Marie Prikler
2022-01-02 22:57                     ` Mark H Weaver
2022-01-03 21:25                       ` Liliana Marie Prikler
2022-01-03 23:14                         ` Mark H Weaver
2022-01-04 19:55                           ` Liliana Marie Prikler
2022-01-04 23:42                             ` Mark H Weaver
2022-01-05  9:28                               ` Mark H Weaver
2022-01-05 20:43                                 ` Liliana Marie Prikler
2022-01-06 10:38                                   ` Mark H Weaver
2022-01-06 11:25                                     ` Liliana Marie Prikler
2022-01-02 19:30                   ` zimoun
2022-01-02 21:35                     ` Liliana Marie Prikler
2022-01-03  9:22                       ` zimoun
2022-01-03 18:13                         ` Liliana Marie Prikler
2022-01-03 19:07                           ` zimoun
2022-01-03 20:19                             ` Liliana Marie Prikler
2022-01-03 23:00                               ` zimoun
2022-01-04  5:23                                 ` Liliana Marie Prikler
2022-01-04  8:51                                   ` zimoun
2022-01-04 13:15                                     ` zimoun
2022-01-04 19:45                                       ` Liliana Marie Prikler
2022-01-04 19:53                                         ` zimoun
2021-12-31 23:56         ` Mark H Weaver
2022-01-01  0:15           ` Liliana Marie Prikler
2021-12-30  1:13 ` Mark H Weaver
2021-12-30 12:56   ` zimoun
2021-12-31  3:15   ` Liliana Marie Prikler
2021-12-31  7:57     ` Taylan Kammer
2021-12-31 10:55       ` Liliana Marie Prikler
2022-01-01  1:41     ` Mark H Weaver
2022-01-01 11:12       ` Liliana Marie Prikler
2022-01-01 17:45         ` Timothy Sample [this message]
2022-01-01 19:52           ` Liliana Marie Prikler
2022-01-02 23:00             ` Timothy Sample
2022-01-03 15:46           ` Ludovic Courtès
2022-01-01 20:19         ` Mark H Weaver
2022-01-01 23:20           ` Liliana Marie Prikler
2022-01-02 12:25             ` Mark H Weaver
2022-01-02 14:09               ` Liliana Marie Prikler
2022-01-02  2:07         ` Bengt Richter
2021-12-31 17:56 ` Vagrant Cascadian
2022-01-03 15:51   ` Ludovic Courtès
2022-01-03 16:29     ` Vagrant Cascadian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=874k6nqrhs.fsf@ngyro.com \
    --to=samplet@ngyro.com \
    --cc=guix-devel@gnu.org \
    --cc=liliana.prikler@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).