unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
From: Liliana Marie Prikler <liliana.prikler@gmail.com>
To: Timothy Sample <samplet@ngyro.com>
Cc: guix-devel@gnu.org
Subject: Re: On raw strings in <origin> commit field
Date: Sat, 01 Jan 2022 20:52:25 +0100	[thread overview]
Message-ID: <ea6a072346578e34026410e6a8413cea5494f247.camel@gmail.com> (raw)
In-Reply-To: <874k6nqrhs.fsf@ngyro.com>

Hi Timothy,

Am Samstag, dem 01.01.2022 um 12:45 -0500 schrieb Timothy Sample:
> If you want a concrete example to think through, there’s ‘eclib’. 
> Our package says it’s version “20190909”, but that’s not what
> upstream calls version “20190909”.  It looks like when we packaged
> ‘eclib’, that tag pointed to commit
> 19e7e3e74268bf78bd9a1c4ba07597d5434fb166, but now
> it points to bfbbd7c414521e1bf5e718a2925ea8ad845a2e87.
> 
> If you try to build ‘eclib’, everything will work great, since we can
> grab the checkout from our servers.  If you use
> 
>     $ guix build --check -S eclib
> 
> you get a hash mismatch.  We have CI jobs for sources, but they
> aren’t checking this: <https://ci.guix.gnu.org/build/319/details>. 
> That job succeeds after downloading the checkout from our servers.
With the robustness framework that is talked about here, this is only
partially robust.  If you have substitutes disabled, then a normal
`guix build -S eclib' also fails, and if CI eventually garbage collects
the source, the same happens for everyone.

If we simply hardcoded the hash on the other hand, none of that would
happen at all, you couldn't even use `guix build --check -S' as an
oracle.

> There are two things I can highlight from this case.
> 
> First, as expected, finding the original commit was painful.  SWH did
> not record the old version of the tag.  Comparing it with the
> checkout from our servers showed that the differences were very
> minor.  With that in mind, I moved backwards through the commit
> history with ‘guix hash’ until I found a match.  As pointed out many
> times, if I had the original commit, I could just ask SWH for it
> directly.
> 
> Second, these cases are very, very rare.  (I’ve essentially checked
> every Git origin since Guix version 1.0.0, and this problem is not
> one that worries me).  “Tricking Peer Review”-style problems seem to
> be much more prevalent.  When tracking down a “difficult” Git origin,
> the first thing I do is grep the Guix Git history for a “oops I
> committed the wrong hash” message.  I recommend we focus our energies
> there before worrying too much about replacing tags with commits or
> using both or whatever.
Since you are our expert on preservation, would you mind if I ask you
for some estimates on how painful it is to track down such commits in
general, if it could be made easier were you to record tag → commit
(alternatively file-name x sha256 → SWHID) maps periodically (or if you
already have such a map and those arise while creating it), and how
many “Tricking Peer Review”-style problems you think are currently
around?

> > > Regarding "Tricking Peer Review": I think it would be ideal for
> > > package definitions to include both the git tag _and_ the git
> > > commit hash, and to teach our linter to raise an alarm when the
> > > expected tags are missing or fail to match the expected commit
> > > hash.
> > 
> > That is among the solutions I've proposed here, so naturally I'd be
> > fine with it.
> 
> Given what I wrote above, maybe we could start by updating the linter
> so that ‘check-source’ actually checks that it gets the right result.
> Right now it uses a few heuristics to check that the result looks
> okay (for instance, it checks if the result is suspiciously small). 
> Maybe it should just go through the whole download process and verify
> the hash?  Alternatively (or additionally), the CI “source”
> specification could be configured to avoid using our servers as a
> fallback when checking sources.
I think substitutes should be disabled for the source download of a
"check-source".  Even if a substitute or SWH fallback exists, that's
not what we want to check here, no?

> I agree that adding more identifiers (commit hashes or whatever)
> makes things more robust, but the cost is more work when creating,
> updating, and reviewing packages.  I think we should start by
> verifying the identifiers we already have (i.e., checking that the
> URI and method of the origin produce the right output).  It would
> solve many existing problems and would serve as a nice foundation for
> future improvements.
Is this something we can reasonably expect our current CI or CI in
general to handle (assuming we tweaked the linter to behave as you
intend?)  Or would it make more sense to implement this as a
weekly/monthly cronjob?

> And as a bonus, if you want to be really kind to future time
> travellers, when fixing an errant hash, please include a nice hint as
> to what the original hash was for (like a commit hash).  We have
> commit ca5a791f6285b08506ccd662d5911ccf0c4d1ece in our repo, which
> says:
> 
> > The previous hash was from the "dev" branch of the repository.
> 
> I can’t find the source for the previous hash, and if I could
> actually travel through time, I would change the commit message to:
> 
> > The previous hash was from commit abcd0123..., which comes from the
> > "dev" branch of the repository.
+1 from me for useful commit messages.



  reply	other threads:[~2022-01-01 19:52 UTC|newest]

Thread overview: 63+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-12-28 20:55 On raw strings in <origin> commit field Liliana Marie Prikler
2021-12-29  8:39 ` zimoun
2021-12-29 20:25   ` Liliana Marie Prikler
2021-12-30 12:43     ` zimoun
2021-12-31  0:02       ` Liliana Marie Prikler
2021-12-31  1:23         ` zimoun
2021-12-31  3:27           ` Liliana Marie Prikler
2021-12-31  9:31             ` Ricardo Wurmus
2021-12-31 11:07               ` Liliana Marie Prikler
2021-12-31 12:31                 ` Ricardo Wurmus
2021-12-31 13:18                   ` Liliana Marie Prikler
2021-12-31 13:15               ` zimoun
2021-12-31 15:19                 ` Liliana Marie Prikler
2021-12-31 17:21                   ` zimoun
2021-12-31 20:52                     ` Liliana Marie Prikler
2021-12-31 23:36         ` Mark H Weaver
2022-01-01  1:33           ` Liliana Marie Prikler
2022-01-01  5:00             ` Mark H Weaver
2022-01-01 10:33               ` Liliana Marie Prikler
2022-01-01 20:37                 ` Mark H Weaver
2022-01-01 22:55                   ` Liliana Marie Prikler
2022-01-02 22:57                     ` Mark H Weaver
2022-01-03 21:25                       ` Liliana Marie Prikler
2022-01-03 23:14                         ` Mark H Weaver
2022-01-04 19:55                           ` Liliana Marie Prikler
2022-01-04 23:42                             ` Mark H Weaver
2022-01-05  9:28                               ` Mark H Weaver
2022-01-05 20:43                                 ` Liliana Marie Prikler
2022-01-06 10:38                                   ` Mark H Weaver
2022-01-06 11:25                                     ` Liliana Marie Prikler
2022-01-02 19:30                   ` zimoun
2022-01-02 21:35                     ` Liliana Marie Prikler
2022-01-03  9:22                       ` zimoun
2022-01-03 18:13                         ` Liliana Marie Prikler
2022-01-03 19:07                           ` zimoun
2022-01-03 20:19                             ` Liliana Marie Prikler
2022-01-03 23:00                               ` zimoun
2022-01-04  5:23                                 ` Liliana Marie Prikler
2022-01-04  8:51                                   ` zimoun
2022-01-04 13:15                                     ` zimoun
2022-01-04 19:45                                       ` Liliana Marie Prikler
2022-01-04 19:53                                         ` zimoun
2021-12-31 23:56         ` Mark H Weaver
2022-01-01  0:15           ` Liliana Marie Prikler
2021-12-30  1:13 ` Mark H Weaver
2021-12-30 12:56   ` zimoun
2021-12-31  3:15   ` Liliana Marie Prikler
2021-12-31  7:57     ` Taylan Kammer
2021-12-31 10:55       ` Liliana Marie Prikler
2022-01-01  1:41     ` Mark H Weaver
2022-01-01 11:12       ` Liliana Marie Prikler
2022-01-01 17:45         ` Timothy Sample
2022-01-01 19:52           ` Liliana Marie Prikler [this message]
2022-01-02 23:00             ` Timothy Sample
2022-01-03 15:46           ` Ludovic Courtès
2022-01-01 20:19         ` Mark H Weaver
2022-01-01 23:20           ` Liliana Marie Prikler
2022-01-02 12:25             ` Mark H Weaver
2022-01-02 14:09               ` Liliana Marie Prikler
2022-01-02  2:07         ` Bengt Richter
2021-12-31 17:56 ` Vagrant Cascadian
2022-01-03 15:51   ` Ludovic Courtès
2022-01-03 16:29     ` Vagrant Cascadian

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://guix.gnu.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=ea6a072346578e34026410e6a8413cea5494f247.camel@gmail.com \
    --to=liliana.prikler@gmail.com \
    --cc=guix-devel@gnu.org \
    --cc=samplet@ngyro.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).