unofficial mirror of guix-devel@gnu.org 
 help / color / mirror / code / Atom feed
* failure when rebuilding the past: long term?
@ 2022-02-02 22:20 zimoun
  2022-02-05 14:40 ` Ludovic Courtès
  0 siblings, 1 reply; 2+ messages in thread
From: zimoun @ 2022-02-02 22:20 UTC (permalink / raw)
  To: guix-devel

Hi,

Investigating [0], and working on rebuilding a tiny part of "Preservation
of Guix", I note some failures of the time-machine.

Let just display the help message using the time-machine for all the 120
commits updating the package guix.  Other said, let just check that it
is possible to time-machine some commits.

--8<---------------cut here---------------start------------->8---
for ci in $(git log --format="%H %s" v1.0.0.. | grep 'guix: Update' | cut -f1 -d' '); do printf "$ci "; guix time-machine --commit=$ci -- help 1> /dev/null 2> /dev/null; if [ $? -eq 1 ]; then echo KO; else echo OK; fi ;done
--8<---------------cut here---------------end--------------->8---

These commits are failing:

7bae88b5b9dcacad4dcd11b353b486dc2f8a78e2 Sep 2021
f08587682a631d3fe30159af838c6766dd65205b Oct 2020
7db32c94b0b7d7fe0896389772f7cda802536693 Oct 2020
29d3569c9c712d70466d9175474c8fd1a3262234 Aug 2020
d3eee3c0643a20ba06941ba45d9d27146a8b634d Jul 2020
b778989e9a299102355b7145d1963baed5db7268 Mar 2020
cd2c3dc2d6ed1372ba457d7856b3fdbf097c7095 Nov 2019

7/120, not that bad! :-)


However, for instance, I miss how it is possible to get:

--8<---------------cut here---------------start------------->8---
@ build-started /gnu/store/9p560p4gd4f7jpbwnc8sarqkxfyxpxb9-guix-1.1.0-28.d27dbeb-checkout.drv - x86_64-linux /var/log/guix/drvs/9p//560p4gd4f7jpbwnc8sarqkxfyxpxb9-guix-1.1.0-28.d27dbeb-checkout.drv.bz2 3761976
@ build-log 3761976 41
guile: warning: failed to install locale
@ build-log 3761976 152
environment variable `PATH' set to `/gnu/store/378zjf2kgajcfd7mfr98jn5xyc5wa3qv-gzip-1.10/bin:/gnu/store/sf3rbvb6iqcphgm1afbplcs72hsywg25-tar-1.32/bin'
@ build-log 3761976 117
Initialized empty Git repository in /gnu/store/6ccci55npzlzb0pnpm6sf5f8swnnrpfg-guix-1.1.0-28.d27dbeb-checkout/.git/
/@ build-log 3761976 65
fatal: dumb http transport does not support shallow capabilities
@ build-log 3761976 55
Failed to do a shallow fetch; retrying a full fetch...
\@ build-log 3761976 324
From https://git.savannah.gnu.org/r/guix
[...]
HEAD is now at d27dbeb9d8 gnu: guix: Install OpenRC init files to $(prefix)/etc.
@ hash-mismatch /gnu/store/6ccci55npzlzb0pnpm6sf5f8swnnrpfg-guix-1.1.0-28.d27dbeb-checkout r:sha256 05mvljdr4clnv8i89db2hpjm33xg7jcg1vs00dbb4jcivlpkmqrl 0j60m9s47n23flfp2yn4ww4vsk8qvp500m2x1x0ib5bjywj1hiwl
hash mismatch for store item '/gnu/store/6ccci55npzlzb0pnpm6sf5f8swnnrpfg-guix-1.1.0-28.d27dbeb-checkout'
@ build-failed /gnu/store/9p560p4gd4f7jpbwnc8sarqkxfyxpxb9-guix-1.1.0-28.d27dbeb-checkout.drv - 1 hash mismatch for store item '/gnu/store/6ccci55npzlzb0pnpm6sf5f8swnnrpfg-guix-1.1.0-28.d27dbeb-checkout'
--8<---------------cut here---------------end--------------->8---

or another:

--8<---------------cut here---------------start------------->8---
fatal: reference is not a tree: 537080fad8dfa63df2f1d0b0d046a28077d56a56
@ build-log 3768401 160
git-fetch: '/gnu/store/i5b1vv7qc6l2gi4xwa9mqzjy3shvgk30-git-minimal-2.28.0/bin/git checkout 537080fad8dfa63df2f1d0b0d046a28077d56a56' failed with exit code 128
--8<---------------cut here---------------end--------------->8---

Other said, why is time-machine fully cloning from network and not
reusing '~/.cache/guix/checkouts' since it has already done earlier with
«Updating channel 'guix' from Git repository at
'https://git.savannah.gnu.org/git/guix.git'...»?


The questions are then:

 1. For these failed commits, is it fixable?  If yes, how?

 2. If not, what could be done to cut earlier?  For instance, collect a
 list of commits known to be unreachable.

 3. Having all the sources is one thing, but being able to rebuild is
 another.  Failure of OpenBLAS [0] is one example, of some mesboot [1]
 or of texlive [2] are others.  It appears to me that something is
 inadequate with the current workflow pushing all to master without any
 automated* checks. Other said, failures as 8f9fd9b70c (value "Unbound
 variable: ~S") (value (r-biobase) seems wrong by design.  Well,
 ac6f677249 is another recent example.  Somehow, because the package
 collection is becoming larger and larger (which is good!)  then it is
 becoming harder and harder to maintain the consistency both forward and
 backward.  For the last Guix revision, my rough estimate is that ~5% of
 packages are broken and my guess is that this number is “independant”
 of the package collection size.  However, I already have some
 collection of unreachable commits by the time-machine and then for some
 reachable commits, I do not have numbers for what is effectively
 rebuildable.  As discussed in this thread [3], maybe Guix is moving too
 fast; or better worded, maybe the current workflow is inadequate with
 some goals of long term and build all from sources.  I do not know…  My
 point here: do we provide a list of commits (release, others) where we
 apply more care for long term?

*automated check: “guix lint” is not automated since it depends on the
 submitter and/or the committer; and for having spent some time to check
 the coverage of git-fetch by SWH, I can tell that “guix lint” is not
 automated. ;-)


0: <https://lists.gnu.org/archive/html/guix-devel/2022-02/msg00000.html>
1: <https://lists.gnu.org/archive/html/guix-devel/2022-02/msg00009.html>
2: <https://lists.gnu.org/archive/html/guix-devel/2022-02/msg00010.html>
3: <https://lists.gnu.org/archive/html/guix-devel/2021-03/msg00329.html>

Cheers,
simon


^ permalink raw reply	[flat|nested] 2+ messages in thread

* Re: failure when rebuilding the past: long term?
  2022-02-02 22:20 failure when rebuilding the past: long term? zimoun
@ 2022-02-05 14:40 ` Ludovic Courtès
  0 siblings, 0 replies; 2+ messages in thread
From: Ludovic Courtès @ 2022-02-05 14:40 UTC (permalink / raw)
  To: zimoun; +Cc: guix-devel

Hi!

As a preamble, I’d like to say that what we’re doing in terms of time
traveling is ambitious; it’s never been done before, AFAIK.

And there are many pitfalls, as you write, mainly: disappearing source
(we’re working on it!), and, a bigger concern IMO, “non-deterministic”
build processes—in a broad sense: that includes build processes that
depend on CPU features, kernel versions, and other things not specified
in derivations.

I think we can address the latter, indirectly, with early cutoffs: if
two different derivations yield the same output, then we can consider
them equivalent and avoid rebuilding anything that depends on it,
instead just grafting them.  It helps in that it would allow users to
build ‘--without-tests’ and get an equivalent result, or to build with a
hypothetical ‘--in-virtual-machine’ or ‘--with-current-time=2020-01-01’
option.

zimoun <zimon.toutoune@gmail.com> skribis:

>  3. Having all the sources is one thing, but being able to rebuild is
>  another.  Failure of OpenBLAS [0] is one example, of some mesboot [1]
>  or of texlive [2] are others.  It appears to me that something is
>  inadequate with the current workflow pushing all to master without any
>  automated* checks. Other said, failures as 8f9fd9b70c (value "Unbound
>  variable: ~S") (value (r-biobase) seems wrong by design.  Well,
>  ac6f677249 is another recent example.  Somehow, because the package
>  collection is becoming larger and larger (which is good!)  then it is
>  becoming harder and harder to maintain the consistency both forward and
>  backward.  For the last Guix revision, my rough estimate is that ~5% of
>  packages are broken and my guess is that this number is “independant”
>  of the package collection size.  However, I already have some
>  collection of unreachable commits by the time-machine and then for some
>  reachable commits, I do not have numbers for what is effectively
>  rebuildable.  As discussed in this thread [3], maybe Guix is moving too
>  fast; or better worded, maybe the current workflow is inadequate with
>  some goals of long term and build all from sources.  I do not know…  My
>  point here: do we provide a list of commits (release, others) where we
>  apply more care for long term?

It’s hard for committers to push evidently broken code, such as unbound
variables, because ‘guix lint’, ‘make’, & co. catch it.  Still, it would
be great if it never happened at all, and for that enforcing server-side
checks could help.

The next issue is broken builds.  Automation can definitely help, and
Chris Baines’ work on automated patch handling or anything along these
lines would be a step in the right direction.

Non-deterministic builds are probably harder to automatically identify,
yet it’s a bigger concern.

Besides, I think we should do some sort of CI for time traveling, to
make sure we can travel forward and backward for different revisions.
(I would love to see research institutes invest in this work.)

My 2¢,
Ludo’.


^ permalink raw reply	[flat|nested] 2+ messages in thread

end of thread, other threads:[~2022-02-05 14:41 UTC | newest]

Thread overview: 2+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-02 22:20 failure when rebuilding the past: long term? zimoun
2022-02-05 14:40 ` Ludovic Courtès

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).