* bug#54447: cuirass: missing derivation error
@ 2022-03-18 12:36 Mathieu Othacehe
2022-08-10 9:43 ` Maxime Devos
` (4 more replies)
0 siblings, 5 replies; 21+ messages in thread
From: Mathieu Othacehe @ 2022-03-18 12:36 UTC (permalink / raw)
To: 54447
Hello,
A lot of builds, among them ~20 system tests[1], are failing with:
"cannot build missing derivation
?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
errors.
Those derivations are present on the CI head node. This means that the
errors occur during substitution. This is most likely caused by some
issue with the publish server, because:
- The publish server serves a 404 error. We should get rid once and for
all of this 404 thing, pushing something like:
https://issues.guix.gnu.org/50040.
or
- The publish server is not fast enough and hits an Nginx timeout that
closes the communication.
Any other cause I could be missing?
Thanks,
Mathieu
[1]: https://ci.guix.gnu.org/eval/159975?status=failed
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
@ 2022-08-10 9:43 ` Maxime Devos
2022-08-10 15:30 ` Maxime Devos
2022-12-10 10:57 ` Ludovic Courtès
` (3 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Maxime Devos @ 2022-08-10 9:43 UTC (permalink / raw)
To: 54447
[-- Attachment #1.1.1: Type: text/plain, Size: 64 bytes --]
Here's another instance: https://ci.guix.gnu.org/eval/528710
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2022-08-10 9:43 ` Maxime Devos
@ 2022-08-10 15:30 ` Maxime Devos
0 siblings, 0 replies; 21+ messages in thread
From: Maxime Devos @ 2022-08-10 15:30 UTC (permalink / raw)
To: 54447
[-- Attachment #1.1.1.1: Type: text/plain, Size: 559 bytes --]
On 10-08-2022 11:43, Maxime Devos wrote:
> Here's another instance: https://ci.guix.gnu.org/eval/528710
>
More information:
* non-ASCII does not seem to be set up (see: ?) (looks irrelevant)
* here are connection failures
Log:
> substitute:
> substitute: ^[[Kupdating substitutes from 'http://141.80.167.131'... 0.0%guix substitute: warning: 141.80.167.131: connection failed: Connection refused
> substitute:
> cannot build missing derivation ?/gnu/store/4gqj2byvj9zz30wzvwkbijpya3vn1bjw-rust-dogged-0.2.0.drv?
Greetings,
Maxime.
[-- Attachment #1.1.1.2: Type: text/html, Size: 1216 bytes --]
[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]
[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
2022-08-10 9:43 ` Maxime Devos
@ 2022-12-10 10:57 ` Ludovic Courtès
2023-10-15 20:21 ` Ludovic Courtès
2023-08-22 3:38 ` Maxim Cournoyer
` (2 subsequent siblings)
4 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2022-12-10 10:57 UTC (permalink / raw)
To: 54447
Mathieu Othacehe <othacehe@gnu.org> skribis:
> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
>
> Those derivations are present on the CI head node. This means that the
> errors occur during substitution. This is most likely caused by some
> issue with the publish server, because:
>
> - The publish server serves a 404 error. We should get rid once and for
> all of this 404 thing, pushing something like:
> https://issues.guix.gnu.org/50040.
>
> or
>
> - The publish server is not fast enough and hits an Nginx timeout that
> closes the communication.
Also being discussed at <https://issues.guix.gnu.org/48468#12>.
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
2022-08-10 9:43 ` Maxime Devos
2022-12-10 10:57 ` Ludovic Courtès
@ 2023-08-22 3:38 ` Maxim Cournoyer
2023-08-22 20:38 ` Ludovic Courtès
2023-08-30 12:17 ` 宋文武 via Bug reports for GNU Guix
2023-10-10 15:52 ` Ludovic Courtès
2024-04-14 0:15 ` John Kehayias via Bug reports for GNU Guix
4 siblings, 2 replies; 21+ messages in thread
From: Maxim Cournoyer @ 2023-08-22 3:38 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 54447
Hello,
Mathieu Othacehe <othacehe@gnu.org> writes:
> Hello,
>
> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
>
> Those derivations are present on the CI head node. This means that the
> errors occur during substitution. This is most likely caused by some
> issue with the publish server, because:
>
> - The publish server serves a 404 error. We should get rid once and for
> all of this 404 thing, pushing something like:
> https://issues.guix.gnu.org/50040.
>
> or
>
> - The publish server is not fast enough and hits an Nginx timeout that
> closes the communication.
>
> Any other cause I could be missing?
Looking at multiple of recent 'cannot build missing derivation' build
failures on Cuirass, I see for example:
--8<---------------cut here---------------start------------->8---
substitute:
substitute: [Kupdating substitutes from 'http://141.80.167.131'... 0.0%
substitute: [Kcould not fetch http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation ?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv?
--8<---------------cut here---------------end--------------->8---
So it seems the error originated from guix-publish being too heavily
under load to produce a timely reply, and the nginx proxy issued a 504
(timeout) error response.
Looking into /var/log/guix-publish.log for a corresponding entry, I
found:
--8<---------------cut here---------------start------------->8---
2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35 159:7 2 (http-write #<<http-server> socket: #<input-output: fi…> …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:35 1 (put-bytevector #<input-output: socket 42> #vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35 1685:16 0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---
So the connection was apparently severed (?), resulting in the "broken
pipe" error.
Here's a different one:
--8<---------------cut here---------------start------------->8---
substitute:
substitute: [Kupdating substitutes from 'http://141.80.167.131'... 0.0%
substitute: [Kcould not fetch http://141.80.167.131/p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation ?/gnu/store/p2lfyvbxicjqsm4qp6368bx76gp0g948-python-astropy-healpix-0.7.drv?
--8<---------------cut here---------------end--------------->8---
it occurred around the same time, and the failing mode was the same, per
guix-publish.log:
--8<---------------cut here---------------start------------->8---
2023-08-21 23:59:35 GET /p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35 159:7 2 (http-write #<<http-server> socket: #<input-output: fi…> …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:35 1 (put-bytevector #<input-output: socket 50> #vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35 1685:16 0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---
I wonder if these could be related to the DDoS protection discovered on
the Berlin network. I'll keep looking for other, potentially different
occurrences.
--
Thanks,
Maxim
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-08-22 3:38 ` Maxim Cournoyer
@ 2023-08-22 20:38 ` Ludovic Courtès
2023-08-30 12:17 ` 宋文武 via Bug reports for GNU Guix
1 sibling, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2023-08-22 20:38 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 54447
Hi,
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> Looking at multiple of recent 'cannot build missing derivation' build
> failures on Cuirass, I see for example:
>
> substitute:
> substitute: [Kupdating substitutes from 'http://141.80.167.131'... 0.0%
> substitute: [Kcould not fetch http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504
> substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
> cannot build missing derivation ?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv?
>
>
> So it seems the error originated from guix-publish being too heavily
> under load to produce a timely reply, and the nginx proxy issued a 504
> (timeout) error response.
>
> Looking into /var/log/guix-publish.log for a corresponding entry, I
> found:
>
> 2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo
> 2023-08-21 23:59:35 In web/server/http.scm:
> 2023-08-21 23:59:35 159:7 2 (http-write #<<http-server> socket: #<input-output: fi…> …)
> 2023-08-21 23:59:35 In unknown file:
> 2023-08-21 23:59:35 1 (put-bytevector #<input-output: socket 42> #vu8(83 # …) …)
> 2023-08-21 23:59:35 In ice-9/boot-9.scm:
> 2023-08-21 23:59:35 1685:16 0 (raise-exception _ #:continuable? _)
> 2023-08-21 23:59:35 In procedure fport_write: Broken pipe
>
>
> So the connection was apparently severed (?), resulting in the "broken
> pipe" error.
I think it’s just that, when ‘guix publish’ eventually replied, the
client had left, hence EPIPE.
The initial problem does look like ‘guix publish’ being too slow. Do
the corresponding nginx logs confirm the “backend too slow => 504”
hypothesis?
Thanks for investigating!
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-08-22 3:38 ` Maxim Cournoyer
2023-08-22 20:38 ` Ludovic Courtès
@ 2023-08-30 12:17 ` 宋文武 via Bug reports for GNU Guix
2023-10-11 3:21 ` Maxim Cournoyer
1 sibling, 1 reply; 21+ messages in thread
From: 宋文武 via Bug reports for GNU Guix @ 2023-08-30 12:17 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 54447
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
> I wonder if these could be related to the DDoS protection discovered on
> the Berlin network. I'll keep looking for other, potentially different
> occurrences.
Hello, this one for ddd: https://ci.guix.gnu.org/build/1372655/log/raw
cannot build missing derivation ?/gnu/store/anzz2p18b7r9x45y350avnk8br2yihi2-ddd-3.4.0.drv?
Restart it on CI still got the same error.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
` (2 preceding siblings ...)
2023-08-22 3:38 ` Maxim Cournoyer
@ 2023-10-10 15:52 ` Ludovic Courtès
2023-10-11 3:08 ` Maxim Cournoyer
` (2 more replies)
2024-04-14 0:15 ` John Kehayias via Bug reports for GNU Guix
4 siblings, 3 replies; 21+ messages in thread
From: Ludovic Courtès @ 2023-10-10 15:52 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 54447, guix-sysadmin
[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]
Hello!
Mathieu Othacehe <othacehe@gnu.org> skribis:
> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
I have a disappointingly simple hypothesis for this. Remember that
“missing derivation” errors happen primarily for system tests.
Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
mcron job, explicitly removes GC roots for things like *-os-encrypted
once they’re more than two days old, as well as GC roots for the
corresponding .drv.
I think this was increasing the likelihood that a .drv would be GC’d by
the time we run the test: under high load¹, it’s plausible that a system
test wouldn’t be built within two days after it’s been queued.
I’m proposing the change below to address this; I don’t think we need
‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
things in ‘guix publish’ cache first and foremost.
Thoughts?
In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
days in practice). That’s okay, except that it would be safer to delete
GC roots for a .drv if and only if it’s been built already.
Thanks,
Ludo’.
¹ The queue was often processed slowly, with many workers remaining idle
due to the bug fixed by
<https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=40f70d28aed55c404cca6a0760860fb4942e6bee>.
[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3079 bytes --]
diff --git a/hydra/modules/sysadmin/services.scm b/hydra/modules/sysadmin/services.scm
index fecfdde..e6f2b44 100644
--- a/hydra/modules/sysadmin/services.scm
+++ b/hydra/modules/sysadmin/services.scm
@@ -110,9 +110,7 @@
((guix config) => ,(make-config.scm)))
#~(begin
(use-modules (ice-9 ftw)
- (srfi srfi-1)
- (guix store)
- (guix derivations))
+ (srfi srfi-1))
(define %roots-directory
"/var/guix/profiles/per-user/cuirass/cuirass")
@@ -157,28 +155,6 @@
deleted))
deleted))
- (define (root-target root)
- ;; Return the store item ROOT refers to.
- (string-append (%store-prefix) "/" (basename root)))
-
- (define (derivation-referrers store item)
- ;; Return the referrers of the derivers of ITEM.
- (let* ((derivers (valid-derivers store item))
- (referrers (append-map (lambda (drv)
- (referrers store drv))
- derivers)))
- (delete-duplicates referrers)))
-
- (define (delete-gc-root-for-derivation drv)
- ;; Delete the GC root for DRV, if any.
- (catch 'system-error
- (lambda ()
- (let ((item (derivation-path->output-path drv)))
- (delete-file
- (string-append %roots-directory
- "/" (basename drv)))))
- (const #f)))
-
;; Note: 'scandir' would introduce too much overhead due
;; to the large number of entries that it would sort.
(define deleted
@@ -197,17 +173,7 @@
(for-each (lambda (file)
(display file port)
(newline port))
- deleted)))
-
- ;; Since we run 'guix-daemon --gc-keep-outputs
- ;; --gc-keep-derivations', also remove GC roots for the outputs of
- ;; derivations that refer to the derivers of DELETED.
- (for-each delete-gc-root-for-derivation
- (with-store store
- (append-map (lambda (root)
- (derivation-referrers
- store (root-target root)))
- deleted))))))))
+ deleted))))))))
(define (gc-jobs threshold)
"Return the garbage collection mcron jobs. The garbage collection
@@ -251,8 +217,7 @@ collection instead."
(build-accounts (* build-accounts-to-max-jobs-ratio max-jobs))
(extra-options (list "--max-jobs" (number->string max-jobs)
- "--cores" (number->string cores)
- "--gc-keep-outputs" "--gc-keep-derivations"))))
+ "--cores" (number->string cores)))))
\f
;;;
^ permalink raw reply related [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-10 15:52 ` Ludovic Courtès
@ 2023-10-11 3:08 ` Maxim Cournoyer
2023-10-15 20:42 ` Ludovic Courtès
2023-10-16 17:44 ` Ludovic Courtès
2 siblings, 0 replies; 21+ messages in thread
From: Maxim Cournoyer @ 2023-10-11 3:08 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Mathieu Othacehe, guix-sysadmin, 54447
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
> Hello!
>
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> A lot of builds, among them ~20 system tests[1], are failing with:
>> "cannot build missing derivation
>> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
>> errors.
>
> I have a disappointingly simple hypothesis for this. Remember that
> “missing derivation” errors happen primarily for system tests.
>
> Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
> mcron job, explicitly removes GC roots for things like *-os-encrypted
> once they’re more than two days old, as well as GC roots for the
> corresponding .drv.
>
> I think this was increasing the likelihood that a .drv would be GC’d by
> the time we run the test: under high load¹, it’s plausible that a system
> test wouldn’t be built within two days after it’s been queued.
>
> I’m proposing the change below to address this; I don’t think we need
> ‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
> things in ‘guix publish’ cache first and foremost.
>
> Thoughts?
Ah, so that mcron job is kind of a hack to hasten garbage collecting
only *some* items faster than the default policy of 30 days? And we'd
now avoid deleting selected .drv files while still deleting their
outputs, so in the case something that needs it took more than 2 days to
build, it could lead to having to rebuild the garbage collected outputs?
I'm not sure if we need such a fancy hack with the 100 TiB of data we
now have, but your fix seems reasonable (LGTM!)
> In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
> procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
> days in practice). That’s okay, except that it would be safer to delete
> GC roots for a .drv if and only if it’s been built already.
Hm. I wonder if this could explain the other cases we've seen. It
could be that building a derivation was interrupted or canceled for some
reason, then 30 days elapsed, then was garbage collected, and after
which it doesn't get recreated and we get the error of the missing .drv?
--
Thanks,
Maxim
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-08-30 12:17 ` 宋文武 via Bug reports for GNU Guix
@ 2023-10-11 3:21 ` Maxim Cournoyer
2023-10-15 16:45 ` Ludovic Courtès
0 siblings, 1 reply; 21+ messages in thread
From: Maxim Cournoyer @ 2023-10-11 3:21 UTC (permalink / raw)
To: 宋文武; +Cc: Mathieu Othacehe, Ludovic Courtès, 54447
Hello,
宋文武 <iyzsong@envs.net> writes:
[...]
> Hello, this one for ddd: https://ci.guix.gnu.org/build/1372655/log/raw
>
> cannot build missing derivation ?/gnu/store/anzz2p18b7r9x45y350avnk8br2yihi2-ddd-3.4.0.drv?
>
> Restart it on CI still got the same error.
Another example: https://ci.guix.gnu.org/build/1982454/details
--8<---------------cut here---------------start------------->8---
substitute:
substitute: [Kupdating substitutes from 'http://10.0.0.1'... 0.0%
substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
--8<---------------cut here---------------end--------------->8---
--
Thanks,
Maxim
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-11 3:21 ` Maxim Cournoyer
@ 2023-10-15 16:45 ` Ludovic Courtès
2023-10-16 13:25 ` Maxim Cournoyer
2023-11-20 19:09 ` Maxim Cournoyer
0 siblings, 2 replies; 21+ messages in thread
From: Ludovic Courtès @ 2023-10-15 16:45 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 宋文武, 54447
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> Another example: https://ci.guix.gnu.org/build/1982454/details
>
> substitute:
> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 0.0%
> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
> cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
This one is from Sep. 9, which is before I deployed the remote-worker
fixes, so I’ll dismiss it (happy to look at more recent ones though!).
Tip of the day: M-: (build-farm-build 1982454)
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2022-12-10 10:57 ` Ludovic Courtès
@ 2023-10-15 20:21 ` Ludovic Courtès
2023-10-15 20:34 ` Ludovic Courtès
0 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2023-10-15 20:21 UTC (permalink / raw)
To: 54447
Hi!
Ludovic Courtès <ludo@gnu.org> skribis:
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> A lot of builds, among them ~20 system tests[1], are failing with:
>> "cannot build missing derivation
>> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
>> errors.
>>
>> Those derivations are present on the CI head node. This means that the
>> errors occur during substitution. This is most likely caused by some
>> issue with the publish server, because:
>>
>> - The publish server serves a 404 error. We should get rid once and for
>> all of this 404 thing, pushing something like:
>> https://issues.guix.gnu.org/50040.
>>
>> or
>>
>> - The publish server is not fast enough and hits an Nginx timeout that
>> closes the communication.
>
> Also being discussed at <https://issues.guix.gnu.org/48468#12>.
I got confirmation that the cache-bypass-threshold hypothesis holds, at
least for system tests.
Namely, looking at <https://ci.guix.gnu.org/build/2258097/details>,
which ends like this:
--8<---------------cut here---------------start------------->8---
@ substituter-succeeded /gnu/store/qh2876i5l1wvxgwhg9fbl9zmb3px3n2m-gc-roots.drv
fetching path `/gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder'...
@ substituter-started /gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder substitute
Downloading http://141.80.167.131/nar/lzip/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder...
.^[[K xdg-mime-database-builder 3.6MiB/s 00:00 | 3KiB transferred.^[[K xdg-mime-database-builder 1.9MiB/s 00:00 | 3KiB transferred
@ substituter-succeeded /gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder
cannot build missing derivation ‘/gnu/store/4r1wij3bzj9zv75ds82a93jl7bcman2x-installed-extlinux-os.drv’
--8<---------------cut here---------------end--------------->8---
Looking at the nginx and ‘guix publish’ logs, I found that the missing
substitute is not that of 4r1wij3bzj9zv75ds82a93jl7bcman2x (the .drv
itself) but rather that of a dependency of that .drv:
[14/Oct/2023:23:22:09 +0200] "GET /wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg.narinfo HTTP/1.1" 404 58 "-" "GNU Guile"
That item’s size is above the cache bypass threshold of 100 MiB as
currently configured on berlin:
--8<---------------cut here---------------start------------->8---
$ du -hs /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
124M /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
--8<---------------cut here---------------end--------------->8---
The immediate fix/workaround is to raise that threshold.
A better solution would be for system tests to depend on a fixed-output
derivation for the Guix source instead of the “source” above (I use
“source” as it is used in the context of <derivation>).
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-15 20:21 ` Ludovic Courtès
@ 2023-10-15 20:34 ` Ludovic Courtès
0 siblings, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2023-10-15 20:34 UTC (permalink / raw)
To: 54447
Ludovic Courtès <ludo@gnu.org> skribis:
> Looking at the nginx and ‘guix publish’ logs, I found that the missing
> substitute is not that of 4r1wij3bzj9zv75ds82a93jl7bcman2x (the .drv
> itself) but rather that of a dependency of that .drv:
>
> [14/Oct/2023:23:22:09 +0200] "GET /wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg.narinfo HTTP/1.1" 404 58 "-" "GNU Guile"
>
> That item’s size is above the cache bypass threshold of 100 MiB as
> currently configured on berlin:
>
> $ du -hs /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
> 124M /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
>
> The immediate fix/workaround is to raise that threshold.
I raised the threshold to 150 MiB in maintenance.git commit
213384e43de63ce3a5a55599e8fb89891ffef7eb.
I reconfigured berlin and restarted ‘guix publish’ seconds ago.
Hopefully next time installation tests won’t have that problem.
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-10 15:52 ` Ludovic Courtès
2023-10-11 3:08 ` Maxim Cournoyer
@ 2023-10-15 20:42 ` Ludovic Courtès
2023-10-16 17:44 ` Ludovic Courtès
2 siblings, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2023-10-15 20:42 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 54447, guix-sysadmin
Ludovic Courtès <ludo@gnu.org> skribis:
> In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
> procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
> days in practice). That’s okay, except that it would be safer to delete
> GC roots for a .drv if and only if it’s been built already.
Fixed in Cuirass commit 55af0f70c0d4938b8eda777382bbc4d8f5698a37.
Ludo'.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-15 16:45 ` Ludovic Courtès
@ 2023-10-16 13:25 ` Maxim Cournoyer
2023-10-16 17:39 ` Ludovic Courtès
2023-11-20 19:09 ` Maxim Cournoyer
1 sibling, 1 reply; 21+ messages in thread
From: Maxim Cournoyer @ 2023-10-16 13:25 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Mathieu Othacehe, 宋文武, 54447
Hi,
Ludovic Courtès <ludo@gnu.org> writes:
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Another example: https://ci.guix.gnu.org/build/1982454/details
>>
>> substitute:
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 0.0%
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
>> cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
>
> This one is from Sep. 9, which is before I deployed the remote-worker
> fixes, so I’ll dismiss it (happy to look at more recent ones though!).
>
> Tip of the day: M-: (build-farm-build 1982454)
I don't have such a function in scope, is this from the guix-emacs
package?
--
Thanks,
Maxim
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-16 13:25 ` Maxim Cournoyer
@ 2023-10-16 17:39 ` Ludovic Courtès
0 siblings, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2023-10-16 17:39 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 宋文武, 54447
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>> Tip of the day: M-: (build-farm-build 1982454)
>
> I don't have such a function in scope, is this from the guix-emacs
> package?
It’s from the ‘emacs-build-farm’ package, which I recommend. :-)
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-10 15:52 ` Ludovic Courtès
2023-10-11 3:08 ` Maxim Cournoyer
2023-10-15 20:42 ` Ludovic Courtès
@ 2023-10-16 17:44 ` Ludovic Courtès
2024-04-04 21:33 ` Ludovic Courtès
2 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2023-10-16 17:44 UTC (permalink / raw)
To: Mathieu Othacehe; +Cc: 54447, guix-sysadmin, Maxim Cournoyer
Ludovic Courtès <ludo@gnu.org> skribis:
> Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
> mcron job, explicitly removes GC roots for things like *-os-encrypted
> once they’re more than two days old, as well as GC roots for the
> corresponding .drv.
>
> I think this was increasing the likelihood that a .drv would be GC’d by
> the time we run the test: under high load¹, it’s plausible that a system
> test wouldn’t be built within two days after it’s been queued.
>
> I’m proposing the change below to address this; I don’t think we need
> ‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
> things in ‘guix publish’ cache first and foremost.
I pushed a variant of this patch:
053839d hydra: services: Leave “guix-binary.tar.xz” GC roots.
e40d961 hydra: services: Preserve Cuirass .drv GC roots.
b8fc66c hydra: cuirass: Fix build product regexps.
I didn’t dare remove “--gc-keep-derivations”. I reconfigured berlin
just now from this commit and restarted mcron (I didn’t restart
guix-daemon to avoid downtime; we should do that when the queue is close
to empty).
We’ll have to monitor disk usage to make sure it’s not negatively
affected.
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-15 16:45 ` Ludovic Courtès
2023-10-16 13:25 ` Maxim Cournoyer
@ 2023-11-20 19:09 ` Maxim Cournoyer
1 sibling, 0 replies; 21+ messages in thread
From: Maxim Cournoyer @ 2023-11-20 19:09 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: Mathieu Othacehe, 宋文武, 54447
Hi Ludovic,
Ludovic Courtès <ludo@gnu.org> writes:
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Another example: https://ci.guix.gnu.org/build/1982454/details
>>
>> substitute:
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 0.0%
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
>> cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
>
> This one is from Sep. 9, which is before I deployed the remote-worker
> fixes, so I’ll dismiss it (happy to look at more recent ones though!).
Here's a more recent occurrence:
https://ci.guix.gnu.org/build/2635272/details
I haven't restarted it to leave proof of its existence :-)
--
Thanks,
Maxim
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2023-10-16 17:44 ` Ludovic Courtès
@ 2024-04-04 21:33 ` Ludovic Courtès
2024-07-14 21:49 ` Ludovic Courtès
0 siblings, 1 reply; 21+ messages in thread
From: Ludovic Courtès @ 2024-04-04 21:33 UTC (permalink / raw)
To: 54447; +Cc: Mathieu Othacehe, guix-sysadmin, Maxim Cournoyer
Hello!
News from the everlasting bug!
cannot build missing derivation ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’
(From <https://ci.guix.gnu.org/build/3861708/>.)
Why was it missing this time? /var/log/nginx/error.log:
--8<---------------cut here---------------start------------->8---
2024/04/04 17:15:03 [error] 98751#0: *152293778 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.169, server: ci.guix.gnu.org, request: "GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo", host: "141.80.167.131"
--8<---------------cut here---------------end--------------->8---
Oops! (There are dozens of upstream timeouts logged on that minute.)
/var/log/guix-publish.log:
--8<---------------cut here---------------start------------->8---
2024-04-04 17:14:51 GET /nar/lzip/pz39bkq7pd1hgy5rwiynqa33gyjvpgs5-python-pygments-2.12.0
2024-04-04 17:14:51 GET /z2xxwwxswdd4b8c8iwmxhqnqbp5nwz09.narinfo
2024-04-04 17:14:51 GET /lgyck285bsxzwrnh3x5ix5dwzd3n3wga.narinfo
2024-04-04 17:14:51 GET /nar/zstd/jxkglr445f215m2faqz1i2lgmbans4rf-texlive-amsmath-66594-doc
2024-04-04 17:15:33 GET /qg5cxb869i42jn7x2dm6k5l41ikkz21w.narinfo
2024-04-04 17:15:33 GET /nar/zstd/i2hp3q2pfhsyl0al7z38am7cqpddi4qr-texlive-capt-of-66594-doc
2024-04-04 17:15:33 GET /hh0gdbljj3cjdnjbr88kfm21mhys5sy7.narinfo
2024-04-04 17:15:33 GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo
2024-04-04 17:15:33 GET /yj63wifalfr6sla42h7mkqg011qrl5d0.narinfo
2024-04-04 17:15:33 GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo
2024-04-04 17:15:33 -> GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo: 404
2024-04-04 17:15:33 GET /nar/lzip/6zxlrw15b9dsv73s7v5fqabl7iv5v5il-python-exceptiongroup-1.1.1
2024-04-04 17:15:33 GET /nar/zstd/pychjd114abscbqlzcr3s7myf1497vw2-julia-compilersupportlibraries-jll-0.4.0%2B1
--8<---------------cut here---------------end--------------->8---
‘guix publish’ replied, but 40s too late (nginx has
“proxy_connect_timeout 10s;” for .narinfo URLs¹).
Notice the 40s pause time between 17:14:51 and 17:15:33. Stop-the-world
GC? Unlikely, because ‘guix publish’ had been running for ~3h, so even
with a leak², it’s hard to believe GC could take this long.
Ludo’.
¹ https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/berlin.scm#n103
² https://issues.guix.gnu.org/69596
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
` (3 preceding siblings ...)
2023-10-10 15:52 ` Ludovic Courtès
@ 2024-04-14 0:15 ` John Kehayias via Bug reports for GNU Guix
4 siblings, 0 replies; 21+ messages in thread
From: John Kehayias via Bug reports for GNU Guix @ 2024-04-14 0:15 UTC (permalink / raw)
To: Ludovic Courtès
Cc: 54447, guix-sysadmin, Maxim Cournoyer, Mathieu Othacehe
Hi all,
On Thu, Apr 04, 2024 at 11:33 PM, Ludovic Courtès wrote:
> Hello!
>
> News from the everlasting bug!
>
> cannot build missing derivation
> ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’
>
> (From <https://ci.guix.gnu.org/build/3861708/>.)
>
> Why was it missing this time? /var/log/nginx/error.log:
>
> 2024/04/04 17:15:03 [error] 98751#0: *152293778 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.169, server: ci.guix.gnu.org, request: "GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo", host: "141.80.167.131"
>
>
> Oops! (There are dozens of upstream timeouts logged on that minute.)
>
> /var/log/guix-publish.log:
>
> 2024-04-04 17:14:51 GET /nar/lzip/pz39bkq7pd1hgy5rwiynqa33gyjvpgs5-python-pygments-2.12.0
> 2024-04-04 17:14:51 GET /z2xxwwxswdd4b8c8iwmxhqnqbp5nwz09.narinfo
> 2024-04-04 17:14:51 GET /lgyck285bsxzwrnh3x5ix5dwzd3n3wga.narinfo
> 2024-04-04 17:14:51 GET /nar/zstd/jxkglr445f215m2faqz1i2lgmbans4rf-texlive-amsmath-66594-doc
> 2024-04-04 17:15:33 GET /qg5cxb869i42jn7x2dm6k5l41ikkz21w.narinfo
> 2024-04-04 17:15:33 GET /nar/zstd/i2hp3q2pfhsyl0al7z38am7cqpddi4qr-texlive-capt-of-66594-doc
> 2024-04-04 17:15:33 GET /hh0gdbljj3cjdnjbr88kfm21mhys5sy7.narinfo
> 2024-04-04 17:15:33 GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo
> 2024-04-04 17:15:33 GET /yj63wifalfr6sla42h7mkqg011qrl5d0.narinfo
> 2024-04-04 17:15:33 GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo
> 2024-04-04 17:15:33 -> GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo: 404
> 2024-04-04 17:15:33 GET /nar/lzip/6zxlrw15b9dsv73s7v5fqabl7iv5v5il-python-exceptiongroup-1.1.1
> 2024-04-04 17:15:33 GET /nar/zstd/pychjd114abscbqlzcr3s7myf1497vw2-julia-compilersupportlibraries-jll-0.4.0%2B1
>
> ‘guix publish’ replied, but 40s too late (nginx has
> “proxy_connect_timeout 10s;” for .narinfo URLs¹).
>
> Notice the 40s pause time between 17:14:51 and 17:15:33. Stop-the-world
> GC? Unlikely, because ‘guix publish’ had been running for ~3h, so even
> with a leak², it’s hard to believe GC could take this long.
>
> Ludo’.
>
> ¹
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/berlin.scm#n103
> ² https://issues.guix.gnu.org/69596
I don't have any insight, but if anyone wants to see this in action at a
large scale, take look at pretty much any red dot on
https://ci.guix.gnu.org/eval/1238471/dashboard?system=i686-linux
From my quick look all the CL and texlive failures were all missing
derivation. I've tried restarting a bunch to get i686 coverage going, so
hopefully some will disappear. But I can't/won't manually restart the
thousands(?) of failed builds. I didn't see such issues on x86_64, while
other architectures take a really long time to build on Berlin so I
haven't looked.
I don't know if this is helpful, but thought I would chime in if anyone
wants potentially a bunch of data. And if there are good ideas to
recover (just restart all builds?) that would be great so mesa-updates
will be build on i686 since otherwise it looks good.
Thanks!
John
^ permalink raw reply [flat|nested] 21+ messages in thread
* bug#54447: cuirass: missing derivation error
2024-04-04 21:33 ` Ludovic Courtès
@ 2024-07-14 21:49 ` Ludovic Courtès
0 siblings, 0 replies; 21+ messages in thread
From: Ludovic Courtès @ 2024-07-14 21:49 UTC (permalink / raw)
To: 54447; +Cc: Mathieu Othacehe, guix-sysadmin, Maxim Cournoyer
Hi!
Ludovic Courtès <ludo@gnu.org> skribis:
> News from the everlasting bug!
>
> cannot build missing derivation ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’
[...]
> ‘guix publish’ replied, but 40s too late (nginx has
> “proxy_connect_timeout 10s;” for .narinfo URLs¹).
While the exact reason why ‘guix publish’ exhibits this behavior is
unclear, the good news is that this is “fixed” by having ‘cuirass
remote-worker’ retry when it fails to substitute a .drv (thanks Chris
for the obvious-in-hindsight tip!):
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=2365ba786c805477fcbae6eaeb358b0dd0501598
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=71426663f6ea32152782645e4632168dd2b18602
Furthermore, workers can now reject builds if they fail to substitute
the .drv, in which case ‘cuirass remote-server’ either reschedules or
cancels the build:
https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=a909fa99340db5e5cd64612ea4e07e929dc643ad
This has been deployed a few days ago on berlin and on its x86_64 build
machines. Working well so far!
Ludo’.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2024-07-14 21:52 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
2022-08-10 9:43 ` Maxime Devos
2022-08-10 15:30 ` Maxime Devos
2022-12-10 10:57 ` Ludovic Courtès
2023-10-15 20:21 ` Ludovic Courtès
2023-10-15 20:34 ` Ludovic Courtès
2023-08-22 3:38 ` Maxim Cournoyer
2023-08-22 20:38 ` Ludovic Courtès
2023-08-30 12:17 ` 宋文武 via Bug reports for GNU Guix
2023-10-11 3:21 ` Maxim Cournoyer
2023-10-15 16:45 ` Ludovic Courtès
2023-10-16 13:25 ` Maxim Cournoyer
2023-10-16 17:39 ` Ludovic Courtès
2023-11-20 19:09 ` Maxim Cournoyer
2023-10-10 15:52 ` Ludovic Courtès
2023-10-11 3:08 ` Maxim Cournoyer
2023-10-15 20:42 ` Ludovic Courtès
2023-10-16 17:44 ` Ludovic Courtès
2024-04-04 21:33 ` Ludovic Courtès
2024-07-14 21:49 ` Ludovic Courtès
2024-04-14 0:15 ` John Kehayias via Bug reports for GNU Guix
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.