unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#54447: cuirass: missing derivation error
@ 2022-03-18 12:36 Mathieu Othacehe
  2022-08-10  9:43 ` Maxime Devos
                   ` (4 more replies)
  0 siblings, 5 replies; 20+ messages in thread
From: Mathieu Othacehe @ 2022-03-18 12:36 UTC (permalink / raw)
  To: 54447


Hello,

A lot of builds, among them ~20 system tests[1], are failing with:
"cannot build missing derivation
?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
errors.

Those derivations are present on the CI head node. This means that the
errors occur during substitution. This is most likely caused by some
issue with the publish server, because:

- The publish server serves a 404 error. We should get rid once and for
  all of this 404 thing, pushing something like:
  https://issues.guix.gnu.org/50040.

or

- The publish server is not fast enough and hits an Nginx timeout that
  closes the communication.

Any other cause I could be missing?

Thanks,

Mathieu

[1]: https://ci.guix.gnu.org/eval/159975?status=failed




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
@ 2022-08-10  9:43 ` Maxime Devos
  2022-08-10 15:30   ` Maxime Devos
  2022-12-10 10:57 ` Ludovic Courtès
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Maxime Devos @ 2022-08-10  9:43 UTC (permalink / raw)
  To: 54447


[-- Attachment #1.1.1: Type: text/plain, Size: 64 bytes --]

Here's another instance: https://ci.guix.gnu.org/eval/528710


[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2022-08-10  9:43 ` Maxime Devos
@ 2022-08-10 15:30   ` Maxime Devos
  0 siblings, 0 replies; 20+ messages in thread
From: Maxime Devos @ 2022-08-10 15:30 UTC (permalink / raw)
  To: 54447


[-- Attachment #1.1.1.1: Type: text/plain, Size: 559 bytes --]


On 10-08-2022 11:43, Maxime Devos wrote:
> Here's another instance: https://ci.guix.gnu.org/eval/528710
>
More information:

  * non-ASCII does not seem to be set up (see: ?) (looks irrelevant)
  * here are connection failures

Log:

> substitute:
> substitute: ^[[Kupdating substitutes from 'http://141.80.167.131'...   0.0%guix substitute: warning: 141.80.167.131: connection failed: Connection refused
> substitute:
> cannot build missing derivation ?/gnu/store/4gqj2byvj9zz30wzvwkbijpya3vn1bjw-rust-dogged-0.2.0.drv?

Greetings,
Maxime.

[-- Attachment #1.1.1.2: Type: text/html, Size: 1216 bytes --]

[-- Attachment #1.1.2: OpenPGP public key --]
[-- Type: application/pgp-keys, Size: 929 bytes --]

[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 236 bytes --]

^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
  2022-08-10  9:43 ` Maxime Devos
@ 2022-12-10 10:57 ` Ludovic Courtès
  2023-10-15 20:21   ` Ludovic Courtès
  2023-08-22  3:38 ` Maxim Cournoyer
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 20+ messages in thread
From: Ludovic Courtès @ 2022-12-10 10:57 UTC (permalink / raw)
  To: 54447

Mathieu Othacehe <othacehe@gnu.org> skribis:

> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
>
> Those derivations are present on the CI head node. This means that the
> errors occur during substitution. This is most likely caused by some
> issue with the publish server, because:
>
> - The publish server serves a 404 error. We should get rid once and for
>   all of this 404 thing, pushing something like:
>   https://issues.guix.gnu.org/50040.
>
> or
>
> - The publish server is not fast enough and hits an Nginx timeout that
>   closes the communication.

Also being discussed at <https://issues.guix.gnu.org/48468#12>.

Ludo’.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
  2022-08-10  9:43 ` Maxime Devos
  2022-12-10 10:57 ` Ludovic Courtès
@ 2023-08-22  3:38 ` Maxim Cournoyer
  2023-08-22 20:38   ` Ludovic Courtès
  2023-08-30 12:17   ` 宋文武 via Bug reports for GNU Guix
  2023-10-10 15:52 ` Ludovic Courtès
  2024-04-14  0:15 ` John Kehayias via Bug reports for GNU Guix
  4 siblings, 2 replies; 20+ messages in thread
From: Maxim Cournoyer @ 2023-08-22  3:38 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 54447

Hello,

Mathieu Othacehe <othacehe@gnu.org> writes:

> Hello,
>
> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.
>
> Those derivations are present on the CI head node. This means that the
> errors occur during substitution. This is most likely caused by some
> issue with the publish server, because:
>
> - The publish server serves a 404 error. We should get rid once and for
>   all of this 404 thing, pushing something like:
>   https://issues.guix.gnu.org/50040.
>
> or
>
> - The publish server is not fast enough and hits an Nginx timeout that
>   closes the communication.
>
> Any other cause I could be missing?

Looking at multiple of recent 'cannot build missing derivation' build
failures on Cuirass, I see for example:

--8<---------------cut here---------------start------------->8---
substitute: 
substitute: [Kupdating substitutes from 'http://141.80.167.131'...   0.0%
substitute: [Kcould not fetch http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation ?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv?
--8<---------------cut here---------------end--------------->8---

So it seems the error originated from guix-publish being too heavily
under load to produce a timely reply, and the nginx proxy issued a 504
(timeout) error response.

Looking into /var/log/guix-publish.log for a corresponding entry, I
found:

--8<---------------cut here---------------start------------->8---
2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35     159:7  2 (http-write #<<http-server> socket: #<input-output: fi…> …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:35            1 (put-bytevector #<input-output: socket 42> #vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35   1685:16  0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---

So the connection was apparently severed (?), resulting in the "broken
pipe" error.

Here's a different one:

--8<---------------cut here---------------start------------->8---
substitute: 
substitute: [Kupdating substitutes from 'http://141.80.167.131'...   0.0%
substitute: [Kcould not fetch http://141.80.167.131/p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo 504
substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
cannot build missing derivation ?/gnu/store/p2lfyvbxicjqsm4qp6368bx76gp0g948-python-astropy-healpix-0.7.drv?
--8<---------------cut here---------------end--------------->8---

it occurred around the same time, and the failing mode was the same, per
guix-publish.log:

--8<---------------cut here---------------start------------->8---
2023-08-21 23:59:35 GET /p2lfyvbxicjqsm4qp6368bx76gp0g948.narinfo
2023-08-21 23:59:35 In web/server/http.scm:
2023-08-21 23:59:35     159:7  2 (http-write #<<http-server> socket: #<input-output: fi…> …)
2023-08-21 23:59:35 In unknown file:
2023-08-21 23:59:35            1 (put-bytevector #<input-output: socket 50> #vu8(83 # …) …)
2023-08-21 23:59:35 In ice-9/boot-9.scm:
2023-08-21 23:59:35   1685:16  0 (raise-exception _ #:continuable? _)
2023-08-21 23:59:35 In procedure fport_write: Broken pipe
--8<---------------cut here---------------end--------------->8---

I wonder if these could be related to the DDoS protection discovered on
the Berlin network.  I'll keep looking for other, potentially different
occurrences.

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-08-22  3:38 ` Maxim Cournoyer
@ 2023-08-22 20:38   ` Ludovic Courtès
  2023-08-30 12:17   ` 宋文武 via Bug reports for GNU Guix
  1 sibling, 0 replies; 20+ messages in thread
From: Ludovic Courtès @ 2023-08-22 20:38 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 54447

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Looking at multiple of recent 'cannot build missing derivation' build
> failures on Cuirass, I see for example:
>
> substitute: 
> substitute: [Kupdating substitutes from 'http://141.80.167.131'...   0.0%
> substitute: [Kcould not fetch http://141.80.167.131/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo 504
> substitute: updating substitutes from 'http://141.80.167.131'... 100.0%
> cannot build missing derivation ?/gnu/store/rhgrs3ac6h64siz0krqh2ia8kkn3h6ym-python-asdf-standard-1.0.3.drv?
>
>
> So it seems the error originated from guix-publish being too heavily
> under load to produce a timely reply, and the nginx proxy issued a 504
> (timeout) error response.
>
> Looking into /var/log/guix-publish.log for a corresponding entry, I
> found:
>
> 2023-08-21 23:59:35 GET /rhgrs3ac6h64siz0krqh2ia8kkn3h6ym.narinfo
> 2023-08-21 23:59:35 In web/server/http.scm:
> 2023-08-21 23:59:35     159:7  2 (http-write #<<http-server> socket: #<input-output: fi…> …)
> 2023-08-21 23:59:35 In unknown file:
> 2023-08-21 23:59:35            1 (put-bytevector #<input-output: socket 42> #vu8(83 # …) …)
> 2023-08-21 23:59:35 In ice-9/boot-9.scm:
> 2023-08-21 23:59:35   1685:16  0 (raise-exception _ #:continuable? _)
> 2023-08-21 23:59:35 In procedure fport_write: Broken pipe
>
>
> So the connection was apparently severed (?), resulting in the "broken
> pipe" error.

I think it’s just that, when ‘guix publish’ eventually replied, the
client had left, hence EPIPE.

The initial problem does look like ‘guix publish’ being too slow.  Do
the corresponding nginx logs confirm the “backend too slow => 504”
hypothesis?

Thanks for investigating!

Ludo’.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-08-22  3:38 ` Maxim Cournoyer
  2023-08-22 20:38   ` Ludovic Courtès
@ 2023-08-30 12:17   ` 宋文武 via Bug reports for GNU Guix
  2023-10-11  3:21     ` Maxim Cournoyer
  1 sibling, 1 reply; 20+ messages in thread
From: 宋文武 via Bug reports for GNU Guix @ 2023-08-30 12:17 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 54447

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> I wonder if these could be related to the DDoS protection discovered on
> the Berlin network.  I'll keep looking for other, potentially different
> occurrences.


Hello, this one for ddd: https://ci.guix.gnu.org/build/1372655/log/raw

  cannot build missing derivation ?/gnu/store/anzz2p18b7r9x45y350avnk8br2yihi2-ddd-3.4.0.drv?

Restart it on CI still got the same error.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
                   ` (2 preceding siblings ...)
  2023-08-22  3:38 ` Maxim Cournoyer
@ 2023-10-10 15:52 ` Ludovic Courtès
  2023-10-11  3:08   ` Maxim Cournoyer
                     ` (2 more replies)
  2024-04-14  0:15 ` John Kehayias via Bug reports for GNU Guix
  4 siblings, 3 replies; 20+ messages in thread
From: Ludovic Courtès @ 2023-10-10 15:52 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 54447, guix-sysadmin

[-- Attachment #1: Type: text/plain, Size: 1609 bytes --]

Hello!

Mathieu Othacehe <othacehe@gnu.org> skribis:

> A lot of builds, among them ~20 system tests[1], are failing with:
> "cannot build missing derivation
> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
> errors.

I have a disappointingly simple hypothesis for this.  Remember that
“missing derivation” errors happen primarily for system tests.

Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
mcron job, explicitly removes GC roots for things like *-os-encrypted
once they’re more than two days old, as well as GC roots for the
corresponding .drv.

I think this was increasing the likelihood that a .drv would be GC’d by
the time we run the test: under high load¹, it’s plausible that a system
test wouldn’t be built within two days after it’s been queued.

I’m proposing the change below to address this; I don’t think we need
‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
things in ‘guix publish’ cache first and foremost.

Thoughts?

In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
days in practice).  That’s okay, except that it would be safer to delete
GC roots for a .drv if and only if it’s been built already.

Thanks,
Ludo’.

¹ The queue was often processed slowly, with many workers remaining idle
  due to the bug fixed by
  <https://git.savannah.gnu.org/cgit/guix/guix-cuirass.git/commit/?id=40f70d28aed55c404cca6a0760860fb4942e6bee>.


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #2: Type: text/x-patch, Size: 3079 bytes --]

diff --git a/hydra/modules/sysadmin/services.scm b/hydra/modules/sysadmin/services.scm
index fecfdde..e6f2b44 100644
--- a/hydra/modules/sysadmin/services.scm
+++ b/hydra/modules/sysadmin/services.scm
@@ -110,9 +110,7 @@
                               ((guix config) => ,(make-config.scm)))
        #~(begin
            (use-modules (ice-9 ftw)
-                        (srfi srfi-1)
-                        (guix store)
-                        (guix derivations))
+                        (srfi srfi-1))
 
            (define %roots-directory
              "/var/guix/profiles/per-user/cuirass/cuirass")
@@ -157,28 +155,6 @@
                      deleted))
                  deleted))
 
-           (define (root-target root)
-             ;; Return the store item ROOT refers to.
-             (string-append (%store-prefix) "/" (basename root)))
-
-           (define (derivation-referrers store item)
-             ;; Return the referrers of the derivers of ITEM.
-             (let* ((derivers  (valid-derivers store item))
-                    (referrers (append-map (lambda (drv)
-                                             (referrers store drv))
-                                           derivers)))
-               (delete-duplicates referrers)))
-
-           (define (delete-gc-root-for-derivation drv)
-             ;; Delete the GC root for DRV, if any.
-             (catch 'system-error
-               (lambda ()
-                 (let ((item (derivation-path->output-path drv)))
-                   (delete-file
-                    (string-append %roots-directory
-                                   "/" (basename drv)))))
-               (const #f)))
-
            ;; Note: 'scandir' would introduce too much overhead due
            ;; to the large number of entries that it would sort.
            (define deleted
@@ -197,17 +173,7 @@
                (for-each (lambda (file)
                            (display file port)
                            (newline port))
-                         deleted)))
-
-           ;; Since we run 'guix-daemon --gc-keep-outputs
-           ;; --gc-keep-derivations', also remove GC roots for the outputs of
-           ;; derivations that refer to the derivers of DELETED.
-           (for-each delete-gc-root-for-derivation
-                     (with-store store
-                       (append-map (lambda (root)
-                                     (derivation-referrers
-                                      store (root-target root)))
-                                   deleted))))))))
+                         deleted))))))))
 
 (define (gc-jobs threshold)
   "Return the garbage collection mcron jobs.  The garbage collection
@@ -251,8 +217,7 @@ collection instead."
 
    (build-accounts (* build-accounts-to-max-jobs-ratio max-jobs))
    (extra-options (list "--max-jobs" (number->string max-jobs)
-                        "--cores" (number->string cores)
-                        "--gc-keep-outputs" "--gc-keep-derivations"))))
+                        "--cores" (number->string cores)))))
 
 \f
 ;;;

^ permalink raw reply related	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-10 15:52 ` Ludovic Courtès
@ 2023-10-11  3:08   ` Maxim Cournoyer
  2023-10-15 20:42   ` Ludovic Courtès
  2023-10-16 17:44   ` Ludovic Courtès
  2 siblings, 0 replies; 20+ messages in thread
From: Maxim Cournoyer @ 2023-10-11  3:08 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, guix-sysadmin, 54447

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Hello!
>
> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> A lot of builds, among them ~20 system tests[1], are failing with:
>> "cannot build missing derivation
>> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
>> errors.
>
> I have a disappointingly simple hypothesis for this.  Remember that
> “missing derivation” errors happen primarily for system tests.
>
> Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
> mcron job, explicitly removes GC roots for things like *-os-encrypted
> once they’re more than two days old, as well as GC roots for the
> corresponding .drv.
>
> I think this was increasing the likelihood that a .drv would be GC’d by
> the time we run the test: under high load¹, it’s plausible that a system
> test wouldn’t be built within two days after it’s been queued.
>
> I’m proposing the change below to address this; I don’t think we need
> ‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
> things in ‘guix publish’ cache first and foremost.
>
> Thoughts?

Ah, so that mcron job is kind of a hack to hasten garbage collecting
only *some* items faster than the default policy of 30 days?  And we'd
now avoid deleting selected .drv files while still deleting their
outputs, so in the case something that needs it took more than 2 days to
build, it could lead to having to rebuild the garbage collected outputs?

I'm not sure if we need such a fancy hack with the 100 TiB of data we
now have, but your fix seems reasonable (LGTM!)

> In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
> procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
> days in practice).  That’s okay, except that it would be safer to delete
> GC roots for a .drv if and only if it’s been built already.

Hm.  I wonder if this could explain the other cases we've seen.  It
could be that building a derivation was interrupted or canceled for some
reason, then 30 days elapsed, then was garbage collected, and after
which it doesn't get recreated and we get the error of the missing .drv?

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-08-30 12:17   ` 宋文武 via Bug reports for GNU Guix
@ 2023-10-11  3:21     ` Maxim Cournoyer
  2023-10-15 16:45       ` Ludovic Courtès
  0 siblings, 1 reply; 20+ messages in thread
From: Maxim Cournoyer @ 2023-10-11  3:21 UTC (permalink / raw)
  To: 宋文武; +Cc: Mathieu Othacehe, Ludovic Courtès, 54447

Hello,

宋文武 <iyzsong@envs.net> writes:

[...]

> Hello, this one for ddd: https://ci.guix.gnu.org/build/1372655/log/raw
>
>   cannot build missing derivation ?/gnu/store/anzz2p18b7r9x45y350avnk8br2yihi2-ddd-3.4.0.drv?
>
> Restart it on CI still got the same error.

Another example: https://ci.guix.gnu.org/build/1982454/details

--8<---------------cut here---------------start------------->8---
substitute: 
substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
--8<---------------cut here---------------end--------------->8---

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-11  3:21     ` Maxim Cournoyer
@ 2023-10-15 16:45       ` Ludovic Courtès
  2023-10-16 13:25         ` Maxim Cournoyer
  2023-11-20 19:09         ` Maxim Cournoyer
  0 siblings, 2 replies; 20+ messages in thread
From: Ludovic Courtès @ 2023-10-15 16:45 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 宋文武, 54447

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Another example: https://ci.guix.gnu.org/build/1982454/details
>
> substitute: 
> substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
> cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?

This one is from Sep. 9, which is before I deployed the remote-worker
fixes, so I’ll dismiss it (happy to look at more recent ones though!).

Tip of the day: M-: (build-farm-build 1982454)

Ludo’.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2022-12-10 10:57 ` Ludovic Courtès
@ 2023-10-15 20:21   ` Ludovic Courtès
  2023-10-15 20:34     ` Ludovic Courtès
  0 siblings, 1 reply; 20+ messages in thread
From: Ludovic Courtès @ 2023-10-15 20:21 UTC (permalink / raw)
  To: 54447

Hi!

Ludovic Courtès <ludo@gnu.org> skribis:

> Mathieu Othacehe <othacehe@gnu.org> skribis:
>
>> A lot of builds, among them ~20 system tests[1], are failing with:
>> "cannot build missing derivation
>> ?/gnu/store/hs6kp1lqgymhyp3jndc0dsp0pn4psgv0-gui-installed-desktop-os-encrypted.drv?"
>> errors.
>>
>> Those derivations are present on the CI head node. This means that the
>> errors occur during substitution. This is most likely caused by some
>> issue with the publish server, because:
>>
>> - The publish server serves a 404 error. We should get rid once and for
>>   all of this 404 thing, pushing something like:
>>   https://issues.guix.gnu.org/50040.
>>
>> or
>>
>> - The publish server is not fast enough and hits an Nginx timeout that
>>   closes the communication.
>
> Also being discussed at <https://issues.guix.gnu.org/48468#12>.

I got confirmation that the cache-bypass-threshold hypothesis holds, at
least for system tests.

Namely, looking at <https://ci.guix.gnu.org/build/2258097/details>,
which ends like this:

--8<---------------cut here---------------start------------->8---
@ substituter-succeeded /gnu/store/qh2876i5l1wvxgwhg9fbl9zmb3px3n2m-gc-roots.drv
fetching path `/gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder'...
@ substituter-started /gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder substitute
Downloading http://141.80.167.131/nar/lzip/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder...
.^[[K xdg-mime-database-builder                    3.6MiB/s 00:00 | 3KiB transferred.^[[K xdg-mime-database-builder                    1.9MiB/s 00:00 | 3KiB transferred

@ substituter-succeeded /gnu/store/fh9dnmrfsz429pwqmvsjnk0snlm959kc-xdg-mime-database-builder
cannot build missing derivation ‘/gnu/store/4r1wij3bzj9zv75ds82a93jl7bcman2x-installed-extlinux-os.drv’
--8<---------------cut here---------------end--------------->8---

Looking at the nginx and ‘guix publish’ logs, I found that the missing
substitute is not that of 4r1wij3bzj9zv75ds82a93jl7bcman2x (the .drv
itself) but rather that of a dependency of that .drv:

  [14/Oct/2023:23:22:09 +0200] "GET /wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg.narinfo HTTP/1.1" 404 58 "-" "GNU Guile"

That item’s size is above the cache bypass threshold of 100 MiB as
currently configured on berlin:

--8<---------------cut here---------------start------------->8---
$ du -hs /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
124M    /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
--8<---------------cut here---------------end--------------->8---

The immediate fix/workaround is to raise that threshold.

A better solution would be for system tests to depend on a fixed-output
derivation for the Guix source instead of the “source” above (I use
“source” as it is used in the context of <derivation>).

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-15 20:21   ` Ludovic Courtès
@ 2023-10-15 20:34     ` Ludovic Courtès
  0 siblings, 0 replies; 20+ messages in thread
From: Ludovic Courtès @ 2023-10-15 20:34 UTC (permalink / raw)
  To: 54447

Ludovic Courtès <ludo@gnu.org> skribis:

> Looking at the nginx and ‘guix publish’ logs, I found that the missing
> substitute is not that of 4r1wij3bzj9zv75ds82a93jl7bcman2x (the .drv
> itself) but rather that of a dependency of that .drv:
>
>   [14/Oct/2023:23:22:09 +0200] "GET /wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg.narinfo HTTP/1.1" 404 58 "-" "GNU Guile"
>
> That item’s size is above the cache bypass threshold of 100 MiB as
> currently configured on berlin:
>
> $ du -hs /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
> 124M    /gnu/store/wqqzcxrhbnv0nzg64iiiqd5grr4vk2zg-guix-5a6b1a5
>
> The immediate fix/workaround is to raise that threshold.

I raised the threshold to 150 MiB in maintenance.git commit
213384e43de63ce3a5a55599e8fb89891ffef7eb.

I reconfigured berlin and restarted ‘guix publish’ seconds ago.
Hopefully next time installation tests won’t have that problem.

Ludo’.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-10 15:52 ` Ludovic Courtès
  2023-10-11  3:08   ` Maxim Cournoyer
@ 2023-10-15 20:42   ` Ludovic Courtès
  2023-10-16 17:44   ` Ludovic Courtès
  2 siblings, 0 replies; 20+ messages in thread
From: Ludovic Courtès @ 2023-10-15 20:42 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 54447, guix-sysadmin

Ludovic Courtès <ludo@gnu.org> skribis:

> In addition to the mcron job, Cuirass’s own ‘register-gc-roots’
> procedure periodically deletes GC roots older than ‘%gc-roots-ttl’ (30
> days in practice).  That’s okay, except that it would be safer to delete
> GC roots for a .drv if and only if it’s been built already.

Fixed in Cuirass commit 55af0f70c0d4938b8eda777382bbc4d8f5698a37.

Ludo'.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-15 16:45       ` Ludovic Courtès
@ 2023-10-16 13:25         ` Maxim Cournoyer
  2023-10-16 17:39           ` Ludovic Courtès
  2023-11-20 19:09         ` Maxim Cournoyer
  1 sibling, 1 reply; 20+ messages in thread
From: Maxim Cournoyer @ 2023-10-16 13:25 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 宋文武, 54447

Hi,

Ludovic Courtès <ludo@gnu.org> writes:

> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Another example: https://ci.guix.gnu.org/build/1982454/details
>>
>> substitute: 
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
>> cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
>
> This one is from Sep. 9, which is before I deployed the remote-worker
> fixes, so I’ll dismiss it (happy to look at more recent ones though!).
>
> Tip of the day: M-: (build-farm-build 1982454)

I don't have such a function in scope, is this from the guix-emacs
package?

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-16 13:25         ` Maxim Cournoyer
@ 2023-10-16 17:39           ` Ludovic Courtès
  0 siblings, 0 replies; 20+ messages in thread
From: Ludovic Courtès @ 2023-10-16 17:39 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: Mathieu Othacehe, 宋文武, 54447

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

>> Tip of the day: M-: (build-farm-build 1982454)
>
> I don't have such a function in scope, is this from the guix-emacs
> package?

It’s from the ‘emacs-build-farm’ package, which I recommend.  :-)

Ludo’.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-10 15:52 ` Ludovic Courtès
  2023-10-11  3:08   ` Maxim Cournoyer
  2023-10-15 20:42   ` Ludovic Courtès
@ 2023-10-16 17:44   ` Ludovic Courtès
  2024-04-04 21:33     ` Ludovic Courtès
  2 siblings, 1 reply; 20+ messages in thread
From: Ludovic Courtès @ 2023-10-16 17:44 UTC (permalink / raw)
  To: Mathieu Othacehe; +Cc: 54447, guix-sysadmin, Maxim Cournoyer

Ludovic Courtès <ludo@gnu.org> skribis:

> Turns out that ‘cleanup-cuirass-roots’ in maintenance.git, used as an
> mcron job, explicitly removes GC roots for things like *-os-encrypted
> once they’re more than two days old, as well as GC roots for the
> corresponding .drv.
>
> I think this was increasing the likelihood that a .drv would be GC’d by
> the time we run the test: under high load¹, it’s plausible that a system
> test wouldn’t be built within two days after it’s been queued.
>
> I’m proposing the change below to address this; I don’t think we need
> ‘--gc-keep-outputs --gc-keep-derivations’ anymore now that we keep
> things in ‘guix publish’ cache first and foremost.

I pushed a variant of this patch:

  053839d hydra: services: Leave “guix-binary.tar.xz” GC roots.
  e40d961 hydra: services: Preserve Cuirass .drv GC roots.
  b8fc66c hydra: cuirass: Fix build product regexps.

I didn’t dare remove “--gc-keep-derivations”.  I reconfigured berlin
just now from this commit and restarted mcron (I didn’t restart
guix-daemon to avoid downtime; we should do that when the queue is close
to empty).

We’ll have to monitor disk usage to make sure it’s not negatively
affected.

Ludo’.




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-15 16:45       ` Ludovic Courtès
  2023-10-16 13:25         ` Maxim Cournoyer
@ 2023-11-20 19:09         ` Maxim Cournoyer
  1 sibling, 0 replies; 20+ messages in thread
From: Maxim Cournoyer @ 2023-11-20 19:09 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: Mathieu Othacehe, 宋文武, 54447

Hi Ludovic,

Ludovic Courtès <ludo@gnu.org> writes:

> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> Another example: https://ci.guix.gnu.org/build/1982454/details
>>
>> substitute: 
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'...   0.0%
>> substitute: [Kupdating substitutes from 'http://10.0.0.1'... 100.0%
>> cannot build missing derivation ?/gnu/store/vwhgs9dkj9spryglb180j27dr5vidjxv-ecl-23.9.9.drv?
>
> This one is from Sep. 9, which is before I deployed the remote-worker
> fixes, so I’ll dismiss it (happy to look at more recent ones though!).

Here's a more recent occurrence:
https://ci.guix.gnu.org/build/2635272/details

I haven't restarted it to leave proof of its existence :-)

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2023-10-16 17:44   ` Ludovic Courtès
@ 2024-04-04 21:33     ` Ludovic Courtès
  0 siblings, 0 replies; 20+ messages in thread
From: Ludovic Courtès @ 2024-04-04 21:33 UTC (permalink / raw)
  To: 54447; +Cc: Mathieu Othacehe, guix-sysadmin, Maxim Cournoyer

Hello!

News from the everlasting bug!

  cannot build missing derivation ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’

(From <https://ci.guix.gnu.org/build/3861708/>.)

Why was it missing this time?  /var/log/nginx/error.log:

--8<---------------cut here---------------start------------->8---
2024/04/04 17:15:03 [error] 98751#0: *152293778 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.169, server: ci.guix.gnu.org, request: "GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo", host: "141.80.167.131"
--8<---------------cut here---------------end--------------->8---

Oops!  (There are dozens of upstream timeouts logged on that minute.)

/var/log/guix-publish.log:

--8<---------------cut here---------------start------------->8---
2024-04-04 17:14:51 GET /nar/lzip/pz39bkq7pd1hgy5rwiynqa33gyjvpgs5-python-pygments-2.12.0
2024-04-04 17:14:51 GET /z2xxwwxswdd4b8c8iwmxhqnqbp5nwz09.narinfo
2024-04-04 17:14:51 GET /lgyck285bsxzwrnh3x5ix5dwzd3n3wga.narinfo
2024-04-04 17:14:51 GET /nar/zstd/jxkglr445f215m2faqz1i2lgmbans4rf-texlive-amsmath-66594-doc
2024-04-04 17:15:33 GET /qg5cxb869i42jn7x2dm6k5l41ikkz21w.narinfo
2024-04-04 17:15:33 GET /nar/zstd/i2hp3q2pfhsyl0al7z38am7cqpddi4qr-texlive-capt-of-66594-doc
2024-04-04 17:15:33 GET /hh0gdbljj3cjdnjbr88kfm21mhys5sy7.narinfo
2024-04-04 17:15:33 GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo
2024-04-04 17:15:33 GET /yj63wifalfr6sla42h7mkqg011qrl5d0.narinfo
2024-04-04 17:15:33 GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo
2024-04-04 17:15:33 -> GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo: 404
2024-04-04 17:15:33 GET /nar/lzip/6zxlrw15b9dsv73s7v5fqabl7iv5v5il-python-exceptiongroup-1.1.1
2024-04-04 17:15:33 GET /nar/zstd/pychjd114abscbqlzcr3s7myf1497vw2-julia-compilersupportlibraries-jll-0.4.0%2B1
--8<---------------cut here---------------end--------------->8---

‘guix publish’ replied, but 40s too late (nginx has
“proxy_connect_timeout 10s;” for .narinfo URLs¹).

Notice the 40s pause time between 17:14:51 and 17:15:33.  Stop-the-world
GC?  Unlikely, because ‘guix publish’ had been running for ~3h, so even
with a leak², it’s hard to believe GC could take this long.

Ludo’.

¹ https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/berlin.scm#n103
² https://issues.guix.gnu.org/69596




^ permalink raw reply	[flat|nested] 20+ messages in thread

* bug#54447: cuirass: missing derivation error
  2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
                   ` (3 preceding siblings ...)
  2023-10-10 15:52 ` Ludovic Courtès
@ 2024-04-14  0:15 ` John Kehayias via Bug reports for GNU Guix
  4 siblings, 0 replies; 20+ messages in thread
From: John Kehayias via Bug reports for GNU Guix @ 2024-04-14  0:15 UTC (permalink / raw)
  To: Ludovic Courtès
  Cc: 54447, guix-sysadmin, Maxim Cournoyer, Mathieu Othacehe

Hi all,

On Thu, Apr 04, 2024 at 11:33 PM, Ludovic Courtès wrote:

> Hello!
>
> News from the everlasting bug!
>
>   cannot build missing derivation
> ‘/gnu/store/dfgc46q3l8wlnymv49a1wjnxypin8p0y-plink-1.07.drv’
>
> (From <https://ci.guix.gnu.org/build/3861708/>.)
>
> Why was it missing this time?  /var/log/nginx/error.log:
>
> 2024/04/04 17:15:03 [error] 98751#0: *152293778 upstream timed out (110: Connection timed out) while reading response header from upstream, client: 141.80.167.169, server: ci.guix.gnu.org, request: "GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo HTTP/1.1", upstream: "http://127.0.0.1:3000/dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo", host: "141.80.167.131"
>
>
> Oops!  (There are dozens of upstream timeouts logged on that minute.)
>
> /var/log/guix-publish.log:
>
> 2024-04-04 17:14:51 GET /nar/lzip/pz39bkq7pd1hgy5rwiynqa33gyjvpgs5-python-pygments-2.12.0
> 2024-04-04 17:14:51 GET /z2xxwwxswdd4b8c8iwmxhqnqbp5nwz09.narinfo
> 2024-04-04 17:14:51 GET /lgyck285bsxzwrnh3x5ix5dwzd3n3wga.narinfo
> 2024-04-04 17:14:51 GET /nar/zstd/jxkglr445f215m2faqz1i2lgmbans4rf-texlive-amsmath-66594-doc
> 2024-04-04 17:15:33 GET /qg5cxb869i42jn7x2dm6k5l41ikkz21w.narinfo
> 2024-04-04 17:15:33 GET /nar/zstd/i2hp3q2pfhsyl0al7z38am7cqpddi4qr-texlive-capt-of-66594-doc
> 2024-04-04 17:15:33 GET /hh0gdbljj3cjdnjbr88kfm21mhys5sy7.narinfo
> 2024-04-04 17:15:33 GET /dfgc46q3l8wlnymv49a1wjnxypin8p0y.narinfo
> 2024-04-04 17:15:33 GET /yj63wifalfr6sla42h7mkqg011qrl5d0.narinfo
> 2024-04-04 17:15:33 GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo
> 2024-04-04 17:15:33 -> GET /h2s2g2adxbnd67g34mnjnpcr6p3nhr69.narinfo: 404
> 2024-04-04 17:15:33 GET /nar/lzip/6zxlrw15b9dsv73s7v5fqabl7iv5v5il-python-exceptiongroup-1.1.1
> 2024-04-04 17:15:33 GET /nar/zstd/pychjd114abscbqlzcr3s7myf1497vw2-julia-compilersupportlibraries-jll-0.4.0%2B1
>
> ‘guix publish’ replied, but 40s too late (nginx has
> “proxy_connect_timeout 10s;” for .narinfo URLs¹).
>
> Notice the 40s pause time between 17:14:51 and 17:15:33.  Stop-the-world
> GC?  Unlikely, because ‘guix publish’ had been running for ~3h, so even
> with a leak², it’s hard to believe GC could take this long.
>
> Ludo’.
>
> ¹
> https://git.savannah.gnu.org/cgit/guix/maintenance.git/tree/hydra/nginx/berlin.scm#n103
> ² https://issues.guix.gnu.org/69596

I don't have any insight, but if anyone wants to see this in action at a
large scale, take look at pretty much any red dot on
https://ci.guix.gnu.org/eval/1238471/dashboard?system=i686-linux

From my quick look all the CL and texlive failures were all missing
derivation. I've tried restarting a bunch to get i686 coverage going, so
hopefully some will disappear. But I can't/won't manually restart the
thousands(?) of failed builds. I didn't see such issues on x86_64, while
other architectures take a really long time to build on Berlin so I
haven't looked.

I don't know if this is helpful, but thought I would chime in if anyone
wants potentially a bunch of data. And if there are good ideas to
recover (just restart all builds?) that would be great so mesa-updates
will be build on i686 since otherwise it looks good.

Thanks!
John





^ permalink raw reply	[flat|nested] 20+ messages in thread

end of thread, other threads:[~2024-04-14  0:17 UTC | newest]

Thread overview: 20+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-03-18 12:36 bug#54447: cuirass: missing derivation error Mathieu Othacehe
2022-08-10  9:43 ` Maxime Devos
2022-08-10 15:30   ` Maxime Devos
2022-12-10 10:57 ` Ludovic Courtès
2023-10-15 20:21   ` Ludovic Courtès
2023-10-15 20:34     ` Ludovic Courtès
2023-08-22  3:38 ` Maxim Cournoyer
2023-08-22 20:38   ` Ludovic Courtès
2023-08-30 12:17   ` 宋文武 via Bug reports for GNU Guix
2023-10-11  3:21     ` Maxim Cournoyer
2023-10-15 16:45       ` Ludovic Courtès
2023-10-16 13:25         ` Maxim Cournoyer
2023-10-16 17:39           ` Ludovic Courtès
2023-11-20 19:09         ` Maxim Cournoyer
2023-10-10 15:52 ` Ludovic Courtès
2023-10-11  3:08   ` Maxim Cournoyer
2023-10-15 20:42   ` Ludovic Courtès
2023-10-16 17:44   ` Ludovic Courtès
2024-04-04 21:33     ` Ludovic Courtès
2024-04-14  0:15 ` John Kehayias via Bug reports for GNU Guix

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).