unofficial mirror of bug-guix@gnu.org 
 help / color / mirror / code / Atom feed
* bug#66692: [cuirass] Spurious failed builds following patching u-boot sources
@ 2023-10-23  2:57 Maxim Cournoyer
  2023-10-23 20:49 ` Ludovic Courtès
       [not found] ` <handler.66692.D66692.169867163715469.notifdone@debbugs.gnu.org>
  0 siblings, 2 replies; 8+ messages in thread
From: Maxim Cournoyer @ 2023-10-23  2:57 UTC (permalink / raw)
  To: 66692

Hi,

I've noticed build failure events from Cuirass such as:

 --8<---------------cut here---------------start------------->8---
. [   ?: Cuirass                ] Build u-boot-sifive-unmatched.x86_64-linux on master is broken.
R. [   ?: Cuirass                ] Build u-boot-am335x-boneblack.x86_64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-a20-olinuxino-lime.x86_64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-wandboard.powerpc64le-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-qemu-riscv64-smode.aarch64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-rpi-2-efi.powerpc64le-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-qemu-arm.x86_64-linux on master is broken.
 . [   ?: Cuirass                ] Build u-boot-rpi-2.x86_64-linux on master is broken.
 --8<---------------cut here---------------end--------------->8---

There are a few more, and they were triggered by applying a patch (that
only touches the patman tool) in this commit:
<https://git.savannah.gnu.org/cgit/guix.git/log/?id=faeebdc37e9137d62e11175e83988f77496a6641>.

The build failures are not captured in the logs, which appear truncated,
e.g. from <https://ci.guix.gnu.org/build/2294052/details> (log empty --
builder: sjd-p9, worker N38vG7OW) or from
<https://ci.guix.gnu.org/build/2294218/details> (builder:
hydra-guix-107, worker 4CWieLpQ) which ends with:

--8<---------------cut here---------------start------------->8---
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_fixsfsi_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_fixsfsi.visT
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_fixunssfsi_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_fixunssfsi.visT
mv -f _arm_truncdfsf2.visT _arm_truncdfsf2.vis
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_floatdidf_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_floatdidf.visT
mv -f _arm_negsf2.visT _arm_negsf2.vis
mv -f _arm_addsubsf3.visT _arm_addsubsf3.vis
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_floatdisf_s.o | gawk 'NF == 3 && $2 !~ /^[UN]$/ && $3 !~ /.*_compat/ && $3 !~ /.*@.*/ { print "\t.hidden", $3 }' > _arm_floatdisf.visT
/tmp/guix-build-gcc-cross-arm-linux-gnueabihf-11.3.0.drv-0/build/./gcc/nm -pg _arm_floatundi
--8<---------------cut here---------------end--------------->8---

These appear to be some problem with the CI software (probably not
hardware, as many workers across different architectures suffered the
same problem) rather than packaging problems.

I didn't see anything useful in /var/log/cuirass.log or
/var/log/cuirass-remote-worker.log berlin.  I haven't checked the logs
on the builders.

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66692: [cuirass] Spurious failed builds following patching u-boot sources
  2023-10-23  2:57 bug#66692: [cuirass] Spurious failed builds following patching u-boot sources Maxim Cournoyer
@ 2023-10-23 20:49 ` Ludovic Courtès
  2023-10-23 21:14   ` Ludovic Courtès
  2023-10-28  9:55   ` Ludovic Courtès
       [not found] ` <handler.66692.D66692.169867163715469.notifdone@debbugs.gnu.org>
  1 sibling, 2 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-23 20:49 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 66692

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> The build failures are not captured in the logs, which appear truncated,
> e.g. from <https://ci.guix.gnu.org/build/2294052/details> (log empty --
> builder: sjd-p9, worker N38vG7OW) 

On sjd-p9, I get:

--8<---------------cut here---------------start------------->8---
root@sjd-p9:~# journalctl -u cuirass-remote-worker.service |grep -C2 s167lh0w0azfxx83ri8k0mn2gj57i5rl
Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: derivation /gnu/store/kcv6hfqz94f1vdw24viw7d9ih6v9zi1j-libkmahjongg-23.04.3.drv build succeeded.
Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: request work.
Oct 23 00:06:21 sjd-p9 cuirass[531]: N38vG7OW: building derivation `/gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv' (system: powerpc64le-linux)
Oct 23 00:06:32 sjd-p9 cuirass[531]: C2q41fts: ping tcp://10.0.0.1:5555.
Oct 23 00:06:32 sjd-p9 cuirass[531]: BrnR2Sju: ping tcp://10.0.0.1:5555.
--
Oct 23 00:34:16 sjd-p9 cuirass[531]: vA6cYbb5: request work.
Oct 23 00:34:16 sjd-p9 cuirass[531]: vA6cYbb5: building derivation `/gnu/store/lj4h1g1zq235l6i8mjh88s1ylvj7ksxd-u-boot-pine64-plus-2023.07.02.drv' (system: powerpc64le-linux)
Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: derivation /gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv build failed.
Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: request work.
Oct 23 00:34:29 sjd-p9 cuirass[531]: N38vG7OW: building derivation `/gnu/store/dy8g5pi1gv8dza1av9957ic6zzj4l88j-u-boot-tools-2023.07.02.drv' (system: powerpc64le-linux)
root@sjd-p9:~# guix build --log-file /gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv
https://ci.guix.gnu.org/log/0yzfwl2b017gsnf60wvr7dnn2g26wrj7-u-boot-rpi-4-32b-2023.07.02
--8<---------------cut here---------------end--------------->8---

So it looks as though the build was attempted and failed, but there’s no
local build log in /var/log/guix/drvs, meaning that, most likely, no
build process was even started.

The log of guix-daemon provide few hints:

--8<---------------cut here---------------start------------->8---
root@sjd-p9:~# journalctl -u guix-daemon |grep 'Oct 23 00:'

[...]

Oct 23 00:06:21 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:06:21 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:06:55 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:07:51 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:16:25 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:17:15 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:33:42 sjd-p9 guix-daemon[2097426]: SIGPOLL
Oct 23 00:33:42 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:33:49 sjd-p9 guix-daemon[2097426]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:16 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:16 sjd-p9 guix-daemon[1990649]: SIGPOLL
Oct 23 00:34:23 sjd-p9 guix-daemon[1990649]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:29 sjd-p9 guix-daemon[2052532]: SIGPOLL
Oct 23 00:34:29 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:30 sjd-p9 guix-daemon[2052532]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:38 sjd-p9 guix-daemon[1996946]: SIGPOLL
Oct 23 00:34:38 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:38 sjd-p9 guix-daemon[1996946]: unexpected build daemon error: interrupted by the user
Oct 23 00:34:45 sjd-p9 guix-daemon[2248556]: SIGPOLL
Oct 23 00:34:45 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:34:45 sjd-p9 guix-daemon[2248556]: unexpected build daemon error: interrupted by the user
Oct 23 00:35:33 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
Oct 23 00:36:46 sjd-p9 guix-daemon[534]: accepted connection from pid 531, user root
--8<---------------cut here---------------end--------------->8---

One possibility would have been that ‘guix gc’ was running, preventing
the build process from starting (“big GC lock” taken); however, if this
were the case, we’d see a connection from a different process in the
guix-daemon log.

> or from <https://ci.guix.gnu.org/build/2294218/details> (builder:
> hydra-guix-107, worker 4CWieLpQ) which ends with:

Here we get:

--8<---------------cut here---------------start------------->8---
root@hydra-guix-107 ~# grep -C2 jy463789qf42dn4zk5i5jzry2lf82ypy /var/log/cuirass-remote-worker.log
2023-10-23 00:17:34 vDKeRcHy: no available build.
2023-10-23 00:17:46 4CWieLpQ: request work.
2023-10-23 00:17:46 4CWieLpQ: building derivation `/gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv' (system: x86_64-linux)
2023-10-23 00:18:03 zARGXQMa: request work.
2023-10-23 00:18:03 zARGXQMa: no available build.
--
2023-10-23 00:27:35 vDKeRcHy: request work.
2023-10-23 00:27:35 vDKeRcHy: no available build.
2023-10-23 00:27:42 4CWieLpQ: derivation /gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv build failed.
2023-10-23 00:27:42 4CWieLpQ: request work.
2023-10-23 00:27:42 4CWieLpQ: no available build.
root@hydra-guix-107 ~# ls /var/log/guix/drvs/*/*u-boot-cubietruck-2023*
ls: cannot access '/var/log/guix/drvs/*/*u-boot-cubietruck-2023*': No such file or directory
--8<---------------cut here---------------end--------------->8---

And likewise:

--8<---------------cut here---------------start------------->8---
root@hydra-guix-107 ~# grep -C2 '10-23 00:27:' /var/log/guix-daemon.log
2023-10-23 00:17:46 accepted connection from pid 64022, user root
2023-10-23 00:19:34 accepted connection from pid 64022, user root
2023-10-23 00:27:42 SIGPOLL
2023-10-23 00:27:45 unexpected build daemon error: interrupted by the user
2023-10-23 00:33:33 accepted connection from pid 18016, user hydra
2023-10-23 00:33:34 accepted connection from pid 18035, user hydra
--8<---------------cut here---------------end--------------->8---

But hey, as a matter of fact, ‘cuirass remote-server’ on berlin was
restarted at around that time, as can be seen in
/var/log/cuirass-remote-server.log:

--8<---------------cut here---------------start------------->8---
2023-10-23 00:26:06 0 items queued for eventual download
2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument
2023-10-23 00:26:59 increased maximum number of open files from 1024 to 2048
2023-10-23 00:26:59 listening for build logs on port 5556
2023-10-23 00:26:59 deleting old build logs from '/var/cache/cuirass/remote/'...
2023-10-23 00:28:49 selected 1115 build logs to remove
2023-10-23 00:28:49 WARNING: (cuirass scripts remote-server): imported module (fibers) overrides core binding `sleep'
2023-10-23 00:28:49 starting fetch worker with up to 8 concurrent downloads
2023-10-23 00:28:49 build failed: '/gnu/store/qlmhcyk9ac30464v99i2ys0irs7jz3lg-u-boot-am335x-evm-2023.07.02.drv'
2023-10-23 00:28:49 build failed: '/gnu/store/dl3far2xa4paq6nl6pp9iklpxrwg1wy2-ungoogled-chromium-wayland-112.0.5615.165-1.drv'
2023-10-23 00:28:50 build failed: '/gnu/store/y9fwzzgvyhvrmf0avihm6944j8rvpg6m-u-boot-mx6cuboxi-2023.07.02.drv'
2023-10-23 00:28:50 build failed: '/gnu/store/5bhl5q86j6bkqx8zybq29khndhf050jm-u-boot-rpi-2-2023.07.02.drv'
2023-10-23 00:28:50 build failed: '/gnu/store/16d5vq3dlzy1f4s7vazlfg6358bl8dlc-u-boot-qemu-arm-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/i88vczhcnlz462cmpls8qgx5n3ssw7rc-u-boot-qemu-arm-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/jy463789qf42dn4zk5i5jzry2lf82ypy-u-boot-cubietruck-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/i1ri2nkh1bcfzynmggbjg0rjlyi121y8-u-boot-rpi-3-32b-2023.07.02.drv'
2023-10-23 00:28:51 build failed: '/gnu/store/q08xw052r484c1c1nj31sx0fg5yf2py9-openjdk-18.0.2.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/5j4pp2b5dx2ykw4na2wbmiwp84g0hi4i-u-boot-am335x-boneblack-2023.07.02.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/nqp61id4bavvcsib307rw5234myl7ykr-u-boot-rpi-3-32b-efi-2023.07.02.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/9h7x080cn44ik79mrzycbg2x0dvvgigc-u-boot-a20-olinuxino-lime-2023.07.02.drv'
2023-10-23 00:28:52 build failed: '/gnu/store/mkwy8ywa6mrw5zdxwr88skyw94afgaqv-u-boot-novena-2023.07.02.drv'
2023-10-23 00:28:53 build failed: '/gnu/store/lfjp146k0z0w2yqd3y6m2mv09fq01948-u-boot-sifive-unmatched-2023.07.02.drv'
2023-10-23 00:28:53 build failed: '/gnu/store/425ykg0ci4i612ryc7z79as94vipp9rb-u-boot-rpi-2-efi-2023.07.02.drv'
2023-10-23 00:28:57 build failed: '/gnu/store/7fyq24jsmvclnapjgf1li43w6dmm5n06-u-boot-mx6cuboxi-2023.07.02.drv'
[…]
2023-10-23 00:34:29 build failed: '/gnu/store/s167lh0w0azfxx83ri8k0mn2gj57i5rl-u-boot-rpi-4-32b-2023.07.02.drv'
--8<---------------cut here---------------end--------------->8---

/var/log/messages shows that ‘remote-server’ failed and was respawned
(the failure manifests as “failed to start worker/database proxy” in the
cuirass-remote-server.log excerpt above):

--8<---------------cut here---------------start------------->8---
Oct 23 00:24:33 localhost vmunix: [12534110.756389] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 2 wanted 61709 found 11
Oct 23 00:24:33 localhost vmunix: [12534110.768173] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 1 wanted 61709 found 11
Oct 23 00:24:34 localhost vmunix: [12534110.779454] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 2 wanted 61709 found 11
Oct 23 00:24:59 localhost ntpd[1953]: Soliciting pool server 81.169.208.5
Oct 23 00:26:04 localhost ntpd[1953]: Soliciting pool server 141.98.136.83
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server (PID 92414) exited with 1. 
Oct 23 00:26:58 localhost shepherd[1]: Respawning cuirass-remote-server. 
Oct 23 00:26:58 localhost shepherd[1]: Starting service user-homes... 
Oct 23 00:26:58 localhost shepherd[1]: Service user-homes has been started. 
Oct 23 00:26:58 localhost shepherd[1]: Service user-homes started. 
Oct 23 00:26:58 localhost shepherd[1]: Service user-homes running with value #t. 
Oct 23 00:26:58 localhost shepherd[1]: Starting service postgres-roles... 
Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles has been started. 
Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles started. 
Oct 23 00:26:58 localhost shepherd[1]: Service postgres-roles running with value #t. 
Oct 23 00:26:58 localhost shepherd[1]: Starting service cuirass-remote-server... 
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server has been started. 
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server started. 
Oct 23 00:26:58 localhost shepherd[1]: Service cuirass-remote-server running with value 63052. 
Oct 23 00:26:59 localhost vmunix: [12534255.969754] validate_extent_buffer: 58 callbacks suppressed
Oct 23 00:26:59 localhost vmunix: [12534255.969764] BTRFS error (device sde2): parent transid verify failed on logical 30457856 mirror 1 wanted 61709 found 11
--8<---------------cut here---------------end--------------->8---

To summarize:

  1. ‘cuirass remote-server’ failed for some unknown reason and was
     respawned.

  2. All the ongoing and about-to-be-started builds carried out by
     worker processes at that point were marked as failed.

We can improve on both parts.

To be continued…

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66692: [cuirass] Spurious failed builds following patching u-boot sources
  2023-10-23 20:49 ` Ludovic Courtès
@ 2023-10-23 21:14   ` Ludovic Courtès
  2023-10-24  0:59     ` Maxim Cournoyer
  2023-10-28  9:55   ` Ludovic Courtès
  1 sibling, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-23 21:14 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 66692

Ludovic Courtès <ludo@gnu.org> skribis:

> /var/log/cuirass-remote-server.log:
>
> 2023-10-23 00:26:06 0 items queued for eventual download
> 2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
> 2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument

EINVAL here most likely comes from replying to a sender with invalid ID,
due to ZMQ_ROUTER_MANDATORY, exactly what Cuirass commit
40f70d28aed55c404cca6a0760860fb4942e6bee fixed and aimed to catch.

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66692: [cuirass] Spurious failed builds following patching u-boot sources
  2023-10-23 21:14   ` Ludovic Courtès
@ 2023-10-24  0:59     ` Maxim Cournoyer
  2023-10-24 20:27       ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Maxim Cournoyer @ 2023-10-24  0:59 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 66692

Hi Ludovic!

Ludovic Courtès <ludo@gnu.org> writes:

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> /var/log/cuirass-remote-server.log:
>>
>> 2023-10-23 00:26:06 0 items queued for eventual download
>> 2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
>> 2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument
>
> EINVAL here most likely comes from replying to a sender with invalid ID,
> due to ZMQ_ROUTER_MANDATORY, exactly what Cuirass commit
> 40f70d28aed55c404cca6a0760860fb4942e6bee fixed and aimed to catch.

Thanks for picking up the investigation.  Does that mean that the above
fix was partial and that there likely is a use after free somewhere
else?

It's good at least that this ZMQ_ROUTER_MANDATORY option catches that.

(Also, these scary Btrfs errors are a reminder that the file system got
corrupted at some point and should be recreated at some point -- I hope
they are not related to the problem at hand but we should keep an eye on
things).

-- 
Thanks,
Maxim




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66692: [cuirass] Spurious failed builds following patching u-boot sources
  2023-10-24  0:59     ` Maxim Cournoyer
@ 2023-10-24 20:27       ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-24 20:27 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 66692

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Ludovic Courtès <ludo@gnu.org> skribis:
>>
>>> /var/log/cuirass-remote-server.log:
>>>
>>> 2023-10-23 00:26:06 0 items queued for eventual download
>>> 2023-10-23 00:26:55 period update: 0 resumable, 0 failed builds.
>>> 2023-10-23 00:26:58 error: failed to start worker/database proxy: Invalid argument
>>
>> EINVAL here most likely comes from replying to a sender with invalid ID,
>> due to ZMQ_ROUTER_MANDATORY, exactly what Cuirass commit
>> 40f70d28aed55c404cca6a0760860fb4942e6bee fixed and aimed to catch.
>
> Thanks for picking up the investigation.  Does that mean that the above
> fix was partial and that there likely is a use after free somewhere
> else?

Yes, the fix in Cuirass above was insufficient, though you had to be
truly unlucky for use-after-free to happen this quickly.

Anyway, I had also proposed a fix in guile-simple-zmq, which the upgrade
in Guix commit 04215779baaa21f10bd5f20999761dc506e449d8 pulls in.

I have yet to reconfigure on berlin.

In the meantime I’ll look at the worker side of this issue.

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66692: [cuirass] Spurious failed builds following patching u-boot sources
  2023-10-23 20:49 ` Ludovic Courtès
  2023-10-23 21:14   ` Ludovic Courtès
@ 2023-10-28  9:55   ` Ludovic Courtès
  2023-10-30 13:13     ` Ludovic Courtès
  1 sibling, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-28  9:55 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 66692

Hi,

Ludovic Courtès <ludo@gnu.org> skribis:

>   2. All the ongoing and about-to-be-started builds carried out by
>      worker processes at that point were marked as failed.

This part of the problem was a bit more subtle.q

When ‘cuirass remote-server’ goes away, ‘cuirass remote-worker’ keeps
going if it was building something.  In that case the build log will be
lost (because the direct connection to the server was cut), but the
worker eventually notifies the server of build success/failure (ZeroMQ,
makes sure those messages are eventually sent, when the server is back).

The bug was that the worker could in some cases, when the server is
down, send a “build failed” message even though the bug had not
completed.

This is fixed by these Cuirass commits:

  2eb3e13 * remote-worker: Discard log anytime ‘send-log’ throws.
  4cc37f5 * remote-worker: Ignore the return value of ‘build-derivations&’.
  3bbb5c8 * store: ‘build-derivations&’ enforces synchronization with the build.

(I have yet to update the ‘cuirass’ package and to deploy it on ci.guix.
It’s been running for a few days at <https://guix.bordeaux.inria.fr>.)

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66692: [cuirass] Spurious failed builds following patching u-boot sources
  2023-10-28  9:55   ` Ludovic Courtès
@ 2023-10-30 13:13     ` Ludovic Courtès
  0 siblings, 0 replies; 8+ messages in thread
From: Ludovic Courtès @ 2023-10-30 13:13 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 66692-done

Ludovic Courtès <ludo@gnu.org> skribis:

> This is fixed by these Cuirass commits:
>
>   2eb3e13 * remote-worker: Discard log anytime ‘send-log’ throws.
>   4cc37f5 * remote-worker: Ignore the return value of ‘build-derivations&’.
>   3bbb5c8 * store: ‘build-derivations&’ enforces synchronization with the build.
>
> (I have yet to update the ‘cuirass’ package and to deploy it on ci.guix.
> It’s been running for a few days at <https://guix.bordeaux.inria.fr>.)

I’ve done that just a couple of hours ago, running Cuirass 1.2.0 on both
berlin and the x86_64 nodes listed in ‘berlin-nodes.scm’.

Ludo’.




^ permalink raw reply	[flat|nested] 8+ messages in thread

* bug#66692: closed (Re: bug#66692: [cuirass] Spurious failed builds following patching u-boot sources)
       [not found] ` <handler.66692.D66692.169867163715469.notifdone@debbugs.gnu.org>
@ 2023-10-31 13:31   ` Maxim Cournoyer
  0 siblings, 0 replies; 8+ messages in thread
From: Maxim Cournoyer @ 2023-10-31 13:31 UTC (permalink / raw)
  Cc: 66692, Ludovic Courtès

Hi,

> Ludovic Courtès <ludo@gnu.org> skribis:
>
>> This is fixed by these Cuirass commits:
>>
>>   2eb3e13 * remote-worker: Discard log anytime ‘send-log’ throws.
>>   4cc37f5 * remote-worker: Ignore the return value of ‘build-derivations&’.
>>   3bbb5c8 * store: ‘build-derivations&’ enforces synchronization with the build.
>>
>> (I have yet to update the ‘cuirass’ package and to deploy it on ci.guix.
>> It’s been running for a few days at <https://guix.bordeaux.inria.fr>.)
>
> I’ve done that just a couple of hours ago, running Cuirass 1.2.0 on both
> berlin and the x86_64 nodes listed in ‘berlin-nodes.scm’.

Awesome, thanks for all the fixes toward Cuirass!

-- 
Maxim




^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2023-10-31 13:32 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-10-23  2:57 bug#66692: [cuirass] Spurious failed builds following patching u-boot sources Maxim Cournoyer
2023-10-23 20:49 ` Ludovic Courtès
2023-10-23 21:14   ` Ludovic Courtès
2023-10-24  0:59     ` Maxim Cournoyer
2023-10-24 20:27       ` Ludovic Courtès
2023-10-28  9:55   ` Ludovic Courtès
2023-10-30 13:13     ` Ludovic Courtès
     [not found] ` <handler.66692.D66692.169867163715469.notifdone@debbugs.gnu.org>
2023-10-31 13:31   ` bug#66692: closed (Re: bug#66692: [cuirass] Spurious failed builds following patching u-boot sources) Maxim Cournoyer

Code repositories for project(s) associated with this public inbox

	https://git.savannah.gnu.org/cgit/guix.git

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).