all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
@ 2023-05-08 10:45 Christopher Baines
  2023-05-10 12:47 ` Christopher Baines
                   ` (2 more replies)
  0 siblings, 3 replies; 9+ messages in thread
From: Christopher Baines @ 2023-05-08 10:45 UTC (permalink / raw)
  To: 63368

[-- Attachment #1: Type: text/plain, Size: 403 bytes --]

Since the recent core-updates merge, I've seen the build coordinator
using less memory, but it's also been crashing in a new way, up to 10
times a day.

In the log, you see something like:

  2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
  2023-05-07 09:15:42 Signals delivery fails constantly

I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
do with this.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-05-08 10:45 bug#63368: Build coordiantor "Signals delivery fails constantly" crashes Christopher Baines
@ 2023-05-10 12:47 ` Christopher Baines
  2023-05-25 15:24 ` Ludovic Courtès
  2024-12-01 14:26 ` Ludovic Courtès
  2 siblings, 0 replies; 9+ messages in thread
From: Christopher Baines @ 2023-05-10 12:47 UTC (permalink / raw)
  To: 63368

[-- Attachment #1: Type: text/plain, Size: 894 bytes --]


Christopher Baines <mail@cbaines.net> writes:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

I think I've found a workaround. I found a list of environment variables
[1] you can set to affect the GC behaviour, and the first one I tried
(GC_RETRY_SIGNALS=0) seems to have had the desired affect, in that the
crashes/restarts have stopped.

1: https://github.com/ivmai/bdwgc/blob/master/docs/README.environment

I've sent a patch [2] to apply this setting as part of the service.

2: https://issues.guix.gnu.org/63417

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-05-08 10:45 bug#63368: Build coordiantor "Signals delivery fails constantly" crashes Christopher Baines
  2023-05-10 12:47 ` Christopher Baines
@ 2023-05-25 15:24 ` Ludovic Courtès
  2023-05-25 15:26   ` Christopher Baines
  2024-12-01 14:26 ` Ludovic Courtès
  2 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2023-05-25 15:24 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 63368

Hi,

Christopher Baines <mail@cbaines.net> skribis:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly
>
> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

Normally on GNU/Linux libgc has:

  #define SIG_SUSPEND SIGPWR

The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
which should normally be fine.

Is there anything else that might interfere with libgc?

Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-05-25 15:24 ` Ludovic Courtès
@ 2023-05-25 15:26   ` Christopher Baines
  2023-06-02 17:07     ` Christopher Baines
  0 siblings, 1 reply; 9+ messages in thread
From: Christopher Baines @ 2023-05-25 15:26 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 63368

[-- Attachment #1: Type: text/plain, Size: 946 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> Since the recent core-updates merge, I've seen the build coordinator
>> using less memory, but it's also been crashing in a new way, up to 10
>> times a day.
>>
>> In the log, you see something like:
>>
>>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>>   2023-05-07 09:15:42 Signals delivery fails constantly
>>
>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>> do with this.
>
> Normally on GNU/Linux libgc has:
>
>   #define SIG_SUSPEND SIGPWR
>
> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
> which should normally be fine.
>
> Is there anything else that might interfere with libgc?

I've seen this issue in both the build coordinator and nar-herder, both
of which use guile-sqlite, so I wonder if that could have something to
do with it.

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-05-25 15:26   ` Christopher Baines
@ 2023-06-02 17:07     ` Christopher Baines
  2023-06-06 15:09       ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Christopher Baines @ 2023-06-02 17:07 UTC (permalink / raw)
  To: 63368; +Cc: Ludovic Courtès

[-- Attachment #1: Type: text/plain, Size: 1796 bytes --]


Christopher Baines <mail@cbaines.net> writes:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Christopher Baines <mail@cbaines.net> skribis:
>>
>>> Since the recent core-updates merge, I've seen the build coordinator
>>> using less memory, but it's also been crashing in a new way, up to 10
>>> times a day.
>>>
>>> In the log, you see something like:
>>>
>>>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>>>   2023-05-07 09:15:42 Signals delivery fails constantly
>>>
>>> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
>>> do with this.
>>
>> Normally on GNU/Linux libgc has:
>>
>>   #define SIG_SUSPEND SIGPWR
>>
>> The Coordinator fiddles with SIGALRM, SIGUSR1, SIGINT, and SIGPIPE,
>> which should normally be fine.
>>
>> Is there anything else that might interfere with libgc?
>
> I've seen this issue in both the build coordinator and nar-herder, both
> of which use guile-sqlite, so I wonder if that could have something to
> do with it.

I've seen this happen with the build coordinator agent now (on
milano-guix-1):

  2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
  2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
  2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
  2023-06-02 19:01:22 Signals delivery fails constantly
  2023-06-02 19:01:29 locale is en_US.utf8
  2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)

Which is a bit more concerning, since the build coordinator agent is
intentionally quite simple (no SQLite for example).

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-06-02 17:07     ` Christopher Baines
@ 2023-06-06 15:09       ` Ludovic Courtès
  2023-06-06 15:19         ` Christopher Baines
  0 siblings, 1 reply; 9+ messages in thread
From: Ludovic Courtès @ 2023-06-06 15:09 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 63368

Christopher Baines <mail@cbaines.net> skribis:

> I've seen this happen with the build coordinator agent now (on
> milano-guix-1):
>
>   2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of build inputs
>   2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ): fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building: /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
>   2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
>   2023-06-02 19:01:22 Signals delivery fails constantly
>   2023-06-02 19:01:29 locale is en_US.utf8
>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>
> Which is a bit more concerning, since the build coordinator agent is
> intentionally quite simple (no SQLite for example).

The closure of (guix-build-coordinator agent) seems to be quite large
still.

Could you check what .so files are loaded by that code, perhaps via
/proc/PID/maps?

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-06-06 15:09       ` Ludovic Courtès
@ 2023-06-06 15:19         ` Christopher Baines
  2023-06-09 13:14           ` Ludovic Courtès
  0 siblings, 1 reply; 9+ messages in thread
From: Christopher Baines @ 2023-06-06 15:19 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 63368

[-- Attachment #1: Type: text/plain, Size: 2861 bytes --]


Ludovic Courtès <ludo@gnu.org> writes:

> Christopher Baines <mail@cbaines.net> skribis:
>
>> I've seen this happen with the build coordinator agent now (on
>> milano-guix-1):
>>
>>   2023-06-02 18:59:55 2023-06-02 18:59:55 (DEBUG):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: checking the availability of
>> build inputs
>>   2023-06-02 18:59:55 2023-06-02 18:59:55 (INFO ):
>> fb9f06cf-cc1d-4493-88b8-3eac9437f5d4: setup successful, building:
>> /gnu/store/7fbrli2a8nzn676q8gz2b0i0y0lr9nxv-r-quasr-1.40.0.drv
>>   2023-06-02 19:00:46 Signals delivery fails constantly at GC #55
>>   2023-06-02 19:01:22 Signals delivery fails constantly
>>   2023-06-02 19:01:29 locale is en_US.utf8
>>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>
>> Which is a bit more concerning, since the build coordinator agent is
>> intentionally quite simple (no SQLite for example).
>
> The closure of (guix-build-coordinator agent) seems to be quite large
> still.
>
> Could you check what .so files are loaded by that code, perhaps via
> /proc/PID/maps?

I think I see these (that's on milano-guix-1 currently):

/gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
/gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
/gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
/gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
/gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
/gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
/gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
/gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
/gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
/gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
/gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
/gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
/gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
/gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
/gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
/gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
/gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 987 bytes --]

^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-06-06 15:19         ` Christopher Baines
@ 2023-06-09 13:14           ` Ludovic Courtès
  0 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2023-06-09 13:14 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 63368

Christopher Baines <mail@cbaines.net> skribis:

> Ludovic Courtès <ludo@gnu.org> writes:
>
>> Christopher Baines <mail@cbaines.net> skribis:

[...]

>>>   2023-06-02 19:01:22 Signals delivery fails constantly
>>>   2023-06-02 19:01:29 locale is en_US.utf8
>>>   2023-06-02 19:01:29 (gnutls version: 3.7.7, guix version: 1.4.0-6.dc5430c)
>>>
>>> Which is a bit more concerning, since the build coordinator agent is
>>> intentionally quite simple (no SQLite for example).
>>
>> The closure of (guix-build-coordinator agent) seems to be quite large
>> still.
>>
>> Could you check what .so files are loaded by that code, perhaps via
>> /proc/PID/maps?
>
> I think I see these (that's on milano-guix-1 currently):
>
> /gnu/store/0i81lpfnn05pmjc5f43q4nfvd27r08f7-guile-gnutls-3.7.12/lib/guile/3.0/extensions/guile-gnutls-v-2.so.0.0.0
> /gnu/store/0jk7sl5xqwwdkzjpp9sxgz9z0d48a3vy-libunistring-1.0/lib/libunistring.so.2.2.0
> /gnu/store/1r1azdi4hvfypnx14d01n60p4aa7g2im-libidn2-2.3.4/lib/libidn2.so.0.3.8
> /gnu/store/1w1r6r56z9lhg8ghcb7lxss6mkn7d5l1-libgc-8.2.2/lib/libgc.so.1.5.1
> /gnu/store/4gvgcfdiz67wv04ihqfa8pqwzsb0qpv5-guile-3.0.9/lib/libguile-3.0.so.1.6.0
> /gnu/store/8y0pwifz8a3d7zbdfzsawa1amf4afx1s-libgcrypt-1.10.1/lib/libgcrypt.so.20.4.1
> /gnu/store/930nwsiysdvy2x5zv1sf6v7ym75z8ayk-gcc-11.3.0-lib/lib/libgcc_s.so.1
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libhogweed.so.6.6
> /gnu/store/c2fx42ial6lr60s96xcbml5hd8vwaxq3-nettle-3.8.1/lib/libnettle.so.8.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/ld-linux-x86-64.so.2
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libcrypt.so.1
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libc.so.6
> /gnu/store/gsjczqir1wbz8p770zndrpw4rnppmxi3-glibc-2.35/lib/libm.so.6
> /gnu/store/ib2n2vzqpchc3bhh9i712w5sq9zapn8d-gmp-6.2.1/lib/libgmp.so.10.4.1
> /gnu/store/j5kzdjan6mnf2ngmkc50fia8vrbpqi9b-libtasn1-4.19.0/lib/libtasn1.so.6.6.3
> /gnu/store/k0p01a6b7hsxjfr65ga4f2gh6lh92aiq-lzlib-1.13/lib/liblz.so.1.13
> /gnu/store/m9wi9hcrf7f9dm4ri32vw1jrbh1csywi-libgpg-error-1.45/lib/libgpg-error.so.0.33.0
> /gnu/store/slzq3zqwj75lbrg4ly51hfhbv2vhryv5-zlib-1.2.13/lib/libz.so.1.2.13
> /gnu/store/vq7dxp5la2lnhsvniwv38j0ggvsmzim7-p11-kit-0.24.1/lib/libp11-kit.so.0.3.0
> /gnu/store/w8b0l8hk6g0fahj4fvmc4qqm3cvaxnmv-libffi-3.4.4/lib/libffi.so.8.1.2
> /gnu/store/yr4lbvdyc4dgs76yij1dw2w2z8s84af8-gnutls-3.7.7/lib/libgnutls.so.30.34.1


Hmm no idea.  I’ve never seen “Signals delivery fails” before so I
really wonder what could be causing this.  Would be great if you could
come up with a reduced test case, but I guess that won’t be easy.

Or perhaps you could run a Coordinator agent under ‘strace -f’ to see if
we get hints?

Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

* bug#63368: Build coordiantor "Signals delivery fails constantly" crashes
  2023-05-08 10:45 bug#63368: Build coordiantor "Signals delivery fails constantly" crashes Christopher Baines
  2023-05-10 12:47 ` Christopher Baines
  2023-05-25 15:24 ` Ludovic Courtès
@ 2024-12-01 14:26 ` Ludovic Courtès
  2 siblings, 0 replies; 9+ messages in thread
From: Ludovic Courtès @ 2024-12-01 14:26 UTC (permalink / raw)
  To: Christopher Baines; +Cc: 63368

Christopher Baines <mail@cbaines.net> skribis:

> Since the recent core-updates merge, I've seen the build coordinator
> using less memory, but it's also been crashing in a new way, up to 10
> times a day.
>
> In the log, you see something like:
>
>   2023-05-07 09:15:42 Signals delivery fails constantly at GC #71051
>   2023-05-07 09:15:42 Signals delivery fails constantly

Same with ‘guix publish’: https://issues.guix.gnu.org/74632

> I'm guessing the switch from libgc-8.0.4 to libgc-8.2.2 has something to
> do with this.

I’m not sure when these started to happen for ‘guix publish’.

Data point: the ‘guix publish’ instance at guix.bordeaux.inria.fr never
encountered this problem.  The main difference compared to ci.guix is
that it does not produce lzip archives.  (I see the Coordinator uses
Guile-Lzlib; maybe that’s a lead.)

Ludo’.




^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2024-12-01 14:27 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2023-05-08 10:45 bug#63368: Build coordiantor "Signals delivery fails constantly" crashes Christopher Baines
2023-05-10 12:47 ` Christopher Baines
2023-05-25 15:24 ` Ludovic Courtès
2023-05-25 15:26   ` Christopher Baines
2023-06-02 17:07     ` Christopher Baines
2023-06-06 15:09       ` Ludovic Courtès
2023-06-06 15:19         ` Christopher Baines
2023-06-09 13:14           ` Ludovic Courtès
2024-12-01 14:26 ` Ludovic Courtès

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.