Hello, I've observed a few Cuirass crashes the past days. The log looks like: --8<---------------cut here---------------start------------->8--- 2020-09-11T12:55:35 next evaluation in 300 seconds GC Warning: Repeated allocation of very large block (appr. size 28766208): May lead to memory leak and poor performance 2020-09-11T12:58:52 heap: 942.38 MiB; threads: 110; file descriptors: 257 2020-09-11T13:00:35 fetching input 'core-updates' of spec 'core-updates-core-updates' 2020-09-11T13:00:54 fetched input 'core-updates' of spec 'core-updates-core-updates' (commit "1bec03df9b60f156c657a64a323ef27f4ed14b44") 2020-09-11T13:00:54 fetching input 'guix' of spec 'guix-master' 2020-09-11T13:01:13 fetched input 'guix' of spec 'guix-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2") 2020-09-11T13:01:13 evaluating spec 'guix-master' 2020-09-11T13:01:13 fetching input 'guix-modular' of spec 'guix-modular-master' 2020-09-11T13:01:17 fetched input 'guix-modular' of spec 'guix-modular-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2") 2020-09-11T13:01:17 evaluating spec 'guix-modular-master' 2020-09-11T13:01:17 fetching input 'kernel-updates' of spec 'kernel-updates' 2020-09-11T13:01:21 fetched input 'kernel-updates' of spec 'kernel-updates' (commit "1de80be489e443e7c0d8c79ea84762e1706e81ff") 2020-09-11T13:01:21 fetching input 'staging' of spec 'staging-staging' 2020-09-11T13:01:24 fetched input 'staging' of spec 'staging-staging' (commit "de3c03a47160dec355d9b19ad5ca210d90c15fd7") 2020-09-11T13:01:24 fetching input 'version-1.0.1' of spec 'version-1.0.1' 2020-09-11T13:01:27 fetched input 'version-1.0.1' of spec 'version-1.0.1' (commit "58d7909c97c1ab2457faee1d7af925ee32ad15c2") 2020-09-11T13:01:27 fetching input 'version-1.1.0' of spec 'version-1.1.0' mmap(PROT_NONE) failed WARNING: (guile-user): imported module (fibers) overrides core binding `sleep' 2020-09-11T13:01:30 performing database optimizations --8<---------------cut here---------------end--------------->8--- It looks like a memory allocation failed causing a Cuirass/Guile crash. Thanks, Mathieu
Mathieu Othacehe <othacehe@gnu.org> writes:
> Hello,
>
> I've observed a few Cuirass crashes the past days. The log looks like:
>
> --8<---------------cut here---------------start------------->8---
> 2020-09-11T12:55:35 next evaluation in 300 seconds
> GC Warning: Repeated allocation of very large block (appr. size 28766208):
> May lead to memory leak and poor performance
> 2020-09-11T12:58:52 heap: 942.38 MiB; threads: 110; file descriptors: 257
> 2020-09-11T13:00:35 fetching input 'core-updates' of spec 'core-updates-core-updates'
> 2020-09-11T13:00:54 fetched input 'core-updates' of spec 'core-updates-core-updates' (commit "1bec03df9b60f156c657a64a323ef27f4ed14b44")
> 2020-09-11T13:00:54 fetching input 'guix' of spec 'guix-master'
> 2020-09-11T13:01:13 fetched input 'guix' of spec 'guix-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
> 2020-09-11T13:01:13 evaluating spec 'guix-master'
> 2020-09-11T13:01:13 fetching input 'guix-modular' of spec 'guix-modular-master'
> 2020-09-11T13:01:17 fetched input 'guix-modular' of spec 'guix-modular-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2")
> 2020-09-11T13:01:17 evaluating spec 'guix-modular-master'
> 2020-09-11T13:01:17 fetching input 'kernel-updates' of spec 'kernel-updates'
> 2020-09-11T13:01:21 fetched input 'kernel-updates' of spec 'kernel-updates' (commit "1de80be489e443e7c0d8c79ea84762e1706e81ff")
> 2020-09-11T13:01:21 fetching input 'staging' of spec 'staging-staging'
> 2020-09-11T13:01:24 fetched input 'staging' of spec 'staging-staging' (commit "de3c03a47160dec355d9b19ad5ca210d90c15fd7")
> 2020-09-11T13:01:24 fetching input 'version-1.0.1' of spec 'version-1.0.1'
> 2020-09-11T13:01:27 fetched input 'version-1.0.1' of spec 'version-1.0.1' (commit "58d7909c97c1ab2457faee1d7af925ee32ad15c2")
> 2020-09-11T13:01:27 fetching input 'version-1.1.0' of spec 'version-1.1.0'
> mmap(PROT_NONE) failed
> WARNING: (guile-user): imported module (fibers) overrides core binding `sleep'
> 2020-09-11T13:01:30 performing database optimizations
> --8<---------------cut here---------------end--------------->8---
>
> It looks like a memory allocation failed causing a Cuirass/Guile crash.
On ci.guix.gnu.org? We have 188GiB RAM there according to free.
--
Ricardo
[-- Attachment #1: Type: text/plain, Size: 2768 bytes --] Mathieu Othacehe <othacehe@gnu.org> writes: > Hello, > > I've observed a few Cuirass crashes the past days. The log looks like: > > --8<---------------cut here---------------start------------->8--- > 2020-09-11T12:55:35 next evaluation in 300 seconds > GC Warning: Repeated allocation of very large block (appr. size 28766208): > May lead to memory leak and poor performance > 2020-09-11T12:58:52 heap: 942.38 MiB; threads: 110; file descriptors: 257 > 2020-09-11T13:00:35 fetching input 'core-updates' of spec 'core-updates-core-updates' > 2020-09-11T13:00:54 fetched input 'core-updates' of spec 'core-updates-core-updates' (commit "1bec03df9b60f156c657a64a323ef27f4ed14b44") > 2020-09-11T13:00:54 fetching input 'guix' of spec 'guix-master' > 2020-09-11T13:01:13 fetched input 'guix' of spec 'guix-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2") > 2020-09-11T13:01:13 evaluating spec 'guix-master' > 2020-09-11T13:01:13 fetching input 'guix-modular' of spec 'guix-modular-master' > 2020-09-11T13:01:17 fetched input 'guix-modular' of spec 'guix-modular-master' (commit "7daa99e52d94e409f05a874813bdf739709807a2") > 2020-09-11T13:01:17 evaluating spec 'guix-modular-master' > 2020-09-11T13:01:17 fetching input 'kernel-updates' of spec 'kernel-updates' > 2020-09-11T13:01:21 fetched input 'kernel-updates' of spec 'kernel-updates' (commit "1de80be489e443e7c0d8c79ea84762e1706e81ff") > 2020-09-11T13:01:21 fetching input 'staging' of spec 'staging-staging' > 2020-09-11T13:01:24 fetched input 'staging' of spec 'staging-staging' (commit "de3c03a47160dec355d9b19ad5ca210d90c15fd7") > 2020-09-11T13:01:24 fetching input 'version-1.0.1' of spec 'version-1.0.1' > 2020-09-11T13:01:27 fetched input 'version-1.0.1' of spec 'version-1.0.1' (commit "58d7909c97c1ab2457faee1d7af925ee32ad15c2") > 2020-09-11T13:01:27 fetching input 'version-1.1.0' of spec 'version-1.1.0' > mmap(PROT_NONE) failed > WARNING: (guile-user): imported module (fibers) overrides core binding `sleep' > 2020-09-11T13:01:30 performing database optimizations > --8<---------------cut here---------------end--------------->8--- > > It looks like a memory allocation failed causing a Cuirass/Guile crash. So, I've seen this before but in a slightly different context, [1]. To summarise, with Guile built with libgc@8 the Guix Data Service couldn't processes Guix revisions, because the code it had Guile built with libgc@8 run caused it to consistently crash with this error. The workaround was to add a Guile variant built with libgc@7 and use this for the guix package [2]. 1: http://issues.guix.info/40525 2: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=40684 I'm not quite sure what Guile process is crashing here, but switching to use Guile built with libgc@7 might help. [-- Attachment #2: signature.asc --] [-- Type: application/pgp-signature, Size: 962 bytes --]
Hey Chris,
>> It looks like a memory allocation failed causing a Cuirass/Guile crash.
>
> So, I've seen this before but in a slightly different context, [1]. To
> summarise, with Guile built with libgc@8 the Guix Data Service couldn't
> processes Guix revisions, because the code it had Guile built with
> libgc@8 run caused it to consistently crash with this error. The
> workaround was to add a Guile variant built with libgc@7 and use this
> for the guix package [2].
>
> 1: http://issues.guix.info/40525
> 2: https://debbugs.gnu.org/cgi/bugreport.cgi?bug=40684
>
> I'm not quite sure what Guile process is crashing here, but switching to
> use Guile built with libgc@7 might help.
Thanks for pointing to this, I somehow missed it at the time. I
collected the strace log which sounds indeed really similar:
--8<---------------cut here---------------start------------->8---
[pid 49511] getdents64(271, 0x7f5374304930 /* 455 entries */, 32768) = 32760
[pid 42583] mmap(0x7f5361976000, 4096, PROT_NONE, MAP_PRIVATE|MAP_FIXED|MAP_ANONYMOUS, -1, 0) = -1 ENOMEM (Cannot allocate memory)
[pid 42583] write(2, "mmap(PROT_NONE) failed", 22) = 22
[pid 42583] write(2, "\n", 1) = 1
[pid 42583] rt_sigprocmask(SIG_UNBLOCK, [ABRT], NULL, 8) = 0
[pid 42583] rt_sigprocmask(SIG_BLOCK, ~[RTMIN RT_1], [], 8) = 0
[pid 42583] getpid() = 42562
[pid 42583] gettid() = 42583
[pid 42583] tgkill(42562, 42583, SIGABRT) = 0
[pid 42583] rt_sigprocmask(SIG_SETMASK, [], NULL, 8) = 0
[pid 42583] --- SIGABRT {si_signo=SIGABRT, si_code=SI_TKILL, si_pid=42562, si_uid=997} ---
[pid 42738] <... read resumed> <unfinished ...>) = ?
--8<---------------cut here---------------end--------------->8---
The abort seem to be received by the finalizer thread. I can try to use
guile-3.0/libgc-7 to confirm this theory, but I guess we'll need to dig
deeper.
Thanks,
Mathieu
Hello, Closing as Cuirass evaluation process now uses less memory. Thanks, Mathieu