From mboxrd@z Thu Jan 1 00:00:00 1970 From: Ludovic =?UTF-8?Q?Court=C3=A8s?= Subject: bug#37237: mcron randomly stops running jobs Date: Wed, 18 Mar 2020 12:10:27 +0100 Message-ID: <878sjxapgc.fsf@gnu.org> References: <1DDB50D6-8CC4-4617-B2DC-80BCAADE2126@vllmrt.net> Mime-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable Return-path: Received: from eggs.gnu.org ([2001:470:142:3::10]:37783) by lists.gnu.org with esmtp (Exim 4.90_1) (envelope-from ) id 1jEWbJ-0004Y1-4g for bug-guix@gnu.org; Wed, 18 Mar 2020 07:11:07 -0400 Received: from Debian-exim by eggs.gnu.org with spam-scanned (Exim 4.71) (envelope-from ) id 1jEWbG-0001I9-VT for bug-guix@gnu.org; Wed, 18 Mar 2020 07:11:05 -0400 Received: from debbugs.gnu.org ([209.51.188.43]:33910) by eggs.gnu.org with esmtps (TLS1.0:RSA_AES_128_CBC_SHA1:16) (Exim 4.71) (envelope-from ) id 1jEWbG-0001Ht-RL for bug-guix@gnu.org; Wed, 18 Mar 2020 07:11:02 -0400 Received: from Debian-debbugs by debbugs.gnu.org with local (Exim 4.84_2) (envelope-from ) id 1jEWbG-0000vv-M7 for bug-guix@gnu.org; Wed, 18 Mar 2020 07:11:02 -0400 Sender: "Debbugs-submit" Resent-To: bug-guix@gnu.org Resent-Message-ID: In-Reply-To: <1DDB50D6-8CC4-4617-B2DC-80BCAADE2126@vllmrt.net> (Robert Vollmert's message of "Fri, 30 Aug 2019 19:31:44 +0200") List-Id: Bug reports for GNU Guix List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , Errors-To: bug-guix-bounces+gcggb-bug-guix=m.gmane-mx.org@gnu.org Sender: "bug-Guix" To: Robert Vollmert Cc: 37237-done@debbugs.gnu.org Hi, Robert Vollmert skribis: > I have numerous mcron jobs configured, many of which run every 15 minutes, > and one which runs once a minute. Just now, at 19:22, I noticed the minut= ely > cron job didn=E2=80=99t seem to run, and saw that one every-15-minute job= which always > logs to mcron.log had last run at 18:00. I restarted mcron, and jobs star= ted > running again. There is no timezone confusion either: That every-15-minut= e job > logged its first run at 19:30. When mcron is stuck, it=E2=80=99s apparently stuck in =E2=80=98pthread_join= =E2=80=99, waiting for the finalization thread to terminate: --8<---------------cut here---------------start------------->8--- (gdb) bt #0 0x00007f7f730d8478 in __GI___pthread_timedjoin_ex (threadid=3D140185229= 293312, thread_return=3Dthread_return@entry=3D0x0, abstime=3Dabstime@entry= =3D0x0,=20 block=3Dblock@entry=3Dtrue) at pthread_join_common.c:84 #1 0x00007f7f730d82dc in __pthread_join (threadid=3D, threa= d_return=3Dthread_return@entry=3D0x0) at pthread_join.c:24 #2 0x00007f7f731d7508 in stop_finalization_thread () at finalizers.c:265 #3 0x00007f7f731d7729 in scm_i_finalizer_pre_fork () at finalizers.c:290 #4 0x00007f7f73250426 in scm_fork () at posix.c:1220 #5 0x00007f7f7324979f in vm_regular_engine (thread=3D0x7f7f6acb79d0, vp=3D= 0x7f7f6b299f30, registers=3D0xafaf, resume=3D1930265720) at vm-engine.c:786 #6 0x00007f7f7324bfd7 in scm_call_n (proc=3D#, argv= =3Dargv@entry=3D0x7ffd9b10bed8, nargs=3Dnargs@entry=3D1) at vm.c:1260 #7 0x00007f7f731cdf58 in scm_call_1 (proc=3D, arg1=3D) at eval.c:485 #8 0x000000000040130c in inner_main () #9 0x00007f7f731e5cfd in invoke_main_func (body_data=3D0x7ffd9b10c3b0) at = init.c:341 #10 0x00007f7f731c835a in c_body (d=3D0x7ffd9b10c2f0) at continuations.c:422 #11 0x00007f7f7324979f in vm_regular_engine (thread=3D0x7f7f6acb79d0, vp=3D= 0x7f7f6b299f30, registers=3D0xafaf, resume=3D1930265720) at vm-engine.c:786 #12 0x00007f7f7324bfd7 in scm_call_n (proc=3Dproc@entry=3D#, argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at vm.= c:1260 #13 0x00007f7f731cdf39 in scm_call_0 (proc=3Dproc@entry=3D#) at eval.c:479 #14 0x00007f7f7323ac12 in catch (tag=3Dtag@entry=3D#t, thunk=3D#,=20 handler=3D0x7f7f6b2a1420,=20 pre_unwind_handler=3D0x7f7f6b2a1400) at throw.c:137 #15 0x00007f7f7323aef5 in scm_catch_with_pre_unwind_handler (key=3Dkey@entr= y=3D#t, thunk=3D, handler=3D, pre_unwind_hand= ler=3D) at throw.c:254 #16 0x00007f7f7323b0bf in scm_c_catch (tag=3Dtag@entry=3D#t, body=3Dbody@en= try=3D0x7f7f731c8350 , body_data=3Dbody_data@entry=3D0x7ffd9b10c2f0= ,=20 handler=3Dhandler@entry=3D0x7f7f731c85e0 , handler_data=3Dha= ndler_data@entry=3D0x7ffd9b10c2f0,=20 pre_unwind_handler=3Dpre_unwind_handler@entry=3D0x7f7f731c8440 , pre_unwind_handler_data=3D0x7f7f6b2939a0) at throw.c:377 #17 0x00007f7f731c8940 in scm_i_with_continuation_barrier (body=3Dbody@entr= y=3D0x7f7f731c8350 , body_data=3Dbody_data@entry=3D0x7ffd9b10c2f0,= =20 handler=3Dhandler@entry=3D0x7f7f731c85e0 , handler_data=3Dha= ndler_data@entry=3D0x7ffd9b10c2f0,=20 pre_unwind_handler=3Dpre_unwind_handler@entry=3D0x7f7f731c8440 , pre_unwind_handler_data=3D0x7f7f6b2939a0) at continuations.c:= 360 #18 0x00007f7f731c89d5 in scm_c_with_continuation_barrier (func=3D, data=3D) at continuations.c:456 #19 0x00007f7f7323984c in with_guile (base=3Dbase@entry=3D0x7ffd9b10c358, d= ata=3Ddata@entry=3D0x7ffd9b10c380) at threads.c:661 #20 0x00007f7f73127a68 in GC_call_with_stack_base (fn=3Dfn@entry=3D0x7f7f73= 239800 , arg=3Darg@entry=3D0x7ffd9b10c380) at misc.c:1941 #21 0x00007f7f73239b68 in scm_i_with_guile (dynamic_state=3D= , data=3Ddata@entry=3D0x7ffd9b10c380, func=3Dfunc@entry=3D0x7f7f731e5ce0 ) at threads.c:704 #22 scm_with_guile (func=3Dfunc@entry=3D0x7f7f731e5ce0 , = data=3Ddata@entry=3D0x7ffd9b10c3b0) at threads.c:710 #23 0x00007f7f731e5e92 in scm_boot_guile (argc=3D12, argv=3D0x7ffd9b10c4f8,= main_func=3D0x4012a0 , closure=3D0x0) at init.c:324 #24 0x000000000040118b in main () --8<---------------cut here---------------end--------------->8--- The finalization thread lives its life as if it hadn=E2=80=99t received the =E2=80=9Cstop=E2=80=9D message in its pipe: --8<---------------cut here---------------start------------->8--- (gdb) thread 17 [Switching to thread 17 (Thread 0x7f7f6acb7700 (LWP 44975))] #0 0x00007f7f730e0344 in __libc_read (fd=3D5, buf=3Dbuf@entry=3D0x7f7f6acb= 6a40, nbytes=3Dnbytes@entry=3D1) at ../sysdeps/unix/sysv/linux/read.c:26 26 ../sysdeps/unix/sysv/linux/read.c: No such file or directory. (gdb) bt #0 0x00007f7f730e0344 in __libc_read (fd=3D5, buf=3Dbuf@entry=3D0x7f7f6acb= 6a40, nbytes=3Dnbytes@entry=3D1) at ../sysdeps/unix/sysv/linux/read.c:26 #1 0x00007f7f731d7497 in read_finalization_pipe_data (data=3D0x7f7f6acb6a4= 0) at finalizers.c:199 #2 0x00007f7f7312d503 in GC_do_blocking_inner (data=3D0x7f7f6acb6a00 "\200= t\035s\177\177", context=3D) at pthread_support.c:1362 #3 0x00007f7f73121d62 in GC_with_callee_saves_pushed (fn=3D0x7f7f7312d4c0 = , arg=3Darg@entry=3D0x7f7f6acb6a00 "\200t\035s\177\17= 7") at mach_dep.c:328 #4 0x00007f7f73127a9c in GC_do_blocking (fn=3Dfn@entry=3D0x7f7f731d7480 , client_data=3Dclient_data@entry=3D0x7f7f6acb6a= 40) at misc.c:2053 #5 0x00007f7f73239bba in scm_without_guile (func=3D0x7f7f731d7480 , data=3D0x7f7f6acb6a40) at threads.c:722 #6 0x00007f7f731d784b in finalization_thread_proc (unused=3D) at finalizers.c:212 #7 0x00007f7f731c835a in c_body (d=3D0x7f7f6acb6e50) at continuations.c:422 #8 0x00007f7f7324979f in vm_regular_engine (thread=3D0x5, vp=3D0x7f7f6b299= ea0, registers=3D0x1, resume=3D1930298180) at vm-engine.c:786 #9 0x00007f7f7324bfd7 in scm_call_n (proc=3Dproc@entry=3D#, argv=3Dargv@entry=3D0x0, nargs=3Dnargs@entry=3D0) at vm.= c:1260 #10 0x00007f7f731cdf39 in scm_call_0 (proc=3Dproc@entry=3D#) at eval.c:479 #11 0x00007f7f7323ac12 in catch (tag=3Dtag@entry=3D#t, thunk=3D#,=20 handler=3D0x7f7f6b22f740,=20 pre_unwind_handler=3D0x7f7f6b22f720) at throw.c:137 #12 0x00007f7f7323aef5 in scm_catch_with_pre_unwind_handler (key=3Dkey@entr= y=3D#t, thunk=3D, handler=3D, pre_unwind_hand= ler=3D) at throw.c:254 #13 0x00007f7f7323b0bf in scm_c_catch (tag=3Dtag@entry=3D#t, body=3Dbody@en= try=3D0x7f7f731c8350 , body_data=3Dbody_data@entry=3D0x7f7f6acb6e50= ,=20 handler=3Dhandler@entry=3D0x7f7f731c85e0 , handler_data=3Dha= ndler_data@entry=3D0x7f7f6acb6e50,=20 pre_unwind_handler=3Dpre_unwind_handler@entry=3D0x7f7f731c8440 , pre_unwind_handler_data=3D0x7f7f6b2939a0) at throw.c:377 #14 0x00007f7f731c8940 in scm_i_with_continuation_barrier (body=3Dbody@entr= y=3D0x7f7f731c8350 , body_data=3Dbody_data@entry=3D0x7f7f6acb6e50,= =20 handler=3Dhandler@entry=3D0x7f7f731c85e0 , handler_data=3Dha= ndler_data@entry=3D0x7f7f6acb6e50,=20 pre_unwind_handler=3Dpre_unwind_handler@entry=3D0x7f7f731c8440 , pre_unwind_handler_data=3D0x7f7f6b2939a0) at continuations.c:= 360 #15 0x00007f7f731c89d5 in scm_c_with_continuation_barrier (func=3D, data=3D) at continuations.c:456 #16 0x00007f7f7323984c in with_guile (base=3Dbase@entry=3D0x7f7f6acb6eb8, d= ata=3Ddata@entry=3D0x7f7f6acb6ee0) at threads.c:661 #17 0x00007f7f73127a68 in GC_call_with_stack_base (fn=3Dfn@entry=3D0x7f7f73= 239800 , arg=3Darg@entry=3D0x7f7f6acb6ee0) at misc.c:1941 #18 0x00007f7f73239b68 in scm_i_with_guile (dynamic_state=3D= , data=3D, func=3D) at threads.c:704 #19 scm_with_guile (func=3D, data=3D) at thre= ads.c:710 #20 0x00007f7f730d7015 in start_thread (arg=3D0x7f7f6acb7700) at pthread_cr= eate.c:486 #21 0x00007f7f7300891f in clone () at ../sysdeps/unix/sysv/linux/x86_64/clo= ne.S:95 --8<---------------cut here---------------end--------------->8--- This is on 2.2.6 and very likely the same issue as : =E2=80=98finalization_thread_proc=E2=80=99 ge= ts EINTR, by chance =E2=80=98data.byte=E2=80=99 is zero, and so it keeps going instead o= f exiting. Commit e002332b8a136638c2a4fc1a2f2a26541ce0211d moves mcron to 2.2.7, which should fix this problem. \o/ Ludo=E2=80=99.