* bug#52779: tests/no-home test failure in Shepherd @ 2021-12-24 14:52 Maxim Cournoyer 2022-01-03 21:09 ` Maxim Cournoyer 2022-01-13 15:37 ` Ludovic Courtès 0 siblings, 2 replies; 6+ messages in thread From: Maxim Cournoyer @ 2021-12-24 14:52 UTC (permalink / raw) To: 52779 Hello Guix, I encountered the following (nondeterministic) test failure: --8<---------------cut here---------------start------------->8--- # TOTAL: 13 # PASS: 12 # SKIP: 0 # XFAIL: 0 # FAIL: 1 # XPASS: 0 # ERROR: 0 .. contents:: :depth: 2 FAIL: tests/no-home =================== + shepherd --version shepherd (GNU Shepherd) 0.8.1 Copyright (C) 2020 the Shepherd authors License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. + HOME=/nowhere + export HOME + test -w /nowhere + socket=t-socket-1651 + pid=t-pid-1651 + herd='herd -s t-socket-1651' + trap 'rm -f t-socket-1651; test -f t-pid-1651 && kill `cat t-pid-1651` || true; rm -f t-pid-1651' EXIT + shepherd_pid=1896 + test -f t-pid-1651 + shepherd -I -s t-socket-1651 -c /dev/null -l /dev/null --pid=t-pid-1651 + kill -0 1896 + sleep 0.3 Service root has been started. + test -f t-pid-1651 ++ cat t-pid-1651 + kill -0 1896 + herd -s t-socket-1651 status root Started: + root + herd -s t-socket-1651 stop root ++ cat t-pid-1651 + kill 1896 + exit 1 + rm -f t-socket-1651 + test -f t-pid-1651 ++ cat t-pid-1651 + kill 1896 + rm -f t-pid-1651 FAIL tests/no-home.sh (exit status: 1) error: in phase 'check': uncaught exception: %exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f> phase `check' failed after 15.3 seconds command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2 --8<---------------cut here---------------end--------------->8--- Perhaps due to high load, as I can't seem to reproduce when building Shepherd alone. Thanks, Maxim ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd 2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer @ 2022-01-03 21:09 ` Maxim Cournoyer 2022-01-10 18:17 ` Maxim Cournoyer 2022-01-13 15:37 ` Ludovic Courtès 1 sibling, 1 reply; 6+ messages in thread From: Maxim Cournoyer @ 2022-01-03 21:09 UTC (permalink / raw) To: 52779 Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > Hello Guix, > > I encountered the following (nondeterministic) test failure: > > # TOTAL: 13 > # PASS: 12 > # SKIP: 0 > # XFAIL: 0 > # FAIL: 1 > # XPASS: 0 > # ERROR: 0 > > .. contents:: :depth: 2 > > FAIL: tests/no-home > =================== > > + shepherd --version > shepherd (GNU Shepherd) 0.8.1 > Copyright (C) 2020 the Shepherd authors > License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> > This is free software: you are free to change and redistribute it. > There is NO WARRANTY, to the extent permitted by law. > + HOME=/nowhere > + export HOME > + test -w /nowhere > + socket=t-socket-1651 > + pid=t-pid-1651 > + herd='herd -s t-socket-1651' > + trap 'rm -f t-socket-1651; > test -f t-pid-1651 && kill `cat t-pid-1651` || true; rm -f t-pid-1651' EXIT > + shepherd_pid=1896 > + test -f t-pid-1651 > + shepherd -I -s t-socket-1651 -c /dev/null -l /dev/null --pid=t-pid-1651 > + kill -0 1896 > + sleep 0.3 > Service root has been started. > + test -f t-pid-1651 > ++ cat t-pid-1651 > + kill -0 1896 > + herd -s t-socket-1651 status root > Started: > + root > + herd -s t-socket-1651 stop root > ++ cat t-pid-1651 > + kill 1896 > + exit 1 > + rm -f t-socket-1651 > + test -f t-pid-1651 > ++ cat t-pid-1651 > + kill 1896 > + rm -f t-pid-1651 > FAIL tests/no-home.sh (exit status: 1) > > > error: in phase 'check': uncaught exception: > %exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f> > phase `check' failed after 15.3 seconds > command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2 > > Perhaps due to high load, as I can't seem to reproduce when building > Shepherd alone. I encountered the problem once more; the errors are slightly different so I'm pasting the failed test output here: --8<---------------cut here---------------end--------------->8--- FAIL: tests/no-home =================== + shepherd --version shepherd (GNU Shepherd) 0.8.1 Copyright (C) 2020 the Shepherd authors License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html> This is free software: you are free to change and redistribute it. There is NO WARRANTY, to the extent permitted by law. + HOME=/nowhere + export HOME + test -w /nowhere + socket=t-socket-1606 + pid=t-pid-1606 + herd='herd -s t-socket-1606' + trap 'rm -f t-socket-1606; test -f t-pid-1606 && kill `cat t-pid-1606` || true; rm -f t-pid-1606' EXIT + shepherd_pid=1868 + test -f t-pid-1606 + shepherd -I -s t-socket-1606 -c /dev/null -l /dev/null --pid=t-pid-1606 + kill -0 1868 + sleep 0.3 Service root has been started. + test -f t-pid-1606 ++ cat t-pid-1606 + kill -0 1868 + herd -s t-socket-1606 status root Started: + root + herd -s t-socket-1606 stop root ++ cat t-pid-1606 + kill 1868 + exit 1 + rm -f t-socket-1606 + test -f t-pid-1606 ++ cat t-pid-1606 + kill 1868 ./tests/no-home.sh: line 2: kill: (1868) - No such process + true + rm -f t-pid-1606 FAIL tests/no-home.sh (exit status: 1) error: in phase 'check': uncaught exception: %exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f> phase `check' failed after 15.6 seconds command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2 --8<---------------cut here---------------end--------------->8--- Thanks, Maxim ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd 2022-01-03 21:09 ` Maxim Cournoyer @ 2022-01-10 18:17 ` Maxim Cournoyer 0 siblings, 0 replies; 6+ messages in thread From: Maxim Cournoyer @ 2022-01-10 18:17 UTC (permalink / raw) To: 52779 Hi, Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > Hi, > > Maxim Cournoyer <maxim.cournoyer@gmail.com> writes: > >> Hello Guix, >> >> I encountered the following (nondeterministic) test failure: >> >> # TOTAL: 13 >> # PASS: 12 >> # SKIP: 0 >> # XFAIL: 0 >> # FAIL: 1 >> # XPASS: 0 >> # ERROR: 0 >> >> .. contents:: :depth: 2 >> >> FAIL: tests/no-home >> =================== [...] I tried to reproduce this is a controlled setting, using the 'stress' command to effect a high load on the system, but in vain. I could only trigger the problem on a highly parallel CPU (24 logical cores) busy building 10s of Guix packages at the same time. Thanks, Maxim ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd 2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer 2022-01-03 21:09 ` Maxim Cournoyer @ 2022-01-13 15:37 ` Ludovic Courtès 2022-01-13 17:29 ` Maxim Cournoyer 1 sibling, 1 reply; 6+ messages in thread From: Ludovic Courtès @ 2022-01-13 15:37 UTC (permalink / raw) To: Maxim Cournoyer; +Cc: 52779 [-- Attachment #1: Type: text/plain, Size: 2613 bytes --] Hello, Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > + herd -s t-socket-1651 status root > Started: > + root > + herd -s t-socket-1651 stop root > ++ cat t-pid-1651 > + kill 1896 > + exit 1 > + rm -f t-socket-1651 > + test -f t-pid-1651 > ++ cat t-pid-1651 > + kill 1896 > + rm -f t-pid-1651 > FAIL tests/no-home.sh (exit status: 1) What happens here is that the shepherd process is still alive after ‘herd stop root’ has completed, contrary to what’s expected: --8<---------------cut here---------------start------------->8--- $herd stop root if kill `cat "$pid"` then exit 1 fi --8<---------------cut here---------------end--------------->8--- The expectation is that shepherd has terminated by the time ‘herd stop root’ exits; I wonder if that’s bogus. ‘herd stop root’ terminates when shepherd has closed its connection, which normally happens when shepherd exits: --8<---------------cut here---------------start------------->8--- 28003 read(15, "(shepherd-command (version 0) (action stop) (service root) (arguments ()) (directory \"/data/src/shepherd\"))", 1024) = 107 28003 brk(0x1030000) = 0x1030000 28003 mmap(NULL, 262144, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0072be8000 28003 brk(0x100f000) = 0x100f000 28003 getcwd("/data/src/shepherd", 100) = 19 28003 chdir("/data/src/shepherd") = 0 28003 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0 28003 write(7, "2022-01-13 16:21:16 Exiting shepherd...\n", 40) = 40 28003 chdir("/data/src/shepherd") = 0 28003 getuid() = 1000 28003 close(13) = 0 28003 unlink("test") = 0 28003 exit_group(0) = ? 28006 <... futex resumed>) = ? 28008 <... read resumed> <unfinished ...>) = ? 28005 <... futex resumed>) = ? 28004 <... futex resumed>) = ? 28008 +++ exited with 0 +++ 28006 +++ exited with 0 +++ 28005 +++ exited with 0 +++ 28004 +++ exited with 0 +++ 28003 +++ exited with 0 +++ --8<---------------cut here---------------end--------------->8--- Maybe there’s a chance that the shell hasn’t processed the shepherd’s SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that case, the shepherd process still exists as a zombie. A more robust approach might be to use the shell’s builtin ‘wait’, because then I suppose the shell will be forced to process pending SIGCHLDs: [-- Attachment #2: Type: text/x-patch, Size: 553 bytes --] diff --git a/tests/no-home.sh b/tests/no-home.sh index 85b6116..5a8c278 100644 --- a/tests/no-home.sh +++ b/tests/no-home.sh @@ -1,5 +1,5 @@ # GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not writable. -# Copyright © 2014, 2016 Ludovic Courtès <ludo@gnu.org> +# Copyright © 2014, 2016, 2022 Ludovic Courtès <ludo@gnu.org> # # This file is part of the GNU Shepherd. # @@ -46,7 +46,4 @@ kill -0 `cat "$pid"` $herd status root $herd stop root -if kill `cat "$pid"` -then - exit 1 -fi +wait `cat "$pid"` [-- Attachment #3: Type: text/plain, Size: 217 bytes --] I can’t get it to fail while waiting for a few minutes of: while make check TESTS=tests/no-home.sh ; do : ; done … but I cannot get the original one to fail either. Does it work for you? Ludo’. ^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd 2022-01-13 15:37 ` Ludovic Courtès @ 2022-01-13 17:29 ` Maxim Cournoyer 2022-01-17 13:43 ` Ludovic Courtès 0 siblings, 1 reply; 6+ messages in thread From: Maxim Cournoyer @ 2022-01-13 17:29 UTC (permalink / raw) To: Ludovic Courtès; +Cc: 52779 Hi Ludo! Ludovic Courtès <ludo@gnu.org> writes: > Hello, > > Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > >> + herd -s t-socket-1651 status root >> Started: >> + root >> + herd -s t-socket-1651 stop root >> ++ cat t-pid-1651 >> + kill 1896 >> + exit 1 >> + rm -f t-socket-1651 >> + test -f t-pid-1651 >> ++ cat t-pid-1651 >> + kill 1896 >> + rm -f t-pid-1651 >> FAIL tests/no-home.sh (exit status: 1) > > What happens here is that the shepherd process is still alive after > ‘herd stop root’ has completed, contrary to what’s expected: > > $herd stop root > > if kill `cat "$pid"` > then > exit 1 > fi Yes! [...] > Maybe there’s a chance that the shell hasn’t processed the shepherd’s > SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that > case, the shepherd process still exists as a zombie. > > A more robust approach might be to use the shell’s builtin ‘wait’, > because then I suppose the shell will be forced to process pending > SIGCHLDs: > > diff --git a/tests/no-home.sh b/tests/no-home.sh > index 85b6116..5a8c278 100644 > --- a/tests/no-home.sh > +++ b/tests/no-home.sh > @@ -1,5 +1,5 @@ > # GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not writable. > -# Copyright © 2014, 2016 Ludovic Courtès <ludo@gnu.org> > +# Copyright © 2014, 2016, 2022 Ludovic Courtès <ludo@gnu.org> > # > # This file is part of the GNU Shepherd. > # > @@ -46,7 +46,4 @@ kill -0 `cat "$pid"` > $herd status root > $herd stop root > > -if kill `cat "$pid"` > -then > - exit 1 > -fi > +wait `cat "$pid"` As I wrote, I was also unable to reproduce this (but when I had a high load of packages to build at the same time, I could get it to happen a couple times upon retrying). Your analysis (and the narrow window which would allow for a failure) makes sense to me, along with the proposed fix. I think you should commit it and tentatively mark this bug as fixed :-). Thank you for looking into it! Maxim ^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd 2022-01-13 17:29 ` Maxim Cournoyer @ 2022-01-17 13:43 ` Ludovic Courtès 0 siblings, 0 replies; 6+ messages in thread From: Ludovic Courtès @ 2022-01-17 13:43 UTC (permalink / raw) To: Maxim Cournoyer; +Cc: 52779-done Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis: > I think you should commit it and tentatively mark this bug as fixed :-). Alright, pushed as Shepherd commit e025f83c2072e4a6533a5747682810a885e20b2e! Thanks, Ludo’. ^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-01-17 14:54 UTC | newest] Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer 2022-01-03 21:09 ` Maxim Cournoyer 2022-01-10 18:17 ` Maxim Cournoyer 2022-01-13 15:37 ` Ludovic Courtès 2022-01-13 17:29 ` Maxim Cournoyer 2022-01-17 13:43 ` Ludovic Courtès
Code repositories for project(s) associated with this public inbox https://git.savannah.gnu.org/cgit/guix.git This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).