* bug#52779: tests/no-home test failure in Shepherd
@ 2021-12-24 14:52 Maxim Cournoyer
2022-01-03 21:09 ` Maxim Cournoyer
2022-01-13 15:37 ` Ludovic Courtès
0 siblings, 2 replies; 6+ messages in thread
From: Maxim Cournoyer @ 2021-12-24 14:52 UTC (permalink / raw)
To: 52779
Hello Guix,
I encountered the following (nondeterministic) test failure:
--8<---------------cut here---------------start------------->8---
# TOTAL: 13
# PASS: 12
# SKIP: 0
# XFAIL: 0
# FAIL: 1
# XPASS: 0
# ERROR: 0
.. contents:: :depth: 2
FAIL: tests/no-home
===================
+ shepherd --version
shepherd (GNU Shepherd) 0.8.1
Copyright (C) 2020 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ HOME=/nowhere
+ export HOME
+ test -w /nowhere
+ socket=t-socket-1651
+ pid=t-pid-1651
+ herd='herd -s t-socket-1651'
+ trap 'rm -f t-socket-1651;
test -f t-pid-1651 && kill `cat t-pid-1651` || true; rm -f t-pid-1651' EXIT
+ shepherd_pid=1896
+ test -f t-pid-1651
+ shepherd -I -s t-socket-1651 -c /dev/null -l /dev/null --pid=t-pid-1651
+ kill -0 1896
+ sleep 0.3
Service root has been started.
+ test -f t-pid-1651
++ cat t-pid-1651
+ kill -0 1896
+ herd -s t-socket-1651 status root
Started:
+ root
+ herd -s t-socket-1651 stop root
++ cat t-pid-1651
+ kill 1896
+ exit 1
+ rm -f t-socket-1651
+ test -f t-pid-1651
++ cat t-pid-1651
+ kill 1896
+ rm -f t-pid-1651
FAIL tests/no-home.sh (exit status: 1)
error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f>
phase `check' failed after 15.3 seconds
command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2
--8<---------------cut here---------------end--------------->8---
Perhaps due to high load, as I can't seem to reproduce when building
Shepherd alone.
Thanks,
Maxim
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd
2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer
@ 2022-01-03 21:09 ` Maxim Cournoyer
2022-01-10 18:17 ` Maxim Cournoyer
2022-01-13 15:37 ` Ludovic Courtès
1 sibling, 1 reply; 6+ messages in thread
From: Maxim Cournoyer @ 2022-01-03 21:09 UTC (permalink / raw)
To: 52779
Hi,
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
> Hello Guix,
>
> I encountered the following (nondeterministic) test failure:
>
> # TOTAL: 13
> # PASS: 12
> # SKIP: 0
> # XFAIL: 0
> # FAIL: 1
> # XPASS: 0
> # ERROR: 0
>
> .. contents:: :depth: 2
>
> FAIL: tests/no-home
> ===================
>
> + shepherd --version
> shepherd (GNU Shepherd) 0.8.1
> Copyright (C) 2020 the Shepherd authors
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> + HOME=/nowhere
> + export HOME
> + test -w /nowhere
> + socket=t-socket-1651
> + pid=t-pid-1651
> + herd='herd -s t-socket-1651'
> + trap 'rm -f t-socket-1651;
> test -f t-pid-1651 && kill `cat t-pid-1651` || true; rm -f t-pid-1651' EXIT
> + shepherd_pid=1896
> + test -f t-pid-1651
> + shepherd -I -s t-socket-1651 -c /dev/null -l /dev/null --pid=t-pid-1651
> + kill -0 1896
> + sleep 0.3
> Service root has been started.
> + test -f t-pid-1651
> ++ cat t-pid-1651
> + kill -0 1896
> + herd -s t-socket-1651 status root
> Started:
> + root
> + herd -s t-socket-1651 stop root
> ++ cat t-pid-1651
> + kill 1896
> + exit 1
> + rm -f t-socket-1651
> + test -f t-pid-1651
> ++ cat t-pid-1651
> + kill 1896
> + rm -f t-pid-1651
> FAIL tests/no-home.sh (exit status: 1)
>
>
> error: in phase 'check': uncaught exception:
> %exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f>
> phase `check' failed after 15.3 seconds
> command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2
>
> Perhaps due to high load, as I can't seem to reproduce when building
> Shepherd alone.
I encountered the problem once more; the errors are slightly different
so I'm pasting the failed test output here:
--8<---------------cut here---------------end--------------->8---
FAIL: tests/no-home
===================
+ shepherd --version
shepherd (GNU Shepherd) 0.8.1
Copyright (C) 2020 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ HOME=/nowhere
+ export HOME
+ test -w /nowhere
+ socket=t-socket-1606
+ pid=t-pid-1606
+ herd='herd -s t-socket-1606'
+ trap 'rm -f t-socket-1606;
test -f t-pid-1606 && kill `cat t-pid-1606` || true; rm -f t-pid-1606' EXIT
+ shepherd_pid=1868
+ test -f t-pid-1606
+ shepherd -I -s t-socket-1606 -c /dev/null -l /dev/null --pid=t-pid-1606
+ kill -0 1868
+ sleep 0.3
Service root has been started.
+ test -f t-pid-1606
++ cat t-pid-1606
+ kill -0 1868
+ herd -s t-socket-1606 status root
Started:
+ root
+ herd -s t-socket-1606 stop root
++ cat t-pid-1606
+ kill 1868
+ exit 1
+ rm -f t-socket-1606
+ test -f t-pid-1606
++ cat t-pid-1606
+ kill 1868
./tests/no-home.sh: line 2: kill: (1868) - No such process
+ true
+ rm -f t-pid-1606
FAIL tests/no-home.sh (exit status: 1)
error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f>
phase `check' failed after 15.6 seconds
command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2
--8<---------------cut here---------------end--------------->8---
Thanks,
Maxim
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd
2022-01-03 21:09 ` Maxim Cournoyer
@ 2022-01-10 18:17 ` Maxim Cournoyer
0 siblings, 0 replies; 6+ messages in thread
From: Maxim Cournoyer @ 2022-01-10 18:17 UTC (permalink / raw)
To: 52779
Hi,
Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hello Guix,
>>
>> I encountered the following (nondeterministic) test failure:
>>
>> # TOTAL: 13
>> # PASS: 12
>> # SKIP: 0
>> # XFAIL: 0
>> # FAIL: 1
>> # XPASS: 0
>> # ERROR: 0
>>
>> .. contents:: :depth: 2
>>
>> FAIL: tests/no-home
>> ===================
[...]
I tried to reproduce this is a controlled setting, using the 'stress'
command to effect a high load on the system, but in vain.
I could only trigger the problem on a highly parallel CPU (24 logical
cores) busy building 10s of Guix packages at the same time.
Thanks,
Maxim
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd
2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer
2022-01-03 21:09 ` Maxim Cournoyer
@ 2022-01-13 15:37 ` Ludovic Courtès
2022-01-13 17:29 ` Maxim Cournoyer
1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2022-01-13 15:37 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: 52779
[-- Attachment #1: Type: text/plain, Size: 2613 bytes --]
Hello,
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> + herd -s t-socket-1651 status root
> Started:
> + root
> + herd -s t-socket-1651 stop root
> ++ cat t-pid-1651
> + kill 1896
> + exit 1
> + rm -f t-socket-1651
> + test -f t-pid-1651
> ++ cat t-pid-1651
> + kill 1896
> + rm -f t-pid-1651
> FAIL tests/no-home.sh (exit status: 1)
What happens here is that the shepherd process is still alive after
‘herd stop root’ has completed, contrary to what’s expected:
--8<---------------cut here---------------start------------->8---
$herd stop root
if kill `cat "$pid"`
then
exit 1
fi
--8<---------------cut here---------------end--------------->8---
The expectation is that shepherd has terminated by the time ‘herd stop
root’ exits; I wonder if that’s bogus.
‘herd stop root’ terminates when shepherd has closed its connection,
which normally happens when shepherd exits:
--8<---------------cut here---------------start------------->8---
28003 read(15, "(shepherd-command (version 0) (action stop) (service root) (arguments ()) (directory \"/data/src/shepherd\"))", 1024) = 107
28003 brk(0x1030000) = 0x1030000
28003 mmap(NULL, 262144, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0072be8000
28003 brk(0x100f000) = 0x100f000
28003 getcwd("/data/src/shepherd", 100) = 19
28003 chdir("/data/src/shepherd") = 0
28003 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0
28003 write(7, "2022-01-13 16:21:16 Exiting shepherd...\n", 40) = 40
28003 chdir("/data/src/shepherd") = 0
28003 getuid() = 1000
28003 close(13) = 0
28003 unlink("test") = 0
28003 exit_group(0) = ?
28006 <... futex resumed>) = ?
28008 <... read resumed> <unfinished ...>) = ?
28005 <... futex resumed>) = ?
28004 <... futex resumed>) = ?
28008 +++ exited with 0 +++
28006 +++ exited with 0 +++
28005 +++ exited with 0 +++
28004 +++ exited with 0 +++
28003 +++ exited with 0 +++
--8<---------------cut here---------------end--------------->8---
Maybe there’s a chance that the shell hasn’t processed the shepherd’s
SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that
case, the shepherd process still exists as a zombie.
A more robust approach might be to use the shell’s builtin ‘wait’,
because then I suppose the shell will be forced to process pending
SIGCHLDs:
[-- Attachment #2: Type: text/x-patch, Size: 553 bytes --]
diff --git a/tests/no-home.sh b/tests/no-home.sh
index 85b6116..5a8c278 100644
--- a/tests/no-home.sh
+++ b/tests/no-home.sh
@@ -1,5 +1,5 @@
# GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not writable.
-# Copyright © 2014, 2016 Ludovic Courtès <ludo@gnu.org>
+# Copyright © 2014, 2016, 2022 Ludovic Courtès <ludo@gnu.org>
#
# This file is part of the GNU Shepherd.
#
@@ -46,7 +46,4 @@ kill -0 `cat "$pid"`
$herd status root
$herd stop root
-if kill `cat "$pid"`
-then
- exit 1
-fi
+wait `cat "$pid"`
[-- Attachment #3: Type: text/plain, Size: 217 bytes --]
I can’t get it to fail while waiting for a few minutes of:
while make check TESTS=tests/no-home.sh ; do : ; done
… but I cannot get the original one to fail either.
Does it work for you?
Ludo’.
^ permalink raw reply related [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd
2022-01-13 15:37 ` Ludovic Courtès
@ 2022-01-13 17:29 ` Maxim Cournoyer
2022-01-17 13:43 ` Ludovic Courtès
0 siblings, 1 reply; 6+ messages in thread
From: Maxim Cournoyer @ 2022-01-13 17:29 UTC (permalink / raw)
To: Ludovic Courtès; +Cc: 52779
Hi Ludo!
Ludovic Courtès <ludo@gnu.org> writes:
> Hello,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> + herd -s t-socket-1651 status root
>> Started:
>> + root
>> + herd -s t-socket-1651 stop root
>> ++ cat t-pid-1651
>> + kill 1896
>> + exit 1
>> + rm -f t-socket-1651
>> + test -f t-pid-1651
>> ++ cat t-pid-1651
>> + kill 1896
>> + rm -f t-pid-1651
>> FAIL tests/no-home.sh (exit status: 1)
>
> What happens here is that the shepherd process is still alive after
> ‘herd stop root’ has completed, contrary to what’s expected:
>
> $herd stop root
>
> if kill `cat "$pid"`
> then
> exit 1
> fi
Yes!
[...]
> Maybe there’s a chance that the shell hasn’t processed the shepherd’s
> SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that
> case, the shepherd process still exists as a zombie.
>
> A more robust approach might be to use the shell’s builtin ‘wait’,
> because then I suppose the shell will be forced to process pending
> SIGCHLDs:
>
> diff --git a/tests/no-home.sh b/tests/no-home.sh
> index 85b6116..5a8c278 100644
> --- a/tests/no-home.sh
> +++ b/tests/no-home.sh
> @@ -1,5 +1,5 @@
> # GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not writable.
> -# Copyright © 2014, 2016 Ludovic Courtès <ludo@gnu.org>
> +# Copyright © 2014, 2016, 2022 Ludovic Courtès <ludo@gnu.org>
> #
> # This file is part of the GNU Shepherd.
> #
> @@ -46,7 +46,4 @@ kill -0 `cat "$pid"`
> $herd status root
> $herd stop root
>
> -if kill `cat "$pid"`
> -then
> - exit 1
> -fi
> +wait `cat "$pid"`
As I wrote, I was also unable to reproduce this (but when I had a high
load of packages to build at the same time, I could get it to happen a
couple times upon retrying). Your analysis (and the narrow window which
would allow for a failure) makes sense to me, along with the proposed
fix.
I think you should commit it and tentatively mark this bug as fixed :-).
Thank you for looking into it!
Maxim
^ permalink raw reply [flat|nested] 6+ messages in thread
* bug#52779: tests/no-home test failure in Shepherd
2022-01-13 17:29 ` Maxim Cournoyer
@ 2022-01-17 13:43 ` Ludovic Courtès
0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-01-17 13:43 UTC (permalink / raw)
To: Maxim Cournoyer; +Cc: 52779-done
Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
> I think you should commit it and tentatively mark this bug as fixed :-).
Alright, pushed as Shepherd commit
e025f83c2072e4a6533a5747682810a885e20b2e!
Thanks,
Ludo’.
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-01-17 14:54 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer
2022-01-03 21:09 ` Maxim Cournoyer
2022-01-10 18:17 ` Maxim Cournoyer
2022-01-13 15:37 ` Ludovic Courtès
2022-01-13 17:29 ` Maxim Cournoyer
2022-01-17 13:43 ` Ludovic Courtès
Code repositories for project(s) associated with this external index
https://git.savannah.gnu.org/cgit/guix.git
This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.