all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* bug#52779: tests/no-home test failure in Shepherd
@ 2021-12-24 14:52 Maxim Cournoyer
  2022-01-03 21:09 ` Maxim Cournoyer
  2022-01-13 15:37 ` Ludovic Courtès
  0 siblings, 2 replies; 6+ messages in thread
From: Maxim Cournoyer @ 2021-12-24 14:52 UTC (permalink / raw)
  To: 52779

Hello Guix,

I encountered the following (nondeterministic) test failure:

--8<---------------cut here---------------start------------->8---
# TOTAL: 13
# PASS:  12
# SKIP:  0
# XFAIL: 0
# FAIL:  1
# XPASS: 0
# ERROR: 0

.. contents:: :depth: 2

FAIL: tests/no-home
===================

+ shepherd --version
shepherd (GNU Shepherd) 0.8.1
Copyright (C) 2020 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ HOME=/nowhere
+ export HOME
+ test -w /nowhere
+ socket=t-socket-1651
+ pid=t-pid-1651
+ herd='herd -s t-socket-1651'
+ trap 'rm -f t-socket-1651;
      test -f t-pid-1651 && kill `cat t-pid-1651` || true; rm -f t-pid-1651' EXIT
+ shepherd_pid=1896
+ test -f t-pid-1651
+ shepherd -I -s t-socket-1651 -c /dev/null -l /dev/null --pid=t-pid-1651
+ kill -0 1896
+ sleep 0.3
Service root has been started.
+ test -f t-pid-1651
++ cat t-pid-1651
+ kill -0 1896
+ herd -s t-socket-1651 status root
Started:
 + root
+ herd -s t-socket-1651 stop root
++ cat t-pid-1651
+ kill 1896
+ exit 1
+ rm -f t-socket-1651
+ test -f t-pid-1651
++ cat t-pid-1651
+ kill 1896
+ rm -f t-pid-1651
FAIL tests/no-home.sh (exit status: 1)


error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f> 
phase `check' failed after 15.3 seconds
command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2
--8<---------------cut here---------------end--------------->8---

Perhaps due to high load, as I can't seem to reproduce when building
Shepherd alone.

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52779: tests/no-home test failure in Shepherd
  2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer
@ 2022-01-03 21:09 ` Maxim Cournoyer
  2022-01-10 18:17   ` Maxim Cournoyer
  2022-01-13 15:37 ` Ludovic Courtès
  1 sibling, 1 reply; 6+ messages in thread
From: Maxim Cournoyer @ 2022-01-03 21:09 UTC (permalink / raw)
  To: 52779

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Hello Guix,
>
> I encountered the following (nondeterministic) test failure:
>
> # TOTAL: 13
> # PASS:  12
> # SKIP:  0
> # XFAIL: 0
> # FAIL:  1
> # XPASS: 0
> # ERROR: 0
>
> .. contents:: :depth: 2
>
> FAIL: tests/no-home
> ===================
>
> + shepherd --version
> shepherd (GNU Shepherd) 0.8.1
> Copyright (C) 2020 the Shepherd authors
> License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
> This is free software: you are free to change and redistribute it.
> There is NO WARRANTY, to the extent permitted by law.
> + HOME=/nowhere
> + export HOME
> + test -w /nowhere
> + socket=t-socket-1651
> + pid=t-pid-1651
> + herd='herd -s t-socket-1651'
> + trap 'rm -f t-socket-1651;
>       test -f t-pid-1651 && kill `cat t-pid-1651` || true; rm -f t-pid-1651' EXIT
> + shepherd_pid=1896
> + test -f t-pid-1651
> + shepherd -I -s t-socket-1651 -c /dev/null -l /dev/null --pid=t-pid-1651
> + kill -0 1896
> + sleep 0.3
> Service root has been started.
> + test -f t-pid-1651
> ++ cat t-pid-1651
> + kill -0 1896
> + herd -s t-socket-1651 status root
> Started:
>  + root
> + herd -s t-socket-1651 stop root
> ++ cat t-pid-1651
> + kill 1896
> + exit 1
> + rm -f t-socket-1651
> + test -f t-pid-1651
> ++ cat t-pid-1651
> + kill 1896
> + rm -f t-pid-1651
> FAIL tests/no-home.sh (exit status: 1)
>
>
> error: in phase 'check': uncaught exception:
> %exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f> 
> phase `check' failed after 15.3 seconds
> command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2
>
> Perhaps due to high load, as I can't seem to reproduce when building
> Shepherd alone.

I encountered the problem once more; the errors are slightly different
so I'm pasting the failed test output here:

--8<---------------cut here---------------end--------------->8---
FAIL: tests/no-home
===================

+ shepherd --version
shepherd (GNU Shepherd) 0.8.1
Copyright (C) 2020 the Shepherd authors
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.
+ HOME=/nowhere
+ export HOME
+ test -w /nowhere
+ socket=t-socket-1606
+ pid=t-pid-1606
+ herd='herd -s t-socket-1606'
+ trap 'rm -f t-socket-1606;
      test -f t-pid-1606 && kill `cat t-pid-1606` || true; rm -f t-pid-1606' EXIT
+ shepherd_pid=1868
+ test -f t-pid-1606
+ shepherd -I -s t-socket-1606 -c /dev/null -l /dev/null --pid=t-pid-1606
+ kill -0 1868
+ sleep 0.3
Service root has been started.
+ test -f t-pid-1606
++ cat t-pid-1606
+ kill -0 1868
+ herd -s t-socket-1606 status root
Started:
 + root
+ herd -s t-socket-1606 stop root
++ cat t-pid-1606
+ kill 1868
+ exit 1
+ rm -f t-socket-1606
+ test -f t-pid-1606
++ cat t-pid-1606
+ kill 1868
./tests/no-home.sh: line 2: kill: (1868) - No such process
+ true
+ rm -f t-pid-1606
FAIL tests/no-home.sh (exit status: 1)


error: in phase 'check': uncaught exception:
%exception #<&invoke-error program: "make" arguments: ("check" "-j" "24" "GUILE_AUTO_COMPILE=0") exit-status: 2 term-signal: #f stop-signal: #f> 
phase `check' failed after 15.6 seconds
command "make" "check" "-j" "24" "GUILE_AUTO_COMPILE=0" failed with status 2
--8<---------------cut here---------------end--------------->8---

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52779: tests/no-home test failure in Shepherd
  2022-01-03 21:09 ` Maxim Cournoyer
@ 2022-01-10 18:17   ` Maxim Cournoyer
  0 siblings, 0 replies; 6+ messages in thread
From: Maxim Cournoyer @ 2022-01-10 18:17 UTC (permalink / raw)
  To: 52779

Hi,

Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:

> Hi,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> writes:
>
>> Hello Guix,
>>
>> I encountered the following (nondeterministic) test failure:
>>
>> # TOTAL: 13
>> # PASS:  12
>> # SKIP:  0
>> # XFAIL: 0
>> # FAIL:  1
>> # XPASS: 0
>> # ERROR: 0
>>
>> .. contents:: :depth: 2
>>
>> FAIL: tests/no-home
>> ===================

[...]

I tried to reproduce this is a controlled setting, using the 'stress'
command to effect a high load on the system, but in vain.

I could only trigger the problem on a highly parallel CPU (24 logical
cores) busy building 10s of Guix packages at the same time.

Thanks,

Maxim




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52779: tests/no-home test failure in Shepherd
  2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer
  2022-01-03 21:09 ` Maxim Cournoyer
@ 2022-01-13 15:37 ` Ludovic Courtès
  2022-01-13 17:29   ` Maxim Cournoyer
  1 sibling, 1 reply; 6+ messages in thread
From: Ludovic Courtès @ 2022-01-13 15:37 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 52779

[-- Attachment #1: Type: text/plain, Size: 2613 bytes --]

Hello,

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> + herd -s t-socket-1651 status root
> Started:
>  + root
> + herd -s t-socket-1651 stop root
> ++ cat t-pid-1651
> + kill 1896
> + exit 1
> + rm -f t-socket-1651
> + test -f t-pid-1651
> ++ cat t-pid-1651
> + kill 1896
> + rm -f t-pid-1651
> FAIL tests/no-home.sh (exit status: 1)

What happens here is that the shepherd process is still alive after
‘herd stop root’ has completed, contrary to what’s expected:

--8<---------------cut here---------------start------------->8---
$herd stop root

if kill `cat "$pid"`
then
    exit 1
fi
--8<---------------cut here---------------end--------------->8---

The expectation is that shepherd has terminated by the time ‘herd stop
root’ exits; I wonder if that’s bogus.

‘herd stop root’ terminates when shepherd has closed its connection,
which normally happens when shepherd exits:

--8<---------------cut here---------------start------------->8---
28003 read(15, "(shepherd-command (version 0) (action stop) (service root) (arguments ()) (directory \"/data/src/shepherd\"))", 1024) = 107
28003 brk(0x1030000)                    = 0x1030000
28003 mmap(NULL, 262144, PROT_READ|PROT_WRITE|PROT_EXEC, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f0072be8000
28003 brk(0x100f000)                    = 0x100f000
28003 getcwd("/data/src/shepherd", 100) = 19
28003 chdir("/data/src/shepherd")       = 0
28003 newfstatat(AT_FDCWD, "/etc/localtime", {st_mode=S_IFREG|0444, st_size=2962, ...}, 0) = 0
28003 write(7, "2022-01-13 16:21:16 Exiting shepherd...\n", 40) = 40
28003 chdir("/data/src/shepherd")       = 0
28003 getuid()                          = 1000
28003 close(13)                         = 0
28003 unlink("test")                    = 0
28003 exit_group(0)                     = ?
28006 <... futex resumed>)              = ?
28008 <... read resumed> <unfinished ...>) = ?
28005 <... futex resumed>)              = ?
28004 <... futex resumed>)              = ?
28008 +++ exited with 0 +++
28006 +++ exited with 0 +++
28005 +++ exited with 0 +++
28004 +++ exited with 0 +++
28003 +++ exited with 0 +++
--8<---------------cut here---------------end--------------->8---

Maybe there’s a chance that the shell hasn’t processed the shepherd’s
SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that
case, the shepherd process still exists as a zombie.

A more robust approach might be to use the shell’s builtin ‘wait’,
because then I suppose the shell will be forced to process pending
SIGCHLDs:


[-- Attachment #2: Type: text/x-patch, Size: 553 bytes --]

diff --git a/tests/no-home.sh b/tests/no-home.sh
index 85b6116..5a8c278 100644
--- a/tests/no-home.sh
+++ b/tests/no-home.sh
@@ -1,5 +1,5 @@
 # GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not writable.
-# Copyright © 2014, 2016 Ludovic Courtès <ludo@gnu.org>
+# Copyright © 2014, 2016, 2022 Ludovic Courtès <ludo@gnu.org>
 #
 # This file is part of the GNU Shepherd.
 #
@@ -46,7 +46,4 @@ kill -0 `cat "$pid"`
 $herd status root
 $herd stop root
 
-if kill `cat "$pid"`
-then
-    exit 1
-fi
+wait `cat "$pid"`

[-- Attachment #3: Type: text/plain, Size: 217 bytes --]


I can’t get it to fail while waiting for a few minutes of:

  while make check TESTS=tests/no-home.sh ; do : ; done

… but I cannot get the original one to fail either.

Does it work for you?

Ludo’.

^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52779: tests/no-home test failure in Shepherd
  2022-01-13 15:37 ` Ludovic Courtès
@ 2022-01-13 17:29   ` Maxim Cournoyer
  2022-01-17 13:43     ` Ludovic Courtès
  0 siblings, 1 reply; 6+ messages in thread
From: Maxim Cournoyer @ 2022-01-13 17:29 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: 52779

Hi Ludo!

Ludovic Courtès <ludo@gnu.org> writes:

> Hello,
>
> Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:
>
>> + herd -s t-socket-1651 status root
>> Started:
>>  + root
>> + herd -s t-socket-1651 stop root
>> ++ cat t-pid-1651
>> + kill 1896
>> + exit 1
>> + rm -f t-socket-1651
>> + test -f t-pid-1651
>> ++ cat t-pid-1651
>> + kill 1896
>> + rm -f t-pid-1651
>> FAIL tests/no-home.sh (exit status: 1)
>
> What happens here is that the shepherd process is still alive after
> ‘herd stop root’ has completed, contrary to what’s expected:
>
> $herd stop root
>
> if kill `cat "$pid"`
> then
>     exit 1
> fi

Yes!

[...]

> Maybe there’s a chance that the shell hasn’t processed the shepherd’s
> SIGCHLD when it evaluates the “if kill `cat "$pid"`” condition; in that
> case, the shepherd process still exists as a zombie.
>
> A more robust approach might be to use the shell’s builtin ‘wait’,
> because then I suppose the shell will be forced to process pending
> SIGCHLDs:
>
> diff --git a/tests/no-home.sh b/tests/no-home.sh
> index 85b6116..5a8c278 100644
> --- a/tests/no-home.sh
> +++ b/tests/no-home.sh
> @@ -1,5 +1,5 @@
>  # GNU Shepherd --- Make sure shepherd doesn't fail when $HOME is not writable.
> -# Copyright © 2014, 2016 Ludovic Courtès <ludo@gnu.org>
> +# Copyright © 2014, 2016, 2022 Ludovic Courtès <ludo@gnu.org>
>  #
>  # This file is part of the GNU Shepherd.
>  #
> @@ -46,7 +46,4 @@ kill -0 `cat "$pid"`
>  $herd status root
>  $herd stop root
>  
> -if kill `cat "$pid"`
> -then
> -    exit 1
> -fi
> +wait `cat "$pid"`

As I wrote, I was also unable to reproduce this (but when I had a high
load of packages to build at the same time, I could get it to happen a
couple times upon retrying).  Your analysis (and the narrow window which
would allow for a failure) makes sense to me, along with the proposed
fix.

I think you should commit it and tentatively mark this bug as fixed :-).

Thank you for looking into it!

Maxim




^ permalink raw reply	[flat|nested] 6+ messages in thread

* bug#52779: tests/no-home test failure in Shepherd
  2022-01-13 17:29   ` Maxim Cournoyer
@ 2022-01-17 13:43     ` Ludovic Courtès
  0 siblings, 0 replies; 6+ messages in thread
From: Ludovic Courtès @ 2022-01-17 13:43 UTC (permalink / raw)
  To: Maxim Cournoyer; +Cc: 52779-done

Maxim Cournoyer <maxim.cournoyer@gmail.com> skribis:

> I think you should commit it and tentatively mark this bug as fixed :-).

Alright, pushed as Shepherd commit
e025f83c2072e4a6533a5747682810a885e20b2e!

Thanks,
Ludo’.




^ permalink raw reply	[flat|nested] 6+ messages in thread

end of thread, other threads:[~2022-01-17 14:54 UTC | newest]

Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-12-24 14:52 bug#52779: tests/no-home test failure in Shepherd Maxim Cournoyer
2022-01-03 21:09 ` Maxim Cournoyer
2022-01-10 18:17   ` Maxim Cournoyer
2022-01-13 15:37 ` Ludovic Courtès
2022-01-13 17:29   ` Maxim Cournoyer
2022-01-17 13:43     ` Ludovic Courtès

Code repositories for project(s) associated with this inbox:

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.