all messages for Guix-related lists mirrored at yhetil.org
 help / color / mirror / code / Atom feed
* Adding a TOR system test - confusing results
@ 2018-07-23  0:14 Chris Marusich
  2018-07-23  9:01 ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Marusich @ 2018-07-23  0:14 UTC (permalink / raw)
  To: guix-devel


[-- Attachment #1.1: Type: text/plain, Size: 2452 bytes --]

Hi Guix,

In response to a recent email thread [1], I'd like to update our TOR
service to allow users to run TOR over a UNIX-domain socket.  I
discovered that TOR has no system tests, so I decided to start by
creating a basic system test to protect against regressions I might
accidentally introduce.

Unfortunately, the system test I've added behaves mysteriously.  It
succeeds when I think it obviously should not.  Could you please help me
understand what's going on?

Let's examine the problem.  I've attached two patches to this email.
They apply cleanly to 889e7fab3c04be98a59b880bf44fbdaa6ddf0a4e.

To begin, let's apply the first patch ("Add system tests for the TOR
service.").  Then run the test:

--8<---------------cut here---------------start------------->8---
$ make -j1 check-system TESTS=tor
[...]
# of expected passes      1
@ build-succeeded /gnu/store/zdh97qnh4b69yr23yp5js16hw9lvjhzm-tor-test.drv -
TOTAL: 1
PASS: /gnu/store/c3f0y7jbqi1qw8s63p3jbp5mrwl1bxnc-tor-test
--8<---------------cut here---------------end--------------->8---

It passed!  Great!  Now, let's make sure it also detects failures
correctly: Let's break TOR by applying the second patch ("For testing
purposes, break the TOR service.").  Then run it again:

--8<---------------cut here---------------start------------->8---
$ make -j1 check-system TESTS=tor
[...]
This is the GNU system.  Welcome.
komputilo login: exec of "/gnu/store/5jhlsv29n6kx76na1kkgfarznrl5kjjg-tor-0.3.3.9DOES-NOT-EXIST" failed: No such file or directory
QEMU runs as PID 4
connected to QEMU's monitor
read QEMU monitor prompt
connected to guest REPL
%%%% Starting test tor  (Writing full log to "tor.log")
marionette is ready
# of expected passes      1
@ build-succeeded /gnu/store/z8q9y8zivsipaq9fdj7yhhcvqgfq8ryl-tor-test.drv -
TOTAL: 1
PASS: /gnu/store/r84kh8x3d9ypqi6nl6r9xkqpk9i8jgk2-tor-test
[0] [env] marusich@garuda.local:~/guix
$ 
--8<---------------cut here---------------end--------------->8---

The test still passes, but I can't figure out why.  There is even an
error message in there, complaining that we can't exec the path to the
TOR program.  What is going on?  I've already tried rebuilding
everything from a clean slate, so that isn't the issue.

Any help would be greatly appreciated!  Thank you in advance,

Footnotes: 
[1]  https://lists.gnu.org/archive/html/help-guix/2018-07/msg00087.html

-- 
Chris

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-Add-system-tests-for-the-TOR-service.patch --]
[-- Type: text/x-patch, Size: 2527 bytes --]

From 63f84210927bc45ad0d239b4b92aaf028fa5c5ac Mon Sep 17 00:00:00 2001
From: Chris Marusich <cmmarusich@gmail.com>
Date: Sun, 22 Jul 2018 16:23:53 -0700
Subject: [PATCH 1/2] Add system tests for the TOR service.

---
 gnu/tests/networking.scm | 55 +++++++++++++++++++++++++++++++++++++++-
 1 file changed, 54 insertions(+), 1 deletion(-)

diff --git a/gnu/tests/networking.scm b/gnu/tests/networking.scm
index 323679e7f..1128e3448 100644
--- a/gnu/tests/networking.scm
+++ b/gnu/tests/networking.scm
@@ -30,7 +30,7 @@
   #:use-module (gnu packages bash)
   #:use-module (gnu packages networking)
   #:use-module (gnu services shepherd)
-  #:export (%test-inetd %test-openvswitch %test-dhcpd))
+  #:export (%test-inetd %test-openvswitch %test-dhcpd %test-tor))
 
 (define %inetd-os
   ;; Operating system with 2 inetd services.
@@ -339,3 +339,56 @@ subnet 192.168.1.0 netmask 255.255.255.0 {
    (name "dhcpd")
    (description "Test a running DHCP daemon configuration.")
    (value (run-dhcpd-test))))
+
+\f
+;;;
+;;; Services related to TOR
+;;;
+
+(define %tor-os
+  (simple-operating-system
+   (tor-service)))
+
+(define (run-tor-test)
+  (define os
+    (marionette-operating-system %tor-os
+                                 #:imported-modules '((gnu services herd))))
+
+  (define test
+    (with-imported-modules '((gnu build marionette))
+      #~(begin
+          (use-modules (gnu build marionette)
+                       (ice-9 popen)
+                       (ice-9 rdelim)
+                       (srfi srfi-64))
+
+          (define marionette
+            (make-marionette (list #$(virtual-machine os))))
+
+          (mkdir #$output)
+          (chdir #$output)
+
+          (test-begin "tor")
+
+          (test-assert "tor is alive"
+            (marionette-eval
+             '(begin
+                (use-modules (gnu services herd)
+                             (srfi srfi-1))
+                (live-service-running
+                 (find (lambda (live)
+                         (memq 'tor
+                               (live-service-provision live)))
+                       (current-services))))
+             marionette))
+
+          (test-end)
+          (exit (= (test-runner-fail-count (test-runner-current)) 0)))))
+
+  (gexp->derivation "tor-test" test))
+
+(define %test-tor
+  (system-test
+   (name "tor")
+   (description "Test a running TOR daemon configuration.")
+   (value (run-tor-test))))
-- 
2.18.0


[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.3: 0002-For-testing-purposes-break-the-TOR-service.patch --]
[-- Type: text/x-patch, Size: 984 bytes --]

From f2ac689f2d1cc308eb7b4371130b4b2a5ea1a393 Mon Sep 17 00:00:00 2001
From: Chris Marusich <cmmarusich@gmail.com>
Date: Sun, 22 Jul 2018 16:38:23 -0700
Subject: [PATCH 2/2] For testing purposes, break the TOR service.

---
 gnu/services/networking.scm | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/gnu/services/networking.scm b/gnu/services/networking.scm
index d5d0cf9d1..7996aae7e 100644
--- a/gnu/services/networking.scm
+++ b/gnu/services/networking.scm
@@ -657,7 +657,7 @@ HiddenServicePort ~a ~a~%"
                            (gnu system file-systems)))
 
                 (start #~(make-forkexec-constructor/container
-                          (list #$(file-append tor "/bin/tor") "-f" #$torrc)
+                          (list #$(file-append tor "DOES-NOT-EXIST") "-f" #$torrc)
 
                           #:mappings (list (file-system-mapping
                                             (source "/var/lib/tor")
-- 
2.18.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Adding a TOR system test - confusing results
  2018-07-23  0:14 Adding a TOR system test - confusing results Chris Marusich
@ 2018-07-23  9:01 ` Ludovic Courtès
  2018-07-24  3:11   ` Chris Marusich
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-07-23  9:01 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel

Hi Chris,

Chris Marusich <cmmarusich@gmail.com> skribis:

> To begin, let's apply the first patch ("Add system tests for the TOR
> service.").  Then run the test:
>
> $ make -j1 check-system TESTS=tor
> [...]
> # of expected passes      1
> @ build-succeeded /gnu/store/zdh97qnh4b69yr23yp5js16hw9lvjhzm-tor-test.drv -
> TOTAL: 1
> PASS: /gnu/store/c3f0y7jbqi1qw8s63p3jbp5mrwl1bxnc-tor-test
>
>
> It passed!  Great!  Now, let's make sure it also detects failures
> correctly: Let's break TOR by applying the second patch ("For testing
> purposes, break the TOR service.").  Then run it again:
>
> $ make -j1 check-system TESTS=tor
> [...]
> This is the GNU system.  Welcome.
> komputilo login: exec of "/gnu/store/5jhlsv29n6kx76na1kkgfarznrl5kjjg-tor-0.3.3.9DOES-NOT-EXIST" failed: No such file or directory
> QEMU runs as PID 4
> connected to QEMU's monitor
> read QEMU monitor prompt
> connected to guest REPL
> %%%% Starting test tor  (Writing full log to "tor.log")
> marionette is ready
> # of expected passes      1
> @ build-succeeded /gnu/store/z8q9y8zivsipaq9fdj7yhhcvqgfq8ryl-tor-test.drv -

The problem has to do with ‘make-forkexec-constructor/container’: it
creates a process (with ‘run-container’) that exec’s the program, and
then returns immediately.  Thus we know that a process was created, but
we don’t know that soon after that it exited because of a failed ‘exec’
call.

If you run it in a VM, you’ll notice that shepherd tries to respawn it
several times in a row and eventually disables the service.

The test above is unreliable because during this fail/respawn cycle, the
service keeps going up and down.

The solution here would be to use a PID file for the Tor service.  That
would (1) allow us to properly detect failure to run Tor, and (2)
provide better synchronization with Tor startup.

HTH!

Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Adding a TOR system test - confusing results
  2018-07-23  9:01 ` Ludovic Courtès
@ 2018-07-24  3:11   ` Chris Marusich
  2018-07-26 13:36     ` Ludovic Courtès
  0 siblings, 1 reply; 8+ messages in thread
From: Chris Marusich @ 2018-07-24  3:11 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel


[-- Attachment #1.1: Type: text/plain, Size: 2372 bytes --]

Hi Ludo,

ludo@gnu.org (Ludovic Courtès) writes:

> The problem has to do with ‘make-forkexec-constructor/container’: it
> creates a process (with ‘run-container’) that exec’s the program, and
> then returns immediately.  Thus we know that a process was created, but
> we don’t know that soon after that it exited because of a failed ‘exec’
> call.
>
> If you run it in a VM, you’ll notice that shepherd tries to respawn it
> several times in a row and eventually disables the service.
>
> The test above is unreliable because during this fail/respawn cycle, the
> service keeps going up and down.
>
> The solution here would be to use a PID file for the Tor service.  That
> would (1) allow us to properly detect failure to run Tor, and (2)
> provide better synchronization with Tor startup.

That makes sense.  TOR supports a PID file, so I'll give that a try.

In the meantime, there is another similar mystery I could not solve.
Consider the attached patch, which applies cleanly on top of the first
patch in my previous email.  This is what it does when I run it:

--8<---------------cut here---------------start------------->8---
$ make -j 1 check-system TESTS=tor
[...]
This is the GNU system.  Welcome.
komputilo login: Jul 24 03:07:46.688 [notice] Tor 0.3.3.9 (git-45028085ea188baf) running on Linux with L.
Jul 24 03:07:46.692 [notice] Tor can't help you if you use it wrong! Learn how to be safe at https://wwwg
Jul 24 03:07:46.702 [notice] Read configuration file "/gnu/store/g4h37si0fjqijrqp6smn6jq23bqj1051-torrc".
Jul 24 03:07:46.731 [notice] Scheduler type KIST has been enabled.
Jul 24 03:07:46.733 [notice] Opening Socks listener on 127.0.0.1:9050
QEMU runs as PID 4
connected to QEMU's monitor
read QEMU monitor prompt
connected to guest REPL
%%%% Starting test tor  (Writing full log to "tor.log")
marionette is ready
# of expected passes      1
@ build-succeeded /gnu/store/g35kgsr3298k2s15nn3f8qpcxbdxrvrl-tor-test.drv -
TOTAL: 1
PASS: /gnu/store/ghkrf4cx4mkmzjyx1m8hdg5hbk6zjzb2-tor-test
--8<---------------cut here---------------end--------------->8---

Why does it pass?  The (sleep 999999) form appears to be totally
ignored, and I don't understand why.  It's concerning to me that I can't
understand how the code is being executed.  I expected the test to hang.

-- 
Chris

[-- Warning: decoded text below may be mangled, UTF-8 assumed --]
[-- Attachment #1.2: 0001-Sleep-for-about-278-hours-during-the-TOR-test.patch --]
[-- Type: text/x-patch, Size: 811 bytes --]

From 29bcb8ed3181a6d871d04311a484f1bdffa476a8 Mon Sep 17 00:00:00 2001
From: Chris Marusich <cmmarusich@gmail.com>
Date: Mon, 23 Jul 2018 20:10:09 -0700
Subject: [PATCH] Sleep for about 278 hours during the TOR test.

---
 gnu/tests/networking.scm | 1 +
 1 file changed, 1 insertion(+)

diff --git a/gnu/tests/networking.scm b/gnu/tests/networking.scm
index 1128e3448..7984525bc 100644
--- a/gnu/tests/networking.scm
+++ b/gnu/tests/networking.scm
@@ -375,6 +375,7 @@ subnet 192.168.1.0 netmask 255.255.255.0 {
              '(begin
                 (use-modules (gnu services herd)
                              (srfi srfi-1))
+                (sleep 999999)
                 (live-service-running
                  (find (lambda (live)
                          (memq 'tor
-- 
2.18.0


[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply related	[flat|nested] 8+ messages in thread

* Re: Adding a TOR system test - confusing results
  2018-07-24  3:11   ` Chris Marusich
@ 2018-07-26 13:36     ` Ludovic Courtès
  2018-07-27  5:13       ` Chris Marusich
  0 siblings, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-07-26 13:36 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel

Hello,

Chris Marusich <cmmarusich@gmail.com> skribis:

> Why does it pass?  The (sleep 999999) form appears to be totally
> ignored, and I don't understand why.  It's concerning to me that I can't
> understand how the code is being executed.  I expected the test to hang.

According to the manual (info "(guile) Signals"):

--8<---------------cut here---------------start------------->8---
 -- Scheme Procedure: sleep secs
 -- Scheme Procedure: usleep usecs
 -- C Function: scm_sleep (secs)
 -- C Function: scm_usleep (usecs)
     Wait the given period SECS seconds or USECS microseconds (both
     integers).  If a signal arrives the wait stops and the return value
     is the time remaining, in seconds or microseconds respectively.  If
     the period elapses with no signal the return is zero.
--8<---------------cut here---------------end--------------->8---

So I suspect that the marionette process got a signal, and thus ‘sleep’
returned quickly.  Could you print its return value?

HTH,
Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Adding a TOR system test - confusing results
  2018-07-26 13:36     ` Ludovic Courtès
@ 2018-07-27  5:13       ` Chris Marusich
  2018-07-27  8:19         ` Chris Marusich
  2018-07-27  8:41         ` Ludovic Courtès
  0 siblings, 2 replies; 8+ messages in thread
From: Chris Marusich @ 2018-07-27  5:13 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1898 bytes --]

ludo@gnu.org (Ludovic Courtès) writes:

> Chris Marusich <cmmarusich@gmail.com> skribis:
>
>> Why does it pass?  The (sleep 999999) form appears to be totally
>> ignored, and I don't understand why.  It's concerning to me that I can't
>> understand how the code is being executed.  I expected the test to hang.
>
> According to the manual (info "(guile) Signals"):
>
>  -- Scheme Procedure: sleep secs
>  -- Scheme Procedure: usleep usecs
>  -- C Function: scm_sleep (secs)
>  -- C Function: scm_usleep (usecs)
>      Wait the given period SECS seconds or USECS microseconds (both
>      integers).  If a signal arrives the wait stops and the return value
>      is the time remaining, in seconds or microseconds respectively.  If
>      the period elapses with no signal the return is zero.
>
> So I suspect that the marionette process got a signal, and thus ‘sleep’
> returned quickly.  Could you print its return value?

Thank you for the reply!  The return value is 999998.  Like you said,
maybe it's possible that the marionette process got a signal.  But why
would that happen?

I have a related question.  Is there a guarantee that the test case will
begin after the TOR service has started up?  The test passes
consistently, which is good.  However, I can't find any code that
guarantees that the test case will begin after TOR has started up.  In
order to make the test succeed even if it happens to check TOR before
TOR has started up, I think I need to check the TOR service in a loop
and eventually time out if it doesn't show up after a while.  That will
be difficult to implement if I can't rely on the "sleep" procedure.

In short, my questions are:

* Why does the sleep get interrupted?
* Do I need to periodically poll for TOR's health in my test?

I appreciate your time answering these detailed questions.  Thank you.

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Adding a TOR system test - confusing results
  2018-07-27  5:13       ` Chris Marusich
@ 2018-07-27  8:19         ` Chris Marusich
  2018-07-27  8:41         ` Ludovic Courtès
  1 sibling, 0 replies; 8+ messages in thread
From: Chris Marusich @ 2018-07-27  8:19 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1008 bytes --]

Chris Marusich <cmmarusich@gmail.com> writes:

> In short, my questions are:
>
> * Why does the sleep get interrupted?

I've been looking into this, but I still don't know why it happens.  If
you have any ideas, I'd love to know!

> * Do I need to periodically poll for TOR's health in my test?

I figured out the answer to this question by examining the source of
Shepherd and Guix.  The answer is: if I declare the TOR service to be a
requirement of the Marionette service (by including it in the
#:requirements argument when calling marionette-operating-system), then
Shepherd will start the TOR service and set the TOR service's "running"
slot to a non-#f value before starting Marionette.  So if that's all I
care about, I can just ask Shepherd if TOR is alive in my test, and it
will work as expected.

Of course, if I wanted to check some other feature of TOR which maybe
doesn't come online until some time after the process is started, that's
a different story.

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Adding a TOR system test - confusing results
  2018-07-27  5:13       ` Chris Marusich
  2018-07-27  8:19         ` Chris Marusich
@ 2018-07-27  8:41         ` Ludovic Courtès
  2018-07-27  9:58           ` Chris Marusich
  1 sibling, 1 reply; 8+ messages in thread
From: Ludovic Courtès @ 2018-07-27  8:41 UTC (permalink / raw)
  To: Chris Marusich; +Cc: guix-devel

Chris Marusich <cmmarusich@gmail.com> skribis:

> ludo@gnu.org (Ludovic Courtès) writes:
>
>> Chris Marusich <cmmarusich@gmail.com> skribis:
>>
>>> Why does it pass?  The (sleep 999999) form appears to be totally
>>> ignored, and I don't understand why.  It's concerning to me that I can't
>>> understand how the code is being executed.  I expected the test to hang.
>>
>> According to the manual (info "(guile) Signals"):
>>
>>  -- Scheme Procedure: sleep secs
>>  -- Scheme Procedure: usleep usecs
>>  -- C Function: scm_sleep (secs)
>>  -- C Function: scm_usleep (usecs)
>>      Wait the given period SECS seconds or USECS microseconds (both
>>      integers).  If a signal arrives the wait stops and the return value
>>      is the time remaining, in seconds or microseconds respectively.  If
>>      the period elapses with no signal the return is zero.
>>
>> So I suspect that the marionette process got a signal, and thus ‘sleep’
>> returned quickly.  Could you print its return value?
>
> Thank you for the reply!  The return value is 999998.  Like you said,
> maybe it's possible that the marionette process got a signal.  But why
> would that happen?

I’m not sure!  It could be SIGPIPE or SIGCHLD, this is left as an
exercise to you.  :-)  Seriously, you could register signal handlers and
try to see what’s going on.

It doesn’t change the fact that the return value of ‘sleep’ must be
checked, though.

> I have a related question.  Is there a guarantee that the test case will
> begin after the TOR service has started up?

Most service tests start by waiting for the service to be up using
something like:

            (marionette-eval
             '(begin
                (use-modules (gnu services herd))
                (start-service 'mcron))
             marionette)

That blocks until ‘mcron’ is up or it has failed to start.

HTH!

Ludo’.

^ permalink raw reply	[flat|nested] 8+ messages in thread

* Re: Adding a TOR system test - confusing results
  2018-07-27  8:41         ` Ludovic Courtès
@ 2018-07-27  9:58           ` Chris Marusich
  0 siblings, 0 replies; 8+ messages in thread
From: Chris Marusich @ 2018-07-27  9:58 UTC (permalink / raw)
  To: Ludovic Courtès; +Cc: guix-devel

[-- Attachment #1: Type: text/plain, Size: 1078 bytes --]

ludo@gnu.org (Ludovic Courtès) writes:

>> Like you said, maybe it's possible that the marionette process got a
>> signal.  But why would that happen?
>
> I’m not sure!  It could be SIGPIPE or SIGCHLD, this is left as an
> exercise to you.  :-)  Seriously, you could register signal handlers and
> try to see what’s going on.

That's fair.  I'll do that if I actually need to figure out what's going
on with the sleep during the test.  For now, I don't have to sleep, so
it isn't a big deal yet.

>> I have a related question.  Is there a guarantee that the test case will
>> begin after the TOR service has started up?
>
> Most service tests start by waiting for the service to be up using
> something like:
>
>             (marionette-eval
>              '(begin
>                 (use-modules (gnu services herd))
>                 (start-service 'mcron))
>              marionette)
>
> That blocks until ‘mcron’ is up or it has failed to start.

That's another good way to do it!  Thank you for entertaining my
questions.

-- 
Chris

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 832 bytes --]

^ permalink raw reply	[flat|nested] 8+ messages in thread

end of thread, other threads:[~2018-07-27  9:58 UTC | newest]

Thread overview: 8+ messages (download: mbox.gz follow: Atom feed
-- links below jump to the message on this page --
2018-07-23  0:14 Adding a TOR system test - confusing results Chris Marusich
2018-07-23  9:01 ` Ludovic Courtès
2018-07-24  3:11   ` Chris Marusich
2018-07-26 13:36     ` Ludovic Courtès
2018-07-27  5:13       ` Chris Marusich
2018-07-27  8:19         ` Chris Marusich
2018-07-27  8:41         ` Ludovic Courtès
2018-07-27  9:58           ` Chris Marusich

Code repositories for project(s) associated with this external index

	https://git.savannah.gnu.org/cgit/guix.git

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.