unofficial mirror of notmuch@notmuchmail.org
 help / color / mirror / code / Atom feed
From: Daniel Kahn Gillmor <dkg@fifthhorseman.net>
To: David Bremner <david@tethera.net>, notmuch@notmuchmail.org
Subject: Re: parallel tests broken on Debian stable
Date: Mon, 20 May 2019 19:49:02 -0400	[thread overview]
Message-ID: <87pnocadqp.fsf@fifthhorseman.net> (raw)
In-Reply-To: <878sv1avfc.fsf@fifthhorseman.net>

[-- Attachment #1: Type: text/plain, Size: 8409 bytes --]

On Mon 2019-05-20 13:27:03 -0400, Daniel Kahn Gillmor wrote:
>  c) we should avoid the timeout hanging :)

I dug into this today, and i'm reporting back my findings.

I have what appears to be a fix (see below), but i don't understand it,
so i'm not advocating for it.

To be clear: my two test cases are two KVM instances, one running
stretch (debian stable) and one running sid (debian unstable).  both
systems have 4 virtual CPUs (on a hardware platform that has 4 cores).

The two VMs are otherwise similarly configured.  Both have moreutils
installed, and GNU parallel is not installed.

on the stretch system, i can achieve this hang/failure with a simple
"make -j4 check".  on the "sid" system, i do not see the failure.

When i disable the use of timeout entirely (with NOTMUCH_TEST_TIMEOUT=0,
see id:20190520232535.4904-1-dkg@fifthhorseman.net), the problem goes
away on the stretch system.

When i inspect the state of the debian stretch system when the tests are
hanging, i see this (from "ps auwx"):

------------------------
USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
dkg       7980  1.8  0.5  10228  4348 pts/1    S+   18:49   0:00 make -j4 check
dkg       8001  0.0  0.3  11228  2884 pts/1    S+   18:49   0:00 bash /home/dkg/src/notmuch/notmuch/test/notmuch-test
dkg       8011  0.0  0.1  10092   804 pts/1    S    18:49   0:00 timeout 2m parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/
dkg       8012  0.0  0.0   4168   744 pts/1    T    18:49   0:00 parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/src/notmuch
dkg       8013  0.0  0.0   4168    96 pts/1    T    18:49   0:00 parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/src/notmuch
dkg       8014  0.0  0.0   4168    96 pts/1    T    18:49   0:00 parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/src/notmuch
dkg       8267  0.0  0.0   4168    96 pts/1    T    18:49   0:00 parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/src/notmuch
dkg       8268  0.0  0.0   4276   732 pts/1    T    18:49   0:00 sh -c /home/dkg/src/notmuch/notmuch/test/T050-new.sh
dkg       8270  2.7  0.4  11772  3744 pts/1    T    18:49   0:00 bash /home/dkg/src/notmuch/notmuch/test/T050-new.sh
dkg       8320  0.0  0.0   4168    96 pts/1    T    18:49   0:00 parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/src/notmuch
dkg       8321  0.0  0.0   4276   748 pts/1    T    18:49   0:00 sh -c /home/dkg/src/notmuch/notmuch/test/T060-count.sh
dkg       8322  0.7  0.4  11752  3556 pts/1    T    18:49   0:00 bash /home/dkg/src/notmuch/notmuch/test/T060-count.sh
dkg       8345  0.0  0.0   4168    96 pts/1    T    18:49   0:00 parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/src/notmuch
dkg       8346  0.0  0.0   4276   744 pts/1    T    18:49   0:00 sh -c /home/dkg/src/notmuch/notmuch/test/T070-insert.sh
dkg       8347  1.7  0.4  11772  3764 pts/1    T    18:49   0:00 bash /home/dkg/src/notmuch/notmuch/test/T070-insert.sh
dkg       8425  0.0  0.0   4168    96 pts/1    T    18:49   0:00 parallel -- /home/dkg/src/notmuch/notmuch/test/T000-basic.sh /home/dkg/src/notmuch/notmuch/test/T010-help-test.sh /home/dkg/src/notmuch
dkg       8426  0.0  0.0   4276   740 pts/1    T    18:49   0:00 sh -c /home/dkg/src/notmuch/notmuch/test/T080-search.sh
dkg       8427  1.5  0.4  11752  3664 pts/1    T    18:49   0:00 bash /home/dkg/src/notmuch/notmuch/test/T080-search.sh
dkg       8763  4.7  2.9  73960 22708 pts/1    T    18:49   0:00 gdb --batch-silent --return-child-result -x count-files.gdb --args notmuch count --output=files *
dkg       8914  0.0  0.8  68508  6228 pts/1    T    18:49   0:00 notmuch search --format=text0 --output=files --offset=1 --limit=1 *
dkg       8915  0.0  0.1   4484  1164 pts/1    T    18:49   0:00 xargs -0 -I {} mv {} /home/dkg/src/notmuch/notmuch/test/tmp.T050-new/mail/moved_messages
dkg       8916  0.0  0.5  68244  3824 pts/1    T    18:49   0:00 notmuch insert --folder=Drafts +draft -unread
dkg       8919  0.0  0.0  13012   704 pts/1    T    18:49   0:00 notmuch new
dkg       8920  0.0  0.0   1412     4 pts/1    t    18:49   0:00 /bin/bash -c exec /home/dkg/src/notmuch/notmuch/notmuch count --output=files \*
------------------------


As you can see in the "STAT" column, nearly all of the hanging processes
are marked with T ("stopped by job control signal" according to ps(1)).

I also note that "t" means "stopped by debugger during the tracing" --
maybe that final line (with "t") is the special one that triggers this?
i don't know.

When i try to connect to any of these stopped processes with "strace -p
$PID", strace reports:

    strace: Process 4204 attached
    --- stopped by SIGTTOU ---

SIGTTOU is novel to me, and i don't really understand why the test suite
would have this problem.  Skimming this guidance:

    http://curiousthing.org/sigttin-sigttou-deep-dive-linux

suggested that maybe if i just decoupled the processes from the terminal
"enough" i could get away with a functioning test suite.  redirecting
all of stdin, stdout, stderr to /dev/null worked!  then i tried pruning
out different pieces, and found that all i needed to do was to redirect
stdin from /dev/null and the test suite would run without problems in
parallel with moreutils parallel.  (it also works with GNU parallel, and
if i run the tests serially).

So the patch below is a "fix" but it's not a principled one.

the source for moreutils parallel.c doesn't appear to have changed at
all between stretch and buster.  I tried upgrading the version of
moreutils on this stretch system from 0.60-1 to 0.62-1, and i was able
to reproduce the same problem.  So i don't believe the problem is with
moreutils.

Some things that might be different between debian stable (stretch) and
testing (buster):

    package        provides               stretch           buster
    -------        --------               -------           ------
    GNU coreutils  /usr/bin/timeout       8.26-3            8.30-3
    GNU bash       /bin/bash              4.4-5             5.0-4
    GNU dash       /bin/sh (via symlink)  0.5.8-2.4         0.5.10.2-5
    Linux          the kernel             4.9.168-1+deb9u2  4.19.37-3
    GNU gdb        /usr/bin/gdb           7.12-6            8.2.1-2

I also tried changing the symlink for /bin/sh to point to bash instead
of dash, and was still able to replicate the problem, so i suspect dash
is not the culprit.

However, i tried selectively upgrading all the versions of all of these
packages *except for gdb* to the version in buster (or to the version
from backports, in the case of the kernel).  and i'm *still* seeing the
problem on the stretch system.

So perhaps it's some interaction between timeout and gdb?  I haven't
managed to test that particular combination yet.

I hope someone else will look into this further, as i'm out of my depth.

  --dkg

diff --git a/test/Makefile.local b/test/Makefile.local
index 47244e8f..3a57b6be 100644
--- a/test/Makefile.local
+++ b/test/Makefile.local
@@ -66,13 +66,13 @@ test-binaries: $(TEST_BINARIES)
 test:  all test-binaries
 ifeq ($V,)
        @echo 'Use "$(MAKE) V=1" to see the details for passing and known broken tests.'
-       @env NOTMUCH_TEST_QUIET=1 $(NOTMUCH_SRCDIR)/$(test_src_dir)/notmuch-test $(OPTIONS)
+       @env NOTMUCH_TEST_QUIET=1 $(NOTMUCH_SRCDIR)/$(test_src_dir)/notmuch-test $(OPTIONS) </dev/null
 else
 # The user has explicitly enabled quiet execution.
 ifeq ($V,0)
-       @env NOTMUCH_TEST_QUIET=1 $(NOTMUCH_SRCDIR)/$(test_src_dir)/notmuch-test $(OPTIONS)
+       @env NOTMUCH_TEST_QUIET=1 $(NOTMUCH_SRCDIR)/$(test_src_dir)/notmuch-test $(OPTIONS) </dev/null
 else
-       @$(NOTMUCH_SRCDIR)/$(test_src_dir)/notmuch-test $(OPTIONS)
+       @$(NOTMUCH_SRCDIR)/$(test_src_dir)/notmuch-test $(OPTIONS) </dev/null
 endif
 endif
 

[-- Attachment #2: signature.asc --]
[-- Type: application/pgp-signature, Size: 227 bytes --]

  parent reply	other threads:[~2019-05-20 23:49 UTC|newest]

Thread overview: 13+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-05-10 10:45 parallel tests broken on Debian stable David Bremner
2019-05-20 17:27 ` Daniel Kahn Gillmor
2019-05-20 18:55   ` Tomi Ollila
2019-05-20 21:45     ` Daniel Kahn Gillmor
2019-05-20 23:49   ` Daniel Kahn Gillmor [this message]
2019-05-21  0:44     ` Daniel Kahn Gillmor
2019-05-21  1:03       ` [PATCH] test: avoid hanging older gdb under GNU timeout and moreutils parallel Daniel Kahn Gillmor
2019-05-21  2:32         ` David Bremner
2019-05-21 15:45           ` Daniel Kahn Gillmor
2019-05-21  6:12         ` Tomi Ollila
2019-05-21 15:50           ` Daniel Kahn Gillmor
2019-05-21 20:23             ` Tomi Ollila
2021-10-30 19:30 ` parallel tests broken on Debian stable David Bremner

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

  List information: https://notmuchmail.org/

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87pnocadqp.fsf@fifthhorseman.net \
    --to=dkg@fifthhorseman.net \
    --cc=david@tethera.net \
    --cc=notmuch@notmuchmail.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
Code repositories for project(s) associated with this public inbox

	https://yhetil.org/notmuch.git/

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).