From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mp0 ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by ms11 with LMTPS id gCRWCKX7N2BgEQAA0tVLHw (envelope-from ) for ; Thu, 25 Feb 2021 19:33:57 +0000 Received: from aspmx1.migadu.com ([2001:41d0:8:6d80::]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits)) by mp0 with LMTPS id EAXEA6X7N2DYHwAA1q6Kng (envelope-from ) for ; Thu, 25 Feb 2021 19:33:57 +0000 Received: from mail.notmuchmail.org (nmbug.tethera.net [144.217.243.247]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) key-exchange X25519 server-signature RSA-PSS (2048 bits) server-digest SHA256) (No client certificate requested) by aspmx1.migadu.com (Postfix) with ESMTPS id 701322CE28 for ; Thu, 25 Feb 2021 20:33:56 +0100 (CET) Received: from nmbug.tethera.net (localhost [127.0.0.1]) by mail.notmuchmail.org (Postfix) with ESMTP id BBF6326AA1; Thu, 25 Feb 2021 14:33:45 -0500 (EST) Received: from lahtoruutu.iki.fi (unknown [IPv6:2a0b:5c81:1c1::37]) by mail.notmuchmail.org (Postfix) with ESMTPS id 487241FBBC for ; Thu, 25 Feb 2021 14:33:42 -0500 (EST) Received: from guru.guru-group.fi (unknown [IPv6:2a02:2380:1:9:5054:ff:feb7:a4bc]) (using TLSv1.2 with cipher AES256-SHA (256/256 bits)) (No client certificate requested) (Authenticated sender: too) by lahtoruutu.iki.fi (Postfix) with ESMTPSA id CFF5B1B00257; Thu, 25 Feb 2021 21:33:29 +0200 (EET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1614281609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/7y97YGpRzXFa54W8ovPLJiqRBY0c4R28Kv/UEW7wbQ=; b=whVIvIM4mwc5D2jefzdpy9bVCvMaSwpR96J29tWeNNEFi1DwSipHb9/PdrH+M9I6Kdp+Z3 3vtS273KZZFKAhPpMH0LH92yJQjsiGXCbt5Ev2AKjvEopox0OjDQTtDd862Tem+juIxbhk m8b/tC5W9FCvMVDRsRDUNjrPB+Xw4XmO7sAVXYpCLDna+9KDYEI/Bt3onOvEpO2LamMreV hmpg1YoZiWkMxBqOyK1CSPntZ+797Pju5zcinjtXewmcH69a1O0BKOb9atgW+xIh3X7e6h cxFWYSyX82HgfGSSlsFzFW1NxO3vEdScMKt0qGAtKYIVsZd/t/L5Ws1uLxa32g== From: Tomi Ollila To: David Bremner , notmuch@notmuchmail.org Subject: Re: parallel test failures In-Reply-To: <87wnv4qm7s.fsf@tethera.net> References: <87wnv4qm7s.fsf@tethera.net> User-Agent: Notmuch/0.31.4+128~gc67b63a (https://notmuchmail.org) Emacs/27.1 X-Face: HhBM'cA~ MIME-Version: 1.0 ARC-Authentication-Results: i=1; ORIGINATING; auth=pass smtp.auth=too smtp.mailfrom=tomi.ollila@iki.fi ARC-Seal: i=1; s=lahtoruutu; d=iki.fi; t=1614281609; a=rsa-sha256; cv=none; b=SzxrU3ss8KniuoF1ZYY8dQOGVklBs4qho36uKHpNxyM8o7XfTyDPbDNxZ5ER1s81kIExyP yVB9UJQ9BRo5VIcwvdhMU9Yo5vIQlX1c5V1kwve6oAXiXZzCIzSsrKQxQIG8WHEmZQTKuM qUxxlgfa3tZpER7jW6QSOPd6H73BFI3Vib2F4sSiJcuPBLdbNS3PPP7qUVcxzVURStRFkf cEcnmzxeRaKAdqFp7pfwl1zUekxI5C+Mh4jVPs1fsasZdregLPWDDWVPxtWhDyAYRSKloD WNiTJT6n6f/f40JsxJ8svnDx0kc2ibZOAiJPpIxv1jTX1viG6VoMFzgxPrc4mg== ARC-Message-Signature: i=1; a=rsa-sha256; c=relaxed/relaxed; d=iki.fi; s=lahtoruutu; t=1614281609; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:mime-version:mime-version:content-type:content-type: in-reply-to:in-reply-to:references:references; bh=/7y97YGpRzXFa54W8ovPLJiqRBY0c4R28Kv/UEW7wbQ=; b=EoHsVklsgso8gJFOwg3prP1YtNgHkUq5AcX29fpC6WlZ0O3TguyzYWclcTWRUQF3rmR0C4 SARPQ/asK3hxkHr1VBfoGqEnemGslqexUNydjcsB1o3zmQDTxOUXumIvwvEA4xK7LW9uHb SOwrlowsaDcG3mJibIAL51wxPdYj2+8Ul7bxcVXcJEbXiIgqHBA+ZSjJWCpvvjkzW9oxD3 NsKP6C3WNsXl1Ii+8DuEyVMPvVJmzpoWNl+12cWA11rM+9+czUVwUgA3pnHZB1tvhY9TcA kUELyqKtxx+W7GOSO9ikN4smXZ/QCXvSYJe15p/loycB7sI8zEKf3ZVRuYhMVQ== Message-ID-Hash: DLLOU5ABSJAOSKC2LVGH4MMVAID7PLEJ X-Message-ID-Hash: DLLOU5ABSJAOSKC2LVGH4MMVAID7PLEJ X-MailFrom: tomi.ollila@iki.fi X-Mailman-Rule-Misses: dmarc-mitigation; no-senders; approved; emergency; loop; banned-address; member-moderation; header-match-notmuch.notmuchmail.org-0; nonmember-moderation; administrivia; implicit-dest; max-recipients; max-size; news-moderation; no-subject; suspicious-header X-Mailman-Version: 3.2.1 Precedence: list List-Id: "Use and development of the notmuch mail system." List-Help: List-Post: List-Subscribe: List-Unsubscribe: Content-Type: text/plain; charset="us-ascii" Content-Transfer-Encoding: 7bit X-Migadu-Flow: FLOW_IN X-Migadu-Spam-Score: 2.66 Authentication-Results: aspmx1.migadu.com; dkim=fail ("body hash did not verify") header.d=iki.fi header.s=lahtoruutu header.b=whVIvIM4; arc=reject ("signature check failed: fail, {[1] = sig:iki.fi:reject}"); dmarc=none; spf=pass (aspmx1.migadu.com: domain of notmuch-bounces@notmuchmail.org designates 144.217.243.247 as permitted sender) smtp.mailfrom=notmuch-bounces@notmuchmail.org X-Migadu-Queue-Id: 701322CE28 X-Spam-Score: 2.66 X-Migadu-Scanner: scn1.migadu.com X-TUID: VltCknaAFy4t On Fri, Feb 19 2021, David Bremner wrote: > I have intermittent failures when running the test suite on sufficiently > parallel machines. I have attached a log of such a failing build, > although it does not seem especially illuminating. > > It takes anywhere from 5 to 300 runs to get a failure for me running on > 60 hardware threads (30 cores). At least on this machine the number of > tests that pass seems consistent at 1205 I did the following changes to see file write accesses: ---- diff --git a/test/notmuch-test b/test/notmuch-test index b58fd3b3..903a5dff 100755 --- a/test/notmuch-test +++ b/test/notmuch-test @@ -62,13 +62,16 @@ if test -z "$NOTMUCH_TEST_SERIALIZE" && command -v parallel >/dev/null ; then META_FAILURE="parallel test suite returned error code $RES" fi else + rm -rf inw; mkdir inw for test in $TESTS; do + testname=$(basename $test .sh) + inotifywait -d --outfile $PWD/inw/inw-$testname -r -e close_write,delete $PWD/test /tmp $TEST_TIMEOUT_CMD $test "$@" & wait $! + pkill inotifywa # If the test failed without producing results, then it aborted, # so we should abort, too. RES=$? - testname=$(basename $test .sh) if [[ $RES != 0 && ! -e "$NOTMUCH_BUILDDIR/test/test-results/$testname" ]]; then META_FAILURE="Aborting on $testname (returned $RES)" break ---- Then ran tests w/ NOTMUCH_TEST_SERIALIZE=t and then ran for f in inw/*; do echo $f; sed -e 's,.*notmuch/test/, ,' -e '/tmp.T/ s,/.*,,' $f | sort -u; echo; done | less to examine "fallout" based on that (random gazes to the listing) I did not see any potentially overlapping writes, but saw unrelated inconsistency in test directories. Anyway, the log.gz did not show any tests failing but parallel exiting nonzero possibly for some other reason. Cannot say. Probably stracing (even with --seccomp-bpf) would make it happen even less likely :/ Tomi